Re: Inconsistent replicas in a shard

2017-10-06 Thread Webster Homer
This solrcloud has had some issues of late. We had a network glitch which
caused a shard leader of one of the collections write over 5000 0 length
tlogs to its filesystem. Whenever it started up it ran out of file handles
which killed the IndexWriter and caused lots of unhappy collections. This
may be related to that. No one was alerted to the errors for several days.

These guys have been out of sync for a while.Indeed one of the collections
we just did a data load to today and it stayed bad. I can say that we have
NEVER done a FORCELEADER, unless some internal solrcloud code does this.

the dataload we did had no errors and both replicas are in active state. No
replica was offline

Oh just went back and looked and saw that the empty replica in the
collection we just loaded has now caught up and has data. It took a while
but it now matches its leader. Perhaps all we need to do is new data loads
to the out of whack collections?

On Fri, Oct 6, 2017 at 2:04 PM, Webster Homer 
wrote:

> We are using Solr 6.2.0 in solrcloud mode
>
> I have a QA solrcloud that has multiple collections. All collections have
> 2 shards each with two replicas.
>
> I have several replicas where the numDocs in the same shard do not match.
> In two collections with three different shards I have one replica with data
> and the other has no data. All six replicas appear healthy in the Solr
> console.
>
> So how does that happen where two replicas in the same shard have
> different amounts of data?
>
> How do you diagnose this when the replicas are active and seemingly
> healthy?
>
> How do I get the replicas with no data, get data from their leader? In all
> three cases the replica with data is the leader.
>
> I also see two other collections where the replica's numDocs don't quite
> match
> In those two cases the leader has a few more docs than the other replica
>
> How to remedy this situation?
>
> This solrcloud is a target of CDCR replication, but I'm not sure why that
> would matter since I believe cdcr has the shard leaders communicate and the
> followers should just get their updates from their leader as they would
> from a normal update
>
> I'm just lucky that this is not a production solrcloud! Still need to know
> how to fix it.
>
> Thanks!
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


[ANNOUNCE] Apache Solr 7.0.1 released

2017-10-06 Thread Steve Rowe
6 October 2017, Apache Solr™ 7.0.1 available 

Solr is the popular, blazing fast, open source NoSQL search platform from the 
Apache Lucene project. Its major features include powerful full-text search, 
hit highlighting, faceted search and analytics, rich document parsing, 
geospatial search, extensive REST APIs as well as parallel SQL. Solr is 
enterprise grade, secure and highly scalable, providing fault tolerant 
distributed search and indexing, and powers the search and navigation 
features of many of the world's largest internet sites. 

This release includes 2 bug fixes since the 7.0.0 release: 

* Solr 7.0 cannot read indexes from 6.x versions. 

* Message "Lock held by this virtual machine" during startup. 
Solr is trying to start some cores twice. 

Furthermore, this release includes Apache Lucene 7.0.1 which includes 1 bug 
fix since the 7.0.0 release. 

The release is available for immediate download at: 

http://www.apache.org/dyn/closer.lua/lucene/solr/7.0.1 

Please read CHANGES.txt for a detailed list of changes: 

https://lucene.apache.org/solr/7_0_1/changes/Changes.html 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html) 

Note: The Apache Software Foundation uses an extensive mirroring 
network for distributing releases. It is possible that the mirror you 
are using may not have replicated the release yet. If that is the 
case, please try another mirror. This also goes for Maven access.

spell-check does not return collations when using search query with filter

2017-10-06 Thread Arnold Bronley
When 'polt' is passed as keyword, both suggestions and collations
parameters are returned. But if I pass 'tag:polt' as search query then only
suggestions parameter is returned. Is this a bug?


Re: Inconsistent replicas in a shard

2017-10-06 Thread Erick Erickson
Shouldn't be happening of course (replicas with different numbers of
docs), at least permanently. It can regularly happen on a _temporary_
basis however. And there are ways you can cause this to happen
permanently. Here's an outline.

> temporarily out of sync. Due to the fact that commits happen at different 
> wall clock times, different replicas in the same shard can be skewed for the 
> autocommit interval. Ways to check:
>> stop indexing, wait for the CDCR to catch up _plus_ your autocommit interval 
>> and check.
>> Fire a query at the replica that cuts off some time in the past and add 
>> distrib=false, then examine the number of hits returned. The query looks 
>> something like 
>> "..solr/collection1_shard1_replica1/query?q=*:*=timestamp:[* TO NOW-(2x 
>> autocommit interval + CDCR latency)]=false". This requires a 
>> reliable timestamp of course.

> Permanently out of sync:
>> if you ever fired a FORCELEADER at a replica, you are risking this.
>> If you stopped the (non leader) replica and kept indexing, then stopped the 
>> leader and started the replica back up. Solr does the best it can to 
>> preserve the data, but if a replica is offline it doesn't have updates in 
>> the tlog to replay. So when leader election happens if the old replica is 
>> elected leader it won't have all the updates.


Best,
Erick

On Fri, Oct 6, 2017 at 12:04 PM, Webster Homer  wrote:
> We are using Solr 6.2.0 in solrcloud mode
>
> I have a QA solrcloud that has multiple collections. All collections have 2
> shards each with two replicas.
>
> I have several replicas where the numDocs in the same shard do not match.
> In two collections with three different shards I have one replica with data
> and the other has no data. All six replicas appear healthy in the Solr
> console.
>
> So how does that happen where two replicas in the same shard have different
> amounts of data?
>
> How do you diagnose this when the replicas are active and seemingly healthy?
>
> How do I get the replicas with no data, get data from their leader? In all
> three cases the replica with data is the leader.
>
> I also see two other collections where the replica's numDocs don't quite
> match
> In those two cases the leader has a few more docs than the other replica
>
> How to remedy this situation?
>
> This solrcloud is a target of CDCR replication, but I'm not sure why that
> would matter since I believe cdcr has the shard leaders communicate and the
> followers should just get their updates from their leader as they would
> from a normal update
>
> I'm just lucky that this is not a production solrcloud! Still need to know
> how to fix it.
>
> Thanks!
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Inconsistent replicas in a shard

2017-10-06 Thread Webster Homer
We are using Solr 6.2.0 in solrcloud mode

I have a QA solrcloud that has multiple collections. All collections have 2
shards each with two replicas.

I have several replicas where the numDocs in the same shard do not match.
In two collections with three different shards I have one replica with data
and the other has no data. All six replicas appear healthy in the Solr
console.

So how does that happen where two replicas in the same shard have different
amounts of data?

How do you diagnose this when the replicas are active and seemingly healthy?

How do I get the replicas with no data, get data from their leader? In all
three cases the replica with data is the leader.

I also see two other collections where the replica's numDocs don't quite
match
In those two cases the leader has a few more docs than the other replica

How to remedy this situation?

This solrcloud is a target of CDCR replication, but I'm not sure why that
would matter since I believe cdcr has the shard leaders communicate and the
followers should just get their updates from their leader as they would
from a normal update

I'm just lucky that this is not a production solrcloud! Still need to know
how to fix it.

Thanks!

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


RE: Complexphrase treats wildcards differently than other query parsers

2017-10-06 Thread Allison, Timothy B.
That could be it.  I'm not able to reproduce this with trunk.  More next week.

In trunk, if I add this to schema15.xml:
  

  
  

  
  

This test passes.

  @Test
  public void testCharFilter() {
assertU(adoc("iso-latin1", "cr\u00E6zy tr\u00E6n", "id", "1"));
assertU(commit());
assertU(optimize());

assertQ(req("q", "{!complexphrase} iso-latin1:craezy")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:traen")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:caezy~1")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:crae*")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:*aezy")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:crae*y")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"craezy traen\"")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"caezy~1 traen\"")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"craez* traen\"")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"*aezy traen\"")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"crae*y traen\"")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);
  }



-Original Message-
From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] 
Sent: Friday, October 6, 2017 6:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Complexphrase treats wildcards differently than other query parsers

Thanks a lot for your effort, Tim.

Looking at it from the Solr side, I see some use of local classes. The snippet 
below in particular caught my eye (in 
solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java).
The instance of ComplexPhraseQueryParser is not the clean one from Lucene, but 
a modified one. If any of the modifications messes with the analysis logic, 
well then that might answer it.

What do you make of it?

lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema().
getQueryAnalyzer())
{
protected Query newWildcardQuery(org.apache.lucene.index.Term t) { try { 
org.apache.lucene.search.Query wildcardQuery = reverseAwareParser.
getWildcardQuery(t.field(), t.text());
setRewriteMethod(wildcardQuery);
return wildcardQuery;
} catch (SyntaxError e) {
throw new RuntimeException(e);
}
}
private Query setRewriteMethod(org.apache.lucene.search.Query query) { if 
(query instanceof MultiTermQuery) {
((MultiTermQuery) query).setRewriteMethod( 
org.apache.lucene.search.MultiTermQuery.SCORING_BOOLEAN_REWRITE);
}
return query;
}
protected Query newRangeQuery(String field, String part1, String part2, boolean 
startInclusive, boolean endInclusive) { boolean reverse = 
reverseAwareParser.isRangeShouldBeProtectedFromReverse(field,
part1);
return super.newRangeQuery(field,
reverse ? reverseAwareParser.getLowerBoundForReverse() : part1, part2, 
startInclusive || reverse, endInclusive); } } ;

Thanks,
Bjarke




Re: Error adding replica after a delete replica

2017-10-06 Thread Webster Homer
Unfortunately as developers we have no access to the actual solr nodes, and
certainly no privileges to delete stuff, even in the development
environment.

On Fri, Oct 6, 2017 at 1:34 PM, Erick Erickson 
wrote:

> for future reference, a less harsh way to fix it would be to
> > stop the Solr instance where the replica resides
> > rm -rf SOLR_HOME/collection1_replia1_shard1
>
> where "collection1_replica1_shard1" is the directory of the replica in
> question, you should see a "core.properties" file in that directory...
>
> That said, this shouldn't be necessary, just in case.
>
> Best,
> Erick
>
> On Fri, Oct 6, 2017 at 10:34 AM, Webster Homer 
> wrote:
> > The replica was deleted using the deleteReplica collections API call. The
> > call timed out, but eventually completed. However something still held a
> > write lock, and it was still held a day later, but the replica was
> removed
> > as far as we could tell in the solr admin console.
> >
> > Since it was a development collection, we "fixed" the problem by deleting
> > the collection and re-creating it
> >
> >
> >
> > On Fri, Oct 6, 2017 at 2:44 AM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi,
> >> How did you delete replica? Did you see any errors in logs after
> deleting?
> >> How did/does it look from ZK perspective after deleting that replica?
> >>
> >> Thanks,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >> > On 5 Oct 2017, at 16:17, Webster Homer 
> wrote:
> >> >
> >> > A colleague of mine was testing how solrcloud replica recovery works.
> We
> >> > have had a lot of issues with replicas going into recovery mode,
> replicas
> >> > down and in recovery failed states.  So to test, he deleted a healthy
> >> > replica in one of our development. First the delete operation timed
> out,
> >> > but the replica appears to be gone. However, addReplica always fails
> with
> >> > this error:
> >> >
> >> > Error CREATEing SolrCore 'sial-content-citations_shard1_replica1':
> >> Unable
> >> > to create core [sial-content-citations_shard1_replica1] Caused by:
> Lock
> >> > held by this virtual machine: /var/solr/data/sial-content-
> >> > citations_shard1_replica1/data/index/write.lock
> >> >
> >> > This cloud has 4 nodes. The collection has two shards with two
> replicas
> >> per
> >> > shard. They are all hosted in a google cloud environment.
> >> >
> >> > So if the delete deleted the replica why would it then hold a lock? We
> >> want
> >> > to understand this.
> >> >
> >> > We are using Solr 6.2.0
> >> >
> >> > --
> >> >
> >> >
> >> > This message and any attachment are confidential and may be
> privileged or
> >> > otherwise protected from disclosure. If you are not the intended
> >> recipient,
> >> > you must not copy this message or attachment or disclose the contents
> to
> >> > any other person. If you have received this transmission in error,
> please
> >> > notify the sender immediately and delete the message and any
> attachment
> >> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> >> > subsidiaries do not accept liability for any omissions or errors in
> this
> >> > message which may arise as a result of E-Mail-transmission or for
> damages
> >> > resulting from any unauthorized changes of the content of this message
> >> and
> >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> >> > subsidiaries do not guarantee that this message is free of viruses and
> >> does
> >> > not accept liability for any damages caused by any virus transmitted
> >> > therewith.
> >> >
> >> > Click http://www.emdgroup.com/disclaimer to access the German,
> French,
> >> > Spanish and Portuguese versions of this disclaimer.
> >>
> >>
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.

Re: Error adding replica after a delete replica

2017-10-06 Thread Erick Erickson
for future reference, a less harsh way to fix it would be to
> stop the Solr instance where the replica resides
> rm -rf SOLR_HOME/collection1_replia1_shard1

where "collection1_replica1_shard1" is the directory of the replica in
question, you should see a "core.properties" file in that directory...

That said, this shouldn't be necessary, just in case.

Best,
Erick

On Fri, Oct 6, 2017 at 10:34 AM, Webster Homer  wrote:
> The replica was deleted using the deleteReplica collections API call. The
> call timed out, but eventually completed. However something still held a
> write lock, and it was still held a day later, but the replica was removed
> as far as we could tell in the solr admin console.
>
> Since it was a development collection, we "fixed" the problem by deleting
> the collection and re-creating it
>
>
>
> On Fri, Oct 6, 2017 at 2:44 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi,
>> How did you delete replica? Did you see any errors in logs after deleting?
>> How did/does it look from ZK perspective after deleting that replica?
>>
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 5 Oct 2017, at 16:17, Webster Homer  wrote:
>> >
>> > A colleague of mine was testing how solrcloud replica recovery works. We
>> > have had a lot of issues with replicas going into recovery mode, replicas
>> > down and in recovery failed states.  So to test, he deleted a healthy
>> > replica in one of our development. First the delete operation timed out,
>> > but the replica appears to be gone. However, addReplica always fails with
>> > this error:
>> >
>> > Error CREATEing SolrCore 'sial-content-citations_shard1_replica1':
>> Unable
>> > to create core [sial-content-citations_shard1_replica1] Caused by: Lock
>> > held by this virtual machine: /var/solr/data/sial-content-
>> > citations_shard1_replica1/data/index/write.lock
>> >
>> > This cloud has 4 nodes. The collection has two shards with two replicas
>> per
>> > shard. They are all hosted in a google cloud environment.
>> >
>> > So if the delete deleted the replica why would it then hold a lock? We
>> want
>> > to understand this.
>> >
>> > We are using Solr 6.2.0
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error, please
>> > notify the sender immediately and delete the message and any attachment
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not accept liability for any omissions or errors in this
>> > message which may arise as a result of E-Mail-transmission or for damages
>> > resulting from any unauthorized changes of the content of this message
>> and
>> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee that this message is free of viruses and
>> does
>> > not accept liability for any damages caused by any virus transmitted
>> > therewith.
>> >
>> > Click http://www.emdgroup.com/disclaimer to access the German, French,
>> > Spanish and Portuguese versions of this disclaimer.
>>
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Re: vespa

2017-10-06 Thread Doug Turnbull
Diego, I wrote this article today after our initial high-level review at
OpenSource Connections

http://opensourceconnections.com/blog/2017/10/06/vespa-vs-lucene-initial-impressions/

On Wed, Sep 27, 2017 at 11:44 AM Diego Ceccarelli (BLOOMBERG/ LONDON) <
dceccarel...@bloomberg.net> wrote:

> Hi all,
>
> Yesterday Yahoo open sourced Vespa (i.e.: The open big data serving
> engine: Store, search, rank and organize big data at user serving time.),
> looking at the API they provide search.
> I did a quick search on the code for lucene, getting only 5 results.
>
> Does anyone know more about the framework? does it provide a new way to do
> search?  how does it compare with Solr?
>
> https://github.com/vespa-engine/vespa
> http://vespa.ai
>
> --
Consultant, OpenSource Connections. Contact info at
http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)


Re: Error adding replica after a delete replica

2017-10-06 Thread Webster Homer
The replica was deleted using the deleteReplica collections API call. The
call timed out, but eventually completed. However something still held a
write lock, and it was still held a day later, but the replica was removed
as far as we could tell in the solr admin console.

Since it was a development collection, we "fixed" the problem by deleting
the collection and re-creating it



On Fri, Oct 6, 2017 at 2:44 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi,
> How did you delete replica? Did you see any errors in logs after deleting?
> How did/does it look from ZK perspective after deleting that replica?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Oct 2017, at 16:17, Webster Homer  wrote:
> >
> > A colleague of mine was testing how solrcloud replica recovery works. We
> > have had a lot of issues with replicas going into recovery mode, replicas
> > down and in recovery failed states.  So to test, he deleted a healthy
> > replica in one of our development. First the delete operation timed out,
> > but the replica appears to be gone. However, addReplica always fails with
> > this error:
> >
> > Error CREATEing SolrCore 'sial-content-citations_shard1_replica1':
> Unable
> > to create core [sial-content-citations_shard1_replica1] Caused by: Lock
> > held by this virtual machine: /var/solr/data/sial-content-
> > citations_shard1_replica1/data/index/write.lock
> >
> > This cloud has 4 nodes. The collection has two shards with two replicas
> per
> > shard. They are all hosted in a google cloud environment.
> >
> > So if the delete deleted the replica why would it then hold a lock? We
> want
> > to understand this.
> >
> > We are using Solr 6.2.0
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Doubt about facet with dates

2017-10-06 Thread Chris Hostetter

: https://lucene.apache.org/solr/guide/6_6/working-with-dates.html#Workin
: gwithDates-DateMath
: 
: Your query would be something like
: mydate:[* TO NOW/DAY] AND mydate:[NOW+1DAY/DAY TO *]

specifically you could use those with "facet.query" ... instead of trying 
to do them with "facet.range"


-Hoss
http://www.lucidworks.com/


Re: Slow when query with cursorMark

2017-10-06 Thread Chris Hostetter

: I would guess that your first query is hitting the queryResultCache.

yeah, that's almost certainly why you're seeing the "page#0" query be so 
fast -- but IIRC the cursorMark pages can't be cached in the same way?  

where you'll start to see significant speed ups is in in the subsequent 
"pages"...

https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/



-Hoss
http://www.lucidworks.com/


Re: FieldValueCache in solr 6.6

2017-10-06 Thread Yonik Seeley
On Fri, Oct 6, 2017 at 12:45 PM, sile  wrote:
> Hi Yonik,
>
> Thanks for your answer :).
>
> It works.
>
> Another question:
>
> What is recommended to be used in solr 6.6 for faceting (docValues or
> UnInvertedField), because UnInvertedField performs better for subsequent
> requests?
>
> I assume that docValues is more beneficial in terms of heap memory use, but
> should I use fieldValueCache instead if hit ratio is good?

docValues is a safer default.  UIF needs to rebuild every time the
index is changed (and is also one reason why it's faster once it is
built).
If one uses real docValues (at index time), then there will even be
less heap memory.
A note to those trying out UIF: method=uif is an execution hint only
and will be ignored if you've indexed docValues.  We should probably
add some way of forcing it at some point.

-Yonik


Re: FieldValueCache in solr 6.6

2017-10-06 Thread sile
Hi Yonik,

Thanks for your answer :).

It works. 

Another question:

What is recommended to be used in solr 6.6 for faceting (docValues or
UnInvertedField), because UnInvertedField performs better for subsequent
requests? 

I assume that docValues is more beneficial in terms of heap memory use, but
should I use fieldValueCache instead if hit ratio is good? 

Regards,

Sile





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr not preserving milliseconds precision for zero milliseconds

2017-10-06 Thread Pratik Patel
Thanks for the clarification. I'll change my code to accommodate this
behavior.

On Thu, Oct 5, 2017 at 6:24 PM, Chris Hostetter 
wrote:

> : > "startTime":"2013-02-10T18:36:07.000Z"
> ...
> : handler. It gets added successfully but when I retrieve this document
> back
> : using "id" I get following.
> ...
> : > "startTime":"2013-02-10T18:36:07Z",
> ...
> : As you can see, the milliseconds precision in date field "startTime" is
> : lost. Precision is preserved for non-zero milliseconds but it's being
> lost
> : for zero values. The field type of "startTime" field is as follows.
> ...
> : Does anyone know how I can preserve milliseconds even if its zero? Or is
> it
> : not possible at all?
>
> ms precision is being preserved -- but as you mentioned, the fractional
> seconds you indexed are "0" therefore they are not needed/preserved when
> writing the response to maintain ms precision.
>
> This is the correct formatting as specified in the specification for the
> time format that Solr follows...
>
> https://lucene.apache.org/solr/guide/working-with-dates.html
> https://www.w3.org/TR/xmlschema-2/#dateTime
>
> >>> 3.2.7.2 Canonical representation
> >>> ...
> >>> The fractional second string, if present, must not end in '0';
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: FieldValueCache in solr 6.6

2017-10-06 Thread Yonik Seeley
If you're using regular faceting (as opposed to the JSON Facet API),
you can try facet.method=uif
https://issues.apache.org/jira/browse/SOLR-8466

Background:
UIF (UnInvertedField which are the entries in the FieldValueCache) was
completely removed from use at some point in the 5.x timeframe.
It was part of the JSON Facet API though, and so later SOLR-8466 added
back support to regular faceting by calling JSON Faceting when
facet.method=uif

-Yonik


On Fri, Oct 6, 2017 at 11:22 AM, sile  wrote:
> Hi,
>
> I'm new to solr, and I'm using solr 6.6.
>
> I did some testing with solr 4.9 and 6.6 on the same index with the same
> faceting queries on the multivalued fields.
>
> In first run (with empty cache) solr 6.6 performs much better, but when I
> run same queries couple more times solr 4.9 is a little bit faster than solr
> 6.6.
>
> FieldValueCache is empty in solr 6.6, and solr 4.9 uses this cache with a
> good hit ratio (0.9).
>
> I have specified this cache inside solrconfig.xml for both solr 6.6 and 4.9.
>
> I have also tried same thing by reindexing documents with docValues set to
> false for the faceting fields and run queries again and FieldValueCache is
> still empty.
>
> Is it possible to use FieldValueCache in solr 6.6?
>
> Thanks in advance.
>
> Regards,
>
> Sile
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


FieldValueCache in solr 6.6

2017-10-06 Thread sile
Hi,

I'm new to solr, and I'm using solr 6.6.

I did some testing with solr 4.9 and 6.6 on the same index with the same
faceting queries on the multivalued fields. 

In first run (with empty cache) solr 6.6 performs much better, but when I
run same queries couple more times solr 4.9 is a little bit faster than solr
6.6. 

FieldValueCache is empty in solr 6.6, and solr 4.9 uses this cache with a
good hit ratio (0.9).

I have specified this cache inside solrconfig.xml for both solr 6.6 and 4.9.

I have also tried same thing by reindexing documents with docValues set to
false for the faceting fields and run queries again and FieldValueCache is
still empty.

Is it possible to use FieldValueCache in solr 6.6?  

Thanks in advance.

Regards,

Sile



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr test runs: test skipping logic

2017-10-06 Thread Nawab Zada Asad Iqbal
Thanks Chris,

That very likely is the reason. I had noticed the seed and realized that it
will be controlling the random input generation for the tests to make
failures reproducible. However, i didn't consider that it can also cause
test skipping.

Thanks!
Nawab


On Thu, Oct 5, 2017 at 3:13 PM, Chris Hostetter 
wrote:

>
> : I am seeing that in different test runs (e.g., by executing 'ant test' on
> : the root folder in 'lucene-solr') a different subset of tests are
> skipped.
> : Where can I find more about it? I am trying to create parity between test
> : successes before and after my changes and this is causing  confusion.
>
> The test randomization logic creates an arbitrary "master seed" that is
> assigned by ant.  This master seed is
> then used to generate some randomized default properties for the the
> forked JVMs (default timezones, default Locale, default charset, etc...)
>
> Each test class run in a forked JVM then gets it's own Random seed
> (generated fro mthe master seed as well) which the solr test-framework
> uses to randomize some more things (that are specific to the solr
> test-framework.
>
> In some cases, tests have @Assume of assumeThat(...) logic in if we know
> that certain tests are completely incompatible with certain randomized
> aspects of the environemnt -- for example: some tests won't bothe to run
> if the randomized Locale uses "tr" because of external third-party
> dependencies that break with this Locale (due to upercase/lowercase
> behavior).
>
> This is most likeley the reason you are seeing a diff "set" of tests run
> on diff times.  But if you want true parity between test runs, use the
> same master seed -- which is printed at the begining of every "ant
> test" run, as well as any time a test fails, and can be overridden on the
> ant command line for future runs.
>
> run "ant test-help" for the specifics.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: FilterCache size should reduce as index grows?

2017-10-06 Thread Yonik Seeley
On Fri, Oct 6, 2017 at 6:50 AM, Toke Eskildsen  wrote:
> Letting the default use maxSizeMB would be better IMO. But I assume
> that FastLRUCache is used for a reason, so that would have to be
> extended to support that parameter first.

FastLRUCache is the default on the filter cache because it was shown
(and developed for the purpose) of being better under concurrency.
But I don't know if anyone has analyzed / thought about what is the
better default since the size option was added to LRUCache.

Would be nice to try something based on Caffeine + LFU + size-based limits.

-Yonik


fieldValueCache not used in solr 6.6

2017-10-06 Thread sile
Hi,

I'm new to solr, and I use solr 6.6. 

I have set it up and do some testing, I've noticed that fieldValueCache is
not used for multivalued field faceting (and any other query) at all, of
course I specified it previously in solrconfig.xml and set docVals for all
faceting fields to false in order to use fieldValueCache but it doesn't
work, indexing is done with docValues=false for that specific fields. 

How can I use fieldValueCache in solr 6.6? 

I have tested solr 6.6 (with docVals) and solr 4.9 (with fieldValueCache
enabled), and solr 4.9 performs a little bit beter when the same queries are
repeating, probably because of fieldValueCache. 

Thanks in advance






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Doubt about facet with dates

2017-10-06 Thread Toke Eskildsen
On Fri, 2017-10-06 at 13:16 +0200, Miguel Valencia Zurera wrote:
>      I need get faceted results  by a date field. The facets must be
> two:
> 1) all values lower than the system date
> 2) and values greater than the system date,

https://lucene.apache.org/solr/guide/6_6/working-with-dates.html#Workin
gwithDates-DateMath

Your query would be something like
mydate:[* TO NOW/DAY] AND mydate:[NOW+1DAY/DAY TO *]

- Toke Eskildsen, Royal Danish Library



Doubt about facet with dates

2017-10-06 Thread Miguel Valencia Zurera

Hi

I need get faceted results  by a date field. The facets must be two:
1) all values lower than the system date
2) and values greater than the system date,

¿it is possible get these two facet?

I'm reading wiki solr about facet.date and facet.range but I not get a 
good solution.


Any ideas.
Thanks




Re: FilterCache size should reduce as index grows?

2017-10-06 Thread Toke Eskildsen
On Thu, 2017-10-05 at 21:56 -0700, S G wrote:
> So for large indexes, there is a chance that filterCache of 128 can
> cause bad GC.

Large indexes measured in document count, yes. Or you could argue that
a large index is likely to be served with a much larger heap and that
it will offset the increased filterCache requirements.

> And for smaller indexes, it would really not matter that much because
> well, the index size is small and probably whole of it is in OS-cache 
> anyways.

More fuzzy. You can easily have a small index measured in document
count that is large in bytes (i.e. large documents) and have complex
(slow) filters.

> So perhaps a default of 64 would be a much saner choice to get the
> best of both the worlds?

Hard to say without empiric measurements. At this point it is all hand-
waving, made worse by the fact that Solr indexes differ a lot in where
the scale & complexity is. I am told that PostgreSQL has the same
problem with default tuning parameters.

Letting the default use maxSizeMB would be better IMO. But I assume
that FastLRUCache is used for a reason, so that would have to be
extended to support that parameter first.


Looking much further ahead, the whole caching system would benefit from
having constraints that encompasses all the shards & collections served
in the same Solr. Unfortunately it is a daunting task just to figure
out the overall principles in this.

- Toke Eskildsen, Royal Danish Library



Re: Complexphrase treats wildcards differently than other query parsers

2017-10-06 Thread Bjarke Buur Mortensen
Thanks a lot for your effort, Tim.

Looking at it from the Solr side, I see some use of local classes. The
snippet below in particular caught my eye (in
solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java).
The instance of ComplexPhraseQueryParser is not the clean one from Lucene,
but a modified one. If any of the modifications messes with the analysis
logic, well then that might answer it.

What do you make of it?

lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema().
getQueryAnalyzer())
{
protected Query newWildcardQuery(org.apache.lucene.index.Term t) {
try {
org.apache.lucene.search.Query wildcardQuery = reverseAwareParser.
getWildcardQuery(t.field(), t.text());
setRewriteMethod(wildcardQuery);
return wildcardQuery;
} catch (SyntaxError e) {
throw new RuntimeException(e);
}
}
private Query setRewriteMethod(org.apache.lucene.search.Query query) {
if (query instanceof MultiTermQuery) {
((MultiTermQuery) query).setRewriteMethod(
org.apache.lucene.search.MultiTermQuery.SCORING_BOOLEAN_REWRITE);
}
return query;
}
protected Query newRangeQuery(String field, String part1, String part2,
boolean startInclusive,
boolean endInclusive) {
boolean reverse = reverseAwareParser.isRangeShouldBeProtectedFromReverse(field,
part1);
return super.newRangeQuery(field,
reverse ? reverseAwareParser.getLowerBoundForReverse() : part1,
part2,
startInclusive || reverse,
endInclusive);
}
}
;

Thanks,
Bjarke

2017-10-05 21:15 GMT+02:00 Allison, Timothy B. :

> After some more digging, I'm wrong even at the Lucene level.
>
> When I use the CustomAnalyzer and make my UC vowel mock filter
> MultitermAware, I get this with Lucene in trunk:
>
> "the* quick~" name:thE* name:qUIck~2 name:thE name:qUIck
>
> So, there's room for improvement with phrases, but the regular multiterms
> should be ok.
>
> Still no answer for you...
>
> 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. :
>
> > There's every chance that I'm missing something at the Solr level, but
> > it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still
> > not applying analysis to multiterms.
> >
> > When I call this on 7.0.0:
> >QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName,
> > analyzer);
> > return qp.parse(qString);
> >
> >  where the analyzer is a mock "uppercase vowel" analyzer[1] and the
> > qString is;
> >
> > "the* quick~" the* quick~ the quick
> >
> > I get this:
> > "the* quick~" name:the* name:quick~2 name:thE name:qUIck
>
>


RE: Solr 5.4.0: Colored Highlight and multi-value field ?

2017-10-06 Thread Bruno Mannina
Hi Erik,

Sorry for the late reply, I wasn't in my office this week...

So, I give more information:

* IC is a multi-value field defined like this:


* The request I use (i.e):
http://my_host/solr/collection/select?
q=ic:(A63C10* OR G06F22/086)
=0
=10
=json
=true
=pd+desc
=*
// HighLight
=true
=ti,ab,ic,inc,cpc,apc
=
=
=colored
=true
=true
=true
=999
=true

* Result:
I have only one color (in my case the yellow) for all different values found

* BUT *

If I use a non multi-value field like ti (title) with a query with some keywords


*Result (i.e ti:(foo OR merge) ):
I have different colors for each different terms found


Question:
- Is it because IC field is not defined with all term*="true" options ?
- How can I have different color and not use pre and post tags ?


Many thanks for your help !

-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com]
Envoyé : mercredi 4 octobre 2017 15:48
À : solr-user
Objet : Re: Solr 5.4.0: Colored Highlight and multi-value field ?

How does it not work for you? Details matter, an example set of values and the 
response from Solr are good bits of info for us to have.

On Tue, Oct 3, 2017 at 3:59 PM, Bruno Mannina 
wrote:

> Dear all,
>
>
>
> Is it possible to have a colored highlight in a multi-value field ?
>
>
>
> I’m succeed to do it on a textfield but not in a multi-value field,
> then SOLR takes hl.simple.pre / hl.simple.post as tag.
>
>
>
> Thanks a lot for your help,
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
> www.matheo-software.com
>
> www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
> [image: facebook (1)] [image:
> 1425551717] [image: 1425551737]
> [image: 1425551760]
> 
>
>
>
>
>  campaign=sig-email_content=emailclient> Garanti sans virus.
> www.avast.com
>  campaign=sig-email_content=emailclient>
> <#m_-7780043212915396992_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Re: search request audit logging

2017-10-06 Thread Michal Hlavac
Hi,
 
I've noticed that in SOLR-7484 Solr part of http request was moved to 
SolrHttpCall. So there is no way to handle
SolrQueryRequest and SolrQueryResponse in SolrDispatchFilter.
 
Internal requet logging is SolrCore.execute(SolrRequestHandler, 
SolrQueryRequest, SolrQueryResponse)
 
Is there is way to handle SOLR request/response to make custom log in SolrCloud 
environment.
 
thank you, m.


Hi,

I would like to ask how to implement search audit logging. I've implemented 
some idea but I would like to ask if there is better approach to do this.

Requirement is to log username, search time, all request parameters (q, fq, 
etc.), response data (count, etc) and important thing is to log all errors.

As I need it only for search requests I implemented custom SearchHandler with 
something like:

public class AuditSearchHandler extends SearchHandler {

@Override
public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {
try {
super.handleRequest(req, rsp);
} finally {
doAuditLog(req, rsp);
}
}
}

Custom SearchComponent is not option, because it can't handle all errors.

I read also 
/http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html/[1]
 and they mentioned custom Servlet Filter, but I didn't find example how to 
implement Servlet Filter to SOLR in proper way. If it's ok to edit web.xml

thanks for suggestions, m.





[1] 
http://lucene.472066.n3.nabble.com/Solr-request-response-lifecycle-and-logging-full-response-time-td4006044.html


Re: Error adding replica after a delete replica

2017-10-06 Thread Emir Arnautović
Hi,
How did you delete replica? Did you see any errors in logs after deleting? How 
did/does it look from ZK perspective after deleting that replica?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Oct 2017, at 16:17, Webster Homer  wrote:
> 
> A colleague of mine was testing how solrcloud replica recovery works. We
> have had a lot of issues with replicas going into recovery mode, replicas
> down and in recovery failed states.  So to test, he deleted a healthy
> replica in one of our development. First the delete operation timed out,
> but the replica appears to be gone. However, addReplica always fails with
> this error:
> 
> Error CREATEing SolrCore 'sial-content-citations_shard1_replica1': Unable
> to create core [sial-content-citations_shard1_replica1] Caused by: Lock
> held by this virtual machine: /var/solr/data/sial-content-
> citations_shard1_replica1/data/index/write.lock
> 
> This cloud has 4 nodes. The collection has two shards with two replicas per
> shard. They are all hosted in a google cloud environment.
> 
> So if the delete deleted the replica why would it then hold a lock? We want
> to understand this.
> 
> We are using Solr 6.2.0
> 
> -- 
> 
> 
> This message and any attachment are confidential and may be privileged or 
> otherwise protected from disclosure. If you are not the intended recipient, 
> you must not copy this message or attachment or disclose the contents to 
> any other person. If you have received this transmission in error, please 
> notify the sender immediately and delete the message and any attachment 
> from your system. Merck KGaA, Darmstadt, Germany and any of its 
> subsidiaries do not accept liability for any omissions or errors in this 
> message which may arise as a result of E-Mail-transmission or for damages 
> resulting from any unauthorized changes of the content of this message and 
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
> subsidiaries do not guarantee that this message is free of viruses and does 
> not accept liability for any damages caused by any virus transmitted 
> therewith.
> 
> Click http://www.emdgroup.com/disclaimer to access the German, French, 
> Spanish and Portuguese versions of this disclaimer.