RE: solr query gives different numFound upon refreshing

2014-09-16 Thread Joshi, Shital
We wrote a script which queries each Solr instance in cloud 
(http://$host/solr/replication?command=details) and subtracts the 
‘replicableVersion’ number from the ‘indexVersion’ number, converts to minutes, 
and alerts if the minutes exceed 20. We get alerted many times a day. The soft 
commit setting is every 7 minutes. 

Any idea what might be wrong here?

This is our commit setting. 

autoCommit
   maxTime15000/maxTime
   maxDocs10/maxDocs
   openSearcherfalse/openSearcher   
 /autoCommit
 autoSoftCommit 
   maxTime45/maxTime
/autoSoftCommit

We got rid of all max new searcher errors. 


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, September 04, 2014 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: solr query gives different numFound upon refreshing

Does this persist if you issue a hard commit? You can do something like
http://solr/collection/update?stream.body=commit/

On Thu, Sep 4, 2014 at 2:19 PM, shamik sham...@gmail.com wrote:
 I've noticed similar behavior with our Solr cloud cluster for a while, it's
 random though. We've 2 shards with 3 replicas each. At times, I've observed
 that the same query on refresh will fetch different results (numFound) as
 well as the content. The only way to mitigate is to refresh the index with
 the documents till the nodes are in sync. I always use SolrJ which talks to
 Solr through zookeeper, even with that it seemed to be unavoidable at times.
 We are committing every 10 mins. I'm pretty much sure there's a minor glitch
 which creates a sync issue at times.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-query-gives-different-numFound-upon-refreshing-tp4155414p4157026.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr query gives different numFound upon refreshing

2014-09-04 Thread shamik
I've noticed similar behavior with our Solr cloud cluster for a while, it's
random though. We've 2 shards with 3 replicas each. At times, I've observed
that the same query on refresh will fetch different results (numFound) as
well as the content. The only way to mitigate is to refresh the index with
the documents till the nodes are in sync. I always use SolrJ which talks to
Solr through zookeeper, even with that it seemed to be unavoidable at times.
We are committing every 10 mins. I'm pretty much sure there's a minor glitch
which creates a sync issue at times. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-query-gives-different-numFound-upon-refreshing-tp4155414p4157026.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr query gives different numFound upon refreshing

2014-09-04 Thread Erick Erickson
Does this persist if you issue a hard commit? You can do something like
http://solr/collection/update?stream.body=commit/

On Thu, Sep 4, 2014 at 2:19 PM, shamik sham...@gmail.com wrote:
 I've noticed similar behavior with our Solr cloud cluster for a while, it's
 random though. We've 2 shards with 3 replicas each. At times, I've observed
 that the same query on refresh will fetch different results (numFound) as
 well as the content. The only way to mitigate is to refresh the index with
 the documents till the nodes are in sync. I always use SolrJ which talks to
 Solr through zookeeper, even with that it seemed to be unavoidable at times.
 We are committing every 10 mins. I'm pretty much sure there's a minor glitch
 which creates a sync issue at times.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-query-gives-different-numFound-upon-refreshing-tp4155414p4157026.html
 Sent from the Solr - User mailing list archive at Nabble.com.


RE: solr query gives different numFound upon refreshing

2014-08-29 Thread Joshi, Shital
Eric,

Thanks your reply. 

We will increase autocommit setting and let you know.

We are  using Solr Cloud (4.8.0). When from the Solr admin gui, select a 
collection and see the Overview tab, We see three versions of index though we 
have just 1 replica. 

Master (Searching)
Master (Replicable)
Slave (Searching)

What is Master Searching vs. Master Replicable vs Slave Searching? 

Thanks. 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, August 29, 2014 12:22 AM
To: solr-user@lucene.apache.org
Subject: Re: solr query gives different numFound upon refreshing

First, I  want to be sure you're not mixing old-style
replication and SolrCloud. Your use of Master/Slave
causes this question.

Second, your maxWarmingSearchers error indicates that
your commit interval is too short relative to your autowarm
times. Try lengthening your autocommit settings (probably
soft commit) until you no longer see that error message
and see if the problem goes away. If it doesn't, let us know.

Best,
Erick



On Thu, Aug 28, 2014 at 9:39 AM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi Shawn,

 Thanks for your reply.

 We did some tests enabling shards.info=true and confirmed that there is
 not duplicate copy of our index.

 We have one replica but many times we see three versions on Admin
 GUI/Overview tab. All three has different versions and gen. Is that a
 problem?
 Master (Searching)
 Master (Replicable)
 Slave (Searching)

 We constantly see max searcher open exception. The warmup time is 1.5
 minutes but the difference between openedAt date and registeredAt date is
 at times more than 4-5 minutes. Is the true searcher time the difference
 between two dates and not the warmupTime?

 openedAt:   2014-08-28T16:17:24.829Z
 registeredAt:   2014-08-28T16:21:02.278Z
 warmupTime: 65727

 Thanks for all help.


 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: Wednesday, August 27, 2014 2:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: solr query gives different numFound upon refreshing

 On 8/27/2014 10:44 AM, Bryan Bende wrote:
  Theoretically this shouldn't happen, but is it possible that the two
  replicas for a given shard are not fully in sync?
 
  Say shard1 replica1 is missing a document that is in shard1 replica2...
 if
  you run a query that would hit on that document and run it a bunch of
  times, sometimes replica 1 will handle the request and sometimes replica
 2
  will handle it, and it would change your number of results if one of them
  is missing a document. You could write a program that compares each
  replica's documents by querying them with distrib=false.
 
  If there was a replica out of sync, I would think it would detect that
 on a
  restart when comparing itself against the leader for that shard, but I'm
  not sure.

 A replica out of sync is a possibility, but the most common reason for a
 changing numFound is because the overall distributed index has more than
 one document with the same uniqueKey value -- different versions of the
 same document in more than one shard.

 SolrCloud tries really hard to never end up with replicas out of sync,
 but either due to highly unusual circumstances or bugs, it could still
 happen.

 Thanks,
 Shawn




Re: solr query gives different numFound upon refreshing

2014-08-29 Thread Erick Erickson
bq: What is Master Searching vs. Master Replicable vs Slave Searching

Likely leftover from the days when master/slave was the only option. You
can pretty much ignore it in SorlrCloud I think.

Best,
Erick


On Fri, Aug 29, 2014 at 11:38 AM, Joshi, Shital shital.jo...@gs.com wrote:

 Eric,

 Thanks your reply.

 We will increase autocommit setting and let you know.

 We are  using Solr Cloud (4.8.0). When from the Solr admin gui, select a
 collection and see the Overview tab, We see three versions of index though
 we have just 1 replica.

 Master (Searching)
 Master (Replicable)
 Slave (Searching)

 What is Master Searching vs. Master Replicable vs Slave Searching?

 Thanks.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Friday, August 29, 2014 12:22 AM
 To: solr-user@lucene.apache.org
 Subject: Re: solr query gives different numFound upon refreshing

 First, I  want to be sure you're not mixing old-style
 replication and SolrCloud. Your use of Master/Slave
 causes this question.

 Second, your maxWarmingSearchers error indicates that
 your commit interval is too short relative to your autowarm
 times. Try lengthening your autocommit settings (probably
 soft commit) until you no longer see that error message
 and see if the problem goes away. If it doesn't, let us know.

 Best,
 Erick



 On Thu, Aug 28, 2014 at 9:39 AM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Hi Shawn,
 
  Thanks for your reply.
 
  We did some tests enabling shards.info=true and confirmed that there is
  not duplicate copy of our index.
 
  We have one replica but many times we see three versions on Admin
  GUI/Overview tab. All three has different versions and gen. Is that a
  problem?
  Master (Searching)
  Master (Replicable)
  Slave (Searching)
 
  We constantly see max searcher open exception. The warmup time is 1.5
  minutes but the difference between openedAt date and registeredAt date is
  at times more than 4-5 minutes. Is the true searcher time the difference
  between two dates and not the warmupTime?
 
  openedAt:   2014-08-28T16:17:24.829Z
  registeredAt:   2014-08-28T16:21:02.278Z
  warmupTime: 65727
 
  Thanks for all help.
 
 
  -Original Message-
  From: Shawn Heisey [mailto:s...@elyograg.org]
  Sent: Wednesday, August 27, 2014 2:37 PM
  To: solr-user@lucene.apache.org
  Subject: Re: solr query gives different numFound upon refreshing
 
  On 8/27/2014 10:44 AM, Bryan Bende wrote:
   Theoretically this shouldn't happen, but is it possible that the two
   replicas for a given shard are not fully in sync?
  
   Say shard1 replica1 is missing a document that is in shard1 replica2...
  if
   you run a query that would hit on that document and run it a bunch of
   times, sometimes replica 1 will handle the request and sometimes
 replica
  2
   will handle it, and it would change your number of results if one of
 them
   is missing a document. You could write a program that compares each
   replica's documents by querying them with distrib=false.
  
   If there was a replica out of sync, I would think it would detect that
  on a
   restart when comparing itself against the leader for that shard, but
 I'm
   not sure.
 
  A replica out of sync is a possibility, but the most common reason for a
  changing numFound is because the overall distributed index has more than
  one document with the same uniqueKey value -- different versions of the
  same document in more than one shard.
 
  SolrCloud tries really hard to never end up with replicas out of sync,
  but either due to highly unusual circumstances or bugs, it could still
  happen.
 
  Thanks,
  Shawn
 
 



RE: solr query gives different numFound upon refreshing

2014-08-28 Thread Joshi, Shital
Hi Shawn,

Thanks for your reply. 

We did some tests enabling shards.info=true and confirmed that there is not 
duplicate copy of our index.  

We have one replica but many times we see three versions on Admin GUI/Overview 
tab. All three has different versions and gen. Is that a problem?
Master (Searching)  
Master (Replicable) 
Slave (Searching)   

We constantly see max searcher open exception. The warmup time is 1.5 minutes 
but the difference between openedAt date and registeredAt date is at times more 
than 4-5 minutes. Is the true searcher time the difference between two dates 
and not the warmupTime?

openedAt:   2014-08-28T16:17:24.829Z
registeredAt:   2014-08-28T16:21:02.278Z
warmupTime: 65727

Thanks for all help. 


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, August 27, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: Re: solr query gives different numFound upon refreshing

On 8/27/2014 10:44 AM, Bryan Bende wrote:
 Theoretically this shouldn't happen, but is it possible that the two
 replicas for a given shard are not fully in sync?

 Say shard1 replica1 is missing a document that is in shard1 replica2... if
 you run a query that would hit on that document and run it a bunch of
 times, sometimes replica 1 will handle the request and sometimes replica 2
 will handle it, and it would change your number of results if one of them
 is missing a document. You could write a program that compares each
 replica's documents by querying them with distrib=false.

 If there was a replica out of sync, I would think it would detect that on a
 restart when comparing itself against the leader for that shard, but I'm
 not sure.

A replica out of sync is a possibility, but the most common reason for a
changing numFound is because the overall distributed index has more than
one document with the same uniqueKey value -- different versions of the
same document in more than one shard.

SolrCloud tries really hard to never end up with replicas out of sync,
but either due to highly unusual circumstances or bugs, it could still
happen.

Thanks,
Shawn



Re: solr query gives different numFound upon refreshing

2014-08-28 Thread Erick Erickson
First, I  want to be sure you're not mixing old-style
replication and SolrCloud. Your use of Master/Slave
causes this question.

Second, your maxWarmingSearchers error indicates that
your commit interval is too short relative to your autowarm
times. Try lengthening your autocommit settings (probably
soft commit) until you no longer see that error message
and see if the problem goes away. If it doesn't, let us know.

Best,
Erick



On Thu, Aug 28, 2014 at 9:39 AM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi Shawn,

 Thanks for your reply.

 We did some tests enabling shards.info=true and confirmed that there is
 not duplicate copy of our index.

 We have one replica but many times we see three versions on Admin
 GUI/Overview tab. All three has different versions and gen. Is that a
 problem?
 Master (Searching)
 Master (Replicable)
 Slave (Searching)

 We constantly see max searcher open exception. The warmup time is 1.5
 minutes but the difference between openedAt date and registeredAt date is
 at times more than 4-5 minutes. Is the true searcher time the difference
 between two dates and not the warmupTime?

 openedAt:   2014-08-28T16:17:24.829Z
 registeredAt:   2014-08-28T16:21:02.278Z
 warmupTime: 65727

 Thanks for all help.


 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: Wednesday, August 27, 2014 2:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: solr query gives different numFound upon refreshing

 On 8/27/2014 10:44 AM, Bryan Bende wrote:
  Theoretically this shouldn't happen, but is it possible that the two
  replicas for a given shard are not fully in sync?
 
  Say shard1 replica1 is missing a document that is in shard1 replica2...
 if
  you run a query that would hit on that document and run it a bunch of
  times, sometimes replica 1 will handle the request and sometimes replica
 2
  will handle it, and it would change your number of results if one of them
  is missing a document. You could write a program that compares each
  replica's documents by querying them with distrib=false.
 
  If there was a replica out of sync, I would think it would detect that
 on a
  restart when comparing itself against the leader for that shard, but I'm
  not sure.

 A replica out of sync is a possibility, but the most common reason for a
 changing numFound is because the overall distributed index has more than
 one document with the same uniqueKey value -- different versions of the
 same document in more than one shard.

 SolrCloud tries really hard to never end up with replicas out of sync,
 but either due to highly unusual circumstances or bugs, it could still
 happen.

 Thanks,
 Shawn




Re: solr query gives different numFound upon refreshing

2014-08-27 Thread Bryan Bende
Theoretically this shouldn't happen, but is it possible that the two
replicas for a given shard are not fully in sync?

Say shard1 replica1 is missing a document that is in shard1 replica2... if
you run a query that would hit on that document and run it a bunch of
times, sometimes replica 1 will handle the request and sometimes replica 2
will handle it, and it would change your number of results if one of them
is missing a document. You could write a program that compares each
replica's documents by querying them with distrib=false.

If there was a replica out of sync, I would think it would detect that on a
restart when comparing itself against the leader for that shard, but I'm
not sure.


On Wed, Aug 27, 2014 at 11:37 AM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi,

 We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. We have
 three collections. We recently upgraded from 4.4.0 from 4.8. We have ~850
 mil documents.

 We are facing an issue where refreshing a Solr query may give different
 results (number of documents returned). This issue is seen in all three
 collections.

 We found that Solr admin would report Solr instance states as not
 “current”.  Is it indicative of the above issue?

 We checked logs and found various errors/warnings, but they don’t seem to
 be indicative of the above issue (or if they are – it’s not yet
 clear/obvious or maybe indirectly related). The error message is like this:
 8/27/2014 2:01:24 AMERROR   SolrCmdDistributor
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
 opening new searcher. exceeded limit of maxWarmingSearchers=2, try again
 later.

 This is our autocommit setting.

 autoCommit
maxTime15000/maxTime
maxDocs10/maxDocs
openSearcherfalse/openSearcher
  /autoCommit
  autoSoftCommit
maxTime30/maxTime
 /autoSoftCommit
 The searcher takes less than 1.5 minutes and the soft commit setting is
 set for every 5 minutes. So there is no way to end up with more than two
 searchers.

 The searcher registeredAttime and openedAttime are sometimes 12-13 hours
 old and we end up bouncing could.

 Any help to solve this issue is appreciated.











Re: solr query gives different numFound upon refreshing

2014-08-27 Thread Shawn Heisey
On 8/27/2014 10:44 AM, Bryan Bende wrote:
 Theoretically this shouldn't happen, but is it possible that the two
 replicas for a given shard are not fully in sync?

 Say shard1 replica1 is missing a document that is in shard1 replica2... if
 you run a query that would hit on that document and run it a bunch of
 times, sometimes replica 1 will handle the request and sometimes replica 2
 will handle it, and it would change your number of results if one of them
 is missing a document. You could write a program that compares each
 replica's documents by querying them with distrib=false.

 If there was a replica out of sync, I would think it would detect that on a
 restart when comparing itself against the leader for that shard, but I'm
 not sure.

A replica out of sync is a possibility, but the most common reason for a
changing numFound is because the overall distributed index has more than
one document with the same uniqueKey value -- different versions of the
same document in more than one shard.

SolrCloud tries really hard to never end up with replicas out of sync,
but either due to highly unusual circumstances or bugs, it could still
happen.

Thanks,
Shawn