Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread Erick Erickson
bq: If I changed the routing strategy back to composite (which it should be). is
it ok?

I sincerely doubt it. The docs have already been routed to the wrong
place (actually, I'm not sure how it worked at all). You can't get
them redistributed simply by changing the definition in ZooKeeper,
they're _already_ in the wrong place.

I'd tear down the corrupted data center and rebuild the collection.
Here "tear down" is delete all the affected collections and start over
again.

On the plus side, if you can get a window during which you are _not_
indexing you can copy the indexes from one of your good data centers
to the new one. Do it like this:


- Stop indexing.

- Set up the new collection in the corrupted data center. It's
important that it have _exactly_ the same number of shards ad the DC
you're going to transfer _from_. Also, make it leader only, i.e.
exactly 1 replica per shard.

- copy the indexes over from the good data center to the corresponding
shards. Here "corresponding" means that the source and destination
have the same hash range, which you can see from the state.json (or
clusterstate.json if you're on an earlier format). NOTE: there are two
ways to do this:
-- Just do file copies, scp, hand carry CDs, whatever. Solr should be
offline in the target data center.
-- use the replication API to issue a "fetchindex" command. This works
even in cloud mode, all the target Solr instance needs is access to a
URL it can pull from. Solr of course needs to be running in this case.

- Bring up Solr on the target data center and verify it's working.

- Use the Collections API to ADDREPLICA on the target system until you
build out the collection with the numbers of replicas you want.

- Start indexing to the target data center.

The bits about shutting off indexing is a safety measure, it
guarantees that the indexes are consistent. If you can't shut indexing
down during the transfer, you'll need to index docs to the
newly-rebuilt cluster in some manner that guarantees the two DCs will
have the same docs eventually.

Best,
Erick



On Tue, Mar 14, 2017 at 3:26 PM, vbindal  wrote:
> I think I dint explain properly.
>
> I have 3 data centers each with its own SOLR cloud.
>
> My original strategy was composite routing but when one data center went
> down and we brought it back, somehow the routing strategy on this changed to
> implicit (Other 2 DC still have composit and they are working absolutely
> fine).
>
> This might be the reason for the data corruption on that DS because the
> routing strategy got changed.
>
> If I changed the routing strategy back to composite (which it should be). is
> it ok? Do I need to do anything more than simply changing the strategy in
> the clusterState.json?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4325001.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread vbindal
I think I dint explain properly.

I have 3 data centers each with its own SOLR cloud. 

My original strategy was composite routing but when one data center went
down and we brought it back, somehow the routing strategy on this changed to
implicit (Other 2 DC still have composit and they are working absolutely
fine).

This might be the reason for the data corruption on that DS because the
routing strategy got changed.

If I changed the routing strategy back to composite (which it should be). is
it ok? Do I need to do anything more than simply changing the strategy in
the clusterState.json? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4325001.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread Erick Erickson
That would make the problem even worse. If you created the collection
with implicit routing, there are no hash ranges for each shard.
CompositeId requires hash ranges to be defined for each shard. Don't
even try.

Best,
Erick

On Tue, Mar 14, 2017 at 11:13 AM, vbindal  wrote:
> Compared it against the other 2 datacenters and they both have `compositeId
> `.
>
> This started happening after 1 of our zookeeper died due to hardware issue
> and we had to setup a new zookeeper machine. update the config in all the
> solr machine and restart the cloud. My guess is something went wrong and
> `implicit` router got created.
>
> Can I simply change the `clusterstate.json` to take care of this?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324950.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread vbindal
Compared it against the other 2 datacenters and they both have `compositeId
`.

This started happening after 1 of our zookeeper died due to hardware issue
and we had to setup a new zookeeper machine. update the config in all the
solr machine and restart the cloud. My guess is something went wrong and
`implicit` router got created.

Can I simply change the `clusterstate.json` to take care of this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324950.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread Erick Erickson
The default router has always been compositeId, but when you created
your collection you may have created it with implicit. Looking at the
clusterstate.json and/or state.json in the individual collection
should show you (admin UI>>cloud>>tree).

But we need to be very clear about what a "duplicate" document is.
Solr routes/replaces documents based on whatever you've defined as
 in your schema file (assuming compositeID routing). When
you say you get dups when you re-index, it sounds like you are somehow
using different s for what you consider the "same"
document.

bq: Also, This started happening after 1 of our zookeeper died due to hardware
issue and we had to setup a new zookeeper machine. update the config in all
the solr machine and restart the cloud.

Hmmm. "update the config in all the solr machine...". You should not
have to do this in SolrCloud. All the configs are stored in Zookeeper
and loaded from ZK when the Solr instance starts. What it's starting
to sound like is that you've somehow mixed up SorlCloud and older
"stand-alone" concepts and "somehow" your restoration process messed
up your configs.

So if the issue isn't that you're somehow using different
's, I'd recommend just starting over with a new collection
since you can re-index from scratch.

Best,
Erick

On Tue, Mar 14, 2017 at 10:35 AM, vbindal  wrote:
> Hi Shawn,
>
> We are on 4.10.0 version. Is that the default router in this version? Also,
> we dont see all the documents duplicated, only some of them. I have a
> indexer job to index data in SOLR. After I delete all the records and run
> this job, the count is correct but when I run the job again, we start seeing
> higher count and duplicate records (random records) in shards.
>
> Also, This started happening after 1 of our zookeeper died due to hardware
> issue and we had to setup a new zookeeper machine. update the config in all
> the solr machine and restart the cloud.
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324937.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread vbindal
Hi Shawn,

We are on 4.10.0 version. Is that the default router in this version? Also,
we dont see all the documents duplicated, only some of them. I have a
indexer job to index data in SOLR. After I delete all the records and run
this job, the count is correct but when I run the job again, we start seeing
higher count and duplicate records (random records) in shards. 

Also, This started happening after 1 of our zookeeper died due to hardware
issue and we had to setup a new zookeeper machine. update the config in all
the solr machine and restart the cloud. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2017-03-13 Thread Shawn Heisey
On 3/13/2017 3:16 AM, vbindal wrote:
> I am facing the same issue where my query *:* returns inconsistent number 
> (almost 3) time the actual number in millions.
>
> When I try disturb=false on every machine, the results are correct. but
> without `disturb=false` results are incorrect.

This most likely means that you've got duplicate documents in different
shards of your cloud.  This most commonly happens when the router is
"implicit" which is easier to understand if you imagine that "implicit"
is "manual" -- which would be a far better name for it.

In a later message you mention duplicate documents and 3 shards.  It
sounds like you have indexed all your documents to all three shards, and
are probably using the implicit router.

The implicit router basically means that there *IS* no shard routing --
documents would be either indexed by the shard that received them, or
directed to the shard indicated by explicit routing parameters.  With
three shards and "distrib=false" requests to one shard, you should see
about one-third of your total document count, not the full document count.

The router that you typically want if you don't want to be concerned
about how documents are routed to different shards is the compositeId
router.  This *should* be the default for a multi-shard collection in
any recent version, but the first one or two releases of SolrCloud might
have created the wrong type.

Thanks,
Shawn



Re: Inconsistent numFound in SC when querying core directly

2017-03-13 Thread vbindal
Hi,

I am facing the same issue where my query *:* returns inconsistent number
(almost 3 ) time the actual number in millions. 

When I try disturb=false on every machine, the results are correct. but
without `disturb=false` results are incorrect. 

Can you guys suggest something?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324561.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2013-12-05 Thread Tim Vaillancourt
Very good point. I've seen this issue occur once before when I was playing
with 4.3.1 and don't  remember it happening since 4.5.0+, so that is good
news - we are just behind.

For anyone that is curious, on my earlier mention that
Zookeeper/clusterstate.json was not taking updates: this was NOT correct.
Zookeeper has no issues taking set/creates to clusterstate.json (or any
znode), just this one node seemed to stay stuck as state: active while it
was very inconsistent for reasons unknown, potentially just bugs.

The good news is this will be resolved today with a create/destroy of the
bad replica.

Thanks all!

Tim


On 4 December 2013 16:50, Mark Miller markrmil...@gmail.com wrote:

 Keep in mind, there have been a *lot* of bug fixes since 4.3.1.

 - Mark

 On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote:

  Hey all,
 
  Now that I am getting correct results with distrib=false, I've
 identified that 1 of my nodes has just 1/3rd of the total data set and
 totally explains the flapping in results. The fix for this is obvious
 (rebuild replica) but the cause is less obvious.
 
  There is definately more than one issue going on with this SolrCloud
 (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that
 /clusterstate.json doesn't seem to get updated when nodes are brought
 down/up is the reason why this replica remained in the distributed request
 chain without recovering/re-replicating from leader.
 
  I imagine my Zookeeper ensemble is having some problems unrelated to
 Solr that is the real root cause.
 
  Thanks!
 
  Tim
 
  On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
  Chris, this is extremely helpful and it's silly I didn't think of this
 sooner! Thanks a lot, this makes the situation make much more sense.
 
  I will gather some proper data with your suggestion and get back to the
 thread shortly.
 
  Thanks!!
 
  Tim
 
  On 04/12/13 02:57 PM, Chris Hostetter wrote:
  :
  : I may be incorrect here, but I assumed when querying a single core
 of a
  : SolrCloud collection, the SolrCloud routing is bypassed and I am
 talking
  : directly to a plain/non-SolrCloud core.
 
  No ... every query received from a client by solr is handled by a
 single
  core -- if that core knows it's part of a SolrCloud collection then it
  will do a distributed search across a random replica from each shard in
  that collection.
 
  If you want to bypass the distribute search logic, you have to say so
  explicitly...
 
  To ask an arbitrary replica to only search itself add distrib=false
 to
  the request.
 
  Alternatively: you can ask that only certain shard names (or certain
  explicit replicas) be included in a distribute request..
 
  https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
 
 
 
  -Hoss
  http://www.lucidworks.com/




Re: Inconsistent numFound in SC when querying core directly

2013-12-05 Thread Tim Vaillancourt
I spoke too soon, my plan for fixing this didn't quite work.

I've moved this issue into a new thread/topic: No /clusterstate.json
updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE.

Thanks all for the help on this one!

Tim


On 5 December 2013 11:37, Tim Vaillancourt t...@elementspace.com wrote:

 Very good point. I've seen this issue occur once before when I was playing
 with 4.3.1 and don't  remember it happening since 4.5.0+, so that is good
 news - we are just behind.

 For anyone that is curious, on my earlier mention that
 Zookeeper/clusterstate.json was not taking updates: this was NOT correct.
 Zookeeper has no issues taking set/creates to clusterstate.json (or any
 znode), just this one node seemed to stay stuck as state: active while it
 was very inconsistent for reasons unknown, potentially just bugs.

 The good news is this will be resolved today with a create/destroy of the
 bad replica.

 Thanks all!

 Tim


 On 4 December 2013 16:50, Mark Miller markrmil...@gmail.com wrote:

 Keep in mind, there have been a *lot* of bug fixes since 4.3.1.

 - Mark

 On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com
 wrote:

  Hey all,
 
  Now that I am getting correct results with distrib=false, I've
 identified that 1 of my nodes has just 1/3rd of the total data set and
 totally explains the flapping in results. The fix for this is obvious
 (rebuild replica) but the cause is less obvious.
 
  There is definately more than one issue going on with this SolrCloud
 (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that
 /clusterstate.json doesn't seem to get updated when nodes are brought
 down/up is the reason why this replica remained in the distributed request
 chain without recovering/re-replicating from leader.
 
  I imagine my Zookeeper ensemble is having some problems unrelated to
 Solr that is the real root cause.
 
  Thanks!
 
  Tim
 
  On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
  Chris, this is extremely helpful and it's silly I didn't think of this
 sooner! Thanks a lot, this makes the situation make much more sense.
 
  I will gather some proper data with your suggestion and get back to
 the thread shortly.
 
  Thanks!!
 
  Tim
 
  On 04/12/13 02:57 PM, Chris Hostetter wrote:
  :
  : I may be incorrect here, but I assumed when querying a single core
 of a
  : SolrCloud collection, the SolrCloud routing is bypassed and I am
 talking
  : directly to a plain/non-SolrCloud core.
 
  No ... every query received from a client by solr is handled by a
 single
  core -- if that core knows it's part of a SolrCloud collection then it
  will do a distributed search across a random replica from each shard
 in
  that collection.
 
  If you want to bypass the distribute search logic, you have to say so
  explicitly...
 
  To ask an arbitrary replica to only search itself add distrib=false
 to
  the request.
 
  Alternatively: you can ask that only certain shard names (or certain
  explicit replicas) be included in a distribute request..
 
  https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
 
 
 
  -Hoss
  http://www.lucidworks.com/





Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg: 
q=key:timvaillancourt), not just the q=*:* I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is 
remaining state: active in my /clusterstate.json - something is really 
wrong with this cloud! Would a Zookeeper issue explain my varied results 
when querying a core directly?


Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of 
the SolrCloud collection on each instance, only to notice the same 
problem - the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of 
a SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84739144,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84771358,start:0,maxScore:1.0,docs:[]


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are 
no deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core directly?


An interesting observation is when I do an /admin/cores call to see 
the docCount of the core's index, it does not fluctuate, only the 
query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


RE: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-4260

Join the club Tim! Can you upgrade to trunk or incorporate the latest patches 
of related issues? You can fix it by trashing the bad node's data, although 
without multiple clusters it may be difficult to decide which node is bad.

We use the latest commits now (since tuesday) and are still waiting for it to 
happen again.

-Original message-
 From:Tim Vaillancourt t...@elementspace.com
 Sent: Wednesday 4th December 2013 23:38
 To: solr-user@lucene.apache.org
 Subject: Re: Inconsistent numFound in SC when querying core directly
 
 To add two more pieces of data:
 
 1) This occurs with real, conditional queries as well (eg: 
 q=key:timvaillancourt), not just the q=*:* I provided in my email.
 2) I've noticed when I bring a node of the SolrCloud down it is 
 remaining state: active in my /clusterstate.json - something is really 
 wrong with this cloud! Would a Zookeeper issue explain my varied results 
 when querying a core directly?
 
 Thanks again!
 
 Tim
 
 On 04/12/13 02:17 PM, Tim Vaillancourt wrote:
  Hey guys,
 
  I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
  3-node external Zookeeper and 1 collection (2 shards, 2 replicas).
 
  Currently we are noticing inconsistent results from the SolrCloud when 
  performing the same simple /select query many times to our collection. 
  Almost every other query the numFound count (and the returned data) 
  jumps between two very different values.
 
  Initially I suspected a replica in a shard of the collection was 
  inconsistent (and every other request hit that node) and started 
  performing the same /select query direct to the individual cores of 
  the SolrCloud collection on each instance, only to notice the same 
  problem - the count jumps between two very different values!
 
  I may be incorrect here, but I assumed when querying a single core of 
  a SolrCloud collection, the SolrCloud routing is bypassed and I am 
  talking directly to a plain/non-SolrCloud core.
 
  As you can see here, the count for 1 core of my SolrCloud collection 
  fluctuates wildly, and is only receiving updates and no deletes to 
  explain the jumps:
 
  solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
  'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
   
  numFound
response:{numFound:123596839,start:0,maxScore:1.0,docs:[]
 
  solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
  'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
   
  numFound
response:{numFound:84739144,start:0,maxScore:1.0,docs:[]
 
  solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
  'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
   
  numFound
response:{numFound:123596839,start:0,maxScore:1.0,docs:[]
 
  solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
  'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
   
  numFound
response:{numFound:84771358,start:0,maxScore:1.0,docs:[]
 
 
  Could anyone help me understand why the same /select query direct to a 
  single core would return inconsistent, flapping results if there are 
  no deletes issued in my app to cause such jumps? Am I incorrect in my 
  assumption that I am querying the core directly?
 
  An interesting observation is when I do an /admin/cores call to see 
  the docCount of the core's index, it does not fluctuate, only the 
  query result.
 
  That was hard to explain, hopefully someone has some insight! :)
 
  Thanks!
 
  Tim
 


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Thanks Markus,

I'm not sure if I'm encountering the same issue. This JIRA mentions 10s 
of docs difference, I'm seeing differences in the multi-millions of 
docs, and even more strangely it very predictably flaps between a 123M 
value and an 87M value, a 30M+ doc difference.


Secondly, I'm not comparing values from 2 instances (Leader to Replica), 
I'm currently performing the same curl call to the same core directly 
and am seeing flapping results each time I perform the query, so this is 
currently happening within a single instance/core unless I am 
misunderstanding how to directly query a core.


Cheers,

Tim

On 04/12/13 02:46 PM, Markus Jelsma wrote:

https://issues.apache.org/jira/browse/SOLR-4260

Join the club Tim! Can you upgrade to trunk or incorporate the latest patches 
of related issues? You can fix it by trashing the bad node's data, although 
without multiple clusters it may be difficult to decide which node is bad.

We use the latest commits now (since tuesday) and are still waiting for it to 
happen again.

-Original message-

From:Tim Vaillancourtt...@elementspace.com
Sent: Wednesday 4th December 2013 23:38
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent numFound in SC when querying core directly

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg:
q=key:timvaillancourt), not just the q=*:* I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is
remaining state: active in my /clusterstate.json - something is really
wrong with this cloud! Would a Zookeeper issue explain my varied results
when querying a core directly?

Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).

Currently we are noticing inconsistent results from the SolrCloud when
performing the same simple /select query many times to our collection.
Almost every other query the numFound count (and the returned data)
jumps between two very different values.

Initially I suspected a replica in a shard of the collection was
inconsistent (and every other request hit that node) and started
performing the same /select query direct to the individual cores of
the SolrCloud collection on each instance, only to notice the same
problem - the count jumps between two very different values!

I may be incorrect here, but I assumed when querying a single core of
a SolrCloud collection, the SolrCloud routing is bypassed and I am
talking directly to a plain/non-SolrCloud core.

As you can see here, the count for 1 core of my SolrCloud collection
fluctuates wildly, and is only receiving updates and no deletes to
explain the jumps:

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:84739144,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:84771358,start:0,maxScore:1.0,docs:[]


Could anyone help me understand why the same /select query direct to a
single core would return inconsistent, flapping results if there are
no deletes issued in my app to cause such jumps? Am I incorrect in my
assumption that I am querying the core directly?

An interesting observation is when I do an /admin/cores call to see
the docCount of the core's index, it does not fluctuate, only the
query result.

That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Chris Hostetter
: 
: I may be incorrect here, but I assumed when querying a single core of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am talking
: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single 
core -- if that core knows it's part of a SolrCloud collection then it 
will do a distributed search across a random replica from each shard in 
that collection.

If you want to bypass the distribute search logic, you have to say so 
explicitly...

To ask an arbitrary replica to only search itself add distrib=false to 
the request.

Alternatively: you can ask that only certain shard names (or certain 
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to the 
thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am talking
: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add distrib=false to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Hey all,

Now that I am getting correct results with distrib=false, I've 
identified that 1 of my nodes has just 1/3rd of the total data set and 
totally explains the flapping in results. The fix for this is obvious 
(rebuild replica) but the cause is less obvious.


There is definately more than one issue going on with this SolrCloud 
(but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that 
/clusterstate.json doesn't seem to get updated when nodes are brought 
down/up is the reason why this replica remained in the distributed 
request chain without recovering/re-replicating from leader.


I imagine my Zookeeper ensemble is having some problems unrelated to 
Solr that is the real root cause.


Thanks!

Tim

On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to 
the thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core 
of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking

: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add distrib=false to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Mark Miller
Keep in mind, there have been a *lot* of bug fixes since 4.3.1.

- Mark

On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote:

 Hey all,
 
 Now that I am getting correct results with distrib=false, I've identified 
 that 1 of my nodes has just 1/3rd of the total data set and totally explains 
 the flapping in results. The fix for this is obvious (rebuild replica) but 
 the cause is less obvious.
 
 There is definately more than one issue going on with this SolrCloud (but 1 
 down thanks to Chris' suggestion!), so I'm guessing the fact that 
 /clusterstate.json doesn't seem to get updated when nodes are brought down/up 
 is the reason why this replica remained in the distributed request chain 
 without recovering/re-replicating from leader.
 
 I imagine my Zookeeper ensemble is having some problems unrelated to Solr 
 that is the real root cause.
 
 Thanks!
 
 Tim
 
 On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
 Chris, this is extremely helpful and it's silly I didn't think of this 
 sooner! Thanks a lot, this makes the situation make much more sense.
 
 I will gather some proper data with your suggestion and get back to the 
 thread shortly.
 
 Thanks!!
 
 Tim
 
 On 04/12/13 02:57 PM, Chris Hostetter wrote:
 :
 : I may be incorrect here, but I assumed when querying a single core of a
 : SolrCloud collection, the SolrCloud routing is bypassed and I am talking
 : directly to a plain/non-SolrCloud core.
 
 No ... every query received from a client by solr is handled by a single
 core -- if that core knows it's part of a SolrCloud collection then it
 will do a distributed search across a random replica from each shard in
 that collection.
 
 If you want to bypass the distribute search logic, you have to say so
 explicitly...
 
 To ask an arbitrary replica to only search itself add distrib=false to
 the request.
 
 Alternatively: you can ask that only certain shard names (or certain
 explicit replicas) be included in a distribute request..
 
 https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
 
 
 
 -Hoss
 http://www.lucidworks.com/