Re: Inconsistent numFound in SC when querying core directly
bq: If I changed the routing strategy back to composite (which it should be). is it ok? I sincerely doubt it. The docs have already been routed to the wrong place (actually, I'm not sure how it worked at all). You can't get them redistributed simply by changing the definition in ZooKeeper, they're _already_ in the wrong place. I'd tear down the corrupted data center and rebuild the collection. Here "tear down" is delete all the affected collections and start over again. On the plus side, if you can get a window during which you are _not_ indexing you can copy the indexes from one of your good data centers to the new one. Do it like this: - Stop indexing. - Set up the new collection in the corrupted data center. It's important that it have _exactly_ the same number of shards ad the DC you're going to transfer _from_. Also, make it leader only, i.e. exactly 1 replica per shard. - copy the indexes over from the good data center to the corresponding shards. Here "corresponding" means that the source and destination have the same hash range, which you can see from the state.json (or clusterstate.json if you're on an earlier format). NOTE: there are two ways to do this: -- Just do file copies, scp, hand carry CDs, whatever. Solr should be offline in the target data center. -- use the replication API to issue a "fetchindex" command. This works even in cloud mode, all the target Solr instance needs is access to a URL it can pull from. Solr of course needs to be running in this case. - Bring up Solr on the target data center and verify it's working. - Use the Collections API to ADDREPLICA on the target system until you build out the collection with the numbers of replicas you want. - Start indexing to the target data center. The bits about shutting off indexing is a safety measure, it guarantees that the indexes are consistent. If you can't shut indexing down during the transfer, you'll need to index docs to the newly-rebuilt cluster in some manner that guarantees the two DCs will have the same docs eventually. Best, Erick On Tue, Mar 14, 2017 at 3:26 PM, vbindalwrote: > I think I dint explain properly. > > I have 3 data centers each with its own SOLR cloud. > > My original strategy was composite routing but when one data center went > down and we brought it back, somehow the routing strategy on this changed to > implicit (Other 2 DC still have composit and they are working absolutely > fine). > > This might be the reason for the data corruption on that DS because the > routing strategy got changed. > > If I changed the routing strategy back to composite (which it should be). is > it ok? Do I need to do anything more than simply changing the strategy in > the clusterState.json? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4325001.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Inconsistent numFound in SC when querying core directly
I think I dint explain properly. I have 3 data centers each with its own SOLR cloud. My original strategy was composite routing but when one data center went down and we brought it back, somehow the routing strategy on this changed to implicit (Other 2 DC still have composit and they are working absolutely fine). This might be the reason for the data corruption on that DS because the routing strategy got changed. If I changed the routing strategy back to composite (which it should be). is it ok? Do I need to do anything more than simply changing the strategy in the clusterState.json? -- View this message in context: http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4325001.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Inconsistent numFound in SC when querying core directly
That would make the problem even worse. If you created the collection with implicit routing, there are no hash ranges for each shard. CompositeId requires hash ranges to be defined for each shard. Don't even try. Best, Erick On Tue, Mar 14, 2017 at 11:13 AM, vbindalwrote: > Compared it against the other 2 datacenters and they both have `compositeId > `. > > This started happening after 1 of our zookeeper died due to hardware issue > and we had to setup a new zookeeper machine. update the config in all the > solr machine and restart the cloud. My guess is something went wrong and > `implicit` router got created. > > Can I simply change the `clusterstate.json` to take care of this? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324950.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Inconsistent numFound in SC when querying core directly
Compared it against the other 2 datacenters and they both have `compositeId `. This started happening after 1 of our zookeeper died due to hardware issue and we had to setup a new zookeeper machine. update the config in all the solr machine and restart the cloud. My guess is something went wrong and `implicit` router got created. Can I simply change the `clusterstate.json` to take care of this? -- View this message in context: http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324950.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Inconsistent numFound in SC when querying core directly
The default router has always been compositeId, but when you created your collection you may have created it with implicit. Looking at the clusterstate.json and/or state.json in the individual collection should show you (admin UI>>cloud>>tree). But we need to be very clear about what a "duplicate" document is. Solr routes/replaces documents based on whatever you've defined as in your schema file (assuming compositeID routing). When you say you get dups when you re-index, it sounds like you are somehow using different s for what you consider the "same" document. bq: Also, This started happening after 1 of our zookeeper died due to hardware issue and we had to setup a new zookeeper machine. update the config in all the solr machine and restart the cloud. Hmmm. "update the config in all the solr machine...". You should not have to do this in SolrCloud. All the configs are stored in Zookeeper and loaded from ZK when the Solr instance starts. What it's starting to sound like is that you've somehow mixed up SorlCloud and older "stand-alone" concepts and "somehow" your restoration process messed up your configs. So if the issue isn't that you're somehow using different 's, I'd recommend just starting over with a new collection since you can re-index from scratch. Best, Erick On Tue, Mar 14, 2017 at 10:35 AM, vbindalwrote: > Hi Shawn, > > We are on 4.10.0 version. Is that the default router in this version? Also, > we dont see all the documents duplicated, only some of them. I have a > indexer job to index data in SOLR. After I delete all the records and run > this job, the count is correct but when I run the job again, we start seeing > higher count and duplicate records (random records) in shards. > > Also, This started happening after 1 of our zookeeper died due to hardware > issue and we had to setup a new zookeeper machine. update the config in all > the solr machine and restart the cloud. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324937.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Inconsistent numFound in SC when querying core directly
Hi Shawn, We are on 4.10.0 version. Is that the default router in this version? Also, we dont see all the documents duplicated, only some of them. I have a indexer job to index data in SOLR. After I delete all the records and run this job, the count is correct but when I run the job again, we start seeing higher count and duplicate records (random records) in shards. Also, This started happening after 1 of our zookeeper died due to hardware issue and we had to setup a new zookeeper machine. update the config in all the solr machine and restart the cloud. -- View this message in context: http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324937.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Inconsistent numFound in SC when querying core directly
On 3/13/2017 3:16 AM, vbindal wrote: > I am facing the same issue where my query *:* returns inconsistent number > (almost 3) time the actual number in millions. > > When I try disturb=false on every machine, the results are correct. but > without `disturb=false` results are incorrect. This most likely means that you've got duplicate documents in different shards of your cloud. This most commonly happens when the router is "implicit" which is easier to understand if you imagine that "implicit" is "manual" -- which would be a far better name for it. In a later message you mention duplicate documents and 3 shards. It sounds like you have indexed all your documents to all three shards, and are probably using the implicit router. The implicit router basically means that there *IS* no shard routing -- documents would be either indexed by the shard that received them, or directed to the shard indicated by explicit routing parameters. With three shards and "distrib=false" requests to one shard, you should see about one-third of your total document count, not the full document count. The router that you typically want if you don't want to be concerned about how documents are routed to different shards is the compositeId router. This *should* be the default for a multi-shard collection in any recent version, but the first one or two releases of SolrCloud might have created the wrong type. Thanks, Shawn
Re: Inconsistent numFound in SC when querying core directly
Hi, I am facing the same issue where my query *:* returns inconsistent number (almost 3 ) time the actual number in millions. When I try disturb=false on every machine, the results are correct. but without `disturb=false` results are incorrect. Can you guys suggest something? -- View this message in context: http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324561.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Inconsistent numFound in SC when querying core directly
Very good point. I've seen this issue occur once before when I was playing with 4.3.1 and don't remember it happening since 4.5.0+, so that is good news - we are just behind. For anyone that is curious, on my earlier mention that Zookeeper/clusterstate.json was not taking updates: this was NOT correct. Zookeeper has no issues taking set/creates to clusterstate.json (or any znode), just this one node seemed to stay stuck as state: active while it was very inconsistent for reasons unknown, potentially just bugs. The good news is this will be resolved today with a create/destroy of the bad replica. Thanks all! Tim On 4 December 2013 16:50, Mark Miller markrmil...@gmail.com wrote: Keep in mind, there have been a *lot* of bug fixes since 4.3.1. - Mark On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey all, Now that I am getting correct results with distrib=false, I've identified that 1 of my nodes has just 1/3rd of the total data set and totally explains the flapping in results. The fix for this is obvious (rebuild replica) but the cause is less obvious. There is definately more than one issue going on with this SolrCloud (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that /clusterstate.json doesn't seem to get updated when nodes are brought down/up is the reason why this replica remained in the distributed request chain without recovering/re-replicating from leader. I imagine my Zookeeper ensemble is having some problems unrelated to Solr that is the real root cause. Thanks! Tim On 04/12/13 03:00 PM, Tim Vaillancourt wrote: Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: Inconsistent numFound in SC when querying core directly
I spoke too soon, my plan for fixing this didn't quite work. I've moved this issue into a new thread/topic: No /clusterstate.json updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE. Thanks all for the help on this one! Tim On 5 December 2013 11:37, Tim Vaillancourt t...@elementspace.com wrote: Very good point. I've seen this issue occur once before when I was playing with 4.3.1 and don't remember it happening since 4.5.0+, so that is good news - we are just behind. For anyone that is curious, on my earlier mention that Zookeeper/clusterstate.json was not taking updates: this was NOT correct. Zookeeper has no issues taking set/creates to clusterstate.json (or any znode), just this one node seemed to stay stuck as state: active while it was very inconsistent for reasons unknown, potentially just bugs. The good news is this will be resolved today with a create/destroy of the bad replica. Thanks all! Tim On 4 December 2013 16:50, Mark Miller markrmil...@gmail.com wrote: Keep in mind, there have been a *lot* of bug fixes since 4.3.1. - Mark On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey all, Now that I am getting correct results with distrib=false, I've identified that 1 of my nodes has just 1/3rd of the total data set and totally explains the flapping in results. The fix for this is obvious (rebuild replica) but the cause is less obvious. There is definately more than one issue going on with this SolrCloud (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that /clusterstate.json doesn't seem to get updated when nodes are brought down/up is the reason why this replica remained in the distributed request chain without recovering/re-replicating from leader. I imagine my Zookeeper ensemble is having some problems unrelated to Solr that is the real root cause. Thanks! Tim On 04/12/13 03:00 PM, Tim Vaillancourt wrote: Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: Inconsistent numFound in SC when querying core directly
To add two more pieces of data: 1) This occurs with real, conditional queries as well (eg: q=key:timvaillancourt), not just the q=*:* I provided in my email. 2) I've noticed when I bring a node of the SolrCloud down it is remaining state: active in my /clusterstate.json - something is really wrong with this cloud! Would a Zookeeper issue explain my varied results when querying a core directly? Thanks again! Tim On 04/12/13 02:17 PM, Tim Vaillancourt wrote: Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
RE: Inconsistent numFound in SC when querying core directly
https://issues.apache.org/jira/browse/SOLR-4260 Join the club Tim! Can you upgrade to trunk or incorporate the latest patches of related issues? You can fix it by trashing the bad node's data, although without multiple clusters it may be difficult to decide which node is bad. We use the latest commits now (since tuesday) and are still waiting for it to happen again. -Original message- From:Tim Vaillancourt t...@elementspace.com Sent: Wednesday 4th December 2013 23:38 To: solr-user@lucene.apache.org Subject: Re: Inconsistent numFound in SC when querying core directly To add two more pieces of data: 1) This occurs with real, conditional queries as well (eg: q=key:timvaillancourt), not just the q=*:* I provided in my email. 2) I've noticed when I bring a node of the SolrCloud down it is remaining state: active in my /clusterstate.json - something is really wrong with this cloud! Would a Zookeeper issue explain my varied results when querying a core directly? Thanks again! Tim On 04/12/13 02:17 PM, Tim Vaillancourt wrote: Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
Re: Inconsistent numFound in SC when querying core directly
Thanks Markus, I'm not sure if I'm encountering the same issue. This JIRA mentions 10s of docs difference, I'm seeing differences in the multi-millions of docs, and even more strangely it very predictably flaps between a 123M value and an 87M value, a 30M+ doc difference. Secondly, I'm not comparing values from 2 instances (Leader to Replica), I'm currently performing the same curl call to the same core directly and am seeing flapping results each time I perform the query, so this is currently happening within a single instance/core unless I am misunderstanding how to directly query a core. Cheers, Tim On 04/12/13 02:46 PM, Markus Jelsma wrote: https://issues.apache.org/jira/browse/SOLR-4260 Join the club Tim! Can you upgrade to trunk or incorporate the latest patches of related issues? You can fix it by trashing the bad node's data, although without multiple clusters it may be difficult to decide which node is bad. We use the latest commits now (since tuesday) and are still waiting for it to happen again. -Original message- From:Tim Vaillancourtt...@elementspace.com Sent: Wednesday 4th December 2013 23:38 To: solr-user@lucene.apache.org Subject: Re: Inconsistent numFound in SC when querying core directly To add two more pieces of data: 1) This occurs with real, conditional queries as well (eg: q=key:timvaillancourt), not just the q=*:* I provided in my email. 2) I've noticed when I bring a node of the SolrCloud down it is remaining state: active in my /clusterstate.json - something is really wrong with this cloud! Would a Zookeeper issue explain my varied results when querying a core directly? Thanks again! Tim On 04/12/13 02:17 PM, Tim Vaillancourt wrote: Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
Re: Inconsistent numFound in SC when querying core directly
: : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: Inconsistent numFound in SC when querying core directly
Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: Inconsistent numFound in SC when querying core directly
Hey all, Now that I am getting correct results with distrib=false, I've identified that 1 of my nodes has just 1/3rd of the total data set and totally explains the flapping in results. The fix for this is obvious (rebuild replica) but the cause is less obvious. There is definately more than one issue going on with this SolrCloud (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that /clusterstate.json doesn't seem to get updated when nodes are brought down/up is the reason why this replica remained in the distributed request chain without recovering/re-replicating from leader. I imagine my Zookeeper ensemble is having some problems unrelated to Solr that is the real root cause. Thanks! Tim On 04/12/13 03:00 PM, Tim Vaillancourt wrote: Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: Inconsistent numFound in SC when querying core directly
Keep in mind, there have been a *lot* of bug fixes since 4.3.1. - Mark On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey all, Now that I am getting correct results with distrib=false, I've identified that 1 of my nodes has just 1/3rd of the total data set and totally explains the flapping in results. The fix for this is obvious (rebuild replica) but the cause is less obvious. There is definately more than one issue going on with this SolrCloud (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that /clusterstate.json doesn't seem to get updated when nodes are brought down/up is the reason why this replica remained in the distributed request chain without recovering/re-replicating from leader. I imagine my Zookeeper ensemble is having some problems unrelated to Solr that is the real root cause. Thanks! Tim On 04/12/13 03:00 PM, Tim Vaillancourt wrote: Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/