I spoke too soon, my plan for fixing this didn't quite work. I've moved this issue into a new thread/topic: "No /clusterstate.json updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE".
Thanks all for the help on this one! Tim On 5 December 2013 11:37, Tim Vaillancourt <t...@elementspace.com> wrote: > Very good point. I've seen this issue occur once before when I was playing > with 4.3.1 and don't remember it happening since 4.5.0+, so that is good > news - we are just behind. > > For anyone that is curious, on my earlier mention that > Zookeeper/clusterstate.json was not taking updates: this was NOT correct. > Zookeeper has no issues taking set/creates to clusterstate.json (or any > znode), just this one node seemed to stay stuck as "state: active" while it > was very inconsistent for reasons unknown, potentially just bugs. > > The good news is this will be resolved today with a create/destroy of the > bad replica. > > Thanks all! > > Tim > > > On 4 December 2013 16:50, Mark Miller <markrmil...@gmail.com> wrote: > >> Keep in mind, there have been a *lot* of bug fixes since 4.3.1. >> >> - Mark >> >> On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt <t...@elementspace.com> >> wrote: >> >> > Hey all, >> > >> > Now that I am getting correct results with "distrib=false", I've >> identified that 1 of my nodes has just 1/3rd of the total data set and >> totally explains the flapping in results. The fix for this is obvious >> (rebuild replica) but the cause is less obvious. >> > >> > There is definately more than one issue going on with this SolrCloud >> (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that >> /clusterstate.json doesn't seem to get updated when nodes are brought >> down/up is the reason why this replica remained in the distributed request >> chain without recovering/re-replicating from leader. >> > >> > I imagine my Zookeeper ensemble is having some problems unrelated to >> Solr that is the real root cause. >> > >> > Thanks! >> > >> > Tim >> > >> > On 04/12/13 03:00 PM, Tim Vaillancourt wrote: >> >> Chris, this is extremely helpful and it's silly I didn't think of this >> sooner! Thanks a lot, this makes the situation make much more sense. >> >> >> >> I will gather some proper data with your suggestion and get back to >> the thread shortly. >> >> >> >> Thanks!! >> >> >> >> Tim >> >> >> >> On 04/12/13 02:57 PM, Chris Hostetter wrote: >> >>> : >> >>> : I may be incorrect here, but I assumed when querying a single core >> of a >> >>> : SolrCloud collection, the SolrCloud routing is bypassed and I am >> talking >> >>> : directly to a plain/non-SolrCloud core. >> >>> >> >>> No ... every query received from a client by solr is handled by a >> single >> >>> core -- if that core knows it's part of a SolrCloud collection then it >> >>> will do a distributed search across a random replica from each shard >> in >> >>> that collection. >> >>> >> >>> If you want to bypass the distribute search logic, you have to say so >> >>> explicitly... >> >>> >> >>> To ask an arbitrary replica to only search itself add "distrib=false" >> to >> >>> the request. >> >>> >> >>> Alternatively: you can ask that only certain shard names (or certain >> >>> explicit replicas) be included in a distribute request.. >> >>> >> >>> https://cwiki.apache.org/confluence/display/solr/Distributed+Requests >> >>> >> >>> >> >>> >> >>> -Hoss >> >>> http://www.lucidworks.com/ >> >> >