Thank you Tomás for pointing to the JavaDoc http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE
The Javadoc is quite clear. So this stale state.json is not an issue after all. However, it's very confusing that when a node goes down, state.json may be updated for 1 collection while it remains stale in the other collection. Also in our case, the node did not crash as per the JavaDoc... it was a normal server stop/shut-down. We may need to review our shut-down process and see whether things change. Thank you very much Erick and Tomás for your valuable help... very appreciated. Arcadius. On 8 September 2015 at 18:28, Erick Erickson <erickerick...@gmail.com> wrote: > bq: You were probably referring to state.json > > yep, I'm never sure whether people are on the old or new ZK versions. > > OK, With Tomás' comment, I think it's explained... although confusing. > > WDYT? > > > On Tue, Sep 8, 2015 at 10:03 AM, Arcadius Ahouansou > <arcad...@menelic.com> wrote: > > Hello Erick. > > > > Yes, > > > > 1> liveNodes has N nodes listed (correctly): Correct, liveNodes is always > > right. > > > > 2> clusterstate.json has N+M nodes listed as "active": clusterstate.json > is > > always empty as it's no longer being "used" in 5.3. You were > > probably referring to state.json which is in individual collections. Yes, > > that one reflects the wrong value i.e N+M > > > > 3> using the collection API to get CLUSTERSTATUS always return the > correct > > value N > > > > 4> The Front-end code in code in cloud.js displays the right colour when > > nodes go down because it checks for the live node > > > > The problem is only with state.json under certain circumstances. > > > > Thanks. > > > > On 8 September 2015 at 17:51, Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> Arcadius: > >> > >> Hmmm. It may take a while for the cluster state to change, but I'm > >> assuming that this state persists for minutes/hours/days. > >> > >> So to recap: If dump the entire ZK node from the root, you have > >> 1> liveNodes has N nodes listed (correctly) > >> 2> clusterstate.json has N+M nodes listed as "active" > >> > >> Doesn't sound right to me, but I'll have to let people who are deep > >> into that code speculate from here. > >> > >> Best, > >> Erick > >> > >> On Tue, Sep 8, 2015 at 1:13 AM, Arcadius Ahouansou < > arcad...@menelic.com> > >> wrote: > >> > On Sep 8, 2015 6:25 AM, "Erick Erickson" <erickerick...@gmail.com> > >> wrote: > >> >> > >> >> Perhaps the browser cache? What happens if you, say, use > >> >> Zookeeper client tools to bring down the the cluster state in > >> >> question? Or perhaps just refresh the admin UI when showing > >> >> the cluster status.... > >> >> > >> > > >> > Hello Erick. > >> > > >> > Thank you very much for answering. > >> > I did use the ZooInspetor tool to check the state.json in all 5 zk > nodes > >> > and they are all out of date and identical to what I get through the > tree > >> > view in sole admin ui. > >> > > >> > Looking at the source code cloud.js that correctly display nodes as > >> "gone" > >> > in the graph view, it calls the end point /zookeeper?wt=json and > relies > >> on > >> > the live nodes to mark a node as down instead of status.json. > >> > > >> > Thanks. > >> > > >> >> Shot in the dark, > >> >> Erick > >> >> > >> >> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou < > >> arcad...@menelic.com> > >> > wrote: > >> >> > We are running the latest Solr 5.3.0 > >> >> > > >> >> > Thanks. > >> > > > > > > > > -- > > Arcadius Ahouansou > > Menelic Ltd | Information is Power > > M: 07908761999 > > W: www.menelic.com > > --- > -- Arcadius Ahouansou Menelic Ltd | Information is Power M: 07908761999 W: www.menelic.com ---