Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread danny teichthal
According to what you describe, I really don't see the need of core discovery in Solr Cloud. It will only be used to eagerly load a core on startup. If I understand correctly, when ZK = truth, this eager loading can/should be done by consulting zookeeper instead of local disk. I agree that it is

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread Jeff Wartes
Well, with the understanding that someone who isn’t involved in the process is describing something that isn’t built yet... I could imagine changes like: - Core discovery ignores cores that aren’t present in the ZK cluster state - New cores are automatically created to bring a node in line

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread danny teichthal
Thanks Jeff, I understand your philosophy and it sounds correct. Since we had many problems with zookeeper when switching to Solr Cloud. we couldn't make it as a source of knowledge and had to relay on a more stable source. The issues is that when we get such an event of zookeeper, it brought our

Re: SolrCloud - Strategy for recovering cluster states

2016-03-01 Thread Jeff Wartes
I’ve been running SolrCloud clusters in various versions for a few years here, and I can only think of two or three cases that the ZK-stored cluster state was broken in a way that I had to manually intervene by hand-editing the contents of ZK. I think I’ve seen Solr fixes go by for those

Re: SolrCloud - Strategy for recovering cluster states

2016-03-01 Thread danny teichthal
Hi, Just summarizing my questions if the long mail is a little intimidating: 1. Is there a best practice/automated tool for overcoming problems in cluster state coming from zookeeper disconnections? 2. Creating a collection via core admin is discouraged, is it true also for core.properties

SolrCloud - Strategy for recovering cluster states

2016-02-29 Thread danny teichthal
Hi, I would like to describe a process we use for overcoming problems in cluster state when we have networking issues. Would appreciate if anyone can answer about what are the flaws on this solution and what is the best practice for recovery in case of network problems involving zookeeper. I'm