Re: Interesting failure scenario, SolrCloud and ZK nodes on different times

2013-08-08 Thread Grant Ingersoll
I seem to recall seeing this on my cluster when we didn't have clocks in sync, but perhaps my memory is fuzzy as well. -Grant On Aug 7, 2013, at 7:41 AM, Erick Erickson erickerick...@gmail.com wrote: Well, we're reconstructing a chain of _possibilities_ post-mortem, so there's not much I can

Re: Interesting failure scenario, SolrCloud and ZK nodes on different times

2013-08-07 Thread Erick Erickson
Well, we're reconstructing a chain of _possibilities_ post-mortem, so there's not much I can say for sure. Mostly just throwing this out there in case it sparks some aha moments. Not knowing ZK well, anything I say is speculation. But I speculate that this isn't really the root of the problem

Interesting failure scenario, SolrCloud and ZK nodes on different times

2013-08-06 Thread Erick Erickson
I've become aware of a situation I thought I'd pass along. A SolrCloud installation had several ZK nodes that has very significantly offset times. They were being hit with the ClusterState says we are the leader, but locally we don't think we are error when nodes were recovering. Of course whether

Re: Interesting failure scenario, SolrCloud and ZK nodes on different times

2013-08-06 Thread Shawn Heisey
On 8/6/2013 1:56 PM, Erick Erickson wrote: I've become aware of a situation I thought I'd pass along. A SolrCloud installation had several ZK nodes that has very significantly offset times. They were being hit with the ClusterState says we are the leader, but locally we don't think we are error

Re: Interesting failure scenario, SolrCloud and ZK nodes on different times

2013-08-06 Thread Chris Hostetter
: When the times were coordinated, many of the problems with recovery went : away. We're trying to reconstruct the scenario from memory, but it : prompted me to pass the incident in case it sparked any thoughts. : Specifically, I wonder if there's anything that comes to mind if the ZK :