Re: SolrCloud replicas out of sync

2016-01-29 Thread David Smith
Tomás, Good find, but I don’t think the rate of updates was high enough during the network outage to create the overrun situation described in the ticket. I did notice that one of the proposed fixes, https://issues.apache.org/jira/browse/SOLR-8586, is an entire-index consistency check between

Re: SolrCloud replicas out of sync

2016-01-28 Thread Tomás Fernández Löbbe
Maybe you are hitting the reordering issue described in SOLR-8129? Tomás On Wed, Jan 27, 2016 at 11:32 AM, David Smith wrote: > Sure. Here is our SolrCloud cluster: > >+ Three (3) instances of Zookeeper on three separate (physical) > servers. The ZK servers

Re: SolrCloud replicas out of sync

2016-01-27 Thread David Smith
Jeff, again, very much appreciate your feedback. It is interesting — the article you linked to by Shalin is exactly why we picked SolrCloud over ES, because (eventual) consistency is critical for our application and we will sacrifice availability for it. To be clear, after the outage, NONE

Re: SolrCloud replicas out of sync

2016-01-27 Thread Shawn Heisey
On 1/27/2016 8:59 AM, David Smith wrote: > So we definitely don’t have CP yet — our very first network outage resulted > in multiple overlapped lost updates. As a result, I can’t pick one replica > and make it the new “master”. I must rebuild this collection from scratch, > which I can do,

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
On 1/27/16, 8:28 AM, "Shawn Heisey" wrote: > >I don't think any documentation states this, but it seems like a good >idea to me use an alias from day one, so that you always have the option >of swapping the "real" collection that you are using without needing to >change

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
If you can identify the problem documents, you can just re-index those after forcing a sync. Might save a full rebuild and downtime. You might describe your cluster setup, including ZK. it sounds like you’ve done your research, but improper ZK node distribution could certainly invalidate some

Re: SolrCloud replicas out of sync

2016-01-27 Thread David Smith
Sure. Here is our SolrCloud cluster: + Three (3) instances of Zookeeper on three separate (physical) servers. The ZK servers are beefy and fairly recently built, with 2x10 GigE (bonded) Ethernet connectivity to the rest of the data center. We recognize importance of the stability and

Re: SolrCloud replicas out of sync

2016-01-27 Thread Brian Narsi
This on the surface appears to be similar to an earlier thread by me: "Query results change" On Tue, Jan 26, 2016 at 4:32 PM, Jeff Wartes wrote: > > Ah, perhaps you fell into something like this then? > https://issues.apache.org/jira/browse/SOLR-7844 > > That says it’s

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
My understanding is that the "version" represents the timestamp the searcher was opened, so it doesn’t really offer any assurances about your data. Although you could probably bounce a node and get your document counts back in sync (by provoking a check), it’s interesting that you’re in this

Re: SolrCloud replicas out of sync

2016-01-26 Thread David Smith
Thanks Jeff! A few comments >> >> Although you could probably bounce a node and get your document counts back >> in sync (by provoking a check) >> If the check is a simple doc count, that will not work. We have found that replica1 and replica3, although they contain the same doc count,

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
Ah, perhaps you fell into something like this then? https://issues.apache.org/jira/browse/SOLR-7844 That says it’s fixed in 5.4, but that would be an example of a split-brain type incident, where different documents were accepted by different replicas who each thought they were the leader. If

SolrCloud replicas out of sync

2016-01-22 Thread David Smith
I have a SolrCloud v5.4 collection with 3 replicas that appear to have fallen permanently out of sync. Users started to complain that the same search, executed twice, sometimes returned different result counts. Sure enough, our replicas are not identical: >> shard1_replica1: 89867 documents