Re: solrcloud replicas not in sync

2017-05-24 Thread Walter Underwood
Funny, I took a different approach to the same monitoring problem. Each document has a published_timestamp field set when it is generated. The schema has an indexed_timestamp field with a default of NOW. I wrote some Python to get the set of nodes in the collection, query each one, then report

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
Actually I wrote a service that calls the collections API Cluster Status, but it adds data for each replica by calling the Core Admin STATUS https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-STATUS my service fills in the index information for more data This returns the

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
oh, those logs probably reflect the update job that runs every 15 minutes if there are updates, typically 1 or 2 changes. thanks for the info On Wed, May 24, 2017 at 10:37 AM, Erick Erickson wrote: > By default, enough closed log files will be kept to hold the last 100

Re: solrcloud replicas not in sync

2017-05-24 Thread Erick Erickson
By default, enough closed log files will be kept to hold the last 100 documents indexed. This is for "peer sync" purposes. Say replica1 goes offline for a bit. When it comes back online, if it's fallen behind by no more than 100 docs, the docs are replayed from another replica's tlog. Having such

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
The tlog sizes are strange In the case of the collection where we had issues with the replicas the tlog sizes are 740 bytes and 938 bytes on the target side and the same on the source side. There are a lot of them on the source side, when do tlog files get deleted? On Tue, May 23, 2017 at 12:52

Re: solrcloud replicas not in sync

2017-05-24 Thread Erick Erickson
I wouldn't rely on the "current" flag in the admin UI as an indicator. As long as your numDocs and the like match I'd say it's a UI issue. Best, Erick On Wed, May 24, 2017 at 8:15 AM, Webster Homer wrote: > We see data in the target clusters. CDCR replication is working.

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
We see data in the target clusters. CDCR replication is working. We first noticed the current=false flag on the target replicas, but since I started looking I see it on the source too. I have removed the IgnoreCommitOptimizeUpdateProcessorFactory from our update processor chain, I did two data

Re: solrcloud replicas not in sync

2017-05-23 Thread Erick Erickson
This is all quite strange. Optimize (BTW, it's rarely necessary/desirable on an index that changes, despite its name) shouldn't matter here. CDCR forwards the raw documents to the target cluster. Ample time indeed. With a soft commit of 15 seconds, that's your window (with some slop for how long

Re: solrcloud replicas not in sync

2017-05-23 Thread Webster Homer
We see a pretty consistent issue where the replicas show in the admin console as not current, indicating that our auto commit isn't commiting. In one case we loaded the data to the source, cdcr replicated it to the targets and we see the source and the target as having current = false. It is

Re: solrcloud replicas not in sync

2017-05-22 Thread Erick Erickson
You can ping individual replicas by addressing to a specific replica and setting distrib=false, something like http://SOLR_NODE:port/solr/collection1_shard1_replica1/query?distrib=false=.. But one thing to check first is that you've committed. I'd: 1> turn off indexing on the source

solrcloud replicas not in sync

2017-05-22 Thread Webster Homer
I have a solrcloud collection with 2 shards and 4 replicas. The replicas for shard 1 have different numbers of records, so different queries will return different numbers of records. I am not certain how this occurred, it happened in a collection that was a cdcr target. Is there a way to limit a

Re: SolrCloud replicas out of sync

2016-01-29 Thread David Smith
Tomás, Good find, but I don’t think the rate of updates was high enough during the network outage to create the overrun situation described in the ticket. I did notice that one of the proposed fixes, https://issues.apache.org/jira/browse/SOLR-8586, is an entire-index consistency check between

Re: SolrCloud replicas out of sync

2016-01-28 Thread Tomás Fernández Löbbe
Maybe you are hitting the reordering issue described in SOLR-8129? Tomás On Wed, Jan 27, 2016 at 11:32 AM, David Smith wrote: > Sure. Here is our SolrCloud cluster: > >+ Three (3) instances of Zookeeper on three separate (physical) > servers. The ZK servers

Re: SolrCloud replicas out of sync

2016-01-27 Thread David Smith
Jeff, again, very much appreciate your feedback. It is interesting — the article you linked to by Shalin is exactly why we picked SolrCloud over ES, because (eventual) consistency is critical for our application and we will sacrifice availability for it. To be clear, after the outage, NONE

Re: SolrCloud replicas out of sync

2016-01-27 Thread Shawn Heisey
On 1/27/2016 8:59 AM, David Smith wrote: > So we definitely don’t have CP yet — our very first network outage resulted > in multiple overlapped lost updates. As a result, I can’t pick one replica > and make it the new “master”. I must rebuild this collection from scratch, > which I can do,

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
On 1/27/16, 8:28 AM, "Shawn Heisey" wrote: > >I don't think any documentation states this, but it seems like a good >idea to me use an alias from day one, so that you always have the option >of swapping the "real" collection that you are using without needing to >change

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
If you can identify the problem documents, you can just re-index those after forcing a sync. Might save a full rebuild and downtime. You might describe your cluster setup, including ZK. it sounds like you’ve done your research, but improper ZK node distribution could certainly invalidate some

Re: SolrCloud replicas out of sync

2016-01-27 Thread David Smith
Sure. Here is our SolrCloud cluster: + Three (3) instances of Zookeeper on three separate (physical) servers. The ZK servers are beefy and fairly recently built, with 2x10 GigE (bonded) Ethernet connectivity to the rest of the data center. We recognize importance of the stability and

Re: SolrCloud replicas out of sync

2016-01-27 Thread Brian Narsi
This on the surface appears to be similar to an earlier thread by me: "Query results change" On Tue, Jan 26, 2016 at 4:32 PM, Jeff Wartes wrote: > > Ah, perhaps you fell into something like this then? > https://issues.apache.org/jira/browse/SOLR-7844 > > That says it’s

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
My understanding is that the "version" represents the timestamp the searcher was opened, so it doesn’t really offer any assurances about your data. Although you could probably bounce a node and get your document counts back in sync (by provoking a check), it’s interesting that you’re in this

Re: SolrCloud replicas out of sync

2016-01-26 Thread David Smith
Thanks Jeff! A few comments >> >> Although you could probably bounce a node and get your document counts back >> in sync (by provoking a check) >> If the check is a simple doc count, that will not work. We have found that replica1 and replica3, although they contain the same doc count,

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
Ah, perhaps you fell into something like this then? https://issues.apache.org/jira/browse/SOLR-7844 That says it’s fixed in 5.4, but that would be an example of a split-brain type incident, where different documents were accepted by different replicas who each thought they were the leader. If

SolrCloud replicas out of sync

2016-01-22 Thread David Smith
I have a SolrCloud v5.4 collection with 3 replicas that appear to have fallen permanently out of sync. Users started to complain that the same search, executed twice, sometimes returned different result counts. Sure enough, our replicas are not identical: >> shard1_replica1: 89867 documents