Re: solrcloud replicas not in sync

2017-05-24 Thread Walter Underwood
Funny, I took a different approach to the same monitoring problem. Each document has a published_timestamp field set when it is generated. The schema has an indexed_timestamp field with a default of NOW. I wrote some Python to get the set of nodes in the collection, query each one, then report

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
Actually I wrote a service that calls the collections API Cluster Status, but it adds data for each replica by calling the Core Admin STATUS https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-STATUS my service fills in the index information for more data This returns the

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
oh, those logs probably reflect the update job that runs every 15 minutes if there are updates, typically 1 or 2 changes. thanks for the info On Wed, May 24, 2017 at 10:37 AM, Erick Erickson wrote: > By default, enough closed log files will be kept to hold the last 100 > documents indexed. This

Re: solrcloud replicas not in sync

2017-05-24 Thread Erick Erickson
By default, enough closed log files will be kept to hold the last 100 documents indexed. This is for "peer sync" purposes. Say replica1 goes offline for a bit. When it comes back online, if it's fallen behind by no more than 100 docs, the docs are replayed from another replica's tlog. Having such

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
The tlog sizes are strange In the case of the collection where we had issues with the replicas the tlog sizes are 740 bytes and 938 bytes on the target side and the same on the source side. There are a lot of them on the source side, when do tlog files get deleted? On Tue, May 23, 2017 at 12:52

Re: solrcloud replicas not in sync

2017-05-24 Thread Erick Erickson
I wouldn't rely on the "current" flag in the admin UI as an indicator. As long as your numDocs and the like match I'd say it's a UI issue. Best, Erick On Wed, May 24, 2017 at 8:15 AM, Webster Homer wrote: > We see data in the target clusters. CDCR replication is working. We first > noticed the c

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
We see data in the target clusters. CDCR replication is working. We first noticed the current=false flag on the target replicas, but since I started looking I see it on the source too. I have removed the IgnoreCommitOptimizeUpdateProcessorFactory from our update processor chain, I did two data lo

Re: solrcloud replicas not in sync

2017-05-23 Thread Erick Erickson
This is all quite strange. Optimize (BTW, it's rarely necessary/desirable on an index that changes, despite its name) shouldn't matter here. CDCR forwards the raw documents to the target cluster. Ample time indeed. With a soft commit of 15 seconds, that's your window (with some slop for how long C

Re: solrcloud replicas not in sync

2017-05-23 Thread Webster Homer
We see a pretty consistent issue where the replicas show in the admin console as not current, indicating that our auto commit isn't commiting. In one case we loaded the data to the source, cdcr replicated it to the targets and we see the source and the target as having current = false. It is search

Re: solrcloud replicas not in sync

2017-05-22 Thread Erick Erickson
You can ping individual replicas by addressing to a specific replica and setting distrib=false, something like http://SOLR_NODE:port/solr/collection1_shard1_replica1/query?distrib=false&q=.. But one thing to check first is that you've committed. I'd: 1> turn off indexing on the source c

solrcloud replicas not in sync

2017-05-22 Thread Webster Homer
I have a solrcloud collection with 2 shards and 4 replicas. The replicas for shard 1 have different numbers of records, so different queries will return different numbers of records. I am not certain how this occurred, it happened in a collection that was a cdcr target. Is there a way to limit a