Funny, I took a different approach to the same monitoring problem.
Each document has a published_timestamp field set when it is generated. The
schema has an indexed_timestamp field with a default of NOW. I wrote some
Python to get the set of nodes in the collection, query each one, then report
Actually I wrote a service that calls the collections API Cluster Status,
but it adds data for each replica by calling the Core Admin STATUS
https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-STATUS
my service fills in the index information for more data
This returns the
oh, those logs probably reflect the update job that runs every 15 minutes
if there are updates, typically 1 or 2 changes. thanks for the info
On Wed, May 24, 2017 at 10:37 AM, Erick Erickson
wrote:
> By default, enough closed log files will be kept to hold the last 100
By default, enough closed log files will be kept to hold the last 100
documents indexed. This is for "peer sync" purposes. Say replica1 goes
offline for a bit. When it comes back online, if it's fallen behind by
no more than 100 docs, the docs are replayed from another replica's
tlog.
Having such
The tlog sizes are strange
In the case of the collection where we had issues with the replicas the
tlog sizes are 740 bytes and 938 bytes on the target side and the same on
the source side. There are a lot of them on the source side, when do tlog
files get deleted?
On Tue, May 23, 2017 at 12:52
I wouldn't rely on the "current" flag in the admin UI as an indicator.
As long as your numDocs and the like match I'd say it's a UI issue.
Best,
Erick
On Wed, May 24, 2017 at 8:15 AM, Webster Homer wrote:
> We see data in the target clusters. CDCR replication is working.
We see data in the target clusters. CDCR replication is working. We first
noticed the current=false flag on the target replicas, but since I started
looking I see it on the source too.
I have removed the IgnoreCommitOptimizeUpdateProcessorFactory from our
update processor chain, I did two data
This is all quite strange. Optimize (BTW, it's rarely
necessary/desirable on an index that changes, despite its name)
shouldn't matter here. CDCR forwards the raw documents to the target
cluster.
Ample time indeed. With a soft commit of 15 seconds, that's your
window (with some slop for how long
We see a pretty consistent issue where the replicas show in the admin
console as not current, indicating that our auto commit isn't commiting. In
one case we loaded the data to the source, cdcr replicated it to the
targets and we see the source and the target as having current = false. It
is
You can ping individual replicas by addressing to a specific replica
and setting distrib=false, something like
http://SOLR_NODE:port/solr/collection1_shard1_replica1/query?distrib=false=..
But one thing to check first is that you've committed. I'd:
1> turn off indexing on the source
I have a solrcloud collection with 2 shards and 4 replicas. The replicas
for shard 1 have different numbers of records, so different queries will
return different numbers of records.
I am not certain how this occurred, it happened in a collection that was a
cdcr target.
Is there a way to limit a
Tomás,
Good find, but I don’t think the rate of updates was high enough during the
network outage to create the overrun situation described in the ticket.
I did notice that one of the proposed fixes,
https://issues.apache.org/jira/browse/SOLR-8586, is an entire-index consistency
check between
Maybe you are hitting the reordering issue described in SOLR-8129?
Tomás
On Wed, Jan 27, 2016 at 11:32 AM, David Smith
wrote:
> Sure. Here is our SolrCloud cluster:
>
>+ Three (3) instances of Zookeeper on three separate (physical)
> servers. The ZK servers
Jeff, again, very much appreciate your feedback.
It is interesting — the article you linked to by Shalin is exactly why we
picked SolrCloud over ES, because (eventual) consistency is critical for our
application and we will sacrifice availability for it. To be clear, after the
outage, NONE
On 1/27/2016 8:59 AM, David Smith wrote:
> So we definitely don’t have CP yet — our very first network outage resulted
> in multiple overlapped lost updates. As a result, I can’t pick one replica
> and make it the new “master”. I must rebuild this collection from scratch,
> which I can do,
On 1/27/16, 8:28 AM, "Shawn Heisey" wrote:
>
>I don't think any documentation states this, but it seems like a good
>idea to me use an alias from day one, so that you always have the option
>of swapping the "real" collection that you are using without needing to
>change
If you can identify the problem documents, you can just re-index those after
forcing a sync. Might save a full rebuild and downtime.
You might describe your cluster setup, including ZK. it sounds like you’ve done
your research, but improper ZK node distribution could certainly invalidate
some
Sure. Here is our SolrCloud cluster:
+ Three (3) instances of Zookeeper on three separate (physical) servers.
The ZK servers are beefy and fairly recently built, with 2x10 GigE (bonded)
Ethernet connectivity to the rest of the data center. We recognize importance
of the stability and
This on the surface appears to be similar to an earlier thread by me: "Query
results change"
On Tue, Jan 26, 2016 at 4:32 PM, Jeff Wartes wrote:
>
> Ah, perhaps you fell into something like this then?
> https://issues.apache.org/jira/browse/SOLR-7844
>
> That says it’s
My understanding is that the "version" represents the timestamp the searcher
was opened, so it doesn’t really offer any assurances about your data.
Although you could probably bounce a node and get your document counts back in
sync (by provoking a check), it’s interesting that you’re in this
Thanks Jeff! A few comments
>>
>> Although you could probably bounce a node and get your document counts back
>> in sync (by provoking a check)
>>
If the check is a simple doc count, that will not work. We have found that
replica1 and replica3, although they contain the same doc count,
Ah, perhaps you fell into something like this then?
https://issues.apache.org/jira/browse/SOLR-7844
That says it’s fixed in 5.4, but that would be an example of a split-brain type
incident, where different documents were accepted by different replicas who
each thought they were the leader. If
I have a SolrCloud v5.4 collection with 3 replicas that appear to have fallen
permanently out of sync. Users started to complain that the same search,
executed twice, sometimes returned different result counts. Sure enough, our
replicas are not identical:
>> shard1_replica1: 89867 documents
23 matches
Mail list logo