On 10/14/2018 6:25 PM, dami...@gmail.com wrote:
I had an issue with async backup on solr 6.5.1 reporting that the backup
was complete when clearly it was not. I was using 12 shards across 6 nodes.
I only noticed this issue when one shard was much larger than the others.
There were no answers here
http://lucene.472066.n3.nabble.com/async-backup-td4342776.html

One detail I thought I had written but isn't there:  The backup did fully complete -- all 30 shards were in the backup location.  Not a lot in each shard backup -- the collection was empty.  It would be easy enough to add a few thousand documents to the collection before doing the backup.

If the backup process reports that it's done before it's ACTUALLY done, that's a bad thing.  It's hard to say whether that problem is related to the problem I described.  Since I haven't dived into the code, I cannot say for sure, but it honestly would not surprise me to find they are connected.  Every time I try to understand Collections API code, I find it extremely difficult to follow.

I'm sorry that you never got resolution on your problem.  Do you know whether that is still a problem in 7.x?  Setting up a reproduction where one shard is significantly larger than the others will take a little bit of work.

I was focusing on the STATUS returned from the REQUESTSTATUS command, but
looking again now I can see a response from only 6 shards, and each shard
is from a different node. So this fits with what you're seeing. I assume
your shards 1, 7, 9 are all on different nodes.

I did not actually check, and the cloud example I was using isn't around any more, but each of the shards in the status response were PROBABLY on separate nodes.  The cloud example was 3 nodes.  It's an easy enough scenario to replicate, and I provided enough details for anyone to do it.

The person on IRC that reported this problem had a cluster of 15 nodes, and the status response had ten shards (out of 30) mentioned.  It was shards 1-9 and shard 20.  The suspicion is that there's something hard-coded that limits it to 10 responses ... because without that, I would expect the number of shards in the response to match the number of nodes.

Thanks,
Shawn

Reply via email to