I think it is just a side-effect of the current implementation that the ranges are assigned linearly. You can also verify this by choosing a document from each shard and running it's uniqueKey against the CompositeIdRouter's sliceHash method and verifying that it is included in the range.
I couldn't reproduce this but I didn't try too hard either. If you are able to isolate a reproducible example then please do report back. I'll spend some time to review the related code again to see if I can spot the problem. On Thu, Feb 27, 2014 at 2:19 AM, Greg Pendlebury <greg.pendleb...@gmail.com> wrote: > Thanks Shalin, that code might be helpful... do you know if there is a > reliable way to line up the ranges with the shard numbers? When the problem > occurred we had 80 million documents already in the index, and could not > issue even a basic 'deleteById' call. I'm tempted to assume they are just > assigned linearly since our Test and Prod clusters both look to work that > way now, but I can't be sure whether that is by design or just happenstance > of boot order. > > And no, unfortunately we have not been able to reproduce this issue > consistently despite trying a number of different things such as graceless > stop/start and screwing with the underlying WAR file (which is what we > thought puppet might be doing). The problem has occurred twice since, but > always in our Test environment. The fact that Test has only a single > replica per shard is the most likely culprit for me, but as mentioned, even > gracelessly killing the last replica in the cluster seems to leave the > range set correctly in clusterstate when we test it in isolation. > > In production (45 JVMs, 15 shards with 3 replicas each) we've never seen > the problem, despite a similar number of rollouts for version changes etc. > > Ta, > Greg > > > > > On 26 February 2014 23:46, Shalin Shekhar Mangar > <shalinman...@gmail.com>wrote: > >> If you have 15 shards and assuming that you've never used shard >> splitting, you can calculate the shard ranges by using new >> CompositeIdRouter().partitionRange(15, new >> CompositeIdRouter().fullRange()) >> >> This gives me: >> [80000000-9110ffff, 91110000-a221ffff, a2220000-b332ffff, >> b3330000-c443ffff, c4440000-d554ffff, d5550000-e665ffff, >> e6660000-f776ffff, f7770000-887ffff, 8880000-1998ffff, >> 19990000-2aa9ffff, 2aaa0000-3bbaffff, 3bbb0000-4ccbffff, >> 4ccc0000-5ddcffff, 5ddd0000-6eedffff, 6eee0000-7fffffff] >> >> Have you done any more investigation into why this happened? Anything >> strange in the logs? Are you able to reproduce this in a test >> environment? >> >> On Wed, Feb 19, 2014 at 5:16 AM, Greg Pendlebury >> <greg.pendleb...@gmail.com> wrote: >> > We've got a 15 shard cluster spread across 3 hosts. This morning our >> puppet >> > software rebooted them all and afterwards the 'range' for each shard has >> > become null in zookeeper. Is there any way to restore this value short of >> > rebuilding a fresh index? >> > >> > I've read various questions from people with a similar problem, although >> in >> > those cases it is usually a single shard that has become null allowing >> them >> > to infer what the value should be and manually fix it in ZK. In this >> case I >> > have no idea what the ranges should be. This is our test cluster, and >> > checking production I can see that the ranges don't appear to be >> > predictable based on the shard number. >> > >> > I'm also not certain why it even occurred. Our test cluster only has a >> > single replica per shard, so when a JVM is rebooted the cluster is >> > unavailable... would that cause this? Production has 3 replicas so we can >> > do rolling reboots. >> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> -- Regards, Shalin Shekhar Mangar.