My client has a test cluster Solr 4.6 with three instances 1, 2, and 3 hosting
shards 1, 2, and 3, respectively. There is no replication in this cluster. We
started receiving OOME during indexing; likely the batches were too large. The
cluster was rebooted to restore the system. However, upon reboot, instance 2
now shows as a replica of shard 1 and its shard2 is down with a null range.
Instance 2 is queryable shards.tolerant=true&distribute=false and returns a
different set of records than instance 1 (as would be expected during normal
operations). Clusterstate.json is similar to the following:
mycollection:{
shard1:{
range:8000000-d554ffff,
state:active,
replicas:{
instance1....state:active...,
instance2....state:active...
}
},
shard3:{....state:active.....},
shard2:{
range:null,
state:active,
replicas:{
instance2{....state:down....}
}
},
maxShardsPerNode:1,
replicationFactor:1
}
Any ideas on how this would come to pass? Would manually correcting the
clusterstate.json in Zk correct this situation?