Looking at the master it looks like at some point there were shards that went down. I am seeing things like what is below.
NFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 12) Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process INFO: Updating live nodes... (9) Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: I may be the new leader - try and sync On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <markrmil...@gmail.com> wrote: > I don't think the versions you are thinking of apply here. Peersync does > not look at that - it looks at version numbers for updates in the > transaction log - it compares the last 100 of them on leader and replica. > What it's saying is that the replica seems to have versions that the leader > does not. Have you scanned the logs for any interesting exceptions? > > Did the leader change during the heavy indexing? Did any zk session > timeouts occur? > > - Mark > > On Apr 2, 2013, at 4:52 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > I am currently looking at moving our Solr cluster to 4.2 and noticed a > > strange issue while testing today. Specifically the replica has a higher > > version than the master which is causing the index to not replicate. > > Because of this the replica has fewer documents than the master. What > > could cause this and how can I resolve it short of taking down the index > > and scping the right version in? > > > > MASTER: > > Last Modified:about an hour ago > > Num Docs:164880 > > Max Doc:164880 > > Deleted Docs:0 > > Version:2387 > > Segment Count:23 > > > > REPLICA: > > Last Modified: about an hour ago > > Num Docs:164773 > > Max Doc:164773 > > Deleted Docs:0 > > Version:3001 > > Segment Count:30 > > > > in the replicas log it says this: > > > > INFO: Creating new http client, > > > config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false > > > > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync > > > > INFO: PeerSync: core=dsc-shard5-core2 > > url=http://10.38.33.17:7577/solrSTART replicas=[ > > http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100 > > > > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions > > > > INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr > > Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/ > > > > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions > > > > INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our > > versions are newer. ourLowThreshold=1431233788792274944 > > otherHigh=1431233789440294912 > > > > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync > > > > INFO: PeerSync: core=dsc-shard5-core2 > > url=http://10.38.33.17:7577/solrDONE. sync succeeded > > > > > > which again seems to point that it thinks it has a newer version of the > > index so it aborts. This happened while having 10 threads indexing > 10,000 > > items writing to a 6 shard (1 replica each) cluster. Any thoughts on > this > > or what I should look for would be appreciated. > >