sorry for spamming here.... shard5-core2 is the instance we're having issues with...
Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: shard update error StdNode: http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException: Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok status:503, message:Service Unavailable at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <jej2...@gmail.com> wrote: > here is another one that looks interesting > > Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the > leader, but locally we don't think so > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) > at > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) > at > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) > at > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) > > > > On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <jej2...@gmail.com> wrote: > >> Looking at the master it looks like at some point there were shards that >> went down. I am seeing things like what is below. >> >> NFO: A cluster state change: WatchedEvent state:SyncConnected >> type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live >> nodes size: 12) >> Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 >> process >> INFO: Updating live nodes... (9) >> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext >> runLeaderProcess >> INFO: Running the leader process. >> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext >> shouldIBeLeader >> INFO: Checking if I should try and be the leader. >> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext >> shouldIBeLeader >> INFO: My last published State was Active, it's okay to be the leader. >> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext >> runLeaderProcess >> INFO: I may be the new leader - try and sync >> >> >> >> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <markrmil...@gmail.com>wrote: >> >>> I don't think the versions you are thinking of apply here. Peersync does >>> not look at that - it looks at version numbers for updates in the >>> transaction log - it compares the last 100 of them on leader and replica. >>> What it's saying is that the replica seems to have versions that the leader >>> does not. Have you scanned the logs for any interesting exceptions? >>> >>> Did the leader change during the heavy indexing? Did any zk session >>> timeouts occur? >>> >>> - Mark >>> >>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>> >>> > I am currently looking at moving our Solr cluster to 4.2 and noticed a >>> > strange issue while testing today. Specifically the replica has a >>> higher >>> > version than the master which is causing the index to not replicate. >>> > Because of this the replica has fewer documents than the master. What >>> > could cause this and how can I resolve it short of taking down the >>> index >>> > and scping the right version in? >>> > >>> > MASTER: >>> > Last Modified:about an hour ago >>> > Num Docs:164880 >>> > Max Doc:164880 >>> > Deleted Docs:0 >>> > Version:2387 >>> > Segment Count:23 >>> > >>> > REPLICA: >>> > Last Modified: about an hour ago >>> > Num Docs:164773 >>> > Max Doc:164773 >>> > Deleted Docs:0 >>> > Version:3001 >>> > Segment Count:30 >>> > >>> > in the replicas log it says this: >>> > >>> > INFO: Creating new http client, >>> > >>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false >>> > >>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync >>> > >>> > INFO: PeerSync: core=dsc-shard5-core2 >>> > url=http://10.38.33.17:7577/solrSTART replicas=[ >>> > http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100 >>> > >>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions >>> > >>> > INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr >>> > Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/ >>> > >>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions >>> > >>> > INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our >>> > versions are newer. ourLowThreshold=1431233788792274944 >>> > otherHigh=1431233789440294912 >>> > >>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync >>> > >>> > INFO: PeerSync: core=dsc-shard5-core2 >>> > url=http://10.38.33.17:7577/solrDONE. sync succeeded >>> > >>> > >>> > which again seems to point that it thinks it has a newer version of the >>> > index so it aborts. This happened while having 10 threads indexing >>> 10,000 >>> > items writing to a 6 shard (1 replica each) cluster. Any thoughts on >>> this >>> > or what I should look for would be appreciated. >>> >>> >> >