It would appear it's a bug given what you have said.

Any other exceptions would be useful. Might be best to start tracking in a JIRA 
issue as well.

To fix, I'd bring the behind node down and back again.

Unfortunately, I'm pressed for time, but we really need to get to the bottom of 
this and fix it, or determine if it's fixed in 4.2.1 (spreading to mirrors now).

- Mark

On Apr 2, 2013, at 7:21 PM, Jamie Johnson <jej2...@gmail.com> wrote:

> Sorry I didn't ask the obvious question.  Is there anything else that I
> should be looking for here and is this a bug?  I'd be happy to troll
> through the logs further if more information is needed, just let me know.
> 
> Also what is the most appropriate mechanism to fix this.  Is it required to
> kill the index that is out of sync and let solr resync things?
> 
> 
> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <jej2...@gmail.com> wrote:
> 
>> sorry for spamming here....
>> 
>> shard5-core2 is the instance we're having issues with...
>> 
>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>> SEVERE: shard update error StdNode:
>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException:
>> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
>> status:503, message:Service Unavailable
>>        at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>>        at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>        at
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>>        at
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>>        at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>        at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>        at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>        at java.lang.Thread.run(Thread.java:662)
>> 
>> 
>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>> 
>>> here is another one that looks interesting
>>> 
>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
>>> the leader, but locally we don't think so
>>>        at
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>>>        at
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>>>        at
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>>>        at
>>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>>>        at
>>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>>>        at
>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>>>        at
>>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>>        at
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>>        at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>>>        at
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>>>        at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>>> 
>>> 
>>> 
>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>> 
>>>> Looking at the master it looks like at some point there were shards that
>>>> went down.  I am seeing things like what is below.
>>>> 
>>>> NFO: A cluster state change: WatchedEvent state:SyncConnected
>>>> type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
>>>> nodes size: 12)
>>>> Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
>>>> process
>>>> INFO: Updating live nodes... (9)
>>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>>> runLeaderProcess
>>>> INFO: Running the leader process.
>>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>>> shouldIBeLeader
>>>> INFO: Checking if I should try and be the leader.
>>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>>> shouldIBeLeader
>>>> INFO: My last published State was Active, it's okay to be the leader.
>>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>>> runLeaderProcess
>>>> INFO: I may be the new leader - try and sync
>>>> 
>>>> 
>>>> 
>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <markrmil...@gmail.com>wrote:
>>>> 
>>>>> I don't think the versions you are thinking of apply here. Peersync
>>>>> does not look at that - it looks at version numbers for updates in the
>>>>> transaction log - it compares the last 100 of them on leader and replica.
>>>>> What it's saying is that the replica seems to have versions that the 
>>>>> leader
>>>>> does not. Have you scanned the logs for any interesting exceptions?
>>>>> 
>>>>> Did the leader change during the heavy indexing? Did any zk session
>>>>> timeouts occur?
>>>>> 
>>>>> - Mark
>>>>> 
>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>>> 
>>>>>> I am currently looking at moving our Solr cluster to 4.2 and noticed a
>>>>>> strange issue while testing today.  Specifically the replica has a
>>>>> higher
>>>>>> version than the master which is causing the index to not replicate.
>>>>>> Because of this the replica has fewer documents than the master.  What
>>>>>> could cause this and how can I resolve it short of taking down the
>>>>> index
>>>>>> and scping the right version in?
>>>>>> 
>>>>>> MASTER:
>>>>>> Last Modified:about an hour ago
>>>>>> Num Docs:164880
>>>>>> Max Doc:164880
>>>>>> Deleted Docs:0
>>>>>> Version:2387
>>>>>> Segment Count:23
>>>>>> 
>>>>>> REPLICA:
>>>>>> Last Modified: about an hour ago
>>>>>> Num Docs:164773
>>>>>> Max Doc:164773
>>>>>> Deleted Docs:0
>>>>>> Version:3001
>>>>>> Segment Count:30
>>>>>> 
>>>>>> in the replicas log it says this:
>>>>>> 
>>>>>> INFO: Creating new http client,
>>>>>> 
>>>>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
>>>>>> 
>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
>>>>>> 
>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[
>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
>>>>>> 
>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
>>>>>> 
>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>>>>> http://10.38.33.17:7577/solr
>>>>>> Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/
>>>>>> 
>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
>>>>>> 
>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>>>>> http://10.38.33.17:7577/solr  Our
>>>>>> versions are newer. ourLowThreshold=1431233788792274944
>>>>>> otherHigh=1431233789440294912
>>>>>> 
>>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
>>>>>> 
>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>>>>> url=http://10.38.33.17:7577/solrDONE. sync succeeded
>>>>>> 
>>>>>> 
>>>>>> which again seems to point that it thinks it has a newer version of
>>>>> the
>>>>>> index so it aborts.  This happened while having 10 threads indexing
>>>>> 10,000
>>>>>> items writing to a 6 shard (1 replica each) cluster.  Any thoughts on
>>>>> this
>>>>>> or what I should look for would be appreciated.
>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to