Hi,

My cluster hangs again running an update process, the HTTP POST request was 
aborted because a timeout error. After the hang,  I couldn't do more updates 
without restart the cluster.

I could see this error on node's log after kill it. Is like if solr waits for 
the update response forever … and no more operations can be handle until this 
one finish.

[qtp301150411-1248] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: interrupted waiting for shard update 
response
at 
org.apache.solr.update.SolrCmdDistributor.checkResponses(SolrCmdDistributor.java:429)
at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:99)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:447)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1140)
at 
org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.solr.update.SolrCmdDistributor.checkResponses(SolrCmdDistributor.java:408)
... 35 more

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 3, 2013 at 2:18 AM, Erick Erickson wrote:

> Did you take a stack trace of your _server_ and see if the
> fragment I posted is the place a bunch of threads are
> stuck? If so, then it's what I mentioned, and the patch
> I pointed to should fix it up (when it's ready)...
>  
> The fact that it hangs more frequently with replication > 1
> is consistent with the JIRA.
>  
> Shawn:
>  
> Thanks, you beat me to the punch for clarifying "replication"!
>  
> Best
> Erick
>  
> On Sun, Jun 2, 2013 at 12:41 PM, Yago Riveiro <yago.rive...@gmail.com 
> (mailto:yago.rive...@gmail.com)> wrote:
> > Shawn:
> >  
> > replicationFactor higher than one yes.
> >  
> > --
> > Yago Riveiro
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >  
> >  
> > On Sunday, June 2, 2013 at 4:07 PM, Shawn Heisey wrote:
> >  
> > > On 6/2/2013 8:28 AM, Yago Riveiro wrote:
> > > > Erick:
> > > >  
> > > > In my case, when server hangs, no exception is thrown, the logs on both 
> > > > servers stop registering the update INFO messages. if a shutdown one 
> > > > node, immediately the log of the alive node register some update INFO 
> > > > messages that appears was stuck at some place on the update operation.
> > > >  
> > > > Other thing that I notice is the fact that the cluster hangs more 
> > > > frequently when the collection has replication.
> > >  
> > > Just to clarify, you are talking about a replicationFactor higher than
> > > one, not old-style master-slave replication, correct? I'm pretty sure
> > > that's the case, I'm just trying to keep this topic from getting derailed.
> > >  
> > > Thanks,
> > > Shawn
> > >  
> >  
> >  
>  
>  
>  


Reply via email to