Hi, My cluster hangs again running an update process, the HTTP POST request was aborted because a timeout error. After the hang, I couldn't do more updates without restart the cluster.
I could see this error on node's log after kill it. Is like if solr waits for the update response forever … and no more operations can be handle until this one finish. [qtp301150411-1248] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: interrupted waiting for shard update response at org.apache.solr.update.SolrCmdDistributor.checkResponses(SolrCmdDistributor.java:429) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:99) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:447) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1140) at org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.solr.update.SolrCmdDistributor.checkResponses(SolrCmdDistributor.java:408) ... 35 more -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 3, 2013 at 2:18 AM, Erick Erickson wrote: > Did you take a stack trace of your _server_ and see if the > fragment I posted is the place a bunch of threads are > stuck? If so, then it's what I mentioned, and the patch > I pointed to should fix it up (when it's ready)... > > The fact that it hangs more frequently with replication > 1 > is consistent with the JIRA. > > Shawn: > > Thanks, you beat me to the punch for clarifying "replication"! > > Best > Erick > > On Sun, Jun 2, 2013 at 12:41 PM, Yago Riveiro <yago.rive...@gmail.com > (mailto:yago.rive...@gmail.com)> wrote: > > Shawn: > > > > replicationFactor higher than one yes. > > > > -- > > Yago Riveiro > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > On Sunday, June 2, 2013 at 4:07 PM, Shawn Heisey wrote: > > > > > On 6/2/2013 8:28 AM, Yago Riveiro wrote: > > > > Erick: > > > > > > > > In my case, when server hangs, no exception is thrown, the logs on both > > > > servers stop registering the update INFO messages. if a shutdown one > > > > node, immediately the log of the alive node register some update INFO > > > > messages that appears was stuck at some place on the update operation. > > > > > > > > Other thing that I notice is the fact that the cluster hangs more > > > > frequently when the collection has replication. > > > > > > Just to clarify, you are talking about a replicationFactor higher than > > > one, not old-style master-slave replication, correct? I'm pretty sure > > > that's the case, I'm just trying to keep this topic from getting derailed. > > > > > > Thanks, > > > Shawn > > > > > > > > > >