I didn't see any OOM errors in the logs on either of the nodes. I saw GC pause of 1 second on the box that was throwing error ...but nothing on the other node. Any other recommendations? Thanks
Thanks Jay Potharaju On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju <jspothar...@gmail.com> wrote: > Ah thanks for explaining that! > > Thanks > Jay Potharaju > > > On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >> Node A receives batch of documents to index. It forwards documents to >> shards that are on the node B. Node B is having issues with GC so it takes >> a while to respond. Node A sees it as read timeout and reports it in logs. >> So the issue is on node B not node A. >> >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >> > On 7 May 2018, at 18:39, Jay Potharaju <jspothar...@gmail.com> wrote: >> > >> > Yes, the nodes are well balanced. I am just using these boxes for >> indexing >> > the data and is not serving any traffic at this time. The error >> indicates >> > it is having issues errors on the shards that are hosted on the box and >> not >> > on the other box. >> > I will check GC logs to see if there were any issues. >> > thanks >> > >> > Thanks >> > Jay Potharaju >> > >> > >> > On Mon, May 7, 2018 at 9:34 AM, Emir Arnautović < >> > emir.arnauto...@sematext.com> wrote: >> > >> >> Hi Jay, >> >> My first guess would be that there was some major GC on other box so it >> >> did not respond on time. Are your nodes well balanced - do they serve >> equal >> >> amount of data? >> >> >> >> Thanks, >> >> Emir >> >> -- >> >> Monitoring - Log Management - Alerting - Anomaly Detection >> >> Solr & Elasticsearch Consulting Support Training - >> http://sematext.com/ >> >> >> >> >> >> >> >>> On 7 May 2018, at 18:11, Jay Potharaju <jspothar...@gmail.com> wrote: >> >>> >> >>> Hi, >> >>> I am seeing the following lines in the error log. My setup has 2 >> nodes in >> >>> the solrcloud cluster, each node has 3 shards with no replication. >> From >> >> the >> >>> error log it seems like all the shards on this box are throwing async >> >>> exception errors. Other node in the cluster does not have any errors >> in >> >> the >> >>> logs. Any suggestions on how to tackle this error? >> >>> >> >>> Solr setup >> >>> Solr:6.6.3 >> >>> 2Nodes: 3 shards each >> >>> >> >>> >> >>> ERROR org.apache.solr.servlet.HttpSolrCall [test_shard3_replica1] ? >> >>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$ >> >> DistributedUpdatesAsyncException: >> >>> Async exception during distributed update: Read timed out >> >>> at >> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish( >> >> DistributedUpdateProcessor.java:972) >> >>> at >> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish( >> >> DistributedUpdateProcessor.java:1911) >> >>> at >> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody( >> >> ContentStreamHandlerBase.java:78) >> >>> at >> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest( >> >> RequestHandlerBase.java:173) >> >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) >> >>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall. >> java:723) >> >>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) >> >>> at >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter( >> >> SolrDispatchFilter.java:361) >> >>> at >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter( >> >> SolrDispatchFilter.java:305) >> >>> at >> >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain. >> >> doFilter(ServletHandler.java:1691) >> >>> at >> >>> org.eclipse.jetty.servlet.ServletHandler.doHandle( >> >> ServletHandler.java:582) >> >>> at >> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle( >> >> ScopedHandler.java:143) >> >>> at >> >>> org.eclipse.jetty.security.SecurityHandler.handle( >> >> SecurityHandler.java:548) >> >>> at >> >>> org.eclipse.jetty.server.session.SessionHandler. >> >> doHandle(SessionHandler.java:226) >> >>> at >> >>> org.eclipse.jetty.server.handler.ContextHandler. >> >> doHandle(ContextHandler.java:1180) >> >>> at org.eclipse.jetty.servlet.ServletHandler.doScope( >> >> ServletHandler.java:512) >> >>> at >> >>> org.eclipse.jetty.server.session.SessionHandler. >> >> doScope(SessionHandler.java:185) >> >>> at >> >>> org.eclipse.jetty.server.handler.ContextHandler. >> >> doScope(ContextHandler.java:1112) >> >>> at >> >>> org.eclipse.jetty.server.handler.ScopedHandler.handle( >> >> ScopedHandler.java:141) >> >>> at >> >>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle( >> >> ContextHandlerCollection.java:213) >> >>> at >> >>> org.eclipse.jetty.server.handler.HandlerCollection. >> >> handle(HandlerCollection.java:119) >> >>> at >> >>> org.eclipse.jetty.server.handler.HandlerWrapper.handle( >> >> HandlerWrapper.java:134) >> >>> at >> >>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle( >> >> RewriteHandler.java:335) >> >>> at >> >>> org.eclipse.jetty.server.handler.HandlerWrapper.handle( >> >> HandlerWrapper.java:134) >> >>> at org.eclipse.jetty.server.Server.handle(Server.java:534) >> >>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) >> >>> at >> >>> org.eclipse.jetty.server.HttpConnection.onFillable( >> >> HttpConnection.java:251) >> >>> at >> >>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded( >> >> AbstractConnection.java:273) >> >>> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) >> >>> at >> >>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run( >> >> SelectChannelEndPoint.java:93) >> >>> at >> >>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( >> >> QueuedThreadPool.java:671) >> >>> at >> >>> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run( >> >> QueuedThreadPool.java:589) >> >>> at java.lang.Thread.run(Unknown Source) >> >>> >> >>> >> >>> Thanks >> >>> Jay >> >> >> >> >> >> >