Network it's ok? Between nodes? The use? Swap it's disabled? Swapiness rhe value it's 0?
Em sáb, 4 de jul de 2020 15:19, Tran Van Hoan <tranvanhoan...@yahoo.com.invalid> escreveu: > I used physical servers, and IO wait is small :(!!!I saw that iptables > dropped all ACK message from clients (not only client solr, prometheus > scape metric from exporter was dropped too).all when i check netstat > -anp|grep 8983, all socket are TIME_WAIT state.Only restart solrs, the > incident was resolved. Total request around 2.5k request per second per > node. > > On Sunday, July 5, 2020, 1:11:38 AM GMT+7, Rodrigo Oliveira < > adamantina.rodr...@gmail.com> wrote: > > Hi, > > I had this problem. In my case was the wait/io in vm. I migrate my > environment to another place and solved. > > Actually it's problem wirh wait/io at host physical (until backup it's a > problem over veeam). > > Regards > > Em sáb, 4 de jul de 2020 12:30, Tran Van Hoan > <tranvanhoan...@yahoo.com.invalid> escreveu: > > > The problem reoccurs repeatly in recent days. > > To day i tried dump heap and thread. Only dumping thread, heap can not > > because solr instance was hang. > > Almost thread was blocked. > > > > On Tuesday, June 23, 2020, 10:42:36 PM GMT+7, Tran Van Hoan > > <tranvanhoan...@yahoo.com.invalid> wrote: > > > > > > I checked node exporter metrics and saw network no problem > > > > On Tuesday, June 23, 2020, 8:37:41 PM GMT+7, Tran Van Hoan < > > tranvanhoan...@yahoo.com> wrote: > > > > > > I check node exporter, no problem with OS, hardware and network. > > I attached images about solr metrics 7 days and 12h. > > > > On Tuesday, June 23, 2020, 2:23:05 PM GMT+7, Dario Rigolin < > > dario.rigo...@comperio.it> wrote: > > > > > > What about a network issue? > > > > Il giorno mar 23 giu 2020 alle ore 01:37 Tran Van Hoan > > <tranvanhoan...@yahoo.com.invalid> ha scritto: > > > > > > > > > > dear all, > > > > > > I have a solr cloud 8.2.0 with 6 instance per 6 server (64G RAM), each > > > instance has xmx = xms = 30G. > > > > > > Today almost nodes in the solrcloud were dead 2 times from 8:00AM (5/6 > > > nodes were down) and 1:00PM (2/6 nodes were down). yesterday, One > node > > > were down. almost metrics didn't increase too much except threads. > > > > > > Performance in one week ago: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > performace 12h ago: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I go to the admin UI, some node dead some node too long to response. > When > > > checking logfile, they generate too much (log level warning), here are > > logs > > > which appears in the solr cloud: > > > > > > Log before server 4 and 6 down > > > > > > - Server 4 before it dead: > > > > > > + o.a.s.h.RequestHandlerBase java.io.IOException: > > > java.util.concurrent.TimeoutException: Idle timeout expired: > > 120000/120000 > > > ms > > > > > > +org.apache.solr.client.solrj.SolrServerException: Timeout occured > while > > > waiting response from server at: > > > http://server6:8983/solr/mycollection_shard3_replica_n5/select > > > > > > > > > > > > at > > > > > > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:406) > > > > > > at > > > > > > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746) > > > > > > at > > > org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274) > > > > > > at > > > > > > org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238) > > > > > > at > > > > > > org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199) > > > > > > at > > java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > > > > at > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > > > > > at > > java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > > > > at > > > > > > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) > > > > > > at > > > > > > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > > > > > > at > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > > > > > at > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > > > > > ... 1 more > > > > > > Caused by: java.util.concurrent.TimeoutException > > > > > > at > > > > > > org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216) > > > > > > at > > > > > > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:397) > > > > > > ... 12 more > > > > > > > > > > > > + o.a.s.s.HttpSolrCall invalid return code: -1 > > > > > > + o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp: > > > 1592803662746 , received timestamp: 1592803796152 , TTL: 120000 > > > > > > + o.a.s.s.PKIAuthenticationPlugin Decryption failed , key must be wrong > > => > > > java.security.InvalidKeyException: No installed provider supports this > > key: > > > (null) > > > > > > + o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling > > > SolrCmdDistributor$Req: cmd=delete{,commitWithin=-1}; node=ForwardNode: > > > http://server6:8983/solr/mycollection_shard3_replica_n5/ to > > > http://server6:8983/solr/mycollection_shard3_replica_n5/ => > > > java.util.concurrent.TimeoutException > > > > > > + o.a.s.s.HttpSolrCall > > > > > > null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > > > Async exception during distributed update: null > > > > > > > > > > > > Server 2: > > > > > > + Max requests queued per destination 3000 exceeded for > > > HttpDestination[http://server4:8983 > > > ]@7d7ec93c,queue=3000,pool=MultiplexConnectionPool@73b938e3 > > > [c=4/4,b=4,m=0,i=0] > > > > > > + Max requests queued per destination 3000 exceeded for > > > HttpDestination[http://server5:8983 > > > ]@7d7ec93c,queue=3000,pool=MultiplexConnectionPool@73b938e3 > > > [c=4/4,b=4,m=0,i=0] > > > > > > > > > > > > + Timeout occured while waiting response from server at: > > > http://server4:8983/solr/mycollection_shard6_replica_n23/select > > > > > > + Timeout occured while waiting response from server at: > > > http://server6:8983/solr/mycollection_shard2_replica_n15/select > > > > > > + o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: > > > org.apache.solr.client.solrj.SolrServerException: IOException occured > > when > > > talking to server at: null > > > > > > Caused by: org.apache.solr.client.solrj.SolrServerException: > IOException > > > occured when talking to server at: null > > > > > > Caused by: java.nio.channels.ClosedChannelException > > > > > > > > > > > > Server 6: > > > > > > + org.apache.solr.client.solrj.SolrServerException: Timeout occured > > while > > > waiting response from server at: > > > http://server6:8983/solr/mycollection_shard2_replica_n15/select > > > > > > + + org.apache.solr.client.solrj.SolrServerException: Timeout occured > > > while waiting response from server at: Timeout occured while waiting > > > response from server at: > > > http://server4:8983/mycollection_shard6_replica_n23/select > > > > > > > > > > > > I tried search google but didn't find any clue :(! Do you help me how > to > > > find the cause. thank you! > > > > > > > > > > > > > > > > > > > > > -- > > > > Dario Rigolin > > Comperio srl - CTO > > Mobile: +39 347 7232652 - Office: +39 0425 471482 > > Skype: dario.rigolin > > > >