Hi Dominique, Our issues are similar to the one discussed here. https://github.com/eclipse/jetty.project/issues/4105
Your views on this. Thanks, Mohandoss. On Tue, Aug 11, 2020 at 7:06 AM Doss <itsmed...@gmail.com> wrote: > Hi Dominique, > > Thanks for the response. > > I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is > the best choice for LTS version. > > >> We will try changing it. > > You change a lot of default values. Any specific raisons ? Il seems very > aggressive ! > > >> Our product team wants data to be reflected in Near Real Time. > mergePolicyFactory, mergeScheduler - This is based on our oldest SOLR > cluster where these parameter tweaking gave good results. > > You have to analyze GC on all nodes ! > > >> I checked other nodes GC, found no issues. I shared the node's GC which > gets into trouble very frequently. > > Your heap is very big. According to full GC frequency, I don't think you > really need such a big heap for only indexing. May be when you will perform > queries. > > >> Heap Sizing is based on the select requests we are expecting. We expect > it would be around 10 to 15 million per day. We have plans to increase CPU > before routing select traffics. > > Did you check your network performances ? > > >> We do checked in sar reports, but unable to figure out an issue, we use > 10 GBPS connection. Is there any SOLR metric API which will give network > related information? Please suggest other ways to dig this further. > > Did you check Zookeeper logs ? > > >> We never looked at the Zookeeper logs, will check and share, is there > any kind of information to watch out for? > > Regards, > Doss > > > On Monday, August 10, 2020, Dominique Bejean <dominique.bej...@eolya.fr> > wrote: > >> Doss, >> >> See below. >> >> Dominique >> >> >> Le lun. 10 août 2020 à 17:41, Doss <itsmed...@gmail.com> a écrit : >> >>> Hi Dominique, >>> >>> Thanks for your response. Find below the details, please do let me know >>> if anything I missed. >>> >>> >>> *- hardware architecture and sizing* >>> >> Centos 7, VMs,4CPUs, 66GB RAM, 16GB Heap, 250GB SSD >>> >>> >>> *- JVM version / settings * >>> >> Red Hat, Inc. OpenJDK 64-Bit Server VM, version:"14.0.1 14.0.1+7" - >>> Default Settings including GC >>> >> >> I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is >> the best choice for LTS version. >> >> >>> >>> *- Solr settings * >>> >> softCommit: 15000 (15 sec), autoCommit: 300000 (5 mins) >>> <mergePolicyFactory >>> class="org.apache.solr.index.TieredMergePolicyFactory"><int >>> name="maxMergeAtOnce">30</int> <int name="maxMergeAtOnceExplicit">100</int> >>> <double name="segmentsPerTier">30.0</double> </mergePolicyFactory> >>> >>> <mergeScheduler >>> class="org.apache.lucene.index.ConcurrentMergeScheduler"><int >>> name="maxMergeCount">18</int><int >>> name="maxThreadCount">6</int></mergeScheduler> >>> >> >> You change a lot of default values. Any specific raisons ? Il seems very >> aggressive ! >> >> >>> >>> >>> *- collections and queries information * >>> >> One Collection, with 4 shards , 3 replicas , 3.5 Million Records, 150 >>> columns, mostly integer fields, Average doc size is 350kb. Insert / Updates >>> 0.5 Million Span across the whole day (peak time being 6PM to 10PM) , >>> selects not yet started. Daily once we do delta import of cetrain fields of >>> type multivalued with some good amount of data. >>> >>> *- gc logs or gceasy results* >>> >>> Easy GC Report says GC health is good, one server's gc report: >>> https://drive.google.com/file/d/1C2SqEn0iMbUOXnTNlYi46Gq9kF_CmWss/view?usp=sharing >>> CPU Load Pattern: >>> https://drive.google.com/file/d/1rjRMWv5ritf5QxgbFxDa0kPzVlXdbySe/view?usp=sharing >>> >>> >> You have to analyze GC on all nodes ! >> Your heap is very big. According to full GC frequency, I don't think you >> really need such a big heap for only indexing. May be when you will perform >> queries. >> >> Did you check your network performances ? >> Did you check Zookeeper logs ? >> >> >>> >>> Thanks, >>> Doss. >>> >>> >>> >>> On Mon, Aug 10, 2020 at 7:39 PM Dominique Bejean < >>> dominique.bej...@eolya.fr> wrote: >>> >>>> Hi Doss, >>>> >>>> See a lot of TIMED_WATING connection occurs with high tcp traffic >>>> infrastructure as in a LAMP solution when the Apache server can't >>>> anymore connect to the MySQL/MariaDB database. >>>> In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but >>>> never net.ipv4.tcp_tw_recycle as you suggested in your previous post). >>>> This >>>> is well explained in this great article >>>> https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux >>>> >>>> However, in general and more specifically in your case, I would >>>> investigate >>>> the root cause of your issue and do not try to find a workaround. >>>> >>>> Can you provide more information about your use case (we know : 3 node >>>> SOLR >>>> (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ? >>>> >>>> - hardware architecture and sizing >>>> - JVM version / settings >>>> - Solr settings >>>> - collections and queries information >>>> - gc logs or gceasy results >>>> >>>> Regards >>>> >>>> Dominique >>>> >>>> >>>> >>>> Le lun. 10 août 2020 à 15:43, Doss <itsmed...@gmail.com> a écrit : >>>> >>>> > Hi, >>>> > >>>> > In solr 8.3.1 source, I see the following , which I assume could be >>>> the >>>> > reason for the issue "Max requests queued per destination 3000 >>>> exceeded for >>>> > HttpDestination", >>>> > >>>> > >>>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java: >>>> > private static final int MAX_OUTSTANDING_REQUESTS = 1000; >>>> > >>>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java: >>>> > available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false); >>>> > >>>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java: >>>> > return MAX_OUTSTANDING_REQUESTS * 3; >>>> > >>>> > how can I increase this? >>>> > >>>> > On Mon, Aug 10, 2020 at 12:01 AM Doss <itsmed...@gmail.com> wrote: >>>> > >>>> > > Hi, >>>> > > >>>> > > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble >>>> now and >>>> > > then we are facing "Max requests queued per destination 3000 >>>> exceeded for >>>> > > HttpDestination" >>>> > > >>>> > > After restart evering thing starts working fine until another >>>> problem. >>>> > > Once a problem occurred we are seeing soo many TIMED_WAITING threads >>>> > > >>>> > > Server 1: >>>> > > *7722* Threads are in TIMED_WATING >>>> > > >>>> > >>>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f >>>> > > ") >>>> > > Server 2: >>>> > > *4046* Threads are in TIMED_WATING >>>> > > >>>> > >>>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3 >>>> > > ") >>>> > > Server 3: >>>> > > *4210* Threads are in TIMED_WATING >>>> > > >>>> > >>>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0 >>>> > > ") >>>> > > >>>> > > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how >>>> can we >>>> > > increase the 3000 limit? >>>> > > >>>> > > Sorry, since I haven't got any response to my previous query, I am >>>> > > creating this as new, >>>> > > >>>> > > Thanks, >>>> > > Mohandoss. >>>> > > >>>> > >>>> >>>