There is no firewall and the CLOSE_WAITs are between Solr-to-Solr nodes (the origin and destination IP:PORT belong to Solr).
Also, note that the same test runs fine on 5.4.1, even though there are still few hundreds of CLOSE_WAITs. I'm looking at what has changed in the code between 5.4.1 and 5.5.1. It's also only reproducible when Solr is run in SSL mode, so the problem might lie in HttpClient/Jetty too. Shai On Fri, Jul 8, 2016 at 11:59 AM Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Is there a firewall between a client and a server by any chance? > > CLOSE_WAIT is not a leak, but standard TCP step at the end. So the question > is why sockets are reopened that often or why the other side does not > acknowledge TCP termination packet fast. > > I would run Ethereal to troubleshoot that. And truss/strace. > > Regards, > Alex > On 8 Jul 2016 4:56 PM, "Mads Tomasgård Bjørgan" <m...@dips.no> wrote: > > FYI - we're using Solr-6.1.0, and the leak seems to be consequent (occurs > every single time when running with SSL). > > -----Original Message----- > From: Anshum Gupta [mailto:ans...@anshumgupta.net] > Sent: torsdag 7. juli 2016 18.14 > To: solr-user@lucene.apache.org > Subject: Re: File Descriptor/Memory Leak > > I've created a JIRA to track this: > https://issues.apache.org/jira/browse/SOLR-9290 > > On Thu, Jul 7, 2016 at 8:00 AM, Shai Erera <ser...@gmail.com> wrote: > > > Shalin, we're seeing that issue too (and actually actively debugging > > it these days). So far I can confirm the following (on a 2-node cluster): > > > > 1) It consistently reproduces on 5.5.1, but *does not* reproduce on > > 5.4.1 > > 2) It does not reproduce when SSL is disabled > > 3) Restarting the Solr process (sometimes both need to be restarted), > > the count drops to 0, but if indexing continues, they climb up again > > > > When it does happen, Solr seems stuck. The leader cannot talk to the > > replica, or vice versa, the replica is usually put in DOWN state and > > there's no way to fix it besides restarting the JVM. > > > > Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that > > looked suspicious (SOLR-8451 and SOLR-8578), even though the changes > > look legit. That did not help, and honestly I've done that before we > > suspected it might be the SSL. Therefore I think those are "safe", but > just FYI. > > > > When it does happen, the number of CLOSE_WAITS climb very high, to the > > order of 30K+ entries in 'netstat'. > > > > When I say it does not reproduce on 5.4.1 I really mean the numbers > > don't go as high as they do in 5.5.1. Meaning, when running without > > SSL, the number of CLOSE_WAITs is smallish, usually less than a 10 (I > > would separately like to understand why we have any in that state at > > all). When running with SSL and 5.4.1, they stay low at the order of > > hundreds the most. > > > > Unfortunately running without SSL is not an option for us. We will > > likely roll back to 5.4.1, even if the problem exists there, but to a > > lesser degree. > > > > I will post back here when/if we have more info about this. > > > > Shai > > > > On Thu, Jul 7, 2016 at 5:32 PM Shalin Shekhar Mangar < > > shalinman...@gmail.com> > > wrote: > > > > > I have myself seen this CLOSE_WAIT issue at a customer. I am running > > > some tests with different versions trying to pinpoint the cause of this > leak. > > > Once I have some more information and a reproducible test, I'll open > > > a > > jira > > > issue. I'll keep you posted. > > > > > > On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan <m...@dips.no> > > > wrote: > > > > > > > Hello there, > > > > Our SolrCloud is experiencing a FD leak while running with SSL. > > > > This is occurring on the one machine that our program is sending > > > > data too. We > > > have > > > > a total of three servers running as an ensemble. > > > > > > > > While running without SSL does the FD Count remain quite constant > > > > at around 180 while indexing. Performing a garbage collection also > > > > clears almost the entire JVM-memory. > > > > > > > > However - when indexing with SSL does the FDC grow polynomial. The > > count > > > > increases with a few hundred every five seconds or so, but reaches > > easily > > > > 50 000 within three to four minutes. Performing a GC swipes most > > > > of the memory on the two machines our program isn't transmitting > > > > the data > > > directly > > > > to. The last machine is unaffected by the GC, and both memory nor > > > > FDC doesn't reset before Solr is restarted on that machine. > > > > > > > > Performing a netstat reveals that the FDC mostly consists of > > > > TCP-connections in the state of "CLOSE_WAIT". > > > > > > > > > > > > > > > > > > > > > -- > > > Regards, > > > Shalin Shekhar Mangar. > > > > > > > > > -- > Anshum Gupta >