Re: File Descriptor/Memory Leak

Anshum Gupta Thu, 07 Jul 2016 09:15:17 -0700

I've created a JIRA to track this:
https://issues.apache.org/jira/browse/SOLR-9290


On Thu, Jul 7, 2016 at 8:00 AM, Shai Erera <ser...@gmail.com> wrote:

> Shalin, we're seeing that issue too (and actually actively debugging it
> these days). So far I can confirm the following (on a 2-node cluster):
>
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
>
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
>
> Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that
> looked suspicious (SOLR-8451 and SOLR-8578), even though the changes look
> legit. That did not help, and honestly I've done that before we suspected
> it might be the SSL. Therefore I think those are "safe", but just FYI.
>
> When it does happen, the number of CLOSE_WAITS climb very high, to the
> order of 30K+ entries in 'netstat'.
>
> When I say it does not reproduce on 5.4.1 I really mean the numbers don't
> go as high as they do in 5.5.1. Meaning, when running without SSL, the
> number of CLOSE_WAITs is smallish, usually less than a 10 (I would
> separately like to understand why we have any in that state at all). When
> running with SSL and 5.4.1, they stay low at the order of hundreds the
> most.
>
> Unfortunately running without SSL is not an option for us. We will likely
> roll back to 5.4.1, even if the problem exists there, but to a lesser
> degree.
>
> I will post back here when/if we have more info about this.
>
> Shai
>
> On Thu, Jul 7, 2016 at 5:32 PM Shalin Shekhar Mangar <
> shalinman...@gmail.com>
> wrote:
>
> > I have myself seen this CLOSE_WAIT issue at a customer. I am running some
> > tests with different versions trying to pinpoint the cause of this leak.
> > Once I have some more information and a reproducible test, I'll open a
> jira
> > issue. I'll keep you posted.
> >
> > On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan <m...@dips.no>
> > wrote:
> >
> > > Hello there,
> > > Our SolrCloud is experiencing a FD leak while running with SSL. This is
> > > occurring on the one machine that our program is sending data too. We
> > have
> > > a total of three servers running as an ensemble.
> > >
> > > While running without SSL does the FD Count remain quite constant at
> > > around 180 while indexing. Performing a garbage collection also clears
> > > almost the entire JVM-memory.
> > >
> > > However - when indexing with SSL does the FDC grow polynomial. The
> count
> > > increases with a few hundred every five seconds or so, but reaches
> easily
> > > 50 000 within three to four minutes. Performing a GC swipes most of the
> > > memory on the two machines our program isn't transmitting the data
> > directly
> > > to. The last machine is unaffected by the GC, and both memory nor FDC
> > > doesn't reset before Solr is restarted on that machine.
> > >
> > > Performing a netstat reveals that the FDC mostly consists of
> > > TCP-connections in the state of "CLOSE_WAIT".
> > >
> > >
> > >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Anshum Gupta

Re: File Descriptor/Memory Leak

Reply via email to