That's interesting. Someone did some tests on a project I'm working on and 
reported as well a lot of memory usage (even for only txt files).
I did not dig yet into the issue so I don't know if this is related or not, but 
I thought I'd share this here: https://github.com/dadoonet/fscrawler/issues/566


Le 7 août 2018 à 14:36 +0200, Tim Allison <[email protected]>, a écrit :
> Thomas,
>    Thank you for raising this on the Solr list. Please let us know if we can 
> help you help us figure out what’s going on...or if you’ve already figured it 
> out!
>     Thank you!
>
>     Best,
>        Tim
>
> > ---------- Forwarded message ---------
> > From: Thomas Scheffler <[email protected]>
> > Date: Thu, Aug 2, 2018 at 6:06 AM
> > Subject: Memory Leak in 7.3 to 7.4
> > To: [email protected] <[email protected]>
> >
> >
> > Hi,
> >
> > we noticed a memory leak in a rather small setup. 40.000 metadata documents 
> > with nearly as much files that have „literal.*“ fields with it. While 7.2.1 
> > has brought some tika issues (due to a beta version) the real problems 
> > started to appear with version 7.3.0 which are currently unresolved in 
> > 7.4.0. Memory consumption is out-of-roof. Where previously 512MB heap was 
> > enough, now 6G aren’t enough to index all files.
> > I am now to a point where I can track this down to the libraries in 
> > solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries 
> > shipped with 7.2.1 the problem disappears. As most files are PDF documents 
> > I tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the 
> > problem. I will next try to downgrade these single libraries back to 2.0.6 
> > and 1.16 to see if these are the source of the memory leak.
> >
> > In the mean time I would like to know if anybody else experienced the same 
> > problems?
> >
> > kind regards,
> >
> > Thomas

Reply via email to