Thank you, David! It would be helpful to know if downgrading to 1.16
solves the problems with .txt files, as it does (apparently) with
pdfs.
On Tue, Aug 7, 2018 at 9:10 AM David Pilato <[email protected]> wrote:
>
> That's interesting. Someone did some tests on a project I'm working on and
> reported as well a lot of memory usage (even for only txt files).
> I did not dig yet into the issue so I don't know if this is related or not,
> but I thought I'd share this here:
> https://github.com/dadoonet/fscrawler/issues/566
>
>
> Le 7 août 2018 à 14:36 +0200, Tim Allison <[email protected]>, a écrit :
>
> Thomas,
> Thank you for raising this on the Solr list. Please let us know if we can
> help you help us figure out what’s going on...or if you’ve already figured it
> out!
> Thank you!
>
> Best,
> Tim
>
> ---------- Forwarded message ---------
> From: Thomas Scheffler <[email protected]>
> Date: Thu, Aug 2, 2018 at 6:06 AM
> Subject: Memory Leak in 7.3 to 7.4
> To: [email protected] <[email protected]>
>
>
> Hi,
>
> we noticed a memory leak in a rather small setup. 40.000 metadata documents
> with nearly as much files that have „literal.*“ fields with it. While 7.2.1
> has brought some tika issues (due to a beta version) the real problems
> started to appear with version 7.3.0 which are currently unresolved in 7.4.0.
> Memory consumption is out-of-roof. Where previously 512MB heap was enough,
> now 6G aren’t enough to index all files.
> I am now to a point where I can track this down to the libraries in
> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries
> shipped with 7.2.1 the problem disappears. As most files are PDF documents I
> tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the
> problem. I will next try to downgrade these single libraries back to 2.0.6
> and 1.16 to see if these are the source of the memory leak.
>
> In the mean time I would like to know if anybody else experienced the same
> problems?
>
> kind regards,
>
> Thomas