That's interesting. Someone did some tests on a project I'm working on and reported as well a lot of memory usage (even for only txt files). I did not dig yet into the issue so I don't know if this is related or not, but I thought I'd share this here: https://github.com/dadoonet/fscrawler/issues/566
Le 7 août 2018 à 14:36 +0200, Tim Allison <[email protected]>, a écrit : > Thomas, > Thank you for raising this on the Solr list. Please let us know if we can > help you help us figure out what’s going on...or if you’ve already figured it > out! > Thank you! > > Best, > Tim > > > ---------- Forwarded message --------- > > From: Thomas Scheffler <[email protected]> > > Date: Thu, Aug 2, 2018 at 6:06 AM > > Subject: Memory Leak in 7.3 to 7.4 > > To: [email protected] <[email protected]> > > > > > > Hi, > > > > we noticed a memory leak in a rather small setup. 40.000 metadata documents > > with nearly as much files that have „literal.*“ fields with it. While 7.2.1 > > has brought some tika issues (due to a beta version) the real problems > > started to appear with version 7.3.0 which are currently unresolved in > > 7.4.0. Memory consumption is out-of-roof. Where previously 512MB heap was > > enough, now 6G aren’t enough to index all files. > > I am now to a point where I can track this down to the libraries in > > solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries > > shipped with 7.2.1 the problem disappears. As most files are PDF documents > > I tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the > > problem. I will next try to downgrade these single libraries back to 2.0.6 > > and 1.16 to see if these are the source of the memory leak. > > > > In the mean time I would like to know if anybody else experienced the same > > problems? > > > > kind regards, > > > > Thomas
