Hi, we noticed a memory leak in a rather small setup. 40.000 metadata documents with nearly as much files that have „literal.*“ fields with it. While 7.2.1 has brought some tika issues (due to a beta version) the real problems started to appear with version 7.3.0 which are currently unresolved in 7.4.0. Memory consumption is out-of-roof. Where previously 512MB heap was enough, now 6G aren’t enough to index all files. I am now to a point where I can track this down to the libraries in solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries shipped with 7.2.1 the problem disappears. As most files are PDF documents I tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the problem. I will next try to downgrade these single libraries back to 2.0.6 and 1.16 to see if these are the source of the memory leak.
In the mean time I would like to know if anybody else experienced the same problems? kind regards, Thomas
signature.asc
Description: Message signed with OpenPGP