Hi,

we noticed a memory leak in a rather small setup. 40.000 metadata documents 
with nearly as much files that have „literal.*“ fields with it. While 7.2.1 has 
brought some tika issues (due to a beta version) the real problems started to 
appear with version 7.3.0 which are currently unresolved in 7.4.0. Memory 
consumption is out-of-roof. Where previously 512MB heap was enough, now 6G 
aren’t enough to index all files.
I am now to a point where I can track this down to the libraries in 
solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries 
shipped with 7.2.1 the problem disappears. As most files are PDF documents I 
tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the 
problem. I will next try to downgrade these single libraries back to 2.0.6 and 
1.16 to see if these are the source of the memory leak.

In the mean time I would like to know if anybody else experienced the same 
problems?

kind regards,

Thomas

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to