Hello,
we are using the pdfbox-app-2.0.3.jar library in an application which
crawls a huge Intranet reading also PDF-Documents and extracting
text-content.
In the last month we are facing problems caused by out of memory crashes
of the jvm. we are running java 1.8.0_65 under linux with -Xms512 -Xmx1024
The heap-dump analysis reports: The class "java.lang.ref.Finalizer",
loaded by "<system class loader>", occupies 470.713.224 (70,17%) bytes.
And the Memory Analyzer shows amongst others the following:
Class Name | Shallow Heap | Retained Heap | Percentage
-----------------------------------------------------------------------------------------------------------------------------------
| | |
class java.lang.ref.Finalizer @ 0xc0005768 System Class | 16
| 470.713.224 | 70,17%
|- java.lang.ref.Finalizer @ 0xed44ed10 | 40 |
470.713.192 | 70,17%
| |- java.lang.ref.Finalizer @ 0xed43c9b0 | 40 |
470.713.088 | 70,17%
| | |- java.lang.ref.Finalizer @ 0xed42b040 | 40 |
470.712.984 | 70,17%
| | | |- java.lang.ref.Finalizer @ 0xed419588 | 40 |
470.712.880 | 70,17%
| | | | |- java.lang.ref.Finalizer @ 0xed407b10 | 40 |
470.712.776 | 70,17%
| | | | | |- java.lang.ref.Finalizer @ 0xed3f6098 | 40 |
470.712.672 | 70,17%
| | | | | | |- java.lang.ref.Finalizer @ 0xed3e4620 |
40 | 470.712.568 | 70,17%
| | | | | | | |- java.lang.ref.Finalizer @ 0xed3d2b18 |
40 | 470.712.464 | 70,17%
| | | | | | | | |- java.lang.ref.Finalizer @ 0xed3bda48 | 40
| 470.712.360 | 70,17%
| | | | | | | | | |- java.lang.ref.Finalizer @ 0xed3abe48 |
40 | 470.712.256 | 70,17%
| | | | | | | | | | |- java.lang.ref.Finalizer @ 0xed39a3d0 |
40 | 470.712.152 | 70,17%
| | | | | | | | | | | |- java.lang.ref.Finalizer @ 0xed388798
| 40 | 470.712.048 | 70,17%
| | | | | | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer
@ 0xed39a390| 64 | 64 | 0,00%
| | | | | | | | | | | '- Total: 2 entries | |
|
| | | | | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3abe08 | 64 | 64 | 0,00%
| | | | | | | | | | '- Total: 2 entries | | |
| | | | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3bda08 | 64 | 64 | 0,00%
| | | | | | | | | '- Total: 2 entries | | |
| | | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3d2ad8 | 64 | 64 | 0,00%
| | | | | | | | '- Total: 2 entries | |
|
| | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3e45e0 | 64 | 64 | 0,00%
| | | | | | | '- Total: 2 entries | | |
| | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed3f6058
| 64 | 64 | 0,00%
| | | | | | '- Total: 2 entries | | |
| | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed407ad0 |
64 | 64 | 0,00%
| | | | | '- Total: 2 entries | | |
| | | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed419548 |
64 | 64 | 0,00%
| | | | '- Total: 2 entries | | |
| | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed42b000 | 64
| 64 | 0,00%
| | | '- Total: 2 entries | | |
| | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed43c970 |
64 | 64 | 0,00%
| | '- Total: 2 entries | | |
| |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed44ecd0 |
64 | 64 | 0,00%
| '- Total: 2 entries | | |
|- java.lang.Object @ 0xc0005758 | 16 | 16 | 0,00%
'- Total: 2 entries | | |
-----------------------------------------------------------------------------------------------------------------------------------
Excerpt from our code:
try {
PDDocument doc = PDDocument.load(file);
PDFTextStripper stripper = new PDFTextStripper();
...
textContent = stripper.getText(doc);
doc.close();
...
}
I have seen there are some similar Bugs reported:
https://issues.apache.org/jira/browse/PDFBOX-3253
https://issues.apache.org/jira/browse/PDFBOX-3388
Nevertheless, do you have a quick fix or workaround for us?
Thanks
Tjard
---------------------------------------------------------------------
Deutsche Vermögensberatung Aktiengesellschaft DVAG
Münchener Straße 1
60329 Frankfurt am Main
Vorstandsvorsitzender: Andreas Pohl
Mitglieder des Vorstandes: Dr. h.c. /HLU Udo Corts, Hans-Theo Franken,
Christian Glanz,
Lars Knackstedt, Dr. Helge Lach, Robert Peil, Dr. Dirk Reiffenrath
Aufsichtsratsvorsitzender: Friedrich Bohl
Sitz der Gesellschaft: Frankfurt am Main
Handelsregister Frankfurt HRB 15511
USt-Ident.-Nr.: DE 114 139 839
Erlaubnis- und Aufsichtsbehörde nach § 34c GewO: Stadt Frankfurt am Main,
Ordnungsamt, Kleyerstraße 86, 60326 Frankfurt am Main
Erlaubnis- und Aufsichtsbehörde nach § 34f GewO: IHK Frankfurt am Main,
Börsenplatz 4, 60313 Frankfurt am Main
Gemeinsame Registerstelle für § 34d GewO und § 34f GewO:
Deutscher Industrie- und Handelskammertag (DIHK) e.V.
Breite Straße 29, 10178 Berlin, Telefon 0180 600585-0
(20 Cent/Anruf aus dem deutschen Festnetz, höchstens 60 Cent/Anruf aus
Mobilfunknetzen)
www.vermittlerregister.info oder www.vermittlerregister.org
Registernummer nach § 34d GewO: D-LYYB-BSPX5-17
Registernummer nach § 34f GewO: D-F-125-93J4-60
---------------------------------------------------------------------