Am 17.07.2022 um 17:09 schrieb PGNet Dev:
On 7/17/22 10:24 AM, Tilman Hausherr wrote:
That is in pdfbox, not in tika.
There's also a PDFParser.parse() in tika, which then calls
PDDocument.load(). However I don't know if this will use the
InputStream call, or the one with File. If it uses the one with the
file, then check the length and content of the file (tika does
sometimes store streams into a temporary file).
i see the same results -- i.e., nada -- with explicit stop in
PDFParser.parse
Re the failed build: remove the segment with ossindex-maven-plugin
from the parent pom.xml . That plugin (or rather, the company behind
it) has gone crazy, we've partly disabled it in the current trunk.
no idea what specifically to do there.
trying building 'main' with those partial disables, rather than
'2.4.1', that also fails,
I'll add some logging when in debug mode, maybe this will help in the
future. I still believe the error is on your side, but debugging would
help "proving" this.
https://issues.apache.org/jira/browse/TIKA-3819
This will show filename and length but only if logging is in DEBUG log
level. The modified version will appear at
https://repository.apache.org/content/groups/snapshots/org/apache/tika/
in a few hours.
Tilman