Am 17.07.2022 um 17:09 schrieb PGNet Dev:
On 7/17/22 10:24 AM, Tilman Hausherr wrote:
That is in pdfbox, not in tika.

There's also a PDFParser.parse() in tika, which then calls PDDocument.load(). However I don't know if this will use the InputStream call, or the one with File. If it uses the one with the file, then check the length and content of the file (tika does sometimes store streams into a temporary file).

i see the same results -- i.e., nada -- with explicit stop in PDFParser.parse

Re the failed build: remove the segment with ossindex-maven-plugin from the parent pom.xml . That plugin (or rather, the company behind it) has gone crazy, we've partly disabled it in the current trunk.

no idea what specifically to do there.

trying building 'main' with those partial disables, rather than '2.4.1', that also fails,


I'll add some logging when in debug mode, maybe this will help in the future. I still believe the error is on your side, but debugging would help "proving" this.

https://issues.apache.org/jira/browse/TIKA-3819

This will show filename and length but only if logging is in DEBUG log level. The modified version will appear at

https://repository.apache.org/content/groups/snapshots/org/apache/tika/

in a few hours.

Tilman

Reply via email to