Am 16.07.2022 um 18:43 schrieb PGNet Dev:
i don't get any more useful info on failure, --> https://pastebin.com/raw/DsrLxbeg
You didn't get the exception I mentioned; then set the breakpoint at parse() to get the fileLen. The current error messages suggests that bytes have been changed or have been lost.
IIRC tika saves the PDF in a file in the temp directory before parsing, maybe look there at that time and compare the length and content with your own.
Tilman
