I'm also here 😂

You can also put a breakpoint in PDFBox, then go to
org.apache.pdfbox.pdfparser.PDFParser.parse()
and when it does breakpoint-stop there (it definitively passes that point!), then look into your /tmp directory for the file that is mentioned in the tika debug output and copy it somewhere else.

Tilman

Am 20.07.2022 um 00:45 schrieb PGNet Dev:
hi,

i'm debugging a problem with email attachment scanning by tika-server.

dovecot imap server receives email+attachment, then hands off the attachment (modified, or unmodified, dunno yet) via its 'fts-tika' plugin.

with

    dovecot 2.3.19.1
    tika 2.4.2-snapshot
    openjdk version "18.0.1" 2022-04-19

this used to work with earlier versions (haven't bisected the problem yet).

with that^ version mix, it's failing.

it appears to be failing @ ~ PDFParser.

i've been trying to debug in this thread,

    https://lists.apache.org/thread/pztsq8tb8xqz3s4kmjpnt9p3zt07y05k

but have hit a current (temporary?) impasse.

at both Tika & Dovecot mailing lists, it's suggested to capture the /tmp/file @ failure.

to do so, i've -- per instruction -- set a jdb bkpt @

    org.apache.tika.parser.pdf.PDFParser

, but on exec, the errant file's not persisted

one suggestion as to why not is,

    "If the debugger didn't stop, then the breakpoint was at the wrong place. Or it's not possible to debug."

seems *really* odd that it can't be debugged ... thought best to ask _here_ first.

Q:

    IS it possible to debug?  ()

    what's the RIGHT breakpoint to set to make sure to halt, & catch the tmp file?


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to