Am 22.08.2025 um 14:38 schrieb Tim Allison:
Unfortunately, there's no way via configuration to tell Tika to avoid
parsing XFA.

I've been trying to research this but somehow I messed up my IDE while working on TIKA-4470 so I can't properly test right now. I was wondering whether disabling acroform (setExtractAcroFormContent(false)) would work (although we'd lose the classic form content as well), or if we could exclude the XMP parser. (There are two occurences of XFA usage in AbstractPDF2XHTML.java)

Another solution would be to check PDF files with PDFBox (easy), and also check for attachments (less easy because there are two types of attachments).

Tilman

Reply via email to