Hello,

thanks for all the details, we tried using testPDF_XFA_govdocs1_258578.pdf and we confirmed that the XFA part is parsed and indexed on our side. However, ideally we'd like to not loose the indexing of XFA part, and we're still in doubt that the XXE is impacting us (partly because we tried to build a customized version of tika 2.x with applying patches and we got errors suggesting interactions with woodstox for parsing the XML and apparently woodstox claim to not be affected by this kind of XXE).

The problem is that at this point we're still trying to craft a PDF exposing the XXE and apparently we're failing to do so... So I know it's a bit sensitive but would that be possible to transmit us such PDF, probably not publicly in the mailing list but maybe directly to secur...@xwiki.org?

Thanks again for all info and work here,

Simon

Le 23/08/2025 à 05:17, Tilman Hausherr a écrit :
To check that it works, test with the file testPDF_XFA_govdocs1_258578.pdf from the tika source code. If "Abraham Lincoln" is part of the output then it didn't work.

Tilman

Reply via email to