Hello,
thanks for all the details, we tried using
testPDF_XFA_govdocs1_258578.pdf and we confirmed that the XFA part is
parsed and indexed on our side. However, ideally we'd like to not loose
the indexing of XFA part, and we're still in doubt that the XXE is
impacting us (partly because we tried to build a customized version of
tika 2.x with applying patches and we got errors suggesting interactions
with woodstox for parsing the XML and apparently woodstox claim to not
be affected by this kind of XXE).
The problem is that at this point we're still trying to craft a PDF
exposing the XXE and apparently we're failing to do so... So I know it's
a bit sensitive but would that be possible to transmit us such PDF,
probably not publicly in the mailing list but maybe directly to
secur...@xwiki.org?
Thanks again for all info and work here,
Simon
Le 23/08/2025 à 05:17, Tilman Hausherr a écrit :
To check that it works, test with the file
testPDF_XFA_govdocs1_258578.pdf from the tika source code. If "Abraham
Lincoln" is part of the output then it didn't work.
Tilman