Simon,
  I'm sorry for my delay. I'm hesitant to share the triggering PDF
even offline.

  I just added unit tests that confirm the fix for StAX processing:
https://github.com/apache/tika/pull/2318 . Will that be of any use to
you? The stax tests failed before the fix.

  Also, I can confirm that I was able to trigger Jazzer's XXE/SSRF
sanitizer with a custom PDFParser harness with our 2.x code before the
fix. The vulnerability was real.
  I'm sorry that I can't help more on this.

      Best,

             Tim

On Wed, Aug 27, 2025 at 5:44 AM Simon Urli <simon.u...@xwiki.com> wrote:
>
> Hello,
>
> thanks for all the details, we tried using
> testPDF_XFA_govdocs1_258578.pdf and we confirmed that the XFA part is
> parsed and indexed on our side. However, ideally we'd like to not loose
> the indexing of XFA part, and we're still in doubt that the XXE is
> impacting us (partly because we tried to build a customized version of
> tika 2.x with applying patches and we got errors suggesting interactions
> with woodstox for parsing the XML and apparently woodstox claim to not
> be affected by this kind of XXE).
>
> The problem is that at this point we're still trying to craft a PDF
> exposing the XXE and apparently we're failing to do so... So I know it's
> a bit sensitive but would that be possible to transmit us such PDF,
> probably not publicly in the mailing list but maybe directly to
> secur...@xwiki.org?
>
> Thanks again for all info and work here,
>
> Simon
>
> Le 23/08/2025 à 05:17, Tilman Hausherr a écrit :
> > To check that it works, test with the file
> > testPDF_XFA_govdocs1_258578.pdf from the tika source code. If "Abraham
> > Lincoln" is part of the output then it didn't work.
> >
> > Tilman
> >

Reply via email to