>We've managed to produce a triggering PDF ourselves
W00t! I should have mentioned that there will be some data published
soon that will have a triggering PDF for those still interested.

> This is because Woodstox actually uses the IGNORING_STAX_ENTITY_RESOLVER, it 
> supports the String
return type and also wouldn't ignore it even if the return type wasn't
supported as long as the return value isn't null.
<face_palm/>

>Unfortunately, we also noticed that the fix in Tika breaks parsing PDFs
with XFA with Woodstox as Woodstox doesn't support the
XMLConstants.ACCESS_EXTERNAL_DTD property

Some days it feels like there is just no winning. This is directed at
XML parsers in the java ecosystem... not you!

This means that tika-server and other modules that use woodstox
probably aren't vulnerable. However, I just confirmed that tika-server
throws an exception on PDFs with xfa --
https://issues.apache.org/jira/browse/TIKA-4482

>Would it be possible to fix this...
Y, we can do so in 3.x and main. We can also cherry-pick into 2.x if
you're building your own???
https://issues.apache.org/jira/browse/TIKA-4482

On Wed, Sep 10, 2025 at 7:29 AM Michael Hamann <michael.ham...@xwiki.com> wrote:
>
> Hi Tim,
>
> as Simon is currently not available, I took over the handling of this on
> the XWiki side.
>
> On 2025/09/08 14:15:41 Tim Allison wrote:
> > Simon,
> >   I'm sorry for my delay. I'm hesitant to share the triggering PDF
> > even offline.
> >
> >   I just added unit tests that confirm the fix for StAX processing:
> > https://github.com/apache/tika/pull/2318 . Will that be of any use to
> > you? The stax tests failed before the fix.
>
> We've managed to produce a triggering PDF ourselves that exposes both
> URL and file contents when extracting its text with Tika, so no need to
> share anything. What we found with this PDF and also with the linked
> test (thank you very much!) is that the vulnerability doesn't reproduce
> with Woodstox as Stax XML API implementation. This is because Woodstox
> actually uses the IGNORING_STAX_ENTITY_RESOLVER, it supports the String
> return type and also wouldn't ignore it even if the return type wasn't
> supported as long as the return value isn't null. The corresponding code
> in Woodstox is
> https://github.com/FasterXML/woodstox/blob/bfde796d30f074e51960cc681e8ab478bcbbedd3/src/main/java/com/ctc/wstx/io/DefaultInputResolver.java#L150-L158
>
> So for now, based on this analysis, we assume that XWiki (which uses
> Woodstox) isn't affected by CVE-2025-54988.
>
> Unfortunately, we also noticed that the fix in Tika breaks parsing PDFs
> with XFA with Woodstox as Woodstox doesn't support the
> XMLConstants.ACCESS_EXTERNAL_DTD property - see also
> https://github.com/FasterXML/woodstox/issues/162.
>
> This can be reproduced with the mentioned unit tests, they fail with
>
> java.lang.IllegalArgumentException: Unrecognized property
> 'http://javax.xml.XMLConstants/property/accessExternalDTD'
>
> when Woodstox is added as a dependency.
>
> Would it be possible to fix this, e.g., by catching this exception as it
> is the case for all other properties? From what I understand from
> https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html#xmlinputfactory-a-stax-parser,
> XMLInputFactory.SUPPORT_DTD should be sufficient as protection, so I
> think Tika shouldn't fail if XMLConstants.ACCESS_EXTERNAL_DTD isn't
> supported - I would have rather expected it to fail if setting
> XMLInputFactory.SUPPORT_DTD didn't work.
>
> Thank you very much!
>
> Michael

Reply via email to