>We've managed to produce a triggering PDF ourselves W00t! I should have mentioned that there will be some data published soon that will have a triggering PDF for those still interested.
> This is because Woodstox actually uses the IGNORING_STAX_ENTITY_RESOLVER, it > supports the String return type and also wouldn't ignore it even if the return type wasn't supported as long as the return value isn't null. <face_palm/> >Unfortunately, we also noticed that the fix in Tika breaks parsing PDFs with XFA with Woodstox as Woodstox doesn't support the XMLConstants.ACCESS_EXTERNAL_DTD property Some days it feels like there is just no winning. This is directed at XML parsers in the java ecosystem... not you! This means that tika-server and other modules that use woodstox probably aren't vulnerable. However, I just confirmed that tika-server throws an exception on PDFs with xfa -- https://issues.apache.org/jira/browse/TIKA-4482 >Would it be possible to fix this... Y, we can do so in 3.x and main. We can also cherry-pick into 2.x if you're building your own??? https://issues.apache.org/jira/browse/TIKA-4482 On Wed, Sep 10, 2025 at 7:29 AM Michael Hamann <michael.ham...@xwiki.com> wrote: > > Hi Tim, > > as Simon is currently not available, I took over the handling of this on > the XWiki side. > > On 2025/09/08 14:15:41 Tim Allison wrote: > > Simon, > > I'm sorry for my delay. I'm hesitant to share the triggering PDF > > even offline. > > > > I just added unit tests that confirm the fix for StAX processing: > > https://github.com/apache/tika/pull/2318 . Will that be of any use to > > you? The stax tests failed before the fix. > > We've managed to produce a triggering PDF ourselves that exposes both > URL and file contents when extracting its text with Tika, so no need to > share anything. What we found with this PDF and also with the linked > test (thank you very much!) is that the vulnerability doesn't reproduce > with Woodstox as Stax XML API implementation. This is because Woodstox > actually uses the IGNORING_STAX_ENTITY_RESOLVER, it supports the String > return type and also wouldn't ignore it even if the return type wasn't > supported as long as the return value isn't null. The corresponding code > in Woodstox is > https://github.com/FasterXML/woodstox/blob/bfde796d30f074e51960cc681e8ab478bcbbedd3/src/main/java/com/ctc/wstx/io/DefaultInputResolver.java#L150-L158 > > So for now, based on this analysis, we assume that XWiki (which uses > Woodstox) isn't affected by CVE-2025-54988. > > Unfortunately, we also noticed that the fix in Tika breaks parsing PDFs > with XFA with Woodstox as Woodstox doesn't support the > XMLConstants.ACCESS_EXTERNAL_DTD property - see also > https://github.com/FasterXML/woodstox/issues/162. > > This can be reproduced with the mentioned unit tests, they fail with > > java.lang.IllegalArgumentException: Unrecognized property > 'http://javax.xml.XMLConstants/property/accessExternalDTD' > > when Woodstox is added as a dependency. > > Would it be possible to fix this, e.g., by catching this exception as it > is the case for all other properties? From what I understand from > https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html#xmlinputfactory-a-stax-parser, > XMLInputFactory.SUPPORT_DTD should be sufficient as protection, so I > think Tika shouldn't fail if XMLConstants.ACCESS_EXTERNAL_DTD isn't > supported - I would have rather expected it to fail if setting > XMLInputFactory.SUPPORT_DTD didn't work. > > Thank you very much! > > Michael