Since we made the announcement about CVE-2025-54988, we've learned
that there are some mitigations available if you need to stay with the
Tika 2.x branch. We regret that 2.x reached EOL in April 2025 (see:
https://tika.apache.org/), and the Tika project has no plans to make
another 2.x release.

Many thanks to Simon and Michael at XWiki for their research and
collaboration on these mitigations. We would not have known about the
woodstox option without their investigation.

Any of these mitigations should work, but you'll need to test these in
your environment with whatever dependencies you have on your classpath
and with your version and implementation of Java. We've included XXE
and billion laughs examples in:

https://github.com/apache/tika/blob/main/tika-core/src/test/java/org/apache/tika/utils/XMLReaderUtilsTest.java

Potential mitigations, from easier to harder:

1) Add woodstox to your classpath [1]. Woodstox accepts a string
return value in the IGNORING_STAX_ENTITY_RESOLVER, which Java silently
did not. Note that woodstox is already on the classpath with
tika-server.

2) Turn off xfa parsing in PDFs. See at the bottom [1] for how to
configure this.

3) Build your own tika-2.x from our {{branch_2x}}. We've added the
fixes in that branch.

4) Upgrade to Tika 3.2.3 when it is released. Tika 3.2.2 fixed this
vulnerability, but Tika 3.2.3 contains an important bugfix for those
using tika-server and for those with woodstox on their classpath (see:
https://issues.apache.org/jira/browse/TIKA-4482)

[0]
<dependency>
  <groupId>com.fasterxml.woodstox</groupId>
  <artifactId>woodstox-core</artifactId>
  <version>7.1.1</version>
</dependency>

[1]

<properties>
  <service-loader initializableProblemHandler="throw"/>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser">
      <parser-exclude class="org.apache.tika.parser.pdf.PDFParser"/>
    </parser>
    <parser class="org.apache.tika.parser.pdf.PDFParser">
      <params>
        <!-- this is the one that matters -->
        <param name="extractAcroFormContent" type="bool">false</param>
        <!-- this can override the above. make absolutely sure this is false -->
        <param name="ifXFAExtractOnlyXFA" type="bool">false</param>
      </params>
    </parser>
  </parsers>
</properties>

Reply via email to