As you read, the vulnerability is an XXE via a PDF with a crafted XFA (xml) file embedded.
Generally speaking, the worst case scenario is that a user is running Tika as a user with a high level of permissions, parsing untrusted files and returning the results to an attacker. So, maybe a jobs site runs Tika as root, parses a resume submitted by an attacker and shows the results to the attacker, er, job applicant. The exploit is that the XFA may contain an external entity that would read the contents of e.g. "/etc/password" or pull content from "https://our-local-sharepoint.com/super-secret.html" and return that to the attacker as "this is the text we extracted from your resume:xyz". Data exfiltration. A not great scenario is that the attacker drops a million such PDFs into your site, and you now have a million network calls to an internal http site on your network or even a public site. This is a denial of service. The minimal fix is in this commit: https://github.com/apache/tika/commit/bfee6d5569fe9197c4ea947a96e212825184ca33 I made some slight updates here: https://github.com/apache/tika/commit/fd2016ffe4a892c06da097b50deeecf8c9d5813a The root cause of the vuln is that I thought that our IGNORING_STAX_ENTITY_RESOLVER was preventing calls to external entities. However, it returned a String, which was not the correct object type, and Java was silently ignoring that problem and backing off to default behavior which allows external entities. Unfortunately, there's no way via configuration to tell Tika to avoid parsing XFA. One solution would be to refactor your code to use tika-server, which would put all the dependencies into a separate jvm and you wouldn't have jar hell with jakarta etc. That's a heavy lift, I realize. 2.x is EOL, and I'd really personally rather not make another release, but I can see from your note that there is a need. My major concern with a 2.x release is that there are probably a number of other dependencies that now have vulns in their jdk 8 versions, and the amount of time spent figuring out which other dependencies we can update within the jdk8 limitations causes me concern. Fellow devs, what do you think? Best, Tim On Fri, Aug 22, 2025 at 5:09 AM Simon Urli <simon.u...@xwiki.com> wrote: > > Hello, > > I'm one of the core contributor of the XWiki platform > (https://www.xwiki.org) which relies on Tika. > > We got informed this morning through our automated checks about the > publication of CVE-2025-54988. We still haven't managed to finish our > migration to Tika 3.x because of the complex migration to Jakarta of all > the subsequent dependencies (see > https://jira.xwiki.org/browse/XWIKI-22595) meaning that we depend on > Tika 2.x which is affected by the CVE, apparently without any easy > workaround and without plan for releasing a bug fix if I understand > correctly what's been announced regarding the 2.x EOL. > > So at this point we're trying to understand how much we're possibly > affected by this CVE: we're currently using the tika-parser-pdf-module > mainly in that class: > https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-search-solr-api/src/main/java/org/xwiki/search/solr/internal/metadata/AbstractSolrMetadataExtractor.java#L520-L543, > where we use it to perform indexing of PDF documents. > > I've tried to look in the recents commits in > https://github.com/apache/tika/commits/3.2.2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module > to understand a bit better the vulnerability but I'm failing to see it, > and I haven't found anymore information in JIRA when browsing the > tickets fixed in 3.2.2. > > So would that be possible to get more information about this > vulnerability, like a possible scenario of an exploit so that we can > check quickly if we're impacted or not? > > Thanks, > > Simon Urli. >