Hi,
thanks a lot for the fast answer and the details.
Le 22/08/2025 à 14:38, Tim Allison a écrit :
As you read, the vulnerability is an XXE via a PDF with a crafted XFA
(xml) file embedded.
Generally speaking, the worst case scenario is that a user is running
Tika as a user with a high level of permissions, parsing untrusted
files and returning the results to an attacker. So, maybe a jobs site
runs Tika as root, parses a resume submitted by an attacker and shows
the results to the attacker, er, job applicant. The exploit is that
the XFA may contain an external entity that would read the contents of
e.g. "/etc/password" or pull content from
"https://our-local-sharepoint.com/super-secret.html" and return that
to the attacker as "this is the text we extracted from your
resume:xyz". Data exfiltration.
Ok in our case XWiki should never be running with a user having such a
level of permissions, but nevertheless it could allow an attacker to
access some configuration files of XWiki itself that shouldn't be
readable by any user.
A not great scenario is that the attacker drops a million such PDFs
into your site, and you now have a million network calls to an
internal http site on your network or even a public site. This is a
denial of service.
I think it would be less of a problem for us as the parsing is only done
once for the indexing in a queue, and I think we have some measures to
prevent uploading too many files at once.
The minimal fix is in this commit:
https://github.com/apache/tika/commit/bfee6d5569fe9197c4ea947a96e212825184ca33
I made some slight updates here:
https://github.com/apache/tika/commit/fd2016ffe4a892c06da097b50deeecf8c9d5813a
The root cause of the vuln is that I thought that our
IGNORING_STAX_ENTITY_RESOLVER was preventing calls to external
entities. However, it returned a String, which was not the correct
object type, and Java was silently ignoring that problem and backing
off to default behavior which allows external entities.
Unfortunately, there's no way via configuration to tell Tika to avoid
parsing XFA.
That would have been indeed a good option.
One solution would be to refactor your code to use tika-server, which
would put all the dependencies into a separate jvm and you wouldn't
have jar hell with jakarta etc. That's a heavy lift, I realize.
Yeah well, we need to perform the jakarta migration anyway for other
libraries too, it's just that we're lagging behind on the topic...
2.x is EOL, and I'd really personally rather not make another release,
but I can see from your note that there is a need. My major concern
with a 2.x release is that there are probably a number of other
dependencies that now have vulns in their jdk 8 versions, and the
amount of time spent figuring out which other dependencies we can
update within the jdk8 limitations causes me concern.
Fellow devs, what do you think?
So clearly that would be the ideal for us: right now we're internally
discussing about forking a Tika 2.x applying your changes and deploy a
custom version in our own repo to get the fix. Would be better if it's
an official one for sure.
Thanks again,
Simon.
Best,
Tim
On Fri, Aug 22, 2025 at 5:09 AM Simon Urli <simon.u...@xwiki.com> wrote:
Hello,
I'm one of the core contributor of the XWiki platform
(https://www.xwiki.org) which relies on Tika.
We got informed this morning through our automated checks about the
publication of CVE-2025-54988. We still haven't managed to finish our
migration to Tika 3.x because of the complex migration to Jakarta of all
the subsequent dependencies (see
https://jira.xwiki.org/browse/XWIKI-22595) meaning that we depend on
Tika 2.x which is affected by the CVE, apparently without any easy
workaround and without plan for releasing a bug fix if I understand
correctly what's been announced regarding the 2.x EOL.
So at this point we're trying to understand how much we're possibly
affected by this CVE: we're currently using the tika-parser-pdf-module
mainly in that class:
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-search-solr-api/src/main/java/org/xwiki/search/solr/internal/metadata/AbstractSolrMetadataExtractor.java#L520-L543,
where we use it to perform indexing of PDF documents.
I've tried to look in the recents commits in
https://github.com/apache/tika/commits/3.2.2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module
to understand a bit better the vulnerability but I'm failing to see it,
and I haven't found anymore information in JIRA when browsing the
tickets fixed in 3.2.2.
So would that be possible to get more information about this
vulnerability, like a possible scenario of an exploit so that we can
check quickly if we're impacted or not?
Thanks,
Simon Urli.