Hi,
we have a problem when running the TikaServer. We use Tika 3.1.0 on
Ubuntu with Java21.
Previously, we used Tika 2.4.x - there we could not observe this problem.
We run a *lot* of text-extraction requests. After a few hours (8-10h)
Tika is not able to restart its worker processes.
Tika runs via systemd and via journalctl we see the following output:
-- journalct.start
May 28 04:39:39 dss-index java[350084]: INFO [pool-2-thread-1]
04:39:39,752 org.apache.tika.server.core.TikaServerWatchDog forked
process exited with exit value 3
May 28 04:39:40 dss-index java[376963]: May 28, 2025 4:39:40 AM
org.apache.cxf.endpoint.ServerImpl initDestination
May 28 04:39:40 dss-index java[376963]: INFO: Setting the server's
publish address to be http://localhost:9998/
May 28 05:35:32 dss-index java[350084]: INFO [pool-2-thread-1]
05:35:32,896 org.apache.tika.server.core.TikaServerWatchDog forked
process exited with exit value 2
May 28 05:35:34 dss-index java[377213]: May 28, 2025 5:35:34 AM
org.apache.cxf.endpoint.ServerImpl initDestination
May 28 05:35:34 dss-index java[377213]: INFO: Setting the server's
publish address to be http://localhost:9998/
-- journalct.end
After these messages the TikaServer does not respond to requests any
more. A restart of the Tika-Parent process is the only thing which helps.
The error messages are emitted in TikaServerWatchDog:161. Yet, I do not
understand what is going wrong here. Probably the messages are error
messages from the OS. perror gives the following output:
OS error code 2: No such file or directory
OS error code 3: No such process
Yet, it is unclear to me, what happens. Below you'll find the tika.config.
As far as I understand the situation this seems a bug which has been
introduced sometime between version 2.4.x and 3.1.0.
Hope that someone has an idea what is going on and how this can be
remedied.
Tino
-- tika.config.start
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
</parser>
</parsers>
<server>
<params>
<port>9998</port>
<host>localhost</host>
<digest>sha256</digest>
<digestMarkLimit>1000000</digestMarkLimit>
<id></id>
<cors>NONE</cors>
<logLevel>info</logLevel>
<returnStackTrace>false</returnStackTrace>
<noFork>false</noFork>
<taskTimeoutMillis>300000</taskTimeoutMillis>
<maxForkedStartupMillis>120000</maxForkedStartupMillis>
<maxRestarts>-1</maxRestarts>
<maxFiles>25000</maxFiles>
<javaPath>java</javaPath>
<forkedJvmArgs>
<arg>-Xms4g</arg>
<arg>-Xmx4g</arg>
<arg>-Dlog4j.configurationFile=tika-forked-log4j2.xml</arg>
</forkedJvmArgs>
<enableUnsecureFeatures>false</enableUnsecureFeatures>
<endpoints>
<endpoint>status</endpoint>
<endpoint>tika</endpoint>
<endpoint>rmeta</endpoint>
<endpoint>language</endpoint>
</endpoints>
</params>
</server>
</properties>
-- tika.config.stop
--
Tino Schöllhorn
Diplom Wirtschaftsinformatiker
Geschäftsführer
Plattform GmbH
Gabelsbergerstr. 5
68165 Mannheim
Tel: 0621-58679312
E-Mail: t.schoellh...@plattform-gmbh.de
Internet: http://www.plattform-gmbh.de
Registergericht: Amtsgericht Mannheim, HRB 9955
Geschäftsführer: Olaf Kellermeier, Tino Schöllhorn