Hi,

we have a problem when running the TikaServer. We use Tika 3.1.0 on Ubuntu with Java21.
Previously, we used Tika 2.4.x - there we could not observe this problem.

We run a *lot* of text-extraction requests. After a few hours (8-10h) Tika is not able to restart its worker processes.
Tika runs via systemd and via journalctl we see the following output:

-- journalct.start
May 28 04:39:39 dss-index java[350084]: INFO  [pool-2-thread-1] 04:39:39,752 org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit value 3 May 28 04:39:40 dss-index java[376963]: May 28, 2025 4:39:40 AM org.apache.cxf.endpoint.ServerImpl initDestination May 28 04:39:40 dss-index java[376963]: INFO: Setting the server's publish address to be http://localhost:9998/ May 28 05:35:32 dss-index java[350084]: INFO  [pool-2-thread-1] 05:35:32,896 org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit value 2 May 28 05:35:34 dss-index java[377213]: May 28, 2025 5:35:34 AM org.apache.cxf.endpoint.ServerImpl initDestination May 28 05:35:34 dss-index java[377213]: INFO: Setting the server's publish address to be http://localhost:9998/
-- journalct.end

After these messages the TikaServer does not respond to requests any more. A restart of the Tika-Parent process is the only thing which helps. The error messages are emitted in TikaServerWatchDog:161. Yet, I do not understand what is going wrong here. Probably the messages are error
messages from the OS. perror gives the following output:

OS error code   2:  No such file or directory
OS error code   3:  No such process

Yet, it is unclear to me, what happens. Below you'll find the tika.config.

As far as I understand the situation this seems a bug which has been introduced sometime between version 2.4.x and 3.1.0.

Hope that someone has an idea what is going on and how this can be remedied.

Tino


-- tika.config.start
<?xml version="1.0" encoding="UTF-8"?>
<properties>
   <parsers>
      <parser class="org.apache.tika.parser.DefaultParser">
      </parser>
   </parsers>
   <server>
    <params>
      <port>9998</port>
      <host>localhost</host>
      <digest>sha256</digest>
      <digestMarkLimit>1000000</digestMarkLimit>
      <id></id>
      <cors>NONE</cors>
      <logLevel>info</logLevel>
      <returnStackTrace>false</returnStackTrace>
      <noFork>false</noFork>
      <taskTimeoutMillis>300000</taskTimeoutMillis>
<maxForkedStartupMillis>120000</maxForkedStartupMillis>
      <maxRestarts>-1</maxRestarts>
      <maxFiles>25000</maxFiles>
      <javaPath>java</javaPath>
      <forkedJvmArgs>
        <arg>-Xms4g</arg>
        <arg>-Xmx4g</arg>
<arg>-Dlog4j.configurationFile=tika-forked-log4j2.xml</arg>
       </forkedJvmArgs>

<enableUnsecureFeatures>false</enableUnsecureFeatures>

      <endpoints>
        <endpoint>status</endpoint>
        <endpoint>tika</endpoint>
        <endpoint>rmeta</endpoint>
        <endpoint>language</endpoint>
      </endpoints>
    </params>
  </server>
</properties>
-- tika.config.stop


--
Tino Schöllhorn
Diplom Wirtschaftsinformatiker
Geschäftsführer
Plattform GmbH
Gabelsbergerstr. 5
68165 Mannheim
Tel: 0621-58679312
E-Mail: t.schoellh...@plattform-gmbh.de
Internet: http://www.plattform-gmbh.de

Registergericht: Amtsgericht Mannheim, HRB 9955
Geschäftsführer: Olaf Kellermeier, Tino Schöllhorn

Reply via email to