Hi Tim,

Thanks for your answer!



On Tue, Mar 4, 2025 at 5:44 PM Tim Allison <talli...@apache.org> wrote:

> I'm deeply puzzled. I agree with your assessments.
> 1) ERROR should only be a status if there was an OOM, and you should be
> seeing that elsewhere in your logs. Further, the chances that you'd see an
> ERROR should be fairly slim... that status should trigger a restart fairly
> quickly, but it is definitely possible to see that.
>

So when running in forked mode, the watchdog process would query the ERROR
status and would terminate the process?

What happens when OutOfMemory but the server continues to run, does the JVM
reclaim the heap and continue to run? Or is it running in an undefined
state? I can see it is working and can recover from this state, but maybe
there are some gotchas ...


> 2) The "SEVERE" warning level is chosen by cxf, and out of Tika's control.
> I've seen that before when the client closes the connection before reading
> all the data...I think.
>

OK, then in this case it is not determining the ERROR state.


>
> Questions/assumptions:
> 1) tika 3.1.0?
>
Yes.

> 2) you are running in default mode, you aren't running in {{nofork}}
>

Running with --no-fork and a custom watchdog. However the watchdog just
takes care of starting a new instance, it does not check the health status
is OPERATING, just checking the http code from the /status endpoint.


> 3) what are the other error entries?!
>

Only this one, that I am debugging
- "package":"org.apache.pdfbox.contentstream.PDFStreamEngine",
"message":"Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image
I/O Tools are not installed"}
But normally there could be ERRORs reported for instance when parsing
encrypted docs, etc. I just wanted to double check that such errors do not
impact the status of the service.


>
> On the larger question, when you're running tika-server 2.x and greater,
> it should restart on its own (unless you're running in {{nofork}}. You
> shouldn't have to have a watcher to restart the processes. If you do want
> to take over that responsibility, you should run in {{nofork}} mode, maybe?
>
Indeed, running in no-fork mode and taking the responsibility of
restarting. Generally one can rely on k8s and health probes for restarts.
So my take-away is that health status should check that STATUS is not
ERROR, most likely, depending on your answer to the question above.

Thanks,
Cristi


>
> On Tue, Mar 4, 2025 at 9:46 AM Cristian Zamfir <cri...@cyberhaven.com>
> wrote:
>
>> Hello,
>>
>> What is the meaning of the status ERROR in tika server? I noticed that
>> some operational servers respond to ERROR instead of OPERATING, e.g.,
>> { "server_id" : "2c38a628-a37d-401f-99cd-f22d933e60c1", "status" :
>> "ERROR", "millis_since_last_parse_started" : 24072, "files_processed" :
>> 9003, "num_restarts" : 0 }
>>
>> In the code it looks like ERROR is only set in OOM situations, though I
>> do not see this in the logs.
>> I see some ERROR entries that do not look like they should influence the
>> status of the server + this SEVERE entry:
>>
>> SEVERE: Problem with writing the data, class
>> org.apache.tika.server.core.resource.TikaResource$$Lambda/0x0000788572302f00,
>> ContentType: text/plain
>> Mar 04, 2025 11:34:52 AM org.apache.cxf.phase.PhaseInterceptorChain
>> doDefaultLogging
>> WARNING: Interceptor for {
>> http://resource.core.server.tika.apache.org/}TikaResource has thrown
>> exception, unwinding now
>> org.apache.cxf.interceptor.Fault: Could not send Message.
>> at
>> org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
>> at
>> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>> at
>> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
>> at
>> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>> at
>> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
>> at
>> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
>> at
>> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:244)
>> at
>> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:80)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:178)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>> at org.eclipse.jetty.server.Server.handle(Server.java:563)
>> at
>> org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
>> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
>> at
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287)
>> at
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
>> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
>> at
>> org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
>> at
>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421)
>> at
>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390)
>> at
>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277)
>> at
>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199)
>> at
>> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
>>
>>
>> Please let me know if any of this would be setting the status of the
>> server to ERROR. My goal was to look for OPERATING status as a health
>> indication and restart in case of ERROR, but I would like to avoid false
>> positives.
>>
>> Thanks,
>> Cristi
>>
>>

Reply via email to