Hi Tim, Thanks for your answer!
On Tue, Mar 4, 2025 at 5:44 PM Tim Allison <talli...@apache.org> wrote: > I'm deeply puzzled. I agree with your assessments. > 1) ERROR should only be a status if there was an OOM, and you should be > seeing that elsewhere in your logs. Further, the chances that you'd see an > ERROR should be fairly slim... that status should trigger a restart fairly > quickly, but it is definitely possible to see that. > So when running in forked mode, the watchdog process would query the ERROR status and would terminate the process? What happens when OutOfMemory but the server continues to run, does the JVM reclaim the heap and continue to run? Or is it running in an undefined state? I can see it is working and can recover from this state, but maybe there are some gotchas ... > 2) The "SEVERE" warning level is chosen by cxf, and out of Tika's control. > I've seen that before when the client closes the connection before reading > all the data...I think. > OK, then in this case it is not determining the ERROR state. > > Questions/assumptions: > 1) tika 3.1.0? > Yes. > 2) you are running in default mode, you aren't running in {{nofork}} > Running with --no-fork and a custom watchdog. However the watchdog just takes care of starting a new instance, it does not check the health status is OPERATING, just checking the http code from the /status endpoint. > 3) what are the other error entries?! > Only this one, that I am debugging - "package":"org.apache.pdfbox.contentstream.PDFStreamEngine", "message":"Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed"} But normally there could be ERRORs reported for instance when parsing encrypted docs, etc. I just wanted to double check that such errors do not impact the status of the service. > > On the larger question, when you're running tika-server 2.x and greater, > it should restart on its own (unless you're running in {{nofork}}. You > shouldn't have to have a watcher to restart the processes. If you do want > to take over that responsibility, you should run in {{nofork}} mode, maybe? > Indeed, running in no-fork mode and taking the responsibility of restarting. Generally one can rely on k8s and health probes for restarts. So my take-away is that health status should check that STATUS is not ERROR, most likely, depending on your answer to the question above. Thanks, Cristi > > On Tue, Mar 4, 2025 at 9:46 AM Cristian Zamfir <cri...@cyberhaven.com> > wrote: > >> Hello, >> >> What is the meaning of the status ERROR in tika server? I noticed that >> some operational servers respond to ERROR instead of OPERATING, e.g., >> { "server_id" : "2c38a628-a37d-401f-99cd-f22d933e60c1", "status" : >> "ERROR", "millis_since_last_parse_started" : 24072, "files_processed" : >> 9003, "num_restarts" : 0 } >> >> In the code it looks like ERROR is only set in OOM situations, though I >> do not see this in the logs. >> I see some ERROR entries that do not look like they should influence the >> status of the server + this SEVERE entry: >> >> SEVERE: Problem with writing the data, class >> org.apache.tika.server.core.resource.TikaResource$$Lambda/0x0000788572302f00, >> ContentType: text/plain >> Mar 04, 2025 11:34:52 AM org.apache.cxf.phase.PhaseInterceptorChain >> doDefaultLogging >> WARNING: Interceptor for { >> http://resource.core.server.tika.apache.org/}TikaResource has thrown >> exception, unwinding now >> org.apache.cxf.interceptor.Fault: Could not send Message. >> at >> org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67) >> at >> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) >> at >> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90) >> at >> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) >> at >> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) >> at >> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265) >> at >> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:244) >> at >> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:80) >> at >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) >> at >> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223) >> at >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381) >> at >> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:178) >> at >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303) >> at >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) >> at >> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) >> at >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) >> at org.eclipse.jetty.server.Server.handle(Server.java:563) >> at >> org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598) >> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753) >> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501) >> at >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287) >> at >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314) >> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) >> at >> org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) >> at >> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421) >> at >> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390) >> at >> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277) >> at >> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199) >> at >> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411) >> at >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969) >> at >> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194) >> at >> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149) >> >> >> Please let me know if any of this would be setting the status of the >> server to ERROR. My goal was to look for OPERATING status as a health >> indication and restart in case of ERROR, but I would like to avoid false >> positives. >> >> Thanks, >> Cristi >> >>