Hi,
I'm seeing the errors below when using the apache/tika:latest docker
image.
I'm using Tika solely as a backend to Apache Solr - I haven't done any
kind of configuration in Tika - I've just told Solr to load its
'extraction' module, and it's finding Tika on the default port.
I also tried installing tika from tika-server-standard-3.2.3-bin.tgz and
running it via systemd with the following in the service file:
Environment=TIKA_INCLUDE=/etc/default/tika.in.sh
ExecStart=/usr/bin/java -jar /opt/tika/tika-server.jar
I've seen the same error messages running Tika both ways, and also when
using both Solr 9.10.1 or 10.0
A quick web search suggests it might be an issue with non- thread-safe
code, but I'm not familiar with Java, so that's just a guess.
Is there any configuration I need to do for Tika that will help resolve
this, or any other suggestions?
Many thanks,
Carl
WARN [qtp2128961136-26] 10:35:10,759
org.apache.tika.server.core.resource.TikaResource tika: Text extraction
failed (4th year transition block slides - prereadingv1.pptx)
org.apache.tika.exception.TikaException: TIKA-237: Illegal SAXException
from org.apache.tika.parser.DefaultParser@58dce8f3
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:310)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
at
org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:365)
at
org.apache.tika.server.core.resource.TikaResource.lambda$produceOutput$2(TikaResource.java:659)
at
org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:176)
at
org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1651)
at
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:249)
at
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122)
at
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
at
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:244)
at
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:80)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:178)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.Server.handle(Server.java:563)
at
org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at
org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.apache.tika.sax.TaggedSAXException:
org.eclipse.jetty.io.EofException
at
org.apache.tika.sax.TaggedContentHandler.handleException(TaggedContentHandler.java:113)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:134)
at
org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:201)
at
org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:257)
at
org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:290)
at
org.apache.tika.parser.csv.TextAndCSVParser.handleText(TextAndCSVParser.java:135)
at
org.apache.tika.parser.csv.TextAndCSVParser.handleText(TextAndCSVParser.java:258)
at
org.apache.tika.parser.csv.TextAndCSVParser.parse(TextAndCSVParser.java:179)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
... 35 more
Caused by: org.apache.tika.sax.TaggedSAXException:
org.eclipse.jetty.io.EofException
at
org.apache.tika.sax.TaggedContentHandler.handleException(TaggedContentHandler.java:113)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:134)
... 44 more
Caused by: org.xml.sax.SAXException: org.eclipse.jetty.io.EofException
at
java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.outputCharacters(ToStream.java:1523)
at
java.xml/com.sun.org.apache.xml.internal.serializer.ToStream$CharacterBuffer$1.flush(ToStream.java:3417)
at
java.xml/com.sun.org.apache.xml.internal.serializer.ToStream$CharacterBuffer.flush(ToStream.java:3506)
at
java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.flushCharactersBuffer(ToStream.java:1559)
at
java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.endElement(ToStream.java:2092)
at
java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endElement(TransformerHandlerImpl.java:282)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:134)
at
org.apache.tika.sax.ExpandedTitleContentHandler.endElement(ExpandedTitleContentHandler.java:69)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:134)
at
org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:241)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:134)
... 45 more
Caused by: org.eclipse.jetty.io.EofException
at
org.eclipse.jetty.io.SocketChannelEndPoint.flush(SocketChannelEndPoint.java:116)
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:275)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:254)
at
org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:386)
at
org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:843)
at
org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:243)
at
org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224)
at
org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:600)
at
org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:1051)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:1123)
at
org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:271)
at
org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:255)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:859)
at
org.apache.cxf.transport.http_jetty.JettyHTTPDestination$JettyOutputStream.write(JettyHTTPDestination.java:319)
at
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
at
java.base/java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:284)
at
java.base/java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:232)
at
java.base/java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:148)
at
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
at
org.apache.cxf.io.AbstractThresholdOutputStream.write(AbstractThresholdOutputStream.java:69)
at
java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:309)
at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:381)
at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:357)
at
java.base/sun.nio.cs.StreamEncoder.lockedWrite(StreamEncoder.java:158)
at java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:139)
at
java.base/java.io.OutputStreamWriter.write(OutputStreamWriter.java:219)
at
java.xml/com.sun.org.apache.xml.internal.serializer.ToStream.outputCharacters(ToStream.java:1515)
... 55 more
Caused by: java.io.IOException: Broken pipe
at java.base/sun.nio.ch.SocketDispatcher.writev0(Native Method)
at
java.base/sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:66)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:227)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:158)
at
java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:574)
at
java.base/java.nio.channels.SocketChannel.write(SocketChannel.java:660)
at
org.eclipse.jetty.io.SocketChannelEndPoint.flush(SocketChannelEndPoint.java:110)
... 82 more
ERROR [qtp2128961136-26] 10:35:10,769
org.apache.cxf.jaxrs.utils.JAXRSUtils Problem with writing the data,
class
org.apache.tika.server.core.resource.TikaResource$$Lambda/0x00007f71fc2a6208,
ContentType: text/xml
WARN [qtp2128961136-26] 10:35:10,770
org.apache.cxf.phase.PhaseInterceptorChain Interceptor for
{http://resource.core.server.tika.apache.org/}MetadataResource has
thrown exception, unwinding now
org.apache.cxf.interceptor.Fault: Could not send Message.
at
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
at
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:244)
at
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:80)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:178)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.Server.handle(Server.java:563)
at
org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at
org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.eclipse.jetty.io.EofException: Closed
at
org.eclipse.jetty.server.HttpOutput.checkWritable(HttpOutput.java:757)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:781)
at
org.apache.cxf.transport.http_jetty.JettyHTTPDestination$JettyOutputStream.write(JettyHTTPDestination.java:319)
at
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
at
java.base/java.util.zip.GZIPOutputStream.finish(GZIPOutputStream.java:172)
at
java.base/java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:267)
at
org.apache.cxf.io.AbstractWrappedOutputStream.close(AbstractWrappedOutputStream.java:77)
at
org.apache.cxf.io.AbstractThresholdOutputStream.close(AbstractThresholdOutputStream.java:102)
at
org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56)
at
org.apache.cxf.transport.http.AbstractHTTPDestination$BackChannelConduit.close(AbstractHTTPDestination.java:766)
at
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:63)
... 27 more
WARN [qtp2128961136-26] 10:35:10,771
org.apache.cxf.phase.PhaseInterceptorChain Interceptor for
{http://resource.core.server.tika.apache.org/}MetadataResource has
thrown exception, unwinding now
java.lang.NullPointerException: Deflater has been closed
at java.base/java.util.zip.Deflater.ensureOpen(Deflater.java:902)
at java.base/java.util.zip.Deflater.deflate(Deflater.java:564)
at java.base/java.util.zip.Deflater.deflate(Deflater.java:464)
at
java.base/java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:282)
at
java.base/java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:232)
at
java.base/java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:148)
at
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
at
org.apache.cxf.io.AbstractThresholdOutputStream.write(AbstractThresholdOutputStream.java:69)
at com.ctc.wstx.io.UTF8Writer.flush(UTF8Writer.java:100)
at
com.ctc.wstx.sw.BufferingXmlWriter.flush(BufferingXmlWriter.java:242)
at com.ctc.wstx.sw.BaseStreamWriter.flush(BaseStreamWriter.java:260)
at
org.apache.cxf.jaxrs.interceptor.JAXRSDefaultFaultOutInterceptor.handleMessage(JAXRSDefaultFaultOutInterceptor.java:104)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at
org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:112)
at
org.apache.cxf.phase.PhaseInterceptorChain.wrapExceptionAsFault(PhaseInterceptorChain.java:376)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:334)
at
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
at
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:244)
at
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:80)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:178)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.Server.handle(Server.java:563)
at
org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at
org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.base/java.lang.Thread.run(Thread.java:1583)