Hey,
I have a MP4 file that is 132Mb that I send to the Tika server and I get
back the data successfully.
However, when I wrap the same file in rar format I get the following error:
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
org.apache.tika.parser.pkg.UnrarParser@36d35f86
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:195)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.server.core.resource.UnpackerResource.process(UnpackerResource.java:145)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.server.core.resource.UnpackerResource.unpackAll(UnpackerResource.java:109)
~[tika-server-standard-2.7.0.jar:2.7.0]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method) ~[?:?]
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
~[?:?]
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:?]
at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
at
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
~[tika-server-standard-2.7.0.jar:2.7.0]
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)
~[tika-server-standard-2.7.0.jar:2.7.0]
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
~[tika-server-standard-2.7.0.jar:2.7.0]
at org.eclipse.jetty.server.Server.handle(Server.java:516)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
~[tika-server-standard-2.7.0.jar:2.7.0]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.io.IOException:
org.apache.tika.exception.TikaMemoryLimitException: Tried to allocate
104857601 bytes, but 104857600 is the maximum allowed. Please open an issue
https://issues.apache.org/jira/projects/TIKA if you believe this file is
not corrupt.
at
org.apache.tika.server.core.resource.UnpackerResource$MyEmbeddedDocumentExtractor.parseEmbedded(UnpackerResource.java:184)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.pkg.UnrarParser.processFile(UnrarParser.java:136)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.pkg.UnrarParser.processDirectory(UnrarParser.java:121)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.pkg.UnrarParser.parse(UnrarParser.java:105)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
~[tika-server-standard-2.7.0.jar:2.7.0]
... 39 more
Caused by: org.apache.tika.exception.TikaMemoryLimitException: Tried to
allocate 104857601 bytes, but 104857600 is the maximum allowed. Please open
an issue https://issues.apache.org/jira/projects/TIKA if you believe this
file is not corrupt.
at
org.apache.tika.server.core.resource.UnpackerResource$MyEmbeddedDocumentExtractor.parseEmbedded(UnpackerResource.java:184)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.pkg.UnrarParser.processFile(UnrarParser.java:136)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.pkg.UnrarParser.processDirectory(UnrarParser.java:121)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.pkg.UnrarParser.parse(UnrarParser.java:105)
~[tika-server-standard-2.7.0.jar:2.7.0]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
~[tika-server-standard-2.7.0.jar:2.7.0]
I attached my Tike-config.xml and I would like to get ideas on what I
should do to solve that issue.
Thx,
Shay.
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
<parser-exclude class="org.apache.tika.parser.pkg.RarParser"/>
</parser>
<parser class="org.apache.tika.parser.microsoft.OfficeParser">
<params>
<param name="byteArrayMaxOverride" type="int">1000000000</param>
</params>
</parser>
<parser class="org.apache.tika.parser.pkg.UnrarParser">
<params>
<param name="byteArrayMaxOverride" type="int">1000000000</param>
</params>
</parser>
</parsers>
</properties>