Hi Tim
Yeah, it appears to be odd - it just happens in scope of a write
provider dealing with a StreamingOutput callback which starts parsing
only when it is asked to write. Perhaps a different model can be
introduced. In meantime this indeed can be managed at a custom exception
level, example, a custom WebApplicationExceptionMapper can be registered,
Thanks, Sergey
On 27/02/15 18:53, Allison, Timothy B. wrote:
Hi Sergey,
Thank you for responding so quickly. It seems odd to get a "write
exception" in addition to the parse exception. I recently centralized _nearly_ all
calls to parse and added a custom ExceptionMapper. We could handle it there, if we
wanted.
However, if you're not batting an eye at the warning, I'm happy to ignore
the logs. Thank you!
Best,
Tim
-----Original Message-----
From: Sergey Beryozkin [mailto:[email protected]]
Sent: Friday, February 27, 2015 10:23 AM
To: [email protected]
Subject: Re: JAX-RS: SEVERE Problem with writing the data when parser hits
exception?
Hi Tim,
The problem appears to be happening during a write process, when a
JAX-RS runtime provider delegates back to JAX-RS StreamingOutput
TikaResource implementation.
I'm presuming this causes an actual exception reporting.
Do you think it should not be reported/logged ? This can be easily done,
if the parser throws the exception then this exception can be propagated
(wrapped if it is not RuntimeException) and caught with a custom
exception mapper and the logging being blocked...
Cheers, Sergey
On 27/02/15 15:05, Allison, Timothy B. wrote:
All,
I recently noticed that I'm getting this message logged when there is an
exception during parsing:
SEVERE: Problem with writing the data, class
org.apache.tika.server.TikaResource$5, ContentType: text/html
We didn't get this message with Tika 1.6, but we are getting this with
Tika 1.7 and trunk.
Is this to be expected?
Full stack trace is below. The test document that triggered this is an
encrypted PDF document.
WARNING: tika: Text extraction failed
org.apache.tika.exception.TikaException: Unable to extract PDF content
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:150)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:146)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256
)
at
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:117
)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256
)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
at
org.apache.tika.server.TikaResource$5.write(TikaResource.java:368)
at
org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataPr
ovider.java:164)
at
org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.jav
a:1363)
at
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage
(JAXRSOutInterceptor.java:244)
at
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(
JAXRSOutInterceptor.java:117)
at
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JA
XRSOutInterceptor.java:80)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseIntercept
orChain.java:307)
at
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(Out
goingChainInterceptor.java:83)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseIntercept
orChain.java:307)
at
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainIniti
ationObserver.java:121)
at
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(Abstract
HTTPDestination.java:251)
at
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(Je
ttyHTTPDestination.java:261)
at
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTP
Handler.java:70)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
er.java:1088)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
r.java:1024)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
extHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
tHttpConnection.java:494)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpC
onnection.java:982)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.conten
t(AbstractHttpConnection.java:1043)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnecti
on.java:82)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEn
dPoint.java:696)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEnd
Point.java:53)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
l.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException
at
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:109)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
at
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:22
5)
at
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.ja
va:117)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
ne.java:251)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
ne.java:235)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
java:215)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.ja
va:460)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.j
ava:385)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java
:344)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:134)
... 35 more
Caused by: java.util.zip.DataFormatException: incorrect header check
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Unknown Source)
at java.util.zip.Inflater.inflate(Unknown Source)
at
org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:128)
at
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:101)
... 46 more
Feb 27, 2015 9:27:33 AM org.apache.cxf.jaxrs.utils.JAXRSUtils
logMessageHandlerP
roblem
SEVERE: Problem with writing the data, class
org.apache.tika.server.TikaResource
$5, ContentType: text/html
Feb 27, 2015 9:27:33 AM
org.apache.cxf.jaxrs.impl.WebApplicationExceptionMapper
toResponse
WARNING: javax.ws.rs.WebApplicationException: HTTP 500 Internal Server Error
at
org.apache.tika.server.TikaResource$5.write(TikaResource.java:397)
at
org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataPr
ovider.java:164)
at
org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.jav
a:1363)
at
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage
(JAXRSOutInterceptor.java:244)
at
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(
JAXRSOutInterceptor.java:117)
at
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JA
XRSOutInterceptor.java:80)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseIntercept
orChain.java:307)
at
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(Out
goingChainInterceptor.java:83)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseIntercept
orChain.java:307)
at
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainIniti
ationObserver.java:121)
at
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(Abstract
HTTPDestination.java:251)
at
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(Je
ttyHTTPDestination.java:261)
at
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTP
Handler.java:70)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
er.java:1088)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
r.java:1024)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
extHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
tHttpConnection.java:494)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpC
onnection.java:982)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.conten
t(AbstractHttpConnection.java:1043)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnecti
on.java:82)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEn
dPoint.java:696)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEnd
Point.java:53)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
l.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
.java:543)
at java.lang.Thread.run(Unknown Source)
Best,
Tim