R: External Tika Server

2018-12-04 Thread Bisonti Mario

In my tika server, I added:
-spawnChild -taskTimeoutMillis 100
To bypass the timeout problem

Mario


Da: Furkan KAMACI 
Inviato: martedì 4 dicembre 2018 10:16
A: user@manifoldcf.apache.org; Rafa Haro 
Oggetto: Re: External Tika Server

Hi Rafa,

I can parse same document via HTTP URL of Tika Server. I thought that there 
maybe a timeout parameter within ManifoldCF while communicating with Tika 
Server :)

Kind Regards,
Furkan KAMACI

On Tue, Dec 4, 2018 at 12:13 PM Rafa Haro 
mailto:rh...@apache.org>> wrote:
Hi Furkan,

You seem to be getting a Timeout from Tesseract. This might be happening with 
large documents (too many pages). Maybe there is some configuration parameter 
for increasing timeouts that you can use at Tika side

Rafa

On Tue, Dec 4, 2018 at 9:58 AM Furkan KAMACI 
mailto:furkankam...@gmail.com>> wrote:
Hi,

I try to test external OCR capabilities of Tika Server with ManifoldCF 2.11. 
Documents are parsed when I curl documents into Tika Server directly. However, 
when I try to parse them via Tika Server I get that error at most of the 
documents (not all of them):

INFO  meta (application/msword)
WARN  meta: Text extraction failed
org.apache.tika.exception.TikaException: Unable to extract PDF content
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:402)
at 
org.apache.tika.server.resource.MetadataResource.parseMetadata(MetadataResource.java:126)
at 
org.apache.tika.server.resource.MetadataResource.getMetadata(MetadataResource.java:60)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)
at 
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:193)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:103)
at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
at java.lang.Thread.run(Thread.java:748)
Caused by: 

Re: External Tika Server

2018-12-04 Thread Furkan KAMACI
Hi Rafa,

I can parse same document via HTTP URL of Tika Server. I thought that there
maybe a timeout parameter within ManifoldCF while communicating with Tika
Server :)

Kind Regards,
Furkan KAMACI

On Tue, Dec 4, 2018 at 12:13 PM Rafa Haro  wrote:

> Hi Furkan,
>
> You seem to be getting a Timeout from Tesseract. This might be happening
> with large documents (too many pages). Maybe there is some configuration
> parameter for increasing timeouts that you can use at Tika side
>
> Rafa
>
> On Tue, Dec 4, 2018 at 9:58 AM Furkan KAMACI 
> wrote:
>
>> Hi,
>>
>> I try to test external OCR capabilities of Tika Server with ManifoldCF
>> 2.11. Documents are parsed when I curl documents into Tika Server directly.
>> However, when I try to parse them via Tika Server I get that error at
>> *most* of the documents (not all of them):
>>
>> INFO  meta (application/msword)
>> WARN  meta: Text extraction failed
>> org.apache.tika.exception.TikaException: Unable to extract PDF content
>> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139)
>> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172)
>> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
>> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>> at
>> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:402)
>> at
>> org.apache.tika.server.resource.MetadataResource.parseMetadata(MetadataResource.java:126)
>> at
>> org.apache.tika.server.resource.MetadataResource.getMetadata(MetadataResource.java:60)
>> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at
>> org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)
>> at
>> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
>> at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:193)
>> at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:103)
>> at
>> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
>> at
>> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
>> at
>> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
>> at
>> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
>> at
>> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
>> at
>> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
>> at
>> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>> at org.eclipse.jetty.server.Server.handle(Server.java:531)
>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
>> at
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
>> at
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
>> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
>> at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>> at
>> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.commons.io.IOExceptionWithCause: Unable to end a
>> page
>> at
>> 

Re: External Tika Server

2018-12-04 Thread Rafa Haro
Hi Furkan,

You seem to be getting a Timeout from Tesseract. This might be happening
with large documents (too many pages). Maybe there is some configuration
parameter for increasing timeouts that you can use at Tika side

Rafa

On Tue, Dec 4, 2018 at 9:58 AM Furkan KAMACI  wrote:

> Hi,
>
> I try to test external OCR capabilities of Tika Server with ManifoldCF
> 2.11. Documents are parsed when I curl documents into Tika Server directly.
> However, when I try to parse them via Tika Server I get that error at
> *most* of the documents (not all of them):
>
> INFO  meta (application/msword)
> WARN  meta: Text extraction failed
> org.apache.tika.exception.TikaException: Unable to extract PDF content
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:402)
> at
> org.apache.tika.server.resource.MetadataResource.parseMetadata(MetadataResource.java:126)
> at
> org.apache.tika.server.resource.MetadataResource.getMetadata(MetadataResource.java:60)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)
> at
> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
> at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:193)
> at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:103)
> at
> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
> at
> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
> at
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
> at
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
> at
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
> at
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.server.Server.handle(Server.java:531)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
> at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
> at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.commons.io.IOExceptionWithCause: Unable to end a page
> at
> org.apache.tika.parser.pdf.AbstractPDF2XHTML.endPage(AbstractPDF2XHTML.java:428)
> at org.apache.tika.parser.pdf.PDF2XHTML.endPage(PDF2XHTML.java:162)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:393)
> at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
> at
> 

External Tika Server

2018-12-04 Thread Furkan KAMACI
Hi,

I try to test external OCR capabilities of Tika Server with ManifoldCF
2.11. Documents are parsed when I curl documents into Tika Server directly.
However, when I try to parse them via Tika Server I get that error at *most*
of the documents (not all of them):

INFO  meta (application/msword)
WARN  meta: Text extraction failed
org.apache.tika.exception.TikaException: Unable to extract PDF content
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:402)
at
org.apache.tika.server.resource.MetadataResource.parseMetadata(MetadataResource.java:126)
at
org.apache.tika.server.resource.MetadataResource.getMetadata(MetadataResource.java:60)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)
at
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:193)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:103)
at
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
at
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
at
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.commons.io.IOExceptionWithCause: Unable to end a page
at
org.apache.tika.parser.pdf.AbstractPDF2XHTML.endPage(AbstractPDF2XHTML.java:428)
at org.apache.tika.parser.pdf.PDF2XHTML.endPage(PDF2XHTML.java:162)
at
org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:393)
at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
at
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
at
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
... 44 more
Caused by: org.apache.tika.exception.TikaException: TesseractOCRParser
timeout
at
org.apache.tika.parser.ocr.TesseractOCRParser.doOCR(TesseractOCRParser.java:562)
at
org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:434)
at