forwarding to the correct pdfbox address... sorry for the noise...

---------- Forwarded message ---------
From: Tim Allison <[email protected]>
Date: Wed, Jan 30, 2019 at 10:29 AM
Subject: Re: Memory Errors with PDFBOX
To: <[email protected]>, Jim <[email protected]>, <[email protected]>


@PDFBox colleagues,
  Any thoughts/recommendations?

On Wed, Jan 30, 2019 at 9:43 AM Jim <[email protected]> wrote:
>
> I have a simple Tika REST service that accepts a Base64Encoded String (which 
> for testing is a PDF File in this case).
>
> The REST service that receives the string Base64-decodes the string and 
> passes it to Tika for file text extraction (from the binary PDF content after 
> Base64 Decode).
>
> Locally, on an iMac with 16 GB, all this works fine. Even with a PDF that's 
> 150 MB!  No errors at all.
>
> Yet, using an AWS Windows 2008 server also with 16 GB RAM (t3.xlarge), I get 
> the error stack below.
>
> I've tried upping the memory used by Tomcat (CATALINA_OPTS environment 
> variable in Windows on AWS), but locally on the iMac, I don't do anything 
> special at all for all to work. Both the working iMac and Windows have the 
> same version of the service with Tika 1.20 libs.
>
> Would appreciate any advice or suggestions.
>
> Thanks very much.
>
> ERROR STACK:
>
> "java.lang.OutOfMemoryError: GC overhead limit exceeded
> at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:115)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:949)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:632)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:876)
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:152)
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279)
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864)
> at 
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:88)
> at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:993)
> at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:879)
> at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:793)
> at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:753)
> at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1200)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1173)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at 
> com.alias.ws.service.TextExtractionService.extractText(TextExtractionService.java:40)
> at 
> com.alias.ws.controllers.TextExtractionController.extractText(TextExtractionController.java:40)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209)
> at 
> org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
> at 
> org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
> at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:877)
> at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:783)
>
>
>
> Sent from ProtonMail, Swiss-based encrypted email.
>
>

Reply via email to