forwarding to the correct pdfbox address... sorry for the noise... ---------- Forwarded message --------- From: Tim Allison <[email protected]> Date: Wed, Jan 30, 2019 at 10:29 AM Subject: Re: Memory Errors with PDFBOX To: <[email protected]>, Jim <[email protected]>, <[email protected]>
@PDFBox colleagues, Any thoughts/recommendations? On Wed, Jan 30, 2019 at 9:43 AM Jim <[email protected]> wrote: > > I have a simple Tika REST service that accepts a Base64Encoded String (which > for testing is a PDF File in this case). > > The REST service that receives the string Base64-decodes the string and > passes it to Tika for file text extraction (from the binary PDF content after > Base64 Decode). > > Locally, on an iMac with 16 GB, all this works fine. Even with a PDF that's > 150 MB! No errors at all. > > Yet, using an AWS Windows 2008 server also with 16 GB RAM (t3.xlarge), I get > the error stack below. > > I've tried upping the memory used by Tomcat (CATALINA_OPTS environment > variable in Windows on AWS), but locally on the iMac, I don't do anything > special at all for all to work. Both the working iMac and Windows have the > same version of the service with Tika 1.20 libs. > > Would appreciate any advice or suggestions. > > Thanks very much. > > ERROR STACK: > > "java.lang.OutOfMemoryError: GC overhead limit exceeded > at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:115) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:949) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:632) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:876) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:152) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864) > at > org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:88) > at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:993) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:879) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:793) > at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:753) > at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1200) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1173) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at > com.alias.ws.service.TextExtractionService.extractText(TextExtractionService.java:40) > at > com.alias.ws.controllers.TextExtractionController.extractText(TextExtractionController.java:40) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209) > at > org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136) > at > org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102) > at > org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:877) > at > org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:783) > > > > Sent from ProtonMail, Swiss-based encrypted email. > >
