I have a simple Tika REST service that accepts a Base64Encoded String (which 
for testing is a PDF File in this case).

The REST service that receives the string Base64-decodes the string and passes 
it to Tika for file text extraction (from the binary PDF content after Base64 
Decode).

Locally, on an iMac with 16 GB, all this works fine. Even with a PDF that's 150 
MB!  No errors at all.

Yet, using an AWS Windows 2008 server also with 16 GB RAM (t3.xlarge), I get 
the error stack below.

I've tried upping the memory used by Tomcat (CATALINA_OPTS environment variable 
in Windows on AWS), but locally on the iMac, I don't do anything special at all 
for all to work. Both the working iMac and Windows have the same version of the 
service with Tika 1.20 libs.

Would appreciate any advice or suggestions.

Thanks very much.

ERROR STACK:

"java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:115)
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:949)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:632)
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:876)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:152)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864)
at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:88)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:993)
at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:879)
at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:793)
at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:753)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1200)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1173)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at 
com.alias.ws.service.TextExtractionService.extractText(TextExtractionService.java:40)
at 
com.alias.ws.controllers.TextExtractionController.extractText(TextExtractionController.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209)
at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:877)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:783)

Sent from [ProtonMail](https://protonmail.com), Swiss-based encrypted email.

Reply via email to