I have a simple Tika REST service that accepts a Base64Encoded String (which for testing is a PDF File in this case).
The REST service that receives the string Base64-decodes the string and passes it to Tika for file text extraction (from the binary PDF content after Base64 Decode). Locally, on an iMac with 16 GB, all this works fine. Even with a PDF that's 150 MB! No errors at all. Yet, using an AWS Windows 2008 server also with 16 GB RAM (t3.xlarge), I get the error stack below. I've tried upping the memory used by Tomcat (CATALINA_OPTS environment variable in Windows on AWS), but locally on the iMac, I don't do anything special at all for all to work. Both the working iMac and Windows have the same version of the service with Tika 1.20 libs. Would appreciate any advice or suggestions. Thanks very much. ERROR STACK: "java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:115) at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:949) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:632) at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:876) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:152) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212) at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864) at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:88) at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:993) at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:879) at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:793) at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:753) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1200) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1173) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at com.alias.ws.service.TextExtractionService.extractText(TextExtractionService.java:40) at com.alias.ws.controllers.TextExtractionController.extractText(TextExtractionController.java:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:877) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:783) Sent from [ProtonMail](https://protonmail.com), Swiss-based encrypted email.
