[ https://issues.apache.org/jira/browse/PDFBOX-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670654#action_12670654 ]
Lars Torunski commented on PDFBOX-227: -------------------------------------- I got the same problem: java.lang.ArrayIndexOutOfBoundsException: 4 at org.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:294) at org.fontbox.cmap.CMapParser.parse(CMapParser.java:103) at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:535) at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387) at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325) at org.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:80) at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452) at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215) at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174) at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) Also no special security applied and the pdf opens fine. > ArrayIndexOutOfBoundsException:4 > -------------------------------- > > Key: PDFBOX-227 > URL: https://issues.apache.org/jira/browse/PDFBOX-227 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Priority: Minor > > [imported from SourceForge] > http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1610268 > Originally submitted by fotb on 2006-12-06 09:04. > Does anyone know if there has been any resolution to the > ArrayIndexOutOfBoundsException:4 problem. > I have extracted text from 300 pdfs using 0.7.3. All extractions were > successful except 3. I recieved this error message when trying to extract > text from these 3 pds (java.lang.ArrayIndexOutOfBoundsException:4). I am > able to open the pdfs fine and they have no special security applied to > them. Any ideas as to why PDFBOX 0.7.3 is hiccuping while trying to > process these files? I am not able to send the pdf over the internet because > it is government property. > [comment on SourceForge] > Originally sent by fotb. > Logged In: YES > user_id=1662347 > Originator: YES > I loaded PDFBox-0.7.2 and the problem went away. The 3 pdfs that were > raising errors with PDFBox-0.7.3 are now being successfully processed with > text being extracted from them. Something in PDFBox-0.7.3 is causing the > problem. If anyone else is having the ArrayIndexOutOfBoundsException is, I > would suggest loading PDFBox-0.7.2 and see if you still get the error. > Thanks Ben for your time. Be well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.