[ https://issues.apache.org/jira/browse/PDFBOX-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696232#action_12696232 ]
Justin LeFebvre commented on PDFBOX-372: ---------------------------------------- Ran this through the trunk version of Pdfbox and had no issues extracting the text. I believe that the changes Brian and I made to the Parser fixed this issue. > java.io.IOException: Error: expected hex character and not :32 > --------------------------------------------------------------- > > Key: PDFBOX-372 > URL: https://issues.apache.org/jira/browse/PDFBOX-372 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 0.7.3 > Environment: Solaris OS JDK 6 > Reporter: DURGA DEEP > Attachments: Webmail02.pdf > > > Unable to parse the following PDF Attachment. > java.io.IOException: Error: expected hex character and not :32 > at org.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:283) > at org.fontbox.cmap.CMapParser.parse(CMapParser.java:105) > at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:535) > at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387) > at > org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325) > at org.pdfbox.util.operator.ShowText.process(ShowText.java:64) > at > org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452) > at > org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215) > at > org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174) > at > org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) > at > org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) > at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) > at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.