[ https://issues.apache.org/jira/browse/PDFBOX-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-390: -------------------------------------- Attachment: ASCIIHexFilter_390-Patch.diff I've created a patch with the suggested changes from mathias. Has someone a sample-document to test this feature? > org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace > --------------------------------------------------------- > > Key: PDFBOX-390 > URL: https://issues.apache.org/jira/browse/PDFBOX-390 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 0.8.0-incubator > Reporter: Mathias Bosch > Fix For: 0.8.0-incubator > > Attachments: ASCIIHexFilter_390-Patch.diff > > > org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace > According to the Specification (pdf_reference_1-7.pdf) all Whitespace > Characters between the ASCII-Hex values have to be skipped (see 3.3.1 > ASCIIHexDecode Filter). > The 0.8.0-incubator source decodes (or attempts to decode) those Whitespace > Characters and as a result the byte values are wrong (all characters that > are not [0-9a-f] result in -1, but processing does continue). > This causes an invalid byte Stream. > The ASCIIHexDecode Filter Section also defines the EOD end Character of the > Byte Steam as '>' which might ease the parsing of inline Images. > (The EI Operator should follow the EOD in case of an inline Image). > Example for ASCII-Hex encoded value, copied from the Spec: > FF CE A3 7C 5B 3F 28 16 0A 02 00 02 0A 16 28 3F 5B 7C A3 CE FF > > I did fix the problem to be able to continue with my work. > I paste the changed code here as a hint that might help to fix the bug. > public class ASCIIHexFilter > implements Filter > { > /** > * Whitespace > * 0 0x00 Null (NUL) > * 9 0x09 Tab (HT) > * 10 0x0A Line feed (LF) > * 12 0x0C Form feed (FF) > * 13 0x0D Carriage return (CR) > * 32 0x20 Space (SP) > */ > protected boolean isWhitespace(int c) { > return c == 0 || c == 9 || c == 10 || c == 12 || c == 13 || c == 32; > } > > protected boolean isEOD(int c) { > return (c == 62); // '>' - EOD > } > /** > * [EMAIL PROTECTED] > */ > public void decode(InputStream compressedData, OutputStream result, > COSDictionary options, int filterIndex) throws IOException { > int value = 0; > int firstByte = 0; > int secondByte = 0; > while ((firstByte = compressedData.read()) != -1) { > > // always after first char > while(isWhitespace(firstByte)) > firstByte = compressedData.read(); > if(isEOD(firstByte)) > break; > > if(REVERSE_HEX[firstByte] == -1) > System.out.println("Invalid Hex Code; int: " + firstByte + " char: " > + (char) firstByte); > value = REVERSE_HEX[firstByte] * 16; > secondByte = compressedData.read(); > > if(isEOD(secondByte)) { > // second value behaves like 0 in case of EOD > result.write(value); > break; > } > if(secondByte >= 0) { > if(REVERSE_HEX[secondByte] == -1) > System.out.println("Invalid Hex Code; int: " + secondByte + " char: > " + (char) secondByte); > value += REVERSE_HEX[secondByte]; > } > result.write(value); > } > > result.flush(); > } > // ..................................................... > // other code remains unchanged -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.