[jira] Updated: (PDFBOX-390) org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace

JIRA Sat, 29 Nov 2008 06:09:08 -0800

     [ 
https://issues.apache.org/jira/browse/PDFBOX-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andreas Lehmkühler updated PDFBOX-390:
--------------------------------------

    Attachment: ASCIIHexFilter_390-Patch.diff

I've created a patch with the suggested changes from mathias. Has someone a 
sample-document to test this feature?

> org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
> ---------------------------------------------------------
>
>                 Key: PDFBOX-390
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-390
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.8.0-incubator
>            Reporter: Mathias Bosch
>             Fix For: 0.8.0-incubator
>
>         Attachments: ASCIIHexFilter_390-Patch.diff
>
>
> org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
> According to the Specification (pdf_reference_1-7.pdf) all Whitespace
> Characters between the ASCII-Hex values have to be skipped (see 3.3.1
> ASCIIHexDecode Filter).
> The 0.8.0-incubator source decodes (or attempts to decode) those Whitespace
> Characters and as a result the byte values are wrong (all characters that
> are not [0-9a-f] result in -1, but processing does continue).
> This causes an invalid byte Stream.
> The ASCIIHexDecode Filter Section also defines the EOD end Character of the
> Byte Steam as '>' which might ease the parsing of inline Images.
> (The EI Operator should follow the EOD in case of an inline Image).
> Example for ASCII-Hex encoded value, copied from the Spec:
> FF CE A3 7C 5B 3F 28 16 0A 02 00 02 0A 16 28 3F 5B 7C A3 CE FF >
> I did fix the problem to be able to continue with my work.
> I paste the changed code here as a hint that might help to fix the bug.
> public class ASCIIHexFilter
>   implements Filter
> {
>  /**
>   * Whitespace
>   *   0  0x00  Null (NUL)
>   *   9  0x09  Tab (HT)
>   *  10  0x0A  Line feed (LF)
>   *  12  0x0C  Form feed (FF)
>   *  13  0x0D  Carriage return (CR)
>   *  32  0x20  Space (SP)  
>   */
>   protected boolean isWhitespace(int c) {
>     return c == 0 || c == 9 || c == 10 || c == 12 || c == 13 || c == 32;
>   }
>   
>   protected boolean isEOD(int c) {
>     return (c == 62); // '>' - EOD
>   }
>   /**
>    * [EMAIL PROTECTED]
>    */
>   public void decode(InputStream compressedData, OutputStream result, 
> COSDictionary options, int filterIndex) throws IOException {
>     int value = 0;
>     int firstByte = 0;
>     int secondByte = 0;
>     while ((firstByte = compressedData.read()) != -1) {
>       
>       // always after first char
>       while(isWhitespace(firstByte))
>         firstByte = compressedData.read();
>       if(isEOD(firstByte))
>         break;
>       
>       if(REVERSE_HEX[firstByte] == -1)
>         System.out.println("Invalid Hex Code; int: " + firstByte + " char: " 
> + (char) firstByte);
>       value = REVERSE_HEX[firstByte] * 16;
>       secondByte = compressedData.read();
>       
>       if(isEOD(secondByte)) {
>         // second value behaves like 0 in case of EOD
>         result.write(value);
>         break;
>       }
>       if(secondByte >= 0) {
>         if(REVERSE_HEX[secondByte] == -1)
>           System.out.println("Invalid Hex Code; int: " + secondByte + " char: 
> " + (char) secondByte);
>         value += REVERSE_HEX[secondByte];
>       }
>       result.write(value);
>     }
>     
>     result.flush();
>   }
> // .....................................................
> // other code remains unchanged

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PDFBOX-390) org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace

Reply via email to