BaseParser incorrectly handling stream, exhibiting IOException
--------------------------------------------------------------

                 Key: PDFBOX-383
                 URL: https://issues.apache.org/jira/browse/PDFBOX-383
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 0.7.3
         Environment: pdfbox 0.73 with java 5 running on windows platform
            Reporter: Son


when loading pdf file containing a file attachment annotation , errors might 
occurs when 2 conditions arise:
- the Length value for the dictionary of F stream holds an indirect reference 
to a integer value
- the content of the filtered stream contains the word 'endstream'

typically this occurs when, in the pdf file, there is a stream description as 
follows:

12 0 obj
<< /Length 16 0 R
/Filter /FlateDecode
>>
stream
{content}
endstream
endobj
...
16 0 obj
{length}
endobj
....

and it the {content} (filtered) contains the (filtered) string "endstream".
(see on line 3700 of the attachment)

the problem is related to the way stream content is (always) read by method 
readUntilEndStream () that stop on first 'endstream' sequence end.

a (partial) fix was made, that reads the stream content 3 different ways:
- if the Length is known (this is a direct object), the {length} bytes are read 
and written to the stream FilteredStream
- if the Length is unknown and if the filter is FlateFilter, the code unfilters 
the datas (the FlateDecode algorythm allows for not knowing the length of 
encoded data ahead of time) and associates to the stream's unfiltered stream
- otherwise, let current behavior

Running the modified code on files exhibiting errors has fixed problems that 
was encountered. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to