On Sun, Mar 9, 2008 at 9:39 AM, Ralph Giles <[EMAIL PROTECTED]> wrote: > On 8-Mar-08, at 2:00 PM, Cirilo Bernardo wrote: > > > I don't understand this scheme - a stream tells you how long its input > > data segment is. > > Irrespective of your other objections, I suggest that you should > design for the occasional possibility that the Length entry in the > stream dictionary is missing or wrong and the length of the stream > has to be determined by looking for an EOD marker, hitting the end of > the file, or even something that looks like a different kind of data. > Such files are of course invalid, but they nevertheless exist. > > -r >
Yes, such streams are invalid, they may exist, and they should probably not be supported. The safest thing to do is reject that stream altogether, and this is what is stated in the PDF specification. If there is a bad PDF writer in existence somewhere, people should not waste time working around its faults. If you try to accommodate such problems then you are deliberately ignoring the specification and inviting bad coders to produce more bad code because they know someone else will put in all the effort to work around it. In the rare cases where a file may be partially corrupted and a copy cannot be obtained elsewhere, recovering information should be left to specialists, not to the PDF library. Even in the future if someone complains "But you can't view files created by FaultyPDFWriter 2.0 and 90% of people use that software" that is no excuse to cater to bad software. The question then is: do you want to trust your documents to software which is not producing correct output? So although I am guessing that your intent is to produce a more robust library, the simplest solution (drop the stream) is in fact the best and most robust, even if end-users may scream and say "oh but if they only accepted '~' for EOD rather than requiring '~>' as in the standards, then I could see this picture that is meant to display on page 2". Personally, I think supporting bad streams is simply circumventing the PDF design for reliability. - Cirilo
