Am Thursday 22 November 2007 schrieb Craig Ringer: > Hi folks > > I'm having a bit of trouble with some PDF code, and was hoping for a > helping hand. The following code is the the end of the content stream > for PDF Reference 1.5 (v5) page 89: > > q > 232.5 281.5 147.5 245.975 re > W n > 0.5 w > 232.75 527.225 147 -245.475 re > S > EMC > Q > > ... and as far as I can tell it's not valid, since the EMC operator for > closing a BMC/BDC/EMC scope appears inside a q/Q scope. If I understand > correctly, the EMC operator should appear after the Q operator, not > before it. There is an open BDC context for the EMC to close, so if the > ordering of those two operators were reversed the content stream would > be fine. > > The same issue appears on page 641 of the PDF 1.6 reference. > > It seems unlikely that the PDF Reference would contain bad PDF, so I'm > sure I'm missing something. If anyone sees an obvious answer to what > that is, I'd love to hear about it. > > > I'm also seeing lots of issues in other files where PdfContentsTokenizer > claims that there are indirect references in the content stream. I > haven't seen this when checking the PDF references, but it turns up in a > lot of other files. I haven't verified that it's actually being parsed > correctly yet, but presuming it is are there any circumstances in which > that might be legal after all? I am not sure if the above code is legal. But it does not look very clean. Currently I cannot bring any case to my mind where I would have used indirect references in a content stream. Well, that's why we have a resource dictionary, don't we?
>
>
> Finally, on a side note ... opening a PDF that contains object streams
> is *really* slow. I haven't looked into why, but if there is anything
> you could point me at that I might be able to do about that, I'd like to
> hear about it.
You are right. This is one point that needs optimizations while parsing. You
might want to look at:
void PdfParser::ReadObjectFromStream( int nObjNo, int )
Current issues are:
- We read all objects from a sream immediately into memory (I think this is
ok, because we have to decompress the stream only once)
- At the beginning of ReadObjectFromStream we search in a NON-SORTED list if
we read this stream already. Using binary search on a sorted list could be an
improvement here
-Maybe we can generally find a way to reduce calls to ReadObjectFromStream by
optimizing PdfParser::ReadObjectsInternal().
-Another point is that the object containing the stream currently stays in
memory after we read. I think it could even be deleted savely. It should not
be referenced from anywhere else.
best regards,
Dom
--
**********************************************************************
Dominik Seichter - [EMAIL PROTECTED]
KRename - http://www.krename.net - Powerful batch renamer for KDE
KBarcode - http://www.kbarcode.net - Barcode and label printing
PoDoFo - http://podofo.sf.net - PDF generation and parsing library
SchafKopf - http://schafkopf.berlios.de - Schafkopf, a card game, for KDE
Alan - http://alan.sf.net - A Turing Machine in Java
**********************************************************************
signature.asc
Description: This is a digitally signed message part.
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________ Podofo-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/podofo-users
