Am Thursday 22 November 2007 schrieb Craig Ringer:
> Hi folks
>
> I'm having a bit of trouble with some PDF code, and was hoping for a
> helping hand. The following code is the the end of the content stream
> for PDF Reference 1.5 (v5) page 89:
>
> q
> 232.5 281.5 147.5 245.975 re
> W n
> 0.5 w
> 232.75 527.225 147 -245.475 re
> S
> EMC
> Q
>
> ... and as far as I can tell it's not valid, since the EMC operator for
> closing a BMC/BDC/EMC scope appears inside a q/Q scope. If I understand
> correctly, the EMC operator should appear after the Q operator, not
> before it. There is an open BDC context for the EMC to close, so if the
> ordering of those two operators were reversed the content stream would
> be fine.
>
> The same issue appears on page 641 of the PDF 1.6 reference.
>
> It seems unlikely that the PDF Reference would contain bad PDF, so I'm
> sure I'm missing something. If anyone sees an obvious answer to what
> that is, I'd love to hear about it.
>
>
> I'm also seeing lots of issues in other files where PdfContentsTokenizer
> claims that there are indirect references in the content stream. I
> haven't seen this when checking the PDF references, but it turns up in a
> lot of other files. I haven't verified that it's actually being parsed
> correctly yet, but presuming it is are there any circumstances in which
> that might be legal after all?
I am not sure if the above code is legal. But it does not look very clean. 
Currently I cannot bring any case to my mind where I would have used indirect 
references in a content stream. Well, that's why we have a resource 
dictionary, don't we?

>
>
> Finally, on a side note ... opening a PDF that contains object streams
> is *really* slow. I haven't looked into why, but if there is anything
> you could point me at that I might be able to do about that, I'd like to
> hear about it.
You are right. This is one point that needs optimizations while parsing. You 
might want to look at:

void PdfParser::ReadObjectFromStream( int nObjNo, int )

Current issues are:
- We read all objects from a sream immediately into memory (I think this is 
ok, because we have to decompress the stream only once)
- At the beginning of ReadObjectFromStream we search in a NON-SORTED list if 
we read this stream already. Using binary search on a sorted list could be an 
improvement here
-Maybe we can generally find a way to reduce calls to ReadObjectFromStream by 
optimizing PdfParser::ReadObjectsInternal(). 
-Another point is that the object containing the stream currently stays in 
memory after we read. I think it could even be deleted savely. It should not 
be referenced from anywhere else.

best regards,
        Dom

-- 
**********************************************************************
Dominik Seichter - [EMAIL PROTECTED]
KRename  - http://www.krename.net  - Powerful batch renamer for KDE
KBarcode - http://www.kbarcode.net - Barcode and label printing
PoDoFo - http://podofo.sf.net - PDF generation and parsing library
SchafKopf - http://schafkopf.berlios.de - Schafkopf, a card game,  for KDE
Alan - http://alan.sf.net - A Turing Machine in Java
**********************************************************************

Attachment: signature.asc
Description: This is a digitally signed message part.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to