Hi,

There are places which expects only direct objects but these objects may be
also indirect. As is stated in
https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf at
the end of section 7.3.10 "indirect objects" - any object value may be
direct or indirect reference except in few cases where is explicitly stated
that it either must be direct or indirect like for example keys of
dictionaries, string values in encryption dictionary and some keys in
cross-reference stream dictionary must be direct objects.

For example this pdf
https://courtselfhelp.idaho.gov/docs/forms/CAO_NCA_1-2.pdf contains
"listbox" acroform field with "Opt" key as indirect reference but podofo is
able to handle only direct object here as can be also seen in source
PdfField.cpp for example function PdfListField::GetItemCount. There are
other PDF files which for example use indirect reference for "Rect" field
of page or acroform field. Another example pdf is attached which contains
indirect "Filter" key in stream. It is perfectly valid pdf but podofo has
problems with it for example when trying to draw on page 0 it cannot decode
correctly old content in this stream.

Attached patch fixes cases where podofo expects only direct objects but
they can be also indirect. This is mostly done by replacing
"GetDictionaty().GetKey(...)" with GetIndirectKey or
MustGetIndirectKey. This change should be backward compatible because in
cases GetIndirectKey or MustGetIndirectKey throws in addition (when key
value is reference and either referenced object does not exists or
dictionary does not have owner) then old code would return this reference
and subsequent code typically in form "obj->Get[Name/Array/String/...etc]"
would throw also. Or in other cases avoids what would be invalid and failed
later either.

There are also variants GetKeyAsName, GetKeyAsLong, GetKeyAsReal and
GetKeyAsBool for which I added indirect counterparts like
"GetIndirectKeyAs...".

This patch does not fix bugs where podofo expects direct object in arrays
or during certain enumerations where these values can be also indirect
references. There are few hundred uses of PdfArray indexing and are harder
to find than in case of dictionaries (GetKey). Fixing also these will be
much easier after will be merged automatic object ownership where can be
used "FindAt".

This patch does not fix stream dictionary key "DecodeParms" and its child
objects (like Predictor, Colors, BitsPerComponent, Columns, EarlyChange) as
this will be also easier after automatic object ownership where indirect
key value can be retrieved using "FindKey" called on PdfDictionary.
Currently is not possible to dereference indirect objects from
PdfDictionary and interfaces which accept decode params all use this type
as parameter. I think it is better to fix it using "FindKey" rather than to
change these interfaces to accept PdfObject.

This patch does not fix problems with encryption dictionary because podofo
needs to parse this first before parsing all other objects and expects that
all key values will be direct. But pdf reference states that only string
values within encryption dictionary must be direct objects so all other
which are names, numbers and so on can be also indirect. So fixing this it
not so easy.

There is summary which keys of which dictionaries this patch fixes which
can be also indirect:
- Filter in stream
- Kids in page tree node
- Kids in name tree node
- Names in name tree node
- Limits in name tree node
- MediaBox in page object
- CropBox in page object
- TrimBox in page object
- BleedBox in page object
- ArtBox in page object
- Rotate in page object
- Resources in page object
- Type in page object
- JS in JavaScript action
- BaseEncoding in encoding
- F in file specification
- UF in file specification
- Type in font
- Subtype in font
- MissingWidth in font
- MissingWidth in font descriptor
- D in go-to action
- Width in image
- Height in image
- * in document information
- Version in catalog
- ColorSpace in resource
- AS in annotation
- H in annotation
- F in annotation
- MK in annotation
- AP in annotation
- Rect in annotation
- Contents in annotation
- C in annotation
- Dest in link annotation
- Open in pop-up annotation
- QuadPoints in text markup annotations
- N in appearance
- FT in field
- Ff in field
- V in field
- RV in field
- TM in field
- TU in field
- T in field
- AA in field
- MaxLen in text field
- Opt in choice fields
- AC in appearance characteristics
- RC in appearance characteristics
- CA in appearance characteristics
- NeedAppearances in interactive form
- S in action
- Subtype in annotation
- Type in element
- Flags in font descriptor
- FirstChar in font
- LastChar in font
- DW in font
- FontWeight in font descriptor
- ItalicAngle in font descriptor
- Ascent in font descriptor
- Descent in font descriptor
- Subtype in xobject

Attachment: indirect_filter.pdf
Description: Adobe PDF document

Attachment: dict_indirect_objects.patch
Description: Binary data

_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to