Hi, There are places which expects only direct objects but these objects may be also indirect. As is stated in https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf at the end of section 7.3.10 "indirect objects" - any object value may be direct or indirect reference except in few cases where is explicitly stated that it either must be direct or indirect like for example keys of dictionaries, string values in encryption dictionary and some keys in cross-reference stream dictionary must be direct objects.
For example this pdf https://courtselfhelp.idaho.gov/docs/forms/CAO_NCA_1-2.pdf contains "listbox" acroform field with "Opt" key as indirect reference but podofo is able to handle only direct object here as can be also seen in source PdfField.cpp for example function PdfListField::GetItemCount. There are other PDF files which for example use indirect reference for "Rect" field of page or acroform field. Another example pdf is attached which contains indirect "Filter" key in stream. It is perfectly valid pdf but podofo has problems with it for example when trying to draw on page 0 it cannot decode correctly old content in this stream. Attached patch fixes cases where podofo expects only direct objects but they can be also indirect. This is mostly done by replacing "GetDictionaty().GetKey(...)" with GetIndirectKey or MustGetIndirectKey. This change should be backward compatible because in cases GetIndirectKey or MustGetIndirectKey throws in addition (when key value is reference and either referenced object does not exists or dictionary does not have owner) then old code would return this reference and subsequent code typically in form "obj->Get[Name/Array/String/...etc]" would throw also. Or in other cases avoids what would be invalid and failed later either. There are also variants GetKeyAsName, GetKeyAsLong, GetKeyAsReal and GetKeyAsBool for which I added indirect counterparts like "GetIndirectKeyAs...". This patch does not fix bugs where podofo expects direct object in arrays or during certain enumerations where these values can be also indirect references. There are few hundred uses of PdfArray indexing and are harder to find than in case of dictionaries (GetKey). Fixing also these will be much easier after will be merged automatic object ownership where can be used "FindAt". This patch does not fix stream dictionary key "DecodeParms" and its child objects (like Predictor, Colors, BitsPerComponent, Columns, EarlyChange) as this will be also easier after automatic object ownership where indirect key value can be retrieved using "FindKey" called on PdfDictionary. Currently is not possible to dereference indirect objects from PdfDictionary and interfaces which accept decode params all use this type as parameter. I think it is better to fix it using "FindKey" rather than to change these interfaces to accept PdfObject. This patch does not fix problems with encryption dictionary because podofo needs to parse this first before parsing all other objects and expects that all key values will be direct. But pdf reference states that only string values within encryption dictionary must be direct objects so all other which are names, numbers and so on can be also indirect. So fixing this it not so easy. There is summary which keys of which dictionaries this patch fixes which can be also indirect: - Filter in stream - Kids in page tree node - Kids in name tree node - Names in name tree node - Limits in name tree node - MediaBox in page object - CropBox in page object - TrimBox in page object - BleedBox in page object - ArtBox in page object - Rotate in page object - Resources in page object - Type in page object - JS in JavaScript action - BaseEncoding in encoding - F in file specification - UF in file specification - Type in font - Subtype in font - MissingWidth in font - MissingWidth in font descriptor - D in go-to action - Width in image - Height in image - * in document information - Version in catalog - ColorSpace in resource - AS in annotation - H in annotation - F in annotation - MK in annotation - AP in annotation - Rect in annotation - Contents in annotation - C in annotation - Dest in link annotation - Open in pop-up annotation - QuadPoints in text markup annotations - N in appearance - FT in field - Ff in field - V in field - RV in field - TM in field - TU in field - T in field - AA in field - MaxLen in text field - Opt in choice fields - AC in appearance characteristics - RC in appearance characteristics - CA in appearance characteristics - NeedAppearances in interactive form - S in action - Subtype in annotation - Type in element - Flags in font descriptor - FirstChar in font - LastChar in font - DW in font - FontWeight in font descriptor - ItalicAngle in font descriptor - Ascent in font descriptor - Descent in font descriptor - Subtype in xobject
indirect_filter.pdf
Description: Adobe PDF document
dict_indirect_objects.patch
Description: Binary data
_______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users