Hi, But you added it to all ParseFile variants ;)
Yes it makes sense. There are pdf files which contains broken objects which podofo cannot open but many other pdf tool has no problem with them. For example I saw password encrypted pdf which was edited by some tool which created new encryption dictionary but did not delete object with old one. So there were two. One referenced from xref object as "Encrypt" key and the old not referenced by anything except still being in cross reference table. In encrypted pdf are all strings encrypted except those in encryption dictionary so podofo tried to decrypt strings in this old encryption dictionary (which had unencrypted strings) which failed due to some padding (not all possible byte sequences are valid encrypted data). Seems it is not possible to distinguish encryption dictionary from any other dictionary if it is not referenced from "Encrypt" key in xref because the only required key is "Filter" which can be "Standard" or also anything else. So the only good solution seems to be to ignore broken objects. With this patch podofo can open this pdf. Now "Load" functions are opening pdf by filename from disk. "LoadFromBuffer" is usable when one wants to open pdf from memory buffer. "LoadFromDevice" can be used to open pdf from C++ steams which is flexible as one can use fstream or other implementation which for example opens from memory or other source. So I do not see reason why it would not make sense as pdf files from any source can contain broken objects. I think ignoring broken objects should be the default behaviour as many other pdf tools and viewers do. If some end user wants to get pdf info using podofopdfinfo tool what is the point in showing some error message that there is something wrong with that pdf when it opens finely in pdf viewer? There is also strict mode in podofo parser which is turned off by default to allow opening some broken pdf files. On Wed, Feb 19, 2020 at 3:54 PM John Senneker <john.senne...@arup.com> wrote: > Hi Michal, > > Good catch! There’s no reason in principle why the flag shouldn’t be added > to the other Load functions as well. I didn’t add them there because I > don’t use those functions, and I’m not sure whether or not the flag makes > sense there. Your patch looks good to me, but because I don’t use those > APIs I can’t test it. > > > > *From:* Michal Sudolsky <sudols...@gmail.com> > *Sent:* Monday, February 17, 2020 6:42 PM > *To:* John Senneker <john.senne...@arup.com> > *Cc:* podofo-users@lists.sourceforge.net > *Subject:* [External] Re: [Podofo-users] Patch for ignoring broken objects > > > > I am sending updated patch which covers also these two functions. > > > > > > On Mon, Feb 17, 2020 at 11:41 PM Michal Sudolsky <sudols...@gmail.com> > wrote: > > Hi, > > > > There are also other "load" functions like LoadFromBuffer and > LoadFromDevice. Why are these not covered? > > > > On Thu, Feb 13, 2020 at 6:08 PM John Senneker <john.senne...@arup.com> > wrote: > > Hello, > > The patch I sent you had a merge mistake, which caused the new > bIgnoreBrokenObjects parameter not to be passed to the parser from > PdfMemDocument in one of the variants of PdfMemDocument::Load(). The > updated patch attached to this email has fixed this error. > > -- > > John Senneker > > > > *From:* John Senneker > *Sent:* Monday, February 3, 2020 2:50 PM > *To:* podofo-users@lists.sourceforge.net > *Subject:* Patch for ignoring broken objects > > > > Hi, > > I submitted this patch a while ago, but didn’t get a response. In case it > slipped through the cracks, I’ve pasted the text of the email below, and > re-attached the patch file. > > > > An issue I’ve run into is full failure of parsing when objects are > referred to that don’t exist. Often this is due to XRef stream entries that > point to non-existent object streams or non-existent indices within those > streams. This should be a recoverable error. > > > > There’s a bool member `m_bIgnoreBrokenObjects` in `PdfParser`, and that > member is exposed through `SetIgnoreBrokenObjects()`, but it’s set to false > in `Init()`, which is called by `Clear()`, which is called at the top of > `ParseFile()`, so user-defined values are overwritten. Following the > pattern used for the `m_bLoadOnDemand` flag, I’ve modified the > `ParseFile()` functions to allow the user to pass in an optional > `bIgnoreBrokenObjects` parameter (default false) to get around this issue. > I’ve attached a patch to show what this looks like, and which also allows > passing the `bIgnoreBrokenObjects` flag to `PdfMemDocument` constructors > and `Load()` functions. It also checks the flag when reading objects from a > stream. > > > > Is this the right way to go about this, or is there a better way to allow > graceful failure and continuation of parsing? > > -- > > John Senneker > > ____________________________________________________________ > Electronic mail messages entering and leaving Arup business systems are > scanned for viruses and acceptability of content. > > _______________________________________________ > Podofo-users mailing list > Podofo-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/podofo-users > <https://secure-web.cisco.com/1WZsb58KOW_degnSaJl4qBzJxkvdEVv-69IcugRmPy3afV68sCG4UIbaMjX3DQpsablhmQGWrUskTnnX_7oOkK1_qmTgO8agnyUqs8XiLwZxwv2vkDV9LTJLfUkR_5d-C2mP5nMT6NlwJN99-BPvqXp-WKfzxiqLU6XOAYBAo37ng8QN3V-SIgE9r5QkeybvzyJsWMbJNEaBaNGhg8NYcQ_MRzZIsp3Jda6Ru3Wtq-CFGeBE0h8j9mx36soDKVm4ZQovaEweKxpXec3NMVjedyWRiSf1LXLrSCRwFCVuEho6BEMouJLyoy6hTUmgrHGczRIVKPRYKZnYQbgWudrdkCQ/https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fpodofo-users> > >
_______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users