Hi,

But you added it to all ParseFile variants ;)

Yes it makes sense. There are pdf files which contains broken objects which
podofo cannot open but many other pdf tool has no problem with them. For
example I saw password encrypted pdf which was edited by some tool which
created new encryption dictionary but did not delete object with old one.
So there were two. One referenced from xref object as "Encrypt" key and the
old not referenced by anything except still being in cross reference table.
In encrypted pdf are all strings encrypted except those in encryption
dictionary so podofo tried to decrypt strings in this old encryption
dictionary (which had unencrypted strings) which failed due to some padding
(not all possible byte sequences are valid encrypted data). Seems it is not
possible to distinguish encryption dictionary from any other dictionary if
it is not referenced from "Encrypt" key in xref because the only required
key is "Filter" which can be "Standard" or also anything else. So the only
good solution seems to be to ignore broken objects. With this patch podofo
can open this pdf.

Now "Load" functions are opening pdf by filename from disk.
"LoadFromBuffer" is usable when one wants to open pdf from memory buffer.
"LoadFromDevice" can be used to open pdf from C++ steams which is flexible
as one can use fstream or other implementation which for example opens from
memory or other source. So I do not see reason why it would not make sense
as pdf files from any source can contain broken objects.

I think ignoring broken objects should be the default behaviour as many
other pdf tools and viewers do. If some end user wants to get pdf info
using podofopdfinfo tool what is the point in showing some error message
that there is something wrong with that pdf when it opens finely in pdf
viewer? There is also strict mode in podofo parser which is turned off by
default to allow opening some broken pdf files.

On Wed, Feb 19, 2020 at 3:54 PM John Senneker <john.senne...@arup.com>
wrote:

> Hi Michal,
>
> Good catch! There’s no reason in principle why the flag shouldn’t be added
> to the other Load functions as well. I didn’t add them there because I
> don’t use those functions, and I’m not sure whether or not the flag makes
> sense there. Your patch looks good to me, but because I don’t use those
> APIs I can’t test it.
>
>
>
> *From:* Michal Sudolsky <sudols...@gmail.com>
> *Sent:* Monday, February 17, 2020 6:42 PM
> *To:* John Senneker <john.senne...@arup.com>
> *Cc:* podofo-users@lists.sourceforge.net
> *Subject:* [External] Re: [Podofo-users] Patch for ignoring broken objects
>
>
>
> I am sending updated patch which covers also these two functions.
>
>
>
>
>
> On Mon, Feb 17, 2020 at 11:41 PM Michal Sudolsky <sudols...@gmail.com>
> wrote:
>
> Hi,
>
>
>
> There are also other "load" functions like LoadFromBuffer and
> LoadFromDevice. Why are these not covered?
>
>
>
> On Thu, Feb 13, 2020 at 6:08 PM John Senneker <john.senne...@arup.com>
> wrote:
>
> Hello,
>
> The patch I sent you had a merge mistake, which caused the new
> bIgnoreBrokenObjects parameter not to be passed to the parser from
> PdfMemDocument in one of the variants of PdfMemDocument::Load(). The
> updated patch attached to this email has fixed this error.
>
> --
>
> John Senneker
>
>
>
> *From:* John Senneker
> *Sent:* Monday, February 3, 2020 2:50 PM
> *To:* podofo-users@lists.sourceforge.net
> *Subject:* Patch for ignoring broken objects
>
>
>
> Hi,
>
> I submitted this patch a while ago, but didn’t get a response. In case it
> slipped through the cracks, I’ve pasted the text of the email below, and
> re-attached the patch file.
>
>
>
> An issue I’ve run into is full failure of parsing when objects are
> referred to that don’t exist. Often this is due to XRef stream entries that
> point to non-existent object streams or non-existent indices within those
> streams. This should be a recoverable error.
>
>
>
> There’s a bool member `m_bIgnoreBrokenObjects` in `PdfParser`, and that
> member is exposed through `SetIgnoreBrokenObjects()`, but it’s set to false
> in `Init()`, which is called by `Clear()`, which is called at the top of
> `ParseFile()`, so user-defined values are overwritten. Following the
> pattern used for the `m_bLoadOnDemand` flag, I’ve modified the
> `ParseFile()` functions to allow the user to pass in an optional
> `bIgnoreBrokenObjects` parameter (default false) to get around this issue.
> I’ve attached a patch to show what this looks like, and which also allows
> passing the `bIgnoreBrokenObjects` flag to `PdfMemDocument` constructors
> and `Load()` functions. It also checks the flag when reading objects from a
> stream.
>
>
>
> Is this the right way to go about this, or is there a better way to allow
> graceful failure and continuation of parsing?
>
> --
>
> John Senneker
>
>  ____________________________________________________________
> Electronic mail messages entering and leaving Arup business systems are
> scanned for viruses and acceptability of content.
>
> _______________________________________________
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users
> <https://secure-web.cisco.com/1WZsb58KOW_degnSaJl4qBzJxkvdEVv-69IcugRmPy3afV68sCG4UIbaMjX3DQpsablhmQGWrUskTnnX_7oOkK1_qmTgO8agnyUqs8XiLwZxwv2vkDV9LTJLfUkR_5d-C2mP5nMT6NlwJN99-BPvqXp-WKfzxiqLU6XOAYBAo37ng8QN3V-SIgE9r5QkeybvzyJsWMbJNEaBaNGhg8NYcQ_MRzZIsp3Jda6Ru3Wtq-CFGeBE0h8j9mx36soDKVm4ZQovaEweKxpXec3NMVjedyWRiSf1LXLrSCRwFCVuEho6BEMouJLyoy6hTUmgrHGczRIVKPRYKZnYQbgWudrdkCQ/https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fpodofo-users>
>
>
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to