Hello,
Reading a PDF File based on COS Objects will teach you *a lot* about PDF
Internals. There are some tools out there to do that or you can write
your own I used to have a plug-in to do that but I've mislaid the
source and it no longer works with the later versions of Acrobat :-(
Regar
adriano,
adriano wrote
> I think we are starting to converge to a common understanding of my point!
> :)
Well, you want to analyze all data in a PDF file, be it reachable from the
root or not. This can be a valid requirement, e.g. for repairing files with
broken cross references or for analyzing
adriano,
adriano wrote
> From what you wrote, I seem to understand that whatever "garbage" is found
> in an incoming PDF document, even if not referenced in its xref table, can
> be regarded as
/
> safe
/
> : did I get you right ?
What exactly do you imply saying it is "safe"?
I would formulate
adriano,
adriano wrote
> I am not referring to regular PDF documents, but intentionally altered
> ones made up by the bad guys in order to try and cause problems to some
> application. I am aware that a PDF document may have more than one
> /Catalog if it has been revised.
> So what I was asking
adriano,
adriano wrote
> but then I would like to ask the question in more practical and
> limited-scoped terms: is it possible, by using iText (preferably) or
> another piece of Java software, to scan a PDF and at least detect the
> presence (if not retrieve) of duplicate objects?
I doubt that g
adriano,
adriano wrote
> is there a way to retrieve/list all indirect objects found a PDF,
> regardless of the information contained in the xref table? I mean, without
> relying on the Xref Table to be correct.
>
> The final goal is to detect duplicate indirect objects (that is, objects
> with no