RE: [PDFdev] Re: Question about unknown PDF trailer tag

Mark Storer Thu, 29 Apr 2004 12:42:52 -0700

I'm guessing here, but it sounds like some third party's addition to the trailer. What are the producer and creator in those document's info dictionaries?

What are you trying to remove, anyway?

--Mark Storer
Senior Software Engineer
Verity, Inc.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf Of intro intro
Sent: Thursday, April 29, 2004 10:50 AM
To: [EMAIL PROTECTED]
Subject: Re: [PDFdev] Re: Question about unknown PDF trailer tag

   Dan-Ari,

   I appreciate your reply, but you misunderstood my question. Perhaps I did not phrase it correctly so I will try again.

   I do know what the xref table is, it's the object cross reference table, it's divided into subsections and contains a list of offsets, generation count and status for each respective object in that sequentially ordered subsection. At least, that's my understanding, and actually, I already completed the code that parses this xref table and loads all the entries.

   But that's not the xref I am referring to.

   In the trailer section of the PDF (after xref and before startxref, i.e. "trailer << ... >>"), where the trailer variables exist, such as linked list ("/Root", et al), "/Size" etc, there is a variable that I could not find in the specification, it's called "/XRefs".  This variable has an object reference, and in that object is some data that looks like this: "<< /XRef [[123 7 0 6 0 ][456 8 0 7 0][ ... ]] >>" (not an identical copy).

   I don't know what this is object is used for. It seems to have some object references in there (objects that in the xref table have a 0 offset, i.e. not really in use), and appears to be linked in some way. However, if I delete that object and its reference in the trailer, it doesn't seem to affect the document. It loads, rebuilds and displays just the same. I just want to understand what it is before I commit to deleting it from every document. Unfortunately I have not been able to find any information about it anywhere.

   As for the PDF modification, I'm writing a small library with limited use for a specific purpose. I need to to remove an offending tag and define a font. After the modifications are done, the library will be rewriting the PDF from scratch including recalculating a new xref table, so the rebuild step that acrobat performs will no longer be necessary after its been modified. Yes, there are plenty of libraries out there that can probably handle this, but every one I've investigated has had a significant drawback. If I had found something that fit I would have bought it, but I couldn't and I need to get this thing done so I decided to write it myself. No big deal, I just need to know what that aforementioned object is for. If you or anyone else has any information about it that would be greatly appreciated.

   Thanks.

Dan-Ari Feinberg listreader <[EMAIL PROTECTED]> wrote:

PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________

Introvert,

You wrote that your goal is "modification of one object and rebuilding all
the indexes". What indexes are you referring to?

If the indexes that you want to rebuild are the indexes that indicate where
each object exists in the file, then that *is* the XRef table.

The XRef table is carefully documented in both the 1.3 and 1.5 specs. If
it is damaged or not present, Acrobat will try to scan the file and rebuild
the XRef table, but this is not something you should rely upon, because
that scanning/rebuilding will not always work. Some people modify an
object directly without rebuilding the XRef table, and this poor man's
solution seems to work great, and they don't know why other! s warned about
the complexity of PDF files. But then some files are produced that Acrobat
cannot scan and rebuild the XRef table, and the solution fails.

There are multiple libraries that can modify a PDF file, and --please
forgive me for being harsh-- if you cannot find a reference to the XRef
table in the PDF Spec, then you are quite likely getting in over your head
to try to write a library yourself to manipulate a PDF file. You may be
able to write a simple solution to modify a PDF object, and it might work
all of the time. Or it might work some of the time, and then you will have
trouble.

PDFZone and PlanetPDF both list many libraries for manipulating PDF. If
you tell us more about what you are trying to do, someone could recommend a
particular solution. Tell us: Platform and language and what you are
actually trying to do. Do you want to change text on a page? Move a form
field? The more specific you are, the more w! e can help.

Regards,
Dan-Ari

To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html

Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs

RE: [PDFdev] Re: Question about unknown PDF trailer tag

Reply via email to