[PHP] Re: PHP class or functions to manipulate PDF metadata?

Peter Ford Tue, 07 Apr 2009 01:03:33 -0700

O. Lavell wrote:
> Peter Ford wrote:
> 
>> O. Lavell wrote:
> 
> [..]
> 
>>> Any and all suggestions are welcome. Thank you in advance.
>>>
>> So many people ask about manipulating, editing and generally processing
>> PDF files. In my experience, PDF is a write-once format - any
>> manipulation should have been done in whatever source generated the PDF.
>> I think of a PDF as being a piece of paper: if you want to change the
>> content of a piece of paper it is usually best to chuck it away and
>> start again...
>>
>> Even more so, this would apply to the PDF metadata: metadata is supposed
>> to describe the nature of the document: it's author, creation time etc.
>> That sort of data should be maintained with the document and ideally not
>> changed throughout the document's lifetime (like the footer, or
>> end-papers in a physical book)
> 
> Thank you very much for your reply. And it's not that I don't agree with 
> you. Because I do, completely.
> 
> However...
> 
> PDFs often come from sources that can't be bothered to fill in the 
> relevant fields correctly, completely, or at all. For those cases I would 
> like the users of my application to be able to correct the values found 
> in the metadata. Upload the PDF, get a nice little HTML form with 4 or 5 
> values to review or edit. That sort of thing.
> 
>> I do accept that the metadata should be machine-readable: that part of
>> your project is reasonable and I'm fairly sure that ought to be possible
>> with something simple. The best bet I found so far is PDFTK
>> (http://www.pdfhacks.com/pdftk/) which is a command-line tool that you
>> could presumably call with exec or whatever...
> 
> Like I said, this is what I am already doing with the pdfinfo utility 
> from xpdf.


Sorry - I guess I didn't read that bit carefully enough...

> 
> But now that you mentioned pdftk... I just tried it and it does seem to 
> come close to what I want. It is capable of writing a new PDF with the 
> contents of an existing one, with new metadata fed as a text file. So it 
> shouldn't be very hard to write a little PHP around that process.
> 
> Now I need to think a bit more about this approach. Perhaps it can be 
> implemented using only pure PHP, after all. But for the time being, pdftk 
> will do.
> 
> So thank you again for pushing me in that direction, even if 
> unintentionally and despite the fact that what I am doing goes against 
> your judgement ;)
> 

As I know only too well, you can't always choose your customers (especially if
they choose you...) and you certainly can't control all of the sources of data
you have to deal with!
I have spent many hours/days/possibly longer hacking through files that are in
one form to get data into another, and PDF is the one that always makes me
nervous :(
My judgement is certainly not final, or even particularly important: if I had
time I would also look into at least getting the metadata with pure PHP.

Good luck...

-- 
Peter Ford                              phone: 01580 893333
Developer                               fax:   01580 893399
Justcroft International Ltd., Staplehurst, Kent

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP] Re: PHP class or functions to manipulate PDF metadata?

Reply via email to