[PHP] Re: PHP class or functions to manipulate PDF metadata?
Peter Ford wrote: > O. Lavell wrote: >> Peter Ford wrote: [..] >>> I do accept that the metadata should be machine-readable: that part of >>> your project is reasonable and I'm fairly sure that ought to be >>> possible with something simple. The best bet I found so far is PDFTK >>> (http://www.pdfhacks.com/pdftk/) which is a command-line tool that you >>> could presumably call with exec or whatever... >> >> Like I said, this is what I am already doing with the pdfinfo utility >> from xpdf. > > Sorry - I guess I didn't read that bit carefully enough... No problem at all, I was really glad someone wanted to share their thoughts anyway after it first seemed that no one was interested. [..] >> So thank you again for pushing me in that direction, even if >> unintentionally and despite the fact that what I am doing goes against >> your judgement ;) >> >> > As I know only too well, you can't always choose your customers > (especially if they choose you...) and you certainly can't control all > of the sources of data you have to deal with! Exactly. > I have spent many hours/days/possibly longer hacking through files that > are in one form to get data into another, and PDF is the one that always > makes me nervous :( So far you, Tedd and I agree on this. The so-called portable document format is a rather convoluted thing. > My judgement is certainly not final, or even particularly important: if > I had time I would also look into at least getting the metadata with > pure PHP. > > Good luck... Thank you. If I did have the time (to spare) I would feel almost obliged to try to figure it out. Perhaps in a week or two... -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: PHP class or functions to manipulate PDF metadata?
O. Lavell wrote: > Peter Ford wrote: > >> O. Lavell wrote: > > [..] > >>> Any and all suggestions are welcome. Thank you in advance. >>> >> So many people ask about manipulating, editing and generally processing >> PDF files. In my experience, PDF is a write-once format - any >> manipulation should have been done in whatever source generated the PDF. >> I think of a PDF as being a piece of paper: if you want to change the >> content of a piece of paper it is usually best to chuck it away and >> start again... >> >> Even more so, this would apply to the PDF metadata: metadata is supposed >> to describe the nature of the document: it's author, creation time etc. >> That sort of data should be maintained with the document and ideally not >> changed throughout the document's lifetime (like the footer, or >> end-papers in a physical book) > > Thank you very much for your reply. And it's not that I don't agree with > you. Because I do, completely. > > However... > > PDFs often come from sources that can't be bothered to fill in the > relevant fields correctly, completely, or at all. For those cases I would > like the users of my application to be able to correct the values found > in the metadata. Upload the PDF, get a nice little HTML form with 4 or 5 > values to review or edit. That sort of thing. > >> I do accept that the metadata should be machine-readable: that part of >> your project is reasonable and I'm fairly sure that ought to be possible >> with something simple. The best bet I found so far is PDFTK >> (http://www.pdfhacks.com/pdftk/) which is a command-line tool that you >> could presumably call with exec or whatever... > > Like I said, this is what I am already doing with the pdfinfo utility > from xpdf. Sorry - I guess I didn't read that bit carefully enough... > > But now that you mentioned pdftk... I just tried it and it does seem to > come close to what I want. It is capable of writing a new PDF with the > contents of an existing one, with new metadata fed as a text file. So it > shouldn't be very hard to write a little PHP around that process. > > Now I need to think a bit more about this approach. Perhaps it can be > implemented using only pure PHP, after all. But for the time being, pdftk > will do. > > So thank you again for pushing me in that direction, even if > unintentionally and despite the fact that what I am doing goes against > your judgement ;) > As I know only too well, you can't always choose your customers (especially if they choose you...) and you certainly can't control all of the sources of data you have to deal with! I have spent many hours/days/possibly longer hacking through files that are in one form to get data into another, and PDF is the one that always makes me nervous :( My judgement is certainly not final, or even particularly important: if I had time I would also look into at least getting the metadata with pure PHP. Good luck... -- Peter Ford phone: 01580 89 Developer fax: 01580 893399 Justcroft International Ltd., Staplehurst, Kent -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: PHP class or functions to manipulate PDF metadata?
tedd wrote: [..] > All the attempts I have done into opening up a PDF file and then trying > to make sense of it and put it back together with something changed have > been absolute failures. > > The algorithm used to make a PDF file reminds me of a replacement-type > compression technique -- it's not easy to understand what was done. It's definitely voodoo. And I'm not adverse to a little voodoo myself, but someone else's voodoo in which you aren't initiated always seems to be so much more impenetrable... -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: PHP class or functions to manipulate PDF metadata?
Peter Ford wrote: > O. Lavell wrote: [..] >> Any and all suggestions are welcome. Thank you in advance. >> > So many people ask about manipulating, editing and generally processing > PDF files. In my experience, PDF is a write-once format - any > manipulation should have been done in whatever source generated the PDF. > I think of a PDF as being a piece of paper: if you want to change the > content of a piece of paper it is usually best to chuck it away and > start again... > > Even more so, this would apply to the PDF metadata: metadata is supposed > to describe the nature of the document: it's author, creation time etc. > That sort of data should be maintained with the document and ideally not > changed throughout the document's lifetime (like the footer, or > end-papers in a physical book) Thank you very much for your reply. And it's not that I don't agree with you. Because I do, completely. However... PDFs often come from sources that can't be bothered to fill in the relevant fields correctly, completely, or at all. For those cases I would like the users of my application to be able to correct the values found in the metadata. Upload the PDF, get a nice little HTML form with 4 or 5 values to review or edit. That sort of thing. > I do accept that the metadata should be machine-readable: that part of > your project is reasonable and I'm fairly sure that ought to be possible > with something simple. The best bet I found so far is PDFTK > (http://www.pdfhacks.com/pdftk/) which is a command-line tool that you > could presumably call with exec or whatever... Like I said, this is what I am already doing with the pdfinfo utility from xpdf. But now that you mentioned pdftk... I just tried it and it does seem to come close to what I want. It is capable of writing a new PDF with the contents of an existing one, with new metadata fed as a text file. So it shouldn't be very hard to write a little PHP around that process. Now I need to think a bit more about this approach. Perhaps it can be implemented using only pure PHP, after all. But for the time being, pdftk will do. So thank you again for pushing me in that direction, even if unintentionally and despite the fact that what I am doing goes against your judgement ;) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: PHP class or functions to manipulate PDF metadata?
At 10:06 AM +0100 4/6/09, Peter Ford wrote: O. Lavell wrote: > Any and all suggestions are welcome. Thank you in advance. So many people ask about manipulating, editing and generally processing PDF files. In my experience, PDF is a write-once format - any manipulation should have been done in whatever source generated the PDF. I think of a PDF as being a piece of paper: if you want to change the content of a piece of paper it is usually best to chuck it away and start again... That's a good way to put it. All the attempts I have done into opening up a PDF file and then trying to make sense of it and put it back together with something changed have been absolute failures. The algorithm used to make a PDF file reminds me of a replacement-type compression technique -- it's not easy to understand what was done. Cheers, tedd -- --- http://sperling.com http://ancientstones.com http://earthstones.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: PHP class or functions to manipulate PDF metadata?
O. Lavell wrote: > Hi group, > > I am looking for an easy way to manipulate (read, write) the metadata > (title, subject, keywords, author) in PDF files through PHP. > > Most PHP/PDF solutions I have found so far (through Google) are aimed at > constructing PDFs from text and graphics, with lots of fancy features, > but most of them omit metadata functions altogether. > > I would also prefer something extremely lightweight that I could just > include_once() into my script, i.e. not a module or external program. I > am currently using pdfinfo from xpdf-utils, but it has to go. > > My use case is I want to build a database with the metadata of a bunch > (many hundreds, perhaps thousands) of PDF files in a directory on the > server for easy search, statistics and retrieval. I also want users to be > able to make edits to any PDF's metadata from the web. > > If it can be at all avoided, I would rather not have to invent the wheel > myself here. I have looked at the Adobe PDF specification a bit and it > looks quite... challenging. Or should I say daunting. > > Any and all suggestions are welcome. Thank you in advance. > So many people ask about manipulating, editing and generally processing PDF files. In my experience, PDF is a write-once format - any manipulation should have been done in whatever source generated the PDF. I think of a PDF as being a piece of paper: if you want to change the content of a piece of paper it is usually best to chuck it away and start again... Even more so, this would apply to the PDF metadata: metadata is supposed to describe the nature of the document: it's author, creation time etc. That sort of data should be maintained with the document and ideally not changed throughout the document's lifetime (like the footer, or end-papers in a physical book) I do accept that the metadata should be machine-readable: that part of your project is reasonable and I'm fairly sure that ought to be possible with something simple. The best bet I found so far is PDFTK (http://www.pdfhacks.com/pdftk/) which is a command-line tool that you could presumably call with exec or whatever... -- Peter Ford phone: 01580 89 Developer fax: 01580 893399 Justcroft International Ltd., Staplehurst, Kent -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php