Re: [PHP] reading PDF's
Ben Ramsey wrote: Another, easy way to create PDFs with PHP is to use PDML: http://pdml.sourceforge.net/ As for reading the text from a PDF, maybe there's some sort of OCR library for PHP out there, but I don't know about it. It'd be a great thing to see, though. You wouldn't need OCR in most cases, as the text is stored as real text, not as images of text, in the PDF. Surely there must be a PDF-to-text utility out there somewhere, because there's plenty of open-source PDF reading utils around... Jasper -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] reading PDF's
Is it possible to read text from a PDF file with PHP? How? There may be a free one, or even an OpenSource one, but I've never heard of it, possibly because they'd have to pay a license to Adobe (Macromedia this week?) to be legal... Free (as in beer): http://sourceforge.net/projects/pdfcreator/ It's built on top of Ghostscript... which AFAIK does most of the heavy lifting. Several licensing options too. This doesn't appear to read text from a PDF but, rather, create the PDF from text. Another, easy way to create PDFs with PHP is to use PDML: http://pdml.sourceforge.net/ As for reading the text from a PDF, maybe there's some sort of OCR library for PHP out there, but I don't know about it. It'd be a great thing to see, though. -- Ben Ramsey http://benramsey.com/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] reading PDF's
Richard Lynch wrote: On Fri, June 24, 2005 12:10 pm, Jon said: Is it possible to read text from a PDF file with PHP? How? ... There may be a free one, or even an OpenSource one, but I've never heard of it, possibly because they'd have to pay a license to Adobe (Macromedia this week?) to be legal... Free (as in beer): http://sourceforge.net/projects/pdfcreator/ It's built on top of Ghostscript... which AFAIK does most of the heavy lifting. Several licensing options too. ... You don't want to get to launch and find out 90% of the real PDFs simply don't work. :-( I've been using it for about 3 months with very few problems. In fact, I can't think of any problems that I've had with the library (but I don't use it with PHP... I just know that bindings are there for you to go do it yourself). -- NEW? | http://www.catb.org/~esr/faqs/smart-questions.html STFA | http://marc.theaimsgroup.com/?l=php-general&w=2 STFM | http://php.net/manual/en/index.php STFW | http://www.google.com/search?q=php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] reading PDF's
On Fri, June 24, 2005 12:10 pm, Jon said: > Is it possible to read text from a PDF file with PHP? How? At the crudest level, you can fopen/fread a PDF and dump it out, and pick out the plain-text readable bits with your eyes. :-) After that, there are definitely some commercial command-line tools to convert PDF to text (or HTML or whatever) that you can Google for. There may be a free one, or even an OpenSource one, but I've never heard of it, possibly because they'd have to pay a license to Adobe (Macromedia this week?) to be legal... Note that PDFs can have the text encrypted, or password-protect the PDF, or the text could have been rendered into an image which was embedded in the PDF (ugh!). At that point, you can maybe get the image out and use some kind of OCR softare like OmniPage to "read" it. Over the years and versions the PDF changed a lot, so be sure to have a representative sample of PDFs to throw at your testing. You don't want to get to launch and find out 90% of the real PDFs simply don't work. :-( -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] reading PDF's
Jon, I'm not sure there is a way for you to do this from within PHP, but then again I didn't think it was possible for PHP to generate a pdf without any extra libs either ;-) You might want to start with the pdf2* command line programs. I think there is in fact one that will output the pdf as text (pdf2txt). Good luck! Jon wrote: Is it possible to read text from a PDF file with PHP? How? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] reading PDF's
Is it possible to read text from a PDF file with PHP? How? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php