Re: [PHP] reading PDF's

2005-07-01 Thread Jasper Bryant-Greene

Ben Ramsey wrote:
Another, easy way to create PDFs with PHP is to use PDML: 
http://pdml.sourceforge.net/


As for reading the text from a PDF, maybe there's some sort of OCR 
library for PHP out there, but I don't know about it. It'd be a great 
thing to see, though.


You wouldn't need OCR in most cases, as the text is stored as real text, 
not as images of text, in the PDF.


Surely there must be a PDF-to-text utility out there somewhere, because 
there's plenty of open-source PDF reading utils around...


Jasper

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] reading PDF's

2005-07-01 Thread Ben Ramsey

Is it possible to read text from a PDF file with PHP? How?


There may be a free one, or even an OpenSource one, but I've never heard
of it, possibly because they'd have to pay a license to Adobe (Macromedia
this week?) to be legal...


Free (as in beer):
http://sourceforge.net/projects/pdfcreator/

It's built on top of Ghostscript... which AFAIK does most of the heavy 
lifting.  Several licensing options too.


This doesn't appear to read text from a PDF but, rather, create the PDF 
from text.


Another, easy way to create PDFs with PHP is to use PDML: 
http://pdml.sourceforge.net/


As for reading the text from a PDF, maybe there's some sort of OCR 
library for PHP out there, but I don't know about it. It'd be a great 
thing to see, though.


--
Ben Ramsey
http://benramsey.com/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] reading PDF's

2005-07-01 Thread Jason Barnett

Richard Lynch wrote:

On Fri, June 24, 2005 12:10 pm, Jon said:


Is it possible to read text from a PDF file with PHP? How?

...


There may be a free one, or even an OpenSource one, but I've never heard
of it, possibly because they'd have to pay a license to Adobe (Macromedia
this week?) to be legal...



Free (as in beer):
http://sourceforge.net/projects/pdfcreator/

It's built on top of Ghostscript... which AFAIK does most of the heavy 
lifting.  Several licensing options too.


...


You don't want to get to launch and find out 90% of the real PDFs simply
don't work. :-(



I've been using it for about 3 months with very few problems.  In fact, 
I can't think of any problems that I've had with the library (but I 
don't use it with PHP... I just know that bindings are there for you to 
go do it yourself).


--
NEW? | http://www.catb.org/~esr/faqs/smart-questions.html
STFA | http://marc.theaimsgroup.com/?l=php-general&w=2
STFM | http://php.net/manual/en/index.php
STFW | http://www.google.com/search?q=php

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] reading PDF's

2005-06-30 Thread Richard Lynch
On Fri, June 24, 2005 12:10 pm, Jon said:
> Is it possible to read text from a PDF file with PHP? How?

At the crudest level, you can fopen/fread a PDF and dump it out, and pick
out the plain-text readable bits with your eyes. :-)

After that, there are definitely some commercial command-line tools to
convert PDF to text (or HTML or whatever) that you can Google for.

There may be a free one, or even an OpenSource one, but I've never heard
of it, possibly because they'd have to pay a license to Adobe (Macromedia
this week?) to be legal...

Note that PDFs can have the text encrypted, or password-protect the PDF,
or the text could have been rendered into an image which was embedded in
the PDF (ugh!).

At that point, you can maybe get the image out and use some kind of OCR
softare like OmniPage to "read" it.

Over the years and versions the PDF changed a lot, so be sure to have a
representative sample of PDFs to throw at your testing.

You don't want to get to launch and find out 90% of the real PDFs simply
don't work. :-(

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] reading PDF's

2005-06-24 Thread Joe Wollard

Jon,

I'm not sure there is a way for you to do this from within PHP, but then 
again I didn't think it was possible for PHP to generate a pdf without 
any extra libs either ;-)
You might want to start with the pdf2* command line programs. I think 
there is in fact one that will output the pdf as text (pdf2txt).


Good luck!

Jon wrote:


Is it possible to read text from a PDF file with PHP? How?

 



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] reading PDF's

2005-06-24 Thread Jon
Is it possible to read text from a PDF file with PHP? How?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php