Re: [PHP] SCanning text of PDF documents

Frank Arensmeier Thu, 15 May 2008 02:42:29 -0700

A reliable solution depends partly on the pdf document itself.Consider if your pdf document contains roted text or text that spansabout several different blocks/pages. My experience with ps2acsii andother ghostscript related tools is that sometimes it works quitewell, sometimes the output is rather messy.

The most reliable way of extracting text from a pdf is (I think) aproduct called PDF TET from PDFlib Gmbh. Yes, it costs some money fora license, but you are able to get almost everything out of the pdfthen.


http://www.pdflib.com/products/tet/

Maybe some magic with OpenOffice could do the trick as well?

//frank

15 maj 2008 kl. 10.19 skrev Angelo Zanetti:

Hi All.

This is a quick question.

A client of ours wants a solution that when a PDF document isuploaded that

we use PHP to scan the documents contents and save it in a DB.

I know you can do this with normal text documents using the filecommands

and functions.

Is it possible with PDF documents?

My feeling is NO, but perhaps someone will prove me wrong.

Thanks in advance.

Angelo

Web: http://www.elemental.co.za



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Frank Arensmeier

................................................................................................

Webmaster & IT Development

NIKE Hydraulics AB
Box 1107
631 80 Eskilstuna
Sweden

phone +46 - (0)16 16 82 34
fax +46 - (0)16 13 93 16
[EMAIL PROTECTED]
www.nikehydraulics.se

................................................................................................

Re: [PHP] SCanning text of PDF documents

Reply via email to