Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread David Otton
2008/5/15 Angelo Zanetti [EMAIL PROTECTED]: A client of ours wants a solution that when a PDF document is uploaded that we use PHP to scan the documents contents and save it in a DB. I know you can do this with normal text documents using the file commands and functions. Is it

Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread Frank Arensmeier
A reliable solution depends partly on the pdf document itself. Consider if your pdf document contains roted text or text that spans about several different blocks/pages. My experience with ps2acsii and other ghostscript related tools is that sometimes it works quite well, sometimes the

Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread Eric Butera
On Thu, May 15, 2008 at 4:19 AM, Angelo Zanetti [EMAIL PROTECTED] wrote: Hi All. This is a quick question. A client of ours wants a solution that when a PDF document is uploaded that we use PHP to scan the documents contents and save it in a DB. I know you can do this with normal text

Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread Ray Hauge
Angelo Zanetti wrote: Hi All. This is a quick question. A client of ours wants a solution that when a PDF document is uploaded that we use PHP to scan the documents contents and save it in a DB. I know you can do this with normal text documents using the file commands and functions. Is it

Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread Robert Cummings
On Thu, 2008-05-15 at 20:17 -0500, Ray Hauge wrote: One thing you'll have to watch is that if the PDF was created by a scanner, then the text on the PDF is actually just an image and cannot be read without OCR. I got stumped on that one for a while when I was doing something similar :)