Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread Robert Cummings
On Thu, 2008-05-15 at 20:17 -0500, Ray Hauge wrote: > > One thing you'll have to watch is that if the PDF was created by a > scanner, then the "text" on the PDF is actually just an image and cannot > be read without OCR. I got stumped on that one for a while when I was > doing something simila

Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread Ray Hauge
Angelo Zanetti wrote: Hi All. This is a quick question. A client of ours wants a solution that when a PDF document is uploaded that we use PHP to scan the documents contents and save it in a DB. I know you can do this with normal text documents using the file commands and functions. Is it pos

Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread Eric Butera
On Thu, May 15, 2008 at 4:19 AM, Angelo Zanetti <[EMAIL PROTECTED]> wrote: > Hi All. > > This is a quick question. > > A client of ours wants a solution that when a PDF document is uploaded that > we use PHP to scan the documents contents and save it in a DB. > > I know you can do this with normal

Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread Frank Arensmeier
A reliable solution depends partly on the pdf document itself. Consider if your pdf document contains roted text or text that spans about several different blocks/pages. My experience with ps2acsii and other ghostscript related tools is that sometimes it works quite well, sometimes the outp

Re: [PHP] SCanning text of PDF documents

2008-05-15 Thread David Otton
2008/5/15 Angelo Zanetti <[EMAIL PROTECTED]>: > A client of ours wants a solution that when a PDF document is uploaded that > we use PHP to scan the documents contents and save it in a DB. > > I know you can do this with normal text documents using the file commands > and functions. > > Is it

[PHP] SCanning text of PDF documents

2008-05-15 Thread Angelo Zanetti
Hi All. This is a quick question. A client of ours wants a solution that when a PDF document is uploaded that we use PHP to scan the documents contents and save it in a DB. I know you can do this with normal text documents using the file commands and functions. Is it possible with PDF documents