RE: [PHP] extract text from pdf

2006-05-11 Thread George Pitcher
Have a look at the iText java class. I use it in conjuction with php for file splitting and concatenation, but it has a whole host of other features. It's accessible via sourceforge or from the author at www.lowagie.com/iText/. Hope it helps George -Original Message- From: cajbecu

Re: [PHP] extract text from pdf

2006-05-11 Thread Rory Browne
I use twiki. Twiki search sucks. Someone wrote a Plucene based search engine. They wanted to be able to search attachments. Including Pdf files. They used ... something out of xpdf - pdf2text or pdftotext On 5/11/06, George Pitcher [EMAIL PROTECTED] wrote: Have a look at the iText

RE: [PHP] extract text from pdf

2006-05-11 Thread ray . hauge
If this is on a *nix box, I would suggest using the pdf2text command within shell_exec. It should work as long as the PDF isn't a scanned image. Obviously it won't get text off the images, and you'd want to make sure that any input to filenames (if they're dynamic) are verified and scrubbed