ps2ascii seems to be an option that might work. There is a possibility that the files will be password protected (I have the password), in which case I couldn't get ps2ascii to work. The downside is that I would have to install ghostscript. :-(
Thanks for the suggestion. Owen On Wed, 2006-02-01 at 13:26 -0500, Christopher J. Knowles wrote: > On Wednesday 01 February 2006 12:01, Ian Kilgore wrote: > > Owen Berry wrote: > > | Anyone know of a command line utility for extracting text from a pdf > > | file, other than the one included in xpdf (pdftotext)? pdftotext does > > | exactly what I want, but I would like to avoid pulling in the rest of > > | xpdf, if possible, as this is for a server. > > | > > | BTW, I'm using it combined with the perlfect search engine, so the text > > | does not need to be formatted nicely or anything. > > | > > | Thanks, > > | Owen > > > > You can pipe pdf2ps | ps2ascii > > Or, according to ps2ascii manpage (and some quick experimentation, you can > just "ps2ascii pdffile.pdf > pdffile.txt" > > (When I just tried the pdf2ps | ps2ascii, it gave me a blank... while just > running it through ps2ascii seems to work.) > > CJK -- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
