David Mitchell wrote: > This might be helpful also: http://sourceforge.net/projects/pdfbox/ > > I tried the batch utility that comes with it and it seems to do a decent > job of > text extraction.
On Linux, you can use pdftotext. For finer control, you need to convert the pdf file to PS and use better tools there. The major issues are font encoding, text order, accents (and other constructed characters), and ligatures. Best wishes, John ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
