Hi, Gesendet: Do, 02. Sep 2010 Von: Kevin<[email protected]> > I am working on a script that takes a vendor generated PDF document and > converts it to a text file that I can use to update our database. I have > tried multiple scripts and have yet to find one that works on this > particular PDF document. One script I had in PERL and one in PHP and both > worked on other PDF's but not this one. It depends on the pdf and its contents if text extraction will work or not. So probably the issue isn't the tool but the pdf itself.
> Hopefully PDFBox can do what the others couldn't. Anyway I am having > trouble > using this library, I'm a newbie when it comes to installing libraries and > especially Java libraries for use with PHP. > > When I am all finsihed I would like to be able to save the vendor generated > PDF into a directory and when the script runs, if it finds it it will > convert to text and then start the input into the SQL. First things first, > I > need to setup this library just to see if it will convert this PDF because > so many others have failed. > > Any help with the installation/setup would be appreciated. Download the precompiled standalone jar from [1]. Use the following command to extract the text java -jar pdfbox-app-x.y.z.jar ExtractText -sort [OPTIONS] <pdf> [textfile] Have a look at [2] for a complete list of all command line parameter. BR Andreas Lehmkühler [1] http://pdfbox.apache.org/download.html [2] http://pdfbox.apache.org/commandlineutilities/ExtractText.html

