Hello,
I have been using PDFBox to get text from PDF's and validate some of it.
Recently I have had
Problems parsing the PDF's, more precisely I get an java.io.ioexception. I use
the following code
To get the text from PDF:
public String getTextFromPDF(URL url, int readTimeout, int connectTimeout)
throws IOException {
try {
//open connection
HttpURLConnection conn = (HttpURLConnection)
url.openConnection();
//set caching to false
conn.setUseCaches( false );
//set read timeout
conn.setReadTimeout( readTimeout );
//set connect timeout
conn.setConnectTimeout( connectTimeout );
//get input stream from connection
InputStream fileToParse = conn.getInputStream();
System.out.println( fileToParse.toString());
//parser object
PDFParser parser = new PDFParser(fileToParse, null, true);
parser.parse();
//do parse
//parser.parse();
//get document
PDDocument pdoc = parser.getPDDocument();
//get stripper object
PDFTextStripper stripper = new PDFTextStripper();
//get text
String text = stripper.getText( pdoc );
//close doc
pdoc.close();
//disconnect
conn.disconnect();
//reset connection (set to nothing)
conn = null;
//reset file
fileToParse = null;
//reset parser
parser = null;
//return content
return text;
}
The error message I get is this (line 51 is where I call parser.parse() above):
[cid:[email protected]]
I appreciate any tips and help you can provide, in advance many thank you
Miran Damjanovic
-------------------------------------------------------------------
The information contained in this message may be CONFIDENTIAL and is
intended for the addressee only. Any unauthorised use, dissemination of the
information or copying of this message is prohibited. If you are not the
addressee, please notify the sender immediately by return e-mail and delete
this message.
Thank you