Hello Group,
                        I recently downloaded PDFBox 1.6.0. I using to parse 
PDF files as URL in a multi-threaded environment, max 4 thread. It works fine 
for ~200 odd files and then displays following excpetion
org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream
I am using pdfbox in Max OSX lion. I am using following code

URL url = new URL( filePath );
URLConnection urlConn = url.openConnection();
InputStream inStream = urlConn.getInputStream();
PDFParser pdfParser = new PDFParser(inStream);
pdfParser.parse();
document = new PDDocument(pdfParser.getDocument());
PDFTextStripper stripper = new PDFTextStripper();
String str = stripper.getText(document);

inStream.close(); 
output.close();
document.close();

In addition to the above error, I am getting ERROR 
org.apache.pdfbox.pdmodel.font.PDCIDFont - Error: Could not parse predefined 
CMAP file for 'Adobe--UCS2' error but that does not stop the parser to extract 
text so I am ignoring this error. Please suggest me any work around.

regards,
RB

Reply via email to