Forgot to cc users@
---------- Ursprüngliche Nachricht ---------- Von: "Andreas Lehmkühler" <[email protected]> An: Cool The Breezer <[email protected]> Datum: 15. März 2012 um 21:12 Betreff: Re: Exception :org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream Hi, Cool The Breezer <[email protected]> hat am 15. März 2012 um 07:38 geschrieben: > Hello Group, > I recently downloaded PDFBox 1.6.0. I using to parse > PDF files as URL in a multi-threaded environment, max 4 thread. It works fine > for ~200 odd files and then displays following excpetion > org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream > I am using pdfbox in Max OSX lion. I am using following code > > URL url = new URL( filePath ); > URLConnection urlConn = url.openConnection(); > InputStream inStream = urlConn.getInputStream(); > PDFParser pdfParser = new PDFParser(inStream); > pdfParser.parse(); > document = new PDDocument(pdfParser.getDocument()); > PDFTextStripper stripper = new PDFTextStripper(); > String str = stripper.getText(document); > > inStream.close(); > output.close(); > document.close(); There may be a couple of different reasons for that. The version you are using swallows the origin exception. - one of your PDFs may be corrupt, try to find out if the exception occurs when processing the very same document - you ran into an issue which was resolved in the current trunk [1] - OutOfMememory > > In addition to the above error, I am getting ERROR > org.apache.pdfbox.pdmodel.font.PDCIDFont - Error: Could not parse predefined > CMAP file for 'Adobe--UCS2' error but that does not stop the parser to extract > text so I am ignoring this error. Please suggest me any work around. > > regards, > RB BR Andreas Lehmkühler [1] https://issues.apache.org/jira/browse/PDFBOX-1232

