I'm using the PDFBOX 0.7.4 together with Aperture.
 
While crawling, I often get the following warning and exception;
 
"[Aug 14 07:17:34] WARN  (PdfExtractor.java:119) - IOException while extracting 
full-text of 
file:////De-fs003/projects/Active/EC305479%20-%20EUTELSAT%20W2M%20&%20I3K/SC200129%20-%20EUTELSAT%20W2M%20&%20I3K/00%20-%20Additional%20Info/Old%20Eutelsat%20Project/eutelsat/Project/018%20-%20System%20Status/05%20Sysmgt/q7wnb-te.pdf
java.io.StreamCorruptedException: Error: data is null
 at org.pdfbox.filter.LZWFilter.decode(LZWFilter.java:95)
 at org.pdfbox.cos.COSStream.doDecode(COSStream.java:313)
 at org.pdfbox.cos.COSStream.doDecode(COSStream.java:243)
 at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
 at org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
 at org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:132)
 at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:205)
 at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:177)
 at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:339)
 at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:263)
 at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:219)
 at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:152)
 at 
org.semanticdesktop.aperture.extractor.pdf.PdfExtractor.extractFullText(PdfExtractor.java:112)
 at 
org.semanticdesktop.aperture.extractor.pdf.PdfExtractor.processDocument(PdfExtractor.java:100)
 at 
org.semanticdesktop.aperture.extractor.pdf.PdfExtractor.extract(PdfExtractor.java:62)"
 
I have not been able to find a description or solution of the problem 
elsewhere. Is this a know problem? Or have I done something wrong?
 
Thanks,
Gert.

 
 


Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

Reply via email to