I'm using the PDFBOX 0.7.4 together with Aperture. While crawling, I often get the following warning and exception; "[Aug 14 07:17:34] WARN (PdfExtractor.java:119) - IOException while extracting full-text of file:////De-fs003/projects/Active/EC305479%20-%20EUTELSAT%20W2M%20&%20I3K/SC200129%20-%20EUTELSAT%20W2M%20&%20I3K/00%20-%20Additional%20Info/Old%20Eutelsat%20Project/eutelsat/Project/018%20-%20System%20Status/05%20Sysmgt/q7wnb-te.pdf java.io.StreamCorruptedException: Error: data is null at org.pdfbox.filter.LZWFilter.decode(LZWFilter.java:95) at org.pdfbox.cos.COSStream.doDecode(COSStream.java:313) at org.pdfbox.cos.COSStream.doDecode(COSStream.java:243) at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170) at org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101) at org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:132) at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:205) at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:177) at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:339) at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:263) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:219) at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:152) at org.semanticdesktop.aperture.extractor.pdf.PdfExtractor.extractFullText(PdfExtractor.java:112) at org.semanticdesktop.aperture.extractor.pdf.PdfExtractor.processDocument(PdfExtractor.java:100) at org.semanticdesktop.aperture.extractor.pdf.PdfExtractor.extract(PdfExtractor.java:62)" I have not been able to find a description or solution of the problem elsewhere. Is this a know problem? Or have I done something wrong? Thanks, Gert.
Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.