[jira] [Updated] (PDFBOX-2424) ClassCastException in getMetaData if no real meta data
[ https://issues.apache.org/jira/browse/PDFBOX-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-2424: Fix Version/s: 2.0.0 > ClassCastException in getMetaData if no real meta data > -- > > Key: PDFBOX-2424 > URL: https://issues.apache.org/jira/browse/PDFBOX-2424 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.7, 1.8.8, 2.0.0 >Reporter: Tilman Hausherr > Fix For: 2.0.0 > > Attachments: 333472.pdf > > > Here's an exception from [~talli...@apache.org] latest TIKA test (too lazy to > test it myself, the cause is obvious) with the attached file: > {code} > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.pdf.PDFParser > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:137) > at > org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:120) > at > org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:153) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:96) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:38) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary > cannot be cast to org.apache.pdfbox.cos.COSStream > at > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getMetadata(PDDocumentCatalog.java:312) > at > org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:181) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) > ... 13 more > " > {code} > here's the excerpt in the PDF: > {code} > 241 0 obj << /Type /Metadata /Subtype /XML >> endobj > {code} > the current code is > {code} > COSStream stream = (COSStream)root.getDictionaryObject( > COSName.METADATA ); > {code} > shall we keep it that way or rather put out a warning if the meta data is not > a stream and return null? Adobe Reader does nothing when looking for the > properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2424) ClassCastException in getMetaData if no real meta data
[ https://issues.apache.org/jira/browse/PDFBOX-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2424: Attachment: 333472.pdf > ClassCastException in getMetaData if no real meta data > -- > > Key: PDFBOX-2424 > URL: https://issues.apache.org/jira/browse/PDFBOX-2424 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.7, 1.8.8, 2.0.0 >Reporter: Tilman Hausherr > Attachments: 333472.pdf > > > Here's an exception from [~talli...@apache.org] latest TIKA test (too lazy to > test it myself, the cause is obvious) with the attached file: > {code} > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.pdf.PDFParser > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:137) > at > org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:120) > at > org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:153) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:96) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:38) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary > cannot be cast to org.apache.pdfbox.cos.COSStream > at > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getMetadata(PDDocumentCatalog.java:312) > at > org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:181) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) > ... 13 more > " > {code} > here's the excerpt in the PDF: > {code} > 241 0 obj << /Type /Metadata /Subtype /XML >> endobj > {code} > the current code is > {code} > COSStream stream = (COSStream)root.getDictionaryObject( > COSName.METADATA ); > {code} > shall we keep it that way or rather put out a warning if the meta data is not > a stream and return null? Adobe Reader does nothing when looking for the > properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2424) ClassCastException in getMetaData if no real meta data
[ https://issues.apache.org/jira/browse/PDFBOX-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2424: Affects Version/s: 2.0.0 1.8.8 1.8.7 > ClassCastException in getMetaData if no real meta data > -- > > Key: PDFBOX-2424 > URL: https://issues.apache.org/jira/browse/PDFBOX-2424 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.7, 1.8.8, 2.0.0 >Reporter: Tilman Hausherr > > Here's an exception from [~talli...@apache.org] latest TIKA test (too lazy to > test it myself, the cause is obvious) with the attached file: > {code} > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.pdf.PDFParser > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:137) > at > org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:120) > at > org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:153) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:96) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:38) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary > cannot be cast to org.apache.pdfbox.cos.COSStream > at > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getMetadata(PDDocumentCatalog.java:312) > at > org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:181) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) > ... 13 more > " > {code} > here's the excerpt in the PDF: > {code} > 241 0 obj << /Type /Metadata /Subtype /XML >> endobj > {code} > the current code is > {code} > COSStream stream = (COSStream)root.getDictionaryObject( > COSName.METADATA ); > {code} > shall we keep it that way or rather put out a warning if the meta data is not > a stream and return null? Adobe Reader does nothing when looking for the > properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2424) ClassCastException in getMetaData if no real meta data
[ https://issues.apache.org/jira/browse/PDFBOX-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2424: Component/s: Parsing > ClassCastException in getMetaData if no real meta data > -- > > Key: PDFBOX-2424 > URL: https://issues.apache.org/jira/browse/PDFBOX-2424 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.7, 1.8.8, 2.0.0 >Reporter: Tilman Hausherr > > Here's an exception from [~talli...@apache.org] latest TIKA test (too lazy to > test it myself, the cause is obvious) with the attached file: > {code} > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.pdf.PDFParser > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:137) > at > org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:120) > at > org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:153) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:96) > at > org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:38) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary > cannot be cast to org.apache.pdfbox.cos.COSStream > at > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getMetadata(PDDocumentCatalog.java:312) > at > org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:181) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) > ... 13 more > " > {code} > here's the excerpt in the PDF: > {code} > 241 0 obj << /Type /Metadata /Subtype /XML >> endobj > {code} > the current code is > {code} > COSStream stream = (COSStream)root.getDictionaryObject( > COSName.METADATA ); > {code} > shall we keep it that way or rather put out a warning if the meta data is not > a stream and return null? Adobe Reader does nothing when looking for the > properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)