Seva Alekseyev created TIKA-2203: ------------------------------------ Summary: InvalidOperationException on a valid Word file Key: TIKA-2203 URL: https://issues.apache.org/jira/browse/TIKA-2203 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.14 Environment: Windows 7 x64, JVM 1.8.0_101 Reporter: Seva Alekseyev Attachments: OPCCompliance_DerivedPartNameFAIL.docx
The attached Word file, which opens in Word, errors out in Tika: org.apache.tika.exception.TikaException: Error creating OOXML extractor at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:123 at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87 at gov.nih.niaid.fscanner.Extract.ExtractContents:69 Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: You can't add a part with a part name derived from another part ! [M1.11] at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl:338 at org.apache.poi.openxml4j.opc.OPCPackage.getParts:774 at org.apache.poi.openxml4j.opc.OPCPackage.open:268 at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:69 at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87 Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException: You can't add a part with a part name derived from another part ! [M1.11] at org.apache.poi.openxml4j.opc.PackagePartCollection.put:66 at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl:336 at org.apache.poi.openxml4j.opc.OPCPackage.getParts:774 at org.apache.poi.openxml4j.opc.OPCPackage.open:268 at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:69 at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87 -- This message was sent by Atlassian JIRA (v6.3.4#6332)