[ https://issues.apache.org/jira/browse/TIKA-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison resolved TIKA-2136. ------------------------------- Resolution: Fixed Fix Version/s: 1.15 2.0 > External file links in PPTX misparsed > ------------------------------------- > > Key: TIKA-2136 > URL: https://issues.apache.org/jira/browse/TIKA-2136 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.13 > Environment: Windows 7 x64, JVM 1.8.0_101 > Reporter: Seva Alekseyev > Fix For: 2.0, 1.15 > > Attachments: 81809 lab presentation.pptx > > > The attached document contains links to external files. Trying to parse it > with the Tika parser throws the following error: > java.lang.NullPointerException > at > org.apache.poi.openxml4j.opc.PackagePartName.throwExceptionIfEmptyURI(PackagePartName.java:204) > at > org.apache.poi.openxml4j.opc.PackagePartName.throwExceptionIfInvalidPartUri(PackagePartName.java:174) > at > org.apache.poi.openxml4j.opc.PackagePartName.<init>(PackagePartName.java:85) > at > org.apache.poi.openxml4j.opc.PackagingURIHelper.createPartName(PackagingURIHelper.java:493) > at > org.apache.poi.openxml4j.opc.PackagePart.getRelatedPart(PackagePart.java:485) > at > org.apache.poi.xslf.usermodel.XSLFSlideShow.<init>(XSLFSlideShow.java:86) > at > org.apache.poi.xslf.extractor.XSLFPowerPointExtractor.<init>(XSLFPowerPointExtractor.java:62) > at > org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:244) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) > The error happens in the URI validator, but not because the URI fails > validation; the function fails because partURI.getPath() returns a null and > there's no null check. The link in the file may not be valid, but it's not > malformed. And it definitely shouldn't prevent text extraction from the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)