[ 
https://issues.apache.org/jira/browse/TIKA-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2136.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.15
                   2.0

> External file links in PPTX misparsed
> -------------------------------------
>
>                 Key: TIKA-2136
>                 URL: https://issues.apache.org/jira/browse/TIKA-2136
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.13
>         Environment: Windows 7 x64, JVM 1.8.0_101
>            Reporter: Seva Alekseyev
>             Fix For: 2.0, 1.15
>
>         Attachments: 81809 lab presentation.pptx
>
>
> The attached document contains links to external files. Trying to parse it 
> with the Tika parser throws the following error:
> java.lang.NullPointerException
>       at 
> org.apache.poi.openxml4j.opc.PackagePartName.throwExceptionIfEmptyURI(PackagePartName.java:204)
>       at 
> org.apache.poi.openxml4j.opc.PackagePartName.throwExceptionIfInvalidPartUri(PackagePartName.java:174)
>       at 
> org.apache.poi.openxml4j.opc.PackagePartName.<init>(PackagePartName.java:85)
>       at 
> org.apache.poi.openxml4j.opc.PackagingURIHelper.createPartName(PackagingURIHelper.java:493)
>       at 
> org.apache.poi.openxml4j.opc.PackagePart.getRelatedPart(PackagePart.java:485)
>       at 
> org.apache.poi.xslf.usermodel.XSLFSlideShow.<init>(XSLFSlideShow.java:86)
>       at 
> org.apache.poi.xslf.extractor.XSLFPowerPointExtractor.<init>(XSLFPowerPointExtractor.java:62)
>       at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:244)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
> The error happens in the URI validator, but not because the URI fails 
> validation; the function fails because partURI.getPath() returns a null and 
> there's no null check. The link in the file may not be valid, but it's not 
> malformed. And it definitely shouldn't prevent text extraction from the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to