[ https://issues.apache.org/jira/browse/TIKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting updated TIKA-316: ------------------------------- Component/s: (was: cli) parser Looks like this is caused by some underlying POI issue, i.e. the HDGF code in POI fails to interpret this file correctly. It would be great if someone could report this issue upstream to POI and add a reference to that issue here. > Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with > a negative length) > ------------------------------------------------------------------------------------------------ > > Key: TIKA-316 > URL: https://issues.apache.org/jira/browse/TIKA-316 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.4, 0.5 > Environment: Windows Server 2003 SP2, JRE 1.6.0_16, tika-app, Visio > 2003 > Reporter: Mike Hays > Attachments: repro-TIKA-316.vsd > > > tika-app (0.4 and 0.5 nightly) return the following when attempting to parse > a Visio 2003 file (other versions may be affected): > Exception in thread "main" org.apache.tika.exception.TikaException: > Unexpected RuntimeException from > org.apache.tika.parser.microsoft.officepar...@145e044 > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:123) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:103) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:176) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63) > Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative > length, which isn't allowed > at > org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120) > at > org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59) > at > org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93) > at > org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100) > at > org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100) > at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:95) > at > org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52) > at > org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:49) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:118) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) > ... 3 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.