Hello,

When converting a bunch of Microsoft Word documents using the command,

    java -jar tika-app-1.1-SNAPSHOT.jar -v -t

, I'm getting the following exception.

org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@5d3ac0
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:130)
    at org.apache.tika.cli.TikaCLI$TikaServer$1.run(TikaCLI.java:735)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 487
    at org.apache.poi.hwpf.sprm.SprmOperation.initSize(SprmOperation.java:174)
    at org.apache.poi.hwpf.sprm.SprmOperation.<init>(SprmOperation.java:80)
    at org.apache.poi.hwpf.sprm.SprmIterator.next(SprmIterator.java:48)
at org.apache.poi.hwpf.sprm.ParagraphSprmUncompressor.uncompressPAP(ParagraphSprmUncompressor.java:67)
    at org.apache.poi.hwpf.usermodel.Paragraph.newParagraph(Paragraph.java:103)
    at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:943)
at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:146)
    at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
    at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
    at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    ... 4 more




Any idea how to avoid getting this error?

Because these are internal business documents, I may not be able to share them with you guys so would greatly appreciate a fix or a workaround.

Noticed that with 'tika-app-1.0.jar', an even greater number of files would fail to convert. So, things definitely seem to have improved with version 1.1.

Regards,
/HS


Reply via email to