On Wed, 7 Mar 2012, Harry Simons wrote:
When converting a bunch of Microsoft Word documents using the command,
java -jar tika-app-1.1-SNAPSHOT.jar -v -t
, I'm getting the following exception.
Caused by: java.lang.ArrayIndexOutOfBoundsException: 487
at
org.apache.poi.hwpf.sprm.SprmOperation.initSize(SprmOperation.java:174)
at org.apache.poi.hwpf.sprm.SprmOperation.<init>(SprmOperation.java:80)
This looks like a POI bug
Because these are internal business documents, I may not be able to share
them with you guys so would greatly appreciate a fix or a workaround.
That's going to make fixing it much trickier. You'll need to raise a POI
bug, and be willing to do lots of investigating
It may also be worth running the Binary File Format Validator
<http://poi.apache.org/faq.html#faq-N10109> against the file, to check
it's a valid and not corrupted
Nick