Nick, thanks for your response.
I tried the BFF Validator, and it is indeed failing!
However, the file got created by MS Word only, and I doubt if it's 'corrupt'...
since both MS Word and LibreOffice can load it fine without any errors or even
warnings of any kind -- everything seems to be normal with these apps. I can
even use LibreOffice 3.5 to convert it to pdf or to a .zip of xml's.
> This looks like a POI bug
Do you/others still feel it could be addressed by a POI upgrade?
Also, I thought Tika uses POI and would be using POI as a .jar. But looking in
Tika sources, I could find only *POI*.java files but no *POI*.jar or *poi*.jar
file(s).
/HS
On 03/07/2012 06:08 PM, Nick Burch wrote:
On Wed, 7 Mar 2012, Harry Simons wrote:
When converting a bunch of Microsoft Word documents using the command,
java -jar tika-app-1.1-SNAPSHOT.jar -v -t
, I'm getting the following exception.
Caused by: java.lang.ArrayIndexOutOfBoundsException: 487
at org.apache.poi.hwpf.sprm.SprmOperation.initSize(SprmOperation.java:174)
at org.apache.poi.hwpf.sprm.SprmOperation.<init>(SprmOperation.java:80)
This looks like a POI bug
Because these are internal business documents, I may not be able to share
them with you guys so would greatly appreciate a fix or a workaround.
That's going to make fixing it much trickier. You'll need to raise a POI bug,
and be willing to do lots of investigating
It may also be worth running the Binary File Format Validator
<http://poi.apache.org/faq.html#faq-N10109> against the file, to check it's a
valid and not corrupted
Nick