Nick, thanks for your response.

I tried the BFF Validator, and it is indeed failing!

However, the file got created by MS Word only, and I doubt if it's 'corrupt'... since both MS Word and LibreOffice can load it fine without any errors or even warnings of any kind -- everything seems to be normal with these apps. I can even use LibreOffice 3.5 to convert it to pdf or to a .zip of xml's.

> This looks like a POI bug
Do you/others still feel it could be addressed by a POI upgrade?

Also, I thought Tika uses POI and would be using POI as a .jar. But looking in Tika sources, I could find only *POI*.java files but no *POI*.jar or *poi*.jar file(s).

/HS

On 03/07/2012 06:08 PM, Nick Burch wrote:
On Wed, 7 Mar 2012, Harry Simons wrote:
When converting a bunch of Microsoft Word documents using the command,

   java -jar tika-app-1.1-SNAPSHOT.jar -v -t

, I'm getting the following exception.

Caused by: java.lang.ArrayIndexOutOfBoundsException: 487
   at org.apache.poi.hwpf.sprm.SprmOperation.initSize(SprmOperation.java:174)
   at org.apache.poi.hwpf.sprm.SprmOperation.<init>(SprmOperation.java:80)

This looks like a POI bug


Because these are internal business documents, I may not be able to share them with you guys so would greatly appreciate a fix or a workaround.

That's going to make fixing it much trickier. You'll need to raise a POI bug, and be willing to do lots of investigating

It may also be worth running the Binary File Format Validator <http://poi.apache.org/faq.html#faq-N10109> against the file, to check it's a valid and not corrupted

Nick

Reply via email to