On Wed, 15 Feb 2012, Van Tassell, Kristian wrote:
Thanks for the idea to try it from tika-app, having come from Solr, I was unaware of this (great) tool. I opened a vsd file I knew was parsing fine and it loaded fine, as expected. I then opened the problem file and finally saw the stack trace occurring. As you suspected, it looks perhaps more like POI is the offender.
Yup, looks like a POI bug. It seems that for some reason, it's expecting to find a string beyond the length of the chunk. Your best bet is to open a bug in the POI bugzilla, and upload the file
That said, I'm not sure why you're not getting the failure + exception when you call Tika from your code, that might be a 2nd issue
Nick
