On Thu, 24 Jun 2010, Mango wrote:
From what I can tell is that thre rest of .rtf files are parsed without a problem. Only those with embedded Visio diagrams create problems. Wierd thing is I just tried parsing a .doc document with embedded .vsd and it was parsed without a problem.
One test would be to try extracting out the visio file, and see if Tika can parse just that?
I did try to build tika with poi 3.7 beta1, but the compilation failed because of missing constructors and incompatible types, all in hsmf. I read the thread where you mention some changes should be made in tika's hsmf classes.
Yesterday I upgraded tika to using poi 3.7 beta 1, applying the outstanding patches to svn to allow Tika to work nicely with 3.7 beta 1. Try doing a svn up and re-building.
Nick
