I will try later with the new version from svn. There seems to be a problem accessing ulisse.pin.unifi.it, i get a connection timed out during the build.
Interesting thing is that when I open the problem file and save it again (in .rtf again), indexing works fine even though it still contains embedded .vsds. So, it probably is a problem with the RTFParser after all. I will try again when I manage to run the new build. Is there anything I should try in the meantime? Were there documented problems with certain .rtfs before? On Thu, Jun 24, 2010 at 1:30 PM, Nick Burch <[email protected]> wrote: > On Thu, 24 Jun 2010, Mango wrote: >> >> From what I can tell is that thre rest of .rtf files are parsed without a >> problem. Only those with embedded Visio diagrams create problems. Wierd >> thing is I just tried parsing a .doc document with embedded .vsd and it was >> parsed without a problem. > > One test would be to try extracting out the visio file, and see if Tika can > parse just that? > >> I did try to build tika with poi 3.7 beta1, but the compilation failed >> because of missing constructors and incompatible types, all in hsmf. I read >> the thread where you mention some changes should be made in tika's hsmf >> classes. > > Yesterday I upgraded tika to using poi 3.7 beta 1, applying the outstanding > patches to svn to allow Tika to work nicely with 3.7 beta 1. Try doing a svn > up and re-building. > > Nick >
