I will try later with the new version from svn. There seems to be a
problem accessing ulisse.pin.unifi.it, i get a connection timed out
during the build.

Interesting thing is that when I open the problem file and save it
again (in .rtf again), indexing works fine even though it still
contains embedded .vsds. So, it probably is a problem with the
RTFParser after all.  I will try again when I manage to run the new
build.

Is there anything I should try in the meantime? Were there documented
problems with certain .rtfs before?

On Thu, Jun 24, 2010 at 1:30 PM, Nick Burch <[email protected]> wrote:
> On Thu, 24 Jun 2010, Mango wrote:
>>
>> From what I can tell is that thre rest of .rtf files are parsed without a
>> problem. Only those with embedded Visio diagrams create problems. Wierd
>> thing is I just tried parsing a .doc document with embedded .vsd and it was
>> parsed without a problem.
>
> One test would be to try extracting out the visio file, and see if Tika can
> parse just that?
>
>> I did try to build tika with poi 3.7 beta1, but the compilation failed
>> because of missing constructors and incompatible types, all in hsmf. I read
>> the thread where you mention some changes should be made in tika's hsmf
>> classes.
>
> Yesterday I upgraded tika to using poi 3.7 beta 1, applying the outstanding
> patches to svn to allow Tika to work nicely with 3.7 beta 1. Try doing a svn
> up and re-building.
>
> Nick
>

Reply via email to