Hello,
I'm looking at using Tika for extracting information from
office documents (amoung others) and using this information to build a
lucene index. However, I need to be able to extract information from
OLE2 office docs and OOXML office docs. Looking at the website, there
is a comment that OOXML is awaiting a 3.5 release - however on the
ticket it looks like it is working on the head. Is this the case? If
so then I'd be keen to take the head and 'give it a go' (reporting any
problems I find back to the dev group of course).
Thanks for your time.
Cheers,
Neil