Hi,

On 08.02.2013 00:55, Nick Burch wrote:
One thing that I think might be interesting is seeing if you could write
a shim to allow the ODFToolkit .ods support to implement the POI
spreadsheet interfaces. If it could, that could open up quite a few more
users, as people who have code to target .xls and .xlsx could then also
output .ods with only a few lines of code changed

While talking to Yegor at ApacheCon Europe I noticed that this might indeed be a huge opportunity. There is lots of code out there that uses POI and could immediately benefit from the support.

I have no idea how complicated this will be (it's been a while since I have used POI) but I would definitively be interested in trying this. This could at first be a part of the ODFToolkit and later probably be moved to POI.


Another piece of integration that would be good is with Apache Tika.
Tika currently has a slightly hacky piece of xml processing code to turn
ODF files into XHTML. It's low memory and fast, but not easy to
maintain. It'd be great to use some streaming support from ODFToolkit to
make the Tika code easier and more feature-ful, but again I think there
are some feature gaps. It should drive a lot more use though, if
anyone's interested and able to spend a bit of time on it?

From my understanding of the project this is far more difficult as there currently is no streaming support, all access is DOM based. Judging from my experience with the Solr extractor memory efficiency is a very important issue for Tika.

As the Tika API is rather simple (at least as far as I know it) it wouldn't be that hard to write a DOM based version but I doubt if this is the right approach.

Regards
Florian


--
Florian Hopf
Freelance Software Developer

http://blog.florian-hopf.de

Reply via email to