Hi, On 08.02.2013 00:55, Nick Burch wrote:
One thing that I think might be interesting is seeing if you could write a shim to allow the ODFToolkit .ods support to implement the POI spreadsheet interfaces. If it could, that could open up quite a few more users, as people who have code to target .xls and .xlsx could then also output .ods with only a few lines of code changed
While talking to Yegor at ApacheCon Europe I noticed that this might indeed be a huge opportunity. There is lots of code out there that uses POI and could immediately benefit from the support.
I have no idea how complicated this will be (it's been a while since I have used POI) but I would definitively be interested in trying this. This could at first be a part of the ODFToolkit and later probably be moved to POI.
Another piece of integration that would be good is with Apache Tika. Tika currently has a slightly hacky piece of xml processing code to turn ODF files into XHTML. It's low memory and fast, but not easy to maintain. It'd be great to use some streaming support from ODFToolkit to make the Tika code easier and more feature-ful, but again I think there are some feature gaps. It should drive a lot more use though, if anyone's interested and able to spend a bit of time on it?
From my understanding of the project this is far more difficult as there currently is no streaming support, all access is DOM based. Judging from my experience with the Solr extractor memory efficiency is a very important issue for Tika.
As the Tika API is rather simple (at least as far as I know it) it wouldn't be that hard to write a DOM based version but I doubt if this is the right approach.
Regards Florian -- Florian Hopf Freelance Software Developer http://blog.florian-hopf.de
