Tika has been fixed this issue[1] in Tika 1.0[2]. But, we still need to keep our eyes on Tika and the memory optimized streaming API for read-only and single pass.
Anyway, let's speed up the process of initial release. BTW: Anyone volunteer to do some pre-work for the streaming API? [1] https://issues.apache.org/jira/browse/TIKA-736 [2] http://tika.apache.org/1.0/index.html 2011/10/24 Devin Han <[email protected]> > I saw this issue in Tika: OpenOffice parser: master footer text isn't > extracted https://issues.apache.org/jira/browse/TIKA-736 > > The current ODF parser of Tika doesn't touch the styles part and the > embeded document, only meta and content. They are waiting for the first ODF > Toolkit incubating release, then switch to a full featured parser much as > they have for the POI powered ones. > > The first release is coming and we will have no code update before it. So, > I suggest start the discussion that how to use ODF Toolkit to realize it > based on the snapshot. > > This feature concerns ODFDOM and Simple ODF API. We have involved text > extraction in the cookbook and demo, see: > > > http://incubator.apache.org/odftoolkit/simple/document/cookbook/TextExtractor.html > http://incubator.apache.org/odftoolkit/simple/demo/demo2.html > > The work we need to do: > (1) What' s the detail requirements of Tika? > (2) Whether the exist features odf ODF Toolkit can cover the requirements > of Tika? > (3) How to use ODF Toolkit realize it? > > CC to Tika Dev list, in case, guys in this list are interested in this > issue. > -- > -Devin > -- -Devin
