Jukka Zitting wrote:
Hi,
On Fri, Mar 13, 2009 at 9:55 AM, steven <[email protected]> wrote:
Wonder if the text extracting support of MS Office 2007 documents is
available already?
Not currently, but there's a patch for that in
https://issues.apache.org/jira/browse/JCR-1887. In the sandbox we also
have an experimental generic text extractor component based on Apache
Tika that can extract text from a wide range of document formats,
including Office 2007. Both depend on the 3.5 beta releases from
Apache POI.
We will most likely include the JCR-1887 patch in Jackrabbit 1.6 and
target for replacing our custom text extractors with Apache Tika in
Jackrabbit 2.0.
I see, thanks for the clarification.
steven