Hi, On Fri, Mar 13, 2009 at 9:55 AM, steven <[email protected]> wrote: > Wonder if the text extracting support of MS Office 2007 documents is > available already?
Not currently, but there's a patch for that in https://issues.apache.org/jira/browse/JCR-1887. In the sandbox we also have an experimental generic text extractor component based on Apache Tika that can extract text from a wide range of document formats, including Office 2007. Both depend on the 3.5 beta releases from Apache POI. We will most likely include the JCR-1887 patch in Jackrabbit 1.6 and target for replacing our custom text extractors with Apache Tika in Jackrabbit 2.0. BR, Jukka Zitting
