Re: Text extractor for MS Office 2007 documents

Jukka Zitting Fri, 13 Mar 2009 02:25:13 -0700

Hi,

On Fri, Mar 13, 2009 at 9:55 AM, steven <[email protected]> wrote:
> Wonder if the text extracting support of MS Office 2007 documents is
> available already?


Not currently, but there's a patch for that in
https://issues.apache.org/jira/browse/JCR-1887. In the sandbox we also
have an experimental generic text extractor component based on Apache
Tika that can extract text from a wide range of document formats,
including Office 2007. Both depend on the 3.5 beta releases from
Apache POI.

We will most likely include the JCR-1887 patch in Jackrabbit 1.6 and
target for replacing our custom text extractors with Apache Tika in
Jackrabbit 2.0.

BR,

Jukka Zitting

Re: Text extractor for MS Office 2007 documents

Reply via email to