Re: Text extractor for MS Office 2007 documents

steven Fri, 13 Mar 2009 02:44:16 -0700

Jukka Zitting wrote:

Hi,


On Fri, Mar 13, 2009 at 9:55 AM, steven <[email protected]> wrote:

Wonder if the text extracting support of MS Office 2007 documents is
available already?


Not currently, but there's a patch for that in
https://issues.apache.org/jira/browse/JCR-1887. In the sandbox we also
have an experimental generic text extractor component based on Apache
Tika that can extract text from a wide range of document formats,
including Office 2007. Both depend on the 3.5 beta releases from
Apache POI.

We will most likely include the JCR-1887 patch in Jackrabbit 1.6 and
target for replacing our custom text extractors with Apache Tika in
Jackrabbit 2.0.

I see, thanks for the clarification.

        steven

Re: Text extractor for MS Office 2007 documents

Reply via email to