Hi all,

 Anyone successfully used nutch to index Office 2007 documents? I know that
this question has already been asked, but considering the number of e-mails
asking the same question, looks like that Nutch does not support Office 2007
documents.

 Best,

 Adilson

On Wed, Dec 9, 2009 at 2:27 PM, Joe Bell <[email protected]> wrote:

> Hi,
>
>
>
> I'm also curious as to whether anyone has had success with Nutch and
> parsing Office 2007 documents (.pptx, .xlsx, .docx) - I get the same
> errors as seen here -
> http://old.nabble.com/How-to-successfully-crawl-and-index-office-2007-do
> cuments-in-Nutch-1.0-td26640949.html#a26640949<http://old.nabble.com/How-to-successfully-crawl-and-index-office-2007-do%0Acuments-in-Nutch-1.0-td26640949.html#a26640949>
>
>
>
> Is a separate plugin required to parse these documents (i.e.,
> parse-msexcel, parse-mspowerpoint, etc. will *not* work?)
>
>
>
> I noticed the comment on the above thread - docx should be parsed,A
> plugin can be used to Parsed docx file. you get some
> help info from parse-html plugin and so on. - but didn't find it really
> helpful.
>
>
>
> Regards,
>
> Joe
>
>
>
>
> This message is confidential to Prodea Systems, Inc unless otherwise
> indicated
> or apparent from its nature. This message is directed to the intended
> recipient
> only, who may be readily determined by the sender of this message and its
> contents. If the reader of this message is not the intended recipient, or
> an
> employee or agent responsible for delivering this message to the intended
> recipient:(a)any dissemination or copying of this message is strictly
> prohibited; and(b)immediately notify the sender by return message and
> destroy
> any copies of this message in any form(electronic, paper or otherwise) that
> you
> have.The delivery of this message and its information is neither intended
> to be
> nor constitutes a disclosure or waiver of any trade secrets, intellectual
> property, attorney work product, or attorney-client communications. The
> authority of the individual sending this message to legally bind Prodea
> Systems
> is neither apparent nor implied,and must be independently verified.

Reply via email to