Nutch 1.0 and Office 2007 documents

2009-12-09 Thread Joe Bell
Hi, I'm also curious as to whether anyone has had success with Nutch and parsing Office 2007 documents (.pptx, .xlsx, .docx) - I get the same errors as seen here - http://old.nabble.com/How-to-successfully-crawl-and-index-office-2007-do cuments-in-Nutch-1.0-td26640949.html#a26640949 Is a

Nutch 1.0 ms-powerpoint plugin

2009-12-06 Thread Joe Bell
Hi - this is my first post to the nutch mailing list, please let me know if I commit any list protocol errors. I'm currently using Nutch 1.0 with the Powerpoint plugin enabled and can verify that Nutch is indeed pulling in the entire file for passing off to the parser (i.e., I've set the