Parsing .ppt, .xls, .rtf and .doc

nachonieto3 Thu, 29 Apr 2010 10:34:03 -0700

Hello everyone,

I'm using Nutch v0.9 I'm able to crawl, fetch and parse html and .pdf. But
when I try with .ppt, .xls, .rtf and .doc I don't have any problem but when
I use SegmentReader to get the information of each url I don't find any
parsetext in these formats. I configured the plugins and I allow them to
work. This is the result that I get when I try with a .xls format
http://n3.nabble.com/forum/FileDownload.jtp?type=n&id=765912&name=untitled2.bmp


Any suggestion about what I'm doing wrong??How can I check if the plugins
are parsing??

Thank you in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Parsing-ppt-xls-rtf-and-doc-tp765912p765912.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Parsing .ppt, .xls, .rtf and .doc

Reply via email to