Hello everyone, I'm using Nutch v0.9 I'm able to crawl, fetch and parse html and .pdf. But when I try with .ppt, .xls, .rtf and .doc I don't have any problem but when I use SegmentReader to get the information of each url I don't find any parsetext in these formats. I configured the plugins and I allow them to work. This is the result that I get when I try with a .xls format http://n3.nabble.com/forum/FileDownload.jtp?type=n&id=765912&name=untitled2.bmp
Any suggestion about what I'm doing wrong??How can I check if the plugins are parsing?? Thank you in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Parsing-ppt-xls-rtf-and-doc-tp765912p765912.html Sent from the Nutch - User mailing list archive at Nabble.com.