Finally I solved. It was a problem of the URLS that I was trying to analyze.
I was trying to crawl and parse links with spaces in them. I mean, this kind
of links: http://nutch user/nutch.doc.
I solve this problem by changing some things of the URL filter.
Thanks by the way.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Parsing-ppt-xls-rtf-and-doc-tp765912p776476.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to