Finally I solved. It was a problem of the URLS that I was trying to analyze. I was trying to crawl and parse links with spaces in them. I mean, this kind of links: http://nutch user/nutch.doc. I solve this problem by changing some things of the URL filter. Thanks by the way. -- View this message in context: http://lucene.472066.n3.nabble.com/Parsing-ppt-xls-rtf-and-doc-tp765912p776476.html Sent from the Nutch - User mailing list archive at Nabble.com.
- Parsing .ppt, .xls, .rtf and .doc nachonieto3
- Re: Parsing .ppt, .xls, .rtf and .doc nachonieto3