hello,
 
I am a new user on the Nutch application.
I configure my nutch-site.xml to index several type of format  from my file 
system :
<value>nutch-extensionpoints|protocol-file|urlfilter-regex|parse-(msword|xml|text|html|js|pdf)|index-basic|query-(basic|site|url)</value>
when I test with a ".txt" it works
but I make the crawl for  ".pdf" or ".doc" file, there is a problem in the 
fetch :
fetch of file:///C:/doc/test.pdf failed with: java.lang.Exception: 
org.apache.nutch.protocol.file.FileError: File Error: 404
 
could someone help me?
Aïcha 

Reply via email to