Hi, I am trying to configure a recent nutch (0.8+) to configure to fetch directly from the file system instead of http which is fairly slow. The fetcher hits a 404 - File not found (see below). When I'm copying the file:/// <file:///> URL into lynx it gets found without any problems.
2006-09-15 10:29:57,739 INFO fetcher.Fetcher - fetching file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\ <file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\> -\ Leapfrog/Keystone/Architecture/Archives/info.txt 2006-09-15 10:29:57,746 INFO fetcher.Fetcher - fetch of file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\ <file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\> -\ Leapfrog/Keystone/Architecture/Archives/info.txt failed with: org.apache.nutch.protocol.file.FileError: File Error: 404 Anybody having a similar problem - or better - resolution? Cheers, Bruno
