Paolo Mazzoni wrote:
After the error before (see post for config files: Local filesystem
crawl problem).
I got this error:
...
fetching file:///cygdrive/c/Temp
org.apache.nutch.protocol.file.FileError: File Error: 404
at
org.apache.nutch.protocol.file.File.getProtocolOutput(File.java:100)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145)
fetch of file:///cygdrive/c/Temp failed with:
org.apache.nutch.protocol.file.Fil
eError: File Error: 404
Fetcher: done
....
I configured in a text file urls/urls.tct to crawl in the directory
file:///cygdrive/c/Temp
The folder exists, but it seems to don't find it, i expected the clawler
to find files inside that,
and fetch them...but it doesn't.
This is not a real path, but a virtual mount point under Cygwin. Java is
completely unaware of the Cygwin layer, and uses your Windows file
system API. You should change your seed url to this:
file:///c:/Temp
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com