Paolo Mazzoni wrote:
After the error before (see post for config files: Local filesystem crawl problem).
I got this error:

...
fetching file:///cygdrive/c/Temp
org.apache.nutch.protocol.file.FileError: File Error: 404
at org.apache.nutch.protocol.file.File.getProtocolOutput(File.java:100) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145) fetch of file:///cygdrive/c/Temp failed with: org.apache.nutch.protocol.file.Fil
eError: File Error: 404
Fetcher: done
....

I configured in a text file urls/urls.tct to crawl in the directory file:///cygdrive/c/Temp

The folder exists, but it seems to don't find it, i expected the clawler to find files inside that,
and fetch them...but it doesn't.

This is not a real path, but a virtual mount point under Cygwin. Java is completely unaware of the Cygwin layer, and uses your Windows file system API. You should change your seed url to this:

        file:///c:/Temp


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to