The most common problem is not setting the agent name in the nutch-site.xml file. First off check the log files for the task and see if any errors are occuring and it would be good to see more of your configuration for crawl-urlfilter and nutch-site.

Dennis

Volkan Ebil wrote:
Hİ,

I have setup nutch and hadoop succesfully.

No problem at start.sh and stop.sh.

I create a dir name urls with a txt file as seed.

After I run the command
bin/hadoop dfs -put urls urls

it works .I check the list with the command

bin/hadoop dfs -ls
After that i have edited the crawl-urlfilter.txt and nutch-site.xml
hadoop-site.xml and other configurations

At last i ran bin/nutch crawl command but it gives

No urls to fetch check your filter and seed list error I have observed the content of the webdb with the command readdb -stats
There is no problem at generate ,inject.

I am sure there is no problem in crawl-url filter and other configuration
xml files

İs anyone know any possible problem????

Thanks in advance.


Reply via email to