Hello,

Problem is partialy solved but I still write it :)

Usuing bin/nutch commands (inject, generate, fetch etc.) is working.

Only bin/nutch crawl is not

--------------------------------------------------------------------------
I have successfully setup hadoop cluster on 6 nodes (1 namenode+jobtracker, 5 datanodes)

Everything looks ok, 'put'ing to dfs, looking at jobs from jobtracker etc.

But when running a crawl I have:
Generator: 0 records selected for fetching, exiting ...
2009-05-14 08:13:44,420 INFO crawl.Crawl - Stopping at depth=0 - no more URLs to fetch. 2009-05-14 08:13:44,420 WARN crawl.Crawl - No URLs to fetch - check your seed list and URL filters.
2009-05-14 08:13:44,420 INFO  crawl.Crawl - crawl finished: crawled


But everything should be ok, the same seedlist and crawl-urlfilter are performing well on local box (it's only 1 url)

One thing that I noticed is that I can't find segments dir.

Generator: starting
Generator: segment: crawled/segments/20090514081315

Job is successfull
generate: select crawled/segments/20090514081315

But it's not on dfs:
/nutch/search$ bin/hadoop dfs -ls crawled
Found 1 items
drwxr-xr-x - nutch supergroup 0 2009-05-14 08:13 /user/nutch/crawled/crawldb

I am using default nutch installation with crawl command and nutch hadoop tutorial from wiki



Thanks,
Bartosz



Reply via email to