Hello,
Problem is partialy solved but I still write it :)
Usuing bin/nutch commands (inject, generate, fetch etc.) is working.
Only bin/nutch crawl is not
--------------------------------------------------------------------------
I have successfully setup hadoop cluster on 6 nodes (1
namenode+jobtracker, 5 datanodes)
Everything looks ok, 'put'ing to dfs, looking at jobs from jobtracker etc.
But when running a crawl I have:
Generator: 0 records selected for fetching, exiting ...
2009-05-14 08:13:44,420 INFO crawl.Crawl - Stopping at depth=0 - no
more URLs to fetch.
2009-05-14 08:13:44,420 WARN crawl.Crawl - No URLs to fetch - check
your seed list and URL filters.
2009-05-14 08:13:44,420 INFO crawl.Crawl - crawl finished: crawled
But everything should be ok, the same seedlist and crawl-urlfilter are
performing well on local box (it's only 1 url)
One thing that I noticed is that I can't find segments dir.
Generator: starting
Generator: segment: crawled/segments/20090514081315
Job is successfull
generate: select crawled/segments/20090514081315
But it's not on dfs:
/nutch/search$ bin/hadoop dfs -ls crawled
Found 1 items
drwxr-xr-x - nutch supergroup 0 2009-05-14 08:13
/user/nutch/crawled/crawldb
I am using default nutch installation with crawl command and nutch
hadoop tutorial from wiki
Thanks,
Bartosz