Job not finished on nutch and hadoop

Bartosz Gadzimski Thu, 14 May 2009 02:13:47 -0700

Hello,

Problem is partialy solved but I still write it :)


Usuing bin/nutch commands (inject, generate, fetch etc.) is working.

Only bin/nutch crawl is not

--------------------------------------------------------------------------

I have successfully setup hadoop cluster on 6 nodes (1namenode+jobtracker, 5 datanodes)


Everything looks ok, 'put'ing to dfs, looking at jobs from jobtracker etc.

But when running a crawl I have:
Generator: 0 records selected for fetching, exiting ...

2009-05-14 08:13:44,420 INFO crawl.Crawl - Stopping at depth=0 - nomore URLs to fetch.2009-05-14 08:13:44,420 WARN crawl.Crawl - No URLs to fetch - checkyour seed list and URL filters.

2009-05-14 08:13:44,420 INFO  crawl.Crawl - crawl finished: crawled

But everything should be ok, the same seedlist and crawl-urlfilter areperforming well on local box (it's only 1 url)


One thing that I noticed is that I can't find segments dir.

Generator: starting
Generator: segment: crawled/segments/20090514081315

Job is successfull
generate: select crawled/segments/20090514081315

But it's not on dfs:
/nutch/search$ bin/hadoop dfs -ls crawled
Found 1 items

drwxr-xr-x - nutch supergroup 0 2009-05-14 08:13/user/nutch/crawled/crawldb

I am using default nutch installation with crawl command and nutchhadoop tutorial from wiki




Thanks,
Bartosz

Job not finished on nutch and hadoop

Reply via email to