Andrzej Bialecki wrote:
Uroš Gruber wrote:
Hi,

I've made some changes in CrawlDbReader to read from fetchlist made from generate command. First I thought that I have problems with this script because some urls from inject were missing. Then I test on only 6 urls. I've manualy check file generated with inject and by generate and generate made only 3 urls in fetch list.

I don't quite understand this. As far as I understand generate command it collects urls from crawdb, do some sorting by score and puts it to crawl_generate directory.

Are you running in a local mode, or in map-reduce mode with several tasktrackers? what is the number of reduce tasks in this "generate" job?

I'm running local mode with mapred.reduce.tasks as default (1) and (2) map.tasks.

regards

Uros

Reply via email to