Andrzej Bialecki wrote:
Uroš Gruber wrote:
Hi,
I've made some changes in CrawlDbReader to read from fetchlist made
from generate command. First I thought that I have problems with this
script because some urls from inject were missing. Then I test on
only 6 urls. I've manualy check file generated with inject and by
generate and generate made only 3 urls in fetch list.
I don't quite understand this. As far as I understand generate
command it collects urls from crawdb, do some sorting by score and
puts it to crawl_generate directory.
Are you running in a local mode, or in map-reduce mode with several
tasktrackers? what is the number of reduce tasks in this "generate" job?
I'm running local mode with mapred.reduce.tasks as default (1) and (2)
map.tasks.
regards
Uros
- Re: bug or feature Uroš Gruber
-