Uroš Gruber wrote:
Hi,
I've made some changes in CrawlDbReader to read from fetchlist made
from generate command. First I thought that I have problems with this
script because some urls from inject were missing. Then I test on only
6 urls. I've manualy check file generated with inject and by generate
and generate made only 3 urls in fetch list.
I don't quite understand this. As far as I understand generate command
it collects urls from crawdb, do some sorting by score and puts it to
crawl_generate directory.
Are you running in a local mode, or in map-reduce mode with several
tasktrackers? what is the number of reduce tasks in this "generate" job?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com