Uroš Gruber wrote:
Hi,

I've made some changes in CrawlDbReader to read from fetchlist made from generate command. First I thought that I have problems with this script because some urls from inject were missing. Then I test on only 6 urls. I've manualy check file generated with inject and by generate and generate made only 3 urls in fetch list.

I don't quite understand this. As far as I understand generate command it collects urls from crawdb, do some sorting by score and puts it to crawl_generate directory.

Are you running in a local mode, or in map-reduce mode with several tasktrackers? what is the number of reduce tasks in this "generate" job?

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to