AJ Chen wrote:
Any idea why nutch (0.9-dev) does not try to fetch every url
generated? For
example, if Generator generates 200,000 urls, maybe <100,000 urls will be
fetched, succeeded or failed. This is a big difference, which is
obvious by
checking the number of urls in the log or run readseg -list. What
causes a
large number of urls get thrown out by the Fetcher?
Please see rev. 469660 (trunk) and rev. 469667 (branch-0.8) for a
possible fix.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com