(Please don't cross-post to multiple lists)

Emmanuel wrote:
I've been through the code of the CrawlDbReader class. I discovered the
method "processTopNJob" which use the class CrawlDbTopNMapper and
CrawlDbTopNReducer.
I'm wondering why do we have this function. Is it an old implementation that
was used before the Generator to get the TopN links to Fetch or is it
something else ?
I would appreciate if you give me your thoughts.

It's not an old method, it's in use. See the synopsis in CrawlDbReader.main(). The purpose of this option is to dump the top scoring URLs, together with their scores. This is a useful functionality to monitor CrawlDb for potential scoring problems.


I found also some class which are not used, "CrawlDbDumpReducer" its defined
but its never used or instanciate.
Don't you think we can remove it from the source code ?


Yes, we can remove this class - it's equivalent to IdentityReducer, which is used implicitly by this job. This class is a leftover from the time, when it contained also some filtering code.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to