Hi all,

I am trying to better understand the counters and logging of the fetch
MapReduce executed when crawling.

When looking at the job counters in the MapReduce web UI, I note the
following counters and values:

*Map input records                         162,080*
moved                                 345
robots_denied                         4,441
robots_denied_maxcrawldelay 259
*hitByTimeLimit                 7,493*
exception                                 3,801
notmodified                         2
gone                                         48
access_denied                         1
*success                                 93,583*
temp_moved                        3,068
notfound                                1,490

And summing all counters does not equal the total map input...

But, when I go to the map task logs, at the end of each log there is a line
stating:

QueueFeeder finished: total *36651* records + hit by time limit :*20975*
QueueFeeder finished: total *30248* records + hit by time limit :*25492*
QueueFeeder finished: total *44257* records + hit by time limit :*4460*
*
*
Summing all of theses numbers does equal the total map input. I also note
that the total hit by time limit here is 50927 but the job counters show
7493.

Anyone can elaborate ?

Thanks,
Amit.

Reply via email to