Hi Amit

<<
I also note
that the total hit by time limit here is 50927 but the job counters show
7493.
>>

This two time limits are all set bye fetcher.limit property. One is used in
QueueFeeder class, indicate that the QueueFeeder should finish load data if
current time is larger than time limit. So the total hit by time limit is
50927. Another is used in FetchItemQueues class, indicate that check the
time if current time is larger than time limit and feeder has stopped ,
emptying the queues, So here job counters of time limit is 7493. There are
not equal.


<<
Summing all of theses numbers does equal the total map input.
>>

do you set the property of "fetcher.follow.outlinks.depth", when
fetcher.parse is true and this value is greater than 0 the fetcher will
extract outlinks
  and follow until the desired depth is reached.

Another reason is that when this page is redirect to another page , fetch
will add new redirect page to fetch queues, so you can see that map input
is not equal to numbers of all sum.






On Sat, Mar 16, 2013 at 8:03 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi Amit,
>
> I know this thread is a bit old now, however it is also something which
> bugged me when I was looking into something else (InjectorJob counters).
>
> On Tue, Mar 5, 2013 at 3:16 AM, Amit Sela <[email protected]> wrote:
>
> >
> > And summing all counters does not equal the total map input...
> >
> > Summing all of theses numbers does equal the total map input. I also note
> > that the total hit by time limit here is 50927 but the job counters show
> > 7493.
> >
> >
> Basically, the easiest way to see and generally understand counters is to
> run the Nutch application within your Hadoop cluster (if no cluster
> available then use psudo mode) and use the web application interface to
> Hadoop. You will clearly see all counters associated with the job and you
> can take it from there.
> I like the notion of creating custom counters to obtain specific metrics
> but this is solely driven by user requirements.
> Do you want to learn more about counters? Look into the code.
> Do you want to know more about Nutch counters, or make the counters more
> explicit? Then consider opening a Jira issue and we can discuss this in
> more detail.
> With regards to the Fetcher, there are many possible areas where counters
> are (and could be) really useful... as I said though this s only driven by
> user requirements.
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to