By fetcher.limit property,do you mean fetcher.timelimit.mins ? because I
have it set on default (-1) - no time limit.


On Sat, Mar 16, 2013 at 5:12 PM, feng lu <[email protected]> wrote:

> Hi Amit
>
> <<
> I also note
> that the total hit by time limit here is 50927 but the job counters show
> 7493.
> >>
>
> This two time limits are all set bye fetcher.limit property. One is used in
> QueueFeeder class, indicate that the QueueFeeder should finish load data if
> current time is larger than time limit. So the total hit by time limit is
> 50927. Another is used in FetchItemQueues class, indicate that check the
> time if current time is larger than time limit and feeder has stopped ,
> emptying the queues, So here job counters of time limit is 7493. There are
> not equal.
>
>
> <<
> Summing all of theses numbers does equal the total map input.
> >>
>
> do you set the property of "fetcher.follow.outlinks.depth", when
> fetcher.parse is true and this value is greater than 0 the fetcher will
> extract outlinks
>   and follow until the desired depth is reached.
>
> Another reason is that when this page is redirect to another page , fetch
> will add new redirect page to fetch queues, so you can see that map input
> is not equal to numbers of all sum.
>
>
>
>
>
>
> On Sat, Mar 16, 2013 at 8:03 AM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
> > Hi Amit,
> >
> > I know this thread is a bit old now, however it is also something which
> > bugged me when I was looking into something else (InjectorJob counters).
> >
> > On Tue, Mar 5, 2013 at 3:16 AM, Amit Sela <[email protected]> wrote:
> >
> > >
> > > And summing all counters does not equal the total map input...
> > >
> > > Summing all of theses numbers does equal the total map input. I also
> note
> > > that the total hit by time limit here is 50927 but the job counters
> show
> > > 7493.
> > >
> > >
> > Basically, the easiest way to see and generally understand counters is to
> > run the Nutch application within your Hadoop cluster (if no cluster
> > available then use psudo mode) and use the web application interface to
> > Hadoop. You will clearly see all counters associated with the job and you
> > can take it from there.
> > I like the notion of creating custom counters to obtain specific metrics
> > but this is solely driven by user requirements.
> > Do you want to learn more about counters? Look into the code.
> > Do you want to know more about Nutch counters, or make the counters more
> > explicit? Then consider opening a Jira issue and we can discuss this in
> > more detail.
> > With regards to the Fetcher, there are many possible areas where counters
> > are (and could be) really useful... as I said though this s only driven
> by
> > user requirements.
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>

Reply via email to