Re: Hadoop counter

Michael Segel Fri, 19 Oct 2012 07:36:11 -0700

As I understand it... each Task has its own counters and are independently 
updated. As they report back to the JT, they update the counter(s)' status.
The JT then will aggregate them.

In terms of performance, Counters take up some memory in the JT so while its OK 
to use them, if you abuse them, you can run in to issues. 
As to limits... I guess that will depend on the amount of memory on the JT 
machine, the size of the cluster (Number of TT) and the number of counters. 

In terms of global accessibility... Maybe.

The reason I say maybe is that I'm not sure by what you mean by globally 
accessible. 
If a task creates and implements a dynamic counter... I know that it will 
eventually be reflected in the JT. However, I do not believe that a separate 
Task could connect with the JT and see if the counter exists or if it could get 
a value or even an accurate value since the updates are asynchronous.  Not to 
mention that I don't believe that the counters are aggregated until the job 
ends. It would make sense that the JT maintains a unique counter for each task 
until the tasks complete. (If a task fails, it would have to delete the 
counters so that when the task is restarted the correct count is maintained. )  
Note, I haven't looked at the source code so I am probably wrong. 

HTH
Mike
On Oct 19, 2012, at 5:50 AM, Lin Ma <[email protected]> wrote:

> Hi guys,
> 
> I have some quick questions regarding to Hadoop counter,
> 
> Hadoop counter (customer defined) is global accessible (for both read and 
> write) for all Mappers and Reducers in a job?
> What is the performance and best practices of using Hadoop counters? I am not 
> sure if using Hadoop counters too heavy, there will be performance downgrade 
> to the whole job?
> regards,
> Lin

Re: Hadoop counter

Reply via email to