Re: Hadoop counter

Michael Segel Fri, 19 Oct 2012 09:43:10 -0700

On Oct 19, 2012, at 11:27 AM, Lin Ma <[email protected]> wrote:

> Hi Mike,
> 
> Thanks for the detailed reply. Two quick questions/comments,
> 
> 1. For "task", you mean a specific mapper instance, or a specific reducer 
> instance?


Either. 

> 2. "However, I do not believe that a separate Task could connect with the JT 
> and see if the counter exists or if it could get a value or even an accurate 
> value since the updates are asynchronous." -- do you mean if a mapper is 
> updating custom counter ABC, and another mapper is updating the same customer 
> counter ABC, their counter values are updated independently by different 
> mappers, and will not published (aggregated) externally until job completed 
> successfully?
> 
I meant that if a Task created and updated a counter, a different Task has 
access to that counter. 

To give you an example, if I want to count the number of quality errors and 
then fail after X number of errors, I can't use Global counters to do this.

> regards,
> Lin
> 
> On Fri, Oct 19, 2012 at 10:35 PM, Michael Segel <[email protected]> 
> wrote:
> As I understand it... each Task has its own counters and are independently 
> updated. As they report back to the JT, they update the counter(s)' status.
> The JT then will aggregate them. 
> 
> In terms of performance, Counters take up some memory in the JT so while its 
> OK to use them, if you abuse them, you can run in to issues. 
> As to limits... I guess that will depend on the amount of memory on the JT 
> machine, the size of the cluster (Number of TT) and the number of counters. 
> 
> In terms of global accessibility... Maybe.
> 
> The reason I say maybe is that I'm not sure by what you mean by globally 
> accessible. 
> If a task creates and implements a dynamic counter... I know that it will 
> eventually be reflected in the JT. However, I do not believe that a separate 
> Task could connect with the JT and see if the counter exists or if it could 
> get a value or even an accurate value since the updates are asynchronous.  
> Not to mention that I don't believe that the counters are aggregated until 
> the job ends. It would make sense that the JT maintains a unique counter for 
> each task until the tasks complete. (If a task fails, it would have to delete 
> the counters so that when the task is restarted the correct count is 
> maintained. )  Note, I haven't looked at the source code so I am probably 
> wrong. 
> 
> HTH
> Mike
> On Oct 19, 2012, at 5:50 AM, Lin Ma <[email protected]> wrote:
> 
>> Hi guys,
>> 
>> I have some quick questions regarding to Hadoop counter,
>> 
>> Hadoop counter (customer defined) is global accessible (for both read and 
>> write) for all Mappers and Reducers in a job?
>> What is the performance and best practices of using Hadoop counters? I am 
>> not sure if using Hadoop counters too heavy, there will be performance 
>> downgrade to the whole job?
>> regards,
>> Lin
> 
>

Re: Hadoop counter

Reply via email to