On Oct 19, 2012, at 11:27 AM, Lin Ma <[email protected]> wrote: > Hi Mike, > > Thanks for the detailed reply. Two quick questions/comments, > > 1. For "task", you mean a specific mapper instance, or a specific reducer > instance?
Either. > 2. "However, I do not believe that a separate Task could connect with the JT > and see if the counter exists or if it could get a value or even an accurate > value since the updates are asynchronous." -- do you mean if a mapper is > updating custom counter ABC, and another mapper is updating the same customer > counter ABC, their counter values are updated independently by different > mappers, and will not published (aggregated) externally until job completed > successfully? > I meant that if a Task created and updated a counter, a different Task has access to that counter. To give you an example, if I want to count the number of quality errors and then fail after X number of errors, I can't use Global counters to do this. > regards, > Lin > > On Fri, Oct 19, 2012 at 10:35 PM, Michael Segel <[email protected]> > wrote: > As I understand it... each Task has its own counters and are independently > updated. As they report back to the JT, they update the counter(s)' status. > The JT then will aggregate them. > > In terms of performance, Counters take up some memory in the JT so while its > OK to use them, if you abuse them, you can run in to issues. > As to limits... I guess that will depend on the amount of memory on the JT > machine, the size of the cluster (Number of TT) and the number of counters. > > In terms of global accessibility... Maybe. > > The reason I say maybe is that I'm not sure by what you mean by globally > accessible. > If a task creates and implements a dynamic counter... I know that it will > eventually be reflected in the JT. However, I do not believe that a separate > Task could connect with the JT and see if the counter exists or if it could > get a value or even an accurate value since the updates are asynchronous. > Not to mention that I don't believe that the counters are aggregated until > the job ends. It would make sense that the JT maintains a unique counter for > each task until the tasks complete. (If a task fails, it would have to delete > the counters so that when the task is restarted the correct count is > maintained. ) Note, I haven't looked at the source code so I am probably > wrong. > > HTH > Mike > On Oct 19, 2012, at 5:50 AM, Lin Ma <[email protected]> wrote: > >> Hi guys, >> >> I have some quick questions regarding to Hadoop counter, >> >> Hadoop counter (customer defined) is global accessible (for both read and >> write) for all Mappers and Reducers in a job? >> What is the performance and best practices of using Hadoop counters? I am >> not sure if using Hadoop counters too heavy, there will be performance >> downgrade to the whole job? >> regards, >> Lin > >
