Hi Mike, Sorry I am a bit lost... As you are thinking faster than me. :-P
>From your this statement "It would make sense that the JT maintains a unique counter for each task until the tasks complete." -- it seems each task cannot see counters from each other, since JT maintains a unique counter for each tasks; >From your this comment "I meant that if a Task created and updated a counter, a different Task has access to that counter. " -- it seems different tasks could share/access the same counter. Appreciate if you could help to clarify a bit. regards, Lin On Sat, Oct 20, 2012 at 12:42 AM, Michael Segel <[email protected]>wrote: > > On Oct 19, 2012, at 11:27 AM, Lin Ma <[email protected]> wrote: > > Hi Mike, > > Thanks for the detailed reply. Two quick questions/comments, > > 1. For "task", you mean a specific mapper instance, or a specific reducer > instance? > > > Either. > > 2. "However, I do not believe that a separate Task could connect with the > JT and see if the counter exists or if it could get a value or even an > accurate value since the updates are asynchronous." -- do you mean if a > mapper is updating custom counter ABC, and another mapper is updating the > same customer counter ABC, their counter values are updated independently > by different mappers, and will not published (aggregated) externally until > job completed successfully? > > I meant that if a Task created and updated a counter, a different Task has > access to that counter. > > To give you an example, if I want to count the number of quality errors > and then fail after X number of errors, I can't use Global counters to do > this. > > regards, > Lin > > On Fri, Oct 19, 2012 at 10:35 PM, Michael Segel <[email protected] > > wrote: > >> As I understand it... each Task has its own counters and are >> independently updated. As they report back to the JT, they update the >> counter(s)' status. >> The JT then will aggregate them. >> >> In terms of performance, Counters take up some memory in the JT so while >> its OK to use them, if you abuse them, you can run in to issues. >> As to limits... I guess that will depend on the amount of memory on the >> JT machine, the size of the cluster (Number of TT) and the number of >> counters. >> >> In terms of global accessibility... Maybe. >> >> The reason I say maybe is that I'm not sure by what you mean by globally >> accessible. >> If a task creates and implements a dynamic counter... I know that it will >> eventually be reflected in the JT. However, I do not believe that a >> separate Task could connect with the JT and see if the counter exists or if >> it could get a value or even an accurate value since the updates are >> asynchronous. Not to mention that I don't believe that the counters are >> aggregated until the job ends. It would make sense that the JT maintains a >> unique counter for each task until the tasks complete. (If a task fails, it >> would have to delete the counters so that when the task is restarted the >> correct count is maintained. ) Note, I haven't looked at the source code >> so I am probably wrong. >> >> HTH >> Mike >> On Oct 19, 2012, at 5:50 AM, Lin Ma <[email protected]> wrote: >> >> Hi guys, >> >> I have some quick questions regarding to Hadoop counter, >> >> >> - Hadoop counter (customer defined) is global accessible (for both >> read and write) for all Mappers and Reducers in a job? >> - What is the performance and best practices of using Hadoop >> counters? I am not sure if using Hadoop counters too heavy, there will be >> performance downgrade to the whole job? >> >> regards, >> Lin >> >> >> > >
