While using System.out inside a Mapper or Reducer is fine as an aid to learning, be careful: accidentally leaving them in (or not moving to something like log4J) and running the job for real can mean writing millions of lines of log output on a tasktracker, filling up disks and making jobs needlessly slow.
Paul On 27 March 2013 10:38, zheyi rong <[email protected]> wrote: > Hello, > > Q1. > Depends on your need. If you would like an overall statistics, for > example, the number of the malformed records in your datasets, > use counters. If you just want to know what is going on inside a mapper or > reducer, use System.out.println; > since mappers do not know each other, you cannot get an overall statistics > of your job by using System.out.println(). > The output of System.out.println() will finally appear in the tasklog. > > Q2. > In a distributed environment, mappers do not know each other. Imagine that > mapper A is running on a machine, and mapper B is running on another > machine, so in mapper A, you cannot get the internal state of mapper B > simply by System.out.println(). > > Q3. > Harsh J answered it. > > Zheyi. > > 2013/3/27 Sai Sai <[email protected]> > >> Q1. Is it right to assume the System.out.println statements are used only >> in eclipse environment and >> In a multi node cluster environment we need to use counters. >> >> Q2. I am slightly confused as it appears like using System.out.println >> statements >> we r able to get detailed info at every line of code in eclipse and >> counters just give few lines and not as detailed as System.out.println >> statements do so what should we do in a multi node cluster enivronment. >> >> Q3. Also when they say the limit of counters is 120 does that mean that >> in the output if we use: >> context.getCounters("TestGroup1","TestName1").increment(1); >> more than 120 times it will not print it. or does it refer to 120 options >> of counters in an enum that we can define. >> >> Any help is really appreciated. >> Thanks >> Sai >> >> >> >
