just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command line and check how many map task its running. and also set this in mapred-site.xml and check.
*Thanks & Regards * ∞ Shashwat Shriparv On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <[email protected]> wrote: > Hi Sai, > > What Viji indicated is that the default Apache Hadoop setting for any > input is 2 maps. If the input is larger than one block, regular > policies of splitting such as those stated by Shekhar would apply. But > for smaller inputs, just for an out-of-box "parallelism experience", > Hadoop ships with a 2-maps forced splitting default > (mapred.map.tasks=2). > > This means your 5 lines is probably divided as 2:3 or other ratios and > is processed by 2 different Tasks. As Viji also indicated, to turn off > this behavior, you can set the mapred.map.tasks to 1 in your configs > and then you'll see only one map task process all 5 lines. > > On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <[email protected]> wrote: > > Thanks Viji. > > I am confused a little when the data is small y would there b 2 tasks. > > U will use the min as 2 if u need it but in this case it is not needed > due > > to size of the data being small > > so y would 2 map tasks exec. > > Since it results in 1 block with 5 lines of data in it > > i am assuming this results in 5 map computations 1 per each line > > and all of em in 1 process/node since i m using a pseudo vm. > > Where is the second task coming from. > > The 5 computations of map on each line is 1 task. > > Is this right. > > Please help. > > Thanks > > > > > > ________________________________ > > From: Viji R <[email protected]> > > To: [email protected]; Sai Sai <[email protected]> > > Sent: Thursday, 26 September 2013 5:09 PM > > Subject: Re: 2 Map tasks running for a small input file > > > > Hi, > > > > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to > > avoid this. > > > > Regards, > > Viji > > > > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <[email protected]> wrote: > >> Hi > >> Here is the input file for the wordcount job: > >> ****************** > >> Hi This is a simple test. > >> Hi Hadoop how r u. > >> Hello Hello. > >> Hi Hi. > >> Hadoop Hadoop Welcome. > >> ****************** > >> > >> After running the wordcount successfully > >> here r the counters info: > >> > >> *************** > >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386 > >> Launched reduce tasks 0 0 1 > >> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0 > >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0 > >> Launched map tasks 0 0 2 > >> Data-local map tasks 0 0 2 > >> SLOTS_MILLIS_REDUCES 0 0 9,199 > >> *************** > >> My question why r there 2 launched map tasks when i have only a small > >> file. > >> Per my understanding it is only 1 block. > >> and should be only 1 split. > >> Then for each line a map computation should occur > >> but it shows 2 map tasks. > >> Please let me know. > >> Thanks > >> Sai > >> > > > > > > > > -- > Harsh J >
