Hello, As of hadoop documentation, mapper output is not saved in HDFS. It will be saved in temporary local disk which we can modify using mapred.local.dir in mapred.site.xml file. I was able to see the mapper output in this directory and once the job is done it flushes out the data from this temporary directory as it is not needed anymore. My question was how to get the size of this mapper output. Anyhow I figured it now, When running map reduce job execute “du –ah” will give me the size of all directories including subdirectories.
Thanks & Regards, Abdul Navaz Research Assistant University of Houston Main Campus, Houston TX From: "[email protected]" <[email protected]> Reply-To: <[email protected]> Date: Tuesday, December 16, 2014 at 2:12 AM To: user <[email protected]> Subject: Re: Re: Where the output of mappers are saved ? Thanks Susheel !, understood. [email protected] > > From: Susheel Kumar Gadalay <mailto:[email protected]> > Date: 2014-12-16 15:27 > To: user <mailto:[email protected]> > Subject: Re: Re: Where the output of mappers are saved ? > I don't think so. It will be a single output file per reducer. > > If u want multiple small size output files then specify the number of > reducers in the job configuration. > > On 12/16/14, [email protected] <[email protected]> wrote: >> > Thanks Susheel!! >> > One more question.. If part-r-XXXX is extremely large,say, 2G, will the >> > file be splitted into more files under the output directory,that is, one >> > reducer could product more than one files. >> > >> > >> > >> > [email protected] >> > >> > From: Susheel Kumar Gadalay >> > Date: 2014-12-16 14:17 >> > To: user >> > Subject: Re: Re: Where the output of mappers are saved ? >> > Yes, the map outputs will be cleaned on job completion. >> > >> > If u want to see the map outputs give number of reducers as zero >> > and verify the files part-m-0000, part-m-0001.... >> > >> > On 12/16/14, [email protected] <[email protected]> wrote: >>> >> Do they only exist during the map/reduce process and will be removed >>> >> after >>> >> the MR finished? >>> >> >>> >> When the reduce finished,I only see part-m-0000, part-m-0001 ...., which >>> >> are reduce results. >>> >> >>> >> >>> >> >>> >> [email protected] >>> >> >>> >> From: Susheel Kumar Gadalay >>> >> Date: 2014-12-16 13:05 >>> >> To: user >>> >> Subject: Re: Where the output of mappers are saved ? >>> >> Map outputs will be in hdfs under your user name and output directory. >>> >> >>> >> They will have name like part-m-0000, part-m-0001 .... >>> >> >>> >> >>> >> On 12/16/14, Abdul Navaz <[email protected]> wrote: >>>> >>> Hello, >>>> >>> >>>> >>> >>>> >>> Second Try ! >>>> >>> >>>> >>> >>>> >>> I have created a directory to store this mapper output as below. >>>> >>> <property> >>>> >>> <name>mapred.local.dir</name> >>>> >>> <value>/app/hadoop/tmp/myoutput</value> >>>> >>> </property> >>>> >>> and i looked at >>>> >>> hduser@dn4:/app/hadoop/tmp/myoutput$ ls -lrt >>>> >>> total 16 >>>> >>> drwxr-xr-x 2 hduser hadoop 4096 Dec 12 10:50 tt_log_tmp >>>> >>> drwx------ 3 hduser hadoop 4096 Dec 12 10:53 ttprivate >>>> >>> drwxr-xr-x 3 hduser hadoop 4096 Dec 12 10:53 taskTracker >>>> >>> drwxr-xr-x 4 hduser hadoop 4096 Dec 12 13:25 userlogs >>>> >>> and i couldnot find anything here when i run the map reduce job . Where >>>> >>> by >>>> >>> default mapper output is saved and how can I get the size of mapper >>>> >>> output >>>> >>> in bytes >>>> >>> >>>> >>> >>>> >>> Thanks. >>>> >>> >>>> >>> >>>> >>> From: Abdul Navaz <[email protected]> >>>> >>> Date: Friday, December 12, 2014 at 12:36 AM >>>> >>> To: "[email protected]" <[email protected]> >>>> >>> Subject: Where the output of mappers are saved ? >>>> >>> >>>> >>> Hello, >>>> >>> >>>> >>> >>>> >>> I am interested in efficiently manage the Hadoop shuffling traffic and >>>> >>> utilize the network bandwidth effectively. To do this I want to know how >>>> >>> much shuffling traffic generated by each Datanodes ? Shuffling traffic >>>> >>> is >>>> >>> nothing but the output of mappers. So where this mapper output is saved >>>> >>> ? >>>> >>> How can i get the size of mapper output from each datanodes in a real >>>> >>> time >>>> >>> ? >>>> >>> Appreciate your help. >>>> >>> >>>> >>> Thanks & Regards, >>>> >>> >>>> >>> Abdul Navaz >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>> >> >> >
