Look into mapred.max.split.size mapred.min.split.size and number of mapper in mapred-site.xml
*Thanks & Regards * ∞ Shashwat Shriparv On Mon, May 13, 2013 at 12:50 PM, Agarwal, Nikhil <[email protected] > wrote: > Hi,**** > > ** ** > > I have a 3-node cluster, with JobTracker running on one machine and > TaskTrackers on other two. Instead of using HDFS, I have written my own > FileSystem implementation. As an experiment, I kept 1000 text files (all of > same size) on both the slave nodes and ran a simple Wordcount MR job. It > took around 50 mins to complete the task. Afterwards, I concatenated all > the 1000 files into a single file and then ran a Wordcount MR job, it took > 35 secs. From the JobTracker UI I could make out that the problem is > because of the number of mappers that JobTracker is creating. For 1000 > files it creates 1000 maps and for 1 file it creates 1 map (irrespective of > file size). **** > > ** ** > > Thus, is there a way to reduce the number of mappers i.e. can I control > the number of mappers through some configuration parameter so that Hadoop > would club all the files until it reaches some specified size (say, 64 MB) > and then make 1 map per 64 MB block?**** > > ** ** > > Also, I wanted to know how to see which file is being submitted to which > TaskTracker or if that is not possible then how do I check if some data > transfer is happening in between my slave nodes during a MR job?**** > > ** ** > > Sorry for so many questions and Thank you for your time.**** > > ** ** > > Regards,**** > > Nikhil**** >
