A map task is created for each input split in your dataset. By default, an input split correlates to block in HDFS i.e. if a file consists of 1 HDFS block, then 1 map task will be started - if a file consists of N blocks, then N map task will be started for that file (obviously, assuming a default settings).
PiEstimator generates input files for itself. When you submit PiEstimator job, you need to specify how many map tasks you want to run. Then, before submitting a job to the cluster, it will generate a this number of input files in HDFS. For each file map task will be started. What is interesting each file, will contain a single line only. You can see some code here http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#PiEstimator 278 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#278> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> //generate an input file for each map task 279 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#279> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> for(int i=0; i < numMaps; ++i) { 280 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#280> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> final Path <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/fs/Path.java#Path> file = new Path <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/fs/Path.java#Path>(inDir, "part"+i); 281 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#281> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> final LongWritable <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/LongWritable.java#LongWritable> offset = new LongWritable <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/LongWritable.java#LongWritable>(i * numPoints); 282 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#282> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> final LongWritable <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/LongWritable.java#LongWritable> size = new LongWritable <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/LongWritable.java#LongWritable>(numPoints); 283 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#283> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> final SequenceFile <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/SequenceFile.java#SequenceFile.Writer>.Writer <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/SequenceFile.java#SequenceFile.Writer> writer = SequenceFile.createWriter <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/SequenceFile.java#SequenceFile.createWriter%28org.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.Path%2Cjava.lang.Class%2Cjava.lang.Class%2Corg.apache.hadoop.io.SequenceFile.CompressionType%29>( 284 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#284> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> fs, jobConf, file, 285 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#285> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> LongWritable <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/LongWritable.java#LongWritable>.class, LongWritable <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/LongWritable.java#LongWritable>.class, CompressionType <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/SequenceFile.java#SequenceFile.CompressionType.0NONE>.NONE <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/SequenceFile.java#SequenceFile.CompressionType.0NONE>); 286 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#286> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> try { 287 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#287> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> writer.append <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/SequenceFile.java#SequenceFile.Writer.append%28org.apache.hadoop.io.Writable%2Corg.apache.hadoop.io.Writable%29>(offset, size); 288 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#288> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> } finally { 289 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#289> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> writer.close <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/0.20.2-cdh3u1/org/apache/hadoop/io/SequenceFile.java#SequenceFile.Writer.close%28%29>(); 290 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#290> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> } 291 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#291> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> System <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/System.java#System.0out>.out <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/System.java#System.0out>.println <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/PrintStream.java#PrintStream.println%28java.lang.String%29>("Wrote input for Map #"+i); 292 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#292> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examples/PiEstimator.java#> } 2013/12/18 - <[email protected]> > How does the PI example can determine the number of mappers? > I thought the only way to determine number of mappers is via the amount of > filesplits you have in the input file... > So for instance if the input size is 100MB and filesplit size is 20MB then > I would expect to have 100/20 = 5 map tasks. > > Thanks > >
