Below r my simple mapper, partitioner classes and the input file and the output
displayed on Console at the end of the message:
My question is about the keys it prints in the console window highlighted in
bold in the console output which looks like this:
Here is the first few lines of the output in console:
...
13/03/27 02:20:57 INFO mapred.MapTask: data buffer = 79691776/99614720
13/03/27 02:20:57 INFO mapred.MapTask: record buffer = 262144/327680
key = 0 value = 10 10
token[0] = 10 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 10-10
key = 6 value = 20 200
token[0] = 20 token[1] = 200
Printing Result in Partitioner = 0
IntPair in Mapper = 20-200
Q1: I am confused how/where it is calculating/getting these values Key=0 &
Key=6 and so on?
Q2: After output of the first 2 lines it prints the output from the partitioner
class:
Printing Result in Partitioner = 0
Is this because its happening parallel y the mapper & the partitioner?
Will really appreciate if someone can take a quick look and pour some light in
understanding it.
**** Mapper Class ***
public class SecondarySortMapper extends Mapper<LongWritable, Text, IntPair,
IntWritable> {
private String [] tokens = null;
private IntWritable ONE = new IntWritable(1);
@Override
public void map(LongWritable key, Text value,
Context context)
throws IOException , InterruptedException{
System.out.println("key = " + key.toString() + " value = " +
value.toString());
if(value!=null){
tokens = value.toString().split("\\s+") ;
System.out.println("token[0] = " + tokens[0] + " token[1] = " +
tokens[1] );
ONE.set(Integer.parseInt(tokens[1]));
IntPair ip = new IntPair(Integer.parseInt(tokens[0]),
Integer.parseInt(tokens[1]));
context.write(ip, ONE);
System.out.println("IntPair in Mapper = " + ip.toString());
}
}
**** Partitioner class ***
public class SecondarySortPartitioner extends Partitioner<IntPair, IntWritable>
{
@Override
public int getPartition(IntPair key, IntWritable value, int
numOfPartitions) {
// TODO Auto-generated method stub
int result = (key.getFirst().hashCode())%numOfPartitions;
System.out.println("Printing Result in Partitioner = " + result);
return result;
}
}
*** input file ***
10 10
20 200
30 2500
40 400
50 500
60 1
10 10
30 2500
50 500
10 100
20 2000
30 25000
40 4000
50 5000
60 10
10 100
30 25000
50 5000
********** Here is the output in the console ****
...
13/03/27 02:20:57 INFO mapred.MapTask: data buffer = 79691776/99614720
13/03/27 02:20:57 INFO mapred.MapTask: record buffer = 262144/327680
key = 0 value = 10 10
token[0] = 10 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 10-10
key = 6 value = 20 200
token[0] = 20 token[1] = 200
Printing Result in Partitioner = 0
IntPair in Mapper = 20-200
key = 13 value = 30 2500
token[0] = 30 token[1] = 2500
Printing Result in Partitioner = 0
IntPair in Mapper = 30-2500
key = 21 value = 40 400
token[0] = 40 token[1] = 400
Printing Result in Partitioner = 0
IntPair in Mapper = 40-400
key = 28 value = 50 500
token[0] = 50 token[1] = 500
Printing Result in Partitioner = 0
IntPair in Mapper = 50-500
key = 35 value = 60 1
token[0] = 60 token[1] = 1
Printing Result in Partitioner = 0
IntPair in Mapper = 60-1
key = 40 value = 10 10
token[0] = 10 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 10-10
key = 46 value = 30 2500
token[0] = 30 token[1] = 2500
Printing Result in Partitioner = 0
IntPair in Mapper = 30-2500
key = 54 value = 50 500
token[0] = 50 token[1] = 500
Printing Result in Partitioner = 0
IntPair in Mapper = 50-500
key = 61 value = 10 100
token[0] = 10 token[1] = 100
Printing Result in Partitioner = 0
IntPair in Mapper = 10-100
key = 68 value = 20 2000
token[0] = 20 token[1] = 2000
Printing Result in Partitioner = 0
IntPair in Mapper = 20-2000
key = 76 value = 30 25000
token[0] = 30 token[1] = 25000
Printing Result in Partitioner = 0
IntPair in Mapper = 30-25000
key = 85 value = 40 4000
token[0] = 40 token[1] = 4000
Printing Result in Partitioner = 0
IntPair in Mapper = 40-4000
key = 93 value = 50 5000
token[0] = 50 token[1] = 5000
Printing Result in Partitioner = 0
IntPair in Mapper = 50-5000
key = 101 value = 60 10
token[0] = 60 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 60-10
key = 107 value = 10 100
token[0] = 10 token[1] = 100
Printing Result in Partitioner = 0
IntPair in Mapper = 10-100
key = 114 value = 30 25000
token[0] = 30 token[1] = 25000
Printing Result in Partitioner = 0
IntPair in Mapper = 30-25000
key = 123 value = 50 5000
token[0] = 50 token[1] = 5000
Printing Result in Partitioner = 0
IntPair in Mapper = 50-5000
Thanks
Sai