Hi,
I am trying to understand Hadoop Map method compared to spark Map and I
noticed that spark Map only receives 3 arguments 1) input value 2) output
key 3) output value, however in hadoop map it has 4 values 1) input key 2)
input value 3) output key 4) output value. Is there any reason it was
designed this way? Just trying to undersand:
Hadoop:
public void map(K key, V val,
OutputCollector<K, V> output, Reporter reporter)
--------
// Count each word in each batch
JavaPairDStream<String, Integer> *pairs* = words.mapToPair(
*new* *PairFunction<String, String, Integer>()* {
@Override *public* Tuple2<String, Integer> call(String s)
*throws* Exception {
*return* *new* Tuple2<String, Integer>(s, 1);
}
});