Hello, I've came across this post on stack overflow:
http://stackoverflow.com/questions/30079822/part-of-key-changes-when-iterating-through-values-when-using-composite-key-had My question is on some Hadoop internals. Basically, suppose that we have a list of (YEAR,TEMPERATURE) and we want to do the following: SELECT MAX(TEMPERATURE), MIN(TEMPERATURE) GROUP BY YEAR Using a second sort with the composite key <Year,Temperature> will do the trick. The key point is that the "Temperature" part in the key will change while iterating over values (that are NullWritable), because we group on the "Year" part of the key. protected void reduce(CompositeKey key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException { for (NullWritable value : values) { !!!! "Temprature" part of the key is changing here while iterating over values !!! } } I don't exactly understand the underlying mechanism. Do you have a pointer to the code that explain why: - the GroupingComparator class will execute a single "reduce" call by YEAR - the value of the key (via the key reference) will change while iterating over values Thanks Thomas
