> I know I could just write a script that examines the output from the > reducers and picks out the value with the highest temperature. >
In general, I think that this is an acceptable solution. However, if you have a very large number of reducer (let's say, large hundreds or thousands), then reading the content of thousands of files from HDFS can be slow. I would slightly optimize it by using MutlipleOutputs class ( http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html), and configure it in such a way, that each reducer is writing empty output to a file named max-temp_reducer-id.part. Then the last file in your output directory is the one that contains the maximal temperature in first part of filename (alternatively, you can use a name convention <MAX-LONG-max-temp>_reducer-id.part, so that the file with the maximal temp will be the first one in your output directory -> might it be faster?). Obviously to minimize the number of reduce tasks, you can use Combiner class in this case, so that reducers will have less data to process. An alternative way that I can imagine: If you have a couple of reducers only, each of them can publish maximal temperature as a counter, and when the job finishes, you can get counters from Job object and find the one with the highest temperature. Please not that counters are "expensive" (at least in MRv1), so that you should not have many of them (up to tens, but rather not hundreds http://developer.yahoo.com/blogs/hadoop/apache-hadoop-best-practices-anti-patterns-465.html )
