Hi Bryan, yep, that same advice is in the hbase book. http://hbase.apache.org/book.html#mapreduce.specex
That's a good suggestion, and perhaps moving that config to TableMapReduceUtil would be beneficial. On 9/10/11 4:22 PM, "Bryan Keller" <[email protected]> wrote: >I believe there is a problem with Hadoop's speculative execution (which >is on by default), and HBase's TableOutputFormat. If I understand >correctly, speculative execution can launch the same task on multiple >nodes, but only "commit" the one that finishes first. The other tasks >that didn't complete are killed. > >I encountered some strange behavior with speculative execution and >TableOutputFormat. It looks like context.write() will submit the rows to >HBase (when the write buffer is full). But there is no "rollback" if the >task that submitted the rows did not finish first and is later killed. >The rows remain submitted. > >My particular job uses a partitioner so one node will process all records >that match the partition. The reducer selects among the records and >persists these to HBase. With speculative execution turned on, the >reducer for the partition is actually run on 2 nodes, and both end up >inserting into HBase, even though the second reducer is eventually >killed. The results were not what I wanted. > >Turning off speculative execution resolved my issue. I think this should >be set off by default when using TableOutputFormat, unless there is a >better way to handle this.
