Hi Bryan, yep, that same advice is in the hbase book.

http://hbase.apache.org/book.html#mapreduce.specex

That's a good suggestion, and perhaps moving that config to
TableMapReduceUtil would be beneficial.




On 9/10/11 4:22 PM, "Bryan Keller" <[email protected]> wrote:

>I believe there is a problem with Hadoop's speculative execution (which
>is on by default), and HBase's TableOutputFormat. If I understand
>correctly, speculative execution can launch the same task on multiple
>nodes, but only "commit" the one that finishes first. The other tasks
>that didn't complete are killed.
>
>I encountered some strange behavior with speculative execution and
>TableOutputFormat. It looks like context.write() will submit the rows to
>HBase (when the write buffer is full). But there is no "rollback" if the
>task that submitted the rows did not finish first and is later killed.
>The rows remain submitted.
>
>My particular job uses a partitioner so one node will process all records
>that match the partition. The reducer selects among the records and
>persists these to HBase. With speculative execution turned on, the
>reducer for the partition is actually run on 2 nodes, and both end up
>inserting into HBase, even though the second reducer is eventually
>killed. The results were not what I wanted.
>
>Turning off speculative execution resolved my issue. I think this should
>be set off by default when using TableOutputFormat, unless there is a
>better way to handle this.

Reply via email to