Deepika, It is up to you to decide the granularity of transactions, whether at the reduce level or at the M/R job level. Your application just need to be able to (efficiently) rerun the transaction in case of an abort. The transaction feature provides your application with two good properties: (i) ignoring the partial changes made by failed clients, (ii) providing isolation between concurrent transactions. I guess what you need from transactions in a M/R job is the former. This is because when the write set of transactions are large, the probability of write-write conflict between two transactions goes high and it becomes hard to get some progress with so many aborts. If you are planning to run long transactions (with large write sets) in parallel, avoiding write-write conflicts should be taken care of at the application layer by having the concurrent transactions to write to different data elements. In this case, Omid could also be optimized by disabling the submission of ids of the write set to the status oracle.
Cheers - Maysam Yabandeh On Mar 20, 2012, at 2:13 AM, Deepika Khera wrote: Thanks Maysam. I am trying out Omid to see if it will fit my needs. As I told you I am writing to hbase from a map reduce jobs. If my commit and rollback is around a reducer task then it will be quite straight forward. But if the commit should happen if all tasks of the M/R job succeed(which is what I would want, because if some reducer tasks succeed and some fail, it will not be possible to rerun partial data), it gets tricky. Am I on the wrong track? Thanks, Deepika On Mon, 2012-03-19 at 11:44 -0700, Maysam Yabandeh wrote: Hi Deepika, Omid provides Snapshot Isolation (SI), which is a well-known isolation guarantee in database systems such as Oracle. In short, each transaction reads from a consistent snapshot that does not include partial changes by concurrent (or failed) transactions. SI also prevents write-write conflicts between concurrent transactions. The overhead of Omid on HBase is negligible and does not require any changes into HBase, with the only exception of HBase garbage collection algorithm that is replaced via a coprocessor. hbase-trx, on the other hand, does not provide read snapshots and is not safe with client failures. You can find a more detailed comparison in the Omid wiki page: https://github.com/yahoo/omid/wiki Cheers - Maysam Yabandeh On Mar 19, 2012, at 6:49 PM, Deepika Khera wrote: Hi, I have some map reduce jobs that write to Hbase. I am trying to pick a library that could provide transactional support for Hbase. I looked at Omid and hbase-trx . Could you please provide me with a comparison between the two so I can make the right choice. Are there any other ways to do this? Thanks, Deepika
