Deepika,

It is up to  you to decide the granularity of transactions, whether at the 
reduce level or at the M/R job level. Your application just need to be able to 
(efficiently) rerun the transaction in case of an abort. The transaction 
feature provides your application with two good properties: (i) ignoring the 
partial changes made by failed clients, (ii) providing isolation between 
concurrent transactions. I guess what you need from transactions in a M/R job 
is the former. This is because when the write set of transactions are large, 
the probability of write-write conflict between two transactions goes high and 
it becomes hard to get some progress with so many aborts. If you are planning 
to run long transactions (with large write sets) in parallel, avoiding 
write-write conflicts should be taken care of at the application layer by 
having the concurrent transactions to write to different data elements. In this 
case, Omid could also be optimized by disabling the submission of ids of the 
write set to the status oracle.

Cheers
- Maysam Yabandeh

On Mar 20, 2012, at 2:13 AM, Deepika Khera wrote:

Thanks Maysam. I am trying out Omid to see if it will fit my needs.

As I told you I am writing to hbase from a map reduce jobs. If my commit
and rollback is around a reducer task then it will be quite straight
forward. But if the commit should happen if all tasks of the M/R job
succeed(which is what I would want, because if some reducer tasks
succeed and some fail, it will not be possible to rerun partial data),
it gets tricky.
Am I on the wrong track?

Thanks,
Deepika

On Mon, 2012-03-19 at 11:44 -0700, Maysam Yabandeh wrote:
Hi Deepika,

Omid provides Snapshot Isolation (SI), which is a well-known isolation 
guarantee in database systems such as Oracle. In short, each transaction reads 
from a consistent snapshot that does not include partial changes by concurrent 
(or failed) transactions. SI also prevents write-write conflicts between 
concurrent transactions. The overhead of Omid on HBase is negligible and does 
not require any changes into HBase, with the only exception of HBase garbage 
collection algorithm that is replaced via a coprocessor. hbase-trx, on the 
other hand, does not provide read snapshots and is not safe with client 
failures. You can find a more detailed comparison in the Omid wiki page:
https://github.com/yahoo/omid/wiki

Cheers
- Maysam Yabandeh

On Mar 19, 2012, at 6:49 PM, Deepika Khera wrote:

Hi,

I have some map reduce jobs that write to Hbase. I am trying to pick a
library that could provide transactional support for Hbase. I looked at
Omid and hbase-trx .

Could you please provide me with a comparison between the two so I can
make the right choice.
Are there any other ways to do this?

Thanks,
Deepika








Reply via email to