Re: Omid: Transactional Support for HBase

Daniel Gómez Ferro Tue, 08 Nov 2011 01:49:10 -0800

Hi Jignesh

On Nov 7, 2011, at 21:44 , Jignesh Patel wrote:


> Looks like this transaction is limited for one row. Is that correct?
> 

No, it's not. Transactions can span multiple rows.

> Another thing I don't have zookeepr installed as I am running in
> pseudo distibuted mode. The document doesn't say anything about
> integrating in pseudo distributed mode.
> 

Currently Omid requires both ZooKeeper and BookKeeper to operate, but we 
provide some scripts to launch them locally if you just want to try it. I've 
just pushed a change so you don't need to install anything manually, just 
download/checkout Omid, run 'mvn package' and follow the instructions to run 
the benchmark locally.

If people still find cumbersome or difficult to run ZK/BK we could provide an 
option to disable the replication to the WAL.

Daniel

> -Jignesh
> 
> 2011/11/7 Daniel Gómez Ferro <[email protected]>:
>> 
>> On Nov 6, 2011, at 21:53 , lars hofhansl wrote:
>> 
>>> Another question: I assume this will not work out of the box with deletes?
>> 
>> Hi,
>> 
>> Our current approach does support deletes (i.e., user requested deletes). 
>> Right now we use empty values as delete marks: when the user calls 
>> TransactionalTable.delete() we insert empty values at the specified 
>> timestamp. At the filtering time, we keep track of these delete marks and we 
>> can discard the ones that are uncommitted or fall outside our time range of 
>> interest. When a transaction aborts, the cleanup procedure deletes the 
>> specific values inserted by the transactions (in contrast to all versions). 
>> This way we don't insert delete tombstones that mask previous values.
>> 
>> The drawbacks of this approach are that (i) we give a special meaning to the 
>> empty values, and (ii) to delete the whole column family (in contrast with a 
>> column) we have to perform a get beforehand to obtain the column qualifiers.
>> 
>>> 
>>> Deletes always cover all key values in the past (from their timestamps on 
>>> backwards), so once a delete marker is placed there is no way to get back 
>>> any of a puts it affects.
>>> 
>>> HBase trunk has HBASE-4536 to allow time-range scans to work with deleted 
>>> rows (but needs to be enabled for a column family - I still think it should 
>>> be the default, but anyway).
>>> 
>> 
>> I think this feature would be very useful, and enables a cleaner 
>> implementation. It would be great if the flag was enabled by default, we 
>> want the user to change as little as possible his setup, but it's not a big 
>> deal.
>> 
>>> -- Lars
>>> 
>>> ________________________________
>>> From: Flavio Junqueira <[email protected]>
>>> To: Daniel Gómez Ferro <[email protected]>
>>> Cc: "[email protected]" <[email protected]>; lars hofhansl 
>>> <[email protected]>; "[email protected]" <[email protected]>; 
>>> Maysam Yabandeh <[email protected]>; Benjamin Reed 
>>> <[email protected]>; Ivan Kelly <[email protected]>
>>> Sent: Sunday, November 6, 2011 7:14 AM
>>> Subject: Re: Omid: Transactional Support for HBase
>>> 
>>> 
>>> A quick note on Omid for the ones following on github: the repository we 
>>> will be working with is the fork under the Yahoo! account:
>>> 
>>> 
>>> https://github.com/yahoo/omid/
>>> 
>>> -Flavio
>>> 
>>> 
>>> On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:
>>> 
>>> 
>>>> 
>>>> On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>>>> 
>>>> Cool stuff Daniel,
>>>>> 
>>>> 
>>>> Hi Lars,
>>>> 
>>>> Thanks for the good points.
>>>> 
>>>> 
>>>> 
>>>>> Was looking through the code a bit. Seems like you make a best effort to 
>>>>> push as much of
>>>>> the filtering of KVs of uncommitted transactions to HBase and then do 
>>>>> some filtering on the client
>>>>> not a bad approach. (I hope I didn't misunderstand the approach, only 
>>>>> looked through the code for
>>>>> 1/2 hour or so).
>>>>> 
>>>> 
>>>> Putting it more accurately, the uncommitted KVs are stored at HBase, but 
>>>> it is the client's job to filter them using the commit information that it 
>>>> has received from the status oracle. According to snapshot isolation 
>>>> guarantee, all the versions that are inserted with a timestamp larger than 
>>>> the transaction start timestamp must be ignored, which is done by setting 
>>>> the time range on the client's get request sent to HBase. Since the 
>>>> uncommitted changes of the aborted transactions are eventually removed 
>>>> from HBase, the client rarely needs to fetch more than a version to reach 
>>>> a KV that is committed before the transaction starts (the first property 
>>>> of snapshot isolation).
>>>> 
>>>> 
>>>>> 
>>>>> One thing I was wondering: Why bookkeeper? Why not store the WAL itself 
>>>>> in HBase? That way
>>>>> you might not even need a separate server.
>>>>> 
>>>>> Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), 
>>>>> they also do MVCC
>>>>> on top of unaltered HBase/schema, although from reading that paper I get 
>>>>> the impression that it
>>>>> would not scale to scans touching many rows (which is where your client 
>>>>> side filtering comes in).
>>>>> 
>>>> 
>>>> 
>>>> Thanks for the link. We had seen the other paper of the same authors 
>>>> (Grid2010) that shares the same bottlenecks with the recent work.
>>>> As you pointed out correctly, the question is about performance. You could 
>>>> see the scalability bottleneck of 400 TPS in the evaluation section of 
>>>> this paper. Our approach, however, provides snapshot isolation with a 
>>>> negligible overhead on region servers, and could scale up to tens of 
>>>> thousands write transactions per second. If you are interested, a summary 
>>>> of techniques that we used to achieve this performance is published at 
>>>> SOSP'11, poster section.
>>>> http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>>>> 
>>>> 
>>>>> -- Lars
>>>>> 
>>>>> 
>>>>> ----- Original Message -----
>>>>> From: Daniel Gómez Ferro <[email protected]>
>>>>> To: "[email protected]" <[email protected]>; 
>>>>> "[email protected]" <[email protected]>
>>>>> Cc: Maysam Yabandeh <[email protected]>; Flavio Junqueira 
>>>>> <[email protected]>; Benjamin Reed <[email protected]>; Ivan Kelly 
>>>>> <[email protected]>
>>>>> Sent: Friday, November 4, 2011 4:24 AM
>>>>> Subject: Omid: Transactional Support for HBase
>>>>> 
>>>>> (I apologize for resending but I forgot to add the user list.)
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> It is my pleasure to announce the open source release of Omid, a project 
>>>>> whose goal is to add lock-free transactional support on top of HBase. The 
>>>>> current release includes CrSO, a client-replicated status oracle that 
>>>>> detects the write-write conflicts to provide Snapshot Isolation. CrSO has 
>>>>> the following appealing properties:
>>>>> 
>>>>> 1) It does not need any modification into the HBase code nor the table 
>>>>> scheme.
>>>>> 2) The overhead on HBase DataNodes is negligible (only after an abort)
>>>>> 3) It scales up to 50,000 write transactions per second (TPS) and a 
>>>>> thousand of client connections.
>>>>> 
>>>>> We have setup a github project: https://github.com/dgomezferro/omid
>>>>> 
>>>>> More information is available at the wiki: 
>>>>> https://github.com/dgomezferro/omid/wiki
>>>>> 
>>>>> If you are interested, installation and running instructions are 
>>>>> available on the README: 
>>>>> https://github.com/dgomezferro/omid/blob/master/README.md
>>>>> 
>>>>> Please do not hesitate to contact us in the case of any question.
>>>>> 
>>>>> Best Regards,
>>>>> Daniel Gómez Ferro
>>>>> 
>>>>> 
>>>> 
>>> 
>>> flavio
>>> junqueira
>>> 
>>> research scientist
>>> 
>>> [email protected]
>>> direct +34 93-183-8828
>>> 
>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>> phone (408) 349 3300    fax (408) 349 3301
>> 
>>

Re: Omid: Transactional Support for HBase

Reply via email to