On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:


Thanks for the response.
(You don't need to include the cc ...)

With respect to the row level locking ...
I was interested in when the lock is actually acquired, how long the lock 
persists and when is the lock released.
>From your response, the lock is only held on updating the row, and while the 
>data is being written to the memory cache which is then written to disk. 
>(Note: This row level locking different than transactional row level locking.)

Now that I've had some caffeine I think I can clarify... :-)

Some of my developers complained that they were having trouble with two 
different processes trying to update the same table.
Not sure why they were having the problem, so I wanted to have a good fix. The 
simple fix was to have them issue the close() the HTable connection which 
forces any resources that they acquired to be released.


It would help to know what the exact problem was. Normally I wouldn't see any 
problems.


In looking at the problem... its possible that they didn't have AutoFlush set 
to true so the write was still in the buffer and hadn't gotten flushed.

If the lock only persists for the duration of the write to memory and is then 
released, then the issue could have been that the record written was in the 
buffer and not yet flushed to disk.


At the region server level HBase will use the cache for both reads and writes. 
This happens transparently for the user. Once something is written in the 
cache, all other clients will read from the same cache. No need to worry if the 
cache has been flushed.
Lars George has a good article about the hbase storage architecture 
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

I'm also assuming that when you run a scan() against a region that any 
information written to buffer but not yet written to disk will be missed.


When you do puts into hbase you'll use HTable. The HTable instance is on the 
client.  HTable keeps a buffer as well and if autoFlush is false it only 
flushes when you do flushCommits() or when it reaches the buffer limit, or when 
you close the table. With autoFlush set to true it will flush for every put.
This buffer is on the client. So when data is actually flushed it gets on the 
region server where it will get in the region server cache and WAL.
Unless a client flushes the put no other client can see the data because it 
still resides on the client only. Depending on what you need to do you can use 
autoFlush true if you are doing many small writes that need to be seen 
immediately by others. You can use autoFlush false and issue flushCommits() 
yourself, or you can rely on the buffer limit for that.

So I guess the question isn't so much the issue of a lock, but that we need to 
make sure that data written to the buffer should be flushed ASAP unless we know 
that we're going to be writing a lot of data in the m/r job.


Usually when you write from the reducer (heavy) is better to use a buffer and 
not autoFlush to have a good performance.

Cosmin


Thx

-Mike



From: [email protected]<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
CC: [email protected]<mailto:[email protected]>
Date: Fri, 16 Jul 2010 12:34:36 +0100
Subject: Re: Row level locking?

Currently a row is part of a region and there's a single region server serving 
that region at a particular moment.
So when that row is updated a lock is acquired for that row until the actual 
data is updated in memory (note that a put will be written to cache on the 
region server and also persisted in the write-ahead log - WAL). Subsequent puts 
to that row will have to wait for that lock.

HBase is fully consistent. This being said all the locking takes place at row 
level only, so when you scan you have to take that into account as there's no 
range locking.

I'm not sure I understand the resource releasing issue. HTable.close() flushes 
the current write buffer (you can have write buffer if you use autoFlush set to 
false).

Cosmin


On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:


Ok,

First, I'm writing this before I've had my first cup of coffee so I am 
apologizing in advance if the question is a brain dead question....

Going from a relational background, some of these questions may not make sense 
in the HBase world.


When does HBase acquire a lock on a row and how long does it persist? Does the 
lock only hit the current row, or does it also lock the adjacent rows too?
Does HBase support the concept of 'dirty reads'?

The issue is what happens when you have two jobs trying to hit the same table 
at the same time and update/read the rows at the same time.

A developer came across a problem and the fix was to use the HTable.close() 
method to release any resources.

I am wondering if you explicitly have to clean up or can a lazy developer let 
the object just go out of scope and get GC'd.

Thx

-Mike


_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4


_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

Reply via email to