thanks Ryan! (I was about to look for performance numbers)

Just another question -- slightly related to locks. Will HBase 0.90
include HTable.checkAndPut receiving more than one value to check? I'm
eager to help, if possible.

On Fri, Jul 16, 2010 at 5:58 PM, Guilherme Germoglio
<[email protected]> wrote:
>
> thanks Ryan! (I was about to look for performance numbers)
> Just another question -- slightly related to locks. Will HBase 0.90 include 
> HTable.checkAndPut receiving more than one value to check? I'm eager to help, 
> if possible.
> On Fri, Jul 16, 2010 at 5:24 PM, Ryan Rawson <[email protected]> wrote:
>>
>> Explicit locks with zookeeper would be (a) slow and (b) completely out
>> of band and ultimately up to you.  I wouldn't exactly be eager to do
>> our row locking in zookeeper (since the minimum operation time is
>> between 2-10ms).
>>
>> You could do application advisory locks, but that is true no matter
>> what datastore you use...
>>
>> On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio
>> <[email protected]> wrote:
>> > What about implementing explicit row locks using the zookeeper? I'm 
>> > planning
>> > to do this sometime in the near future. Does anyone have any comments
>> > against this approach?
>> >
>> > (or maybe it was already implemented by someone :-)
>> >
>> > On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson <[email protected]> wrote:
>> >
>> >> HTable.close does very little:
>> >>
>> >>  public void close() throws IOException{
>> >>    flushCommits();
>> >>  }
>> >>
>> >>
>> >> None of which involves row locks.
>> >>
>> >> One thing to watch out for is to remember to close your scanners -
>> >> they continue to use server-side resources until you close them or 60
>> >> seconds passes and they get timed out.  Also be very wary of using any
>> >> of the explicit row locking calls, they are generally trouble for more
>> >> or less everyone.  There was a proposal to remove them, but I don't
>> >> think that went through.
>> >>
>> >>
>> >> On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene <[email protected]> wrote:
>> >> >
>> >> > On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:
>> >> >
>> >> >
>> >> >
>> >> > Thanks for the response.
>> >> > (You don't need to include the cc ...)
>> >> >
>> >> > With respect to the row level locking ...
>> >> > I was interested in when the lock is actually acquired, how long the 
>> >> > lock
>> >> persists and when is the lock released.
>> >> > From your response, the lock is only held on updating the row, and while
>> >> the data is being written to the memory cache which is then written to 
>> >> disk.
>> >> (Note: This row level locking different than transactional row level
>> >> locking.)
>> >> >
>> >> > Now that I've had some caffeine I think I can clarify... :-)
>> >> >
>> >> > Some of my developers complained that they were having trouble with two
>> >> different processes trying to update the same table.
>> >> > Not sure why they were having the problem, so I wanted to have a good
>> >> fix. The simple fix was to have them issue the close() the HTable 
>> >> connection
>> >> which forces any resources that they acquired to be released.
>> >> >
>> >> >
>> >> > It would help to know what the exact problem was. Normally I wouldn't 
>> >> > see
>> >> any problems.
>> >> >
>> >> >
>> >> > In looking at the problem... its possible that they didn't have 
>> >> > AutoFlush
>> >> set to true so the write was still in the buffer and hadn't gotten 
>> >> flushed.
>> >> >
>> >> > If the lock only persists for the duration of the write to memory and is
>> >> then released, then the issue could have been that the record written was 
>> >> in
>> >> the buffer and not yet flushed to disk.
>> >> >
>> >> >
>> >> > At the region server level HBase will use the cache for both reads and
>> >> writes. This happens transparently for the user. Once something is written
>> >> in the cache, all other clients will read from the same cache. No need to
>> >> worry if the cache has been flushed.
>> >> > Lars George has a good article about the hbase storage architecture
>> >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>> >> >
>> >> > I'm also assuming that when you run a scan() against a region that any
>> >> information written to buffer but not yet written to disk will be missed.
>> >> >
>> >> >
>> >> > When you do puts into hbase you'll use HTable. The HTable instance is on
>> >> the client.  HTable keeps a buffer as well and if autoFlush is false it 
>> >> only
>> >> flushes when you do flushCommits() or when it reaches the buffer limit, or
>> >> when you close the table. With autoFlush set to true it will flush for 
>> >> every
>> >> put.
>> >> > This buffer is on the client. So when data is actually flushed it gets 
>> >> > on
>> >> the region server where it will get in the region server cache and WAL.
>> >> > Unless a client flushes the put no other client can see the data because
>> >> it still resides on the client only. Depending on what you need to do you
>> >> can use autoFlush true if you are doing many small writes that need to be
>> >> seen immediately by others. You can use autoFlush false and issue
>> >> flushCommits() yourself, or you can rely on the buffer limit for that.
>> >> >
>> >> > So I guess the question isn't so much the issue of a lock, but that we
>> >> need to make sure that data written to the buffer should be flushed ASAP
>> >> unless we know that we're going to be writing a lot of data in the m/r 
>> >> job.
>> >> >
>> >> >
>> >> > Usually when you write from the reducer (heavy) is better to use a 
>> >> > buffer
>> >> and not autoFlush to have a good performance.
>> >> >
>> >> > Cosmin
>> >> >
>> >> >
>> >> > Thx
>> >> >
>> >> > -Mike
>> >> >
>> >> >
>> >> >
>> >> > From: [email protected]<mailto:[email protected]>
>> >> > To: [email protected]<mailto:[email protected]>
>> >> > CC: [email protected]<mailto:[email protected]>
>> >> > Date: Fri, 16 Jul 2010 12:34:36 +0100
>> >> > Subject: Re: Row level locking?
>> >> >
>> >> > Currently a row is part of a region and there's a single region server
>> >> serving that region at a particular moment.
>> >> > So when that row is updated a lock is acquired for that row until the
>> >> actual data is updated in memory (note that a put will be written to cache
>> >> on the region server and also persisted in the write-ahead log - WAL).
>> >> Subsequent puts to that row will have to wait for that lock.
>> >> >
>> >> > HBase is fully consistent. This being said all the locking takes place 
>> >> > at
>> >> row level only, so when you scan you have to take that into account as
>> >> there's no range locking.
>> >> >
>> >> > I'm not sure I understand the resource releasing issue. HTable.close()
>> >> flushes the current write buffer (you can have write buffer if you use
>> >> autoFlush set to false).
>> >> >
>> >> > Cosmin
>> >> >
>> >> >
>> >> > On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:
>> >> >
>> >> >
>> >> > Ok,
>> >> >
>> >> > First, I'm writing this before I've had my first cup of coffee so I am
>> >> apologizing in advance if the question is a brain dead question....
>> >> >
>> >> > Going from a relational background, some of these questions may not make
>> >> sense in the HBase world.
>> >> >
>> >> >
>> >> > When does HBase acquire a lock on a row and how long does it persist?
>> >> Does the lock only hit the current row, or does it also lock the adjacent
>> >> rows too?
>> >> > Does HBase support the concept of 'dirty reads'?
>> >> >
>> >> > The issue is what happens when you have two jobs trying to hit the same
>> >> table at the same time and update/read the rows at the same time.
>> >> >
>> >> > A developer came across a problem and the fix was to use the
>> >> HTable.close() method to release any resources.
>> >> >
>> >> > I am wondering if you explicitly have to clean up or can a lazy 
>> >> > developer
>> >> let the object just go out of scope and get GC'd.
>> >> >
>> >> > Thx
>> >> >
>> >> > -Mike
>> >> >
>> >> >
>> >> > _________________________________________________________________
>> >> > The New Busy is not the too busy. Combine all your e-mail accounts with
>> >> Hotmail.
>> >> >
>> >> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
>> >> >
>> >> >
>> >> > _________________________________________________________________
>> >> > Hotmail is redefining busy with tools for the New Busy. Get more from
>> >> your inbox.
>> >> >
>> >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
>> >> >
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Guilherme
>> >
>> > msn: [email protected]
>> > homepage: http://sites.google.com/site/germoglio/
>> >
>
>
>
> --
> Guilherme
>
> msn: [email protected]
> homepage: http://sites.google.com/site/germoglio/



--
Guilherme

msn: [email protected]
homepage: http://sites.google.com/site/germoglio/

Reply via email to