thanks Ryan! (I was about to look for performance numbers) Just another question -- slightly related to locks. Will HBase 0.90 include HTable.checkAndPut receiving more than one value to check? I'm eager to help, if possible.
On Fri, Jul 16, 2010 at 5:58 PM, Guilherme Germoglio <[email protected]> wrote: > > thanks Ryan! (I was about to look for performance numbers) > Just another question -- slightly related to locks. Will HBase 0.90 include > HTable.checkAndPut receiving more than one value to check? I'm eager to help, > if possible. > On Fri, Jul 16, 2010 at 5:24 PM, Ryan Rawson <[email protected]> wrote: >> >> Explicit locks with zookeeper would be (a) slow and (b) completely out >> of band and ultimately up to you. I wouldn't exactly be eager to do >> our row locking in zookeeper (since the minimum operation time is >> between 2-10ms). >> >> You could do application advisory locks, but that is true no matter >> what datastore you use... >> >> On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio >> <[email protected]> wrote: >> > What about implementing explicit row locks using the zookeeper? I'm >> > planning >> > to do this sometime in the near future. Does anyone have any comments >> > against this approach? >> > >> > (or maybe it was already implemented by someone :-) >> > >> > On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson <[email protected]> wrote: >> > >> >> HTable.close does very little: >> >> >> >> public void close() throws IOException{ >> >> flushCommits(); >> >> } >> >> >> >> >> >> None of which involves row locks. >> >> >> >> One thing to watch out for is to remember to close your scanners - >> >> they continue to use server-side resources until you close them or 60 >> >> seconds passes and they get timed out. Also be very wary of using any >> >> of the explicit row locking calls, they are generally trouble for more >> >> or less everyone. There was a proposal to remove them, but I don't >> >> think that went through. >> >> >> >> >> >> On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene <[email protected]> wrote: >> >> > >> >> > On Jul 16, 2010, at 6:41 PM, Michael Segel wrote: >> >> > >> >> > >> >> > >> >> > Thanks for the response. >> >> > (You don't need to include the cc ...) >> >> > >> >> > With respect to the row level locking ... >> >> > I was interested in when the lock is actually acquired, how long the >> >> > lock >> >> persists and when is the lock released. >> >> > From your response, the lock is only held on updating the row, and while >> >> the data is being written to the memory cache which is then written to >> >> disk. >> >> (Note: This row level locking different than transactional row level >> >> locking.) >> >> > >> >> > Now that I've had some caffeine I think I can clarify... :-) >> >> > >> >> > Some of my developers complained that they were having trouble with two >> >> different processes trying to update the same table. >> >> > Not sure why they were having the problem, so I wanted to have a good >> >> fix. The simple fix was to have them issue the close() the HTable >> >> connection >> >> which forces any resources that they acquired to be released. >> >> > >> >> > >> >> > It would help to know what the exact problem was. Normally I wouldn't >> >> > see >> >> any problems. >> >> > >> >> > >> >> > In looking at the problem... its possible that they didn't have >> >> > AutoFlush >> >> set to true so the write was still in the buffer and hadn't gotten >> >> flushed. >> >> > >> >> > If the lock only persists for the duration of the write to memory and is >> >> then released, then the issue could have been that the record written was >> >> in >> >> the buffer and not yet flushed to disk. >> >> > >> >> > >> >> > At the region server level HBase will use the cache for both reads and >> >> writes. This happens transparently for the user. Once something is written >> >> in the cache, all other clients will read from the same cache. No need to >> >> worry if the cache has been flushed. >> >> > Lars George has a good article about the hbase storage architecture >> >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >> >> > >> >> > I'm also assuming that when you run a scan() against a region that any >> >> information written to buffer but not yet written to disk will be missed. >> >> > >> >> > >> >> > When you do puts into hbase you'll use HTable. The HTable instance is on >> >> the client. HTable keeps a buffer as well and if autoFlush is false it >> >> only >> >> flushes when you do flushCommits() or when it reaches the buffer limit, or >> >> when you close the table. With autoFlush set to true it will flush for >> >> every >> >> put. >> >> > This buffer is on the client. So when data is actually flushed it gets >> >> > on >> >> the region server where it will get in the region server cache and WAL. >> >> > Unless a client flushes the put no other client can see the data because >> >> it still resides on the client only. Depending on what you need to do you >> >> can use autoFlush true if you are doing many small writes that need to be >> >> seen immediately by others. You can use autoFlush false and issue >> >> flushCommits() yourself, or you can rely on the buffer limit for that. >> >> > >> >> > So I guess the question isn't so much the issue of a lock, but that we >> >> need to make sure that data written to the buffer should be flushed ASAP >> >> unless we know that we're going to be writing a lot of data in the m/r >> >> job. >> >> > >> >> > >> >> > Usually when you write from the reducer (heavy) is better to use a >> >> > buffer >> >> and not autoFlush to have a good performance. >> >> > >> >> > Cosmin >> >> > >> >> > >> >> > Thx >> >> > >> >> > -Mike >> >> > >> >> > >> >> > >> >> > From: [email protected]<mailto:[email protected]> >> >> > To: [email protected]<mailto:[email protected]> >> >> > CC: [email protected]<mailto:[email protected]> >> >> > Date: Fri, 16 Jul 2010 12:34:36 +0100 >> >> > Subject: Re: Row level locking? >> >> > >> >> > Currently a row is part of a region and there's a single region server >> >> serving that region at a particular moment. >> >> > So when that row is updated a lock is acquired for that row until the >> >> actual data is updated in memory (note that a put will be written to cache >> >> on the region server and also persisted in the write-ahead log - WAL). >> >> Subsequent puts to that row will have to wait for that lock. >> >> > >> >> > HBase is fully consistent. This being said all the locking takes place >> >> > at >> >> row level only, so when you scan you have to take that into account as >> >> there's no range locking. >> >> > >> >> > I'm not sure I understand the resource releasing issue. HTable.close() >> >> flushes the current write buffer (you can have write buffer if you use >> >> autoFlush set to false). >> >> > >> >> > Cosmin >> >> > >> >> > >> >> > On Jul 16, 2010, at 1:33 PM, Michael Segel wrote: >> >> > >> >> > >> >> > Ok, >> >> > >> >> > First, I'm writing this before I've had my first cup of coffee so I am >> >> apologizing in advance if the question is a brain dead question.... >> >> > >> >> > Going from a relational background, some of these questions may not make >> >> sense in the HBase world. >> >> > >> >> > >> >> > When does HBase acquire a lock on a row and how long does it persist? >> >> Does the lock only hit the current row, or does it also lock the adjacent >> >> rows too? >> >> > Does HBase support the concept of 'dirty reads'? >> >> > >> >> > The issue is what happens when you have two jobs trying to hit the same >> >> table at the same time and update/read the rows at the same time. >> >> > >> >> > A developer came across a problem and the fix was to use the >> >> HTable.close() method to release any resources. >> >> > >> >> > I am wondering if you explicitly have to clean up or can a lazy >> >> > developer >> >> let the object just go out of scope and get GC'd. >> >> > >> >> > Thx >> >> > >> >> > -Mike >> >> > >> >> > >> >> > _________________________________________________________________ >> >> > The New Busy is not the too busy. Combine all your e-mail accounts with >> >> Hotmail. >> >> > >> >> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 >> >> > >> >> > >> >> > _________________________________________________________________ >> >> > Hotmail is redefining busy with tools for the New Busy. Get more from >> >> your inbox. >> >> > >> >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 >> >> > >> >> > >> >> >> > >> > >> > >> > -- >> > Guilherme >> > >> > msn: [email protected] >> > homepage: http://sites.google.com/site/germoglio/ >> > > > > > -- > Guilherme > > msn: [email protected] > homepage: http://sites.google.com/site/germoglio/ -- Guilherme msn: [email protected] homepage: http://sites.google.com/site/germoglio/
