Explicit locks with zookeeper would be (a) slow and (b) completely out of band and ultimately up to you. I wouldn't exactly be eager to do our row locking in zookeeper (since the minimum operation time is between 2-10ms).
You could do application advisory locks, but that is true no matter what datastore you use... On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio <[email protected]> wrote: > What about implementing explicit row locks using the zookeeper? I'm planning > to do this sometime in the near future. Does anyone have any comments > against this approach? > > (or maybe it was already implemented by someone :-) > > On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson <[email protected]> wrote: > >> HTable.close does very little: >> >> public void close() throws IOException{ >> flushCommits(); >> } >> >> >> None of which involves row locks. >> >> One thing to watch out for is to remember to close your scanners - >> they continue to use server-side resources until you close them or 60 >> seconds passes and they get timed out. Also be very wary of using any >> of the explicit row locking calls, they are generally trouble for more >> or less everyone. There was a proposal to remove them, but I don't >> think that went through. >> >> >> On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene <[email protected]> wrote: >> > >> > On Jul 16, 2010, at 6:41 PM, Michael Segel wrote: >> > >> > >> > >> > Thanks for the response. >> > (You don't need to include the cc ...) >> > >> > With respect to the row level locking ... >> > I was interested in when the lock is actually acquired, how long the lock >> persists and when is the lock released. >> > From your response, the lock is only held on updating the row, and while >> the data is being written to the memory cache which is then written to disk. >> (Note: This row level locking different than transactional row level >> locking.) >> > >> > Now that I've had some caffeine I think I can clarify... :-) >> > >> > Some of my developers complained that they were having trouble with two >> different processes trying to update the same table. >> > Not sure why they were having the problem, so I wanted to have a good >> fix. The simple fix was to have them issue the close() the HTable connection >> which forces any resources that they acquired to be released. >> > >> > >> > It would help to know what the exact problem was. Normally I wouldn't see >> any problems. >> > >> > >> > In looking at the problem... its possible that they didn't have AutoFlush >> set to true so the write was still in the buffer and hadn't gotten flushed. >> > >> > If the lock only persists for the duration of the write to memory and is >> then released, then the issue could have been that the record written was in >> the buffer and not yet flushed to disk. >> > >> > >> > At the region server level HBase will use the cache for both reads and >> writes. This happens transparently for the user. Once something is written >> in the cache, all other clients will read from the same cache. No need to >> worry if the cache has been flushed. >> > Lars George has a good article about the hbase storage architecture >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >> > >> > I'm also assuming that when you run a scan() against a region that any >> information written to buffer but not yet written to disk will be missed. >> > >> > >> > When you do puts into hbase you'll use HTable. The HTable instance is on >> the client. HTable keeps a buffer as well and if autoFlush is false it only >> flushes when you do flushCommits() or when it reaches the buffer limit, or >> when you close the table. With autoFlush set to true it will flush for every >> put. >> > This buffer is on the client. So when data is actually flushed it gets on >> the region server where it will get in the region server cache and WAL. >> > Unless a client flushes the put no other client can see the data because >> it still resides on the client only. Depending on what you need to do you >> can use autoFlush true if you are doing many small writes that need to be >> seen immediately by others. You can use autoFlush false and issue >> flushCommits() yourself, or you can rely on the buffer limit for that. >> > >> > So I guess the question isn't so much the issue of a lock, but that we >> need to make sure that data written to the buffer should be flushed ASAP >> unless we know that we're going to be writing a lot of data in the m/r job. >> > >> > >> > Usually when you write from the reducer (heavy) is better to use a buffer >> and not autoFlush to have a good performance. >> > >> > Cosmin >> > >> > >> > Thx >> > >> > -Mike >> > >> > >> > >> > From: [email protected]<mailto:[email protected]> >> > To: [email protected]<mailto:[email protected]> >> > CC: [email protected]<mailto:[email protected]> >> > Date: Fri, 16 Jul 2010 12:34:36 +0100 >> > Subject: Re: Row level locking? >> > >> > Currently a row is part of a region and there's a single region server >> serving that region at a particular moment. >> > So when that row is updated a lock is acquired for that row until the >> actual data is updated in memory (note that a put will be written to cache >> on the region server and also persisted in the write-ahead log - WAL). >> Subsequent puts to that row will have to wait for that lock. >> > >> > HBase is fully consistent. This being said all the locking takes place at >> row level only, so when you scan you have to take that into account as >> there's no range locking. >> > >> > I'm not sure I understand the resource releasing issue. HTable.close() >> flushes the current write buffer (you can have write buffer if you use >> autoFlush set to false). >> > >> > Cosmin >> > >> > >> > On Jul 16, 2010, at 1:33 PM, Michael Segel wrote: >> > >> > >> > Ok, >> > >> > First, I'm writing this before I've had my first cup of coffee so I am >> apologizing in advance if the question is a brain dead question.... >> > >> > Going from a relational background, some of these questions may not make >> sense in the HBase world. >> > >> > >> > When does HBase acquire a lock on a row and how long does it persist? >> Does the lock only hit the current row, or does it also lock the adjacent >> rows too? >> > Does HBase support the concept of 'dirty reads'? >> > >> > The issue is what happens when you have two jobs trying to hit the same >> table at the same time and update/read the rows at the same time. >> > >> > A developer came across a problem and the fix was to use the >> HTable.close() method to release any resources. >> > >> > I am wondering if you explicitly have to clean up or can a lazy developer >> let the object just go out of scope and get GC'd. >> > >> > Thx >> > >> > -Mike >> > >> > >> > _________________________________________________________________ >> > The New Busy is not the too busy. Combine all your e-mail accounts with >> Hotmail. >> > >> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 >> > >> > >> > _________________________________________________________________ >> > Hotmail is redefining busy with tools for the New Busy. Get more from >> your inbox. >> > >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 >> > >> > >> > > > > -- > Guilherme > > msn: [email protected] > homepage: http://sites.google.com/site/germoglio/ >
