For the 2nd part of the question, if you have 10 Puts it's more efficient to send a single RS message with 10 Puts than send 10 RS messages with 1 Put apiece. There are 2 words to be careful with, and those are "always" and "never", because there is an exception: if you are using the client writeBuffer and each of those 10 Puts are going to a different RegionServer, then you haven't really gained much.
To answer the next question of how you know where the Puts are going, see this method… http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[],%20boolean%29 Because the Hbase client talks directly to each RS, it has to know the region boundaries. From: Lin Ma <[email protected]<mailto:[email protected]>> Date: Thursday, September 6, 2012 11:54 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, Doug Meil <[email protected]<mailto:[email protected]>> Cc: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: batch update question Thank you Doug, Very effective reply. :-) - why batch update could resolve contention issue on the same row? Could you elaborate a bit more or show me an example? - Batch update always have good performance compared to single update (when we measure total throughput)? regards, Lin On Thu, Sep 6, 2012 at 12:59 AM, Doug Meil <[email protected]<mailto:[email protected]>> wrote: Hi there, if you look in the source code for HTable there is a list of Put objects. That's the buffer, and it's a client-side buffer. On 9/5/12 12:04 PM, "Lin Ma" <[email protected]<mailto:[email protected]>> wrote: >Thank you Stack for the details directions! > >1. You are right, I have not met with any real row contention issues. My >purpose is understanding the issue in advance, and also from this issue to >understand HBase generals better; >2. For the comments from API Url page you referred -- "If >isAutoFlush<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client >/HTableInterface.html#isAutoFlush%28%29>is >false, the update is buffered until the internal buffer is full.", I >am >confused what is the buffer? Buffer at client side or buffer in region >server? Is there a way to configure its size to hold until flushing? >3. Why batch could resolve contention on the same raw issue in theory, >compared to non-batch operation? Besides preparation the solution in my >mind in advance, I want to learn a bit about why. :-) > >regards, >Lin > >On Wed, Sep 5, 2012 at 4:00 AM, Stack ><[email protected]<mailto:[email protected]>> wrote: > >> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma >> <[email protected]<mailto:[email protected]>> wrote: >> > Hello guys, >> > >> > I am reading the book "HBase, the definitive guide", at the beginning >>of >> > chapter 3, it is mentioned in order to reduce performance impact for >> > clients to update the same row (lock contention issues for automatic >> > write), batch update is preferred. My questions is, for MR job, what >>are >> > the batch update methods we could leverage to resolve the issue? And >>for >> > API client, what are the batch update methods we could leverage to >> resolve >> > the issue? >> > >> >> Do you actually have a problem where there is contention on a single >>row? >> >> Use methods like >> >> >>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm >>l#put(java.util.List) >> or the batch methods listed earlier in the API. You should set >> autoflush to false too: >> >> >>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte >>rface.html#isAutoFlush() >> >> Even batching, a highly contended row might hold up inserts... but for >> sure you actually have this problem in the first place? >> >> St.Ack >>
