Re: batch update question

Doug Meil Thu, 06 Sep 2012 11:27:29 -0700

For the 2nd part of the question, if you have 10 Puts it's more efficient to 
send a single RS message with 10 Puts than send 10 RS messages with 1 Put 
apiece.  There are 2 words to be careful with, and those are "always" and 
"never", because there is an exception: if you are using the client writeBuffer 
and each of those 10 Puts are going to a different RegionServer, then you 
haven't really gained much.


To answer the next question of how you know where the Puts are going, see this 
method…

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[],%20boolean%29

Because the Hbase client talks directly to each RS, it has to know the region 
boundaries.



From: Lin Ma <[email protected]<mailto:[email protected]>>
Date: Thursday, September 6, 2012 11:54 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Doug Meil 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: batch update question

Thank you Doug,

Very effective reply. :-)

- why batch update could resolve contention issue on the same row? Could you 
elaborate a bit more or show me an example?
- Batch update always have good performance compared to single update (when we 
measure total throughput)?

regards,
Lin

On Thu, Sep 6, 2012 at 12:59 AM, Doug Meil 
<[email protected]<mailto:[email protected]>> wrote:

Hi there, if you look in the source code for HTable there is a list of Put
objects.  That's the buffer, and it's a client-side buffer.





On 9/5/12 12:04 PM, "Lin Ma" <[email protected]<mailto:[email protected]>> wrote:

>Thank you Stack for the details directions!
>
>1. You are right, I have not met with any real row contention issues. My
>purpose is understanding the issue in advance, and also from this issue to
>understand HBase generals better;
>2. For the comments from API Url page you referred -- "If
>isAutoFlush<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client
>/HTableInterface.html#isAutoFlush%28%29>is
>false, the update is buffered until the internal buffer is full.", I
>am
>confused what is the buffer? Buffer at client side or buffer in region
>server? Is there a way to configure its size to hold until flushing?
>3. Why batch could resolve contention on the same raw issue in theory,
>compared to non-batch operation? Besides preparation the solution in my
>mind in advance, I want to learn a bit about why. :-)
>
>regards,
>Lin
>
>On Wed, Sep 5, 2012 at 4:00 AM, Stack 
><[email protected]<mailto:[email protected]>> wrote:
>
>> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma 
>> <[email protected]<mailto:[email protected]>> wrote:
>> > Hello guys,
>> >
>> > I am reading the book "HBase, the definitive guide", at the beginning
>>of
>> > chapter 3, it is mentioned in order to reduce performance impact for
>> > clients to update the same row (lock contention issues for automatic
>> > write), batch update is preferred. My questions is, for MR job, what
>>are
>> > the batch update methods we could leverage to resolve the issue? And
>>for
>> > API client, what are the batch update methods we could leverage to
>> resolve
>> > the issue?
>> >
>>
>> Do you actually have a problem where there is contention on a single
>>row?
>>
>> Use methods like
>>
>>
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm
>>l#put(java.util.List)
>> or the batch methods listed earlier in the API.  You should set
>> autoflush to false too:
>>
>>
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte
>>rface.html#isAutoFlush()
>>
>> Even batching, a highly contended row might hold up inserts... but for
>> sure you actually have this problem in the first place?
>>
>> St.Ack
>>

Re: batch update question

Reply via email to