Re: HTable.put(List puts) perform batch insert?

Jim X Mon, 31 Jan 2011 17:14:00 -0800

Does Htable.getWriteBuffer() do a roll back?

Jim


On Mon, Jan 31, 2011 at 8:04 PM, Ryan Rawson <[email protected]> wrote:
> When you are using the buffer, you also need to flush it:
>
> htable.flushCommits();
>
> If the call succeeds, the edits were persisted.  If at any point you
> get exceptions, the unfinished edits are left in the write buffer and
> htable.getWriteBuffer() gets you them.
>
> -ryan
>
> On Mon, Jan 31, 2011 at 10:48 AM, Sean Bigdatafun
> <[email protected]> wrote:
>> On Fri, Jan 14, 2011 at 10:51 PM, tsuna <[email protected]> wrote:
>>
>>> On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun
>>> <[email protected]> wrote:
>>> > But how can the client understand which k-v belongs to an individual RS?
>>> > Does it need to scan the .META. table? (if so, it's an expensive op). On
>>> the
>>> > RegionServer side, is it like processing multiple requests in a batch per
>>> > RPC?
>>>
>>> The client has to figure out which region each edit has to go to.  The
>>> client maintains a local cache of the META table, so when you
>>> frequently use the same working set of regions (which is common for
>>> most applications), the lookups are essentially free.
>>>
>>> The worst case is a client that does random-writes to all the regions
>>> in a huge table.  In this case, the client will end up discovering the
>>> location of all the regions of that table and keep this in its
>>> in-memory cache.  But regions move around, are split etc.  This does
>>> cause extra META lookups, but the latency for a META lookup is
>>> typically very small (even though the penalty incurred by the client
>>> compared to cache hits in its local META cache is huge, comparatively
>>> speaking).  Note that right now neither HTable nor asynchbase
>>> pro-actively evict unused entries from the local META cache to save
>>> memory.  I don't think anyone is running HBase at a scale where this
>>> optimization would be useful.
>>>
>>> If you have a write-heavy application, you're always going to get
>>> significantly higher throughput when you send your edits in batch to
>>> the server.  The downside to this is that when your client application
>>> dies, you lose all the edits in the un-committed batch.  Unlike
>>> HTable, asynchbase puts an upper bound on the amount of time an edit
>>> is allowed to remain in the client's buffer, which helps limit
>>> data-loss when a client crashes (OpenTSDB sets this to 1s by default,
>>> so when it dies, you know you lost at most 1s worth of datapoints).
>>>
>> *
>>
>> setWriteBufferSize(1024*1014*10); // 10MB
>>
>> *
>>
>> *setAutoFlush(false*);
>>
>> for(i=0; i<N; i++) {
>>
>>  list.add(putitem[i]);
>>
>> }
>>
>> htable.put(list);
>>
>>
>> For the above pseudo code (using put(List) to commit update in HBase), can I
>> get a "batch transaction" success notification?
>>       * i.e., How can I know all the items have been successfully
>> committed? -- it seems that I can't get such information, all are
>> best-effort. Should I know some commits fail, I can do an application-level
>> retry.
>>       * *setAutoFlush(true*); does not seem to help us to get any more
>> reliable operation either.
>>
>>
>>
>>
>>
>>>
>>> --
>>> Benoit "tsuna" Sigoure
>>> Software Engineer @ www.StumbleUpon.com
>>>
>>
>>
>>
>> --
>> --Sean
>>
>

Re: HTable.put(List puts) perform batch insert?

Reply via email to