Re: HTable.put(List puts) perform batch insert?

Ryan Rawson Mon, 31 Jan 2011 17:04:34 -0800

When you are using the buffer, you also need to flush it:

htable.flushCommits();


If the call succeeds, the edits were persisted.  If at any point you
get exceptions, the unfinished edits are left in the write buffer and
htable.getWriteBuffer() gets you them.

-ryan

On Mon, Jan 31, 2011 at 10:48 AM, Sean Bigdatafun
<[email protected]> wrote:
> On Fri, Jan 14, 2011 at 10:51 PM, tsuna <[email protected]> wrote:
>
>> On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun
>> <[email protected]> wrote:
>> > But how can the client understand which k-v belongs to an individual RS?
>> > Does it need to scan the .META. table? (if so, it's an expensive op). On
>> the
>> > RegionServer side, is it like processing multiple requests in a batch per
>> > RPC?
>>
>> The client has to figure out which region each edit has to go to.  The
>> client maintains a local cache of the META table, so when you
>> frequently use the same working set of regions (which is common for
>> most applications), the lookups are essentially free.
>>
>> The worst case is a client that does random-writes to all the regions
>> in a huge table.  In this case, the client will end up discovering the
>> location of all the regions of that table and keep this in its
>> in-memory cache.  But regions move around, are split etc.  This does
>> cause extra META lookups, but the latency for a META lookup is
>> typically very small (even though the penalty incurred by the client
>> compared to cache hits in its local META cache is huge, comparatively
>> speaking).  Note that right now neither HTable nor asynchbase
>> pro-actively evict unused entries from the local META cache to save
>> memory.  I don't think anyone is running HBase at a scale where this
>> optimization would be useful.
>>
>> If you have a write-heavy application, you're always going to get
>> significantly higher throughput when you send your edits in batch to
>> the server.  The downside to this is that when your client application
>> dies, you lose all the edits in the un-committed batch.  Unlike
>> HTable, asynchbase puts an upper bound on the amount of time an edit
>> is allowed to remain in the client's buffer, which helps limit
>> data-loss when a client crashes (OpenTSDB sets this to 1s by default,
>> so when it dies, you know you lost at most 1s worth of datapoints).
>>
> *
>
> setWriteBufferSize(1024*1014*10); // 10MB
>
> *
>
> *setAutoFlush(false*);
>
> for(i=0; i<N; i++) {
>
>  list.add(putitem[i]);
>
> }
>
> htable.put(list);
>
>
> For the above pseudo code (using put(List) to commit update in HBase), can I
> get a "batch transaction" success notification?
>       * i.e., How can I know all the items have been successfully
> committed? -- it seems that I can't get such information, all are
> best-effort. Should I know some commits fail, I can do an application-level
> retry.
>       * *setAutoFlush(true*); does not seem to help us to get any more
> reliable operation either.
>
>
>
>
>
>>
>> --
>> Benoit "tsuna" Sigoure
>> Software Engineer @ www.StumbleUpon.com
>>
>
>
>
> --
> --Sean
>

Re: HTable.put(List puts) perform batch insert?

Reply via email to