Re: HTable.put(List puts) perform batch insert?

Sean Bigdatafun Mon, 31 Jan 2011 17:17:20 -0800

On Mon, Jan 31, 2011 at 5:13 PM, Jim X <[email protected]> wrote:

> Does Htable.getWriteBuffer() do a roll back?
>
>
I guess not --- this only allows you to know what has not been successfully
committed to the server after you catch the exception.


Correct me if I am wrong.

Sean


> Jim
>
> On Mon, Jan 31, 2011 at 8:04 PM, Ryan Rawson <[email protected]> wrote:
> > When you are using the buffer, you also need to flush it:
> >
> > htable.flushCommits();
> >
> > If the call succeeds, the edits were persisted.  If at any point you
> > get exceptions, the unfinished edits are left in the write buffer and
> > htable.getWriteBuffer() gets you them.
> >
> > -ryan
> >
> > On Mon, Jan 31, 2011 at 10:48 AM, Sean Bigdatafun
> > <[email protected]> wrote:
> >> On Fri, Jan 14, 2011 at 10:51 PM, tsuna <[email protected]> wrote:
> >>
> >>> On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun
> >>> <[email protected]> wrote:
> >>> > But how can the client understand which k-v belongs to an individual
> RS?
> >>> > Does it need to scan the .META. table? (if so, it's an expensive op).
> On
> >>> the
> >>> > RegionServer side, is it like processing multiple requests in a batch
> per
> >>> > RPC?
> >>>
> >>> The client has to figure out which region each edit has to go to.  The
> >>> client maintains a local cache of the META table, so when you
> >>> frequently use the same working set of regions (which is common for
> >>> most applications), the lookups are essentially free.
> >>>
> >>> The worst case is a client that does random-writes to all the regions
> >>> in a huge table.  In this case, the client will end up discovering the
> >>> location of all the regions of that table and keep this in its
> >>> in-memory cache.  But regions move around, are split etc.  This does
> >>> cause extra META lookups, but the latency for a META lookup is
> >>> typically very small (even though the penalty incurred by the client
> >>> compared to cache hits in its local META cache is huge, comparatively
> >>> speaking).  Note that right now neither HTable nor asynchbase
> >>> pro-actively evict unused entries from the local META cache to save
> >>> memory.  I don't think anyone is running HBase at a scale where this
> >>> optimization would be useful.
> >>>
> >>> If you have a write-heavy application, you're always going to get
> >>> significantly higher throughput when you send your edits in batch to
> >>> the server.  The downside to this is that when your client application
> >>> dies, you lose all the edits in the un-committed batch.  Unlike
> >>> HTable, asynchbase puts an upper bound on the amount of time an edit
> >>> is allowed to remain in the client's buffer, which helps limit
> >>> data-loss when a client crashes (OpenTSDB sets this to 1s by default,
> >>> so when it dies, you know you lost at most 1s worth of datapoints).
> >>>
> >> *
> >>
> >> setWriteBufferSize(1024*1014*10); // 10MB
> >>
> >> *
> >>
> >> *setAutoFlush(false*);
> >>
> >> for(i=0; i<N; i++) {
> >>
> >>  list.add(putitem[i]);
> >>
> >> }
> >>
> >> htable.put(list);
> >>
> >>
> >> For the above pseudo code (using put(List) to commit update in HBase),
> can I
> >> get a "batch transaction" success notification?
> >>       * i.e., How can I know all the items have been successfully
> >> committed? -- it seems that I can't get such information, all are
> >> best-effort. Should I know some commits fail, I can do an
> application-level
> >> retry.
> >>       * *setAutoFlush(true*); does not seem to help us to get any more
> >> reliable operation either.
> >>
> >>
> >>
> >>
> >>
> >>>
> >>> --
> >>> Benoit "tsuna" Sigoure
> >>> Software Engineer @ www.StumbleUpon.com
> >>>
> >>
> >>
> >>
> >> --
> >> --Sean
> >>
> >
>



-- 
--Sean

Re: HTable.put(List puts) perform batch insert?

Reply via email to