On Fri, Jan 14, 2011 at 10:51 PM, tsuna <[email protected]> wrote:
> On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun
> <[email protected]> wrote:
> > But how can the client understand which k-v belongs to an individual RS?
> > Does it need to scan the .META. table? (if so, it's an expensive op). On
> the
> > RegionServer side, is it like processing multiple requests in a batch per
> > RPC?
>
> The client has to figure out which region each edit has to go to. The
> client maintains a local cache of the META table, so when you
> frequently use the same working set of regions (which is common for
> most applications), the lookups are essentially free.
>
> The worst case is a client that does random-writes to all the regions
> in a huge table. In this case, the client will end up discovering the
> location of all the regions of that table and keep this in its
> in-memory cache. But regions move around, are split etc. This does
> cause extra META lookups, but the latency for a META lookup is
> typically very small (even though the penalty incurred by the client
> compared to cache hits in its local META cache is huge, comparatively
> speaking). Note that right now neither HTable nor asynchbase
> pro-actively evict unused entries from the local META cache to save
> memory. I don't think anyone is running HBase at a scale where this
> optimization would be useful.
>
> If you have a write-heavy application, you're always going to get
> significantly higher throughput when you send your edits in batch to
> the server. The downside to this is that when your client application
> dies, you lose all the edits in the un-committed batch. Unlike
> HTable, asynchbase puts an upper bound on the amount of time an edit
> is allowed to remain in the client's buffer, which helps limit
> data-loss when a client crashes (OpenTSDB sets this to 1s by default,
> so when it dies, you know you lost at most 1s worth of datapoints).
>
*
setWriteBufferSize(1024*1014*10); // 10MB
*
*setAutoFlush(false*);
for(i=0; i<N; i++) {
list.add(putitem[i]);
}
htable.put(list);
For the above pseudo code (using put(List) to commit update in HBase), can I
get a "batch transaction" success notification?
* i.e., How can I know all the items have been successfully
committed? -- it seems that I can't get such information, all are
best-effort. Should I know some commits fail, I can do an application-level
retry.
* *setAutoFlush(true*); does not seem to help us to get any more
reliable operation either.
>
> --
> Benoit "tsuna" Sigoure
> Software Engineer @ www.StumbleUpon.com
>
--
--Sean