On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun
<[email protected]> wrote:
> But how can the client understand which k-v belongs to an individual RS?
> Does it need to scan the .META. table? (if so, it's an expensive op). On the
> RegionServer side, is it like processing multiple requests in a batch per
> RPC?

The client has to figure out which region each edit has to go to.  The
client maintains a local cache of the META table, so when you
frequently use the same working set of regions (which is common for
most applications), the lookups are essentially free.

The worst case is a client that does random-writes to all the regions
in a huge table.  In this case, the client will end up discovering the
location of all the regions of that table and keep this in its
in-memory cache.  But regions move around, are split etc.  This does
cause extra META lookups, but the latency for a META lookup is
typically very small (even though the penalty incurred by the client
compared to cache hits in its local META cache is huge, comparatively
speaking).  Note that right now neither HTable nor asynchbase
pro-actively evict unused entries from the local META cache to save
memory.  I don't think anyone is running HBase at a scale where this
optimization would be useful.

If you have a write-heavy application, you're always going to get
significantly higher throughput when you send your edits in batch to
the server.  The downside to this is that when your client application
dies, you lose all the edits in the un-committed batch.  Unlike
HTable, asynchbase puts an upper bound on the amount of time an edit
is allowed to remain in the client's buffer, which helps limit
data-loss when a client crashes (OpenTSDB sets this to 1s by default,
so when it dies, you know you lost at most 1s worth of datapoints).

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Reply via email to