On Mon, Jan 10, 2011 at 11:12 AM, Weishung Chung <[email protected]> wrote: > Multiple batches of 10k *new/updated* rows at any time to different tables > by different clients simultaneously. I want these multiple batches of > insertions to be done super fast. At the same time, I would like to be able > to scale up to 100k rows at a time (the goal). Now, I am building a cluster > of size 6 to 7 nodes.
If you're writing a multi-threaded client and you're going to have many clients like this writing to HBase continuously, I recommend writing your application with asynchbase (http://github.com/stumbleupon/asynchbase) instead. It's an alternate HBase client library I wrote and in my application it significantly increased write throughput. It can easily push 150k updates per second to a 20-node cluster – and then it's the local machine that's CPU bound, not the HBase cluster (the local machine is a very slow VM so it doesn't have a lot of horsepower). This client is especially good for throughput oriented workloads and was written to be thread-safe from the ground up (unlike HTable). -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
