Re: Read/Write Performance

Ryan Rawson Sat, 25 Dec 2010 22:49:26 -0800

This is a huge email to drop on Xmas eve, you wont likely get a
comprehensive answer until January.  I can offer a few tidbits
though...


First off you have to moderate your expectations in terms of datagrams
and RPC calls.  You want to do 10k RPC calls/sec/node is a bit of a
hard sell.  Especially since there is a flow of data from
client->thrift->rs->datanode.  There is shared infra there, you can
get weird pauses as hidden dependencies are revealed, eg: 3
regionservers hammer 1 datanode and things choke a bit.

So back to performanceland, the key here is batching if at all
possible.  If you want to do 10k insert/sec with good perf, batching
is key.  I think if you use the batch put calls in thrift that will do
what is expected.

As for the read side, I think your read goals are achievable.  We get
low read latencies using PHP and Thrift.  Interestingly running thru
thrift amortizes the client cache for short lived scripts/programs.

-ryan

On Fri, Dec 24, 2010 at 5:09 AM, Wayne <[email protected]> wrote:
> We are in the process of evaluating hbase in an effort to switch from a
> different nosql solution. Performance is of course an important part of our
> evaluation. We are a python shop and we are very worried that we can not get
> any real performance out of hbase using thrift (and must drop down to java).
> We are aware of the various lower level options for bulk insert or java
> based inserts with turning off WAL etc. but none of these are available to
> us in python so are not part of our evaluation. We have a 10 node cluster
> (24gb, 6 x 1TB, 16 core) that we setting up as data/region nodes, and we are
> looking for suggestions on configuration as well as benchmarks in terms of
> expectations of performance. Below are some specific questions. I realize
> there are a million factors that help determine specific performance
> numbers, so any examples of performance from running clusters would be great
> as examples of what can be done. Again thrift seems to be our "problem" so
> non java based solutions are preferred (do any non java based shops run
> large scale hbase clusters?). Our total production cluster size is estimated
> to be 50TB.
>
> Our data model is 3 CFs, one primary and 2 secondary indexes. All writes go
> to all 3 CFs and are grouped as a batch of row mutations which should avoid
> row locking issues.
>
> What heap size is recommended for master, and for region servers (24gb ram)?
> What other settings can/should be tweaked in hbase to optimize performance
> (we have looked at the wiki page)?
> What is a good batch size for writes? We will start with 10k values/batch.
> How many concurrent writers/readers can a single data node handle with
> evenly distributed load? Are there settings specific to this?
> What is "very good" read/write latency for a single put/get in hbase using
> thrift?
> What is "very good" read/write throughput per node in hbase using thrift?
>
> We are looking to get performance numbers in the range of 10k aggregate
> inserts/sec/node and read latency < 30ms/read with 3-4 concurrent
> readers/node. Can our expectations be met with hbase through thrift? Can
> they be met with hbase through java?
>
> Thanks in advance for any help, examples, or recommendations that you can
> provide!
>
> Wayne
>

Re: Read/Write Performance

Reply via email to