We are in the process of evaluating hbase in an effort to switch from a
different nosql solution. Performance is of course an important part of our
evaluation. We are a python shop and we are very worried that we can not get
any real performance out of hbase using thrift (and must drop down to java).
We are aware of the various lower level options for bulk insert or java
based inserts with turning off WAL etc. but none of these are available to
us in python so are not part of our evaluation. We have a 10 node cluster
(24gb, 6 x 1TB, 16 core) that we setting up as data/region nodes, and we are
looking for suggestions on configuration as well as benchmarks in terms of
expectations of performance. Below are some specific questions. I realize
there are a million factors that help determine specific performance
numbers, so any examples of performance from running clusters would be great
as examples of what can be done. Again thrift seems to be our "problem" so
non java based solutions are preferred (do any non java based shops run
large scale hbase clusters?). Our total production cluster size is estimated
to be 50TB.

Our data model is 3 CFs, one primary and 2 secondary indexes. All writes go
to all 3 CFs and are grouped as a batch of row mutations which should avoid
row locking issues.

What heap size is recommended for master, and for region servers (24gb ram)?
What other settings can/should be tweaked in hbase to optimize performance
(we have looked at the wiki page)?
What is a good batch size for writes? We will start with 10k values/batch.
How many concurrent writers/readers can a single data node handle with
evenly distributed load? Are there settings specific to this?
What is "very good" read/write latency for a single put/get in hbase using
thrift?
What is "very good" read/write throughput per node in hbase using thrift?

We are looking to get performance numbers in the range of 10k aggregate
inserts/sec/node and read latency < 30ms/read with 3-4 concurrent
readers/node. Can our expectations be met with hbase through thrift? Can
they be met with hbase through java?

Thanks in advance for any help, examples, or recommendations that you can
provide!

Wayne

Reply via email to