In response to a comment from P. Taylor Goetz on another thread..."I can
personally verify that it is possible to process 1.2+ million (relatively
small) messages per second with a 10-15 node cluster — and that includes
writing to HBase, and other components (I don’t have the hardware specs
handy, but can probably dig them up)."

I would like to know what special knobs people are tuning in both Storm and
Hbase to achieve this level of throughput. Things I would be interested in
would be Hbase cluster sizes, is the cluster shared with map reduce load as
well, bolt parallelism and any other knobs people have adjusted to get this
level of write throughput to Hbase from Storm.

Maybe this isn't the right group, but we are struggling getting more than
about 2000 tuples/sec writting to Hbase. I think I know some of the
bottlenecks, but would love to know what others in teh community are tuning
to get this level of performance.

Our messages are roughly 300-500k and we are running on a 6 node Storm
cluster running on virtual machines (our first bottleneck, which we will be
replacing with 10 relatively beefy physical nodes), a parallelism of 40 for
our storage bolt.

Any hints on Hbase or Storm optimizations that can be done to help increase
the throughput to Hbase would be greatly appreciated.

Thanks
Justin

Reply via email to