In response to a comment from P. Taylor Goetz on another thread..."I can personally verify that it is possible to process 1.2+ million (relatively small) messages per second with a 10-15 node cluster — and that includes writing to HBase, and other components (I don’t have the hardware specs handy, but can probably dig them up)."
I would like to know what special knobs people are tuning in both Storm and Hbase to achieve this level of throughput. Things I would be interested in would be Hbase cluster sizes, is the cluster shared with map reduce load as well, bolt parallelism and any other knobs people have adjusted to get this level of write throughput to Hbase from Storm. Maybe this isn't the right group, but we are struggling getting more than about 2000 tuples/sec writting to Hbase. I think I know some of the bottlenecks, but would love to know what others in teh community are tuning to get this level of performance. Our messages are roughly 300-500k and we are running on a 6 node Storm cluster running on virtual machines (our first bottleneck, which we will be replacing with 10 relatively beefy physical nodes), a parallelism of 40 for our storage bolt. Any hints on Hbase or Storm optimizations that can be done to help increase the throughput to Hbase would be greatly appreciated. Thanks Justin
