Are your HBase bolts being saturated? If not, you may want to increase the number of pending tuples, as that could cause things to be artificially throttled.
You should also try attaching a profiler to your bolt, and see what's holding it up. Are you doing batched puts (or puts being committed on a separate thread)? That could also cause substantial improvements. On Mon, Jun 9, 2014 at 8:11 PM, Justin Workman <[email protected]> wrote: > In response to a comment from P. Taylor Goetz on another thread..."I can > personally verify that it is possible to process 1.2+ million (relatively > small) messages per second with a 10-15 node cluster — and that includes > writing to HBase, and other components (I don’t have the hardware specs > handy, but can probably dig them up)." > > I would like to know what special knobs people are tuning in both Storm > and Hbase to achieve this level of throughput. Things I would be interested > in would be Hbase cluster sizes, is the cluster shared with map reduce load > as well, bolt parallelism and any other knobs people have adjusted to get > this level of write throughput to Hbase from Storm. > > Maybe this isn't the right group, but we are struggling getting more than > about 2000 tuples/sec writting to Hbase. I think I know some of the > bottlenecks, but would love to know what others in teh community are tuning > to get this level of performance. > > Our messages are roughly 300-500k and we are running on a 6 node Storm > cluster running on virtual machines (our first bottleneck, which we will be > replacing with 10 relatively beefy physical nodes), a parallelism of 40 for > our storage bolt. > > Any hints on Hbase or Storm optimizations that can be done to help > increase the throughput to Hbase would be greatly appreciated. > > Thanks > Justin >
