+1 for YourKit. I'll try to follow up with more information and specs, but for now here are a few details:
- it is a network monitoring use case dealing with network packets, so the message size is small. - the queue was Apache Kafka 0.8.x - the spout was the storm-kafka spout that is now part of the storm project and will be bundled with the next release - the topologies were core storm, not Trident - the HBase integration used https://github.com/ptgoetz/storm-hbase (full disclosure: I'm the author. It ships with Hortonworks Data Platform (HDP) 2.1. I hope to include it in the Storm distribution if there is enough community interest/support). - Elastic Search was also a component. There was a lot of tuning... Storm, HBase, etc. I'll follow up with additional details when I get a chance. -Taylor > On Jun 9, 2014, at 9:06 PM, Justin Workman <[email protected]> wrote: > > That is my understanding of capacity as well. I am certain I am looking at > the right metric. I will forward a screen shot of the ui as soon as I get in > front of a computer. > > I will also give YourKit a try. > > Sent from my iPhone > >> On Jun 9, 2014, at 6:58 PM, Jon Logan <[email protected]> wrote: >> >> Are you sure you are looking at the right figure? Capacity should not be > >> 1. High values indicate that you may want to increase parallelism on that >> step. Low values indicate something else is probably bottlenecking your >> topology. If you could send a screenshot of the Storm UI that could be >> helpful. >> >> >> I've had good luck with YourKit...just remotely attach to a running worker. >> >> >>> On Mon, Jun 9, 2014 at 8:53 PM, Justin Workman <[email protected]> >>> wrote: >>> The capacity indicates they are being utilized. Capacity hovers around .800 >>> and busts to 1.6 or so when we see spikes of tuples or restart the >>> topology. >>> >>> Recommendations on profilers? >>> >>> Sent from my iPhone >>> >>>> On Jun 9, 2014, at 6:50 PM, Jon Logan <[email protected]> wrote: >>>> >>>> Are your HBase bolts being saturated? If not, you may want to increase the >>>> number of pending tuples, as that could cause things to be artificially >>>> throttled. >>>> >>>> You should also try attaching a profiler to your bolt, and see what's >>>> holding it up. Are you doing batched puts (or puts being committed on a >>>> separate thread)? That could also cause substantial improvements. >>>> >>>> >>>>> On Mon, Jun 9, 2014 at 8:11 PM, Justin Workman <[email protected]> >>>>> wrote: >>>>> In response to a comment from P. Taylor Goetz on another thread..."I can >>>>> personally verify that it is possible to process 1.2+ million (relatively >>>>> small) messages per second with a 10-15 node cluster — and that includes >>>>> writing to HBase, and other components (I don’t have the hardware specs >>>>> handy, but can probably dig them up)." >>>>> >>>>> I would like to know what special knobs people are tuning in both Storm >>>>> and Hbase to achieve this level of throughput. Things I would be >>>>> interested in would be Hbase cluster sizes, is the cluster shared with >>>>> map reduce load as well, bolt parallelism and any other knobs people have >>>>> adjusted to get this level of write throughput to Hbase from Storm. >>>>> >>>>> Maybe this isn't the right group, but we are struggling getting more than >>>>> about 2000 tuples/sec writting to Hbase. I think I know some of the >>>>> bottlenecks, but would love to know what others in teh community are >>>>> tuning to get this level of performance. >>>>> >>>>> Our messages are roughly 300-500k and we are running on a 6 node Storm >>>>> cluster running on virtual machines (our first bottleneck, which we will >>>>> be replacing with 10 relatively beefy physical nodes), a parallelism of >>>>> 40 for our storage bolt. >>>>> >>>>> Any hints on Hbase or Storm optimizations that can be done to help >>>>> increase the throughput to Hbase would be greatly appreciated. >>>>> >>>>> Thanks >>>>> Justin
