+1 for YourKit.

I'll try to follow up with more information and specs, but for now here are a 
few details:

- it is a network monitoring use case dealing with network packets, so the 
message size is small.
- the queue was Apache Kafka 0.8.x
- the spout was the storm-kafka spout that is now part of the storm project and 
will be bundled with the next release
- the topologies were core storm, not Trident
- the HBase integration used https://github.com/ptgoetz/storm-hbase (full 
disclosure: I'm the author. It ships with Hortonworks Data Platform (HDP) 2.1. 
I hope to include it in the Storm distribution if there is enough community 
interest/support).
- Elastic Search was also a component.

There was a lot of tuning... Storm, HBase, etc. 

I'll follow up with additional details when I get a chance.

-Taylor

> On Jun 9, 2014, at 9:06 PM, Justin Workman <[email protected]> wrote:
> 
> That is my understanding of capacity as well. I am certain I am looking at 
> the right metric. I will forward a screen shot of the ui as soon as I get in 
> front of a computer. 
> 
> I will also give YourKit a try. 
> 
> Sent from my iPhone
> 
>> On Jun 9, 2014, at 6:58 PM, Jon Logan <[email protected]> wrote:
>> 
>> Are you sure you are looking at the right figure? Capacity should not be > 
>> 1. High values indicate that you may want to increase parallelism on that 
>> step. Low values indicate something else is probably bottlenecking your 
>> topology. If you could send a screenshot of the Storm UI that could be 
>> helpful.
>> 
>> 
>> I've had good luck with YourKit...just remotely attach to a running worker.
>> 
>> 
>>> On Mon, Jun 9, 2014 at 8:53 PM, Justin Workman <[email protected]> 
>>> wrote:
>>> The capacity indicates they are being utilized. Capacity hovers around .800 
>>> and busts to 1.6 or so when we see spikes of tuples or restart the 
>>> topology. 
>>> 
>>> Recommendations on profilers?
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Jun 9, 2014, at 6:50 PM, Jon Logan <[email protected]> wrote:
>>>> 
>>>> Are your HBase bolts being saturated? If not, you may want to increase the 
>>>> number of pending tuples, as that could cause things to be artificially 
>>>> throttled.
>>>> 
>>>> You should also try attaching a profiler to your bolt, and see what's 
>>>> holding it up. Are you doing batched puts (or puts being committed on a 
>>>> separate thread)? That could also cause substantial improvements.
>>>> 
>>>> 
>>>>> On Mon, Jun 9, 2014 at 8:11 PM, Justin Workman <[email protected]> 
>>>>> wrote:
>>>>> In response to a comment from P. Taylor Goetz on another thread..."I can 
>>>>> personally verify that it is possible to process 1.2+ million (relatively 
>>>>> small) messages per second with a 10-15 node cluster — and that includes 
>>>>> writing to HBase, and other components (I don’t have the hardware specs 
>>>>> handy, but can probably dig them up)."
>>>>> 
>>>>> I would like to know what special knobs people are tuning in both Storm 
>>>>> and Hbase to achieve this level of throughput. Things I would be 
>>>>> interested in would be Hbase cluster sizes, is the cluster shared with 
>>>>> map reduce load as well, bolt parallelism and any other knobs people have 
>>>>> adjusted to get this level of write throughput to Hbase from Storm.
>>>>> 
>>>>> Maybe this isn't the right group, but we are struggling getting more than 
>>>>> about 2000 tuples/sec writting to Hbase. I think I know some of the 
>>>>> bottlenecks, but would love to know what others in teh community are 
>>>>> tuning to get this level of performance.
>>>>> 
>>>>> Our messages are roughly 300-500k and we are running on a 6 node Storm 
>>>>> cluster running on virtual machines (our first bottleneck, which we will 
>>>>> be replacing with 10 relatively beefy physical nodes), a parallelism of 
>>>>> 40 for our storage bolt. 
>>>>> 
>>>>> Any hints on Hbase or Storm optimizations that can be done to help 
>>>>> increase the throughput to Hbase would be greatly appreciated.
>>>>> 
>>>>> Thanks
>>>>> Justin

Reply via email to