You should post a screenshot of your topology in Storm UI for us to analyze.
The issue may be any one, or combination of: * Hardware and OS environment the cluster runs on * Storm and topology settings (maxSpoutPending, numWorkers, Java or Kryo serialization, worker JVM settings, etc.) * Topology structure, i.e. the number, type and your component's parallelism, the type of bolt groupings, usage of Trident (incurs a performance hit compared to the plain Storm API). The choice of groupings is very important, as others already mentioned. You should strive to minimize inter-worker tuple traffic whenever possible. First reduce your data, and then route it to the next bolt. Partition data as much as possible. LocalOrShuffle grouping is very useful here. And CombinerAggregators in Trident, for example. Try to measure the network throughput of your cluster to see if the network is saturated, and monitor your CPU and memory usage. Monitor for JVM GC pauses too and other parameters. Just tuning a few parameters can give you and order of magnitude performance boost, but you should first identify the bottleneck to know which parameter to tune. As for Kryo serialization, set Config.setFallBackToJavaSerialization to 'false' to disable falling back to Java serialization if Kryo can't be used, this way you'll know if Kryo is being used and if not, the reason why (check the logs). Danijel On Thursday, June 26, 2014, <[email protected]> wrote: > Hi, > Yes, you're correct. After my adjustment, it can process 5500 tuples/s of > whole topology. And I do a simple experiment: > a. CountingBolt only: 11500 tuples/sec > b. CountingBolt+parseDataBolt: 6000 tuples/sec > > These two bots are both connected from kafkaSpout, so I think > parseDataBolt is the bottleneck!! > > BUT!!! I try to increase the parallelism hint of parseDataBolt from 20 to > 50, and it almost has no effect of throughput. What's the problems? If I > need to process more tuples in the future, what's the solution ? > > > Best regards, > James > > > > Danijel Schiavuzzi <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> 於 2014/6/26 > 下午2:50 寫道: > > At your current throughput rate, the choice of Java or Kryo serialization > doesn't matter much. The bottleneck seems to be somewhere else. > > -- Danijel Schiavuzzi E: [email protected] W: www.schiavuzzi.com T: +385989035562 Skype: danijels7
