Hi, We are upgrading our Storm production cluster from 1.2.1 to 2.2.0 and we are observing that the performance of our topologies got worse compared to the 1.2.1 cluster.
Just to give insights into the performance impact, the processing QPS per worker used to be 400 and now it is below 100 in 2.2.0 cluster. The topology code hasn't been changed, except for some type-casting changes (the code is in Scala and build was failing). The storm.yaml config is also mostly same as that of the 1.2.1 cluster, only these we have updated: - topology.spout.wait.strategy: from *SleepSpoutWaitStrategy* to *WaitStrategyProgressive* - Added *topology.disable.loadaware.messaging: true* (from this stackoverflow post <https://stackoverflow.com/questions/64689408/apache-strom-upgrade-from-1-0-3-to-2-2-0-and-not-all-workers-are-used> ) The parallelism hint and number of workers is also the same for the different bolts as they were in the 1.2.1 cluster. And the machines on which workers are run are also the same (in terms of core, memory, etc.) For the whole week, we've been trying to figure it out. Finally, we tried to tweak batch configs (topology.transfer.batch.size and topology.producer.batch.size) but the topologies are getting stuck. We will spend more time on this, but ideally the topology should be able to scale as much as it was on the 1.2.1 cluster. So, we need some other direction here. Please suggest some potential issues that you've observed / other configs that we can tweak. Some of our observations: - Even after increasing parallelism hints for bolts / spouts, the overall QPS remains the same. So, if we have 3 workers (and reduced parallelism) we get ~400 qps / slot => overall qps to be ~1200. But with 9 workers (increased parallelism) we are able to scale ~140 qps / bolt => overall ~1200. - We have checked logs (application generated, nimbus, supervisor and worker) and don't see any issues there. Let us know particular logs we can look for. - The load-average has increased in the new cluster. Earlier it used to vary from 1 - 2.5 per machine, now it varies from 1 - 10 per machine. Given we have 12-core machines, we figured out it shouldn't be an issue. Thanks -- Regards, Shubham Kumar Jha
