To follow up, we've looked at Storm UI and it shows that the tuples are not evenly distributed. We know this has to do with the grouping. We use shuffle grouping (and even disabled load aware messaging), but the tuples are still not evenly distributed.
It seems like others have the same experience when upgrading to storm 2.2 from storm 1.2. Here are a few of those posts. We sense a theme here with this issue and are surprised there's no documentation or response to address this. 1. https://www.mail-archive.com/[email protected]/msg10070.html 2. https://www.mail-archive.com/[email protected]/msg10013.html 3. https://www.mail-archive.com/[email protected]/msg10114.html Also, for us, when we include other bolts to the topology, we've even seen the tuples not making it to the bolts at all, at least that is what Storm UI is showing (no tuples executed or acked and bolt capacity is always zero). Please help advise us on what should be done to troubleshoot/fix this. Any help is greatly appreciated. Thanks! From: Le, Binh T. Sent: Monday, November 8, 2021 2:32 PM To: '[email protected]' <[email protected]> Subject: Storm 2 Spout Not Acking, Failing Tuples Hi, We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue similar to this<https://www.mail-archive.com/[email protected]/msg10013.html>, where all bolts are acking but the spout does not, causing latency to be high and increasing. FYI, we anchor tuples. We can see that tuples are consistently timing out, causing them to fail and be retried over and over again and eventually getting "dropped", exceeding the max retries configured. In storm 1.2.1, the same set of storm configs work fine. It's only after upgrading that we're seeing this behavior. We have tried a number of things, all of which did not help. They include, but not limited to, the following: 1. Increasing the topology message timeout 2. Increasing max spout pending 3. Increasing number of workers 4. Increasing executor send and transfer buffer size 5. Extending the back pressure interval check (since it can't be disabled, which is the old behavior in storm 1) 6. Disabling load aware messaging Can you please let us know how we can go about troubleshooting this issue, finding where the root cause / bottleneck is, and possibly a fix? In case it matters, our storm topologies are reading from AWS Kinesis Data Streams. Thanks, Binh ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy. ______________________________________________________________________________________ www.accenture.com
