To follow up, we've looked at Storm UI and it shows that the tuples are not 
evenly distributed. We know this has to do with the grouping. We use shuffle 
grouping (and even disabled load aware messaging), but the tuples are still not 
evenly distributed.

It seems like others have the same experience when upgrading to storm 2.2 from 
storm 1.2. Here are a few of those posts. We sense a theme here with this issue 
and are surprised there's no documentation or response to address this.


  1.  https://www.mail-archive.com/[email protected]/msg10070.html
  2.  https://www.mail-archive.com/[email protected]/msg10013.html
  3.  https://www.mail-archive.com/[email protected]/msg10114.html

Also, for us, when we include other bolts to the topology, we've even seen the 
tuples not making it to the bolts at all, at least that is what Storm UI is 
showing (no tuples executed or acked and bolt capacity is always zero).

Please help advise us on what should be done to troubleshoot/fix this. Any help 
is greatly appreciated.

Thanks!

From: Le, Binh T.
Sent: Monday, November 8, 2021 2:32 PM
To: '[email protected]' <[email protected]>
Subject: Storm 2 Spout Not Acking, Failing Tuples

Hi,

We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue 
similar to 
this<https://www.mail-archive.com/[email protected]/msg10013.html>, where 
all bolts are acking but the spout does not, causing latency to be high and 
increasing. FYI, we anchor tuples. We can see that tuples are consistently 
timing out, causing them to fail and be retried over and over again and 
eventually getting "dropped", exceeding the max retries configured.

In storm 1.2.1, the same set of storm configs work fine. It's only after 
upgrading that we're seeing this behavior. We have tried a number of things, 
all of which did not help. They include, but not limited to, the following:

  1.  Increasing the topology message timeout
  2.  Increasing max spout pending
  3.  Increasing number of workers
  4.  Increasing executor send and transfer buffer size
  5.  Extending the back pressure interval check (since it can't be disabled, 
which is the old behavior in storm 1)
  6.  Disabling load aware messaging

Can you please let us know how we can go about troubleshooting this issue, 
finding where the root cause / bottleneck is, and possibly a fix? In case it 
matters, our storm topologies are reading from AWS Kinesis Data Streams.

Thanks,
Binh


________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy. Your privacy is important to us. Accenture uses your personal data only 
in compliance with data protection laws. For further information on how 
Accenture processes your personal data, please see our privacy statement at 
https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com

Reply via email to