We have such fails with two reasons: 1) Bolt doesn't ack tuple immidiatly, but collects a batch and at some point ack's them all. In that case thes situation when batch bigger than max_spout_pending and some tuples fails.
2) Bolt doesn't ack tuple at all. Make sure Bolt acks or fails tuples without any exclusions. On Wed, Jul 27, 2016 at 10:22 PM, Kevin Peek <[email protected]> wrote: > We have a topology that is experiencing massive amounts of spout failures > without corresponding bolt failures. We have been interpreting these as > tuple timeouts, but we seem to be getting more of these failures than we > understand to be possible with timeouts. > > Our topology uses a Kafka spout and the topology is configured with: > topology.message.timeout.secs = 300 > topology.max.spout.pending = 2500 > > Based on these settings, I would expect the topology to experience a > maximum of 2500 tuple timeouts per 300 seconds. But from the Storm UI, we > see that after running for about 10 minutes, the topology will show about > 50K spout failures and zero bolt failures. > > Am I misunderstanding something that would allow more tuples to time out, > or is there another source of spout failures? > > Thanks in advance, > Kevin Peek >
