Hi Karthick,

please avoid cross posting lists.

Short version: your instinct is right (commit-on-progress, treat close() as 
best-effort), but a few premises need fixing.

Accurate
- storm kill is graceful; default wait really is TOPOLOGY_MESSAGE_TIMEOUT_SECS 
(Nimbus KILL_TRANSITION).
- close() is best-effort. Javadoc is actually stronger than you quoted: "There 
is no guarantee that close will be called, because the supervisor kill -9's 
worker processes." Only local mode is guaranteed.

Corrections
1. nextTuple() stops at deactivate(). The executor routes to inactiveExecute() 
and sleeps; it does not call nextTuple() (SpoutExecutor.java). Acks still flow, 
so watermarks advance — but anything buffered-but-not-yet-emitted at deactivate 
time will replay on restart.
2. topology.kill.delay.secs doesn't exist. The wait knob is storm kill -w N, 
defaulting to TOPOLOGY_MESSAGE_TIMEOUT_SECS.
3. supervisor.worker.shutdown.sleep.secs defaults to 3s. Way too tight for a 
blocking commitSync. Raise it if you want, but still don't depend on it.

Pattern
storm-kafka-client doesn't commit on ack (afaik): it records on ack and commits 
on a periodic timer, with a best-effort flush in close(). Design rule:

    kill-wait > periodic-commit-interval > worst-case ack latency

If that ordering holds, close() succeeding is gravy, not load-bearing. Shutdown 
hooks aren't a good fit here: they race SIGKILL.

Read external/storm-kafka-client/.../KafkaSpout.java - ack(), 
commitOffsetsForAckedTuples(), the commitTimer, and shutdown(). More useful 
than the JIRAs.


Gruß
Richard

> Am 11.05.2026 um 10:36 schrieb Karthick <[email protected]>:
> 
> Hi
> Apologies for the cold email — I'd appreciate a sanity check on a Storm spout 
> shutdown design before we ship it.
> 
> Setup: 6 worker JVMs across 3 hosts, custom Kafka spout (not 
> storm-kafka-client), non-idempotent bolts. On `storm kill` during deploys, we 
> need every in-flight tuple acked/failed and its offset committed before exit; 
> otherwise the uncommitted window replays and we get duplicates.
> 
> Our current shutdown path:
> - `deactivate()`: pause consumer + emission, keep draining the staging queue 
> via `nextTuple()`, advance watermarks as acks arrive.
> - `close()`: stop consumer loop, `commitSync(watermarks)` via 
> `onPartitionsRevoked`.
> 
> 
> What I think I know (please correct):
> - `storm kill` *is* graceful: spouts get `deactivate()`, then Storm waits 
> `TOPOLOGY_MESSAGE_TIMEOUT_SECS` before tearing workers down.
> - `close()` is explicitly best-effort — the ISpout Javadoc says so, and 
> `supervisor.worker.shutdown.sleep.secs` caps how long the worker has before 
> force-kill.
> 
> If that's right, putting the final `commitSync` in `close()` is structurally 
> fragile — it may never run, or get cut off mid-call. Two questions:
> 
> 1. Is the recommended pattern "commit-on-ack from `nextTuple()`, treat 
> `close()` as best-effort" — so the committed offset already reflects all 
> fully-acked tuples by the time the JVM dies? That seems to be roughly how 
> storm-kafka-client does it.
> 
> 2. Beyond `TOPOLOGY_MESSAGE_TIMEOUT_SECS` and 
> `supervisor.worker.shutdown.sleep.secs`, are there other knobs 
> (`topology.kill.delay.secs`, shutdown hooks) we should tune to widen the 
> drain window?
> 
> Already read: ISpout Javadoc, Running-topologies-on-a-production-cluster.md, 
> STORM-794, STORM-2176, STORM-3355. A pointer to anything I've missed is 
> plenty; I don't want to take up your time re-explaining things.
> 
> Please guide me on the best approach.
> 
> Thanks and regards,
> Karthick.

Reply via email to