Hi Karthick,
please avoid cross posting lists.
Short version: your instinct is right (commit-on-progress, treat close() as
best-effort), but a few premises need fixing.
Accurate
- storm kill is graceful; default wait really is TOPOLOGY_MESSAGE_TIMEOUT_SECS
(Nimbus KILL_TRANSITION).
- close() is best-effort. Javadoc is actually stronger than you quoted: "There
is no guarantee that close will be called, because the supervisor kill -9's
worker processes." Only local mode is guaranteed.
Corrections
1. nextTuple() stops at deactivate(). The executor routes to inactiveExecute()
and sleeps; it does not call nextTuple() (SpoutExecutor.java). Acks still flow,
so watermarks advance — but anything buffered-but-not-yet-emitted at deactivate
time will replay on restart.
2. topology.kill.delay.secs doesn't exist. The wait knob is storm kill -w N,
defaulting to TOPOLOGY_MESSAGE_TIMEOUT_SECS.
3. supervisor.worker.shutdown.sleep.secs defaults to 3s. Way too tight for a
blocking commitSync. Raise it if you want, but still don't depend on it.
Pattern
storm-kafka-client doesn't commit on ack (afaik): it records on ack and commits
on a periodic timer, with a best-effort flush in close(). Design rule:
kill-wait > periodic-commit-interval > worst-case ack latency
If that ordering holds, close() succeeding is gravy, not load-bearing. Shutdown
hooks aren't a good fit here: they race SIGKILL.
Read external/storm-kafka-client/.../KafkaSpout.java - ack(),
commitOffsetsForAckedTuples(), the commitTimer, and shutdown(). More useful
than the JIRAs.
Gruß
Richard
> Am 11.05.2026 um 10:36 schrieb Karthick <[email protected]>:
>
> Hi
> Apologies for the cold email — I'd appreciate a sanity check on a Storm spout
> shutdown design before we ship it.
>
> Setup: 6 worker JVMs across 3 hosts, custom Kafka spout (not
> storm-kafka-client), non-idempotent bolts. On `storm kill` during deploys, we
> need every in-flight tuple acked/failed and its offset committed before exit;
> otherwise the uncommitted window replays and we get duplicates.
>
> Our current shutdown path:
> - `deactivate()`: pause consumer + emission, keep draining the staging queue
> via `nextTuple()`, advance watermarks as acks arrive.
> - `close()`: stop consumer loop, `commitSync(watermarks)` via
> `onPartitionsRevoked`.
>
>
> What I think I know (please correct):
> - `storm kill` *is* graceful: spouts get `deactivate()`, then Storm waits
> `TOPOLOGY_MESSAGE_TIMEOUT_SECS` before tearing workers down.
> - `close()` is explicitly best-effort — the ISpout Javadoc says so, and
> `supervisor.worker.shutdown.sleep.secs` caps how long the worker has before
> force-kill.
>
> If that's right, putting the final `commitSync` in `close()` is structurally
> fragile — it may never run, or get cut off mid-call. Two questions:
>
> 1. Is the recommended pattern "commit-on-ack from `nextTuple()`, treat
> `close()` as best-effort" — so the committed offset already reflects all
> fully-acked tuples by the time the JVM dies? That seems to be roughly how
> storm-kafka-client does it.
>
> 2. Beyond `TOPOLOGY_MESSAGE_TIMEOUT_SECS` and
> `supervisor.worker.shutdown.sleep.secs`, are there other knobs
> (`topology.kill.delay.secs`, shutdown hooks) we should tune to widen the
> drain window?
>
> Already read: ISpout Javadoc, Running-topologies-on-a-production-cluster.md,
> STORM-794, STORM-2176, STORM-3355. A pointer to anything I've missed is
> plenty; I don't want to take up your time re-explaining things.
>
> Please guide me on the best approach.
>
> Thanks and regards,
> Karthick.