Hi
Apologies for the cold email — I'd appreciate a sanity check on a Storm
spout shutdown design before we ship it.

Setup: 6 worker JVMs across 3 hosts, custom Kafka spout (not
storm-kafka-client), non-idempotent bolts. On `storm kill` during deploys,
we need every in-flight tuple acked/failed and its offset committed before
exit; otherwise the uncommitted window replays and we get duplicates.

Our current shutdown path:
- `deactivate()`: pause consumer + emission, keep draining the staging
queue via `nextTuple()`, advance watermarks as acks arrive.
- `close()`: stop consumer loop, `commitSync(watermarks)` via
`onPartitionsRevoked`.


What I think I know (please correct):
- `storm kill` *is* graceful: spouts get `deactivate()`, then Storm waits
`TOPOLOGY_MESSAGE_TIMEOUT_SECS` before tearing workers down.
- `close()` is explicitly best-effort — the ISpout Javadoc says so, and
`supervisor.worker.shutdown.sleep.secs` caps how long the worker has before
force-kill.

If that's right, putting the final `commitSync` in `close()` is
structurally fragile — it may never run, or get cut off mid-call. Two
questions:

1. Is the recommended pattern "commit-on-ack from `nextTuple()`, treat
`close()` as best-effort" — so the committed offset already reflects all
fully-acked tuples by the time the JVM dies? That seems to be roughly how
storm-kafka-client does it.

2. Beyond `TOPOLOGY_MESSAGE_TIMEOUT_SECS` and
`supervisor.worker.shutdown.sleep.secs`, are there other knobs
(`topology.kill.delay.secs`, shutdown hooks) we should tune to widen the
drain window?

Already read: ISpout Javadoc,
Running-topologies-on-a-production-cluster.md, STORM-794, STORM-2176,
STORM-3355. A pointer to anything I've missed is plenty; I don't want to
take up your time re-explaining things.

Please guide me on the best approach.

Thanks and regards,
Karthick.

Reply via email to