Hi all, Suppose I have a WindowedBolt subscribing to a KafkaSpout stream. The WindowedBolt is a sliding window of size 2 with a sliding interval of 1. The stream from KafkaSpout to WindowedBolt is partitioned by field (Field Grouping). To keep things simple let's assume the stream gets partitioned in two groups only, A and B in the context of two supervisors with one worker process each (setNumWorkers(2)):
By default the number of tasks of a spout/bolt is equal to the parallelism hint, correct ? If the WindowedBolt has a parallelism hint of 2 and 2 tasks (setNumTasks(2) or not specified), one thread (executor) running exactly 1 task will execute on each worker/supervisor, correct ? Each executor/task will receive tuples from only one of the partitions, for instance executor/task 1 gets tuples from substream A and executor/task 2 get tuples from substream B, correct ? Assuming the latter is correct, if a node/supervisor fails, will the remaining task begin to receive tuples from the other half of the stream partition ? That is, if executor/task 2 (on supervisor 2) disappears, will the tuples from A and B be interleaved in the sliding window of executor/task 1 ? Or will the substream B just "hang" until another task is started to handle it ? All of this makes sense and my use case might look counterproductive but I need to be sure of what's happening, my workflow is inherently sequential and I don't want "streams" to interleave. Thanks !
