Max parallelism and reactive mode

Alexis Sarda-Espinosa Thu, 03 Mar 2022 14:34:50 -0800

Hi everyone,

I have some questions regarding max parallelism and how interacts with 
deployment modes. The documentation states that max parallelism should be "set 
on a per-job and per-operator granularity" but doesn't provide more details. Is 
it possible to have different values of max parallelism in different operators? 
I did a test in which my source had a max parallelism of 3, whereas a 
downstream operator had a (non-max) parallelism explicitly set to 4, and the 
job could not be started. Could this related to operator chaining? Or maybe the 
whole job ended up with a max parallelism of 3 because I didn't set it and it 
took the value from the source?


Additionally, the documentation states that, in reactive mode, only max 
parallelism is taken into account, so if I want to limit the number of parallel 
instances of my sources and sinks, I'd have to set their max parallelism, and 
that would be different from that of the rest of the operators.

Moreover, is it possible to switch a job from non-reactive to reactive mode via 
savepoints? What happens if my max parallelism settings change during the 
switch? For example, to limit my sink to a single instance.

In summary, for a hypothetical pipeline that basically does something like: 
source (parallelism between 1 & 3) -> stateful operator (parallelism between 1 
& 32) -> sink (parallelism exactly 1 always)
what should I do regarding max parallelism (both for execution env an 
operators) in normal mode, what should I do in reactive mode, and can I switch 
between modes with savepoints?

Regards,
Alexis.

Max parallelism and reactive mode

Reply via email to