What are you trying to achieve by setting the parallelism?
On Sat, Apr 15, 2023 at 5:13 PM Jeff Zhang wrote:
> Thanks Reuven, what I mean is to set the parallelism in operator level.
> And the input size of the operator is unknown at compiling stage if it is
> not a source
> operator,
>
>
Thanks Reuven, what I mean is to set the parallelism in operator level. And
the input size of the operator is unknown at compiling stage if it is not a
source
operator,
Here's an example of flink
The maximum parallelism is always determined by the parallelism of your
data. If you do a GroupByKey for example, the number of keys in your data
determines the maximum parallelism.
Beyond the limitations in your data, it depends on your execution engine.
If you're using Dataflow, Dataflow is
Besides the global parallelism of beam job, is there any way to set
parallelism for individual operators like group by and join? I
understand the parallelism setting depends on the underlying execution
engine, but it is very common to set parallelism like group by and join in
both spark & flink.