date:20230415

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-15 Thread Robert Bradshaw via user

What are you trying to achieve by setting the parallelism? On Sat, Apr 15, 2023 at 5:13 PM Jeff Zhang wrote: > Thanks Reuven, what I mean is to set the parallelism in operator level. > And the input size of the operator is unknown at compiling stage if it is > not a source > operator, > >

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-15 Thread Jeff Zhang

Thanks Reuven, what I mean is to set the parallelism in operator level. And the input size of the operator is unknown at compiling stage if it is not a source operator, Here's an example of flink

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-15 Thread Reuven Lax via user

The maximum parallelism is always determined by the parallelism of your data. If you do a GroupByKey for example, the number of keys in your data determines the maximum parallelism. Beyond the limitations in your data, it depends on your execution engine. If you're using Dataflow, Dataflow is

Is there any way to set the parallelism of operators like group by, join?

2023-04-15 Thread Jeff Zhang

Besides the global parallelism of beam job, is there any way to set parallelism for individual operators like group by and join? I understand the parallelism setting depends on the underlying execution engine, but it is very common to set parallelism like group by and join in both spark & flink.