Hello there Sambhav!

> I tried using setParallelism() here but I think it doesn't comply with
flink schema and is using default parallelism instead
So are you disabling chaining
<https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#disable-chaining>
to set a per-operator parallelism? Because otherwise Flink will set the
whole pipeline parallelism as it. Also, depending on the way you handle
your deployment, the pipeline may have a set parallelism that you cannot
control. Like if you have a pre-defined TMs and task per TM.


>How can I increase the parallelism of this keyBy and window combination
here.

I think using disable chaining before the keyBy then increasing the
parallelism of the output of the keyed stream would work out.





Att,
Pedro Mázala
Be awesome


On Wed, 28 May 2025 at 10:42, Sambhav Gupta <sambhavwor...@gmail.com> wrote:

> Hi all,
>
> Following up on request
>
> Thanks
>
> On Mon, 26 May 2025, 15:18 Sambhav Gupta, <sambhavwor...@gmail.com> wrote:
>
>> Hey,
>>
>> There is no error with the parallelism .I want to increase it for this
>> function as it is creating a bottleneck for the disk space which I am not
>> able to do.
>>
>> I tried using setParallelism() here but I think it doesn't comply with
>> flink schema and is using default parallelism instead
>>
>> Can you please help me with this?
>> How can I increase the parallelism of this keyBy and window combination
>> here.
>>
>> Thanks,
>> Sambhav Gupta
>>
>> On Mon, 26 May 2025, 14:21 Pedro Mázala, <pedroh.maz...@gmail.com> wrote:
>>
>>> What is the error on the parallelism you're facing?
>>>
>>>
>>>
>>> Att,
>>> Pedro Mázala
>>> Be awesome
>>>
>>>
>>> On Mon, 26 May 2025 at 10:13, Sambhav Gupta <sambhavwor...@gmail.com>
>>> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> We are migrating our codebase of flink to V2.1 version. Here were using
>>>> dataset jobs which we need to migrate to data stream now and while doing
>>>> this we faced an error of parallelism of keyby and window function in our
>>>> full outerjoin function which is creating bottleneck for us in case of disk
>>>> storage and compute
>>>>
>>>> The code structure
>>>>
>>>> We have 2 inputs db2stream and kafka input on which we perform
>>>> outerjoin function
>>>>
>>>> Db2Stream.keyby(key selector)
>>>> .cogroup(kafka input)
>>>> .where(key)
>>>> .equalto(key)
>>>> .window(endOfStreamWindow.get)
>>>> .apply(<join function>)
>>>>
>>>>
>>>> Can you  please help me with increasing  Parallelism of this function
>>>> in anyway so that we can remove our bottleneck while migrating it from
>>>> dataset to datastream
>>>>
>>>> Thanks,
>>>> Sambhav Gupta
>>>>
>>>>
>>>>
>>>>

Reply via email to