Re: Regarding increasing Parallelism of the keyby and window function while migrating

Sambhav Gupta Wed, 28 May 2025 03:35:33 -0700

Hi Pedro,

I tried using disabling changing but it is not working



Providing code snippets for better understanding

```
Datastream<Producer record<Long,String>> discrepancy=
     db2input.keyBy((Key selector<key>)
                     .coGroup(kafkaInput)
                     .where(key)
                     .equalTo(key)
          .window(EndOfStreamWindows.get())
           .apply(FullOuterJoinCoGroupFunction)



Can you please let me know any improvement I can make here to improve the
parallelism of these functions

I have default parallelism of 1 and they are taking that only which is
creating a bottleneck for my disk space.

Thanks,
Sambhav





On Wed, 28 May 2025, 15:18 Pedro Mázala, <pedroh.maz...@gmail.com> wrote:

> Hello there Sambhav!
>
> > I tried using setParallelism() here but I think it doesn't comply with
> flink schema and is using default parallelism instead
> So are you disabling chaining
> <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#disable-chaining>
> to set a per-operator parallelism? Because otherwise Flink will set the
> whole pipeline parallelism as it. Also, depending on the way you handle
> your deployment, the pipeline may have a set parallelism that you cannot
> control. Like if you have a pre-defined TMs and task per TM.
>
>
> >How can I increase the parallelism of this keyBy and window combination
> here.
>
> I think using disable chaining before the keyBy then increasing the
> parallelism of the output of the keyed stream would work out.
>
>
>
>
>
> Att,
> Pedro Mázala
> Be awesome
>
>
> On Wed, 28 May 2025 at 10:42, Sambhav Gupta <sambhavwor...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Following up on request
>>
>> Thanks
>>
>> On Mon, 26 May 2025, 15:18 Sambhav Gupta, <sambhavwor...@gmail.com>
>> wrote:
>>
>>> Hey,
>>>
>>> There is no error with the parallelism .I want to increase it for this
>>> function as it is creating a bottleneck for the disk space which I am not
>>> able to do.
>>>
>>> I tried using setParallelism() here but I think it doesn't comply with
>>> flink schema and is using default parallelism instead
>>>
>>> Can you please help me with this?
>>> How can I increase the parallelism of this keyBy and window combination
>>> here.
>>>
>>> Thanks,
>>> Sambhav Gupta
>>>
>>> On Mon, 26 May 2025, 14:21 Pedro Mázala, <pedroh.maz...@gmail.com>
>>> wrote:
>>>
>>>> What is the error on the parallelism you're facing?
>>>>
>>>>
>>>>
>>>> Att,
>>>> Pedro Mázala
>>>> Be awesome
>>>>
>>>>
>>>> On Mon, 26 May 2025 at 10:13, Sambhav Gupta <sambhavwor...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> We are migrating our codebase of flink to V2.1 version. Here were
>>>>> using dataset jobs which we need to migrate to data stream now and while
>>>>> doing this we faced an error of parallelism of keyby and window function 
>>>>> in
>>>>> our full outerjoin function which is creating bottleneck for us in case of
>>>>> disk storage and compute
>>>>>
>>>>> The code structure
>>>>>
>>>>> We have 2 inputs db2stream and kafka input on which we perform
>>>>> outerjoin function
>>>>>
>>>>> Db2Stream.keyby(key selector)
>>>>> .cogroup(kafka input)
>>>>> .where(key)
>>>>> .equalto(key)
>>>>> .window(endOfStreamWindow.get)
>>>>> .apply(<join function>)
>>>>>
>>>>>
>>>>> Can you  please help me with increasing  Parallelism of this function
>>>>> in anyway so that we can remove our bottleneck while migrating it from
>>>>> dataset to datastream
>>>>>
>>>>> Thanks,
>>>>> Sambhav Gupta
>>>>>
>>>>>
>>>>>
>>>>>

Re: Regarding increasing Parallelism of the keyby and window function while migrating

Reply via email to