Hello there Sambhav! > I tried using setParallelism() here but I think it doesn't comply with flink schema and is using default parallelism instead So are you disabling chaining <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#disable-chaining> to set a per-operator parallelism? Because otherwise Flink will set the whole pipeline parallelism as it. Also, depending on the way you handle your deployment, the pipeline may have a set parallelism that you cannot control. Like if you have a pre-defined TMs and task per TM.
>How can I increase the parallelism of this keyBy and window combination here. I think using disable chaining before the keyBy then increasing the parallelism of the output of the keyed stream would work out. Att, Pedro Mázala Be awesome On Wed, 28 May 2025 at 10:42, Sambhav Gupta <sambhavwor...@gmail.com> wrote: > Hi all, > > Following up on request > > Thanks > > On Mon, 26 May 2025, 15:18 Sambhav Gupta, <sambhavwor...@gmail.com> wrote: > >> Hey, >> >> There is no error with the parallelism .I want to increase it for this >> function as it is creating a bottleneck for the disk space which I am not >> able to do. >> >> I tried using setParallelism() here but I think it doesn't comply with >> flink schema and is using default parallelism instead >> >> Can you please help me with this? >> How can I increase the parallelism of this keyBy and window combination >> here. >> >> Thanks, >> Sambhav Gupta >> >> On Mon, 26 May 2025, 14:21 Pedro Mázala, <pedroh.maz...@gmail.com> wrote: >> >>> What is the error on the parallelism you're facing? >>> >>> >>> >>> Att, >>> Pedro Mázala >>> Be awesome >>> >>> >>> On Mon, 26 May 2025 at 10:13, Sambhav Gupta <sambhavwor...@gmail.com> >>> wrote: >>> >>>> Hi Team, >>>> >>>> We are migrating our codebase of flink to V2.1 version. Here were using >>>> dataset jobs which we need to migrate to data stream now and while doing >>>> this we faced an error of parallelism of keyby and window function in our >>>> full outerjoin function which is creating bottleneck for us in case of disk >>>> storage and compute >>>> >>>> The code structure >>>> >>>> We have 2 inputs db2stream and kafka input on which we perform >>>> outerjoin function >>>> >>>> Db2Stream.keyby(key selector) >>>> .cogroup(kafka input) >>>> .where(key) >>>> .equalto(key) >>>> .window(endOfStreamWindow.get) >>>> .apply(<join function>) >>>> >>>> >>>> Can you please help me with increasing Parallelism of this function >>>> in anyway so that we can remove our bottleneck while migrating it from >>>> dataset to datastream >>>> >>>> Thanks, >>>> Sambhav Gupta >>>> >>>> >>>> >>>>