> > 退订 请发送任意邮件到 user-unsubscr...@flink.apache.org 取消 订阅来自 user@flink.apache.org 邮件列表的邮件,发送到 user@flink.apache.org 是不会取消订阅的。
> 发自我的iPhone > > > ------------------ Original ------------------ > From: Tony Wei <tony19920...@gmail.com> > Date: Tue,Mar 14,2023 1:11 PM > To: David Anderson <dander...@apache.org> > Cc: Hangxiang Yu <master...@gmail.com>, user <user@flink.apache.org> > Subject: Re: is there any detrimental side-effect if i set the max > parallelismas 32768 > > Hi Hangxiang, David, > > Thank you for your replies. Your responses are very helpful. > > Best regards, > Tony Wei > > David Anderson <dander...@apache.org <mailto:dander...@apache.org>> 於 > 2023年3月14日 週二 下午12:12寫道: > I believe there is some noticeable overhead if you are using the > heap-based state backend, but with RocksDB I think the difference is > negligible. > > David > > On Tue, Mar 7, 2023 at 11:10 PM Hangxiang Yu <master...@gmail.com > <mailto:master...@gmail.com>> wrote: > > > > Hi, Tony. > > "be detrimental to performance" means that some extra space overhead of the > > field of the key-group may influence performance. > > As we know, Flink will write the key group as the prefix of the key to > > speed up rescaling. > > So the format will be like: key group | key len | key | ...... > > You could check the relationship between max parallelism and bytes of key > > group as below: > > ------------------------------------------ > > max parallelism bytes of key group > > 128 1 > > 32768 2 > > ------------------------------------------ > > So I think the cost will be very small if the real key length >> 2 bytes. > > > > On Wed, Mar 8, 2023 at 1:06 PM Tony Wei <tony19920...@gmail.com > > <mailto:tony19920...@gmail.com>> wrote: > >> > >> Hi experts, > >> > >>> Setting the maximum parallelism to a very large value can be detrimental > >>> to performance because some state backends have to keep internal data > >>> structures that scale with the number of key-groups (which are the > >>> internal implementation mechanism for rescalable state). > >>> > >>> Changing the maximum parallelism explicitly when recovery from original > >>> job will lead to state incompatibility. > >> > >> > >> I read the section above from Flink official document [1], and I'm > >> wondering what the detail is regarding to the side-effect. > >> > >> Suppose that I have a Flink SQL job with large state, large parallelism > >> and using RocksDB as my state backend. > >> I would like to set the max parallelism as 32768, so that I don't bother > >> if the max parallelism can be divided by the parallelism whenever I want > >> to scale my job, > >> because the number of key groups will not differ too much between each > >> subtask. > >> > >> I'm wondering if this is a good practice, because based on the official > >> document it is not recommended actually. > >> If possible, I would like to know the detail about this side-effect. Which > >> state backend will have this issue? and Why? > >> Please give me an advice. Thanks in advance. > >> > >> Best regards, > >> Tony Wei > >> > >> [1] > >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism > >> > >> <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism> > > > > > > > > -- > > Best, > > Hangxiang.