Re: is there any detrimental side-effect if i set the max parallelismas 32768

Leonard Xu Wed, 15 Mar 2023 02:01:34 -0700

> 
> 退订
请发送任意邮件到 user-unsubscr...@flink.apache.org 取消 订阅来自 user@flink.apache.org  
邮件列表的邮件，发送到 user@flink.apache.org 是不会取消订阅的。



> 发自我的iPhone
> 
> 
> ------------------ Original ------------------
> From: Tony Wei <tony19920...@gmail.com>
> Date: Tue,Mar 14,2023 1:11 PM
> To: David Anderson <dander...@apache.org>
> Cc: Hangxiang Yu <master...@gmail.com>, user <user@flink.apache.org>
> Subject: Re: is there any detrimental side-effect if i set the max 
> parallelismas 32768
> 
> Hi Hangxiang, David,
> 
> Thank you for your replies. Your responses are very helpful.
> 
> Best regards,
> Tony Wei
> 
> David Anderson <dander...@apache.org <mailto:dander...@apache.org>> 於 
> 2023年3月14日 週二 下午12:12寫道：
> I believe there is some noticeable overhead if you are using the
> heap-based state backend, but with RocksDB I think the difference is
> negligible.
> 
> David
> 
> On Tue, Mar 7, 2023 at 11:10 PM Hangxiang Yu <master...@gmail.com 
> <mailto:master...@gmail.com>> wrote:
> >
> > Hi, Tony.
> > "be detrimental to performance" means that some extra space overhead of the 
> > field of the key-group may influence performance.
> > As we know, Flink will write the key group as the prefix of the key to 
> > speed up rescaling.
> > So the format will be like: key group | key len | key | ......
> > You could check the relationship between max parallelism and bytes of key 
> > group as below:
> > ------------------------------------------
> > max parallelism   bytes of key group
> >        128                    1
> >       32768                 2
> > ------------------------------------------
> > So I think the cost will be very small if the real key length >> 2 bytes.
> >
> > On Wed, Mar 8, 2023 at 1:06 PM Tony Wei <tony19920...@gmail.com 
> > <mailto:tony19920...@gmail.com>> wrote:
> >>
> >> Hi experts,
> >>
> >>> Setting the maximum parallelism to a very large value can be detrimental 
> >>> to performance because some state backends have to keep internal data 
> >>> structures that scale with the number of key-groups (which are the 
> >>> internal implementation mechanism for rescalable state).
> >>>
> >>> Changing the maximum parallelism explicitly when recovery from original 
> >>> job will lead to state incompatibility.
> >>
> >>
> >> I read the section above from Flink official document [1], and I'm 
> >> wondering what the detail is regarding to the side-effect.
> >>
> >> Suppose that I have a Flink SQL job with large state, large parallelism 
> >> and using RocksDB as my state backend.
> >> I would like to set the max parallelism as 32768, so that I don't bother 
> >> if the max parallelism can be divided by the parallelism whenever I want 
> >> to scale my job,
> >> because the number of key groups will not differ too much between each 
> >> subtask.
> >>
> >> I'm wondering if this is a good practice, because based on the official 
> >> document it is not recommended actually.
> >> If possible, I would like to know the detail about this side-effect. Which 
> >> state backend will have this issue? and Why?
> >> Please give me an advice. Thanks in advance.
> >>
> >> Best regards,
> >> Tony Wei
> >>
> >> [1] 
> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism
> >>  
> >> <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism>
> >
> >
> >
> > --
> > Best,
> > Hangxiang.

Re: is there any detrimental side-effect if i set the max parallelismas 32768

Reply via email to