Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-22 Thread Lijie Wang
> understand a lot of internal implementation details). > > Maybe > > > > it > > > > > > > could > > > > > > > >>>> be a > > > > > > > >>>>>>> future improvement (if it's worthw

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-21 Thread Jing Ge
t; > > >>>>>> be > > > > > > >>>>>>> scheduled in topological order, so the large table side > can > > > > only > > > > > be > > > > > > >>>>>>> scheduled after the Ru

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-20 Thread liu ron
> > > > >>>>>>> relevant experience on it, welcome to give some > suggestions. > > > > > >>>>>>> > > > > > >>>>>>>>> What's the representation of the runtime filter node in > > >

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-20 Thread Jing Ge
>>>>>>>> > > > > >>>>>>>>> Schedule the TableSource(dim) first. > > > > >>>>>>>> > > > > >>>>>>>> How does it know to schedule the TableSource(dim) first ? > IMO, > > > In

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-20 Thread Lijie Wang
; >>>>>>>> Nice to see this valuable feature. After reading the FLIP I > > have > > > > >>>> some > > > > >>>>>>>> questions below: > > > > >>>>>>>> > > > > >>>>>

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread liu ron
;>> value. The same to > > > >> table.optimizer.runtime-filter.max-build-data-size > > > >>>>>>>> > > > >>>>>>>>> the runtime filter can be pushed down along the probe side, > as > > > >&g

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread Jing Ge
it would be reasonable to > > also > > >>>>>>>> support > > >>>>>>>>> "pipeline shuffle" if possible. > > >>>>>>>>> > > >>>>>>>>> "p

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread Lijie Wang
ern that "Even if the RuntimeFilter becomes > >> running > >>>>>>>> before > >>>>>>>>> the RuntimeFilterBuilder finished, it will not process any data > and > >>>>>> will > >>>>>>&g

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread Stefan Richter
>>>>>>> [1] >>>>>> >>>> >> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://www.google.com/url?q%253Dhttps://www.google.com/url?q%25253Dhttps://issues.apache.org/jira/browse/FLINK-25318%252526source%25253Dgmail-imap%

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread Lijie Wang
gt; > >>>>>>> Lijie Wang wangdachui9...@gmail.com> >> wangdachui9...@gmail.com <mailto:wangdachui9...@gmail.com>> wangdachui9...@gmail.com>> > >>>> 于2023年6月15日周四 14:18写道: > >>>>>>> >

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread Stefan Richter
>> between >>>>>>>> RuntimeFilterBuilder and RuntimeFilter to be BLOCKING(regardless of >>>>>> which >>>>>>>> BatchShuffleMode is set). Because the RuntimeFilter really doesn’t >>>> need >>>>>>> to >

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread Lijie Wang
;> > >>>>>> Lijie Wang wangdachui9...@gmail.com> <mailto:wangdachui9...@gmail.com>> > >> 于2023年6月15日周四 09:48写道: > >>>>>> > >>>>>>> Hi Yuxia, > >>>>>>> > >>>>>>>

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread Stefan Richter
available, we will >>>>> not >>>>>>> inject a runtime filter(As you said, we can hardly evaluate the >>>>>> benefits). >>>>>>> Besides, AFAIK, the estimated data size of build side is also based >>>> on >>>>>&

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread Lijie Wang
;> phase) -> No filter > >>>>> Estimated data size meets the requirement (in planner optimization > >>>> phase), > >>>>> but the real data size does not meet the requirement(in execution > >>> phase) > >>>> ->

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-16 Thread Stefan Richter
.@alumni.sjtu.edu.cn>> >>>>> 于2023年6月14日周三 20:37写道: >>>>> >>>>>> Thanks Lijie for starting this discussion. Excited to see runtime >>> filter >>>>>> is to be implemented in Flink. >>>>>> I have few ques

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-15 Thread Lijie Wang
unt > > > >> instead`. So, does row count comes from the statistic from > underlying > > > >> table? What if the the statistic is also unavailable considering > users > > > >> maynot always remember to generate statistic in production. > > > >> I'

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-15 Thread Aitozi
t; benefits of runtime-filter. > > >> > > >> > > >> 2: The FLIP said: "We will inject the runtime filters only if the > > >> following requirements are met:xxx", but it also said, "Once this > limit > > is > > >> exceed

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-15 Thread Benchao Li
We will inject the runtime filters only if the > >> following requirements are met:xxx", but it also said, "Once this limit > is > >> exceeded, it will output a fake filter(which always returns true)" in > >> `RuntimeFilterBuilderOperator` part; Seems they ar

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-15 Thread Lijie Wang
t;> 3: Does it also mean runtime-filter can only take effect in blocking >> shuffle? >> >> >> >> Best regards, >> Yuxia >> >> - 原始邮件 - >> 发件人: "ron9 liu" >> 收件人: "dev" >> 发送时间: 星期三, 2023年 6 月 14日

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-14 Thread Lijie Wang
; 3: Does it also mean runtime-filter can only take effect in blocking > shuffle? > > > > Best regards, > Yuxia > > ----- 原始邮件 - > 发件人: "ron9 liu" > 收件人: "dev" > 发送时间: 星期三, 2023年 6 月 14日 下午 5:29:28 > 主题: Re: [DISCUSS] FLIP-324: Introduce

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-14 Thread yuxia
'm wondering what's the real behavior, no filter will be injected or fake filter? 3: Does it also mean runtime-filter can only take effect in blocking shuffle? Best regards, Yuxia - 原始邮件 - 发件人: "ron9 liu" 收件人: "dev" 发送时间: 星期三, 2023年 6 月 14日 下午 5:29:28 主题: Re: [DIS

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-14 Thread liu ron
Thanks Lijie start this discussion. Runtime Filter is a common optimization to improve the join performance that has been adopted by many computing engines such as Spark, Doris, etc... Flink is a streaming batch computing engine, and we are continuously optimizing the performance of batches.

[DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-14 Thread Lijie Wang
Hi devs Ron Liu, Gen Luo and I would like to start a discussion about FLIP-324: Introduce Runtime Filter for Flink Batch Jobs[1] Runtime Filter is a common optimization to improve join performance. It is designed to dynamically generate filter conditions for certain Join queries at runtime to