from:"Benchao Li"

Re: [VOTE] FLIP-457: Improve Table/SQL Configuration for Flink 2.0

2024-05-26 Thread Benchao Li

+1 (binding)

Yubin Li  于2024年5月25日周六 12:26写道：
>
> +1 (non-binding)
>
> Best,
> Yubin
>
> On Fri, May 24, 2024 at 2:04 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > +1(binding)
> >
> > Best,
> > Rui
> >
> > On Fri, May 24, 2024 at 1:45 PM Leonard Xu  wrote:
> >
> > > +1
> > >
> > > Best,
> > > Leonard
> > >
> > > > 2024年5月24日 下午1:27，weijie guo  写道：
> > > >
> > > > +1(binding)
> > > >
> > > > Best regards,
> > > >
> > > > Weijie
> > > >
> > > >
> > > > Lincoln Lee  于2024年5月24日周五 12:20写道：
> > > >
> > > >> +1(binding)
> > > >>
> > > >> Best,
> > > >> Lincoln Lee
> > > >>
> > > >>
> > > >> Jane Chan  于2024年5月24日周五 09:52写道：
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> I'd like to start a vote on FLIP-457[1] after reaching a consensus
> > > >> through
> > > >>> the discussion thread[2].
> > > >>>
> > > >>> The vote will be open for at least 72 hours unless there is an
> > > objection
> > > >> or
> > > >>> insufficient votes.
> > > >>>
> > > >>>
> > > >>> [1]
> > > >>>
> > > >>
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=307136992
> > > >>> [2] https://lists.apache.org/thread/1sthbv6q00sq52pp04n2p26d70w4fqj1
> > > >>>
> > > >>> Best,
> > > >>> Jane
> > > >>>
> > > >>
> > >
> > >



-- 

Best,
Benchao Li

Re: [DISCUSSION] FLIP-457: Improve Table/SQL Configuration for Flink 2.0

2024-05-19 Thread Benchao Li

I agree with Lincoln about the experimental features.

Some of these configurations do not even have proper implementation,
take 'table.exec.range-sort.enabled' as an example, there was a
discussion[1] about it before.

[1] https://lists.apache.org/thread/q5h3obx36pf9po28r0jzmwnmvtyjmwdr

Lincoln Lee  于2024年5月20日周一 12:01写道：
>
> Hi Jane,
>
> Thanks for the proposal!
>
> +1 for the changes except for these annotated as experimental ones.
>
> For the options annotated as experimental,
>
> +1 for the moving of IncrementalAggregateRule & RelNodeBlock.
>
> For the rest of the options, there are some suggestions:
>
> 1. for the batch related parameters, it's recommended to either delete
> them (leaving the necessary defaults value in place) or leave them as they
> are. Including:
> FlinkRelMdRowCount
> FlinkRexUtil
> BatchPhysicalSortRule
> JoinDeriveNullFilterRule
> BatchPhysicalJoinRuleBase
> BatchPhysicalSortMergeJoinRule
>
> What I understand about the history of these options is that they were once
> used for fine
> tuning for tpc testing, and the current flink planner no longer relies on
> these internal
> options when testing tpc[1]. In addition, these options are too obscure for
> SQL users,
> and some of them are actually magic numbers.
>
> 2. Regarding the options in HashAggCodeGenerator, since this new feature
> has gone
> through a couple of release cycles and could be considered for
> PublicEvolving now,
> cc @Ron Liu   WDYT?
>
> 3. Regarding WindowEmitStrategy, IIUC it is currently unsupported on TVF
> window, so
> it's recommended to keep it untouched for now and follow up in
> FLINK-29692[2]. cc @Xuyang 
>
> [1]
> https://github.com/ververica/flink-sql-benchmark/blob/master/tools/common/flink-conf.yaml
> [2] https://issues.apache.org/jira/browse/FLINK-29692
>
>
> Best,
> Lincoln Lee
>
>
> Yubin Li  于2024年5月17日周五 10:49写道：
>
> > Hi Jane,
> >
> > Thank Jane for driving this proposal !
> >
> > This makes sense for users, +1 for that.
> >
> > Best,
> > Yubin
> >
> > On Thu, May 16, 2024 at 3:17 PM Jark Wu  wrote:
> > >
> > > Hi Jane,
> > >
> > > Thanks for the proposal. +1 from my side.
> > >
> > >
> > > Best,
> > > Jark
> > >
> > > On Thu, 16 May 2024 at 10:28, Xuannan Su  wrote:
> > >
> > > > Hi Jane,
> > > >
> > > > Thanks for driving this effort! And +1 for the proposed changes.
> > > >
> > > > I have one comment on the migration plan.
> > > >
> > > > For options to be moved to another module/package, I think we have to
> > > > mark the old option deprecated in 1.20 for it to be removed in 2.0,
> > > > according to the API compatibility guarantees[1]. We can introduce the
> > > > new option in 1.20 with the same option key in the intended class.
> > > > WDYT?
> > > >
> > > > Best,
> > > > Xuannan
> > > >
> > > > [1]
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#api-compatibility-guarantees
> > > >
> > > >
> > > >
> > > > On Wed, May 15, 2024 at 6:20 PM Jane Chan 
> > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a discussion on FLIP-457: Improve Table/SQL
> > > > Configuration
> > > > > for Flink 2.0 [1]. This FLIP revisited all Table/SQL configurations
> > to
> > > > > improve user-friendliness and maintainability as Flink moves toward
> > 2.0.
> > > > >
> > > > > I am looking forward to your feedback.
> > > > >
> > > > > Best regards,
> > > > > Jane
> > > > >
> > > > > [1]
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=307136992
> > > >
> >



-- 

Best,
Benchao Li

Re: [VOTE] FLIP-452: Allow Skipping Invocation of Function Calls While Constant-folding

2024-05-08 Thread Benchao Li

+1 (binding)

Timo Walther  于2024年5月8日周三 17:15写道：
>
> +1 (binding)
>
> Thanks,
> Timo
>
> On 08.05.24 11:10, Stefan Richter wrote:
> > Hi Alan,
> >
> > Thanks for this proposal, the ability to exclude functions from constant 
> > folding makes sense to me.
> >
> > +1 (binding)
> >
> > Best,
> > Stefan
> >
> >> On 8. May 2024, at 02:01, Alan Sheinberg  
> >> wrote:
> >>
> >> Hi everyone,
> >>
> >> I'd like to start a vote on FLIP-452 [1]. It covers adding a new method
> >> FunctionDefinition.supportsConstantFolding() as part of the Flink Table/SQL
> >> API to allow skipping invocation of functions while constant-folding. It
> >> has been discussed in this thread [2].
> >>
> >> I would like to start a vote.  The vote will be open for at least 72 hours
> >> unless there is an objection or insufficient votes.
> >>
> >> [1]
> >> https://www.google.com/url?q=https://cwiki.apache.org/confluence/display/FLINK/FLIP-452%253A%2BAllow%2BSkipping%2BInvocation%2Bof%2BFunction%2BCalls%2BWhile%2BConstant-folding=gmail-imap=171573131400=AOvVaw3sVTK3M3Qs45haptzQbUmo
> >>
> >> [2] 
> >> https://www.google.com/url?q=https://lists.apache.org/thread/ko5ndv5kr87nm011psll2hzzd0nn3ztz=gmail-imap=171573131400=AOvVaw3YKYwhLhbgWkX5hbzHRW31
> >>
> >> Thanks,
> >> Alan
> >
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink PMC Member - Lincoln Lee

2024-04-14 Thread Benchao Li

Congratulations, Lincoln. Well deserved!

Leonard Xu  于2024年4月14日周日 21:19写道：
>
> Congratulations, Lincoln~
>
> Best,
> Leonard
>
>
>
> > 2024年4月12日 下午4:40，Yuepeng Pan  写道：
> >
> > Congratulations, Lincoln!
> >
> > Best,Yuepeng Pan
> > At 2024-04-12 16:24:01, "Yun Tang"  wrote:
> >> Congratulations, Lincoln!
> >>
> >>
> >> Best
> >> Yun Tang
> >> 
> >> From: Jark Wu 
> >> Sent: Friday, April 12, 2024 15:59
> >> To: dev 
> >> Cc: Lincoln Lee 
> >> Subject: [ANNOUNCE] New Apache Flink PMC Member - Lincoln Lee
> >>
> >> Hi everyone,
> >>
> >> On behalf of the PMC, I'm very happy to announce that Lincoln Lee has
> >> joined the Flink PMC!
> >>
> >> Lincoln has been an active member of the Apache Flink community for
> >> many years. He mainly works on Flink SQL component and has driven
> >> /pushed many FLIPs around SQL, including FLIP-282/373/415/435 in
> >> the recent versions. He has a great technical vision of Flink SQL and
> >> participated in plenty of discussions in the dev mailing list. Besides
> >> that,
> >> he is community-minded, such as being the release manager of 1.19,
> >> verifying releases, managing release syncs, writing the release
> >> announcement etc.
> >>
> >> Congratulations and welcome Lincoln!
> >>
> >> Best,
> >> Jark (on behalf of the Flink PMC)
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink PMC Member - Jing Ge

2024-04-14 Thread Benchao Li

Congratulations, Jing. Well deserved!

Leonard Xu  于2024年4月14日周日 21:18写道：
>
> Congratulations, Jing~
>
> Best,
> Leonard
>
> > 2024年4月14日 下午4:23，Xia Sun  写道：
> >
> > Congratulations, Jing!
> >
> > Best,
> > Xia
> >
> > Ferenc Csaky  于2024年4月13日周六 00:50写道：
> >
> >> Congratulations, Jing!
> >>
> >> Best,
> >> Ferenc
> >>
> >>
> >>
> >> On Friday, April 12th, 2024 at 13:54, Ron liu  wrote:
> >>
> >>>
> >>>
> >>> Congratulations, Jing!
> >>>
> >>> Best,
> >>> Ron
> >>>
> >>> Junrui Lee jrlee@gmail.com 于2024年4月12日周五 18:54写道：
> >>>
> >>>> Congratulations, Jing!
> >>>>
> >>>> Best,
> >>>> Junrui
> >>>>
> >>>> Aleksandr Pilipenko z3d...@gmail.com 于2024年4月12日周五 18:28写道：
> >>>>
> >>>>> Congratulations, Jing!
> >>>>>
> >>>>> Best Regards,
> >>>>> Aleksandr
> >>
>


-- 

Best,
Benchao Li

Re: Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Benchao Li

Congratulations!

Zakelly Lan  于2024年3月29日周五 10:25写道：
>
> Congratulations!
>
>
> Best,
> Zakelly
>
> On Thu, Mar 28, 2024 at 10:13 PM Jing Ge  wrote:
>
> > Congrats!
> >
> > Best regards,
> > Jing
> >
> > On Thu, Mar 28, 2024 at 1:27 PM Feifan Wang  wrote:
> >
> > > Congratulations!——
> > >
> > > Best regards,
> > >
> > > Feifan Wang
> > >
> > >
> > >
> > >
> > > At 2024-03-28 20:02:43, "Yanfei Lei"  wrote:
> > > >Congratulations!
> > > >
> > > >Best,
> > > >Yanfei
> > > >
> > > >Zhanghao Chen  于2024年3月28日周四 19:59写道：
> > > >>
> > > >> Congratulations!
> > > >>
> > > >> Best,
> > > >> Zhanghao Chen
> > > >> 
> > > >> From: Yu Li 
> > > >> Sent: Thursday, March 28, 2024 15:55
> > > >> To: d...@paimon.apache.org 
> > > >> Cc: dev ; user 
> > > >> Subject: Re: [ANNOUNCE] Apache Paimon is graduated to Top Level
> > Project
> > > >>
> > > >> CC the Flink user and dev mailing list.
> > > >>
> > > >> Paimon originated within the Flink community, initially known as Flink
> > > >> Table Store, and all our incubating mentors are members of the Flink
> > > >> Project Management Committee. I am confident that the bonds of
> > > >> enduring friendship and close collaboration will continue to unite the
> > > >> two communities.
> > > >>
> > > >> And congratulations all!
> > > >>
> > > >> Best Regards,
> > > >> Yu
> > > >>
> > > >> On Wed, 27 Mar 2024 at 20:35, Guojun Li 
> > > wrote:
> > > >> >
> > > >> > Congratulations!
> > > >> >
> > > >> > Best,
> > > >> > Guojun
> > > >> >
> > > >> > On Wed, Mar 27, 2024 at 5:24 PM wulin  wrote:
> > > >> >
> > > >> > > Congratulations~
> > > >> > >
> > > >> > > > 2024年3月27日 15:54，王刚  写道：
> > > >> > > >
> > > >> > > > Congratulations~
> > > >> > > >
> > > >> > > >> 2024年3月26日 10:25，Jingsong Li  写道：
> > > >> > > >>
> > > >> > > >> Hi Paimon community,
> > > >> > > >>
> > > >> > > >> I’m glad to announce that the ASF board has approved a
> > > resolution to
> > > >> > > >> graduate Paimon into a full Top Level Project. Thanks to
> > > everyone for
> > > >> > > >> your help to get to this point.
> > > >> > > >>
> > > >> > > >> I just created an issue to track the things we need to modify
> > > [2],
> > > >> > > >> please comment on it if you feel that something is missing. You
> > > can
> > > >> > > >> refer to apache documentation [1] too.
> > > >> > > >>
> > > >> > > >> And, we already completed the GitHub repo migration [3], please
> > > update
> > > >> > > >> your local git repo to track the new repo [4].
> > > >> > > >>
> > > >> > > >> You can run the following command to complete the remote repo
> > > tracking
> > > >> > > >> migration.
> > > >> > > >>
> > > >> > > >> git remote set-url origin https://github.com/apache/paimon.git
> > > >> > > >>
> > > >> > > >> If you have a different name, please change the 'origin' to
> > your
> > > remote
> > > >> > > name.
> > > >> > > >>
> > > >> > > >> Please join me in celebrating!
> > > >> > > >>
> > > >> > > >> [1]
> > > >> > >
> > >
> > https://incubator.apache.org/guides/transferring.html#life_after_graduation
> > > >> > > >> [2] https://github.com/apache/paimon/issues/3091
> > > >> > > >> [3] https://issues.apache.org/jira/browse/INFRA-25630
> > > >> > > >> [4] https://github.com/apache/paimon
> > > >> > > >>
> > > >> > > >> Best,
> > > >> > > >> Jingsong Lee
> > > >> > >
> > > >> > >
> > >
> >



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-434: Support optimizations for pre-partitioned data sources

2024-03-21 Thread Benchao Li

Jeyhun,

Sorry for the delay. And thanks for the explanation, it sounds good to me!

Jeyhun Karimov  于2024年3月16日周六 05:09写道：
>
> Hi Benchao,
>
> Thanks for your comments.
>
> 1. What the parallelism would you take? E.g., 128 + 256 => 128? What
> > if we cannot have a good greatest common divisor, like 127 + 128,
> > could we just utilize one side's pre-partitioned attribute, and let
> > another side just do the shuffle?
>
>
> There are two cases we need to consider:
>
> 1. Static Partition (no partitions are added during the query execution) is
> enabled AND both sources implement "SupportsPartitionPushdown"
>
> In this case, we are sure that no new partitions will be added at runtime.
> So, we have a chance equalize both sources' partitions and parallelism, IFF
> both sources implement "SupportsPartitionPushdown" interface.
> To achieve so, first we will fetch the existing partitions from source1
> (say p_s1) and from source2 (say p_s2).
> Then, we find the intersection of these two partition sets (say
> p_intersect) and pushdown these partitions:
>
> SupportsPartitionPushDown::applyPartitions(p_intersect) // make sure that
> only specific partitions are read
> SupportsPartitioning::applyPartitionedRead(p_intersect) // partitioned read
> with filtered partitions
>
> Lastly, we need to change the parallelism of 1) source1, 2) source2, and 3)
> all of their downstream operators until (and including) their first common
> ancestor (e.g., join) to be equal to the number of partitions (size of
> p_intersect).
>
> 2. All other cases
>
> In all other cases, the parallelism of both sources and their downstream
> operators until their common ancestor would be equal to the MIN(p_s1,
> p_s2).
> That is, minimum of the partition size of source1 and partition size of
> source2 will be selected as the parallelism.
> Coming back to your example, if source1 parallelism is 127 and source2
> parallelism is 128, then we will first check the partition size of source1
> and source2.
> Say partition size of source1 is 100 and partition size of source2 is 90.
> Then, we would set the parallelism for source1, source2, and all of their
> downstream operators until (and including) the join operator
> to 90 (min(100, 90)).
> We also plan to implement a cost based decision instead of the rule-based
> one (the ones explained above - MIN rule).
> One  possible result of the cost based estimation is to keep the partitions
> on one side and perform the shuffling on another source.
>
>
>
> 2. In our current shuffle remove design (FlinkExpandConversionRule),
> > we don't consider parallelism, we just remove unnecessary shuffles
> > according to the distribution columns. After this FLIP, the
> > parallelism may be bundled with source's partitions, then how will
> > this optimization accommodate with FlinkExpandConversionRule, will you
> > also change downstream operator's parallelisms if we want to also
> > remove subsequent shuffles?
>
>
>
> - From my understanding of FlinkExpandConversionRule, its removal logic is
> agnostic to operator parallelism.
> So, if FlinkExpandConversionRule decides to remove a shuffle operation,
> then this FLIP will search another possible shuffle (the one closest to the
> source) to remove.
> If there is such an opportunity, this FLIP will remove the shuffle. So,
> from my understanding FlinkExpandConversionRule and this optimization rule
> can work together safely.
> Please correct me if I misunderstood your question.
>
>
>
> Regarding the new optimization rule, have you also considered to allow
> > some non-strict mode like FlinkRelDistribution#requireStrict? For
> > example, source is pre-partitioned by a, b columns, if we are
> > consuming this source, and do a aggregate on a, b, c, can we utilize
> > this optimization?
>
>
> - Good point. Yes, there are some cases that non-strict mode will apply.
> For example:
>
> - pre-partitioned columns and aggregate columns are the same but have
> different order (e.g., source pre-partitioned  w.r.t. a,b and aggregate has
> a GROUP BY b,a)
> - columns in the Exchange operator is a list-prefix of pre-partitoned
> columns of source (e.g., source is pre-partitioned w.r.t. a,b,c and
> Exchange's partition columns are a,b)
>
> Please let me know if the above answers your questions or if you have any
> other comments.
>
> Regards,
> Jeyhun
>
> On Thu, Mar 14, 2024 at 12:48 PM Benchao Li  wrote:
>
> > Thanks Jeyhun for bringing up this discussion, it is really exiting,
> > +1 for the general idea.
> >
> > We also introduced a similar concept in Flink Batch internally to cope
>

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-21 Thread Benchao Li

Congratulations, and thanks for the great work!

Yuan Mei  于2024年3月21日周四 18:31写道：
>
> Thanks for driving these efforts!
>
> Congratulations
>
> Best
> Yuan
>
> On Thu, Mar 21, 2024 at 4:35 PM Yu Li  wrote:
>
> > Congratulations and look forward to its further development!
> >
> > Best Regards,
> > Yu
> >
> > On Thu, 21 Mar 2024 at 15:54, ConradJam  wrote:
> > >
> > > Congrattulations!
> > >
> > > Leonard Xu  于2024年3月20日周三 21:36写道：
> > >
> > > > Hi devs and users,
> > > >
> > > > We are thrilled to announce that the donation of Flink CDC as a
> > > > sub-project of Apache Flink has completed. We invite you to explore
> > the new
> > > > resources available:
> > > >
> > > > - GitHub Repository: https://github.com/apache/flink-cdc
> > > > - Flink CDC Documentation:
> > > > https://nightlies.apache.org/flink/flink-cdc-docs-stable
> > > >
> > > > After Flink community accepted this donation[1], we have completed
> > > > software copyright signing, code repo migration, code cleanup, website
> > > > migration, CI migration and github issues migration etc.
> > > > Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
> > Qingsheng
> > > > Ren, Jiabao Sun, LvYanquan, loserwang1024 and other contributors for
> > their
> > > > contributions and help during this process！
> > > >
> > > >
> > > > For all previous contributors: The contribution process has slightly
> > > > changed to align with the main Flink project. To report bugs or
> > suggest new
> > > > features, please open tickets
> > > > Apache Jira (https://issues.apache.org/jira).  Note that we will no
> > > > longer accept GitHub issues for these purposes.
> > > >
> > > >
> > > > Welcome to explore the new repository and documentation. Your feedback
> > and
> > > > contributions are invaluable as we continue to improve Flink CDC.
> > > >
> > > > Thanks everyone for your support and happy exploring Flink CDC!
> > > >
> > > > Best，
> > > > Leonard
> > > > [1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
> > > >
> > > >
> > >
> > > --
> > > Best
> > >
> > > ConradJam
> >



-- 

Best,
Benchao Li

Re: [VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-20 Thread Benchao Li

+1 (binding)

gongzhongqiang  于2024年3月20日周三 11:40写道：
>
> +1 (non-binding)
>
> Best,
> Zhongqiang Gong
>
> Yubin Li  于2024年3月19日周二 18:03写道：
>
> > Hi everyone,
> >
> > Thanks for all the feedback, I'd like to start a vote on the FLIP-436:
> > Introduce Catalog-related Syntax [1]. The discussion thread is here
> > [2].
> >
> > The vote will be open for at least 72 hours unless there is an
> > objection or insufficient votes.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > [2] https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
> >
> > Best regards,
> > Yubin
> >



-- 

Best,
Benchao Li

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 Thread Benchao Li

Congratulations! And thanks to all release managers and everyone
involved in this release!

Yubin Li  于2024年3月18日周一 18:11写道：
>
> Congratulations!
>
> Thanks to release managers and everyone involved.
>
> On Mon, Mar 18, 2024 at 5:55 PM Hangxiang Yu  wrote:
> >
> > Congratulations!
> > Thanks release managers and all involved!
> >
> > On Mon, Mar 18, 2024 at 5:23 PM Hang Ruan  wrote:
> >
> > > Congratulations!
> > >
> > > Best,
> > > Hang
> > >
> > > Paul Lam  于2024年3月18日周一 17:18写道：
> > >
> > > > Congrats! Thanks to everyone involved!
> > > >
> > > > Best,
> > > > Paul Lam
> > > >
> > > > > 2024年3月18日 16:37，Samrat Deb  写道：
> > > > >
> > > > > Congratulations !
> > > > >
> > > > > On Mon, 18 Mar 2024 at 2:07 PM, Jingsong Li 
> > > > wrote:
> > > > >
> > > > >> Congratulations!
> > > > >>
> > > > >> On Mon, Mar 18, 2024 at 4:30 PM Rui Fan <1996fan...@gmail.com> wrote:
> > > > >>>
> > > > >>> Congratulations, thanks for the great work!
> > > > >>>
> > > > >>> Best,
> > > > >>> Rui
> > > > >>>
> > > > >>> On Mon, Mar 18, 2024 at 4:26 PM Lincoln Lee 
> > > > >> wrote:
> > > > >>>>
> > > > >>>> The Apache Flink community is very happy to announce the release of
> > > > >> Apache Flink 1.19.0, which is the fisrt release for the Apache Flink
> > > > 1.19
> > > > >> series.
> > > > >>>>
> > > > >>>> Apache Flink® is an open-source stream processing framework for
> > > > >> distributed, high-performing, always-available, and accurate data
> > > > streaming
> > > > >> applications.
> > > > >>>>
> > > > >>>> The release is available for download at:
> > > > >>>> https://flink.apache.org/downloads.html
> > > > >>>>
> > > > >>>> Please check out the release blog post for an overview of the
> > > > >> improvements for this bugfix release:
> > > > >>>>
> > > > >>
> > > >
> > > https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> > > > >>>>
> > > > >>>> The full release notes are available in Jira:
> > > > >>>>
> > > > >>
> > > >
> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> > > > >>>>
> > > > >>>> We would like to thank all contributors of the Apache Flink
> > > community
> > > > >> who made this release possible!
> > > > >>>>
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Yun, Jing, Martijn and Lincoln
> > > > >>
> > > >
> > > >
> > >
> >
> >
> > --
> > Best,
> > Hangxiang.



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-434: Support optimizations for pre-partitioned data sources

2024-03-14 Thread Benchao Li

t; > @Override
> > public void apply(DynamicTableSource tableSource, SourceAbilityContext
> > context) {
> > if (tableSource instanceof SupportsPartitioning) {
> > ((SupportsPartitioning) tableSource).applyPartitionedRead();
> > } else {
> > throw new TableException(
> > String.format(
> > "%s does not support SupportsPartitioning.",
> > tableSource.getClass().getName()));
> > }
> > }
> >   // some code here
> > }
> >
> > 
> > SourceAbilitySpec class
> > 
> > @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, include =
> > JsonTypeInfo.As.PROPERTY, property = "type")
> > @JsonSubTypes({
> > @JsonSubTypes.Type(value = FilterPushDownSpec.class),
> > @JsonSubTypes.Type(value = LimitPushDownSpec.class),
> > @JsonSubTypes.Type(value = PartitionPushDownSpec.class),
> > @JsonSubTypes.Type(value = ProjectPushDownSpec.class),
> > @JsonSubTypes.Type(value = ReadingMetadataSpec.class),
> > @JsonSubTypes.Type(value = WatermarkPushDownSpec.class),
> > @JsonSubTypes.Type(value = SourceWatermarkSpec.class),
> > @JsonSubTypes.Type(value = AggregatePushDownSpec.class),
> > +  @JsonSubTypes.Type(value = PartitioningSpec.class)   //
> > new added
> >
> >
> >
> > Please let me know if that answers your questions or if you have other
> > comments.
> >
> > Regards,
> > Jeyhun
> >
> >
> > On Tue, Mar 12, 2024 at 8:56 AM Jane Chan  wrote:
> >
> > > Hi Jeyhun,
> > >
> > > Thank you for leading the discussion. I'm generally +1 with this
> > proposal,
> > > along with some questions. Please see my comments below.
> > >
> > > 1. Concerning the `sourcePartitions()` method, the partition information
> > > returned during the optimization phase may not be the same as the
> > partition
> > > information during runtime execution. For long-running jobs, partitions
> > may
> > > be continuously created. Is this FLIP equipped to handle scenarios?
> > >
> > > 2. Regarding the `RemoveRedundantShuffleRule` optimization rule, I
> > > understand that it is also necessary to verify whether the hash key
> > within
> > > the Exchange node is consistent with the partition key defined in the
> > table
> > > source that implements `SupportsPartitioning`.
> > >
> > > 3. Could you elaborate on the desired physical plan and integration with
> > > `CompiledPlan` to enhance the overall functionality?
> > >
> > > Best,
> > > Jane
> > >
> > > On Tue, Mar 12, 2024 at 11:11 AM Jim Hughes  > >
> > > wrote:
> > >
> > > > Hi Jeyhun,
> > > >
> > > > I like the idea!  Given FLIP-376[1], I wonder if it'd make sense to
> > > > generalize FLIP-434 to be about "pre-divided" data to cover "buckets"
> > and
> > > > "partitions" (and maybe even situations where a data source is
> > > partitioned
> > > > and bucketed).
> > > >
> > > > Separate from that, the page mentions TPC-H Q1 as an example.  For a
> > > join,
> > > > any two tables joined on the same bucket key should provide a concrete
> > > > example of a join.  Systems like Kafka Streams/ksqlDB call this
> > > > "co-partitioning"; for those systems, it is a requirement placed on the
> > > > input sources.  For Flink, with FLIP-434, the proposed planner rule
> > > > could remove the shuffle.
> > > >
> > > > Definitely a fun idea; I look forward to hearing more!
> > > >
> > > > Cheers,
> > > >
> > > > Jim
> > > >
> > > >
> > > > 1.
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
> > > > 2.
> > > >
> > > >
> > >
> > https://docs.ksqldb.io/en/latest/developer-guide/joins/partition-data/#co-partitioning-requirements
> > > >
> > > > On Sun, Mar 10, 2024 at 3:38 PM Jeyhun Karimov 
> > > > wrote:
> > > >
> > > > > Hi devs,
> > > > >
> > > > > I’d like to start a discussion on FLIP-434: Support optimizations for
> > > > > pre-partitioned data sources [1].
> > > > >
> > > > > The FLIP introduces taking advantage of pre-partitioned data sources
> > > for
> > > > > SQL/Table API (it is already supported as experimental feature in
> > > > > DataStream API [2]).
> > > > >
> > > > >
> > > > > Please find more details in the FLIP wiki document [1].
> > > > > Looking forward to your feedback.
> > > > >
> > > > > Regards,
> > > > > Jeyhun
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-434%3A+Support+optimizations+for+pre-partitioned+data+sources
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/experimental/
> > > > >
> > > >
> > >
> >



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-436: Introduce "SHOW CREATE CATALOG" Syntax

2024-03-14 Thread Benchao Li

s
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Wed, Mar 13, 2024 at 5:09 PM Yubin Li 
> > wrote:
> > > > > > >
> > > > > > > Hi devs,
> > > > > > >
> > > > > > > I'd like to start a discussion about FLIP-436: Introduce "SHOW
> > > CREATE
> > > > > > > CATALOG" Syntax [1].
> > > > > > >
> > > > > > > At present, the `SHOW CREATE TABLE` statement provides strong
> > > support
> > > > > for
> > > > > > > users to easily
> > > > > > > reuse created tables. However, despite the increasing importance
> > > of the
> > > > > > > `Catalog` in user's
> > > > > > > business, there is no similar statement for users to use.
> > > > > > >
> > > > > > > According to the online discussion in FLINK-24939 [2] with Jark
> > Wu
> > > and
> > > > > > Feng
> > > > > > > Jin, since `CatalogStore`
> > > > > > > has been introduced in FLIP-295 [3], we could use this component
> > to
> > > > > > > implement such a long-awaited
> > > > > > > feature, Please refer to the document [1] for implementation
> > > details.
> > > > > > >
> > > > > > > examples as follows:
> > > > > > >
> > > > > > > Flink SQL> create catalog cat2 WITH ('type'='generic_in_memory',
> > > > > > > > 'default-database'='db');
> > > > > > > > [INFO] Execute statement succeeded.
> > > > > > > > Flink SQL> show create catalog cat2;
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> > ++
> > > > > > > > | result |
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> > ++
> > > > > > > > | CREATE CATALOG `cat2` WITH (
> > > > > > > >   'default-database' = 'db',
> > > > > > > >   'type' = 'generic_in_memory'
> > > > > > > > )
> > > > > > > >  |
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> > ++
> > > > > > > > 1 row in set
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Looking forward to hearing from you, thanks!
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Yubin
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=296290756
> > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-24939
> > > > > > > [3]
> > > > > > >
> > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> > > > > >
> > > > >
> > >
> >



-- 

Best,
Benchao Li

Re: [VOTE] Release 1.19.0, release candidate #2

2024-03-12 Thread Benchao Li

> > +--+
> > > > > > > > > > > 1 row in set
> > > > > > > > > > >
> > > > > > > > > > > Flink SQL> DESCRIBE DATABASE default_database;
> > > > > > > > > > > [ERROR] Could not execute SQL statement. Reason:
> > > > > > > > > > > org.apache.calcite.sql.validate.SqlValidatorException:
> > > Column
> > > > > > > > > > > 'default_database' not found in
> > > > > > > > > > > any table
> > > > > > > > > > > ```
> > > > > > > > > > >
> > > > > > > > > > > Is this an error in the release notes, or my mistake in
> > > > > > > interpreting
> > > > > > > > > > them?
> > > > > > > > > > >
> > > > > > > > > > > thanks, Robin.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 7 Mar 2024 at 10:01, Lincoln Lee <
> > > > > lincoln.8...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > Please review and vote on the release candidate #2 for
> > > the
> > > > > > > version
> > > > > > > > > > > 1.19.0,
> > > > > > > > > > > > as follows:
> > > > > > > > > > > > [ ] +1, Approve the release
> > > > > > > > > > > > [ ] -1, Do not approve the release (please provide
> > > specific
> > > > > > > > comments)
> > > > > > > > > > > >
> > > > > > > > > > > > The complete staging area is available for your review,
> > > which
> > > > > > > > > includes:
> > > > > > > > > > > >
> > > > > > > > > > > > * JIRA release notes [1], and the pull request adding
> > > release
> > > > > > > note
> > > > > > > > > for
> > > > > > > > > > > > users [2]
> > > > > > > > > > > > * the official Apache source release and binary
> > > convenience
> > > > > > > > releases
> > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > deployed to dist.apache.org [3], which are signed with
> > > the
> > > > > key
> > > > > > > > with
> > > > > > > > > > > > fingerprint E57D30ABEE75CA06  [4],
> > > > > > > > > > > > * all artifacts to be deployed to the Maven Central
> > > > > Repository
> > > > > > > [5],
> > > > > > > > > > > > * source code tag "release-1.19.0-rc2" [6],
> > > > > > > > > > > > * website pull request listing the new release and
> > adding
> > > > > > > > > announcement
> > > > > > > > > > > blog
> > > > > > > > > > > > post [7].
> > > > > > > > > > > >
> > > > > > > > > > > > The vote will be open for at least 72 hours. It is
> > > adopted by
> > > > > > > > > majority
> > > > > > > > > > > > approval, with at least 3 PMC affirmative votes.
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> > > > > > > > > > > > [2] https://github.com/apache/flink/pull/24394
> > > > > > > > > > > > [3]
> > > > > > > https://dist.apache.org/repos/dist/dev/flink/flink-1.19.0-rc2/
> > > > > > > > > > > > [4]
> > > https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > > > > > > > > [5]
> > > > > > > > > > >
> > > > > > > >
> > > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1709
> > > > > > > > > > > > [6]
> > > > > > > >
> > https://github.com/apache/flink/releases/tag/release-1.19.0-rc2
> > > > > > > > > > > > [7] https://github.com/apache/flink-web/pull/721
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yun, Jing, Martijn and Lincoln
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >



-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun

2024-02-19 Thread Benchao Li

Congrats, Jiabao!

Zhanghao Chen  于2024年2月19日周一 18:42写道：
>
> Congrats, Jiaba!
>
> Best,
> Zhanghao Chen
> 
> From: Qingsheng Ren 
> Sent: Monday, February 19, 2024 17:53
> To: dev ; jiabao...@apache.org 
> Subject: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun
>
> Hi everyone,
>
> On behalf of the PMC, I'm happy to announce Jiabao Sun as a new Flink
> Committer.
>
> Jiabao began contributing in August 2022 and has contributed 60+ commits
> for Flink main repo and various connectors. His most notable contribution
> is being the core author and maintainer of MongoDB connector, which is
> fully functional in DataStream and Table/SQL APIs. Jiabao is also the
> author of FLIP-377 and the main contributor of JUnit 5 migration in runtime
> and table planner modules.
>
> Beyond his technical contributions, Jiabao is an active member of our
> community, participating in the mailing list and consistently volunteering
> for release verifications and code reviews with enthusiasm.
>
> Please join me in congratulating Jiabao for becoming an Apache Flink
> committer!
>
> Best,
> Qingsheng (on behalf of the Flink PMC)



-- 

Best,
Benchao Li

Re: [DISCUSS] Alternative way of posting FLIPs

2024-02-11 Thread Benchao Li

 > Lincoln Lee
> > > > > >
> > > > > >
> > > > > > Martijn Visser  于2024年2月7日周三 21:51写道：
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > ASF Infra has confirmed to me that only ASF committers can access
> > > the
> > > > > > > ASF Confluence site since a recent change. One of the results of
> > > this
> > > > > > > decision is that users can't signup and access Confluence, so
> > only
> > > > > > > committers+ can create FLIPs.
> > > > > > >
> > > > > > > ASF Infra hopes to improve this situation when they move to the
> > > Cloud
> > > > > > > shortly (as in: some months), but they haven't committed on an
> > > actual
> > > > > > > date. The idea would be that we find a temporary solution until
> > > > anyone
> > > > > > > can request access to Confluence.
> > > > > > >
> > > > > > > There are a couple of ways we could resolve this situation:
> > > > > > > 1. Contributors create a Google Doc and make that view-only, and
> > > post
> > > > > > > that Google Doc to the mailing list for a discussion thread. When
> > > the
> > > > > > > discussions have been resolved, the contributor ask on the Dev
> > > > mailing
> > > > > > > list to a committer/PMC to copy the contents from the Google Doc,
> > > and
> > > > > > > create a FLIP number for them. The contributor can then use that
> > > FLIP
> > > > > > > to actually have a VOTE thread.
> > > > > > > 2. We could consider moving FLIPs to "Discussions" on Github,
> > like
> > > > > > > Airflow does at https://github.com/apache/airflow/discussions
> > > > > > > 3. Perhaps someone else has another good idea.
> > > > > > >
> > > > > > > Looking forward to your thoughts.
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Martijn
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Sergey
> >



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP 411: Chaining-agnostic Operator ID generation for improved state compatibility on parallelism change

2024-02-11 Thread Benchao Li

gt;>
> >>>> Thanks for driving this, that’s really painful for us when we need to
> >>> switch config `pipeline.operator-chaining`.
> >>>> But I have a Concern, according to FLIP description, modifying
> >>> `isChainable` related code in `StreamGraphHasherV2` will cause the
> >>> generated operator id to be changed, which will result in the user unable
> >>> to recover from the old state (old and new Operator IDs can't be mapped).
> >>>> Therefore switching Hasher strategy (V2->V3 or V3->V2) will lead to an
> >>> incompatibility, is there any relevant compatibility design considered?
> >>>> Best,
> >>>> Yu Chen
> >>>>
> >>>> 2024年1月10日 10:25，Zhanghao Chen  >>> zhanghao.c...@outlook.com>> 写道：
> >>>> Hi David,
> >>>>
> >>>> Thanks for the comments. AFAIK, unaligned checkpoints are disabled for
> >>> pointwise connections according to [1], let's wait Piotr for confirmation.
> >>> The issue itself is not directly related to this proposal as well. If a
> >>> user manually specifies UIDs for each of the chained operators and has
> >>> unaligned checkpoints enabled, we will encounter the same issue if they
> >>> decide to break the chain on a later restart and try to recover from a
> >>> retained cp.
> >>>> [1]
> >>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/
> >>>>
> >>>> Best,
> >>>> Zhanghao Chen
> >>>> 
> >>>> From: David Morávek mailto:d...@apache.org>>
> >>>> Sent: Wednesday, January 10, 2024 6:26
> >>>> To: dev@flink.apache.org<mailto:dev@flink.apache.org> <
> >>> dev@flink.apache.org<mailto:dev@flink.apache.org>>; Piotr Nowojski <
> >>> piotr.nowoj...@gmail.com<mailto:piotr.nowoj...@gmail.com>>
> >>>> Subject: Re: [DISCUSS] FLIP 411: Chaining-agnostic Operator ID
> >>> generation for improved state compatibility on parallelism change
> >>>> Hi Zhanghao,
> >>>>
> >>>> Thanks for the FLIP. What you're proposing makes a lot of sense +1
> >>>>
> >>>> Have you thought about how this works with unaligned checkpoints in case
> >>>> you go from unchained to chained? I think it should be fine because this
> >>>> scenario should only apply to forward/rebalance scenarios where we, as
> >>> far
> >>>> as I recall, force alignment anyway, so there should be no exchanges to
> >>>> snapshot. It might just work, but something to double-check. Maybe @Piotr
> >>>> Nowojski mailto:piotr.nowoj...@gmail.com>>
> >>> could confirm it.
> >>>> Best,
> >>>> D.
> >>>>
> >>>> On Wed, Jan 3, 2024 at 7:10 AM Zhanghao Chen  >>> <mailto:zhanghao.c...@outlook.com>>
> >>>> wrote:
> >>>>
> >>>> Dear Flink devs,
> >>>>
> >>>> I'd like to start a discussion on FLIP 411: Chaining-agnostic Operator ID
> >>>> generation for improved state compatibility on parallelism change [1].
> >>>>
> >>>> Currently, when user does not explicitly set operator UIDs, the chaining
> >>>> behavior will still affect state compatibility, as the generation of the
> >>>> Operator ID is dependent on its chained output nodes. For example, a
> >>> simple
> >>>> source->sink DAG with source and sink chained together is state
> >>>> incompatible with an otherwise identical DAG with source and sink
> >>> unchained
> >>>> (either because the parallelisms of the two ops are changed to be unequal
> >>>> or chaining is disabled). This greatly limits the flexibility to perform
> >>>> chain-breaking/building for performance tuning.
> >>>>
> >>>> The dependency on chained output nodes for Operator ID generation can be
> >>>> traced back to Flink 1.2. It is unclear at this point on why chained
> >>> output
> >>>> nodes are involved in the algorithm, but the following history background
> >>>> might be related: prior to Flink 1.3, Flink runtime takes the snapshots
> >>> by
> >>>> the operator ID of the first vertex in a chain, so it somewhat makes
> >>> sense
> >>>> to include chained output nodes into the algorithm as
> >>>> chain-breaking/building is expected to break state-compatibility anyway.
> >>>>
> >>>> Given that operator-level state recovery within a chain has long been
> >>>> supported since Flink 1.3, I propose to introduce StreamGraphHasherV3
> >>> that
> >>>> is agnostic of the chaining behavior of operators, so that users are free
> >>>> to tune the parallelism of individual operators without worrying about
> >>>> state incompatibility. We can make the V3 hasher an optional choice in
> >>>> Flink 1.19, and make it the default hasher in 2.0 for backwards
> >>>> compatibility.
> >>>>
> >>>> Looking forward to your suggestions on it, thanks~
> >>>>
> >>>> [1]
> >>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-411%3A+Chaining-agnostic+Operator+ID+generation+for+improved+state+compatibility+on+parallelism+change
> >>>> Best,
> >>>> Zhanghao Chen
> >>>>
> >>>
>


-- 

Best,
Benchao Li

[jira] [Created] (FLINK-34403) GHA misc test cannot pass due to OOM

2024-02-06 Thread Benchao Li (Jira)

Benchao Li created FLINK-34403:
--

 Summary: GHA misc test cannot pass due to OOM
 Key: FLINK-34403
 URL: https://issues.apache.org/jira/browse/FLINK-34403
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20
Reporter: Benchao Li


After FLINK-33611 merged, the misc test on GHA cannot pass due to out of memory 
error, throwing following exceptions:

{code:java}
Error: 05:43:21 05:43:21.768 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 40.98 s <<< FAILURE! -- in 
org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest
Error: 05:43:21 05:43:21.773 [ERROR] 
org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple -- Time 
elapsed: 40.97 s <<< ERROR!
Feb 07 05:43:21 org.apache.flink.util.FlinkRuntimeException: Error in 
serialization.
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:327)
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:162)
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1007)
Feb 07 05:43:21 at 
org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56)
Feb 07 05:43:21 at 
org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45)
Feb 07 05:43:21 at 
org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61)
Feb 07 05:43:21 at 
org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104)
Feb 07 05:43:21 at 
org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2440)
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2421)
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.datastream.DataStream.executeAndCollectWithClient(DataStream.java:1495)
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1382)
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.datastream.DataStream.executeAndCollect(DataStream.java:1367)
Feb 07 05:43:21 at 
org.apache.flink.formats.protobuf.ProtobufTestHelper.validateRow(ProtobufTestHelper.java:66)
Feb 07 05:43:21 at 
org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:89)
Feb 07 05:43:21 at 
org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:76)
Feb 07 05:43:21 at 
org.apache.flink.formats.protobuf.ProtobufTestHelper.rowToPbBytes(ProtobufTestHelper.java:71)
Feb 07 05:43:21 at 
org.apache.flink.formats.protobuf.VeryBigPbRowToProtoTest.testSimple(VeryBigPbRowToProtoTest.java:37)
Feb 07 05:43:21 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 07 05:43:21 Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: Self-suppression not permitted
Feb 07 05:43:21 at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
Feb 07 05:43:21 at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:323)
Feb 07 05:43:21 ... 18 more
Feb 07 05:43:21 Caused by: java.lang.IllegalArgumentException: Self-suppression 
not permitted
Feb 07 05:43:21 at 
java.lang.Throwable.addSuppressed(Throwable.java:1072)
Feb 07 05:43:21 at 
org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:556)
Feb 07 05:43:21 at 
org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:486)
Feb 07 05:43:21 at 
org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:182)
Feb 07 05:43:21 at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)
Feb 07 05:43:21 at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646)
Feb 07 05:43:21 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
Feb 07 05:43:21 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Feb 07 05:43:21 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:62

[NOTICE] master branch cannot compile for now

2024-01-25 Thread Benchao Li

Hi devs,

I merged FLINK-33263[1] this morning (10:16 +8:00), and it based on an
old commit which uses older guava version, so currently the master
branch cannot compile.

Zhanghao has discovered this in FLINK-33264[2], and the hotfix commit
has been proposed in the same PR, hopefully we can merge it after CI
passes (it may take a few hours).

Sorry for the inconvenience.

[1] https://github.com/apache/flink/pull/24128
[2] https://github.com/apache/flink/pull/24133

-- 

Best,
Benchao Li

Re: [VOTE] FLIP-415: Introduce a new join operator to support minibatch

2024-01-18 Thread Benchao Li

+1 (binding)

shuai xu  于2024年1月19日周五 12:58写道：

> Dear Flink Developers,
>
> Thank you for providing feedback on FLIP-415: Introduce a new join
> operator to support minibatch[1]. I'd like to start a vote on this FLIP.
> Here is the discussion thread[2].
>
> After the discussion, this FLIP will not introduce any new Option. The
> minibatch join will default to compacting the changelog. As for the option
> to control compaction within the minibatch that was mentioned in the
> discussion, it could be discussed in a future FLIP.
>
> The vote will be open for at least 72 hours unless there is an objection or
> insufficient votes.
>
> Best,
> Xu Shuai
>
> [1]
> FLIP-415: Introduce a new join operator to support minibatch - Apache
> Flink - Apache Software Foundation
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch>
> cwiki.apache.org
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch>
> [image: favicon.ico]
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch>
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch>
> [2]
> lists.apache.org
> <https://lists.apache.org/thread/pd9gzslq20dtzzfphxqvwhc43hrzo2y1>
> [image: favicon.ico]
> <https://lists.apache.org/thread/pd9gzslq20dtzzfphxqvwhc43hrzo2y1>
> <https://lists.apache.org/thread/pd9gzslq20dtzzfphxqvwhc43hrzo2y1>
>
>

-- 

Best,
Benchao Li

Re: [DISCUSS] Support detecting table schema from external files.

2024-01-16 Thread Benchao Li

Thanks Yisha for bringing up this discussion. Schema inferring is a
very interesting and useful feature, especially when it comes to
formats with well defined schemas such as Protobuf/Parquet. I'm
looking forward to the FLIP.

Yisha Zhou  于2024年1月15日周一 16:29写道：
>
> Hi dev,
>
> Currently,  we are used to creating a table by listing all physical columns 
> or using like syntax to reuse the table schema in Catalogs.
> However, in our company there are many cases that the messages in the 
> external systems are with very complex schema. The worst
> case is that some protobuf data has even thousands of fields in it.
>
> In these cases, listing fields in the DDL will be a very hard work. Creating 
> and updating such complex schema in Catalogs will also cost a lot.
> Therefore, I’d like to introduce an ability for detecting table schema from 
> external files in DDL.
>
> A good precedent from SnowFlake[1] works like below:
>
> CREATE TABLE mytable
>   USING TEMPLATE (
> SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
>   FROM TABLE(
> INFER_SCHEMA(
>   LOCATION=>'@mystage/json/',
>   FILE_FORMAT=>'my_json_format'
> )
>   ));
>
> The INFER_SCHEMA is a table function to 'automatically detects the file 
> metadata schema in a set of staged data files that contain
> semi-structured data and retrieves the column definitions.’ The files can be 
> in Parquet, Avro, ORC, JSON, and CSV.
>
> We don’t need to follow the syntax, but the functionality is exactly what I 
> want. In addition, the file can be more than just semi-structured data
> file. It can be metadata file. For example, a .proto file, a .thrift file.
>
> As it will be a big feature which deserves a FLIP to describe it in detail. 
> I'm forward to your feedback and suggestions before I start to do it.
>
> Best,
> Yisha
>
> [1]https://docs.snowflake.com/en/sql-reference/functions/infer_schema 
> <https://docs.snowflake.com/en/sql-reference/functions/infer_schema>



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-415: Introduce a new join operator to support minibatch

2024-01-16 Thread Benchao Li

shuai,

Thanks for the explanations, I understand the scenario you described
now. IIUC, this will be a rather rare case that need to disable
"compaction" when mini-batch is enabled, so I won't be against
introducing it. However, I would suggest to enable the "compaction" by
default (if mini-batch enabled), which will benefit most of use cases.
For others that have special requirements about the changelog semantic
(no compaction), they can disable compaction by themselves. WDYT?

> This is a relatively large optimization that may pose a significant
> risk of bugs, so I like to keep it from being enabled by default for
> now.
@Jingsong has raised an interesting point that for large optimization
or new features, we want to have an option for it and disable it by
default in case of the risk of bugs. I agree with it, mostly.
Currently there is no standard about whether a change is major or not,
which means we may run into a situation debating whether a change is
major or not. Anyway, it's an orthogonal topic to this discussion.

shuai xu  于2024年1月16日周二 13:14写道：
>
> Hi Benchao,
>
> Do you have any other questions about this issue?  Also, I would appreciate 
> your thoughts on the proposal to introduce the new option 
> 'table.exec.mini-batch.compact-changes-enabled'. I’m looking forward your 
> feedback.
>
> > 2024年1月12日 15:01，shuai xu  写道：
> >
> > Suppose we currently have a job that joins two CDC sources after 
> > de-duplicating them and the output is available for audit analysis, and the 
> > user turns off the parameter 
> > "table.exec.deduplicate.mini-batch.compact-changes-enabled" to ensure that 
> > it does not lose update details. If we don't introduce this parameter, 
> > after the user upgrades the version, some update details may be lost due to 
> > the mini-batch connection being enabled by default, resulting in distorted 
> > audit results.
> >
> >> 2024年1月11日 16:19，Benchao Li  写道：
> >>
> >>> the change might not be supposed for the downstream of the job which 
> >>> requires details of changelog
> >>
> >> Could you elaborate on this a bit? I've never met such kinds of
> >> requirements before, I'm curious what is the scenario that requires
> >> this.
> >>
> >> shuai xu  于2024年1月11日周四 13:08写道：
> >>>
> >>> Thanks for your response, Benchao.
> >>>
> >>> Here is my thought on the newly added option.
> >>> Users' current jobs are running on a version without minibatch join. If 
> >>> the existing option to enable minibatch join is utilized, then when 
> >>> users' jobs are migrated to the new version, the internal behavior of the 
> >>> join operation within the jobs will change. Although the semantic of 
> >>> changelog emitted by the Join operator is eventual consistency, the 
> >>> change might not be supposed for the downstream of the job which requires 
> >>> details of changelog. This newly added option also refers to 
> >>> 'table.exec.deduplicate.mini-batch.compact-changes-enabled'.
> >>>
> >>> As for the implementation，The new operator shares the state of the 
> >>> original operator and it merely has an additional minibatch for storing 
> >>> records to do some optimization. The storage remains consistent, and 
> >>> there is minor modification to the computational logic.
> >>>
> >>> Best,
> >>> Xu Shuai
> >>>
> >>>> 2024年1月10日 22:56，Benchao Li  写道：
> >>>>
> >>>> Thanks shuai for driving this, mini-batch Join is a very useful
> >>>> optimization, +1 for the general idea.
> >>>>
> >>>> Regarding the configuration
> >>>> "table.exec.stream.join.mini-batch-enabled", I'm not sure it's really
> >>>> necessary. The semantic of changelog emitted by the Join operator is
> >>>> eventual consistency, so there is no much difference between original
> >>>> Join and mini-batch Join from this aspect. Besides, introducing more
> >>>> options would make it more complex for users, harder to understand and
> >>>> maintain, which we should be careful about.
> >>>>
> >>>> One thing about the implementation, could you make the new operator
> >>>> share the same state definition with the original one?
> >>>>
> >>>> shuai xu  于2024年1月10日周三 21:23写道：
> >>>>>
> >>>>> Hi devs,
> >>>>>
> >>>>> I’d like to start a d

Re: [DISCUSS] FLIP-415: Introduce a new join operator to support minibatch

2024-01-11 Thread Benchao Li

> the change might not be supposed for the downstream of the job which requires 
> details of changelog

Could you elaborate on this a bit? I've never met such kinds of
requirements before, I'm curious what is the scenario that requires
this.

shuai xu  于2024年1月11日周四 13:08写道：
>
> Thanks for your response, Benchao.
>
> Here is my thought on the newly added option.
> Users' current jobs are running on a version without minibatch join. If the 
> existing option to enable minibatch join is utilized, then when users' jobs 
> are migrated to the new version, the internal behavior of the join operation 
> within the jobs will change. Although the semantic of changelog emitted by 
> the Join operator is eventual consistency, the change might not be supposed 
> for the downstream of the job which requires details of changelog. This newly 
> added option also refers to 
> 'table.exec.deduplicate.mini-batch.compact-changes-enabled'.
>
> As for the implementation，The new operator shares the state of the original 
> operator and it merely has an additional minibatch for storing records to do 
> some optimization. The storage remains consistent, and there is minor 
> modification to the computational logic.
>
> Best,
> Xu Shuai
>
> > 2024年1月10日 22:56，Benchao Li  写道：
> >
> > Thanks shuai for driving this, mini-batch Join is a very useful
> > optimization, +1 for the general idea.
> >
> > Regarding the configuration
> > "table.exec.stream.join.mini-batch-enabled", I'm not sure it's really
> > necessary. The semantic of changelog emitted by the Join operator is
> > eventual consistency, so there is no much difference between original
> > Join and mini-batch Join from this aspect. Besides, introducing more
> > options would make it more complex for users, harder to understand and
> > maintain, which we should be careful about.
> >
> > One thing about the implementation, could you make the new operator
> > share the same state definition with the original one?
> >
> > shuai xu  于2024年1月10日周三 21:23写道：
> >>
> >> Hi devs,
> >>
> >> I’d like to start a discussion on FLIP-415: Introduce a new join operator 
> >> to support minibatch[1].
> >>
> >> Currently, when performing cascading connections in Flink, there is a pain 
> >> point of record amplification. Every record join operator receives would 
> >> trigger join process. However, if records of +I and -D matches , they 
> >> could be folded to reduce two times of join process. Besides, records of  
> >> -U +U might output 4 records in which two records are redundant when 
> >> encountering outer join .
> >>
> >> To address this issue, this FLIP introduces a new  
> >> MiniBatchStreamingJoinOperator to achieve batch processing which could 
> >> reduce number of outputting redundant messages and avoid unnecessary join 
> >> processes.
> >> A new option is added to control the operator to avoid influencing 
> >> existing jobs.
> >>
> >> Please find more details in the FLIP wiki document [1]. Looking
> >> forward to your feedback.
> >>
> >> [1]
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
> >>
> >> Best,
> >> Xu Shuai
> >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-415: Introduce a new join operator to support minibatch

2024-01-10 Thread Benchao Li

Thanks shuai for driving this, mini-batch Join is a very useful
optimization, +1 for the general idea.

Regarding the configuration
"table.exec.stream.join.mini-batch-enabled", I'm not sure it's really
necessary. The semantic of changelog emitted by the Join operator is
eventual consistency, so there is no much difference between original
Join and mini-batch Join from this aspect. Besides, introducing more
options would make it more complex for users, harder to understand and
maintain, which we should be careful about.

One thing about the implementation, could you make the new operator
share the same state definition with the original one?

shuai xu  于2024年1月10日周三 21:23写道：
>
> Hi devs,
>
> I’d like to start a discussion on FLIP-415: Introduce a new join operator to 
> support minibatch[1].
>
> Currently, when performing cascading connections in Flink, there is a pain 
> point of record amplification. Every record join operator receives would 
> trigger join process. However, if records of +I and -D matches , they could 
> be folded to reduce two times of join process. Besides, records of  -U +U 
> might output 4 records in which two records are redundant when encountering 
> outer join .
>
> To address this issue, this FLIP introduces a new  
> MiniBatchStreamingJoinOperator to achieve batch processing which could reduce 
> number of outputting redundant messages and avoid unnecessary join processes.
> A new option is added to control the operator to avoid influencing existing 
> jobs.
>
> Please find more details in the FLIP wiki document [1]. Looking
> forward to your feedback.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
>
> Best,
> Xu Shuai



-- 

Best,
Benchao Li

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Benchao Li

+1 (non-binding)

Feng Wang  于2024年1月9日周二 15:29写道：
>
> +1 non-binding
> Regards,
> Feng
>
> On Tue, Jan 9, 2024 at 3:05 PM Leonard Xu  wrote:
>
> > Hello all,
> >
> > This is the official vote whether to accept the Flink CDC code contribution
> >  to Apache Flink.
> >
> > The current Flink CDC code, documentation, and website can be
> > found here:
> > code: https://github.com/ververica/flink-cdc-connectors <
> > https://github.com/ververica/flink-cdc-connectors>
> > docs: https://ververica.github.io/flink-cdc-connectors/ <
> > https://ververica.github.io/flink-cdc-connectors/>
> >
> > This vote should capture whether the Apache Flink community is interested
> > in accepting, maintaining, and evolving Flink CDC.
> >
> > Regarding my original proposal[1] in the dev mailing list, I firmly believe
> > that this initiative aligns perfectly with Flink. For the Flink community,
> > it represents an opportunity to bolster Flink's competitive edge in
> > streaming
> > data integration, fostering the robust growth and prosperity of the Apache
> > Flink
> > ecosystem. For the Flink CDC project, becoming a sub-project of Apache
> > Flink
> > means becoming an integral part of a neutral open-source community,
> > capable of
> > attracting a more diverse pool of contributors.
> >
> > All Flink CDC maintainers are dedicated to continuously contributing to
> > achieve
> > seamless integration with Flink. Additionally, PMC members like Jark,
> > Qingsheng,
> > and I are willing to infacilitate the expansion of contributors and
> > committers to
> > effectively maintain this new sub-project.
> >
> > This is a "Adoption of a new Codebase" vote as per the Flink bylaws [2].
> > Only PMC votes are binding. The vote will be open at least 7 days
> > (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
> > until we
> > achieve the 2/3rd majority. We will follow the instructions in the Flink
> > Bylaws
> > in the case of insufficient active binding voters:
> >
> > > 1. Wait until the minimum length of the voting passes.
> > > 2. Publicly reach out via personal email to the remaining binding voters
> > in the
> > voting mail thread for at least 2 attempts with at least 7 days between
> > two attempts.
> > > 3. If the binding voter being contacted still failed to respond after
> > all the attempts,
> > the binding voter will be considered as inactive for the purpose of this
> > particular voting.
> >
> > Welcome voting !
> >
> > Best,
> > Leonard
> > [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> > [2]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026



-- 

Best,
Benchao Li

Re: [VOTE] FLIP-387: Support named parameters for functions and call procedures

2024-01-04 Thread Benchao Li

+1 (binding)

Feng Jin  于2024年1月5日周五 10:49写道：
>
> Hi everyone
>
> Thanks for all the feedback about the FLIP-387: Support named parameters
> for functions and call procedures [1] [2] .
>
> I'd like to start a vote for it. The vote will be open for at least 72
> hours(excluding weekends,until Jan 10, 12:00AM GMT) unless there is an
> objection or an insufficient number of votes.
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures
> [2] https://lists.apache.org/thread/bto7mpjvcx7d7k86owb00dwrm65jx8cn
>
>
> Best,
> Feng Jin



-- 

Best,
Benchao Li

Re: [VOTE] FLIP-397: Add config options for administrator JVM options

2024-01-03 Thread Benchao Li

+1 (binding)

Zhanghao Chen  于2024年1月4日周四 10:30写道：
>
> Hi everyone,
>
> Thanks for all the feedbacks on FLIP-397 [1], which proposes to add a set of 
> default JVM options for administrator use that prepends the user-set extra 
> JVM options for easier platform-wide JVM pre-tuning. It has been discussed in 
> [2].
>
> I'd like to start a vote. The vote will be open for at least 72 hours (until 
> January 8th 12:00 GMT) unless there is an objection or insufficient votes.
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options
> [2] https://lists.apache.org/thread/cflonyrfd1ftmyrpztzj3ywckbq41jzg
>
> Best,
> Zhanghao Chen



-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-02 Thread Benchao Li

Congratulations, Alex!

Yuepeng Pan  于2024年1月3日周三 10:10写道：
>
> Congrats, Alex!
>
> Best,
> Yuepeng Pan
> At 2024-01-02 20:15:08, "Maximilian Michels"  wrote:
> >Happy New Year everyone,
> >
> >I'd like to start the year off by announcing Alexander Fedulov as a
> >new Flink committer.
> >
> >Alex has been active in the Flink community since 2019. He has
> >contributed more than 100 commits to Flink, its Kubernetes operator,
> >and various connectors [1][2].
> >
> >Especially noteworthy are his contributions on deprecating and
> >migrating the old Source API functions and test harnesses, the
> >enhancement to flame graphs, the dynamic rescale time computation in
> >Flink Autoscaling, as well as all the small enhancements Alex has
> >contributed which make a huge difference.
> >
> >Beyond code contributions, Alex has been an active community member
> >with his activity on the mailing lists [3][4], as well as various
> >talks and blog posts about Apache Flink [5][6].
> >
> >Congratulations Alex! The Flink community is proud to have you.
> >
> >Best,
> >The Flink PMC
> >
> >[1] https://github.com/search?type=commits=author%3Aafedulov+org%3Aapache
> >[2] 
> >https://issues.apache.org/jira/browse/FLINK-28229?jql=status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(afedulov)%20ORDER%20BY%20resolved%20DESC%2C%20created%20DESC
> >[3] https://lists.apache.org/list?dev@flink.apache.org:lte=100M:Fedulov
> >[4] https://lists.apache.org/list?u...@flink.apache.org:lte=100M:Fedulov
> >[5] 
> >https://flink.apache.org/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/
> >[6] 
> >https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-397: Add config options for administrator JVM options

2023-12-22 Thread Benchao Li

+1 from my side,

I also met some scenarios that I wanted to set some JVM options by
default for all Flink jobs before, such as
'-XX:-DontCompileHugeMethods', without it, some generated big methods
won't be optimized in JVM C2 compiler, leading to poor performance.

Zhanghao Chen  于2023年11月27日周一 20:04写道：
>
> Hi devs,
>
> I'd like to start a discussion on FLIP-397: Add config options for 
> administrator JVM options [1].
>
> In production environments, users typically develop and operate their Flink 
> jobs through a managed platform. Users may need to add JVM options to their 
> Flink applications (e.g. to tune GC options). They typically use the 
> env.java.opts.x series of options to do so. Platform administrators also have 
> a set of JVM options to apply by default, e.g. to use JVM 17, enable GC 
> logging, or apply pretuned GC options, etc. Both use cases will need to set 
> the same series of options and will clobber one another. Similar issues have 
> been described in SPARK-23472 [2].
>
> Therefore, I propose adding a set of default JVM options for administrator 
> use that prepends the user-set extra JVM options.
>
> Looking forward to hearing from you.
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options
> [2] https://issues.apache.org/jira/browse/SPARK-23472
>
> Best,
> Zhanghao Chen



-- 

Best,
Benchao Li

Re: Re: [DISCUSS] FLIP-387: Support named parameters for functions and call procedures

2023-12-21 Thread Benchao Li

I'm glad to hear that this is in your plan. Sorry that I overlooked
the PoC link in the FLIP previously, I'll go over the code of PoC, and
post here if there are any more concerns.

Xuyang  于2023年12月21日周四 10:39写道：

>
> Hi, Benchao.
>
>
> When Feng Jin and I tried the poc together, we found that when using udaf, 
> Calcite directly using the function's input parameters from 
> SqlCall#getOperandList. But in fact, these input parameters may use named 
> arguments, the order of parameters may be wrong, and they may not include 
> optional parameters that need to set default values. Actually, we should use 
> new SqlCallBinding(this, scope, call).operands() to let this method correct 
> the order and add default values. (You can see the modification in 
> SqlToRelConverter in poc branch[1])
>
>
> We have not reported this bug to the calcite community yet. Our original plan 
> was to report this bug to the calcite community during the process of doing 
> this flip, and fix it separately in flink's own calcite file. Because the 
> time for Calcite to release the version is uncertain. And the time to upgrade 
> flink to the latest calcite version is also unknown.
>
>
> The link to the poc code is at the bottom of the flip[2]. I'm post it here 
> again[1].
>
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures
> [2] 
> https://github.com/apache/flink/compare/master...hackergin:flink:poc_named_argument
>
>
>
> --
>
> Best！
> Xuyang
>
>
>
>
>
> 在 2023-12-20 13:31:26，"Benchao Li"  写道：
> >I didn't see your POC code, so I assumed that you'll need to add
> >SqlStdOperatorTable#DEFAULT and
> >SqlStdOperatorTable#ARGUMENT_ASSIGNMENT to FlinkSqlOperatorTable, am I
> >right?
> >
> >If yes, this would enable many builtin functions to allow default and
> >optional arguments, for example, `select md5(DEFAULT)`, I guess this
> >is not what we want to support right? If so, I would suggest to throw
> >proper errors for these unexpected usages.
> >
> >Benchao Li  于2023年12月20日周三 13:16写道：
> >>
> >> Thanks Feng for driving this, it's a very useful feature.
> >>
> >> In the FLIP, you mentioned that
> >> > During POC verification, bugs were discovered in Calcite that caused 
> >> > issues during the validation phase. We need to modify the 
> >> > SqlValidatorImpl and SqlToRelConverter to address these problems.
> >>
> >> Could you log bugs in Calcite, and reference the corresponding Jira
> >> number in your code. We want to upgrade Calcite version to latest as
> >> much as possible, and maintaining many bugfixes in Flink will add
> >> additional burdens for upgrading Calcite. By adding corresponding
> >> issue numbers, we can easily make sure that we can remove these Flink
> >> hosted bugfixes when we upgrade to a version that already contains the
> >> fix.
> >>
> >> Feng Jin  于2023年12月14日周四 19:30写道：
> >> >
> >> > Hi Timo
> >> > Thanks for your reply.
> >> >
> >> > >  1) ArgumentNames annotation
> >> >
> >> > I'm sorry for my incorrect expression. argumentNames is a method of
> >> > FunctionHints. We should introduce a new arguments method to replace this
> >> > method and return Argument[].
> >> > I updated the FLIP doc about this part.
> >> >
> >> > >  2) Evolution of FunctionHint
> >> >
> >> > +1 define DataTypeHint as part of ArgumentHint. I'll update the FLIP doc.
> >> >
> >> > > 3)  Semantical correctness
> >> >
> >> > I realized that I forgot to submit the latest modification of the FLIP
> >> > document. Xuyang and I had a prior discussion before starting this 
> >> > discuss.
> >> > Let's restrict it to supporting only one eval() function, which will
> >> > simplify the overall design.
> >> >
> >> > Therefore, I also concur with not permitting overloaded named parameters.
> >> >
> >> >
> >> > Best,
> >> > Feng
> >> >
> >> > On Thu, Dec 14, 2023 at 6:15 PM Timo Walther  wrote:
> >> >
> >> > > Hi Feng,
> >> > >
> >> > > thank you for proposing this FLIP. This nicely completes FLIP-65 which
> >> > > is great for usability.
> >> > >
> >> > > I read the FLIP and have some feedback:
> >> > >
> >> > &g

Re: [DISCUSS] FLIP-387: Support named parameters for functions and call procedures

2023-12-19 Thread Benchao Li

I didn't see your POC code, so I assumed that you'll need to add
SqlStdOperatorTable#DEFAULT and
SqlStdOperatorTable#ARGUMENT_ASSIGNMENT to FlinkSqlOperatorTable, am I
right?

If yes, this would enable many builtin functions to allow default and
optional arguments, for example, `select md5(DEFAULT)`, I guess this
is not what we want to support right? If so, I would suggest to throw
proper errors for these unexpected usages.

Benchao Li  于2023年12月20日周三 13:16写道：
>
> Thanks Feng for driving this, it's a very useful feature.
>
> In the FLIP, you mentioned that
> > During POC verification, bugs were discovered in Calcite that caused issues 
> > during the validation phase. We need to modify the SqlValidatorImpl and 
> > SqlToRelConverter to address these problems.
>
> Could you log bugs in Calcite, and reference the corresponding Jira
> number in your code. We want to upgrade Calcite version to latest as
> much as possible, and maintaining many bugfixes in Flink will add
> additional burdens for upgrading Calcite. By adding corresponding
> issue numbers, we can easily make sure that we can remove these Flink
> hosted bugfixes when we upgrade to a version that already contains the
> fix.
>
> Feng Jin  于2023年12月14日周四 19:30写道：
> >
> > Hi Timo
> > Thanks for your reply.
> >
> > >  1) ArgumentNames annotation
> >
> > I'm sorry for my incorrect expression. argumentNames is a method of
> > FunctionHints. We should introduce a new arguments method to replace this
> > method and return Argument[].
> > I updated the FLIP doc about this part.
> >
> > >  2) Evolution of FunctionHint
> >
> > +1 define DataTypeHint as part of ArgumentHint. I'll update the FLIP doc.
> >
> > > 3)  Semantical correctness
> >
> > I realized that I forgot to submit the latest modification of the FLIP
> > document. Xuyang and I had a prior discussion before starting this discuss.
> > Let's restrict it to supporting only one eval() function, which will
> > simplify the overall design.
> >
> > Therefore, I also concur with not permitting overloaded named parameters.
> >
> >
> > Best,
> > Feng
> >
> > On Thu, Dec 14, 2023 at 6:15 PM Timo Walther  wrote:
> >
> > > Hi Feng,
> > >
> > > thank you for proposing this FLIP. This nicely completes FLIP-65 which
> > > is great for usability.
> > >
> > > I read the FLIP and have some feedback:
> > >
> > >
> > > 1) ArgumentNames annotation
> > >
> > >  > Deprecate the ArgumentNames annotation as it is not user-friendly for
> > > specifying argument names with optional configuration.
> > >
> > > Which annotation does the FLIP reference here? I cannot find it in the
> > > Flink code base.
> > >
> > > 2) Evolution of FunctionHint
> > >
> > > Introducing @ArgumentHint makes a lot of sense to me. However, using it
> > > within @FunctionHint looks complex, because there is both `input=` and
> > > `arguments=`. Ideally, the @DataTypeHint can be defined inline as part
> > > of the @ArgumentHint. It could even be the `value` such that
> > > `@ArgumentHint(@DataTypeHint("INT"))` is valid on its own.
> > >
> > > We could deprecate `input=`. Or let both `input` and `arguments=`
> > > coexist but never be defined at the same time.
> > >
> > > 3) Semantical correctness
> > >
> > > As you can see in the `TypeInference` class, named parameters are
> > > prepared in the stack already. However, we need to watch out between
> > > helpful explanation (see `InputTypeStrategy#getExpectedSignatures`) and
> > > named parameters (see `TypeInference.Builder#namedArguments`) that can
> > > be used in SQL.
> > >
> > > If I remember correctly, named parameters can be reordered and don't
> > > allow overloading of signatures. Thus, only a single eval() should have
> > > named parameters. Looking at the FLIP it seems you would like to support
> > > multiple parameter lists. What changes are you planning to TypeInference
> > > (which is also declared as @PublicEvoving)? This should also be
> > > documented as the annotations should compile into this class.
> > >
> > > In general, I would prefer to keep it simple and don't allow overloading
> > > named parameters. With the optionality, users can add an arbitrary
> > > number of parameters to the signature of the same eval method.
> > >
> > > Regards,
> > > Timo
> > >
> > &g

Re: [DISCUSS] FLIP-387: Support named parameters for functions and call procedures

2023-12-19 Thread Benchao Li

, param2 => ‘value2’’) FROM
> > []
> > >
> > > -- for table function
> > > SELECT  *  FROM TABLE(my_table_function(param1 => 'value1', param2 =>
> > 'value2'))
> > >
> > > -- for agg function
> > > SELECT my_agg_function(param1 => 'value1', param2 => 'value2') FROM []
> > >
> > > -- for call procedure
> > > CALL  procedure_name(param1 => ‘value1’, param2 => ‘value2’)
> > > ```
> > >
> > > For UDX and Call procedure developers, we introduce a new annotation
> > > to specify the parameter name, indicate if it is optional, and
> > > potentially support specifying default values in the future
> > >
> > > ```
> > > public @interface ArgumentHint {
> > >  /**
> > >   * The name of the parameter, default is an empty string.
> > >   */
> > >  String name() default "";
> > >
> > >  /**
> > >   * Whether the parameter is optional, default is true.
> > >   */
> > >  boolean isOptional() default true;
> > > }}
> > > ```
> > >
> > > ```
> > > // Call Procedure Development
> > >
> > > public static class NamedArgumentsProcedure implements Procedure {
> > >
> > > // Example usage: CALL myNamedProcedure(in1 => 'value1', in2 =>
> > 'value2')
> > >
> > > // Example usage: CALL myNamedProcedure(in1 => 'value1', in2 =>
> > > 'value2', in3 => 'value3')
> > >
> > > @ProcedureHint(
> > > input = {@DataTypeHint(value = "STRING"),
> > > @DataTypeHint(value = "STRING"), @DataTypeHint(value = "STRING")},
> > > output = @DataTypeHint("STRING"),
> > >  arguments = {
> > >  @ArgumentHint(name = "in1", isOptional = false),
> > >  @ArgumentHint(name = "in2", isOptional = true)
> > >  @ArgumentHint(name = "in3", isOptional = true)})
> > > public String[] call(ProcedureContext procedureContext, String
> > > arg1, String arg2, String arg3) {
> > >     return new String[]{arg1 + ", " + arg2 + "," + arg3};
> > > }
> > > }
> > > ```
> > >
> > >
> > > Currently, we offer support for two scenarios when calling a function
> > > or procedure:
> > >
> > > 1. The corresponding parameters can be specified using the parameter
> > > name, without a specific order.
> > > 2. Unnecessary parameters can be omitted.
> > >
> > >
> > > There are still some limitations when using Named parameters:
> > > 1. Named parameters do not support variable arguments.
> > > 2. UDX or procedure classes that support named parameters can only
> > > have one eval method.
> > > 3. Due to the current limitations of Calcite-947[2], we cannot specify
> > > a default value for omitted parameters, which is Null by default.
> > >
> > >
> > >
> > > Also, thanks very much for the suggestions and help provided by Zelin
> > > and Lincoln.
> > >
> > >
> > >
> > >
> > > 1.
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures
> > .
> > >
> > > 2. https://issues.apache.org/jira/browse/CALCITE-947
> > >
> > >
> > >
> > > Best,
> > >
> > > Feng
> > >
> >
> >



-- 

Best,
Benchao Li

Re: Question on lookup joins

2023-12-18 Thread Benchao Li

I don't see a problem in the result. Since you are using LEFT JOIN,
the NULLs are expected where there is no matching result in the right
table.

Hang Ruan  于2023年12月18日周一 09:39写道：
>
> Hi, David.
>
> The FLIP-377[1] is about this part. You could take a look at it.
>
> Best,
> Hang
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>
>
> Hang Ruan  于2023年12月17日周日 20:56写道：
>
> > Hi, David.
> >
> > I think you are right that the value with NULL should not be returned if
> > the filter push down is closed.
> >
> > Maybe you should explain this sql to make sure this filter not be pushed
> > down to the lookup source.
> >
> > I see the configuration
> > 'table.optimizer.source.predicate-pushdown-enabled' relies on the class
> > FilterableTableSource, which is deprecated.
> > I am not sure whether this configuration is still useful for jdbc
> > connector, which is using the SupportsFilterPushDown.
> >
> > Maybe the jdbc connector should read this configuration and return an
> > empty 'acceptedFilters' in the method 'applyFilters'.
> >
> > Best,
> > Hang
> >
> > David Radley  于2023年12月16日周六 01:47写道：
> >
> >> Hi ,
> >> I am working on FLINK-33365 which related to JDBC predicate pushdown. I
> >> want to ensure that the same results occur with predicate pushdown as
> >> without. So I am asking this question outside the pr / issue.
> >>
> >> I notice the following behaviour for lookup joins without predicate
> >> pushdown. I was not expecting all the s , when there is not a
> >> matching join key.  ’a’ is a table in paimon and ‘db’ is a relational
> >> database.
> >>
> >>
> >>
> >> Flink SQL> select * from a;
> >>
> >> +++-+
> >>
> >> | op | ip |proctime |
> >>
> >> +++-+
> >>
> >> | +I |10.10.10.10 | 2023-12-15 17:36:10.028 |
> >>
> >> | +I |20.20.20.20 | 2023-12-15 17:36:10.030 |
> >>
> >> | +I |30.30.30.30 | 2023-12-15 17:36:10.031 |
> >>
> >> ^CQuery terminated, received a total of 3 rows
> >>
> >>
> >>
> >> Flink SQL> select * from  db_catalog.menagerie.e;
> >>
> >>
> >> +++-+-+-+-+
> >>
> >> | op | ip |type | age |
> >> height |  weight |
> >>
> >>
> >> +++-+-+-+-+
> >>
> >> | +I |10.10.10.10 |   1 |  30 |
> >>100 | 100 |
> >>
> >> | +I |10.10.10.10 |   2 |  40 |
> >> 90 | 110 |
> >>
> >> | +I |10.10.10.10 |   2 |  50 |
> >> 80 | 120 |
> >>
> >> | +I |10.10.10.10 |   3 |  50 |
> >> 70 |  40 |
> >>
> >> | +I |20.20.20.20 |   3 |  30 |
> >> 80 |  90 |
> >>
> >>
> >> +++-+-+-+-+
> >>
> >> Received a total of 5 rows
> >>
> >>
> >>
> >> Flink SQL> set table.optimizer.source.predicate-pushdown-enabled=false;
> >>
> >> [INFO] Execute statement succeed.
> >>
> >>
> >>
> >> Flink SQL> SELECT * FROM a left join mariadb_catalog.menagerie.e FOR
> >> SYSTEM_TIME AS OF a.proctime on e.type = 2 and a.ip = e.ip;
> >>
> >>
> >> +++-++-+-+-+-+
> >>
> >> | op | ip |proctime |
> >> ip0 |type | age |  height |
> >> weight |
> >>
> >>
> >> +++-----+----+-+-+-+-+
> >>
> >> | +I |10.10.10.10 | 2023-12-15 17:38:05.169 |
> >> 10.10.10.10 |   2 |  40 |  90 |
> >>  110 |
> >>
> >> | +I |10.10.10.10 | 2023-12-15 17:38:05.169 |
> >> 10.10.10.10 |   2 |  50 |  80 |
> >>  120 |
> >>
> >> | +I |20.20.20.20 | 2023-12-15 17:38:05.170 |
> >>   |   |   |   |
> >>  |
> >>
> >> | +I |30.30.30.30 | 2023-12-15 17:38:05.172 |
> >>   |   |   |   |
> >>  |
> >>
> >> Unless otherwise stated above:
> >>
> >> IBM United Kingdom Limited
> >> Registered in England and Wales with number 741598
> >> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >>
> >



-- 

Best,
Benchao Li

Re: [VOTE] FLIP-396: Trial to test GitHub Actions as an alternative for Flink's current Azure CI infrastructure

2023-12-13 Thread Benchao Li

+1 (binding)

Thanks Matthias for driving it!

Etienne Chauchot  于2023年12月13日周三 21:35写道：
>
> Thanks Matthias for your hard work !
>
> +1 (binding)
>
> Best
>
> Etienne
>
> Le 12/12/2023 à 11:23, Lincoln Lee a écrit :
> > +1 (binding)
> >
> > Thanks for driving this!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Yun Tang  于2023年12月12日周二 17:52写道：
> >
> >> Thanks for Matthias driving this work!
> >>
> >> +1 (binding)
> >>
> >> Best
> >> Yun Tang
> >> 
> >> From: Yangze Guo
> >> Sent: Tuesday, December 12, 2023 16:12
> >> To:dev@flink.apache.org  
> >> Subject: Re: [VOTE] FLIP-396: Trial to test GitHub Actions as an
> >> alternative for Flink's current Azure CI infrastructure
> >>
> >> +1 (binding)
> >>
> >> Best,
> >> Yangze Guo
> >>
> >> On Tue, Dec 12, 2023 at 3:51 PM Yuxin Tan  wrote:
> >>> +1 (non binding)
> >>> Thanks for the effort.
> >>>
> >>> Best,
> >>> Yuxin
> >>>
> >>>
> >>> Samrat Deb  于2023年12月12日周二 15:25写道：
> >>>
> >>>> +1 (non binding)
> >>>> Thanks for driving
> >>>>
> >>>> On Tue, 12 Dec 2023 at 11:59 AM, Sergey Nuyanzin
> >>>> wrote:
> >>>>
> >>>>> +1 (binding)
> >>>>>
> >>>>> Thanks for driving this
> >>>>>
> >>>>> On Tue, Dec 12, 2023, 07:22 Rui Fan<1996fan...@gmail.com>  wrote:
> >>>>>
> >>>>>> +1(binding)
> >>>>>>
> >>>>>> Best,
> >>>>>> Rui
> >>>>>>
> >>>>>> On Tue, Dec 12, 2023 at 11:58 AM weijie guo <
> >> guoweijieres...@gmail.com
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks Matthias for this efforts.
> >>>>>>>
> >>>>>>> +1(binding)
> >>>>>>>
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>>
> >>>>>>> Weijie
> >>>>>>>
> >>>>>>>
> >>>>>>> Matthias Pohl  于2023年12月11日周一
> >>>> 21:51写道：
> >>>>>>>> Hi everyone,
> >>>>>>>> I'd like to start a vote on FLIP-396 [1]. It covers enabling
> >> GitHub
> >>>>>>> Actions
> >>>>>>>> (GHA) in Apache Flink. This means that GHA workflows will run
> >> aside
> >>>>>> from
> >>>>>>>> the usual Azure CI workflows in a trial phase (which ends
> >> earliest
> >>>>> with
> >>>>>>> the
> >>>>>>>> release of Flink 1.19). Azure CI will still serve as the
> >> project's
> >>>>>> ground
> >>>>>>>> of truth until the community decides in a final vote to switch
> >> to
> >>>> GHA
> >>>>>> or
> >>>>>>>> stick to Azure CI.
> >>>>>>>>
> >>>>>>>> The related discussion thread can be found in [2].
> >>>>>>>>
> >>>>>>>> The vote will remain open for at least 72 hours and only
> >> concluded
> >>>> if
> >>>>>>> there
> >>>>>>>> are no objections and enough (i.e. at least 3) binding votes.
> >>>>>>>>
> >>>>>>>> Matthias
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure
> >>>>>>>> [2]
> >>>> https://lists.apache.org/thread/h4cmv7l3y8mxx2t435dmq4ltco4sbrgb
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> [image: Aiven]<https://www.aiven.io>
> >>>>>>>>
> >>>>>>>> *Matthias Pohl*
> >>>>>>>> Opensource Software Engineer, *Aiven*
> >>>>>>>> matthias.p...@aiven.io  |  +49 170 9869525
> >>>>>>>> aiven.io<https://www.aiven.io>|   <
> >>>>>>> https://www.facebook.com/aivencloud
> >>>>>>>><https://www.linkedin.com/company/aiven/>< 
> >>>>>>>> https://twitter.com/aiven_io>
> >>>>>>>> *Aiven Deutschland GmbH*
> >>>>>>>> Alexanderufer 3-7, 10117 Berlin
> >>>>>>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>>>>>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>>>>>



-- 

Best,
Benchao Li

Re: [DISCUSS] Release Flink 1.18.1

2023-12-08 Thread Benchao Li

I've merged FLINK-33313 to release-1.18 branch.

Péter Váry  于2023年12月8日周五 16:56写道：
>
> Hi Jing,
> Thanks for taking care of this!
> +1 (non-binding)
> Peter
>
> Sergey Nuyanzin  ezt írta (időpont: 2023. dec. 8., P,
> 9:36):
>
> > Thanks Jing driving it
> > +1
> >
> > also +1 to include FLINK-33313 mentioned by Benchao Li
> >
> > On Fri, Dec 8, 2023 at 9:17 AM Benchao Li  wrote:
> >
> > > Thanks Jing for driving 1.18.1 releasing.
> > >
> > > I would like to include FLINK-33313[1] in 1.18.1, it's just a bugfix,
> > > not a blocker, but it's already merged into master, I plan to merge it
> > > to 1.8/1.7 branches today after the CI passes.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-33313
> > >
> > > Jing Ge  于2023年12月8日周五 16:06写道：
> > > >
> > > > Hi all,
> > > >
> > > > I would like to discuss creating a new 1.18 patch release (1.18.1). The
> > > > last 1.18 release is nearly two months old, and since then, 37 tickets
> > > have
> > > > been closed [1], of which 6 are blocker/critical [2].  Some of them are
> > > > quite important, such as FLINK-33598 [3]
> > > >
> > > > Most urgent and important one is FLINK-33523 [4] and according to the
> > > > discussion thread[5] on the ML, 1.18.1 should/must be released asap
> > after
> > > > the breaking change commit has been reverted.
> > > >
> > > > I am not aware of any other unresolved blockers and there are no
> > > in-progress
> > > > tickets [6].
> > > > Please let me know if there are any issues you'd like to be included in
> > > > this release but still not merged.
> > > >
> > > > If the community agrees to create this new patch release, I could
> > > > volunteer as the release manager.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > [1]
> > > >
> > >
> > https://issues.apache.org/jira/browse/FLINK-33567?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.18.1%20%20and%20resolution%20%20!%3D%20%20Unresolved%20order%20by%20priority%20DESC
> > > > [2]
> > > >
> > >
> > https://issues.apache.org/jira/browse/FLINK-33693?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.18.1%20and%20resolution%20%20!%3D%20Unresolved%20%20and%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20by%20priority%20%20DESC
> > > > [3] https://issues.apache.org/jira/browse/FLINK-33598
> > > > [4] https://issues.apache.org/jira/browse/FLINK-33523
> > > > [5] https://lists.apache.org/thread/m4c879y8mb7hbn2kkjh9h3d8g1jphh3j
> > > > [6] https://issues.apache.org/jira/projects/FLINK/versions/12353640
> > > > Thanks,
> > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
> >
> > --
> > Best regards,
> > Sergey
> >



-- 

Best,
Benchao Li

Re: [DISCUSS] Release Flink 1.18.1

2023-12-08 Thread Benchao Li

Thanks Jing for driving 1.18.1 releasing.

I would like to include FLINK-33313[1] in 1.18.1, it's just a bugfix,
not a blocker, but it's already merged into master, I plan to merge it
to 1.8/1.7 branches today after the CI passes.

[1] https://issues.apache.org/jira/browse/FLINK-33313

Jing Ge  于2023年12月8日周五 16:06写道：
>
> Hi all,
>
> I would like to discuss creating a new 1.18 patch release (1.18.1). The
> last 1.18 release is nearly two months old, and since then, 37 tickets have
> been closed [1], of which 6 are blocker/critical [2].  Some of them are
> quite important, such as FLINK-33598 [3]
>
> Most urgent and important one is FLINK-33523 [4] and according to the
> discussion thread[5] on the ML, 1.18.1 should/must be released asap after
> the breaking change commit has been reverted.
>
> I am not aware of any other unresolved blockers and there are no in-progress
> tickets [6].
> Please let me know if there are any issues you'd like to be included in
> this release but still not merged.
>
> If the community agrees to create this new patch release, I could
> volunteer as the release manager.
>
> Best regards,
> Jing
>
> [1]
> https://issues.apache.org/jira/browse/FLINK-33567?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.18.1%20%20and%20resolution%20%20!%3D%20%20Unresolved%20order%20by%20priority%20DESC
> [2]
> https://issues.apache.org/jira/browse/FLINK-33693?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.18.1%20and%20resolution%20%20!%3D%20Unresolved%20%20and%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20by%20priority%20%20DESC
> [3] https://issues.apache.org/jira/browse/FLINK-33598
> [4] https://issues.apache.org/jira/browse/FLINK-33523
> [5] https://lists.apache.org/thread/m4c879y8mb7hbn2kkjh9h3d8g1jphh3j
> [6] https://issues.apache.org/jira/projects/FLINK/versions/12353640
> Thanks,



-- 

Best,
Benchao Li

Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-06 Thread Benchao Li

Thank you, Leonard and all the Flink CDC maintainers.

Big big +1 from me. As a heavy user of both Flink and Flink CDC, I've
already taken them as a whole project.

Leonard Xu  于2023年12月7日周四 11:25写道：
>
> Dear Flink devs,
>
> As you may have heard, we at Alibaba (Ververica) are planning to donate CDC 
> Connectors for the Apache Flink project[1] to the Apache Flink community.
>
> CDC Connectors for Apache Flink comprise a collection of source connectors 
> designed specifically for Apache Flink. These connectors[2] enable the 
> ingestion of changes from various databases using Change Data Capture (CDC), 
> most of these CDC connectors are powered by Debezium[3]. They support both 
> the DataStream API and the Table/SQL API, facilitating the reading of 
> database snapshots and continuous reading of transaction logs with 
> exactly-once processing, even in the event of failures.
>
>
> Additionally, in the latest version 3.0, we have introduced many long-awaited 
> features. Starting from CDC version 3.0, we've built a Streaming ELT 
> Framework available for streaming data integration. This framework allows 
> users to write their data synchronization logic in a simple YAML file, which 
> will automatically be translated into a Flink DataStreaming job. It 
> emphasizes optimizing the task submission process and offers advanced 
> functionalities such as whole database synchronization, merging sharded 
> tables, and schema evolution[4].
>
>
> I believe this initiative is a perfect match for both sides. For the Flink 
> community, it presents an opportunity to enhance Flink's competitive 
> advantage in streaming data integration, promoting the healthy growth and 
> prosperity of the Apache Flink ecosystem. For the CDC Connectors project, 
> becoming a sub-project of Apache Flink means being part of a neutral 
> open-source community, which can attract a more diverse pool of contributors.
>
> Please note that the aforementioned points represent only some of our 
> motivations and vision for this donation. Specific future operations need to 
> be further discussed in this thread. For example, the sub-project name after 
> the donation; we hope to name it Flink-CDC aiming to streaming data 
> intergration through Apache Flink, following the naming convention of 
> Flink-ML; And this project is managed by a total of 8 maintainers, including 
> 3 Flink PMC members and 1 Flink Committer. The remaining 4 maintainers are 
> also highly active contributors to the Flink community, donating this project 
> to the Flink community implies that their permissions might be reduced. 
> Therefore, we may need to bring up this topic for further discussion within 
> the Flink PMC. Additionally, we need to discuss how to migrate existing users 
> and documents. We have a user group of nearly 10,000 people and a 
> multi-version documentation site need to migrate. We also need to plan for 
> the migration of CI/CD processes and other specifics.
>
>
> While there are many intricate details that require implementation, we are 
> committed to progressing and finalizing this donation process.
>
>
> Despite being Flink’s most active ecological project (as evaluated by GitHub 
> metrics), it also boasts a significant user base. However, I believe it's 
> essential to commence discussions on future operations only after the 
> community reaches a consensus on whether they desire this donation.
>
>
> Really looking forward to hear what you think!
>
>
> Best,
> Leonard (on behalf of the Flink CDC Connectors project maintainers)
>
> [1] https://github.com/ververica/flink-cdc-connectors
> [2] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> [3] https://debezium.io
> [4] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-395: Deprecate Global Aggregator Manager

2023-11-26 Thread Benchao Li

+1 for the idea.

Currently OperatorCoordinator is still marked as @Internal, shouldn't
it be a public API already?

Besides, GlobalAggregatorManager supports coordination between
different operators, but OperatorCoordinator only supports
coordination within one operator. And CoordinatorStore introduced in
FLINK-24439 opens the door for multi operators. Again, should it also
be a public API too?

Weihua Hu  于2023年11月27日周一 11:05写道：
>
> Thanks Zhanghao for driving this FLIP.
>
> +1 for this.
>
> Best,
> Weihua
>
>
> On Mon, Nov 20, 2023 at 5:49 PM Zhanghao Chen 
> wrote:
>
> > Hi all,
> >
> > I'd like to start a discussion of FLIP-395: Deprecate Global Aggregator
> > Manager [1].
> >
> > Global Aggregate Manager was introduced in [2] to support event time
> > synchronization across sources and more generally, coordination of parallel
> > tasks. AFAIK, this was only used in the Kinesis source for an early version
> > of watermark alignment. Operator Coordinator, introduced in FLIP-27,
> > provides a more powerful and elegant solution for that need and is part of
> > the new source API standard. FLIP-217 further provides a complete solution
> > for watermark alignment of source splits on top of the Operator Coordinator
> > mechanism. Furthermore, Global Aggregate Manager manages state in JobMaster
> > object, causing problems for adaptive parallelism changes [3].
> >
> > Therefore, I propose to deprecate the use of Global Aggregate Manager,
> > which can improve the maintainability of the Flink codebase without
> > compromising its functionality.
> >
> > Looking forward to your feedbacks, thanks.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-395%3A+Deprecate+Global+Aggregator+Manager
> > [2] https://issues.apache.org/jira/browse/FLINK-10886
> > [3] https://issues.apache.org/jira/browse/FLINK-31245
> >
> > Best,
> > Zhanghao Chen
> >



-- 

Best,
Benchao Li

Re: [VOTE] FLIP-393: Make QueryOperations SQL serializable

2023-11-21 Thread Benchao Li

+1 (binding)

Dawid Wysakowicz  于2023年11月21日周二 18:56写道：
>
> Hi everyone,
>
> Thank you to everyone for the feedback on FLIP-393: Make QueryOperations
> SQL serializable[1]
> which has been discussed in this thread [2].
>
> I would like to start a vote for it. The vote will be open for at least 72
> hours unless there is an objection or not enough votes.
>
> [1] https://cwiki.apache.org/confluence/x/vQ2ZE
> [2] https://lists.apache.org/thread/ztyk68brsbmwwo66o1nvk3f6fqqhdxgk



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-393: Make QueryOperations SQL serializable

2023-11-20 Thread Benchao Li

The FLIP looks good to me now, let's start the vote.

Dawid Wysakowicz  于2023年11月20日周一 22:36写道：
>
> @Benchao I added an example to the page.
>
> If there are no further comments, I'll start a vote on the FLIP tomorrow or
> the next day.
>
> Best,
> Dawid
>
> On Fri, 17 Nov 2023 at 12:20, xiangyu feng  wrote:
>
> > >After this FLIP is done, FLINK-25015() can utilize this ability to set
> > > job name for queries.
> >
> > +1 for this. Currently, when users submit sql jobs through table api, we
> > can't see the complete SQL string on flink ui. It would be easy for us to
> > finish this feature if we can get serialized sql from QueryOperation
> > directly.
> >
> > So +1 for the overall proposal.
> >
> > Regards,
> > Xiangyu
> >
> > Benchao Li  于2023年11月17日周五 19:07写道：
> >
> > > That sounds good to me, I'm looking forward to it!
> > >
> > > After this FLIP is done, FLINK-25015 can utilize this ability to set
> > > job name for queries.
> > >
> > > Dawid Wysakowicz  于2023年11月16日周四 21:16写道：
> > > >
> > > > Yes, the idea is to convert the QueryOperation tree into a
> > > > proper/compilable query. To be honest I didn't think it could be done
> > > > differently, sorry if I wasn't clear enough. Yes, it is very much like
> > > > SqlNode#toSqlString you mentioned. I'll add an example of a single
> > > > QueryOperation tree to the FLIP.
> > > >
> > > > I tried to focus only on the public contracts, not on the
> > implementation
> > > > details. I mentioned Expressions, because this requires changing
> > > > semi-public interfaces in BuiltinFunctionDefinitions.
> > > >
> > > > Hope this makes it clearer.
> > > >
> > > > Regards,
> > > > Dawid
> > > >
> > > > On Thu, 16 Nov 2023 at 12:12, Benchao Li  wrote:
> > > >
> > > > > Sorry that I wasn't expressing it clearly.
> > > > >
> > > > > Since the FLIP talks about two things: ResolvedExpression and
> > > > > QueryOperation, and you have illustrated how to serialize
> > > > > ResolvedExpression into SQL string. I'm wondering how you'll gonna to
> > > > > convert QueryOperation into SQL string.
> > > > >
> > > > > I was thinking that you proposed to convert the QueryOperation tree
> > > > > into a "complete runnable SQL statement", e.g.
> > > > >
> > > > >
> > >
> > ProjectQueryOperation(x,y)->FilterQueryOperation(z>10)->TableSourceQueryOperation(T),
> > > > > we'll get "SELECT x, y FROM T WHERE z > 10".
> > > > > Maybe I misread it, maybe you just meant to convert each
> > > > > QueryOperation into a row-level SQL string instead the whole tree
> > into
> > > > > a complete SQL statement.
> > > > >
> > > > > The idea of translating whole QueryOperation tree into SQL statement
> > > > > may come from my experience of Apache Calcite, there is a
> > > > > SqlImplementor[1] which convert a RelNode tree into SqlNode, and
> > > > > further we can use  SqlNode#toSqlString to unparse it into SQL
> > string.
> > > > > I would assume that most of our QueryOperations are much like the
> > > > > abstraction of Calcite's RelNode, with some exceptions such as
> > > > > PlannerQueryOperation.
> > > > >
> > > > > [1]
> > > > >
> > >
> > https://github.com/apache/calcite/blob/153796f8994831ad015af4b9036aa01ebf78/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L141
> > > > >
> > > > > Dawid Wysakowicz  于2023年11月16日周四
> > 16:24写道：
> > > > > >
> > > > > > I think the FLIP covers all public contracts that are necessary to
> > be
> > > > > > discussed at that level.
> > > > > >
> > > > > > If you meant you could not find a method that would be called to
> > > trigger
> > > > > > the translation then it is already there. It's just not implemented
> > > yet:
> > > > > > QueryOperation#asSerializableString[1]. As I mentioned this is
> > > mostly a
> > > > > > follow up to previous work.
> > > > > >
> > > > > > Regards,
> > > > > > Dawid
> > >

Re: [DISCUSS] FLIP-393: Make QueryOperations SQL serializable

2023-11-17 Thread Benchao Li

That sounds good to me, I'm looking forward to it!

After this FLIP is done, FLINK-25015 can utilize this ability to set
job name for queries.

Dawid Wysakowicz  于2023年11月16日周四 21:16写道：
>
> Yes, the idea is to convert the QueryOperation tree into a
> proper/compilable query. To be honest I didn't think it could be done
> differently, sorry if I wasn't clear enough. Yes, it is very much like
> SqlNode#toSqlString you mentioned. I'll add an example of a single
> QueryOperation tree to the FLIP.
>
> I tried to focus only on the public contracts, not on the implementation
> details. I mentioned Expressions, because this requires changing
> semi-public interfaces in BuiltinFunctionDefinitions.
>
> Hope this makes it clearer.
>
> Regards,
> Dawid
>
> On Thu, 16 Nov 2023 at 12:12, Benchao Li  wrote:
>
> > Sorry that I wasn't expressing it clearly.
> >
> > Since the FLIP talks about two things: ResolvedExpression and
> > QueryOperation, and you have illustrated how to serialize
> > ResolvedExpression into SQL string. I'm wondering how you'll gonna to
> > convert QueryOperation into SQL string.
> >
> > I was thinking that you proposed to convert the QueryOperation tree
> > into a "complete runnable SQL statement", e.g.
> >
> > ProjectQueryOperation(x,y)->FilterQueryOperation(z>10)->TableSourceQueryOperation(T),
> > we'll get "SELECT x, y FROM T WHERE z > 10".
> > Maybe I misread it, maybe you just meant to convert each
> > QueryOperation into a row-level SQL string instead the whole tree into
> > a complete SQL statement.
> >
> > The idea of translating whole QueryOperation tree into SQL statement
> > may come from my experience of Apache Calcite, there is a
> > SqlImplementor[1] which convert a RelNode tree into SqlNode, and
> > further we can use  SqlNode#toSqlString to unparse it into SQL string.
> > I would assume that most of our QueryOperations are much like the
> > abstraction of Calcite's RelNode, with some exceptions such as
> > PlannerQueryOperation.
> >
> > [1]
> > https://github.com/apache/calcite/blob/153796f8994831ad015af4b9036aa01ebf78/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L141
> >
> > Dawid Wysakowicz  于2023年11月16日周四 16:24写道：
> > >
> > > I think the FLIP covers all public contracts that are necessary to be
> > > discussed at that level.
> > >
> > > If you meant you could not find a method that would be called to trigger
> > > the translation then it is already there. It's just not implemented yet:
> > > QueryOperation#asSerializableString[1]. As I mentioned this is mostly a
> > > follow up to previous work.
> > >
> > > Regards,
> > > Dawid
> > >
> > > [1]
> > >
> > https://github.com/apache/flink/blob/d18a4bfe596fc580f8280750fa3bfa22007671d9/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/operations/QueryOperation.java#L46
> > >
> > > On Wed, 15 Nov 2023 at 16:36, Benchao Li  wrote:
> > >
> > > > +1 for the idea of choosing SQL as the serialization format for
> > > > QueryOperation, thanks for Dawid for driving this FLIP.
> > > >
> > > > Regarding the implementation, I didn't see the proposal for how to
> > > > translate QueryOperation to SQL yet, am I missing something? Or the
> > > > FLIP is still in preliminary state, you just want to gather ideas
> > > > about whether to use SQL or something else as the serialization format
> > > > for QueryOperation?
> > > >
> > > > Dawid Wysakowicz  于2023年11月15日周三 19:34写道：
> > > > >
> > > > > Hi,
> > > > > I would like to propose a follow-up improvement to some of the work
> > that
> > > > > has been done over the years to the Table API. I posted the proposed
> > > > > changes in the FLIP-393. I'd like to get to know what others think of
> > > > > choosing SQL as the serialization format for QueryOperations.
> > > > > Regards,
> > > > > Dawid Wysakowicz
> > > > >
> > > > > [1] https://cwiki.apache.org/confluence/x/vQ2ZE
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best,
> > > > Benchao Li
> > > >
> >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-393: Make QueryOperations SQL serializable

2023-11-16 Thread Benchao Li

Sorry that I wasn't expressing it clearly.

Since the FLIP talks about two things: ResolvedExpression and
QueryOperation, and you have illustrated how to serialize
ResolvedExpression into SQL string. I'm wondering how you'll gonna to
convert QueryOperation into SQL string.

I was thinking that you proposed to convert the QueryOperation tree
into a "complete runnable SQL statement", e.g.
ProjectQueryOperation(x,y)->FilterQueryOperation(z>10)->TableSourceQueryOperation(T),
we'll get "SELECT x, y FROM T WHERE z > 10".
Maybe I misread it, maybe you just meant to convert each
QueryOperation into a row-level SQL string instead the whole tree into
a complete SQL statement.

The idea of translating whole QueryOperation tree into SQL statement
may come from my experience of Apache Calcite, there is a
SqlImplementor[1] which convert a RelNode tree into SqlNode, and
further we can use  SqlNode#toSqlString to unparse it into SQL string.
I would assume that most of our QueryOperations are much like the
abstraction of Calcite's RelNode, with some exceptions such as
PlannerQueryOperation.

[1] 
https://github.com/apache/calcite/blob/153796f8994831ad015af4b9036aa01ebf78/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L141

Dawid Wysakowicz  于2023年11月16日周四 16:24写道：
>
> I think the FLIP covers all public contracts that are necessary to be
> discussed at that level.
>
> If you meant you could not find a method that would be called to trigger
> the translation then it is already there. It's just not implemented yet:
> QueryOperation#asSerializableString[1]. As I mentioned this is mostly a
> follow up to previous work.
>
> Regards,
> Dawid
>
> [1]
> https://github.com/apache/flink/blob/d18a4bfe596fc580f8280750fa3bfa22007671d9/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/operations/QueryOperation.java#L46
>
> On Wed, 15 Nov 2023 at 16:36, Benchao Li  wrote:
>
> > +1 for the idea of choosing SQL as the serialization format for
> > QueryOperation, thanks for Dawid for driving this FLIP.
> >
> > Regarding the implementation, I didn't see the proposal for how to
> > translate QueryOperation to SQL yet, am I missing something? Or the
> > FLIP is still in preliminary state, you just want to gather ideas
> > about whether to use SQL or something else as the serialization format
> > for QueryOperation?
> >
> > Dawid Wysakowicz  于2023年11月15日周三 19:34写道：
> > >
> > > Hi,
> > > I would like to propose a follow-up improvement to some of the work that
> > > has been done over the years to the Table API. I posted the proposed
> > > changes in the FLIP-393. I'd like to get to know what others think of
> > > choosing SQL as the serialization format for QueryOperations.
> > > Regards,
> > > Dawid Wysakowicz
> > >
> > > [1] https://cwiki.apache.org/confluence/x/vQ2ZE
> >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >

-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-393: Make QueryOperations SQL serializable

2023-11-15 Thread Benchao Li

+1 for the idea of choosing SQL as the serialization format for
QueryOperation, thanks for Dawid for driving this FLIP.

Regarding the implementation, I didn't see the proposal for how to
translate QueryOperation to SQL yet, am I missing something? Or the
FLIP is still in preliminary state, you just want to gather ideas
about whether to use SQL or something else as the serialization format
for QueryOperation?

Dawid Wysakowicz  于2023年11月15日周三 19:34写道：
>
> Hi,
> I would like to propose a follow-up improvement to some of the work that
> has been done over the years to the Table API. I posted the proposed
> changes in the FLIP-393. I'd like to get to know what others think of
> choosing SQL as the serialization format for QueryOperations.
> Regards,
> Dawid Wysakowicz
>
> [1] https://cwiki.apache.org/confluence/x/vQ2ZE



-- 

Best,
Benchao Li

[jira] [Created] (FLINK-33533) Add documentation about performance for huge ProtoBuf definations

2023-11-12 Thread Benchao Li (Jira)

Benchao Li created FLINK-33533:
--

 Summary: Add documentation about performance for huge ProtoBuf 
definations
 Key: FLINK-33533
 URL: https://issues.apache.org/jira/browse/FLINK-33533
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Formats (JSON, Avro, Parquet, ORC, 
SequenceFile)
Reporter: Benchao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-376: Add DISTRIBUTED BY clause

2023-11-06 Thread Benchao Li

+1 (binding)

Lincoln Lee  于2023年11月7日周二 08:50写道：
>
> +1 (binding)
>
> Best,
> Lincoln Lee
>
>
> Martijn Visser  于2023年11月7日周二 01:02写道：
>
> > +1 (binding)
> >
> > Thanks for driving this Timo
> >
> > On Mon, Nov 6, 2023 at 12:40 PM Timo Walther  wrote:
> > >
> > > Hi everyone,
> > >
> > > I'd like to start a vote on FLIP-376: Add DISTRIBUTED BY clause[1] which
> > > has been discussed in this thread [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an objection
> > > or not enough votes.
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
> > > [2] https://lists.apache.org/thread/z1p4f38x9ohv29y4nlq8g1wdxwfjwxd1
> > >
> > > Cheers,
> > > Timo
> >



-- 

Best,
Benchao Li

Re: Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-28 Thread Benchao Li

Thanks Timo for preparing the FLIP.

Regarding "By default, DISTRIBUTED BY assumes a list of columns for an
implicit hash partitioning."
Do you think it's useful to add some extensibility for the hash
strategy. One scenario I can foresee is if we write bucketed data into
Hive, and if Flink's hash strategy is different than Hive/Spark's,
then they could not utilize the bucketed data written by Flink. This
is the one case I met in production already, there may be more cases
like this that needs customize the hash strategy to accommodate with
existing systems.

yunfan zhang  于2023年10月27日周五 19:06写道：
>
> Distribute by in DML is also supported by Hive.
> And it is also useful for flink.
> Users can use this ability to increase cache hit rate in lookup join.
> And users can use "distribute by key, rand(1, 10)” to avoid data skew problem.
> And I think it is another way to solve this Flip204[1]
> There is already has some people required this feature[2]
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> [2] https://issues.apache.org/jira/browse/FLINK-27541
>
> On 2023/10/27 08:20:25 Jark Wu wrote:
> > Hi Timo,
> >
> > Thanks for starting this discussion. I really like it!
> > The FLIP is already in good shape, I only have some minor comments.
> >
> > 1. Could we also support HASH and RANGE distribution kind on the DDL
> > syntax?
> > I noticed that HASH and UNKNOWN are introduced in the Java API, but not in
> > the syntax.
> >
> > 2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER TABLE?
> > Some storage engines support automatically determining the bucket number
> > based on
> > the cluster resources and data size of the table. For example, StarRocks[1]
> > and Paimon[2].
> >
> > Best,
> > Jark
> >
> > [1]:
> > https://docs.starrocks.io/en-us/latest/table_design/Data_distribution#determine-the-number-of-buckets
> > [2]:
> > https://paimon.apache.org/docs/0.5/concepts/primary-key-table/#dynamic-bucket
> >
> > On Thu, 26 Oct 2023 at 18:26, Jingsong Li  wrote:
> >
> > > Very thanks Timo for starting this discussion.
> > >
> > > Big +1 for this.
> > >
> > > The design looks good to me!
> > >
> > > We can add some documentation for connector developers. For example:
> > > for sink, If there needs some keyby, please finish the keyby by the
> > > connector itself. SupportsBucketing is just a marker interface.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, Oct 26, 2023 at 5:00 PM Timo Walther  wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I would like to start a discussion on FLIP-376: Add DISTRIBUTED BY
> > > > clause [1].
> > > >
> > > > Many SQL vendors expose the concepts of Partitioning, Bucketing, and
> > > > Clustering. This FLIP continues the work of previous FLIPs and would
> > > > like to introduce the concept of "Bucketing" to Flink.
> > > >
> > > > This is a pure connector characteristic and helps both Apache Kafka and
> > > > Apache Paimon connectors in avoiding a complex WITH clause by providing
> > > > improved syntax.
> > > >
> > > > Here is an example:
> > > >
> > > > CREATE TABLE MyTable
> > > >(
> > > >  uid BIGINT,
> > > >  name STRING
> > > >)
> > > >DISTRIBUTED BY (uid) INTO 6 BUCKETS
> > > >WITH (
> > > >  'connector' = 'kafka'
> > > >)
> > > >
> > > > The full syntax specification can be found in the document. The clause
> > > > should be optional and fully backwards compatible.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > > [1]
> > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
> > >
> >



-- 

Best,
Benchao Li

Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Benchao Li

Great work, thanks everyone involved!

Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道：
>
> Thanks for the great work!
>
> Best,
> Rui
>
> On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:
>
> > Finally! Thanks to all!
> >
> > Best,
> > Paul Lam
> >
> > > 2023年10月27日 03:58，Alexander Fedulov  写道：
> > >
> > > Great work, thanks everyone!
> > >
> > > Best,
> > > Alexander
> > >
> > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> > > wrote:
> > >
> > >> Thank you all who have contributed!
> > >>
> > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> > >>
> > >>> Thanks for the great work! Congratulations
> > >>>
> > >>>
> > >>> Best,
> > >>> Feng Jin
> > >>>
> > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
> > >>>
> > >>>> Congratulations, Well done!
> > >>>>
> > >>>> Best,
> > >>>> Leonard
> > >>>>
> > >>>> On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks for the great work! Congrats all!
> > >>>>>
> > >>>>> Best,
> > >>>>> Lincoln Lee
> > >>>>>
> > >>>>>
> > >>>>> Jing Ge  于2023年10月27日周五 00:16写道：
> > >>>>>
> > >>>>>> The Apache Flink community is very happy to announce the release of
> > >>>>> Apache
> > >>>>>> Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > >>>>> series.
> > >>>>>>
> > >>>>>> Apache Flink® is an open-source unified stream and batch data
> > >>>> processing
> > >>>>>> framework for distributed, high-performing, always-available, and
> > >>>>> accurate
> > >>>>>> data applications.
> > >>>>>>
> > >>>>>> The release is available for download at:
> > >>>>>> https://flink.apache.org/downloads.html
> > >>>>>>
> > >>>>>> Please check out the release blog post for an overview of the
> > >>>>> improvements
> > >>>>>> for this release:
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > >>>>>>
> > >>>>>> The full release notes are available in Jira:
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > >>>>>>
> > >>>>>> We would like to thank all contributors of the Apache Flink
> > >> community
> > >>>> who
> > >>>>>> made this release possible!
> > >>>>>>
> > >>>>>> Best regards,
> > >>>>>> Konstantin, Qingsheng, Sergey, and Jing
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-25 Thread Benchao Li

ot; to false.
> >>
> >> Regarding the priority of these two configurations, I believe that
> >> "table.optimizer.source.predicate"
> >> takes precedence over "scan.filter-push-down.enabled" and it exhibits the
> >> following behavior.
> >>
> >> 1. "table.optimizer.source.predicate" = "true" and
> >> "scan.filter-push-down.enabled" = "true"
> >> This is the default behavior, allowing filter pushdown for sources.
> >>
> >> 2. "table.optimizer.source.predicate" = "true" and
> >> "scan.filter-push-down.enabled" = "false"
> >> Allow the planner to perform predicate pushdown, but individual sources do
> >> not enable filter pushdown.
> >>
> >> 3. "table.optimizer.source.predicate" = "false"
> >> Predicate pushdown is not allowed for the planner.
> >> Regardless of the value of the "scan.filter-push-down.enabled"
> >> configuration, filter pushdown is disabled.
> >> In this scenario, the behavior remains consistent with the old version as
> >> well.
> >>
> >>
> >> From an implementation perspective, setting the priority of
> >> "scan.filter-push-down.enabled" higher than
> >> "table.optimizer.source.predicate" is difficult to achieve now.
> >> Because the PushFilterIntoSourceScanRuleBase at the planner level takes
> >> precedence over the source-level FilterPushDownSpec.
> >> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the
> >> Source-level filter pushdown be performed.
> >>
> >> Additionally, in my opinion, there doesn't seem to be much benefit in
> >> setting a higher priority for "scan.filter-push-down.enabled".
> >> It may instead affect compatibility and increase implementation complexity.
> >>
> >> WDYT?
> >>
> >> Best,
> >> Jiabao
> >>
> >>
> >>> 2023年10月25日 11:56，Benchao Li  写道：
> >>>
> >>> I agree with Jane that fine-grained configurations should have higher
> >>> priority than job level configurations.
> >>>
> >>> For current proposal, we can achieve that:
> >>> - Set "table.optimizer.source.predicate" = "true" to enable by
> >>> default, and set ""scan.filter-push-down.enabled" = "false" to disable
> >>> it per table source
> >>> - Set "table.optimizer.source.predicate" = "false" to disable by
> >>> default, and set ""scan.filter-push-down.enabled" = "true" to enable
> >>> it per table source
> >>>
> >>> Jane Chan  于2023年10月24日周二 23:55写道：
> >>>>
> >>>>>
> >>>>> I believe that the configuration "table.optimizer.source.predicate"
> >> has a
> >>>>> higher priority at the planner level than the configuration at the
> >> source
> >>>>> level,
> >>>>> and it seems easy to implement now.
> >>>>>
> >>>>
> >>>> Correct me if I'm wrong, but I think the fine-grained configuration
> >>>> "scan.filter-push-down.enabled" should have a higher priority because
> >> the
> >>>> default value of "table.optimizer.source.predicate" is true. As a
> >> result,
> >>>> turning off filter push-down for a specific source will not take effect
> >>>> unless the default value of "table.optimizer.source.predicate" is
> >> changed
> >>>> to false, or, alternatively, let users manually set
> >>>> "table.optimizer.source.predicate" to false first and then selectively
> >>>> enable filter push-down for the desired sources, which is less
> >> intuitive.
> >>>> WDYT?
> >>>>
> >>>> Best,
> >>>> Jane
> >>>>
> >>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun  >> .invalid>
> >>>> wrote:
> >>>>
> >>>>> Thanks Jane,
> >>>>>
> >>>>> I believe that the configuration "table.optimizer.source.predicate"
> >> has a
> >>>>> higher priority at the planner level than the configuration at the
> >> source
> >>>>> level,
> >>>>> and it seems easy

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Benchao Li

 would like to start a discussion on FLIP-377: support configuration
> > >> to
> > >>>> disable filter pushdown for Table/SQL Sources[1].
> > >>>>>
> > >>>>> Currently, Flink Table/SQL does not expose fine-grained control for
> > >>>> users to enable or disable filter pushdown.
> > >>>>> However, filter pushdown has some side effects, such as additional
> > >>>> computational pressure on external systems.
> > >>>>> Moreover, Improper queries can lead to issues such as full table
> > scans,
> > >>>> which in turn can impact the stability of external systems.
> > >>>>>
> > >>>>> Suppose we have an SQL query with two sources: Kafka and a database.
> > >>>>> The database is sensitive to pressure, and we want to configure it to
> > >>>> not perform filter pushdown to the database source.
> > >>>>> However, we still want to perform filter pushdown to the Kafka source
> > >> to
> > >>>> decrease network IO.
> > >>>>>
> > >>>>> I propose to support configuration to disable filter push down for
> > >>>> Table/SQL sources to let user decide whether to perform filter
> > pushdown.
> > >>>>>
> > >>>>> Looking forward to your feedback.
> > >>>>>
> > >>>>> [1]
> > >>>>
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> > >>>>>
> > >>>>> Best,
> > >>>>> Jiabao
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >



-- 

Best,
Benchao Li

Re: [VOTE] FLIP-373: Support Configuring Different State TTLs using SQL Hint

2023-10-22 Thread Benchao Li

+1 (binding)

Feng Jin  于2023年10月23日周一 13:07写道：
>
> +1(non-binding)
>
>
> Best,
> Feng
>
>
> On Mon, Oct 23, 2023 at 11:58 AM Xuyang  wrote:
>
> > +1(non-binding)
> >
> >
> >
> >
> > --
> >
> > Best！
> > Xuyang
> >
> >
> >
> >
> >
> > At 2023-10-23 11:38:15, "Jane Chan"  wrote:
> > >Hi developers,
> > >
> > >Thanks for all the feedback on FLIP-373: Support Configuring Different
> > >State TTLs using SQL Hint [1].
> > >Based on the discussion [2], we have reached a consensus, so I'd like to
> > >start a vote.
> > >
> > >The vote will last for at least 72 hours (Oct. 26th at 10:00 A.M. GMT)
> > >unless there is an objection or insufficient votes.
> > >
> > >[1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint
> > >[2] https://lists.apache.org/thread/3s69dhv3rp4s0kysnslqbvyqo3qf7zq5
> > >
> > >Best,
> > >Jane
> >



-- 

Best,
Benchao Li

Re: Re: [DISCUSS] FLIP-373: Support Configuring Different State TTLs using SQL Hint

2023-10-22 Thread Benchao Li

> > > > > > > > > Hi, Jane.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think this syntax will be easier for users to set operator
> > > > ttl. So
> > > > > > > big
> > > > > > > > > +1. I left some minor comments here.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I notice that using STATE_TTL hints wrongly will not throw
> > any
> > > > > > > exceptions.
> > > > > > > > > But it seems that in the current join hint scenario,
> > > > > > > > > if user uses an unknown table name as the chosen side, a
> > > > validation
> > > > > > > > > exception will be thrown.
> > > > > > > > > Maybe we should distinguish which exceptions need to be
> > thrown
> > > > > > > explicitly.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best！
> > > > > > > > > Xuyang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > At 2023-10-10 18:23:55, "Jane Chan" 
> > > > wrote:
> > > > > > > > > >Hi Feng,
> > > > > > > > > >
> > > > > > > > > >Thank you for your valuable comments. The reason for not
> > > > including
> > > > > > the
> > > > > > > > > >scenarios above is as follows:
> > > > > > > > > >
> > > > > > > > > >For <1>, the automatically inferred stateful operators are
> > not
> > > > > > easily
> > > > > > > > > >expressible in SQL. This issue was discussed in FLIP-292,
> > and
> > > > > > besides
> > > > > > > > > >ChangelogNormalize, SinkUpsertMateralizer also faces the
> > same
> > > > > > problem.
> > > > > > > > > >
> > > > > > > > > >For <2> and <3>, the challenge lies in internal
> > implementation.
> > > > > > > During the
> > > > > > > > > >default_rewrite phase, the row_number expression in
> > > > LogicalProject
> > > > > > is
> > > > > > > > > >transformed into LogicalWindow by Calcite's
> > > > > > > > > >CoreRules#PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW. However,
> > > > > > > CalcRelSplitter
> > > > > > > > > >does not pass the hints as an input argument when creating
> > > > > > > LogicalWindow,
> > > > > > > > > >resulting in the loss of the hint at this step. To support
> > > > this, we
> > > > > > > may
> > > > > > > > > >need to rewrite some optimization rules in Calcite, which
> > could
> > > > be a
> > > > > > > > > >follow-up work if required.
> > > > > > > > > >
> > > > > > > > > >Best,
> > > > > > > > > >Jane
> > > > > > > > > >
> > > > > > > > > >On Tue, Oct 10, 2023 at 1:40 AM Feng Jin <
> > jinfeng1...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi Jane,
> > > > > > > > > >>
> > > > > > > > > >> Thank you for proposing this FLIP.
> > > > > > > > > >>
> > > > > > > > > >> I believe that this FLIP will greatly enhance the
> > flexibility
> > > > of
> > > > > > > setting
> > > > > > > > > >> state, and by setting different operators' TTL, it can
> > also
> > > > > > > increase job
> > > > > > > > > >> stability, especially in regular join scenarios.
> > > > > > > > > >> The parameter design is very concise, big +1 for this,
> > and it
> > > > is
> > > > > > > also
> > > > > > > > > >> relatively easy to use for users.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> I have a small question: in the FLIP, it only mentions
> > join
> > > > and
> > > > > > > group.
> > > > > > > > > >> Should we also consider other scenarios?
> > > > > > > > > >>
> > > > > > > > > >> 1. the auto generated deduplicate operator[1].
> > > > > > > > > >> 2. the deduplicate query[2].
> > > > > > > > > >> 3. the topN query[3].
> > > > > > > > > >>
> > > > > > > > > >> [1]
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-source-cdc-events-duplicate
> > > > > > > > > >> [2]
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/deduplication/
> > > > > > > > > >> [3]
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/topn/
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Best,
> > > > > > > > > >> Feng
> > > > > > > > > >>
> > > > > > > > > >> On Sun, Oct 8, 2023 at 5:53 PM Jane Chan <
> > > > qingyue@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi devs,
> > > > > > > > > >> >
> > > > > > > > > >> > I'd like to initiate a discussion on FLIP-373: Support
> > > > > > Configuring
> > > > > > > > > >> > Different State TTLs using SQL Hint [1]. This proposal
> > is
> > > > on top
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > >> > FLIP-292 [2] to address typical scenarios with
> > unambiguous
> > > > > > > semantics
> > > > > > > > > and
> > > > > > > > > >> > hint propagation.
> > > > > > > > > >> >
> > > > > > > > > >> > I'm looking forward to your opinions!
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > [1]
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint
> > > > > > > > > >> > [2]
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration
> > > > > > > > > >> >
> > > > > > > > > >> > Best,
> > > > > > > > > >> > Jane
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best
> > > > > >
> > > > > > ConradJam
> > > > > >
> > > >
> >



-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink Committer - Ron Liu

2023-10-16 Thread Benchao Li

Congrats, Ron! Well deserved!

Leonard Xu  于2023年10月16日周一 14:19写道：
>
> Congratulations, Ron!
>
> Best,
> Leonard
>
> > 2023年10月16日 下午2:10，Zakelly Lan  写道：
> >
> > Congratulations, Ron!
> >
> > Best,
> > Zakelly
> >
> > On Mon, Oct 16, 2023 at 12:04 PM Weihua Hu  wrote:
> >>
> >> Congrats, Ron!
> >>
> >> Best,
> >> Weihua
> >>
> >>
> >> On Mon, Oct 16, 2023 at 11:50 AM Yangze Guo  wrote:
> >>
> >>> Congrats, Ron!
> >>>
> >>> Best,
> >>> Yangze Guo
> >>>
> >>> On Mon, Oct 16, 2023 at 11:48 AM Matt Wang  wrote:
> >>>>
> >>>> Congratulations, Ron!
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Best,
> >>>> Matt Wang
> >>>>
> >>>>
> >>>>  Replied Message 
> >>>> | From | Feng Jin |
> >>>> | Date | 10/16/2023 11:29 |
> >>>> | To |  |
> >>>> | Subject | Re: [ANNOUNCE] New Apache Flink Committer - Ron Liu |
> >>>> Congratulations, Ron!
> >>>>
> >>>> Best,
> >>>> Feng
> >>>>
> >>>> On Mon, Oct 16, 2023 at 11:22 AM yh z  wrote:
> >>>>
> >>>> Congratulations, Ron!
> >>>>
> >>>> Best,
> >>>> Yunhong (SwuferHong)
> >>>>
> >>>> Yuxin Tan  于2023年10月16日周一 11:12写道：
> >>>>
> >>>> Congratulations, Ron!
> >>>>
> >>>> Best,
> >>>> Yuxin
> >>>>
> >>>>
> >>>> Junrui Lee  于2023年10月16日周一 10:24写道：
> >>>>
> >>>> Congratulations Ron !
> >>>>
> >>>> Best,
> >>>> Junrui
> >>>>
> >>>> Yun Tang  于2023年10月16日周一 10:22写道：
> >>>>
> >>>> Congratulations, Ron!
> >>>>
> >>>> Best
> >>>> Yun Tang
> >>>> 
> >>>> From: yu zelin 
> >>>> Sent: Monday, October 16, 2023 10:16
> >>>> To: dev@flink.apache.org 
> >>>> Cc: ron9@gmail.com 
> >>>> Subject: Re: [ANNOUNCE] New Apache Flink Committer - Ron Liu
> >>>>
> >>>> Congratulations!
> >>>>
> >>>> Best,
> >>>> Yu Zelin
> >>>>
> >>>> 2023年10月16日 09:56，Jark Wu  写道：
> >>>>
> >>>> Hi, everyone
> >>>>
> >>>> On behalf of the PMC, I'm very happy to announce Ron Liu as a new
> >>>> Flink
> >>>> Committer.
> >>>>
> >>>> Ron has been continuously contributing to the Flink project for
> >>>> many
> >>>> years,
> >>>> authored and reviewed a lot of codes. He mainly works on Flink SQL
> >>>> parts
> >>>> and drove several important FLIPs, e.g., USING JAR (FLIP-214),
> >>>> Operator
> >>>> Fusion CodeGen (FLIP-315), Runtime Filter (FLIP-324). He has a
> >>>> great
> >>>> knowledge of the Batch SQL and improved a lot of batch performance
> >>>> in
> >>>> the
> >>>> past several releases. He is also quite active in mailing lists,
> >>>> participating in discussions and answering user questions.
> >>>>
> >>>> Please join me in congratulating Ron Liu for becoming a Flink
> >>>> Committer!
> >>>>
> >>>> Best,
> >>>> Jark Wu (on behalf of the Flink PMC)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink Committer - Jane Chan

2023-10-16 Thread Benchao Li

Congrats, Jane! Well deserved~

Leonard Xu  于2023年10月16日周一 14:19写道：
>
>
> > Congrats! I noticed Jane has been around for a while; well-deserved.
> >
> > Best,
> > tison.
>
> +1
>
> Best,
> Leonard
>
>
>
> >
> >
> > Jark Wu  于2023年10月16日周一 09:58写道：
> >
> >> Hi, everyone
> >>
> >> On behalf of the PMC, I'm very happy to announce Jane Chan as a new Flink
> >> Committer.
> >>
> >> Jane started code contribution in Jan 2021 and has been active in the Flink
> >> community since. She authored more than 60 PRs and reviewed more than 40
> >> PRs. Her contribution mainly revolves around Flink SQL, including Plan
> >> Advice (FLIP-280), operator-level state TTL (FLIP-292), and ALTER TABLE
> >> statements (FLINK-21634). Jane participated deeply in development
> >> discussions and also helped answer user question emails. Jane was also a
> >> core contributor of Flink Table Store (now Paimon) when the project was in
> >> the early days.
> >>
> >> Please join me in congratulating Jane Chan for becoming a Flink Committer!
> >>
> >> Best,
> >> Jark Wu (on behalf of the Flink PMC)
> >>
>


-- 

Best,
Benchao Li

Re: [VOTE] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-10 Thread Benchao Li

+1 (binding)

Rui Fan <1996fan...@gmail.com> 于2023年10月11日周三 10:17写道：
>
> +1(binding)
>
> Best,
> Rui
>
> On Wed, Oct 11, 2023 at 10:07 AM Yangze Guo  wrote:
>
> > Hi everyone,
> >
> > I'd like to start the vote of FLIP-374 [1]. This FLIP is discussed in
> > the thread [2].
> >
> > The vote will be open for at least 72 hours. Unless there is an
> > objection, I'll try to close it by October 16, 2023 if we have
> > received sufficient votes.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
> > [2] https://lists.apache.org/thread/g4vl8mgnwgl7vjyvjy6zrc8w54b2lthv
> >
> > Best,
> > Yangze Guo
> >



-- 

Best,
Benchao Li

Re: [VOTE] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-10-08 Thread Benchao Li

+1 (binding)

Zhanghao Chen  于2023年10月9日周一 10:20写道：
>
> Hi All,
>
> Thanks for all the feedback on FLIP-367: Support Setting Parallelism for 
> Table/SQL Sources [1][2].
>
> I'd like to start a vote for FLIP-367. The vote will be open until Oct 12th 
> 12:00 PM GMT) unless there is an objection or insufficient votes.
>
> [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> [2] https://lists.apache.org/thread/gtpswl42jzv0c9o3clwqskpllnw8rh87
>
> Best,
> Zhanghao Chen



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-07 Thread Benchao Li

Thanks Yangze for preparing this FLIP, it's good to have this ability
for gateway since we already have it for other JVM processes
(client/JM/TM) as Rui mentioned.

Rui Fan <1996fan...@gmail.com> 于2023年10月7日周六 18:02写道：
>
> Thanks to Yangze driving this proposal.
>
> `env.java.opts.xxx` is already supported for client, historyserver,
> jobmanager and taskmanager. And it's very useful for troubleshooting.
> So +1 for `env.java.opts.sql-gateway`.
>
> I have a minor question: doesn't the `env.java.opts.all` support
> sql-gateway?
> If yes, it's fine. If no, it's better to consider it to be the subtask of
> this FLIP.
>
> Best,
> Rui
>
>
> On Sat, Oct 7, 2023 at 4:35 PM xiangyu feng  wrote:
>
> > Thanks for initiating this discussion. Within the development towards
> > Streaming Warehousing, SQL Gateway will become more and more important.
> > Big +1 to specify Java Options separately for SQL Gateway.
> >
> > Regards,
> > Xiangyu
> >
> > Yangze Guo  于2023年10月7日周六 15:24写道：
> >
> > > Hi, there,
> > >
> > > We'd like to start a discussion thread on "FLIP-374: Adding a separate
> > > configuration for specifying Java Options of the SQL Gateway"[1],
> > > where we propose adding a separate configuration option to specify the
> > > Java options for the SQL Gateway. This would allow users to fine-tune
> > > the memory settings, garbage collection behavior, and other relevant
> > > Java parameters specific to the SQL Gateway, ensuring optimal
> > > performance and stability in production environments.
> > >
> > > Looking forward to your feedback.
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
> > >
> > > Best,
> > > Yangze Guo
> > >
> >



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-19 Thread Benchao Li

t; To: dev@flink.apache.org 
> > > Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for
> > Table/SQL
> > > Sources
> > >
> > > + 1 Thanks for the FLIP and the discussion. I would like to ask whether
> > to
> > > use SQL Hint syntax to set this parallelism？
> > >
> > > Martijn Visser  于2023年9月15日周五 20:52写道：
> > >
> > > > Hi everyone,
> > > >
> > > > Thanks for the FLIP and the discussion. I find it exciting. Thanks for
> > > > pushing for this.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Fri, Sep 15, 2023 at 2:25 PM Chen Zhanghao <
> > zhanghao.c...@outlook.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Jane,
> > > > >
> > > > > Thanks for the valuable suggestions.
> > > > >
> > > > > For Q1, it's indeed an issue. Some possible ideas include
> > introducing a
> > > > > fake transformation after the source that takes the global default
> > > > > parallelism, or simply make exec nodes to take the global default
> > > > > parallelism, but both ways prevent potential chaining opportunity and
> > > I'm
> > > > > not sure if that's good to go. We'll need to give deeper thoughts in
> > it
> > > > and
> > > > > polish our proposal. We're also more than glad to hear your inputs on
> > > it.
> > > > >
> > > > > For Q2, scan.parallelism will take high precedence, as the more
> > > specific
> > > > > config should take higher precedence.
> > > > >
> > > > > Best,
> > > > > Zhanghao Chen
> > > > > 
> > > > > 发件人: Jane Chan 
> > > > > 发送时间: 2023年9月15日 11:56
> > > > > 收件人: dev@flink.apache.org 
> > > > > 抄送: dewe...@outlook.com 
> > > > > 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> > > > > Sources
> > > > >
> > > > > Hi, Zhanghao, Dewei,
> > > > >
> > > > > Thanks for initiating this discussion. This feature is valuable in
> > > > > providing more flexibility for performance tuning for SQL pipelines.
> > > > >
> > > > > Here are my two cents,
> > > > >
> > > > > 1. In the FLIP, you mentioned concerns about the parallelism of the
> > > calc
> > > > > node and concluded to "leave the behavior unchanged for now."  This
> > > means
> > > > > that the calc node will use the parallelism of the source operator,
> > > > > regardless of whether the source parallelism is configured or not.
> > If I
> > > > > understand correctly, currently, except for the sink exec node (which
> > > has
> > > > > the ability to configure its own parallelism), the rest of the exec
> > > nodes
> > > > > accept its input parallelism. From the design, I didn't see the
> > details
> > > > > about coping with input and default parallelism for the rest of the
> > > exec
> > > > > nodes. Can you elaborate more about the details?
> > > > >
> > > > > 2. Does the configuration `table.exec.resource.default-parallelism`
> > > take
> > > > > precedence over `scan.parallelism`?
> > > > >
> > > > > Best,
> > > > > Jane
> > > > >
> > > > > On Fri, Sep 15, 2023 at 10:43 AM Yun Tang  wrote:
> > > > >
> > > > > > Thanks for creating this FLIP,
> > > > > >
> > > > > > Many users have demands to configure the source parallelism just as
> > > > > > configuring the sink parallelism via DDL. Look forward for this
> > > > feature.
> > > > > >
> > > > > > BTW, I think setting parallelism for each operator should also be
> > > > > > valuable. And this shall work with compiled plan [1] instead of
> > SQL's
> > > > > DDL.
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+T

Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-14 Thread Benchao Li

Thanks Zhanghao, Dewei for preparing the FLIP,

I think this is a long awaited feature, and I appreciate your effort,
especially the "Other concerns" part you listed.

Regarding the parallelism of transformations following the source
transformation, it's indeed a problem that we initially want to solve
when we introduced this feature internally. I'd like to hear more
opinions on this. Personally I'm ok to leave it out of this FLIP for
the time being.

Chen Zhanghao  于2023年9月14日周四 14:46写道：
>
> Hi Devs,
>
> Dewei (cced) and I would like to start a discussion on FLIP-367: Support 
> Setting Parallelism for Table/SQL Sources [1].
>
> Currently, Flink Table/SQL jobs do not expose fine-grained control of 
> operator parallelism to users. FLIP-146 [2] brings us support for setting 
> parallelism for sinks, but except for that, one can only set a default global 
> parallelism and all other operators share the same parallelism. However, in 
> many cases, setting parallelism for sources individually is preferable:
>
> - Many connectors have an upper bound parallelism to efficiently ingest data. 
> For example, the parallelism of a Kafka source is bound by the number of 
> partitions, any extra tasks would be idle.
> - Other operators may involve intensive computation and need a larger 
> parallelism.
>
> We propose to improve the current situation by extending the current table 
> source API to support setting parallelism for Table/SQL sources via connector 
> options.
>
> Looking forward to your feedback.
>
> [1] FLIP-367: Support Setting Parallelism for Table/SQL Sources - Apache 
> Flink - Apache Software 
> Foundation<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150>
> [2] FLIP-146: Improve new TableSource and TableSink interfaces - Apache Flink 
> - Apache Software 
> Foundation<https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces>
>
> Best,
> Zhanghao Chen



-- 

Best,
Benchao Li

Re: [VOTE] FLIP-348: Make expanding behavior of virtual metadata columns configurable

2023-08-31 Thread Benchao Li

+1 (binding)

Martijn Visser  于2023年8月31日周四 15:24写道：
>
> +1 (binding)
>
> On Thu, Aug 31, 2023 at 9:09 AM Timo Walther  wrote:
>
> > Hi everyone,
> >
> > I'd like to start a vote on FLIP-348: Make expanding behavior of virtual
> > metadata columns configurable [1] which has been discussed in this
> > thread [2].
> >
> > The vote will be open for at least 72 hours unless there is an objection
> > or not enough votes.
> >
> > [1] https://cwiki.apache.org/confluence/x/_o6zDw
> > [2] https://lists.apache.org/thread/zon967w7synby8z6m1s7dj71dhkh9ccy
> >
> > Cheers,
> > Timo
> >



-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API

2023-08-20 Thread Benchao Li

t;>> SELECT *
> >>>>>> FROM (SELECT * FROM employees ORDER BY employee_id)
> >>>>>> WHERE ROWNUM < 11;
> >>>>>>
> >>>>>> However, IIUC, the proposed "$rowtime" pseudo-column can only be got
> >>>>>> from
> >>>>>> the physical table
> >>>>>> and can't be got from queries even if the query propagates the
> rowtime
> >>>>>> attribute. There was also
> >>>>>> a discussion about adding a pseudo-column "_proctime" [2] to make
> >>>>>> lookup
> >>>>>> join easier to use
> >>>>>> which can be got from arbitrary queries. That "_proctime" may
> conflict
> >>>>> with
> >>>>>> the proposed
> >>>>>> pseudo-column concept.
> >>>>>>
> >>>>>> Did you consider making it as a built-in defined pseudo-column
> >>>>>> "$rowtime"
> >>>>>> which returns the
> >>>>>> time attribute value (if exists) or null (if non-exists) for every
> >>>>>> table/query, and pseudo-column
> >>>>>> "$proctime" always returns PROCTIME() value for each table/query. In
> >>>>>> this
> >>>>>> way, catalogs only need
> >>>>>> to provide a default rowtime attribute and users can get it in the
> >> same
> >>>>>> way. And we don't need
> >>>>>> to introduce the contract interface of "Metadata Key Prefix
> >> Constraint"
> >>>>>> which is still a little complex
> >>>>>> for users and devs to understand.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>> [1]:
> >>>>>>
> >>>>>
> >>
> https://docs.oracle.com/cd/E11882_01/server.112/e41084/pseudocolumns009.htm#SQLRF00255
> >>>>>> [2]:
> https://lists.apache.org/thread/7ln106qxyw8sp7ljq40hs2p1lb1gdwj5
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, 28 Jul 2023 at 06:18, Alexey Leonov-Vendrovskiy <
> >>>>>> vendrov...@gmail.com> wrote:
> >>>>>>
> >>>>>>>>
> >>>>>>>> `SELECT * FROM (SELECT $rowtime, * FROM t);`
> >>>>>>>> Am I right that it will show `$rowtime` in output ?
> >>>>>>>
> >>>>>>>
> >>>>>>> Yes, all explicitly selected columns become a part of the result
> (and
> >>>>>>> intermediate) schema, and hence propagate.
> >>>>>>>
> >>>>>>> On Thu, Jul 27, 2023 at 2:40 PM Alexey Leonov-Vendrovskiy <
> >>>>>>> vendrov...@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Thank you, Timo, for starting this FLIP!
> >>>>>>>>
> >>>>>>>> I propose the following change:
> >>>>>>>>
> >>>>>>>> Remove the requirement that DESCRIBE need to show system columns.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Some concrete vendor specific catalog implementations might prefer
> >>>>>>>> this
> >>>>>>>> approach.
> >>>>>>>> Usually the same system columns are available on all (or family)
> of
> >>>>>>>> tables, and it can be easily captured in the documentation.
> >>>>>>>>
> >>>>>>>> For example, BigQuery does exactly this: there, pseudo-columns do
> >> not
> >>>>>>> show
> >>>>>>>> up in the table schema in any place, but can be accessed via
> >>>>>>>> reference.
> >>>>>>>>
> >>>>>>>> So I propose we:
> >>>>>>>> a) Either we say that DESCRIBE doesn't show system columns,
> >>>>>>>> b) Or leave this vendor-specific / or configurable via flag (if
> >>>>> needed).
> >>>>>>>>
> >>>>>>>> Regards,
> &g

Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei

2023-08-09 Thread Benchao Li

Yanfei Lei
> > > > > > >
> > > > > > > Congratulations and welcome, Yanfei!
> > > > > > >
> > > > > > > Best,
> > > > > > > Qingsheng
> > > > > > >
> > > > > > > On Mon, Aug 7, 2023 at 4:19 PM Matthias Pohl <
> > matthias.p...@aiven.io
> > > > > > > .invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > Congratulations, Yanfei! :)
> > > > > > >
> > > > > > > On Mon, Aug 7, 2023 at 10:00 AM Junrui Lee <
> jrlee@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > Congratulations Yanfei!
> > > > > > >
> > > > > > > Best,
> > > > > > > Junrui
> > > > > > >
> > > > > > > Yun Tang  于2023年8月7日周一 15:19写道：
> > > > > > >
> > > > > > > Congratulations, Yanfei!
> > > > > > >
> > > > > > > Best
> > > > > > > Yun Tang
> > > > > > > 
> > > > > > > From: Danny Cranmer 
> > > > > > > Sent: Monday, August 7, 2023 15:10
> > > > > > > To: dev 
> > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
> > > > > > >
> > > > > > > Congrats Yanfei! Welcome to the team.
> > > > > > >
> > > > > > > Danny
> > > > > > >
> > > > > > > On Mon, 7 Aug 2023, 08:03 Rui Fan, <1996fan...@gmail.com>
> wrote:
> > > > > > >
> > > > > > > Congratulations Yanfei!
> > > > > > >
> > > > > > > Best,
> > > > > > > Rui
> > > > > > >
> > > > > > > On Mon, Aug 7, 2023 at 2:56 PM Yuan Mei <
> yuanmei.w...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > On behalf of the PMC, I'm happy to announce Yanfei Lei as a new
> > > > > > > Flink
> > > > > > > Committer.
> > > > > > >
> > > > > > > Yanfei has been active in the Flink community for almost two
> > > > > > > years
> > > > > > > and
> > > > > > > has
> > > > > > > played an important role in developing and maintaining State
> > > > > > > and
> > > > > > > Checkpoint
> > > > > > > related features/components, including RocksDB Rescaling
> > > > > > > Performance
> > > > > > > Improvement and Generic Incremental Checkpoints.
> > > > > > >
> > > > > > > Yanfei also helps improve community infrastructure in many
> > > > > > > ways,
> > > > > > > including
> > > > > > > migrating the Flink Daily performance benchmark to the Apache
> > > > > > > Flink
> > > > > > > slack
> > > > > > > channel. She is the maintainer of the benchmark and has
> > > > > > > improved
> > > > > > > its
> > > > > > > detection stability significantly. She is also one of the major
> > > > > > > maintainers
> > > > > > > of the FrocksDB Repo and released FRocksDB 6.20.3 (part of
> > > > > > > Flink
> > > > > > > 1.17
> > > > > > > release). Yanfei is a very active community member, supporting
> > > > > > > users
> > > > > > > and
> > > > > > > participating
> > > > > > > in tons of discussions on the mailing lists.
> > > > > > >
> > > > > > > Please join me in congratulating Yanfei for becoming a Flink
> > > > > > > Committer!
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yuan Mei (on behalf of the Flink PMC)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu

2023-08-09 Thread Benchao Li

Congrats, Hangxiang!

Jing Ge  于2023年8月8日周二 17:44写道：

> Congrats, Hangxiang!
>
> Best regards,
> Jing
>
> On Tue, Aug 8, 2023 at 3:04 PM Yangze Guo  wrote:
>
> > Congrats, Hangxiang!
> >
> > Best,
> > Yangze Guo
> >
> > On Tue, Aug 8, 2023 at 11:28 AM yh z  wrote:
> > >
> > > Congratulations, Hangxiang !
> > >
> > >
> > > Best,
> > > Yunhong Zheng (Swuferhong)
> > >
> > > yuxia  于2023年8月8日周二 09:20写道：
> > >
> > > > Congratulations, Hangxiang !
> > > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > - 原始邮件 -
> > > > 发件人: "Wencong Liu" 
> > > > 收件人: "dev" 
> > > > 发送时间: 星期一, 2023年 8 月 07日 下午 11:55:24
> > > > 主题: Re:[ANNOUNCE] New Apache Flink Committer - Hangxiang Yu
> > > >
> > > > Congratulations, Hangxiang !
> > > >
> > > >
> > > > Best,
> > > > Wencong
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2023-08-07 14:57:49, "Yuan Mei"  wrote:
> > > > >On behalf of the PMC, I'm happy to announce Hangxiang Yu as a new
> > Flink
> > > > >Committer.
> > > > >
> > > > >Hangxiang has been active in the Flink community for more than 1.5
> > years
> > > > >and has played an important role in developing and maintaining State
> > and
> > > > >Checkpoint related features/components, including Generic
> Incremental
> > > > >Checkpoints (take great efforts to make the feature prod-ready).
> > Hangxiang
> > > > >is also the main driver of the FLIP-263: Resolving schema
> > compatibility.
> > > > >
> > > > >Hangxiang is passionate about the Flink community. Besides the
> > technical
> > > > >contribution above, he is also actively promoting Flink: talks about
> > > > Generic
> > > > >Incremental Checkpoints in Flink Forward and Meet-up. Hangxiang also
> > spent
> > > > >a good amount of time supporting users, participating in
> Jira/mailing
> > list
> > > > >discussions, and reviewing code.
> > > > >
> > > > >Please join me in congratulating Hangxiang for becoming a Flink
> > Committer!
> > > > >
> > > > >Thanks,
> > > > >Yuan Mei (on behalf of the Flink PMC)
> > > >
> >
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink PMC Member - Matthias Pohl

2023-08-09 Thread Benchao Li

Congratulations, Matthias!

Maximilian Michels  于2023年8月8日周二 17:54写道：

> Congrats, well done, and welcome to the PMC Matthias!
>
> -Max
>
> On Tue, Aug 8, 2023 at 8:36 AM yh z  wrote:
> >
> > Congratulations, Matthias!
> >
> > Best,
> > Yunhong Zheng (Swuferhong)
> >
> > Ryan Skraba  于2023年8月7日周一 21:39写道：
> >
> > > Congratulations Matthias -- very well-deserved, the community is lucky
> to
> > > have you <3
> > >
> > > All my best, Ryan
> > >
> > > On Mon, Aug 7, 2023 at 3:04 PM Lincoln Lee 
> wrote:
> > >
> > > > Congratulations!
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Feifan Wang  于2023年8月7日周一 20:13写道：
> > > >
> > > > > Congrats Matthias!
> > > > >
> > > > >
> > > > >
> > > > > ——
> > > > > Name: Feifan Wang
> > > > > Email: zoltar9...@163.com
> > > > >
> > > > >
> > > > >  Replied Message 
> > > > > | From | Matthias Pohl |
> > > > > | Date | 08/7/2023 16:16 |
> > > > > | To |  |
> > > > > | Subject | Re: [ANNOUNCE] New Apache Flink PMC Member - Matthias
> Pohl
> > > |
> > > > > Thanks everyone. :)
> > > > >
> > > > > On Mon, Aug 7, 2023 at 3:18 AM Andriy Redko 
> wrote:
> > > > >
> > > > > Congrats Matthias, well deserved!!
> > > > >
> > > > > DC> Congrats Matthias!
> > > > >
> > > > > DC> Very well deserved, thankyou for your continuous, consistent
> > > > > contributions.
> > > > > DC> Welcome.
> > > > >
> > > > > DC> Thanks,
> > > > > DC> Danny
> > > > >
> > > > > DC> On Fri, Aug 4, 2023 at 9:30 AM Feng Jin  >
> > > > wrote:
> > > > >
> > > > > Congratulations, Matthias!
> > > > >
> > > > > Best regards
> > > > >
> > > > > Feng
> > > > >
> > > > > On Fri, Aug 4, 2023 at 4:29 PM weijie guo <
> guoweijieres...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > Congratulations, Matthias!
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Weijie
> > > > >
> > > > >
> > > > > Wencong Liu  于2023年8月4日周五 15:50写道：
> > > > >
> > > > > Congratulations, Matthias!
> > > > >
> > > > > Best,
> > > > > Wencong Liu
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2023-08-04 11:18:00, "Xintong Song" 
> > > > > wrote:
> > > > > Hi everyone,
> > > > >
> > > > > On behalf of the PMC, I'm very happy to announce that Matthias Pohl
> > > > > has
> > > > > joined the Flink PMC!
> > > > >
> > > > > Matthias has been consistently contributing to the project since
> > > > > Sep
> > > > > 2020,
> > > > > and became a committer in Dec 2021. He mainly works in Flink's
> > > > > distributed
> > > > > coordination and high availability areas. He has worked on many
> > > > > FLIPs
> > > > > including FLIP195/270/285. He helped a lot with the release
> > > > > management,
> > > > > being one of the Flink 1.17 release managers and also very active
> > > > > in
> > > > > Flink
> > > > > 1.18 / 2.0 efforts. He also contributed a lot to improving the
> > > > > build
> > > > > stability.
> > > > >
> > > > > Please join me in congratulating Matthias!
> > > > >
> > > > > Best,
> > > > >
> > > > > Xintong (on behalf of the Apache Flink PMC)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink PMC Member - Matthias Pohl

2023-08-03 Thread Benchao Li

Congratulations, Matthias!

Jing Ge  于2023年8月4日周五 12:35写道：

> Congrats! Matthias!
>
> Best regards,
> Jing
>
> On Fri, Aug 4, 2023 at 12:09 PM Yangze Guo  wrote:
>
> > Congrats, Matthias!
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Aug 4, 2023 at 11:44 AM Qingsheng Ren  wrote:
> > >
> > > Congratulations, Matthias! This is absolutely well deserved.
> > >
> > > Best,
> > > Qingsheng
> > >
> > > On Fri, Aug 4, 2023 at 11:31 AM Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > > Congratulations Matthias, well deserved!
> > > >
> > > > Best,
> > > > Rui Fan
> > > >
> > > > On Fri, Aug 4, 2023 at 11:30 AM Leonard Xu 
> wrote:
> > > >
> > > > > Congratulations,  Matthias.
> > > > >
> > > > > Well deserved ^_^
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > >
> > > > > > On Aug 4, 2023, at 11:18 AM, Xintong Song  >
> > > > wrote:
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > On behalf of the PMC, I'm very happy to announce that Matthias
> > Pohl has
> > > > > > joined the Flink PMC!
> > > > > >
> > > > > > Matthias has been consistently contributing to the project since
> > Sep
> > > > > 2020,
> > > > > > and became a committer in Dec 2021. He mainly works in Flink's
> > > > > distributed
> > > > > > coordination and high availability areas. He has worked on many
> > FLIPs
> > > > > > including FLIP195/270/285. He helped a lot with the release
> > management,
> > > > > > being one of the Flink 1.17 release managers and also very active
> > in
> > > > > Flink
> > > > > > 1.18 / 2.0 efforts. He also contributed a lot to improving the
> > build
> > > > > > stability.
> > > > > >
> > > > > > Please join me in congratulating Matthias!
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Xintong (on behalf of the Apache Flink PMC)
> > > > >
> > > > >
> > > >
> >
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu

2023-08-03 Thread Benchao Li

Congratulations, Weihua!

Leonard Xu  于2023年8月4日周五 11:30写道：

> Congratulations Weihua!
>
>
> Best,
> Leonard
>
> > On Aug 4, 2023, at 11:18 AM, Xintong Song  wrote:
> >
> > Hi everyone,
> >
> > On behalf of the PMC, I'm very happy to announce Weihua Hu as a new Flink
> > Committer!
> >
> > Weihua has been consistently contributing to the project since May 2022.
> He
> > mainly works in Flink's distributed coordination areas. He is the main
> > contributor of FLIP-298 and many other improvements in large-scale job
> > scheduling and improvements. He is also quite active in mailing lists,
> > participating discussions and answering user questions.
> >
> > Please join me in congratulating Weihua!
> >
> > Best,
> >
> > Xintong (on behalf of the Apache Flink PMC)
>
>

-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink Committer - Hong Teoh

2023-08-03 Thread Benchao Li

Congratulations, Hong!

yuxia  于2023年8月4日周五 09:23写道：

> Congratulations, Hong Teoh!
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Matthias Pohl" 
> 收件人: "dev" 
> 发送时间: 星期四, 2023年 8 月 03日 下午 11:24:39
> 主题: Re: [ANNOUNCE] New Apache Flink Committer - Hong Teoh
>
> Congratulations, Hong! :)
>
> On Thu, Aug 3, 2023 at 3:39 PM Leonard Xu  wrote:
>
> > Congratulations, Hong!
> >
> >
> > Best,
> > Leonard
> >
> > > On Aug 3, 2023, at 8:45 PM, Jiabao Sun  .INVALID>
> > wrote:
> > >
> > > Congratulations, Hong Teoh!
> > >
> > > Best,
> > > Jiabao Sun
> > >
> > >> 2023年8月3日 下午7:32，Danny Cranmer  写道：
> > >>
> > >> On behalf of the PMC, I'm very happy to announce Hong Teoh as a new
> > Flink
> > >> Committer.
> > >>
> > >> Hong has been active in the Flink community for over 1 year and has
> > played
> > >> a key role in developing and maintaining AWS integrations, core
> > connector
> > >> APIs and more recently, improvements to the Flink REST API.
> > Additionally,
> > >> Hong is a very active community member, supporting users and
> > participating
> > >> in discussions on the mailing lists, Flink slack channels and speaking
> > at
> > >> conferences.
> > >>
> > >> Please join me in congratulating Hong for becoming a Flink Committer!
> > >>
> > >> Thanks,
> > >> Danny Cranmer (on behalf of the Flink PMC)
> > >
> >
> >
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API

2023-07-26 Thread Benchao Li

Hi Timo,

Thanks for the FLIP, I also like the idea and option 3 sounds good to me.

I would like to discuss a case which is not mentioned in the current FLIP.
How are the "System column"s expressed in intermediate result, e.g. Join?
E.g. `SELECT * FROM t1 JOIN t2`, I guess it should not include "system
columns" from t1 and t2 as you proposed, and for `SELECT t1.$rowtime, *
FROM t1 JOIN t2`, it should also be valid.
Then the question is how to you plan to implement the "system columns", do
we need to add it to `RelNode` level? Or we just need to do it in the
parsing/validating phase?
I'm not sure that Calcite's "system column" feature is fully ready for this
since the code about this part is imported from the earlier project before
it gets into Apache, and has not been considered much in the past
development.

Jing Ge  于2023年7月26日周三 00:01写道：

> Hi Timo,
>
> Thanks for your proposal. It is a very pragmatic feature. Among all options
> in the FLIP, option 3 is one I prefer too and I'd like to ask some
> questions to understand your thoughts.
>
> 1. I did some research on pseudo columns, just out of curiosity, do you
> know why most SQL systems do not need any prefix with their pseudo column?
> 2. Some platform providers will use ${variable_name} to define their own
> configurations and allow them to be embedded into SQL scripts. Will there
> be any conflict with option 3?
>
> Best regards,
> Jing
>
> On Tue, Jul 25, 2023 at 7:00 PM Konstantin Knauf 
> wrote:
>
> > Hi Timo,
> >
> > this makes sense to me. Option 3 seems reasonable, too.
> >
> > Cheers,
> >
> > Konstantin
> >
> > Am Di., 25. Juli 2023 um 12:53 Uhr schrieb Timo Walther <
> > twal...@apache.org
> > >:
> >
> > > Hi everyone,
> > >
> > > I would like to start a discussion about introducing the concept of
> > > "System Columns" in SQL and Table API.
> > >
> > > The subject sounds bigger than it actually is. Luckily, Flink SQL
> > > already exposes the concept of metadata columns. And this proposal is
> > > just a slight adjustment for how metadata columns can be used as system
> > > columns.
> > >
> > > The biggest problem of metadata columns currently is that a catalog
> > > implementation can't provide them by default because they would affect
> > > `SELECT *` when adding another one.
> > >
> > > Looking forward to your feedback on FLIP-348:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API
> > >
> > > Thanks,
> > > Timo
> > >
> >
> >
> > --
> > https://twitter.com/snntrable
> > https://github.com/knaufk
> >
>

-- 

Best,
Benchao Li

Re: [DISCUSSION] Revival of FLIP-154 (SQL Implicit Type Coercion)

2023-07-25 Thread Benchao Li

Hi Sergey,

There is a discussion[1] for this FLIP before.

I like this feature, thanks for driving this. I hope it does not need to
much customization (if there is any improvement which would benefit both
Calcite and Flink, it would be great to push those also to upstream
project, and I would like to help on that)

[1] https://lists.apache.org/thread/mglop6mgd36rrbjt7ojyd5sj8tyoc3c2

Sergey Nuyanzin  于2023年7月19日周三 06:18写道：

> Hello everyone
>
> I'd like to revive FLIP-154[1] a bit.
>
> I failed with finding any discussion/vote thread about it (please point me
> to that if it is present somewhere)
>
> Also FLIP itself looks abandoned (no activity for a couple of years),
> however it seems to be useful.
>
> I did a bit investigation about that
>
> From one side Calcite provides its own coercion... However, sometimes it
> behaves strangely and is not ready to use in Flink.
> for instance
> 1) it can not handle cases with `with` subqueries and fails with NPE (even
> if it's fixed it will come not earlier than with 1.36.0)
> 2) under the hood it uses hardcoded `cast`, especially it is an issue for
> equals where `cast` is invoked without fallback to `null`. In addition it
> tries to cast `string` to `date`. All this leads to failure for such
> queries like `select uuid() = null;` where it tries to cast the result of
> `uuid()` to date without a fallback option.
>
> The good thing is that Calcite also provides a custom TypeCoercionFactory
> which could be used during coercion (in case it is enabled). This could
> allow for instance to use `try_cast` instead of `cast`, embed the fix for
> aforementioned NPE, enable only required coercion rules. Also it will
> enable coercions rule by rule instead of big bang.
>
> I would volunteer to update the FLIP page with usage of a custom factory if
> there are no objections and come back with a discussion thread to revive
> the work on it.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-154%3A+SQL+Implicit+Type+Coercion
> --
> Best regards,
> Sergey
>


-- 

Best,
Benchao Li

Re: [DISCUSS] Connectors, Formats, and even User Code should also be pluggable.

2023-07-25 Thread Benchao Li

I agree with Jing that the current doc is quite preliminary, and needs to
be turned into a FLIP.

I'll be very interested in this FLIP, and looking forward to it. We've
suffered from reshading/relocating heavy connector dependencies for a long
time.

Besides, we've implemented a plugin mechanism to load hive udf in our batch
SQL scenario internally, it saves us a lot of effort to handle the
dependency conflicts. The most challenging part would be the
(de)serialization of those classes loaded through plugins according to our
experience. (I noticed you have already considered this)

Jing Ge  于2023年7月21日周五 06:44写道：

> Hi Zhiqiang,
>
> Thanks for your proposal. The idea is very interesting! Since almost all
> connectors are or will be externalized, the pluggable design you suggested
> could help reduce the development effort.
>
> As you mentioned, the attached doc contains only your preliminary idea. I
> would suggest that you could turn it into a FLIP with more details and do
> the followings:
>
> 1. Describe a real conflict case that will benefit from the pluggable
> design
> 2. Describe where you want to modify the code, or provide a POC branch
> 3. Guideline to tell users how to use it, i.e. where (the plugins dir)
> should the connector jar be located, how does it look like with an example,
> etc.
>
> WDYT?
>
> Best regards,
> Jing
>
>
> On Fri, Jul 14, 2023 at 3:00 PM zhiqiang li 
> wrote:
>
> > Hi devs,
> >
> > I have observed that in [1], connectors and formats are pluggable,
> > allowing user code to be easily integrated. The advantages of having
> > pluggable connectors are evident, as it helps avoid conflicts between
> > different versions of jar packages. If classloader isolation is not used,
> > shading becomes necessary to resolve conflicts, resulting in a
> significant
> > waste of development time. I understand that implementing this change may
> > require numerous API modifications, so I would like to discuss in this
> > email.
> >
> > > Plugins cannot access classes from other plugins or from Flink that
> have
> > not been specifically whitelisted.
> > > This strict isolation allows plugins to contain conflicting versions of
> > the same library without the need to relocate classes or to converge to
> > common versions.
> > > Currently, file systems and metric reporters are pluggable but, in the
> > future, connectors, formats, and even user code should also be pluggable.
> >
> > [2] It is my preliminary idea.
> >
> > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/
> > [2]
> >
> https://docs.google.com/document/d/1XP2fBpcntK0YIdQ_Ax7JV2MhNdebvkFxSiNJRp6WQ24/edit?usp=sharing
> >
> >
> > Best,
> > Zhiqiang
> >
> >
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink Committer - Yong Fang

2023-07-23 Thread Benchao Li

Congratulations, Yong! Well deserved!

Yangze Guo  于2023年7月24日周一 12:16写道：

> Congrats, Yong!
>
> Best,
> Yangze Guo
>
> On Mon, Jul 24, 2023 at 12:02 PM xiangyu feng 
> wrote:
> >
> > Congratulations, Yong!
> >
> > Best,
> > Xiangyu
> >
> > liu ron  于2023年7月24日周一 11:48写道：
> >
> > > Congratulations,
> > >
> > > Best,
> > > Ron
> > >
> > > Qingsheng Ren  于2023年7月24日周一 11:18写道：
> > >
> > > > Congratulations and welcome aboard, Yong!
> > > >
> > > > Best,
> > > > Qingsheng
> > > >
> > > > On Mon, Jul 24, 2023 at 11:14 AM Chen Zhanghao <
> > > zhanghao.c...@outlook.com>
> > > > wrote:
> > > >
> > > > > Congrats, Shammon!
> > > > >
> > > > > Best,
> > > > > Zhanghao Chen
> > > > > 
> > > > > 发件人: Weihua Hu 
> > > > > 发送时间: 2023年7月24日 11:11
> > > > > 收件人: dev@flink.apache.org 
> > > > > 抄送: Shammon FY 
> > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yong Fang
> > > > >
> > > > > Congratulations!
> > > > >
> > > > > Best,
> > > > > Weihua
> > > > >
> > > > >
> > > > > On Mon, Jul 24, 2023 at 11:04 AM Paul Lam 
> > > wrote:
> > > > >
> > > > > > Congrats, Shammon!
> > > > > >
> > > > > > Best,
> > > > > > Paul Lam
> > > > > >
> > > > > > > 2023年7月24日 10:56，Jingsong Li  写道：
> > > > > > >
> > > > > > > Shammon
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
>


-- 

Best,
Benchao Li

[jira] [Created] (FLINK-32547) Add missing doc for Timestamp support in ProtoBuf format

2023-07-05 Thread Benchao Li (Jira)

Benchao Li created FLINK-32547:
--

 Summary: Add missing doc for Timestamp support in ProtoBuf format
 Key: FLINK-32547
 URL: https://issues.apache.org/jira/browse/FLINK-32547
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.17.1
Reporter: Benchao Li


In FLINK-30093, we have support {{Timestamp}} type, and added the doc for it, 
but missed to updating the English version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Feature requests for Flink protobuf deserialization

2023-07-05 Thread Benchao Li

Thanks for starting the discussion,

1. I'm +1 for this.
2. We have already supported this in [1]
3. I'm not sure about this, could you give more examples except the cases
1&2?
4&5. I think we also have considered this with the option
'protobuf.read-default-values' [2], is this what you want?

[1] https://issues.apache.org/jira/browse/FLINK-30093
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/formats/protobuf/#protobuf-read-default-values



Adam Richardson  于2023年6月28日周三 10:16写道：

> Hi there,
>
> My company is in the process of rebuilding some of our batch Spark-based
> ETL pipelines in Flink. We use protobuf to define our schemas. One major
> challenge is that Flink protobuf deserialization has some semantic
> differences with the ScalaPB encoders we use in our Spark systems. This
> poses a serious barrier for adoption as moving any given dataset from Spark
> to Flink will potentially break all downstream consumers. I have a long
> list of feature requests in this area:
>
>1. Support for mapping protobuf optional wrapper types (StringValue,
>IntValue, and friends) to nullable primitive types rather than RowTypes
>2. Support for mapping the protobuf Timestamp type to a real timestamp
>rather than RowType
>3. A way of defining custom mappings from specific proto types to custom
>Flink types (the previous two feature requests could be implemented on
> top
>of this lower-level feature)
>4. Support for nullability semantics for message types (in the status
>quo, an unset message is treated as equivalent to a message with default
>values for all fields, which is a confusing user experience)
>5. Support for nullability semantics for primitives types (in many of
>our services, the default value for a field of primitive type is
> treated as
>being equivalent to unset or null, so it would be good to offer this as
> a
>capability in the data warehouse)
>
> Would Flink accept patches for any or all of these feature requests? We're
> contemplating forking flink-protobuf internally, but it would be better if
> we could just upstream the relevant changes. (To my mind, 1, 2, and 4 are
> broadly applicable features that are definitely worthy of upstream support.
> 3 and 5 may be somewhat more specific to our use case.)
>
> Thanks,
> Adam Richardson
>


-- 

Best,
Benchao Li

Re: Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-04 Thread Benchao Li

t;> On Mon, Jul 3, 2023 at 3:21 PM yuxia <
> > > > > > > > luoyu...@alumni.sjtu.edu.cn
> > > > > > > > > > >> <mailto:luoyu...@alumni.sjtu.edu.cn>> wrote:
> > > > > > > > > > >> >>> Congratulations!
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> Best regards,
> > > > > > > > > > >> >>> Yuxia
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> 发件人: "Pushpa Ramakrishnan" <
> > > > pushpa.ramakrish...@icloud.com
> > > > > > > >  > > > > > > > > > >> pushpa.ramakrish...@icloud.com>>
> > > > > > > > > > >> >>> 收件人: "Xintong Song"   > > > > > > > > > >> tonysong...@gmail.com>>
> > > > > > > > > > >> >>> 抄送: "dev"  > > > > > dev@flink.apache.org>>,
> > > > > > > > > > >> "User"  > > u...@flink.apache.org
> > > > >>
> > > > > > > > > > >> >>> 发送时间: 星期一, 2023年 7 月 03日 下午 8:36:30
> > > > > > > > > > >> >>> 主题: Re: [ANNOUNCE] Apache Flink has won the 2023
> > > SIGMOD
> > > > > > Systems
> > > > > > > > > > Award
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> Congratulations \uD83E\uDD73
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> On 03-Jul-2023, at 3:30 PM, Xintong Song <
> > > > > > tonysong...@gmail.com
> > > > > > > > > > >> <mailto:tonysong...@gmail.com>> wrote:
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> 
> > > > > > > > > > >> >>> Dear Community,
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> I'm pleased to share this good news with everyone.
> > As
> > > > some
> > > > > > of
> > > > > > > > you
> > > > > > > > > > may
> > > > > > > > > > >> have already heard, Apache Flink has won the 2023
> SIGMOD
> > > > Systems
> > > > > > > > Award
> > > > > > > > > > [1].
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> "Apache Flink greatly expanded the use of stream
> > > > > > > > data-processing."
> > > > > > > > > > --
> > > > > > > > > > >> SIGMOD Awards Committee
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> SIGMOD is one of the most influential data
> > management
> > > > > > research
> > > > > > > > > > >> conferences in the world. The Systems Award is awarded
> > to
> > > an
> > > > > > > > individual
> > > > > > > > > > or
> > > > > > > > > > >> set of individuals to recognize the development of a
> > > > software or
> > > > > > > > > > hardware
> > > > > > > > > > >> system whose technical contributions have had
> > significant
> > > > > > impact on
> > > > > > > > the
> > > > > > > > > > >> theory or practice of large-scale data management
> > systems.
> > > > > > Winning
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > >> award indicates the high recognition of Flink's
> > > > technological
> > > > > > > > > > advancement
> > > > > > > > > > >> and industry influence from academia.
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> As an open-source project, Flink wouldn't have
> come
> > > > this far
> > > > > > > > without
> > > > > > > > > > >> the wide, active and supportive community behind it.
> > Kudos
> > > > to
> > > > > > all
> > > > > > > > of us
> > > > > > > > > > who
> > > > > > > > > > >> helped make this happen, including the over 1,400
> > > > contributors
> > > > > > and
> > > > > > > > many
> > > > > > > > > > >> others who contributed in ways beyond code.
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> Best,
> > > > > > > > > > >> >>> Xintong (on behalf of the Flink PMC)
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> [1] https://sigmod.org/2023-sigmod-systems-award/
> > > > > > > > > > >> >>>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
>


-- 

Best,
Benchao Li

Re: [VOTE] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-24 Thread Benchao Li

+1 (binding)

Yuxin Tan  于2023年6月25日周日 12:27写道：

> +1 (non-binding)
>
> Best,
> Yuxin
>
>
> Yangze Guo  于2023年6月25日周日 12:21写道：
>
> > +1 (binding)
> >
> > Best,
> > Yangze Guo
> >
> > On Sun, Jun 25, 2023 at 11:41 AM Jark Wu  wrote:
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Jark
> > >
> > > > 2023年6月25日 10:04，Xia Sun  写道：
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Best Regards,
> > > >
> > > > Xia
> > > >
> > > > yuxia  于2023年6月25日周日 09:23写道：
> > > >
> > > >> +1 (binding)
> > > >> Thanks Lijie driving it.
> > > >>
> > > >> Best regards,
> > > >> Yuxia
> > > >>
> > > >> - 原始邮件 -
> > > >> 发件人: "Yuepeng Pan" 
> > > >> 收件人: "dev" 
> > > >> 发送时间: 星期六, 2023年 6 月 24日 下午 9:06:53
> > > >> 主题: Re:[VOTE] FLIP-324: Introduce Runtime Filter for Flink Batch
> Jobs
> > > >>
> > > >> +1 (non-binding)
> > > >>
> > > >> Thanks,
> > > >> Yuepeng Pan
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> At 2023-06-23 23:49:53, "Lijie Wang" 
> > wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> Thanks for all the feedback about the FLIP-324: Introduce Runtime
> > Filter
> > > >>> for Flink Batch Jobs[1]. This FLIP was discussed in [2].
> > > >>>
> > > >>> I'd like to start a vote for it. The vote will be open for at least
> > 72
> > > >>> hours (until June 29th 12:00 GMT) unless there is an objection or
> > > >>> insufficient votes.
> > > >>>
> > > >>> [1]
> > > >>>
> > > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> > > >>> [2]
> https://lists.apache.org/thread/mm0o8fv7x7k13z11htt88zhy7lo8npmg
> > > >>>
> > > >>> Best,
> > > >>> Lijie
> > > >>
> > >
> >
>


-- 

Best,
Benchao Li

Re: [VOTE] FLIP-308: Support Time Travel

2023-06-19 Thread Benchao Li

+1 (binding)

Lincoln Lee  于2023年6月19日周一 19:40写道：

> +1 (binding)
>
> Best,
> Lincoln Lee
>
>
> yuxia  于2023年6月19日周一 19:30写道：
>
> > +1 (binding)
> > Thanks Feng driving it.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Feng Jin" 
> > 收件人: "dev" 
> > 发送时间: 星期一, 2023年 6 月 19日 下午 7:22:06
> > 主题: [VOTE] FLIP-308: Support Time Travel
> >
> > Hi everyone
> >
> > Thanks for all the feedback about the FLIP-308: Support Time Travel[1].
> > [2] is the discussion thread.
> >
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours(excluding weekends,until June 22, 12:00AM GMT) unless there is an
> > objection or an insufficient number of votes.
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel
> > [2]https://lists.apache.org/thread/rpozdlf7469jmc7q7vc0s08pjnmscz00
> >
> >
> > Best,
> > Feng
> >
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-15 Thread Benchao Li

n adopted by many computing
> >> engines such as Spark, Doris, etc... Flink is a streaming batch
> computing
> >> engine, and we are continuously optimizing the performance of batches.
> >> Runtime filter is a general performance optimization technique that can
> >> improve the performance of Flink batch jobs, so we are introducing it on
> >> batch as well.
> >>
> >> Looking forward to all feedback.
> >>
> >> Best,
> >> Ron
> >>
> >> Lijie Wang  于2023年6月14日周三 17:17写道：
> >>
> >> > Hi devs
> >> >
> >> > Ron Liu, Gen Luo and I would like to start a discussion about
> FLIP-324:
> >> > Introduce Runtime Filter for Flink Batch Jobs[1]
> >> >
> >> > Runtime Filter is a common optimization to improve join performance.
> It
> >> is
> >> > designed to dynamically generate filter conditions for certain Join
> >> queries
> >> > at runtime to reduce the amount of scanned or shuffled data, avoid
> >> > unnecessary I/O and network transmission, and speed up the query. Its
> >> > working principle is building a filter(e.g. bloom filter) based on the
> >> data
> >> > on the small table side(build side) first, then pass this filter to
> the
> >> > large table side(probe side) to filter the irrelevant data on it, this
> >> can
> >> > reduce the data reaching the join and improve performance.
> >> >
> >> > You can find more details in the FLIP-324[1]. Looking forward to your
> >> > feedback.
> >> >
> >> > [1]
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> >> >
> >> > Best,
> >> > Ron & Gen & Lijie
> >> >
> >>
> >
>


-- 

Best,
Benchao Li

Re: [VOTE] FLIP-294: Support Customized Catalog Modification Listener

2023-06-14 Thread Benchao Li

+1 (binding)

Shammon FY  于2023年6月14日周三 19:52写道：

> Hi all:
>
> Thanks for all the feedback for FLIP-294: Support Customized Catalog
> Modification Listener [1]. I would like to start a vote for it according to
> the discussion in thread [2].
>
> The vote will be open for at least 72 hours(excluding weekends, until June
> 19, 19:00 PM GMT) unless there is an objection or an insufficient number of
> votes.
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-294%3A+Support+Customized+Catalog+Modification+Listener
> [2] https://lists.apache.org/thread/185mbcwnpokfop4xcb22r9bgfp2m68fx
>
>
> Best,
> Shammon FY
>


-- 

Best,
Benchao Li

Re: [VOTE] FLIP-311: Support Call Stored Procedure

2023-06-12 Thread Benchao Li

+1 (binding)

yuxia  于2023年6月12日周一 17:58写道：

> Hi everyone,
> Thanks for all the feedback about FLIP-311: Support Call Stored
> Procedure[1]. Based on the discussion [2], we have come to a consensus, so
> I would like to start a vote.
> The vote will be open for at least 72 hours (until June 15th, 10:00AM GMT)
> unless there is an objection or an insufficient number of votes.
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> [2] https://lists.apache.org/thread/k6s50gcgznon9v1oylyh396gb5kgrwmd
>
> Best regards,
> Yuxia
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

2023-06-09 Thread Benchao Li

Thanks Leonard for the input, "Implicitly type conversion" way sounds good
to me.
I also agree that this should be done in planner instead of connector,
it'll be a lot easier for connector development.

Leonard Xu  于2023年6月9日周五 20:11写道：

> About the semantics consideration, I have some new input after rethink.
>
> 1. We can support both TIMESTAMP and TIMESTAMP_LTZ expression following
> the syntax  `SELECT [column_name(s)] FROM [table_name] FOR SYSTEM_TIME AS
> OF `
>
> 2. For TIMESTAMP_LTZ type, give a long instant value to CatalogTable is
> pretty intuitive, for TIMESTAMP_type, it will be implied cast to
> TIMESTAMP_LTZ type by planner using session timezone and then pass to
> CatalogTable. This case can be considered as a Function AsOfSnapshot(Table
> t, TIMESTAMP_LTZ arg), which can pass arg with TIMESTAMP_LTZ type, but our
> framework supports implicit type conversion thus users can also pass arg
> with TIMESTAMP type. Hint, Spark[1] did the  implicit type conversion too.
>
> 3.I also considered handing over the implicit type conversion to the
> connector instead of planner, such as passing a TIMESTAMP literal, and the
> connector using the session timezone to perform type conversion, but this
> is more complicated than previous planner handling, and it’s not friendly
> to the connector developers.
>
> 4. The last point,  TIMESTAMP_LTZ  '1970-01-01 00:00:04.001’ should be an
> invalid expression as if you can not define a instant point (i.e
> TIMSTAMP_LTZ semantics in SQL) from a timestamp literal without timezone.
> You can use explicit type conversion like `cast(ts_ntz as TIMESTAMP_LTZ)`
> after `FOR SYSTEM_TIME AS OF ` if you want to use
> Timestamp type/expression/literal without timezone.
>
> 5. The last last point, the TIMESTAMP_LTZ type of Flink SQL supports DST
> time[2] well that will help user avoid many corner case.
>
>
> Best,
> Leonard
>
> [1]
> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
> [2]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/timezone/#daylight-saving-time-support
>
>
>
>
> > On Jun 9, 2023, at 1:13 PM, Benchao Li  wrote:
> >
> > As you can see that you must use `UNIX_TIMESTAMP` to do this work, that's
> > where the time zone happens.
> >
> > What I'm talking about is casting timestamp/timestamp_ltz to long
> directly,
> > that's why the semantic is tricky when you are casting timestamp to long
> > using time zone.
> >
> > For other systems, such as SQL server[1], they actually uses a string
> > instead of timestamp literal `FOR SYSTEM_TIME AS OF '2021-01-01
> > 00:00:00.000'`, I'm not sure whether they convert the string
> implicitly
> > to TIMESTAMP_LTZ, or they just have a different definition of the syntax.
> >
> > But for us, we are definitely using timestamp/timestmap_ltz literal here,
> > that's why it is special, and we must highlight this behavior that we are
> > converting a timestamp without time zone literal to long using the
> session
> > time zone.
> >
> > [1]
> >
> https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-ver16
> >
> > Feng Jin  于2023年6月8日周四 11:35写道：
> >
> >> Hi all,
> >>
> >> thanks for your input
> >>
> >>
> >> @Benchao
> >>
> >>> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
> >> WITHOUT TIME ZONE", converting it to unix timestamp would use UTC
> timezone,
> >> which is not usually expected by users.
> >>
> >> It was indeed the case before Flink 1.13, but now my understanding is
> that
> >> there have been some slight changes in the definition of TIMESTAMP.
> >>
> >> TIMESTAMP is currently used to specify the year, month, day, hour,
> minute
> >> and second. We recommend that users use
> *UNIX_TIMESTAMP(CAST(timestamp_col
> >> AS STRING))* to convert *TIMESTAMP values* and *long values*. The
> >> *UNIX_TIMESTAMP* function will use the *LOCAL TIME ZONE*. Therefore,
> >> whether converting TIMESTAMP or TIMESTAMP_LTZ to Long values will
> involve
> >> using the *LOCAL TIME ZONE*.
> >>
> >>
> >> Here is an test:
> >>
> >> Flink SQL> SET 'table.local-time-zone' = 'UTC';
> >> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
> >> STRING)) as `timestamp`;
> >> ---
> >> timestamp
> >>

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

2023-06-08 Thread Benchao Li

As you can see that you must use `UNIX_TIMESTAMP` to do this work, that's
where the time zone happens.

What I'm talking about is casting timestamp/timestamp_ltz to long directly,
that's why the semantic is tricky when you are casting timestamp to long
using time zone.

For other systems, such as SQL server[1], they actually uses a string
instead of timestamp literal `FOR SYSTEM_TIME AS OF '2021-01-01
00:00:00.000'`, I'm not sure whether they convert the string implicitly
to TIMESTAMP_LTZ, or they just have a different definition of the syntax.

But for us, we are definitely using timestamp/timestmap_ltz literal here,
that's why it is special, and we must highlight this behavior that we are
converting a timestamp without time zone literal to long using the session
time zone.

[1]
https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-ver16

Feng Jin  于2023年6月8日周四 11:35写道：

> Hi all,
>
> thanks for your input
>
>
> @Benchao
>
> >  The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
> WITHOUT TIME ZONE", converting it to unix timestamp would use UTC timezone,
> which is not usually expected by users.
>
> It was indeed the case before Flink 1.13, but now my understanding is that
> there have been some slight changes in the definition of TIMESTAMP.
>
> TIMESTAMP is currently used to specify the year, month, day, hour, minute
> and second. We recommend that users use *UNIX_TIMESTAMP(CAST(timestamp_col
> AS STRING))* to convert *TIMESTAMP values* and *long values*. The
> *UNIX_TIMESTAMP* function will use the *LOCAL TIME ZONE*. Therefore,
> whether converting TIMESTAMP or TIMESTAMP_LTZ to Long values will involve
> using the *LOCAL TIME ZONE*.
>
>
> Here is an test:
>
> Flink SQL> SET 'table.local-time-zone' = 'UTC';
> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
> STRING)) as `timestamp`;
> ---
>  timestamp
>  --
>  0
>
> Flink SQL> SET 'table.local-time-zone' = 'Asia/Shanghai';
> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
> STRING)) as `timestamp`;
> ---
>  timestamp
>  --
>  -28800
>
> Therefore, the current conversion method exposed to users is also using
> LOCAL TIME ZONE.
>
>
> @yuxia
>
> Thank you very much for providing the list of behaviors of TIMESTAMP in
> other systems.
>
> > I think we can align them to avoid the inconsistency to other engines and
> provide convenience for the external connectors while integrating Flink's
> time travel API.
>
> +1 for this.
>
> > Regarding the inconsistency, I think we can consider time-travel as a
> specical case, and we do needs to highlight this in this FLIP.
> As for "violate the restriction outlined in FLINK-21978[1]", since we cast
> timestamp to epochMillis only for the internal use, and won't expose it to
> users, I don't think it will violate the restriction.
> Btw, please add a brief desc to explain the meaning of the parameter
> `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath, long
> timestamp)`. Maybe something like "timestamp of the table snapt, which is
> millseconds since 1970-01-01 00:00:00 UTC".
>
> Thank you for the suggestions regarding the document. I will add them to
> FLIP.
>
>
> Best,
> Feng
>
>
> On Wed, Jun 7, 2023 at 12:18 PM Benchao Li  wrote:
>
> > I also share the concern about the timezone problem.
> >
> > The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
> WITHOUT
> > TIME ZONE", converting it to unix timestamp would use UTC timezone, which
> > is not usually expected by users.
> >
> > If we want to keep consistent with the standard, we probably should use
> > "TIMESTAMP WITH LOCAL ZONE '2023-04-27 00:00:00'", which type is
> "TIMESTAMP
> > WITH LOCAL TIME ZONE", and converting it to unix timestamp will consider
> > the session timezone, which is the expected result. But it's inconvenient
> > for users.
> >
> > Taking this a special case, and converting "TIMESTAMP '2023-04-27
> > 00:00:00'" to a unix timestamp with session timezone, will be convenient
> > for users, but will break the standard. I will +0.5 for this choice.
> >
> > yuxia  于2023年6月7日周三 12:06写道：
> >
> > > Hi, Feng Jin.
> > > I think the concern of Leonard may be the inconsistency of the behavior
> > of
> > > TIMESTAMP '2023-04-27 00:00:00' beween timetravel and other sql
> > statement.
> > >
> > > For the normal sql:
> > > `SELECT TIM

Re: [VOTE] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-07 Thread Benchao Li

+1, binding

Jark Wu  于2023年6月7日周三 14:44写道：

> +1 (binding)
>
> Best,
> Jark
>
> > 2023年6月7日 14:20，liu ron  写道：
> >
> > Hi everyone,
> >
> > Thanks for all the feedback about FLIP-315: Support Operator Fusion
> Codegen
> > for Flink SQL[1].
> > [2] is the discussion thread.
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours (until June 12th, 12:00AM GMT) unless there is an objection or an
> > insufficient number of votes.
> >
> > [1]:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > [2]: https://lists.apache.org/thread/9cnqhsld4nzdr77s2fwf00o9cb2g9fmw
> >
> > Best,
> > Ron
>
>

-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

2023-06-06 Thread Benchao Li

y property, its should
> >> not
> >> >>> be
> >> >>>> coupled with the Catalog or table definition. In my opinion this is
> >> >>>> something that DynamicTableFactory only has to know about. I would
> >> >> rather
> >> >>>> see this feature as it is - SQL syntax enhancement but delegate
> >> clearly
> >> >>> to
> >> >>>> DynamicTableFactory.
> >> >>>>
> >> >>>> I've implemented timetravel feature for Delta Connector  [1]  using
> >> >>>> current Flink API.
> >> >>>> Docs are pending code review, but you can find them here [2] and
> >> >> examples
> >> >>>> are available here [3]
> >> >>>>
> >> >>>> The timetravel feature that I've implemented is based on Flink
> Query
> >> >>>> hints.
> >> >>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> >> >>>>
> >> >>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is
> >> handled
> >> >>> not
> >> >>>> by Catalog but by DyntamicTableFactory implementation for Delta
> >> >>> connector.
> >> >>>> The value of this property is passed to Delta standalone lib API
> that
> >> >>>> returns table view for given version.
> >> >>>>
> >> >>>> I'm not sure how/if proposed change could benefit Delta connector
> >> >>>> implementation for this feature.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Krzysztof
> >> >>>>
> >> >>>> [1]
> >> >>>>
> >> >>>
> >> >>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> >> >>>> [2]
> >> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> >> >>>> [3]
> >> >>>>
> >> >>>
> >> >>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> >> >>>>
> >> >>>> śr., 31 maj 2023 o 06:03 liu ron  napisał(a):
> >> >>>>
> >> >>>>> Hi, Feng
> >> >>>>>
> >> >>>>> Thanks for driving this FLIP, Time travel is very useful for Flink
> >> >>>>> integrate with data lake system. I have one question why the
> >> >>>>> implementation
> >> >>>>> of TimeTravel is delegated to Catalog? Assuming that we use Flink
> to
> >> >>> query
> >> >>>>> Hudi table with the time travel syntax, but we don't use the
> >> >>> HudiCatalog,
> >> >>>>> instead, we register the hudi table to InMemoryCatalog,  can we
> >> >> support
> >> >>>>> time travel for Hudi table in this case?
> >> >>>>> In contrast, I think time travel should bind to connector instead
> of
> >> >>>>> Catalog, so the rejected alternative should be considered.
> >> >>>>>
> >> >>>>> Best,
> >> >>>>> Ron
> >> >>>>>
> >> >>>>> yuxia  于2023年5月30日周二 09:40写道：
> >> >>>>>
> >> >>>>>> Hi, Feng.
> >> >>>>>> Notice this FLIP only support batch mode for time travel.  Would
> it
> >> >>> also
> >> >>>>>> make sense to support stream mode to a read a snapshot of the
> table
> >> >>> as a
> >> >>>>>> bounded stream?
> >> >>>>>>
> >> >>>>>> Best regards,
> >> >>>>>> Yuxia
> >> >>>>>>
> >> >>>>>> - 原始邮件 -
> >> >>>>>> 发件人: "Benchao Li" 
> >> >>>>>> 收件人: "dev" 
> >> >>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> >> >>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >> >>>>>>
> >> >>>>>> # Can Calcite sup

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-05 Thread Benchao Li

t;
> > > > > > > Hi, Jingsong
> > > > > > >
> > > > > > > Thanks for your review. We have tested it in TPC-DS case, and
> got a
> > > > 12%
> > > > > > > gain overall when only supporting only Calc
> > > > operator.
> > > > > In
> > > > > > > some queries, we even get more than 30% gain, it looks like  an
> > > > > effective
> > > > > > > way.
> > > > > > >
> > > > > > > Best,
> > > > > > > Ron
> > > > > > >
> > > > > > > Jingsong Li  于2023年5月29日周一 14:33写道：
> > > > > > >
> > > > > > > > Thanks Ron for the proposal.
> > > > > > > >
> > > > > > > > Do you have some benchmark results for the performance
> > > > improvement? I
> > > > > > > > am more concerned about the improvement on Flink than the
> data in
> > > > > > > > other papers.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jingsong
> > > > > > > >
> > > > > > > > On Mon, May 29, 2023 at 2:16 PM liu ron 
> > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi, dev
> > > > > > > > >
> > > > > > > > > I'd like to start a discussion about FLIP-315: Support
> Operator
> > > > > > Fusion
> > > > > > > > > Codegen for Flink SQL[1]
> > > > > > > > >
> > > > > > > > > As main memory grows, query performance is more and more
> > > > determined
> > > > > > by
> > > > > > > > the
> > > > > > > > > raw CPU costs of query processing itself, this is due to
> the
> > > > query
> > > > > > > > > processing techniques based on interpreted execution shows
> poor
> > > > > > > > performance
> > > > > > > > > on modern CPUs due to lack of locality and frequent
> instruction
> > > > > > > > > mis-prediction. Therefore, the industry is also
> researching how
> > > > to
> > > > > > > > improve
> > > > > > > > > engine performance by increasing operator execution
> efficiency.
> > > > In
> > > > > > > > > addition, during the process of optimizing Flink's
> performance
> > > > for
> > > > > > TPC-DS
> > > > > > > > > queries, we found that a significant amount of CPU time was
> > > spent
> > > > > on
> > > > > > > > > virtual function calls, framework collector calls, and
> invalid
> > > > > > > > > calculations, which can be optimized to improve the overall
> > > > engine
> > > > > > > > > performance. After some investigation, we found Operator
> Fusion
> > > > > > Codegen
> > > > > > > > > which is proposed by Thomas Neumann in the paper[2] can
> address
> > > > > these
> > > > > > > > > problems. I have finished a PoC[3] to verify its
> feasibility
> > > and
> > > > > > > > validity.
> > > > > > > > >
> > > > > > > > > Looking forward to your feedback.
> > > > > > > > >
> > > > > > > > > [1]:
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > > > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
> > > > > > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Ron
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

2023-06-03 Thread Benchao Li

Thanks Yuxia for the explanation, it makes sense to me. It would be great
if you also add this to the FLIP doc.

yuxia  于2023年6月1日周四 17:11写道：

> Hi, Benchao.
> Thanks for your attention.
>
> Initially, I also want to pass `TableEnvironment` to procedure. But
> according my investegation and offline discussion with Jingson, the real
> important thing for procedure devs is the ability to build Flink
> datastream. But we can't get the `StreamExecutionEnvironment` which is the
> entrypoint to build datastream. That's to say we will lost the ability to
> build a datastream if we just pass `TableEnvironment`.
>
> Of course, we can also pass `TableEnvironment` along with
> `StreamExecutionEnvironment` to Procedure. But I'm intend to be cautious
> about exposing too much too early to procedure devs. If someday we find we
> will need `TableEnvironment` to custom a procedure, we can then add a
> method like `getTableEnvironment()` in `ProcedureContext`.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Benchao Li" 
> 收件人: "dev" 
> 发送时间: 星期四, 2023年 6 月 01日 下午 12:58:08
> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
>
> Thanks Yuxia for opening this discussion,
>
> The general idea looks good to me, I only have one question about the
> `ProcedureContext#getExecutionEnvironment`. Why are you proposing to return
> a `StreamExecutionEnvironment` instead of `TableEnvironment`, could you
> elaborate a little more on this?
>
> Jingsong Li  于2023年5月30日周二 17:58写道：
>
> > Thanks for your explanation.
> >
> > We can support Iterable in future. Current design looks good to me.
> >
> > Best,
> > Jingsong
> >
> > On Tue, May 30, 2023 at 4:56 PM yuxia 
> wrote:
> > >
> > > Hi, Jingsong.
> > > Thanks for your feedback.
> > >
> > > > Does this need to be a function call? Do you have some example?
> > > I think it'll be useful to support function call when user call
> > procedure.
> > > The following example is from iceberg:[1]
> > > CALL catalog_name.system.migrate('spark_catalog.db.sample', map('foo',
> > 'bar'));
> > >
> > > It allows user to use `map('foo', 'bar')` to pass a map data to
> > procedure.
> > >
> > > Another case that I can imagine may be rollback a table to the snapshot
> > of one week ago.
> > > Then, with function call, user may call `rollback(table_name, now() -
> > INTERVAL '7' DAY)` to acheive such purpose.
> > >
> > > Although it can be function call, the eventual parameter got by the
> > procedure will always be the literal evaluated.
> > >
> > >
> > > > Procedure looks like a TableFunction, do you consider using Collector
> > > something like TableFunction? (Supports large amount of data)
> > >
> > > Yes, I had considered it. But returns T[] is for simpility,
> > >
> > > First, regarding how to return the calling result of a procedure, it
> > looks more intuitive to me to use the return result of the `call` method
> > instead of by calling something like collector#collect.
> > > Introduce a collector will increase necessary complexity.
> > >
> > > Second, regarding supporting large amount of data,  acoording my
> > investagtion, I haven't seen the requirement that supports returning
> large
> > amount of data.
> > > Iceberg also return an array.[2] If you do think we should support
> large
> > amount of data, I think we can change to return type from T[] to
> Iterable
> > >
> > > [1]: https://iceberg.apache.org/docs/latest/spark-procedures/#migrate
> > > [2]:
> >
> https://github.com/apache/iceberg/blob/601c5af9b6abded79dabeba177331310d5487f43/spark/v3.2/spark/src/main/java/org/apache/spark/sql/connector/iceberg/catalog/Procedure.java#L44
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Jingsong Li" 
> > > 收件人: "dev" 
> > > 发送时间: 星期一, 2023年 5 月 29日 下午 2:42:04
> > > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > >
> > > Thanks Yuxia for the proposal.
> > >
> > > > CALL [catalog_name.][database_name.]procedure_name ([ expression [,
> > expression]* ] )
> > >
> > > The expression can be a function call. Does this need to be a function
> > > call? Do you have some example?
> > >
> > > > Procedure returns T[]
> > >
> > > Procedure looks like a TableFunction, do you consider using Collector
> > > something like Ta

Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

2023-05-31 Thread Benchao Li

Thanks Yuxia for opening this discussion,

The general idea looks good to me, I only have one question about the
`ProcedureContext#getExecutionEnvironment`. Why are you proposing to return
a `StreamExecutionEnvironment` instead of `TableEnvironment`, could you
elaborate a little more on this?

Jingsong Li  于2023年5月30日周二 17:58写道：

> Thanks for your explanation.
>
> We can support Iterable in future. Current design looks good to me.
>
> Best,
> Jingsong
>
> On Tue, May 30, 2023 at 4:56 PM yuxia  wrote:
> >
> > Hi, Jingsong.
> > Thanks for your feedback.
> >
> > > Does this need to be a function call? Do you have some example?
> > I think it'll be useful to support function call when user call
> procedure.
> > The following example is from iceberg:[1]
> > CALL catalog_name.system.migrate('spark_catalog.db.sample', map('foo',
> 'bar'));
> >
> > It allows user to use `map('foo', 'bar')` to pass a map data to
> procedure.
> >
> > Another case that I can imagine may be rollback a table to the snapshot
> of one week ago.
> > Then, with function call, user may call `rollback(table_name, now() -
> INTERVAL '7' DAY)` to acheive such purpose.
> >
> > Although it can be function call, the eventual parameter got by the
> procedure will always be the literal evaluated.
> >
> >
> > > Procedure looks like a TableFunction, do you consider using Collector
> > something like TableFunction? (Supports large amount of data)
> >
> > Yes, I had considered it. But returns T[] is for simpility,
> >
> > First, regarding how to return the calling result of a procedure, it
> looks more intuitive to me to use the return result of the `call` method
> instead of by calling something like collector#collect.
> > Introduce a collector will increase necessary complexity.
> >
> > Second, regarding supporting large amount of data,  acoording my
> investagtion, I haven't seen the requirement that supports returning large
> amount of data.
> > Iceberg also return an array.[2] If you do think we should support large
> amount of data, I think we can change to return type from T[] to Iterable
> >
> > [1]: https://iceberg.apache.org/docs/latest/spark-procedures/#migrate
> > [2]:
> https://github.com/apache/iceberg/blob/601c5af9b6abded79dabeba177331310d5487f43/spark/v3.2/spark/src/main/java/org/apache/spark/sql/connector/iceberg/catalog/Procedure.java#L44
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Jingsong Li" 
> > 收件人: "dev" 
> > 发送时间: 星期一, 2023年 5 月 29日 下午 2:42:04
> > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> >
> > Thanks Yuxia for the proposal.
> >
> > > CALL [catalog_name.][database_name.]procedure_name ([ expression [,
> expression]* ] )
> >
> > The expression can be a function call. Does this need to be a function
> > call? Do you have some example?
> >
> > > Procedure returns T[]
> >
> > Procedure looks like a TableFunction, do you consider using Collector
> > something like TableFunction? (Supports large amount of data)
> >
> > Best,
> > Jingsong
> >
> > On Mon, May 29, 2023 at 2:33 PM yuxia 
> wrote:
> > >
> > > Hi, everyone.
> > >
> > > I’d like to start a discussion about FLIP-311: Support Call Stored
> Procedure [1]
> > >
> > > Stored procedure provides a convenient way to encapsulate complex
> logic to perform data manipulation or administrative tasks in external
> storage systems. It's widely used in traditional databases and popular
> compute engines like Trino for it's convenience. Therefore, we propose
> adding support for call stored procedure in Flink to enable better
> integration with external storage systems.
> > >
> > > With this FLIP, Flink will allow connector developers to develop their
> own built-in stored procedures, and then enables users to call these
> predefiend stored procedures.
> > >
> > > Looking forward to your feedbacks.
> > >
> > > [1]:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> > >
> > > Best regards,
> > > Yuxia
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

2023-05-29 Thread Benchao Li

# Can Calcite support this syntax ` VERSION AS OF`  ?

This also depends on whether this is defined in standard or any known
databases that have implemented this. If not, it would be hard to push it
to Calcite.

# getTable(ObjectPath object, long timestamp)

Then we again come to the problem of "casting between timestamp and
numeric", which has been disabled in FLINK-21978[1]. If you're gonna use
this, then we need to clarify that problem first.

[1] https://issues.apache.org/jira/browse/FLINK-21978


Feng Jin  于2023年5月29日周一 15:57写道：

> hi, thanks for your reply.
>
> @Benchao
> > did you consider the pushdown abilities compatible
>
> In the current design, the implementation of TimeTravel is delegated to
> Catalog. We have added a function called getTable(ObjectPath tablePath,
> long timestamp) to obtain the corresponding CatalogBaseTable at a specific
> time.  Therefore, I think it will not have any impact on the original
> pushdown abilities.
>
>
> >   I see there is a rejected  design for adding SupportsTimeTravel, but I
> didn't see the alternative in  the FLIP doc
>
> Sorry, the document description is not very clear.  Regarding whether to
> support SupportTimeTravel, I have discussed it with yuxia. Since we have
> already passed the corresponding time in getTable(ObjectPath, long
> timestamp) of Catalog, SupportTimeTravel may not be necessary.
>
> In getTable(ObjectPath object, long timestamp), we can obtain the schema of
> the corresponding time point and put the SNAPSHOT that needs to be consumed
> into options.
>
>
> @Shammon
> > Could we support this in Flink too?
>
> I personally think it's possible, but limited by Calcite's syntax
> restrictions. I believe we should first support this syntax in Calcite.
> Currently, I think it may not be easy  to support this syntax in Flink's
> parser. @Benchao, what do you think? Can Calcite support this syntax
> ` VERSION AS OF`  ?
>
>
> Best,
> Feng.
>
>
> On Fri, May 26, 2023 at 2:55 PM Shammon FY  wrote:
>
> > Thanks Feng, the feature of time travel sounds great!
> >
> > In addition to SYSTEM_TIME, lake houses such as paimon and iceberg
> support
> > snapshot or version. For example, users can query snapshot 1 for paimon
> by
> > the following statement
> > SELECT * FROM t VERSION AS OF 1
> >
> > Could we support this in Flink too?
> >
> > Best,
> > Shammon FY
> >
> > On Fri, May 26, 2023 at 1:20 PM Benchao Li  wrote:
> >
> > > Regarding the implementation, did you consider the pushdown abilities
> > > compatible, e.g., projection pushdown, filter pushdown, partition
> > pushdown.
> > > Since `Snapshot` is not handled much in existing rules, I have a
> concern
> > > about this. Of course, it depends on your implementation detail, what
> is
> > > important is that we'd better add some cross tests for these.
> > >
> > > Regarding the interface exposed to Connector, I see there is a rejected
> > > design for adding SupportsTimeTravel, but I didn't see the alternative
> in
> > > the FLIP doc. IMO, this is an important thing we need to clarify
> because
> > we
> > > need to know whether the Connector supports this, and what
> > column/metadata
> > > corresponds to 'system_time'.
> > >
> > > Feng Jin  于2023年5月25日周四 22:50写道：
> > >
> > > > Thanks for your reply
> > > >
> > > > @Timo @BenChao @yuxia
> > > >
> > > > Sorry for the mistake,  Currently , calcite only supports  `FOR
> > > SYSTEM_TIME
> > > > AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS OF` .  I've
> > > > updated the syntax part of the FLIP.
> > > >
> > > >
> > > > @Timo
> > > >
> > > > > We will convert it to TIMESTAMP_LTZ?
> > > >
> > > > Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ and then
> > > convert
> > > > it into a long value.
> > > >
> > > > > How do we want to query the most recent version of a table
> > > >
> > > > I think we can use `AS OF CURRENT_TIMESTAMP` ，But it does cause
> > > > inconsistency with the real-time concept.
> > > > However, from my personal understanding, the scope of  `AS OF
> > > > CURRENT_TIMESTAMP` is the table itself, not the table record.  So, I
> > > think
> > > > using CURRENT_TIMESTAMP should also be reasonable?.
> > > > Additionally, if no version is specified, the latest version should
> be
> > > used
> > > >

Re: [DISCUSS] Hive dialect shouldn't fall back to Flink's default dialect

2023-05-29 Thread Benchao Li

Big +1 on this, thanks yuxia for driving this!

yuxia  于2023年5月29日周一 14:55写道：

> Hi, community.
>
> I want to start the discussion about Hive dialect shouldn't fall back to
> Flink's default dialect.
>
> Currently, when the HiveParser fail to parse the sql in Hive dialect,
> it'll fall back to Flink's default parser[1] to handle flink-specific
> statements like "CREATE CATALOG xx with (xx);".
>
> As I‘m involving with Hive dialect and have some communication with
> community users who use Hive dialectrecently,  I'm thinking throw exception
> directly instead of falling back to Flink's default dialect when fail to
> parse the sql in Hive dialect
>
> Here're some reasons:
>
> First of all, it'll hide some error with Hive dialect. For example, we
> found we can't use Hive dialect any more with Flink sql client in release
> validation phase[2], finally we find a modification in Flink sql client
> cause it, but our test case can't find it earlier for although HiveParser
> faill to parse it but then it'll fall back to default parser and pass test
> case successfully.
>
> Second, conceptually, Hive dialect should be do nothing with Flink's
> default dialect. They are two totally different dialect. If we do need a
> dialect mixing Hive dialect and default dialect , may be we need to propose
> a new hybrid dialect and announce the hybrid behavior to users.
> Also, It made some users confused for the fallback behavior. The fact
> comes from I had been ask by community users. Throw an excpetioin directly
> when fail to parse the sql statement in Hive dialect will be more intuitive.
>
> Last but not least, it's import to decouple Hive with Flink planner[3]
> before we can externalize Hive connector[4]. If we still fall back to Flink
> default dialct, then we will need depend on `ParserImpl` in Flink planner,
> which will block us removing the provided dependency of Hive dialect as
> well as externalizing Hive connector.
>
> Although we hadn't announced the fall back behavior ever, but some users
> may implicitly depend on this behavior in theirs sql jobs. So, I hereby
> open the dicussion about abandoning the fall back behavior to make Hive
> dialect clear and isoloted.
> Please remember it won't break the Hive synatax but the syntax specified
> to Flink may fail after then. But for the failed sql, you can use `SET
> table.sql-dialect=default;` to switch to Flink dialect.
> If there's some flink-specific statements we found should be included in
> Hive dialect to be easy to use, I think we can still add them as specific
> cases to Hive dialect.
>
> Look forwards to your feedback. I'd love to listen the feedback from
> community to take the next steps.
>
> [1]:
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/HiveParser.java#L348
> [2]:https://issues.apache.org/jira/browse/FLINK-26681
> [3]:https://issues.apache.org/jira/browse/FLINK-31413
> [4]:https://issues.apache.org/jira/browse/FLINK-30064
>
>
>
> Best regards,
> Yuxia
>


-- 

Best,
Benchao Li

Re: [VOTE] Release flink-connector-jdbc v3.1.1, release candidate #1

2023-05-28 Thread Benchao Li

Thanks Martijn,

- checked signature/checksum [OK]
- downloaded src, compiled from source [OK]
- diffed src and tag, no binary files [OK]
- gone through nexus staging area, looks good [OK]
- run with flink 1.7.1 [OK]

One thing I spotted is that the version in `docs/data/jdbc.yml` is still
3.1.0, I'm not sure whether this should be a blocker.


Martijn Visser  于2023年5月25日周四 02:55写道：

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 3.1.1,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which are signed with the key with fingerprint
> A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag v3.1.1-rc1 [5],
> * website pull request listing the new release [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353281
> [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.1.1-rc1
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1636/
> [5] https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.1-rc1
> [6] https://github.com/apache/flink-web/pull/654
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

2023-05-25 Thread Benchao Li

Regarding the implementation, did you consider the pushdown abilities
compatible, e.g., projection pushdown, filter pushdown, partition pushdown.
Since `Snapshot` is not handled much in existing rules, I have a concern
about this. Of course, it depends on your implementation detail, what is
important is that we'd better add some cross tests for these.

Regarding the interface exposed to Connector, I see there is a rejected
design for adding SupportsTimeTravel, but I didn't see the alternative in
the FLIP doc. IMO, this is an important thing we need to clarify because we
need to know whether the Connector supports this, and what column/metadata
corresponds to 'system_time'.

Feng Jin  于2023年5月25日周四 22:50写道：

> Thanks for your reply
>
> @Timo @BenChao @yuxia
>
> Sorry for the mistake,  Currently , calcite only supports  `FOR SYSTEM_TIME
> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS OF` .  I've
> updated the syntax part of the FLIP.
>
>
> @Timo
>
> > We will convert it to TIMESTAMP_LTZ?
>
> Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ and then convert
> it into a long value.
>
> > How do we want to query the most recent version of a table
>
> I think we can use `AS OF CURRENT_TIMESTAMP` ，But it does cause
> inconsistency with the real-time concept.
> However, from my personal understanding, the scope of  `AS OF
> CURRENT_TIMESTAMP` is the table itself, not the table record.  So, I think
> using CURRENT_TIMESTAMP should also be reasonable?.
> Additionally, if no version is specified, the latest version should be used
> by default.
>
>
>
> Best,
> Feng
>
>
> On Thu, May 25, 2023 at 7:47 PM yuxia  wrote:
>
> > Thanks Feng for bringing this up. It'll be great to introduce time travel
> > to Flink to have a better integration with external data soruces.
> >
> > I also share same concern about the syntax.
> > I see in the part of `Whether to support other syntax implementations` in
> > this FLIP, seems the syntax in Calcite should be `FOR SYSTEM_TIME AS OF`,
> > right?
> > But the the syntax part in this FLIP, it seems to be `AS OF TIMESTAMP`
> > instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake or by design?
> >
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Benchao Li" 
> > 收件人: "dev" 
> > 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> > 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >
> > Thanks Feng, it's exciting to have this ability.
> >
> > Regarding the syntax section, are you proposing `AS OF` instead of `FOR
> > SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in the SQL
> standard
> > and has been supported in some database vendors such as SQL Server. About
> > `AS OF`, is it in the standard or any database vendor supports this, if
> > yes, I think it's worth to add this support to Calcite, and I would give
> a
> > hand in Calcite side. Otherwise, I think we'd better to use `FOR SYSTEM
> AS
> > OF`.
> >
> > Timo Walther  于2023年5月25日周四 19:02写道：
> >
> > > Also: How do we want to query the most recent version of a table?
> > >
> > > `AS OF CURRENT_TIMESTAMP` would be ideal, but according to the docs
> both
> > > the type is TIMESTAMP_LTZ and what is even more concerning is the it
> > > actually is evalated row-based:
> > >
> > >  > Returns the current SQL timestamp in the local time zone, the return
> > > type is TIMESTAMP_LTZ(3). It is evaluated for each record in streaming
> > > mode. But in batch mode, it is evaluated once as the query starts and
> > > uses the same result for every row.
> > >
> > > This could make it difficult to explain in a join scenario of multiple
> > > snapshotted tables.
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 25.05.23 12:29, Timo Walther wrote:
> > > > Hi Feng,
> > > >
> > > > thanks for proposing this FLIP. It makes a lot of sense to finally
> > > > support querying tables at a specific point in time or hopefully also
> > > > ranges soon. Following time-versioned tables.
> > > >
> > > > Here is some feedback from my side:
> > > >
> > > > 1. Syntax
> > > >
> > > > Can you elaborate a bit on the Calcite restrictions?
> > > >
> > > > Does Calcite currently support `AS OF` syntax for this but not `FOR
> > > > SYSTEM_TIME AS OF`?
> > > >
> > > > It would be great to support `AS OF` also for time-ver

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

2023-05-25 Thread Benchao Li

Thanks Feng, it's exciting to have this ability.

Regarding the syntax section, are you proposing `AS OF` instead of `FOR
SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in the SQL standard
and has been supported in some database vendors such as SQL Server. About
`AS OF`, is it in the standard or any database vendor supports this, if
yes, I think it's worth to add this support to Calcite, and I would give a
hand in Calcite side. Otherwise, I think we'd better to use `FOR SYSTEM AS
OF`.

Timo Walther  于2023年5月25日周四 19:02写道：

> Also: How do we want to query the most recent version of a table?
>
> `AS OF CURRENT_TIMESTAMP` would be ideal, but according to the docs both
> the type is TIMESTAMP_LTZ and what is even more concerning is the it
> actually is evalated row-based:
>
>  > Returns the current SQL timestamp in the local time zone, the return
> type is TIMESTAMP_LTZ(3). It is evaluated for each record in streaming
> mode. But in batch mode, it is evaluated once as the query starts and
> uses the same result for every row.
>
> This could make it difficult to explain in a join scenario of multiple
> snapshotted tables.
>
> Regards,
> Timo
>
>
> On 25.05.23 12:29, Timo Walther wrote:
> > Hi Feng,
> >
> > thanks for proposing this FLIP. It makes a lot of sense to finally
> > support querying tables at a specific point in time or hopefully also
> > ranges soon. Following time-versioned tables.
> >
> > Here is some feedback from my side:
> >
> > 1. Syntax
> >
> > Can you elaborate a bit on the Calcite restrictions?
> >
> > Does Calcite currently support `AS OF` syntax for this but not `FOR
> > SYSTEM_TIME AS OF`?
> >
> > It would be great to support `AS OF` also for time-versioned joins and
> > have a unified and short syntax.
> >
> > Once a fix is merged in Calcite for this, we can make this available in
> > Flink earlier by copying the corresponding classes until the next
> > Calcite upgrade is performed.
> >
> > 2. Semantics
> >
> > How do we interpret the timestamp? In Flink we have 2 timestamp types
> > (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF TIMESTAMP
> > '2023-04-27 00:00:00', in which timezone will the timestamp be? We will
> > convert it to TIMESTAMP_LTZ?
> >
> > We definely need to clarify this because the past has shown that
> > daylight saving times make our lives hard.
> >
> > Thanks,
> > Timo
> >
> > On 25.05.23 10:57, Feng Jin wrote:
> >> Hi, everyone.
> >>
> >> I’d like to start a discussion about FLIP-308: Support Time Travel In
> >> Batch
> >> Mode [1]
> >>
> >>
> >> Time travel is a SQL syntax used to query historical versions of data.
> It
> >> allows users to specify a point in time and retrieve the data and
> >> schema of
> >> a table as it appeared at that time. With time travel, users can easily
> >> analyze and compare historical versions of data.
> >>
> >>
> >> With the widespread use of data lake systems such as Paimon, Iceberg,
> and
> >> Hudi, time travel can provide more convenience for users' data analysis.
> >>
> >>
> >> Looking forward to your opinions, any suggestions are welcomed.
> >>
> >>
> >>
> >> 1.
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> >>
> >>
> >>
> >> Best.
> >>
> >> Feng
> >>
> >
>
>

-- 

Best,
Benchao Li

Re: [VOTE] Release 1.17.1, release candidate #1

2023-05-23 Thread Benchao Li

; > >
> > > > > > fingerprint 8D56AE6E7082699A4870750EA4E8C4C05EE6861F [3],
> > > > > >
> > > > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > > >
> > > > > > * source code tag "release-1.17.1-rc1" [5],
> > > > > >
> > > > > > * website pull request listing the new release and adding
> > > announcement
> > > > > blog
> > > > > >
> > > > > > post [6].
> > > > > >
> > > > > >
> > > > > > The vote will be open for at least 72 hours. It is adopted by
> > > majority
> > > > > >
> > > > > > approval, with at least 3 PMC affirmative votes.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Release Manager
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352886
> > > > > >
> > > > > > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-1.17.1-rc1/
> > > > > >
> > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > >
> > > > > > [4]
> > > > > >
> > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1635/
> > > > > >
> > > > > > [5]
> > https://github.com/apache/flink/releases/tag/release-1.17.1-rc1
> > > > > >
> > > > > > [6] https://github.com/apache/flink-web/pull/650
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink PMC Member - Leonard Xu

2023-04-22 Thread Benchao Li

Congratulations, Leonard!

Lincoln Lee  于2023年4月23日周日 09:12写道：

> Congratulations, Leonard!
>
> Best,
> Lincoln Lee
>
>
> Yuxin Tan  于2023年4月22日周六 11:57写道：
>
> > Congratulations, Leonard!
> >
> > Best,
> > Yuxin
> >
> >
> > Panagiotis Garefalakis  于2023年4月22日周六 08:08写道：
> >
> > > Congrats Leonard!
> > >
> > > On Fri, Apr 21, 2023 at 11:19 AM Ahmed Hamdy 
> > wrote:
> > >
> > > > Congratulations Leonard.
> > > > Best Regards
> > > > Ahmed
> > > >
> > > > On Fri, 21 Apr 2023 at 17:23, Samrat Deb 
> > wrote:
> > > >
> > > > > congratulations
> > > > >
> > > > > On Fri, 21 Apr 2023 at 9:44 PM, David Morávek 
> > wrote:
> > > > >
> > > > > > Congratulations, Leonard, well deserved!
> > > > > >
> > > > > > Best,
> > > > > > D.
> > > > > >
> > > > > > On Fri 21. 4. 2023 at 16:40, Feng Jin 
> > wrote:
> > > > > >
> > > > > > > Congratulations, Leonard
> > > > > > >
> > > > > > >
> > > > > > > 
> > > > > > > Best,
> > > > > > > Feng Jin
> > > > > > >
> > > > > > > On Fri, Apr 21, 2023 at 8:38 PM Mang Zhang  >
> > > > wrote:
> > > > > > >
> > > > > > > > Congratulations, Leonard.
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Mang Zhang
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > At 2023-04-21 19:47:52, "Jark Wu"  wrote:
> > > > > > > > >Hi everyone,
> > > > > > > > >
> > > > > > > > >We are thrilled to announce that Leonard Xu has joined the
> > Flink
> > > > > PMC!
> > > > > > > > >
> > > > > > > > >Leonard has been an active member of the Apache Flink
> > community
> > > > for
> > > > > > many
> > > > > > > > >years and became a committer in Nov 2021. He has been
> involved
> > > in
> > > > > > > various
> > > > > > > > >areas of the project, from code contributions to community
> > > > building.
> > > > > > His
> > > > > > > > >contributions are mainly focused on Flink SQL and
> connectors,
> > > > > > especially
> > > > > > > > >leading the flink-cdc-connectors project to receive 3.8+K
> > GitHub
> > > > > > stars.
> > > > > > > He
> > > > > > > > >authored 150+ PRs, and reviewed 250+ PRs, and drove several
> > > FLIPs
> > > > > > (e.g.,
> > > > > > > > >FLIP-132, FLIP-162). He has participated in plenty of
> > > discussions
> > > > in
> > > > > > the
> > > > > > > > >dev mailing list, answering questions about 500+ threads in
> > the
> > > > > > > > >user/user-zh mailing list. Besides that, he is community
> > minded,
> > > > > such
> > > > > > as
> > > > > > > > >being the release manager of 1.17, verifying releases,
> > managing
> > > > > > release
> > > > > > > > >syncs, etc.
> > > > > > > > >
> > > > > > > > >Congratulations and welcome Leonard!
> > > > > > > > >
> > > > > > > > >Best,
> > > > > > > > >Jark (on behalf of the Flink PMC)
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 

Best,
Benchao Li

Re: [ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren

2023-04-22 Thread Benchao Li

Congratulations, Qingsheng!

yuxia  于2023年4月23日周日 09:24写道：

> Congratulations, Qingsheng!
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "lincoln 86xy" 
> 收件人: "dev" 
> 发送时间: 星期日, 2023年 4 月 23日 上午 9:11:07
> 主题: Re: [ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren
>
> Congratulations, Qingsheng!
>
> Best,
> Lincoln Lee
>
>
> Yuxin Tan  于2023年4月22日周六 11:57写道：
>
> > Congratulations, Qingsheng!
> >
> > Best,
> > Yuxin
> >
> >
> > Ahmed Hamdy  于2023年4月22日周六 02:20写道：
> >
> > > Congratulations Qingsheng.
> > > Best regards
> > > Ahmed
> > >
> > > On Fri, 21 Apr 2023 at 17:22, Samrat Deb 
> wrote:
> > >
> > > > congratulations !
> > > >
> > > > On Fri, 21 Apr 2023 at 9:45 PM, David Morávek 
> wrote:
> > > >
> > > > > Congratulations, Qingsheng, well deserved!
> > > > >
> > > > > Best,
> > > > > D.
> > > > >
> > > > > On Fri 21. 4. 2023 at 16:41, Feng Jin 
> wrote:
> > > > >
> > > > > > Congratulations, Qingsheng
> > > > > >
> > > > > >
> > > > > > 
> > > > > > Best,
> > > > > > Feng Jin
> > > > > >
> > > > > > On Fri, Apr 21, 2023 at 8:39 PM Mang Zhang 
> > > wrote:
> > > > > >
> > > > > > > Congratulations, Qingsheng.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Mang Zhang
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > At 2023-04-21 19:50:02, "Jark Wu"  wrote:
> > > > > > > >Hi everyone,
> > > > > > > >
> > > > > > > >We are thrilled to announce that Qingsheng Ren has joined the
> > > Flink
> > > > > PMC!
> > > > > > > >
> > > > > > > >Qingsheng has been contributing to Apache Flink for a long
> time.
> > > He
> > > > is
> > > > > > the
> > > > > > > >core contributor and maintainer of the Kafka connector and
> > > > > > > >flink-cdc-connectors, bringing users stability and ease of use
> > in
> > > > both
> > > > > > > >projects. He drove discussions and implementations in
> FLIP-221,
> > > > > > FLIP-288,
> > > > > > > >and the connector testing framework. He is continuously
> helping
> > > with
> > > > > the
> > > > > > > >expansion of the Flink community and has given several talks
> > about
> > > > > Flink
> > > > > > > >connectors at many conferences, such as Flink Forward Global
> and
> > > > Flink
> > > > > > > >Forward Asia. Besides that, he is willing to help a lot in the
> > > > > community
> > > > > > > >work, such as being the release manager for both 1.17 and
> 1.18,
> > > > > > verifying
> > > > > > > >releases, and answering questions on the mailing list.
> > > > > > > >
> > > > > > > >Congratulations and welcome Qingsheng!
> > > > > > > >
> > > > > > > >Best,
> > > > > > > >Jark (on behalf of the Flink PMC)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 

Best,
Benchao Li

Re: Re: [VOTE] FLIP-302: Support TRUNCATE TABLE statement in batch mode

2023-04-18 Thread Benchao Li

+1 (binding)

Hang Ruan  于2023年4月18日周二 14:03写道：

> +1 (no-binding)
>
> Best,
> Hang
>
> Shammon FY  于2023年4月18日周二 13:33写道：
>
> > +1 (no-binding)
> >
> > Best,
> > Shammon FY
> >
> > On Tue, Apr 18, 2023 at 12:56 PM Jacky Lau  wrote:
> >
> > > +1 (no-binding)
> > >
> > > Best,
> > > Jacky Lau
> > >
> > > Jingsong Li  于2023年4月18日周二 11:57写道：
> > >
> > > > +1
> > > >
> > > > On Tue, Apr 18, 2023 at 9:39 AM Aitozi  wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > Best,
> > > > > Aitozi
> > > > >
> > > > > ron  于2023年4月18日周二 09:18写道：
> > > > > >
> > > > > > +1
> > > > > >
> > > > > >
> > > > > > > -原始邮件-
> > > > > > > 发件人: "Lincoln Lee" 
> > > > > > > 发送时间: 2023-04-18 09:08:08 (星期二)
> > > > > > > 收件人: dev@flink.apache.org
> > > > > > > 抄送:
> > > > > > > 主题: Re: [VOTE] FLIP-302: Support TRUNCATE TABLE statement in
> > batch
> > > > mode
> > > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > Best,
> > > > > > > Lincoln Lee
> > > > > > >
> > > > > > >
> > > > > > > yuxia  于2023年4月17日周一 23:54写道：
> > > > > > >
> > > > > > > > Hi all.
> > > > > > > >
> > > > > > > > Thanks for all the feedback on FLIP-302: Support TRUNCATE
> TABLE
> > > > statement
> > > > > > > > in batch mode [1].
> > > > > > > > Based on the discussion [2], we have come to a consensus, so
> I
> > > > would like
> > > > > > > > to start a vote.
> > > > > > > >
> > > > > > > > The vote will last for at least 72 hours unless there is an
> > > > objection or
> > > > > > > > insufficient votes.
> > > > > > > >
> > > > > > > > [1]:
> > > > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement+in+batch+mode
> > > > > > > > [2]: [
> > > > https://lists.apache.org/thread/m4r3wrd7p96wdst3nz3ncqzog6kf51cf |
> > > > > > > >
> > https://lists.apache.org/thread/m4r3wrd7p96wdst3nz3ncqzog6kf51cf
> > > ]
> > > > > > > >
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Yuxia
> > > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Ron
> > > >
> > >
> >
>


-- 

Best,
Benchao Li

Re: [VOTE] Release flink-connector-jdbc v3.1.0, release candidate #1

2023-04-16 Thread Benchao Li

And I've filed a hotfix PR[1] to fix the copyright issue.

[1] https://github.com/apache/flink-connector-jdbc/pull/41

Benchao Li  于2023年4月16日周日 17:51写道：

> * Verified hashes, signatures (OK)
> * Download, build and test source code (OK)
> * Checked files in Nexus staging (OK)
> * Start a Flink local cluster (1.17.0), run sql-client with jdbc
> connector jar with Mysql 8.0.28 (OK)
> * Go through the release note (OK)
>
> One thing I spotted is that when I checked the NOTICE file, I found that
> we have not updated the copyright year to 2023, I'm not sure whether we
> should take this as a blocker issue. If not, my vote is +1.
>
> And I left my comments in the PR for website updating.
>
> Teoh, Hong  于2023年4月14日周五 15:54写道：
>
>> Thanks Danny
>>
>> +1 (non-binding)
>>
>> * Hashes and Signatures look good
>> * All required files on dist.apache.org
>> * Tag is present in Github
>> * Verified source archive does not contain any binary files
>> * Source archive builds using maven
>>
>>
>> Cheers,
>> Hong
>>
>>
>> > On 14 Apr 2023, at 05:21, Elphas Toringepi 
>> wrote:
>> >
>> > CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>> >
>> >
>> >
>> > +1 (non-binding)
>> >
>> > * JIRA release notes look good
>> > * Verified signature and checksum for Apache source
>> > * Checked source code tag and confirmed Github and Apache source
>> releases
>> > are identical
>> > * Reviewed website pull request and left comment
>> >
>> > Kind regards,
>> > Elphas
>> >
>> >
>> > On Thu, Apr 13, 2023 at 3:42 PM Danny Cranmer 
>> > wrote:
>> >
>> >> Hi everyone,
>> >> Please review and vote on the release candidate #1 for the version
>> v3.1.0,
>> >> as follows:
>> >> [ ] +1, Approve the release
>> >> [ ] -1, Do not approve the release (please provide specific comments)
>> >>
>> >> This version supports both Flink 1.16.x and Flink 1.17.x
>> >>
>> >> The complete staging area is available for your review, which includes:
>> >> * JIRA release notes [1],
>> >> * the official Apache source release to be deployed to dist.apache.org
>> >> [2],
>> >> which are signed with the key with fingerprint
>> >> 0F79F2AFB2351BC29678544591F9C1EC125FD8DB [3],
>> >> * all artifacts to be deployed to the Maven Central Repository [4],
>> >> * source code tag v3.1.0-rc1 [5],
>> >> * website pull request listing the new release [6].
>> >>
>> >> The vote will be open for at least 72 hours. It is adopted by majority
>> >> approval, with at least 3 PMC affirmative votes.
>> >>
>> >> Thanks,
>> >> Danny
>> >>
>> >> [1]
>> >>
>> >>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352678
>> >> [2]
>> >>
>> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.1.0-rc1
>> >> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> >> [4]
>> >>
>> https://repository.apache.org/content/repositories/orgapacheflink-1614/
>> >> [5]
>> https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.0-rc1
>> >> [6] https://github.com/apache/flink-web/pull/638
>> >>
>>
>>
>
> --
>
> Best,
> Benchao Li
>


-- 

Best,
Benchao Li

Re: [VOTE] Release flink-connector-jdbc v3.1.0, release candidate #1

2023-04-16 Thread Benchao Li

* Verified hashes, signatures (OK)
* Download, build and test source code (OK)
* Checked files in Nexus staging (OK)
* Start a Flink local cluster (1.17.0), run sql-client with jdbc
connector jar with Mysql 8.0.28 (OK)
* Go through the release note (OK)

One thing I spotted is that when I checked the NOTICE file, I found that we
have not updated the copyright year to 2023, I'm not sure whether we should
take this as a blocker issue. If not, my vote is +1.

And I left my comments in the PR for website updating.

Teoh, Hong  于2023年4月14日周五 15:54写道：

> Thanks Danny
>
> +1 (non-binding)
>
> * Hashes and Signatures look good
> * All required files on dist.apache.org
> * Tag is present in Github
> * Verified source archive does not contain any binary files
> * Source archive builds using maven
>
>
> Cheers,
> Hong
>
>
> > On 14 Apr 2023, at 05:21, Elphas Toringepi  wrote:
> >
> > CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> >
> >
> >
> > +1 (non-binding)
> >
> > * JIRA release notes look good
> > * Verified signature and checksum for Apache source
> > * Checked source code tag and confirmed Github and Apache source releases
> > are identical
> > * Reviewed website pull request and left comment
> >
> > Kind regards,
> > Elphas
> >
> >
> > On Thu, Apr 13, 2023 at 3:42 PM Danny Cranmer 
> > wrote:
> >
> >> Hi everyone,
> >> Please review and vote on the release candidate #1 for the version
> v3.1.0,
> >> as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> This version supports both Flink 1.16.x and Flink 1.17.x
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> >> [2],
> >> which are signed with the key with fingerprint
> >> 0F79F2AFB2351BC29678544591F9C1EC125FD8DB [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag v3.1.0-rc1 [5],
> >> * website pull request listing the new release [6].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> >> approval, with at least 3 PMC affirmative votes.
> >>
> >> Thanks,
> >> Danny
> >>
> >> [1]
> >>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352678
> >> [2]
> >>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.1.0-rc1
> >> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> [4]
> >> https://repository.apache.org/content/repositories/orgapacheflink-1614/
> >> [5]
> https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.0-rc1
> >> [6] https://github.com/apache/flink-web/pull/638
> >>
>
>

-- 

Best,
Benchao Li

Re: [VOTE] FLIP-292: Enhance COMPILED PLAN to support operator-level state TTL configuration

2023-04-10 Thread Benchao Li

+1 (binding)

Shuo Cheng  于2023年4月11日周二 10:02写道：

> + 1 (non-binding)
>
> Best Regards,
> Shuo
>
> On Mon, Apr 10, 2023 at 6:06 PM Jane Chan  wrote:
>
> > Hi developers,
> >
> > Thanks for all the feedback on FLIP-292: Enhance COMPILED PLAN to support
> > operator-level state TTL configuration [1].
> > Based on the discussion [2], we have come to a consensus, so I would like
> > to start a vote.
> >
> > The vote will last for at least 72 hours (Apr. 13th at 10:00 A.M. GMT)
> > unless there is an objection or insufficient votes.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240883951
> > [2] https://lists.apache.org/thread/ffmc96gv8ofoskbxlhtm7w8oxv8nqzct
> >
> > Best,
> > Jane Chan
> >
>


-- 

Best,
Benchao Li

[jira] [Created] (FLINK-31673) Add E2E tests for flink jdbc driver

2023-03-30 Thread Benchao Li (Jira)

Benchao Li created FLINK-31673:
--

 Summary: Add E2E tests for flink jdbc driver
 Key: FLINK-31673
 URL: https://issues.apache.org/jira/browse/FLINK-31673
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / JDBC, Tests
Reporter: Benchao Li


Since jdbc driver will be used by third party projects, and we've introduced a 
bundled jar in flink-sql-jdbc-driver-bundle, we'd better to have e2e tests to 
verify and ensure it works fine (in case of the dependency management).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

2023-03-25 Thread Benchao Li

> > > scope is more coarse and not operator-level*. Maybe it deserves
> another
> > > > FLIP to discuss whether we need a multiple-level state TTL
> > configuration
> > > > mechanism and how to properly implement it.
> > > >
> > > > @Shammon
> > > > > Generally, Flink jobs support two types
> > > > of submission: SQL and jar. If users want to use `TTL on Operator`
> for
> > > SQL
> > > > jobs, they need to edit the json file which is not supported by
> general
> > > job
> > > > submission systems such as flink sql-client, apache kyuubi, apache
> > > > streampark and .etc. Users need to download the file and edit it
> > > manually,
> > > > but they may not have the permissions to the storage system such as
> > HDFS
> > > in
> > > > a real production environment. From this perspective, I think it is
> > > > necessary to provide a way similar to
> > > > hits that users can configure the `TTL on Operator` in their sqls
> which
> > > > help users to use it conveniently.
> > > >
> > > > IIUC, SQL client supports the statement "EXECUTE PLAN
> > > > 'file:/foo/bar/example.json'". While I think there is not much
> evidence
> > > to
> > > > say we should choose to use hints, just because users cannot touch
> > their
> > > > development environment. As a reply to @Shuo,  the TTL set through
> hint
> > > way
> > > > is not at the operator level. And whether it is really "convenient"
> > needs
> > > > more discussion.
> > > >
> > > > > I agree with @Shuo's idea that for complex cases, users can combine
> > > hits
> > > > and `json plan` to configure `TTL on Operator` better.
> > > >
> > > > Suppose users can configure TTL through
> > > > <1> SET 'table.exec.state.ttl' = 'foo';
> > > > <2> Modify the compiled JSON plan;
> > > > <3> Use hints (personally I'm strongly against this way, but let's
> take
> > > it
> > > > into consideration).
> > > > IMO if the user can configure the same parameter in so many ways,
> then
> > > the
> > > > complex case only makes things worse. Who has higher priority and who
> > > > overrides who?
> > > >
> > > > Best,
> > > > Jane
> > > >
> > > >
> > > > On Fri, Mar 24, 2023 at 11:00 AM Shammon FY 
> wrote:
> > > >
> > > > > Hi jane
> > > > >
> > > > > Thanks for initializing this discussion. Configure TTL per operator
> > can
> > > > > help users manage state more effectively.
> > > > >
> > > > > I think the `compiled json plan` proposal may need to consider the
> > > impact
> > > > > on the user's submission workflow. Generally, Flink jobs support
> two
> > > > types
> > > > > of submission: SQL and jar. If users want to use `TTL on Operator`
> > for
> > > > SQL
> > > > > jobs, they need to edit the json file which is not supported by
> > general
> > > > job
> > > > > submission systems such as flink sql-client, apache kyuubi, apache
> > > > > streampark and .etc. Users need to download the file and edit it
> > > > manually,
> > > > > but they may not have the permissions to the storage system such as
> > > HDFS
> > > > in
> > > > > a real production environment.
> > > > >
> > > > > From this perspective, I think it is necessary to provide a way
> > similar
> > > > to
> > > > > hits that users can configure the `TTL on Operator` in their sqls
> > which
> > > > > help users to use it conveniently. At the same time, I agree with
> > > @Shuo's
> > > > > idea that for complex cases, users can combine hits and `json plan`
> > to
> > > > > configure `TTL on Operator` better. What do you think? Thanks
> > > > >
> > > > >
> > > > > Best,
> > > > > Shammon FY
> > > > >
> > > > >
> > > > > On Thu, Mar 23, 2023 at 9:58 PM Shuo Cheng 
> > wrote:
> > > > >
> > > > > > Correction: “users can set 'scan.startup.mode' for kafka
> connector”
> > > ->
> > > > > > “users
> > > > > > can set 'scan.startup.mode' for kafka connector by dynamic table
> > > > option”
> > > > > >
> > > > > > Shuo Cheng 于2023年3月23日 周四21:50写道：
> > > > > >
> > > > > > > Hi Jane,
> > > > > > > Thanks for driving this, operator level state ttl is
> absolutely a
> > > > > desired
> > > > > > > feature. I would share my opinion as following:
> > > > > > >
> > > > > > > If the scope of this proposal is limited as an enhancement for
> > > > compiled
> > > > > > > json plan, it makes sense. I think it does not conflict with
> > > > > configuring
> > > > > > > state ttl
> > > > > > > in other ways, e.g., SQL HINT or something else, because they
> > just
> > > > work
> > > > > > in
> > > > > > > different level, SQL Hint works in the exact entrance of SQL
> API,
> > > > while
> > > > > > > compiled json plan is the intermediate results for SQL.
> > > > > > > I think the final shape of state ttl configuring may like the
> > that,
> > > > > users
> > > > > > > can define operator state ttl using SQL HINT (assumption...),
> but
> > > it
> > > > > may
> > > > > > > affects more than one stateful operators inside the same query
> > > block,
> > > > > > then
> > > > > > > users can further configure a specific one by modifying the
> > > compiled
> > > > > json
> > > > > > > plan...
> > > > > > >
> > > > > > > In a word, this proposal is in good shape as an enhancement for
> > > > > compiled
> > > > > > > json plan, and it's orthogonal with other ways like SQL Hint
> > which
> > > > > works
> > > > > > in
> > > > > > > a higher level.
> > > > > > >
> > > > > > >
> > > > > > > Nips:
> > > > > > >
> > > > > > > > "From the SQL semantic perspective, hints cannot intervene in
> > the
> > > > > > > calculation of data results."
> > > > > > > I think it's more properly to say that hint does not affect the
> > > > > > > equivalence of execution plans (hash agg vs sort agg), not the
> > > > > > equivalence
> > > > > > > of execution results, e.g., users can set 'scan.startup.mode'
> for
> > > > kafka
> > > > > > > connector, which also "intervene in the calculation of data
> > > results".
> > > > > > >
> > > > > > > Sincerely,
> > > > > > > Shuo
> > > > > > >
> > > > > > > On Tue, Mar 21, 2023 at 7:52 PM Jane Chan <
> qingyue@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> Hi devs,
> > > > > > >>
> > > > > > >> I'd like to start a discussion on FLIP-292: Support
> configuring
> > > > state
> > > > > > TTL
> > > > > > >> at operator level for Table API & SQL programs [1].
> > > > > > >>
> > > > > > >> Currently, we only support job-level state TTL configuration
> via
> > > > > > >> 'table.exec.state.ttl'. However, users may expect a
> fine-grained
> > > > state
> > > > > > TTL
> > > > > > >> control to optimize state usage. Hence we propose to
> > > > > > serialize/deserialize
> > > > > > >> the state TTL as metadata of the operator's state to/from the
> > > > compiled
> > > > > > >> JSON
> > > > > > >> plan, to achieve the goal that specifying different state TTL
> > when
> > > > > > >> transforming the exec node to stateful operators.
> > > > > > >>
> > > > > > >> Look forward to your opinions!
> > > > > > >>
> > > > > > >> [1]
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240883951
> > > > > > >>
> > > > > > >> Best Regards,
> > > > > > >> Jane Chan
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 

Best,
Benchao Li

Re: [VOTE] FLIP-300: Add targetColumns to DynamicTableSink#Context to solve the null overwrite problem of partial-insert

2023-03-15 Thread Benchao Li

+1 (binding)

Dong Lin  于2023年3月15日周三 23:24写道：

> +1 (binding)
>
> On Mon, Mar 13, 2023 at 8:18 PM Lincoln Lee 
> wrote:
>
> > Dear Flink developers,
> >
> > Thanks for all your feedback for FLIP-300: Add targetColumns to
> > DynamicTableSink#Context to solve the null overwrite problem of
> > partial-insert[1] on the discussion thread[2].
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours unless there is an objection or not enough votes.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240885081
> > [2] https://lists.apache.org/thread/bk8x0nqg4oc62jqryj9ntzzlpj062wd9
> >
> >
> > Best,
> > Lincoln Lee
> >
>


-- 

Best,
Benchao Li

Re: [VOTE] FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway

2023-03-13 Thread Benchao Li

+1 (binding)

Shammon FY  于2023年3月13日周一 13:47写道：

> Hi Devs,
>
> I'd like to start the vote on FLIP-293: Introduce Flink Jdbc Driver For Sql
> Gateway [1].
>
> The FLIP was discussed in thread [2], and it aims to introduce Flink Jdbc
> Driver module in Flink.
>
> The vote will last for at least 72 hours (03/16, 15:00 UTC+8) unless there
> is an objection or insufficient vote. Thank you all.
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-293%3A+Introduce+Flink+Jdbc+Driver+For+Sql+Gateway
> [2] https://lists.apache.org/thread/d1owrg8zh77v0xygcpb93fxt0jpjdkb3
>
>
> Best,
> Shammon.FY
>


-- 

Best,
Benchao Li

1 2 3 >

1 - 100 of 287 matches

Mail list logo