Re: Re: [VOTE] FLIP-415: Introduce a new join operator to support minibatch

2024-01-18 Thread liu ron
+1(binding)

Best,
Ron

Xuyang  于2024年1月19日周五 13:58写道:

> +1 (non-binding)--
>
> Best!
> Xuyang
>
>
>
>
>
> 在 2024-01-19 13:28:52,"Lincoln Lee"  写道:
> >+1 (binding)
> >
> >Best,
> >Lincoln Lee
> >
> >
> >Benchao Li  于2024年1月19日周五 13:15写道:
> >
> >> +1 (binding)
> >>
> >> shuai xu  于2024年1月19日周五 12:58写道:
> >>
> >>> Dear Flink Developers,
> >>>
> >>> Thank you for providing feedback on FLIP-415: Introduce a new join
> >>> operator to support minibatch[1]. I'd like to start a vote on this
> FLIP.
> >>> Here is the discussion thread[2].
> >>>
> >>> After the discussion, this FLIP will not introduce any new Option. The
> >>> minibatch join will default to compacting the changelog. As for the
> option
> >>> to control compaction within the minibatch that was mentioned in the
> >>> discussion, it could be discussed in a future FLIP.
> >>>
> >>> The vote will be open for at least 72 hours unless there is an
> objection
> >>> or
> >>> insufficient votes.
> >>>
> >>> Best,
> >>> Xu Shuai
> >>>
> >>> [1]
> >>> FLIP-415: Introduce a new join operator to support minibatch - Apache
> >>> Flink - Apache Software Foundation
> >>> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
> >
> >>> cwiki.apache.org
> >>> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
> >
> >>> [image: favicon.ico]
> >>> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
> >
> >>> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
> >
> >>> [2]
> >>> lists.apache.org
> >>> 
> >>> [image: favicon.ico]
> >>> 
> >>> 
> >>>
> >>>
> >>
> >> --
> >>
> >> Best,
> >> Benchao Li
> >>
>


Re: Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-10 Thread liu ron
+1 non-binding

Best
Ron

Matthias Pohl  于2024年1月10日周三 23:05写道:

> +1 (binding)
>
> On Wed, Jan 10, 2024 at 3:35 PM ConradJam  wrote:
>
> > +1 non-binding
> >
> > Dawid Wysakowicz  于2024年1月10日周三 21:06写道:
> >
> > > +1 (binding)
> > > Best,
> > > Dawid
> > >
> > > On Wed, 10 Jan 2024 at 11:54, Piotr Nowojski 
> > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > śr., 10 sty 2024 o 11:25 Martijn Visser 
> > > > napisał(a):
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Wed, Jan 10, 2024 at 4:43 AM Xingbo Huang 
> > > wrote:
> > > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > Best,
> > > > > > Xingbo
> > > > > >
> > > > > > Dian Fu  于2024年1月10日周三 11:35写道:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > Regards,
> > > > > > > Dian
> > > > > > >
> > > > > > > On Wed, Jan 10, 2024 at 5:09 AM Sharath  >
> > > > wrote:
> > > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Sharath
> > > > > > > >
> > > > > > > > On Tue, Jan 9, 2024 at 1:02 PM Venkata Sanath Muppalla <
> > > > > > > sanath...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Sanath
> > > > > > > > >
> > > > > > > > > On Tue, Jan 9, 2024 at 11:16 AM Peter Huang <
> > > > > > > huangzhenqiu0...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 (non-binding)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best Regards
> > > > > > > > > > Peter Huang
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 9, 2024 at 5:26 AM Jane Chan <
> > > > qingyue@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Jane
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jan 9, 2024 at 8:41 PM Lijie Wang <
> > > > > > > wangdachui9...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Lijie
> > > > > > > > > > > >
> > > > > > > > > > > > Jiabao Sun 
> > > 于2024年1月9日周二
> > > > > > > 19:28写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Jiabao
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 2024/01/09 09:58:04 xiangyu feng wrote:
> > > > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > Xiangyu Feng
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Danny Cranmer  于2024年1月9日周二
> > > 17:50写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 (binding)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Danny
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Jan 9, 2024 at 9:31 AM Feng Jin <
> > > > > ji...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Feng Jin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jan 9, 2024 at 5:29 PM Yuxin Tan <
> > > > > > > ta...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Yuxin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Márton Balassi 
> > 于2024年1月9日周二
> > > > > 17:25写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > +1 (binding)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Jan 9, 2024 at 10:15 AM Leonard
> Xu
> > <
> > > > > > > > > > xb...@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > +1(binding)
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > Leonard
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 2024年1月9日 下午5:08,Yangze Guo <
> > > > ka...@gmail.com
> > > > > >
> > > > > > > 写道:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Jan 9, 2024 at 5:06 PM Robert
> > > > > Metzger <
> > > > > > > > > > > > > > > rmetz...@apache.org
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > 

Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-10 Thread liu ron
+1

Best,
Ron

Yunqing Mo  于2023年12月11日周一 12:01写道:

> So cool, Big +1 for this exciting work.
>
> On 2023/12/07 03:24:59 Leonard Xu wrote:
> > Dear Flink devs,
> >
> > As you may have heard, we at Alibaba (Ververica) are planning to donate
> CDC Connectors for the Apache Flink project[1] to the Apache Flink
> community.
> >
> > CDC Connectors for Apache Flink comprise a collection of source
> connectors designed specifically for Apache Flink. These connectors[2]
> enable the ingestion of changes from various databases using Change Data
> Capture (CDC), most of these CDC connectors are powered by Debezium[3].
> They support both the DataStream API and the Table/SQL API, facilitating
> the reading of database snapshots and continuous reading of transaction
> logs with exactly-once processing, even in the event of failures.
> >
> >
> > Additionally, in the latest version 3.0, we have introduced many
> long-awaited features. Starting from CDC version 3.0, we've built a
> Streaming ELT Framework available for streaming data integration. This
> framework allows users to write their data synchronization logic in a
> simple YAML file, which will automatically be translated into a Flink
> DataStreaming job. It emphasizes optimizing the task submission process and
> offers advanced functionalities such as whole database synchronization,
> merging sharded tables, and schema evolution[4].
> >
> >
> > I believe this initiative is a perfect match for both sides. For the
> Flink community, it presents an opportunity to enhance Flink's competitive
> advantage in streaming data integration, promoting the healthy growth and
> prosperity of the Apache Flink ecosystem. For the CDC Connectors project,
> becoming a sub-project of Apache Flink means being part of a neutral
> open-source community, which can attract a more diverse pool of
> contributors.
> >
> > Please note that the aforementioned points represent only some of our
> motivations and vision for this donation. Specific future operations need
> to be further discussed in this thread. For example, the sub-project name
> after the donation; we hope to name it Flink-CDC aiming to streaming data
> intergration through Apache Flink, following the naming convention of
> Flink-ML; And this project is managed by a total of 8 maintainers,
> including 3 Flink PMC members and 1 Flink Committer. The remaining 4
> maintainers are also highly active contributors to the Flink community,
> donating this project to the Flink community implies that their permissions
> might be reduced. Therefore, we may need to bring up this topic for further
> discussion within the Flink PMC. Additionally, we need to discuss how to
> migrate existing users and documents. We have a user group of nearly 10,000
> people and a multi-version documentation site need to migrate. We also need
> to plan for the migration of CI/CD processes and other specifics.
> >
> >
> > While there are many intricate details that require implementation, we
> are committed to progressing and finalizing this donation process.
> >
> >
> > Despite being Flink’s most active ecological project (as evaluated by
> GitHub metrics), it also boasts a significant user base. However, I believe
> it's essential to commence discussions on future operations only after the
> community reaches a consensus on whether they desire this donation.
> >
> >
> > Really looking forward to hear what you think!
> >
> >
> > Best,
> > Leonard (on behalf of the Flink CDC Connectors project maintainers)
> >
> > [1] https://github.com/ververica/flink-cdc-connectors
> > [2]
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> > [3] https://debezium.io
> > [4]
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html
>


Re: Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-04 Thread liu ron
Hi, xuyang

Thanks for starting this FLIP discussion, currently there are two types of
window aggregation in Flink SQL, namely legacy group window aggregation and
window tvf aggregation, these two types of window aggregation are not fully
aligned in behavior, which will bring a lot of confusion to the users, so
there is a need to unify and align them. I think the final ideal state
should be that there is only one window tvf aggregation, which supports
Tumble, HOP, Cumulate and Session windows, and supports consuming CDC data
streams. There is also support for configuring EARLY-FIRE and LATER-FIRE.

This FLIP is a continuation of FLIP-145, and also supports legacy group
window aggregation to flat-migrate to the new window tvf agregation, which
is very useful, especially for the support of CDC streams, a pain point
that users often feedback. Big +1 for this FLIP.

Best,
Ron

Xuyang  于2023年12月5日周二 11:11写道:

> Hi, Feng and David.
>
>
> Thank you very much to share your thoughts.
>
>
> This flip does not include the official exposure of these experimental
> conf to users. Thus there is not adetailed description of this part.
> However, in view that some technical users may have added these
> experimental conf in actual production jobs, the processing
> of these conf while using window tvf syntax has been added to this flip.
>
>
> Overall, the behavior of using these experimental parameters is no
> different from before, and I think we should provide the compatibility
> about using these experimental conf.
>
>
> Look for your thoughs.
>
>
>
>
> --
>
> Best!
> Xuyang
>
>
>
>
>
> At 2023-12-05 09:17:49, "David Anderson"  wrote:
> >The current situation (where we have both the legacy windows and the
> >TVF-based windows) is confusing for users, and I'd like to see us move
> >forward as rapidly as possible.
> >
> >Since the early fire, late fire, and allowed lateness features were never
> >documented or exposed to users, I don't feel that we need to provide
> >replacements for these internal, experimental features before officially
> >deprecating the legacy group window aggregations, and I'd rather not wait.
> >
> >However, I'd be delighted to see a proposal for what that might look like.
> >
> >Best,
> >David
> >
> >On Mon, Dec 4, 2023 at 12:45 PM Feng Jin  wrote:
> >
> >> Hi xuyang,
> >>
> >> Thank you for initiating this proposal.
> >>
> >> I'm glad to see that TVF's functionality can be fully supported.
> >>
> >> Regarding the early fire, late fire, and allow lateness features, how
> will
> >> they be provided to users? The documentation doesn't seem to provide a
> >> detailed description of this part.
> >>
> >> Since this FLIP will also involve a lot of feature development, I am
> more
> >> than willing to help, including development and code review.
> >>
> >> Best,
> >> Feng
> >>
> >> On Tue, Nov 28, 2023 at 8:31 PM Xuyang  wrote:
> >>
> >> > Hi all.
> >> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
> >> > Window Aggregation.
> >> >
> >> >
> >> > Although the current Flink SQL Window Aggregation documentation[1]
> >> > indicates that the legacy Group Window Aggregation
> >> > syntax has been deprecated, the new Window TVF Aggregation syntax has
> not
> >> > fully covered all of the features of the legacy one.
> >> >
> >> >
> >> > Compared to Group Window Aggergation, Window TVF Aggergation has
> several
> >> > advantages, such as two-stage optimization,
> >> > support for standard GROUPING SET syntax, and so on. However, it
> needs to
> >> > supplement and enrich the following features.
> >> >
> >> >
> >> > 1. Support for SESSION Window TVF Aggregation
> >> > 2. Support for consuming CDC stream
> >> > 3. Support for HOP window size with non-integer step length
> >> > 4. Support for configurations such as early fire, late fire and allow
> >> > lateness
> >> > (which are internal experimental configurations in Group Window
> >> > Aggregation and not public to users yet.)
> >> > 5. Unification of the Window TVF Aggregation operator in runtime at
> the
> >> > implementation layer
> >> > (In the long term, the cost to maintain the operators about Window TVF
> >> > Aggregation and Group Window Aggregation is too expensive.)
> >> >
> >> >
> >> > This flip aims to continue the unfinished work in FLIP-145[2], which
> is
> >> to
> >> > fully enable the capabilities of Window TVF Aggregation
> >> >  and officially deprecate the legacy syntax Group Window Aggregation,
> to
> >> > prepare for the removal of the legacy one in Flink 2.0.
> >> >
> >> >
> >> > I have already done some preliminary POC to validate the feasibility
> of
> >> > the related work in this flip as follows.
> >> > 1. POC for SESSION Window TVF Aggregation [3]
> >> > 2. POC for CUMULATE in Group Window Aggregation operator [4]
> >> > 3. POC for consuming CDC stream in Window Aggregation operator [5]
> >> >
> >> >
> >> > Looking forward to your feedback and thoughts!
> >> >
> >> >
> >> >
> >> > [1]
> >> >
> >>
> 

Re: [DISCUSS] Release 1.17.2

2023-11-06 Thread liu ron
+1

Best,
Ron

Jing Ge  于2023年11月6日周一 22:07写道:

> +1
> Thanks for your effort!
>
> Best regards,
> Jing
>
> On Mon, Nov 6, 2023 at 1:15 AM Konstantin Knauf  wrote:
>
> > Thank you for picking it up! +1
> >
> > Cheers,
> >
> > Konstantin
> >
> > Am Mo., 6. Nov. 2023 um 03:48 Uhr schrieb Yun Tang :
> >
> > > Hi all,
> > >
> > > I would like to discuss creating a new 1.17 patch release (1.17.2). The
> > > last 1.17 release is near half a year old, and since then, 79 tickets
> > have
> > > been closed [1], of which 15 are blocker/critical [2]. Some
> > > of them are quite important, such as FLINK-32758 [3], FLINK-32296 [4],
> > > FLINK-32548 [5]
> > > and FLINK-33010[6].
> > >
> > > In addition to this, FLINK-33149 [7] is important to bump snappy-java
> to
> > > 1.1.10.4.
> > > Although FLINK-33149 is unresolved, it was done in 1.17.2.
> > >
> > > I am not aware of any unresolved blockers and there are no in-progress
> > > tickets [8]. Please let me know if there are any issues you'd like to
> be
> > > included in this release but still not merged.
> > >
> > > If the community agrees to create this new patch release, I could
> > > volunteer as the release manager with Yu Chen.
> > >
> > > Since there will be another flink-1.16.3 release request during the
> same
> > > time, we will work with Rui Fan since many issues will be fixed in both
> > > releases.
> > >
> > > [1]
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.17.2%20%20and%20resolution%20%20!%3D%20%20Unresolved%20order%20by%20priority%20DESC
> > > [2]
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.17.2%20and%20resolution%20%20!%3D%20Unresolved%20%20and%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20by%20priority%20%20DESC
> > > [3] https://issues.apache.org/jira/browse/FLINK-32758
> > > [4] https://issues.apache.org/jira/browse/FLINK-32296
> > > [5] https://issues.apache.org/jira/browse/FLINK-32548
> > > [6] https://issues.apache.org/jira/browse/FLINK-33010
> > > [7] https://issues.apache.org/jira/browse/FLINK-33149
> > > [8] https://issues.apache.org/jira/projects/FLINK/versions/12353260
> > >
> > > Best
> > > Yun Tang
> > >
> >
> >
> > --
> > https://twitter.com/snntrable
> > https://github.com/knaufk
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread liu ron
Great work, thanks everyone!

Best,
Ron

Alexander Fedulov  于2023年10月27日周五 04:00写道:

> Great work, thanks everyone!
>
> Best,
> Alexander
>
> On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> wrote:
>
> > Thank you all who have contributed!
> >
> > Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> >
> > > Thanks for the great work! Congratulations
> > >
> > >
> > > Best,
> > > Feng Jin
> > >
> > > On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
> > >
> > > > Congratulations, Well done!
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee  >
> > > > wrote:
> > > >
> > > > > Thanks for the great work! Congrats all!
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > > >
> > > > > Jing Ge  于2023年10月27日周五 00:16写道:
> > > > >
> > > > > > The Apache Flink community is very happy to announce the release
> of
> > > > > Apache
> > > > > > Flink 1.18.0, which is the first release for the Apache Flink
> 1.18
> > > > > series.
> > > > > >
> > > > > > Apache Flink® is an open-source unified stream and batch data
> > > > processing
> > > > > > framework for distributed, high-performing, always-available, and
> > > > > accurate
> > > > > > data applications.
> > > > > >
> > > > > > The release is available for download at:
> > > > > > https://flink.apache.org/downloads.html
> > > > > >
> > > > > > Please check out the release blog post for an overview of the
> > > > > improvements
> > > > > > for this release:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > > > >
> > > > > > The full release notes are available in Jira:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > > > >
> > > > > > We would like to thank all contributors of the Apache Flink
> > community
> > > > who
> > > > > > made this release possible!
> > > > > >
> > > > > > Best regards,
> > > > > > Konstantin, Qingsheng, Sergey, and Jing
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] FLIP-373: Support Configuring Different State TTLs using SQL Hint

2023-10-25 Thread liu ron
+1(binding)

Best,
Ron

Jark Wu  于2023年10月25日周三 19:52写道:

> +1 (binding)
>
> Best,
> Jark
>
> On Wed, 25 Oct 2023 at 16:27, Jiabao Sun 
> wrote:
>
> > Thanks Jane for driving this.
> >
> > +1 (non-binding)
> >
> > Best,
> > Jiabao
> >
> >
> > > 2023年10月25日 16:22,Lincoln Lee  写道:
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Zakelly Lan  于2023年10月23日周一 14:15写道:
> > >
> > >> +1(non-binding)
> > >>
> > >> Best,
> > >> Zakelly
> > >>
> > >> On Mon, Oct 23, 2023 at 1:15 PM Benchao Li 
> > wrote:
> > >>>
> > >>> +1 (binding)
> > >>>
> > >>> Feng Jin  于2023年10月23日周一 13:07写道:
> > 
> >  +1(non-binding)
> > 
> > 
> >  Best,
> >  Feng
> > 
> > 
> >  On Mon, Oct 23, 2023 at 11:58 AM Xuyang  wrote:
> > 
> > > +1(non-binding)
> > >
> > >
> > >
> > >
> > > --
> > >
> > >Best!
> > >Xuyang
> > >
> > >
> > >
> > >
> > >
> > > At 2023-10-23 11:38:15, "Jane Chan"  wrote:
> > >> Hi developers,
> > >>
> > >> Thanks for all the feedback on FLIP-373: Support Configuring
> > >> Different
> > >> State TTLs using SQL Hint [1].
> > >> Based on the discussion [2], we have reached a consensus, so I'd
> > >> like to
> > >> start a vote.
> > >>
> > >> The vote will last for at least 72 hours (Oct. 26th at 10:00 A.M.
> > >> GMT)
> > >> unless there is an objection or insufficient votes.
> > >>
> > >> [1]
> > >>
> > >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint
> > >> [2]
> > >> https://lists.apache.org/thread/3s69dhv3rp4s0kysnslqbvyqo3qf7zq5
> > >>
> > >> Best,
> > >> Jane
> > >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Best,
> > >>> Benchao Li
> > >>
> >
> >
>


Re: [DISCUSS] Implementing SQL remote functions

2023-09-18 Thread liu ron
Hi, Alan

Thanks for driving this proposal. It sounds interesting.
Regarding implementing the Remote Function, can you go into more detail
about your idea, how we should support it, and how users should use it,
from API design to semantic explanation?and how does the remote function
help to solve your problem?

I understand that your core pain point is that there are performance issues
with too many RPC calls. For the three solutions you have explored.
Regarding the Lookup Join Cons,

>> *Lookup Joins:*
Pros:
- Part of the Flink codebase
- High throughput
Cons:
- Unintuitive syntax
- Harder to do multiple remote calls per input row

I think one solution is to support Mini-Batch Lookup Join by the framework
layer, do a RPC call by a batch input row, which can improve throughput.

Best,
Ron

Alan Sheinberg  于2023年9月19日周二 07:34写道:

> Hello all,
>
> We want to implement a custom function that sends HTTP requests to a remote
> endpoint using Flink SQL. Even though the function will behave like a
> normal UDF, the runtime would issue calls asynchronously to achieve high
> throughput for these remote (potentially high latency) calls. What is the
> community's take on implementing greater support for such functions? Any
> feedback is appreciated.
>
> What we have explored so far:
>
> 1.  Using a lookup join
> <
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join
> >.
> For example:
> create TEMPORARY TABLE RemoteTable(table_lookup_key string, resp string,
> PRIMARY KEY (table_lookup_key) NOT ENFORCED) with ('connector' =
> 'remote_call');
> SELECT i.table_lookup_key, resp FROM Inputs as i JOIN RemoteTable r FOR
> SYSTEM_TIME AS OF i.proc_time as a ON i.table_lookup_key = r.
> table_lookup_key;
>
> 2.  Using a polymorphic table function. Partially supported already for
> window
> functions
> <
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-tvf/
> >.
> For example:
> SELECT * FROM TABLE (REMOTE_CALL (Input => Table(TableToLookup) as d, Col
> => DESCRIPTOR("table_lookup_key")));
>
> 3.  Using an AsyncScalarFunction. Scalar functions are usually used as
> below (thus support for an async version of ScalarFunction required):
> SELECT REMOTE_CALL(t.table_lookup_key) FROM TableToLookup t;
>
> Some pros and cons for each approach:
>
> *Lookup Joins:*
> Pros:
> - Part of the Flink codebase
> - High throughput
> Cons:
> - Unintuitive syntax
> - Harder to do multiple remote calls per input row
>
> *PTFs:*
> Pros:
> - More intuitive syntax
> Cons:
> - Need to add more support in Flink. It may exist for specialized built-in
> functions, but not for user defined ones
>
> *AsyncScalarFunction:*
> Pros:
> - Most intuitive syntax
> - Easy to do as many calls per row input as desired
> Cons:
> - Need to add support in Flink, including a new interface with an async
> eval method
> - Out of order results could pose issues with SQL semantics. If we output
> in order, the throughput performance may suffer
>
> Thanks,
> Alan
>


Re: [VOTE] FLIP-334: Decoupling autoscaler and kubernetes and support the Standalone Autoscaler

2023-09-13 Thread liu ron
+1(non-binding)

Best,
Ron

Dong Lin  于2023年9月14日周四 09:01写道:

> Thank you Rui for the proposal.
>
> +1 (binding)
>
> On Wed, Sep 13, 2023 at 10:52 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> > Hi all,
> >
> > Thanks for all the feedback about the FLIP-334:
> > Decoupling autoscaler and kubernetes and
> > support the Standalone Autoscaler[1].
> > This FLIP was discussed in [2].
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours (until Sep 16th 11:00 UTC+8) unless there is an objection or
> > insufficient votes.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-334+%3A+Decoupling+autoscaler+and+kubernetes+and+support+the+Standalone+Autoscaler
> > [2] https://lists.apache.org/thread/kmm03gls1vw4x6vk1ypr9ny9q9522495
> >
> > Best,
> > Rui
> >
>


Re: [VOTE] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-09-12 Thread liu ron
+1(non-binding)

Best,
Ron

ConradJam  于2023年9月12日周二 20:43写道:

> +1 (non-binding)
>
> Matt Wang  于2023年9月12日周二 19:29写道:
>
> > +1 (non-binding)
> >
> >
> > --
> >
> > Best,
> > Matt Wang
> >
> >
> >  Replied Message 
> > | From | Samrat Deb |
> > | Date | 09/12/2023 18:26 |
> > | To |  |
> > | Subject | Re: [VOTE] FLIP-323: Support Attached Execution on Flink
> > Application Completion for Batch Jobs |
> > Thank you for driving this FLIP,
> >
> > +1 (non-binding)
> >
> > Bests,
> > Samrat
> >
> > On Tue, Sep 12, 2023 at 2:27 PM Ahmed Hamdy 
> wrote:
> >
> > Hi Allison
> > Thanks for the proposal.
> > + (non-binding)
> > Best Regards
> > Ahmed Hamdy
> >
> >
> > On Mon, 11 Sept 2023 at 03:48, Weihua Hu  wrote:
> >
> > +1 (binding)
> >
> > Best,
> > Weihua
> >
> >
> > On Mon, Sep 11, 2023 at 3:16 AM Jing Ge 
> > wrote:
> >
> > +1(binding)
> >
> > Best Regards,
> > Jing
> >
> > On Sun, Sep 10, 2023 at 10:17 AM Dong Lin  wrote:
> >
> > Thanks Allison for proposing the FLIP.
> >
> > +1 (binding)
> >
> > On Fri, Sep 8, 2023 at 4:21 AM Allison Chang
> >  >
> > wrote:
> >
> > Hi everyone,
> >
> > Would like to start the VOTE for FLIP-323<
> >
> >
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-323%3A+Support+Attached+Execution+on+Flink+Application+Completion+for+Batch+Jobs
> >
> > which proposes to introduce attached execution for batch jobs. The
> > discussion thread can be found here<
> > https://lists.apache.org/thread/d3toldk6qqjh2fnbmqthlfkj9rc6lwgl>:
> >
> >
> > Best,
> >
> > Allison Chang
> >
> >
> >
> >
> >
> >
> >
>
> --
> Best
>
> ConradJam
>


Re: 退订

2023-09-07 Thread liu ron
Hi,

Please send email to dev-unsubscr...@flink.apache.org
 if you want to unsubscribe the mail
from dev@flink.apache.org , and you can refer [1][2]
for more details.

[1]
https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
[2] https://flink.apache.org/community.html#mailing-lists

Best,
Ron


喻凯  于2023年9月7日周四 19:38写道:

> 退订


Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-09-06 Thread liu ron
Hi Tawfik,

Fast and slow streaming in distributed scenarios leads to watermark
advancing too fast, which leads to lost data and is a headache in Flink.
Can't wait to read your research paper!

Best,
Ron

Yun Tang  于2023年9月6日周三 14:46写道:

> Hi Tawfik,
>
> Thanks for offering such a proposal, looking forward to your research
> paper!
>
> You could also ask the edit permission for Flink improvement proposals to
> create a new proposal if you want to contribute this to the community by
> yourself.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>
> Best
> Yun Tang
> 
> From: yuxia 
> Sent: Wednesday, September 6, 2023 12:31
> To: dev 
> Subject: Re: Proposal for Implementing Keyed Watermarks in Apache Flink
>
> Hi, Tawfik Yasser.
> Thanks for the proposal.
> It sounds exciting. I can't wait the research paper for more details.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "David Morávek" 
> 收件人: "dev" 
> 发送时间: 星期二, 2023年 9 月 05日 下午 4:36:51
> 主题: Re: Proposal for Implementing Keyed Watermarks in Apache Flink
>
> Hi Tawfik,
>
> It's exciting to see any ongoing research that tries to push Flink forward!
>
> The get the discussion started, can you please your paper with the
> community? Assessing the proposal without further context is tough.
>
> Best,
> D.
>
> On Mon, Sep 4, 2023 at 4:42 PM Tawfek Yasser Tawfek 
> wrote:
>
> > Dear Apache Flink Development Team,
> >
> > I hope this email finds you well. I am writing to propose an exciting new
> > feature for Apache Flink that has the potential to significantly enhance
> > its capabilities in handling unbounded streams of events, particularly in
> > the context of event-time windowing.
> >
> > As you may be aware, Apache Flink has been at the forefront of Big Data
> > Stream processing engines, leveraging windowing techniques to manage
> > unbounded event streams effectively. The accuracy of the results obtained
> > from these streams relies heavily on the ability to gather all relevant
> > input within a window. At the core of this process are watermarks, which
> > serve as unique timestamps marking the progression of events in time.
> >
> > However, our analysis has revealed a critical issue with the current
> > watermark generation method in Apache Flink. This method, which operates
> at
> > the input stream level, exhibits a bias towards faster sub-streams,
> > resulting in the unfortunate consequence of dropped events from slower
> > sub-streams. Our investigations showed that Apache Flink's conventional
> > watermark generation approach led to an alarming data loss of
> approximately
> > 33% when 50% of the keys around the median experienced delays. This loss
> > further escalated to over 37% when 50% of random keys were delayed.
> >
> > In response to this issue, we have authored a research paper outlining a
> > novel strategy named "keyed watermarks" to address data loss and
> > substantially enhance data processing accuracy, achieving at least 99%
> > accuracy in most scenarios.
> >
> > Moreover, we have conducted comprehensive comparative studies to evaluate
> > the effectiveness of our strategy against the conventional watermark
> > generation method, specifically in terms of event-time tracking accuracy.
> >
> > We believe that implementing keyed watermarks in Apache Flink can greatly
> > enhance its performance and reliability, making it an even more valuable
> > tool for organizations dealing with complex, high-throughput data
> > processing tasks.
> >
> > We kindly request your consideration of this proposal. We would be eager
> > to discuss further details, provide the full research paper, or
> collaborate
> > closely to facilitate the integration of this feature into Apache Flink.
> >
> > Thank you for your time and attention to this proposal. We look forward
> to
> > the opportunity to contribute to the continued success and evolution of
> > Apache Flink.
> >
> > Best Regards,
> >
> > Tawfik Yasser
> > Senior Teaching Assistant @ Nile University, Egypt
> > Email: tyas...@nu.edu.eg
> > LinkedIn: https://www.linkedin.com/in/tawfikyasser/
> >
>


Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-09-06 Thread liu ron
Hi, Boonie

I'm with Jark on why disable hint is needed if it won't affect security. If
users don't need to use hint, then they won't care about it and I don't
think it's going to be a nuisance. On top of that, Lookup Join Hint is very
useful for streaming jobs, and disabling the hint would result in users not
being able to use it.

Best,
Ron

Bonnie Arogyam Varghese  于2023年9月6日周三
23:52写道:

> Hi Liu Ron,
>  To answer your question,
>Security might not be the main reason for disabling this option but
> other arguments brought forward by Timo. Let me know if you have any
> further questions or concerns.
>
> On Tue, Sep 5, 2023 at 9:35 PM Bonnie Arogyam Varghese <
> bvargh...@confluent.io> wrote:
>
> > It looks like it will be nice to have a config to disable hints. Any
> other
> > thoughts/concerns before we can close this discussion?
> >
> > On Fri, Aug 18, 2023 at 7:43 AM Timo Walther  wrote:
> >
> >>  > lots of the streaming SQL syntax are extensions of SQL standard
> >>
> >> That is true. But hints are kind of a special case because they are not
> >> even "part of Flink SQL" that's why they are written in a comment
> syntax.
> >>
> >> Anyway, I feel hints could be sometimes confusing for users because most
> >> of them have no effect for streaming and long-term we could also set
> >> some hints via the CompiledPlan. And if you have multiple teams,
> >> non-skilled users should not play around with hints and leave the
> >> decision to the system that might become smarter over time.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 17.08.23 18:47, liu ron wrote:
> >> > Hi, Bonnie
> >> >
> >> >> Options hints could be a security concern since users can override
> >> > settings.
> >> >
> >> > I think this still doesn't answer my question
> >> >
> >> > Best,
> >> > Ron
> >> >
> >> > Jark Wu  于2023年8月17日周四 19:51写道:
> >> >
> >> >> Sorry, I still don't understand why we need to disable the query
> hint.
> >> >> It doesn't have the security problems as options hint. Bonnie said it
> >> >> could affect performance, but that depends on users using it
> >> explicitly.
> >> >> If there is any performance problem, users can remove the hint.
> >> >>
> >> >> If we want to disable query hint just because it's an extension to
> SQL
> >> >> standard.
> >> >> I'm afraid we have to introduce a bunch of configuration, because
> lots
> >> of
> >> >> the streaming SQL syntax are extensions of SQL standard.
> >> >>
> >> >> Best,
> >> >> Jark
> >> >>
> >> >> On Thu, 17 Aug 2023 at 15:43, Timo Walther 
> wrote:
> >> >>
> >> >>> +1 for this proposal.
> >> >>>
> >> >>> Not every data team would like to enable hints. Also because they
> are
> >> an
> >> >>> extension to the SQL standard. It might also be the case that custom
> >> >>> rules would be overwritten otherwise. Setting hints could also be
> the
> >> >>> exclusive task of a DevOp team.
> >> >>>
> >> >>> Regards,
> >> >>> Timo
> >> >>>
> >> >>>
> >> >>> On 17.08.23 09:30, Konstantin Knauf wrote:
> >> >>>> Hi Bonnie,
> >> >>>>
> >> >>>> this makes sense to me, in particular, given that we already have
> >> this
> >> >>>> toggle for a different type of hints.
> >> >>>>
> >> >>>> Best,
> >> >>>>
> >> >>>> Konstantin
> >> >>>>
> >> >>>> Am Mi., 16. Aug. 2023 um 19:38 Uhr schrieb Bonnie Arogyam Varghese
> >> >>>> :
> >> >>>>
> >> >>>>> Hi Liu,
> >> >>>>>Options hints could be a security concern since users can
> >> override
> >> >>>>> settings. However, query hints specifically could affect
> >> performance.
> >> >>>>> Since we have a config to disable Options hint, I'm suggesting we
> >> also
> >> >>> have
> >> >>>>> a config to disable Query hints.
> >> >>>>>
> >> >>>>

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread liu ron
Hi, Gyula

Thanks for driving this proposal, GC-related metrics are beneficial for us
to profile the root cause, +1 for this proposal.

Best,
Ron

Matt Wang  于2023年9月6日周三 15:24写道:

> Hi Gyula,
>
> +1 for this proposal.
>
> Do we need to add a metric to record the count of different
> collectors? Now there is only a total count. For example,
> for G1, there is no way to distinguish whether it is the
> young generation or the old generation.
>
>
>
> --
>
> Best,
> Matt Wang
>
>
>  Replied Message 
> | From | Gyula Fóra |
> | Date | 09/6/2023 15:03 |
> | To |  |
> | Subject | Re: [DISCUSS] FLIP-361: Improve GC Metrics |
> Thanks Xintong!
>
> Just so I understand correctly, do you suggest adding a metric for
> delta(Time) / delta(Count) since the last reporting ?
> .TimePerGc or .AverageTime would make sense.
> AverageTime may be a bit nicer :)
>
> My only concern is how useful this will be in reality. If there are only
> (or several) long pauses then the msPerSec metrics will show it already,
> and if there is a single long pause that may not be shown at all if there
> are several shorter pauses as well with this metric.
>
> Gyula
>
> On Wed, Sep 6, 2023 at 8:46 AM Xintong Song  wrote:
>
> Thanks for bringing this up, Gyula.
>
> The proposed changes make sense to me. +1 for them.
>
> In addition to the proposed changes, I wonder if we should also add
> something like timePerGc? This would help understand whether there are long
> pauses, due to GC STW, that may lead to rpc unresponsiveness and heartbeat
> timeouts. Ideally, we'd like to understand the max pause time per STW in a
> recent time window. However, I don't see an easy way to separate the pause
> time of each STW. Deriving the overall time per GC from the existing
> metrics (time-increment / count-increment) seems to be a good alternative.
> WDYT?
>
> Best,
>
> Xintong
>
>
>
> On Wed, Sep 6, 2023 at 2:16 PM Rui Fan <1996fan...@gmail.com> wrote:
>
> Thanks for the clarification!
>
> By default the meterview measures for 1 minute sounds good to me!
>
> +1 for this proposal.
>
> Best,
> Rui
>
> On Wed, Sep 6, 2023 at 1:27 PM Gyula Fóra  wrote:
>
> Thanks for the feedback Rui,
>
> The rates would be computed using the MeterView class (like for any
> other
> rate metric), just because we report the value per second it doesn't
> mean
> that we measure in a second granularity.
> By default the meterview measures for 1 minute and then we calculate
> the
> per second rates, but we can increase the timespan if necessary.
>
> So I don't think we run into this problem in practice and we can keep
> the
> metric aligned with other time rate metrics like busyTimeMsPerSec etc.
>
> Cheers,
> Gyula
>
> On Wed, Sep 6, 2023 at 4:55 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> Hi Gyula,
>
> +1 for this proposal. The current GC metric is really unfriendly.
>
> I have a concern with your proposed rate metric: the rate is
> perSecond
> instead of per minute. I'm unsure whether it's suitable for GC
> metric.
>
> There are two reasons why I suspect perSecond may not be well
> compatible with GC metric:
>
> 1. GCs are usually infrequent and may only occur for a small number
> of time periods within a minute.
>
> Metrics are collected periodically, for example, reported every
> minute.
> If the result reported by the GC metric is 1s/perSecond, it does not
> mean that the GC of the TM is serious, because there may be no GC
> in the remaining 59s.
>
> On the contrary, the GC metric reports 0s/perSecond, which does not
> mean that the GC of the TM is not serious, and the GC may be very
> serious in the remaining 59s.
>
> 2. Stop-the-world may cause the metric to fail(delay) to report
>
> The TM will stop the world during GC, especially full GC. It means
> the metric cannot be collected or reported during full GC.
>
> So the collected GC metric may never be 1s/perSecond. This metric
> may always be good because the metric will only be reported when
> the GC is not severe.
>
>
> If these concerns make sense, how about updating the GC rate
> at minute level?
>
> We can define the type to Gauge for TimeMsPerMiunte, and updating
> this Gauge every second, it is:
> GC Total.Time of current time - GC total time of one miunte ago.
>
> Best,
> Rui
>
> On Tue, Sep 5, 2023 at 11:05 PM Maximilian Michels 
> wrote:
>
> Hi Gyula,
>
> +1 The proposed changes make sense and are in line with what is
> available for other metrics, e.g. number of records processed.
>
> -Max
>
> On Tue, Sep 5, 2023 at 2:43 PM Gyula Fóra 
> wrote:
>
> Hi Devs,
>
> I would like to start a discussion on FLIP-361: Improve GC
> Metrics
> [1].
>
> The current Flink GC metrics [2] are not very useful for
> monitoring
> purposes as they require post processing logic that is also
> dependent
> on
> the current runtime environment.
>
> Problems:
> - Total time is not very relevant for long running applications,
> only
> the
> rate of change (msPerSec)
> - In most cases it's best to simply aggregate the time/count
> 

Re: 退订

2023-08-30 Thread liu ron
Please send email to user-unsubscr...@flink.apache.org if you want to
unsubscribe the mail from u...@flink.apache.org, and you can refer [1][2]
for more details.

[1]
https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
[2] https://flink.apache.org/community.html#mailing-lists

Best,
Ron

喻凯  于2023年8月30日周三 14:17写道:

>
>


Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-08-17 Thread liu ron
Hi, Bonnie

> Options hints could be a security concern since users can override
settings.

I think this still doesn't answer my question

Best,
Ron

Jark Wu  于2023年8月17日周四 19:51写道:

> Sorry, I still don't understand why we need to disable the query hint.
> It doesn't have the security problems as options hint. Bonnie said it
> could affect performance, but that depends on users using it explicitly.
> If there is any performance problem, users can remove the hint.
>
> If we want to disable query hint just because it's an extension to SQL
> standard.
> I'm afraid we have to introduce a bunch of configuration, because lots of
> the streaming SQL syntax are extensions of SQL standard.
>
> Best,
> Jark
>
> On Thu, 17 Aug 2023 at 15:43, Timo Walther  wrote:
>
> > +1 for this proposal.
> >
> > Not every data team would like to enable hints. Also because they are an
> > extension to the SQL standard. It might also be the case that custom
> > rules would be overwritten otherwise. Setting hints could also be the
> > exclusive task of a DevOp team.
> >
> > Regards,
> > Timo
> >
> >
> > On 17.08.23 09:30, Konstantin Knauf wrote:
> > > Hi Bonnie,
> > >
> > > this makes sense to me, in particular, given that we already have this
> > > toggle for a different type of hints.
> > >
> > > Best,
> > >
> > > Konstantin
> > >
> > > Am Mi., 16. Aug. 2023 um 19:38 Uhr schrieb Bonnie Arogyam Varghese
> > > :
> > >
> > >> Hi Liu,
> > >>   Options hints could be a security concern since users can override
> > >> settings. However, query hints specifically could affect performance.
> > >> Since we have a config to disable Options hint, I'm suggesting we also
> > have
> > >> a config to disable Query hints.
> > >>
> > >> On Wed, Aug 16, 2023 at 9:41 AM liu ron  wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Thanks for driving this proposal.
> > >>>
> > >>> Can you explain why you would need to disable query hints because of
> > >>> security issues? I don't really understand why query hints affects
> > >>> security.
> > >>>
> > >>> Best,
> > >>> Ron
> > >>>
> > >>> Bonnie Arogyam Varghese 
> 于2023年8月16日周三
> > >>> 23:59写道:
> > >>>
> > >>>> Platform providers may want to disable hints completely for security
> > >>>> reasons.
> > >>>>
> > >>>> Currently, there is a configuration to disable OPTIONS hint -
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-dynamic-table-options-enabled
> > >>>>
> > >>>> However, there is no configuration available to disable QUERY hints
> -
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/queries/hints/#query-hints
> > >>>>
> > >>>> The proposal is to add a new configuration:
> > >>>>
> > >>>> Name: table.query-options.enabled
> > >>>> Description: Enable or disable the QUERY hint, if disabled, an
> > >>>> exception would be thrown if any QUERY hints are specified
> > >>>> Note: The default value will be set to true.
> > >>>>
> > >>>
> > >>
> > >
> > >
> >
> >
>


Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-08-16 Thread liu ron
Hi, Jiangjie

Thanks for your detailed explanation, I got your point. If the
execution.attached is only used for client currently, removing it also make
sense to me.

Best,
Ron

Becket Qin  于2023年8月17日周四 07:37写道:

> Hi Ron,
>
> Isn't the cluster (session or per job) only using the execution.attached to
> determine whether the client is attached? If so, the client can always
> include the information of whether it's an attached client or not in the
> JobSubmissoinRequestBody, right? For a shared session cluster, there could
> be multiple clients submitting jobs to it. These clients may or may not be
> attached. A static execution.attached configuration for the session cluster
> does not work in this case, right?
>
> The current problem of execution.attached is that it is not always honored.
> For example, if a session cluster was started with execution.attached set
> to false. And a client submits a job later to that session cluster with
> execution.attached set to true. In this case, the cluster won't (and
> shouldn't) shutdown after the job finishes or the attached client loses
> connection. So, in fact, the execution.attached configuration is only
> honored by the client, but not the cluster. Therefore, I think removing it
> makes sense.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Aug 17, 2023 at 12:31 AM liu ron  wrote:
>
> > Hi, Jiangjie
> >
> > Sorry for late reply. Thank you for such a detailed response. As you say,
> > there are three behaviours here for users and I agree with you. The goal
> of
> > this FLIP is to clarify the behaviour of the client side, which I also
> > agree with. However, as weihua said, the config execution.attached is not
> > only for per-job mode, but also for session mode, but the FLIP says that
> > this is only for per-job mode, and this config will be removed in the
> > future because the per-job mode has been deprecated. I don't think this
> is
> > correct and we should change the description in the corresponding section
> > of the FLIP. Since execution.attached is used in session mode, there is a
> > compatibility issue here if we change it directly to
> > client.attached.after.submission, and I think we should make this clear
> in
> > the FLIP.
> >
> > Best,
> > Ron
> >
> > Becket Qin  于2023年8月14日周一 20:33写道:
> >
> > > Hi Ron and Weihua,
> > >
> > > Thanks for the feedback.
> > >
> > > There seem three user sensible behaviors that we are talking about:
> > >
> > > 1. The behavior on the client side, i.e. whether blocking until the job
> > > finishes or not.
> > >
> > > 2. The behavior of the submitted job, whether stop the job execution if
> > the
> > > client is detached from the Flink cluster, i.e. whether bind the
> > lifecycle
> > > of the job with the connection status of the attached client. For
> > example,
> > > one might want to keep a batch job running until finish even after the
> > > client connection is lost. But it makes sense to stop the job upon
> client
> > > connection lost if the job invokes collect() on a streaming job.
> > >
> > > 3. The behavior of the Flink cluster (JM and TMs), whether shutdown the
> > > Flink cluster if the client is detached from the Flink cluster, i.e.
> > > whether bind the cluster lifecycle with the job lifecycle. For
> dedicated
> > > clusters (application cluster or dedicated session clusters), the
> > lifecycle
> > > of the cluster should be bound with the job lifecycle. But for shared
> > > session clusters, the lifecycle of the Flink cluster should be
> > independent
> > > of the jobs running in it.
> > >
> > > As we can see, these three behaviors are sort of independent, the
> current
> > > configurations fail to support all the combination of wanted behaviors.
> > > Ideally there should be three separate configurations, for example:
> > > - client.attached.after.submission and client.heartbeat.timeout control
> > the
> > > behavior on the client side.
> > > - jobmanager.cancel-on-attached-client-exit controls the behavior of
> the
> > > job when an attached client lost connection. The client heartbeat
> timeout
> > > and attach-ness will be also passed to the JM upon job submission.
> > > - cluster.shutdown-on-first-job-finishes *(*or
> > > jobmanager.shutdown-cluster-after-job-finishes) controls the cluster
> > > behavior after the job finishes normally / abnormally. This is a
> cluster
> > > level setting instead of a j

Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-08-16 Thread liu ron
Hi,

Thanks for driving this proposal.

Can you explain why you would need to disable query hints because of
security issues? I don't really understand why query hints affects security.

Best,
Ron

Bonnie Arogyam Varghese  于2023年8月16日周三
23:59写道:

> Platform providers may want to disable hints completely for security
> reasons.
>
> Currently, there is a configuration to disable OPTIONS hint -
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-dynamic-table-options-enabled
>
> However, there is no configuration available to disable QUERY hints -
>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/queries/hints/#query-hints
>
> The proposal is to add a new configuration:
>
> Name: table.query-options.enabled
> Description: Enable or disable the QUERY hint, if disabled, an
> exception would be thrown if any QUERY hints are specified
> Note: The default value will be set to true.
>


Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-08-16 Thread liu ron
Hi, Jiangjie

Sorry for late reply. Thank you for such a detailed response. As you say,
there are three behaviours here for users and I agree with you. The goal of
this FLIP is to clarify the behaviour of the client side, which I also
agree with. However, as weihua said, the config execution.attached is not
only for per-job mode, but also for session mode, but the FLIP says that
this is only for per-job mode, and this config will be removed in the
future because the per-job mode has been deprecated. I don't think this is
correct and we should change the description in the corresponding section
of the FLIP. Since execution.attached is used in session mode, there is a
compatibility issue here if we change it directly to
client.attached.after.submission, and I think we should make this clear in
the FLIP.

Best,
Ron

Becket Qin  于2023年8月14日周一 20:33写道:

> Hi Ron and Weihua,
>
> Thanks for the feedback.
>
> There seem three user sensible behaviors that we are talking about:
>
> 1. The behavior on the client side, i.e. whether blocking until the job
> finishes or not.
>
> 2. The behavior of the submitted job, whether stop the job execution if the
> client is detached from the Flink cluster, i.e. whether bind the lifecycle
> of the job with the connection status of the attached client. For example,
> one might want to keep a batch job running until finish even after the
> client connection is lost. But it makes sense to stop the job upon client
> connection lost if the job invokes collect() on a streaming job.
>
> 3. The behavior of the Flink cluster (JM and TMs), whether shutdown the
> Flink cluster if the client is detached from the Flink cluster, i.e.
> whether bind the cluster lifecycle with the job lifecycle. For dedicated
> clusters (application cluster or dedicated session clusters), the lifecycle
> of the cluster should be bound with the job lifecycle. But for shared
> session clusters, the lifecycle of the Flink cluster should be independent
> of the jobs running in it.
>
> As we can see, these three behaviors are sort of independent, the current
> configurations fail to support all the combination of wanted behaviors.
> Ideally there should be three separate configurations, for example:
> - client.attached.after.submission and client.heartbeat.timeout control the
> behavior on the client side.
> - jobmanager.cancel-on-attached-client-exit controls the behavior of the
> job when an attached client lost connection. The client heartbeat timeout
> and attach-ness will be also passed to the JM upon job submission.
> - cluster.shutdown-on-first-job-finishes *(*or
> jobmanager.shutdown-cluster-after-job-finishes) controls the cluster
> behavior after the job finishes normally / abnormally. This is a cluster
> level setting instead of a job level setting. Therefore it can only be set
> when launching the cluster.
>
> The current code sort of combines config 2 and 3 into
> execution.shutdown-on-attach-exit.
> This assumes the the life cycle of the cluster is the same as the job when
> the client is attached. This FLIP does not intend to change that. but using
> the execution.attached config for the client behavior control looks
> misleading. So this FLIP proposes to replace it with a more intuitive
> config of client.attached.after.submission. This makes it clear that it is
> a configuration controlling the client side behavior, instead of the
> execution of the job.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
>
> On Thu, Aug 10, 2023 at 10:34 PM Weihua Hu  wrote:
>
> > Hi Allison
> >
> > Thanks for driving this FLIP. It's a valuable feature for batch jobs.
> > This helps keep "Drop Per-Job Mode [1]" going.
> >
> > +1 for this proposal.
> >
> > However, it seems that the change in this FLIP is not detailed enough.
> > I have a few questions.
> >
> > 1. The config 'execution.attached' is not only used in per-job mode,
> > but also in session mode to shutdown the cluster. IMHO, it's better to
> > keep this option name.
> >
> > 2. This FLIP only mentions YARN mode. I believe this feature should
> > work in both YARN and Kubernetes mode.
> >
> > 3. Within the attach mode, we support two features:
> > execution.shutdown-on-attached-exit
> > and client.heartbeat.timeout. These should also be taken into account.
> >
> > 4. The Application Mode will shut down once the job has been completed.
> > So, if we use the flink client to poll job status via REST API for attach
> > mode,
> > there is a chance that the client will not be able to retrieve the job
> > finish status.
> > Perhaps FLINK-24113[3] will help with this.
> >
> >
> > [1]https://issues.apac

Re: Plans for Schema Evolution in Table API

2023-08-15 Thread liu ron
Hi, Ashish

As Timo said, the community currently doesn't have plan to support schema
evolution in Table API,

Best,
Ron

Timo Walther  于2023年8月15日周二 23:29写道:

> Hi Ashish,
>
> sorry for the late reply. There are currently no concrete plans to
> support schema evolution in Table API. Until recently, Flink version
> evolution was the biggest topic. In the near future we can rediscuss
> query and state evolution in more detail.
>
> Personally, I think we will need either some kind of more flexible data
> type (similar like the JSON type in Postgres) or user-defined types
> (UDT) to ensure a smooth experience.
>
> For now, warming up the state is the only viable solution until internal
> serializers are more flexible.
>
> Regards,
> Timo
>
> On 14.08.23 16:55, Ashish Khatkar wrote:
> > Bumping the thread.
> >
> > On Fri, Aug 4, 2023 at 12:51 PM Ashish Khatkar 
> wrote:
> >
> >> Hi all,
> >>
> >> We are using flink-1.17.0 table API and RocksDB as backend to provide a
> >> service to our users to run sql queries. The tables are created using
> the
> >> avro schema and when the schema is changed in a compatible manner i.e
> >> adding a field with default, we are unable to recover the job from the
> >> savepoint. This is mentioned in the flink doc on evolution [1] as well.
> >>
> >> Are there any plans to support schema evolution in the table API? Our
> >> current approach involves rebuilding the entire state by discarding the
> >> output and then utilizing that state in the actual job. This is already
> >> done for table-store [2]
> >>
> >> [1]
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/concepts/overview/#stateful-upgrades-and-evolution
> >> [2]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-226%3A+Introduce+Schema+Evolution+on+Table+Store
> >>
> >>
> >>
> >
>
>


Re: request for jira

2023-08-14 Thread liu ron
Hi, Lambda

If you're interested in a ticket that doesn't currently have an owner, you
can comment directly in the ticket and the committer will assign it to you
without assigning special permissions.

Best,
Ron

lambda ch  于2023年8月14日周一 22:42写道:

> Hi,
> I want to contribute to Apache Flink.
> Would you please give me the contributor permission?
> My JIRA ID is CHlambda
>


Re: FLINK-20767 - Support for nested fields filter push down

2023-08-13 Thread liu ron
ecial case. This avoids doing
> > the
> > > >> same
> > > >> > thing in different ways which is also a confusion to the users. To
> > me,
> > > >> the
> > > >> > int[][] format would become kind of a technical debt after we
> extend
> > > the
> > > >> > FieldReferenceExpression. Although we don't have to address it
> right
> > > >> away
> > > >> > in the same FLIP, this kind of debt accumulates over time and
> makes
> > > the
> > > >> > project harder to learn and maintain. So, personally I prefer to
> > > address
> > > >> > these technical debts as soon as possible.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Jiangjie (Becket) Qin
> > > >> >
> > > >> > On Wed, Aug 2, 2023 at 8:19 PM Jark Wu  wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > I agree with Becket that we may need to extend
> > > >> FieldReferenceExpression
> > > >> > to
> > > >> > > support nested field access (or maybe a new
> > > >> > > NestedFieldReferenceExpression).
> > > >> > > But I have some concerns about evolving the
> > > >> > > SupportsProjectionPushDown.applyProjection.
> > > >> > > A projection is much simpler than Filter Expression which only
> > needs
> > > >> to
> > > >> > > represent the field indexes.
> > > >> > > If we evolve `applyProjection` to accept
> > > >> `List
> > > >> > > projectedFields`,
> > > >> > > users have to convert the `List` back
> to
> > > >> > int[][]
> > > >> > > which is an overhead for users.
> > > >> > > Field indexes (int[][]) is required to project schemas with the
> > > >> > > utility org.apache.flink.table.connector.Projection.
> > > >> > >
> > > >> > >
> > > >> > > Best,
> > > >> > > Jark
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Wed, 2 Aug 2023 at 07:40, Venkatakrishnan Sowrirajan <
> > > >> > vsowr...@asu.edu>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Thanks Becket for the suggestion. That makes sense. Let me try
> > it
> > > >> out
> > > >> > and
> > > >> > > > get back to you.
> > > >> > > >
> > > >> > > > Regards
> > > >> > > > Venkata krishnan
> > > >> > > >
> > > >> > > >
> > > >> > > > On Tue, Aug 1, 2023 at 9:04 AM Becket Qin <
> becket@gmail.com
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > This is a very useful feature in practice.
> > > >> > > > >
> > > >> > > > > It looks to me that the key issue here is that Flink
> > > >> > ResolvedExpression
> > > >> > > > > does not have necessary abstraction for nested field access.
> > So
> > > >> the
> > > >> > > > Calcite
> > > >> > > > > RexFieldAccess does not have a counterpart in the
> > > >> ResolvedExpression.
> > > >> > > The
> > > >> > > > > FieldReferenceExpression only supports direct access to the
> > > >> fields,
> > > >> > not
> > > >> > > > > nested access.
> > > >> > > > >
> > > >> > > > > Theoretically speaking, this nested field reference is also
> > > >> required
> > > >> > by
> > > >> > > > > projection pushdown. However, we addressed that by using an
> > > >> int[][]
> > > >> > in
> > > >> > > > the
> > > >> > > > > SupportsProjectionPushDown interface. Maybe we can do the
> > > >> following:
> > > >> > > > >
> > > >> > > > > 1. Extend the FieldRefer

Re: [DISCUSS] Connectors, Formats, and even User Code should also be pluggable.

2023-08-10 Thread liu ron
Hi, zhiqiang

Thanks for driving this proposal, it looks very interesting.

As we all know, Flink has a very powerful connector ecosystem, and as Jing
said, all the connectors are currently being externalized, and it would be
very useful to design a pluggable mechanism for connectors, which could
solve the potential problem of multiple versions of a connector conflicting
without the user having to relocate. In our company, we also have a rich
set of connectors internally, and we need to handle them carefully to avoid
class loading conflicts.

I also agree with Jing saying that we need to turn it into a FLIP. In that
case, the whole design will be very complete.

Best,
Ron


Thor God  于2023年8月10日周四 14:47写道:

> I'm interested in this, we often have linker dependency conflicts and it
> takes a lot of work to resolve dependency conflicts.
>
> Benchao Li  于2023年7月25日周二 20:51写道:
>
> > I agree with Jing that the current doc is quite preliminary, and needs to
> > be turned into a FLIP.
> >
> > I'll be very interested in this FLIP, and looking forward to it. We've
> > suffered from reshading/relocating heavy connector dependencies for a
> long
> > time.
> >
> > Besides, we've implemented a plugin mechanism to load hive udf in our
> batch
> > SQL scenario internally, it saves us a lot of effort to handle the
> > dependency conflicts. The most challenging part would be the
> > (de)serialization of those classes loaded through plugins according to
> our
> > experience. (I noticed you have already considered this)
> >
> > Jing Ge  于2023年7月21日周五 06:44写道:
> >
> > > Hi Zhiqiang,
> > >
> > > Thanks for your proposal. The idea is very interesting! Since almost
> all
> > > connectors are or will be externalized, the pluggable design you
> > suggested
> > > could help reduce the development effort.
> > >
> > > As you mentioned, the attached doc contains only your preliminary
> idea. I
> > > would suggest that you could turn it into a FLIP with more details and
> do
> > > the followings:
> > >
> > > 1. Describe a real conflict case that will benefit from the pluggable
> > > design
> > > 2. Describe where you want to modify the code, or provide a POC branch
> > > 3. Guideline to tell users how to use it, i.e. where (the plugins dir)
> > > should the connector jar be located, how does it look like with an
> > example,
> > > etc.
> > >
> > > WDYT?
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Fri, Jul 14, 2023 at 3:00 PM zhiqiang li 
> > > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I have observed that in [1], connectors and formats are pluggable,
> > > > allowing user code to be easily integrated. The advantages of having
> > > > pluggable connectors are evident, as it helps avoid conflicts between
> > > > different versions of jar packages. If classloader isolation is not
> > used,
> > > > shading becomes necessary to resolve conflicts, resulting in a
> > > significant
> > > > waste of development time. I understand that implementing this change
> > may
> > > > require numerous API modifications, so I would like to discuss in
> this
> > > > email.
> > > >
> > > > > Plugins cannot access classes from other plugins or from Flink that
> > > have
> > > > not been specifically whitelisted.
> > > > > This strict isolation allows plugins to contain conflicting
> versions
> > of
> > > > the same library without the need to relocate classes or to converge
> to
> > > > common versions.
> > > > > Currently, file systems and metric reporters are pluggable but, in
> > the
> > > > future, connectors, formats, and even user code should also be
> > pluggable.
> > > >
> > > > [2] It is my preliminary idea.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/
> > > > [2]
> > > >
> > >
> >
> https://docs.google.com/document/d/1XP2fBpcntK0YIdQ_Ax7JV2MhNdebvkFxSiNJRp6WQ24/edit?usp=sharing
> > > >
> > > >
> > > > Best,
> > > > Zhiqiang
> > > >
> > > >
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-08-09 Thread liu ron
Hi, Allison

Thanks for driving this proposal, it looks cool for batch jobs under
application mode. But after reading your FLIP document and [1], I have a
question. Why do you want to rename the execution.attached configuration to
client.attached.after.submission and at the same time deprecate
execution.attached? Based on your design, I understand the role of these
two options are the same. Introducing a new option would increase the cost
of understanding and use for the user, so why not follow the idea discussed
in FLINK-25495 and make Application mode support attached.execution.

[1] https://issues.apache.org/jira/browse/FLINK-25495

Best,
Ron

Venkatakrishnan Sowrirajan  于2023年8月9日周三 02:07写道:

> This is definitely a useful feature especially for the flink batch
> execution workloads using flow orchestrators like Airflow, Azkaban, Oozie
> etc. Thanks for reviving this issue and starting a FLIP.
>
> Regards
> Venkata krishnan
>
>
> On Mon, Aug 7, 2023 at 4:09 PM Allison Chang  >
> wrote:
>
> > Hi all,
> >
> > I am opening this thread to discuss this proposal to support attached
> > execution on Flink Application Completion for Batch Jobs. The link to the
> > FLIP proposal is here:
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-323*3A*Support*Attached*Execution*on*Flink*Application*Completion*for*Batch*Jobs__;JSsrKysrKysrKys!!IKRxdwAv5BmarQ!friFO6bJub5FKSLhPIzA6kv-7uffv-zXlv9ZLMKqj_xMcmZl62HhsgvwDXSCS5hfSeyHZgoAVSFg3fk7ChaAFNKi$
> >
> > This FLIP proposes adding back attached execution for Application Mode.
> In
> > the past attached execution was supported for the per-job mode, which
> will
> > be deprecated and we want to include this feature back into Application
> > mode.
> >
> > Please reply to this email thread and share your thoughts/opinions.
> >
> > Thank you!
> >
> > Allison Chang
> >
>


Re: [Question] How to assign uid to existing stream

2023-08-08 Thread liu ron
Hi, Dennis

You should send this question to mailbox u...@flink.apache.org, dev is used
to discuss development-related issues.

Back to your question:

- AFAIK, uid will be set up as a random value if not defined. How can I
find the current uid?

Regarding the function of uid, you can see [1] and [2] for more detail. The
framework generates UIDs automatically if they are not set manually, it is
a random value, and it is maybe changed if you modify the job. There is
currently no way to get the UIDs for each operator directly.

- What will happen if I set up a new uid in the code above?

If you set a new uid, and restore from savepoint, the restart may fail
because the old UID can't be found because Kafka source is stateful

- When I want to migrate 'FlinkKafkaConsumer' to 'KafkaSource', is it okay
not to set up 'uid'?

I thinks it is okay.


[1]
https://github.com/apache/flink/blob/dfb9cb851dc1f0908ea6c3ce1230dd8ca2b48733/flink-core/src/main/java/org/apache/flink/configuration/PipelineOptions.java#L70
[2]
https://github.com/apache/flink/blob/dfb9cb851dc1f0908ea6c3ce1230dd8ca2b48733/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L168

Best,
Ron

Dennis Jung  于2023年8月8日周二 10:44写道:

> Hello people,
> As suggested in the following, it seems it is recommended to set 'uid' for
> operators.
>
>
> https://ververica.zendesk.com/hc/en-us/articles/360010248879-Should-I-call-uid-after-addSource-or-addSink-
>
> Currently there is an existing personal project which uses
> 'FlinkKafkaConsumer'(it is a deprecated feature) to consume data from
> Kafka, which 'uid' is not configured.
>
> '''
> ...
> FlinkKafkaConsumer kc = ... ;
> DataStream ds = env.addSource(kc, "flink-kafka-source");
> '''
>
> - AFAIK, uid will be set up as a random value if not defined. How can I
> find the current uid?
> - What will happen if I set up a new uid in the code above?
> - When I want to migrate 'FlinkKafkaConsumer' to 'KafkaSource', is it okay
> not to set up 'uid'?
>
> Thanks.
>


Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei

2023-08-07 Thread liu ron
Congratulations Yanfei!

Best,
Ron

Zakelly Lan  于2023年8月7日周一 23:15写道:

> Congratulations, Yanfei!
>
> Best regards,
> Zakelly
>
> On Mon, Aug 7, 2023 at 9:04 PM Lincoln Lee  wrote:
> >
> > Congratulations, Yanfei!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Weihua Hu  于2023年8月7日周一 20:43写道:
> >
> > > Congratulations Yanfei!
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Mon, Aug 7, 2023 at 8:08 PM Feifan Wang  wrote:
> > >
> > > > Congratulations Yanfei! :)
> > > >
> > > >
> > > >
> > > > ——
> > > > Name: Feifan Wang
> > > > Email: zoltar9...@163.com
> > > >
> > > >
> > > >  Replied Message 
> > > > | From | Matt Wang |
> > > > | Date | 08/7/2023 19:40 |
> > > > | To | dev@flink.apache.org |
> > > > | Subject | Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei |
> > > > Congratulations Yanfei!
> > > >
> > > >
> > > > --
> > > >
> > > > Best,
> > > > Matt Wang
> > > >
> > > >
> > > >  Replied Message 
> > > > | From | Mang Zhang |
> > > > | Date | 08/7/2023 18:56 |
> > > > | To |  |
> > > > | Subject | Re:Re: [ANNOUNCE] New Apache Flink Committer - Yanfei
> Lei |
> > > > Congratulations--
> > > >
> > > > Best regards,
> > > > Mang Zhang
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 在 2023-08-07 18:17:58,"Yuxin Tan"  写道:
> > > > Congrats, Yanfei!
> > > >
> > > > Best,
> > > > Yuxin
> > > >
> > > >
> > > > weijie guo  于2023年8月7日周一 17:59写道:
> > > >
> > > > Congrats, Yanfei!
> > > >
> > > > Best regards,
> > > >
> > > > Weijie
> > > >
> > > >
> > > > Biao Geng  于2023年8月7日周一 17:03写道:
> > > >
> > > > Congrats, Yanfei!
> > > > Best,
> > > > Biao Geng
> > > >
> > > > 发送自 Outlook for iOS
> > > > 
> > > > 发件人: Qingsheng Ren 
> > > > 发送时间: Monday, August 7, 2023 4:23:52 PM
> > > > 收件人: dev@flink.apache.org 
> > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
> > > >
> > > > Congratulations and welcome, Yanfei!
> > > >
> > > > Best,
> > > > Qingsheng
> > > >
> > > > On Mon, Aug 7, 2023 at 4:19 PM Matthias Pohl  > > > .invalid>
> > > > wrote:
> > > >
> > > > Congratulations, Yanfei! :)
> > > >
> > > > On Mon, Aug 7, 2023 at 10:00 AM Junrui Lee 
> > > > wrote:
> > > >
> > > > Congratulations Yanfei!
> > > >
> > > > Best,
> > > > Junrui
> > > >
> > > > Yun Tang  于2023年8月7日周一 15:19写道:
> > > >
> > > > Congratulations, Yanfei!
> > > >
> > > > Best
> > > > Yun Tang
> > > > 
> > > > From: Danny Cranmer 
> > > > Sent: Monday, August 7, 2023 15:10
> > > > To: dev 
> > > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
> > > >
> > > > Congrats Yanfei! Welcome to the team.
> > > >
> > > > Danny
> > > >
> > > > On Mon, 7 Aug 2023, 08:03 Rui Fan, <1996fan...@gmail.com> wrote:
> > > >
> > > > Congratulations Yanfei!
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Mon, Aug 7, 2023 at 2:56 PM Yuan Mei 
> > > > wrote:
> > > >
> > > > On behalf of the PMC, I'm happy to announce Yanfei Lei as a new
> > > > Flink
> > > > Committer.
> > > >
> > > > Yanfei has been active in the Flink community for almost two
> > > > years
> > > > and
> > > > has
> > > > played an important role in developing and maintaining State
> > > > and
> > > > Checkpoint
> > > > related features/components, including RocksDB Rescaling
> > > > Performance
> > > > Improvement and Generic Incremental Checkpoints.
> > > >
> > > > Yanfei also helps improve community infrastructure in many
> > > > ways,
> > > > including
> > > > migrating the Flink Daily performance benchmark to the Apache
> > > > Flink
> > > > slack
> > > > channel. She is the maintainer of the benchmark and has
> > > > improved
> > > > its
> > > > detection stability significantly. She is also one of the major
> > > > maintainers
> > > > of the FrocksDB Repo and released FRocksDB 6.20.3 (part of
> > > > Flink
> > > > 1.17
> > > > release). Yanfei is a very active community member, supporting
> > > > users
> > > > and
> > > > participating
> > > > in tons of discussions on the mailing lists.
> > > >
> > > > Please join me in congratulating Yanfei for becoming a Flink
> > > > Committer!
> > > >
> > > > Thanks,
> > > > Yuan Mei (on behalf of the Flink PMC)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
>


Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu

2023-08-07 Thread liu ron
Congratulations, Hangxiang!

Best,
Ron

Zakelly Lan  于2023年8月7日周一 23:15写道:

> Congratulations, Hangxiang!
>
> Best regards,
> Zakelly
>
> On Mon, Aug 7, 2023 at 9:03 PM Lincoln Lee  wrote:
> >
> > Congratulations Hangxiang!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Feifan Wang  于2023年8月7日周一 20:09写道:
> >
> > > Congratulations Hangxiang! :)
> > >
> > >
> > >
> > >
> > >
> > > ——
> > > Name: Feifan Wang
> > > Email: zoltar9...@163.com
> > >
> > >
> > >  Replied Message 
> > > | From | Mang Zhang |
> > > | Date | 08/7/2023 18:56 |
> > > | To |  |
> > > | Subject | Re:Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang
> Yu |
> > > Congratulations--
> > >
> > > Best regards,
> > > Mang Zhang
> > >
> > >
> > >
> > >
> > >
> > > 在 2023-08-07 18:18:08,"Yuxin Tan"  写道:
> > > Congrats, Hangxiang!
> > >
> > > Best,
> > > Yuxin
> > >
> > >
> > > weijie guo  于2023年8月7日周一 17:59写道:
> > >
> > > Congrats, Hangxiang!
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Biao Geng  于2023年8月7日周一 17:04写道:
> > >
> > > Congrats, Hangxiang!
> > > Best,
> > > Biao Geng
> > >
> > >
> > > 发送自 Outlook for iOS
> > > 
> > > 发件人: Qingsheng Ren 
> > > 发送时间: Monday, August 7, 2023 4:23:11 PM
> > > 收件人: dev@flink.apache.org 
> > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu
> > >
> > > Congratulations and welcome aboard, Hangxiang!
> > >
> > > Best,
> > > Qingsheng
> > >
> > > On Mon, Aug 7, 2023 at 4:19 PM Matthias Pohl  > > .invalid>
> > > wrote:
> > >
> > > Congratulations, Hangxiang! :)
> > >
> > > On Mon, Aug 7, 2023 at 10:01 AM Junrui Lee 
> > > wrote:
> > >
> > > Congratulations, Hangxiang!
> > >
> > > Best,
> > > Junrui
> > >
> > > Yun Tang  于2023年8月7日周一 15:19写道:
> > >
> > > Congratulations, Hangxiang!
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: Danny Cranmer 
> > > Sent: Monday, August 7, 2023 15:11
> > > To: dev 
> > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu
> > >
> > > Congrats Hangxiang! Welcome to the team.
> > >
> > > Danny.
> > >
> > > On Mon, 7 Aug 2023, 08:04 Rui Fan, <1996fan...@gmail.com> wrote:
> > >
> > > Congratulations Hangxiang!
> > >
> > > Best,
> > > Rui
> > >
> > > On Mon, Aug 7, 2023 at 2:58 PM Yuan Mei 
> > > wrote:
> > >
> > > On behalf of the PMC, I'm happy to announce Hangxiang Yu as a
> > > new
> > > Flink
> > > Committer.
> > >
> > > Hangxiang has been active in the Flink community for more than
> > > 1.5
> > > years
> > > and has played an important role in developing and maintaining
> > > State
> > > and
> > > Checkpoint related features/components, including Generic
> > > Incremental
> > > Checkpoints (take great efforts to make the feature
> > > prod-ready).
> > > Hangxiang
> > > is also the main driver of the FLIP-263: Resolving schema
> > > compatibility.
> > >
> > > Hangxiang is passionate about the Flink community. Besides the
> > > technical
> > > contribution above, he is also actively promoting Flink: talks
> > > about
> > > Generic
> > > Incremental Checkpoints in Flink Forward and Meet-up. Hangxiang
> > > also
> > > spent
> > > a good amount of time supporting users, participating in
> > > Jira/mailing
> > > list
> > > discussions, and reviewing code.
> > >
> > > Please join me in congratulating Hangxiang for becoming a Flink
> > > Committer!
> > >
> > > Thanks,
> > > Yuan Mei (on behalf of the Flink PMC)
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>


Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

2023-08-06 Thread liu ron
Hi, Rui

Thanks for driving the FLIP.

The tuning of streaming jobs by autoscaler is very important. Although the
mainstream trend now is cloud-native, many companies still run their Flink
jobs on Yarn for historical reasons. If we can decouple autoscaler from K8S
and turn it into a common tool that can support other resource management
frameworks such as Yarn, I think it will be very meaningful.
+1 for this proposal.

Best,
Ron


Gyula Fóra  于2023年8月5日周六 15:03写道:

> Hi Rui!
>
> Thanks for the proposal.
>
> I agree with Max on that the state store abstractions could be improved and
> be more specific as we know what goes into the state. It could simply be
>
> public interface AutoScalerStateStore {
> Map getState(KEY jobKey)
> void updateState(KEY jobKey, Map state);
> }
>
>
> We could also remove the entire recommended parallelism logic from the
> interface and make it internal to the implementation somehow because it's
> not very nice in the current form.
>
> Cheers,
> Gyula
>
> On Fri, Aug 4, 2023 at 7:05 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> > Hi Max,
> >
> > After careful consideration, I prefer to keep the AutoScalerStateStore
> > instead of AutoScalerEventHandler taking over the work of
> > AutoScalerStateStore. And the following are some reasons:
> >
> > 1. Keeping the AutoScalerStateStore to make StateStore easy to plug in.
> >
> > Currently, the kubernetes-operator-autoscaler uses the ConfigMap as the
> > state store. However, users may use a different state store for
> > yarn-autoscaler or generic autoscaler. Such as: MySQL StateStore,
> > Heaped StateStore or PostgreSQL StateStore, etc.
> >
> > Of course, kubernetes autoscaler can also use the MySQL StateStore.
> > If the AutoScalerEventHandler is responsible for recording events,
> > scaling job and accessing state, whenever users or community want to
> > create a new state store, they must also implement the new
> > AutoScalerEventHandler, it includes recording events and scaling job.
> >
> > If we decouple AutoScalerEventHandler and AutoScalerStateStore,
> > it's easy to implement a new state store.
> >
> > 2. AutoScalerEventHandler isn't suitable for access state.
> >
> > Sometimes the generic autoscaler needs to query state,
> > AutoScalerEventHandler can update the state during handling events.
> > However, it's wired to provide a series of interfaces to query state.
> >
> > What do you think?
> >
> > And looking forward to more thoughts from the community, thanks!
> >
> > Best,
> > Rui Fan
> >
> > On Tue, Aug 1, 2023 at 11:47 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Hi Max,
> > >
> > > Thanks for your quick response!
> > >
> > > > 1. Handle state in the AutoScalerEventHandler which will receive
> > > > all related scaling and metric collection events, and can keep
> > > > track of any state.
> > >
> > > If I understand correctly, you mean that updating state is just part of
> > > handling events, right?
> > >
> > > If yes, sounds make sense. However, I have some concerns:
> > >
> > > - Currently, we have 3 key-values that need to be stored. And the
> > > autoscaler needs to get them first, then update them, and sometimes
> > > remove them. If we use AutoScalerEventHandler, we need to provided
> > > 9 methods, right? Every key has 3 methods.
> > > - Do we add the persistState interface for AutoScalerEventHandler to
> > > persist in-memory state to kubernetes?
> > >
> > > > 2. In the long run, the autoscaling logic can move to a
> > > > separate repository, although this will complicate the release
> > > > process, so I would defer this unless there is strong interest.
> > >
> > > I also agree to leave it in flink-k8s-operator for now. Unless moving
> it
> > > out of flink-k8s-operator is necessary in the future.
> > >
> > > > 3. The proposal mentions some removal of tests.
> > >
> > > Sorry, I didn't express clearly in FLIP. POC just check whether these
> > > interfaces work well. It will take more time if I develop all the tests
> > > during POC. So I removed these tests in my POC.
> > >
> > > These tests will be completed in the final PR, and the test is very
> > useful
> > > for less bugs.
> > >
> > > Best,
> > > Rui Fan
> > >
> > > On Tue, Aug 1, 2023 at 10:10 PM Maximilian Michels 
> > wrote:
> > >
> > >> Hi Rui,
> > >>
> > >> Thanks for the proposal. I think it makes a lot of sense to decouple
> > >> the autoscaler from Kubernetes-related dependencies. A couple of notes
> > >> when I read the proposal:
> > >>
> > >> 1. You propose AutoScalerEventHandler, AutoScalerStateStore,
> > >> AutoScalerStateStoreFactory, and AutoScalerEventHandler.
> > >> AutoscalerStateStore is a generic key/value database (methods:
> > >> "get"/"put"/"delete"). I would propose to refine this interface and
> > >> make it less general purpose, e.g. add a method for persisting scaling
> > >> decisions as well as any metrics gathered for the current metric
> > >> window. For simplicity, I'd even go so far to remove the state store
> > 

Re: Projection pushdown for Avro files seems to be buggy

2023-08-06 Thread liu ron
Hi, Xingcan

After deep dive into the source code, I also think it is a bug.

Best,
Ron

Xingcan Cui  于2023年8月5日周六 23:27写道:

> Hi all,
>
> We tried to read some Avro files with the Flink SQL (1.16.1) and noticed
> that the projection pushdown seems to be buggy.
>
> The Avro schema we used has 4 fields, namely f1, f2, f3 and f4. When using
> "SELECT *" or SELECT the first n fields (e.g., SELECT f1 or SELECT f1, f2)
> to read the table, it works fine. However, if we query an arbitrary field
> other than f1, a data & converter mismatch exception will show.
>
> After digging into the code, I figured out that `AvroFileFormatFactory`
> generates a `producedDataType` with projection push down. When generating
> AvroToRowDataConverters, it only considers the selected fields and
> generates converters for them. However, the records read by the
> DataFileReader contain all fields.
>
> Specifically, for the following code snippet from AvroToRowDataConverters,
> the fieldConverters contains only the converters for the selected fields
> but the record contains all fields which leads to a converters & data
> fields mismatch problem. That also explains why selecting the first n
> fields works (It's because the converters & data fields happen to match).
>
> ```
> return avroObject -> {
>   IndexedRecord record = (IndexedRecord) avroObject;
>   GenericRowData row = new GenericRowData(arity);
>   for (int i = 0; i < arity; ++i) {
> // avro always deserialize successfully even though the type isn't
> matched
> // so no need to throw exception about which field can't be
> deserialized
> row.setField(i, fieldConverters[i].convert(record.get(i)));
>   }
>   return row;
> };
> ```
>
> Not sure if any of you hit this before. If it's confirmed to be a bug, I'll
> file a ticket and try to fix it.
>
> Best,
> Xingcan
>


Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu

2023-08-03 Thread liu ron
Congratulations Weihua!

Best,
Ron

Shammon FY  于2023年8月4日周五 13:25写道:

> Congratulations, Weihua!
>
> Best,
> Shammon FY
>
> On Fri, Aug 4, 2023 at 12:33 PM Jing Ge 
> wrote:
>
> > congrats! Weihua!
> >
> > Best regards,
> > Jing
> >
> > On Fri, Aug 4, 2023 at 12:20 PM Matt Wang  wrote:
> >
> > > Congratulations Weihua ~
> > >
> > >
> > > --
> > >
> > > Best,
> > > Matt Wang
> > >
> > >
> > >  Replied Message 
> > > | From | Rui Fan<1996fan...@gmail.com> |
> > > | Date | 08/4/2023 11:29 |
> > > | To |  |
> > > | Cc | Weihua Hu |
> > > | Subject | Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu |
> > > Congratulations Weihua, well deserved!
> > >
> > > Best,
> > > Rui Fan
> > >
> > > On Fri, Aug 4, 2023 at 11:19 AM Xintong Song 
> > > wrote:
> > >
> > > Hi everyone,
> > >
> > > On behalf of the PMC, I'm very happy to announce Weihua Hu as a new
> Flink
> > > Committer!
> > >
> > > Weihua has been consistently contributing to the project since May
> 2022.
> > He
> > > mainly works in Flink's distributed coordination areas. He is the main
> > > contributor of FLIP-298 and many other improvements in large-scale job
> > > scheduling and improvements. He is also quite active in mailing lists,
> > > participating discussions and answering user questions.
> > >
> > > Please join me in congratulating Weihua!
> > >
> > > Best,
> > >
> > > Xintong (on behalf of the Apache Flink PMC)
> > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Matthias Pohl

2023-08-03 Thread liu ron
Congrats, Matthias!

Best,
Ron

Shammon FY  于2023年8月4日周五 13:24写道:

> Congratulations, Matthias!
>
> On Fri, Aug 4, 2023 at 1:13 PM Samrat Deb  wrote:
>
> > Congrats, Matthias!
> >
> >
> > On Fri, 4 Aug 2023 at 10:13 AM, Benchao Li  wrote:
> >
> > > Congratulations, Matthias!
> > >
> > > Jing Ge  于2023年8月4日周五 12:35写道:
> > >
> > > > Congrats! Matthias!
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Fri, Aug 4, 2023 at 12:09 PM Yangze Guo 
> wrote:
> > > >
> > > > > Congrats, Matthias!
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Fri, Aug 4, 2023 at 11:44 AM Qingsheng Ren 
> > > wrote:
> > > > > >
> > > > > > Congratulations, Matthias! This is absolutely well deserved.
> > > > > >
> > > > > > Best,
> > > > > > Qingsheng
> > > > > >
> > > > > > On Fri, Aug 4, 2023 at 11:31 AM Rui Fan <1996fan...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Congratulations Matthias, well deserved!
> > > > > > >
> > > > > > > Best,
> > > > > > > Rui Fan
> > > > > > >
> > > > > > > On Fri, Aug 4, 2023 at 11:30 AM Leonard Xu 
> > > > wrote:
> > > > > > >
> > > > > > > > Congratulations,  Matthias.
> > > > > > > >
> > > > > > > > Well deserved ^_^
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Leonard
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Aug 4, 2023, at 11:18 AM, Xintong Song <
> > > tonysong...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > On behalf of the PMC, I'm very happy to announce that
> > Matthias
> > > > > Pohl has
> > > > > > > > > joined the Flink PMC!
> > > > > > > > >
> > > > > > > > > Matthias has been consistently contributing to the project
> > > since
> > > > > Sep
> > > > > > > > 2020,
> > > > > > > > > and became a committer in Dec 2021. He mainly works in
> > Flink's
> > > > > > > > distributed
> > > > > > > > > coordination and high availability areas. He has worked on
> > many
> > > > > FLIPs
> > > > > > > > > including FLIP195/270/285. He helped a lot with the release
> > > > > management,
> > > > > > > > > being one of the Flink 1.17 release managers and also very
> > > active
> > > > > in
> > > > > > > > Flink
> > > > > > > > > 1.18 / 2.0 efforts. He also contributed a lot to improving
> > the
> > > > > build
> > > > > > > > > stability.
> > > > > > > > >
> > > > > > > > > Please join me in congratulating Matthias!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > Xintong (on behalf of the Apache Flink PMC)
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>


Re: [VOTE] FLIP-333: Redesign Apache Flink website

2023-08-03 Thread liu ron
+1

Best,
Ron

Xintong Song  于2023年8月4日周五 09:35写道:

> +1 (binding)
>
> Best,
>
> Xintong
>
>
>
> On Fri, Aug 4, 2023 at 3:05 AM Hong Liang  wrote:
>
> > +1 (binding)
> >
> > Thanks Deepthi!
> >
> >
> >
> > On Thu, Aug 3, 2023 at 7:44 PM Danny Cranmer 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks Deepthi
> > >
> > >
> > > On Thu, 3 Aug 2023, 12:03 Rui Fan, <1996fan...@gmail.com> wrote:
> > >
> > > > +1(binding), thanks for driving this proposal, it's cool !
> > > >
> > > > Best,
> > > > Rui Fan
> > > >
> > > > On Thu, Aug 3, 2023 at 6:06 PM Jing Ge 
> > > wrote:
> > > >
> > > > > +1, thanks for driving it!
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > > > > On Thu, Aug 3, 2023 at 4:49 AM Mohan, Deepthi
> > >  > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thank you all for your feedback on FLIP-333. I’d like to start a
> > > vote.
> > > > > >
> > > > > > Discussion thread:
> > > > > > https://lists.apache.org/thread/z9j0rqt61ftgbkr37gzwbjg0n4fl1hsf
> > > > > > FLIP:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-333%3A+Redesign+Apache+Flink+website
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Deepthi
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Hong Teoh

2023-08-03 Thread liu ron
Congratulations, Hong!

Best,
Ron

Samrat Deb  于2023年8月4日周五 10:49写道:

> Congratulations, Hong Teoh
>
> On Fri, 4 Aug 2023 at 7:52 AM, Benchao Li  wrote:
>
> > Congratulations, Hong!
> >
> > yuxia  于2023年8月4日周五 09:23写道:
> >
> > > Congratulations, Hong Teoh!
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Matthias Pohl" 
> > > 收件人: "dev" 
> > > 发送时间: 星期四, 2023年 8 月 03日 下午 11:24:39
> > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Hong Teoh
> > >
> > > Congratulations, Hong! :)
> > >
> > > On Thu, Aug 3, 2023 at 3:39 PM Leonard Xu  wrote:
> > >
> > > > Congratulations, Hong!
> > > >
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > > On Aug 3, 2023, at 8:45 PM, Jiabao Sun  > > .INVALID>
> > > > wrote:
> > > > >
> > > > > Congratulations, Hong Teoh!
> > > > >
> > > > > Best,
> > > > > Jiabao Sun
> > > > >
> > > > >> 2023年8月3日 下午7:32,Danny Cranmer  写道:
> > > > >>
> > > > >> On behalf of the PMC, I'm very happy to announce Hong Teoh as a
> new
> > > > Flink
> > > > >> Committer.
> > > > >>
> > > > >> Hong has been active in the Flink community for over 1 year and
> has
> > > > played
> > > > >> a key role in developing and maintaining AWS integrations, core
> > > > connector
> > > > >> APIs and more recently, improvements to the Flink REST API.
> > > > Additionally,
> > > > >> Hong is a very active community member, supporting users and
> > > > participating
> > > > >> in discussions on the mailing lists, Flink slack channels and
> > speaking
> > > > at
> > > > >> conferences.
> > > > >>
> > > > >> Please join me in congratulating Hong for becoming a Flink
> > Committer!
> > > > >>
> > > > >> Thanks,
> > > > >> Danny Cranmer (on behalf of the Flink PMC)
> > > > >
> > > >
> > > >
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


Re: FLINK-20767 - Support for nested fields filter push down

2023-08-01 Thread liu ron
Hi, Venkata

Thanks for reporting this issue. Currently, Flink doesn't support nested
filter pushdown. I also think that this optimization would be useful,
especially for jobs, which may need to read a lot of data from the parquet
or orc file. We didn't move forward with this for some priority reasons.

Regarding your three questions, I will respond to you later after my
on-call is finished because I need to dive into the source code. About your
commit, I don't think it's the right solution because
FieldReferenceExpression doesn't currently support nested field filter
pushdown, maybe we need to extend it.

You can also look further into reasonable solutions, which we'll discuss
further later on.

Best,
Ron


Venkatakrishnan Sowrirajan  于2023年7月29日周六 03:31写道:

> Hi all,
>
> Currently, I am working on adding support for nested fields filter push
> down. In our use case running Flink on Batch, we found nested fields filter
> push down is key - without it, it is significantly slow. Note: Spark SQL
> supports nested fields filter push down.
>
> While debugging the code using IcebergTableSource as the table source,
> narrowed down the issue to missing support for
> RexNodeExtractor#RexNodeToExpressionConverter#visitFieldAccess.
> As part of fixing it, I made changes by returning an
> Option(FieldReferenceExpression)
> with appropriate reference to the parent index and the child index for the
> nested field with the data type info.
>
> But this new ResolvedExpression cannot be converted to RexNode which
> happens in PushFilterIntoSourceScanRuleBase
> <
> https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushFilterIntoSourceScanRuleBase.java#L104
> >
> .
>
> Few questions
>
> 1. Does FieldReferenceExpression support nested fields currently or should
> it be extended to support nested fields? I couldn't figure this out from
> the PushProjectIntoTableScanRule that supports nested column projection
> push down.
> 2. ExpressionConverter
> <
> https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/expressions/converter/ExpressionConverter.java#L197
> >
> converts ResolvedExpression -> RexNode but the new FieldReferenceExpression
> with the nested field cannot be converted to RexNode. This is why the
> answer to the 1st question is key.
> 3. Anything else that I'm missing here? or is there an even easier way to
> add support for nested fields filter push down?
>
> Partially working changes - Commit
> <
> https://github.com/venkata91/flink/commit/00cdf34ecf9be3ba669a97baaed4b69b85cd26f9
> >
> Please
> feel free to leave a comment directly in the commit.
>
> Any pointers here would be much appreciated! Thanks in advance.
>
> Disclaimer: Relatively new to Flink code base especially Table planner :-).
>
> Regards
> Venkata krishnan
>


Re: [DISCUSSION] Revival of FLIP-154 (SQL Implicit Type Coercion)

2023-07-31 Thread liu ron
Hi, Sergey

Thanks for reviving this FLIP, Implicit type coercion is very useful, and
in our company practice, we have allowed this feature to be turned on by
way of an option. At one point I was going to push for this part of the
work, but didn't care for it because of other things. Feel free to start
this work.

Best,
Ron

Benchao Li  于2023年7月25日周二 21:06写道:

> Hi Sergey,
>
> There is a discussion[1] for this FLIP before.
>
> I like this feature, thanks for driving this. I hope it does not need to
> much customization (if there is any improvement which would benefit both
> Calcite and Flink, it would be great to push those also to upstream
> project, and I would like to help on that)
>
> [1] https://lists.apache.org/thread/mglop6mgd36rrbjt7ojyd5sj8tyoc3c2
>
> Sergey Nuyanzin  于2023年7月19日周三 06:18写道:
>
> > Hello everyone
> >
> > I'd like to revive FLIP-154[1] a bit.
> >
> > I failed with finding any discussion/vote thread about it (please point
> me
> > to that if it is present somewhere)
> >
> > Also FLIP itself looks abandoned (no activity for a couple of years),
> > however it seems to be useful.
> >
> > I did a bit investigation about that
> >
> > From one side Calcite provides its own coercion... However, sometimes it
> > behaves strangely and is not ready to use in Flink.
> > for instance
> > 1) it can not handle cases with `with` subqueries and fails with NPE
> (even
> > if it's fixed it will come not earlier than with 1.36.0)
> > 2) under the hood it uses hardcoded `cast`, especially it is an issue for
> > equals where `cast` is invoked without fallback to `null`. In addition it
> > tries to cast `string` to `date`. All this leads to failure for such
> > queries like `select uuid() = null;` where it tries to cast the result of
> > `uuid()` to date without a fallback option.
> >
> > The good thing is that Calcite also provides a custom TypeCoercionFactory
> > which could be used during coercion (in case it is enabled). This could
> > allow for instance to use `try_cast` instead of `cast`, embed the fix for
> > aforementioned NPE, enable only required coercion rules. Also it will
> > enable coercions rule by rule instead of big bang.
> >
> > I would volunteer to update the FLIP page with usage of a custom factory
> if
> > there are no objections and come back with a discussion thread to revive
> > the work on it.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-154%3A+SQL+Implicit+Type+Coercion
> > --
> > Best regards,
> > Sergey
> >
>
>
> --
>
> Best,
> Benchao Li
>


Re: [DISCUSS][2.0] FLIP-349: Move RocksDB statebackend classes to o.a.f.state.rocksdb package

2023-07-31 Thread liu ron
+1,

Best,
Ron

Yun Tang  于2023年7月26日周三 14:19写道:

> +1 (binding)
>
> Best
> Yun Tang
> 
> From: Yu Li 
> Sent: Wednesday, July 26, 2023 14:10
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS][2.0] FLIP-349: Move RocksDB statebackend classes to
> o.a.f.state.rocksdb package
>
> +1
>
> Best Regards,
> Yu
>
>
> On Wed, 26 Jul 2023 at 10:47, Yanfei Lei  wrote:
>
> > +1 for moving all classes in the state-backend-rocksdb module under
> > the classes to o.a.f.state.rocksdb package.
> >
> > I have always been curious about the relationship between
> > o.a.f.contrib.xx and the flink-contrib module. :)
> >
> > Best,
> > Yanfei
> >
> > Jing Ge  于2023年7月25日周二 17:50写道:
> > >
> > > make sense.
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Tue, Jul 25, 2023 at 4:29 PM Stefan Richter
> > >  wrote:
> > >
> > > >
> > > > +1
> > > >
> > > >
> > > >
> > > > > On 24. Jul 2023, at 12:25, Chesnay Schepler 
> > wrote:
> > > > >
> > > > > To properly reflect the state of the rocksdb statebackend I propose
> > to
> > > > move all classes in the state-backend-rocksdb module under the
> classes
> > to
> > > > o.a.f.state.rocksdb package.
> > > > >
> > > > >
> > > > >
> > > >
> >
> https://www.google.com/url?q=https://cwiki.apache.org/confluence/display/FLINK/FLIP-349%253A%2BMove%2BRocksDB%2Bstatebackend%2Bclasses%2Bto%2Bo.a.f.state.rocksdb%2Bpackage=gmail-imap=169079912800=AOvVaw3OiTwgsLEiTcJpNTVM-Y8y
> > > >
> > > >
> >
>


Re: [VOTE][2.0] FLIP-344: Remove parameter in RichFunction#open

2023-07-31 Thread liu ron
+1(no-binding)

Best,
Ron

Jing Ge  于2023年7月26日周三 21:47写道:

> +1 (binding)
>
> On Wed, Jul 26, 2023 at 4:21 PM weijie guo 
> wrote:
>
> > +1 (binding)
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Yuxin Tan  于2023年7月26日周三 16:11写道:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Yuxin
> > >
> > >
> > > Xintong Song  于2023年7月26日周三 16:09写道:
> > >
> > > > +1 (binding)
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > >
> > > > On Wed, Jul 26, 2023 at 2:26 PM Wencong Liu 
> > > wrote:
> > > >
> > > > > Hi dev,
> > > > >
> > > > >
> > > > > I'd like to start a vote on FLIP-344.
> > > > >
> > > > >
> > > > > Discussion thread:
> > > > > https://lists.apache.org/thread/5lyjrrdtwkngkol2t541r4xwoh7133km
> > > > > FLIP:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231
> > > > >
> > > > >
> > > > > Best regards,
> > > > > Wencong Liu
> > > >
> > >
> >
>


Re: [DISCUSS] Add missing visibility annotation for Table APIs

2023-07-30 Thread liu ron
Hi, Jane

Thanks for driving this discussion, I think this would be very useful for
developer, +1.

Best,
Ron

Jane Chan  于2023年7月31日周一 10:57写道:

> Hi Jing,
>
> According to your proposal, does it mean that there will be no public class
> > that has no annotation?
>
>
> Yes, every class should be marked with the visibility annotation, but this
> is restricted only to the classes in the table-api-java,
> table-api-java-bridge, and table-common modules, as they are exposed to
> users.
>
>
> In the end, every public class must have one annotation and all classes
> > with @Internal will have the same meaning for developers as they had no
> > annotation before. Developers will ignore the annotation and continue
> > depending on them.
> >
>
> The Java doc for `@Internal` clearly states that it is internal to Flink.
> Suppose users continue to use it and encounter compatibility issues in
> subsequent versions, we can respond that it is because they are using an
> internal class that should not be relied on. However, if a class is
> unmarked, it is undefined and could be either `@PublicEvolving` or
> `@Internal`, resulting in higher interpretation costs. If that's the case,
> we'd better explain the purpose of these classes to users from the
> beginning. IMO this is necessary unless someday we can extract away all the
> classes that users should not rely upon from these three modules.
>
> Another point is that if we allow classes without visibility annotations,
> Flink developers may easily forget to mark some APIs that should have been
> marked as `@PublicEvolving` during development. This phenomenon can also be
> seen from the sheet I've collected. Moreover, it is difficult to check this
> situation through the automated rules.
>
> Best,
> Jane
>
> On Sat, Jul 29, 2023 at 10:59 PM Jing Ge 
> wrote:
>
> > Hi Jane,
> >
> > Thanks for the clarification. Commonly speaking, it is a good idea to
> show
> > the clear information that a class is only used internally, i.e. please
> > don't rely on it. However, the rule of using @Internal in the community
> is
> > not clearly defined, at least it is not as clear as
> > @Public/@PublicEvolving/@Experimental
> > are. According to your proposal, does it mean that there will be no
> public
> > class that has no annotation? In the end, every public class must have
> one
> > annotation and all classes with @Internal will have the same meaning for
> > developers as they had no annotation before. Developers will ignore
> > the annotation and continue depending on them. It looks like a typical
> > involution: we will come back to the position where we were but with
> > extra @Internal maintenance effort.
> >
> > Best regards,
> > Jing
> >
> > On Sat, Jul 29, 2023 at 3:01 PM Jane Chan  wrote:
> >
> > > Hi all,
> > >
> > > Thanks for your valuable feedback. Here are my thoughts.
> > >
> > > To Jark
> > > I agree with your suggestions, and I've updated the sheet accordingly.
> > >
> > > To Lincoln
> > >
> > > > For the `DynamicFilteringEvent`, tend to keep it 'internal' since
> it's
> > a
> > > > concreate implementation of `SourceEvent`(and other two implementers
> > are
> > > > not public ones) .
> > >
> > >
> > > I have checked the corresponding FLIP [1], and according to the design
> > doc,
> > > it suggested being marked as `@PublicEvolving`. At the same time, the
> > class
> > > `DynamicFilteringData`, which was introduced with it, was also marked
> as
> > > `@PublicEvolving`. I think we can cc Gen Luo  to
> > help
> > > clarify.
> > >
> > >
> > > For the `LookupOptions` and `Trigger`s, because they're all in the
> public
> > > > interfaces of the flip[2], I'm fine with making them all public ones
> or
> > > > just excluding the Trigger implemantors, cc @Qingsheng can you also
> > help
> > > to
> > > > check this?
> > > >
> > >
> > > I'm fine with both. Let's wait for Qingsheng's opinion.
> > >
> > >
> > > For the `BuiltInFunctionDefinitions$Builder`, I think it should be
> > > > `BuiltInFunctionDefinition$Builder`
> > >
> > >
> > > Yes, this is a typo, and I've corrected it.
> > >
> > >
> > > To Jing
> > >
> > > do we really need to
> > > > mark some many classes as @Internal? What is the exactly different
> > > between
> > > > a public class with no annotation and with the @Internal?
> > >
> > >
> > > IMO it is still necessary. From a user's perspective, marking a class
> as
> > > @Internal has a clear directionality, indicating that this is an
> internal
> > > class, and I should not rely on it. However, facing an unmarked class,
> I
> > > wonder whether it is safe to depend on it in my code. From a
> developer's
> > > perspective, marking a class as @Internal also helps us to be more
> > > confident when iterating and updating interfaces. We can be sure that
> > this
> > > proposed change will not have unexpected behavior (because we have
> > informed
> > > users that it is internal and cannot promise the same compatibility
> > > guarantee as public APIs).
> > >
> > > [1]

Re: Re: Re: Re: [DISCUSS][2.0] FLIP-347: Remove IOReadableWritable serialization in Path

2023-07-30 Thread liu ron
+1, make sense.

Best,
Ron

Wencong Liu  于2023年7月27日周四 09:47写道:

> Hi Matthias,
>
> Thanks for your reply. Due to my busy work reasons, I would like to focus
> only on
> the `Path` class in FLIP-347 for now. As for the implementation of other
> modules,
> I will review them when I have available time later on.
>
>
> Best regards,
> Wencong Liu
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2023-07-26 18:35:21, "Matthias Pohl" 
> wrote:
> >Correct. I don't have the intention to block this FLIP if it's too much
> >effort to expand it. Sorry if that's the message that came across.
> >
> >On Wed, Jul 26, 2023 at 12:17 PM Xintong Song 
> wrote:
> >
> >> I think it worth looking into all implementations of
> IOReadeableWritable.
> >> However, I would not consider that as a concern of this FLIP.
> >>
> >> An important convention of the open-source community is volunteer work.
> If
> >> Wencong only wants to work on the `Path` case, I think he should not be
> >> asked to investigate all other cases.
> >>
> >> I believe it's not Matthias's intention to put more workload on Wencong.
> >> It's just sometimes such requests are not easy to say no.
> >>
> >> Best,
> >>
> >> Xintong
> >>
> >>
> >>
> >> On Wed, Jul 26, 2023 at 4:14 PM Matthias Pohl
> >>  wrote:
> >>
> >> > Is the time constraint driven by the fact that you wanted to have that
> >> > effort being included in 1.18? If so, it looks like that's not
> possible
> >> > based on the decision being made for 1.18 to only allow document
> changes
> >> > [1]. So, there would be actually time to look into it. WDYT?
> >> >
> >> > [1] https://lists.apache.org/thread/7l1c9ybqgyc1mx7t7tk4wkc1cm8481o9
> >> >
> >> > On Tue, Jul 25, 2023 at 12:04 PM Junrui Lee 
> wrote:
> >> >
> >> > > +1
> >> > >
> >> > > Best,
> >> > > Junrui
> >> > >
> >> > > Jing Ge  于2023年7月24日周一 23:28写道:
> >> > >
> >> > > > agree, since we want to try our best to deprecate APIs in 1.18, it
> >> > makes
> >> > > > sense.
> >> > > >
> >> > > >
> >> > > > Best regards,
> >> > > > Jing
> >> > > >
> >> > > > On Mon, Jul 24, 2023 at 12:11 PM Wencong Liu <
> liuwencle...@163.com>
> >> > > wrote:
> >> > > >
> >> > > > > Hi Jing and Matthias,
> >> > > > >
> >> > > > >
> >> > > > > I believe it is reasonable to examine all classes that
> >> implement
> >> > > the
> >> > > > > IOReadableWritable
> >> > > > > interface and summarize their actual usage. However, due to time
> >> > > > > constraints, I suggest
> >> > > > > we minimize the scope of this FLIP to focus on the Path class.
> As
> >> for
> >> > > > > other components
> >> > > > > that implement IOReadableWritable, we can make an effort to
> >> > investigate
> >> > > > > them
> >> > > > > in the future. WDYT?
> >> > > > >
> >> > > > >
> >> > > > > Best regards,
> >> > > > > Wencong Liu
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > At 2023-07-22 00:46:45, "Jing Ge" 
> >> > wrote:
> >> > > > > >Hi Wencong,
> >> > > > > >
> >> > > > > >Thanks for the clarification. I got your point. It makes sense.
> >> > > > > >
> >> > > > > >Wrt IOReadableWritable, the suggestion was to check all classes
> >> that
> >> > > > > >implemented it, e.g. BlockInfo, Value, Configuration, etc. Not
> >> > limited
> >> > > > to
> >> > > > > >the Path.
> >> > > > > >
> >> > > > > >Best regards,
> >> > > > > >Jing
> >> > > > > >
> >> > > > > >On Fri, Jul 21, 2023 at 4:31 PM Wencong Liu <
> liuwencle...@163.com
> >> >
> >> > > > wrote:
> >> > > > > >
> >> > > > > >> Hello Jing,
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> Thanks for your reply. The URI field should be final and the
> >> > > > > >> Path will be immutable.The static method
> >> > > deserializeFromDataInputView
> >> > > > > >> will create a new Path object instead of replacing the URI
> field
> >> > > > > >> in a existed Path Object.
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> For the crossing multiple modules issue, I've explained it in
> >> the
> >> > > > reply
> >> > > > > >> to Matthias.
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> Best regards,
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> Wencong Liu
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> At 2023-07-21 18:05:26, "Jing Ge"  >
> >> > > wrote:
> >> > > > > >> >Hi Wencong,
> >> > > > > >> >
> >> > > > > >> >Just out of curiosity, will the newly introduced
> >> > > > > >> >deserializeFromDataInputView() method make the Path mutable
> >> > again?
> >> > > > > >> >
> >> > > > > >> >What Matthias suggested makes sense, although the extension
> >> might
> >> > > > make
> >> > > > > >> this
> >> > > > > >> >FLIP cross multiple modules.
> >> > > > > >> >
> >> > > > > >> >Best regards,
> >> > 

Re: [VOTE][2.0] FLIP-343: Remove parameter in WindowAssigner#getDefaultTrigger()

2023-07-30 Thread liu ron
+1

Best,
Ron

Xintong Song  于2023年7月27日周四 09:59写道:

> I think we should add the release version when the FLIP is actually
> released, in case of inconsistency.
>
>
> Best,
>
> Xintong
>
>
>
> On Wed, Jul 26, 2023 at 9:48 PM Jing Ge 
> wrote:
>
> > +1 (binding)
> >
> > Please don't forget to update the release version to 2.0 in the FLIP.
> >
> > Best regards,
> > Jing
> >
> > On Wed, Jul 26, 2023 at 5:49 PM Matthias Pohl
> >  wrote:
> >
> > > +1 (binding)
> > >
> > > On Wed, Jul 26, 2023 at 10:11 AM Yuxin Tan 
> > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Yuxin
> > > >
> > > >
> > > > Xintong Song  于2023年7月26日周三 16:08写道:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Best,
> > > > >
> > > > > Xintong
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jul 26, 2023 at 3:35 PM Yuepeng Pan 
> wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > Yuepeng Pan.
> > > > > > At 2023-07-26 14:26:04, "Wencong Liu" 
> > wrote:
> > > > > > >Hi dev,
> > > > > > >
> > > > > > >
> > > > > > >I'd like to start a vote on FLIP-343.
> > > > > > >
> > > > > > >
> > > > > > >Discussion thread:
> > > > > > https://lists.apache.org/thread/zn11f460x70nn7f2ckqph41bvx416wxc
> > > > > > >FLIP:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425229
> > > > > > >
> > > > > > >
> > > > > > >Best regards,
> > > > > > >Wencong Liu
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE][2.0] FLIP-347: Remove IOReadableWritable serialization in Path

2023-07-27 Thread liu ron
+1

Best,
Ron

Jing Ge  于2023年7月26日周三 23:39写道:

> +1(binding).
>
> Theoretically, I agree with Matthias. But in the practices, afaic, those
> touched classes are low level APIs. We'd better have enough time to
> evaluate them with care. More time might be needed than we could have
> before the 1.18 release. A follow-up ticket might solve the concern. WDYT?
>
> Best regards,
> Jing
>
> On Wed, Jul 26, 2023 at 4:25 PM Matthias Pohl
>  wrote:
>
> > -0 (binding)
> >
> > I'm not going to block this effort because it's still possible to cover
> > other occurrences in a subsequent FLIP if it's considered useful (I'm not
> > in the position to judge how important the IOReadableWritable interface
> is
> > in the context of removing the DataSet API).
> >
> > The recent discussion around what is allowed to be added to 1.18 after
> the
> > feature freeze [1] prevents us from moving forward with FLIP-347 in 1.18,
> > anyway. Therefore, we could use the time to extend the FLIP to become a
> > larger effort around making IOReadableWritable implementations immutable
> as
> > suggested in FLINK-4758 [2].
> >
> > I mentioned this in the FLIP's discussion thread as well.
> >
> > [1] https://lists.apache.org/thread/7l1c9ybqgyc1mx7t7tk4wkc1cm8481o9
> > [2] https://issues.apache.org/jira/browse/FLINK-4758
> >
> > On Wed, Jul 26, 2023 at 10:21 AM weijie guo 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Yuxin Tan  于2023年7月26日周三 16:11写道:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Yuxin
> > > >
> > > >
> > > > Xintong Song  于2023年7月26日周三 16:09写道:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Best,
> > > > >
> > > > > Xintong
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jul 26, 2023 at 2:29 PM Wencong Liu 
> > > > wrote:
> > > > >
> > > > > > Hi dev,
> > > > > >
> > > > > >
> > > > > > I'd like to start a vote on FLIP-347.
> > > > > >
> > > > > >
> > > > > > Discussion thread:
> > > > > > https://lists.apache.org/thread/3gcxhnqpsvb85golnlxf9tv5p43xkjgj
> > > > > > FLIP:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-347%3A+Remove+IOReadableWritable+serialization+in+Path
> > > > > >
> > > > > >
> > > > > > Best regards,
> > > > > > Wencong Liu
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Yong Fang

2023-07-23 Thread liu ron
Congratulations,

Best,
Ron

Qingsheng Ren  于2023年7月24日周一 11:18写道:

> Congratulations and welcome aboard, Yong!
>
> Best,
> Qingsheng
>
> On Mon, Jul 24, 2023 at 11:14 AM Chen Zhanghao 
> wrote:
>
> > Congrats, Shammon!
> >
> > Best,
> > Zhanghao Chen
> > 
> > 发件人: Weihua Hu 
> > 发送时间: 2023年7月24日 11:11
> > 收件人: dev@flink.apache.org 
> > 抄送: Shammon FY 
> > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yong Fang
> >
> > Congratulations!
> >
> > Best,
> > Weihua
> >
> >
> > On Mon, Jul 24, 2023 at 11:04 AM Paul Lam  wrote:
> >
> > > Congrats, Shammon!
> > >
> > > Best,
> > > Paul Lam
> > >
> > > > 2023年7月24日 10:56,Jingsong Li  写道:
> > > >
> > > > Shammon
> > >
> > >
> >
>


Re: [DISCUSS] FLIP 333 - Redesign Apache Flink website

2023-07-23 Thread liu ron
+1,

The dark mode looks very cool.

Best,
Ron

Matthias Pohl  于2023年7月20日周四 15:45写道:

> I think Martijn and Markos brought up a few good points:
>
> - We shouldn't degrade the accessibility but ideally improve it as part of
> the redesign. The current proposal doesn't look like we're doing changes in
> a way that it lowers the accessibility (considering that the menu structure
> stays the same and there are no plans to replace text with images). But
> nonetheless, it would be good to have this topic explicitly covered in the
> FLIP.
>
> - I see Markos' concern about the white vs black background (Flink
> documentation vs Flink website): Does it make a big difference to change to
> a white background design? The switch to dark mode could happen after a
> similar (or the same) theme is supported by the documentation. Or are there
> other reasons why the dark background is favorable?
>
> Best,
> Matthias
>
> On Wed, Jul 19, 2023 at 12:28 PM Markos Sfikas
>  wrote:
>
> > +1 Thanks for proposing this FLIP, Deepthi.
> >
> > The designs on FLIP-333 [1] look fresh and modern and I feel they achieve
> > the goal in general.
> >
> > A couple of suggestions from my side could be the following:
> >
> > [a] Assuming that no changes are implemented to the Flink documentation,
> I
> > would like to see a visual with a 'white background' instead of the 'dark
> > mode'. This is primarily for two reasons: Firstly, it provides a more
> > consistent experience for the website visitor going from the home page to
> > the documentation (instead of switching from dark to white mode on the
> > website) and secondly, from an accessibility and inclusivity perspective
> > that was mentioned earlier, we should give the option to either switch
> > between dark and white mode or have something that is universally easy to
> > read and consume (not everyone is comfortable reading white text on dark
> > background).
> >
> > [b] Regarding structuring the home page, right now the Flink website has
> > use cases blending with what seems to be Flink's 'technical
> > characteristics' (i.e. the sections that talk about 'Guaranteed
> > correctness', 'Layered APIs', 'Operational Focus', etc.). As someone new
> to
> > Flink and considering using the technology, I would like to understand
> > firstly the use cases and secondly dive into the characteristics that
> make
> > Flink stand out. I would suggest having one 'Use Cases' section above the
> > 'technical characteristics' to have a separation between the two and
> easily
> > navigate to the Flink use cases pages [2] directly from this section.\
> >
> > Thank you
> > Markos
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-333%3A+Redesign+Apache+Flink+website
> > [2] https://flink.apache.org/use-cases/
> >
> > On Wed, Jul 19, 2023 at 10:46 AM Martijn Visser <
> martijnvis...@apache.org>
> > wrote:
> >
> > > +0. I think it has to grow on me. A couple of things from my end:
> > >
> > > - Have we evaluated if these new designs are an improvement on W3C's
> > > Accessibility, Usability & Inclusion? [1]. It is something that the ASF
> > > rightfully emphasises.
> > > - "there is general consensus in the community that the Flink
> > documentation
> > > is very well-organized and easily searchable." -> That's actually not
> the
> > > case, there are numerous FLIPs on this topic [2] [3] which haven't been
> > > concluded/implemented.
> > > - I don't think we should put the links to the Github repo and the blog
> > > posts in the footer: these are some of the most read/visited links.
> > >
> > > Thanks,
> > >
> > > Martijn
> > >
> > > [1]
> > https://www.w3.org/WAI/fundamentals/accessibility-usability-inclusion/
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685
> > > [3]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation
> > >
> > >
> > > On Mon, Jul 17, 2023 at 2:31 PM Maximilian Michels 
> > wrote:
> > >
> > > > +1
> > > >
> > > > On Mon, Jul 17, 2023 at 10:45 AM Chesnay Schepler <
> ches...@apache.org>
> > > > wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > On 16/07/2023 08:10, Mohan, Deepthi wrote:
> > > > > > @Chesnay
> > > > > >
> > > > > > Thank you for your feedback.
> > > > > >
> > > > > > An important takeaway from the previous discussion [1] and your
> > > > feedback was to keep the design and text/diagram changes separate as
> > each
> > > > change for text and diagrams likely require deeper discussion.
> > Therefore,
> > > > as a first step I am proposing only UX changes with minimal text
> > changes
> > > > for the pages mentioned in the FLIP.
> > > > > >
> > > > > > The feedback we received from customers cover both aesthetics and
> > > > functional aspects of the website. Note that most feedback is focused
> > > only
> > > > on the main Flink website [2].
> > > > > >
> > > > > > 1) New customers who are considering Flink have said about the
> > > website
> > 

Re: [VOTE] FLIP-346: Deprecate ManagedTable related APIs

2023-07-23 Thread liu ron
+1

Best,
Ron

Lincoln Lee  于2023年7月21日周五 16:09写道:

> +1
>
> Best,
> Lincoln Lee
>
>
> Leonard Xu  于2023年7月21日周五 16:07写道:
>
> > +1
> >
> > Best,
> > Leonard
> >
> > > On Jul 21, 2023, at 4:02 PM, yuxia 
> wrote:
> > >
> > > +1(binging)
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Jane Chan" 
> > > 收件人: "dev" 
> > > 发送时间: 星期五, 2023年 7 月 21日 下午 3:41:11
> > > 主题: [VOTE] FLIP-346: Deprecate ManagedTable related APIs
> > >
> > > Hi developers,
> > >
> > > Thanks for all the feedback on FLIP-346: Deprecate ManagedTable related
> > > APIs[1].
> > > Based on the discussion[2], we have reached a consensus, so I would
> like
> > to
> > > start a vote.
> > >
> > > The vote will last for at least 72 hours unless there is an objection
> or
> > > insufficient votes.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-346%3A+Deprecate+ManagedTable+related+APIs
> > > [2] https://lists.apache.org/thread/5dvqyhqp5fbtm54944xohts71modwd99
> > >
> > > Best,
> > > Jane
> >
> >
>


Re: [DISCUSS][2.0] FLIP-343: Remove parameter in WindowAssigner#getDefaultTrigger()

2023-07-23 Thread liu ron
+1

Best,
Ron

Yuxin Tan  于2023年7月21日周五 16:21写道:

> +1
>
> Best,
> Yuxin
>
>
> Jing Ge  于2023年7月21日周五 15:41写道:
>
> > +1
> >
> > NIT: the release in the FLIP is still empty, it should be 2.0
> >
> > Best regards,
> > Jing
> >
> > On Fri, Jul 21, 2023 at 6:03 AM Xintong Song 
> > wrote:
> >
> > > +1
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Fri, Jul 21, 2023 at 10:53 AM Wencong Liu 
> > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I would like to start a discussion on FLIP-343: Remove parameter in
> > > > WindowAssigner#getDefaultTrigger() [1].
> > > >
> > > >
> > > > The method getDefaultTrigger() in WindowAssigner takes a
> > > > StreamExecutionEnvironment
> > > > parameter, but this parameter is not actually used for any subclasses
> > of
> > > > WindowAssigner.
> > > > Therefore, it is unnecessary to include this parameter.
> > > > As such I propose to remove the StreamExecutionEnvironment field from
> > > > WindowAssigner#getDefaultTrigger(StreamExecutionEnvironment env).
> > > > Looking forward to your feedback.
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425229
> > > > Best regards,
> > > >
> > > >
> > > > Wencong Liu
> > >
> >
>


Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-03 Thread liu ron
Congrats everyone

Best,
Ron

Jark Wu  于2023年7月3日周一 22:48写道:

> Congrats everyone!
>
> Best,
> Jark
>
> > 2023年7月3日 22:37,Yuval Itzchakov  写道:
> >
> > Congrats team!
> >
> > On Mon, Jul 3, 2023, 17:28 Jing Ge via user  > wrote:
> >> Congratulations!
> >>
> >> Best regards,
> >> Jing
> >>
> >>
> >> On Mon, Jul 3, 2023 at 3:21 PM yuxia  > wrote:
> >>> Congratulations!
> >>>
> >>> Best regards,
> >>> Yuxia
> >>>
> >>> 发件人: "Pushpa Ramakrishnan"  pushpa.ramakrish...@icloud.com>>
> >>> 收件人: "Xintong Song"  tonysong...@gmail.com>>
> >>> 抄送: "dev" mailto:dev@flink.apache.org>>,
> "User" mailto:u...@flink.apache.org>>
> >>> 发送时间: 星期一, 2023年 7 月 03日 下午 8:36:30
> >>> 主题: Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award
> >>>
> >>> Congratulations \uD83E\uDD73
> >>>
> >>> On 03-Jul-2023, at 3:30 PM, Xintong Song  > wrote:
> >>>
> >>> 
> >>> Dear Community,
> >>>
> >>> I'm pleased to share this good news with everyone. As some of you may
> have already heard, Apache Flink has won the 2023 SIGMOD Systems Award [1].
> >>>
> >>> "Apache Flink greatly expanded the use of stream data-processing." --
> SIGMOD Awards Committee
> >>>
> >>> SIGMOD is one of the most influential data management research
> conferences in the world. The Systems Award is awarded to an individual or
> set of individuals to recognize the development of a software or hardware
> system whose technical contributions have had significant impact on the
> theory or practice of large-scale data management systems. Winning of the
> award indicates the high recognition of Flink's technological advancement
> and industry influence from academia.
> >>>
> >>> As an open-source project, Flink wouldn't have come this far without
> the wide, active and supportive community behind it. Kudos to all of us who
> helped make this happen, including the over 1,400 contributors and many
> others who contributed in ways beyond code.
> >>>
> >>> Best,
> >>> Xintong (on behalf of the Flink PMC)
> >>>
> >>> [1] https://sigmod.org/2023-sigmod-systems-award/
> >>>
>
>


Re: [RESULT][VOTE] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-29 Thread liu ron
Thanks all for participating!

Best,
Ron

Lijie Wang  于2023年6月29日周四 21:38写道:

> Hi, all
> Happy to announce that FLIP-324 [1] has been approved unanimously.
> According to the vote thread[2], there are 10 approving votes, out of
> which 7 are binding:
>
> - Jing Ge (binding)
> - Rui Fan (binding)
> - Zhu Zhu (binding)
> - Yuepeng Pan (non-binding)
> - Yuxia (binding)
> - Xia Sun (non-binding)
> - Jark Wu (binding)
> - Yangze Guo (binding)
> - Yuxin Tan (non-binding)
> - Benchao Li (binding)
>
> And no disapproving ones.
>
> Thanks all for participating!
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> [2] https://lists.apache.org/thread/60c0obrgxrcxb7qv9pqywzxvtt5phnhy
>
> Best,
> Lijie.
>


Re: [VOTE] FLIP-303: Support REPLACE TABLE AS SELECT statement

2023-06-28 Thread liu ron
+1(no-binding)

Best,
Ron

Feng Jin  于2023年6月29日周四 00:22写道:

> +1 (no-binding)
>
>
> Best
> Feng
>
> On Wed, Jun 28, 2023 at 11:03 PM Jing Ge 
> wrote:
>
> > +1(binding)
> >
> > On Wed, Jun 28, 2023 at 1:51 PM Mang Zhang  wrote:
> >
> > > +1 (no-binding)
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Mang Zhang
> > >
> > >
> > >
> > >
> > >
> > > At 2023-06-28 17:48:15, "yuxia"  wrote:
> > > >Hi everyone,
> > > >Thanks for all the feedback about FLIP-303: Support REPLACE TABLE AS
> > > SELECT statement[1]. Based on the discussion [2], we have come to a
> > > consensus, so I would like to start a vote.
> > > >The vote will be open for at least 72 hours (until July 3th, 10:00AM
> > GMT)
> > > unless there is an objection or an insufficient number of votes.
> > > >
> > > >
> > > >[1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-303%3A+Support+REPLACE+TABLE+AS+SELECT+statement
> > > >[2] https://lists.apache.org/thread/39mwckdsdgck48tzsdfm66hhnxorjtz3
> > > >
> > > >
> > > >Best regards,
> > > >Yuxia
> > >
> >
>


Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-20 Thread liu ron
Hi, Jing

The default value for this ratio is a reference to other systems, such as
Hive. As long as Runtime Filter can filter out more than half of the data,
we can benefit from it. Of course, normally, as long as we can get the
statistics, ndv are present, the use of rowCount should be less, so I think
the formula is valid in most cases. This formula we are also borrowed from
some systems, such as the polardb of AliCloud. your concern is valuable for
this FLIP, but currently, we do not know how to adjust is reasonably, too
complex may lead to the user also can not understand, so I think we should
first follow the simple way, the subsequent gradual optimization. The first
step may be that we can verify the reasonableness of current formula by
TPC-DS case.

Best,
Ron

Jing Ge  于2023年6月20日周二 19:46写道:

> Hi Ron,
>
> Thanks for the clarification. That answered my questions.
>
> Regarding the ratio, since my gut feeling is that any value less than 0.8
> or 0.9 won't help too much(I might be wrong). I was thinking of adapting
> the formula to somehow map the current 0.9-1 to 0-1, i.e. if user config
> 0.5, it will be mapped to e.g. 0.95 (or e.g. 0.85, the real number
> needs more calculation) for the current formula described in the FLIP. But
> I am not sure it is a feasible solution. It deserves more discussion. Maybe
> some real performance tests could give us some hints.
>
> Best regards,
> Jing
>
> On Tue, Jun 20, 2023 at 5:19 AM liu ron  wrote:
>
> > Hi, Jing
> >
> > Thanks for your feedback.
> >
> > > Afaiu, the runtime Filter will only be Injected when the gap between
> the
> > build data size and prob data size is big enough. Let's make an extreme
> > example. If the small table(build side) has one row and the large
> > table(probe side) contains tens of billions of rows. This will be the
> ideal
> > use case for the runtime filter and the improvement will be significant.
> Is
> > this correct?
> >
> > Yes, you are right.
> >
> > > Speaking of the "Conditions of injecting Runtime Filter" in the FLIP,
> > will
> > the value of max-build-data-size and min-prob-data-size depend on the
> > parallelism config? I.e. with the same data-size setting, is it possible
> to
> > inject or don't inject runtime filters by adjusting the parallelism?
> >
> > First, let me clarify two points. The first is that RuntimeFilter decides
> > whether to inject or not in the optimization phase, but we do not
> consider
> > operator parallelism in the SQL optimization phase currently, which is
> set
> > at the ExecNode level. The second is that in batch mode, the default
> > AdaptiveBatchScheduler[1] is now used, which will derive the parallelism
> of
> > the downstream operator based on the amount of data produced by the
> > upstream operator, that is, the parallelism is determined by runtime
> > adaptation. In the above case, we cannot decide whether to inject
> > BloomFilter in the optimization stage based on parallelism.
> > A more important point is that the purpose of Runtime Filter is to reduce
> > the amount of data for shuffle, and thus the amount of data processed by
> > the downstream join operator. Therefore, I understand that regardless of
> > the parallelism of the probe, the amount of data in the shuffle must be
> > reduced after inserting the Runtime Filter, which is beneficial to the
> join
> > operator, so whether to insert the RuntimeFilter or not is not dependent
> on
> > the parallelism.
> >
> > > Does it make sense to reconsider the formula of ratio
> > calculation to help users easily control the filter injection?
> >
> > Only when ndv does not exist will row count be considered. when size uses
> > the default value and ndv cannot be taken, it is true that this condition
> > may always hold, but this does not seem to affect anything, and the user
> is
> > also likely to change the value of the size. One question, how do you
> think
> > we should make it easier for users to control the  filter injection?
> >
> >
> > [1]:
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler
> >
> > Best,
> > Ron
> >
> > Jing Ge  于2023年6月20日周二 07:11写道:
> >
> > > Hi Lijie,
> > >
> > > Thanks for your proposal. It is a really nice feature. I'd like to ask
> a
> > > few questions to understand your thoughts.
> > >
> > > Afaiu, the runtime Filter will only be Injected when the gap between
> the
> > > build data size and prob data size is big enough. Let's make a

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread liu ron
Hi, Jing

Thanks for your feedback.

> Afaiu, the runtime Filter will only be Injected when the gap between the
build data size and prob data size is big enough. Let's make an extreme
example. If the small table(build side) has one row and the large
table(probe side) contains tens of billions of rows. This will be the ideal
use case for the runtime filter and the improvement will be significant. Is
this correct?

Yes, you are right.

> Speaking of the "Conditions of injecting Runtime Filter" in the FLIP, will
the value of max-build-data-size and min-prob-data-size depend on the
parallelism config? I.e. with the same data-size setting, is it possible to
inject or don't inject runtime filters by adjusting the parallelism?

First, let me clarify two points. The first is that RuntimeFilter decides
whether to inject or not in the optimization phase, but we do not consider
operator parallelism in the SQL optimization phase currently, which is set
at the ExecNode level. The second is that in batch mode, the default
AdaptiveBatchScheduler[1] is now used, which will derive the parallelism of
the downstream operator based on the amount of data produced by the
upstream operator, that is, the parallelism is determined by runtime
adaptation. In the above case, we cannot decide whether to inject
BloomFilter in the optimization stage based on parallelism.
A more important point is that the purpose of Runtime Filter is to reduce
the amount of data for shuffle, and thus the amount of data processed by
the downstream join operator. Therefore, I understand that regardless of
the parallelism of the probe, the amount of data in the shuffle must be
reduced after inserting the Runtime Filter, which is beneficial to the join
operator, so whether to insert the RuntimeFilter or not is not dependent on
the parallelism.

> Does it make sense to reconsider the formula of ratio
calculation to help users easily control the filter injection?

Only when ndv does not exist will row count be considered. when size uses
the default value and ndv cannot be taken, it is true that this condition
may always hold, but this does not seem to affect anything, and the user is
also likely to change the value of the size. One question, how do you think
we should make it easier for users to control the  filter injection?


[1]:
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler

Best,
Ron

Jing Ge  于2023年6月20日周二 07:11写道:

> Hi Lijie,
>
> Thanks for your proposal. It is a really nice feature. I'd like to ask a
> few questions to understand your thoughts.
>
> Afaiu, the runtime Filter will only be Injected when the gap between the
> build data size and prob data size is big enough. Let's make an extreme
> example. If the small table(build side) has one row and the large
> table(probe side) contains tens of billions of rows. This will be the ideal
> use case for the runtime filter and the improvement will be significant. Is
> this correct?
>
> Speaking of the "Conditions of injecting Runtime Filter" in the FLIP, will
> the value of max-build-data-size and min-prob-data-size depend on the
> parallelism config? I.e. with the same data-size setting, is it possible to
> inject or don't inject runtime filters by adjusting the parallelism?
>
> In the FLIP, there are default values for the new configuration parameters
> that will be used to check the injection condition. If ndv cannot be
> estimated, row count will be used. Given the max-build-data-size is 10MB
> and the min-prob-data-size is 10GB, in the worst case, the min-filter-ratio
> will be 0.999, i.e. the probeNdv is 1000 times buildNdv . If we consider
> the duplication and the fact that the large table might have more columns
> than the small table, the probeNdv should still be 100 or 10 times
> buildNdv, which ends up with a min-filter-ratio equals to 0.99 or 0.9. Both
> are bigger than the default value 0.5 in the FLIP. If I am not mistaken,
> commonly, a min-filter-ratio less than 0.99 will always allow injecting the
> runtime filter. Does it make sense to reconsider the formula of ratio
> calculation to help users easily control the filter injection?
>
> Best regards,
> Jing
>
> On Mon, Jun 19, 2023 at 4:42 PM Lijie Wang 
> wrote:
>
> > Hi Stefan,
> >
> > >> bypassing the dataflow
> > I believe it's a possible solution, but it may require more coordination
> > and extra conditions (such as DFS), I do think it should be excluded from
> > the first version. I'll put it in Future+Improvements as a potential
> > improvement.
> >
> > Thanks again for your quick reply :)
> >
> > Best,
> > Lijie
> >
> > Stefan Richter  于2023年6月19日周一 20:51写道:
> >
> > >
> > > Hi Lijie,
> > >
> > > I think you understood me correctly. But I would not consider this a
> true
> > > cyclic dependency in the dataflow because I would not suggest to send
> the
> > > filter through an edge in the job graph from join to scan. I’d rather
> > > bypass the stream graph for 

Re: [VOTE] FLIP-294: Support Customized Catalog Modification Listener

2023-06-15 Thread liu ron
+1(no-binding)

Best,
Ron

yuxia  于2023年6月15日周四 20:15写道:

> +1 (binding)
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Leonard Xu" 
> 收件人: "dev" 
> 发送时间: 星期四, 2023年 6 月 15日 下午 8:09:00
> 主题: Re: [VOTE] FLIP-294: Support Customized Catalog Modification Listener
>
> +1 (binding)
>
>
> Best,
> Leonard
>
> > On Jun 14, 2023, at 11:21 PM, Jing Ge 
> wrote:
> >
> > +1 (binding)
> >
> > Best Regards,
> > Jing
> >
> > On Wed, Jun 14, 2023 at 4:07 PM Benchao Li  wrote:
> >
> >> +1 (binding)
> >>
> >> Shammon FY  于2023年6月14日周三 19:52写道:
> >>
> >>> Hi all:
> >>>
> >>> Thanks for all the feedback for FLIP-294: Support Customized Catalog
> >>> Modification Listener [1]. I would like to start a vote for it
> according
> >> to
> >>> the discussion in thread [2].
> >>>
> >>> The vote will be open for at least 72 hours(excluding weekends, until
> >> June
> >>> 19, 19:00 PM GMT) unless there is an objection or an insufficient
> number
> >> of
> >>> votes.
> >>>
> >>>
> >>> [1]
> >>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-294%3A+Support+Customized+Catalog+Modification+Listener
> >>> [2] https://lists.apache.org/thread/185mbcwnpokfop4xcb22r9bgfp2m68fx
> >>>
> >>>
> >>> Best,
> >>> Shammon FY
> >>>
> >>
> >>
> >> --
> >>
> >> Best,
> >> Benchao Li
> >>
>


Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-14 Thread liu ron
Thanks Lijie start this discussion. Runtime Filter is a common optimization
to improve the join performance that has been adopted by many computing
engines such as Spark, Doris, etc... Flink is a streaming batch computing
engine, and we are continuously optimizing the performance of batches.
Runtime filter is a general performance optimization technique that can
improve the performance of Flink batch jobs, so we are introducing it on
batch as well.

Looking forward to all feedback.

Best,
Ron

Lijie Wang  于2023年6月14日周三 17:17写道:

> Hi devs
>
> Ron Liu, Gen Luo and I would like to start a discussion about FLIP-324:
> Introduce Runtime Filter for Flink Batch Jobs[1]
>
> Runtime Filter is a common optimization to improve join performance. It is
> designed to dynamically generate filter conditions for certain Join queries
> at runtime to reduce the amount of scanned or shuffled data, avoid
> unnecessary I/O and network transmission, and speed up the query. Its
> working principle is building a filter(e.g. bloom filter) based on the data
> on the small table side(build side) first, then pass this filter to the
> large table side(probe side) to filter the irrelevant data on it, this can
> reduce the data reaching the join and improve performance.
>
> You can find more details in the FLIP-324[1]. Looking forward to your
> feedback.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
>
> Best,
> Ron & Gen & Lijie
>


Re: [VOTE] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

2023-06-12 Thread liu ron
+1 (no-binding)

Best,
Ron

Jing Ge  于2023年6月12日周一 19:33写道:

> +1(binding) Thanks!
>
> Best regards,
> Jing
>
> On Mon, Jun 12, 2023 at 12:01 PM yuxia 
> wrote:
>
> > +1 (binding)
> > Thanks Mang driving it.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "zhangmang1" 
> > 收件人: "dev" 
> > 发送时间: 星期一, 2023年 6 月 12日 下午 5:31:10
> > 主题: [VOTE] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS)
> > statement
> >
> > Hi everyone,
> >
> > Thanks for all the feedback about FLIP-305: Support atomic for CREATE
> > TABLE AS SELECT(CTAS) statement[1].
> > [2] is the discussion thread.
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours (until June 15th, 10:00AM GMT) unless there is an objection or an
> > insufficient number of votes.[1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
> > [2]https://lists.apache.org/thread/n6nsvbwhs5kwlj5kjgv24by2tk5mh9xd
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Best regards,
> > Mang Zhang
> >
>


Re: [VOTE] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-12 Thread liu ron
Hi, all.
FLIP-315 [1] has been accepted.
There are 5 binding votes, 1 non-binding votes:
- Jark Wu(binding)
- Jingsong Li (binding)
- Benchao Li (binding)
- Weijie Guo(binding)
- Jing Ge(binding)

- Aitozi (non-binding)
Thanks everyone.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL

Best,
Ron

Aitozi  于2023年6月8日周四 13:30写道:

> +1
>
> Looking forward to this feature.
>
> Best,
> Aitozi.
>
> Jing Ge  于2023年6月8日周四 04:44写道:
>
> > +1
> >
> > Best Regards,
> > Jing
> >
> > On Wed, Jun 7, 2023 at 10:52 AM weijie guo 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Jingsong Li  于2023年6月7日周三 15:59写道:
> > >
> > > > +1
> > > >
> > > > On Wed, Jun 7, 2023 at 3:03 PM Benchao Li 
> > wrote:
> > > > >
> > > > > +1, binding
> > > > >
> > > > > Jark Wu  于2023年6月7日周三 14:44写道:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > > 2023年6月7日 14:20,liu ron  写道:
> > > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > Thanks for all the feedback about FLIP-315: Support Operator
> > Fusion
> > > > > > Codegen
> > > > > > > for Flink SQL[1].
> > > > > > > [2] is the discussion thread.
> > > > > > >
> > > > > > > I'd like to start a vote for it. The vote will be open for at
> > least
> > > > 72
> > > > > > > hours (until June 12th, 12:00AM GMT) unless there is an
> objection
> > > or
> > > > an
> > > > > > > insufficient number of votes.
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > > > > [2]:
> > > > https://lists.apache.org/thread/9cnqhsld4nzdr77s2fwf00o9cb2g9fmw
> > > > > > >
> > > > > > > Best,
> > > > > > > Ron
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best,
> > > > > Benchao Li
> > > >
> > >
> >
>


Re: Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

2023-06-09 Thread liu ron
Hi, Mang

Thanks for your update, the FLIP looks good to me now.

Best,
Ron

Mang Zhang  于2023年6月9日周五 12:08写道:

> Hi Ron,
> Thanks for your reply!
> After our offline discussion, at present, there may be many of flink jobs
> using non-atomic CTAS functions, especially Stream jobs,
> If we only infer whether atomic CTAS is supported based on whether
> DynamicTableSink implements the SupportsStaging interface,
> then after the user upgrades to a new version, flink's behavior will
> change, which is not production friendly.
> in order to ensure the consistency of flink behavior, and to give the user
> maximum flexibility,
> in time DynamicTableSink implements the SupportsStaging interface, users
> can still choose non-atomic implementation according to business needs.
>
> I have updated FLIP-305[1].
>
> Looking forward to more feedback, if there is no other feedback, I will
> launch a vote next Monday(2023-06-12).
> Thanks again!
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
>
>
> --
>
> Best regards,
>
> Mang Zhang
>
>
>
> At 2023-06-09 10:23:21, "liu ron"  wrote:
> >Hi, Mang
> >
> >In FLIP-214, we have discussed that atomicity is not needed in streaming
> >mode, so we have implemented the initial version that doesn't support
> >atomicity. In addition, we introduce the option
> >"table.ctas.atomicity-enabled" to enable the atomic ability. According to
> >your FLIP-315 description, Once the DynamicTableSink implements the
> >SupportsStaging interface, the atomicity is the default behavior whether in
> >stream mode or batch mode, and the user can't change it, I think this is
> >not feasible for streaming mode, the atomicity should can be controlled by
> >user. So I think we should clear the atomicity behavior combine the option
> >and SuppportsStage interface in FLIP. Only the DynamicTableSink implements
> >the SupportsStaging and option is enabled, only atomicity is enabled. WDYT?
> >
> >Best,
> >Ron
> >
> >Jark Wu  于2023年6月8日周四 16:30写道:
> >
> >> Thank you for the great work, Mang! The updated proposal looks good to me.
> >>
> >> Best,
> >> Jark
> >>
> >> > 2023年6月8日 11:49,Jingsong Li  写道:
> >> >
> >> > Thanks Mang for updating!
> >> >
> >> > Looks good to me!
> >> >
> >> > Best,
> >> > Jingsong
> >> >
> >> > On Wed, Jun 7, 2023 at 2:31 PM Mang Zhang  wrote:
> >> >>
> >> >> Hi Jingsong,
> >> >>
> >> >>> I have some doubts about the `TwoPhaseCatalogTable`. Generally, our
> >> >>> Flink design places execution in the TableFactory or directly in the
> >> >>> Catalog, so introducing an executable table makes me feel a bit
> >> >>> strange. (Spark is this style, but Flink may not be)
> >> >> On this issue, we introduce the executable logic commit/abort a bit of
> >> strange on CatalogTable.
> >> >> After an offline discussion with yuxia, I tweaked the FLIP-305[1]
> >> scenario.
> >> >> The new solution is similar to the implementation of SupportsOverwrite,
> >> >> which introduces the SupportsStaging interface and infers whether
> >> DynamicTableSink supports atomic ctas based on whether it implements the
> >> SupportsStaging interface,
> >> >> and if so, it will get the StagedTable object from DynamicTableSink.
> >> >>
> >> >> For more implementation details, please see the FLIP-305 document.
> >> >>
> >> >> This is my poc commits
> >> https://github.com/Tartarus0zm/flink/commit/025b30ad8f1a03e7738e9bb534e6e491c31990fa
> >> >>
> >> >>
> >> >> [1]
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Best regards,
> >> >>
> >> >> Mang Zhang
> >> >>
> >> >>
> >> >>
> >> >> At 2023-05-12 13:02:14, "Jingsong Li"  wrote:
> >> >>> Hi Mang,
> >> >>>
> >> >>> Thanks for starting this FLIP.
> >> >>>
> >> >>> I have some doubts about the `TwoPhaseCatalogTable`. Generally, our
> >> >>> Flink de

Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

2023-06-08 Thread liu ron
Hi, Mang

In FLIP-214, we have discussed that atomicity is not needed in streaming
mode, so we have implemented the initial version that doesn't support
atomicity. In addition, we introduce the option
"table.ctas.atomicity-enabled" to enable the atomic ability. According to
your FLIP-315 description, Once the DynamicTableSink implements the
SupportsStaging interface, the atomicity is the default behavior whether in
stream mode or batch mode, and the user can't change it, I think this is
not feasible for streaming mode, the atomicity should can be controlled by
user. So I think we should clear the atomicity behavior combine the option
and SuppportsStage interface in FLIP. Only the DynamicTableSink implements
the SupportsStaging and option is enabled, only atomicity is enabled. WDYT?

Best,
Ron

Jark Wu  于2023年6月8日周四 16:30写道:

> Thank you for the great work, Mang! The updated proposal looks good to me.
>
> Best,
> Jark
>
> > 2023年6月8日 11:49,Jingsong Li  写道:
> >
> > Thanks Mang for updating!
> >
> > Looks good to me!
> >
> > Best,
> > Jingsong
> >
> > On Wed, Jun 7, 2023 at 2:31 PM Mang Zhang  wrote:
> >>
> >> Hi Jingsong,
> >>
> >>> I have some doubts about the `TwoPhaseCatalogTable`. Generally, our
> >>> Flink design places execution in the TableFactory or directly in the
> >>> Catalog, so introducing an executable table makes me feel a bit
> >>> strange. (Spark is this style, but Flink may not be)
> >> On this issue, we introduce the executable logic commit/abort a bit of
> strange on CatalogTable.
> >> After an offline discussion with yuxia, I tweaked the FLIP-305[1]
> scenario.
> >> The new solution is similar to the implementation of SupportsOverwrite,
> >> which introduces the SupportsStaging interface and infers whether
> DynamicTableSink supports atomic ctas based on whether it implements the
> SupportsStaging interface,
> >> and if so, it will get the StagedTable object from DynamicTableSink.
> >>
> >> For more implementation details, please see the FLIP-305 document.
> >>
> >> This is my poc commits
> https://github.com/Tartarus0zm/flink/commit/025b30ad8f1a03e7738e9bb534e6e491c31990fa
> >>
> >>
> >> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
> >>
> >>
> >> --
> >>
> >> Best regards,
> >>
> >> Mang Zhang
> >>
> >>
> >>
> >> At 2023-05-12 13:02:14, "Jingsong Li"  wrote:
> >>> Hi Mang,
> >>>
> >>> Thanks for starting this FLIP.
> >>>
> >>> I have some doubts about the `TwoPhaseCatalogTable`. Generally, our
> >>> Flink design places execution in the TableFactory or directly in the
> >>> Catalog, so introducing an executable table makes me feel a bit
> >>> strange. (Spark is this style, but Flink may not be)
> >>>
> >>> And for `TwoPhase`, maybe `StagedXXX` like Spark is better?
> >>>
> >>> Best,
> >>> Jingsong
> >>>
> >>> On Wed, May 10, 2023 at 9:29 PM Mang Zhang  wrote:
> >>>>
> >>>> Hi Ron,
> >>>>
> >>>>
> >>>> First of all, thank you for your reply!
> >>>> After our offline communication, what you said is mainly in the
> compilePlan scenario, but currently compilePlanSql does not support non
> INSERT statements, otherwise it will throw an exception.
> >>>>> Unsupported SQL query! compilePlanSql() only accepts a single SQL
> statement of type INSERT
> >>>> But it's a good point that I will seriously consider.
> >>>> Non-atomic CTAS can be supported relatively easily;
> >>>> But atomic CTAS needs more adaptation work, so I'm going to leave it
> as is and follow up with a separate issue to implement CTAS support for
> compilePlanSql.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Best regards,
> >>>> Mang Zhang
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> At 2023-04-23 17:52:07, "liu ron"  wrote:
> >>>>> Hi, Mang
> >>>>>
> >>>>> I have a question about the implementation details. For the
> atomicity case,
> >>>>> since the target table is not cr

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-07 Thread liu ron
Hi, Ging

Thanks for your valuable input about scala free.

Firstly, reply to your question, using java to implement codegen is
possible,  but we need to utilize some tools. I think the first alternative
is to update our jdk version to 13, which provides text block feature[1]
makes string format easier, and improves the multiple-line String
readability and writability. However, we don't update the JDK version to 13
in the short term future. The second alternative is to use a third library
such as Freemarker and StringTemplate, but this is not easy work, we need
to introduce extra dependency in table planner, and makes our
implementation more complicated.

We use a lot of scala code in the planner module, one of the main purposes
is that codegen is more friendly, and many of the operators are also
implemented through codegen. In the foreseeable future, we do not have the
time and manpower to remove the scala code from the planner module, so
scala-free is unlikely. From the point of view of development friendliness
and development cost, scala is currently a relatively better solution for
codegen. Suppose we need to completely rewrite the planner module in java
in the future, I think it is better to consider what tools are used to
support codegen in a unified way at that time, and I can't give a suitable
tool at the moment.

In summary, I don't think it is feasible to implement my FLIP with
scala-free at this time.

[1]: https://openjdk.org/jeps/378

Best,
Ron


liu ron  于2023年6月8日周四 10:51写道:

> Hi, Atiozi
>
> Thanks for your feedback.
>
> > Traverse the ExecNode DAG and create a FusionExecNode  for physical
> operators that can be fused together.
> which kind of operators can be fused together ? are the operators in an
> operator chain? Is this optimization aligned to spark's whole stage codegen
> ?
> In theory, all kinds of operators can be fused together, our final goal is
> to support all operators in batch mode, OperatorChain is just one case. Due
> to this work effort is relatively large, so we need to complete it step by
> step. Our OFCG not only achieves the ability of spark's whole stage
> codegen, but also do more better than them.
>
> > does the "support codegen" means fusion codegen? but why we generate a
> FusionTransformation when the member operator does not support codegen, IMO
> it should
> fallback to the current behavior.
>
> yes, it means the fusion codegen. In FLIP, I propose two operator fusion
> mechanisms, one is like OperatorChain for single input operator, another is
> MultipleInput fusion. For the former, our design mechanism is to fuse all
> operators together at the ExecNode layer only if they all support fusion
> codegen, or else go over the default OperatorChain. For the latter, in
> order not to break the existing MultipleInput optimization purpose, so when
> there are member operators that do not support fusion codegen,  we will
> fall back to the current behavior[1], which means that a
> FusionTransformation is created. here FusionTransformation is just a
> surrogate for MultipleInput case, it actually means
> MultipleInputTransformation, which fuses multiple physical operators.
> Sorry, the description in the flow is not very clear and caused your
> confusion.
>
> > In the end, I share the same idea with Lincoln about performance
> benchmark.
> Currently flink community's flink-benchmark only covers like schedule,
> state, datastream operator's performance.
> A good benchmark harness for sql operator will benefit the sql optimizer
> topic and observation
>
> For the performance benchmark, I agree with you. As I stated earlier, I
> think this is a new scope of work, we should design it separately, we can
> introduce this improvement in the future.
>
> [1]
> https://github.com/apache/flink/blob/77214f138cf759a3ee5466c9b2379e717227a0ae/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/batch/BatchExecMultipleInput.java#L123
>
> Best,
> Ron
>
> Jing Ge  于2023年6月8日周四 04:28写道:
>
>> Hi Ron,
>>
>> Thanks for raising the proposal. It is a very attractive idea! Since the
>> FLIP is a relatively complex one which contains three papers and a design
>> doc. It deserves more time for the discussion to make sure everyone is on
>> the same page. I have a NIT question which will not block your voting
>> process. Previously, it took the community a lot of effort to make Flink
>> kinds of scala free. Since the code base of the table module is too big,
>> instead of porting to Java, all scala code has been hidden. Furthermore,
>> there are ongoing efforts to remove Scala code from Flink. As you can see,
>> the community tries to limit (i.e. get rid of) scala code as much as
>> possible. I was 

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-07 Thread liu ron
Hi, Atiozi

Thanks for your feedback.

> Traverse the ExecNode DAG and create a FusionExecNode  for physical
operators that can be fused together.
which kind of operators can be fused together ? are the operators in an
operator chain? Is this optimization aligned to spark's whole stage codegen
?
In theory, all kinds of operators can be fused together, our final goal is
to support all operators in batch mode, OperatorChain is just one case. Due
to this work effort is relatively large, so we need to complete it step by
step. Our OFCG not only achieves the ability of spark's whole stage
codegen, but also do more better than them.

> does the "support codegen" means fusion codegen? but why we generate a
FusionTransformation when the member operator does not support codegen, IMO
it should
fallback to the current behavior.

yes, it means the fusion codegen. In FLIP, I propose two operator fusion
mechanisms, one is like OperatorChain for single input operator, another is
MultipleInput fusion. For the former, our design mechanism is to fuse all
operators together at the ExecNode layer only if they all support fusion
codegen, or else go over the default OperatorChain. For the latter, in
order not to break the existing MultipleInput optimization purpose, so when
there are member operators that do not support fusion codegen,  we will
fall back to the current behavior[1], which means that a
FusionTransformation is created. here FusionTransformation is just a
surrogate for MultipleInput case, it actually means
MultipleInputTransformation, which fuses multiple physical operators.
Sorry, the description in the flow is not very clear and caused your
confusion.

> In the end, I share the same idea with Lincoln about performance
benchmark.
Currently flink community's flink-benchmark only covers like schedule,
state, datastream operator's performance.
A good benchmark harness for sql operator will benefit the sql optimizer
topic and observation

For the performance benchmark, I agree with you. As I stated earlier, I
think this is a new scope of work, we should design it separately, we can
introduce this improvement in the future.

[1]
https://github.com/apache/flink/blob/77214f138cf759a3ee5466c9b2379e717227a0ae/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/batch/BatchExecMultipleInput.java#L123

Best,
Ron

Jing Ge  于2023年6月8日周四 04:28写道:

> Hi Ron,
>
> Thanks for raising the proposal. It is a very attractive idea! Since the
> FLIP is a relatively complex one which contains three papers and a design
> doc. It deserves more time for the discussion to make sure everyone is on
> the same page. I have a NIT question which will not block your voting
> process. Previously, it took the community a lot of effort to make Flink
> kinds of scala free. Since the code base of the table module is too big,
> instead of porting to Java, all scala code has been hidden. Furthermore,
> there are ongoing efforts to remove Scala code from Flink. As you can see,
> the community tries to limit (i.e. get rid of) scala code as much as
> possible. I was wondering if it is possible for you to implement the FLIP
> with scala free code?
>
> Best regards,
> Jing
>
> [1] https://flink.apache.org/2022/02/22/scala-free-in-one-fifteen/
>
> On Wed, Jun 7, 2023 at 5:33 PM Aitozi  wrote:
>
> > Hi Ron:
> > Sorry for the late reply after the voting process. I just want to ask
> >
> > > Traverse the ExecNode DAG and create a FusionExecNode  for physical
> > operators that can be fused together.
> > which kind of operators can be fused together ? are the operators in an
> > operator chain? Is this optimization aligned to spark's whole stage
> codegen
> > ?
> >
> > > If any member operator does not support codegen, generate a
> > Transformation DAG based on the topological relationship of member
> ExecNode
> >  and jump to step 8.
> > step8: Generate a FusionTransformation, setting the parallelism and
> managed
> > memory for the fused operator.
> >
> > does the "support codegen" means fusion codegen? but why we generate a
> > FusionTransformation when the member operator does not support codegen,
> IMO
> > it should
> > fallback to the current behavior.
> >
> > In the end, I share the same idea with Lincoln about performance
> benchmark.
> > Currently flink community's flink-benchmark only covers like schedule,
> > state, datastream operator's performance.
> > A good benchmark harness for sql operator will benefit the sql optimizer
> > topic and observation
> >
> > Thanks,
> > Atiozi.
> >
> >
> > liu ron  于2023年6月6日周二 19:30写道:
> >
> > > Hi dev
> > >
> > > Thanks for all the feedback, it se

[VOTE] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-07 Thread liu ron
Hi everyone,

Thanks for all the feedback about FLIP-315: Support Operator Fusion Codegen
for Flink SQL[1].
[2] is the discussion thread.

I'd like to start a vote for it. The vote will be open for at least 72
hours (until June 12th, 12:00AM GMT) unless there is an objection or an
insufficient number of votes.

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
[2]: https://lists.apache.org/thread/9cnqhsld4nzdr77s2fwf00o9cb2g9fmw

Best,
Ron


Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-06 Thread liu ron
Hi dev

Thanks for all the feedback, it seems that here are no more comments, I will
start a vote on FLIP-315 [1] later. Thanks again.

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL

Best,
Ron

liu ron  于2023年6月5日周一 16:01写道:

> Hi, Yun, Jinsong, Benchao
>
> Thanks for your valuable input about this FLIP.
>
> First of all, let me emphasize that from the technical implementation
> point of view, this design is feasible in both stream and batch scenarios,
> so I consider both stream and batch mode in FLIP. In the stream scenario,
> for stateful operator, according to our business experience, basically the
> bottleneck is on the state access, so the optimization effect of OFCG for
> the stream will not be particularly obvious, so we will not give priority
> to support it currently. On the contrary, in the batch scenario, where CPU
> is the bottleneck, this optimization is gainful.
>
> Taking the above into account, we are able to support both stream and
> batch mode optimization in this design, but we will give priority to
> supporting batch operators. As benchao said, when we find a suitable
> streaming business scenario in the future, we can consider doing this
> optimization. Back to Yun issue, the design will break state compatibility
> in stream mode as[1] and the version upgrade will not support this OFCG. As
> mentioned earlier, we will not support this feature in stream mode in the
> short term.
>
> Also thanks to Benchao's suggestion, I will state the current goal of that
> optimization in the FLIP, scoped to batch mode.
>
> Best,
> Ron
>
> liu ron  于2023年6月5日周一 15:04写道:
>
>> Hi, Lincoln
>>
>> Thanks for your appreciation of this design. Regarding your question:
>>
>> > do we consider adding a benchmark for the operators to intuitively
>> understand the improvement brought by each improvement?
>>
>> I think it makes sense to add a benchmark, Spark also has this benchmark
>> framework. But I think it is another story to introduce a benchmark
>> framework in Flink, we need to start a new discussion to this work.
>>
>> > for the implementation plan, mentioned in the FLIP that 1.18 will
>> support Calc, HashJoin and HashAgg, then what will be the next step? and
>> which operators do we ultimately expect to cover (all or specific ones)?
>>
>> Our ultimate goal is to support all operators in batch mode, but we
>> prioritize them according to their usage. Operators like Calc, HashJoin,
>> HashAgg, etc. are more commonly used, so we will support them first. Later
>> we support the rest of the operators step by step. Considering the time
>> factor and the development workload, so we can only support  Calc,
>> HashJoin, HashAgg in 1.18. In 1.19 or 1.20, we will complete the rest work.
>> I will make this clear in FLIP
>>
>> Best,
>> Ron
>>
>> Jingsong Li  于2023年6月5日周一 14:15写道:
>>
>>> > For the state compatibility session, it seems that the checkpoint
>>> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
>>> be helpful in this case for SQL version upgrades?
>>>
>>> I guess this is only for batch processing. Streaming should be another
>>> story?
>>>
>>> Best,
>>> Jingsong
>>>
>>> On Mon, Jun 5, 2023 at 2:07 PM Yun Tang  wrote:
>>> >
>>> > Hi Ron,
>>> >
>>> > I think this FLIP would help to improve the performance, looking
>>> forward to its completion in Flink!
>>> >
>>> > For the state compatibility session, it seems that the checkpoint
>>> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
>>> be helpful in this case for SQL version upgrades?
>>> >
>>> >
>>> > [1]
>>> https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#heading=h.fri5rtpte0si
>>> > [2]
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489
>>> >
>>> > Best
>>> > Yun Tang
>>> >
>>> > 
>>> > From: Lincoln Lee 
>>> > Sent: Monday, June 5, 2023 10:56
>>> > To: dev@flink.apache.org 
>>> > Subject: Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for
>>> Flink SQL
>>> >
>>> > Hi Ron
>>> >
>>> > OFGC looks like an exciting optimization, looking forward to its
>>> completion
>>> > in Flink!
>>> > A small question, do we 

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-05 Thread liu ron
Hi, Yun, Jinsong, Benchao

Thanks for your valuable input about this FLIP.

First of all, let me emphasize that from the technical implementation point
of view, this design is feasible in both stream and batch scenarios, so I
consider both stream and batch mode in FLIP. In the stream scenario, for
stateful operator, according to our business experience, basically the
bottleneck is on the state access, so the optimization effect of OFCG for
the stream will not be particularly obvious, so we will not give priority
to support it currently. On the contrary, in the batch scenario, where CPU
is the bottleneck, this optimization is gainful.

Taking the above into account, we are able to support both stream and batch
mode optimization in this design, but we will give priority to supporting
batch operators. As benchao said, when we find a suitable streaming
business scenario in the future, we can consider doing this optimization.
Back to Yun issue, the design will break state compatibility in stream mode
as[1] and the version upgrade will not support this OFCG. As mentioned
earlier, we will not support this feature in stream mode in the short term.

Also thanks to Benchao's suggestion, I will state the current goal of that
optimization in the FLIP, scoped to batch mode.

Best,
Ron

liu ron  于2023年6月5日周一 15:04写道:

> Hi, Lincoln
>
> Thanks for your appreciation of this design. Regarding your question:
>
> > do we consider adding a benchmark for the operators to intuitively
> understand the improvement brought by each improvement?
>
> I think it makes sense to add a benchmark, Spark also has this benchmark
> framework. But I think it is another story to introduce a benchmark
> framework in Flink, we need to start a new discussion to this work.
>
> > for the implementation plan, mentioned in the FLIP that 1.18 will
> support Calc, HashJoin and HashAgg, then what will be the next step? and
> which operators do we ultimately expect to cover (all or specific ones)?
>
> Our ultimate goal is to support all operators in batch mode, but we
> prioritize them according to their usage. Operators like Calc, HashJoin,
> HashAgg, etc. are more commonly used, so we will support them first. Later
> we support the rest of the operators step by step. Considering the time
> factor and the development workload, so we can only support  Calc,
> HashJoin, HashAgg in 1.18. In 1.19 or 1.20, we will complete the rest work.
> I will make this clear in FLIP
>
> Best,
> Ron
>
> Jingsong Li  于2023年6月5日周一 14:15写道:
>
>> > For the state compatibility session, it seems that the checkpoint
>> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
>> be helpful in this case for SQL version upgrades?
>>
>> I guess this is only for batch processing. Streaming should be another
>> story?
>>
>> Best,
>> Jingsong
>>
>> On Mon, Jun 5, 2023 at 2:07 PM Yun Tang  wrote:
>> >
>> > Hi Ron,
>> >
>> > I think this FLIP would help to improve the performance, looking
>> forward to its completion in Flink!
>> >
>> > For the state compatibility session, it seems that the checkpoint
>> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
>> be helpful in this case for SQL version upgrades?
>> >
>> >
>> > [1]
>> https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#heading=h.fri5rtpte0si
>> > [2]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489
>> >
>> > Best
>> > Yun Tang
>> >
>> > 
>> > From: Lincoln Lee 
>> > Sent: Monday, June 5, 2023 10:56
>> > To: dev@flink.apache.org 
>> > Subject: Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for
>> Flink SQL
>> >
>> > Hi Ron
>> >
>> > OFGC looks like an exciting optimization, looking forward to its
>> completion
>> > in Flink!
>> > A small question, do we consider adding a benchmark for the operators to
>> > intuitively understand the improvement brought by each improvement?
>> > In addition, for the implementation plan, mentioned in the FLIP that
>> 1.18
>> > will support Calc, HashJoin and HashAgg, then what will be the next
>> step?
>> > and which operators do we ultimately expect to cover (all or specific
>> ones)?
>> >
>> > Best,
>> > Lincoln Lee
>> >
>> >
>> > liu ron  于2023年6月5日周一 09:40写道:
>> >
>> > > Hi, Jark
>> > >
>> > > Thanks for your feedback, according to my initial assessment, the work
>> > > e

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-05 Thread liu ron
Hi, Lincoln

Thanks for your appreciation of this design. Regarding your question:

> do we consider adding a benchmark for the operators to intuitively
understand the improvement brought by each improvement?

I think it makes sense to add a benchmark, Spark also has this benchmark
framework. But I think it is another story to introduce a benchmark
framework in Flink, we need to start a new discussion to this work.

> for the implementation plan, mentioned in the FLIP that 1.18 will support
Calc, HashJoin and HashAgg, then what will be the next step? and which
operators do we ultimately expect to cover (all or specific ones)?

Our ultimate goal is to support all operators in batch mode, but we
prioritize them according to their usage. Operators like Calc, HashJoin,
HashAgg, etc. are more commonly used, so we will support them first. Later
we support the rest of the operators step by step. Considering the time
factor and the development workload, so we can only support  Calc,
HashJoin, HashAgg in 1.18. In 1.19 or 1.20, we will complete the rest work.
I will make this clear in FLIP

Best,
Ron

Jingsong Li  于2023年6月5日周一 14:15写道:

> > For the state compatibility session, it seems that the checkpoint
> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
> be helpful in this case for SQL version upgrades?
>
> I guess this is only for batch processing. Streaming should be another
> story?
>
> Best,
> Jingsong
>
> On Mon, Jun 5, 2023 at 2:07 PM Yun Tang  wrote:
> >
> > Hi Ron,
> >
> > I think this FLIP would help to improve the performance, looking forward
> to its completion in Flink!
> >
> > For the state compatibility session, it seems that the checkpoint
> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
> be helpful in this case for SQL version upgrades?
> >
> >
> > [1]
> https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#heading=h.fri5rtpte0si
> > [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489
> >
> > Best
> > Yun Tang
> >
> > 
> > From: Lincoln Lee 
> > Sent: Monday, June 5, 2023 10:56
> > To: dev@flink.apache.org 
> > Subject: Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for
> Flink SQL
> >
> > Hi Ron
> >
> > OFGC looks like an exciting optimization, looking forward to its
> completion
> > in Flink!
> > A small question, do we consider adding a benchmark for the operators to
> > intuitively understand the improvement brought by each improvement?
> > In addition, for the implementation plan, mentioned in the FLIP that 1.18
> > will support Calc, HashJoin and HashAgg, then what will be the next step?
> > and which operators do we ultimately expect to cover (all or specific
> ones)?
> >
> > Best,
> > Lincoln Lee
> >
> >
> > liu ron  于2023年6月5日周一 09:40写道:
> >
> > > Hi, Jark
> > >
> > > Thanks for your feedback, according to my initial assessment, the work
> > > effort is relatively large.
> > >
> > > Moreover, I will add a test result of all queries to the FLIP.
> > >
> > > Best,
> > > Ron
> > >
> > > Jark Wu  于2023年6月1日周四 20:45写道:
> > >
> > > > Hi Ron,
> > > >
> > > > Thanks a lot for the great proposal. The FLIP looks good to me in
> > > general.
> > > > It looks like not an easy work but the performance sounds promising.
> So I
> > > > think it's worth doing.
> > > >
> > > > Besides, if there is a complete test graph with all TPC-DS queries,
> the
> > > > effect of this FLIP will be more intuitive.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > >
> > > > On Wed, 31 May 2023 at 14:27, liu ron  wrote:
> > > >
> > > > > Hi, Jinsong
> > > > >
> > > > > Thanks for your valuable suggestions.
> > > > >
> > > > > Best,
> > > > > Ron
> > > > >
> > > > > Jingsong Li  于2023年5月30日周二 13:22写道:
> > > > >
> > > > > > Thanks Ron for your information.
> > > > > >
> > > > > > I suggest that it can be written in the Motivation of FLIP.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Tue, May 30, 2023 at 9:57 AM liu ron 
> wrote:
> > > > > &

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-04 Thread liu ron
Hi, Jark

Thanks for your feedback, according to my initial assessment, the work
effort is relatively large.

Moreover, I will add a test result of all queries to the FLIP.

Best,
Ron

Jark Wu  于2023年6月1日周四 20:45写道:

> Hi Ron,
>
> Thanks a lot for the great proposal. The FLIP looks good to me in general.
> It looks like not an easy work but the performance sounds promising. So I
> think it's worth doing.
>
> Besides, if there is a complete test graph with all TPC-DS queries, the
> effect of this FLIP will be more intuitive.
>
> Best,
> Jark
>
>
>
> On Wed, 31 May 2023 at 14:27, liu ron  wrote:
>
> > Hi, Jinsong
> >
> > Thanks for your valuable suggestions.
> >
> > Best,
> > Ron
> >
> > Jingsong Li  于2023年5月30日周二 13:22写道:
> >
> > > Thanks Ron for your information.
> > >
> > > I suggest that it can be written in the Motivation of FLIP.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Tue, May 30, 2023 at 9:57 AM liu ron  wrote:
> > > >
> > > > Hi, Jingsong
> > > >
> > > > Thanks for your review. We have tested it in TPC-DS case, and got a
> 12%
> > > > gain overall when only supporting only Calc
> operator.
> > In
> > > > some queries, we even get more than 30% gain, it looks like  an
> > effective
> > > > way.
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Jingsong Li  于2023年5月29日周一 14:33写道:
> > > >
> > > > > Thanks Ron for the proposal.
> > > > >
> > > > > Do you have some benchmark results for the performance
> improvement? I
> > > > > am more concerned about the improvement on Flink than the data in
> > > > > other papers.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Mon, May 29, 2023 at 2:16 PM liu ron 
> wrote:
> > > > > >
> > > > > > Hi, dev
> > > > > >
> > > > > > I'd like to start a discussion about FLIP-315: Support Operator
> > > Fusion
> > > > > > Codegen for Flink SQL[1]
> > > > > >
> > > > > > As main memory grows, query performance is more and more
> determined
> > > by
> > > > > the
> > > > > > raw CPU costs of query processing itself, this is due to the
> query
> > > > > > processing techniques based on interpreted execution shows poor
> > > > > performance
> > > > > > on modern CPUs due to lack of locality and frequent instruction
> > > > > > mis-prediction. Therefore, the industry is also researching how
> to
> > > > > improve
> > > > > > engine performance by increasing operator execution efficiency.
> In
> > > > > > addition, during the process of optimizing Flink's performance
> for
> > > TPC-DS
> > > > > > queries, we found that a significant amount of CPU time was spent
> > on
> > > > > > virtual function calls, framework collector calls, and invalid
> > > > > > calculations, which can be optimized to improve the overall
> engine
> > > > > > performance. After some investigation, we found Operator Fusion
> > > Codegen
> > > > > > which is proposed by Thomas Neumann in the paper[2] can address
> > these
> > > > > > problems. I have finished a PoC[3] to verify its feasibility and
> > > > > validity.
> > > > > >
> > > > > > Looking forward to your feedback.
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
> > > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
> > > > > >
> > > > > > Best,
> > > > > > Ron
> > > > >
> > >
> >
>


Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-05-31 Thread liu ron
Hi, Jinsong

Thanks for your valuable suggestions.

Best,
Ron

Jingsong Li  于2023年5月30日周二 13:22写道:

> Thanks Ron for your information.
>
> I suggest that it can be written in the Motivation of FLIP.
>
> Best,
> Jingsong
>
> On Tue, May 30, 2023 at 9:57 AM liu ron  wrote:
> >
> > Hi, Jingsong
> >
> > Thanks for your review. We have tested it in TPC-DS case, and got a 12%
> > gain overall when only supporting only Calc operator. In
> > some queries, we even get more than 30% gain, it looks like  an effective
> > way.
> >
> > Best,
> > Ron
> >
> > Jingsong Li  于2023年5月29日周一 14:33写道:
> >
> > > Thanks Ron for the proposal.
> > >
> > > Do you have some benchmark results for the performance improvement? I
> > > am more concerned about the improvement on Flink than the data in
> > > other papers.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, May 29, 2023 at 2:16 PM liu ron  wrote:
> > > >
> > > > Hi, dev
> > > >
> > > > I'd like to start a discussion about FLIP-315: Support Operator
> Fusion
> > > > Codegen for Flink SQL[1]
> > > >
> > > > As main memory grows, query performance is more and more determined
> by
> > > the
> > > > raw CPU costs of query processing itself, this is due to the query
> > > > processing techniques based on interpreted execution shows poor
> > > performance
> > > > on modern CPUs due to lack of locality and frequent instruction
> > > > mis-prediction. Therefore, the industry is also researching how to
> > > improve
> > > > engine performance by increasing operator execution efficiency. In
> > > > addition, during the process of optimizing Flink's performance for
> TPC-DS
> > > > queries, we found that a significant amount of CPU time was spent on
> > > > virtual function calls, framework collector calls, and invalid
> > > > calculations, which can be optimized to improve the overall engine
> > > > performance. After some investigation, we found Operator Fusion
> Codegen
> > > > which is proposed by Thomas Neumann in the paper[2] can address these
> > > > problems. I have finished a PoC[3] to verify its feasibility and
> > > validity.
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > [1]:
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
> > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
> > > >
> > > > Best,
> > > > Ron
> > >
>


Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration

2023-05-31 Thread liu ron
Hi, Feng

Thanks for driving this FLIP, this proposal is very useful for catalog
management.
I have some small questions:

1. Regarding the CatalogStoreFactory#createCatalogStore method, do we need
to provide a default implementation?
2. If we get Catalog from CatalogStore, after initializing it, whether we
put it to Map catalogs again?
3. Regarding the options `sql.catalog.store.type` and
`sql.catalog.store.file.path`, how about renaming them to
`catalog.store.type` and `catalog.store.path`?

Best,
Ron

Feng Jin  于2023年5月29日周一 21:19写道:

> Hi yuxia
>
>  > But from the code in Proposed Changes, once we register the Catalog, we
> initialize it and open it. right?
>
> Yes, In order to avoid inconsistent semantics of the original CREATE
> CATALOG DDL, Catalog will be directly initialized in registerCatalog so
> that parameter validation can be performed.
>
> In the current design, lazy initialization is mainly reflected in
> getCatalog. If CatalogStore has already saved some catalog configurations,
> only initialization is required in getCatalog.
>
>
> Best,
> Feng
>
> On Mon, May 29, 2023 at 8:27 PM yuxia  wrote:
>
> > Hi, Feng.
> > I'm trying to understanding the meaning of *lazy initialization*. If i'm
> > wrong, please correct me.
> >
> > IIUC, lazy initialization means only you need to access the catalog, then
> > you initialize it. But from the code in Proposed Changes, once we
> register
> > the Catalog,
> > we initialize it and open it. right?
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Jing Ge" 
> > 收件人: "dev" 
> > 发送时间: 星期一, 2023年 5 月 29日 下午 5:12:46
> > 主题: Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration
> > and asynchronous registration
> >
> > Hi Feng,
> >
> > Thanks for your effort! +1 for the proposal.
> >
> > One of the major changes is that current design will provide
> > Map catalogs as a snapshot instead of a cache, which
> means
> > once it has been initialized, any changes done by other sessions will not
> > affect it. Point 6 described follow-up options for further improvement.
> >
> > Best regards,
> > Jing
> >
> > On Mon, May 29, 2023 at 5:31 AM Feng Jin  wrote:
> >
> > > Hi all, I would like to update you on the latest progress of the FLIP.
> > >
> > >
> > > Last week, Leonard Xu, HangRuan, Jing Ge, Shammon FY, ShengKai Fang
> and I
> > > had an offline discussion regarding the overall solution for Flink
> > > CatalogStore. We have reached a consensus and I have updated the final
> > > solution in FLIP.
> > >
> > > Next, let me briefly describe the entire design:
> > >
> > >1.
> > >
> > >Introduce CatalogDescriptor to store catalog configuration similar
> to
> > >TableDescriptor.
> > >2.
> > >
> > >The two key functions of CatalogStore - void storeCatalog(String
> > >catalogName, CatalogDescriptor) and CatalogDescriptor
> > getCatalog(String)
> > >will both use CatalogDescriptor instead of Catalog instance. This
> way,
> > >CatalogStore will only be responsible for saving and retrieving
> > catalog
> > >configurations without having to initialize catalogs.
> > >3.
> > >
> > >The default registerCatalog(String catalogName, Catalog catalog)
> > >function in CatalogManager will be marked as deprecated.
> > >4.
> > >
> > >A new function registerCatalog(String catalogName, CatalogDescriptor
> > >catalog) will be added to serve as the default registration function
> > for
> > >catalogs in CatalogManager.
> > >5.
> > >
> > >Map catalogs in CataloManager will remain unchanged
> > and
> > >save initialized catalogs.This means that deletion operations from
> one
> > >session won't synchronize with other sessions.
> > >6.
> > >
> > >To support multi-session synchronization scenarios for deletions
> later
> > >on we should make Mapcatalogs configurable.There may
> > be
> > >three possible situations:
> > >
> > >a.Default caching of all initialized catalogs
> > >
> > >b.Introduction of LRU cache logic which can automatically clear
> > >long-unused catalogs.
> > >
> > >c.No caching of any instances; each call to getCatalog creates a new
> > >instance.
> > >
> > >
> > > This is the document for discussion:
> > >
> > >
> >
> https://docs.google.com/document/d/1HRJNd4_id7i6cUxGnAybmYZIwl5g1SmZCOzGdUz-6lU/edit
> > >
> > > This is the final proposal document:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> > >
> > >
> > > Thank you very much for your attention and suggestions on this FLIP.  A
> > > special thanks to Hang Ruan for his suggestions on the entire design
> and
> > > organizing offline discussions.
> > >
> > > If you have any further suggestions or feedback about this FLIP please
> > feel
> > > free to share.
> > >
> > >
> > > Best,
> > >
> > > Feng
> > >
> > > On Sat, May 6, 2023 at 8:32 PM Jing Ge 
> > wrote:
> > >
> > > 

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

2023-05-30 Thread liu ron
Hi, Feng

Thanks for driving this FLIP, Time travel is very useful for Flink
integrate with data lake system. I have one question why the implementation
of TimeTravel is delegated to Catalog? Assuming that we use Flink to query
Hudi table with the time travel syntax, but we don't use the HudiCatalog,
instead, we register the hudi table to InMemoryCatalog,  can we support
time travel for Hudi table in this case?
In contrast, I think time travel should bind to connector instead of
Catalog, so the rejected alternative should be considered.

Best,
Ron

yuxia  于2023年5月30日周二 09:40写道:

> Hi, Feng.
> Notice this FLIP only support batch mode for time travel.  Would it also
> make sense to support stream mode to a read a snapshot of the table as a
> bounded stream?
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Benchao Li" 
> 收件人: "dev" 
> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>
> # Can Calcite support this syntax ` VERSION AS OF`  ?
>
> This also depends on whether this is defined in standard or any known
> databases that have implemented this. If not, it would be hard to push it
> to Calcite.
>
> # getTable(ObjectPath object, long timestamp)
>
> Then we again come to the problem of "casting between timestamp and
> numeric", which has been disabled in FLINK-21978[1]. If you're gonna use
> this, then we need to clarify that problem first.
>
> [1] https://issues.apache.org/jira/browse/FLINK-21978
>
>
> Feng Jin  于2023年5月29日周一 15:57写道:
>
> > hi, thanks for your reply.
> >
> > @Benchao
> > > did you consider the pushdown abilities compatible
> >
> > In the current design, the implementation of TimeTravel is delegated to
> > Catalog. We have added a function called getTable(ObjectPath tablePath,
> > long timestamp) to obtain the corresponding CatalogBaseTable at a
> specific
> > time.  Therefore, I think it will not have any impact on the original
> > pushdown abilities.
> >
> >
> > >   I see there is a rejected  design for adding SupportsTimeTravel, but
> I
> > didn't see the alternative in  the FLIP doc
> >
> > Sorry, the document description is not very clear.  Regarding whether to
> > support SupportTimeTravel, I have discussed it with yuxia. Since we have
> > already passed the corresponding time in getTable(ObjectPath, long
> > timestamp) of Catalog, SupportTimeTravel may not be necessary.
> >
> > In getTable(ObjectPath object, long timestamp), we can obtain the schema
> of
> > the corresponding time point and put the SNAPSHOT that needs to be
> consumed
> > into options.
> >
> >
> > @Shammon
> > > Could we support this in Flink too?
> >
> > I personally think it's possible, but limited by Calcite's syntax
> > restrictions. I believe we should first support this syntax in Calcite.
> > Currently, I think it may not be easy  to support this syntax in Flink's
> > parser. @Benchao, what do you think? Can Calcite support this syntax
> > ` VERSION AS OF`  ?
> >
> >
> > Best,
> > Feng.
> >
> >
> > On Fri, May 26, 2023 at 2:55 PM Shammon FY  wrote:
> >
> > > Thanks Feng, the feature of time travel sounds great!
> > >
> > > In addition to SYSTEM_TIME, lake houses such as paimon and iceberg
> > support
> > > snapshot or version. For example, users can query snapshot 1 for paimon
> > by
> > > the following statement
> > > SELECT * FROM t VERSION AS OF 1
> > >
> > > Could we support this in Flink too?
> > >
> > > Best,
> > > Shammon FY
> > >
> > > On Fri, May 26, 2023 at 1:20 PM Benchao Li 
> wrote:
> > >
> > > > Regarding the implementation, did you consider the pushdown abilities
> > > > compatible, e.g., projection pushdown, filter pushdown, partition
> > > pushdown.
> > > > Since `Snapshot` is not handled much in existing rules, I have a
> > concern
> > > > about this. Of course, it depends on your implementation detail, what
> > is
> > > > important is that we'd better add some cross tests for these.
> > > >
> > > > Regarding the interface exposed to Connector, I see there is a
> rejected
> > > > design for adding SupportsTimeTravel, but I didn't see the
> alternative
> > in
> > > > the FLIP doc. IMO, this is an important thing we need to clarify
> > because
> > > we
> > > > need to know whether the Connector supports this, and what
> > > column/metadata
> > > > corresponds to 'system_time'.
> > > >
> > > > Feng Jin  于2023年5月25日周四 22:50写道:
> > > >
> > > > > Thanks for your reply
> > > > >
> > > > > @Timo @BenChao @yuxia
> > > > >
> > > > > Sorry for the mistake,  Currently , calcite only supports  `FOR
> > > > SYSTEM_TIME
> > > > > AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS OF` .
> I've
> > > > > updated the syntax part of the FLIP.
> > > > >
> > > > >
> > > > > @Timo
> > > > >
> > > > > > We will convert it to TIMESTAMP_LTZ?
> > > > >
> > > > > Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ and then
> > > > convert
> > > > > it into a long value.
> > > > >
> > > > > > How do we want to query the most recent version of a 

Re: [DISCUSS] FLIP-294: Support Customized Job Meta Data Listener

2023-05-30 Thread liu ron
Hi, Shammon

Thanks for driving this FLIP, It will enforce the Flink metadata capability
from the platform produce perspective. The overall design looks good to me,
I just have some small question:
1. Regarding CatalogModificationListenerFactory#createListener method, I
think it would be better to pass Context as its parameter instead of two
specific Object. In this way, we can easily extend it in the future and
there will be no compatibility problems. Refer to
https://github.com/apache/flink/blob/9880ba5324d4a1252d6ae1a3f0f061e4469a05ac/flink-table/flink-table-common/src/main/java/org/apache/flink/table/factories/DynamicTableFactory.java#L81
2. In FLIP, you mentioned that multiple Flink tables may refer to the same
physical table, so does the Listener report this physical table repeatedly?
3. When registering a Listener object, will it connect to an external
system such as Datahub? If the Listener object registration times out due
to permission issues, it will affect the execution of all subsequent SQL,
what should we do in this case?

Best,
Ron

Shammon FY  于2023年5月31日周三 08:53写道:

> Thanks Feng, the catalog modification listener is only used to report
> read-only ddl information to other components or systems.
>
> > 1. Will an exception thrown by the listener affect the normal execution
> process?
>
> Users need to handle the exception in the listener themselves. Many DDLs
> such as drop tables and alter tables cannot be rolled back, Flink cannot
> handle these exceptions for the listener. It will cause the operation to
> exit if an exception is thrown, but the executed DDL will be successful.
>
> > 2. What is the order of execution? Is the listener executed first or are
> specific operations executed first?  If I want to perform DDL permission
> verification(such as integrating with Ranger based on the listener) , is
> that possible?
>
> The listener will be notified to report catalog modification after DDLs are
> successful, so you can not do permission verification for DDL in the
> listener. As mentioned above, Flink will not roll back the DDL even when
> the listener throws an exception. I think permission verification is
> another issue and can be discussed separately.
>
>
> Best,
> Shammon FY
>
> On Tue, May 30, 2023 at 1:07 AM Feng Jin  wrote:
>
> > Hi, Shammon
> >
> > Thanks for driving this Flip, [Support Customized Job Meta Data Listener]
> > will  make it easier for Flink to collect lineage information.
> > I fully agree with the overall solution and have a small question:
> >
> > 1. Will an exception thrown by the listener affect the normal execution
> > process?
> >
> > 2. What is the order of execution? Is the listener executed first or are
> > specific operations executed first?  If I want to perform DDL permission
> > verification(such as integrating with Ranger based on the listener) , is
> > that possible?
> >
> >
> > Best,
> > Feng
> >
> > On Fri, May 26, 2023 at 4:09 PM Shammon FY  wrote:
> >
> > > Hi devs,
> > >
> > > We would like to bring up a discussion about FLIP-294: Support
> Customized
> > > Job Meta Data Listener[1]. We have had several discussions with Jark
> Wu,
> > > Leonard Xu, Dong Lin, Qingsheng Ren and Poorvank about the functions
> and
> > > interfaces, and thanks for their valuable advice.
> > > The overall job and connector information is divided into metadata and
> > > lineage, this FLIP focuses on metadata and lineage will be discussed in
> > > another FLIP in the future. In this FLIP we want to add a customized
> > > listener in Flink to report catalog modifications to external metadata
> > > systems such as datahub[2] or atlas[3]. Users can view the specific
> > > information of connectors such as source and sink for Flink jobs in
> these
> > > systems, including fields, watermarks, partitions, etc.
> > >
> > > Looking forward to hearing from you, thanks.
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-294%3A+Support+Customized+Job+Meta+Data+Listener
> > > [2] https://datahub.io/
> > > [3] https://atlas.apache.org/#/
> > >
> >
>


Re: [DISCUSS] FLIP-307: Flink connector Redshift

2023-05-30 Thread liu ron
Hi, Samrat

Thanks for driving this FLIP. It looks like supporting
flink-connector-redshift is very useful to Flink. I have two question:
1. Regarding the  `read.mode` and `write.mode`, you say here provides two
modes, respectively, jdbc and `unload or copy`, What is the default value
for `read.mode` and `write.mode?
2. For Source, does it both support batch read and streaming read?


Best,
Ron

Samrat Deb  于2023年5月30日周二 17:15写道:

> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
>
> [note] Missed the trailing link for previous mail
>
>
>
> On Tue, May 30, 2023 at 2:43 PM Samrat Deb  wrote:
>
> > Hi Leonard,
> >
> > > and I’m glad to help review the design as well as the code review.
> > Thank you so much. It would be really great and helpful to bring
> > flink-connector-redshift for flink users :) .
> >
> > I have divided the implementation in 3 phases in the `Scope` Section[1].
> > 1st phase is to
> >
> >- Integrate with Flink Sink API (*FLIP-171*
> ><
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink>
> >)
> >
> >
> > > About the implementation phases, How about prioritizing support for the
> > Datastream Sink API and TableSink API in the first phase?
> > I can completely resonate with you to prioritize support for Datastream
> > Sink API and TableSink API in the first phase.
> > I will update the FLIP[1] as you have suggested.
> >
> > > It seems that the primary use cases for the Redshift connector are
> > acting as a sink for processed data by Flink.
> > Yes, majority ask and requirement for Redshift connector is sink for
> > processed data by Flink.
> >
> > Bests,
> > Samrat
> >
> > On Tue, May 30, 2023 at 12:35 PM Leonard Xu  wrote:
> >
> >> Thanks @Samrat for bringing this discussion.
> >>
> >> It makes sense to me to introduce AWS Redshift connector for Apache
> >> Flink, and I’m glad to help review the design as well as the code
> review.
> >>
> >> About the implementation phases, How about prioritizing support for the
> >> Datastream Sink API and TableSink API in the first phase? It seems that
> the
> >> primary use cases for the Redshift connector are acting as a sink for
> >> processed data by Flink.
> >>
> >> Best,
> >> Leonard
> >>
> >>
> >> > On May 29, 2023, at 12:51 PM, Samrat Deb 
> wrote:
> >> >
> >> > Hello all ,
> >> >
> >> > Context:
> >> > Amazon Redshift [1] is a fully managed, petabyte-scale data warehouse
> >> > service in the cloud. It allows analyzing data without all of the
> >> > configurations of a provisioned data warehouse. Resources are
> >> automatically
> >> > provisioned and data warehouse capacity is intelligently scaled to
> >> deliver
> >> > fast performance for even the most demanding and unpredictable
> >> workloads.
> >> > Redshift is one of the widely used warehouse solutions in the current
> >> > market.
> >> >
> >> > Building flink connector redshift will allow flink users to have
> source
> >> and
> >> > sink directly to redshift. It will help flink to expand the scope to
> >> > redshift as a new connector in the ecosystem.
> >> >
> >> > I would like to start a discussion on the FLIP-307: Flink connector
> >> > redshift [2].
> >> > Looking forward to comments, feedbacks and suggestions from the
> >> community
> >> > on the proposal.
> >> >
> >> > [1] https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html
> >> > [2]
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
> >> >
> >> >
> >> >
> >> > Bests,
> >> > Samrat
> >>
> >>
>


Re: [DISCUSS] Hive dialect shouldn't fall back to Flink's default dialect

2023-05-30 Thread liu ron
Thanks for your proposal. I even don't notice this fallback behavior, +1.

Best,
Ron

Jingsong Li  于2023年5月30日周二 15:23写道:

> +1, the fallback looks weird now, it is outdated.
>
> But, it is good to provide an option. I don't know if there are some
> users who depend on this fallback.
>
> Best,
> Jingsong
>
> On Tue, May 30, 2023 at 1:47 PM Rui Li  wrote:
> >
> > +1, the fallback was just intended as a temporary workaround to run
> catalog/module related statements with hive dialect.
> >
> > On Mon, May 29, 2023 at 3:59 PM Benchao Li  wrote:
> >>
> >> Big +1 on this, thanks yuxia for driving this!
> >>
> >> yuxia  于2023年5月29日周一 14:55写道:
> >>
> >> > Hi, community.
> >> >
> >> > I want to start the discussion about Hive dialect shouldn't fall back
> to
> >> > Flink's default dialect.
> >> >
> >> > Currently, when the HiveParser fail to parse the sql in Hive dialect,
> >> > it'll fall back to Flink's default parser[1] to handle flink-specific
> >> > statements like "CREATE CATALOG xx with (xx);".
> >> >
> >> > As I‘m involving with Hive dialect and have some communication with
> >> > community users who use Hive dialectrecently,  I'm thinking throw
> exception
> >> > directly instead of falling back to Flink's default dialect when fail
> to
> >> > parse the sql in Hive dialect
> >> >
> >> > Here're some reasons:
> >> >
> >> > First of all, it'll hide some error with Hive dialect. For example, we
> >> > found we can't use Hive dialect any more with Flink sql client in
> release
> >> > validation phase[2], finally we find a modification in Flink sql
> client
> >> > cause it, but our test case can't find it earlier for although
> HiveParser
> >> > faill to parse it but then it'll fall back to default parser and pass
> test
> >> > case successfully.
> >> >
> >> > Second, conceptually, Hive dialect should be do nothing with Flink's
> >> > default dialect. They are two totally different dialect. If we do
> need a
> >> > dialect mixing Hive dialect and default dialect , may be we need to
> propose
> >> > a new hybrid dialect and announce the hybrid behavior to users.
> >> > Also, It made some users confused for the fallback behavior. The fact
> >> > comes from I had been ask by community users. Throw an excpetioin
> directly
> >> > when fail to parse the sql statement in Hive dialect will be more
> intuitive.
> >> >
> >> > Last but not least, it's import to decouple Hive with Flink planner[3]
> >> > before we can externalize Hive connector[4]. If we still fall back to
> Flink
> >> > default dialct, then we will need depend on `ParserImpl` in Flink
> planner,
> >> > which will block us removing the provided dependency of Hive dialect
> as
> >> > well as externalizing Hive connector.
> >> >
> >> > Although we hadn't announced the fall back behavior ever, but some
> users
> >> > may implicitly depend on this behavior in theirs sql jobs. So, I
> hereby
> >> > open the dicussion about abandoning the fall back behavior to make
> Hive
> >> > dialect clear and isoloted.
> >> > Please remember it won't break the Hive synatax but the syntax
> specified
> >> > to Flink may fail after then. But for the failed sql, you can use `SET
> >> > table.sql-dialect=default;` to switch to Flink dialect.
> >> > If there's some flink-specific statements we found should be included
> in
> >> > Hive dialect to be easy to use, I think we can still add them as
> specific
> >> > cases to Hive dialect.
> >> >
> >> > Look forwards to your feedback. I'd love to listen the feedback from
> >> > community to take the next steps.
> >> >
> >> > [1]:
> >> >
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/HiveParser.java#L348
> >> > [2]:https://issues.apache.org/jira/browse/FLINK-26681
> >> > [3]:https://issues.apache.org/jira/browse/FLINK-31413
> >> > [4]:https://issues.apache.org/jira/browse/FLINK-30064
> >> >
> >> >
> >> >
> >> > Best regards,
> >> > Yuxia
> >> >
> >>
> >>
> >> --
> >>
> >> Best,
> >> Benchao Li
> >
> >
> >
> > --
> > Best regards!
> > Rui Li
>


Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-05-29 Thread liu ron
Hi, Jingsong

Thanks for your review. We have tested it in TPC-DS case, and got a 12%
gain overall when only supporting only Calc operator. In
some queries, we even get more than 30% gain, it looks like  an effective
way.

Best,
Ron

Jingsong Li  于2023年5月29日周一 14:33写道:

> Thanks Ron for the proposal.
>
> Do you have some benchmark results for the performance improvement? I
> am more concerned about the improvement on Flink than the data in
> other papers.
>
> Best,
> Jingsong
>
> On Mon, May 29, 2023 at 2:16 PM liu ron  wrote:
> >
> > Hi, dev
> >
> > I'd like to start a discussion about FLIP-315: Support Operator Fusion
> > Codegen for Flink SQL[1]
> >
> > As main memory grows, query performance is more and more determined by
> the
> > raw CPU costs of query processing itself, this is due to the query
> > processing techniques based on interpreted execution shows poor
> performance
> > on modern CPUs due to lack of locality and frequent instruction
> > mis-prediction. Therefore, the industry is also researching how to
> improve
> > engine performance by increasing operator execution efficiency. In
> > addition, during the process of optimizing Flink's performance for TPC-DS
> > queries, we found that a significant amount of CPU time was spent on
> > virtual function calls, framework collector calls, and invalid
> > calculations, which can be optimized to improve the overall engine
> > performance. After some investigation, we found Operator Fusion Codegen
> > which is proposed by Thomas Neumann in the paper[2] can address these
> > problems. I have finished a PoC[3] to verify its feasibility and
> validity.
> >
> > Looking forward to your feedback.
> >
> > [1]:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
> > [3]: https://github.com/lsyldliu/flink/tree/OFCG
> >
> > Best,
> > Ron
>


[DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-05-29 Thread liu ron
Hi, dev

I'd like to start a discussion about FLIP-315: Support Operator Fusion
Codegen for Flink SQL[1]

As main memory grows, query performance is more and more determined by the
raw CPU costs of query processing itself, this is due to the query
processing techniques based on interpreted execution shows poor performance
on modern CPUs due to lack of locality and frequent instruction
mis-prediction. Therefore, the industry is also researching how to improve
engine performance by increasing operator execution efficiency. In
addition, during the process of optimizing Flink's performance for TPC-DS
queries, we found that a significant amount of CPU time was spent on
virtual function calls, framework collector calls, and invalid
calculations, which can be optimized to improve the overall engine
performance. After some investigation, we found Operator Fusion Codegen
which is proposed by Thomas Neumann in the paper[2] can address these
problems. I have finished a PoC[3] to verify its feasibility and validity.

Looking forward to your feedback.

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
[2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
[3]: https://github.com/lsyldliu/flink/tree/OFCG

Best,
Ron


Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

2023-04-23 Thread liu ron
ingMode will be used by the Catalog to determine whether to
> >support atomic or not. With this selector argument, there will be two
> >different logics built within one method and it is hard to follow without
> >reading the code or the doc carefully(another concern is to keep the doc
> >and code alway be consistent) i.e. sometimes there will be no difference
> by
> >using true/false isStreamingMode, sometimes they are quite different -
> >atomic vs. non-atomic. Another question is, before we call
> >Catalog#twoPhaseCreateTable(...), we have to know the value of
> >isStreamingMode. In case only non-atomic is supported for streaming mode,
> >we could just follow FLIP-218 instead of (twistedly) calling
> >Catalog#twoPhaseCreateTable(...) with a false isStreamingMode. Did I miss
> >anything here?
> >
> >Best regards,
> >Jing
> >
> >On Fri, Apr 14, 2023 at 1:55 PM yuxia 
> wrote:
> >
> >> Hi, Mang.
> >> +1 for completing the support for atomicity of CTAS, this is very useful
> >> in batch scenarios and integrate with the data lake which support
> >> transcation.
> >>
> >> I just have one question, IIUC, the DynamiacTableSink will need to know
> >> it's for normal case or the atomicity with CTAS as well as neccessary
> >> context.
> >> Take jdbc catalog as an example, if it's CTAS with atomicity supports,
> the
> >> jdbc DynamiacTableSink will write the temp table defined in the
> >> TwoPhaseCatalogTable which is different from normal case.
> >>
> >> How can the DynamiacTableSink can get it? Could you give some
> explanation
> >> or example in this FLIP?
> >>
> >>
> >> Best regards,
> >> Yuxia
> >>
> >> - 原始邮件 -
> >> 发件人: "zhangmang1" 
> >> 收件人: "dev" , "ron9 liu" ,
> >> "lincoln 86xy" 
> >> 发送时间: 星期五, 2023年 4 月 14日 下午 2:50:40
> >> 主题: Re:Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS
> >> SELECT(CTAS) statement
> >>
> >> Hi, Lincoln and Ron
> >>
> >>
> >> Thank you for your reply.
> >> On the naming wise I think OK, the future expansion of new features more
> >> uniform. I have updated the FLIP.
> >>
> >>
> >> About Hive support atomicity CTAS, Hive is rich in usage scenarios and
> can
> >> be divided into three scenarios: 1. writing Hive tables 2. writing Hive
> >> tables with speculative execution 3. writing Hive table with small file
> >> merge
> >>
> >>
> >> The main purpose of FLIP-305 is to implement support for CTAS atomicity
> in
> >> the Flink framework,
> >> so I only poc to verify the first scenario of writing to the Hive table,
> >> and we can subsequently split the sub-task to support the other two
> >> scenarios.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Mang Zhang
> >>
> >>
> >>
> >>
> >>
> >> At 2023-04-13 12:27:24, "Lincoln Lee"  wrote:
> >> >Hi, Mang
> >> >
> >> >+1 for completing the support for atomicity of CTAS, this is very
> useful
> >> in
> >> >batch scenarios.
> >> >
> >> >I have two questions:
> >> >1. naming wise:
> >> >  a) can we rename the `Catalog#getTwoPhaseCommitCreateTable` to
> >> >`Catalog#twoPhaseCreateTable` (and we may add
> >> >twoPhaseReplaceTable/twoPhaseCreateOrReplaceTable later)
> >> >  b) for the `TwoPhaseCommitCatalogTable`, may it be better using
> >> >`TwoPhaseCatalogTable`?
> >> >  c) `TwoPhaseCommitCatalogTable#beginTransaction`, the word
> 'transaction'
> >> >in the method name, which may remind users of the relevance of
> transaction
> >> >support (however, it is not strictly so), so I suggest changing it to
> >> >`begin`
> >> >2. Has this design been validated by any relevant Poc on hive or other
> >> >catalogs?
> >> >
> >> >Best,
> >> >Lincoln Lee
> >> >
> >> >
> >> >liu ron  于2023年4月13日周四 10:17写道:
> >> >
> >> >> Hi, Mang
> >> >> Atomicity is very important for CTAS, especially for batch jobs. This
> >> FLIP
> >> >> is a continuation of FLIP-218, which is valuable for CTAS.
> >> >> I just have one question, in the Motivation part of FLIP-218, we
> >> mentioned
> >> >> three levels of atomicity semantics, can this current design do the
> >> same as
> >> >> Spark's DataSource V2, which can guarantee both atomicity and
> isolation,
> >> >> for example, can it be done by writing to Hive tables using CTAS?
> >> >>
> >> >> Best,
> >> >> Ron
> >> >>
> >> >> Mang Zhang  于2023年4月10日周一 11:03写道:
> >> >>
> >> >> > Hi, everyone
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > I'd like to start a discussion about FLIP-305: Support atomic for
> >> CREATE
> >> >> > TABLE AS SELECT(CTAS) statement [1].
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > CREATE TABLE AS SELECT(CTAS) statement has been support, but it's
> not
> >> >> > atomic. It will create the table first before job running. If the
> job
> >> >> > execution fails, or is cancelled, the table will not be dropped.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > So I want Flink to support atomic CTAS, where only the table is
> >> created
> >> >> > when the Job succeeds. Improve user experience.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > Looking forward to your feedback.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > [1]
> >> >> >
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > Best regards,
> >> >> > Mang Zhang
> >> >>
> >>
>
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren

2023-04-23 Thread liu ron
Congratulations, Qingsheng.

Best,
Ron

Zhanghao Chen  于2023年4月23日周日 17:32写道:

> Congratulations, Qingsheng!
>
> Best,
> Zhanghao Chen
> 
> From: Shammon FY 
> Sent: Sunday, April 23, 2023 17:22
> To: dev@flink.apache.org 
> Subject: Re: [ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren
>
> Congratulations, Qingsheng!
>
> Best,
> Shammon FY
>
> On Sun, Apr 23, 2023 at 4:40 PM Weihua Hu  wrote:
>
> > Congratulations, Qingsheng!
> >
> > Best,
> > Weihua
> >
> >
> > On Sun, Apr 23, 2023 at 3:53 PM Yun Tang  wrote:
> >
> > > Congratulations, Qingsheng!
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: weijie guo 
> > > Sent: Sunday, April 23, 2023 14:50
> > > To: dev@flink.apache.org 
> > > Subject: Re: [ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren
> > >
> > > Congratulations, Qingsheng!
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Geng Biao  于2023年4月23日周日 14:29写道:
> > >
> > > > Congrats, Qingsheng!
> > > > Best,
> > > > Biao Geng
> > > >
> > > > 获取 Outlook for iOS
> > > > 
> > > > 发件人: Wencong Liu 
> > > > 发送时间: Sunday, April 23, 2023 11:06:39 AM
> > > > 收件人: dev@flink.apache.org 
> > > > 主题: Re:[ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren
> > > >
> > > > Congratulations, Qingsheng!
> > > >
> > > > Best,
> > > > Wencong LIu
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2023-04-21 19:47:52, "Jark Wu"  wrote:
> > > > >Hi everyone,
> > > > >
> > > > >We are thrilled to announce that Leonard Xu has joined the Flink
> PMC!
> > > > >
> > > > >Leonard has been an active member of the Apache Flink community for
> > many
> > > > >years and became a committer in Nov 2021. He has been involved in
> > > various
> > > > >areas of the project, from code contributions to community building.
> > His
> > > > >contributions are mainly focused on Flink SQL and connectors,
> > especially
> > > > >leading the flink-cdc-connectors project to receive 3.8+K GitHub
> > stars.
> > > He
> > > > >authored 150+ PRs, and reviewed 250+ PRs, and drove several FLIPs
> > (e.g.,
> > > > >FLIP-132, FLIP-162). He has participated in plenty of discussions in
> > the
> > > > >dev mailing list, answering questions about 500+ threads in the
> > > > >user/user-zh mailing list. Besides that, he is community minded,
> such
> > as
> > > > >being the release manager of 1.17, verifying releases, managing
> > release
> > > > >syncs, etc.
> > > > >
> > > > >Congratulations and welcome Leonard!
> > > > >
> > > > >Best,
> > > > >Jark (on behalf of the Flink PMC)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2023-04-21 19:50:02, "Jark Wu"  wrote:
> > > > >Hi everyone,
> > > > >
> > > > >We are thrilled to announce that Qingsheng Ren has joined the Flink
> > PMC!
> > > > >
> > > > >Qingsheng has been contributing to Apache Flink for a long time. He
> is
> > > the
> > > > >core contributor and maintainer of the Kafka connector and
> > > > >flink-cdc-connectors, bringing users stability and ease of use in
> both
> > > > >projects. He drove discussions and implementations in FLIP-221,
> > > FLIP-288,
> > > > >and the connector testing framework. He is continuously helping with
> > the
> > > > >expansion of the Flink community and has given several talks about
> > Flink
> > > > >connectors at many conferences, such as Flink Forward Global and
> Flink
> > > > >Forward Asia. Besides that, he is willing to help a lot in the
> > community
> > > > >work, such as being the release manager for both 1.17 and 1.18,
> > > verifying
> > > > >releases, and answering questions on the mailing list.
> > > > >
> > > > >Congratulations and welcome Qingsheng!
> > > > >
> > > > >Best,
> > > > >Jark (on behalf of the Flink PMC)
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Leonard Xu

2023-04-23 Thread liu ron
Congratulations, Leonard.

Best,
Ron

Zhanghao Chen  于2023年4月23日周日 17:33写道:

> Congratulations, Leonard!
>
>
> Best,
> Zhanghao Chen
> 
> From: Shammon FY 
> Sent: Sunday, April 23, 2023 17:22
> To: dev@flink.apache.org 
> Subject: Re: [ANNOUNCE] New Apache Flink PMC Member - Leonard Xu
>
> Congratulations, Leonard!
>
> Best,
> Shammon FY
>
> On Sun, Apr 23, 2023 at 5:07 PM Xianxun Ye 
> wrote:
>
> > Congratulations, Leonard!
> >
> > Best regards,
> >
> > Xianxun
> >
> > > 2023年4月23日 09:10,Lincoln Lee  写道:
> > >
> > > Congratulations, Leonard!
> >
> >
>


Re: [DISCUSS FLINKSQL PARALLELISM]

2023-04-17 Thread liu ron
Hi, Green

Thanks for driving this discussion, in batch mode we have the Adaptive
Batch Scheduler which automatically derives operator parallelism based on
data volume at runtime, so we don't need to care about the parallelism.
However, in stream mode, currently, Flink SQL can only set the parallelism
of an operator globally, and many users would like to set the parallelism
of an operator individually, which seems to be a pain point at the moment,
and it would make sense to support set parallelism at operator granularity.
Do you have any idea about the solution for this problem?

Best,
Ron


GREEN <1286649...@qq.com.invalid> 于2023年4月14日周五 16:03写道:

> Problem:
>
>
> Currently, FlinkSQL can set a unified parallelism in the job,it
> cannot set parallelism for each operator.
> This can cause resource waste On the occasion of high
> parallelism and small data volume.there may also be too many small
> file for writing HDFS Scene.
>
>
> Solution:
> I can modify FlinkSQL to support operator parallelism.Is it meaningful to
> do this?Let's discuss.


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-12 Thread liu ron
Hi, xia

Thanks for your explanation, for the first question, given the current
status, I think we can provide the generic interface in the future if we
need it. For the second question,  it makes sense to me if we can
support the table cache at the framework level.

Best,
Ron

yuxia  于2023年4月11日周二 16:12写道:

> Hi, ron.
>
> 1: Considering for deleting rows, Flink will also write delete record to
> achive purpose of deleting data, it may not as so strange for connector
> devs to make DynamicTableSink implement SupportsTruncate to support
> truncate the table. Based on the assume that DynamicTableSink is used for
> inserting/updating/deleting, I think it's reasonable for DynamicTableSink
> to implement SupportsTruncate. But I think it sounds reasonable to add a
> generic interface like DynamicTable to differentiate DynamicTableSource &
> DynamicTableSink. But it will definitely requires much design and
> discussion which deserves a dedicated FLIP. I perfer not to do that in this
> FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe
> we can discuss it if some day if we do need the new generic table interface.
>
> 2: Considering various catalogs and tables, it's hard for Flink to do the
> unified follow-up actions after truncating table. But still the external
> connector can do such follow-up actions in method `executeTruncation`.
> Btw, in Spark, for the newly truncate table interface[1], Spark only
> recaches the table after truncating table[2] which I think if Flink
> supports table cache in framework-level,
> we can also recache in framework-level for truncate table statement.
>
> [1]
> https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
> [2]
> https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala
>
>
> I think the external catalog can implemnet such logic in method
> `executeTruncation`.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "liu ron" 
> 收件人: "dev" 
> 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Hi, xia
> It's a nice improvement to support TRUNCATE TABLE statement, making Flink
> more feature-rich.
> I think the truncate syntax is a command that will be executed in the
> client's process, rather than pulling up a Flink job to execute on the
> cluster. So on the user-facing exposed interface, I think we should not let
> users implement the SupportsTruncate interface on the DynamicTableSink
> interface. This seems a bit strange and also confuses users, as hang said,
> why Source table does not support truncate. It would be nice if we could
> come up with a generic interface that supports truncate instead of binding
> it to the DynamicTableSink interface, and maybe in the future we will
> support more commands like truncate command.
>
> In addition, after truncating data, we may also need to update the metadata
> of the table, such as Hive table, we need to update the statistics, as well
> as clear the cache in the metastore, I think we should also consider these
> capabilities, Sparky has considered these, refer to
>
> https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573
> .
>
> Best,
>
> Ron
>
> Jim Hughes  于2023年4月11日周二 02:15写道:
>
> > Hi Yuxia,
> >
> > On Mon, Apr 10, 2023 at 10:35 AM yuxia 
> > wrote:
> >
> > > Hi, Jim.
> > >
> > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> > > support all at one shot. For the DynamicTableSinks that haven't
> > implemented
> > > SupportsTruncate interface, we'll throw exception
> > > like 'The truncate statement for the table is not supported as it
> hasn't
> > > implemented the interface SupportsTruncate'. Also, for some sinks that
> > > doesn't support deleting data, it can also implements it but throw more
> > > concrete exception like "xxx donesn't support to truncate a table as
> > delete
> > > is impossible for xxx". It depends on the external connector's
> > > implementation.
> > > Thanks for your advice, I updated it to the FLIP.
> > >
> >
> > Makes sense.
> >
> >
> > > 2: What do you mean by saying "truncate an input to a streaming query"?
> > > This FLIP is aimed to support TRUNCATE TABLE statement which is for
> > > truncating a table. In whic

Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

2023-04-12 Thread liu ron
Hi, Mang
Atomicity is very important for CTAS, especially for batch jobs. This FLIP
is a continuation of FLIP-218, which is valuable for CTAS.
I just have one question, in the Motivation part of FLIP-218, we mentioned
three levels of atomicity semantics, can this current design do the same as
Spark's DataSource V2, which can guarantee both atomicity and isolation,
for example, can it be done by writing to Hive tables using CTAS?

Best,
Ron

Mang Zhang  于2023年4月10日周一 11:03写道:

> Hi, everyone
>
>
>
>
> I'd like to start a discussion about FLIP-305: Support atomic for CREATE
> TABLE AS SELECT(CTAS) statement [1].
>
>
>
>
> CREATE TABLE AS SELECT(CTAS) statement has been support, but it's not
> atomic. It will create the table first before job running. If the job
> execution fails, or is cancelled, the table will not be dropped.
>
>
>
>
> So I want Flink to support atomic CTAS, where only the table is created
> when the Job succeeds. Improve user experience.
>
>
>
>
> Looking forward to your feedback.
>
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
>
>
>
>
>
>
>
>
>
>
> --
>
> Best regards,
> Mang Zhang


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-10 Thread liu ron
Hi, xia
It's a nice improvement to support TRUNCATE TABLE statement, making Flink
more feature-rich.
I think the truncate syntax is a command that will be executed in the
client's process, rather than pulling up a Flink job to execute on the
cluster. So on the user-facing exposed interface, I think we should not let
users implement the SupportsTruncate interface on the DynamicTableSink
interface. This seems a bit strange and also confuses users, as hang said,
why Source table does not support truncate. It would be nice if we could
come up with a generic interface that supports truncate instead of binding
it to the DynamicTableSink interface, and maybe in the future we will
support more commands like truncate command.

In addition, after truncating data, we may also need to update the metadata
of the table, such as Hive table, we need to update the statistics, as well
as clear the cache in the metastore, I think we should also consider these
capabilities, Sparky has considered these, refer to
https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573
.

Best,

Ron

Jim Hughes  于2023年4月11日周二 02:15写道:

> Hi Yuxia,
>
> On Mon, Apr 10, 2023 at 10:35 AM yuxia 
> wrote:
>
> > Hi, Jim.
> >
> > 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> > support all at one shot. For the DynamicTableSinks that haven't
> implemented
> > SupportsTruncate interface, we'll throw exception
> > like 'The truncate statement for the table is not supported as it hasn't
> > implemented the interface SupportsTruncate'. Also, for some sinks that
> > doesn't support deleting data, it can also implements it but throw more
> > concrete exception like "xxx donesn't support to truncate a table as
> delete
> > is impossible for xxx". It depends on the external connector's
> > implementation.
> > Thanks for your advice, I updated it to the FLIP.
> >
>
> Makes sense.
>
>
> > 2: What do you mean by saying "truncate an input to a streaming query"?
> > This FLIP is aimed to support TRUNCATE TABLE statement which is for
> > truncating a table. In which case it will inoperates with streaming
> queries?
> >
>
> Let's take a source like Kafka as an example.  Suppose I have an input
> topic Foo, and query which uses it as an input.
>
> When Foo is truncated, if the truncation works as a delete and create, then
> the connector may need to be made aware (otherwise it may try to use
> offsets from the previous topic).  On the other hand, one may have to ask
> Kafka to delete records up to a certain point.
>
> Also, savepoints for the query may contain information from the truncated
> table.  Should this FLIP involve invalidating that information in some
> manner?  Or does truncating a source table for a query cause undefined
> behavior on that query?
>
> Basically, I'm trying to think through the implementations of a truncate
> operation to streaming sources and queries.
>
> Cheers,
>
> Jim
>
>
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Jim Hughes" 
> > 收件人: "dev" 
> > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28
> > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >
> > Hi Yuxia,
> >
> > Two questions:
> >
> > 1.  Are you expecting all DynamicTableSinks to support Truncate?  The
> FLIP
> > could use some explanation for what supporting and not supporting the
> > operation means.
> >
> > 2.  How will truncate inoperate with streaming queries?  That is, if I
> > truncate an input to a streaming query, is there any defined behavior?
> >
> > Cheers,
> >
> > Jim
> >
> > On Wed, Mar 22, 2023 at 9:13 AM yuxia 
> wrote:
> >
> > > Hi, devs.
> > >
> > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> > > statement [1].
> > >
> > > The TRUNCATE TABLE statement is a SQL command that allows users to
> > quickly
> > > and efficiently delete all rows from a table without dropping the table
> > > itself. This statement is commonly used in data warehouse, where large
> > data
> > > sets are frequently loaded and unloaded from tables.
> > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore
> > exactly,
> > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface
> > with
> > > which the coresponding connectors can implement their own logic for
> > > truncating table.
> > >
> > > Looking forwards to your feedback.
> > >
> > > [1]: [
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > |
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > ]
> > >
> > >
> > > Best regards,
> > > Yuxia
> > >
> >
>


Re: [DISCUSS] FLIP-301: Hybrid Shuffle supports Remote Storage

2023-03-16 Thread liu ron
Hi, Yuxin,

Thanks for creating this FLIP. Adding remote storage capability to Flink's
Hybrid Shuffle is a significant improvement that addresses the issue of
local disk storage limitations, this also can improve the stability of
Flink Batch Job.
I just have one question: can the Hybrid Shuffle replace the RSS in the
future? Due to the Hybrid Shuffle having remote storage ability, I think
maybe we don't need to maintain a standalone RSS, it will simplify our
operation work.


Re: [ANNOUNCE] New Apache Flink Committer - Lincoln Lee

2023-01-11 Thread liu ron
Congratulations, Lincoln!

Best

Yu Li  于2023年1月12日周四 09:22写道:

> Congratulations, Lincoln!
>
> Best Regards,
> Yu
>
>
> On Wed, 11 Jan 2023 at 21:17, Martijn Visser 
> wrote:
>
> > Congratulations Lincoln, happy to have you on board!
> >
> > Best regards, Martijn
> >
> >
> > On Wed, Jan 11, 2023 at 1:49 PM Dong Lin  wrote:
> >
> > > Congratulations, Lincoln!
> > >
> > > Cheers,
> > > Dong
> > >
> > > On Tue, Jan 10, 2023 at 11:52 AM Jark Wu  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > On behalf of the PMC, I'm very happy to announce Lincoln Lee as a new
> > > Flink
> > > > committer.
> > > >
> > > > Lincoln Lee has been a long-term Flink contributor since 2017. He
> > mainly
> > > > works on Flink
> > > > SQL parts and drives several important FLIPs, e.g., FLIP-232 (Retry
> > Async
> > > > I/O), FLIP-234 (
> > > > Retryable Lookup Join), FLIP-260 (TableFunction Finish). Besides, He
> > also
> > > > contributed
> > > > much to Streaming Semantics, including the non-determinism problem
> and
> > > the
> > > > message
> > > > ordering problem.
> > > >
> > > > Please join me in congratulating Lincoln for becoming a Flink
> > committer!
> > > >
> > > > Cheers,
> > > > Jark Wu
> > > >
> > >
> >
>


Re: [blog article] Howto migrate a real-life batch pipeline from the DataSet API to the DataStream API

2022-11-07 Thread liu ron
Thanks for your post, It looks very good to me, also maybe for developers,

Best,
Liudalong

yuxia  于2022年11月8日周二 09:11写道:

> Wow, cool!  Thanks for your work.
> It'll be definitely helpful for the users that want to migrate their batch
> job from DataSet API to DataStream API.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Etienne Chauchot" 
> 收件人: "dev" , "User" 
> 发送时间: 星期一, 2022年 11 月 07日 下午 10:29:54
> 主题: [blog article] Howto migrate a real-life batch pipeline from the
> DataSet API to the DataStream API
>
> Hi everyone,
>
> In case some of you are interested, I just posted a blog article about
> migrating a real-life batch pipeline from the DataSet API to the
> DataStream API:
>
>
> https://echauchot.blogspot.com/2022/11/flink-howto-migrate-real-life-batch.html
>
> Best
>
> Etienne
>


Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-31 Thread liu ron
Congratulations!
Best,
Liudalong

Sergey Nuyanzin  于2022年10月31日周一 23:25写道:

> Congratulations!
> Thanks everyone involved!
>
> On Mon, Oct 31, 2022 at 3:15 PM Matthias Pohl
>  wrote:
>
> > Thanks everyone for making this release happen.
> >
> > Best,
> > Matthias
> >
> > On Mon, Oct 31, 2022 at 10:47 AM Yuan Mei 
> wrote:
> >
> > > Congrats! Thanks everyone who is making this release happen!
> > >
> > > Best
> > > Yuan
> > >
> > > On Mon, Oct 31, 2022 at 5:18 PM Danny Cranmer  >
> > > wrote:
> > >
> > > > Nice work everyone!
> > > >
> > > > Congratulations to all involved :D
> > > >
> > > > Danny,
> > > >
> > > > On Mon, Oct 31, 2022 at 4:07 AM Paul Lam 
> > wrote:
> > > >
> > > > > Congrats! Finally!
> > > > >
> > > > > Best,
> > > > > Paul Lam
> > > > >
> > > > > > 2022年10月31日 11:58,Yuxin Tan  写道:
> > > > > >
> > > > > > Congrats! Glad to hear that.
> > > > > >
> > > > > > Best,
> > > > > > Yuxin
> > > > > >
> > > > > >
> > > > > > weijie guo  于2022年10月31日周一 11:51写道:
> > > > > >
> > > > > >> Congratulations, this is a version with many new features and
> > thanks
> > > > to
> > > > > >> everyone involved!
> > > > > >>
> > > > > >> Best regards,
> > > > > >>
> > > > > >> Weijie
> > > > > >>
> > > > > >>
> > > > > >> Yun Tang  于2022年10月31日周一 11:32写道:
> > > > > >>
> > > > > >>> Congratulations, and thanks to everyone involved!
> > > > > >>>
> > > > > >>>
> > > > > >>> Best
> > > > > >>> Yun Tang
> > > > > >>> 
> > > > > >>> From: Zakelly Lan 
> > > > > >>> Sent: Monday, October 31, 2022 10:44
> > > > > >>> To: dev@flink.apache.org 
> > > > > >>> Subject: Re: [ANNOUNCE] Apache Flink 1.16.0 released
> > > > > >>>
> > > > > >>> Good to know & Congrats!
> > > > > >>>
> > > > > >>> Thanks everyone involved!
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Zakelly
> > > > > >>>
> > > > > >>> On Mon, Oct 31, 2022 at 10:40 AM Becket Qin <
> > becket@gmail.com>
> > > > > >> wrote:
> > > > > 
> > > > >  Hooray!! Congratulations to the team!
> > > > > 
> > > > >  Cheers,
> > > > > 
> > > > >  Jiangjie (Becket) Qin
> > > > > 
> > > > >  On Mon, Oct 31, 2022 at 9:57 AM Hang Ruan <
> > ruanhang1...@gmail.com
> > > >
> > > > > >>> wrote:
> > > > > 
> > > > > > Congratulations!
> > > > > >
> > > > > > Best,
> > > > > > Hang
> > > > > >
> > > > > > Shengkai Fang  于2022年10月31日周一 09:40写道:
> > > > > >
> > > > > >> Congratulations!
> > > > > >>
> > > > > >> Best,
> > > > > >> Shengkai
> > > > > >>
> > > > > >> Hangxiang Yu  于2022年10月31日周一 09:38写道:
> > > > > >>
> > > > > >>> Congratulations!
> > > > > >>> Thanks Chesnay, Martijn, Godfrey & Xingbo for managing the
> > > > > >> release.
> > > > > >>>
> > > > > >>> On Fri, Oct 28, 2022 at 7:35 PM Jing Ge <
> j...@ververica.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > >  Congrats!
> > > > > 
> > > > >  On Fri, Oct 28, 2022 at 1:22 PM 任庆盛 
> > > > > >> wrote:
> > > > > 
> > > > > > Congratulations and a big thanks to Chesnay, Martijn,
> > Godfrey
> > > > > >>> and
> > > > > >> Xingbo
> > > > > > for the awesome work for 1.16!
> > > > > >
> > > > > > Best regards,
> > > > > > Qingsheng Ren
> > > > > >
> > > > > >> On Oct 28, 2022, at 14:46, Xingbo Huang  >
> > > > > >>> wrote:
> > > > > >>
> > > > > >> The Apache Flink community is very happy to announce
> the
> > > > > >>> release
> > > > > > of
> > > > > > Apache
> > > > > >> Flink 1.16.0, which is the first release for the Apache
> > > > > >> Flink
> > > > > >>> 1.16
> > > > > > series.
> > > > > >>
> > > > > >> Apache Flink® is an open-source stream processing
> > framework
> > > > > >>> for
> > > > > >> distributed, high-performing, always-available, and
> > accurate
> > > > > >>> data
> > > > > > streaming
> > > > > >> applications.
> > > > > >>
> > > > > >> The release is available for download at:
> > > > > >> https://flink.apache.org/downloads.html
> > > > > >>
> > > > > >> Please check out the release blog post for an overview
> of
> > > > > >> the
> > > > > >> improvements for this release:
> > > > > >>
> > > > > >>>
> https://flink.apache.org/news/2022/10/28/1.16-announcement.html
> > > > > >>
> > > > > >> The full release notes are available in Jira:
> > > > > >>
> > > > > >
> > > > > >>>
> > > > > >>
> > > > > >
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351275
> > > > > >>
> > > > > >> We would like to thank all contributors of the Apache
> > Flink
> > > > > >> community
> > > > > >> who made this release possible!
> > > > > >>
> > > > > >> Regards,
> > > > >