Re: [VOTE] Release 1.19.1, release candidate #1

2024-06-08 Thread weijie guo
Thanks Hong!

+1(binding)

- Verified gpg signature
- Verified sha512 hash
- Checked gh release tag
- Checked all artifacts deployed to maven repo
- Ran a simple wordcount job on local standalone cluster
- Compiled from source code with JDK 1.8.0_291.

Best regards,

Weijie


Xiqian YU  于2024年6月7日周五 18:23写道:

> +1 (non-binding)
>
>
>   *   Checked download links & release tags
>   *   Verified that package checksums matched
>   *   Compiled Flink from source code with JDK 8 / 11
>   *   Ran E2e data integration test jobs on local cluster
>
> Regards,
> yux
>
> De : Rui Fan <1996fan...@gmail.com>
> Date : vendredi, 7 juin 2024 à 17:14
> À : dev@flink.apache.org 
> Objet : Re: [VOTE] Release 1.19.1, release candidate #1
> +1(binding)
>
> - Reviewed the flink-web PR (Left some comments)
> - Checked Github release tag
> - Verified signatures
> - Verified sha512 (hashsums)
> - The source archives do not contain any binaries
> - Build the source with Maven 3 and java8 (Checked the license as well)
> - Start the cluster locally with jdk8, and run the StateMachineExample job,
> it works fine.
>
> Best,
> Rui
>
> On Thu, Jun 6, 2024 at 11:39 PM Hong Liang  wrote:
>
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the flink v1.19.1,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint B78A5EA1 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-1.19.1-rc1" [5],
> > * website pull request listing the new release and adding announcement
> blog
> > post [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Hong
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354399
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.19.1-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1736/
> > [5] https://github.com/apache/flink/releases/tag/release-1.19.1-rc1
> > [6] https://github.com/apache/flink-web/pull/745
> >
>


Re: [VOTE] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-07 Thread weijie guo
+1 (binding)

Best regards,

Weijie


Zhu Zhu  于2024年6月7日周五 16:48写道:

> +1 (binding)
>
> Thanks,
> Zhu
>
> Xintong Song  于2024年6月7日周五 16:08写道:
>
> > +1 (binding)
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Fri, Jun 7, 2024 at 4:03 PM Yuxin Tan  wrote:
> >
> > > Hi everyone,
> > >
> > > Thanks for all the feedback about the FLIP-459 Support Flink
> > > hybrid shuffle integration with Apache Celeborn[1].
> > > The discussion thread is here [2].
> > >
> > > I'd like to start a vote for it. The vote will be open for at least
> > > 72 hours unless there is an objection or insufficient votes.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> > > [2] https://lists.apache.org/thread/gy7sm7qrf7yrv1rl5f4vtk5fo463ts33
> > >
> > > Best,
> > > Yuxin
> > >
> >
>


Re: [VOTE] Release flink-connector-cassandra v3.2.0, release candidate #1

2024-06-06 Thread weijie guo
Thanks Danny!

+1(binding)

- Verified signatures and hash sum
- Checked the CI build from tag
- Build from source
- Reviewed flink-web PR

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年6月7日周五 11:01写道:

> Thanks Danny for the hard work!
>
> +1(binding)
>
> - Verified signatures
> - Verified sha512 (hashsums)
> - The source archives do not contain any binaries
> - Build the source with Maven 3 and java8 (Checked the license as well)
> - Checked Github release tag
> - Reviewed the flink-web PR
>
> Best,
> Rui
>
> On Wed, May 22, 2024 at 8:01 PM Leonard Xu  wrote:
>
> > +1 (binding)
> >
> > - verified signatures
> > - verified hashsums
> > - built from source code with java 1.8 succeeded
> > - checked Github release tag
> > - checked release notes status which only left one issue is used for
> > release tracking
> > - reviewed the web PR
> >
> > Best,
> > Leonard
> >
> > > 2024年5月22日 下午6:10,weijie guo  写道:
> > >
> > > +1(non-binding)
> > >
> > > -Validated checksum hash
> > > -Verified signature
> > > -Build from source
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Hang Ruan  于2024年5月22日周三 10:12写道:
> > >
> > >> +1 (non-binding)
> > >>
> > >> - Validated checksum hash
> > >> - Verified signature
> > >> - Verified that no binaries exist in the source archive
> > >> - Build the source with Maven and jdk8
> > >> - Verified web PR
> > >> - Check that the jar is built by jdk8
> > >>
> > >> Best,
> > >> Hang
> > >>
> > >> Muhammet Orazov  于2024年5月22日周三
> 04:15写道:
> > >>
> > >>> Hey all,
> > >>>
> > >>> Could we please get some more votes to proceed with the release?
> > >>>
> > >>> Thanks and best,
> > >>> Muhammet
> > >>>
> > >>> On 2024-04-22 13:04, Danny Cranmer wrote:
> > >>>> Hi everyone,
> > >>>>
> > >>>> Please review and vote on release candidate #1 for
> > >>>> flink-connector-cassandra v3.2.0, as follows:
> > >>>> [ ] +1, Approve the release
> > >>>> [ ] -1, Do not approve the release (please provide specific
> comments)
> > >>>>
> > >>>> This release supports Flink 1.18 and 1.19.
> > >>>>
> > >>>> The complete staging area is available for your review, which
> > includes:
> > >>>> * JIRA release notes [1],
> > >>>> * the official Apache source release to be deployed to
> > dist.apache.org
> > >>>> [2],
> > >>>> which are signed with the key with fingerprint 125FD8DB [3],
> > >>>> * all artifacts to be deployed to the Maven Central Repository [4],
> > >>>> * source code tag v3.2.0-rc1 [5],
> > >>>> * website pull request listing the new release [6].
> > >>>> * CI build of the tag [7].
> > >>>>
> > >>>> The vote will be open for at least 72 hours. It is adopted by
> majority
> > >>>> approval, with at least 3 PMC affirmative votes.
> > >>>>
> > >>>> Thanks,
> > >>>> Danny
> > >>>>
> > >>>> [1]
> > >>>>
> > >>>
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353148
> > >>>> [2]
> > >>>>
> > >>>
> > >>
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-cassandra-3.2.0-rc1
> > >>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > >>>> [4]
> > >>>>
> > https://repository.apache.org/content/repositories/orgapacheflink-1722
> > >>>> [5]
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/flink-connector-cassandra/releases/tag/v3.2.0-rc1
> > >>>> [6] https://github.com/apache/flink-web/pull/737
> > >>>> [7]
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/flink-connector-cassandra/actions/runs/8784310241
> > >>>
> > >>
> >
> >
>


Re: [VOTE] Release flink-connector-jdbc v3.2.0, release candidate #2

2024-06-06 Thread weijie guo
Thanks Danny!

+1(binding)
- Verified signatures and hash sums
- Checked the CI build
- Checked the release note
- Reviewed the flink-web PR
- Build from source.

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年6月7日周五 11:08写道:

> Thanks Danny for the hard work!
>
> +1(binding)
>
> - Verified signatures
> - Verified sha512 (hashsums)
> - The source archives do not contain any binaries
> - Build the source with Maven 3 and java8 (Checked the license as well)
> - Checked Github release tag
> - Reviewed the flink-web PR
>
> Best,
> Rui
>
> On Tue, Jun 4, 2024 at 1:31 PM gongzhongqiang 
> wrote:
>
> > +1 (non-binding)
> >
> > - Validated checksum hash and signature.
> > - Confirmed that no binaries exist in the source archive.
> > - Built the source with JDK 8.
> > - Verified the web PR.
> > - Ensured the JAR is built by JDK 8.
> >
> > Best,
> > Zhongqiang Gong
> >
> > Danny Cranmer  于2024年4月18日周四 18:20写道:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #1 for the version
> 3.2.0,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > This release supports Flink 1.18 and 1.19.
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > which are signed with the key with fingerprint 125FD8DB [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag v3.2.0-rc1 [5],
> > > * website pull request listing the new release [6].
> > > * CI run of tag [7].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Danny
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353143
> > > [2]
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.2.0-rc2
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1718/
> > > [5]
> > https://github.com/apache/flink-connector-jdbc/releases/tag/v3.2.0-rc2
> > > [6] https://github.com/apache/flink-web/pull/734
> > > [7]
> > https://github.com/apache/flink-connector-jdbc/actions/runs/8736019099
> > >
> >
>


Re: [VOTE] Release flink-connector-aws v4.3.0, release candidate #2

2024-06-06 Thread weijie guo
Thanks Danny!

+1(binding)

- Verified signatures and hashsums
- Build from source
- Checked release tag
- Reviewed the flink-web PR
- Checked the CI build.

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年6月7日周五 11:00写道:

> Thanks Danny for the hard work!
>
> +1(binding)
>
> - Verified signatures
> - Verified sha512 (hashsums)
> - The source archives do not contain any binaries
> - Build the source with Maven 3 and java8 (Checked the license as well)
> - Checked Github release tag
> - Reviewed the flink-web PR
>
> Best,
> Rui
>
> On Fri, May 31, 2024 at 11:47 AM gongzhongqiang  >
> wrote:
>
> > +1 (non-binding)
> >
> > - Validated the checksum hash and signature.
> > - No binaries exist in the source archive.
> > - Built the source with JDK 8 succeed.
> > - Verified the flink-web PR.
> > - Ensured the JAR is built by JDK 8.
> >
> > Best,
> > Zhongqiang Gong
> >
> > Danny Cranmer  于2024年4月19日周五 18:08写道:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on release candidate #2 for flink-connector-aws
> > > v4.3.0, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > This version supports Flink 1.18 and 1.19.
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > which are signed with the key with fingerprint 125FD8DB [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag v4.3.0-rc2 [5],
> > > * website pull request listing the new release [6].
> > > * CI build of the tag [7].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Release Manager
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353793
> > > [2]
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-aws-4.3.0-rc2
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1721/
> > > [5]
> > https://github.com/apache/flink-connector-aws/releases/tag/v4.3.0-rc2
> > > [6] https://github.com/apache/flink-web/pull/733
> > > [7]
> > https://github.com/apache/flink-connector-aws/actions/runs/8751694197
> > >
> >
>


[DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-06 Thread weijie guo
Hi devs,


I'd like to start a discussion about FLIP-462[1]: Support Custom Data
Distribution for Input Stream of Lookup Join.


Lookup Join is an important feature in Flink, It is typically used to
enrich a table with data that is queried from an external system.
If we interact with the external systems for each incoming record, we
incur significant network IO and RPC overhead.

Therefore, most connectors introduce caching to reduce the per-record
level query overhead. However, because the data distribution of Lookup
Join's input stream is arbitrary, the cache hit rate is sometimes
unsatisfactory.


We want to introduce a mechanism for the connector to tell the Flink
planner its desired input stream data distribution or partitioning
strategy. This can significantly reduce the amount of cached data and
improve performance of Lookup Join.


You can find more details in this FLIP[1]. Looking forward to hearing
from you, thanks!


Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-462+Support+Custom+Data+Distribution+for+Input+Stream+of+Lookup+Join


Re: [ANNOUNCE] New Apache Flink PMC Member - Fan Rui

2024-06-05 Thread weijie guo
Congratulations, Rui. Well-deserved!

Best regards,

Weijie


Zakelly Lan  于2024年6月5日周三 18:05写道:

> Congratulations, Rui!
>
> Best,
> Zakelly
>
> On Wed, Jun 5, 2024 at 6:02 PM Piotr Nowojski 
> wrote:
>
> > Hi everyone,
> >
> > On behalf of the PMC, I'm very happy to announce another new Apache Flink
> > PMC Member - Fan Rui.
> >
> > Rui has been active in the community since August 2019. During this time
> he
> > has contributed a lot of new features. Among others:
> >   - Decoupling Autoscaler from Kubernetes Operator, and supporting
> > Standalone Autoscaler
> >   - Improvements to checkpointing, flamegraphs, restart strategies,
> > watermark alignment, network shuffles
> >   - Optimizing the memory and CPU usage of large operators, greatly
> > reducing the risk and probability of TaskManager OOM
> >
> > He reviewed a significant amount of PRs and has been active both on the
> > mailing lists and in Jira helping to both maintain and grow Apache
> Flink's
> > community. He is also our current Flink 1.20 release manager.
> >
> > In the last 12 months, Rui has been the most active contributor in the
> > Flink Kubernetes Operator project, while being the 2nd most active Flink
> > contributor at the same time.
> >
> > Please join me in welcoming and congratulating Fan Rui!
> >
> > Best,
> > Piotrek (on behalf of the Flink PMC)
> >
>


Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-04 Thread weijie guo
Thanks Yuxin for the proposal!

When we first proposed Hybrid Shuffle, I wanted to support pluggable
storage tier in the future. However, limited by the architecture of the
legacy Hybrid Shuffle at that time, this idea has not been realized. The
new architecture abstracts the tier nicely, and now it's time to introduce
support for external storage.

Big +1 for this one!

Best regards,

Weijie


rexxiong  于2024年6月5日周三 00:08写道:

> Thanks Yuxin for the proposal. +1,  as a member of the Apache Celeborn
> community, I am very excited about the integration of Flink's Hybrid
> Shuffle with Apache Celeborn. The whole design of CIP-6 looks good to me. I
> am looking forward to this integration.
>
> Thanks,
> Jiashu Xiong
>
> Ethan Feng  于2024年6月4日周二 16:47写道:
>
> > +1 for this proposal.
> >
> > After internally reviewing the prototype of CIP-6, this would improve
> > performance and stability for Flink users using Celeborn.
> >
> > Expect to see this feature come out to the community.
> >
> > As I come from the Celeborn community, I hope more users can try to
> > use Celeborn when there are Flink batch jobs.
> >
> > Thanks,
> > Ethan Feng
> >
> > Yuxin Tan  于2024年6月4日周二 16:34写道:
> > >
> > > Hi, Venkatakrishnan,
> > >
> > > Thanks for joining the discussion. We appreciate your interest
> > > in contributing to the work. Once the FLIP and CIP proposals
> > > have been approved, we will create some JIRA tickets in Flink
> > > and Celeborn projects. Please feel free to take a look at the
> > > tickets and select any that resonate with your interests.
> > >
> > > Best,
> > > Yuxin
> > >
> > >
> > > Venkatakrishnan Sowrirajan  于2024年5月31日周五 23:11写道:
> > >
> > > > Thanks for this FLIP. We are also interested in learning/contributing
> > to
> > > > the hybrid shuffle integration with celeborn for batch executions.
> > > >
> > > > On Tue, May 28, 2024, 7:07 PM Yuxin Tan 
> > wrote:
> > > >
> > > > > Hi, Xintong,
> > > > >
> > > > > >  I think we can also publish the prototype codes so the
> > > > > community can better understand and help with it.
> > > > >
> > > > > Ok, I agree on the point. I will prepare and publish the code
> > > > > recently.
> > > > >
> > > > > Rui,
> > > > >
> > > > > > Kindly reminder: the image of CIP-6[1] cannot be loaded.
> > > > >
> > > > > Thanks for the reminder. I've updated the images.
> > > > >
> > > > >
> > > > > Best,
> > > > > Yuxin
> > > > >
> > > > >
> > > > > Rui Fan <1996fan...@gmail.com> 于2024年5月29日周三 09:33写道:
> > > > >
> > > > > > Thanks Yuxin for driving this proposal!
> > > > > >
> > > > > > Kindly reminder: the image of CIP-6[1] cannot be loaded.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6*Support*Flink*hybrid*shuffle*integration*with*Apache*Celeborn__;KysrKysrKys!!IKRxdwAv5BmarQ!ZRTc1aUSYMDBazuIwlet1Dzk2_DD9qKTgoDLH9jSwAVLgwplcuId_8JoXkH0i7AeWxKWXkL0sxM3AeW-H9OJ6v9uGw$
> > > > > >
> > > > > > Best,
> > > > > > Rui
> > > > > >
> > > > > > On Wed, May 29, 2024 at 9:03 AM Xintong Song <
> > tonysong...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 for this proposal.
> > > > > > >
> > > > > > > We have been prototyping this feature internally at Alibaba
> for a
> > > > > couple
> > > > > > of
> > > > > > > months. Yuxin, I think we can also publish the prototype codes
> > so the
> > > > > > > community can better understand and help with it.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Xintong
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, May 28, 2024 at 8:34 PM Yuxin Tan <
> > tanyuxinw...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I would like to start a discussion on FLIP-459 Support Flink
> > hybrid
> > > > > > > shuffle
> > > > > > > > integration with
> > > > > > > > Apache Celeborn[1]. Flink hybrid shuffle supports transitions
> > > > between
> > > > > > > > memory, disk, and
> > > > > > > > remote storage to improve performance and job stability.
> > > > > Concurrently,
> > > > > > > > Apache Celeborn
> > > > > > > > provides a stable, performant, scalable remote shuffle
> service.
> > > > This
> > > > > > > > integration proposal is to
> > > > > > > > harness the benefits from both hybrid shuffle and Celeborn
> > > > > > > simultaneously.
> > > > > > > >
> > > > > > > > Note that this proposal has two parts.
> > > > > > > > 1. The Flink-side modifications are in FLIP-459[1].
> > > > > > > > 2. The Celeborn-side changes are in CIP-6[2].
> > > > > > > >
> > > > > > > > Looking forward to everyone's feedback and suggestions. Thank
> > you!
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> 

[jira] [Created] (FLINK-35520) Nightly build can't compile as problems were detected from NoticeFileChecker

2024-06-04 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35520:
--

 Summary: Nightly build can't compile as problems were detected 
from NoticeFileChecker
 Key: FLINK-35520
 URL: https://issues.apache.org/jira/browse/FLINK-35520
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[June 15 Feature Freeze][SUMMARY] Flink 1.20 Release Sync 06/04/2024

2024-06-04 Thread weijie guo
Dear devs,


This is the fifth meeting for Flink 1.20 release cycle.


I'd like to share the information synced in the meeting.


- Feature Freeze


  It is worth noting that there are only 2 weeks left until the
feature freeze time(June 15, 2024),
and developers need to pay attention to the feature freeze time.


- Features:


  So far we've had 13 flips/features, there are two FLIPs that
unlikely to be done in 1.20, and the status of the other is good.

  It is encouraged to continuously updating
the 1.20 wiki page[1] for contributors.


- Blockers:


   [open] FLINK-35423 - ARRAY_EXCEPT should support set semantics

   -
   We need someone familiar with SQL part to review the pull request attached.


 By the way, we have closed two performance regression
blockers(FLINK-35040 and FLINK-35215), thanks to everyone involved!


- Notice


  CI pipeline triggered by pull request seems sort of unstable,
sometimes it doesn't trigger properly. I have create a ticket to track
it:
https://issues.apache.org/jira/browse/FLINK-35517


- Sync meeting[2]:


 The next meeting is 06/11/2024 10am (UTC+2) and 4pm (UTC+8), please
feel free to join us.
Lastly,

we encourage attendees to fill out the topics to be discussed at
the bottom of 1.20 wiki page[1] a day in advance,

to make it easier for
everyone to understand the background of the topics, thanks!


[1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release

[2] https://meet.google.com/mtj-huez-apu


Best,

Robert, Rui, Ufuk, Weijie


[jira] [Created] (FLINK-35517) CI pipeline triggered by pull request seems unstable

2024-06-04 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35517:
--

 Summary: CI pipeline triggered by pull request seems unstable
 Key: FLINK-35517
 URL: https://issues.apache.org/jira/browse/FLINK-35517
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo


Flink CI pipeline triggered by pull request seems sort of unstable. 

For example, https://github.com/apache/flink/pull/24883 was filed 15 hours ago, 
but CI report is UNKNOWN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35505) RegionFailoverITCase.testMultiRegionFailover has never ever restored state

2024-06-02 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35505:
--

 Summary: RegionFailoverITCase.testMultiRegionFailover has never 
ever restored state
 Key: FLINK-35505
 URL: https://issues.apache.org/jira/browse/FLINK-35505
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Flink 1.18.2 release date

2024-05-30 Thread weijie guo
Hi Yang

IIRC, 1.18.2 has not been kicked off yet.

Best regards,

Weijie


Yang LI  于2024年5月30日周四 22:33写道:

> Dear Flink Community,
>
> Anyone know about the release date for 1.18.2?
>
> Thanks very much,
> Yang
>


Re: [DISCUSS] Connector releases for Flink 1.19

2024-05-30 Thread weijie guo
Hi Jing

> Do we have an umbrella ticket for Flink 1.19 connectors release?

FYI: https://issues.apache.org/jira/browse/FLINK-35131 :)

Best regards,

Weijie


Jing Ge  于2024年5月28日周二 20:29写道:

> Hi,
>
> Thanks Danny for driving it! Do we have an umbrella ticket for Flink 1.19
> connectors release?
>
> @Sergei
> Thanks for the hint wrt JDBC connector. Where could users know that it
> already supports 1.19?
>
> Best regards,
> Jing
>
> On Fri, May 17, 2024 at 4:07 AM Sergey Nuyanzin 
> wrote:
>
> > >, it looks like opensearch-2.0.0 has been created now, all good.
> > yep, thanks to Martijn
> >
> > I've created RCs for Opensearch connector
> >
> > On Tue, May 14, 2024 at 12:38 PM Danny Cranmer 
> > wrote:
> >
> > > Hello,
> > >
> > > @Sergey Nuyanzin , it looks like opensearch-2.0.0
> > > has been created now, all good.
> > >
> > > @Hongshun Wang, thanks, since the CDC connectors are not yet released I
> > > had omitted them from this task. But happy to include them, thanks for
> > the
> > > support.
> > >
> > > Thanks,
> > > Danny
> > >
> > > On Mon, May 13, 2024 at 3:40 AM Hongshun Wang  >
> > > wrote:
> > >
> > >> Hello Danny,
> > >> Thanks for pushing this forward.  I am available to assist with the
> CDC
> > >> connector[1].
> > >>
> > >> [1] https://github.com/apache/flink-cdc
> > >>
> > >> Best
> > >> Hongshun
> > >>
> > >> On Sun, May 12, 2024 at 8:48 PM Sergey Nuyanzin 
> > >> wrote:
> > >>
> > >> > I'm in a process of preparation of RC for OpenSearch connector
> > >> >
> > >> > however it seems I need PMC help: need to create opensearch-2.0.0 on
> > >> jira
> > >> > since as it was proposed in another ML[1] to have 1.x for OpenSearch
> > >> > v1 and 2.x for OpenSearch v2
> > >> >
> > >> > would be great if someone from PMC could help here
> > >> >
> > >> > [1]
> https://lists.apache.org/thread/3w1rnjp5y612xy5k9yv44hy37zm9ph15
> > >> >
> > >> > On Wed, Apr 17, 2024 at 12:42 PM Ferenc Csaky
> > >> >  wrote:
> > >> > >
> > >> > > Thank you Danny and Sergey for pushing this!
> > >> > >
> > >> > > I can help with the HBase connector if necessary, will comment the
> > >> > > details to the relevant Jira ticket.
> > >> > >
> > >> > > Best,
> > >> > > Ferenc
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wednesday, April 17th, 2024 at 11:17, Danny Cranmer <
> > >> > dannycran...@apache.org> wrote:
> > >> > >
> > >> > > >
> > >> > > >
> > >> > > > Hello all,
> > >> > > >
> > >> > > > I have created a parent Jira to cover the releases [1]. I have
> > >> > assigned AWS
> > >> > > > and MongoDB to myself and OpenSearch to Sergey. Please assign
> the
> > >> > > > relevant issue to yourself as you pick up the tasks.
> > >> > > >
> > >> > > > Thanks!
> > >> > > >
> > >> > > > [1] https://issues.apache.org/jira/browse/FLINK-35131
> > >> > > >
> > >> > > > On Tue, Apr 16, 2024 at 2:41 PM Muhammet Orazov
> > >> > > > mor+fl...@morazow.com.invalid wrote:
> > >> > > >
> > >> > > > > Thanks Sergey and Danny for clarifying, indeed it
> > >> > > > > requires committer to go through the process.
> > >> > > > >
> > >> > > > > Anyway, please let me know if I can be any help.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Muhammet
> > >> > > > >
> > >> > > > > On 2024-04-16 11:19, Danny Cranmer wrote:
> > >> > > > >
> > >> > > > > > Hello,
> > >> > > > > >
> > >> > > > > > I have opened the VOTE thread for the AWS connectors release
> > >> [1].
> > >> > > > > >
> > >> > > > > > > If I'm not mistaking (please correct me if I'm wrong) this
> > >> > request is
> > >> > > > > > > not
> > >> > > > > > > about version update it is about new releases for
> connectors
> > >> > > > > >
> > >> > > > > > Yes, correct. If there are any other code changes required
> > then
> > >> > help
> > >> > > > > > would be appreciated.
> > >> > > > > >
> > >> > > > > > > Are you going to create an umbrella issue for it?
> > >> > > > > >
> > >> > > > > > We do not usually create JIRA issues for releases. That
> being
> > >> said
> > >> > it
> > >> > > > > > sounds like a good idea to have one place to track the
> status
> > of
> > >> > the
> > >> > > > > > connector releases and pre-requisite code changes.
> > >> > > > > >
> > >> > > > > > > I would like to work on this task, thanks for initiating
> it!
> > >> > > > > >
> > >> > > > > > The actual release needs to be performed by a committer.
> > >> However,
> > >> > help
> > >> > > > > > getting the connectors building against Flink 1.19 and
> testing
> > >> the
> > >> > RC
> > >> > > > > > is
> > >> > > > > > appreciated.
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Danny
> > >> > > > > >
> > >> > > > > > [1]
> > >> > https://lists.apache.org/thread/0nw9smt23crx4gwkf6p1dd4jwvp1g5s0
> > >> > > > > >
> > >> > > > > > On Tue, Apr 16, 2024 at 6:34 AM Sergey Nuyanzin
> > >> > snuyan...@gmail.com
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Thanks for volunteering Muhammet!
> > >> > > > > > > And thanks Danny for starting the activity.
> > >> > > > > > >
> > 

Re: [VOTE] Release flink-connector-opensearch v1.2.0, release candidate #1

2024-05-30 Thread weijie guo
Thanks Sergey for driving this release!

+1(non-binding)

1. Verified signatures and hash sums
2. Build from source with 1.8.0_291 succeeded
3. Checked RN.

Best regards,

Weijie


Yuepeng Pan  于2024年5月30日周四 10:08写道:

> +1 (non-binding)
>
> - Built from source code with JDK 1.8 on MaxOS- Run examples locally.-
> Checked release notes Best, Yuepeng Pan
>
>
> At 2024-05-28 22:53:10, "gongzhongqiang" 
> wrote:
> >+1(non-binding)
> >
> >- Verified signatures and hash sums
> >- Reviewed the web PR
> >- Built from source code with JDK 1.8 on Ubuntu 22.04
> >- Checked release notes
> >
> >Best,
> >Zhongqiang Gong
> >
> >
> >Sergey Nuyanzin  于2024年5月16日周四 06:03写道:
> >
> >> Hi everyone,
> >> Please review and vote on release candidate #1 for
> >> flink-connector-opensearch v1.2.0, as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> >> [2],
> >> which are signed with the key with fingerprint
> >> F7529FAE24811A5C0DF3CA741596BBF0726835D8 [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag v1.2.0-rc1 [5],
> >> * website pull request listing the new release [6].
> >> * CI build of the tag [7].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> >> approval, with at least 3 PMC affirmative votes.
> >>
> >> Note that this release is for Opensearch v1.x
> >>
> >> Thanks,
> >> Release Manager
> >>
> >> [1] https://issues.apache.org/jira/projects/FLINK/versions/12353812
> >> [2]
> >>
> >>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-opensearch-1.2.0-rc1
> >> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1734
> >> [5]
> >>
> >>
> https://github.com/apache/flink-connector-opensearch/releases/tag/v1.2.0-rc1
> >> [6] https://github.com/apache/flink-web/pull/740
> >> [7]
> >>
> >>
> https://github.com/apache/flink-connector-opensearch/actions/runs/9102334125
> >>
>


[jira] [Created] (FLINK-35487) ContinuousFileProcessingCheckpointITCase crashed as process exit with code 127

2024-05-29 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35487:
--

 Summary: ContinuousFileProcessingCheckpointITCase crashed as 
process exit with code 127
 Key: FLINK-35487
 URL: https://issues.apache.org/jira/browse/FLINK-35487
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35483) BatchJobRecoveryTest.testRecoverFromJMFailover produced no output for 900 second

2024-05-29 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35483:
--

 Summary: BatchJobRecoveryTest.testRecoverFromJMFailover produced 
no output for 900 second
 Key: FLINK-35483
 URL: https://issues.apache.org/jira/browse/FLINK-35483
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Flink 1.19.1 release

2024-05-27 Thread weijie guo
+1 for releasing 1.19.1 and Hong as RM.

Thank you for volunteering!

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年5月27日周一 11:45写道:

> +1 for release 1.19.1 and Hong as release manager.
>
> We are really looking forward to this version, thank you.
>
> Best,
> Rui
>
> On Mon, May 27, 2024 at 11:05 AM Xintong Song 
> wrote:
>
> > +1 for the 1.19.1 release and +1 for Hong as the release manager.
> >
> > Hong, if you need help on PMC-related steps and Danny is not available,
> > please feel free to reach out to me on slack.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Mon, May 27, 2024 at 9:44 AM Leonard Xu  wrote:
> >
> > > +1 for the 1.19.1 release and +1 for Hong as release manager.
> > >
> > > Best,
> > > Leonard
> > >
> > > > 2024年5月25日 上午2:55,Danny Cranmer  写道:
> > > >
> > > > +1 for the 1.19.1 release and +1 for Hong as release manager.
> > >
> > >
> >
>


[jira] [Created] (FLINK-35457) EventTimeWindowCheckpointingITCase fails on AZP as NPE

2024-05-26 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35457:
--

 Summary: EventTimeWindowCheckpointingITCase fails on AZP as NPE
 Key: FLINK-35457
 URL: https://issues.apache.org/jira/browse/FLINK-35457
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35456) WindowAggregateITCase.testEventTimeHopWindow fails on AZP as NPE

2024-05-26 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35456:
--

 Summary: WindowAggregateITCase.testEventTimeHopWindow fails on AZP 
as NPE
 Key: FLINK-35456
 URL: https://issues.apache.org/jira/browse/FLINK-35456
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35443) ChangelogLocalRecoveryITCase failed fatally with 239 exit code

2024-05-23 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35443:
--

 Summary: ChangelogLocalRecoveryITCase failed fatally with 239 exit 
code
 Key: FLINK-35443
 URL: https://issues.apache.org/jira/browse/FLINK-35443
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-443: Interruptible watermark processing

2024-05-23 Thread weijie guo
+1(binding)

Thanks for driving this!

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年5月24日周五 13:03写道:

> +1(binding)
>
> Best,
> Rui
>
> On Fri, May 24, 2024 at 12:01 PM Yanfei Lei  wrote:
>
> > Thanks for driving this!
> >
> > +1 (binding)
> >
> > Best,
> > Yanfei
> >
> > Zakelly Lan  于2024年5月24日周五 10:13写道:
> >
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Thu, May 23, 2024 at 8:21 PM Piotr Nowojski 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > After reaching what looks like a consensus in the discussion thread
> > [1], I
> > > > would like to put FLIP-443 [2] to the vote.
> > > >
> > > > The vote will be open for at least 72 hours unless there is an
> > objection or
> > > > insufficient votes.
> > > >
> > > > [1] https://lists.apache.org/thread/flxm7rphvfgqdn2gq2z0bb7kl007olpz
> > > > [2] https://cwiki.apache.org/confluence/x/qgn9EQ
> > > >
> > > > Bets,
> > > > Piotrek
> > > >
> >
>


Re: [VOTE] FLIP-457: Improve Table/SQL Configuration for Flink 2.0

2024-05-23 Thread weijie guo
+1(binding)

Best regards,

Weijie


Lincoln Lee  于2024年5月24日周五 12:20写道:

> +1(binding)
>
> Best,
> Lincoln Lee
>
>
> Jane Chan  于2024年5月24日周五 09:52写道:
>
> > Hi all,
> >
> > I'd like to start a vote on FLIP-457[1] after reaching a consensus
> through
> > the discussion thread[2].
> >
> > The vote will be open for at least 72 hours unless there is an objection
> or
> > insufficient votes.
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=307136992
> > [2] https://lists.apache.org/thread/1sthbv6q00sq52pp04n2p26d70w4fqj1
> >
> > Best,
> > Jane
> >
>


[jira] [Created] (FLINK-35429) We don't need introduce getFlinkConfigurationOptions for SqlGatewayRestEndpointFactory#Context

2024-05-23 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35429:
--

 Summary: We don't need introduce getFlinkConfigurationOptions for 
SqlGatewayRestEndpointFactory#Context
 Key: FLINK-35429
 URL: https://issues.apache.org/jira/browse/FLINK-35429
 Project: Flink
  Issue Type: Bug
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35428) WindowJoinITCase#testInnerJoin failed on AZP as NPE

2024-05-23 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35428:
--

 Summary: WindowJoinITCase#testInnerJoin failed on AZP as NPE
 Key: FLINK-35428
 URL: https://issues.apache.org/jira/browse/FLINK-35428
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59751=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11944



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-451: Introduce timeout configuration to AsyncSink

2024-05-22 Thread weijie guo
Thanks Ahmed for introducing this FLIP, nice improvement.

+1(binding)

Best regards,

Weijie


Leonard Xu  于2024年5月22日周三 17:25写道:

>
> After discuss with Ahmed, the updated FLIP looks good to me.
>
> +1(binding)
>
>
> Best,
> Leonard
>
> > 2024年5月21日 下午6:12,Hong Liang  写道:
> >
> > +1 (binding)
> >
> > Thanks Ahmed
> >
> > On Tue, May 14, 2024 at 11:51 AM David Radley 
> > wrote:
> >
> >> Thanks for the clarification Ahmed
> >>
> >> +1 (non-binding)
> >>
> >> From: Ahmed Hamdy 
> >> Date: Monday, 13 May 2024 at 19:58
> >> To: dev@flink.apache.org 
> >> Subject: [EXTERNAL] Re: [VOTE] FLIP-451: Introduce timeout configuration
> >> to AsyncSink
> >> Thanks David,
> >> I have replied to your question in the discussion thread.
> >> Best Regards
> >> Ahmed Hamdy
> >>
> >>
> >> On Mon, 13 May 2024 at 16:21, David Radley 
> >> wrote:
> >>
> >>> Hi,
> >>> I raised a question on the discussion thread, around retriable errors,
> as
> >>> a possible alternative,
> >>>  Kind regards, David.
> >>>
> >>>
> >>> From: Aleksandr Pilipenko 
> >>> Date: Monday, 13 May 2024 at 16:07
> >>> To: dev@flink.apache.org 
> >>> Subject: [EXTERNAL] Re: [VOTE] FLIP-451: Introduce timeout
> configuration
> >>> to AsyncSink
> >>> Thanks for driving this!
> >>>
> >>> +1 (non-binding)
> >>>
> >>> Thanks,
> >>> Aleksandr
> >>>
> >>> On Mon, 13 May 2024 at 14:08, 
> >>> wrote:
> >>>
>  Thanks Ahmed!
> 
>  +1 non binding
>  On May 13, 2024 at 12:40 +0200, Jeyhun Karimov  >,
>  wrote:
> > Thanks for driving this Ahmed.
> >
> > +1 (non-binding)
> >
> > Regards,
> > Jeyhun
> >
> > On Mon, May 13, 2024 at 12:37 PM Muhammet Orazov
> >  wrote:
> >
> >> Thanks Ahmed, +1 (non-binding)
> >>
> >> Best,
> >> Muhammet
> >>
> >> On 2024-05-13 09:50, Ahmed Hamdy wrote:
>  Hi all,
> 
>  Thanks for the feedback on the discussion thread[1], I would
> >> like
>  to
>  start
>  a vote on FLIP-451[2]: Introduce timeout configuration to
> >>> AsyncSink
> 
>  The vote will be open for at least 72 hours unless there is an
>  objection or
>  insufficient votes.
> 
>  1-
> >>> https://lists.apache.org/thread/ft7wcw7kyftvww25n5fm4l925tlgdfg0
>  2-
> 
> >>
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API
>  Best Regards
>  Ahmed Hamdy
> >>
> 
> >>>
> >>> Unless otherwise stated above:
> >>>
> >>> IBM United Kingdom Limited
> >>> Registered in England and Wales with number 741598
> >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >>>
> >>
> >> Unless otherwise stated above:
> >>
> >> IBM United Kingdom Limited
> >> Registered in England and Wales with number 741598
> >> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >>
>
>


Re: [VOTE] Release flink-connector-cassandra v3.2.0, release candidate #1

2024-05-22 Thread weijie guo
+1(non-binding)

-Validated checksum hash
-Verified signature
-Build from source

Best regards,

Weijie


Hang Ruan  于2024年5月22日周三 10:12写道:

> +1 (non-binding)
>
> - Validated checksum hash
> - Verified signature
> - Verified that no binaries exist in the source archive
> - Build the source with Maven and jdk8
> - Verified web PR
> - Check that the jar is built by jdk8
>
> Best,
> Hang
>
> Muhammet Orazov  于2024年5月22日周三 04:15写道:
>
> > Hey all,
> >
> > Could we please get some more votes to proceed with the release?
> >
> > Thanks and best,
> > Muhammet
> >
> > On 2024-04-22 13:04, Danny Cranmer wrote:
> > > Hi everyone,
> > >
> > > Please review and vote on release candidate #1 for
> > > flink-connector-cassandra v3.2.0, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > This release supports Flink 1.18 and 1.19.
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > which are signed with the key with fingerprint 125FD8DB [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag v3.2.0-rc1 [5],
> > > * website pull request listing the new release [6].
> > > * CI build of the tag [7].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Danny
> > >
> > > [1]
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353148
> > > [2]
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-cassandra-3.2.0-rc1
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > > https://repository.apache.org/content/repositories/orgapacheflink-1722
> > > [5]
> > >
> >
> https://github.com/apache/flink-connector-cassandra/releases/tag/v3.2.0-rc1
> > > [6] https://github.com/apache/flink-web/pull/737
> > > [7]
> > >
> >
> https://github.com/apache/flink-connector-cassandra/actions/runs/8784310241
> >
>


Re: [DISCUSS] The performance of serializerHeavyString starts regress since April 3

2024-05-22 Thread weijie guo
Thanks Piotr and Zakelly for your advice.

I have also tested this in my local machine, and I can not reproduce
it even if switch to JDK11. This must have something to do with the
benchmark's testing environment.

> I will update FLINK-35040[1] to "Won't fix" and update the Priority from
Blocker to Major if there are no objections before next Monday.

+1 from my side and thanks for your thorough investigation!


Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年5月22日周三 14:52写道:

> Hi Piotr and Zakelly,
>
> Thanks a lot for providing these historical JIRAs related to
> serializerHeavyString performance. It seems this similar issue
> has happened multiple times.
>
> And thanks for clarifying the principle of benchmark. I have
> encountered the semi-stable state for other benchmarks as well.
>
> > Available information indicates this issue is environment- and
> JDK-specific, and I also failed to reproduce it in my Mac.
>
> Thanks for your check on your Mac as well.
>
> > Considering the historical context of this test provided by Piotr, I
> vote a "Won't fix" for this problem.
>
> I will update FLINK-35040[1] to "Won't fix" and update the
> Priority from Blocker to Major if there are no objections
> before next Monday.
>
> Of course, it can be picked up or reopened if we have any
> new clues.
>
> Best,
> Rui
>
> [1] https://issues.apache.org/jira/browse/FLINK-35040
>
> On Wed, May 22, 2024 at 12:41 PM Zakelly Lan 
> wrote:
>
>> Hi Rui and RMs of Flink 1.20,
>>
>> Thanks for driving this!
>>
>> Available information indicates this issue is environment- and
>> JDK-specific, and I also failed to reproduce it in my Mac. Thus I guess it
>> is caused by JIT behavior, which is unpredictable and vulnerable to
>> disturbance of the codebase. Considering the historical context of this
>> test provided by Piotr, I vote a "Won't fix" for this problem.
>>
>> And I can offer some help if anyone wants to investigate the benchmark
>> environment, please reach out to me. JDK version info:
>>
>>> openjdk version "11.0.19" 2023-04-18 LTS
>>> OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-2) (build 11.0.19+7-LTS)
>>> OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-2) (build 11.0.19+7-LTS,
>>> mixed mode, sharing)
>>
>> The OS version is Alibaba Cloud Linux 3.2104 LTS 64-bit[1]. The linux
>> kernel version is 5.10.134-15.al8.x86_64.
>>
>>
>> Best,
>> Zakelly
>>
>> [1]
>> https://www.alibabacloud.com/help/en/alinux/product-overview/release-notes-for-alibaba-cloud-linux
>> (See: Alibaba Cloud Linux 3.2104 U8, image id:
>> aliyun_3_x64_20G_alibase_20230727.vhd)
>>
>> On Tue, May 21, 2024 at 8:15 PM Piotr Nowojski 
>> wrote:
>>
>>> Hi,
>>>
>>> Given what you wrote, that you have investigated the issue and couldn't
>>> find any easy explanation, I would suggest closing this ticket as "Won't
>>> do" or "Can not reproduce" and ignoring the problem.
>>>
>>> In the past there have been quite a bit of cases where some benchmark
>>> detected a performance regression. Sometimes those can not be reproduced,
>>> other times (as it's the case here), some seemingly unrelated change is
>>> causing the regression. The same thing happened in this benchmark many
>>> times in the past [1], [2], [3], [4]. Generally speaking this benchmark has
>>> been in the spotlight a couple of times [5].
>>>
>>> Note that there have been cases where this benchmark did detect a
>>> performance regression :)
>>>
>>> My personal suspicion is that after that commons-io version bump,
>>> something poked JVM/JIT to compile the code a bit differently for string
>>> serialization causing this regression. We have a couple of benchmarks that
>>> seem to be prone to such semi intermittent issues. For example the same
>>> benchmark was subject to this annoying pattern [6], that I've spotted in
>>> quite a bit of benchmarks over the years [6]:
>>>
>>> [image: image.png]
>>> (https://imgur.com/a/AoygmWS)
>>>
>>> Where benchmark results are very stable within a single JVM fork. But
>>> between two forks, they can reach two different "stable" levels. Here it
>>> looks like 50% of the chance of getting stable "200 records/ms" and 50%
>>> chances of "250 records/ms".
>>>
>>> A small interlude. Each of our benchmarks run in 3 different JVM forks,
>>> 10 warm up iterations and 10 measurement iterations. Each iteration
>>> lasts/invokes the benchmarking method at least for one second. So by "very
>>> stable" results, I mean that for example after the 2nd or 3rd warm up
>>> iteration, the results stabilize < +/-1%, and stay on that level for the
>>> whole duration of the fork.
>>>
>>> Given that we are repeating the same benchmark in 3 different forks, we
>>> can have by pure chance:
>>> - 3 slow fork - total average 200 records/ms
>>> - 2 slow fork, 1 fast fork - average 216 r/ms
>>> - 1 slow fork, 2 fast forks - average 233 r/ms
>>> - 3 fast forks - average 250 r/ms
>>>
>>> So this benchmark is susceptible to enter some different semi stable
>>> states. As I wrote above, I 

[SUMMARY] Flink 1.20 Release Sync 05/21/2024

2024-05-22 Thread weijie guo
Dear devs,


This is the fourth meeting for Flink 1.20 release cycle.


I'd like to share the information synced in the meeting.



- Feature Freeze


  It is worth noting that there are only 4 weeks left until the
feature freeze time(June 15, 2024),
and developers need to pay attention to the feature freeze time.



- Features:


  So far we've had 10 flips/features, there are two FLIPs that
may not be completed in 1.20, and the status of the other
eight FLIPs is good.

  It is encouraged to continuously updating
the 1.20 wiki page[1] for contributors.



- Blockers:


  - [Closed] FLINK-35041
IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed


  - [Doing] FLINK-35040 The performance of serializerHeavyString
regresses since April 3

- we have started a discussion[2] about this to dev mail list to
collect some valuable suggestion. We will accept this regression if we
don't have any suggestion.


  - [In Verification] FLINK-35215 The performance of serializerKryo
and serializerKryoWithoutRegistration are regressed

- Rui provided an alternative change without performance
regression, we will follow the benchmark result in the following days,
and close this JIRA
after the performance is recovered.



- Sync meeting[3]:


 The next meeting is 06/04/2024 10am (UTC+2) and 4pm (UTC+8), please
feel free to join us. After this, the release sync will be adjusted to
weekly as we approaching the feature freeze date.


 Lastly, we encourage attendees to fill out the topics to be discussed at
the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
everyone to understand the background of the topics, thanks!



[1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release

[2] https://lists.apache.org/thread/moo2b67qk77ysoofs9j1ojzjk0rhrr9h

[3] https://meet.google.com/mtj-huez-apu



Best,

Robert, Rui, Ufuk, Weijie


[jira] [Created] (FLINK-35416) Weekly CI for ElasticSearch connector failed to compile

2024-05-21 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35416:
--

 Summary: Weekly CI for ElasticSearch connector failed to compile
 Key: FLINK-35416
 URL: https://issues.apache.org/jira/browse/FLINK-35416
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ElasticSearch
Reporter: Weijie Guo
Assignee: Weijie Guo


ElasticsearchSinkBaseITCase.java:[31,65] package 
org.apache.flink.shaded.guava30.com.google.common.collect does not exist



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35398) SqlGatewayE2ECase failed to pull image

2024-05-19 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35398:
--

 Summary: SqlGatewayE2ECase failed to pull image
 Key: FLINK-35398
 URL: https://issues.apache.org/jira/browse/FLINK-35398
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.18.1
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-450: Improve Runtime Configuration for Flink 2.0

2024-05-15 Thread weijie guo
+1(binding)

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年5月15日周三 17:50写道:

> +1(binding)
>
> Best,
> Rui
>
> On Wed, May 15, 2024 at 5:01 PM Xuannan Su  wrote:
>
> > Hi everyone,
> >
> > Thanks for all the feedback about the FLIP-450: Improve Runtime
> > Configuration for Flink 2.0 [1] [2].
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours(excluding weekends,until MAY 20, 12:00AM GMT) unless there is an
> > objection or an insufficient number of votes.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-450%3A+Improve+Runtime+Configuration+for+Flink+2.0
> > [2] https://lists.apache.org/thread/20mkzd31607posls793hxy7mht40xp2x
> >
> >
> > Best regards,
> > Xuannan
> >
>


Re: [VOTE] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-14 Thread weijie guo
Thanks Martijn for the effort!

+1(binding)

Best regards,

Weijie


Martijn Visser  于2024年5月14日周二 14:45写道:

> Hi everyone,
>
> With no more discussions being open in the thread [1] I would like to start
> a vote on FLIP-453: Promote Unified Sink API V2 to Public and Deprecate
> SinkFunction [2]
>
> The vote will be open for at least 72 hours unless there is an objection or
> insufficient votes.
>
> Best regards,
>
> Martijn
>
> [1] https://lists.apache.org/thread/hod6bg421bzwhbfv60lwsck7r81dvo59
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-453%3A+Promote+Unified+Sink+API+V2+to+Public+and+Deprecate+SinkFunction
>


[jira] [Created] (FLINK-35254) build_wheels_on_macos failed

2024-04-27 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35254:
--

 Summary: build_wheels_on_macos failed
 Key: FLINK-35254
 URL: https://issues.apache.org/jira/browse/FLINK-35254
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35246) SqlClientSSLTest.testGatewayMode failed in AZP

2024-04-26 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35246:
--

 Summary: SqlClientSSLTest.testGatewayMode failed in AZP
 Key: FLINK-35246
 URL: https://issues.apache.org/jira/browse/FLINK-35246
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35238) AZP is not working since some ci agent unhealthy

2024-04-25 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35238:
--

 Summary: AZP is not working since some ci agent unhealthy
 Key: FLINK-35238
 URL: https://issues.apache.org/jira/browse/FLINK-35238
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [SUMMARY] Flink 1.20 Release Sync 04/23/2024

2024-04-23 Thread weijie guo
Thanks Rui for the summary!

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年4月23日周二 17:13写道:

> Dear devs,
>
> Today is the second meeting for
> Flink 1.20 release cycle. I'd like to share
> the information synced in the meeting.
>
> - Features:
>   There're 8 weeks until the feature freeze date, so far we've had 6
>   flips/features and the status looks good. It is encouraged to
> continuously
>   updating the 1.20 wiki page[1] for contributors.
>
> - Blockers:
>   - [Closed] Change on getTransitivePredecessors breaks connectors
> FLINK-35009
>   - [Doing] 2 tests fail FLINK-35041, FLINK-34997
>   - [Doing] 2 benchmarks performance regression FLINK-35040, FLINK-35215
>
> - Sync meeting[2]:
>  The next meeting is 05/07/2024, please feel free to join us.
>
> Lastly, we encourage attendees to fill out the topics to be discussed at
> the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
> everyone to understand the background of the topics.
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
> [2] https://meet.google.com/mtj-huez-apu
>
> Best,
>
> Weijie, Ufuk, Robert and Rui
>


[jira] [Created] (FLINK-35185) Resuming Externalized Checkpoint(rocks, incremental, no parallelism change) end-to-end test failed

2024-04-21 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35185:
--

 Summary: Resuming Externalized Checkpoint(rocks, incremental, no 
parallelism change) end-to-end test failed 
 Key: FLINK-35185
 URL: https://issues.apache.org/jira/browse/FLINK-35185
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink Committer - Zakelly Lan

2024-04-14 Thread weijie guo
Congratulations, Zakelly!

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年4月15日周一 11:06写道:

> Congratulations!
>
> Best,
> Rui
>
> On Mon, Apr 15, 2024 at 11:01 AM Congxian Qiu 
> wrote:
>
> > Congratulations!
> >
> > Best,
> > Congxian
> >
> >
> > Ron liu  于2024年4月15日周一 11:00写道:
> >
> > > Congratulations!
> > >
> > > Best,
> > > Ron
> > >
> > > Yuan Mei  于2024年4月15日周一 10:51写道:
> > >
> > > > Hi everyone,
> > > >
> > > > On behalf of the PMC, I'm happy to let you know that Zakelly Lan has
> > > become
> > > > a new Flink Committer!
> > > >
> > > > Zakelly has been continuously contributing to the Flink project since
> > > 2020,
> > > > with a focus area on Checkpointing, State as well as frocksdb (the
> > > default
> > > > on-disk state db).
> > > >
> > > > He leads several FLIPs to improve checkpoints and state APIs,
> including
> > > > File Merging for Checkpoints and configuration/API reorganizations.
> He
> > is
> > > > also one of the main contributors to the recent efforts of
> > "disaggregated
> > > > state management for Flink 2.0" and drives the entire discussion in
> the
> > > > mailing thread, demonstrating outstanding technical depth and breadth
> > of
> > > > knowledge.
> > > >
> > > > Beyond his technical contributions, Zakelly is passionate about
> helping
> > > the
> > > > community in numerous ways. He spent quite some time setting up the
> > Flink
> > > > Speed Center and rebuilding the benchmark pipeline after the original
> > one
> > > > was out of lease. He helps build frocksdb and tests for the upcoming
> > > > frocksdb release (bump rocksdb from 6.20.3->8.10).
> > > >
> > > > Please join me in congratulating Zakelly for becoming an Apache Flink
> > > > committer!
> > > >
> > > > Best,
> > > > Yuan (on behalf of the Flink PMC)
> > > >
> > >
> >
>


[jira] [Created] (FLINK-35100) Quickstarts Java nightly end-to-end test failed on Azure

2024-04-14 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35100:
--

 Summary: Quickstarts Java nightly end-to-end test failed on Azure
 Key: FLINK-35100
 URL: https://issues.apache.org/jira/browse/FLINK-35100
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink PMC Member - Jing Ge

2024-04-12 Thread weijie guo
Congratulations, Jing!

Best regards,

Weijie


Feng Jin  于2024年4月12日周五 17:24写道:

> Congratulations, Jing
>
> Best,
> Feng Jin
>
> On Fri, Apr 12, 2024 at 4:46 PM Samrat Deb  wrote:
>
> > Congratulations, Jing!
> >
> >
> > On Fri, 12 Apr 2024 at 2:15 PM, Jiabao Sun  wrote:
> >
> > > Congratulations, Jing!
> > >
> > > Best,
> > > Jiabao
> > >
> > > Sergey Nuyanzin  于2024年4月12日周五 16:41写道:
> > >
> > > > Congratulations, Jing!
> > > >
> > > > On Fri, Apr 12, 2024 at 10:29 AM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > >
> > > > > Congratulations, Jing~
> > > > >
> > > > > Best,
> > > > > Rui
> > > > >
> > > > > On Fri, Apr 12, 2024 at 4:28 PM Yun Tang  wrote:
> > > > >
> > > > > > Congratulations, Jing!
> > > > > >
> > > > > > Best
> > > > > > Yun Tang
> > > > > > 
> > > > > > From: Jark Wu 
> > > > > > Sent: Friday, April 12, 2024 16:02
> > > > > > To: dev 
> > > > > > Cc: gej...@gmail.com 
> > > > > > Subject: [ANNOUNCE] New Apache Flink PMC Member - Jing Ge
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > On behalf of the PMC, I'm very happy to announce that Jing Ge has
> > > > > > joined the Flink PMC!
> > > > > >
> > > > > > Jing has been contributing to Apache Flink for a long time. He
> > > > > continuously
> > > > > > works on SQL, connectors, Source, and Sink APIs, test, and
> document
> > > > > > modules while contributing lots of code and insightful
> discussions.
> > > He
> > > > is
> > > > > > one of the maintainers of Flink CI infra. He is also willing to
> > help
> > > a
> > > > > lot
> > > > > > in the
> > > > > > community work, such as being the release manager for both 1.18
> and
> > > > 1.19,
> > > > > > verifying releases, and answering questions on the mailing list.
> > > > Besides
> > > > > > that,
> > > > > > he is continuously helping with the expansion of the Flink
> > community
> > > > and
> > > > > > has
> > > > > > given several talks about Flink at many conferences, such as
> Flink
> > > > > Forward
> > > > > > 2022 and 2023.
> > > > > >
> > > > > > Congratulations and welcome Jing!
> > > > > >
> > > > > > Best,
> > > > > > Jark (on behalf of the Flink PMC)
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Sergey
> > > >
> > >
> >
>


Re: Re: [ANNOUNCE] New Apache Flink PMC Member - Lincoln Lee

2024-04-12 Thread weijie guo
Congratulations, well deserved.

Best regards,

Weijie


Yuepeng Pan  于2024年4月12日周五 16:41写道:

> Congratulations, Lincoln!
>
> Best,Yuepeng Pan
> At 2024-04-12 16:24:01, "Yun Tang"  wrote:
> >Congratulations, Lincoln!
> >
> >
> >Best
> >Yun Tang
> >
> >From: Jark Wu 
> >Sent: Friday, April 12, 2024 15:59
> >To: dev 
> >Cc: Lincoln Lee 
> >Subject: [ANNOUNCE] New Apache Flink PMC Member - Lincoln Lee
> >
> >Hi everyone,
> >
> >On behalf of the PMC, I'm very happy to announce that Lincoln Lee has
> >joined the Flink PMC!
> >
> >Lincoln has been an active member of the Apache Flink community for
> >many years. He mainly works on Flink SQL component and has driven
> >/pushed many FLIPs around SQL, including FLIP-282/373/415/435 in
> >the recent versions. He has a great technical vision of Flink SQL and
> >participated in plenty of discussions in the dev mailing list. Besides
> >that,
> >he is community-minded, such as being the release manager of 1.19,
> >verifying releases, managing release syncs, writing the release
> >announcement etc.
> >
> >Congratulations and welcome Lincoln!
> >
> >Best,
> >Jark (on behalf of the Flink PMC)
>


[SUMMARY] Flink 1.20 Release Sync 04/09/2024

2024-04-12 Thread weijie guo
Dear devs,


A few days ago(April 9) we had our first
meeting for Flink 1.20 release cycle. I'd like to share the info
synced in the meeting.


1. Feature freezing and release:

   - Feature Freeze
   - June 15, 2024, 00:00 CEST(UTC+2)
  - Release
   - End of July 2024

2. Daily work divisions:


In general, every release manager will be working on all daily issues.
For some major tasks, in order to make sure there will at least always
be someone to take care of them, they have been assigned to specific
release managers[1]. If you need support in each of these areas,
please don't hesitate to contact us.


3. Update 1.20 Release page:


Flink 1.20 release has been kicked off. We'd like to invite you to
update your development plan on the release page[2].


4. Blocker Issues


There are currently 3 blocker issues. All of them have already been
assigned now. One worth highlighting is a performance regression
issue: FLINK-35040[3]. Rui is helping with the investigation.

The next release sync up meeting will be on April 23, 2024. Please
feel free to join us[4]!


Best regards,

Weijie, Rui, Ufuk and Robert


[1]
https://cwiki.apache.org/confluence/display/FLINK/1.20+Release#id-1.20Release-04/09/24

[2]
https://cwiki.apache.org/confluence/display/FLINK/1.20+Release#id-1.20Release-List

[3] https://issues.apache.org/jira/browse/FLINK-35040

[3] https://meet.google.com/mtj-huez-apu


[jira] [Created] (FLINK-35042) Streaming File Sink s3 end-to-end test failed as TM lost

2024-04-07 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35042:
--

 Summary: Streaming File Sink s3 end-to-end test failed as TM lost
 Key: FLINK-35042
 URL: https://issues.apache.org/jira/browse/FLINK-35042
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35041) IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed

2024-04-07 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35041:
--

 Summary: 
IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed
 Key: FLINK-35041
 URL: https://issues.apache.org/jira/browse/FLINK-35041
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35023) YARNApplicationITCase failed on Azure

2024-04-06 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35023:
--

 Summary: YARNApplicationITCase failed on Azure
 Key: FLINK-35023
 URL: https://issues.apache.org/jira/browse/FLINK-35023
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo


1. 
YARNApplicationITCase.testApplicationClusterWithLocalUserJarAndDisableUserJarInclusion

{code:java}
Apr 06 02:19:44 02:19:44.063 [ERROR] 
org.apache.flink.yarn.YARNApplicationITCase.testApplicationClusterWithLocalUserJarAndDisableUserJarInclusion
 -- Time elapsed: 9.727 s <<< FAILURE!
Apr 06 02:19:44 java.lang.AssertionError: Application became FAILED or KILLED 
while expecting FINISHED
Apr 06 02:19:44 at 
org.apache.flink.yarn.YarnTestBase.waitApplicationFinishedElseKillIt(YarnTestBase.java:1282)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YARNApplicationITCase.deployApplication(YARNApplicationITCase.java:116)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YARNApplicationITCase.lambda$testApplicationClusterWithLocalUserJarAndDisableUserJarInclusion$1(YARNApplicationITCase.java:72)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YARNApplicationITCase.testApplicationClusterWithLocalUserJarAndDisableUserJarInclusion(YARNApplicationITCase.java:70)
Apr 06 02:19:44 at 
java.base/java.lang.reflect.Method.invoke(Method.java:568)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)

{code}


2. YARNApplicationITCase.testApplicationClusterWithRemoteUserJar


{code:java}
Apr 06 02:19:44 java.lang.AssertionError: Application became FAILED or KILLED 
while expecting FINISHED
Apr 06 02:19:44 at 
org.apache.flink.yarn.YarnTestBase.waitApplicationFinishedElseKillIt(YarnTestBase.java:1282)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YARNApplicationITCase.deployApplication(YARNApplicationITCase.java:116)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YARNApplicationITCase.lambda$testApplicationClusterWithRemoteUserJar$2(YARNApplicationITCase.java:86)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YARNApplicationITCase.testApplicationClusterWithRemoteUserJar(YARNApplicationITCase.java:84)
Apr 06 02:19:44 at 
java.base/java.lang.reflect.Method.invoke(Method.java:568)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
{code}



3. 
YARNApplicationITCase.testApplicationClusterWithLocalUserJarAndFirstUserJarInclusion


{code:java}
Apr 06 02:19:44 java.lang.AssertionError: Application became FAILED or KILLED 
while expecting FINISHED
Apr 06 02:19:44 at 
org.apache.flink.yarn.YarnTestBase.waitApplicationFinishedElseKillIt(YarnTestBase.java:1282)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YARNApplicationITCase.deployApplication(YARNApplicationITCase.java:116)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YARNApplicationITCase.lambda$testApplicationClusterWithLocalUserJarAndFirstUserJarInclusion$0(YARNApplicationITCase.java:62)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:303)
Apr 06 02:19:44 at 
org.apache.flink.yarn.YARNApplicationITCase.testApplicationClusterWithLocalUserJarAndFirstUserJarInclusion(YARNApplicationITCase.java:60)
Apr 06 02:19:44 at 
java.base/java.lang.reflect.Method.invoke(Method.java:568)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
Apr 06 02:19:44 at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
Apr 06 02:19:44 at 

Re: [FYI] The Azure CI for PRs is currently not triggered

2024-04-06 Thread weijie guo
Thanks all!

Best regards,

Weijie


lorenzo affetti  于2024年4月4日周四
20:04写道:

> The CI is back to normal since this morning.
> Thanks to the collaboration with Robert and Mathias.
>
> For more information, see the issue link in Matthias message.
>
>
>
> On Thu, Apr 4, 2024, 08:23 Matthias Pohl 
> wrote:
>
> > Hi everyone,
> > just for your information: The Azure CI for PRs is currently not working.
> > This started to happen on Tuesday (April 2 at around 7pm (CEST)).
> > FLINK-34999 [1] covers the issue.
> >
> > We're expecting the issue to be gone by today. But in the meantime, these
> > are the things you can do:
> > 1. Wait for FLINK-34999 to be fixed before merging your PR.
> > 2. Check the GHA workflow run for the PR and commit in your fork (and
> share
> > the link in your PR for documentation).
> > 3. Azure Pipelines CI is still triggered for pushes to master and the
> > release branches [2], i.e. if you decide to merge, monitor these builds
> > closely.
> >
> > That said, option (1), i.e. waiting for FLINK-34999 to be fixed, is still
> > the preferred way considering that we have Azure Pipelines defined as our
> > ground of truth for now and that the issue is going to be fixed today,
> > hopefully. Additionally, merging a change without a CI run isn't the best
> > option, either. But I still want to be transparent about your options.
> >
> > Matthias
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-34999
> > [2]
> >
> >
> https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=1&_a=summary
> >
> > --
> >
> > [image: Aiven] 
> >
> > *Matthias Pohl*
> > Opensource Software Engineer, *Aiven*
> > matthias.p...@aiven.io|  +49 170 9869525
> > aiven.io    |   <
> https://www.facebook.com/aivencloud
> > >
> >      <
> > https://twitter.com/aiven_io>
> > *Aiven Deutschland GmbH*
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>


[jira] [Created] (FLINK-34998) Wordcount on Docker test failed on azure

2024-04-03 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-34998:
--

 Summary: Wordcount on Docker test failed on azure
 Key: FLINK-34998
 URL: https://issues.apache.org/jira/browse/FLINK-34998
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo


/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh:
 line 65: docker-compose: command not found
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh:
 line 66: docker-compose: command not found
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh:
 line 67: docker-compose: command not found
sort: cannot read: 
'/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-24250435151/out/docker_wc_out*':
 No such file or directory
Apr 03 02:08:14 FAIL WordCount: Output hash mismatch.  Got 
d41d8cd98f00b204e9800998ecf8427e, expected 0e5bd0a3dd7d5a7110aa85ff70adb54b.
Apr 03 02:08:14 head hexdump of actual:
head: cannot open 
'/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-24250435151/out/docker_wc_out*'
 for reading: No such file or directory
Apr 03 02:08:14 Stopping job timeout watchdog (with pid=244913)
Apr 03 02:08:14 [FAIL] Test script contains errors.


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58709=logs=e9d3d34f-3d15-59f4-0e3e-35067d100dfe=5d91035e-8022-55f2-2d4f-ab121508bf7e=6043



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34997) PyFlink YARN per-job on Docker test failed on azure

2024-04-03 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-34997:
--

 Summary: PyFlink YARN per-job on Docker test failed on azure
 Key: FLINK-34997
 URL: https://issues.apache.org/jira/browse/FLINK-34997
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo


Apr 03 03:12:37 
==
Apr 03 03:12:37 Running 'PyFlink YARN per-job on Docker test'
Apr 03 03:12:37 
==
Apr 03 03:12:37 TEST_DATA_DIR: 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-37046085202
Apr 03 03:12:37 Flink dist directory: 
/home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
Apr 03 03:12:38 Flink dist directory: 
/home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT
Apr 03 03:12:38 Docker version 24.0.9, build 2936816
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: line 
24: docker-compose: command not found
Apr 03 03:12:38 [FAIL] Test script contains errors.
Apr 03 03:12:38 Checking of logs skipped.
Apr 03 03:12:38 
Apr 03 03:12:38 [FAIL] 'PyFlink YARN per-job on Docker test' failed after 0 
minutes and 1 seconds! Test exited with exit code 1




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34977) FLIP-433: State Access on DataStream API V2

2024-03-31 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-34977:
--

 Summary: FLIP-433: State Access on DataStream API V2
 Key: FLINK-34977
 URL: https://issues.apache.org/jira/browse/FLINK-34977
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream, API / State Processor
Affects Versions: 1.20.0
Reporter: Weijie Guo
Assignee: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-31 Thread weijie guo
Congratulations!

Best regards,

Weijie


Hang Ruan  于2024年4月1日周一 09:49写道:

> Congratulations!
>
> Best,
> Hang
>
> Lincoln Lee  于2024年3月31日周日 00:10写道:
>
> > Congratulations!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jark Wu  于2024年3月30日周六 22:13写道:
> >
> > > Congratulations!
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 29 Mar 2024 at 12:08, Yun Tang  wrote:
> > >
> > > > Congratulations to all Paimon guys!
> > > >
> > > > Glad to see a Flink sub-project has been graduated to an Apache
> > top-level
> > > > project.
> > > >
> > > > Best
> > > > Yun Tang
> > > >
> > > > 
> > > > From: Hangxiang Yu 
> > > > Sent: Friday, March 29, 2024 10:32
> > > > To: dev@flink.apache.org 
> > > > Subject: Re: Re: [ANNOUNCE] Apache Paimon is graduated to Top Level
> > > Project
> > > >
> > > > Congratulations!
> > > >
> > > > On Fri, Mar 29, 2024 at 10:27 AM Benchao Li 
> > > wrote:
> > > >
> > > > > Congratulations!
> > > > >
> > > > > Zakelly Lan  于2024年3月29日周五 10:25写道:
> > > > > >
> > > > > > Congratulations!
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Zakelly
> > > > > >
> > > > > > On Thu, Mar 28, 2024 at 10:13 PM Jing Ge
> >  > > >
> > > > > wrote:
> > > > > >
> > > > > > > Congrats!
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Jing
> > > > > > >
> > > > > > > On Thu, Mar 28, 2024 at 1:27 PM Feifan Wang <
> zoltar9...@163.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations!——
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Feifan Wang
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > At 2024-03-28 20:02:43, "Yanfei Lei" 
> > > wrote:
> > > > > > > > >Congratulations!
> > > > > > > > >
> > > > > > > > >Best,
> > > > > > > > >Yanfei
> > > > > > > > >
> > > > > > > > >Zhanghao Chen  于2024年3月28日周四
> > > 19:59写道:
> > > > > > > > >>
> > > > > > > > >> Congratulations!
> > > > > > > > >>
> > > > > > > > >> Best,
> > > > > > > > >> Zhanghao Chen
> > > > > > > > >> 
> > > > > > > > >> From: Yu Li 
> > > > > > > > >> Sent: Thursday, March 28, 2024 15:55
> > > > > > > > >> To: d...@paimon.apache.org 
> > > > > > > > >> Cc: dev ; user <
> u...@flink.apache.org
> > >
> > > > > > > > >> Subject: Re: [ANNOUNCE] Apache Paimon is graduated to Top
> > > Level
> > > > > > > Project
> > > > > > > > >>
> > > > > > > > >> CC the Flink user and dev mailing list.
> > > > > > > > >>
> > > > > > > > >> Paimon originated within the Flink community, initially
> > known
> > > as
> > > > > Flink
> > > > > > > > >> Table Store, and all our incubating mentors are members of
> > the
> > > > > Flink
> > > > > > > > >> Project Management Committee. I am confident that the
> bonds
> > of
> > > > > > > > >> enduring friendship and close collaboration will continue
> to
> > > > > unite the
> > > > > > > > >> two communities.
> > > > > > > > >>
> > > > > > > > >> And congratulations all!
> > > > > > > > >>
> > > > > > > > >> Best Regards,
> > > > > > > > >> Yu
> > > > > > > > >>
> > > > > > > > >> On Wed, 27 Mar 2024 at 20:35, Guojun Li <
> > > > gjli.schna...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > Congratulations!
> > > > > > > > >> >
> > > > > > > > >> > Best,
> > > > > > > > >> > Guojun
> > > > > > > > >> >
> > > > > > > > >> > On Wed, Mar 27, 2024 at 5:24 PM wulin <
> > ouyangwu...@163.com>
> > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Congratulations~
> > > > > > > > >> > >
> > > > > > > > >> > > > 2024年3月27日 15:54,王刚  .INVALID>
> > > 写道:
> > > > > > > > >> > > >
> > > > > > > > >> > > > Congratulations~
> > > > > > > > >> > > >
> > > > > > > > >> > > >> 2024年3月26日 10:25,Jingsong Li <
> jingsongl...@gmail.com
> > >
> > > > 写道:
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> Hi Paimon community,
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> I’m glad to announce that the ASF board has
> approved
> > a
> > > > > > > > resolution to
> > > > > > > > >> > > >> graduate Paimon into a full Top Level Project.
> Thanks
> > > to
> > > > > > > > everyone for
> > > > > > > > >> > > >> your help to get to this point.
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> I just created an issue to track the things we need
> > to
> > > > > modify
> > > > > > > > [2],
> > > > > > > > >> > > >> please comment on it if you feel that something is
> > > > > missing. You
> > > > > > > > can
> > > > > > > > >> > > >> refer to apache documentation [1] too.
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> And, we already completed the GitHub repo migration
> > > [3],
> > > > > please
> > > > > > > > update
> > > > > > > > >> > > >> your local git repo to track the new repo [4].
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> You can run the following command to complete the
> > > remote
> > > > > repo
> > > > > > > > tracking
> > > > > > > > >> > > >> migration.
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> git remote set-url 

Re: [DISCUSS] Planning Flink 1.20

2024-03-28 Thread weijie guo
Hi everyone,

The discussion has been going on for a long time and there are currently 4
RM candidates.

According to the voting results, the final release manager of 1.20 are:
Weijie Guo, Rui Fan, Ufuk Celebi and Robert Metzger. We will do the first
release sync on April 9, 2024, at 10am (UTC+2) and 4pm (UTC+8). Welcome to
the meeting!

Best regards,

Weijie


weijie guo  于2024年3月26日周二 09:53写道:

> Thanks for joining, Ufuk & Robert :)
>
> Best regards,
>
> Weijie
>
>
> Rui Fan <1996fan...@gmail.com> 于2024年3月25日周一 23:06写道:
>
>> Welcome Ufuk and Robert join the release manager group!
>>
>> Best,
>> Rui
>>
>> Márton Balassi 于2024年3月25日 周一20:04写道:
>>
>> > Thanks for kicking this off, folks.
>> >
>> > +1 for the timeline and the release manager candidates (Weijie, Rui
>> > Fan,Ufuk/Robert).
>> >
>> > On Mon, Mar 25, 2024 at 12:54 PM Leonard Xu  wrote:
>> >
>> > > Wow, happy to see Ufuk and Robert join the release managers group.
>> > >
>> > > +1 for the release manager candidates(Weijie, Rui Fan,Ufuk and Robert)
>> > > from my side.
>> > >
>> > >
>> > > Best,
>> > > Leonard
>> > >
>> > >
>> > >
>> > > > 2024年3月25日 下午6:09,Robert Metzger  写道:
>> > > >
>> > > > Hi, thanks for starting the discussion.
>> > > >
>> > > > +1 for the proposed timeline and the three proposed release
>> managers.
>> > > >
>> > > > I'm happy to join the release managers group as well, as a backup
>> for
>> > > Ufuk
>> > > > (unless there are objections about the number of release managers)
>> > > >
>> > > > On Mon, Mar 25, 2024 at 11:04 AM Ufuk Celebi 
>> wrote:
>> > > >
>> > > >> Hey all,
>> > > >>
>> > > >> I'd like to join the release managers for 1.20 as well. I'm looking
>> > > >> forward to getting more actively involved again.
>> > > >>
>> > > >> Cheers,
>> > > >>
>> > > >> Ufuk
>> > > >>
>> > > >> On Sun, Mar 24, 2024, at 11:27 AM, Ahmed Hamdy wrote:
>> > > >>> +1 for the proposed timeline and release managers.
>> > > >>> Best Regards
>> > > >>> Ahmed Hamdy
>> > > >>>
>> > > >>>
>> > > >>> On Fri, 22 Mar 2024 at 07:41, Xintong Song > >
>> > > >> wrote:
>> > > >>>
>> > > >>>> +1 for the proposed timeline and Weijie & Rui as the release
>> > managers.
>> > > >>>>
>> > > >>>> I think it would be welcomed if another 1-2 volunteers join as
>> the
>> > > >> release
>> > > >>>> managers, but that's not a must. We used to have only 1-2 release
>> > > >> managers
>> > > >>>> for each release,
>> > > >>>>
>> > > >>>> Best,
>> > > >>>>
>> > > >>>> Xintong
>> > > >>>>
>> > > >>>>
>> > > >>>>
>> > > >>>> On Fri, Mar 22, 2024 at 2:55 PM Jark Wu 
>> wrote:
>> > > >>>>
>> > > >>>>> Thanks for kicking this off.
>> > > >>>>>
>> > > >>>>> +1 for the volunteered release managers (Weijie Guo, Rui Fan)
>> and
>> > the
>> > > >>>>> targeting date (feature freeze: June 15).
>> > > >>>>>
>> > > >>>>> Best,
>> > > >>>>> Jark
>> > > >>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> On Fri, 22 Mar 2024 at 14:00, Rui Fan <1996fan...@gmail.com>
>> > wrote:
>> > > >>>>>
>> > > >>>>>> Thanks Leonard for this feedback and help!
>> > > >>>>>>
>> > > >>>>>> Best,
>> > > >>>>>> Rui
>> > > >>>>>>
>> > > >>>>>> On Fri, Mar 22, 2024 at 12:36 PM weijie guo <
>> > &g

[RESULT][VOTE] FLIP-433: State Access on DataStream API V2

2024-03-25 Thread weijie guo
Hi devs,


I'm happy to announce that FLIP-433: State Access on DataStream API V2
[1] has been accepted with 9 approving votes (4 binding) [2]:


- Xintong Song (binding)

- Hangxiang Yu (binding)

- Rui Fan (binding)

- Gyula Fóra (binding)

- Zakelly Lan (non-binding)

- Jinzhong Li (non-binding)

- Ahmed Hamdy (non-binding)

- Samrat Deb (non-binding)

- Yunfeng Zhou (non-binding)


There are no disapproving votes. Thanks to everyone who participated
in the discussion and voting.


Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-433%3A+State+Access+on+DataStream+API+V2

[2] https://lists.apache.org/thread/hwxwfd8fcqo6olg5p9w41jv6rflzw53o


Re: [DISCUSS] Planning Flink 1.20

2024-03-25 Thread weijie guo
Thanks for joining, Ufuk & Robert :)

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年3月25日周一 23:06写道:

> Welcome Ufuk and Robert join the release manager group!
>
> Best,
> Rui
>
> Márton Balassi 于2024年3月25日 周一20:04写道:
>
> > Thanks for kicking this off, folks.
> >
> > +1 for the timeline and the release manager candidates (Weijie, Rui
> > Fan,Ufuk/Robert).
> >
> > On Mon, Mar 25, 2024 at 12:54 PM Leonard Xu  wrote:
> >
> > > Wow, happy to see Ufuk and Robert join the release managers group.
> > >
> > > +1 for the release manager candidates(Weijie, Rui Fan,Ufuk and Robert)
> > > from my side.
> > >
> > >
> > > Best,
> > > Leonard
> > >
> > >
> > >
> > > > 2024年3月25日 下午6:09,Robert Metzger  写道:
> > > >
> > > > Hi, thanks for starting the discussion.
> > > >
> > > > +1 for the proposed timeline and the three proposed release managers.
> > > >
> > > > I'm happy to join the release managers group as well, as a backup for
> > > Ufuk
> > > > (unless there are objections about the number of release managers)
> > > >
> > > > On Mon, Mar 25, 2024 at 11:04 AM Ufuk Celebi  wrote:
> > > >
> > > >> Hey all,
> > > >>
> > > >> I'd like to join the release managers for 1.20 as well. I'm looking
> > > >> forward to getting more actively involved again.
> > > >>
> > > >> Cheers,
> > > >>
> > > >> Ufuk
> > > >>
> > > >> On Sun, Mar 24, 2024, at 11:27 AM, Ahmed Hamdy wrote:
> > > >>> +1 for the proposed timeline and release managers.
> > > >>> Best Regards
> > > >>> Ahmed Hamdy
> > > >>>
> > > >>>
> > > >>> On Fri, 22 Mar 2024 at 07:41, Xintong Song 
> > > >> wrote:
> > > >>>
> > > >>>> +1 for the proposed timeline and Weijie & Rui as the release
> > managers.
> > > >>>>
> > > >>>> I think it would be welcomed if another 1-2 volunteers join as the
> > > >> release
> > > >>>> managers, but that's not a must. We used to have only 1-2 release
> > > >> managers
> > > >>>> for each release,
> > > >>>>
> > > >>>> Best,
> > > >>>>
> > > >>>> Xintong
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Mar 22, 2024 at 2:55 PM Jark Wu  wrote:
> > > >>>>
> > > >>>>> Thanks for kicking this off.
> > > >>>>>
> > > >>>>> +1 for the volunteered release managers (Weijie Guo, Rui Fan) and
> > the
> > > >>>>> targeting date (feature freeze: June 15).
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Jark
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Fri, 22 Mar 2024 at 14:00, Rui Fan <1996fan...@gmail.com>
> > wrote:
> > > >>>>>
> > > >>>>>> Thanks Leonard for this feedback and help!
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Rui
> > > >>>>>>
> > > >>>>>> On Fri, Mar 22, 2024 at 12:36 PM weijie guo <
> > > >> guoweijieres...@gmail.com
> > > >>>>>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Thanks Leonard!
> > > >>>>>>>
> > > >>>>>>>> I'd like to help you if you need some help like permissions
> > > >> from
> > > >>>> PMC
> > > >>>>>>> side, please feel free to ping me.
> > > >>>>>>>
> > > >>>>>>> Nice to know. It'll help a lot!
> > > >>>>>>>
> > > >>>>>>> Best regards,
> > > >>>>>>>
> > > >>>>>>> Weijie
> > > >>>>>>>
>

Re: [DISCUSS] Planning Flink 1.20

2024-03-21 Thread weijie guo
Thanks Leonard!

> I'd like to help you if you need some help like permissions from PMC
side, please feel free to ping me.

Nice to know. It'll help a lot!

Best regards,

Weijie


Leonard Xu  于2024年3月22日周五 12:09写道:

> +1 for the proposed release managers (Weijie Guo, Rui Fan), both the two
> candidates are pretty active committers thus I believe they know the
> community development process well. The recent releases have four release
> managers, and I am also looking forward to having other volunteers
>  join the management of Flink 1.20.
>
> +1 for targeting date (feature freeze: June 15, 2024), referring to the
> release cycle of recent versions, release cycle of 4 months makes sense to
> me.
>
>
> I'd like to help you if you need some help like permissions from PMC side,
> please feel free to ping me.
>
> Best,
> Leonard
>
>
> > 2024年3月19日 下午5:35,Rui Fan <1996fan...@gmail.com> 写道:
> >
> > Hi Weijie,
> >
> > Thanks for kicking off 1.20! I'd like to join you and participate in the
> > 1.20 release.
> >
> > Best,
> > Rui
> >
> > On Tue, Mar 19, 2024 at 5:30 PM weijie guo 
> > wrote:
> >
> >> Hi everyone,
> >>
> >> With the release announcement of Flink 1.19, it's a good time to kick
> off
> >> discussion of the next release 1.20.
> >>
> >>
> >> - Release managers
> >>
> >>
> >> I'd like to volunteer as one of the release managers this time. It has
> been
> >> good practice to have a team of release managers from different
> >> backgrounds, so please raise you hand if you'd like to volunteer and get
> >> involved.
> >>
> >>
> >>
> >> - Timeline
> >>
> >>
> >> Flink 1.19 has been released. With a target release cycle of 4 months,
> >> we propose a feature freeze date of *June 15, 2024*.
> >>
> >>
> >>
> >> - Collecting features
> >>
> >>
> >> As usual, we've created a wiki page[1] for collecting new features in
> 1.20.
> >>
> >>
> >> In addition, we already have a number of FLIPs that have been voted or
> are
> >> in the process, including pre-works for version 2.0.
> >>
> >>
> >> In the meantime, the release management team will be finalized in the
> next
> >> few days, and we'll continue to create Jira Boards and Sync meetings
> >> to make it easy
> >> for everyone to get an overview and track progress.
> >>
> >>
> >>
> >> Best regards,
> >>
> >> Weijie
> >>
> >>
> >>
> >> [1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
> >>
>
>


Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread weijie guo
Congratulations! Well done.


Best regards,

Weijie


Feng Jin  于2024年3月21日周四 11:40写道:

> Congratulations!
>
>
> Best,
> Feng
>
>
> On Thu, Mar 21, 2024 at 11:37 AM Ron liu  wrote:
>
> > Congratulations!
> >
> > Best,
> > Ron
> >
> > Jark Wu  于2024年3月21日周四 10:46写道:
> >
> > > Congratulations and welcome!
> > >
> > > Best,
> > > Jark
> > >
> > > On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > > Congratulations!
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan 
> > > wrote:
> > > >
> > > > > Congrattulations!
> > > > >
> > > > > Best,
> > > > > Hang
> > > > >
> > > > > Lincoln Lee  于2024年3月21日周四 09:54写道:
> > > > >
> > > > >>
> > > > >> Congrats, thanks for the great work!
> > > > >>
> > > > >>
> > > > >> Best,
> > > > >> Lincoln Lee
> > > > >>
> > > > >>
> > > > >> Peter Huang  于2024年3月20日周三 22:48写道:
> > > > >>
> > > > >>> Congratulations
> > > > >>>
> > > > >>>
> > > > >>> Best Regards
> > > > >>> Peter Huang
> > > > >>>
> > > > >>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang 
> > > > wrote:
> > > > >>>
> > > > 
> > > >  Congratulations
> > > > 
> > > > 
> > > > 
> > > >  Best,
> > > >  Huajie Wang
> > > > 
> > > > 
> > > > 
> > > >  Leonard Xu  于2024年3月20日周三 21:36写道:
> > > > 
> > > > > Hi devs and users,
> > > > >
> > > > > We are thrilled to announce that the donation of Flink CDC as a
> > > > > sub-project of Apache Flink has completed. We invite you to
> > explore
> > > > the new
> > > > > resources available:
> > > > >
> > > > > - GitHub Repository: https://github.com/apache/flink-cdc
> > > > > - Flink CDC Documentation:
> > > > > https://nightlies.apache.org/flink/flink-cdc-docs-stable
> > > > >
> > > > > After Flink community accepted this donation[1], we have
> > completed
> > > > > software copyright signing, code repo migration, code cleanup,
> > > > website
> > > > > migration, CI migration and github issues migration etc.
> > > > > Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
> > > > > Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
> > > > contributors
> > > > > for their contributions and help during this process!
> > > > >
> > > > >
> > > > > For all previous contributors: The contribution process has
> > > slightly
> > > > > changed to align with the main Flink project. To report bugs or
> > > > suggest new
> > > > > features, please open tickets
> > > > > Apache Jira (https://issues.apache.org/jira).  Note that we
> will
> > > no
> > > > > longer accept GitHub issues for these purposes.
> > > > >
> > > > >
> > > > > Welcome to explore the new repository and documentation. Your
> > > > feedback
> > > > > and contributions are invaluable as we continue to improve
> Flink
> > > CDC.
> > > > >
> > > > > Thanks everyone for your support and happy exploring Flink CDC!
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > > [1]
> > > https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
> > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-34900) Check compatibility for classes in flink-core-api that skip japicmp

2024-03-20 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-34900:
--

 Summary: Check compatibility for classes in flink-core-api that 
skip japicmp
 Key: FLINK-34900
 URL: https://issues.apache.org/jira/browse/FLINK-34900
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Weijie Guo
Assignee: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34899) Remove all classes that skip the japicmp check for flink-core-api

2024-03-20 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-34899:
--

 Summary: Remove all classes that skip the japicmp check for 
flink-core-api
 Key: FLINK-34899
 URL: https://issues.apache.org/jira/browse/FLINK-34899
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] FLIP-433: State Access on DataStream API V2

2024-03-20 Thread weijie guo
Hi everyone,


Thanks for all the feedback about the FLIP-433: State Access on
DataStream API V2 [1]. The discussion thread is here [2].


The vote will be open for at least 72 hours unless there is an
objection or insufficient votes.


Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-433%3A+State+Access+on+DataStream+API+V2

[2] https://lists.apache.org/thread/8ghzqkvt99p4k7hjqxzwhqny7zb7xrwo


Re: [DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-19 Thread weijie guo
Hi everyone,

Thanks for your discussion and feedback!

Our discussions have been going on for a while and there have been no new
concerns for several days. So I would like to start voting recently.

Best regards,

Weijie


Zakelly Lan  于2024年3月12日周二 17:40写道:

> Hi Weijie,
>
> Thanks for your reply!
>
> Overall I'd be fine with the builder pattern, but it is a little bit long
> when carrying explicit 'build()' and declaring the builder. Keeping the
> StateDeclaration immutable is OK, but it is a little bit inconvenient for
> overriding the undefined options by job configuration at runtime. I'd
> suggest providing some methods responsible for rebuilding a new
> StateDeclaration with new configurable options, just like the
> ConfigOptions#defaultValue does. Well, this is just a suggestion, I'm not
> going to insist on it.
>
>
> Best,
> Zakelly
>
> On Tue, Mar 12, 2024 at 2:07 PM weijie guo 
> wrote:
>
> > Hi Zakelly,
> >
> > > But still, from a user's point of view,  state can be characterized
> along
> > two relatively independent dimensions, how states redistribute and the
> data
> > structure. Thus I still suggest a chained-like configuration API that
> > configures one aspect on each call.
> >
> >
> > I think the chained-like style is a good suggestion. But I'm not going to
> > introduce any mutable-like API to StateDeclaration (even though we can
> > achieve immutability by returning a new object). For this reason, I
> decided
> > to use the builder pattern, which also has the benefit of chaining calls
> > and allows us to support further configurations such as setTTL in the
> > future. For ease of use, we'll also provide some shortcuts to avoid
> having
> > to go through a long build chain each time. Of course, I have updated the
> > the FLIP about this part.
> >
> >
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > weijie guo  于2024年3月12日周二 14:00写道:
> >
> > > Hi Hangxiang,
> > >
> > > > So these operators only define all states they may use which could be
> > > explained by the caller, right ?
> > >
> > > Yes, you're right.
> > >
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > weijie guo  于2024年3月12日周二 13:59写道:
> > >
> > >> Hi Max,
> > >>
> > >> > In this thread it looks like the plan is to remove the old state
> > >> declaration API. I think we should consider keeping the old APIs to
> > >> avoid breaking too many jobs.
> > >>
> > >> We're not plan to remove any old apis, which means that changes made
> in
> > >> V2 won't affect any V1 DataStream jobs. But V2 is limited to the new
> > state
> > >> declaration API, and users who want to migrate to DataStream V2 will
> > need
> > >> to rewrite their jobs anyway.
> > >>
> > >> Best regards,
> > >>
> > >> Weijie
> > >>
> > >>
> > >> Hangxiang Yu  于2024年3月12日周二 10:26写道:
> > >>
> > >>> Hi, Weijie.
> > >>> Thanks for your answer!
> > >>>
> > >>> > No, Introducing and declaring new state
> > >>> > at runtime is something we want to explicitly disallow.
> > >>>
> > >>> I just thinked about how some operators define their useState() when
> > >>> their
> > >>> real used states may be changed at runtime, e.g. different state
> types
> > >>> for
> > >>> different state sizes.
> > >>> So these operators only define all states they may use which could be
> > >>> explained by the caller, right ?
> > >>>
> > >>> On Mon, Mar 11, 2024 at 10:57 PM Maximilian Michels 
> > >>> wrote:
> > >>>
> > >>> > The FLIP mentions: "The contents described in this FLIP are all new
> > >>> > APIs and do not involve compatibility issues."
> > >>> >
> > >>> > In this thread it looks like the plan is to remove the old state
> > >>> > declaration API. I think we should consider keeping the old APIs to
> > >>> > avoid breaking too many jobs. The new APIs will still be beneficial
> > >>> > for new jobs, e.g. for SQL jobs.
> > >>> >
> > >>> > -Max
> > >>> >
> > >>> > On Fri, Mar 8, 2024 at 4:39 AM Zakelly Lan 
> > 

[DISCUSS] Planning Flink 1.20

2024-03-19 Thread weijie guo
Hi everyone,

With the release announcement of Flink 1.19, it's a good time to kick off
discussion of the next release 1.20.


- Release managers


I'd like to volunteer as one of the release managers this time. It has been
good practice to have a team of release managers from different
backgrounds, so please raise you hand if you'd like to volunteer and get
involved.



- Timeline


Flink 1.19 has been released. With a target release cycle of 4 months,
we propose a feature freeze date of *June 15, 2024*.



- Collecting features


As usual, we've created a wiki page[1] for collecting new features in 1.20.


In addition, we already have a number of FLIPs that have been voted or are
in the process, including pre-works for version 2.0.


In the meantime, the release management team will be finalized in the next
few days, and we'll continue to create Jira Boards and Sync meetings
to make it easy
for everyone to get an overview and track progress.



Best regards,

Weijie



[1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release


Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 Thread weijie guo
Congratulations!

Thanks release managers and all the contributors involved.

Best regards,

Weijie


Leonard Xu  于2024年3月18日周一 16:45写道:

> Congratulations, thanks release managers and all involved for the great
> work!
>
>
> Best,
> Leonard
>
> > 2024年3月18日 下午4:32,Jingsong Li  写道:
> >
> > Congratulations!
> >
> > On Mon, Mar 18, 2024 at 4:30 PM Rui Fan <1996fan...@gmail.com> wrote:
> >>
> >> Congratulations, thanks for the great work!
> >>
> >> Best,
> >> Rui
> >>
> >> On Mon, Mar 18, 2024 at 4:26 PM Lincoln Lee 
> wrote:
> >>>
> >>> The Apache Flink community is very happy to announce the release of
> Apache Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19
> series.
> >>>
> >>> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
> >>>
> >>> The release is available for download at:
> >>> https://flink.apache.org/downloads.html
> >>>
> >>> Please check out the release blog post for an overview of the
> improvements for this bugfix release:
> >>>
> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> >>>
> >>> The full release notes are available in Jira:
> >>>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> >>>
> >>> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
> >>>
> >>>
> >>> Best,
> >>> Yun, Jing, Martijn and Lincoln
>
>


Re: [DISCUSS] FLIP-424: Asynchronous State APIs

2024-03-13 Thread weijie guo
Okay, sorry, I'm not looking at the latest version of the FLIP. You've
answered my question in updated FLIP. :)

Best regards,

Weijie


weijie guo  于2024年3月13日周三 14:56写道:

> Hi Zakelly,
>
> Thanks for the proposal! I like this idea and I can see the performance
> improvements it brings.
>
> In the previous reply you mentioned “these APIs are in some newly
> introduced classes, which are located in a different package name with the
> original one”. I can see the benefits of this. To be honest, there is a lot
> of historical burdens with the old state API, maybe this is a chance to
> break free. If I understand you correctly, the new State(V2) interface will
> still support synchronous API, right? But I didn't see that in the FLIP.
>
>
>
> Best regards,
>
> Weijie
>
>
> Zakelly Lan  于2024年3月13日周三 13:03写道:
>
>> Hi Jing,
>>
>> The deprecation and removal of original APIs is beyond the scope of
>> current
>> FLIP, but I do add/highlight such information under "Compatibility,
>> Deprecation, and Migration Plan" section.
>>
>>
>> Best,
>> Zakelly
>>
>> On Wed, Mar 13, 2024 at 9:18 AM Yunfeng Zhou > >
>> wrote:
>>
>> > Hi Zakelly,
>> >
>> > Thanks for your responses. I agree with it that we can keep the design
>> > as it is for now and see if others have any better ideas for these
>> > questions.
>> >
>> > Best,
>> > Yunfeng
>> >
>> > On Tue, Mar 12, 2024 at 5:23 PM Zakelly Lan 
>> wrote:
>> > >
>> > > Hi Xuannan,
>> > >
>> > > Thanks for your comments, I modified the FLIP accordingly.
>> > >
>> > > Hi Yunfeng,
>> > >
>> > > Thanks for sharing your opinions!
>> > >
>> > >> Could you provide some hint on use cases where users need to mix sync
>> > >> and async state operations in spite of the performance regression?
>> > >> This information might help address our concerns on design. If the
>> > >> mixed usage is simply something not recommended, I would prefer to
>> > >> prohibit such usage from API.
>> > >
>> > > In fact, there is no scenario where users MUST use the sync APIs, but
>> it
>> > is much easier to use for those who are not familiar with asynchronous
>> > programming. If they want to migrate their job from Flink 1.x to 2.0
>> > leveraging some benefits from asynchronous APIs, they may try the mixed
>> > usage. It is not user-friendly to directly throw exceptions at runtime,
>> I
>> > think our better approach is to warn users and recommend avoiding this.
>> I
>> > added an example in this FLIP.
>> > >
>> > > Well, I do not insist on allowing mixed usage of APIs if others reach
>> an
>> > agreement that we won't support that . I think the most important is to
>> > keep the API easy to use and understand, thus I propose a unified state
>> > declaration and explicit meaning in method name. WDYT?
>> > >
>> > >> Sorry I missed the new sink API. I do still think that it would be
>> > >> better to make the package name more informative, and ".v2." does not
>> > >> contain information for new Flink users who did not know the v1 of
>> > >> state API. Unlike internal implementation and performance
>> > >> optimization, API will hardly be compromised for now and updated in
>> > >> future, so I still suggest we improve the package name now if
>> > >> possible. But given the existing practice of sink v2 and
>> > >> AbstractStreamOperatorV2, the current package name would be
>> acceptable
>> > >> to me if other reviewers of this FLIP agrees on it.
>> > >
>> > > Actually, I don't like 'v2' either. So if there is another good name,
>> > I'd be happy to apply. This is a compromise to the current situation.
>> Maybe
>> > we could refine this after the retirement of original state APIs.
>> > >
>> > >
>> > > Thanks & Best,
>> > > Zakelly
>> > >
>> > >
>> > > On Tue, Mar 12, 2024 at 1:42 PM Yunfeng Zhou <
>> > flink.zhouyunf...@gmail.com> wrote:
>> > >>
>> > >> Hi Zakelly,
>> > >>
>> > >> Thanks for the quick response!
>> > >>
>> > >> > Actually splitting APIs into two sets ... warn them in runtime.
>> > >>
>> &

Re: [DISCUSS] FLIP-424: Asynchronous State APIs

2024-03-13 Thread weijie guo
uot; does not
> > >> contain information for new Flink users who did not know the v1 of
> > >> state API. Unlike internal implementation and performance
> > >> optimization, API will hardly be compromised for now and updated in
> > >> future, so I still suggest we improve the package name now if
> > >> possible. But given the existing practice of sink v2 and
> > >> AbstractStreamOperatorV2, the current package name would be acceptable
> > >> to me if other reviewers of this FLIP agrees on it.
> > >>
> > >> Best,
> > >> Yunfeng
> > >>
> > >> On Mon, Mar 11, 2024 at 5:27 PM Zakelly Lan 
> > wrote:
> > >> >
> > >> > Hi Yunfeng,
> > >> >
> > >> > Thanks for your comments!
> > >> >
> > >> > +1 for JingGe's suggestion to introduce an AsyncState API, instead
> of
> > >> > > having both get() and asyncGet() in the same State class. As a
> > >> > > supplement to its benefits, this design could help avoid having
> > users
> > >> > > to use sync and async API in a mixed way (unless they create both
> a
> > >> > > State and an AsyncState from the same state descriptor), which is
> > >> > > supposed to bring suboptimal performance according to the FLIP's
> > >> > > description.
> > >> >
> > >> >
> > >> > Actually splitting APIs into two sets of classes also brings some
> > >> > difficulties. In this case, users must explicitly define their usage
> > before
> > >> > actually doing state access. It is a little strange that the user
> can
> > >> > define a sync and an async version of State with the same name,
> while
> > they
> > >> > cannot allocate two async States with the same name.
> > >> > Another reason for distinguishing API by their method name instead
> of
> > class
> > >> > name is that users typically use the State instances to access state
> > but
> > >> > forget their type/class. For example:
> > >> > ```
> > >> > SyncState a = getState(xxx);
> > >> > AsyncState b = getAsyncState(xxx);
> > >> > //...
> > >> > a.update(1);
> > >> > b.update(1);
> > >> > ```
> > >> > Users are likely to think there is no difference between the
> > `a.update(1)`
> > >> > and `b.update(1)`, since they may forget the type for `a` and `b`.
> > Thus I
> > >> > proposed to distinguish the behavior in method names.
> > >> > As for the suboptimal performance with mixed usage of sync and
> async,
> > my
> > >> > proposal is to warn them in runtime.
> > >> >
> > >> > I noticed that the FLIP proposes to place the newly introduced API
> in
> > >> > > the package "org.apache.flink.api.common.state.v2", which seems a
> > >> > > little strange to me as there has not been such a naming pattern
> > >> > > ".v2." for packages in Flink.
> > >> >
> > >> >
> > >> > In fact, there are some similar existing patterns, like
> > >> > `org.apache.flink.streaming.api.functions.sink.v2` and
> > >> > `org.apache.flink.streaming.api.connector.sink2`.
> > >> >
> > >> >  I would suggest discussing this topic
> > >> > > with the main authors of Datastream V2, like Weijie Guo, so that
> the
> > >> > > newly introduced APIs from both sides comply with a unified naming
> > >> > > style.
> > >> >
> > >> > I'm afraid we are facing a different situation with the Datastream
> > V2. For
> > >> > total reconstruction of Datastream API, it is big enough to build a
> > >> > seperate module and keep good package names. While for state APIs,
> we
> > >> > should stay in the flink-core(-api) module alongside with other
> > >> > apis, currently I tend to compromise at the expense of naming style.
> > >> >
> > >> >
> > >> > Looking forward to hearing from you again!
> > >> >
> > >> > Thanks & Best,
> > >> > Zakelly
> > >> >
> > >> > On Mon, Mar 11, 2024 at 4:20 PM Yunfeng Zhou <
> > flink.zhouyunf...@gmail.com>
> > >> > wrote:
> >

Re: [VOTE] Release 1.19.0, release candidate #2

2024-03-12 Thread weijie guo
+1 (non-binding)

- Verified signature and checksum
- Verified source distribution does not contains binaries
- Build from source code and submit a word-count job successfully


Best regards,

Weijie


Jane Chan  于2024年3月12日周二 16:38写道:

> +1 (non-binding)
>
> - Verify that the source distributions do not contain any binaries;
> - Build the source distribution to ensure all source files have Apache
> headers;
> - Verify checksum and GPG signatures;
>
> Best,
> Jane
>
> On Tue, Mar 12, 2024 at 4:08 PM Xuannan Su  wrote:
>
> > +1 (non-binding)
> >
> > - Verified signature and checksum
> > - Verified that source distribution does not contain binaries
> > - Built from source code successfully
> > - Reviewed the release announcement PR
> >
> > Best regards,
> > Xuannan
> >
> > On Tue, Mar 12, 2024 at 2:18 PM Hang Ruan 
> wrote:
> > >
> > > +1 (non-binding)
> > >
> > > - Verified signatures and checksums
> > > - Verified that source does not contain binaries
> > > - Build source code successfully
> > > - Reviewed the release note and left a comment
> > >
> > > Best,
> > > Hang
> > >
> > > Feng Jin  于2024年3月12日周二 11:23写道:
> > >
> > > > +1 (non-binding)
> > > >
> > > > - Verified signatures and checksums
> > > > - Verified that source does not contain binaries
> > > > - Build source code successfully
> > > > - Run a simple sql query successfully
> > > >
> > > > Best,
> > > > Feng Jin
> > > >
> > > >
> > > > On Tue, Mar 12, 2024 at 11:09 AM Ron liu  wrote:
> > > >
> > > > > +1 (non binding)
> > > > >
> > > > > quickly verified:
> > > > > - verified that source distribution does not contain binaries
> > > > > - verified checksums
> > > > > - built source code successfully
> > > > >
> > > > >
> > > > > Best,
> > > > > Ron
> > > > >
> > > > > Jeyhun Karimov  于2024年3月12日周二 01:00写道:
> > > > >
> > > > > > +1 (non binding)
> > > > > >
> > > > > > - verified that source distribution does not contain binaries
> > > > > > - verified signatures and checksums
> > > > > > - built source code successfully
> > > > > >
> > > > > > Regards,
> > > > > > Jeyhun
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 11, 2024 at 3:08 PM Samrat Deb <
> decordea...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > +1 (non binding)
> > > > > > >
> > > > > > > - verified signatures and checksums
> > > > > > > - ASF headers are present in all expected file
> > > > > > > - No unexpected binaries files found in the source
> > > > > > > - Build successful locally
> > > > > > > - tested basic word count example
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Bests,
> > > > > > > Samrat
> > > > > > >
> > > > > > > On Mon, 11 Mar 2024 at 7:33 PM, Ahmed Hamdy <
> > hamdy10...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Lincoln
> > > > > > > > +1 (non-binding) from me
> > > > > > > >
> > > > > > > > - Verified Checksums & Signatures
> > > > > > > > - Verified Source dists don't contain binaries
> > > > > > > > - Built source successfully
> > > > > > > > - reviewed web PR
> > > > > > > >
> > > > > > > >
> > > > > > > > Best Regards
> > > > > > > > Ahmed Hamdy
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, 11 Mar 2024 at 15:18, Lincoln Lee <
> > lincoln.8...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Robin,
> > > > > > > > >
> > > > > > > > > Thanks for helping verifying the release note[1],
> FLINK-14879
> > > > > should
> > > > > > > not
> > > > > > > > > have been included, after confirming this
> > > > > > > > > I moved all unresolved non-blocker issues left over from
> > 1.19.0
> > > > to
> > > > > > > 1.20.0
> > > > > > > > > and reconfigured the release note [1].
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Lincoln Lee
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Robin Moffatt  于2024年3月11日周一
> > > > 19:36写道:
> > > > > > > > >
> > > > > > > > > > Looking at the release notes [1] it lists `DESCRIBE
> > DATABASE`
> > > > > > > > > (FLINK-14879)
> > > > > > > > > > and `DESCRIBE CATALOG` (FLINK-14690).
> > > > > > > > > > When I try these in 1.19 RC2 the behaviour is as in
> 1.18.1,
> > > > i.e.
> > > > > it
> > > > > > > is
> > > > > > > > > not
> > > > > > > > > > supported:
> > > > > > > > > >
> > > > > > > > > > ```
> > > > > > > > > > [INFO] Execute statement succeed.
> > > > > > > > > >
> > > > > > > > > > Flink SQL> show catalogs;
> > > > > > > > > > +-+
> > > > > > > > > > |catalog name |
> > > > > > > > > > +-+
> > > > > > > > > > |   c_new |
> > > > > > > > > > | default_catalog |
> > > > > > > > > > +-+
> > > > > > > > > > 2 rows in set
> > > > > > > > > >
> > > > > > > > > > Flink SQL> DESCRIBE CATALOG c_new;
> > > > > > > > > > [ERROR] Could not execute SQL 

Re: [DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-12 Thread weijie guo
Hi Zakelly,

> But still, from a user's point of view,  state can be characterized along
two relatively independent dimensions, how states redistribute and the data
structure. Thus I still suggest a chained-like configuration API that
configures one aspect on each call.


I think the chained-like style is a good suggestion. But I'm not going to
introduce any mutable-like API to StateDeclaration (even though we can
achieve immutability by returning a new object). For this reason, I decided
to use the builder pattern, which also has the benefit of chaining calls
and allows us to support further configurations such as setTTL in the
future. For ease of use, we'll also provide some shortcuts to avoid having
to go through a long build chain each time. Of course, I have updated the
the FLIP about this part.



Best regards,

Weijie


weijie guo  于2024年3月12日周二 14:00写道:

> Hi Hangxiang,
>
> > So these operators only define all states they may use which could be
> explained by the caller, right ?
>
> Yes, you're right.
>
>
> Best regards,
>
> Weijie
>
>
> weijie guo  于2024年3月12日周二 13:59写道:
>
>> Hi Max,
>>
>> > In this thread it looks like the plan is to remove the old state
>> declaration API. I think we should consider keeping the old APIs to
>> avoid breaking too many jobs.
>>
>> We're not plan to remove any old apis, which means that changes made in
>> V2 won't affect any V1 DataStream jobs. But V2 is limited to the new state
>> declaration API, and users who want to migrate to DataStream V2 will need
>> to rewrite their jobs anyway.
>>
>> Best regards,
>>
>> Weijie
>>
>>
>> Hangxiang Yu  于2024年3月12日周二 10:26写道:
>>
>>> Hi, Weijie.
>>> Thanks for your answer!
>>>
>>> > No, Introducing and declaring new state
>>> > at runtime is something we want to explicitly disallow.
>>>
>>> I just thinked about how some operators define their useState() when
>>> their
>>> real used states may be changed at runtime, e.g. different state types
>>> for
>>> different state sizes.
>>> So these operators only define all states they may use which could be
>>> explained by the caller, right ?
>>>
>>> On Mon, Mar 11, 2024 at 10:57 PM Maximilian Michels 
>>> wrote:
>>>
>>> > The FLIP mentions: "The contents described in this FLIP are all new
>>> > APIs and do not involve compatibility issues."
>>> >
>>> > In this thread it looks like the plan is to remove the old state
>>> > declaration API. I think we should consider keeping the old APIs to
>>> > avoid breaking too many jobs. The new APIs will still be beneficial
>>> > for new jobs, e.g. for SQL jobs.
>>> >
>>> > -Max
>>> >
>>> > On Fri, Mar 8, 2024 at 4:39 AM Zakelly Lan 
>>> wrote:
>>> > >
>>> > > Hi Weijie,
>>> > >
>>> > > Thanks for your answer! Well I get your point. Since partitions are
>>> > > first-class citizens, and redistribution means how states migrate
>>> when
>>> > > partitions change, I'd be fine with deemphasizing the concept of
>>> > > keyed/operator state if we highlight the definition of partition in
>>> the
>>> > > document. Keeping `RedistributionMode` under `StateDeclaration` is
>>> also
>>> > > fine with me, as I guess it is only for internal usage.
>>> > > But still, from a user's point of view,  state can be characterized
>>> along
>>> > > two relatively independent dimensions, how states redistribute and
>>> the
>>> > data
>>> > > structure. Thus I still suggest a chained-like configuration API that
>>> > > configures one aspect on each call, such as:
>>> > > ```
>>> > > # Keyed stream, no redistribution mode specified, the state will go
>>> with
>>> > > partition (no redistribution).  Keyed state
>>> > > StateDeclaration a = States.declare(name).listState(type);
>>> > >
>>> > > # Keyed stream, redistribution strategy specified, the state follows
>>> the
>>> > > specified redistribute strategy.   Operator state
>>> > > StateDeclaration b =
>>> > > States.declare(name).listState(type).redistributeBy(strategy);
>>> > >
>>> > > # Non-keyed stream, redistribution strategy *must be* specified.
>>> > > StateDeclaration c =
>>> > > 

Re: [DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-12 Thread weijie guo
Hi Hangxiang,

> So these operators only define all states they may use which could be
explained by the caller, right ?

Yes, you're right.


Best regards,

Weijie


weijie guo  于2024年3月12日周二 13:59写道:

> Hi Max,
>
> > In this thread it looks like the plan is to remove the old state
> declaration API. I think we should consider keeping the old APIs to
> avoid breaking too many jobs.
>
> We're not plan to remove any old apis, which means that changes made in V2
> won't affect any V1 DataStream jobs. But V2 is limited to the new state
> declaration API, and users who want to migrate to DataStream V2 will need
> to rewrite their jobs anyway.
>
> Best regards,
>
> Weijie
>
>
> Hangxiang Yu  于2024年3月12日周二 10:26写道:
>
>> Hi, Weijie.
>> Thanks for your answer!
>>
>> > No, Introducing and declaring new state
>> > at runtime is something we want to explicitly disallow.
>>
>> I just thinked about how some operators define their useState() when their
>> real used states may be changed at runtime, e.g. different state types for
>> different state sizes.
>> So these operators only define all states they may use which could be
>> explained by the caller, right ?
>>
>> On Mon, Mar 11, 2024 at 10:57 PM Maximilian Michels 
>> wrote:
>>
>> > The FLIP mentions: "The contents described in this FLIP are all new
>> > APIs and do not involve compatibility issues."
>> >
>> > In this thread it looks like the plan is to remove the old state
>> > declaration API. I think we should consider keeping the old APIs to
>> > avoid breaking too many jobs. The new APIs will still be beneficial
>> > for new jobs, e.g. for SQL jobs.
>> >
>> > -Max
>> >
>> > On Fri, Mar 8, 2024 at 4:39 AM Zakelly Lan 
>> wrote:
>> > >
>> > > Hi Weijie,
>> > >
>> > > Thanks for your answer! Well I get your point. Since partitions are
>> > > first-class citizens, and redistribution means how states migrate when
>> > > partitions change, I'd be fine with deemphasizing the concept of
>> > > keyed/operator state if we highlight the definition of partition in
>> the
>> > > document. Keeping `RedistributionMode` under `StateDeclaration` is
>> also
>> > > fine with me, as I guess it is only for internal usage.
>> > > But still, from a user's point of view,  state can be characterized
>> along
>> > > two relatively independent dimensions, how states redistribute and the
>> > data
>> > > structure. Thus I still suggest a chained-like configuration API that
>> > > configures one aspect on each call, such as:
>> > > ```
>> > > # Keyed stream, no redistribution mode specified, the state will go
>> with
>> > > partition (no redistribution).  Keyed state
>> > > StateDeclaration a = States.declare(name).listState(type);
>> > >
>> > > # Keyed stream, redistribution strategy specified, the state follows
>> the
>> > > specified redistribute strategy.   Operator state
>> > > StateDeclaration b =
>> > > States.declare(name).listState(type).redistributeBy(strategy);
>> > >
>> > > # Non-keyed stream, redistribution strategy *must be* specified.
>> > > StateDeclaration c =
>> > > States.declare(name).listState(type).redistributeBy(strategy);
>> > >
>> > > # Broadcast stream and state
>> > > StateDeclaration d = States.declare(name).mapState(typeK,
>> > > typeV).broadcast();
>> > > ```
>> > > It can drive users to think about redistribution issues when needed.
>> And
>> > it
>> > > also provides more flexibility to add more combinations such as
>> > > broadcasting list state, or chain more configurable aspects such as
>> > adding
>> > > `withTtl()` in future. WDYT?
>> > >
>> > >
>> > > Best,
>> > > Zakelly
>> > >
>> > > On Thu, Mar 7, 2024 at 6:04 PM weijie guo 
>> > wrote:
>> > >
>> > > > Hi Jinzhong,
>> > > >
>> > > > Thanks for the reply!
>> > > >
>> > > > > Overall, I think that the “Eager State Declaration” is a good
>> > proposal,
>> > > > which can enhance Flink's state management capabilities and provide
>> > > > possibilities for subsequent state optimizations.
>> > > >
>> > > > It's nic

Re: [DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-12 Thread weijie guo
Hi Max,

> In this thread it looks like the plan is to remove the old state
declaration API. I think we should consider keeping the old APIs to
avoid breaking too many jobs.

We're not plan to remove any old apis, which means that changes made in V2
won't affect any V1 DataStream jobs. But V2 is limited to the new state
declaration API, and users who want to migrate to DataStream V2 will need
to rewrite their jobs anyway.

Best regards,

Weijie


Hangxiang Yu  于2024年3月12日周二 10:26写道:

> Hi, Weijie.
> Thanks for your answer!
>
> > No, Introducing and declaring new state
> > at runtime is something we want to explicitly disallow.
>
> I just thinked about how some operators define their useState() when their
> real used states may be changed at runtime, e.g. different state types for
> different state sizes.
> So these operators only define all states they may use which could be
> explained by the caller, right ?
>
> On Mon, Mar 11, 2024 at 10:57 PM Maximilian Michels 
> wrote:
>
> > The FLIP mentions: "The contents described in this FLIP are all new
> > APIs and do not involve compatibility issues."
> >
> > In this thread it looks like the plan is to remove the old state
> > declaration API. I think we should consider keeping the old APIs to
> > avoid breaking too many jobs. The new APIs will still be beneficial
> > for new jobs, e.g. for SQL jobs.
> >
> > -Max
> >
> > On Fri, Mar 8, 2024 at 4:39 AM Zakelly Lan 
> wrote:
> > >
> > > Hi Weijie,
> > >
> > > Thanks for your answer! Well I get your point. Since partitions are
> > > first-class citizens, and redistribution means how states migrate when
> > > partitions change, I'd be fine with deemphasizing the concept of
> > > keyed/operator state if we highlight the definition of partition in the
> > > document. Keeping `RedistributionMode` under `StateDeclaration` is also
> > > fine with me, as I guess it is only for internal usage.
> > > But still, from a user's point of view,  state can be characterized
> along
> > > two relatively independent dimensions, how states redistribute and the
> > data
> > > structure. Thus I still suggest a chained-like configuration API that
> > > configures one aspect on each call, such as:
> > > ```
> > > # Keyed stream, no redistribution mode specified, the state will go
> with
> > > partition (no redistribution).  Keyed state
> > > StateDeclaration a = States.declare(name).listState(type);
> > >
> > > # Keyed stream, redistribution strategy specified, the state follows
> the
> > > specified redistribute strategy.   Operator state
> > > StateDeclaration b =
> > > States.declare(name).listState(type).redistributeBy(strategy);
> > >
> > > # Non-keyed stream, redistribution strategy *must be* specified.
> > > StateDeclaration c =
> > > States.declare(name).listState(type).redistributeBy(strategy);
> > >
> > > # Broadcast stream and state
> > > StateDeclaration d = States.declare(name).mapState(typeK,
> > > typeV).broadcast();
> > > ```
> > > It can drive users to think about redistribution issues when needed.
> And
> > it
> > > also provides more flexibility to add more combinations such as
> > > broadcasting list state, or chain more configurable aspects such as
> > adding
> > > `withTtl()` in future. WDYT?
> > >
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Thu, Mar 7, 2024 at 6:04 PM weijie guo 
> > wrote:
> > >
> > > > Hi Jinzhong,
> > > >
> > > > Thanks for the reply!
> > > >
> > > > > Overall, I think that the “Eager State Declaration” is a good
> > proposal,
> > > > which can enhance Flink's state management capabilities and provide
> > > > possibilities for subsequent state optimizations.
> > > >
> > > > It's nice to see that people who are familiar with the state stuff
> like
> > > > this proposal. :)
> > > >
> > > > >  When the user attempts to access an undeclared state at runtime,
> it
> > is
> > > > more reasonable to throw an exception rather than returning
> > Option#empty,
> > > > as Gyula mentioned above.
> > > >
> > > > Yes, I agree that this is better then a confused empty, and I have
> > modified
> > > > the corresponding part of this FLIP.
> > > >
> > > > > In addition, I'm not quite sure whether all

Re: [DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-07 Thread weijie guo
Hi Jinzhong,

Thanks for the reply!

> Overall, I think that the “Eager State Declaration” is a good proposal,
which can enhance Flink's state management capabilities and provide
possibilities for subsequent state optimizations.

It's nice to see that people who are familiar with the state stuff like
this proposal. :)

>  When the user attempts to access an undeclared state at runtime, it is
more reasonable to throw an exception rather than returning Option#empty,
as Gyula mentioned above.

Yes, I agree that this is better then a confused empty, and I have modified
the corresponding part of this FLIP.

> In addition, I'm not quite sure whether all of the existing usage in
which states are registered at runtime dynamically can be migrated to the
"Eager State Declaration" style with minimal cost?

I think for most user functions, this is fairly straightforward to migrate.
But states whose declarations depend on runtime information(e.g.
RuntimeContext) are, in principle, not supported in the new API. Anyway,
the old and new apis are completely incompatible, so rewriting jobs is
inevitable. User can think about how to write a good process function that
conforms to the eager declaration style.

> For state TTL, should StateDeclaration also provide interfaces for users
to declare state ttl?

Of course, We can and we need to provide this one. But whether or not it's
in this FLIP isn't very important for me, because we're mainly talking
about the general principles and ways of declaring and accessing state in
this FLIP. I promise we won't leave it out in the end D).



Best regards,

Weijie


Jinzhong Li  于2024年3月7日周四 17:34写道:

> Hi Weijie,
>
> Thanks for driving this!
>
> 1. Overall, I think that the “Eager State Declaration” is a good proposal,
> which can enhance Flink's state management capabilities and provide
> possibilities for subsequent state optimizations.
>
> 2. When the user attempts to access an undeclared state at runtime, it is
> more reasonable to throw an exception rather than returning Option#empty,
> as Gyula mentioned above.
> In addition, I'm not quite sure whether all of the existing usage in which
> states are registered at runtime dynamically can be migrated to the "Eager
> State Declaration" style with minimal cost?
>
> 3. For state TTL, should StateDeclaration also provide interfaces for users
> to declare state ttl?
>
> Best,
> Jinzhong Li
>
>
> On Thu, Mar 7, 2024 at 5:08 PM weijie guo 
> wrote:
>
> > Hi Hangxiang,
> >
> > Thanks for your reply!
> >
> > > We have also discussed in FLIP-359/FLINK-32658 about limiting the user
> > operation to avoid creating state when processElement. Could current
> > interfaces also help this?
> >
> > I think so. It is illegal to create state at runtime in our proposal.
> >
> >
> > > Could you provide more examples about how useStates() works ? Since
> some
> > operations may change their used states at runtime, the value this method
> > returns will be modified at runtime, right?
> >
> >
> > No, Introducing and declaring new state
> > at runtime is something we want to explicitly disallow. You can simply
> > assume that useState is only called when the JobGraph is generated,
> > and any future changes to it are invalid and illegal.
> >
> >
> > > IIUC, RedistributionMode/Strategy should not be used by users, right ?
> > If so, I'm +1 to move them to inner interfaces which seems a bit
> confusing
> > to users.
> >
> >
> > As for this question, I think my answer to Zakelly should be helpful.
> >
> >
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > weijie guo  于2024年3月7日周四 16:58写道:
> >
> > > Hi Zakelly,
> > >
> > > Thanks for your reply!
> > >
> > > > My advice would be to conceal RedistributionMode/Strategy from the
> > > standard user interface, particularly within the helper class 'State'.
> > But
> > > I'm OK to keep it in `StateDeclaration` since its interfaces are
> > basically
> > > used by the framework.
> > >
> > > I'm sorry, I didn't mention some of the details/concepts introduced by
> > the Umbrella FLIP and FLIP-409 in this FLIP. This might make it hard to
> > understand the motivation behind
> > > RedistributionMode, I'll add more context in FLIP then.
> > >
> > >
> > >
> > > Briefly, in V2, we explicitly define the concept of partition. In the
> > case of KeyedPartitionStream, one key corresponds to one partition.For
> > NonKeyedPartitionStream one parallelism/subtask corresponds to one
> > partition. All states are co

Re: [DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-07 Thread weijie guo
Hi Hangxiang,

Thanks for your reply!

> We have also discussed in FLIP-359/FLINK-32658 about limiting the user
operation to avoid creating state when processElement. Could current
interfaces also help this?

I think so. It is illegal to create state at runtime in our proposal.


> Could you provide more examples about how useStates() works ? Since some
operations may change their used states at runtime, the value this method
returns will be modified at runtime, right?


No, Introducing and declaring new state
at runtime is something we want to explicitly disallow. You can simply
assume that useState is only called when the JobGraph is generated,
and any future changes to it are invalid and illegal.


> IIUC, RedistributionMode/Strategy should not be used by users, right ?
If so, I'm +1 to move them to inner interfaces which seems a bit confusing
to users.


As for this question, I think my answer to Zakelly should be helpful.



Best regards,

Weijie


weijie guo  于2024年3月7日周四 16:58写道:

> Hi Zakelly,
>
> Thanks for your reply!
>
> > My advice would be to conceal RedistributionMode/Strategy from the
> standard user interface, particularly within the helper class 'State'. But
> I'm OK to keep it in `StateDeclaration` since its interfaces are basically
> used by the framework.
>
> I'm sorry, I didn't mention some of the details/concepts introduced by the 
> Umbrella FLIP and FLIP-409 in this FLIP. This might make it hard to 
> understand the motivation behind
> RedistributionMode, I'll add more context in FLIP then.
>
>
>
> Briefly, in V2, we explicitly define the concept of partition. In the case of 
> KeyedPartitionStream, one key corresponds to one partition.For 
> NonKeyedPartitionStream one parallelism/subtask corresponds to one partition. 
> All states are considered to be confined within the partition. On this basis, 
> an obvious question is whether and how the state should be redistribution 
> when the partition changes? So we divide the state into three categories:
>
>- Don't need to redistribute states when the partition changes.
>- Has to decide how to distribute states when the partition changes.
>- Always has the same state across different partitions.
>
> After introducing the concept of partition, the redistribution pattern/mode 
> of state is the more essential difference between states. For this reason, we 
> don't want to emphasize keyed/operator state in the V2 API
> any
> more. Keep in mind, partition are first-class citizens. And, even in V1, we 
> have to let the user know that split/union are two different strategies for 
> list state.
>
>
>
> As for whether or not to expose RedistributionMode to users, I have an open 
> mind. But as I said just now, we still can't avoid this problem in the 
> splitRedistributionListState and unionRedistributionListState. IMO, it's 
> better to explain it in the API level instead of avoiding it. WDTY?
>
> Best regards,
>
> Weijie
>
>
> weijie guo  于2024年3月7日周四 16:39写道:
>
>> Hi Gyula,
>>
>>
>> Thanks for your reply!
>>
>>
>> Let me answer these questions:
>>
>>
>> > What is the semantics of the usesStates method? When is it called? Can
>> the used state change dynamically at runtime? Can the logic depend on 
>> something computed in open(..) for example?
>>
>>
>>
>> useStates is used to predefine all the states that the process function 
>> needs to access. In other words, we want to avoid declaring the state 
>> dynamically at runtime and this allows the SQL planner and JM to optimize 
>> the job better. As a result, this logic must be fully available at compile 
>> time (when the JobGraph is generated), so it can't rely on computations that 
>> are executed after deploy to TM.
>>
>>
>> >
>> Currently state access is pretty dynamic in Flink and I would assume many 
>> jobs create states on the fly based on some required logic. Are we planning 
>> to address these use-cases?
>>
>>
>> It depends on what type of context we need. If the type and number of states 
>> depend on runtime context, that's something we want to avoid. If it only 
>> depended on information available at compile time, I think we could support
>> it.
>>
>>
>> >
>> Are we planning to support deleting/dropping states that are not required 
>> anymore?
>>
>>
>>
>> We really don't want the user to be able to dynamically declare/delete a 
>> state at runtime, but if you just want to clear/clean the value of state, 
>> the new API works the same as the old API.
>>
>>
>> > I think if a state is not declared or otherwise cannot be 

Re: [DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-07 Thread weijie guo
Hi Zakelly,

Thanks for your reply!

> My advice would be to conceal RedistributionMode/Strategy from the
standard user interface, particularly within the helper class 'State'. But
I'm OK to keep it in `StateDeclaration` since its interfaces are basically
used by the framework.

I'm sorry, I didn't mention some of the details/concepts introduced by
the Umbrella FLIP and FLIP-409 in this FLIP. This might make it hard
to understand the motivation behind
RedistributionMode, I'll add more context in FLIP then.


Briefly, in V2, we explicitly define the concept of partition. In the
case of KeyedPartitionStream, one key corresponds to one partition.For
NonKeyedPartitionStream one parallelism/subtask corresponds to one
partition. All states are considered to be confined within the
partition. On this basis, an obvious question is whether and how the
state should be redistribution when the partition changes? So we
divide the state into three categories:

   - Don't need to redistribute states when the partition changes.
   - Has to decide how to distribute states when the partition changes.
   - Always has the same state across different partitions.

After introducing the concept of partition, the redistribution
pattern/mode of state is the more essential difference between states.
For this reason, we don't want to emphasize keyed/operator state in
the V2 API
any
more. Keep in mind, partition are first-class citizens. And, even in
V1, we have to let the user know that split/union are two different
strategies for list state.


As for whether or not to expose RedistributionMode to users, I have an
open mind. But as I said just now, we still can't avoid this problem
in the splitRedistributionListState and unionRedistributionListState.
IMO, it's better to explain it in the API level instead of avoiding
it. WDTY?

Best regards,

Weijie


weijie guo  于2024年3月7日周四 16:39写道:

> Hi Gyula,
>
>
> Thanks for your reply!
>
>
> Let me answer these questions:
>
>
> > What is the semantics of the usesStates method? When is it called? Can
> the used state change dynamically at runtime? Can the logic depend on 
> something computed in open(..) for example?
>
>
>
> useStates is used to predefine all the states that the process function needs 
> to access. In other words, we want to avoid declaring the state dynamically 
> at runtime and this allows the SQL planner and JM to optimize the job better. 
> As a result, this logic must be fully available at compile time (when the 
> JobGraph is generated), so it can't rely on computations that are executed 
> after deploy to TM.
>
>
> >
> Currently state access is pretty dynamic in Flink and I would assume many 
> jobs create states on the fly based on some required logic. Are we planning 
> to address these use-cases?
>
>
> It depends on what type of context we need. If the type and number of states 
> depend on runtime context, that's something we want to avoid. If it only 
> depended on information available at compile time, I think we could support
> it.
>
>
> >
> Are we planning to support deleting/dropping states that are not required 
> anymore?
>
>
>
> We really don't want the user to be able to dynamically declare/delete a 
> state at runtime, but if you just want to clear/clean the value of state, the 
> new API works the same as the old API.
>
>
> > I think if a state is not declared or otherwise cannot be accessed, an
> exceptions must be thrown. We cannot confuse empty value with something
> inaccessible.
>
>
> After thinking about it a bit more, I think you have a point!
> It's important to make a clear distinction between an empty state and illegal 
> access, especially since flink currently discourage setting a non-null 
> default value for the state.
> I will modify the proposal as you suggested then :)
>
>
> > The RedistributionMode enum sounds a bit strange to me, as it doesn't
> actually specify a mode of redistribution. It feels more like a flag. Can
> we simply have an Optional instead?
>
>
> We actually define three types RedistributionMode instead of two because
> we don't want to think of IDENTICAL as a redistribution strategy, it's just
> an invariant: the State of that type is always the same across partitions.
> If it only has None and REDISTRIBUTABLE, I think your proposal is
> feasible then. But we don't want to confuse these three semantics/modes.
>
>
> > BroadcastStates are currently very limited by only Map-like states, and
> the new interface also enforces that. Can we remove this limitation? If
> not, should broadcastState declaration extend mapstate declaration?
>
>
>
> Personally, I don't want to make this restriction. This is also why the 
> method in StateManager to get BroadcastState has the pa

Re: [DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-07 Thread weijie guo
butionMode/Strategy
> > from the standard user interface, particularly within the helper class
> > 'State'. But I'm OK to keep it in `StateDeclaration` since its interfaces
> > are basically used by the framework. My preferred syntax would be:
> > ```
> > StateDeclaration a = State.declare(name).keyed().listState(type);
> > StateDeclaration b = State.declare(name).broadcast().mapState(typeK,
> > typeV);
> > StateDeclaration c = State.declare(name).keyed().aggregatingState(type,
> > function);
> > ```
> > WDYT?
> >
> >
> > Best,
> > Zakelly
> >
> > On Wed, Mar 6, 2024 at 11:04 PM Gyula Fóra  wrote:
> >
> > > Hi Weijie!
> > >
> > > Thank you for the proposal.
> > >
> > > I have some initial questions to start the discussion:
> > >
> > > 1. What is the semantics of the usesStates method? When is it called?
> Can
> > > the used state change dynamically at runtime? Can the logic depend on
> > > something computed in open(..) for example?
> > >
> > > Currently state access is pretty dynamic in Flink and I would assume
> many
> > > jobs create states on the fly based on some required logic. Are we
> > planning
> > > to address these use-cases?
> > >
> > > Are we planning to support deleting/dropping states that are not
> required
> > > anymore?
> > >
> > > 2. Get state now returns an optional, but you mention that:
> > > " If you want to get a state that is not declared or has no access,
> > > Option#empty is returned."
> > >
> > > I think if a state is not declared or otherwise cannot be accessed, an
> > > exceptions must be thrown. We cannot confuse empty value with something
> > > inaccessible.
> > >
> > > 3. The RedistributionMode enum sounds a bit strange to me, as it
> doesn't
> > > actually specify a mode of redistribution. It feels more like a flag.
> Can
> > > we simply have an Optional instead?
> > >
> > > 4. BroadcastStates are currently very limited by only Map-like states,
> > and
> > > the new interface also enforces that.
> > > Can we remove this limitation? If not, should broadcastState
> declaration
> > > extend mapstate declaration?
> > >
> > > Cheers,
> > > Gyula
> > >
> > > Cheers
> > > Gyuka
> > >
> > > On Wed, Mar 6, 2024 at 11:18 AM weijie guo 
> > > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I'd like to start a discussion about FLIP-433: State Access on
> > > > DataStream API V2
> > > > [1]. This is the third sub-FLIP of DataStream API V2.
> > > >
> > > >
> > > > After FLIP-410 [2], we can already write a simple stateless job using
> > the
> > > > DataStream V2 API.  But as we all know, stateful computing is Flink's
> > > trump
> > > > card. In this FLIP, we will discuss how to declare and access state
> on
> > > > DataStream API V2 and we manage to avoid some of the shortcomings of
> V1
> > > in
> > > > this regard.
> > > >
> > > > You can find more details in this FLIP. Its relationship with other
> > > > sub-FLIPs can be found in the umbrella FLIP
> > > > [3]. Looking forward to hearing from you, thanks!
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Weijie
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-433%3A+State+Access+on+DataStream+API+V2
> > > >
> > > > [2]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2
> > > >
> > > > [3]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2
> > > >
> > >
> >
>
>
> --
> Best,
> Hangxiang.
>


[DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-06 Thread weijie guo
Hi devs,

I'd like to start a discussion about FLIP-433: State Access on
DataStream API V2
[1]. This is the third sub-FLIP of DataStream API V2.


After FLIP-410 [2], we can already write a simple stateless job using the
DataStream V2 API.  But as we all know, stateful computing is Flink's trump
card. In this FLIP, we will discuss how to declare and access state on
DataStream API V2 and we manage to avoid some of the shortcomings of V1 in
this regard.

You can find more details in this FLIP. Its relationship with other
sub-FLIPs can be found in the umbrella FLIP
[3]. Looking forward to hearing from you, thanks!


Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-433%3A+State+Access+on+DataStream+API+V2

[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2

[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2


[jira] [Created] (FLINK-34549) FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-02-29 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-34549:
--

 Summary: FLIP-410: Config, Context and Processing Timer Service of 
DataStream API V2
 Key: FLINK-34549
 URL: https://issues.apache.org/jira/browse/FLINK-34549
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream
Reporter: Weijie Guo
Assignee: Weijie Guo


This is the umbrella ticket for FLIP-410: Config, Context and Processing Timer 
Service of DataStream API V2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34548) FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-29 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-34548:
--

 Summary: FLIP-409: DataStream V2 Building Blocks: DataStream, 
Partitioning and ProcessFunction
 Key: FLINK-34548
 URL: https://issues.apache.org/jira/browse/FLINK-34548
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream
Reporter: Weijie Guo
Assignee: Weijie Guo


This is the umbrella ticket for FLIP-409: DataStream V2 Building Blocks: 
DataStream, Partitioning and ProcessFunction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34547) Flink DataStream API V2

2024-02-29 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-34547:
--

 Summary: Flink DataStream API V2
 Key: FLINK-34547
 URL: https://issues.apache.org/jira/browse/FLINK-34547
 Project: Flink
  Issue Type: New Feature
  Components: API / DataStream
Reporter: Weijie Guo
Assignee: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-02-29 Thread weijie guo
Hi devs,

I'm happy to announce that FLIP-410: Config, Context and Processing
Timer Service of DataStream API V2 [1] has been accepted with 5
approving votes (4 binding) [2]:


- Xintong Song (binding)

- Weijie Guo (binding)

- Rui Fan (binding)

- Xuannan Su (non-binding)

- Guowei Ma (binding)


There are no disapproving votes. Thanks to everyone who participated
in the discussion and voting.


Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2

[2] https://lists.apache.org/thread/wrwbv80q21oo3jsndpw7lnw7xgs0hfpg


[RESULT][VOTE] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-29 Thread weijie guo
Hi devs,


I'm happy to announce that FLIP-409: DataStream V2 Building Blocks:
DataStream, Partitioning and ProcessFunction [1] has been accepted
with 5 approving votes (4 binding) [2]:


- Xintong Song (binding)

- Weijie Guo (binding)

- Rui Fan (binding)

- Guowei Ma (binding)

- Xuannan Su (non-binding)


There are no disapproving votes. Thanks to everyone who participated
in the discussion and voting.


Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction

[2] https://lists.apache.org/thread/yovqgkv33r8773nx9gcc7bl5hwt8y867


[RESULT][VOTE] FLIP-408: [Umbrella] Introduce DataStream API V2

2024-02-29 Thread weijie guo
Hi devs,


I'm happy to announce that FLIP-408: [Umbrella] Introduce DataStream
API V2 [1] has been accepted with 5 approving votes (4 binding) [2]:


- Xintong Song (binding)

- Weijie Guo (binding)

- Xuannan Su (non-binding)

- Rui Fan (binding)

- Guowei Ma (binding)


There are no disapproving votes. Thanks to everyone who participated
in the discussion and voting.


Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2

[2] https://lists.apache.org/thread/pl9s1s1k1kfqcs9fh4qs79zmryzlnpro


Re: [VOTE] FLIP-314: Support Customized Job Lineage Listener

2024-02-28 Thread weijie guo
+1 (binding)

Best regards,

Weijie


Feng Jin  于2024年2月29日周四 09:37写道:

> +1 (non-binding)
>
> Best,
> Feng Jin
>
> On Thu, Feb 29, 2024 at 4:41 AM Márton Balassi 
> wrote:
>
> > +1 (binding)
> >
> > Marton
> >
> > On Wed, Feb 28, 2024 at 5:14 PM Gyula Fóra  wrote:
> >
> > > +1 (binding)
> > >
> > > Gyula
> > >
> > > On Wed, Feb 28, 2024 at 11:10 AM Maciej Obuchowski <
> > mobuchow...@apache.org
> > > >
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Maciej Obuchowski
> > > >
> > > > śr., 28 lut 2024 o 10:29 Zhanghao Chen 
> > > > napisał(a):
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Best,
> > > > > Zhanghao Chen
> > > > > 
> > > > > From: Yong Fang 
> > > > > Sent: Wednesday, February 28, 2024 10:12
> > > > > To: dev 
> > > > > Subject: [VOTE] FLIP-314: Support Customized Job Lineage Listener
> > > > >
> > > > > Hi devs,
> > > > >
> > > > > I would like to restart a vote about FLIP-314: Support Customized
> Job
> > > > > Lineage Listener[1].
> > > > >
> > > > > Previously, we added lineage related interfaces in FLIP-314. Before
> > the
> > > > > interfaces were developed and merged into the master, @Maciej and
> > > > > @Zhenqiu provided valuable suggestions for the interface from the
> > > > > perspective of the lineage system. So we updated the interfaces of
> > > > FLIP-314
> > > > > and discussed them again in the discussion thread [2].
> > > > >
> > > > > So I am here to initiate a new vote on FLIP-314, the vote will be
> > open
> > > > for
> > > > > at least 72 hours unless there is an objection or insufficient
> votes
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
> > > > > [2]
> https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc
> > > > >
> > > > > Best,
> > > > > Fang Yong
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-02-26 Thread weijie guo
+1(binding)

Best regards,

Weijie


Xintong Song  于2024年2月27日周二 09:37写道:

> +1 (binding)
>
> Best,
>
> Xintong
>
>
>
> On Mon, Feb 26, 2024 at 6:10 PM weijie guo 
> wrote:
>
> > Hi everyone,
> >
> >
> > Thanks for all the feedback about the FLIP-410: Config, Context and
> > Processing Timer Service of DataStream API V2 [1]. The discussion
> > thread is here [2].
> >
> >
> > The vote will be open for at least 72 hours unless there is an
> > objection or insufficient votes.
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2
> >
> > [2] https://lists.apache.org/thread/70gf028c5gsdb9qhsgpht0chzyp9nogc
> >
> >
> > Best regards,
> >
> > Weijie
> >
>


Re: [VOTE] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-26 Thread weijie guo
+1(binding)

Best regards,

Weijie


Xintong Song  于2024年2月27日周二 09:38写道:

> +1 (binding)
>
> Best,
>
> Xintong
>
>
>
> On Mon, Feb 26, 2024 at 6:09 PM weijie guo 
> wrote:
>
> > Hi everyone,
> >
> >
> > Thanks for all the feedback about the FLIP-409: DataStream V2 Building
> > Blocks: DataStream, Partitioning and ProcessFunction [1]. The
> > discussion thread is here [2].
> >
> >
> > The vote will be open for at least 72 hours unless there is an
> > objection or insufficient votes.
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction
> >
> > [2] https://lists.apache.org/thread/cwds0bwbgy3lfdgnlqbfhm6lfvx2qbrv
> >
> >
> > Best regards,
> >
> > Weijie
> >
>


Re: [VOTE] FLIP-408: [Umbrella] Introduce DataStream API V2

2024-02-26 Thread weijie guo
+1(binding)

Best regards,

Weijie


Xintong Song  于2024年2月27日周二 09:36写道:

> +1 (binding)
>
> Best,
>
> Xintong
>
>
>
> On Mon, Feb 26, 2024 at 6:08 PM weijie guo 
> wrote:
>
> > Hi everyone,
> >
> >
> > Thanks for all the feedback about the FLIP-408: [Umbrella] Introduce
> > DataStream API V2 [1]. The discussion thread is here [2].
> >
> >
> > The vote will be open for at least 72 hours unless there is an
> > objection or insufficient votes.
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2
> >
> > [2] https://lists.apache.org/thread/w8olky9s7fo5h8fl3nj3qbym307zk2l0
> >
> > Best regards,
> >
> > Weijie
> >
>


[VOTE] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-02-26 Thread weijie guo
Hi everyone,


Thanks for all the feedback about the FLIP-410: Config, Context and
Processing Timer Service of DataStream API V2 [1]. The discussion
thread is here [2].


The vote will be open for at least 72 hours unless there is an
objection or insufficient votes.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2

[2] https://lists.apache.org/thread/70gf028c5gsdb9qhsgpht0chzyp9nogc


Best regards,

Weijie


[VOTE] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-26 Thread weijie guo
Hi everyone,


Thanks for all the feedback about the FLIP-409: DataStream V2 Building
Blocks: DataStream, Partitioning and ProcessFunction [1]. The
discussion thread is here [2].


The vote will be open for at least 72 hours unless there is an
objection or insufficient votes.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction

[2] https://lists.apache.org/thread/cwds0bwbgy3lfdgnlqbfhm6lfvx2qbrv


Best regards,

Weijie


[VOTE] FLIP-408: [Umbrella] Introduce DataStream API V2

2024-02-26 Thread weijie guo
Hi everyone,


Thanks for all the feedback about the FLIP-408: [Umbrella] Introduce
DataStream API V2 [1]. The discussion thread is here [2].


The vote will be open for at least 72 hours unless there is an
objection or insufficient votes.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2

[2] https://lists.apache.org/thread/w8olky9s7fo5h8fl3nj3qbym307zk2l0

Best regards,

Weijie


Re: [DISCUSS] FLIP-408: [Umbrella] Introduce DataStream API V2

2024-02-25 Thread weijie guo
Hi Guowei,

thanks for your reply!

> Do connectors and SQL currently have similar problems?

- Connectors:
The APIs for FLIP-27 based source and Sink-V2 are currently in flink-core,
and we will gradually move them to flink-core-api. Anyway, connector should
be free of this problem.

- SQL/Table:

For pure SQL job, the user does not need to have any dependencies when
writing the SQL query string.

But for the Table API, it relies on the original DataStream(i.e. DataStream
V1), so the implementation is not decoupled from the API.

In the future, we will consider building the Table API on top of DataStream
API V2, which solves this problem also.


Best regards,

Weijie


Guowei Ma  于2024年2月26日周一 14:37写道:

> Hi,weijie
>
> Thank you very much to Weijie for proposing this series of improvements,
> especially the complete decoupling of user interface and implementation.
> This part is actually a very serious problem that disturbs downstream users
> in the community. I hope this problem can be completely solved in the
> future.
>
> However, regarding the API decoupling part, I have a question: Do
> connectors and SQL currently have similar problems? If so, will similar
> methods be used to solve them?
>
> Best,
> Guowei
>
>
> On Tue, Feb 20, 2024 at 3:10 PM weijie guo 
> wrote:
>
> > Hi All,
> >
> > Thanks for all the feedback.
> >
> > If there are no more comments, I would like to start the vote thread,
> > thanks again!
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Xintong Song  于2024年1月30日周二 11:04写道:
> >
> > > Thanks for working on this, Weijie.
> > >
> > > The design flaws of the current DataStream API (i.e., V1) have been a
> > pain
> > > for a long time. It's great to see efforts going on trying to resolve
> > them.
> > >
> > > Significant changes to such an important and comprehensive set of
> public
> > > APIs deserves caution. From that perspective, the ideas of introducing
> a
> > > new set of APIs that gradually replace the current one, splitting the
> > > introducing of the new APIs into many separate FLIPs, and making
> > > intermediate APIs @Experiemental until all of them are completed make
> > > great sense to me.
> > >
> > > Besides, the ideas of generalized watermark, execution hints sound
> quite
> > > interesting. Looking forward to more detailed discussions in the
> > > corresponding sub-FLIPs.
> > >
> > > +1 for the roadmap.
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Tue, Jan 30, 2024 at 11:00 AM weijie guo  >
> > > wrote:
> > >
> > > > Hi Wencong:
> > > >
> > > > > The Processing TimerService is currently
> > > > defined as one of the basic primitives, partly because it's
> understood
> > > that
> > > > you have to choose between processing time and event time.
> > > > The other part of the reason is that it needs to work based on the
> > task's
> > > > mailbox thread model to avoid concurrency issues. Could you clarify
> the
> > > > second
> > > > part of the reason?
> > > >
> > > > Since the processing logic of the operators takes place in the
> mailbox
> > > > thread, the processing timer's callback function must also be
> executed
> > in
> > > > the mailbox to ensure thread safety.
> > > > If we do not define the Processing TimerService as primitive, there
> is
> > no
> > > > way for the user to dispatch custom logic to the mailbox thread.
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Weijie
> > > >
> > > >
> > > > Xuannan Su  于2024年1月29日周一 17:12写道:
> > > >
> > > > > Hi Weijie,
> > > > >
> > > > > Thanks for driving the work! There are indeed many pain points in
> the
> > > > > current DataStream API, which are challenging to resolve with its
> > > > > existing design. It is a great opportunity to propose a new
> > DataStream
> > > > > API that tackles these issues. I like the way we've divided the
> FLIP
> > > > > into multiple sub-FLIPs; the roadmap is clear and comprehensible.
> +1
> > > > > for the umbrella FLIP. I am eager to see the sub-FLIPs!
> > > > >
> > > > > Best regards,
> > > > > Xuannan
> > > > >
> > > >

Re: [DISCUSS] FLIP-408: [Umbrella] Introduce DataStream API V2

2024-02-19 Thread weijie guo
Hi All,

Thanks for all the feedback.

If there are no more comments, I would like to start the vote thread,
thanks again!

Best regards,

Weijie


Xintong Song  于2024年1月30日周二 11:04写道:

> Thanks for working on this, Weijie.
>
> The design flaws of the current DataStream API (i.e., V1) have been a pain
> for a long time. It's great to see efforts going on trying to resolve them.
>
> Significant changes to such an important and comprehensive set of public
> APIs deserves caution. From that perspective, the ideas of introducing a
> new set of APIs that gradually replace the current one, splitting the
> introducing of the new APIs into many separate FLIPs, and making
> intermediate APIs @Experiemental until all of them are completed make
> great sense to me.
>
> Besides, the ideas of generalized watermark, execution hints sound quite
> interesting. Looking forward to more detailed discussions in the
> corresponding sub-FLIPs.
>
> +1 for the roadmap.
>
> Best,
>
> Xintong
>
>
>
> On Tue, Jan 30, 2024 at 11:00 AM weijie guo 
> wrote:
>
> > Hi Wencong:
> >
> > > The Processing TimerService is currently
> > defined as one of the basic primitives, partly because it's understood
> that
> > you have to choose between processing time and event time.
> > The other part of the reason is that it needs to work based on the task's
> > mailbox thread model to avoid concurrency issues. Could you clarify the
> > second
> > part of the reason?
> >
> > Since the processing logic of the operators takes place in the mailbox
> > thread, the processing timer's callback function must also be executed in
> > the mailbox to ensure thread safety.
> > If we do not define the Processing TimerService as primitive, there is no
> > way for the user to dispatch custom logic to the mailbox thread.
> >
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Xuannan Su  于2024年1月29日周一 17:12写道:
> >
> > > Hi Weijie,
> > >
> > > Thanks for driving the work! There are indeed many pain points in the
> > > current DataStream API, which are challenging to resolve with its
> > > existing design. It is a great opportunity to propose a new DataStream
> > > API that tackles these issues. I like the way we've divided the FLIP
> > > into multiple sub-FLIPs; the roadmap is clear and comprehensible. +1
> > > for the umbrella FLIP. I am eager to see the sub-FLIPs!
> > >
> > > Best regards,
> > > Xuannan
> > >
> > >
> > >
> > >
> > > On Wed, Jan 24, 2024 at 8:55 PM Wencong Liu 
> > wrote:
> > > >
> > > > Hi Weijie,
> > > >
> > > >
> > > > Thank you for the effort you've put into the DataStream API ! By
> > > reorganizing and
> > > > redesigning the DataStream API, as well as addressing some of the
> > > unreasonable
> > > > designs within it, we can enhance the efficiency of job development
> for
> > > developers.
> > > > It also allows developers to design more flexible Flink jobs to meet
> > > business requirements.
> > > >
> > > >
> > > > I have conducted a comprehensive review of the DataStream API design
> in
> > > versions
> > > > 1.18 and 1.19. I found quite a few functional defects in the
> DataStream
> > > API, such as the
> > > > lack of corresponding APIs in batch processing scenarios. In the
> > > upcoming 1.20 version,
> > > > I will further improve the DataStream API in batch computing
> scenarios.
> > > >
> > > >
> > > > The issues existing in the old DataStream API (which can be referred
> to
> > > as V1) can be
> > > > addressed from a design perspective in the initial version of V2. I
> > hope
> > > to also have the
> > > >  opportunity to participate in the development of DataStream V2 and
> > make
> > > my contribution.
> > > >
> > > >
> > > > Regarding FLIP-408, I have a question: The Processing TimerService is
> > > currently
> > > > defined as one of the basic primitives, partly because it's
> understood
> > > that
> > > > you have to choose between processing time and event time.
> > > > The other part of the reason is that it needs to work based on the
> > task's
> > > > mailbox thread model to avoid concurrency issues. Could you clarify
> the
> > > second
> > > > part of the reason?
> > >

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-19 Thread weijie guo
Hi All,

Thanks for all the feedback.

If there are no more comments, I would like to start the vote thread,
thanks again!

Best regards,

Weijie


Xintong Song  于2024年2月20日周二 14:17写道:

> Thanks for the updates. LGTM.
>
> Best,
>
> Xintong
>
>
>
> On Mon, Feb 19, 2024 at 10:51 AM weijie guo 
> wrote:
>
> > Thanks for the reply, Xintong.
> >
> > Based on your comments, I made the following changes to this FLIP:
> >
> > 1. Renaming `TwoInputStreamProcessFunction` and
> > `BroadcastTwoInputStreamProcessFunction` to
> > `TwoInputNonBroadcastStreamProcessFunction` and
> > `TwoInputBroadcastStreamProcessFunction`, respectively.
> >
> > 2. Making `NonPartitionedContext` extend `RuntimeContext`.
> >
> > > Some of these changes also affect FLIP-410. I noticed that FLIP-410 is
> > also updated accordingly. It would be nice to also mention those changes
> in
> > the FLIP-410 discussion thread.
> >
> > Yes, I've now mentioned those updates in the FLIP-410 discussion thread.
> >
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Xintong Song  于2024年2月5日周一 10:58写道:
> >
> > > Thanks for updating the FLIP, Weijie.
> > >
> > > I think separating the TwoInputProcessFunction according to whether the
> > > input stream contains BroadcastStream makes sense.
> > >
> > > I have a few more comments.
> > > 1. I'd suggest the names `TwoInputNonBroadcastStreamProcessFunction`
> and
> > > `TwoInputBroadcastStreamProcessFunction` for the separated methods.
> > > 2. I'd suggest making `NonPartitionedContext` extend `RuntimeContext`.
> > > Otherwise, for all the functionalities that `RuntimeContext` provides,
> we
> > > need to duplicate them for `NonPartitionedContext`.
> > > 3. Some of these changes also affect FLIP-410. I noticed that FLIP-410
> is
> > > also updated accordingly. It would be nice to also mention those
> changes
> > in
> > > the FLIP-410 discussion thread.
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Sun, Feb 4, 2024 at 11:23 AM weijie guo 
> > > wrote:
> > >
> > > > Hi Xuannan and Xintong,
> > > >
> > > > Good point! After further consideration, I feel that we should make
> the
> > > > Broadcast + NonKeyed/Keyed process function different from the normal
> > > > TwoInputProcessFunction. Because the record from the broadcast input
> > > indeed
> > > > correspond to all partitions, while the record from the non-broadcast
> > > edge
> > > > have explicit partitions.
> > > >
> > > > When we consider the data of broadcast input, it is only valid to do
> > > > something on all the partitions at once, such as things like
> > > > `applyToKeyedState`. Similarly, other operations(e.g, endOfInput)
> that
> > do
> > > > not determine the current partition should also only be allowed to
> > > perform
> > > > on all partitions. This FLIP has been updated.
> > > >
> > > > Best regards,
> > > >
> > > > Weijie
> > > >
> > > >
> > > > Xintong Song  于2024年2月1日周四 11:31写道:
> > > >
> > > > > OK, I see your point.
> > > > >
> > > > > I think the demand for updating states and emitting outputs upon
> > > > receiving
> > > > > a broadcast record makes sense. However, the way
> > > > > `KeyedBroadcastProcessFunction` supports this may not be optimal.
> > E.g.,
> > > > if
> > > > > `Collector#collect` is called in `processBroadcastElement` but
> > outside
> > > of
> > > > > `Context#applyToKeyedState`, the result can be undefined.
> > > > >
> > > > > Currently in this FLIP, a `TwoInputStreamProcessFunction` is not
> > aware
> > > of
> > > > > which input is KeyedStream and which is BroadcastStream, which
> makes
> > > > > supporting things like `applyToKeyedState` difficult. I think we
> can
> > > > > provide a built-in function similar to
> > `KeyedBroadcastProcessFunction`
> > > on
> > > > > top of `TwoInputStreamProcessFunction` to address this demand.
> > > > >
> > > > > WDYT?
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Xintong

Re: [DISCUSS] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-02-19 Thread weijie guo
Hi All,

Thanks for all the feedback.

If there are no more comments, I would like to start the vote thread,
thanks again!

Best regards,

Weijie


Xintong Song  于2024年2月20日周二 14:17写道:

> LGTM
>
> Best,
>
> Xintong
>
>
>
> On Mon, Feb 19, 2024 at 10:48 AM weijie guo 
> wrote:
>
> > Hi All,
> >
> > Based on the discussion thread of FLIP-409, I did a synchronized update
> to
> > this one. In simple terms, added `TwoInputBroadcastStreamProcessFunction`
> > related content.
> >
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > weijie guo  于2024年1月31日周三 15:00写道:
> >
> > > Hi Xintong,
> > >
> > > Thanks for the quick reply.
> > >
> > > > Why introduce a new `MetricManager` rather than just return
> > `MetricGroup`
> > > from `RuntimeContext`?
> > >
> > > This is to facilitate possible future extensions. But I thought it
> > > through, MetricGroup itself also plays the role of a manager.
> > > So I think you are right, I will add a `getMetricGroup` method directly
> > in
> > > `RuntimeContext`.
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Xintong Song  于2024年1月31日周三 14:02写道:
> > >
> > >> >
> > >> > > How can users define custom metrics within the `ProcessFunction`?
> > >> > Will there be a method like `getMetricGroup` available in the
> > >> > `RuntimeContext`?
> > >> >
> > >> > I think this is a reasonable request. For extensibility, I have
> added
> > >> the
> > >> > getMetricManager instead of getMetricGroup to RuntimeContext, we can
> > >> use it
> > >> > to get the MetricGroup.
> > >> >
> > >>
> > >> Why introduce a new `MetricManager` rather than just return
> > `MetricGroup`
> > >> from `RuntimeContext`?
> > >>
> > >> > Q2. The FLIP describes the interface for handling processing
> > >> >  timers (ProcessingTimeManager), but it does not mention
> > >> > how to delete or update an existing timer. V1 API provides
> TimeService
> > >> > that could delete a timer. Does this mean that
> > >> >  once a timer is registered, it cannot be changed?
> > >> >
> > >> > I think we do need to introduce a method to delete the timer, but
> I'm
> > >> kind
> > >> > of curious why we need to update the timer instead of registering a
> > new
> > >> > one. Anyway, I have updated the FLIP to support delete the timer.
> > >> >
> > >>
> > >> Registering a new timer does not mean the old timer should be removed.
> > >> There could be multiple timers.
> > >>
> > >> If we don't support deleting timers, developers can still decide to do
> > >> nothing upon the timer is triggered. In that case, they will need
> > >> additional logic to decide whether the timer should be skipped or not
> in
> > >> `onProcessingTimer`. Besides, there could also be additional
> performance
> > >> overhead in frequent calling and skipping the callback.
> > >>
> > >> Best,
> > >>
> > >> Xintong
> > >>
> > >>
> > >>
> > >> On Tue, Jan 30, 2024 at 3:26 PM weijie guo  >
> > >> wrote:
> > >>
> > >> > Hi Wencong,
> > >> >
> > >> > > Q1. In the "Configuration" section, it is mentioned that
> > >> > configurations can be set continuously using the withXXX methods.
> > >> > Are these configuration options the same as those provided by
> > DataStream
> > >> > V1,
> > >> > or might there be different options compared to V1?
> > >> >
> > >> > I haven't considered options that don't exist in V1 yet, but we may
> > have
> > >> > some new options as we continue to develop.
> > >> >
> > >> > > Q2. The FLIP describes the interface for handling processing
> > >> >  timers (ProcessingTimeManager), but it does not mention
> > >> > how to delete or update an existing timer. V1 API provides
> TimeService
> > >> > that could delete a timer. Does this mean that
> > >> >  once a timer is registered, it cannot be changed?
> > >> >
> > >> > I thin

Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun

2024-02-19 Thread weijie guo
Congratulations, Jiabao :)

Best regards,

Weijie


Hang Ruan  于2024年2月19日周一 18:04写道:

> Congratulations, Jiabao!
>
> Best,
> Hang
>
> Qingsheng Ren  于2024年2月19日周一 17:53写道:
>
> > Hi everyone,
> >
> > On behalf of the PMC, I'm happy to announce Jiabao Sun as a new Flink
> > Committer.
> >
> > Jiabao began contributing in August 2022 and has contributed 60+ commits
> > for Flink main repo and various connectors. His most notable contribution
> > is being the core author and maintainer of MongoDB connector, which is
> > fully functional in DataStream and Table/SQL APIs. Jiabao is also the
> > author of FLIP-377 and the main contributor of JUnit 5 migration in
> runtime
> > and table planner modules.
> >
> > Beyond his technical contributions, Jiabao is an active member of our
> > community, participating in the mailing list and consistently
> volunteering
> > for release verifications and code reviews with enthusiasm.
> >
> > Please join me in congratulating Jiabao for becoming an Apache Flink
> > committer!
> >
> > Best,
> > Qingsheng (on behalf of the Flink PMC)
> >
>


Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-18 Thread weijie guo
Thanks for the reply, Xintong.

Based on your comments, I made the following changes to this FLIP:

1. Renaming `TwoInputStreamProcessFunction` and
`BroadcastTwoInputStreamProcessFunction` to
`TwoInputNonBroadcastStreamProcessFunction` and
`TwoInputBroadcastStreamProcessFunction`, respectively.

2. Making `NonPartitionedContext` extend `RuntimeContext`.

> Some of these changes also affect FLIP-410. I noticed that FLIP-410 is
also updated accordingly. It would be nice to also mention those changes in
the FLIP-410 discussion thread.

Yes, I've now mentioned those updates in the FLIP-410 discussion thread.


Best regards,

Weijie


Xintong Song  于2024年2月5日周一 10:58写道:

> Thanks for updating the FLIP, Weijie.
>
> I think separating the TwoInputProcessFunction according to whether the
> input stream contains BroadcastStream makes sense.
>
> I have a few more comments.
> 1. I'd suggest the names `TwoInputNonBroadcastStreamProcessFunction` and
> `TwoInputBroadcastStreamProcessFunction` for the separated methods.
> 2. I'd suggest making `NonPartitionedContext` extend `RuntimeContext`.
> Otherwise, for all the functionalities that `RuntimeContext` provides, we
> need to duplicate them for `NonPartitionedContext`.
> 3. Some of these changes also affect FLIP-410. I noticed that FLIP-410 is
> also updated accordingly. It would be nice to also mention those changes in
> the FLIP-410 discussion thread.
>
> Best,
>
> Xintong
>
>
>
> On Sun, Feb 4, 2024 at 11:23 AM weijie guo 
> wrote:
>
> > Hi Xuannan and Xintong,
> >
> > Good point! After further consideration, I feel that we should make the
> > Broadcast + NonKeyed/Keyed process function different from the normal
> > TwoInputProcessFunction. Because the record from the broadcast input
> indeed
> > correspond to all partitions, while the record from the non-broadcast
> edge
> > have explicit partitions.
> >
> > When we consider the data of broadcast input, it is only valid to do
> > something on all the partitions at once, such as things like
> > `applyToKeyedState`. Similarly, other operations(e.g, endOfInput) that do
> > not determine the current partition should also only be allowed to
> perform
> > on all partitions. This FLIP has been updated.
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Xintong Song  于2024年2月1日周四 11:31写道:
> >
> > > OK, I see your point.
> > >
> > > I think the demand for updating states and emitting outputs upon
> > receiving
> > > a broadcast record makes sense. However, the way
> > > `KeyedBroadcastProcessFunction` supports this may not be optimal. E.g.,
> > if
> > > `Collector#collect` is called in `processBroadcastElement` but outside
> of
> > > `Context#applyToKeyedState`, the result can be undefined.
> > >
> > > Currently in this FLIP, a `TwoInputStreamProcessFunction` is not aware
> of
> > > which input is KeyedStream and which is BroadcastStream, which makes
> > > supporting things like `applyToKeyedState` difficult. I think we can
> > > provide a built-in function similar to `KeyedBroadcastProcessFunction`
> on
> > > top of `TwoInputStreamProcessFunction` to address this demand.
> > >
> > > WDYT?
> > >
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Thu, Feb 1, 2024 at 10:41 AM Xuannan Su 
> > wrote:
> > >
> > > > Hi Weijie and Xingtong,
> > > >
> > > > Thanks for the reply! Please see my comments below.
> > > >
> > > > > Does this mean if we want to support (KeyedStream, BroadcastStream)
> > ->
> > > > > (KeyedStream), we must make sure that no data can be output upon
> > > > processing
> > > > > records from the input BroadcastStream? That's probably a
> reasonable
> > > > > limitation.
> > > >
> > > > I don't think that the requirement for supporting (KeyedStream,
> > > > BroadcastStream) -> (KeyedStream) is that no data can be output upon
> > > > processing the BroadcastStream. For instance, in the current
> > > > `KeyedBroadcastProcessFunction`, we use Context#applyToKeyedState to
> > > > produce output results, which can be keyed in the same manner as the
> > > > keyed input stream, upon processing data from the BroadcastStream.
> > > > Therefore, I believe it only requires that the user must ensure that
> > > > the output is keyed in the same way as the input, in this case, the
> > > > same way as the k

Re: [DISCUSS] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-02-18 Thread weijie guo
Hi All,

Based on the discussion thread of FLIP-409, I did a synchronized update to
this one. In simple terms, added `TwoInputBroadcastStreamProcessFunction`
related content.


Best regards,

Weijie


weijie guo  于2024年1月31日周三 15:00写道:

> Hi Xintong,
>
> Thanks for the quick reply.
>
> > Why introduce a new `MetricManager` rather than just return `MetricGroup`
> from `RuntimeContext`?
>
> This is to facilitate possible future extensions. But I thought it
> through, MetricGroup itself also plays the role of a manager.
> So I think you are right, I will add a `getMetricGroup` method directly in
> `RuntimeContext`.
>
> Best regards,
>
> Weijie
>
>
> Xintong Song  于2024年1月31日周三 14:02写道:
>
>> >
>> > > How can users define custom metrics within the `ProcessFunction`?
>> > Will there be a method like `getMetricGroup` available in the
>> > `RuntimeContext`?
>> >
>> > I think this is a reasonable request. For extensibility, I have added
>> the
>> > getMetricManager instead of getMetricGroup to RuntimeContext, we can
>> use it
>> > to get the MetricGroup.
>> >
>>
>> Why introduce a new `MetricManager` rather than just return `MetricGroup`
>> from `RuntimeContext`?
>>
>> > Q2. The FLIP describes the interface for handling processing
>> >  timers (ProcessingTimeManager), but it does not mention
>> > how to delete or update an existing timer. V1 API provides TimeService
>> > that could delete a timer. Does this mean that
>> >  once a timer is registered, it cannot be changed?
>> >
>> > I think we do need to introduce a method to delete the timer, but I'm
>> kind
>> > of curious why we need to update the timer instead of registering a new
>> > one. Anyway, I have updated the FLIP to support delete the timer.
>> >
>>
>> Registering a new timer does not mean the old timer should be removed.
>> There could be multiple timers.
>>
>> If we don't support deleting timers, developers can still decide to do
>> nothing upon the timer is triggered. In that case, they will need
>> additional logic to decide whether the timer should be skipped or not in
>> `onProcessingTimer`. Besides, there could also be additional performance
>> overhead in frequent calling and skipping the callback.
>>
>> Best,
>>
>> Xintong
>>
>>
>>
>> On Tue, Jan 30, 2024 at 3:26 PM weijie guo 
>> wrote:
>>
>> > Hi Wencong,
>> >
>> > > Q1. In the "Configuration" section, it is mentioned that
>> > configurations can be set continuously using the withXXX methods.
>> > Are these configuration options the same as those provided by DataStream
>> > V1,
>> > or might there be different options compared to V1?
>> >
>> > I haven't considered options that don't exist in V1 yet, but we may have
>> > some new options as we continue to develop.
>> >
>> > > Q2. The FLIP describes the interface for handling processing
>> >  timers (ProcessingTimeManager), but it does not mention
>> > how to delete or update an existing timer. V1 API provides TimeService
>> > that could delete a timer. Does this mean that
>> >  once a timer is registered, it cannot be changed?
>> >
>> > I think we do need to introduce a method to delete the timer, but I'm
>> kind
>> > of curious why we need to update the timer instead of registering a new
>> > one. Anyway, I have updated the FLIP to support delete the timer.
>> >
>> >
>> >
>> > Best regards,
>> >
>> > Weijie
>> >
>> >
>> > weijie guo  于2024年1月30日周二 14:35写道:
>> >
>> > > Hi Xuannan,
>> > >
>> > > > 1. +1 to only use XXXParititionStream if users only need to use the
>> > > configurable PartitionStream.  If there are use cases for both,
>> > > perhaps we could use `ProcessConfigurableNonKeyedPartitionStream` or
>> > > `ConfigurableNonKeyedPartitionStream` for simplicity.
>> > >
>> > > As for why we need both, you can refer to my reply to Yunfeng's first
>> > > question. As for the name, I can accept
>> > > ProcessConfigurableNonKeyedPartitionStream or keep the status quo.
>> But I
>> > > don't want to change it to ConfigurableNonKeyedPartitionStream, the
>> > reason
>> > > is the same, because the configuration is applied to the Process
>> rather
>> > > than the Stream.
>> > >
>

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-03 Thread weijie guo
Hi Xuannan and Xintong,

Good point! After further consideration, I feel that we should make the
Broadcast + NonKeyed/Keyed process function different from the normal
TwoInputProcessFunction. Because the record from the broadcast input indeed
correspond to all partitions, while the record from the non-broadcast edge
have explicit partitions.

When we consider the data of broadcast input, it is only valid to do
something on all the partitions at once, such as things like
`applyToKeyedState`. Similarly, other operations(e.g, endOfInput) that do
not determine the current partition should also only be allowed to perform
on all partitions. This FLIP has been updated.

Best regards,

Weijie


Xintong Song  于2024年2月1日周四 11:31写道:

> OK, I see your point.
>
> I think the demand for updating states and emitting outputs upon receiving
> a broadcast record makes sense. However, the way
> `KeyedBroadcastProcessFunction` supports this may not be optimal. E.g., if
> `Collector#collect` is called in `processBroadcastElement` but outside of
> `Context#applyToKeyedState`, the result can be undefined.
>
> Currently in this FLIP, a `TwoInputStreamProcessFunction` is not aware of
> which input is KeyedStream and which is BroadcastStream, which makes
> supporting things like `applyToKeyedState` difficult. I think we can
> provide a built-in function similar to `KeyedBroadcastProcessFunction` on
> top of `TwoInputStreamProcessFunction` to address this demand.
>
> WDYT?
>
>
> Best,
>
> Xintong
>
>
>
> On Thu, Feb 1, 2024 at 10:41 AM Xuannan Su  wrote:
>
> > Hi Weijie and Xingtong,
> >
> > Thanks for the reply! Please see my comments below.
> >
> > > Does this mean if we want to support (KeyedStream, BroadcastStream) ->
> > > (KeyedStream), we must make sure that no data can be output upon
> > processing
> > > records from the input BroadcastStream? That's probably a reasonable
> > > limitation.
> >
> > I don't think that the requirement for supporting (KeyedStream,
> > BroadcastStream) -> (KeyedStream) is that no data can be output upon
> > processing the BroadcastStream. For instance, in the current
> > `KeyedBroadcastProcessFunction`, we use Context#applyToKeyedState to
> > produce output results, which can be keyed in the same manner as the
> > keyed input stream, upon processing data from the BroadcastStream.
> > Therefore, I believe it only requires that the user must ensure that
> > the output is keyed in the same way as the input, in this case, the
> > same way as the keyed input stream. I think this requirement is
> > consistent with that of (KeyedStream, KeyedStream) -> (KeyedStream).
> > Thus, I believe that supporting (KeyedStream, BroadcastStream) ->
> > (KeyedStream) will not introduce complexity for the users. WDYT?
> >
> > Best regards,
> > Xuannan
> >
> >
> > On Tue, Jan 30, 2024 at 3:12 PM weijie guo 
> > wrote:
> > >
> > > Hi Xintong,
> > >
> > > Thanks for your reply.
> > >
> > > > Does this mean if we want to support (KeyedStream, BroadcastStream)
> ->
> > > (KeyedStream), we must make sure that no data can be output upon
> > processing
> > > records from the input BroadcastStream? That's probably a reasonable
> > > limitation.
> > >
> > > I think so, this is the restriction that has to be imposed in order to
> > > avoid re-partition(i.e. shuffle).
> > > If one just want to get a keyed-stream and don't care about the data
> > > distribution, then explicit KeyBy partitioning works as expected.
> > >
> > > > The problem is would this limitation be too implicit for the users to
> > > understand.
> > >
> > > Since we can't check for this limitation at compile time, if we were to
> > add
> > > support for this case, we would have to introduce additional runtime
> > checks
> > > to ensure program correctness. For now, I'm inclined not to support it,
> > as
> > > it's hard for users to understand this restriction unless we have
> > something
> > > better. And we can always add it later if we do realize there's a
> strong
> > > demand for it.
> > >
> > > > 1. I'd suggest renaming the method with timestamp to something like
> > > `collectAndOverwriteTimestamp`. That might help users understand that
> > they
> > > don't always need to call this method, unless they explicitly want to
> > > overwrite the timestamp.
> > >
> > > Make sense, I have updated this FLIP toward this new method name.
> > >
> > > &g

Re: [DISCUSS] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-01-30 Thread weijie guo
Hi Xintong,

Thanks for the quick reply.

> Why introduce a new `MetricManager` rather than just return `MetricGroup`
from `RuntimeContext`?

This is to facilitate possible future extensions. But I thought it through,
MetricGroup itself also plays the role of a manager.
So I think you are right, I will add a `getMetricGroup` method directly in
`RuntimeContext`.

Best regards,

Weijie


Xintong Song  于2024年1月31日周三 14:02写道:

> >
> > > How can users define custom metrics within the `ProcessFunction`?
> > Will there be a method like `getMetricGroup` available in the
> > `RuntimeContext`?
> >
> > I think this is a reasonable request. For extensibility, I have added the
> > getMetricManager instead of getMetricGroup to RuntimeContext, we can use
> it
> > to get the MetricGroup.
> >
>
> Why introduce a new `MetricManager` rather than just return `MetricGroup`
> from `RuntimeContext`?
>
> > Q2. The FLIP describes the interface for handling processing
> >  timers (ProcessingTimeManager), but it does not mention
> > how to delete or update an existing timer. V1 API provides TimeService
> > that could delete a timer. Does this mean that
> >  once a timer is registered, it cannot be changed?
> >
> > I think we do need to introduce a method to delete the timer, but I'm
> kind
> > of curious why we need to update the timer instead of registering a new
> > one. Anyway, I have updated the FLIP to support delete the timer.
> >
>
> Registering a new timer does not mean the old timer should be removed.
> There could be multiple timers.
>
> If we don't support deleting timers, developers can still decide to do
> nothing upon the timer is triggered. In that case, they will need
> additional logic to decide whether the timer should be skipped or not in
> `onProcessingTimer`. Besides, there could also be additional performance
> overhead in frequent calling and skipping the callback.
>
> Best,
>
> Xintong
>
>
>
> On Tue, Jan 30, 2024 at 3:26 PM weijie guo 
> wrote:
>
> > Hi Wencong,
> >
> > > Q1. In the "Configuration" section, it is mentioned that
> > configurations can be set continuously using the withXXX methods.
> > Are these configuration options the same as those provided by DataStream
> > V1,
> > or might there be different options compared to V1?
> >
> > I haven't considered options that don't exist in V1 yet, but we may have
> > some new options as we continue to develop.
> >
> > > Q2. The FLIP describes the interface for handling processing
> >  timers (ProcessingTimeManager), but it does not mention
> > how to delete or update an existing timer. V1 API provides TimeService
> > that could delete a timer. Does this mean that
> >  once a timer is registered, it cannot be changed?
> >
> > I think we do need to introduce a method to delete the timer, but I'm
> kind
> > of curious why we need to update the timer instead of registering a new
> > one. Anyway, I have updated the FLIP to support delete the timer.
> >
> >
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > weijie guo  于2024年1月30日周二 14:35写道:
> >
> > > Hi Xuannan,
> > >
> > > > 1. +1 to only use XXXParititionStream if users only need to use the
> > > configurable PartitionStream.  If there are use cases for both,
> > > perhaps we could use `ProcessConfigurableNonKeyedPartitionStream` or
> > > `ConfigurableNonKeyedPartitionStream` for simplicity.
> > >
> > > As for why we need both, you can refer to my reply to Yunfeng's first
> > > question. As for the name, I can accept
> > > ProcessConfigurableNonKeyedPartitionStream or keep the status quo. But
> I
> > > don't want to change it to ConfigurableNonKeyedPartitionStream, the
> > reason
> > > is the same, because the configuration is applied to the Process rather
> > > than the Stream.
> > >
> > > > Should we allow users to set custom configurations through the
> > > `ProcessConfigurable` interface and access these configurations in the
> > > `ProcessFunction` via `RuntimeContext`? I believe it would be useful
> > > for process function developers to be able to define custom
> > > configurations.
> > >
> > > If I understand you correctly, you want to set custom properties for
> > > processing. The current configurations are mostly for the runtime
> engine,
> > > such as determining the underlying operator 's parallelism and SSG. But
> > I'm
> > > not aware of the need to pass in a custom valu

Re: [VOTE] FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment

2024-01-30 Thread weijie guo
+1(binding)


Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年1月31日周三 12:51写道:

> +1(binding)
>
> Best,
> Rui
>
> On Wed, Jan 31, 2024 at 12:46 PM Xintong Song 
> wrote:
>
> > +1
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Wed, Jan 31, 2024 at 11:41 AM Xuannan Su 
> wrote:
> >
> > > Hi everyone,
> > >
> > > Thanks for all the feedback about the FLIP-331: Support
> > > EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute
> > > to optimize task deployment [1] [2].
> > >
> > > I'd like to start a vote for it. The vote will be open for at least 72
> > > hours(excluding weekends,until Feb 5, 12:00AM GMT) unless there is an
> > > objection or an insufficient number of votes.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamTrigger+and+isOutputOnlyAfterEndOfStream+operator+attribute+to+optimize+task+deployment
> > > [2] https://lists.apache.org/thread/qq39rmg3f23ysx5m094s4c4cq0m4tdj5
> > >
> > >
> > > Best,
> > > Xuannan
> > >
> >
>


Re: Re: [DISCUSS] FLIP-331: Support EndOfStreamWindows and isOutputOnEOF operator attribute to optimize task deployment

2024-01-29 Thread weijie guo
Thanks Dong and Xuannan,

The updated FLIP LGTM.


Best regards,

Weijie


Xuannan Su  于2024年1月30日周二 11:10写道:

> Hi all,
>
> Thanks for the comments and suggestions. If there are no further
> comments, I will open the voting thread tomorrow.
>
> Best regards,
> Xuannan
> On Fri, Jan 26, 2024 at 3:16 PM Dong Lin  wrote:
> >
> > Thanks Xuannan for the update!
> >
> > +1 (binding)
> >
> > On Wed, Jan 10, 2024 at 5:54 PM Xuannan Su 
> wrote:
> >
> > > Hi all,
> > >
> > > After several rounds of offline discussions with Xingtong and Jinhao,
> > > we have decided to narrow the scope of the FLIP. It will now focus on
> > > introducing OperatorAttributes that indicate whether an operator emits
> > > records only after inputs have ended. We will also use the attribute
> > > to optimize task scheduling for better resource utilization. Setting
> > > the backlog status and optimizing the operator implementation during
> > > the backlog will be deferred to future work.
> > >
> > > In addition to the change above, we also make the following changes to
> > > the FLIP to address the problems mentioned by Dong:
> > > - Public interfaces are updated to reuse the GlobalWindows.
> > > - Instead of making all outputs of the upstream operators of the
> > > "isOutputOnlyAfterEndOfStream=true" operator blocking, we only make
> > > the output of the operator with "isOutputOnlyAfterEndOfStream=true"
> > > blocking. This can prevent the second problem Dong mentioned. In the
> > > future, we may introduce an extra OperatorAttributes to indicate if an
> > > operator has any side output.
> > >
> > > I would greatly appreciate any comment or feedback you may have on the
> > > updated FLIP.
> > >
> > > Best regards,
> > > Xuannan
> > >
> > > On Tue, Sep 26, 2023 at 11:24 AM Dong Lin  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Thanks for the review!
> > > >
> > > > Becket and I discussed this FLIP offline and we agreed on several
> things
> > > > that need to be improved with this FLIP. I will summarize our
> discussion
> > > > with the problems and TODOs. We will update the FLIP and let you know
> > > once
> > > > the FLIP is ready for review again.
> > > >
> > > > 1) Investigate whether it is possible to update the existing
> > > GlobalWindows
> > > > in a backward-compatible way and re-use it for the same purpose
> > > > as EndOfStreamWindows, without introducing EndOfStreamWindows as a
> new
> > > > class.
> > > >
> > > > Note that GlobalWindows#getDefaultTrigger returns a NeverTrigger
> instance
> > > > which will not trigger window's computation even on end-of-inputs. We
> > > will
> > > > need to investigate its existing usage and see if we can re-use it
> in a
> > > > backward-compatible way.
> > > >
> > > > 2) Let JM know whether any operator in the upstream of the operator
> with
> > > > "isOutputOnEOF=true" will emit output via any side channel. The FLIP
> > > should
> > > > update the execution mode of those operators *only if* all outputs
> from
> > > > those operators are emitted only at the end of input.
> > > >
> > > > More specifically, the upstream operator might involve a user-defined
> > > > operator that might emit output directly to an external service,
> where
> > > the
> > > > emission operation is not explicitly expressed as an operator's
> output
> > > edge
> > > > and thus not visible to JM. Similarly, it is also possible for the
> > > > user-defined operator to register a timer
> > > > via InternalTimerService#registerEventTimeTimer and emit output to an
> > > > external service inside Triggerable#onEventTime. There is a chance
> that
> > > > users still need related logic to output data in real-time, even if
> the
> > > > downstream operators have isOutputOnEOF=true.
> > > >
> > > > One possible solution to address this problem is to add an extra
> > > > OperatorAttribute to specify whether this operator might output
> records
> > > in
> > > > such a way that does not go through operator's output (e.g. side
> output).
> > > > Then the JM can safely enable the runtime optimization currently
> > > described
> > > > in the FLIP when there is no such operator.
> > > >
> > > > 3) Create a follow-up FLIP that allows users to specify whether a
> source
> > > > with Boundedness=bounded should have isProcessingBacklog=true.
> > > >
> > > > This capability would effectively introduce a 3rd strategy to set
> backlog
> > > > status (in addition to FLIP-309 and FLIP-328). It might be useful to
> note
> > > > that, even though the data in bounded sources are backlog data in
> most
> > > > practical use-cases, it is not necessarily true. For example, users
> might
> > > > want to start a Flink job to consume real-time data from a Kafka
> topic
> > > and
> > > > specify that the job stops after 24 hours, which means the source is
> > > > technically bounded while the data is fresh/real-time.
> > > >
> > > > This capability is more generic and can cover more use-case than
> > > > EndOfStreamWindows. On the other hand, 

Re: [DISCUSS] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-01-29 Thread weijie guo
Hi Wencong,

> Q1. In the "Configuration" section, it is mentioned that
configurations can be set continuously using the withXXX methods.
Are these configuration options the same as those provided by DataStream V1,
or might there be different options compared to V1?

I haven't considered options that don't exist in V1 yet, but we may have
some new options as we continue to develop.

> Q2. The FLIP describes the interface for handling processing
 timers (ProcessingTimeManager), but it does not mention
how to delete or update an existing timer. V1 API provides TimeService
that could delete a timer. Does this mean that
 once a timer is registered, it cannot be changed?

I think we do need to introduce a method to delete the timer, but I'm kind
of curious why we need to update the timer instead of registering a new
one. Anyway, I have updated the FLIP to support delete the timer.



Best regards,

Weijie


weijie guo  于2024年1月30日周二 14:35写道:

> Hi Xuannan,
>
> > 1. +1 to only use XXXParititionStream if users only need to use the
> configurable PartitionStream.  If there are use cases for both,
> perhaps we could use `ProcessConfigurableNonKeyedPartitionStream` or
> `ConfigurableNonKeyedPartitionStream` for simplicity.
>
> As for why we need both, you can refer to my reply to Yunfeng's first
> question. As for the name, I can accept
> ProcessConfigurableNonKeyedPartitionStream or keep the status quo. But I
> don't want to change it to ConfigurableNonKeyedPartitionStream, the reason
> is the same, because the configuration is applied to the Process rather
> than the Stream.
>
> > Should we allow users to set custom configurations through the
> `ProcessConfigurable` interface and access these configurations in the
> `ProcessFunction` via `RuntimeContext`? I believe it would be useful
> for process function developers to be able to define custom
> configurations.
>
> If I understand you correctly, you want to set custom properties for
> processing. The current configurations are mostly for the runtime engine,
> such as determining the underlying operator 's parallelism and SSG. But I'm
> not aware of the need to pass in a custom value(independent of the
> framework itself) and then get it at runtime from RuntimeContext. Could
> you give some examples?
>
> > How can users define custom metrics within the `ProcessFunction`?
> Will there be a method like `getMetricGroup` available in the
> `RuntimeContext`?
>
> I think this is a reasonable request. For extensibility, I have added the
> getMetricManager instead of getMetricGroup to RuntimeContext, we can use
> it to get the MetricGroup.
>
>
> Best regards,
>
> Weijie
>
>
> weijie guo  于2024年1月30日周二 13:45写道:
>
>> Thanks Yunfeng,
>>
>> Let me try to answer your question :)
>>
>> > 1. Would it be better to have all XXXPartitionStream classes implement
>> ProcessConfigurable, instead of defining both XXXPartitionStream and
>> ProcessConfigurableAndXXXPartitionStream? I wonder whether users would
>> need to operate on a non-configurable PartitionStream.
>>
>> I thought about this for a while and decided to separate DataStream from
>> ProcessConfigurable. At the core of this is that streams and c
>> onfigurations are completely orthogonal concepts, and configuration is
>> only responsible for the `Process`, not the `Stream`. This is why only
>> the `process/connectAndProcess` returns configurable stream, but
>> partitioning like `KeyBy` returns a pure DataStream. This may also answer
>> your second question in passing.
>>
>>
>> > Apart from the detailed withConfigFoo(foo)/withConfigBar(bar)
>> methods, would it be better to also add a general
>> withConfig(configKey, configValue) method to the ProcessConfigurable
>> interface? Adding a method for each configuration might harm the
>> readability and compatibility of configurations.
>>
>> Sorry, I may not fully understand this question. ProcessConfigurable
>> simply refers to the configuration of the Process, which can have the name,
>> parallelism, etc of the process. It's not actually the 
>> Configuratiion(Contains
>> a lot of ConfigOptions) that we usually talk about, but more like
>> `SingleOutputStreamOperator` in DataStream V1.
>>
>> Best regards,
>>
>> Weijie
>>
>>
>> Xuannan Su  于2024年1月29日周一 18:45写道:
>>
>>> Hi Weijie,
>>>
>>> Thanks for the FLIP! I have a few questions regarding the FLIP.
>>>
>>> 1. +1 to only use XXXParititionStream if users only need to use the
>>> configurable PartitionStream.  If there are use cases for both,
>>> perhaps we could use `ProcessConfig

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-29 Thread weijie guo
Hi Xintong,

Thanks for your reply.

> Does this mean if we want to support (KeyedStream, BroadcastStream) ->
(KeyedStream), we must make sure that no data can be output upon processing
records from the input BroadcastStream? That's probably a reasonable
limitation.

I think so, this is the restriction that has to be imposed in order to
avoid re-partition(i.e. shuffle).
If one just want to get a keyed-stream and don't care about the data
distribution, then explicit KeyBy partitioning works as expected.

> The problem is would this limitation be too implicit for the users to
understand.

Since we can't check for this limitation at compile time, if we were to add
support for this case, we would have to introduce additional runtime checks
to ensure program correctness. For now, I'm inclined not to support it, as
it's hard for users to understand this restriction unless we have something
better. And we can always add it later if we do realize there's a strong
demand for it.

> 1. I'd suggest renaming the method with timestamp to something like
`collectAndOverwriteTimestamp`. That might help users understand that they
don't always need to call this method, unless they explicitly want to
overwrite the timestamp.

Make sense, I have updated this FLIP toward this new method name.

> 2. While this method provides a way to set timestamps, how would users
read
timestamps from the records?

Ah, good point. I will introduce a new method to get the timestamp of the
current record in RuntimeContext.


Best regards,

Weijie


Xintong Song  于2024年1月30日周二 14:04写道:

> Just trying to understand.
>
> > Is there a particular reason we do not support a
> > `TwoInputProcessFunction` to combine a KeyedStream with a
> > BroadcastStream to result in a KeyedStream? There seems to be a valid
> > use case where a KeyedStream is enriched with a BroadcastStream and
> > returns a Stream that is partitioned in the same way.
>
>
> > The key point here is that if the returned stream is a KeyedStream, we
> > require that the partition of  input and output be the same. As for the
> > data on the broadcast edge, it will be broadcast to all parallelism, we
> > cannot keep the data partition consistent. For example, if a specific
> > record is sent to both SubTask1 and SubTask2, after processing, the
> > partition index calculated by the new KeySelector is `1`, then the data
> > distribution of SubTask2 has obviously changed.
>
>
> Does this mean if we want to support (KeyedStream, BroadcastStream) ->
> (KeyedStream), we must make sure that no data can be output upon processing
> records from the input BroadcastStream? That's probably a reasonable
> limitation. The problem is would this limitation be too implicit for the
> users to understand.
>
>
> > I noticed that there are two `collect` methods in the Collector,
> >
> > one with a timestamp and one without. Could you elaborate on the
> > differences between them? Additionally, in what use case would one use
> > the method that includes the timestamp?
> >
> >
> > That's a good question, and it's mostly used with time-related operators
> > such as Window. First, we want to give the process function the ability
> to
> > reset timestamps, which makes it more flexible than the original
> > API. Second, we don't want to take the timestamp extraction
> > operator/function as a base primitive, it's more like a high-level
> > extension. Therefore, the framework must provide this functionality.
> >
> >
> 1. I'd suggest renaming the method with timestamp to something like
> `collectAndOverwriteTimestamp`. That might help users understand that they
> don't always need to call this method, unless they explicitly want to
> overwrite the timestamp.
>
> 2. While this method provides a way to set timestamps, how would users read
> timestamps from the records?
>
>
> Best,
>
> Xintong
>
>
>
> On Tue, Jan 30, 2024 at 12:45 PM weijie guo 
> wrote:
>
> > Hi Xuannan,
> >
> > Thank you for your attention.
> >
> > > In the partitioning section, it says that "broadcast can only be
> > used as a side-input of other Inputs." Could you clarify what is meant
> > by "side-input"? If I understand correctly, it refer to one of the
> > inputs of the `TwoInputStreamProcessFunction`. If that is the case,
> > the term "side-input" may not be accurate.
> >
> > Yes, you got it right! I have rewrote this sentence to avoid
> > misunderstanding.
> >
> > > Is there a particular reason we do not support a
> > `TwoInputProcessFunction` to combine a KeyedStream with a
> > BroadcastStream to result in a KeyedStream? There seems t

Re: [DISCUSS] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-01-29 Thread weijie guo
Hi Xuannan,

> 1. +1 to only use XXXParititionStream if users only need to use the
configurable PartitionStream.  If there are use cases for both,
perhaps we could use `ProcessConfigurableNonKeyedPartitionStream` or
`ConfigurableNonKeyedPartitionStream` for simplicity.

As for why we need both, you can refer to my reply to Yunfeng's first
question. As for the name, I can accept
ProcessConfigurableNonKeyedPartitionStream or keep the status quo. But I
don't want to change it to ConfigurableNonKeyedPartitionStream, the reason
is the same, because the configuration is applied to the Process rather
than the Stream.

> Should we allow users to set custom configurations through the
`ProcessConfigurable` interface and access these configurations in the
`ProcessFunction` via `RuntimeContext`? I believe it would be useful
for process function developers to be able to define custom
configurations.

If I understand you correctly, you want to set custom properties for
processing. The
current configurations are mostly for the runtime engine, such as
determining the underlying operator 's parallelism and SSG. But I'm not
aware of the need to pass in a custom value(independent of the framework
itself) and then get it at runtime from RuntimeContext. Could you give some
examples?

> How can users define custom metrics within the `ProcessFunction`?
Will there be a method like `getMetricGroup` available in the
`RuntimeContext`?

I think this is a reasonable request. For extensibility, I have added the
getMetricManager instead of getMetricGroup to RuntimeContext, we can use it
to get the MetricGroup.


Best regards,

Weijie


weijie guo  于2024年1月30日周二 13:45写道:

> Thanks Yunfeng,
>
> Let me try to answer your question :)
>
> > 1. Would it be better to have all XXXPartitionStream classes implement
> ProcessConfigurable, instead of defining both XXXPartitionStream and
> ProcessConfigurableAndXXXPartitionStream? I wonder whether users would
> need to operate on a non-configurable PartitionStream.
>
> I thought about this for a while and decided to separate DataStream from
> ProcessConfigurable. At the core of this is that streams and configuration
> s are completely orthogonal concepts, and configuration is only
> responsible for the `Process`, not the `Stream`. This is why only the `
> process/connectAndProcess` returns configurable stream, but partitioning
> like `KeyBy` returns a pure DataStream. This may also answer your second
> question in passing.
>
>
> > Apart from the detailed withConfigFoo(foo)/withConfigBar(bar)
> methods, would it be better to also add a general
> withConfig(configKey, configValue) method to the ProcessConfigurable
> interface? Adding a method for each configuration might harm the
> readability and compatibility of configurations.
>
> Sorry, I may not fully understand this question. ProcessConfigurable
> simply refers to the configuration of the Process, which can have the name,
> parallelism, etc of the process. It's not actually the Configuratiion(Contains
> a lot of ConfigOptions) that we usually talk about, but more like
> `SingleOutputStreamOperator` in DataStream V1.
>
> Best regards,
>
> Weijie
>
>
> Xuannan Su  于2024年1月29日周一 18:45写道:
>
>> Hi Weijie,
>>
>> Thanks for the FLIP! I have a few questions regarding the FLIP.
>>
>> 1. +1 to only use XXXParititionStream if users only need to use the
>> configurable PartitionStream.  If there are use cases for both,
>> perhaps we could use `ProcessConfigurableNonKeyedPartitionStream` or
>> `ConfigurableNonKeyedPartitionStream` for simplicity.
>>
>> 2. Should we allow users to set custom configurations through the
>> `ProcessConfigurable` interface and access these configurations in the
>> `ProcessFunction` via `RuntimeContext`? I believe it would be useful
>> for process function developers to be able to define custom
>> configurations.
>>
>> 3. How can users define custom metrics within the `ProcessFunction`?
>> Will there be a method like `getMetricGroup` available in the
>> `RuntimeContext`?
>>
>> Best,
>> Xuannan
>>
>>
>> On Fri, Jan 26, 2024 at 2:38 PM Yunfeng Zhou
>>  wrote:
>> >
>> > Hi Weijie,
>> >
>> > Thanks for introducing this FLIP! I have a few questions about the
>> > designs proposed.
>> >
>> > 1. Would it be better to have all XXXPartitionStream classes implement
>> > ProcessConfigurable, instead of defining both XXXPartitionStream and
>> > ProcessConfigurableAndXXXPartitionStream? I wonder whether users would
>> > need to operate on a non-configurable PartitionStream.
>> >
>> > 2. The name "ProcessConfigurable" seems a little ambiguous to me. Wil

  1   2   3   4   >