from:"Jing Ge"

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-25 Thread Jing Ge via user

Hi folks,

The bug has been fixed and PR at docker-library/official-images has been
merged. The official images are available now.

Best regards,
Jing

On Mon, Jan 22, 2024 at 11:39 AM Jing Ge  wrote:

> Hi folks,
>
> I am still working on the official images because of the issue
> https://issues.apache.org/jira/browse/FLINK-34165. Images under
> apache/flink are available.
>
> Best regards,
> Jing
>
> On Sun, Jan 21, 2024 at 11:06 PM Jing Ge  wrote:
>
>> Thanks Leonard for the feedback! Also thanks @Jark Wu  
>> @Chesnay
>> Schepler  and each and everyone who worked closely
>> with me for this release. We made it together!
>>
>> Best regards,
>> Jing
>>
>> On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu  wrote:
>>
>>> Thanks Jing for driving the release, nice work!
>>>
>>> Thanks all who involved this release!
>>>
>>> Best,
>>> Leonard
>>>
>>> > 2024年1月20日 上午12:01，Jing Ge  写道：
>>> >
>>> > The Apache Flink community is very happy to announce the release of
>>> Apache
>>> > Flink 1.18.1, which is the first bugfix release for the Apache Flink
>>> 1.18
>>> > series.
>>> >
>>> > Apache Flink® is an open-source stream processing framework for
>>> > distributed, high-performing, always-available, and accurate data
>>> streaming
>>> > applications.
>>> >
>>> > The release is available for download at:
>>> > https://flink.apache.org/downloads.html
>>> >
>>> > Please check out the release blog post for an overview of the
>>> improvements
>>> > for this bugfix release:
>>> >
>>> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/
>>> >
>>> > Please note: Users that have state compression should not migrate to
>>> 1.18.1
>>> > (nor 1.18.0) due to a critical bug that could lead to data loss. Please
>>> > refer to FLINK-34063 for more information.
>>> >
>>> > The full release notes are available in Jira:
>>> >
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640
>>> >
>>> > We would like to thank all contributors of the Apache Flink community
>>> who
>>> > made this release possible! Special thanks to @Qingsheng Ren @Leonard
>>> Xu
>>> > @Xintong Song @Matthias Pohl @Martijn Visser for the support during
>>> this
>>> > release.
>>> >
>>> > A Jira task series based on the Flink release wiki has been created for
>>> > 1.18.1 release. Tasks that need to be done by PMC have been explicitly
>>> > created separately. It will be convenient for the release manager to
>>> reach
>>> > out to PMC for those tasks. Any future patch release could consider
>>> cloning
>>> > it and follow the standard release process.
>>> > https://issues.apache.org/jira/browse/FLINK-33824
>>> >
>>> > Feel free to reach out to the release managers (or respond to this
>>> thread)
>>> > with feedback on the release process. Our goal is to constantly
>>> improve the
>>> > release process. Feedback on what could be improved or things that
>>> didn't
>>> > go so well are appreciated.
>>> >
>>> > Regards,
>>> > Jing
>>>
>>>

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-25 Thread Jing Ge

Hi folks,

The bug has been fixed and PR at docker-library/official-images has been
merged. The official images are available now.

Best regards,
Jing

On Mon, Jan 22, 2024 at 11:39 AM Jing Ge  wrote:

> Hi folks,
>
> I am still working on the official images because of the issue
> https://issues.apache.org/jira/browse/FLINK-34165. Images under
> apache/flink are available.
>
> Best regards,
> Jing
>
> On Sun, Jan 21, 2024 at 11:06 PM Jing Ge  wrote:
>
>> Thanks Leonard for the feedback! Also thanks @Jark Wu  
>> @Chesnay
>> Schepler  and each and everyone who worked closely
>> with me for this release. We made it together!
>>
>> Best regards,
>> Jing
>>
>> On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu  wrote:
>>
>>> Thanks Jing for driving the release, nice work!
>>>
>>> Thanks all who involved this release!
>>>
>>> Best,
>>> Leonard
>>>
>>> > 2024年1月20日 上午12:01，Jing Ge  写道：
>>> >
>>> > The Apache Flink community is very happy to announce the release of
>>> Apache
>>> > Flink 1.18.1, which is the first bugfix release for the Apache Flink
>>> 1.18
>>> > series.
>>> >
>>> > Apache Flink® is an open-source stream processing framework for
>>> > distributed, high-performing, always-available, and accurate data
>>> streaming
>>> > applications.
>>> >
>>> > The release is available for download at:
>>> > https://flink.apache.org/downloads.html
>>> >
>>> > Please check out the release blog post for an overview of the
>>> improvements
>>> > for this bugfix release:
>>> >
>>> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/
>>> >
>>> > Please note: Users that have state compression should not migrate to
>>> 1.18.1
>>> > (nor 1.18.0) due to a critical bug that could lead to data loss. Please
>>> > refer to FLINK-34063 for more information.
>>> >
>>> > The full release notes are available in Jira:
>>> >
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640
>>> >
>>> > We would like to thank all contributors of the Apache Flink community
>>> who
>>> > made this release possible! Special thanks to @Qingsheng Ren @Leonard
>>> Xu
>>> > @Xintong Song @Matthias Pohl @Martijn Visser for the support during
>>> this
>>> > release.
>>> >
>>> > A Jira task series based on the Flink release wiki has been created for
>>> > 1.18.1 release. Tasks that need to be done by PMC have been explicitly
>>> > created separately. It will be convenient for the release manager to
>>> reach
>>> > out to PMC for those tasks. Any future patch release could consider
>>> cloning
>>> > it and follow the standard release process.
>>> > https://issues.apache.org/jira/browse/FLINK-33824
>>> >
>>> > Feel free to reach out to the release managers (or respond to this
>>> thread)
>>> > with feedback on the release process. Our goal is to constantly
>>> improve the
>>> > release process. Feedback on what could be improved or things that
>>> didn't
>>> > go so well are appreciated.
>>> >
>>> > Regards,
>>> > Jing
>>>
>>>

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-22 Thread Jing Ge

Hi folks,

I am still working on the official images because of the issue
https://issues.apache.org/jira/browse/FLINK-34165. Images under
apache/flink are
available.

Best regards,
Jing

On Sun, Jan 21, 2024 at 11:06 PM Jing Ge  wrote:

> Thanks Leonard for the feedback! Also thanks @Jark Wu  
> @Chesnay
> Schepler  and each and everyone who worked closely
> with me for this release. We made it together!
>
> Best regards,
> Jing
>
> On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu  wrote:
>
>> Thanks Jing for driving the release, nice work!
>>
>> Thanks all who involved this release!
>>
>> Best,
>> Leonard
>>
>> > 2024年1月20日 上午12:01，Jing Ge  写道：
>> >
>> > The Apache Flink community is very happy to announce the release of
>> Apache
>> > Flink 1.18.1, which is the first bugfix release for the Apache Flink
>> 1.18
>> > series.
>> >
>> > Apache Flink® is an open-source stream processing framework for
>> > distributed, high-performing, always-available, and accurate data
>> streaming
>> > applications.
>> >
>> > The release is available for download at:
>> > https://flink.apache.org/downloads.html
>> >
>> > Please check out the release blog post for an overview of the
>> improvements
>> > for this bugfix release:
>> >
>> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/
>> >
>> > Please note: Users that have state compression should not migrate to
>> 1.18.1
>> > (nor 1.18.0) due to a critical bug that could lead to data loss. Please
>> > refer to FLINK-34063 for more information.
>> >
>> > The full release notes are available in Jira:
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640
>> >
>> > We would like to thank all contributors of the Apache Flink community
>> who
>> > made this release possible! Special thanks to @Qingsheng Ren @Leonard Xu
>> > @Xintong Song @Matthias Pohl @Martijn Visser for the support during this
>> > release.
>> >
>> > A Jira task series based on the Flink release wiki has been created for
>> > 1.18.1 release. Tasks that need to be done by PMC have been explicitly
>> > created separately. It will be convenient for the release manager to
>> reach
>> > out to PMC for those tasks. Any future patch release could consider
>> cloning
>> > it and follow the standard release process.
>> > https://issues.apache.org/jira/browse/FLINK-33824
>> >
>> > Feel free to reach out to the release managers (or respond to this
>> thread)
>> > with feedback on the release process. Our goal is to constantly improve
>> the
>> > release process. Feedback on what could be improved or things that
>> didn't
>> > go so well are appreciated.
>> >
>> > Regards,
>> > Jing
>>
>>

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-22 Thread Jing Ge via user

Hi folks,

I am still working on the official images because of the issue
https://issues.apache.org/jira/browse/FLINK-34165. Images under
apache/flink are
available.

Best regards,
Jing

On Sun, Jan 21, 2024 at 11:06 PM Jing Ge  wrote:

> Thanks Leonard for the feedback! Also thanks @Jark Wu  
> @Chesnay
> Schepler  and each and everyone who worked closely
> with me for this release. We made it together!
>
> Best regards,
> Jing
>
> On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu  wrote:
>
>> Thanks Jing for driving the release, nice work!
>>
>> Thanks all who involved this release!
>>
>> Best,
>> Leonard
>>
>> > 2024年1月20日 上午12:01，Jing Ge  写道：
>> >
>> > The Apache Flink community is very happy to announce the release of
>> Apache
>> > Flink 1.18.1, which is the first bugfix release for the Apache Flink
>> 1.18
>> > series.
>> >
>> > Apache Flink® is an open-source stream processing framework for
>> > distributed, high-performing, always-available, and accurate data
>> streaming
>> > applications.
>> >
>> > The release is available for download at:
>> > https://flink.apache.org/downloads.html
>> >
>> > Please check out the release blog post for an overview of the
>> improvements
>> > for this bugfix release:
>> >
>> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/
>> >
>> > Please note: Users that have state compression should not migrate to
>> 1.18.1
>> > (nor 1.18.0) due to a critical bug that could lead to data loss. Please
>> > refer to FLINK-34063 for more information.
>> >
>> > The full release notes are available in Jira:
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640
>> >
>> > We would like to thank all contributors of the Apache Flink community
>> who
>> > made this release possible! Special thanks to @Qingsheng Ren @Leonard Xu
>> > @Xintong Song @Matthias Pohl @Martijn Visser for the support during this
>> > release.
>> >
>> > A Jira task series based on the Flink release wiki has been created for
>> > 1.18.1 release. Tasks that need to be done by PMC have been explicitly
>> > created separately. It will be convenient for the release manager to
>> reach
>> > out to PMC for those tasks. Any future patch release could consider
>> cloning
>> > it and follow the standard release process.
>> > https://issues.apache.org/jira/browse/FLINK-33824
>> >
>> > Feel free to reach out to the release managers (or respond to this
>> thread)
>> > with feedback on the release process. Our goal is to constantly improve
>> the
>> > release process. Feedback on what could be improved or things that
>> didn't
>> > go so well are appreciated.
>> >
>> > Regards,
>> > Jing
>>
>>

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-21 Thread Jing Ge via user

Thanks Leonard for the feedback! Also thanks @Jark Wu
 @Chesnay
Schepler  and each and everyone who worked closely with
me for this release. We made it together!

Best regards,
Jing

On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu  wrote:

> Thanks Jing for driving the release, nice work!
>
> Thanks all who involved this release!
>
> Best,
> Leonard
>
> > 2024年1月20日 上午12:01，Jing Ge  写道：
> >
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 1.18.1, which is the first bugfix release for the Apache Flink 1.18
> > series.
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> streaming
> > applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> improvements
> > for this bugfix release:
> >
> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/
> >
> > Please note: Users that have state compression should not migrate to
> 1.18.1
> > (nor 1.18.0) due to a critical bug that could lead to data loss. Please
> > refer to FLINK-34063 for more information.
> >
> > The full release notes are available in Jira:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640
> >
> > We would like to thank all contributors of the Apache Flink community who
> > made this release possible! Special thanks to @Qingsheng Ren @Leonard Xu
> > @Xintong Song @Matthias Pohl @Martijn Visser for the support during this
> > release.
> >
> > A Jira task series based on the Flink release wiki has been created for
> > 1.18.1 release. Tasks that need to be done by PMC have been explicitly
> > created separately. It will be convenient for the release manager to
> reach
> > out to PMC for those tasks. Any future patch release could consider
> cloning
> > it and follow the standard release process.
> > https://issues.apache.org/jira/browse/FLINK-33824
> >
> > Feel free to reach out to the release managers (or respond to this
> thread)
> > with feedback on the release process. Our goal is to constantly improve
> the
> > release process. Feedback on what could be improved or things that didn't
> > go so well are appreciated.
> >
> > Regards,
> > Jing
>
>

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-21 Thread Jing Ge

Thanks Leonard for the feedback! Also thanks @Jark Wu
 @Chesnay
Schepler  and each and everyone who worked closely with
me for this release. We made it together!

Best regards,
Jing

On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu  wrote:

> Thanks Jing for driving the release, nice work!
>
> Thanks all who involved this release!
>
> Best,
> Leonard
>
> > 2024年1月20日 上午12:01，Jing Ge  写道：
> >
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 1.18.1, which is the first bugfix release for the Apache Flink 1.18
> > series.
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> streaming
> > applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> improvements
> > for this bugfix release:
> >
> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/
> >
> > Please note: Users that have state compression should not migrate to
> 1.18.1
> > (nor 1.18.0) due to a critical bug that could lead to data loss. Please
> > refer to FLINK-34063 for more information.
> >
> > The full release notes are available in Jira:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640
> >
> > We would like to thank all contributors of the Apache Flink community who
> > made this release possible! Special thanks to @Qingsheng Ren @Leonard Xu
> > @Xintong Song @Matthias Pohl @Martijn Visser for the support during this
> > release.
> >
> > A Jira task series based on the Flink release wiki has been created for
> > 1.18.1 release. Tasks that need to be done by PMC have been explicitly
> > created separately. It will be convenient for the release manager to
> reach
> > out to PMC for those tasks. Any future patch release could consider
> cloning
> > it and follow the standard release process.
> > https://issues.apache.org/jira/browse/FLINK-33824
> >
> > Feel free to reach out to the release managers (or respond to this
> thread)
> > with feedback on the release process. Our goal is to constantly improve
> the
> > release process. Feedback on what could be improved or things that didn't
> > go so well are appreciated.
> >
> > Regards,
> > Jing
>
>

[ANNOUNCE] Apache Flink 1.18.1 released

2024-01-19 Thread Jing Ge

The Apache Flink community is very happy to announce the release of Apache
Flink 1.18.1, which is the first bugfix release for the Apache Flink 1.18
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:
https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/

Please note: Users that have state compression should not migrate to 1.18.1
(nor 1.18.0) due to a critical bug that could lead to data loss. Please
refer to FLINK-34063 for more information.

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640

We would like to thank all contributors of the Apache Flink community who
made this release possible! Special thanks to @Qingsheng Ren @Leonard Xu
 @Xintong Song @Matthias Pohl @Martijn Visser for the support during this
release.

A Jira task series based on the Flink release wiki has been created for
1.18.1 release. Tasks that need to be done by PMC have been explicitly
created separately. It will be convenient for the release manager to reach
out to PMC for those tasks. Any future patch release could consider cloning
it and follow the standard release process.
https://issues.apache.org/jira/browse/FLINK-33824

Feel free to reach out to the release managers (or respond to this thread)
with feedback on the release process. Our goal is to constantly improve the
release process. Feedback on what could be improved or things that didn't
go so well are appreciated.

Regards,
Jing

[ANNOUNCE] Apache Flink 1.18.1 released

2024-01-19 Thread Jing Ge via user

The Apache Flink community is very happy to announce the release of Apache
Flink 1.18.1, which is the first bugfix release for the Apache Flink 1.18
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:
https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/

Please note: Users that have state compression should not migrate to 1.18.1
(nor 1.18.0) due to a critical bug that could lead to data loss. Please
refer to FLINK-34063 for more information.

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640

We would like to thank all contributors of the Apache Flink community who
made this release possible! Special thanks to @Qingsheng Ren @Leonard Xu
 @Xintong Song @Matthias Pohl @Martijn Visser for the support during this
release.

A Jira task series based on the Flink release wiki has been created for
1.18.1 release. Tasks that need to be done by PMC have been explicitly
created separately. It will be convenient for the release manager to reach
out to PMC for those tasks. Any future patch release could consider cloning
it and follow the standard release process.
https://issues.apache.org/jira/browse/FLINK-33824

Feel free to reach out to the release managers (or respond to this thread)
with feedback on the release process. Our goal is to constantly improve the
release process. Feedback on what could be improved or things that didn't
go so well are appreciated.

Regards,
Jing

Re: [DISCUSS] FLIP-391: Deprecate RuntimeContext#getExecutionConfig

2023-11-20 Thread Jing Ge via user

Hi Junrui,

Thanks for the clarification. On one hand, adding more methods into the
RuntimeContext flat will increase the effort for users who will use
RuntimeContext. But the impact is limited. It is fine. The big impact is,
on the other hand, for users who want to focus on the execution config,
they will need to find the needle in the haystack.

I just shared my thoughts and tried to help you look at the issue from many
different angles and I am open to learning opinions from other
contributors. Please feel free to proceed if there are no other objections.

Best regards,
Jing

On Mon, Nov 20, 2023 at 6:50 AM Junrui Lee  wrote:

> Hi Jing,
>
> Thank you for your feedback. I understand your concerns regarding putting
> all methods into the RuntimeContext flat.
>
> I would like to share some of my thoughts on this matter.
> Firstly, this FLIP only proposes the addition of three additional methods,
> which should not impose too much extra burden on users. Secondly, I agree
> that it is important to make it clearer for users to use the
> RuntimeContext. However, reorganizing the RuntimeContext to achieve this
> requires further discussion. We should focus on a more specific and unified
> reorganization of the RuntimeContext interface in future work, rather than
> implementing a temporary solution now. Therefore, I prefer not to add a
> separate abstraction layer for these three methods in this FLIP.
>
> Please feel free to share any further thoughts.
>
> Best regards,
> Junrui
>
> Jing Ge  于2023年11月20日周一 05:46写道：
>
>> Hi Junrui,
>>
>> Thanks for bringing this to our attention. First of all, it makes sense
>> to deprecate RuntimeContext#getExecutionConfig.
>>
>> Afaic, this is an issue of how we design API with clean concepts/aspects.
>> There are two issues mentioned in the FLIP:
>>
>> 1. short of user-facing abstraction - we just exposed ExecutionConfig
>> which mixed methods for users with methods that should only be used
>> internally.
>> 2. mutable vs immutable - do we want users to be able to modify configs
>> during job execution?
>>
>> An immutable user-facing abstraction design can solve both issues. All
>> execution related configs are still consolidated into the abstraction class
>> and easy to access. This is another design decision: flat vs. hierarchical.
>> Current FLIP removed the execution config abstraction and put all methods
>> into RuntimeContext flat, which will end up with more than 30 methods
>> offered flat by the RuntimeContext. I am not sure if this could help users
>> find the right method in the context of execution config better than
>> before.
>>
>> I might miss something and look forward to your thoughts. Thanks!
>>
>> Best regards,
>> Jing
>>
>> On Sat, Nov 18, 2023 at 11:21 AM Junrui Lee  wrote:
>>
>>> Hello Wencong,
>>>
>>> Thank you for your valuable feedback and suggestions. I want to clarify
>>> that reviewing existing methods in the ExecutionConfig is not directly
>>> related to the proposal in this FLIP. The main focus of this FLIP is to
>>> deprecate the specific method RuntimeContext#getExecutionConfig(). I
>>> believe it is important to keep the scope of this FLIP limited. However,
>>> your suggestion can certainly be considered as a separate FLIP in the
>>> future.
>>>
>>> Best regards,
>>> Junrui
>>>
>>> Wencong Liu  于2023年11月17日周五 22:08写道：
>>>
>>>> Hello Junrui,
>>>>
>>>>
>>>> Thanks for the effort. I agree with the proposal to deprecate the
>>>> getExecutionConfig() method in the RuntimeContext class. Exposing
>>>> the complex ExecutionConfig to user-defined functions can lead to
>>>> unnecessary complexity and risks.
>>>>
>>>>
>>>> I also have a suggestion. We could consider reviewing the existing
>>>>  methods in ExecutionConfig. If there are methods that are defined
>>>>  in ExecutionConfig but currently have no callers, we could consider
>>>>  annotating  them as @Internal or directly removing them. Since
>>>> users are no longer able to access and invoke these methods,
>>>> it would be beneficial to clean up the codebase.
>>>>
>>>>
>>>> +1 (non-binding).
>>>>
>>>>
>>>> Best,
>>>> Wencong
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>

Re: [DISCUSS] FLIP-391: Deprecate RuntimeContext#getExecutionConfig

2023-11-19 Thread Jing Ge via user

Hi Junrui,

Thanks for bringing this to our attention. First of all, it makes sense to
deprecate RuntimeContext#getExecutionConfig.

Afaic, this is an issue of how we design API with clean concepts/aspects.
There are two issues mentioned in the FLIP:

1. short of user-facing abstraction - we just exposed ExecutionConfig which
mixed methods for users with methods that should only be used internally.
2. mutable vs immutable - do we want users to be able to modify configs
during job execution?

An immutable user-facing abstraction design can solve both issues. All
execution related configs are still consolidated into the abstraction class
and easy to access. This is another design decision: flat vs. hierarchical.
Current FLIP removed the execution config abstraction and put all methods
into RuntimeContext flat, which will end up with more than 30 methods
offered flat by the RuntimeContext. I am not sure if this could help users
find the right method in the context of execution config better than
before.

I might miss something and look forward to your thoughts. Thanks!

Best regards,
Jing

On Sat, Nov 18, 2023 at 11:21 AM Junrui Lee  wrote:

> Hello Wencong,
>
> Thank you for your valuable feedback and suggestions. I want to clarify
> that reviewing existing methods in the ExecutionConfig is not directly
> related to the proposal in this FLIP. The main focus of this FLIP is to
> deprecate the specific method RuntimeContext#getExecutionConfig(). I
> believe it is important to keep the scope of this FLIP limited. However,
> your suggestion can certainly be considered as a separate FLIP in the
> future.
>
> Best regards,
> Junrui
>
> Wencong Liu  于2023年11月17日周五 22:08写道：
>
>> Hello Junrui,
>>
>>
>> Thanks for the effort. I agree with the proposal to deprecate the
>> getExecutionConfig() method in the RuntimeContext class. Exposing
>> the complex ExecutionConfig to user-defined functions can lead to
>> unnecessary complexity and risks.
>>
>>
>> I also have a suggestion. We could consider reviewing the existing
>>  methods in ExecutionConfig. If there are methods that are defined
>>  in ExecutionConfig but currently have no callers, we could consider
>>  annotating  them as @Internal or directly removing them. Since
>> users are no longer able to access and invoke these methods,
>> it would be beneficial to clean up the codebase.
>>
>>
>> +1 (non-binding).
>>
>>
>> Best,
>> Wencong
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2023-11-15 16:51:15, "Junrui Lee"  wrote:
>> >Hi all,
>> >
>> >I'd like to start a discussion of FLIP-391: Deprecate
>> >RuntimeContext#getExecutionConfig[1].
>> >
>> >Currently, the FLINK RuntimeContext is important for connecting user
>> >functions to the underlying runtime details. It provides users with
>> >necessary runtime information during job execution.
>> >However, he current implementation of the FLINK RuntimeContext exposes
>> the
>> >ExecutionConfig to users, resulting in two issues:
>> >Firstly, the ExecutionConfig contains much unrelated information that can
>> >confuse users and complicate management.
>> >Secondly, exposing the ExecutionConfig allows users to modify it during
>> job
>> >execution, which can cause inconsistencies and problems, especially with
>> >operator chaining.
>> >
>> >Therefore, we propose deprecating the RuntimeContext#getExecutionConfig
>> in
>> >the FLINK RuntimeContext. In the upcoming FLINK-2.0 version, we plan to
>> >completely remove the RuntimeContext#getExecutionConfig method. Instead,
>> we
>> >will introduce alternative getter methods that enable users to access
>> >specific information without exposing unnecessary runtime details. These
>> >getter methods will include:
>> >
>> >1. @PublicEvolving  TypeSerializer
>> >createSerializer(TypeInformation typeInformation);
>> >2. @PublicEvolving Map getGlobalJobParameters();
>> >3. @PublicEvolving boolean isObjectReuseEnabled();
>> >
>> >Looking forward to your feedback and suggestions, thanks.
>> >
>> >[1]
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465937
>> >
>> >Best regards,
>> >Junrui
>>
>

[ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Jing Ge via user

The Apache Flink community is very happy to announce the release of Apache
Flink 1.18.0, which is the first release for the Apache Flink 1.18 series.

Apache Flink® is an open-source unified stream and batch data processing
framework for distributed, high-performing, always-available, and accurate
data applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this release:
https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Best regards,
Konstantin, Qingsheng, Sergey, and Jing

[ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Jing Ge

The Apache Flink community is very happy to announce the release of Apache
Flink 1.18.0, which is the first release for the Apache Flink 1.18 series.

Apache Flink® is an open-source unified stream and batch data processing
framework for distributed, high-performing, always-available, and accurate
data applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this release:
https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Best regards,
Konstantin, Qingsheng, Sergey, and Jing

Re: 请问1.18什么时候可以发布呢，想体验1.17jdk

2023-10-16 Thread Jing Ge

快了，已经开始voting了 :-))

On Sun, Oct 15, 2023 at 5:55 AM kcz <573693...@qq.com.invalid> wrote:

>

Re: Flink 1.17.2 planned?

2023-08-23 Thread Jing Ge via user

Hi Christian,

Thanks for your understanding. We will take a look at 1.17.2, once the 1.18
release is done. In the meantime, there might be someone in the community
who volunteers to be the 1.17.2 release manager. You will see related email
threads on the Dev. Stay tuned please :-)

Best regards,
Jing

On Wed, Aug 23, 2023 at 9:27 AM Christian Lorenz 
wrote:

> Hi Jing,
>
>
>
> thanks for the answer. I have no idea what kind of work is needed for
> being a release manager. I think we’ll have to wait for the release then
> (if really urgent, the blocking bug can also be patched by us).
>
>
>
> Kind regards,
>
> Christian
>
>
>
> *Von: *Jing Ge via user 
> *Datum: *Dienstag, 22. August 2023 um 11:40
> *An: *liu ron 
> *Cc: *user@flink.apache.org 
> *Betreff: *Re: Flink 1.17.2 planned?
>
> This email has reached Mapp via an external source
>
>
>
> Hi Christian,
>
>
>
> Thanks for reaching out. Liked Ron pointed out that the community is
> focusing on the 1.18 release. If you are facing urgent issues, would you
> like to volunteer as the release manager of 1.17.2 and drive the release?
> Theoretically, everyone could be the release manager of a bugs fix release.
>
>
>
> Best regards,
>
> Jing
>
>
>
> On Tue, Aug 22, 2023 at 3:41 AM liu ron  wrote:
>
> Hi, Christian
>
>
>
> We released 1.17.1 [1] in May, and the main focus of the community is
> currently on the 1.18 release, so 1.17.2 should be planned for after the
> 1.18 release!
>
>
>
> [1]
> https://flink.apache.org/2023/05/25/apache-flink-1.17.1-release-announcement/
>
>
>
>
>
> Best,
>
> Ron
>
>
>
> Christian Lorenz via user  于2023年8月21日周一 17:33写道：
>
> Hi team,
>
>
>
> are there any infos about a bugfix release 1.17.2 available? E.g. will
> there be another bugfix release of 1.17 / approximate timing?
>
> We are hit by https://issues.apache.org/jira/browse/FLINK-32296 which
> leads to wrong SQL responses in some circumstances.
>
>
>
> Kind regards,
>
> Christian
>
> This e-mail is from Mapp Digital Group and its international legal
> entities and may contain information that is confidential.
> If you are not the intended recipient, do not read, copy or distribute the
> e-mail or any attachments. Instead, please notify the sender and delete the
> e-mail and any attachments.
>
> This e-mail is from Mapp Digital Group and its international legal
> entities and may contain information that is confidential.
> If you are not the intended recipient, do not read, copy or distribute the
> e-mail or any attachments. Instead, please notify the sender and delete the
> e-mail and any attachments.
>

Fwd: [Discussion] Slack Channel

2023-08-23 Thread Jing Ge via user

Hi devs,

Thanks Giannis for your suggestion. It seems that the last email wasn't
sent to the dev ML. It is also an interesting topic for devs and user-zh.

Best regards,
Jing

-- Forwarded message -
From: Giannis Polyzos 
Date: Tue, Aug 22, 2023 at 11:11 AM
Subject: [Discussion] Slack Channel
To: user , 


Hello folks,
considering how apache flink gains more and more popularity and seeing how
other open-source projects use Slack, I wanted to start this thread to see
how we can grow the community.
First of all one thing I have noticed, although there are people involved
with Flink only lately they start noticing there is actually an open-source
Slack channel.
Maybe we can help somehow raise awareness around that? Like inviting people
to join.
The other part is around the channels.
Currently, there is only one channel #troubleshooting for people to ask
questions.
I believe this creates a few limitations, as there are quite a few
questions daily, but it's hard to differentiate between topics, and people
with context on specific parts can't identify them easily.
I'm thinking it would be nice to create more channels like:
#flink-cdc
#flink-paimon
#flink-datastream
#pyflink
#flink-sql
#flink-statebackends
#flink-monitoring
#flink-k8s
#job-board etc.
to help people have more visibility on different topics, make it easier to
find answers to similar questions, and search for things of interest.
This can be a first step towards growing the community.

Best,
Giannis

Fwd: [Discussion] Slack Channel

2023-08-23 Thread Jing Ge

Hi devs,

Thanks Giannis for your suggestion. It seems that the last email wasn't
sent to the dev ML. It is also an interesting topic for devs and user-zh.

Best regards,
Jing

-- Forwarded message -
From: Giannis Polyzos 
Date: Tue, Aug 22, 2023 at 11:11 AM
Subject: [Discussion] Slack Channel
To: user , 


Hello folks,
considering how apache flink gains more and more popularity and seeing how
other open-source projects use Slack, I wanted to start this thread to see
how we can grow the community.
First of all one thing I have noticed, although there are people involved
with Flink only lately they start noticing there is actually an open-source
Slack channel.
Maybe we can help somehow raise awareness around that? Like inviting people
to join.
The other part is around the channels.
Currently, there is only one channel #troubleshooting for people to ask
questions.
I believe this creates a few limitations, as there are quite a few
questions daily, but it's hard to differentiate between topics, and people
with context on specific parts can't identify them easily.
I'm thinking it would be nice to create more channels like:
#flink-cdc
#flink-paimon
#flink-datastream
#pyflink
#flink-sql
#flink-statebackends
#flink-monitoring
#flink-k8s
#job-board etc.
to help people have more visibility on different topics, make it easier to
find answers to similar questions, and search for things of interest.
This can be a first step towards growing the community.

Best,
Giannis

Re: Flink 1.17.2 planned?

2023-08-22 Thread Jing Ge via user

Hi Christian,

Thanks for reaching out. Liked Ron pointed out that the community is
focusing on the 1.18 release. If you are facing urgent issues, would you
like to volunteer as the release manager of 1.17.2 and drive the release?
Theoretically, everyone could be the release manager of a bugs fix release.

Best regards,
Jing

On Tue, Aug 22, 2023 at 3:41 AM liu ron  wrote:

> Hi, Christian
>
> We released 1.17.1 [1] in May, and the main focus of the community is
> currently on the 1.18 release, so 1.17.2 should be planned for after the
> 1.18 release!
>
> [1]
> https://flink.apache.org/2023/05/25/apache-flink-1.17.1-release-announcement/
>
>
> Best,
> Ron
>
> Christian Lorenz via user  于2023年8月21日周一 17:33写道：
>
>> Hi team,
>>
>>
>>
>> are there any infos about a bugfix release 1.17.2 available? E.g. will
>> there be another bugfix release of 1.17 / approximate timing?
>>
>> We are hit by https://issues.apache.org/jira/browse/FLINK-32296 which
>> leads to wrong SQL responses in some circumstances.
>>
>>
>>
>> Kind regards,
>>
>> Christian
>>
>> This e-mail is from Mapp Digital Group and its international legal
>> entities and may contain information that is confidential.
>> If you are not the intended recipient, do not read, copy or distribute
>> the e-mail or any attachments. Instead, please notify the sender and delete
>> the e-mail and any attachments.
>>
>

Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-03 Thread Jing Ge via user

Congratulations!

Best regards,
Jing


On Mon, Jul 3, 2023 at 3:21 PM yuxia  wrote:

> Congratulations!
>
> Best regards,
> Yuxia
>
> --
> *发件人: *"Pushpa Ramakrishnan" 
> *收件人: *"Xintong Song" 
> *抄送: *"dev" , "User" 
> *发送时间: *星期一, 2023年 7 月 03日 下午 8:36:30
> *主题: *Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award
>
> Congratulations \uD83E\uDD73
>
> On 03-Jul-2023, at 3:30 PM, Xintong Song  wrote:
>
> 
> Dear Community,
>
> I'm pleased to share this good news with everyone. As some of you may have
> already heard, Apache Flink has won the 2023 SIGMOD Systems Award [1].
>
> "Apache Flink greatly expanded the use of stream data-processing." --
> SIGMOD Awards Committee
>
> SIGMOD is one of the most influential data management research conferences
> in the world. The Systems Award is awarded to an individual or set of
> individuals to recognize the development of a software or hardware system
> whose technical contributions have had significant impact on the theory or
> practice of large-scale data management systems. Winning of the award
> indicates the high recognition of Flink's technological advancement and
> industry influence from academia.
>
> As an open-source project, Flink wouldn't have come this far without the
> wide, active and supportive community behind it. Kudos to all of us who
> helped make this happen, including the over 1,400 contributors and many
> others who contributed in ways beyond code.
>
> Best,
>
> Xintong (on behalf of the Flink PMC)
>
>
> [1] https://sigmod.org/2023-sigmod-systems-award/
>
>
>

Call for Presentations: Flink Forward Seattle 2023

2023-06-08 Thread Jing Ge via user

Dear Flink developers & users,

We hope this email finds you well. We are excited to announce the Call for
Presentations for the upcoming Flink Forward Seattle 2023, the premier
event dedicated to Apache Flink and stream processing technologies. As a
prominent figure in the field, we invite you to submit your innovative
research, insightful experiences, and cutting-edge use cases for
consideration as a speaker at the conference.

Flink Forward Conference 2023 Details:
Date: November 6-7(training), November 8 (conference)
Location: Seattle United States

Flink Forward is a conference dedicated to the Apache Flink® community. In
2023 we shall have a full conference day following a 2-days long training
session. The conference gathers an international audience of CTOs/CIOs,
developers, data architects, data scientists, Apache Flink® core
committers, and the stream processing community, to share experiences,
exchange ideas and knowledge, and receive hands-on training sessions led by
Flink experts. We are seeking compelling presentations and
thought-provoking talks that cover a broad range of topics related to
Apache Flink, including but not limited to:

Flink architecture and internals
Flink performance optimization
Advanced Flink features and enhancements
Real-world use cases and success stories
Flink ecosystem and integrations
Stream processing at scale
Best practices for Flink application development

If you have an inspiring story, valuable insights, real-world application,
research breakthroughs, use case, best practice, or compelling vision of
the future for Flink, we encourage you to present it to a highly skilled
and enthusiastic community. We welcome submissions from both industry
professionals and academic researchers.

To submit your proposal, please visit the Flink Forward Conference website
at https://www.flink-forward.org/seattle-2023/call-for-presentations. The
submission form will require you to provide an abstract of your talk, along
with a brief biography and any supporting materials. The deadline for
submissions is July 12th 11:59 pm PDT.

We believe your contribution will greatly enrich the Flink Forward
Conference and provide invaluable insights to our attendees. This is an
excellent opportunity to connect with a diverse community of Flink
enthusiasts, network with industry experts, and gain recognition for your
expertise. We look forward to receiving your submission and welcoming you
as a speaker at the Flink Forward Conference.

Thank you for your time and consideration.

Best regards,

-- 

Jing Ge | Head of Engineering

j...@ververica.com

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference - Tickets on SALE now!
<https://eu.eventscloud.com/ereg/newreg.php?eventid=200259741

Re: apache-flink java question

2023-06-01 Thread Jing Ge via user

Hi Chris,

not yet and we are working on it[1].

best regards,
Jing

[1]
https://issues.apache.org/jira/browse/FLINK-15736?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=17697544#comment-17697544

On Thu, Jun 1, 2023 at 11:40 PM Joseph, Chris S 
wrote:

> Hi,
>
>
>
> Does Apache flink table api work with java 17?
>
>
>
> Thanks,
>
> Chris Joseph
>
>
>

Re: [SUMMARY] Flink 1.18 Release Sync 05/30/2023

2023-05-30 Thread Jing Ge via user

Thanks Qingsheng for driving it!

@Devs
As you might already be aware of, there are many externalizations and new
releases of Flink connectors. Once a connector has been externalized
successfully, i.e. the related module has been removed in the Flink repo,
we will not set a priority higher than major to tasks related to those
connectors.

Best regards,
Jing

On Tue, May 30, 2023 at 11:48 AM Qingsheng Ren  wrote:

> Hi devs and users,
>
> I'd like to share some highlights from the release sync of 1.18 on May 30.
>
> 1. @developers please update the progress of your features on 1.18 release
> wiki page [1] ! That will help us a lot to have an overview of the entire
> release cycle.
>
> 2. We found a JIRA issue (FLINK-18356) [2] that doesn't have an assignee,
> which is a CI instability of the flink-table-planner module. It'll be nice
> if someone in the community could pick it up and make some investigations
> :-)
>
> There are 6 weeks before the feature freeze date (Jul 11). The next release
> sync will be on Jun 13, 2023. Welcome to join us [3]!
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release
> [2] https://issues.apache.org/jira/browse/FLINK-18356
> [3] Zoom meeting:
> https://us04web.zoom.us/j/79158702091?pwd=8CXPqxMzbabWkma5b0qFXI1IcLbxBh.1
>
> Best regards,
> Jing, Konstantin, Sergey and Qingsheng
>

Re: [ANNOUNCE] Apache Flink 1.16.2 released

2023-05-29 Thread Jing Ge

Hi Weijie,

Thanks for your contribution and feedback! In case there are some reasons
not to allow us to upgrade them, we still can leverage virtualenv or pipenv
to create a dedicated environment for Flink release. WDYT?

cc Dian Fu

@Dian
I was wondering if you know the reason. Thanks!

Best regards,
Jing




On Mon, May 29, 2023 at 6:27 AM weijie guo 
wrote:

> Hi Jing,
>
> Thank you for caring about the releasing process. It has to be said that
> the entire process went smoothly. We have very comprehensive
> documentation[1] to guide my work, thanks to the contribution of previous
> release managers and the community.
>
> Regarding the obstacles, I actually only have one minor problem: We used an
> older twine(1.12.0) to deploy python artifacts to PyPI, and its compatible
> dependencies (such as urllib3) are also older. When I tried twine upload,
> the process couldn't work as expected as the version of urllib3 installed
> in my machine was relatively higher. In order to solve this, I had to
> proactively downgrade the version of some dependencies. I added a notice in
> the cwiki page[1] to prevent future release managers from encountering the
> same problem. It seems that this is a known issue(see comments in [2]),
> which has been resolved in the higher version of twine, I wonder if we can
> upgrade the version of twine? Does anyone remember the reason why we fixed
> a very old version(1.12.0)?
>
> Best regards,
>
> Weijie
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
>
> [2] https://github.com/pypa/twine/issues/997
>
>
> Jing Ge  于2023年5月27日周六 00:15写道：
>
> > Hi Weijie,
> >
> > Thanks again for your effort. I was wondering if there were any obstacles
> > you had to overcome while releasing 1.16.2 and 1.17.1 that could lead us
> to
> > any improvement wrt the release process and management?
> >
> > Best regards,
> > Jing
> >
> > On Fri, May 26, 2023 at 4:41 PM Martijn Visser  >
> > wrote:
> >
> > > Thank you Weijie and those who helped with testing!
> > >
> > > On Fri, May 26, 2023 at 1:06 PM weijie guo 
> > > wrote:
> > >
> > > > The Apache Flink community is very happy to announce the release of
> > > > Apache Flink 1.16.2, which is the second bugfix release for the
> Apache
> > > > Flink 1.16 series.
> > > >
> > > >
> > > >
> > > > Apache Flink® is an open-source stream processing framework for
> > > > distributed, high-performing, always-available, and accurate data
> > > > streaming applications.
> > > >
> > > >
> > > >
> > > > The release is available for download at:
> > > >
> > > > https://flink.apache.org/downloads.html
> > > >
> > > >
> > > >
> > > > Please check out the release blog post for an overview of the
> > > > improvements for this bugfix release:
> > > >
> > > > https://flink.apache.org/news/2023/05/25/release-1.16.2.html
> > > >
> > > >
> > > >
> > > > The full release notes are available in Jira:
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352765
> > > >
> > > >
> > > >
> > > > We would like to thank all contributors of the Apache Flink community
> > > > who made this release possible!
> > > >
> > > >
> > > >
> > > > Feel free to reach out to the release managers (or respond to this
> > > > thread) with feedback on the release process. Our goal is to
> > > > constantly improve the release process. Feedback on what could be
> > > > improved or things that didn't go so well are appreciated.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Release Manager
> > > >
> > >
> >
>

Re: [ANNOUNCE] Apache Flink 1.16.2 released

2023-05-29 Thread Jing Ge via user

Hi Weijie,

Thanks for your contribution and feedback! In case there are some reasons
not to allow us to upgrade them, we still can leverage virtualenv or pipenv
to create a dedicated environment for Flink release. WDYT?

cc Dian Fu

@Dian
I was wondering if you know the reason. Thanks!

Best regards,
Jing




On Mon, May 29, 2023 at 6:27 AM weijie guo 
wrote:

> Hi Jing,
>
> Thank you for caring about the releasing process. It has to be said that
> the entire process went smoothly. We have very comprehensive
> documentation[1] to guide my work, thanks to the contribution of previous
> release managers and the community.
>
> Regarding the obstacles, I actually only have one minor problem: We used an
> older twine(1.12.0) to deploy python artifacts to PyPI, and its compatible
> dependencies (such as urllib3) are also older. When I tried twine upload,
> the process couldn't work as expected as the version of urllib3 installed
> in my machine was relatively higher. In order to solve this, I had to
> proactively downgrade the version of some dependencies. I added a notice in
> the cwiki page[1] to prevent future release managers from encountering the
> same problem. It seems that this is a known issue(see comments in [2]),
> which has been resolved in the higher version of twine, I wonder if we can
> upgrade the version of twine? Does anyone remember the reason why we fixed
> a very old version(1.12.0)?
>
> Best regards,
>
> Weijie
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
>
> [2] https://github.com/pypa/twine/issues/997
>
>
> Jing Ge  于2023年5月27日周六 00:15写道：
>
> > Hi Weijie,
> >
> > Thanks again for your effort. I was wondering if there were any obstacles
> > you had to overcome while releasing 1.16.2 and 1.17.1 that could lead us
> to
> > any improvement wrt the release process and management?
> >
> > Best regards,
> > Jing
> >
> > On Fri, May 26, 2023 at 4:41 PM Martijn Visser  >
> > wrote:
> >
> > > Thank you Weijie and those who helped with testing!
> > >
> > > On Fri, May 26, 2023 at 1:06 PM weijie guo 
> > > wrote:
> > >
> > > > The Apache Flink community is very happy to announce the release of
> > > > Apache Flink 1.16.2, which is the second bugfix release for the
> Apache
> > > > Flink 1.16 series.
> > > >
> > > >
> > > >
> > > > Apache Flink® is an open-source stream processing framework for
> > > > distributed, high-performing, always-available, and accurate data
> > > > streaming applications.
> > > >
> > > >
> > > >
> > > > The release is available for download at:
> > > >
> > > > https://flink.apache.org/downloads.html
> > > >
> > > >
> > > >
> > > > Please check out the release blog post for an overview of the
> > > > improvements for this bugfix release:
> > > >
> > > > https://flink.apache.org/news/2023/05/25/release-1.16.2.html
> > > >
> > > >
> > > >
> > > > The full release notes are available in Jira:
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352765
> > > >
> > > >
> > > >
> > > > We would like to thank all contributors of the Apache Flink community
> > > > who made this release possible!
> > > >
> > > >
> > > >
> > > > Feel free to reach out to the release managers (or respond to this
> > > > thread) with feedback on the release process. Our goal is to
> > > > constantly improve the release process. Feedback on what could be
> > > > improved or things that didn't go so well are appreciated.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Release Manager
> > > >
> > >
> >
>

Re: [ANNOUNCE] Apache Flink 1.16.2 released

2023-05-26 Thread Jing Ge

Hi Weijie,

Thanks again for your effort. I was wondering if there were any obstacles
you had to overcome while releasing 1.16.2 and 1.17.1 that could lead us to
any improvement wrt the release process and management?

Best regards,
Jing

On Fri, May 26, 2023 at 4:41 PM Martijn Visser 
wrote:

> Thank you Weijie and those who helped with testing!
>
> On Fri, May 26, 2023 at 1:06 PM weijie guo 
> wrote:
>
> > The Apache Flink community is very happy to announce the release of
> > Apache Flink 1.16.2, which is the second bugfix release for the Apache
> > Flink 1.16 series.
> >
> >
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> > streaming applications.
> >
> >
> >
> > The release is available for download at:
> >
> > https://flink.apache.org/downloads.html
> >
> >
> >
> > Please check out the release blog post for an overview of the
> > improvements for this bugfix release:
> >
> > https://flink.apache.org/news/2023/05/25/release-1.16.2.html
> >
> >
> >
> > The full release notes are available in Jira:
> >
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352765
> >
> >
> >
> > We would like to thank all contributors of the Apache Flink community
> > who made this release possible!
> >
> >
> >
> > Feel free to reach out to the release managers (or respond to this
> > thread) with feedback on the release process. Our goal is to
> > constantly improve the release process. Feedback on what could be
> > improved or things that didn't go so well are appreciated.
> >
> >
> >
> > Regards,
> >
> > Release Manager
> >
>

Re: [ANNOUNCE] Apache Flink 1.16.2 released

2023-05-26 Thread Jing Ge via user

Hi Weijie,

Thanks again for your effort. I was wondering if there were any obstacles
you had to overcome while releasing 1.16.2 and 1.17.1 that could lead us to
any improvement wrt the release process and management?

Best regards,
Jing

On Fri, May 26, 2023 at 4:41 PM Martijn Visser 
wrote:

> Thank you Weijie and those who helped with testing!
>
> On Fri, May 26, 2023 at 1:06 PM weijie guo 
> wrote:
>
> > The Apache Flink community is very happy to announce the release of
> > Apache Flink 1.16.2, which is the second bugfix release for the Apache
> > Flink 1.16 series.
> >
> >
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> > streaming applications.
> >
> >
> >
> > The release is available for download at:
> >
> > https://flink.apache.org/downloads.html
> >
> >
> >
> > Please check out the release blog post for an overview of the
> > improvements for this bugfix release:
> >
> > https://flink.apache.org/news/2023/05/25/release-1.16.2.html
> >
> >
> >
> > The full release notes are available in Jira:
> >
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352765
> >
> >
> >
> > We would like to thank all contributors of the Apache Flink community
> > who made this release possible!
> >
> >
> >
> > Feel free to reach out to the release managers (or respond to this
> > thread) with feedback on the release process. Our goal is to
> > constantly improve the release process. Feedback on what could be
> > improved or things that didn't go so well are appreciated.
> >
> >
> >
> > Regards,
> >
> > Release Manager
> >
>

Re: [ANNOUNCE] Apache Flink 1.17.1 released

2023-05-26 Thread Jing Ge via user

Hi Weijie,

That is earlier than I expected! Thank you so much for your effort!

Best regards,
Jing

On Fri, May 26, 2023 at 4:44 PM Martijn Visser 
wrote:

> Same here as with Flink 1.16.2, thank you Weijie and those who helped with
> testing!
>
> On Fri, May 26, 2023 at 1:08 PM weijie guo 
> wrote:
>
>>
>> The Apache Flink community is very happy to announce the release of Apache 
>> Flink 1.17.1, which is the first bugfix release for the Apache Flink 1.17 
>> series.
>>
>>
>>
>>
>> Apache Flink® is an open-source stream processing framework for distributed, 
>> high-performing, always-available, and accurate data streaming applications.
>>
>>
>>
>> The release is available for download at:
>>
>> https://flink.apache.org/downloads.html
>>
>>
>>
>>
>> Please check out the release blog post for an overview of the improvements 
>> for this bugfix release:
>>
>> https://flink.apache.org/news/2023/05/25/release-1.17.1.html
>>
>>
>>
>> The full release notes are available in Jira:
>>
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352886
>>
>>
>>
>>
>> We would like to thank all contributors of the Apache Flink community who 
>> made this release possible!
>>
>>
>>
>>
>> Feel free to reach out to the release managers (or respond to this thread) 
>> with feedback on the release process. Our goal is to constantly improve the 
>> release process. Feedback on what could be improved or things that didn't go 
>> so well are appreciated.
>>
>>
>>
>> Regards,
>>
>> Release Manager
>>
>

Re: [ANNOUNCE] Apache Flink 1.17.1 released

2023-05-26 Thread Jing Ge

Hi Weijie,

That is earlier than I expected! Thank you so much for your effort!

Best regards,
Jing

On Fri, May 26, 2023 at 4:44 PM Martijn Visser 
wrote:

> Same here as with Flink 1.16.2, thank you Weijie and those who helped with
> testing!
>
> On Fri, May 26, 2023 at 1:08 PM weijie guo 
> wrote:
>
>>
>> The Apache Flink community is very happy to announce the release of Apache 
>> Flink 1.17.1, which is the first bugfix release for the Apache Flink 1.17 
>> series.
>>
>>
>>
>>
>> Apache Flink® is an open-source stream processing framework for distributed, 
>> high-performing, always-available, and accurate data streaming applications.
>>
>>
>>
>> The release is available for download at:
>>
>> https://flink.apache.org/downloads.html
>>
>>
>>
>>
>> Please check out the release blog post for an overview of the improvements 
>> for this bugfix release:
>>
>> https://flink.apache.org/news/2023/05/25/release-1.17.1.html
>>
>>
>>
>> The full release notes are available in Jira:
>>
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352886
>>
>>
>>
>>
>> We would like to thank all contributors of the Apache Flink community who 
>> made this release possible!
>>
>>
>>
>>
>> Feel free to reach out to the release managers (or respond to this thread) 
>> with feedback on the release process. Our goal is to constantly improve the 
>> release process. Feedback on what could be improved or things that didn't go 
>> so well are appreciated.
>>
>>
>>
>> Regards,
>>
>> Release Manager
>>
>

Re: Maven plugin to detect issues early on

2023-05-22 Thread Jing Ge via user

cc user ML to get more attention, since the plugin will be used by Flink
application developers.

Best regards,
Jing

On Mon, May 22, 2023 at 3:32 PM Jing Ge  wrote:

> Hi Emre,
>
> Thanks for clarifying it. Afaiac, it is a quite interesting proposal,
> especially for Flink job developers who are heavily using the Datastream
> API. Publishing the plugin in your Github would be a feasible way for the
> first move. As I mentioned previously, in order to help the community
> understand the plugin, you might want to describe all those attractive
> features you mentioned in e.g. the readme.md with more technical details. I
> personally was wondering how those connector compatibility rules will be
> defined and maintained, given that almost all connectors have been
> externalized.
>
> Best regards,
> Jing
>
> On Mon, May 22, 2023 at 11:24 AM Kartoglu, Emre
>  wrote:
>
>> Hi Jing,
>>
>> The proposed plugin would be used by Flink application developers, when
>> they are writing their Flink job. It would trigger during
>> compilation/packaging and would look for known incompatibilities, bad
>> practices, or bugs.
>> For instance one cause of frustration for our customers is connector
>> incompatibilities (specifically Kafka and Kinesis) with certain Flink
>> versions. This plugin would be a quick way to update a list of known
>> incompatibilities, bugs, bad practices, so customers get errors during
>> compilation/packaging and not after they've deployed their Flink job.
>>
>> From what you're saying, the FLIP route might not be the best way to go.
>> We might publish this plugin in our own GitHub namespace/group first, and
>> then get community acknowledgement/support for it. I believe working with
>> the Flink community on this is key as we'd need their support/opinion to do
>> this the right way and reach more Flink users.
>>
>> Thanks
>> Emre
>>
>> On 21/05/2023, 16:48, "Jing Ge" > j...@ververica.com.inva>LID> wrote:
>>
>>
>> CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>>
>>
>>
>>
>>
>>
>> Hi Emre,
>>
>>
>> Thanks for your proposal. It looks very interesting! Please pay attention
>> that most connectors have been externalized. Will your proposed plug be
>> used for building Flink Connectors or Flink itself? Furthermore, it would
>> be great if you could elaborate features wrt best practices so that we
>> could understand how the plugin will help us.
>>
>>
>> Afaik, FLIP is recommended for improvement ideas that will change public
>> APIs. I am not sure if a new maven plugin belongs to it.
>>
>>
>> Best regards,
>> Jing
>>
>>
>> On Tue, May 16, 2023 at 11:29 AM Kartoglu, Emre > <mailto:kar...@amazon.co.uk.inva>lid>
>> wrote:
>>
>>
>> > Hello all,
>> >
>> > Myself and 2 colleagues developed a Maven plugin (no support for Gradle
>> or
>> > other build tools yet) that we use internally to detect potential
>> issues in
>> > Flink apps at compilation/packaging stage:
>> >
>> >
>> > * Known connector version incompatibilities – so far covering Kafka
>> > and Kinesis
>> > * Best practices e.g. setting operator IDs
>> >
>> > We’d like to make this open-source. Ideally with the Flink community’s
>> > support/mention of it on the Flink website, so more people use it.
>> >
>> > Going forward, I believe we have at least the following options:
>> >
>> > * Get community support: Create a FLIP to discuss where the plugin
>> > should live, what kind of problems it should detect etc.
>> > * We still open-source it but without the community support (if the
>> > community has objections to officially supporting it for instance).
>> >
>> > Just wanted to gauge the feeling/thoughts towards this tool from the
>> > community before going ahead.
>> >
>> > Thanks,
>> > Emre
>> >
>> >
>>
>>
>>
>>

Re: [Discussion] - Release major Flink version to support JDK 17 (LTS)

2023-04-27 Thread Jing Ge via user

Thanks Tamir for the information. According to the latest comment of the
task FLINK-24998, this bug should be gone while using the latest JDK 17. I
was wondering whether it means that there are no more issues to stop us
releasing a major Flink version to support Java 17? Did I miss something?

Best regards,
Jing

On Thu, Apr 27, 2023 at 8:18 AM Tamir Sagi 
wrote:

> More details about the JDK bug here
> https://bugs.openjdk.org/browse/JDK-8277529
>
> Related Jira ticket
> https://issues.apache.org/jira/browse/FLINK-24998
>
> ------
> *From:* Jing Ge via user 
> *Sent:* Monday, April 24, 2023 11:15 PM
> *To:* Chesnay Schepler 
> *Cc:* Piotr Nowojski ; Alexis Sarda-Espinosa <
> sarda.espin...@gmail.com>; Martijn Visser ;
> d...@flink.apache.org ; user 
> *Subject:* Re: [Discussion] - Release major Flink version to support JDK
> 17 (LTS)
>
>
> *EXTERNAL EMAIL*
>
>
> Thanks Chesnay for working on this. Would you like to share more info
> about the JDK bug?
>
> Best regards,
> Jing
>
> On Mon, Apr 24, 2023 at 11:39 AM Chesnay Schepler 
> wrote:
>
> As it turns out Kryo isn't a blocker; we ran into a JDK bug.
>
> On 31/03/2023 08:57, Chesnay Schepler wrote:
>
>
> https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5#migration-guide
>
> Kroy themselves state that v5 likely can't read v2 data.
>
> However, both versions can be on the classpath without classpath as v5
> offers a versioned artifact that includes the version in the package.
>
> It probably wouldn't be difficult to migrate a savepoint to Kryo v5,
> purely from a read/write perspective.
>
> The bigger question is how we expose this new Kryo version in the API. If
> we stick to the versioned jar we need to either duplicate all current
> Kryo-related APIs or find a better way to integrate other serialization
> stacks.
> On 30/03/2023 17:50, Piotr Nowojski wrote:
>
> Hey,
>
> > 1. The Flink community agrees that we upgrade Kryo to a later version,
> which means breaking all checkpoint/savepoint compatibility and releasing a
> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
> dropped. This is probably the quickest way, but would still mean that we
> expose Kryo in the Flink APIs, which is the main reason why we haven't been
> able to upgrade Kryo at all.
>
> This sounds pretty bad to me.
>
> Has anyone looked into what it would take to provide a smooth migration
> from Kryo2 -> Kryo5?
>
> Best,
> Piotrek
>
> czw., 30 mar 2023 o 16:54 Alexis Sarda-Espinosa 
> napisał(a):
>
> Hi Martijn,
>
> just to be sure, if all state-related classes use a POJO serializer, Kryo
> will never come into play, right? Given FLINK-16686 [1], I wonder how many
> users actually have jobs with Kryo and RocksDB, but even if there aren't
> many, that still leaves those who don't use RocksDB for
> checkpoints/savepoints.
>
> If Kryo were to stay in the Flink APIs in v1.X, is it impossible to let
> users choose between v2/v5 jars by separating them like log4j2 jars?
>
> [1] https://issues.apache.org/jira/browse/FLINK-16686
>
> Regards,
> Alexis.
>
> Am Do., 30. März 2023 um 14:26 Uhr schrieb Martijn Visser <
> martijnvis...@apache.org>:
>
> Hi all,
>
> I also saw a thread on this topic from Clayton Wohl [1] on this topic,
> which I'm including in this discussion thread to avoid that it gets lost.
>
> From my perspective, there's two main ways to get to Java 17:
>
> 1. The Flink community agrees that we upgrade Kryo to a later version,
> which means breaking all checkpoint/savepoint compatibility and releasing a
> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
> dropped. This is probably the quickest way, but would still mean that we
> expose Kryo in the Flink APIs, which is the main reason why we haven't been
> able to upgrade Kryo at all.
> 2. There's a contributor who makes a contribution that bumps Kryo, but
> either a) automagically reads in all old checkpoints/savepoints in using
> Kryo v2 and writes them to new snapshots using Kryo v5 (like is mentioned
> in the Kryo migration guide [2][3] or b) provides an offline tool that
> allows users that are interested in migrating their snapshots manually
> before starting from a newer version. That potentially could prevent the
> need to introduce a new Flink major version. In both scenarios, ideally the
> contributor would also help with avoiding the exposure of Kryo so that we
> will be in a better shape in the future.
>
> It would be good to get the opinion of the community for either of these
> two options, or potentially for another one that I haven't mentioned. If it
> appears that t

Re: 关于Apache Flink源码贡献流程

2023-04-24 Thread Jing Ge

Hi tanjianliang,

你说的英文讨论邮件是这个吧[1]? 建议以后讨论涉及到邮件引用时都带上链接，方便大家理解上下文。

结合邮件以及Jira里面的回复，你可以写好FLIP[2]之后再来发起新的讨论。

Best regards,
Jing

[1] https://lists.apache.org/thread/3yzvo6mynj637v2z10s895t7hhmv4rjd
[2]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

On Mon, Apr 24, 2023 at 11:10 AM tanjialiang  wrote:

> Hi, Jing Ge
> 感谢你的回复。
> 目前我已经发起了一个英文的邮件讨论（大约两周前），但是目前回复邮件的开发者只有一个，针对这种情况我应该如何去安排后续的代码贡献工作？
>
>
> Best regrads,
> tanjialiang.
> ---- 回复的原邮件 
> | 发件人 | Jing Ge |
> | 发送日期 | 2023年4月24日 16:30 |
> | 收件人 |  |
> | 主题 | Re: 关于Apache Flink源码贡献流程 |
> Hi，
>
> 如果是增加public API变更的话，建议先发起一个英文的邮件讨论，然后看是否需要创建FLIP，然后在基于FLIP发起更具体的技术讨论
>
> On Mon, Apr 24, 2023 at 10:06 AM tanjialiang  wrote:
>
> Hello，everyone.
> 我想向apache
>
> flink贡献源码，由于修复这个issue需要新增一些API，按照流程需要发起邮件讨论，但这个topic只得到一名开发者关注，这样的情况下我应该如何进行后面的流程？期待有熟悉flink源码贡献的开发者可以提供帮助
>
>
> issue: https://issues.apache.org/jira/browse/FLINK-31686
> discuss邮件标题: EncodingFormat and DecondingFormat provide copy API
>
>
> Best regrads
> tanjialiang.
>

Re: [Discussion] - Release major Flink version to support JDK 17 (LTS)

2023-04-24 Thread Jing Ge via user

Thanks Chesnay for working on this. Would you like to share more info about
the JDK bug?

Best regards,
Jing

On Mon, Apr 24, 2023 at 11:39 AM Chesnay Schepler 
wrote:

> As it turns out Kryo isn't a blocker; we ran into a JDK bug.
>
> On 31/03/2023 08:57, Chesnay Schepler wrote:
>
>
> https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5#migration-guide
>
> Kroy themselves state that v5 likely can't read v2 data.
>
> However, both versions can be on the classpath without classpath as v5
> offers a versioned artifact that includes the version in the package.
>
> It probably wouldn't be difficult to migrate a savepoint to Kryo v5,
> purely from a read/write perspective.
>
> The bigger question is how we expose this new Kryo version in the API. If
> we stick to the versioned jar we need to either duplicate all current
> Kryo-related APIs or find a better way to integrate other serialization
> stacks.
> On 30/03/2023 17:50, Piotr Nowojski wrote:
>
> Hey,
>
> > 1. The Flink community agrees that we upgrade Kryo to a later version,
> which means breaking all checkpoint/savepoint compatibility and releasing a
> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
> dropped. This is probably the quickest way, but would still mean that we
> expose Kryo in the Flink APIs, which is the main reason why we haven't been
> able to upgrade Kryo at all.
>
> This sounds pretty bad to me.
>
> Has anyone looked into what it would take to provide a smooth migration
> from Kryo2 -> Kryo5?
>
> Best,
> Piotrek
>
> czw., 30 mar 2023 o 16:54 Alexis Sarda-Espinosa 
> napisał(a):
>
>> Hi Martijn,
>>
>> just to be sure, if all state-related classes use a POJO serializer, Kryo
>> will never come into play, right? Given FLINK-16686 [1], I wonder how many
>> users actually have jobs with Kryo and RocksDB, but even if there aren't
>> many, that still leaves those who don't use RocksDB for
>> checkpoints/savepoints.
>>
>> If Kryo were to stay in the Flink APIs in v1.X, is it impossible to let
>> users choose between v2/v5 jars by separating them like log4j2 jars?
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-16686
>>
>> Regards,
>> Alexis.
>>
>> Am Do., 30. März 2023 um 14:26 Uhr schrieb Martijn Visser <
>> martijnvis...@apache.org>:
>>
>>> Hi all,
>>>
>>> I also saw a thread on this topic from Clayton Wohl [1] on this topic,
>>> which I'm including in this discussion thread to avoid that it gets lost.
>>>
>>> From my perspective, there's two main ways to get to Java 17:
>>>
>>> 1. The Flink community agrees that we upgrade Kryo to a later version,
>>> which means breaking all checkpoint/savepoint compatibility and releasing a
>>> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
>>> dropped. This is probably the quickest way, but would still mean that we
>>> expose Kryo in the Flink APIs, which is the main reason why we haven't been
>>> able to upgrade Kryo at all.
>>> 2. There's a contributor who makes a contribution that bumps Kryo, but
>>> either a) automagically reads in all old checkpoints/savepoints in using
>>> Kryo v2 and writes them to new snapshots using Kryo v5 (like is mentioned
>>> in the Kryo migration guide [2][3] or b) provides an offline tool that
>>> allows users that are interested in migrating their snapshots manually
>>> before starting from a newer version. That potentially could prevent the
>>> need to introduce a new Flink major version. In both scenarios, ideally the
>>> contributor would also help with avoiding the exposure of Kryo so that we
>>> will be in a better shape in the future.
>>>
>>> It would be good to get the opinion of the community for either of these
>>> two options, or potentially for another one that I haven't mentioned. If it
>>> appears that there's an overall agreement on the direction, I would propose
>>> that a FLIP gets created which describes the entire process.
>>>
>>> Looking forward to the thoughts of others, including the Users
>>> (therefore including the User ML).
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> [1]  https://lists.apache.org/thread/qcw8wy9dv8szxx9bh49nz7jnth22p1v2
>>> [2] https://lists.apache.org/thread/gv49jfkhmbshxdvzzozh017ntkst3sgq
>>> [3] https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5
>>>
>>> On Sun, Mar 19, 2023 at 8:16 AM Tamir Sagi 
>>> wrote:
>>>
 I agree, there are several options to mitigate the migration from v2 to
 v5.
 yet, Oracle roadmap is to end JDK 11 support in September this year.



 
 From: ConradJam 
 Sent: Thursday, March 16, 2023 4:36 AM
 To: d...@flink.apache.org 
 Subject: Re: [Discussion] - Release major Flink version to support JDK
 17 (LTS)

 EXTERNAL EMAIL



 Thanks for your start this discuss


 I have been tracking this problem for a long time, until I saw a
 conversation in ISSUSE a few days ago and learned that the Kryo version

Re: 关于Apache Flink源码贡献流程

2023-04-24 Thread Jing Ge

Hi，

如果是增加public API变更的话，建议先发起一个英文的邮件讨论，然后看是否需要创建FLIP，然后在基于FLIP发起更具体的技术讨论

On Mon, Apr 24, 2023 at 10:06 AM tanjialiang  wrote:

> Hello，everyone.
> 我想向apache
> flink贡献源码，由于修复这个issue需要新增一些API，按照流程需要发起邮件讨论，但这个topic只得到一名开发者关注，这样的情况下我应该如何进行后面的流程？期待有熟悉flink源码贡献的开发者可以提供帮助
>
>
> issue: https://issues.apache.org/jira/browse/FLINK-31686
> discuss邮件标题: EncodingFormat and DecondingFormat provide copy API
>
>
> Best regrads
> tanjialiang.

[SUMMARY] Flink 1.18 Release Sync 4/4/2023

2023-04-04 Thread Jing Ge via user

Dear devs and users,

Today was the kickoff meeting for Flink 1.18 release cycle. I'd like to
share the info synced in the meeting.

Meeting schedule:

Zoom will be used with a 40 mins limit for each meeting. That should be
fine for now. We will check it again if we have an issue with the time
limit later.

Release cycle will start bi-weekly and switch to weekly after the feature
freeze.

Feature freezing date:

July 11, 2023

Retrospective of 1.17 release:

There are many valuable thoughts and suggestions from previous release
managers[1]. Some of them are summarized as following:

- [Attention] Backports/merges without PRs will break master/release
branches. Kindly remind, every Flink developer, please pay attention to
avoid doing it.
- It is encouraged to create release testing tasks in advance, label them
properly, and finish them earlier, not necessarily wait to do it at the end
of the release cycle.
- A non-votable rc0 will be released in the future for developers to
validate the release.
- Some jira tickets have been created for 1.17 release that could be used
as the starting point to build a standard release pipeline. The release
process documented on the wiki page could be turned into a list of Jira
tasks (Jira template) in the future.

Daily work divisions:

In general, every release manager will be working on all daily issues. For
some major tasks, in order to make sure there will at least always be
someone to take care of them, they have been assigned to specific release
managers[2]. If you need support in each of these areas, please don't
hesitate to contact us.

1.18 Release page:

Flink 1.18 release has been kicked off today. We'd like to invite you to
update your development plan on the release page[2].

The next release sync up meeting will be on April 18, 2023. Please feel
free to join us!

Zoom meeting:
https://us04web.zoom.us/j/79158702091?pwd=8CXPqxMzbabWkma5b0qFXI1IcLbxBh.1

Best regards,
Konstantin, Sergey, Qingsheng, and Jing

[1] https://cwiki.apache.org/confluence/display/FLINK/1.17+Release
[2] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Jing Ge

Congrats!

Best regards,
Jing

On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu  wrote:

> Congratulations!
>
>
> Best,
> Leonard
>
> On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
>
> Dear Flinkers,
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
>
>
>
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
>
>
> 亲爱的Flinkers，
>
>
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
> 致礼，
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>
> [1] https://paimon.apache.org/
> [2] https://github.com/apache/incubator-paimon
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
>
>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Jing Ge via user

Congrats!

Best regards,
Jing

On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu  wrote:

> Congratulations!
>
>
> Best,
> Leonard
>
> On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
>
> Dear Flinkers,
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
>
>
>
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
>
>
> 亲爱的Flinkers，
>
>
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
> 致礼，
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>
> [1] https://paimon.apache.org/
> [2] https://github.com/apache/incubator-paimon
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
>
>

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-23 Thread Jing Ge via user

Excellent work! Congratulations! Appreciate the hard work and contributions
of everyone in the Apache Flink community who helped make this release
possible. Looking forward to those new features. Cheers!

Best regards,
Jing

On Thu, Mar 23, 2023 at 10:24 AM Leonard Xu  wrote:

> The Apache Flink community is very happy to announce the release of Apache 
> Flink
> 1.17.0, which is the first release for the Apache Flink 1.17 series.
>
> Apache Flink® is an open-source unified stream and batch data processing 
> framework
> for distributed, high-performing, always-available, and accurate data
> applications.
>
> The release is available for download at:
>
> *https://flink.apache.org/downloads.html
> *
> Please check out the release blog post for an overview of the improvements
> for this release:
>
> *https://flink.apache.org/2023/03/23/announcing-the-release-of-apache-flink-1.17/
> *
> The full release notes are available in Jira:
>
> *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351585
> 
> *
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Best regards,
> Qingsheng, Martijn, Matthias and Leonard
>

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-23 Thread Jing Ge

Excellent work! Congratulations! Appreciate the hard work and contributions
of everyone in the Apache Flink community who helped make this release
possible. Looking forward to those new features. Cheers!

Best regards,
Jing

On Thu, Mar 23, 2023 at 10:24 AM Leonard Xu  wrote:

> The Apache Flink community is very happy to announce the release of Apache 
> Flink
> 1.17.0, which is the first release for the Apache Flink 1.17 series.
>
> Apache Flink® is an open-source unified stream and batch data processing 
> framework
> for distributed, high-performing, always-available, and accurate data
> applications.
>
> The release is available for download at:
>
> *https://flink.apache.org/downloads.html
> *
> Please check out the release blog post for an overview of the improvements
> for this release:
>
> *https://flink.apache.org/2023/03/23/announcing-the-release-of-apache-flink-1.17/
> *
> The full release notes are available in Jira:
>
> *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351585
> 
> *
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Best regards,
> Qingsheng, Martijn, Matthias and Leonard
>

Re: KafkaSink handling message size produce errors

2023-02-17 Thread Jing Ge via user

ticket created: https://issues.apache.org/jira/browse/FLINK-31121

On Fri, Feb 17, 2023 at 9:59 AM Hatem Mostafa  wrote:

> Thanks for adding this to your backlog, I think it's definitely a very
> useful feature.
>
> Can you provide an example for how to extend KafkaSink to
> add this error handling? I have tried to do so but did not find it straight
> forward, since errors are thrown in the deliveryCallback of KafkaWriter and
> KafkaSink is not extendable since all its members are private and the
> constructor is package private.
>
> On Fri, Feb 17, 2023 at 8:17 AM Shammon FY  wrote:
>
>> Hi jing,
>>
>> It sounds good to me, we can add an option for it
>>
>> Best,
>> Shammon
>>
>>
>> On Fri, Feb 17, 2023 at 3:13 PM Jing Ge  wrote:
>>
>>> Hi,
>>>
>>> It makes sense to offer this feature of catching and ignoring exp with
>>> config on/off, when we put ourselves in users' shoes. WDYT? I will create a
>>> ticket if most of you consider it as a good feature to help users.
>>>
>>> Best regards,
>>> Jing
>>>
>>> On Fri, Feb 17, 2023 at 6:01 AM Shammon FY  wrote:
>>>
>>>> Hi Hatem
>>>>
>>>> As mentioned above, you can extend the KafkaSink or create a udf and
>>>> process the record before sink
>>>>
>>>> Best,
>>>> Shammon
>>>>
>>>> On Fri, Feb 17, 2023 at 9:54 AM yuxia 
>>>> wrote:
>>>>
>>>>> Hi, Hatem.
>>>>> I think there is no way to catch the exception and then ignore it in
>>>>> current implementation for KafkaSink.  You may also need to extend the
>>>>> KafkaSink.
>>>>>
>>>>> Best regards,
>>>>> Yuxia
>>>>>
>>>>> --
>>>>> *发件人: *"Hatem Mostafa" 
>>>>> *收件人: *"User" 
>>>>> *发送时间: *星期四, 2023年 2 月 16日 下午 9:32:44
>>>>> *主题: *KafkaSink handling message size produce errors
>>>>>
>>>>> Hello,
>>>>> I am writing a flink job that reads and writes into kafka, it is using
>>>>> a window operator and eventually writing the result of the window into a
>>>>> kafka topic. The accumulated data can exceed the maximum message size 
>>>>> after
>>>>> compression on the producer level. I want to be able to catch the 
>>>>> exception
>>>>> coming from the producer and ignore this window. I could not find a way to
>>>>> do that in KafkaSink
>>>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#kafka-sink>,
>>>>> is there a way to do so?
>>>>>
>>>>> I attached here an example of an error that I would like to handle
>>>>> gracefully.
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>> This question is similar to one that was asked on stackoverflow here
>>>>> <https://stackoverflow.com/questions/52308911/how-to-handle-exceptions-in-kafka-sink>
>>>>>  but
>>>>> the answer is relevant for older versions of flink.
>>>>>
>>>>> Regards,
>>>>> Hatem
>>>>>
>>>>>

Re: KafkaSink handling message size produce errors

2023-02-16 Thread Jing Ge via user

Hi,

It makes sense to offer this feature of catching and ignoring exp with
config on/off, when we put ourselves in users' shoes. WDYT? I will create a
ticket if most of you consider it as a good feature to help users.

Best regards,
Jing

On Fri, Feb 17, 2023 at 6:01 AM Shammon FY  wrote:

> Hi Hatem
>
> As mentioned above, you can extend the KafkaSink or create a udf and
> process the record before sink
>
> Best,
> Shammon
>
> On Fri, Feb 17, 2023 at 9:54 AM yuxia  wrote:
>
>> Hi, Hatem.
>> I think there is no way to catch the exception and then ignore it in
>> current implementation for KafkaSink.  You may also need to extend the
>> KafkaSink.
>>
>> Best regards,
>> Yuxia
>>
>> --
>> *发件人: *"Hatem Mostafa" 
>> *收件人: *"User" 
>> *发送时间: *星期四, 2023年 2 月 16日 下午 9:32:44
>> *主题: *KafkaSink handling message size produce errors
>>
>> Hello,
>> I am writing a flink job that reads and writes into kafka, it is using a
>> window operator and eventually writing the result of the window into a
>> kafka topic. The accumulated data can exceed the maximum message size after
>> compression on the producer level. I want to be able to catch the exception
>> coming from the producer and ignore this window. I could not find a way to
>> do that in KafkaSink
>> ,
>> is there a way to do so?
>>
>> I attached here an example of an error that I would like to handle
>> gracefully.
>>
>> [image: image.png]
>>
>>
>> This question is similar to one that was asked on stackoverflow here
>> 
>>  but
>> the answer is relevant for older versions of flink.
>>
>> Regards,
>> Hatem
>>
>>

Re: [ANNOUNCE] FRocksDB 6.20.3-ververica-2.0 released

2023-01-30 Thread Jing Ge

Hi Yanfei,

Thanks for your effort. Looking forward to checking it.

Best regards,
Jing

On Mon, Jan 30, 2023 at 1:42 PM Yanfei Lei  wrote:

> It is very happy to announce the release of FRocksDB 6.20.3-ververica-2.0.
>
> Compiled files for Linux x86, Linux arm, Linux ppc64le, MacOS x86,
> MacOS arm, and Windows are included in FRocksDB 6.20.3-ververica-2.0
> jar, and the FRocksDB in Flink 1.17 would be updated to
> 6.20.3-ververica-2.0.
>
> Release highlights:
> - [FLINK-30457] Add periodic_compaction_seconds option to RocksJava[1].
> - [FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13[2].
> - Avoid expensive ToString() call when not in debug[3].
> - [FLINK-24932] Support build FRocksDB Java on Apple silicon[4].
>
> Maven artifacts for FRocksDB can be found at:
> https://mvnrepository.com/artifact/com.ververica/frocksdbjni
>
> We would like to thank all efforts from the Apache Flink community
> that made this release possible!
>
> [1] https://issues.apache.org/jira/browse/FLINK-30457
> [2] https://issues.apache.org/jira/browse/FLINK-30321
> [3] https://github.com/ververica/frocksdb/pull/55
> [4] https://issues.apache.org/jira/browse/FLINK-24932
>
> Best regards,
> Yanfei
> Ververica(Alibaba)
>

Re: [ANNOUNCE] FRocksDB 6.20.3-ververica-2.0 released

2023-01-30 Thread Jing Ge via user

Hi Yanfei,

Thanks for your effort. Looking forward to checking it.

Best regards,
Jing

On Mon, Jan 30, 2023 at 1:42 PM Yanfei Lei  wrote:

> It is very happy to announce the release of FRocksDB 6.20.3-ververica-2.0.
>
> Compiled files for Linux x86, Linux arm, Linux ppc64le, MacOS x86,
> MacOS arm, and Windows are included in FRocksDB 6.20.3-ververica-2.0
> jar, and the FRocksDB in Flink 1.17 would be updated to
> 6.20.3-ververica-2.0.
>
> Release highlights:
> - [FLINK-30457] Add periodic_compaction_seconds option to RocksJava[1].
> - [FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13[2].
> - Avoid expensive ToString() call when not in debug[3].
> - [FLINK-24932] Support build FRocksDB Java on Apple silicon[4].
>
> Maven artifacts for FRocksDB can be found at:
> https://mvnrepository.com/artifact/com.ververica/frocksdbjni
>
> We would like to thank all efforts from the Apache Flink community
> that made this release possible!
>
> [1] https://issues.apache.org/jira/browse/FLINK-30457
> [2] https://issues.apache.org/jira/browse/FLINK-30321
> [3] https://github.com/ververica/frocksdb/pull/55
> [4] https://issues.apache.org/jira/browse/FLINK-24932
>
> Best regards,
> Yanfei
> Ververica(Alibaba)
>

Re: Reading Parquet file with array of structs cause error

2022-11-16 Thread Jing Ge

Hi Michael,

yeah, it will be addressed in Flink-28867.

Best regards,
Jing


On Wed, Nov 16, 2022 at 2:58 AM liu ron  wrote:

> It will be addressed in FLINK-28867.
>
> Best,
> Ron
>
> Benenson, Michael via user  于2022年11月16日周三 08:47写道：
>
>> Thanks, Jing
>>
>>
>>
>> Do you know, if this problem will be addressed in FLINK-28867
>> <https://issues.apache.org/jira/browse/FLINK-28867> or  it deserve a
>> separate Jira?
>>
>>
>>
>>
>>
>> *From: *Jing Ge 
>> *Date: *Tuesday, November 15, 2022 at 3:39 PM
>> *To: *Benenson, Michael 
>> *Cc: *user@flink.apache.org , Deshpande, Omkar <
>> omkar_deshpa...@intuit.com>, Vora, Jainik 
>> *Subject: *Re: Reading Parquet file with array of structs cause error
>>
>> This email is from an external sender.
>>
>>
>>
>> Hi Michael,
>>
>>
>>
>> Currently, ParquetColumnarRowInputFormat does not support schemas with
>> nested columns. If your parquet file stores Avro records. You might want to
>> try e.g. Avro Generic record[1].
>>
>>
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/formats/parquet/#generic-record
>>
>>
>>
>> Best regards,
>>
>> Jing
>>
>>
>>
>>
>>
>> On Tue, Nov 15, 2022 at 8:52 PM Benenson, Michael via user <
>> user@flink.apache.org> wrote:
>>
>> Hi, folks
>>
>>
>>
>> I’m using flink 1.16.0, and I would like to read Parquet file (attached),
>> that has schema [1].
>>
>>
>>
>> I could read this file with Spark, but when I try to read it with Flink
>> 1.16.0 (program attached) using schema [2]
>>
>> I got IndexOutOfBoundsException [3]
>>
>>
>>
>> My code, and parquet file are attached. Is it:
>>
>> · the problem, described in FLINK-28867
>> <https://issues.apache.org/jira/browse/FLINK-28867> or
>>
>> · something new, that deserve a separate Jira, or
>>
>> · something wrong with my code?
>>
>>
>>
>> [1]: Parquet Schema
>>
>>
>>
>> root
>>
>> |-- amount: decimal(38,9) (nullable = true)
>>
>> |-- connectionAccountId: string (nullable = true)
>>
>> |-- sourceEntity: struct (nullable = true)
>>
>> ||-- extendedProperties: array (nullable = true)
>>
>> |||-- element: struct (containsNull = true)
>>
>> ||||-- key: string (nullable = true)
>>
>> ||||-- value: string (nullable = true)
>>
>> ||-- sourceAccountId: string (nullable = true)
>>
>> ||-- sourceEntityId: string (nullable = true)
>>
>> ||-- sourceEntityType: string (nullable = true)
>>
>> ||-- sourceSystem: string (nullable = true)
>>
>>
>>
>>
>>
>> [2]: Schema used in Flink:
>>
>>
>>
>> static RowType getSchema()
>>
>> {
>>
>> RowType elementType = RowType.of(
>>
>> new LogicalType[] {
>>
>> new VarCharType(VarCharType.MAX_LENGTH),
>>
>> new VarCharType(VarCharType.MAX_LENGTH)
>>
>> },
>>
>> new String[] {
>>
>> "key",
>>
>> "value"
>>
>> }
>>
>> );
>>
>>
>>
>> RowType element = RowType.of(
>>
>> new LogicalType[] { elementType },
>>
>> new String[] { "element" }
>>
>> );
>>
>>
>>
>> RowType sourceEntity = RowType.of(
>>
>> new LogicalType[] {
>>
>> new ArrayType(element),
>>
>> new VarCharType(),
>>
>> new VarCharType(),
>>
>> new VarCharType(),
>>
>> new VarCharType(),
>>
>> },
>>
>> new String[] {
>>
>> "extendedProperties",
>>
>> "sourceAccountId",
>>
>> "sourceEntityId",
>>
>> "sourceEntityType",
>>
>> "sourceSystem"
>>
>> }
>>
>> );
>>
>>
>>
>> return  RowType

Re: Reading Parquet file with array of structs cause error

2022-11-15 Thread Jing Ge

Hi Michael,

Currently, ParquetColumnarRowInputFormat does not support schemas with
nested columns. If your parquet file stores Avro records. You might want to
try e.g. Avro Generic record[1].

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/formats/parquet/#generic-record

Best regards,
Jing


On Tue, Nov 15, 2022 at 8:52 PM Benenson, Michael via user <
user@flink.apache.org> wrote:

> Hi, folks
>
>
>
> I’m using flink 1.16.0, and I would like to read Parquet file (attached),
> that has schema [1].
>
>
>
> I could read this file with Spark, but when I try to read it with Flink
> 1.16.0 (program attached) using schema [2]
>
> I got IndexOutOfBoundsException [3]
>
>
>
> My code, and parquet file are attached. Is it:
>
> · the problem, described in FLINK-28867
>  or
>
> · something new, that deserve a separate Jira, or
>
> · something wrong with my code?
>
>
>
> [1]: Parquet Schema
>
>
>
> root
>
> |-- amount: decimal(38,9) (nullable = true)
>
> |-- connectionAccountId: string (nullable = true)
>
> |-- sourceEntity: struct (nullable = true)
>
> ||-- extendedProperties: array (nullable = true)
>
> |||-- element: struct (containsNull = true)
>
> ||||-- key: string (nullable = true)
>
> ||||-- value: string (nullable = true)
>
> ||-- sourceAccountId: string (nullable = true)
>
> ||-- sourceEntityId: string (nullable = true)
>
> ||-- sourceEntityType: string (nullable = true)
>
> ||-- sourceSystem: string (nullable = true)
>
>
>
>
>
> [2]: Schema used in Flink:
>
>
>
> static RowType getSchema()
>
> {
>
> RowType elementType = RowType.of(
>
> new LogicalType[] {
>
> new VarCharType(VarCharType.MAX_LENGTH),
>
> new VarCharType(VarCharType.MAX_LENGTH)
>
> },
>
> new String[] {
>
> "key",
>
> "value"
>
> }
>
> );
>
>
>
> RowType element = RowType.of(
>
> new LogicalType[] { elementType },
>
> new String[] { "element" }
>
> );
>
>
>
> RowType sourceEntity = RowType.of(
>
> new LogicalType[] {
>
> new ArrayType(element),
>
> new VarCharType(),
>
> new VarCharType(),
>
> new VarCharType(),
>
> new VarCharType(),
>
> },
>
> new String[] {
>
> "extendedProperties",
>
> "sourceAccountId",
>
> "sourceEntityId",
>
> "sourceEntityType",
>
> "sourceSystem"
>
> }
>
> );
>
>
>
> return  RowType.of(
>
> new LogicalType[] {
>
> new DecimalType(),
>
> new VarCharType(),
>
> sourceEntity
>
> },
>
> new String[] {
>
> "amount",
>
> "connectionAccountId",
>
> "sourceEntity",
>
> });
>
> }
>
>
>
> [3]:  Execution Exception:
>
>
> 2022/11/15 11:39:58.657 ERROR o.a.f.c.b.s.r.f.SplitFetcherManager -
> Received uncaught exception.
>
> java.lang.RuntimeException: SplitFetcher thread 0 received unexpected
> exception while polling the records
>
> at
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:150)
>
> ...
>
> Caused by: java.lang.IndexOutOfBoundsException: Index 1 out of bounds for
> length 1
>
> at
> java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
>
> at
> java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
>
> at
> java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
>
> at java.base/java.util.Objects.checkIndex(Objects.java:372)
>
> at java.base/java.util.ArrayList.get(ArrayList.java:459)
>
> at org.apache.parquet.schema.GroupType.getType(GroupType.java:216)
>
> at
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:536)
>
> at
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:533)
>
> at
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:503)
>
> at
> org.apache.flink.formats.parquet.vector.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:533)
>
> at
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createWritableVectors(ParquetVectorizedInputFormat.java:281)
>
> at
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReaderBatch(ParquetVectorizedInputFormat.java:270)
>
> at
>

Re: [blog article] Howto migrate a real-life batch pipeline from the DataSet API to the DataStream API

2022-11-10 Thread Jing Ge

Hi Etienne,

Nice blog! Thanks for sharing!

Best regards,
Jing


On Wed, Nov 9, 2022 at 5:49 PM Etienne Chauchot 
wrote:

> Hi Yun Gao,
>
> FYI I just updated the article after your review:
> https://echauchot.blogspot.com/2022/11/flink-howto-migrate-real-life-batch.html
>
> Best
>
> Etienne
> Le 09/11/2022 à 10:04, Etienne Chauchot a écrit :
>
> Hi Yun Gao,
>
> thanks for your email and your review !
>
> My comments are inline
> Le 08/11/2022 à 06:51, Yun Gao a écrit :
>
> Hi Etienne,
>
> Very thanks for the article! Flink is currently indeed keeping increasing
> the
> ability of unified batch / stream processing with the same api, and its a
> great
> pleasure that more and more users are trying this functionality. But I also
> have some questions regarding some details.
>
> First IMO, as a whole for the long run Flink will have two unified APIs,
> namely Table / SQL
> API and DataStream API. Users could express the computation logic with
> these two APIs
> for both bounded and unbounded data processing.
>
>
> Yes that is what I understood also throughout the discussions and jiras.
> And I also think IMHO that reducing the number of APIs to 2 was the good
> move.
>
>
> Underlying Flink provides two
> execution modes:  the streaming mode works with both bounded and unbounded
> data,
> and it executes in a way of incremental processing based on state; the
> batch mode works
> only with bounded data, and it executes in a ways level-by-level similar
> to the traditional
> batch processing frameworks. Users could switch the execution mode via
> EnvironmentSettings.inBatchMode() for
> StreamExecutionEnvironment.setRuntimeMode().
>
> As recommended in Flink docs(1) I have enabled the batch mode as I though
> it would be more efficient on my bounded pipeline but as a matter of fact
> the streaming mode seems to be more efficient on my use case. I'll test
> with higher volumes to confirm.
>
>
>
> Specially for DataStream, as implemented in FLIP-140, currently all the
> existing DataStream
> operation supports the batch execution mode in a unified way[1]:  data
> will be sorted for the
> keyBy() edges according to the key, then the following operations like
> reduce() could receive
> all the data belonging to the same key consecutively, then it could
> directly reducing the records
> of the same key without maintaining the intermediate states. In this way
> users could write the
> same code for both streaming and batch processing with the same code.
>
>
> Yes I have no doubt that my resulting Query3ViaFlinkRowDatastream pipeline
> will work with no modification if I plug an unbounded source to it.
>
>
>
> # Regarding the migration of Join / Reduce
>
> First I think Reduce is always supported and users could write
> dataStream.keyBy().reduce(xx)
> directly, and  if batch  execution mode is set, the reduce will not be
> executed in a incremental way,
> instead is acts much  like sort-based  aggregation in the traditional
> batch processing framework.
>
> Regarding Join, although the issue of FLINK-22587 indeed exists: current
> join has to be bound
> to a window and the GlobalWindow does not work properly, but with some
> more try currently
> it does not need users to  re-write the whole join from scratch: Users
> could write a dedicated
> window assigner that assigns all the  records to the same window instance
> and return
> EventTimeTrigger.create() as the default event-time trigger [2]. Then it
> works
>
> source1.join(source2)
> .where(a -> a.f0)
> .equalTo(b -> b.f0)
> .window(new EndOfStreamWindows())
> .apply();
>
> It does not requires records have event-time attached since the trigger of
> window is only
> relying on the time range of the window and the assignment does not need
> event-time either.
>
> The behavior of the join is also similar to sort-based join if batch mode
> is enabled.
>
> Of course it is not easy to use to let users do the workaround and we'll
> try to fix this issue in 1.17.
>
>
> Yes, this is a better workaround than the manual state-based join that I
> proposed. I tried it and it works perfectly with similar performance.
> Thanks.
>
>
> # Regarding support of Sort / Limit
>
> Currently these two operators are indeed not supported in the DataStream
> API directly. One initial
> though for these two operations are that users may convert the DataStream
> to Table API and use
> Table API for these two operators:
>
> DataStream xx = ... // Keeps the customized logic in DataStream
> Table tableXX = tableEnv.fromDataStream(dataStream);
> tableXX.orderBy($("a").asc());
>
>
> Yes I knew that workaround but I decided not to use it because I have a
> special SQL based implementation (for comparison reasons) so I did not want
> to mix SQL and DataStream APIs in the same pipeline.
>
>
> How do you think about this option? We are also assessing if the
> combination of DataStream
> API / Table API is sufficient for all the batch

Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread Jing Ge

Congrats!

On Fri, Oct 28, 2022 at 1:22 PM 任庆盛  wrote:

> Congratulations and a big thanks to Chesnay, Martijn, Godfrey and Xingbo
> for the awesome work for 1.16!
>
> Best regards,
> Qingsheng Ren
>
> > On Oct 28, 2022, at 14:46, Xingbo Huang  wrote:
> >
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 1.16.0, which is the first release for the Apache Flink 1.16
> series.
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> streaming
> > applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> > improvements for this release:
> > https://flink.apache.org/news/2022/10/28/1.16-announcement.html
> >
> > The full release notes are available in Jira:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351275
> >
> > We would like to thank all contributors of the Apache Flink community
> > who made this release possible!
> >
> > Regards,
> > Chesnay, Martijn, Godfrey & Xingbo
>

Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread Jing Ge

Congrats!

On Fri, Oct 28, 2022 at 1:22 PM 任庆盛  wrote:

> Congratulations and a big thanks to Chesnay, Martijn, Godfrey and Xingbo
> for the awesome work for 1.16!
>
> Best regards,
> Qingsheng Ren
>
> > On Oct 28, 2022, at 14:46, Xingbo Huang  wrote:
> >
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 1.16.0, which is the first release for the Apache Flink 1.16
> series.
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> streaming
> > applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> > improvements for this release:
> > https://flink.apache.org/news/2022/10/28/1.16-announcement.html
> >
> > The full release notes are available in Jira:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351275
> >
> > We would like to thank all contributors of the Apache Flink community
> > who made this release possible!
> >
> > Regards,
> > Chesnay, Martijn, Godfrey & Xingbo
>

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-13 Thread Jing Ge

Hi Qingsheng,

Thanks for the clarification. +1, I like the idea. Pointing both numXXXOut
and numXXXSend to the same external data transfer metric does not really
break the new SinkV2 design, since there was no requirement to monitor the
internal traffic. So, I think both developer and user can live with it. It
might not be the perfect solution but is indeed the currently best
trade-off solution after considering the backward compatibility.  I would
suggest firing a follow-up ticket after the PR to take care of the new
metric for the internal traffic in the future.

Best regards,
Jing


On Thu, Oct 13, 2022 at 3:08 PM Qingsheng Ren  wrote:

> Hi Jing,
>
> Thanks for the reply!
>
> Let me rephrase my proposal: we’d like to use numXXXOut registered on
> SinkWriterOperator to reflect the traffic to the external system for
> compatibility with old versions before 1.15, and make numXXXSend have the
> same value as numXXXOut for compatibility within 1.15. That means both
> numXXXOut and numXXXSend are used for external data transfers, which end
> users care more about. As for the internal traffic within the sink, we
> could name a new metric for it because this is a _new_ feature in the _new_
> sink, and end users usually don’t pay attention to internal implementation.
> The name of the new metric could be discussed later after 1.16 release.
>
> > but it might end up with monitoring unexpected metrics, which is even
> worse for users, i.e. I didn't change anything, but something has been
> broken since the last update.
>
> Yeah this is exactly what we are trying to fix with this proposal. I
> believe users are more concerned with the output to the external system
> than the internal data delivery in the sink, so I think we’ll have more
> cases reporting like “I set up a panel on numRecordsOut in sink to monitor
> the output of the job, but after upgrading to 1.15 this value is extremely
> low and I didn’t change anything” if we stick to the current situation. I
> think only a few end users care about the number of committables sending to
> downstream as most of them don’t care how the sink works.
>
> We do need a re-design to fully distinguish the internal and external
> traffic on metrics, not only in sink but in all operators as it’s quite
> common for operators to make IO. This needs time to design, discuss, adjust
> and vote, but considering this is blocking 1.16, maybe it’s better to
> rescue the compatibility for now, and leave the huge reconstruction to
> future versions (maybe 2.0).
>
> Best,
> Qingsheng
>
> On Wed, Oct 12, 2022 at 7:21 PM Jing Ge  wrote:
>
>> Hi Qingsheng,
>>
>> Just want to make sure we are on the same page. Are you suggesting
>> switching the naming between "numXXXSend" and "numXXXOut" or reverting all
>> the changes we did with FLINK-26126 and FLINK-26492?
>>
>> For the naming switch, please pay attention that the behaviour has been
>> changed since we introduced SinkV2[1]. So, please be aware of different
>> numbers(behaviour change) even with the same metrics name. Sticking with
>> the old name with the new behaviour (very bad idea, IMHO) might seem like
>> saving the effort in the first place, but it might end up with monitoring
>> unexpected metrics, which is even worse for users, i.e. I didn't change
>> anything, but something has been broken since the last update.
>>
>> For reverting, I am not sure how to fix the issue mentioned in
>> FLINK-26126 after reverting all changes. Like Chesnay has already pointed
>> out, with SinkV2 we have two different output lines - one with the external
>> system and the other with the downstream operator. In this case,
>> "numXXXSend" is rather a new metric than a replacement of "numXXXOut". The
>> "numXXXOut" metric can still be used, depending on what the user wants to
>> monitor.
>>
>>
>> Best regards,
>> Jing
>>
>> [1]
>> https://github.com/apache/flink/blob/51fc20db30d001a95de95b3b9993eeb06f558f6c/flink-metrics/flink-metrics-core/src/main/java/org/apache/flink/metrics/groups/SinkWriterMetricGroup.java#L48
>>
>>
>> On Wed, Oct 12, 2022 at 12:48 PM Qingsheng Ren  wrote:
>>
>>> As a supplement, considering it could be a big reconstruction
>>> redefining internal and external traffic and touching metric names in
>>> almost all operators, this requires a lot of discussions and we might
>>> do it finally in Flink 2.0. I think compatibility is a bigger blocker
>>> in front of us, as the output of sink is a metric that users care a
>>> lot about.
>>>
>>> Thanks,
>>> Qingsheng
>>>
&g

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-12 Thread Jing Ge

Hi Qingsheng,

Just want to make sure we are on the same page. Are you suggesting
switching the naming between "numXXXSend" and "numXXXOut" or reverting all
the changes we did with FLINK-26126 and FLINK-26492?

For the naming switch, please pay attention that the behaviour has been
changed since we introduced SinkV2[1]. So, please be aware of different
numbers(behaviour change) even with the same metrics name. Sticking with
the old name with the new behaviour (very bad idea, IMHO) might seem like
saving the effort in the first place, but it might end up with monitoring
unexpected metrics, which is even worse for users, i.e. I didn't change
anything, but something has been broken since the last update.

For reverting, I am not sure how to fix the issue mentioned in FLINK-26126
after reverting all changes. Like Chesnay has already pointed out, with
SinkV2 we have two different output lines - one with the external system
and the other with the downstream operator. In this case, "numXXXSend" is
rather a new metric than a replacement of "numXXXOut". The "numXXXOut"
metric can still be used, depending on what the user wants to monitor.


Best regards,
Jing

[1]
https://github.com/apache/flink/blob/51fc20db30d001a95de95b3b9993eeb06f558f6c/flink-metrics/flink-metrics-core/src/main/java/org/apache/flink/metrics/groups/SinkWriterMetricGroup.java#L48


On Wed, Oct 12, 2022 at 12:48 PM Qingsheng Ren  wrote:

> As a supplement, considering it could be a big reconstruction
> redefining internal and external traffic and touching metric names in
> almost all operators, this requires a lot of discussions and we might
> do it finally in Flink 2.0. I think compatibility is a bigger blocker
> in front of us, as the output of sink is a metric that users care a
> lot about.
>
> Thanks,
> Qingsheng
>
> On Wed, Oct 12, 2022 at 6:20 PM Qingsheng Ren  wrote:
> >
> > Thanks Chesnay for the reply. +1 for making a unified and clearer
> > metric definition distinguishing internal and external data transfers.
> > As you described, having IO in operators is quite common such as
> > dimension tables in Table/SQL API. This definitely deserves a FLIP and
> > an overall design.
> >
> > However I think it's necessary to change the metric back to
> > numRecordsOut instead of sticking with numRecordsSend in 1.15 and
> > 1.16. The most important argument is for compatibility as I mentioned
> > in my previous email, otherwise all users have to modify their configs
> > of metric systems after upgrading to Flink 1.15+, and all custom
> > connectors have to change their implementations to migrate to the new
> > metric name. I believe other ones participating and approving this
> > proposal share the same concern about compatibility too. Also
> > considering this issue is blocking the release of 1.16, maybe we could
> > fix this asap, and as for defining a new metric for internal data
> > transfers we can have an in-depth discussion later. WDYT?
> >
> > Best,
> > Qingsheng
> >
> > On Tue, Oct 11, 2022 at 6:06 PM Chesnay Schepler 
> wrote:
> > >
> > > Currently I think that would be a mistake.
> > >
> > > Ultimately what we have here is the culmination of us never really
> considering how the numRecordsOut metric should behave for operators that
> emit data to other operators _and_ external systems. This goes beyond sinks.
> > > This even applies to numRecordsIn, for cases where functions
> query/write data from/to the outside, (e.g., Async IO).
> > >
> > > Having 2 separate metrics for that, 1 exclusively for internal data
> transfers, and 1 exclusively for external data transfers, is the only way
> to get a consistent metric definition in the long-run.
> > > We can jump back-and-forth now or just commit to it.
> > >
> > > I don't think we can really judge this based on FLIP-33. It was IIRC
> written before the two phase sinks were added, which heavily blurred the
> lines of what a sink even is. Because it definitely is _not_ the last
> operator in a chain anymore.
> > >
> > > What I would suggest is to stick with what we got (although I despise
> the name numRecordsSend), and alias the numRecordsOut metric for all
> non-TwoPhaseCommittingSink.
> > >
> > > On 11/10/2022 05:54, Qingsheng Ren wrote:
> > >
> > > Thanks for the details Chesnay!
> > >
> > > By “alias” I mean to respect the original definition made in FLIP-33
> for numRecordsOut, which is the number of records written to the external
> system, and keep numRecordsSend as the same value as numRecordsOut for
> compatibility.
> > >
> > > I think keeping numRecordsOut for the output to the external system is
> more intuitive to end users because in most cases the metric of data flow
> output is more essential. I agree with you that a new metric is required,
> but considering compatibility and users’ intuition I prefer to keep the
> initial definition of numRecordsOut in FLIP-33 and name a new metric for
> sink writer’s output to downstream operators. This might be against
> consistency with

Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-04 Thread Jing Ge

Hi Martijn,

Thanks for bringing this up. It is generally a great idea, so +1.

Since both scala extension projects mentioned in the FLIP are still very
young and I don't think they will attract more scala developers as Flink
could just because they are external projects. It will be a big issue for
users who have to rewrite their large codebases. Those users should be
aware of the effort from now on and would better not count on those scala
extension projects and prepare their migration plan before Flink 2.0.

Best regards,
Jing


On Tue, Oct 4, 2022 at 1:59 PM Martijn Visser 
wrote:

> Hi Marton,
>
> You're making a good point, I originally wanted to include already the
> User mailing list to get their feedback but forgot to do so. I'll do some
> more outreach via other channels as well.
>
> @Users of Flink, I've made a proposal to deprecate and remove Scala API
> support in a future version of Flink. Your feedback on this topic is very
> much appreciated.
>
> Regarding the large Scala codebase for Flink, a potential alternative
> could be to have a wrapper for all Java APIs that makes them available as
> Scala APIs. However, this still requires Scala maintainers and I don't
> think that we currently have those in our community. The easiest solution
> for them would be to use the Java APIs directly. Yes it would involve work,
> but we won't actually be able to remove the Scala APIs until Flink 2.0 so
> there's still time for that :)
>
> Best regards,
>
> Martijn
>
> On Tue, Oct 4, 2022 at 1:26 AM Márton Balassi 
> wrote:
>
>> Hi Martjin,
>>
>> Thanks for compiling the FLIP. I agree with the sentiment that Scala poses
>> considerable maintenance overhead and key improvements (like 2.13 or
>> 2.12.8
>> supports) are hanging stale. With that said before we make this move we
>> should attempt to understand the userbase affected.
>> A quick Slack and user mailing list search does return quite a bit of
>> results for scala (admittedly a cursory look at them suggest that many of
>> them have to do with missing features in Scala that exist in Java or Scala
>> versions). I would love to see some polls on this topic, we could also use
>> the Flink twitter handle to ask the community about this.
>>
>> I am aware of users having large existing Scala codebases for Flink. This
>> move would pose a very large effort on them, as they would need to rewrite
>> much of their existing code. What are the alternatives in your opinion,
>> Martjin?
>>
>> On Tue, Oct 4, 2022 at 6:22 AM Martijn Visser 
>> wrote:
>>
>> > Hi everyone,
>> >
>> > I would like to open a discussion thread on FLIP-265 Deprecate and
>> remove
>> > Scala API support. Please take a look at
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-265+Deprecate+and+remove+Scala+API+support
>> > and provide your feedback.
>> >
>> > Best regards,
>> >
>> > Martijn
>> > https://twitter.com/MartijnVisser82
>> > https://github.com/MartijnVisser
>> >
>>
>

Re: DataStream and DataStreamSource

2022-09-14 Thread Jing Ge

Hi,

Welcome to the Flink community!

A DataStreamSource is a DataStream. It is normally used as the starting
point of a DataStream.  All related methods in StreamExecutionEnvironment
that create a DataStream return actually a DataStreamSource, because it is
where a DataStream starts.

Commonly, you don't care about DataStreamSource, just use DataStream even
if methods in StreamExecutionEnvironment return a DataStreamSource [1].
DataStreamSource created by those methods in StreamExecutionEnvironment
will use built-in SourceTranfromation. If you want to modify the
configuration of the transformation, you can specifically use the
DataStreamSource type(instead of DataStream) which provides some setter
methods extended from SingleOutputStreamOperator.

Best regards,
Jing

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/overview/

On Wed, Sep 14, 2022 at 3:23 PM Noel OConnor  wrote:

> Hi,
> I'm new to flink and I'm trying to integrate it with apache pulsar.
> I've gone through the demos and I get how they work but one aspect
> that I can't figure out is what's the difference between a DataStream
> and a DataStreamSource.
> When would you use one over the other?
>
> cheers
> Noel
>

Re: Re: 关于flink table store的疑问

2022-09-09 Thread Jing Ge

我理解，目前大的目标是为了流批一体，设计上实际上是对存储进行了抽象，从某种角度上来看可以理解为存储虚拟化，未来的想象空间要大很多。Iceberg，Hudi这些可以作为底层对接的一种具体实现。

On Fri, Sep 9, 2022 at 2:44 PM Xuyang  wrote:

> Hi，我理解Flink table store主要有以下几个优势：
> 1、减少架构复杂性，不需要额外引入多余的组件
> 2、支持Flink计算中直接使用Flink table store的存储
> 3、毫秒级流式查询和olap能力
>
>
>
>
> --
>
> Best！
> Xuyang
>
>
>
>
>
> 在 2022-09-08 16:09:39，"r pp"  写道：
> >应该是为了 流批一体 。不丢数据
> >
> >Kyle Zhang  于2022年9月8日周四 08:37写道：
> >
> >> Hi all,
> >>   看table
> >> store的介绍也是关于数据湖存储以及用于实时流式读取的，那在定位上与iceberg、hudi等项目有什么不一样么，为什么要再开发一个项目？
> >>
> >> Best.
> >>
> >
> >
> >--
> >Best，
> >  pp
>

Re: Flink upgrade path

2022-09-07 Thread Jing Ge

Hi,

I would recommend you to check the release notes of 1.14[1] and 1.15[2]. If
your Flink jobs are using Flink features that have big improvements in
these two releases, it would be better to upgrade step by step without
skipping 1.14.x.

In general, depending on how complicated your jobs are, it is always a big
challenge to upgrade Flink with skipping version(s), i.e. it is recommended
to upgrade constantly following the Flink release period.

[1]
https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.14/
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/

Best regards,
Jing

On Wed, Sep 7, 2022 at 11:14 AM Congxian Qiu  wrote:

> In addition to the state compatibility mentioned above, the interfaces
> provided by Flink are stable if they have public annotation[1]
>
> [1]
> https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/annotation/Public.java
>
> Best,
> Congxian
>
>
> Hangxiang Yu  于2022年9月7日周三 10:31写道：
>
>> Hi, Alexey.
>> You could check the state compatibility in the compatibility table.
>> The page includes how to upgrade and whether it is compatible among
>> different versions.
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/upgrading/#compatibility-table
>>
>> On Wed, Sep 7, 2022 at 7:04 AM Alexey Trenikhun  wrote:
>>
>>> Hello,
>>> Can I upgade from 1.13.6 directly to 1.15.2 skipping 1.14.x ?
>>>
>>> Thanks,
>>> Alexey
>>>
>>>
>>
>> --
>> Best,
>> Hangxiang.
>>
>

Re: [ANNOUNCE] Apache Flink 1.15.2 released

2022-08-24 Thread Jing Ge

Thanks Danny for your effort!

Best regards,
Jing

On Wed, Aug 24, 2022 at 11:43 PM Danny Cranmer 
wrote:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.15.2, which is the second bugfix release for the Apache Flink 1.15
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2022/08/25/release-1.15.2.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351829
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
> Danny Cranmer
>

Re: [ANNOUNCE] Apache Flink 1.15.2 released

2022-08-24 Thread Jing Ge

Thanks Danny for your effort!

Best regards,
Jing

On Wed, Aug 24, 2022 at 11:43 PM Danny Cranmer 
wrote:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.15.2, which is the second bugfix release for the Apache Flink 1.15
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2022/08/25/release-1.15.2.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351829
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
> Danny Cranmer
>

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.1.0 released

2022-07-25 Thread Jing Ge

Congrats! Thank you all!

Best regards,
Jing

On Mon, Jul 25, 2022 at 7:51 AM Px New <15701181132mr@gmail.com> wrote:

> 
>
> Yang Wang  于2022年7月25日周一 10:55写道：
>
> > Congrats! Thanks Gyula for driving this release, and thanks to all
> > contributors!
> >
> >
> > Best,
> > Yang
> >
> > Gyula Fóra  于2022年7月25日周一 10:44写道：
> >
> > > The Apache Flink community is very happy to announce the release of
> > Apache
> > > Flink Kubernetes Operator 1.1.0.
> > >
> > > The Flink Kubernetes Operator allows users to manage their Apache Flink
> > > applications and their lifecycle through native k8s tooling like
> kubectl.
> > >
> > > Please check out the release blog post for an overview of the release:
> > >
> > >
> >
> https://flink.apache.org/news/2022/07/25/release-kubernetes-operator-1.1.0.html
> > >
> > > The release is available for download at:
> > > https://flink.apache.org/downloads.html
> > >
> > > Maven artifacts for Flink Kubernetes Operator can be found at:
> > >
> > >
> >
> https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator
> > >
> > > Official Docker image for the Flink Kubernetes Operator can be found
> at:
> > > https://hub.docker.com/r/apache/flink-kubernetes-operator
> > >
> > > The full release notes are available in Jira:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351723
> > >
> > > We would like to thank all contributors of the Apache Flink community
> who
> > > made this release possible!
> > >
> > > Regards,
> > > Gyula Fora
> > >
> >
>

Re: Using RocksDBStateBackend and SSD to store states, application runs slower..

2022-07-21 Thread Jing Ge

Hi,

using FLASH_SSD_OPTIMIZED already sets the number of threads to 4. This
optimization can improve the source throughput and reduce the delayed wrate
rate.

If this optimization didn't fix the back pressure, could you share more
information about your job? Could you check the metric of the back
pressured operator, e.g. check if it is caused by write-heavy or read-heavy
tasks? You could try tuning rocksdb.writebuffer for write-heavy tasks.

On Thu, Jul 21, 2022 at 5:59 PM Yaroslav Tkachenko 
wrote:

> Hi!
>
> I'd try re-running the SSD test with the following config options:
>
> state.backend.rocksdb.thread.num: 4
> state.backend.rocksdb.predefined-options: FLASH_SSD_OPTIMIZED
>
>
> On Thu, Jul 21, 2022 at 4:11 AM vtygoss  wrote:
>
>> Hi, community!
>>
>>
>> I am doing some performance tests based on my scene.
>>
>>
>> 1. Environment
>>
>> - Flink: 1.13.5
>>
>> - StateBackend: RocksDB, incremental
>>
>> - user case: complex sql contains 7 joins and 2 aggregation, input data
>> 30,000,000 records and output 60,000,000 records about 80GB.
>>
>> - resource: flink on yarn. JM 2G, one TM 24G(8G on-heap, 16G off-heap). 3
>> slots per TM
>>
>> - only difference: different config 'state.backend.rocksdb.localdir', one
>> SATA disk or one SSD disk.
>>
>>
>> 2. rand write performance difference between SATA and SSD
>>
>>4.8M/s is archived using SATA, while 48.2M/s using SSD.
>>
>>```
>>
>>fio -direct=1 -iodepth 64 -thread -rw=randwrite -ioengine=sync
>>  -fsync=1 -runtime=300 -group_reporting -name=xxx -size=100G
>> --allow_mounted_write=1 -bs=8k  -numjobs=64 -filename=/mnt/disk11/xx
>>
>>```
>>
>>
>> 3. In my use case, Flink SQL application finished in 41minutes using
>> SATA, while 45minutes using SSD.
>>
>>
>> Does this comparision suggest that the way to improve RocksDB performance
>> by using SSD is not effective?
>>
>> The direct downstream of the BackPressure operator is HdfsSink, does that
>> mean the best target to improve application performance is HDFS?
>>
>>
>> Thanks for your any replies or suggestions.
>>
>>
>> Best Regards!
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: Unit test have Error "could not find implicit value for evidence parameter"

2022-07-13 Thread Jing Ge

Hi,

you don't have to do that. Next time you can try "Invalidate Caches..."
under the File menu in Intellij Idea.

Best regards,
Jing

On Wed, Jul 13, 2022 at 7:21 PM Min Tu via user 
wrote:

> Thanks a lot !! I have removed the .idea folder and the unit test works.
>
> On Mon, Jul 11, 2022 at 2:44 PM Alexander Fedulov 
> wrote:
>
>> Hi Min Tu,
>>
>> try clean install to make sure the build starts from scratch. Refresh
>> maven modules in IntelliJ after the build. If that doesn't work, try
>> invalidating IntelliJ caches and/or reimporting the project (remove .idea
>> folder).
>>
>> Best,
>> Alexander Fedulov
>>
>> On Sun, Jul 10, 2022 at 12:59 AM Hemanga Borah 
>> wrote:
>>
>>> What version of scala are you using?
>>>
>>> Depending on the build you are using you may be using 2.11 or 2.12. The
>>> version on your maven build files needs to be consistent across your system.
>>>
>>> On Fri, Jul 8, 2022 at 10:00 PM Min Tu via user 
>>> wrote:
>>>
 Hi,

 I have downloaded the flink code and the build works fine
 with following command

 mvnw install -DskipTests -Dcheckstyle.skip

 Then I try to run the unit test code in IntelliJ, but got following
 error:


 /Users/mintu/ApacheProjects/flink/flink-scala/src/test/scala/org/apache/flink/api/scala/DeltaIterationSanityCheckTest.scala:34:41
 *could not find implicit value for evidence parameter of type*
 org.apache.flink.api.common.typeinfo.TypeInformation[(Int, String)]
 val solutionInput = env.fromElements((1, "1"))

 Please advise.

 Thanks in advance

>>>

Re: 退订/unsubscribe

2022-06-19 Thread Jing Ge

退订请发送任意消息至user-unsubscr...@flink.apache.org
In order to unsubscribe, please send an email to
user-unsubscr...@flink.apache.org

Thanks

Best regards,
Jing


From: liangzai 
Date: Sun, Jun 19, 2022 at 4:37 AM
Subject: Re: New KafkaSource API: Change in default behavior regarding
starting offset
To: bastien dine 
Cc: Martijn Visser , Jing Ge ,
user 


请问这个邮件咋退订？


 Replied Message 
>From bastien dine 
Date 06/15/2022 17:50
To Martijn Visser 
Cc Jing Ge ,
user  
Subject Re: New KafkaSource API : Change in default behavior regarding
starting offset
Hello Martijn,

Thanks for the link to the release note, especially :
"When resuming from the savepoint, please use
setStartingOffsets(OffsetsInitializer.committedOffsets()) in the new
KafkaSourceBuilder to transfer the offsets to the new source."
So earliest is the new default
We use for sure  .committedOffsets - we have it by default in our custom
KafkaSource builder to be sure we do not read all the previous data
(earliest)

What bother me is just this change in starting offset default behavior from
FlinkKafkaConsumer to KafkaSource (this can lead to mistake)
In fact it happens that we drop some of our kafka source state to read
again from kafka committed offset, but maybe nodoby does that ^^

Anyway thanks for the focus on the release note !

Best Regards,

--

Bastien DINE
Data Architect / Software Engineer / Sysadmin
bastiendine.io


Le mer. 15 juin 2022 à 10:58, Martijn Visser  a
écrit :

> Hi Bastien,
>
> When the FlinkKafkaConsumer was deprecated in 1.14.0, the release notes
> included the instruction how to migrate from FlinkKafkaConsumer to
> KafkaConsumer [1]. Looking at the Kafka documentation [2], there is a
> section on how to upgrade to the latest connector version that I think is
> outdated. I'm leaning towards copying the migration instructions to the
> generic documentation. Do you think that would have sufficed?
>
> Best regards,
>
> Martijn
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.14/#deprecate-flinkkafkaconsumer
> [2]
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#upgrading-to-the-latest-connector-version
>
> Op wo 15 jun. 2022 om 09:22 schreef bastien dine :
>
>> Hello jing,
>>
>> This was the previous method in old Kafka consumer API, it has been
>> removed in 1.15, so source code is not in master anymore,
>> Yes I know for the new Offset initializer, committed offset + earliest as
>> fallback can be used to have the same behavior as before
>> I just wanted to know whether this is a changed behavior or I am missing
>> something
>>
>>
>>
>> Bastien DINE
>> Freelance
>> Data Architect / Software Engineer / Sysadmin
>> http://bastiendine.io
>>
>>
>>
>> Le mar. 14 juin 2022 à 23:08, Jing Ge  a écrit :
>>
>>> Hi Bastien,
>>>
>>> Thanks for asking. I didn't find any call of setStartFromGroupOffsets() 
>>> within
>>> Flink in the master branch. Could you please point out the code that
>>> committed offset is used as default?
>>>
>>> W.r.t. the new KafkaSource, if OffsetsInitializer.committedOffsets()
>>> is used, an exception will be thrown at runtime in case there is no
>>> committed offset, which is useful if the user is intended to read from the
>>> committed offset but something is wrong. It might feel weird if it is used
>>> as default, because an exception will be thrown when users start new jobs
>>> with default settings.
>>>
>>> Best regards,
>>> Jing
>>>
>>> On Tue, Jun 14, 2022 at 4:15 PM bastien dine 
>>> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> Does someone know why the starting offset behaviour has changed in the
>>>> new Kafka Source ?
>>>>
>>>> This is now from earliest (code in KafkaSourceBuilder), doc says :
>>>> "If offsets initializer is not specified, OffsetsInitializer.earliest() 
>>>> will
>>>> be used by default." from :
>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#starting-offset
>>>>
>>>> Before in old FlinkKafkaConsumer it was from committed offset (i.e : 
>>>> setStartFromGroupOffsets()
>>>> method)
>>>>
>>>> which match with this behaviour in new KafkaSource :   :
>>>> OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST
>>>>
>>>> This change can lead to big troubles if user pay no attention to this
>>>> point when migrating from old KafkaConsumer to new KafkaSource,
>>>>
>>>> Regards,
>>>> Bastien
>>>>
>>>> --
>>>>
>>>> Bastien DINE
>>>> Data Architect / Software Engineer / Sysadmin
>>>> bastiendine.io
>>>>
>>>

Re: Metrics for FileSource

2022-06-18 Thread Jing Ge

Hi Meghajit,

Thanks for the feedback. I have fired a ticket:
https://issues.apache.org/jira/browse/FLINK-28117

Best regards,
Jing

On Mon, Jun 13, 2022 at 7:23 AM Meghajit Mazumdar <
meghajit.mazum...@gojek.com> wrote:

> Hi folks,
>
> Thanks for the reply.
> We have implemented our own SplitAssigner, FileReaderFormat and
> FileReaderFormat.Reader implementations. Hence, we plan to add custom
> metrics such as these:
> 1. No. of splits SplitAssigner is initialized with, number of splits
> re-added back to the SplitAssigner
> 2. Readers created per unit time
> 3. Time taken to create a reader
> 4. Time taken for the Reader to produce a single Row
> 5. Readers closed per unit time
> ... and some more
>
> However, since we haven't implemented our own FileSource or
> SplitEnumerator, we don't have visibility into the metrics of these
> components. We would ideally like to measure these:
> 1. Number of rows emitted by the source per unit time
> 2. Time taken by the enumerator to discover the splits
> 3. Total splits discovered
>
>
> Regards,
> Meghajit
>
>
> On Fri, Jun 10, 2022 at 10:04 PM Jing Ge  wrote:
>
>> Hi meghajit,
>>
>> I think it makes sense to extend the current metrics. Could you list all
>> metrics you need? Thanks!
>>
>> Best regards,
>> Jing
>>
>> On Fri, Jun 10, 2022 at 5:06 PM Lijie Wang 
>> wrote:
>>
>>> Hi Meghajit,
>>>
>>> As far as I know, currently, the FileSource does not have the metrics
>>> you need.  You can implement your own source, and register custom metrics
>>> via `SplitEnumeratorContext#metricGroup` and
>>> `SourceReaderContext#metricGroup`.
>>>
>>> Best,
>>> Lijie
>>>
>>> Meghajit Mazumdar  于2022年6月10日周五 16:36写道：
>>>
>>>> Hello,
>>>>
>>>> We are working on a Flink project which uses FileSource to discover and
>>>> read Parquet Files from GCS. ( using Flink 1.14)
>>>>
>>>> As part of this, we wanted to implement some health metrics around the
>>>> code.
>>>> I wanted to know whether Flink gathers some metrics by itself around
>>>> FileSource, e;g, number of files discovered by the SplitEnumerator, number
>>>> of files added back to SplitAssigner, time taken to process per split, etc 
>>>> ?
>>>>
>>>> I checked in the official documentation
>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/filesystem/>
>>>> but there doesn't appear to be. Is the solution then to implement
>>>> custom metrics like this
>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/metrics/>
>>>> ?
>>>>
>>>>
>>>> *Regards,*
>>>> *Meghajit*
>>>>
>>>
>
> --
> *Regards,*
> *Meghajit*
>

Re: Flink config driven tool ?

2022-06-15 Thread Jing Ge

Hi,

Just like Shengkai mentioned. I would strongly suggest trying SQL for ETL
dag. If you find anything that SQL does not work for you, please share your
requirements with us. We might check if it makes sense to build new
features in Flink to support them.

Best regards,
Jing


On Wed, Jun 15, 2022 at 11:22 AM Rakshit Ramesh <
rakshit.ram...@datakaveri.org> wrote:

> I'm working on such a thing.
> It's in early stages and needs a lot more work.
> I'm open to collaborating.
> https://github.com/datakaveri/iudx-adaptor-framework
>
> On Tue, 7 Jun 2022 at 23:49, sri hari kali charan Tummala <
> kali.tumm...@gmail.com> wrote:
>
>> Hi Flink Community,
>>
>> can someone point me to a good config-driven flink data movement tool
>> Github repos? Imagine I build my ETL dag connecting source -->
>> transformations --> target just using a config file.
>>
>> below are a few spark examples:-
>> https://github.com/mvrpl/big-shipper
>> https://github.com/BitwiseInc/Hydrograph
>>
>> Thanks & Regards
>> Sri Tummala
>>
>>

Re: New KafkaSource API : Change in default behavior regarding starting offset

2022-06-14 Thread Jing Ge

Hi Bastien,

Thanks for asking. I didn't find any call of setStartFromGroupOffsets() within
Flink in the master branch. Could you please point out the code that
committed offset is used as default?

W.r.t. the new KafkaSource, if OffsetsInitializer.committedOffsets()
is used, an exception will be thrown at runtime in case there is no
committed offset, which is useful if the user is intended to read from the
committed offset but something is wrong. It might feel weird if it is used
as default, because an exception will be thrown when users start new jobs
with default settings.

Best regards,
Jing

On Tue, Jun 14, 2022 at 4:15 PM bastien dine  wrote:

> Hello everyone,
>
> Does someone know why the starting offset behaviour has changed in the new
> Kafka Source ?
>
> This is now from earliest (code in KafkaSourceBuilder), doc says :
> "If offsets initializer is not specified, OffsetsInitializer.earliest() will
> be used by default." from :
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#starting-offset
>
> Before in old FlinkKafkaConsumer it was from committed offset (i.e : 
> setStartFromGroupOffsets()
> method)
>
> which match with this behaviour in new KafkaSource :   :
> OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST
>
> This change can lead to big troubles if user pay no attention to this
> point when migrating from old KafkaConsumer to new KafkaSource,
>
> Regards,
> Bastien
>
> --
>
> Bastien DINE
> Data Architect / Software Engineer / Sysadmin
> bastiendine.io
>

Re: Apache Flink - Reading data from Scylla DB

2022-06-14 Thread Jing Ge

Hi,

Please be aware that SourceFunction will be deprecated soon[1]. It is
recommended to build a new source connector based on the new Source API
design by FLIP-27[2]. You might take the Kafka connector as the
reference implementation.

Best regards,
Jing

[1] https://lists.apache.org/thread/d6cwqw9b3105wcpdkwq7rr4s7x4ywqr9
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface



On Tue, Jun 14, 2022 at 6:30 AM yuxia  wrote:

> Seems you may need implement a custom connector for Scylla DB as I haven't
> found a connector on hand.
> Hope the doc[1][2] can help you implement your own connector.
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/
> [2] https://flink.apache.org/2021/09/07/connector-table-sql-api-part1.html
>
> Best regards,
> Yuxia
>
> --
> *发件人: *"Himanshu Sareen" 
> *收件人: *"User" 
> *发送时间: *星期二, 2022年 6 月 14日 上午 11:29:38
> *主题: *Apache Flink - Reading data from Scylla DB
>
> Team,
>
> I'm looking for a solution to Consume/Read data from Scylla DB into Apache
> Flink.
>
> If anyone can guide me or share pointers it will be helpful.
>
> Regards,
> Himanshu
>
>

Re: 怀疑源码中的一个方法是never reached code

2022-06-14 Thread Jing Ge

Hi,

友情提醒：开ticket以及以后在dev里讨论，记得用英语哈。

祝好
Jing


On Tue, Jun 14, 2022 at 3:23 PM Yun Tang  wrote:

> Hi，育锋
>
> 我觉得你的分析应该是没问题的。可以创建一个ticket来修复该问题。另外，关于代码实现的具体讨论，建议在dev邮件列表讨论。
>
> 祝好
> 唐云
> 
> From: 朱育锋 
> Sent: Tuesday, June 14, 2022 19:33
> To: user-zh@flink.apache.org 
> Subject: 怀疑源码中的一个方法是never reached code
>
> Hello Everyone
>
>
> 在阅读ProcessMemoryUtils类的代码时，我怀疑sanityCheckTotalProcessMemory方法[1]中的主体逻辑永远都不会执行：
>
> 1.
> 在deriveJvmMetaspaceAndOverheadFromTotalFlinkMemory方法中，判断了是否显式配置了TotalProcessMemory[2]
> 2.
> false分支（也就是没有显式配置TotalProcessMemory）的逻辑中调用了sanityCheckTotalProcessMemory方法，而sanityCheckTotalProcessMemory方法的主体逻辑
>
> 只有在显式配置了TotalProcessMemory时[3]才会执行，所以sanityCheckTotalProcessMemory方法的主体逻辑应该永远不会执行
>
>
> 参照TaskExecutorFlinkMemoryUtils类中的sanityCheckTotalFlinkMemory方法（该方法与sanityCheckTotalProcessMemory方法逻辑类似，都是比较衍生的内存大小与显式配置的内存大小是否一致）的调用位置[4][5]，
>
> 我猜测sanityCheckTotalProcessMemory方法是不是应该放在deriveJvmMetaspaceAndOverheadFromTotalFlinkMemory方法中if分支的后面，而不是在分支里面
>
> 也有可能是对这段代码的理解不够，没有揣测到这么写的意图，希望大佬们帮忙确认下
>
> [1]
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/ProcessMemoryUtils.java#L239
> <
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/ProcessMemoryUtils.java#L239
> >
> [2]
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/ProcessMemoryUtils.java#L170
> <
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/ProcessMemoryUtils.java#L170
> >
> [3]
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/ProcessMemoryUtils.java#L247
> <
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/ProcessMemoryUtils.java#L247
> >
> [4]
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/taskmanager/TaskExecutorFlinkMemoryUtils.java#L101
> <
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/taskmanager/TaskExecutorFlinkMemoryUtils.java#L101
> >
> [5]
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/taskmanager/TaskExecutorFlinkMemoryUtils.java#L192
> <
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/util/config/memory/taskmanager/TaskExecutorFlinkMemoryUtils.java#L192
> >
>
> Best regards
> YuFeng
>

Re: Metrics for FileSource

2022-06-10 Thread Jing Ge

Hi meghajit,

I think it makes sense to extend the current metrics. Could you list all
metrics you need? Thanks!

Best regards,
Jing

On Fri, Jun 10, 2022 at 5:06 PM Lijie Wang  wrote:

> Hi Meghajit,
>
> As far as I know, currently, the FileSource does not have the metrics you
> need.  You can implement your own source, and register custom metrics via
> `SplitEnumeratorContext#metricGroup` and `SourceReaderContext#metricGroup`.
>
> Best,
> Lijie
>
> Meghajit Mazumdar  于2022年6月10日周五 16:36写道：
>
>> Hello,
>>
>> We are working on a Flink project which uses FileSource to discover and
>> read Parquet Files from GCS. ( using Flink 1.14)
>>
>> As part of this, we wanted to implement some health metrics around the
>> code.
>> I wanted to know whether Flink gathers some metrics by itself around
>> FileSource, e;g, number of files discovered by the SplitEnumerator, number
>> of files added back to SplitAssigner, time taken to process per split, etc ?
>>
>> I checked in the official documentation
>> 
>> but there doesn't appear to be. Is the solution then to implement
>> custom metrics like this
>> 
>> ?
>>
>>
>> *Regards,*
>> *Meghajit*
>>
>

Re: [External] Re: Source vs SourceFunction and testing

2022-06-09 Thread Jing Ge

Hi Carlos,

You might want to join the discussion about FLIP-238[1] to share your
thoughts with us. Thanks!

Best regards,
Jing

[1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt


On Thu, Jun 9, 2022 at 2:13 PM Sanabria, Carlos <
carlos.sanab...@accenture.com> wrote:

> Thanks for your quick response!
>
> Yes, this is exactly what we were looking for!
> Seems like a really nice feature. Even better than the FromElementsSource
> we were asking for, because it allows to generate the events dynamically.
>
> Is there any way we can vote for the FLIP-238 to be accepted?
>
> -Original Message-
> From: Qingsheng Ren 
> Sent: jueves, 9 de junio de 2022 12:16
> To: Sanabria, Carlos 
> Cc: user 
> Subject: Re: [External] Re: Source vs SourceFunction and testing
>
> Hi Carlos,
>
> FLIP-238 [1] is proposing a FLIP-27-based data generator source and I
> think this is what you are looking for. This FLIP was created just days ago
> so it may take some time to get accepted and released.
>
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_FLINK_FLIP-2D238-253A-2BIntroduce-2BFLIP-2D27-2Dbased-2BData-2BGenerator-2BSource=DwIFaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=Q6157zGhiDIuCzxlSpEZgTNbdEC4jbNL0iaPBqIxifg=HzoMy2jOOV5oDa2SuDfcm4cy2LK-9DOQ530Dmvi2r4f0jhOVKmqOTToQ6ArM3q1F=ZRTHrrDGp1m0Po50VeAFjAEQBjCM28naJRNWM4CZQoA=
>
> Cheers,
>
> Qingsheng
>
> On Thu, Jun 9, 2022 at 6:05 PM Sanabria, Carlos <
> carlos.sanab...@accenture.com> wrote:
> >
> > Hi everyone!
> >
> > Sorry for reopening the thread, but I am having some problems related to
> this case while migrating our code from Flink 1.12 to Flink 1.15.
> >
> > We have a base project that encapsulates a ton of common code and
> configurations. One of the abstractions we have is an AbstractDataStreamJob
> class that has generic Sources and Sinks. We implemented it like this since
> Flink 1.8, following the recommendations of the Flink documentation [1]:
> >
> > "Apache Flink provides a JUnit rule called MiniClusterWithClientResource
> for testing complete jobs against a local, embedded mini cluster. called
> MiniClusterWithClientResource.
> > ...
> > A few remarks on integration testing with MiniClusterWithClientResource:
> > - In order not to copy your whole pipeline code from production to test,
> make sources and sinks pluggable in your production code and inject special
> test sources and test sinks in your tests.
> > ..."
> >
> > This way, we can create the real Kafka Sources and Sinks in the Main
> class of the job, and also create the test Sources and Sinks in the Junit
> tests, and inject them in the AbstractDataStreamJob class.
> >
> > The problem comes with the new Source interface and the end to end tests
> against the local embedded mini cluster. Prior to Flink 1.15, we used the
> FromElementsFunction to create the test SourceFunction. Now that we changed
> the code to use the new Source interface, we cannot use the
> FromElementsFunction anymore, and we haven't found an equivalent
> FromElementsSource class with the same functionality but implemented using
> the new Source API.
> >
> > We want to keep the same structure in the AbstractDataStreamJob class
> (with generic and pluggable sources and sinks), as we think it is the most
> elegant and generic solution.
> >
> > Is it planned to implement a FromElementsSource class that extends the
> new Source API? Is there any other alternative that may serve as a
> workaround for the moment?
> >
> > We have tried to implement a custom Source for this use case, but it
> seems like an overwhelming task and we do not want to reinvent the wheel
> either. If it is planned to implement the FromElementsSource we'd rather
> prefer to wait for it.
> >
> > Thanks!
> > Carlos
> >
> > [1]
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__nightlies.apache.
> > org_flink_flink-2Ddocs-2Drelease-2D1.15_docs_dev_datastream_testing_-2
> > 3junit-2Drule-2Dminiclusterwithclientresource=DwIFaQ=eIGjsITfXP_y-
> > DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=Q6157zGhiDIuCzxlSpEZgTNbdEC4jbNL0iaPB
> > qIxifg=HzoMy2jOOV5oDa2SuDfcm4cy2LK-9DOQ530Dmvi2r4f0jhOVKmqOTToQ6ArM3
> > q1F=RKTpSSHRudC_BMmTz9xhGOT91uAAbp7HPEejuTihHvU=
> >
> > -Original Message-
> > From: Qingsheng Ren 
> > Sent: miércoles, 25 de mayo de 2022 12:10
> > To: Piotr Domagalski 
> > Cc: user@flink.apache.org
> > Subject: [External] Re: Source vs SourceFunction and testing
> >
> > This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with
> links and attachments.
> >
> > Glad to see you have resolved the issue!
> >
> > If you want to learn more about the Source API, the Flink document [1]
> has a detailed description about it. The original proposal FLIP-27 [2] is
> also a good reference.
> >
> > [1]
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__nightlies.apache.
> > org_flink_flink-2Ddocs-2Drelease-2D1.15_docs_dev_datastream_sources_
> >

Re: SourceFunction

2022-06-08 Thread Jing Ge

Hi Alexey,

There is a thread[1] discussing this issue right now. It would be great if
you could share some thoughts about your experience. Thanks!

Best regards,
Jing

[1]https://lists.apache.org/thread/d6cwqw9b3105wcpdkwq7rr4s7x4ywqr9

On Wed, Jun 8, 2022 at 4:42 PM Alexey Trenikhun  wrote:

> Hello,
> Is there plan to deprecate SourceFunction in favor of Source API? We have
> custom SourceFunction based source,  do we need to plan to rewrite it using
> new Source API ?
>
> Thanks,
> Alexey
>

Re: Add me to slack

2022-06-05 Thread Jing Ge

Hi Xiao,

Just done, please check. Thanks!

Best regards,
Jing


On Mon, Jun 6, 2022 at 3:59 AM Xiao Ma  wrote:

> Hi Jing,
>
> Could you please add me to the slack channel also?
>
> Thank you.
>
>
> Best,
> Mark Ma
>
> On Sun, Jun 5, 2022 at 9:57 PM Jing Ge  wrote:
>
>> Hi Raghunadh,
>>
>> Just did, please check your email. Thanks!
>>
>> Best regards,
>> Jing
>>
>> On Mon, Jun 6, 2022 at 3:51 AM Raghunadh Nittala 
>> wrote:
>>
>>> Team, Kindly add me to the slack channel.
>>>
>>> Best Regards.
>>>
>> --
> Xiao Ma
> Geotab
> Software Developer, Data Engineering | B.Sc, M.Sc
> Direct +1 (416) 836 - 3541
> Toll-free  +1 (877) 436 - 8221
> Visit   www.geotab.com
> Twitter | Facebook | YouTube | LinkedIn
>

Re: Flink source Code Explanation

2022-06-05 Thread Jing Ge

Hi Sri,

Flink is very well documented. You can find it under e.g.
https://nightlies.apache.org/flink/flink-docs-master/

Best regards,
Jing

On Mon, Jun 6, 2022 at 3:39 AM sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Hi Flink Community,
>
> I want to go through flink source code in my free time is there a document
> that I can go through that explains to me where to start? other than Java
> doc is there anything else to start my reserve engineering.
>
> Thanks & Regards
> Sri Tummala
>
>

Re: Add me to slack

2022-06-05 Thread Jing Ge

Hi Raghunadh,

Just did, please check your email. Thanks!

Best regards,
Jing

On Mon, Jun 6, 2022 at 3:51 AM Raghunadh Nittala 
wrote:

> Team, Kindly add me to the slack channel.
>
> Best Regards.
>

Re: Need help to join Apache Flink community on.Slack

2022-06-05 Thread Jing Ge

Hi Sucheth,

Just invited you, please check. Thanks!

Best Regards,
Jing

On Sun, Jun 5, 2022 at 6:06 PM Sucheth S  wrote:

> Hello Jing,
>
> Can you please add me - suchet...@gmail.com
>
>
> On Sun, Jun 5, 2022 at 9:02 AM sri hari kali charan Tummala <
> kali.tumm...@gmail.com> wrote:
>
>> Hi Jing,
>>
>> Please add me kali.tumm...@gmail.com.
>>
>> Thanks
>> Sri
>>
>> On Sat, Jun 4, 2022 at 4:47 PM Jing Ge  wrote:
>>
>>> Hi Santhosh,
>>>
>>> just invited you. Please check your email. Looking forward to knowing
>>> your story! Thanks!
>>>
>>> To anyone else who wants to join, please send an email to
>>> user@flink.apache.org, you might have a better chance to get the
>>> invite. Thanks.
>>>
>>> Regards,
>>> Jing
>>>
>>> On Sat, Jun 4, 2022 at 10:37 PM santhosh venkat <
>>> santhoshvenkat1...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Can you please invite me to join the apache flink slack community
>>>> channel. We have adopted apache flink and would like to participate in the
>>>> community forum.
>>>>
>>>> Thank you.
>>>>
>>>> Regards
>>>>
>>>
>>
>> --
>> Thanks & Regards
>> Sri Tummala
>>
>> --
> Regards,
> Sucheth Shivakumar
> website: https://sucheths.com
> Mobile : +1(650)-576-8050
> San Mateo, United States
>

Re: Need help to join Apache Flink community on.Slack

2022-06-05 Thread Jing Ge

done, please check. Thanks

Best regards,
Jing

On Sun, Jun 5, 2022 at 6:05 PM Deepak Sharma  wrote:

> I need the invite as well .
> kdq...@gmail.com
>
> On Sun, 5 Jun 2022 at 9:32 PM, sri hari kali charan Tummala <
> kali.tumm...@gmail.com> wrote:
>
>> Hi Jing,
>>
>> Please add me kali.tumm...@gmail.com.
>>
>> Thanks
>> Sri
>>
>> On Sat, Jun 4, 2022 at 4:47 PM Jing Ge  wrote:
>>
>>> Hi Santhosh,
>>>
>>> just invited you. Please check your email. Looking forward to knowing
>>> your story! Thanks!
>>>
>>> To anyone else who wants to join, please send an email to
>>> user@flink.apache.org, you might have a better chance to get the
>>> invite. Thanks.
>>>
>>> Regards,
>>> Jing
>>>
>>> On Sat, Jun 4, 2022 at 10:37 PM santhosh venkat <
>>> santhoshvenkat1...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Can you please invite me to join the apache flink slack community
>>>> channel. We have adopted apache flink and would like to participate in the
>>>> community forum.
>>>>
>>>> Thank you.
>>>>
>>>> Regards
>>>>
>>>
>>
>> --
>> Thanks & Regards
>> Sri Tummala
>>
>> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>

Re: Need help to join Apache Flink community on.Slack

2022-06-05 Thread Jing Ge

Hi Sri,

I have invited you, please check. Thanks!

Best regards,
Jing

On Sun, Jun 5, 2022 at 6:02 PM sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Hi Jing,
>
> Please add me kali.tumm...@gmail.com.
>
> Thanks
> Sri
>
> On Sat, Jun 4, 2022 at 4:47 PM Jing Ge  wrote:
>
>> Hi Santhosh,
>>
>> just invited you. Please check your email. Looking forward to knowing
>> your story! Thanks!
>>
>> To anyone else who wants to join, please send an email to
>> user@flink.apache.org, you might have a better chance to get the invite.
>> Thanks.
>>
>> Regards,
>> Jing
>>
>> On Sat, Jun 4, 2022 at 10:37 PM santhosh venkat <
>> santhoshvenkat1...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Can you please invite me to join the apache flink slack community
>>> channel. We have adopted apache flink and would like to participate in the
>>> community forum.
>>>
>>> Thank you.
>>>
>>> Regards
>>>
>>
>
> --
> Thanks & Regards
> Sri Tummala
>
>

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.0.0 released

2022-06-05 Thread Jing Ge

Amazing! Thanks Yang for driving this! Thanks all for your effort!

Best regards,
Jing

On Sun, Jun 5, 2022 at 11:30 AM tison  wrote:

> Congrats! Thank you all for making this release happen.
>
> Best,
> tison.
>
>
> rui fan <1996fan...@gmail.com> 于2022年6月5日周日 17:19写道：
>
>> Thanks Yang for driving the release, and thanks to
>> all contributors for making this release happen!
>>
>> Best wishes
>> Rui Fan
>>
>> On Sun, Jun 5, 2022 at 4:14 PM Yang Wang  wrote:
>>
>> > The Apache Flink community is very happy to announce the release of
>> Apache
>> > Flink Kubernetes Operator 1.0.0.
>> >
>> > The Flink Kubernetes Operator allows users to manage their Apache Flink
>> > applications and their lifecycle through native k8s tooling like
>> kubectl.
>> > This is the first production ready release and brings numerous
>> > improvements and new features to almost every aspect of the operator.
>> >
>> > Please check out the release blog post for an overview of the release:
>> >
>> >
>> https://flink.apache.org/news/2022/06/05/release-kubernetes-operator-1.0.0.html
>> >
>> > The release is available for download at:
>> > https://flink.apache.org/downloads.html
>> >
>> > Maven artifacts for Flink Kubernetes Operator can be found at:
>> >
>> >
>> https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator
>> >
>> > Official Docker image for Flink Kubernetes Operator applications can be
>> > found at:
>> > https://hub.docker.com/r/apache/flink-kubernetes-operator
>> >
>> > The full release notes are available in Jira:
>> >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351500
>> >
>> > We would like to thank all contributors of the Apache Flink community
>> who
>> > made this release possible!
>> >
>> > Regards,
>> > Gyula & Yang
>> >
>>
>

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.0.0 released

2022-06-05 Thread Jing Ge

Amazing! Thanks Yang for driving this! Thanks all for your effort!

Best regards,
Jing

On Sun, Jun 5, 2022 at 11:30 AM tison  wrote:

> Congrats! Thank you all for making this release happen.
>
> Best,
> tison.
>
>
> rui fan <1996fan...@gmail.com> 于2022年6月5日周日 17:19写道：
>
>> Thanks Yang for driving the release, and thanks to
>> all contributors for making this release happen!
>>
>> Best wishes
>> Rui Fan
>>
>> On Sun, Jun 5, 2022 at 4:14 PM Yang Wang  wrote:
>>
>> > The Apache Flink community is very happy to announce the release of
>> Apache
>> > Flink Kubernetes Operator 1.0.0.
>> >
>> > The Flink Kubernetes Operator allows users to manage their Apache Flink
>> > applications and their lifecycle through native k8s tooling like
>> kubectl.
>> > This is the first production ready release and brings numerous
>> > improvements and new features to almost every aspect of the operator.
>> >
>> > Please check out the release blog post for an overview of the release:
>> >
>> >
>> https://flink.apache.org/news/2022/06/05/release-kubernetes-operator-1.0.0.html
>> >
>> > The release is available for download at:
>> > https://flink.apache.org/downloads.html
>> >
>> > Maven artifacts for Flink Kubernetes Operator can be found at:
>> >
>> >
>> https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator
>> >
>> > Official Docker image for Flink Kubernetes Operator applications can be
>> > found at:
>> > https://hub.docker.com/r/apache/flink-kubernetes-operator
>> >
>> > The full release notes are available in Jira:
>> >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351500
>> >
>> > We would like to thank all contributors of the Apache Flink community
>> who
>> > made this release possible!
>> >
>> > Regards,
>> > Gyula & Yang
>> >
>>
>

Re: slack invite link

2022-06-05 Thread Jing Ge

Hi,

Invites have been sent. Please check your emails. Thanks!

Regards,
JIng

On Sun, Jun 5, 2022 at 12:38 PM Jay Ghiya  wrote:

> Request community to share invite link for me at ghiya6...@gmail.com and
> jay.gh...@ge.com
>

Re: Need help to join Apache Flink community on.Slack

2022-06-04 Thread Jing Ge

Hi Santhosh,

just invited you. Please check your email. Looking forward to knowing your
story! Thanks!

To anyone else who wants to join, please send an email to
user@flink.apache.org, you might have a better chance to get the invite.
Thanks.

Regards,
Jing

On Sat, Jun 4, 2022 at 10:37 PM santhosh venkat <
santhoshvenkat1...@gmail.com> wrote:

> Hi,
>
> Can you please invite me to join the apache flink slack community channel.
> We have adopted apache flink and would like to participate in the
> community forum.
>
> Thank you.
>
> Regards
>

Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-04 Thread Jing Ge

Hi Xingtong,

While inviting new members, there are two options: "From another company"
vs "Your coworker". In this case, we should always choose "Your coworker"
to add new members to the Apache Flink workspace, right?

Best regards,
Jing

On Fri, Jun 3, 2022 at 1:10 PM Yuan Mei  wrote:

> Thanks, Xintong and Jark the great effort driving this, and everyone for
> making this possible.
>
> I've also Twittered this announcement on our Apache Flink Twitter account.
>
> Best
>
> Yuan
>
>
>
> On Fri, Jun 3, 2022 at 12:54 AM Jing Ge  wrote:
>
>> Thanks everyone for your effort!
>>
>> Best regards,
>> Jing
>>
>> On Thu, Jun 2, 2022 at 4:17 PM Martijn Visser 
>> wrote:
>>
>>> Thanks everyone for joining! It's good to see so many have joined in
>>> such a short time already. I've just refreshed the link which you can
>>> always find on the project website [1]
>>>
>>> Best regards, Martijn
>>>
>>> [1] https://flink.apache.org/community.html#slack
>>>
>>> Op do 2 jun. 2022 om 11:42 schreef Jingsong Li :
>>>
>>>> Thanks Xingtong, Jark, Martijn and Robert for making this possible!
>>>>
>>>> Best,
>>>> Jingsong
>>>>
>>>>
>>>> On Thu, Jun 2, 2022 at 5:32 PM Jark Wu  wrote:
>>>>
>>>>> Thank Xingtong for making this possible!
>>>>>
>>>>> Cheers,
>>>>> Jark Wu
>>>>>
>>>>> On Thu, 2 Jun 2022 at 15:31, Xintong Song 
>>>>> wrote:
>>>>>
>>>>> > Hi everyone,
>>>>> >
>>>>> > I'm very happy to announce that the Apache Flink community has
>>>>> created a
>>>>> > dedicated Slack workspace [1]. Welcome to join us on Slack.
>>>>> >
>>>>> > ## Join the Slack workspace
>>>>> >
>>>>> > You can join the Slack workspace by either of the following two ways:
>>>>> > 1. Click the invitation link posted on the project website [2].
>>>>> > 2. Ask anyone who already joined the Slack workspace to invite you.
>>>>> >
>>>>> > We recommend 2), if available. Due to Slack limitations, the
>>>>> invitation
>>>>> > link in 1) expires and needs manual updates after every 100 invites.
>>>>> If it
>>>>> > is expired, please reach out to the dev / user mailing lists.
>>>>> >
>>>>> > ## Community rules
>>>>> >
>>>>> > When using the community Slack workspace, please follow these
>>>>> community
>>>>> > rules:
>>>>> > * *Be respectful* - This is the most important rule!
>>>>> > * All important decisions and conclusions *must be reflected back to
>>>>> the
>>>>> > mailing lists*. "If it didn’t happen on a mailing list, it didn’t
>>>>> happen."
>>>>> > - The Apache Mottos [3]
>>>>> > * Use *Slack threads* to keep parallel conversations from
>>>>> overwhelming a
>>>>> > channel.
>>>>> > * Please *do not direct message* people for troubleshooting, Jira
>>>>> assigning
>>>>> > and PR review. These should be picked-up voluntarily.
>>>>> >
>>>>> >
>>>>> > ## Maintenance
>>>>> >
>>>>> >
>>>>> > Committers can refer to this wiki page [4] for information needed for
>>>>> > maintaining the Slack workspace.
>>>>> >
>>>>> >
>>>>> > Thanks Jark, Martijn and Robert for helping setting up the Slack
>>>>> workspace.
>>>>> >
>>>>> >
>>>>> > Best,
>>>>> >
>>>>> > Xintong
>>>>> >
>>>>> >
>>>>> > [1] https://apache-flink.slack.com/
>>>>> >
>>>>> > [2] https://flink.apache.org/community.html#slack
>>>>> >
>>>>> > [3] http://theapacheway.com/on-list/
>>>>> >
>>>>> > [4]
>>>>> https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
>>>>> >
>>>>>
>>>>

Re: Flink/Scala contract positions ?

2022-06-03 Thread Jing Ge

Hi,

Currently, the Flink Scala API is not in a good shape. Would you like to
start from there?

Best regards,
Jing

On Fri, Jun 3, 2022 at 4:29 PM sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Hi Folks,
>
> Is anyone hiring for Flink or Scala Akka contract corp to corp positions ?
> I am open in market looking for work in Scala Spark or Flink Scala or Scala
> Akka world.
>
> Thanks
> Sri
>

Re: flink-ml algorithms

2022-06-03 Thread Jing Ge

Hi,

It seems like an evaluation with a small dataset. In this case, would you
like to share your data sample and code? In addition, have you tried KMeans
with the same dataset and got inconsistent results too?

Best regards,
Jing

On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani <
natia.chachkhia...@gmail.com> wrote:

> Hi,
>
> I am running OnlineKmeans from flink-ml repo on a small dataset. I've
> noticed that I don't get consistent results, assignments to clusters,
> across different runs. I have set both parallelism and globalBatchSize to 1.
> I am doing simple fit and transform on each data point ingested. Is the
> order of processing not guaranteed? Or am I missing something?
>
> Thanks,
> Natia
>

Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 Thread Jing Ge

Thanks everyone for your effort!

Best regards,
Jing

On Thu, Jun 2, 2022 at 4:17 PM Martijn Visser 
wrote:

> Thanks everyone for joining! It's good to see so many have joined in such
> a short time already. I've just refreshed the link which you can always
> find on the project website [1]
>
> Best regards, Martijn
>
> [1] https://flink.apache.org/community.html#slack
>
> Op do 2 jun. 2022 om 11:42 schreef Jingsong Li :
>
>> Thanks Xingtong, Jark, Martijn and Robert for making this possible!
>>
>> Best,
>> Jingsong
>>
>>
>> On Thu, Jun 2, 2022 at 5:32 PM Jark Wu  wrote:
>>
>>> Thank Xingtong for making this possible!
>>>
>>> Cheers,
>>> Jark Wu
>>>
>>> On Thu, 2 Jun 2022 at 15:31, Xintong Song  wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > I'm very happy to announce that the Apache Flink community has created
>>> a
>>> > dedicated Slack workspace [1]. Welcome to join us on Slack.
>>> >
>>> > ## Join the Slack workspace
>>> >
>>> > You can join the Slack workspace by either of the following two ways:
>>> > 1. Click the invitation link posted on the project website [2].
>>> > 2. Ask anyone who already joined the Slack workspace to invite you.
>>> >
>>> > We recommend 2), if available. Due to Slack limitations, the invitation
>>> > link in 1) expires and needs manual updates after every 100 invites.
>>> If it
>>> > is expired, please reach out to the dev / user mailing lists.
>>> >
>>> > ## Community rules
>>> >
>>> > When using the community Slack workspace, please follow these community
>>> > rules:
>>> > * *Be respectful* - This is the most important rule!
>>> > * All important decisions and conclusions *must be reflected back to
>>> the
>>> > mailing lists*. "If it didn’t happen on a mailing list, it didn’t
>>> happen."
>>> > - The Apache Mottos [3]
>>> > * Use *Slack threads* to keep parallel conversations from overwhelming
>>> a
>>> > channel.
>>> > * Please *do not direct message* people for troubleshooting, Jira
>>> assigning
>>> > and PR review. These should be picked-up voluntarily.
>>> >
>>> >
>>> > ## Maintenance
>>> >
>>> >
>>> > Committers can refer to this wiki page [4] for information needed for
>>> > maintaining the Slack workspace.
>>> >
>>> >
>>> > Thanks Jark, Martijn and Robert for helping setting up the Slack
>>> workspace.
>>> >
>>> >
>>> > Best,
>>> >
>>> > Xintong
>>> >
>>> >
>>> > [1] https://apache-flink.slack.com/
>>> >
>>> > [2] https://flink.apache.org/community.html#slack
>>> >
>>> > [3] http://theapacheway.com/on-list/
>>> >
>>> > [4] https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
>>> >
>>>
>>

Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 Thread Jing Ge

Thanks everyone for your effort!

Best regards,
Jing

On Thu, Jun 2, 2022 at 4:17 PM Martijn Visser 
wrote:

> Thanks everyone for joining! It's good to see so many have joined in such
> a short time already. I've just refreshed the link which you can always
> find on the project website [1]
>
> Best regards, Martijn
>
> [1] https://flink.apache.org/community.html#slack
>
> Op do 2 jun. 2022 om 11:42 schreef Jingsong Li :
>
>> Thanks Xingtong, Jark, Martijn and Robert for making this possible!
>>
>> Best,
>> Jingsong
>>
>>
>> On Thu, Jun 2, 2022 at 5:32 PM Jark Wu  wrote:
>>
>>> Thank Xingtong for making this possible!
>>>
>>> Cheers,
>>> Jark Wu
>>>
>>> On Thu, 2 Jun 2022 at 15:31, Xintong Song  wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > I'm very happy to announce that the Apache Flink community has created
>>> a
>>> > dedicated Slack workspace [1]. Welcome to join us on Slack.
>>> >
>>> > ## Join the Slack workspace
>>> >
>>> > You can join the Slack workspace by either of the following two ways:
>>> > 1. Click the invitation link posted on the project website [2].
>>> > 2. Ask anyone who already joined the Slack workspace to invite you.
>>> >
>>> > We recommend 2), if available. Due to Slack limitations, the invitation
>>> > link in 1) expires and needs manual updates after every 100 invites.
>>> If it
>>> > is expired, please reach out to the dev / user mailing lists.
>>> >
>>> > ## Community rules
>>> >
>>> > When using the community Slack workspace, please follow these community
>>> > rules:
>>> > * *Be respectful* - This is the most important rule!
>>> > * All important decisions and conclusions *must be reflected back to
>>> the
>>> > mailing lists*. "If it didn’t happen on a mailing list, it didn’t
>>> happen."
>>> > - The Apache Mottos [3]
>>> > * Use *Slack threads* to keep parallel conversations from overwhelming
>>> a
>>> > channel.
>>> > * Please *do not direct message* people for troubleshooting, Jira
>>> assigning
>>> > and PR review. These should be picked-up voluntarily.
>>> >
>>> >
>>> > ## Maintenance
>>> >
>>> >
>>> > Committers can refer to this wiki page [4] for information needed for
>>> > maintaining the Slack workspace.
>>> >
>>> >
>>> > Thanks Jark, Martijn and Robert for helping setting up the Slack
>>> workspace.
>>> >
>>> >
>>> > Best,
>>> >
>>> > Xintong
>>> >
>>> >
>>> > [1] https://apache-flink.slack.com/
>>> >
>>> > [2] https://flink.apache.org/community.html#slack
>>> >
>>> > [3] http://theapacheway.com/on-list/
>>> >
>>> > [4] https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
>>> >
>>>
>>

Re: Is there an HA solution to run flink job with multiple source

2022-06-01 Thread Jing Ge

Hi Bariša,

Could you share the reason why your data processing pipeline should keep
running when one kafka source is down?
It seems like any one among the multiple kafka sources is optional for the
data processing logic, because any kafka source could be the one that is
down.

Best regards,
Jing

On Wed, Jun 1, 2022 at 5:59 PM Xuyang  wrote:

> I think you can try to use a custom source to do that although the one of
> the kafka sources is down the operator is also running(just do nothing).
> The only trouble is that you need to manage the checkpoint and something
> else yourself. But the good news is that you can copy the implementation of
> existing kafka source and change a little code conveniently.
>
> At 2022-06-01 22:38:39, "Bariša Obradović"  wrote:
>
> Hi,
> we are running a flink job with multiple kafka sources connected to
> different kafka servers.
>
> The problem we are facing is when one of the kafka's is down, the flink
> job starts restarting.
> Is there anyway for flink to pause processing of the kafka which is down,
> and yet continue processing from other sources?
>
> Cheers,
> Barisa
>
>

Re: Can we resume a job from a savepoint from Java api?

2022-06-01 Thread Jing Ge

Hi,

yuxia has already pointed out the correct direction. The exact line for
using the savepoint path to resume the job from a savepoint is at line 1326
[1]

[1]
https://github.com/apache/flink/blob/586715f23ef49939ab74e4736c58d71c643a64ba/flink-tests/src/test/java/org/apache/flink/test/checkpointing/SavepointITCase.java#L1326


On Wed, Jun 1, 2022 at 3:13 PM yuxia  wrote:

> Hope the unit
> test SavepointITCase#testCanRestoreWithModifiedStatelessOperators[1] in
> Flink repo  can help you.
>
>
> [1]
> https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/checkpointing/SavepointITCase.java#L1228
>
> Best regards,
> Yuxia
>
> --
> *发件人: *"Qing Lim" 
> *收件人: *"User" 
> *发送时间: *星期三, 2022年 6 月 01日 下午 7:46:59
> *主题: *Can we resume a job from a savepoint from Java api?
>
> Hi, is it possible to resume a job from a savepoint in Java code?
>
>
>
> I wish to test failure recovery in my test code, I am thinking to simulate
> failure recovery by saving state to a save point and the recover from it,
> is this possible with local MiniCluster setup?
>
>
>
> Kind regards
>
>
>
> This e-mail and any attachments are confidential to the addressee(s) and
> may contain information that is legally privileged and/or confidential. If
> you are not the intended recipient of this e-mail you are hereby notified
> that any dissemination, distribution, or copying of its content is strictly
> prohibited. If you have received this message in error, please notify the
> sender by return e-mail and destroy the message and all copies in your
> possession.
>
>
> To find out more details about how we may collect, use and share your
> personal information, please see https://www.mwam.com/privacy-policy.
> This includes details of how calls you make to us may be recorded in order
> for us to comply with our legal and regulatory obligations.
>
>
> To the extent that the contents of this email constitutes a financial
> promotion, please note that it is issued only to and/or directed only at
> persons who are professional clients or eligible counterparties as defined
> in the FCA Rules. Any investment products or services described in this
> email are available only to professional clients and eligible
> counterparties. Persons who are not professional clients or eligible
> counterparties should not rely or act on the contents of this email.
>
>
> Marshall Wace LLP is authorised and regulated by the Financial Conduct
> Authority. Marshall Wace LLP is a limited liability partnership registered
> in England and Wales with registered number OC302228 and registered office
> at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving
> this e-mail as a client, or an investor in an investment vehicle, managed
> or advised by Marshall Wace North America L.P., the sender of this e-mail
> is communicating with you in the sender's capacity as an associated or
> related person of Marshall Wace North America L.P. ("MWNA"), which is
> registered with the US Securities and Exchange Commission ("SEC") as an
> investment adviser.  Registration with the SEC does not imply that MWNA or
> its employees possess a certain level of skill or training.
>
>

Re: Status of File Sink Common (flink-file-sink-common)

2022-05-31 Thread Jing Ge

Hi,

Afaik, there are still a lot of unit tests depending on it. I don't think
we can drop it before dropping all of these unit tests.

Best regards,
Jing

On Tue, May 31, 2022 at 8:10 AM Yun Gao  wrote:

> Hi Jun,
>
> I think the release notes should only include the issues that cause changes
> visible to users. Also I think by design flink-file-sink-common should not
> be
> used directly by users and it only serve as a shared module by the legacy
> StreamingFileSink and the new FileSink.
>
> Best,
> Yun
>
>
> --
> From:yuxia 
> Send Time:2022 May 31 (Tue.) 09:14
> To:Jun Qin 
> Cc:User 
> Subject:Re: Status of File Sink Common (flink-file-sink-common)
>
> I'm afraid not. I can still find it in main repository[1].
> [1]
> https://github.com/apache/flink/tree/master/flink-connectors/flink-file-sink-common
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Jun Qin" 
> 收件人: "User" 
> 发送时间: 星期二, 2022年 5 月 31日 上午 5:24:10
> 主题: Status of File Sink Common (flink-file-sink-common)
>
> Hi,
>
> Has File Sink Common (flink-file-sink-common) been dropped? If so, since
> which version? I do not seem to find anything related in the release notes
> of 1.13.x, 1.14.x and 1.15.0.
>
> Thanks
> Jun
>
>
>

Re: GlobalCommitter in Flink's two-phase commit

2022-05-29 Thread Jing Ge

Hi,

1. What are the general usage scenarios of GlobalCommitter?
- GlobalCommitter is used for creating and committing an aggregated
committable. It is part of a 2-phase-commit protocol. One use case is the
compaction of small files.

2. Why should GlobalCommitter be removed in the new version of the api?
- As FLIP-191 described, there are many different requirement from
different downstream systems, e.g. Iceberg, Delta lake, Hive. One
GlobalCommitter could not cover all of them. If you take a look at the
SinkV1Adapter source code, you will see that
StandardSinkTopologies#addGlobalCommitter, which is recommended to replace
the usage of GlobalCommitter, is used to take care of the  post commit
topology.

Best regards,
Jing

On Tue, May 24, 2022 at 9:11 AM di wu <676366...@qq.com> wrote:

> Hello
> Regarding the GlobalCommitter in Flink's two-phase commit,
> I see it was introduced in FLIP-143, but it seems to have been removed
> again in FLP-191 and marked as Deprecated in the source code.
> I haven't found any relevant information about the use of GlobalCommitter.
>
> There are two questions I would like to ask:
> 1. What are the general usage scenarios of GlobalCommitter?
> 2. Why should GlobalCommitter be removed in the new version of the api?
>
> Thanks && Regards,
> di.wu
>
>

Re: GlobalCommitter in Flink's two-phase commit

2022-05-29 Thread Jing Ge

Hi,

1. What are the general usage scenarios of GlobalCommitter?
- GlobalCommitter is used for creating and committing an aggregated
committable. It is part of a 2-phase-commit protocol. One use case is the
compaction of small files.

2. Why should GlobalCommitter be removed in the new version of the api?
- As FLIP-191 described, there are many different requirement from
different downstream systems, e.g. Iceberg, Delta lake, Hive. One
GlobalCommitter could not cover all of them. If you take a look at the
SinkV1Adapter source code, you will see that
StandardSinkTopologies#addGlobalCommitter, which is recommended to replace
the usage of GlobalCommitter, is used to take care of the  post commit
topology.

Best regards,
Jing

On Tue, May 24, 2022 at 9:11 AM di wu <676366...@qq.com> wrote:

> Hello
> Regarding the GlobalCommitter in Flink's two-phase commit,
> I see it was introduced in FLIP-143, but it seems to have been removed
> again in FLP-191 and marked as Deprecated in the source code.
> I haven't found any relevant information about the use of GlobalCommitter.
>
> There are two questions I would like to ask:
> 1. What are the general usage scenarios of GlobalCommitter?
> 2. Why should GlobalCommitter be removed in the new version of the api?
>
> Thanks && Regards,
> di.wu
>
>

Re: Missing metrics in Flink v 1.15.0 rc-0

2022-04-07 Thread Jing Ge

Hi,

Flink 1.15 has developed a new feature to support different sink pre- and
post-topologies[1].  New metrics e.g. NumRecordsSend has been developed to
measure records sent to the external system.  Metrics like "Bytes Sent" and
"Records Sent" measure records sent to the next task. So, in your case, it
is expected to be 0.

There are some further improvements that need to be done on the WebUI.
Task[2] has been created.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-191%3A+Extend+unified+Sink+interface+to+support+small+file+compaction
[2]https://issues.apache.org/jira/browse/FLINK-27112

Best regards
Jing

On Thu, Apr 7, 2022 at 4:03 AM Xintong Song  wrote:

> Hi Peter,
>
> Have you compared the DAT topologies in 1.15 / 1.14?
>
> I think it's expected that "Records Received", "Bytes Sent" and "Records
> Sent" are 0. These metrics trace the internal data exchanges between Flink
> tasks. External data changes, i.e., source reading / sink writing data from
> / to external systems, are not counted. In your case, there's only 1
> vertex in the DAG, thus no internal data exchanges.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Apr 6, 2022 at 11:21 PM Peter Schrott 
> wrote:
>
>> Hi there,
>>
>> I just successfully upgraded our Flink cluster to 1.15.0 rc0 - also the
>> corresponding job is running on this version. Looks great so far!
>>
>> In the Web UI I noticed some metrics are missing, especially "Records
>> Received", "Bytes Sent" and "Records Sent". Those were shown in v 1.14.4.
>> See attached screenshot.
>>
>> Other than that I noticed, when using
>> org.apache.flink.metrics.prometheus.PrometheusReporter , the taskmanager
>> on which the job is running does not report the metrics on the configured
>> port. Rather it returns:
>>
>> ➜  ~ curl http://flink-taskmanager-xx:/
>> curl: (52) Empty reply from server
>>
>> The other taskmanager reports metrics.
>>
>> The exporter is configured as followed:
>>
>> # Prometheus metrics
>> metrics.reporters: prom
>> metrics.reporter.prom.class: 
>> org.apache.flink.metrics.prometheus.PrometheusReporter
>> metrics.reporter.prom.port: xx
>>
>> Is this a known issue with flink 1.15 rc0?
>>
>> Best, Peter
>>
>> [image: missingmetricsinui.png]
>>
>

Re: Flink SQL 1.12 How to implement query Hbase table on secondary index

2022-03-18 Thread Jing Ge

HI WuKong,

Afaiu, technically, you are not using Hbase secondary index(coprocessor).
What you are trying to do is to store the synced dim table in elasticsearch
and query from there to get the rowkeys and then use the rowkeys to get dim
table rows from Hbase. In this way, a (full) table scan in Hbase will be
avoided. Please correct me if I am wrong.

Would you like to share more information about your requirements in detail?
like DDLs of Kafka fact table, Hbase and ES dim table and the join logic
you want to achieve. So we could give you more feasible suggestions. Thanks.

Best regards,
Jing

On Thu, Mar 17, 2022 at 3:24 AM WuKong  wrote:

> Hi,
> now my data store hbase and I want use flink to implement kafka table
> temproal join hbase table , but condtion is not rowkey ,  I realize hbase
> secondary index, how can I implement this function what can use flink sql
> first query secondary index(such as es) and then use rowkey query hbase
> table (must implement own hbase secondary connectors ？)
>
> --
> ---
> Best,
> WuKong
>

Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-07 Thread Jing Ge

Hi,

Thanks Martijn for driving this discussion. Your concerns are very
rational.

We should do our best to keep the Flink development on the right track. I
would suggest discussing it in a vision/goal oriented way. Since Flink has
a clear vision of unified batch and stream processing, supporting batch
jobs will be one of the critical core features to help us reach the vision
and let Flink have an even bigger impact in the industry. I fully agree
with you that we should not focus on the Hive query syntax. Instead of it,
we should build a plan/schedule to support batch query syntax for the
vision. If there is any conflict between Hive query syntax and common batch
query syntax, we should stick with the common batch query syntax. For any
Hive specific query syntax, which is not supported as a common case by
other batch process engines, we should think very carefully and implement
it as a dialect extension like you suggested, but only when it is a
critical business requirement and has broad impact on many use cases. Last
but not least, from architecture's perspective, it is good to have the
capability to support arbitrary syntax via dialect/extension/plugin. But it
will also require a lot of effort to make it happen. Trade-off is always
the key. Currently, I have to agree with you again, we should focus more on
the common (batch) cases.


Best regards,
Jing

On Mon, Mar 7, 2022 at 1:53 PM Jing Zhang  wrote:

> Hi Martijn,
>
> Thanks for driving this discussion.
>
> +1 on efforts on more hive syntax compatibility.
>
> With the efforts on batch processing in recent versions(1.10~1.15), many
> users have run batch processing jobs based on Flink.
> In our team, we are trying to migrate most of the existing online batch
> jobs from Hive/Spark to Flink. We hope this migration does not require
> users to modify their sql.
> Although Hive is not as popular as it used to be, Hive SQL is still alive
> because many users still use Hive SQL to run spark jobs.
> Therefore, compatibility with more HIVE syntax is critical to this
> migration work.
>
> Best,
> Jing Zhang
>
>
>
> Martijn Visser  于2022年3月7日周一 19:23写道：
>
>> Hi everyone,
>>
>> Flink currently has 4 APIs with multiple language support which can be
>> used
>> to develop applications:
>>
>> * DataStream API, both Java and Scala
>> * Table API, both Java and Scala
>> * Flink SQL, both in Flink query syntax and Hive query syntax (partially)
>> * Python API
>>
>> Since FLIP-152 [1] the Flink SQL support has been extended to also support
>> the Hive query syntax. There is now a follow-up FLINK-26360 [2] to address
>> more syntax compatibility issues.
>>
>> I would like to open a discussion on Flink directly supporting the Hive
>> query syntax. I have some concerns if having a 100% Hive query syntax is
>> indeed something that we should aim for in Flink.
>>
>> I can understand that having Hive query syntax support in Flink could help
>> users due to interoperability and being able to migrate. However:
>>
>> - Adding full Hive query syntax support will mean that we go from 6 fully
>> supported API/language combinations to 7. I think we are currently already
>> struggling with maintaining the existing combinations, let another one
>> more.
>> - Apache Hive is/appears to be a project that's not that actively
>> developed
>> anymore. The last release was made in January 2021. It's popularity is
>> rapidly declining in Europe and the United State, also due Hadoop becoming
>> less popular.
>> - Related to the previous topic, other software like Snowflake,
>> Trino/Presto, Databricks are becoming more and more popular. If we add
>> full
>> support for the Hive query syntax, then why not add support for Snowflake
>> and the others?
>> - We are supporting Hive versions that are no longer supported by the Hive
>> community with known security vulnerabilities. This makes Flink also
>> vulnerable for those type of vulnerabilities.
>> - The currently Hive implementation is done by using a lot of internals of
>> Flink, making Flink hard to maintain, with lots of tech debt and making
>> things overly complex.
>>
>> From my perspective, I think it would be better to not have Hive query
>> syntax compatibility directly in Flink itself. Of course we should have a
>> proper Hive connector and a proper Hive catalog to make connectivity with
>> Hive (the versions that are still supported by the Hive community) itself
>> possible. Alternatively, if Hive query syntax is so important, it should
>> not rely on internals but be available as a dialect/pluggable option. That
>> could also open up the possibility to add more syntax support for others
>> in
>> the future, but I really think we should just focus on Flink SQL itself.
>> That's already hard enough to maintain and improve on.
>>
>> I'm looking forward to the thoughts of both Developers and Users, so I'm
>> cross-posting to both mailing lists.
>>
>> Best regards,
>>
>> Martijn Visser
>> https://twitter.com/MartijnVisser82
>>
>> [1]
>>

Re: How to proper hashCode() for keys.

2022-02-07 Thread Jing Ge

Hi John,

your getKey() implementation shows that it is not deterministic, since
calling it with the same click instance multiple times will return
different keys. For example a call at 12:01:59.950 and a call at
12:02:00.050 with the same click instance will return two different keys:

2022-04-07T12:01:00.000Z|cnn.com|some-article-name
2022-04-07T12:02:00.000Z|cnn.com|some-article-name

best regards
Jing

On Mon, Feb 7, 2022 at 5:07 PM John Smith  wrote:

> Maybe there's a misunderstanding. But basically I want to do clickstream
> count for a given "url" and for simplicity and accuracy of the count base
> it on processing time (event time doesn't matter as long as I get a total
> of clicks at that given processing time)
>
> So regardless of the event time. I want all clicks for the current
> processing time rounded to the minute per link.
>
> So, if now was 2022-04-07T12:01:00.000Z
>
> Then I would want the following result...
>
> 2022-04-07T12:01:00.000Z|cnn.com|some-article-name count = 10
> 2022-04-07T12:01:00.000Z|cnn.com|some-other-article count = 2
> 2022-04-07T12:01:00.000Z|cnn.com|another-article count = 15
> 
> 2022-04-07T12:02:00.000Z|cnn.com|some-article-name count = 30
> 2022-04-07T12:02:00.000Z|cnn.com|some-other-article count = 1
> 2022-04-07T12:02:00.000Z|cnn.com|another-article count = 10
> And so on...
>
> @Override
> public MyEventCountKey getKey(final MyEvent click) throws Exception
> {
> MyEventCountKey key = new MyEventCountKey(
> Instant.from(roundFloor(Instant.now().atZone(ZoneId.of("UTC")),
> ChronoField.MINUTE_OF_HOUR, windowSizeMins)).toString(),
> click.getDomain(), // cnn.com
> click.getPath(), // /some-article-name
> );
> return key;
> }
>
>
>
> On Mon, Feb 7, 2022 at 10:48 AM David Morávek  wrote:
>
>> The key selector works.
>>
>>
>> No it does not ;) It depends on the system time so it's not deterministic
>> (you can get different keys for the very same element).
>>
>> How do you key a count based on the time. I have taken this from samples
>>> online.
>>>
>>
>> This is what the windowing is for. You basically want to group / combine
>> elements per key and event time window [1].
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/windows/
>>
>> Best,
>> D.
>>
>> On Mon, Feb 7, 2022 at 3:44 PM John Smith  wrote:
>>
>>> The key selector works. It only causes an issue if there too many keys
>>> produced in one shot. For example of 100 "same" keys are produced for that
>>> 1 minutes it's ok. But if 101 are produced the error happens.
>>>
>>>
>>> If you look at the reproducer at least that's what's hapenning
>>>
>>> How do you key a count based on the time. I have taken this from samples
>>> online.
>>>
>>> The key is that particular time for that particular URL path.
>>>
>>> So cnn.com/article1 was clicked 10 times at 2022-01-01T10:01:00
>>>
>>> On Mon., Feb. 7, 2022, 8:57 a.m. Chesnay Schepler, 
>>> wrote:
>>>
 Your Key selector doesn't need to implement hashCode, but given the
 same object it has to return the same key.
 In your reproducer the returned key will have different timestamps, and
 since the timestamp is included in the hashCode, they will be different
 each time.

 On 07/02/2022 14:50, John Smith wrote:

 I don't get it? I provided the reproducer. I implemented the interface
 to Key selector it needs hashcode and equals as well?

 I'm attempting to do click stream. So the key is based on processing
 date/time rounded to the minute + domain name + path

 So these should be valid below?

 2022-01-01T10:02:00 + cnn.com + /article1
 2022-01-01T10:02:00 + cnn.com + /article1
 2022-01-01T10:02:00 + cnn.com + /article1

 2022-01-01T10:02:00 + cnn.com + /article2

 2022-01-01T10:03:00 + cnn.com + /article1
 2022-01-01T10:03:00 + cnn.com + /article1

 2022-01-01T10:03:00 + cnn.com + /article3
 2022-01-01T10:03:00 + cnn.com + /article3

 On Mon., Feb. 7, 2022, 2:53 a.m. Chesnay Schepler, 
 wrote:

> Don't KeySelectors also need to be deterministic?
>
> * The {@link KeySelector} allows to use deterministic objects for 
> operations such as reduce,* reduceGroup, join, coGroup, etc. *If invoked 
> multiple times on the same object, the returned key*** must be the same.*
>
>
> On 04/02/2022 18:25, John Smith wrote:
>
> Hi Francesco,  here is the reproducer:
> https://github.com/javadevmtl/flink-key-reproducer
>
> So, essentially it looks like when there's a high influx of records
> produced from the source that the Exception is thrown.
>
> The key is generated by 3 values: date/time rounded to the minute and
> 2 strings.
> So you will see keys as follows...
> 2022-02-04T17:20:00Z|foo|bar
> 2022-02-04T17:21:00Z|foo|bar
> 2022-02-04T17:22:00Z|foo|bar
>
> The reproducer has a custom source that basically produces

Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Jing Ge

Thanks @Martijn for driving this! +1 for deprecating and removing it. All
the concerns mentioned previously are valid. It is good to know that the
upcoming connector template/archetype will help the user for the kickoff.
Beyond that, speaking of using a real connector as a sample, since Flink is
heading towards the unified batch and stream processing, IMHO, it would be
nice to pick up a feasible connector for this trend to let the user get a
sample close to the use cases.

Best regards
Jing

On Mon, Jan 31, 2022 at 3:07 PM Andrew Otto  wrote:

> Shameless plug:  Maybe the Wikipedia EventStreams
>  SSE API
>  would make for a great
> connector example in Flink?
>
> :D
>
> On Mon, Jan 31, 2022 at 5:41 AM Martijn Visser 
> wrote:
>
>> Hi all,
>>
>> Thanks for your feedback. It's not about having this connector in the
>> main repo, that has been voted on already. This is strictly about the
>> connector itself, since it's not maintained and most probably also can't be
>> used due to changes in Twitter's API that aren't reflected in our connector
>> implementation. Therefore I propose to remove it.
>>
>> Fully agree on the template part, what's good to know is that a connector
>> template/archetype is part of the goals for the external
>> connector repository.
>>
>> Best regards,
>>
>> Martijn
>>
>> On Mon, 31 Jan 2022 at 11:32, Francesco Guardiani <
>> france...@ververica.com> wrote:
>>
>>> Hi,
>>>
>>> I agree with the concern about having this connector in the main repo.
>>> But I think in general it doesn't harm to have a sample connector to show
>>> how to develop a custom connector, and I think that the Twitter connector
>>> can be a good candidate for such a template. It needs rework for sure, as
>>> it has evident issues, notably it doesn't work with table.
>>>
>>> So i understand if we wanna remove what we have right now, but I think
>>> we should have some replacement for a "connector template", which is both
>>> ready to use and easy to hack to build your own connector starting from it.
>>> Twitter API is a good example for such a template, as it's both "related"
>>> to the known common use cases of Flink and because is quite simple to get
>>> started with.
>>>
>>> FG
>>>
>>> On Sun, Jan 30, 2022 at 12:31 PM David Anderson 
>>> wrote:
>>>
 I agree.

 The Twitter connector is used in a few (unofficial) tutorials, so if we
 remove it that will make it more difficult for those tutorials to be
 maintained. On the other hand, if I recall correctly, that connector uses
 V1 of the Twitter API, which has been deprecated, so it's really not very
 useful even for that purpose.

 David



 On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
 wrote:

> Hi everyone,
>
> I would like to discuss deprecating Flinks' Twitter connector [1].
> This was one of the first connectors that was added to Flink, which could
> be used to access the tweets from Twitter. Given the evolution of Flink
> over Twitter, I don't think that:
>
> * Users are still using this connector at all
> * That the code for this connector should be in the main Flink
> codebase.
>
> Given the circumstances, I would propose to deprecate and remove this
> connector. I'm looking forward to your thoughts. If you agree, please also
> let me know if you think we should first deprecate it in Flink 1.15 and
> remove it in a version after that, or if you think we can remove it
> directly.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/
>
>

Re: RowType for complex types in Parquet File

2022-01-07 Thread Jing Ge

Hi Meghajit,

like the exception described, parquet schema with nested columns is not
supported currently. It is on our todo list with high priority.

Best regards
Jing

On Fri, Jan 7, 2022 at 6:12 AM Meghajit Mazumdar <
meghajit.mazum...@gojek.com> wrote:

> Hello,
>
> Flink documentation mentions this
> 
> as to how to create a FileSource for reading Parquet files.
> For primitive parquet types like BINARY and BOOLEAN, I am able to create a
> RowType and read the fields.
>
> However, I have some nested fields in my parquet schema also like this
> which I want to read :
>
>   optional group location = 11 {
> optional double latitude = 1;
> optional double longitude = 2;
>   }
>
> How can I create a RowType for this ? I did something like this below, but
> I got an exception `Caused by: java.lang.UnsupportedOperationException:
> Complex types not supported`
>
> RowType nestedRowType = RowType.of(new LogicalType[] {new
> DoubleType(), new DoubleType()}, new String[]{"latitude", "longitude"});
> final LogicalType[] fieldTypes = new
> LogicalType[]{nestedRowType};
> final ParquetColumnarRowInputFormat format =
> new ParquetColumnarRowInputFormat<>(
> new Configuration(),
> RowType.of(fieldTypes, new
> String[]{"location"}),
> 500,
> false,
> true);
>

Re: Converting parquet MessageType to flink RowType

2022-01-06 Thread Jing Ge

Hi Meghajit,

good catch! Thanks for correcting me. The question is about how to use
column-oriented storage format like Parquet. What I tried to explain was
that the original MessageType has been used to build a projected
MessageType, since only required columns should be read. Without the input
from the user, there is no way to build the projected schema except read
all columns. Even if we could convert the MessageType to RowType, we would
still need the user's input. The fieldTypes are therefore (mandatorily)
required with current implementation because, when the given fields could
not be found *by the ParquetVectorizedInputFormat *in the parquet footer, a
type info is still needed to build the projected schema.

Best regards
Jing

On Thu, Jan 6, 2022 at 12:38 PM Meghajit Mazumdar <
meghajit.mazum...@gojek.com> wrote:

> Hi Jing,
>
> Thanks for the reply.
> Had 2 doubts related to your answer :
>
> 1. There was a conversion from Flink GroupType to Parquet MessageType. It
> might be possible to build the conversion the other way around.
> -> Both GroupType and MessageType are parquet data structures I believe,
> present in the org.apache.parquet.schema package. I am actually looking if
> it is possible to convert it into a Flink data type, such as RowType.
>
> 2. The fieldTypes are required in case the given fields could not be found
> in the parquet footer, like for example typo.
> -> Does this mean that fieldTypes are not required to be given during the
> construction of RowType ? I tried leaving it empty as below, but it gave an
> exception *Caused by: java.lang.ClassNotFoundException:
> org.apache.flink.table.data.vector.ColumnVector*
>
> final ParquetColumnarRowInputFormat format =
> new ParquetColumnarRowInputFormat<>(
> new Configuration(),
> RowType.of(new LogicalType[]{}, new
> String[]{"field_name_1", "field_name_2"}),
> 500,
>         false,
> true);
>
> Regards,
> Meghajit
>
> On Thu, Jan 6, 2022 at 3:43 PM Jing Ge  wrote:
>
>> Hi Meghajit,
>>
>> thanks for asking. If you took a look at the source code
>> https://github.com/apache/flink/blob/9bbadb9b105b233b7565af120020ebd8dce69a4f/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java#L174,
>> you should see Parquet MessageType has been read from the footer and used.
>> There was a conversion from Flink GroupType to Parquet MessageType. It
>> might be possible to build the conversion the other way around. But the
>> question is about the performance, because only the required columns should
>> be read, therefore the column names should be given by the user. The
>> fieldTypes are required in case the given fields could not be found in the
>> parquet footer, like for example typo.
>>
>> Best regards
>> Jing
>>
>> On Thu, Jan 6, 2022 at 7:01 AM Meghajit Mazumdar <
>> meghajit.mazum...@gojek.com> wrote:
>>
>>> Hello,
>>>
>>> We want to read and process Parquet Files using a FileSource and the 
>>> DataStream API.
>>>
>>>
>>> Currently, as referenced from the documentation 
>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/formats/parquet/#:~:text=contain%20event%20timestamps.-,final%20LogicalType%5B%5D%20fieldTypes%20%3D%0A%20%20new%20LogicalType%5B%5D%20%7B%0A%20%20new%20DoubleType()%2C%20new%20IntType()%2C%20new,DataStream%3CRowData%3E%20stream%20%3D%0A%20%20env.fromSource(source%2C%20WatermarkStrategy.noWatermarks()%2C%20%22file%2Dsource%22)%3B,-Continuous%20read%20example>,
>>>  this is the way in which a FileSource for Parquet is created. As can be 
>>> seen, it requires the construction of a RowType like this
>>>
>>>
>>> RowType*.*of*(*fieldTypes*,* *new* String*[]* *{*"f7"*,* "f4"*,* "f99”*});*
>>>
>>>
>>> where fieldTypes is created like this:
>>>
>>>
>>> *final* LogicalType*[]* fieldTypes *=*
>>>
>>>   *new* LogicalType*[]* *{*
>>>
>>>   *new* DoubleType*(),* *new* IntType*(),* *new* VarCharType*()*
>>>
>>>   *};*
>>>
>>>
>>> Ideally, instead of specifying the column names( f7, f99,...) and their 
>>> data types(DoubleType, VarCharType, ...), we would like to use the schema 
>>> of the Parquet File itself to create a RowType.
>>>
>>> The schema is present in the footer of the Parquet file, inside the 
>>> metad

Re: Converting parquet MessageType to flink RowType

2022-01-06 Thread Jing Ge

Hi Meghajit,

thanks for asking. If you took a look at the source code
https://github.com/apache/flink/blob/9bbadb9b105b233b7565af120020ebd8dce69a4f/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java#L174,
you should see Parquet MessageType has been read from the footer and used.
There was a conversion from Flink GroupType to Parquet MessageType. It
might be possible to build the conversion the other way around. But the
question is about the performance, because only the required columns should
be read, therefore the column names should be given by the user. The
fieldTypes are required in case the given fields could not be found in the
parquet footer, like for example typo.

Best regards
Jing

On Thu, Jan 6, 2022 at 7:01 AM Meghajit Mazumdar <
meghajit.mazum...@gojek.com> wrote:

> Hello,
>
> We want to read and process Parquet Files using a FileSource and the 
> DataStream API.
>
>
> Currently, as referenced from the documentation 
> ,
>  this is the way in which a FileSource for Parquet is created. As can be 
> seen, it requires the construction of a RowType like this
>
>
> RowType*.*of*(*fieldTypes*,* *new* String*[]* *{*"f7"*,* "f4"*,* "f99”*});*
>
>
> where fieldTypes is created like this:
>
>
> *final* LogicalType*[]* fieldTypes *=*
>
>   *new* LogicalType*[]* *{*
>
>   *new* DoubleType*(),* *new* IntType*(),* *new* VarCharType*()*
>
>   *};*
>
>
> Ideally, instead of specifying the column names( f7, f99,...) and their data 
> types(DoubleType, VarCharType, ...), we would like to use the schema of the 
> Parquet File itself to create a RowType.
>
> The schema is present in the footer of the Parquet file, inside the metadata.
>
> We wanted to know if there is an easy way by which way we can convert a 
> parquet schema, i:e, *MessageType* into a Flink *RowType* directly ?
>
> The parquet schema of the file can be easily obtained by using 
> *org.apache.parquet.hadoop.ParquetFileReader* as follows:
>
>
> ParquetFileReader reader = 
> ParquetFileReader.open(HadoopInputFile.fromPath(path, conf));
>
> MessageType schema = reader.getFileMetaData().getSchema(); // this schema has 
> the field names as well as the data types of the parquet records
>
>
> As of now, because we couldn’t find a way to convert the schema into a 
> RowType directly, we resorted to writing our own custom parser to parse a 
> Parquet SimpleGroup into a Flink Row like this:
>
>
> ParquetFileReader reader = 
> ParquetFileReader.open(HadoopInputFile.fromPath(path, conf));
>
> PageReadStore nextPage = reader.readNextRowGroup();
>
> Row row = parseToRow(SimpleGroup g); // custom parser function
>
>
> Looking forward to an answer from the community. Thanks !
>
>
> Regards,
>
> Meghajit
>
>
>

Re: [DISCUSS] Drop Gelly

2022-01-04 Thread Jing Ge

Hi,

thanks Martijn for bringing it up for discussion. I think we could make the
discussion a little bit clearer by splitting it into two questions:

1. should Flink drop Gelly?
2. should Flink drop the graph computing?

The answer of the first question could be yes, since there have been no
changes for years. +1 for dropping Gelly.

But for the second question, I would suggest answering it with no or with
strategic yes and will definitely support it again in the near future,
because there are many use cases that could be solved by streaming/batch +
graph in a more elegant way. Afaik fintech companies have a lot of those
use cases[1]. It would be great if we could find a way to drop Gelly but
keep the graph computing ability within Flink's ecosystem.

Best regards
Jing

[1]
https://california18.com/the-ant-graph-calculation-is-upgraded-to-tugraph-and-it-won-the-2021-world-internet-leading-scientific-and-technological-achievement-award/2117052021/


On Tue, Jan 4, 2022 at 2:02 PM Zhipeng Zhang 
wrote:

> Hi Martijin,
>
> Thanks for the feedback. I am not proposing  to bundle the new graph
> library with Alink. I am +1 for dropping the DataSet-based Gelly library,
> but we probably need a new graph library in Flink for the possible
> migration.
>
> We haven't decided what to do yet and probably need more discussion. There
> are some possible solutions:
> 1. We include a new DataStream-based graph library in FlinkML[1], given
> that graphs and machine learning algorithms are more often used together
> [2][3][4]. To achieve this, we could reuse the `AlgoOperator` interface in
> FlinkML.
> 2. We include a new DataStream-based graph library as a separate
> module/repo. This is consistent with existing libraries like Spark [5].
>
> What do you think?
>
>
> [1] https://github.com/apache/flink-ml
> [2] https://arxiv.org/abs/1403.6652
> [3] https://arxiv.org/abs/1503.03578
> [4] https://github.com/apache/spark
>
> Best,
> Zhipeng
>
> Martijn Visser  于2022年1月4日周二 15:27写道：
>
>> Hi Zhipeng,
>>
>> Good that you've reached out, I wasn't aware that Gelly is being used in
>> Alink. Are you proposing to write a new graph library as a successor of
>> Gelly and bundle that with Alink?
>>
>> Best regards,
>>
>> Martijn
>>
>> On Tue, 4 Jan 2022 at 02:57, Zhipeng Zhang 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Thanks for starting the discussion :)
>>>
>>> We (Alink team [1]) are actually using part of the Gelly library to
>>> support graph algorithms (connected component, single source shortest path,
>>> etc.) for users in Alibaba Inc.
>>>
>>> As DataSet API is going to be dropped, shall we also provide a new graph
>>> library based on DataStream runtime (similar as we did for machine
>>> learning)?
>>>
>>> [1] https://github.com/Alibaba/alink
>>>
>>> David Anderson  于2022年1月4日周二 00:01写道：
>>>
 Most of the inquiries I've had about Gelly in recent memory have been
 from folks looking for a streaming solution, and it's only been a handful.

 +1 for dropping Gelly

 David

 On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann 
 wrote:

> I haven't seen any changes or requests to/for Gelly in ages. Hence, I
> would assume that it is not really used and can be removed.
>
> +1 for dropping Gelly.
>
> Cheers,
> Till
>
> On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> Flink is bundled with Gelly, a Graph API library [1]. This has been
>> marked as approaching end-of-life for quite some time [2].
>>
>> Gelly is built on top of Flink's DataSet API, which is deprecated and
>> slowly being phased out [3]. It only works on batch jobs. Based on the
>> activity in the Dev and User mailing lists, I don't see a lot of 
>> questions
>> popping up regarding the usage of Gelly. Removing Gelly would reduce CI
>> time and resources because we won't need to run tests for this anymore.
>>
>> I'm cross-posting this to the User mailing list to see if there are
>> any users of Gelly at the moment.
>>
>> Let me know your thoughts.
>>
>> Martijn Visser | Product Manager
>>
>> mart...@ververica.com
>>
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/libs/gelly/overview/
>>
>> [2] https://flink.apache.org/roadmap.html
>>
>> [3] https://lists.apache.org/thread/b2y3xx3thbcbtzdphoct5wvzwogs9sqz
>>
>> 
>>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward  - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>>
>>>
>>> --
>>> best,
>>> Zhipeng
>>>
>>>
>
> --
> best,
> Zhipeng
>
>

Re: flinksql的await

2021-12-21 Thread Jing Ge

陈卓宇 你好，

在默认情况下，所有提交后的DML都是异步执行的，详见TableEnvironment.executeSql(String
statement)的注释。使用.await()和不使用.await()的区别是使用await()后会等待异步查询返回第一行结果（题外话：请注意INSERT和SELECT的区别），详见TableResult.await()注解，具体代码见TableResultImpl.awaitInternal(long
timeout, TimeUnit unit), 由于此时入参timeout为-1，导致future.get()被调用,
强制等待resultProvider.isFirstRowReady()为true。

祝好

On Tue, Dec 21, 2021 at 10:00 AM 陈卓宇 <2572805...@qq.com.invalid> wrote:

> 社区您好：
>
> String initialValues =
> "INSERT INTO kafka\n"
> + "SELECT CAST(price AS DECIMAL(10, 2)), currency, "
> + " CAST(d AS DATE), CAST(t AS TIME(0)), CAST(ts AS
> TIMESTAMP(3))\n"
> + "FROM (VALUES (2.02,'Euro','2019-12-12', '00:00:01',
> '2019-12-12 00:00:01.001001'), \n"
> + "  (1.11,'US Dollar','2019-12-12', '00:00:02',
> '2019-12-12 00:00:02.002001'), \n"
> + "  (50,'Yen','2019-12-12', '00:00:03', '2019-12-12
> 00:00:03.004001'), \n"
> + "  (3.1,'Euro','2019-12-12', '00:00:04', '2019-12-12
> 00:00:04.005001'), \n"
> + "  (5.33,'US Dollar','2019-12-12', '00:00:05',
> '2019-12-12 00:00:05.006001'), \n"
> + "  (0,'DUMMY','2019-12-12', '00:00:10', '2019-12-12
> 00:00:10'))\n"
> + "  AS orders (price, currency, d, t, ts)";
> tEnv.executeSql(initialValues).await();
> 我看了await他的注解但是感觉还是没有理解他的作用，
> 使用.await()和不使用.await()的区别是什么？
> 陈卓宇
>
>
>

96 matches

Mail list logo