date:20230327

[jira] [Created] (FLINK-31634) FLIP-301: Hybrid Shuffle supports Remote Storage

2023-03-27 Thread Yuxin Tan (Jira)

Yuxin Tan created FLINK-31634:
-

 Summary: FLIP-301: Hybrid Shuffle supports Remote Storage
 Key: FLINK-31634
 URL: https://issues.apache.org/jira/browse/FLINK-31634
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Network
Affects Versions: 1.18.0
Reporter: Yuxin Tan


This is an umbrella ticket for 
[FLIP-301|https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-27 Thread Shammon FY

Hi hongshun

Thanks for updating, the FLIP's pretty good and looks good to me!

Best,
Shammon FY


On Tue, Mar 28, 2023 at 11:16 AM Hongshun Wang 
wrote:

> Hi Shammon,
>
>
> Thanks a lot for your advise. I agree with your opinion now. It seems that
> I forgot to consider that it may be at a certain point in the future.
>
>
> I will modify OffsetsInitializer to provide a different strategy for later
> discovered partitions, by which users can also customize strategies for new
> and old partitions.
>
>  WDYT?
>
>
> Yours
>
> Hongshun
>
> On Tue, Mar 28, 2023 at 9:00 AM Shammon FY  wrote:
>
> > Hi Hongshun
> >
> > Thanks for your answer.
> >
> > I think the startup offset of Kafka such as timestamp or
> > specific_offset has no relationship with `Window Operator`. Users can
> > freely set the starting position according to their needs, it may be
> before
> > the latest Kafka data, or it may be at a certain point in the future.
> >
> > The offsets set by users in Kafka can be divided into four types at the
> > moment: EARLIEST, LATEST, TIMESTAMP, SPECIFIC_OFFSET. The new discovered
> > partitions may need to be handled with different strategies for these
> four
> > types:
> >
> > 1. EARLIEST, use EARLIEST for the new discovered partitions
> > 2. LATEST, use EARLIEST for the new discovered partitions
> > 3. TIMESTAMP, use TIMESTAMP for the new discovered partitions
> > 4. SPECIFIC_OFFSET, use SPECIFIC_OFFSET for the new discovered partitions
> >
> > From above, it seems that we only need to do special processing for
> > EARLIEST. What do you think of it?
> >
> > Best,
> > Shammon FY
> >
> >
> > On Fri, Mar 24, 2023 at 11:23 AM Hongshun Wang 
> > wrote:
> >
> > > "If all new messages in old partitions should be consumed, all new
> > messages
> > > in new partitions should also be consumed."
> > >
> > > Sorry, I wrote the last sentence incorrectly.
> > >
> > > On Fri, Mar 24, 2023 at 11:15 AM Hongshun Wang <
> loserwang1...@gmail.com>
> > > wrote:
> > >
> > > > Hi Shammon,
> > > >
> > > > Thanks for your advise!  I learn a lot about
> TIMESTAMP/SPECIFIC_OFFSET.
> > > > That's interesting.
> > > >
> > > > However, I have a different opinion.
> > > >
> > > > If a user employs the SPECIFIC_OFFSET strategy and enables
> > > auto-discovery,
> > > > they will be able to find new partitions beyond the specified offset.
> > > > Otherwise, enabling auto-discovery is no sense.
> > > >
> > > > When it comes to the TIMESTAMP strategy, it seems to be trivial. I
> > > > understand your concern, however, it’s the role of time window rather
> > > than
> > > > partition discovery. The TIMESTAMP strategy means that the consumer
> > > starts
> > > > from the first record whose timestamp is greater than or equal to a
> > given
> > > > timestamp, rather than only consuming all records whose timestamp is
> > > > greater than or equal to the given timestamp. *Thus, even disable
> auto
> > > > discovery or discover new partitions with TIMESTAMP strategy, same
> > > problems
> > > > still occur.*
> > > >
> > > > Above all , why use EARLIEST strategy? I believe that the strategy
> > > > specified by the startup should be the strategy at the moment of
> > > startup. *So
> > > > there is no difference between new partitions and new messages in old
> > > > partitions.* Therefore, all the new partition issues that you care
> > about
> > > > will still appear even if you disable the partition, as new messages
> in
> > > old
> > > > partitions. If all new messages in old partitions should be consume,
> > all
> > > > new messages in old partitions should also be consume.
> > > >
> > > >
> > > > Best,
> > > > Hongshun
> > > >
> > > > On Thu, Mar 23, 2023 at 8:34 PM Shammon FY 
> wrote:
> > > >
> > > >> Hi Hongshun
> > > >>
> > > >> Thanks for driving this discussion. Automatically discovering
> > partitions
> > > >> without losing data sounds great!
> > > >>
> > > >> Currently flink supports kafka source with different startup modes,
> > such
> > > >> as
> > > >> EARLIEST, LATEST, TIMESTAMP, SPECIFIC_OFFSETS and GROUP_OFFSET.
> > > >>
> > > >> If I understand correctly, you will set the offset of new partitions
> > > with
> > > >> EARLIEST? Please correct me if I'm wrong, I think the EARLIEST
> startup
> > > >> mode
> > > >> for new partitions is not suitable if users set
> > > TIMESTAMP/SPECIFIC_OFFSET
> > > >> for kafka in their jobs.
> > > >>
> > > >> For an extreme example, the current time is 2023-03-23 15:00:00 and
> > > users
> > > >> set the TIMESTAMP with 2023-03-23 16:00:00 for their jobs. If a
> > > partition
> > > >> is added during this period, jobs will generate “surprising” data.
> > What
> > > do
> > > >> you think of it?
> > > >>
> > > >>
> > > >> Best,
> > > >> Shammon FY
> > > >>
> > > >>
> > > >> On Tue, Mar 21, 2023 at 6:58 PM Hongshun Wang <
> > loserwang1...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi, Hang,
> > > >> >
> > > >> > Thanks for your advice.
> > > >> >
> > > >> > When the second case will occur?

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Yuxin Tan

Congratulations！

Best,
Yuxin


Guanghui Zhang  于2023年3月28日周二 11:06写道：

> Congratulations！
>
> Best,
> Zhang Guanghui
>
> Hang Ruan  于2023年3月28日周二 10:29写道：
>
> > Congratulations!
> >
> > Best,
> > Hang
> >
> > yu zelin  于2023年3月28日周二 10:27写道：
> >
> >> Congratulations!
> >>
> >> Best,
> >> Yu Zelin
> >>
> >> 2023年3月27日 17:23，Yu Li  写道：
> >>
> >> Dear Flinkers,
> >>
> >>
> >>
> >> As you may have noticed, we are pleased to announce that Flink Table
> Store has joined the Apache Incubator as a separate project called Apache
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a
> streaming data lake platform for high-speed data ingestion, change data
> tracking and efficient real-time analytics, with the vision of supporting a
> larger ecosystem and establishing a vibrant and neutral open source
> community.
> >>
> >>
> >>
> >> We would like to thank everyone for their great support and efforts for
> the Flink Table Store project, and warmly welcome everyone to join the
> development and activities of the new project. Apache Flink will continue
> to be one of the first-class citizens supported by Paimon, and we believe
> that the Flink and Paimon communities will maintain close cooperation.
> >>
> >>
> >> 亲爱的Flinkers，
> >>
> >>
> >> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> >> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> >> Apache
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
> >>
> >>
> >> 在这里我们要感谢大家对 Flink Table Store
> >> 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 将继续作为 Paimon
> 支持的主力计算引擎之一，我们也相信
> >> Flink 和 Paimon 社区将继续保持密切合作。
> >>
> >>
> >> Best Regards,
> >> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
> >>
> >> 致礼，
> >> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
> >>
> >> [1] https://paimon.apache.org/
> >> [2] https://github.com/apache/incubator-paimon
> >> [3]
> https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> >>
> >>
> >>
>

[jira] [Created] (FLINK-31633) Hive dialect can't order by an unselected column

2023-03-27 Thread luoyuxia (Jira)

luoyuxia created FLINK-31633:


 Summary: Hive dialect can't order by an unselected column
 Key: FLINK-31633
 URL: https://issues.apache.org/jira/browse/FLINK-31633
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: luoyuxia


With hive dialect, the following sql will throw exception:
{code:java}
create table (a int, b int);
select a from t order by b; {code}
It has been fixed by HIVE-15160. We can apply this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-27 Thread Hongshun Wang

Hi Shammon,


Thanks a lot for your advise. I agree with your opinion now. It seems that
I forgot to consider that it may be at a certain point in the future.


I will modify OffsetsInitializer to provide a different strategy for later
discovered partitions, by which users can also customize strategies for new
and old partitions.

 WDYT?


Yours

Hongshun

On Tue, Mar 28, 2023 at 9:00 AM Shammon FY  wrote:

> Hi Hongshun
>
> Thanks for your answer.
>
> I think the startup offset of Kafka such as timestamp or
> specific_offset has no relationship with `Window Operator`. Users can
> freely set the starting position according to their needs, it may be before
> the latest Kafka data, or it may be at a certain point in the future.
>
> The offsets set by users in Kafka can be divided into four types at the
> moment: EARLIEST, LATEST, TIMESTAMP, SPECIFIC_OFFSET. The new discovered
> partitions may need to be handled with different strategies for these four
> types:
>
> 1. EARLIEST, use EARLIEST for the new discovered partitions
> 2. LATEST, use EARLIEST for the new discovered partitions
> 3. TIMESTAMP, use TIMESTAMP for the new discovered partitions
> 4. SPECIFIC_OFFSET, use SPECIFIC_OFFSET for the new discovered partitions
>
> From above, it seems that we only need to do special processing for
> EARLIEST. What do you think of it?
>
> Best,
> Shammon FY
>
>
> On Fri, Mar 24, 2023 at 11:23 AM Hongshun Wang 
> wrote:
>
> > "If all new messages in old partitions should be consumed, all new
> messages
> > in new partitions should also be consumed."
> >
> > Sorry, I wrote the last sentence incorrectly.
> >
> > On Fri, Mar 24, 2023 at 11:15 AM Hongshun Wang 
> > wrote:
> >
> > > Hi Shammon,
> > >
> > > Thanks for your advise!  I learn a lot about TIMESTAMP/SPECIFIC_OFFSET.
> > > That's interesting.
> > >
> > > However, I have a different opinion.
> > >
> > > If a user employs the SPECIFIC_OFFSET strategy and enables
> > auto-discovery,
> > > they will be able to find new partitions beyond the specified offset.
> > > Otherwise, enabling auto-discovery is no sense.
> > >
> > > When it comes to the TIMESTAMP strategy, it seems to be trivial. I
> > > understand your concern, however, it’s the role of time window rather
> > than
> > > partition discovery. The TIMESTAMP strategy means that the consumer
> > starts
> > > from the first record whose timestamp is greater than or equal to a
> given
> > > timestamp, rather than only consuming all records whose timestamp is
> > > greater than or equal to the given timestamp. *Thus, even disable auto
> > > discovery or discover new partitions with TIMESTAMP strategy, same
> > problems
> > > still occur.*
> > >
> > > Above all , why use EARLIEST strategy? I believe that the strategy
> > > specified by the startup should be the strategy at the moment of
> > startup. *So
> > > there is no difference between new partitions and new messages in old
> > > partitions.* Therefore, all the new partition issues that you care
> about
> > > will still appear even if you disable the partition, as new messages in
> > old
> > > partitions. If all new messages in old partitions should be consume,
> all
> > > new messages in old partitions should also be consume.
> > >
> > >
> > > Best,
> > > Hongshun
> > >
> > > On Thu, Mar 23, 2023 at 8:34 PM Shammon FY  wrote:
> > >
> > >> Hi Hongshun
> > >>
> > >> Thanks for driving this discussion. Automatically discovering
> partitions
> > >> without losing data sounds great!
> > >>
> > >> Currently flink supports kafka source with different startup modes,
> such
> > >> as
> > >> EARLIEST, LATEST, TIMESTAMP, SPECIFIC_OFFSETS and GROUP_OFFSET.
> > >>
> > >> If I understand correctly, you will set the offset of new partitions
> > with
> > >> EARLIEST? Please correct me if I'm wrong, I think the EARLIEST startup
> > >> mode
> > >> for new partitions is not suitable if users set
> > TIMESTAMP/SPECIFIC_OFFSET
> > >> for kafka in their jobs.
> > >>
> > >> For an extreme example, the current time is 2023-03-23 15:00:00 and
> > users
> > >> set the TIMESTAMP with 2023-03-23 16:00:00 for their jobs. If a
> > partition
> > >> is added during this period, jobs will generate “surprising” data.
> What
> > do
> > >> you think of it?
> > >>
> > >>
> > >> Best,
> > >> Shammon FY
> > >>
> > >>
> > >> On Tue, Mar 21, 2023 at 6:58 PM Hongshun Wang <
> loserwang1...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi, Hang,
> > >> >
> > >> > Thanks for your advice.
> > >> >
> > >> > When the second case will occur? Currently, there are three ways to
> > >> specify
> > >> > partitions in Kafka: by topic, by partition, and by matching the
> topic
> > >> with
> > >> > a regular expression. Currently, if the initial partition number is
> 0,
> > >> an
> > >> > error will occur for the first two methods. However, when using a
> > >> regular
> > >> > expression to match topics, it is allowed to have 0 matched topics.
> > >> >
> > >> > > I don't know when the second case will occur
> > >> >
> >

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Guanghui Zhang

Congratulations！

Best,
Zhang Guanghui

Hang Ruan  于2023年3月28日周二 10:29写道：

> Congratulations!
>
> Best,
> Hang
>
> yu zelin  于2023年3月28日周二 10:27写道：
>
>> Congratulations!
>>
>> Best,
>> Yu Zelin
>>
>> 2023年3月27日 17:23，Yu Li  写道：
>>
>> Dear Flinkers,
>>
>>
>>
>> As you may have noticed, we are pleased to announce that Flink Table Store 
>> has joined the Apache Incubator as a separate project called Apache 
>> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
>> streaming data lake platform for high-speed data ingestion, change data 
>> tracking and efficient real-time analytics, with the vision of supporting a 
>> larger ecosystem and establishing a vibrant and neutral open source 
>> community.
>>
>>
>>
>> We would like to thank everyone for their great support and efforts for the 
>> Flink Table Store project, and warmly welcome everyone to join the 
>> development and activities of the new project. Apache Flink will continue to 
>> be one of the first-class citizens supported by Paimon, and we believe that 
>> the Flink and Paimon communities will maintain close cooperation.
>>
>>
>> 亲爱的Flinkers，
>>
>>
>> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
>> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
>> Apache 
>> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>>
>>
>> 在这里我们要感谢大家对 Flink Table Store
>> 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信
>> Flink 和 Paimon 社区将继续保持密切合作。
>>
>>
>> Best Regards,
>> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>>
>> 致礼，
>> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>>
>> [1] https://paimon.apache.org/
>> [2] https://github.com/apache/incubator-paimon
>> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>>
>>
>>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Yun Tang

Congratulations!
Unlike other data-lakes, Paimon might be the first one to act as a stream-first 
(not batch-first) data-lake.

Best
Yun Tang

From: Xianxun Ye 
Sent: Tuesday, March 28, 2023 10:52
To: dev@flink.apache.org 
Cc: Yu Li ; user ; user-zh 
; dev@flink.apache.org 
Subject: Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache 
Paimon(incubating)

Congratulations!

Best regards,

Xianxun

On 03/27/2023 22:51，Samrat 
Deb wrote：
congratulations

Bests,
Samrat

On Mon, Mar 27, 2023 at 7:19 PM Yanfei Lei  wrote:

Congratulations!

Best Regards,
Yanfei

ramkrishna vasudevan  于2023年3月27日周一 21:46写道：

Congratulations !!!

On Mon, Mar 27, 2023 at 2:54 PM Yu Li  wrote:

Dear Flinkers,

As you may have noticed, we are pleased to announce that Flink Table
Store has joined the Apache Incubator as a separate project called Apache
Paimon(incubating) [1] [2] [3]. The new project still aims at building a
streaming data lake platform for high-speed data ingestion, change data
tracking and efficient real-time analytics, with the vision of supporting a
larger ecosystem and establishing a vibrant and neutral open source
community.

We would like to thank everyone for their great support and efforts for
the Flink Table Store project, and warmly welcome everyone to join the
development and activities of the new project. Apache Flink will continue
to be one of the first-class citizens supported by Paimon, and we believe
that the Flink and Paimon communities will maintain close cooperation.

亲爱的Flinkers，

正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache 孵化器独立孵化 [1] [2]
[3]。新项目的名字是 Apache
Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。

在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache
Flink 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。

Best Regards,

Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)

致礼，

李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）

[1] https://paimon.apache.org/

[2] https://github.com/apache/incubator-paimon

[3]
https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Biao Liu

Congrats!

Thanks,
Biao /'bɪ.aʊ/



On Tue, 28 Mar 2023 at 10:29, Hang Ruan  wrote:

> Congratulations!
>
> Best,
> Hang
>
> yu zelin  于2023年3月28日周二 10:27写道：
>
>> Congratulations!
>>
>> Best,
>> Yu Zelin
>>
>> 2023年3月27日 17:23，Yu Li  写道：
>>
>> Dear Flinkers,
>>
>>
>>
>> As you may have noticed, we are pleased to announce that Flink Table Store 
>> has joined the Apache Incubator as a separate project called Apache 
>> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
>> streaming data lake platform for high-speed data ingestion, change data 
>> tracking and efficient real-time analytics, with the vision of supporting a 
>> larger ecosystem and establishing a vibrant and neutral open source 
>> community.
>>
>>
>>
>> We would like to thank everyone for their great support and efforts for the 
>> Flink Table Store project, and warmly welcome everyone to join the 
>> development and activities of the new project. Apache Flink will continue to 
>> be one of the first-class citizens supported by Paimon, and we believe that 
>> the Flink and Paimon communities will maintain close cooperation.
>>
>>
>> 亲爱的Flinkers，
>>
>>
>> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
>> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
>> Apache 
>> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>>
>>
>> 在这里我们要感谢大家对 Flink Table Store
>> 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信
>> Flink 和 Paimon 社区将继续保持密切合作。
>>
>>
>> Best Regards,
>> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>>
>> 致礼，
>> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>>
>> [1] https://paimon.apache.org/
>> [2] https://github.com/apache/incubator-paimon
>> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>>
>>
>>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Hang Ruan

Congratulations!

Best,
Hang

yu zelin  于2023年3月28日周二 10:27写道：

> Congratulations!
>
> Best,
> Yu Zelin
>
> 2023年3月27日 17:23，Yu Li  写道：
>
> Dear Flinkers,
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
>
>
>
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
>
>
> 亲爱的Flinkers，
>
>
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
> 致礼，
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>
> [1] https://paimon.apache.org/
> [2] https://github.com/apache/incubator-paimon
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
>
>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread yu zelin

Congratulations!

Best,
Yu Zelin

> 2023年3月27日 17:23，Yu Li  写道：
> 
> Dear Flinkers,
> 
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
> 
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
> 
> 亲爱的Flinkers，
> 
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache 孵化器独立孵化 [1] [2] 
> [3]。新项目的名字是 Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
> 
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
> 
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
> 
> 致礼，
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
> 
> [1] https://paimon.apache.org/
> [2] https://github.com/apache/incubator-paimon
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread ConradJam

Congratulations!

Yu Li  于2023年3月27日周一 17:24写道：
>
> Dear Flinkers,
>
>
> As you may have noticed, we are pleased to announce that Flink Table
> Store has joined the Apache Incubator as a separate project called
> Apache Paimon(incubating) [1] [2] [3]. The new project still aims at
> building a streaming data lake platform for high-speed data ingestion,
> change data tracking and efficient real-time analytics, with the
> vision of supporting a larger ecosystem and establishing a vibrant and
> neutral open source community.
>
>
> We would like to thank everyone for their great support and efforts
> for the Flink Table Store project, and warmly welcome everyone to join
> the development and activities of the new project. Apache Flink will
> continue to be one of the first-class citizens supported by Paimon,
> and we believe that the Flink and Paimon communities will maintain
> close cooperation.
>
>
> 亲爱的Flinkers，
>
>
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
>
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
>
> 致礼，
>
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>
>
> [1] https://paimon.apache.org/
>
> [2] https://github.com/apache/incubator-paimon
>
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal

-- 
Best

ConradJam

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Jingsong Li

Congratulations!

I believe Paimon will work with Flink to build the best streaming data
warehouse.

Best,
Jingsong

On Tue, Mar 28, 2023 at 8:27 AM Shammon FY  wrote:
>
> Congratulations!
>
>
> Best,
> Shammon FY
>
> On Mon, Mar 27, 2023 at 11:37 PM Samrat Deb  wrote:
>
> > congratulations
> >
> > Bests,
> > Samrat
> > On Mon, 27 Mar 2023 at 8:24 PM, Alexander Fedulov <
> > alexander.fedu...@gmail.com> wrote:
> >
> > > Great to see this, congratulations!
> > >
> > > Best,
> > > Alex
> > >
> > > On Mon, 27 Mar 2023 at 11:24, Yu Li  wrote:
> > >
> > > > Dear Flinkers,
> > > >
> > > >
> > > >
> > > > As you may have noticed, we are pleased to announce that Flink Table
> > > Store has joined the Apache Incubator as a separate project called Apache
> > > Paimon(incubating) [1] [2] [3]. The new project still aims at building a
> > > streaming data lake platform for high-speed data ingestion, change data
> > > tracking and efficient real-time analytics, with the vision of
> > supporting a
> > > larger ecosystem and establishing a vibrant and neutral open source
> > > community.
> > > >
> > > >
> > > >
> > > > We would like to thank everyone for their great support and efforts for
> > > the Flink Table Store project, and warmly welcome everyone to join the
> > > development and activities of the new project. Apache Flink will continue
> > > to be one of the first-class citizens supported by Paimon, and we believe
> > > that the Flink and Paimon communities will maintain close cooperation.
> > > >
> > > >
> > > > 亲爱的Flinkers，
> > > >
> > > >
> > > > 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> > > > 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> > > > Apache
> > >
> > Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
> > > >
> > > >
> > > > 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache
> > > Flink
> > > > 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
> > > >
> > > >
> > > > Best Regards,
> > > >
> > > > Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
> > > >
> > > >
> > > > 致礼，
> > > >
> > > > 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
> > > >
> > > >
> > > > [1] https://paimon.apache.org/
> > > >
> > > > [2] https://github.com/apache/incubator-paimon
> > > >
> > > > [3]
> > https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> > > >
> > >
> >

[jira] [Created] (FLINK-31632) watermark aligned idle source can't resume

2023-03-27 Thread haishui (Jira)

haishui created FLINK-31632:
---

 Summary: watermark aligned idle source can't resume
 Key: FLINK-31632
 URL: https://issues.apache.org/jira/browse/FLINK-31632
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.15.4, 1.16.1, 1.17.0
Reporter: haishui


 
{code:java}
WatermarkStrategy watermarkStrategy = WatermarkStrategy
.forBoundedOutOfOrderness(Duration.ofMillis(0))
.withTimestampAssigner((element, recordTimestamp) -> 
Long.parseLong(element))
.withWatermarkAlignment("group", Duration.ofMillis(10), 
Duration.ofSeconds(2))
.withIdleness(Duration.ofSeconds(10)); 
DataStreamSource s1 = env.fromSource(kafkaSource("topic1"), 
watermarkStrategy, "S1");
DataStreamSource s2 = env.fromSource(kafkaSource("topic2"), 
watermarkStrategy, "S2");{code}
send "0" to kafka topic1 and topic2

 

After 10s, source1 and source2 is idle，and logs are

 
{code:java}
09:44:30,403 DEBUG 
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - New reported 
watermark=Watermark @ -1 (1970-01-01 07:59:59.999) from subTaskId=0
09:44:30,404 DEBUG 
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - New reported 
watermark=Watermark @ -1 (1970-01-01 07:59:59.999) from subTaskId=0
09:44:32,019 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Distributing 
maxAllowedWatermark=9 to subTaskIds=[0]
09:44:32,019 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Distributing 
maxAllowedWatermark=9 to subTaskIds=[0]
09:44:32,417 DEBUG 
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - New reported 
watermark=Watermark @ -1 (1970-01-01 07:59:59.999) from subTaskId=0
09:44:32,418 DEBUG 
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - New reported 
watermark=Watermark @ -1 (1970-01-01 07:59:59.999) from subTaskId=0
09:44:34,028 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Distributing 
maxAllowedWatermark=9 to subTaskIds=[0]
09:44:34,028 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Distributing 
maxAllowedWatermark=9 to subTaskIds=[0]
09:44:34,423 DEBUG 
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - New reported 
watermark=Watermark @ 9223372036854775807 (292278994-08-17 15:12:55.807) from 
subTaskId=0
09:44:34,424 DEBUG 
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - New reported 
watermark=Watermark @ 9223372036854775807 (292278994-08-17 15:12:55.807) from 
subTaskId=0
09:44:36,023 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Distributing 
maxAllowedWatermark=-9223372036854775799 to subTaskIds=[0]
09:44:36,023 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Distributing 
maxAllowedWatermark=-9223372036854775799 to subTaskIds=[0]
09:44:36,433 DEBUG 
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - New reported 
watermark=Watermark @ 9223372036854775807 (292278994-08-17 15:12:55.807) from 
subTaskId=0
09:44:36,433 DEBUG 
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - New reported 
watermark=Watermark @ 9223372036854775807 (292278994-08-17 15:12:55.807) from 
subTaskId=0 {code}
send message to topic1 or topic2 now, the message can't be consumed。

 

the reason is: 

when a source is marked idle, the lastEmittedWatermark = Long.MAX_VALUE and 
currentMaxDesiredWatermark = Long.MAX_VALUE + maxAllowedWatermarkDrift in 
org.apache.flink.streaming.api.operators.SourceOperator.
currentMaxDesiredWatermark is negative and always less than lastEmittedWatermark
operatingMode always is WAITING_FOR_ALIGNMENT



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-27 Thread Shammon FY

Hi Hongshun

Thanks for your answer.

I think the startup offset of Kafka such as timestamp or
specific_offset has no relationship with `Window Operator`. Users can
freely set the starting position according to their needs, it may be before
the latest Kafka data, or it may be at a certain point in the future.

The offsets set by users in Kafka can be divided into four types at the
moment: EARLIEST, LATEST, TIMESTAMP, SPECIFIC_OFFSET. The new discovered
partitions may need to be handled with different strategies for these four
types:

1. EARLIEST, use EARLIEST for the new discovered partitions
2. LATEST, use EARLIEST for the new discovered partitions
3. TIMESTAMP, use TIMESTAMP for the new discovered partitions
4. SPECIFIC_OFFSET, use SPECIFIC_OFFSET for the new discovered partitions

>From above, it seems that we only need to do special processing for
EARLIEST. What do you think of it?

Best,
Shammon FY


On Fri, Mar 24, 2023 at 11:23 AM Hongshun Wang 
wrote:

> "If all new messages in old partitions should be consumed, all new messages
> in new partitions should also be consumed."
>
> Sorry, I wrote the last sentence incorrectly.
>
> On Fri, Mar 24, 2023 at 11:15 AM Hongshun Wang 
> wrote:
>
> > Hi Shammon,
> >
> > Thanks for your advise!  I learn a lot about TIMESTAMP/SPECIFIC_OFFSET.
> > That's interesting.
> >
> > However, I have a different opinion.
> >
> > If a user employs the SPECIFIC_OFFSET strategy and enables
> auto-discovery,
> > they will be able to find new partitions beyond the specified offset.
> > Otherwise, enabling auto-discovery is no sense.
> >
> > When it comes to the TIMESTAMP strategy, it seems to be trivial. I
> > understand your concern, however, it’s the role of time window rather
> than
> > partition discovery. The TIMESTAMP strategy means that the consumer
> starts
> > from the first record whose timestamp is greater than or equal to a given
> > timestamp, rather than only consuming all records whose timestamp is
> > greater than or equal to the given timestamp. *Thus, even disable auto
> > discovery or discover new partitions with TIMESTAMP strategy, same
> problems
> > still occur.*
> >
> > Above all , why use EARLIEST strategy? I believe that the strategy
> > specified by the startup should be the strategy at the moment of
> startup. *So
> > there is no difference between new partitions and new messages in old
> > partitions.* Therefore, all the new partition issues that you care about
> > will still appear even if you disable the partition, as new messages in
> old
> > partitions. If all new messages in old partitions should be consume, all
> > new messages in old partitions should also be consume.
> >
> >
> > Best,
> > Hongshun
> >
> > On Thu, Mar 23, 2023 at 8:34 PM Shammon FY  wrote:
> >
> >> Hi Hongshun
> >>
> >> Thanks for driving this discussion. Automatically discovering partitions
> >> without losing data sounds great!
> >>
> >> Currently flink supports kafka source with different startup modes, such
> >> as
> >> EARLIEST, LATEST, TIMESTAMP, SPECIFIC_OFFSETS and GROUP_OFFSET.
> >>
> >> If I understand correctly, you will set the offset of new partitions
> with
> >> EARLIEST? Please correct me if I'm wrong, I think the EARLIEST startup
> >> mode
> >> for new partitions is not suitable if users set
> TIMESTAMP/SPECIFIC_OFFSET
> >> for kafka in their jobs.
> >>
> >> For an extreme example, the current time is 2023-03-23 15:00:00 and
> users
> >> set the TIMESTAMP with 2023-03-23 16:00:00 for their jobs. If a
> partition
> >> is added during this period, jobs will generate “surprising” data. What
> do
> >> you think of it?
> >>
> >>
> >> Best,
> >> Shammon FY
> >>
> >>
> >> On Tue, Mar 21, 2023 at 6:58 PM Hongshun Wang 
> >> wrote:
> >>
> >> > Hi, Hang,
> >> >
> >> > Thanks for your advice.
> >> >
> >> > When the second case will occur? Currently, there are three ways to
> >> specify
> >> > partitions in Kafka: by topic, by partition, and by matching the topic
> >> with
> >> > a regular expression. Currently, if the initial partition number is 0,
> >> an
> >> > error will occur for the first two methods. However, when using a
> >> regular
> >> > expression to match topics, it is allowed to have 0 matched topics.
> >> >
> >> > > I don't know when the second case will occur
> >> >
> >> >
> >> > Why prefer the field `firstDiscoveryDone`? When a regular expression
> >> > initially matches 0 topics, it should consume all messages of the new
> >> > topic. If unassignedInitialPartitons and unassignedTopLevelPartitions
> >> are
> >> > used instead of firstDiscoveryDone, any new topics created during (5
> >> > minutes discovery + job restart time) will be treated as the first
> >> > discovery, causing data loss.
> >> >
> >> > > Then when will we get the empty partition list? I think it should be
> >> > treated as the first initial discovery if both
> >> `unassignedInitialPartitons`
> >> > and `assignedPartitons` are empty without `firstDiscoveryDone`.
> >> >
> >> > Best
> >> >

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Shammon FY

Congratulations!


Best,
Shammon FY

On Mon, Mar 27, 2023 at 11:37 PM Samrat Deb  wrote:

> congratulations
>
> Bests,
> Samrat
> On Mon, 27 Mar 2023 at 8:24 PM, Alexander Fedulov <
> alexander.fedu...@gmail.com> wrote:
>
> > Great to see this, congratulations!
> >
> > Best,
> > Alex
> >
> > On Mon, 27 Mar 2023 at 11:24, Yu Li  wrote:
> >
> > > Dear Flinkers,
> > >
> > >
> > >
> > > As you may have noticed, we are pleased to announce that Flink Table
> > Store has joined the Apache Incubator as a separate project called Apache
> > Paimon(incubating) [1] [2] [3]. The new project still aims at building a
> > streaming data lake platform for high-speed data ingestion, change data
> > tracking and efficient real-time analytics, with the vision of
> supporting a
> > larger ecosystem and establishing a vibrant and neutral open source
> > community.
> > >
> > >
> > >
> > > We would like to thank everyone for their great support and efforts for
> > the Flink Table Store project, and warmly welcome everyone to join the
> > development and activities of the new project. Apache Flink will continue
> > to be one of the first-class citizens supported by Paimon, and we believe
> > that the Flink and Paimon communities will maintain close cooperation.
> > >
> > >
> > > 亲爱的Flinkers，
> > >
> > >
> > > 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> > > 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> > > Apache
> >
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
> > >
> > >
> > > 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache
> > Flink
> > > 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
> > >
> > >
> > > Best Regards,
> > >
> > > Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
> > >
> > >
> > > 致礼，
> > >
> > > 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
> > >
> > >
> > > [1] https://paimon.apache.org/
> > >
> > > [2] https://github.com/apache/incubator-paimon
> > >
> > > [3]
> https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> > >
> >
>

Re: [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

2023-03-27 Thread Jing Ge

Hi Jane,

Thanks for clarifying it. As far as I am concerned, the issue is where to
keep the user's job metadata, i.e. SQL script (to make the discussion
easier, let's ignore config). As long as FLIP-190 is only used for
migration/upgrade, SQL is the single source of truth. Once the compiled
plan has been modified, in this case ttls, the user's job metadata will be
distributed into two different places. Each time when the SQL needs
changes, extra effort will be required to take care of the modification in
the compiled plan.

Examples:

1. If we try to start the same SQL with a new Flink cluster (one type of
"restart") without knowing the modified compiled plan. The old
performance issue will rise again. This might happen when multiple users
are working on the same project who run a working SQL job, get performance
issues, and have no clue since nothing has been changed. Or one user is
working on many SQL jobs who might lose the overview of which SQL job has
modified plans or not.
2. If a SQL has been changed in a backwards compatible way and (re)start
with a given savepoint(NO_CLAIM), the version2 json plan has to be made
based on version1, as I mentioned previously, which means each time when
the SQL got changed, the related compiled plan need modification too.
Beyond that , it would also be easily forgotten to do it if there were no
connection between the SQL and the related modified compiled plan. The SQL
job will have the performance issue again after the change.
3. Another scenario would be running a backwards compatible SQL job with an
upgraded FLink version, additional upgrade logic or guideline should be
developed for e.g. ttl modification in the compiled plan, because upgraded
Flink engine underneath might lead to a different ttl setting.
4. The last scenario is just like you described that SQL has been changed
significantly so that the compiled operators will be changed too. The easy
way is to start a fresh new tuning. But since there was a tuning for the
last SQL. User has to compare both compiled plans and copy/paste some ttls
that might still work.

A virtualization tool could help but might not reduce those efforts
significantly, since the user behaviour is changed enormously.

I was aware that the json string might be large. Doing(EXECUTE PLAN 'json
plan as string') is intended to avoid dealing with files for most common
cases where the json string has common length.

Anyway, it should be fine, if it is only recommended for advanced use cases
where users are aware of those efforts.

Best regards,
Jing

On Sat, Mar 25, 2023 at 3:54 PM Jane Chan  wrote:

> Hi Leonard, Jing and Shengkai,
>
> Thanks so much for your insightful comments. Here are my thoughts
>
> @Shengkai
> > 1. How the Gateway users use this feature? As far as I know, the EXEUCTE
> PLAN only supports local file right now. Is it possible to extend this
> syntax to allow for reading plan files from remote file systems?
>
> Nice catch! Currently, the "COMPILE PLAN" and "EXECUTE PLAN" statements
> only support a local file path without the scheme (see
> TableEnvironmentImpl.java#L773
> ).
> It's reasonable to extend the support to Flink's FileSystem. Besides, the
> JSON plan should also be added to the resource cleaning mechanism for the
> Gateway mode, just like we do with the "ADD JAR" operation, cleaning it up
> when the session ends. I will take your suggestion and make changes to FLIP.
>
> > 2. I would like to inquire if there are any limitations on this feature?
> I have encountered several instances where the data did not expire in the
> upstream operator, but it expired in the downstream operator, resulting in
> abnormal calculation results or direct exceptions thrown by the operator
> (e.g. rank operator). Can we limit that the expiration time of downstream
> operator data should be greater than or equal to the expiration time of
> upstream operator data?
>
> This is an excellent point. In fact, the current state TTL is based on the
> initialization time of each operator, which is inherently unaligned. The
> probability of such unalignment is magnified now that fine-grained
> operator-level TTL is supported. While on the other hand, this FLIP is not
> the root cause of this issue. To systematically solve the problem of TTL
> unalignment between operators, I understand that we need a larger FLIP to
> accomplish this. And I'll mention this point in the FLIP doc. WDYT?
>
> Back to your suggestions, in most scenarios, the TTL between multiple
> state operators should be non-monotonically decreasing, but there may be
> some exceptions, such as the SinkUpsertMaterializer introduced to solve the
> changelog disorder problem. It may not be appropriate if we block it at the
> implementation level. But it does happen that the users

Re: Subtask distribution in Flink

2023-03-27 Thread santhosh venkat

Hi,

Thank you so much for taking time to answer my questions and pointing me to
relevant documentation. Really appreciate it.

When the task failover happens, are there internal metrics in Flink at a
job level to track the new execution attempt?  Is there a way for the
application owner to figure out how many task failovers have happened in a
job execution and get the current execution attempt.

Thanks.

On Mon, Mar 27, 2023 at 2:55 AM Weihua Hu  wrote:

> Hi,
>
> 1. Does this mean that  each  task slot will contain an entire pipeline in
> > the job?
>
> not exactly, each slot will run a subtask of each task. If the job is so
> simple that
> there is no keyby logic and we do not enable rebalance shuffle type, each
> slot
> could run all the pipeline. But if not we need to shuffle data to other
> subtasks.
> You can get some examples from [1].
>
> 2. Upon a TM pod failure and after K8s brings back the TM pod, would flink
> > assign the same subtasks back to restarted TM  again? Or will they be
> > distributed to different TaskManagers?
>
> If there is no shuffle data in your job (described in 1), only tasks on
> failure pods
>  will be restarted, and they will be assigned to the new TM again.
> But if not, all the related tasks will be restarted. When these tasks
> re-scheduled,
> there are some strategy to assign slots. They will try to assign the task
> to previous
> slot to reduce the recovery time, But there is no guarantee.
> You can read [2] to get more information about failure recovery.
>
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/flink-architecture/#tasks-and-operator-chains
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/
>
> Best,
> Weihua
>
>
> On Mon, Mar 27, 2023 at 3:22 PM santhosh venkat <
> santhoshvenkat1...@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to understand how subtask distribution works in Flink. Let's
> > assume a setup of a Flink cluster with a fixed number of TaskManagers in
> a
> > kubernetes cluster.
> >
> > Let's say I have a flink job with all the operators having the same
> > parallelism and with the same Slot sharing group. The operator
> parallelism
> > is computed as the number of task managers multiplied by number of task
> > slots per TM.
> >
> > 1. Does this mean that  each  task slot will contain an entire pipeline
> in
> > the job?
> > 2. Upon a TM pod failure and after K8s brings back the TM pod, would
> flink
> > assign the same subtasks back to restarted TM  again? Or will they be
> > distributed to different TaskManagers?
> >
> > It would be great if someone can answer this question.
> >
> > Thanks.
> >
>

[jira] [Created] (FLINK-31631) Upgrade GCS connector to 2.2.11.

2023-03-27 Thread Chris Nauroth (Jira)

Chris Nauroth created FLINK-31631:
-

 Summary: Upgrade GCS connector to 2.2.11.
 Key: FLINK-31631
 URL: https://issues.apache.org/jira/browse/FLINK-31631
 Project: Flink
  Issue Type: Improvement
  Components: FileSystems
Affects Versions: 1.17.0
Reporter: Chris Nauroth


Upgrade the [GCS 
Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
 bundled in the Flink distro from version 2.2.3 to 2.2.11. The new release 
contains multiple bug fixes and enhancements discussed in the [Release 
Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
 Notable changes include:
 * Improved socket timeout handling.
 * Trace logging capabilities.
 * Fix bug that prevented usage of GCS as a [Hadoop Credential 
Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
 * Dependency upgrades.
 * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31630) Support max-age for checkpoints during last-state upgrades

2023-03-27 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-31630:
--

 Summary: Support max-age for checkpoints during last-state upgrades
 Key: FLINK-31630
 URL: https://issues.apache.org/jira/browse/FLINK-31630
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Gyula Fora
Assignee: Gyula Fora
 Fix For: kubernetes-operator-1.5.0


The kubernetes operator currently does not consider the age of the last 
completed checkpoint before using it for upgrades.

We should have a configurable max age beyond which the operator will use 
savepoint or wait for in-progress checkpoints to complete.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Samrat Deb

congratulations

Bests,
Samrat
On Mon, 27 Mar 2023 at 8:24 PM, Alexander Fedulov <
alexander.fedu...@gmail.com> wrote:

> Great to see this, congratulations!
>
> Best,
> Alex
>
> On Mon, 27 Mar 2023 at 11:24, Yu Li  wrote:
>
> > Dear Flinkers,
> >
> >
> >
> > As you may have noticed, we are pleased to announce that Flink Table
> Store has joined the Apache Incubator as a separate project called Apache
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a
> streaming data lake platform for high-speed data ingestion, change data
> tracking and efficient real-time analytics, with the vision of supporting a
> larger ecosystem and establishing a vibrant and neutral open source
> community.
> >
> >
> >
> > We would like to thank everyone for their great support and efforts for
> the Flink Table Store project, and warmly welcome everyone to join the
> development and activities of the new project. Apache Flink will continue
> to be one of the first-class citizens supported by Paimon, and we believe
> that the Flink and Paimon communities will maintain close cooperation.
> >
> >
> > 亲爱的Flinkers，
> >
> >
> > 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> > 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> > Apache
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
> >
> >
> > 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache
> Flink
> > 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
> >
> >
> > Best Regards,
> >
> > Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
> >
> >
> > 致礼，
> >
> > 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
> >
> >
> > [1] https://paimon.apache.org/
> >
> > [2] https://github.com/apache/incubator-paimon
> >
> > [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> >
>

Re: [ANNOUNCE] Kafka Connector Code Removal from apache/flink:main branch and code freezing

2023-03-27 Thread Tzu-Li (Gordon) Tai

Thanks for the updates.

So far the above mentioned issues seem to all have PRs against
apache/flink-connector-kafka now.

To be clear, this notice isn't about discussing _what_ PRs we will be
merging or not merging - we should try to review all of them eventually.
The only reason I've made a list of PRs in the original notice is just to
make it visible which PRs we need to reopen against
apache/flink-connector-kafka due to the code removal.

Thanks,
Gordon

On Sun, Mar 26, 2023 at 7:07 PM Jacky Lau  wrote:

> Hi Gordon. https://issues.apache.org/jira/browse/FLINK-31006, which is
> also
> a critical bug in kafka. it will not exit after all partitions consumed
> when jobmanager failover in pipeline mode running unbounded source. and i
> talked with   @PatrickRen  offline, don't
> have a suitable way to fix it before. and we will solved it in this week
>
> Shammon FY  于2023年3月25日周六 13:13写道：
>
> > Thanks Jing and Gordon, I have closed the pr
> > https://github.com/apache/flink/pull/21965 and will open a new one for
> > kafka connector
> >
> >
> > Best,
> > shammon FY
> >
> >
> > On Saturday, March 25, 2023, Ran Tao  wrote:
> >
> > > Thank you Gordon and all the people who have worked on the externalized
> > > kafka implementation.
> > > I have another pr related to Kafka[1]. I will be very appreciative if
> you
> > > can help me review it in your free time.
> > >
> > > [1] https://github.com/apache/flink-connector-kafka/pull/10
> > >
> > > Best Regards,
> > > Ran Tao
> > >
> > >
> > > Tzu-Li (Gordon) Tai  于2023年3月24日周五 23:21写道：
> > >
> > > > Thanks Jing! I missed https://github.com/apache/flink/pull/21965
> > indeed.
> > > >
> > > > Please let us know if anything else was overlooked.
> > > >
> > > > On Fri, Mar 24, 2023 at 8:13 AM Jing Ge 
> > > > wrote:
> > > >
> > > > > Thanks Gordon for driving this! There is another PR related to
> Kafka
> > > > > connector: https://github.com/apache/flink/pull/21965
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > > > > On Fri, Mar 24, 2023 at 4:06 PM Tzu-Li (Gordon) Tai <
> > > tzuli...@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Now that Flink 1.17 has been released, and given that we've
> already
> > > > > synced
> > > > > > the latest Kafka connector code up to Flink 1.17 to the
> > > > > > apache/flink-connector-kafka repo (thanks to Mason and Martijn
> for
> > > most
> > > > > of
> > > > > > the effort!), we're now in the final step of completely removing
> > the
> > > > > Kafka
> > > > > > connector code from apache/flink:main branch, tracked by
> > FLINK-30859
> > > > [1].
> > > > > >
> > > > > > As such, we'd like to ask that no more Kafka connector changes
> gets
> > > > > merged
> > > > > > to apache/flink:main, effective now. Going forward, all Kafka
> > > connector
> > > > > PRs
> > > > > > should be opened directly against the
> apache/flink-connector-kafka:
> > > main
> > > > > > branch.
> > > > > >
> > > > > > Meanwhile, there's a couple of "dangling" Kafka connector PRs
> over
> > > the
> > > > > last
> > > > > > 2 months that is opened against apache/flink:main:
> > > > > >
> > > > > >1. [FLINK-31305] Propagate producer exceptions outside of
> > mailbox
> > > > > >executor [2]
> > > > > >2. [FLINK-31049] Add support for Kafka record headers to
> > KafkaSink
> > > > [3]
> > > > > >3. [FLINK-31262] Move kafka sql connector fat jar test to
> > > > > >SmokeKafkaITCase [4 ]
> > > > > >4. [hotfix] Add writeTimestamp option to
> > > > > >KafkaRecordSerializationSchemaBuilder [5]
> > > > > >
> > > > > > Apart from 1. [FLINK-31305] which is a critical bug and is
> already
> > in
> > > > > > review closed to being merged, for the rest we will be reaching
> out
> > > on
> > > > > the
> > > > > > PRs to ask the authors to close the PR and reopen them against
> > > > > > apache/flink-connector-kafka:main.
> > > > > >
> > > > > > Thanks,
> > > > > > Gordon
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-30859
> > > > > > [2] https://github.com/apache/flink/pull/22150
> > > > > > [3] https://github.com/apache/flink/pull/8
> > > > > > [4] https://github.com/apache/flink/pull/22060
> > > > > > [5] https://github.com/apache/flink/pull/22037
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Alexander Fedulov

Great to see this, congratulations!

Best,
Alex

On Mon, 27 Mar 2023 at 11:24, Yu Li  wrote:

> Dear Flinkers,
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
>
>
>
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
>
>
> 亲爱的Flinkers，
>
>
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
>
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
>
> 致礼，
>
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>
>
> [1] https://paimon.apache.org/
>
> [2] https://github.com/apache/incubator-paimon
>
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Samrat Deb

congratulations

Bests,
Samrat

On Mon, Mar 27, 2023 at 7:19 PM Yanfei Lei  wrote:

> Congratulations!
>
> Best Regards,
> Yanfei
>
> ramkrishna vasudevan  于2023年3月27日周一 21:46写道：
> >
> > Congratulations !!!
> >
> > On Mon, Mar 27, 2023 at 2:54 PM Yu Li  wrote:
> >>
> >> Dear Flinkers,
> >>
> >>
> >> As you may have noticed, we are pleased to announce that Flink Table
> Store has joined the Apache Incubator as a separate project called Apache
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a
> streaming data lake platform for high-speed data ingestion, change data
> tracking and efficient real-time analytics, with the vision of supporting a
> larger ecosystem and establishing a vibrant and neutral open source
> community.
> >>
> >>
> >> We would like to thank everyone for their great support and efforts for
> the Flink Table Store project, and warmly welcome everyone to join the
> development and activities of the new project. Apache Flink will continue
> to be one of the first-class citizens supported by Paimon, and we believe
> that the Flink and Paimon communities will maintain close cooperation.
> >>
> >>
> >> 亲爱的Flinkers，
> >>
> >>
> >> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache 孵化器独立孵化 [1] [2]
> [3]。新项目的名字是 Apache
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
> >>
> >>
> >> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache
> Flink 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
> >>
> >>
> >> Best Regards,
> >>
> >> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
> >>
> >>
> >> 致礼，
> >>
> >> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
> >>
> >>
> >> [1] https://paimon.apache.org/
> >>
> >> [2] https://github.com/apache/incubator-paimon
> >>
> >> [3]
> https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>

Re: [NOTICE] Website now has a staging environment

2023-03-27 Thread Chesnay Schepler


It's just another tool in your belt. You can use it, but don't have to.

We used it for the Hugo migration, where we could only test some things 
(like the rewrite rules) in a production-like environment.


I'm +0 on whether to move the docs to the repo or not.

On 27/03/2023 14:46, Jing Ge wrote:

Wow, that is good news! Thanks Chesnay for your effort.

I'd like to ask the same question as what Matthias asked. Speaking of the
workflow, should we first push changes to the asf-staging branch and check
it at  https://flink.staged.apache.org and then cp them to the asf-site?
Should there always be two separate commits, one for changes and a direct
follow up one for rebuilding the site?

Best regards,
Jing

On Mon, Mar 27, 2023 at 1:39 PM Matthias Pohl
 wrote:


Thanks for sharing that information.

Is there a specific workflow expected for pushing changes? Shall we always
go through the staging environment first or is it more like a nice-to-have
feature?

I'm also wondering whether we should move the documentation from the
Confluent page into the README file in apache/flink-web [1]. The wiki feels
to be too far away from the code here. WDYT?

Matthias

[1] https://github.com/apache/flink-web#readme

On Mon, Feb 6, 2023 at 5:09 PM Chesnay Schepler 
wrote:


Hello,

Just so more people are aware of it, I recently enabled a staging
environment for the Flink Website.

You can push the rebuilt website to the asf-staging branch in flink-web,
and the changes will be visible at https://flink.staged.apache.org
shortly.

This has been documented at
https://cwiki.apache.org/confluence/display/FLINK/Website.


Currently a Hugo-based version of the Website (FLINK-22922) can be seen
in the staging environment.


Cheers

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Yanfei Lei

Congratulations!

Best Regards,
Yanfei

ramkrishna vasudevan  于2023年3月27日周一 21:46写道：
>
> Congratulations !!!
>
> On Mon, Mar 27, 2023 at 2:54 PM Yu Li  wrote:
>>
>> Dear Flinkers,
>>
>>
>> As you may have noticed, we are pleased to announce that Flink Table Store 
>> has joined the Apache Incubator as a separate project called Apache 
>> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
>> streaming data lake platform for high-speed data ingestion, change data 
>> tracking and efficient real-time analytics, with the vision of supporting a 
>> larger ecosystem and establishing a vibrant and neutral open source 
>> community.
>>
>>
>> We would like to thank everyone for their great support and efforts for the 
>> Flink Table Store project, and warmly welcome everyone to join the 
>> development and activities of the new project. Apache Flink will continue to 
>> be one of the first-class citizens supported by Paimon, and we believe that 
>> the Flink and Paimon communities will maintain close cooperation.
>>
>>
>> 亲爱的Flinkers，
>>
>>
>> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache 孵化器独立孵化 [1] [2] 
>> [3]。新项目的名字是 Apache 
>> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>>
>>
>> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 
>> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>>
>>
>> Best Regards,
>>
>> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>>
>>
>> 致礼，
>>
>> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>>
>>
>> [1] https://paimon.apache.org/
>>
>> [2] https://github.com/apache/incubator-paimon
>>
>> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread ramkrishna vasudevan

Congratulations !!!

On Mon, Mar 27, 2023 at 2:54 PM Yu Li  wrote:

> Dear Flinkers,
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
>
>
>
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
>
>
> 亲爱的Flinkers，
>
>
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
>
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
>
> 致礼，
>
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>
>
> [1] https://paimon.apache.org/
>
> [2] https://github.com/apache/incubator-paimon
>
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread feng xiangyu

Congrats Yu! Looking forward to contributing to Paimon!

Best Regards,
Xiangyu

yuxia  于2023年3月27日周一 21:01写道：

> congratulations!
>
> Best regards,
> Yuxia
>
>
> 发件人: "Andrew Otto" 
> 收件人: "Matthias Pohl" 
> 抄送: "Jing Ge" , "Leonard Xu" , "Yu
> Li" , "dev" , "User" <
> u...@flink.apache.org>, "user-zh" 
> 发送时间: 星期一, 2023年 3 月 27日 下午 8:57:50
> 主题: Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache
> Paimon(incubating)
>
> Exciting!
>
> If this ends up working well, Wikimedia Foundation would love to try it
> out!
>
> On Mon, Mar 27, 2023 at 8:39 AM Matthias Pohl via user < [ mailto:
> u...@flink.apache.org | u...@flink.apache.org ] > wrote:
>
>
>
> Congratulations and good luck with pushing the project forward.
>
> On Mon, Mar 27, 2023 at 2:35 PM Jing Ge via user < [ mailto:
> u...@flink.apache.org | u...@flink.apache.org ] > wrote:
>
> BQ_BEGIN
>
> Congrats!
> Best regards,
> Jing
>
> On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu < [ mailto:xbjt...@gmail.com |
> xbjt...@gmail.com ] > wrote:
>
> BQ_BEGIN
>
> Congratulations!
>
> Best,
> Leonard
>
>
> BQ_BEGIN
>
> On Mar 27, 2023, at 5:23 PM, Yu Li < [ mailto:car...@gmail.com |
> car...@gmail.com ] > wrote:
>
> Dear Flinkers,
>
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store
> has joined the Apache Incubator as a separate project called Apache
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a
> streaming data lake platform for high-speed data ingestion, change data
> tracking and efficient real-time analytics, with the vision of supporting a
> larger ecosystem and establishing a vibrant and neutral open source
> community.
>
>
>
>
> We would like to thank everyone for their great support and efforts for
> the Flink Table Store project, and warmly welcome everyone to join the
> development and activities of the new project. Apache Flink will continue
> to be one of the first-class citizens supported by Paimon, and we believe
> that the Flink and Paimon communities will maintain close cooperation.
>
>
>
>
> 亲爱的Flinkers，
>
>
>
>
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache 孵化器独立孵化 [1] [2]
> [3]。新项目的名字是 Apache
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>
>
>
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
>
>
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
> 致礼，
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>
>
>
>
> [1] [ https://paimon.apache.org/ | https://paimon.apache.org/ ]
> [2] [ https://github.com/apache/incubator-paimon |
> https://github.com/apache/incubator-paimon ]
> [3] [ https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> | https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal ]
>
>
>
>
>
> BQ_END
>
>
> BQ_END
>
>
> BQ_END
>
>
>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread yuxia

congratulations! 

Best regards, 
Yuxia 


发件人: "Andrew Otto"  
收件人: "Matthias Pohl"  
抄送: "Jing Ge" , "Leonard Xu" , "Yu Li" 
, "dev" , "User" 
, "user-zh"  
发送时间: 星期一, 2023年 3 月 27日 下午 8:57:50 
主题: Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache 
Paimon(incubating) 

Exciting! 

If this ends up working well, Wikimedia Foundation would love to try it out! 

On Mon, Mar 27, 2023 at 8:39 AM Matthias Pohl via user < [ 
mailto:u...@flink.apache.org | u...@flink.apache.org ] > wrote: 



Congratulations and good luck with pushing the project forward. 

On Mon, Mar 27, 2023 at 2:35 PM Jing Ge via user < [ 
mailto:u...@flink.apache.org | u...@flink.apache.org ] > wrote: 

BQ_BEGIN

Congrats! 
Best regards, 
Jing 

On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu < [ mailto:xbjt...@gmail.com | 
xbjt...@gmail.com ] > wrote: 

BQ_BEGIN

Congratulations! 

Best, 
Leonard 


BQ_BEGIN

On Mar 27, 2023, at 5:23 PM, Yu Li < [ mailto:car...@gmail.com | 
car...@gmail.com ] > wrote: 

Dear Flinkers, 




As you may have noticed, we are pleased to announce that Flink Table Store has 
joined the Apache Incubator as a separate project called Apache 
Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
streaming data lake platform for high-speed data ingestion, change data 
tracking and efficient real-time analytics, with the vision of supporting a 
larger ecosystem and establishing a vibrant and neutral open source community. 




We would like to thank everyone for their great support and efforts for the 
Flink Table Store project, and warmly welcome everyone to join the development 
and activities of the new project. Apache Flink will continue to be one of the 
first-class citizens supported by Paimon, and we believe that the Flink and 
Paimon communities will maintain close cooperation. 




亲爱的Flinkers， 




正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache 孵化器独立孵化 [1] [2] 
[3]。新项目的名字是 Apache 
Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
 




在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 
将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。 




Best Regards, 
Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC) 

致礼， 
李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC） 




[1] [ https://paimon.apache.org/ | https://paimon.apache.org/ ] 
[2] [ https://github.com/apache/incubator-paimon | 
https://github.com/apache/incubator-paimon ] 
[3] [ https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal | 
https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal ] 





BQ_END


BQ_END


BQ_END

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Samrat Deb

congratulations 拾

On Mon, 27 Mar 2023 at 6:02 PM, Leonard Xu  wrote:

> Congratulations!
>
>
> Best,
> Leonard
>
> > On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
> >
> > Dear Flinkers,
> >
> > As you may have noticed, we are pleased to announce that Flink Table
> Store has joined the Apache Incubator as a separate project called Apache
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a
> streaming data lake platform for high-speed data ingestion, change data
> tracking and efficient real-time analytics, with the vision of supporting a
> larger ecosystem and establishing a vibrant and neutral open source
> community.
> >
> > We would like to thank everyone for their great support and efforts for
> the Flink Table Store project, and warmly welcome everyone to join the
> development and activities of the new project. Apache Flink will continue
> to be one of the first-class citizens supported by Paimon, and we believe
> that the Flink and Paimon communities will maintain close cooperation.
> >
> > 亲爱的Flinkers，
> >
> > 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache 孵化器独立孵化 [1] [2]
> [3]。新项目的名字是 Apache
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
> >
> > 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache
> Flink 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
> >
> > Best Regards,
> > Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
> >
> > 致礼，
> > 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
> >
> > [1] https://paimon.apache.org/ 
> > [2] https://github.com/apache/incubator-paimon <
> https://github.com/apache/incubator-paimon>
> > [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> 
>

Re: [NOTICE] Website now has a staging environment

2023-03-27 Thread Jing Ge

Wow, that is good news! Thanks Chesnay for your effort.

I'd like to ask the same question as what Matthias asked. Speaking of the
workflow, should we first push changes to the asf-staging branch and check
it at  https://flink.staged.apache.org and then cp them to the asf-site?
Should there always be two separate commits, one for changes and a direct
follow up one for rebuilding the site?

Best regards,
Jing

On Mon, Mar 27, 2023 at 1:39 PM Matthias Pohl
 wrote:

> Thanks for sharing that information.
>
> Is there a specific workflow expected for pushing changes? Shall we always
> go through the staging environment first or is it more like a nice-to-have
> feature?
>
> I'm also wondering whether we should move the documentation from the
> Confluent page into the README file in apache/flink-web [1]. The wiki feels
> to be too far away from the code here. WDYT?
>
> Matthias
>
> [1] https://github.com/apache/flink-web#readme
>
> On Mon, Feb 6, 2023 at 5:09 PM Chesnay Schepler 
> wrote:
>
> > Hello,
> >
> > Just so more people are aware of it, I recently enabled a staging
> > environment for the Flink Website.
> >
> > You can push the rebuilt website to the asf-staging branch in flink-web,
> > and the changes will be visible at https://flink.staged.apache.org
> > shortly.
> >
> > This has been documented at
> > https://cwiki.apache.org/confluence/display/FLINK/Website.
> >
> >
> > Currently a Hugo-based version of the Website (FLINK-22922) can be seen
> > in the staging environment.
> >
> >
> > Cheers
> >
> >
>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Matthias Pohl

Congratulations and good luck with pushing the project forward.

On Mon, Mar 27, 2023 at 2:35 PM Jing Ge via user 
wrote:

> Congrats!
>
> Best regards,
> Jing
>
> On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu  wrote:
>
>> Congratulations!
>>
>>
>> Best,
>> Leonard
>>
>> On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
>>
>> Dear Flinkers,
>>
>>
>>
>> As you may have noticed, we are pleased to announce that Flink Table Store 
>> has joined the Apache Incubator as a separate project called Apache 
>> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
>> streaming data lake platform for high-speed data ingestion, change data 
>> tracking and efficient real-time analytics, with the vision of supporting a 
>> larger ecosystem and establishing a vibrant and neutral open source 
>> community.
>>
>>
>>
>> We would like to thank everyone for their great support and efforts for the 
>> Flink Table Store project, and warmly welcome everyone to join the 
>> development and activities of the new project. Apache Flink will continue to 
>> be one of the first-class citizens supported by Paimon, and we believe that 
>> the Flink and Paimon communities will maintain close cooperation.
>>
>>
>> 亲爱的Flinkers，
>>
>>
>> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
>> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
>> Apache 
>> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>>
>>
>> 在这里我们要感谢大家对 Flink Table Store
>> 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信
>> Flink 和 Paimon 社区将继续保持密切合作。
>>
>>
>> Best Regards,
>> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>>
>> 致礼，
>> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>>
>> [1] https://paimon.apache.org/
>> [2] https://github.com/apache/incubator-paimon
>> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>>
>>
>>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Konstantin Knauf

Congrats!

Am Mo., 27. März 2023 um 14:32 Uhr schrieb Leonard Xu :

> Congratulations!
>
>
> Best,
> Leonard
>
> On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
>
> Dear Flinkers,
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
>
>
>
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
>
>
> 亲爱的Flinkers，
>
>
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
> 致礼，
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>
> [1] https://paimon.apache.org/
> [2] https://github.com/apache/incubator-paimon
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
>
>

-- 
https://twitter.com/snntrable
https://github.com/knaufk

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Jing Ge

Congrats!

Best regards,
Jing

On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu  wrote:

> Congratulations!
>
>
> Best,
> Leonard
>
> On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
>
> Dear Flinkers,
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
>
>
>
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
>
>
> 亲爱的Flinkers，
>
>
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
> 致礼，
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
>
> [1] https://paimon.apache.org/
> [2] https://github.com/apache/incubator-paimon
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
>
>

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Leonard Xu

Congratulations!


Best,
Leonard

> On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
> 
> Dear Flinkers,
> 
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
> 
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
> 
> 亲爱的Flinkers，
> 
> 正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache 孵化器独立孵化 [1] [2] 
> [3]。新项目的名字是 Apache 
> Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。
> 
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 
> 将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
> 
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
> 
> 致礼，
> 李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）
> 
> [1] https://paimon.apache.org/ 
> [2] https://github.com/apache/incubator-paimon 
> 
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal 
>

Re: [NOTICE] Website now has a staging environment

2023-03-27 Thread Matthias Pohl

Thanks for sharing that information.

Is there a specific workflow expected for pushing changes? Shall we always
go through the staging environment first or is it more like a nice-to-have
feature?

I'm also wondering whether we should move the documentation from the
Confluent page into the README file in apache/flink-web [1]. The wiki feels
to be too far away from the code here. WDYT?

Matthias

[1] https://github.com/apache/flink-web#readme

On Mon, Feb 6, 2023 at 5:09 PM Chesnay Schepler  wrote:

> Hello,
>
> Just so more people are aware of it, I recently enabled a staging
> environment for the Flink Website.
>
> You can push the rebuilt website to the asf-staging branch in flink-web,
> and the changes will be visible at https://flink.staged.apache.org
> shortly.
>
> This has been documented at
> https://cwiki.apache.org/confluence/display/FLINK/Website.
>
>
> Currently a Hugo-based version of the Website (FLINK-22922) can be seen
> in the staging environment.
>
>
> Cheers
>
>

[jira] [Created] (FLINK-31629) Trying to access closed classloader when submit query to restSqlGateway via SqlClient

2023-03-27 Thread Weijie Guo (Jira)

Weijie Guo created FLINK-31629:
--

 Summary: Trying to access closed classloader when submit query to 
restSqlGateway via SqlClient
 Key: FLINK-31629
 URL: https://issues.apache.org/jira/browse/FLINK-31629
 Project: Flink
  Issue Type: Bug
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Subtask distribution in Flink

2023-03-27 Thread Weihua Hu

Hi,

1. Does this mean that  each  task slot will contain an entire pipeline in
> the job?

not exactly, each slot will run a subtask of each task. If the job is so
simple that
there is no keyby logic and we do not enable rebalance shuffle type, each
slot
could run all the pipeline. But if not we need to shuffle data to other
subtasks.
You can get some examples from [1].

2. Upon a TM pod failure and after K8s brings back the TM pod, would flink
> assign the same subtasks back to restarted TM  again? Or will they be
> distributed to different TaskManagers?

If there is no shuffle data in your job (described in 1), only tasks on
failure pods
 will be restarted, and they will be assigned to the new TM again.
But if not, all the related tasks will be restarted. When these tasks
re-scheduled,
there are some strategy to assign slots. They will try to assign the task
to previous
slot to reduce the recovery time, But there is no guarantee.
You can read [2] to get more information about failure recovery.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/flink-architecture/#tasks-and-operator-chains
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/

Best,
Weihua

On Mon, Mar 27, 2023 at 3:22 PM santhosh venkat <
santhoshvenkat1...@gmail.com> wrote:

> Hi,
>
> I am trying to understand how subtask distribution works in Flink. Let's
> assume a setup of a Flink cluster with a fixed number of TaskManagers in a
> kubernetes cluster.
>
> Let's say I have a flink job with all the operators having the same
> parallelism and with the same Slot sharing group. The operator parallelism
> is computed as the number of task managers multiplied by number of task
> slots per TM.
>
> 1. Does this mean that  each  task slot will contain an entire pipeline in
> the job?
> 2. Upon a TM pod failure and after K8s brings back the TM pod, would flink
> assign the same subtasks back to restarted TM  again? Or will they be
> distributed to different TaskManagers?
>
> It would be great if someone can answer this question.
>
> Thanks.
>

[jira] [Created] (FLINK-31628) ArrayIndexOutOfBoundsException in watermark processing

2023-03-27 Thread Michael Helmling (Jira)

Michael Helmling created FLINK-31628:


 Summary: ArrayIndexOutOfBoundsException in watermark processing
 Key: FLINK-31628
 URL: https://issues.apache.org/jira/browse/FLINK-31628
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Affects Versions: 1.17.0
 Environment: Kubernetes with Flink operator 1.4.0.
Reporter: Michael Helmling


After upgrading a job from Flink 1.16.1 to 1.17.0, my task managers throw the 
following exception:

 

 
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index -2147483648 out of bounds for 
length 5
at 
org.apache.flink.streaming.runtime.watermarkstatus.HeapPriorityQueue.removeInternal(HeapPriorityQueue.java:155)
at 
org.apache.flink.streaming.runtime.watermarkstatus.HeapPriorityQueue.remove(HeapPriorityQueue.java:100)
at 
org.apache.flink.streaming.runtime.watermarkstatus.StatusWatermarkValve$InputChannelStatus.removeFrom(StatusWatermarkValve.java:300)
at 
org.apache.flink.streaming.runtime.watermarkstatus.StatusWatermarkValve$InputChannelStatus.access$200(StatusWatermarkValve.java:266)
at 
org.apache.flink.streaming.runtime.watermarkstatus.StatusWatermarkValve.markWatermarkUnaligned(StatusWatermarkValve.java:222)
at 
org.apache.flink.streaming.runtime.watermarkstatus.StatusWatermarkValve.inputWatermarkStatus(StatusWatermarkValve.java:140)
at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:153)
at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:85)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:550)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.base/java.lang.Thread.run(Unknown Source){code}
I never saw this before. The job has multiple Kafka inputs, but doesn't use 
watermark alignment.

 

 

Initially reported [on 
Slack|https://apache-flink.slack.com/archives/C03G7LJTS2G/p1679908171461309], 
where a relation to FLINK-28853 was suspected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Yu Li

Dear Flinkers,


As you may have noticed, we are pleased to announce that Flink Table
Store has joined the Apache Incubator as a separate project called
Apache Paimon(incubating) [1] [2] [3]. The new project still aims at
building a streaming data lake platform for high-speed data ingestion,
change data tracking and efficient real-time analytics, with the
vision of supporting a larger ecosystem and establishing a vibrant and
neutral open source community.


We would like to thank everyone for their great support and efforts
for the Flink Table Store project, and warmly welcome everyone to join
the development and activities of the new project. Apache Flink will
continue to be one of the first-class citizens supported by Paimon,
and we believe that the Flink and Paimon communities will maintain
close cooperation.


亲爱的Flinkers，


正如您可能已经注意到的，我们很高兴地宣布，Flink Table Store 已经正式加入 Apache
孵化器独立孵化 [1] [2] [3]。新项目的名字是
Apache 
Paimon(incubating)，仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外，新项目将支持更加丰富的生态，并建立一个充满活力和中立的开源社区。


在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入，并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
将继续作为 Paimon 支持的主力计算引擎之一，我们也相信 Flink 和 Paimon 社区将继续保持密切合作。


Best Regards,

Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)


致礼，

李钰（谨代表 Apache Flink PMC 和 Apache Paimon PPMC）


[1] https://paimon.apache.org/

[2] https://github.com/apache/incubator-paimon

[3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal

[jira] [Created] (FLINK-31627) docker-build.sh build fails on Linux machines

2023-03-27 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-31627:
-

 Summary: docker-build.sh build fails on Linux machines
 Key: FLINK-31627
 URL: https://issues.apache.org/jira/browse/FLINK-31627
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Matthias Pohl


Building the Flink website on Linux fails due to how Docker is used as a Daemon 
running under root in Linux (see [this 
blog|https://jtreminio.com/blog/running-docker-containers-as-current-host-user/#ok-so-what-actually-works]
 for more details).

Building the website will fail when copying the artifacts because they are 
owned by the root user. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31626) HsSubpartitionFileReaderImpl should recycle skipped read buffers.

2023-03-27 Thread Weijie Guo (Jira)

Weijie Guo created FLINK-31626:
--

 Summary: HsSubpartitionFileReaderImpl should recycle skipped read 
buffers.
 Key: FLINK-31626
 URL: https://issues.apache.org/jira/browse/FLINK-31626
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.17.1
Reporter: Weijie Guo
Assignee: Weijie Guo


In FLINK-30189, {{HsSubpartitionFileReaderImpl}} will skip the buffer has been 
consumed from memory to avoid double-consumption. But these buffers were not 
returned to the {{BatchShuffleReadBufferPool}}, resulting in read buffer leaks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31625) Possbile OOM in KBinsDiscretizer

2023-03-27 Thread Fan Hong (Jira)

Fan Hong created FLINK-31625:


 Summary: Possbile OOM in KBinsDiscretizer
 Key: FLINK-31625
 URL: https://issues.apache.org/jira/browse/FLINK-31625
 Project: Flink
  Issue Type: Bug
  Components: Library / Machine Learning
Reporter: Fan Hong


In KBinsDiscretizer, the main computation `findBinEdgesWithXXXStrategy` is put 
into a single subtask. While data sampling is used to decrease memory usage, 
the memory overhead can still be prohibitive for large input vectors, 
potentially resulting in OOM errors.

A potential solution is to implement parallel computation, distributing the 
data evenly among all workers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31624) [Connectors/Jdbc] Update flink to 1.17.0

2023-03-27 Thread Jira

João Boto created FLINK-31624:
-

 Summary: [Connectors/Jdbc] Update flink to 1.17.0
 Key: FLINK-31624
 URL: https://issues.apache.org/jira/browse/FLINK-31624
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: João Boto


Bump Flink version to 1.17.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31623) Improvements on DataStreamUtils#sample

2023-03-27 Thread Fan Hong (Jira)

Fan Hong created FLINK-31623:


 Summary: Improvements on DataStreamUtils#sample
 Key: FLINK-31623
 URL: https://issues.apache.org/jira/browse/FLINK-31623
 Project: Flink
  Issue Type: Improvement
  Components: Library / Machine Learning
Reporter: Fan Hong


Current implementation employs two-level sampling method, which could encounter 
two issues:
 # 
One subtask has a memory footprint twice as large as the other subtasks, which 
could cause unexpected OOM in some situations.
 
 # When data instances are imbalanced distributed among partitions (subtasks), 
the probabilities of instances to be sampled are different in different 
partitions (subtasks).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Subtask distribution in Flink

2023-03-27 Thread santhosh venkat

Hi,

I am trying to understand how subtask distribution works in Flink. Let's
assume a setup of a Flink cluster with a fixed number of TaskManagers in a
kubernetes cluster.

Let's say I have a flink job with all the operators having the same
parallelism and with the same Slot sharing group. The operator parallelism
is computed as the number of task managers multiplied by number of task
slots per TM.

1. Does this mean that  each  task slot will contain an entire pipeline in
the job?
2. Upon a TM pod failure and after K8s brings back the TM pod, would flink
assign the same subtasks back to restarted TM  again? Or will they be
distributed to different TaskManagers?

It would be great if someone can answer this question.

Thanks.

Re: [External] [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

2023-03-27 Thread Yisha Zhou

Hi guys

Thanks for your enlightening opinions which have benefited me greatly. And I 
have some experiences about configuring something on graphical IDE to share 
with you. Hopefully, these experiences can give you some useful information.

At the FFA 2021, we proposed a solution for state migration on Flink SQL[1]. 
And the key point of it is allowing users to configure operator id for 
execution plan on graphical IDE.  The graphical interface is shown as in the 
following figure.



I believe if we want to improve usability, this kind of tools is necessary.  
However the main problem is that we can't treat different versions of a SQL job 
(which consist of several SQL statements and an execution plan) as independent 
jobs. 
That means an execution plan containing user-configured TTL should be kept all 
the way and migrated after users changing their SQL statements. From the user's 
point of view, although they can understand the compiled plan, 
they can't accept the cost of manually migrating these configuration between 
versions of their SQL jobs.

In our solution, we apply configs from the old execution plan to the new one by 
operator name mapping, which is a rough version that requires enhancement.  And 
till now this feature(editing execution plan) has only been used 
in only about 100 SQL jobs (We have 10k+ SQL jobs in our company).  A truth 
that can not be ignored is that users usually tend to give up editing TTL(or 
operator ID in our case) instead of migrating this configuration between their
versions of one given job.

Therefore, before we introduce compiled plan editing into the pipeline of SQL 
jobs development, maybe we should have a discussion of how to migrate these 
plans for users. Looking forward to your opinions.

Best regards,
Yisha

[1]https://my.oschina.net/u/5941630/blog/7112726

> 2023年3月25日 22:53，Jane Chan  写道：
> 
> Hi Leonard, Jing and Shengkai,
> 
> Thanks so much for your insightful comments. Here are my thoughts
> 
> @Shengkai <>
> > 1. How the Gateway users use this feature? As far as I know, the EXEUCTE 
> > PLAN only supports local file right now. Is it possible to extend this 
> > syntax to allow for reading plan files from remote file systems?
> 
> Nice catch! Currently, the "COMPILE PLAN" and "EXECUTE PLAN" statements only 
> support a local file path without the scheme (see 
> TableEnvironmentImpl.java#L773 
> ).
>  It's reasonable to extend the support to Flink's FileSystem. Besides, the 
> JSON plan should also be added to the resource cleaning mechanism for the 
> Gateway mode, just like we do with the "ADD JAR" operation, cleaning it up 
> when the session ends. I will take your suggestion and make changes to FLIP.
> 
> > 2. I would like to inquire if there are any limitations on this feature? I 
> > have encountered several instances where the data did not expire in the 
> > upstream operator, but it expired in the downstream operator, resulting in 
> > abnormal calculation results or direct exceptions thrown by the operator 
> > (e.g. rank operator). Can we limit that the expiration time of downstream 
> > operator data should be greater than or equal to the expiration time of 
> > upstream operator data?
> 
> This is an excellent point. In fact, the current state TTL is based on the 
> initialization time of each operator, which is inherently unaligned. The 
> probability of such unalignment is magnified now that fine-grained 
> operator-level TTL is supported. While on the other hand, this FLIP is not 
> the root cause of this issue. To systematically solve the problem of TTL 
> unalignment between operators, I understand that we need a larger FLIP to 
> accomplish this. And I'll mention this point in the FLIP doc. WDYT?
> 
> Back to your suggestions, in most scenarios, the TTL between multiple state 
> operators should be non-monotonically decreasing, but there may be some 
> exceptions, such as the SinkUpsertMaterializer introduced to solve the 
> changelog disorder problem. It may not be appropriate if we block it at the 
> implementation level. But it does happen that the users misconfigure the TTL, 
> so in this case, my idea is that, since FLIP-280 
> 
>  introduces an experimental feature "EXPLAIN PLAN_ADVICE", and FLIP-190 
> 
>  also introduces a new syntax "EXPLAIN PLAN FOR '/foo/bar/sql.json'", what if 
> we add a new plan analyzer, which will analyze the compiled plan to perform 
> detection. The analyzer gives a warning attached to the optimized physical 
> plan when the TTL of the predecessor is larger than the TTL of the

Re: [DISCUSS] Planning Flink 1.18

2023-03-27 Thread Matthias Pohl

Thanks Konstantin for driving this.
I have no objections in regards to the feature freeze happening in July.

Thanks everyone for volunteering to drive 1.18. +1 on the people who
offered to help

PS: I shared learnings from the 1.17 release in the other thread [1] to not
spam this thread with 1.17-related stuff.

[1] https://lists.apache.org/thread/1mdbpdo42df4642wtmf91b26p8v2qtbf


On Sat, Mar 25, 2023 at 4:00 AM Yun Tang  wrote:

> Thanks for kick off the discussion.
>
> I'm not sure whether I am a bit late here. I had experience on releasing
> minor version of Flink-1.13.2 and also be interested in helping to release
> the next major Flink-1.18 version.
>
>
> Best
> Yun Tang
> 
> From: Sergey Nuyanzin 
> Sent: Saturday, March 25, 2023 9:46
> To: dev@flink.apache.org 
> Cc: Jing Ge ; Konstantin Knauf 
> Subject: Re: [DISCUSS] Planning Flink 1.18
>
> Thanks for starting the discussion about Flink 1.18.
> I would be interested in helping out around the release as well.
>
> On Sat, Mar 25, 2023 at 2:20 AM Qingsheng Ren  wrote:
>
> > Hi Konstantin,
> >
> > Thanks for kicking off 1.18! As mentioned by Jing we’d like to
> participate
> > in the new cycle, and hope my experience in 1.17 could help.
> >
> > About the feature freezing date, I’m also thinking to set it around mid
> > July. 1.18 will be a larger release with more features to work with
> > compared to 1.17. In 1.17 we have ~3 months for development so I think
> 3.5
> > months for 1.18 would be reasonable.
> >
> > Best,
> > Qingsheng
> >
> > On Sat, Mar 25, 2023 at 07:37 Jing Ge 
> wrote:
> >
> > > Hi Konstantin,
> > >
> > > Qingsheng and I would love to participate and join you as release
> > managers
> > > for 1.18.
> > >
> > > Speaking of the timeline, since we will have a big feature release with
> > > 1.18, a feature freeze in the middle of July would be better. It is
> also
> > > not far from your proposal.
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Fri, Mar 24, 2023 at 7:35 PM Konstantin Knauf 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > "Nach dem Spiel ist vor dem Spiel" [1] aka "There is always a next
> > > > release". With the announcement of Flink 1.17, we have entered the
> next
> > > > release cycle and with it comes the usual routine:
> > > >
> > > > - Release Managers
> > > >
> > > > I'd like to volunteer as one of the release managers this time. It
> has
> > > been
> > > > good practice to have a team of release managers from different
> > > > backgrounds, so please raise you hand if you'd like to volunteer and
> > get
> > > > involved.
> > > >
> > > > - Timeline
> > > >
> > > > Flink 1.17 was released yesterday. With a target release cycle of 4
> > > months,
> > > > we should aim for a feature freeze around the end of June.
> > > >
> > > > - Collecting Features
> > > >
> > > > While we are assembling the release management team over the next
> days
> > > and
> > > > without anticipating the outcome of any process changes the release
> > > > management team might want to try out, I think, we can already start
> > > > collecting features as usual on
> > > > https://cwiki.apache.org/confluence/display/FLINK/1.18+Release.
> > > >
> > > > -  Feedback
> > > >
> > > > As Matthias already mentioned in the release announcement: if there
> is
> > > any
> > > > feedback on the release process or wishes for the future, please
> share
> > it
> > > > with any of the Flink 1.17 release managers (Matthias, Martijn,
> Lenard
> > > and
> > > > Qingsheng) or simply in this thread here.
> > > >
> > > > Cheers,
> > > >
> > > > Konstantin
> > > >
> > > > --
> > > > https://twitter.com/snntrable
> > > > https://github.com/knaufk
> > > >
> > >
> >
>
>
> --
> Best regards,
> Sergey
>

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-27 Thread Matthias Pohl

Here are a few things I noticed from the 1.17 release retrospectively which
I want to share (other release managers might have a different view or
might disagree):

- Google Meet might not be the best choice for the release sync. We need to
be able to invite attendees even if the creator of the meeting isn't
available (maybe try Zoom or jitsi instead?)

- Release sync every 2 weeks and a switch to weekly after feature freeze
felt reasonable

- Slack worked well as a collaboration tool to document the monitoring
tasks (#builds [1], #flink-dev-benchmarks [2]) in a team with multiple
release managers

- The Slack Azure Pipeline bot seems to be buggy. It swallows some build
failures. It's not a severe issue, though. We created #builds-debug [3] to
monitor whether it's happening consistently. The issue is covered in
FLINK-30733 [4].

- Having dedicated people for monitoring the build failures helps getting a
more consistent picture of test instabilities

- We experienced occasional issues in the manual steps of the release
creation in the past (e.g. japicmp config was not properly pushed).
Creating Jira issues for the release helped to make the release creation
more transparent and made the steps more reviewable [5][6][7][8].
Additionally, it helped to distribute subtasks to different people with
Jira being the tool for documentation and synchronization. That's
especially helpful when there is more than one person in charge of creating
the release.

- We had backports/merges without PRs happening by committers occasionally
during the 1.17 release which broke master/release branches (probably,
changes were done locally before merging which were not part of the PR to
have a faster backport experience). It might make sense to remind everyone
that this should be avoided. Not sure whether we want/can restrict that.

- We observed a good response on fixing test instabilities by the end of
the release cycle but had some long running issues earlier in the cycle
which caused extra efforts on the release managers due to reoccurring test
failures.

- Release testing picked up “slowly”: Initially, we planned 2 weeks for
release testing. But there was not really any progress (tickets being
created and worked on) in the first week. In the end, we had to extend the
phase by another week resulting in 3 instead of 2 weeks of release testing.
I guess we could encourage the community to create release testing tasks
earlier and label them properly to be able to monitor the effort. That
would even enable us to do release testing for a certain feature after the
feature is done and not necessarily only at the end of the release cycle.

- Manual test data generation is tedious (FLINK-31593 [9]). But this should
be fixed in 1.18 with FLINK-27518 [10] being almost done.

- We started creating documentation for release management [11]. The goal
is to collect what tasks are there to help support a Flink release to
encourage newcomers to pick up the task.

I'm going to add these to the Flink 1.17 release documentation [12] as
feedback as well.

Best,
Matthias

[1] https://apache-flink.slack.com/archives/C03MR1HQHK2
[2] https://apache-flink.slack.com/archives/C0471S0DFJ9
[3] https://apache-flink.slack.com/archives/C04LZM3EE9E
[4] https://issues.apache.org/jira/browse/FLINK-30733
[5] https://issues.apache.org/jira/browse/FLINK-31146
[6] https://issues.apache.org/jira/browse/FLINK-31154
[7] https://issues.apache.org/jira/browse/FLINK-31562
[8] https://issues.apache.org/jira/browse/FLINK-31567
[9] https://issues.apache.org/jira/browse/FLINK-31593
[10] https://issues.apache.org/jira/browse/FLINK-27518
[11]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+Management
[12] https://cwiki.apache.org/confluence/display/FLINK/1.17+Release

On Sat, Mar 25, 2023 at 8:29 AM Hang Ruan  wrote:

> Thanks for the great work ! Congrats all!
>
> Best,
> Hang
>
> Panagiotis Garefalakis  于2023年3月25日周六 03:22写道：
>
>> Congrats all! Well done!
>>
>> Cheers,
>> Panagiotis
>>
>> On Fri, Mar 24, 2023 at 2:46 AM Qingsheng Ren  wrote:
>>
>> > I'd like to say thank you to all contributors of Flink 1.17. Your
>> support
>> > and great work together make this giant step forward!
>> >
>> > Also like Matthias mentioned, feel free to leave us any suggestions and
>> > let's improve the releasing procedure together.
>> >
>> > Cheers,
>> > Qingsheng
>> >
>> > On Fri, Mar 24, 2023 at 5:00 PM Etienne Chauchot 
>> > wrote:
>> >
>> >> Congrats to all the people involved!
>> >>
>> >> Best
>> >>
>> >> Etienne
>> >>
>> >> Le 23/03/2023 à 10:19, Leonard Xu a écrit :
>> >> > The Apache Flink community is very happy to announce the release of
>> >> Apache Flink 1.17.0, which is the first release for the Apache Flink
>> 1.17
>> >> series.
>> >> >
>> >> > Apache Flink® is an open-source unified stream and batch data
>> >> processing framework for distributed, high-performing,
>> always-available,
>> >> and accurate data applications.
>> >> >
>> >> > The release is available

Re: [VOTE] FLIP-284: Making AsyncSinkWriter Flush triggers adjustable

2023-03-27 Thread Danny Cranmer

Thanks for driving this Ahmed. This change will benefit future and current
sink implementations by giving the developer more flexibility. A good
addition following configurable rate limiting strategies [1].

+1 (binding)

Thanks,
Danny

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-242%3A+Introduce+configurable+RateLimitingStrategy+for+Async+Sink

On Sun, Mar 26, 2023 at 2:27 PM Ahmed Hamdy  wrote:

> Dear Flink developers,
>
> I want to move discussion on this FLIP[1] to voting. Discussion thread[2]
> has been inactive for more than 2 months. Please feel free to add your
> feedback below.
>
> The vote will be open for at least 72
> hours unless there is an objection or not enough votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-284+%3A+Making+AsyncSinkWriter+Flush+triggers+adjustable
> [2] https://lists.apache.org/thread/hmh2xoy34jm4r81txm3x1wv27d6vnpkw
>

Re: [DISCUSS] FLIP-304: Pluggable failure handling for Apache Flink

2023-03-27 Thread Gen Luo

Thanks for the summary!

Also +1 to support custom restart strategies in a different FLIP,
as long as we can make sure that the plugin interface won't be
changed when the restart strategy interface is introduced.

To achieve this, maybe we should think well how the handler
would cooperate with the restart strategy, like would it executes b
efore the strategy (e.g. some strategy may use the tag), or after
it (e.g. some metric reporting handler may use the handling result).
Though we can implement in one way, and extend if the other is
really necessary by someone.

Besides, instead of using either of the names, shall we just make
them two subclasses named FailureEnricher and FailureListener?
The former executes synchronously and can modify the context,
while the latter executes asynchronously and has a read-only view
of context. In this way we can make sure a handler behaves in
the expected way.


On Thu, Mar 23, 2023 at 5:19 PM Zhu Zhu  wrote:

> +1 to support custom restart strategies in a different FLIP.
>
> It's fine to have a different plugin for custom restart strategy.
> If so, since we do not treat the FLIP-304 plugin as a common failure
> handler, but instead mainly targets to add labels to errors, I would
> +1 for the name `FailureEnricher`.
>
> Thanks,
> Zhu
>
> David Morávek  于2023年3月23日周四 15:51写道：
> >
> > >
> > > One additional remark on introducing it as an async operation: We would
> > > need a new configuration parameter to define the timeout for such a
> > > listener call, wouldn't we?
> > >
> >
> > This could be left up to the implementor to handle.
> >
> > What about adding an extra method getNamespace() to the Listener
> interface
> > > which returns an Optional.
> > >
> >
> > I'd avoid mixing an additional concept into this. We can simply have a
> new
> > method that returns a set of keys the listener can output. We can
> validate
> > this at the JM startup time and fail fast (since it's a configuration
> > error) if there is an overlap. If the listener outputs the key that is
> not
> > allowed to, I wouldn't be afraid to call into a fatal error handler since
> > it's an invalid implementation.
> >
> > Best,
> > D.
> >
> > On Thu, Mar 23, 2023 at 8:34 AM Matthias Pohl
> >  wrote:
> >
> > > Sounds good. Two points I want to add:
> > >
> > >- Listener execution should be independent — however we need a way
> to
> > > > enforce a Label key/key-prefix is only assigned to a single Listener,
> > > > thinking of a validation step both at Listener init and runtime
> stages
> > > >
> > > What about adding an extra method getNamespace() to the Listener
> interface
> > > which returns an Optional. Therefore, the implementation/the
> user
> > > can decide depending on the use case whether it's necessary to have
> > > separate namespaces for the key/value pairs or not. On the Flink side,
> we
> > > would just merge the different maps considering their namespaces.
> > >
> > > A flaw of this approach is that if a user decides to use the same
> namespace
> > > for multiple listeners, how is an error in one of the listeners
> represented
> > > in the outcome? We would have to overwrite either the successful
> listener's
> > > result or the failed ones. I wanted to share it, anyway.
> > >
> > > One additional remark on introducing it as an async operation: We would
> > > need a new configuration parameter to define the timeout for such a
> > > listener call, wouldn't we?
> > >
> > > Matthias
> > >
> > > On Wed, Mar 22, 2023 at 4:56 PM Panagiotis Garefalakis <
> pga...@apache.org>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > >
> > > > Thanks for the valuable comments!
> > > > Excited to see this is an area of interest for the community!
> > > >
> > > > Summarizing some of the main points raised along with my thoughts:
> > > >
> > > >- Labels (Key/Value) pairs are more expressive than Tags
> (Strings) so
> > > >using the former is a good idea — I am also debating if we want to
> > > > return
> > > >multiple KV pairs per Listener (one could argue that we could
> split
> > > the
> > > >logic in multiple Listeners to support that)
> > > >- An immutable context along with data returned using the
> interface
> > > >method implementations is a better approach than a mutable
> Collection
> > > >- Listener execution should be independent — however we need a
> way to
> > > >enforce a Label key/key-prefix is only assigned to a single
> Listener,
> > > >thinking of a validation step both at Listener init and runtime
> stages
> > > >- We want to perform async Listener operations as sync could
> block the
> > > >main thread — exposing an ioExecutor pool through the context
> could be
> > > > an
> > > >elegant solution here
> > > >- Make sure Listener errors are not failing jobs — make sure to
> log
> > > and
> > > >keep the job alive
> > > >- We need better naming / public interface separation/description
> > > >
> > > > -  Even though

Re: [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

2023-03-27 Thread Jane Chan

Hi Shuo,

Thanks for your comments.

> The title of this FLIP is "For Table API & SQL", but this proposal
actually only covers SQL jobs via compiled json plan

Please correct me if I'm wrong, but I'm thinking that the scope of this
FLIP should be consistent with the FLIP-190. Here is an example to
illustrate the Table API usage.

tableEnv.createTemporaryTable("SourceTable",

TableDescriptor

.forConnector("datagen")
   .schema(

Schema.newBuilder()

.column("f0", DataTypes.STRING()).build())
  .option(DataGenConnectorOptions.ROWS_PER_SECOND, 100L)
  .build());
Table t = tableEnv.sqlQuery("select * from SourceTable");
tableEnv.createTemporaryTable("SinkTable",

TableDescriptor

.forConnector("print")

   .schema(Schema.newBuilder()

.column("f0", DataTypes.STRING()).build())
   .build());
t.insertInto("SinkTable")
.compilePlan()
.printJsonString(); // or writeToFile

As a result, I think the current title, "Support configuring state TTL at
operator level for Table API & SQL programs", is self-contained.
1. The proposed approach is applicable to both SQL and Table API programs
2. The configuration granularity is at the operator level

And I'm not quite sure if the FLIP title has a convention that requires the
proposed approach to be specified because I see that FLIP-190 doesn't do
that either.

Best,
Jane


On Mon, Mar 27, 2023 at 1:29 PM Shuo Cheng  wrote:

> Hi, Jane
>
> Thanks for your detailed explanation.
>
> > Maybe it deserves another FLIP to discuss whether we need a
> multiple-level state TTL configuration mechanism and how to properly
> implement it.
>
> +1, and this is also what I want to emphasize. The title of this FLIP is
> "For Table API & SQL", but this proposal actually only covers SQL jobs via
> compiled json plan, SQL users who don't want or cannot use the compiled
> json plan could not use the feature. If the FLIP title can state the scope
> more precisely , e.g., "Enhance EXECUTE PLAN to support operator-level
> state TTL configuration", it would be much more rigorous and
> self-consistent, and many concerns in the mailing list  may not exist.
> WDYT?
>
> Sincerely,
> Shuo
>
> On Fri, Mar 24, 2023 at 11:58 AM Jane Chan  wrote:
>
> > Hi Shammon and Shuo,
> >
> > Thanks for your valuable comments!
> >
> > Some thoughts:
> >
> > @Shuo
> > > I think it's more properly to say that hint does not affect the
> > equivalenceof execution plans (hash agg vs sort agg), not the equivalence
> > of execution
> > results, e.g., users can set 'scan.startup.mode' for kafka connector by
> > dynamic table option, which
> > also "intervene in the calculation of data results".
> >
> > IMO, the statement that "hint should not interfere with the calculation
> > results", means it should not interfere with internal computation. On the
> > other hand, 'scan.startup.mode' interferes with the ingestion of the
> data.
> > I think these two concepts are different, but of course, this is just my
> > opinion and welcome other views.
> >
> > > I think the final shape of state ttl configuring may like the that,
> > userscan define operator state ttl using SQL HINT (assumption...), but it
> > may
> > affects more than one stateful operators inside the same query block,
> then
> > users can further configure a specific one by modifying the compiled json
> > plan...
> >
> > Setting aside the issue of semantics, setting TTL from a higher level
> seems
> > to be attractive. This means that users only need to configure
> > 'table.exec.state.ttl' through the existing hint syntax to achieve the
> > effect. Everything is a familiar formula. But is it really the case?
> Hints
> > apply to a very broad range. Let me give an example.
> >
> > Suppose a user wants to set different TTLs for the two streams in a
> stream
> > join query. Where should the hints be written?
> >
> > -- the original query before configuring state TTL
> > create temporary view view1 as select  from my_table_1;
> > create temporary view view2 as select  from my_table_2;
> > create temporary view joined_view as
> > select view1.*, view2.* from my_view_1 a join my_view_2 b on a.join_key =
> > b.join_key;
> >
> > Option 1: declaring hints at the very beginning of the table scan
> >
> > -- should he or she write hints when declaring the first temporary view?
> > create temporary view view1 as select  from my_table_1
> > /*+(OPTIONS('table.exec.state.ttl'
> > = 'foo'))*/;
> > create temporary view view2 as select  from my_table_2
> > /*+(OPTIONS('table.exec.state.ttl'
> > = 'bar'))*/;
> > create temporary view joined_view as
> > select view1.*, view2.* from my_view_1 a join my_view_2 b on a.join_key =
> > b.join_key;
> >
> > Option 2: declaring hints when performing the join
> >
> > -- or should he or she write hints when declaring the join temporary
> view?
> > create temporary view view1 as select  from my_table_1;
> > create temporary view view2 as select  from my_table_2;
> > create

51 matches

Mail list logo