Re: [VOTE] FLIP-387: Support named parameters for functions and call procedures

2024-01-04 Thread Timo Walther
Thanks, for starting the VOTE thread and thanks for considering my 
feedback. One last comment before I'm also happy to give my +1 to this:


Why is ArgumentHint's default isOptinal=true? Shouldn't it be false by 
default? Many function implementers will forget to set this to false and 
suddenly get NULLs passed to their functions. Marking an argument as 
optional should be an explicit decision of an implementer.


Regards,
Timo


On 05.01.24 05:06, Lincoln Lee wrote:

+1 (binding)

Best,
Lincoln Lee


Benchao Li  于2024年1月5日周五 11:46写道:


+1 (binding)

Feng Jin  于2024年1月5日周五 10:49写道:


Hi everyone

Thanks for all the feedback about the FLIP-387: Support named parameters
for functions and call procedures [1] [2] .

I'd like to start a vote for it. The vote will be open for at least 72
hours(excluding weekends,until Jan 10, 12:00AM GMT) unless there is an
objection or an insufficient number of votes.



[1]


https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures

[2] https://lists.apache.org/thread/bto7mpjvcx7d7k86owb00dwrm65jx8cn


Best,
Feng Jin




--

Best,
Benchao Li







Re: FW: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-04 Thread Sergey Nuyanzin
Congratulations, Alex!

On Fri, Jan 5, 2024, 05:12 Lincoln Lee  wrote:

> Congratulations, Alex!
>
> Best,
> Lincoln Lee
>
>
> Alexander Fedulov  于2024年1月4日周四 19:08写道:
>
> > Thanks, everyone! It is great to be part of such an active and
> > collaborative community!
> >
> > Best,
> > Alex
> >
> > On Thu, 4 Jan 2024 at 10:10, Etienne Chauchot 
> > wrote:
> >
> > > Congrats! Welcome onboard.
> > >
> > > Best
> > >
> > > Etienne
> > >
> > > Le 04/01/2024 à 03:14, Jane Chan a écrit :
> > > > Congratulations, Alex!
> > > >
> > > > Best,
> > > > Jane
> > > >
> > > > On Thu, Jan 4, 2024 at 10:03 AM Junrui Lee
> > wrote:
> > > >
> > > >> Congratulations, Alex!
> > > >>
> > > >> Best,
> > > >> Junrui
> > > >>
> > > >> weijie guo  于2024年1月4日周四 09:57写道:
> > > >>
> > > >>> Congratulations, Alex!
> > > >>>
> > > >>> Best regards,
> > > >>>
> > > >>> Weijie
> > > >>>
> > > >>>
> > > >>> Steven Wu  于2024年1月4日周四 02:07写道:
> > > >>>
> > >  Congra, Alex! Well deserved!
> > > 
> > >  On Wed, Jan 3, 2024 at 2:31 AM David Radley<
> david_rad...@uk.ibm.com
> > >
> > >  wrote:
> > > 
> > > > Sorry for my typo.
> > > >
> > > > Many congratulations Alex!
> > > >
> > > > From: David Radley
> > > > Date: Wednesday, 3 January 2024 at 10:23
> > > > To: David Anderson
> > > > Cc:dev@flink.apache.org  
> > > > Subject: Re: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer -
> > > >>> Alexander
> > > > Fedulov
> > > > Many Congratulations David .
> > > >
> > > > From: Maximilian Michels
> > > > Date: Tuesday, 2 January 2024 at 12:16
> > > > To: dev
> > > > Cc: Alexander Fedulov
> > > > Subject: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer -
> > Alexander
> > > > Fedulov
> > > > Happy New Year everyone,
> > > >
> > > > I'd like to start the year off by announcing Alexander Fedulov
> as a
> > > > new Flink committer.
> > > >
> > > > Alex has been active in the Flink community since 2019. He has
> > > > contributed more than 100 commits to Flink, its Kubernetes
> > operator,
> > > > and various connectors [1][2].
> > > >
> > > > Especially noteworthy are his contributions on deprecating and
> > > > migrating the old Source API functions and test harnesses, the
> > > > enhancement to flame graphs, the dynamic rescale time computation
> > in
> > > > Flink Autoscaling, as well as all the small enhancements Alex has
> > > > contributed which make a huge difference.
> > > >
> > > > Beyond code contributions, Alex has been an active community
> member
> > > > with his activity on the mailing lists [3][4], as well as various
> > > > talks and blog posts about Apache Flink [5][6].
> > > >
> > > > Congratulations Alex! The Flink community is proud to have you.
> > > >
> > > > Best,
> > > > The Flink PMC
> > > >
> > > > [1]
> > > >
> > > >>>
> > >
> https://github.com/search?type=commits=author%3Aafedulov+org%3Aapache
> > > > [2]
> > > >
> > > >>
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-28229?jql=status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(afedulov)%20ORDER%20BY%20resolved%20DESC%2C%20created%20DESC
> > > > [3]
> > > >>>
> https://lists.apache.org/list?dev@flink.apache.org:lte=100M:Fedulov
> > > > [4]
> > > >>>
> https://lists.apache.org/list?u...@flink.apache.org:lte=100M:Fedulov
> > > > [5]
> > > >
> > > >>
> > >
> >
> https://flink.apache.org/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/
> > > > [6]
> > > >
> > > >>
> > >
> >
> https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series
> > > > Unless otherwise stated above:
> > > >
> > > > IBM United Kingdom Limited
> > > > Registered in England and Wales with number 741598
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants.
> PO6
> > > >> 3AU
> >
>


Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2024-01-04 Thread Dong Lin
Hi Lu,

I am not actively working on Flink and this JIRA recently. If Xuannan does
not plan to work on this anytime soon, I personally think it will be great
if you can help work on this FLIP. Maybe we can start the voting thread if
there is no further comment on this FLIP.

Xuannan, what do you think?

Thanks,
Dong


On Fri, Jan 5, 2024 at 2:03 AM Lu Niu  wrote:

> Hi,
>
> Is this still under active development? I notice
> https://issues.apache.org/jira/browse/FLINK-32476 is labeled as
> deprioritized. If this is the case, would it be acceptable for us to take
> on the task?
>
> Best
> Lu
>
>
>
> On Thu, Oct 19, 2023 at 4:26 PM Ken Krugler 
> wrote:
>
>> Hi Dong,
>>
>> Sorry for not seeing this initially. I did have one question about the
>> description of the issue in the FLIP:
>>
>> > However, in cases where the upstream and downstream operators do not
>> store or access references to the input or output records, this deep-copy
>> overhead becomes unnecessary
>>
>> I was interested in getting clarification as to what you meant by “or
>> access references…”, to see if it covered this situation:
>>
>> StreamX —forward--> operator1
>> StreamX —forward--> operator2
>>
>> If operator1 modifies the record, and object re-use is enabled, then
>> operator2 will see the modified version, right?
>>
>> Thanks,
>>
>> — Ken
>>
>> > On Jul 2, 2023, at 7:24 PM, Xuannan Su  wrote:
>> >
>> > Hi all,
>> >
>> > Dong(cc'ed) and I are opening this thread to discuss our proposal to
>> > add operator attribute to allow operator to specify support for
>> > object-reuse [1].
>> >
>> > Currently, the default configuration for pipeline.object-reuse is set
>> > to false to avoid data corruption, which can result in suboptimal
>> > performance. We propose adding APIs that operators can utilize to
>> > inform the Flink runtime whether it is safe to reuse the emitted
>> > records. This enhancement would enable Flink to maximize its
>> > performance using the default configuration.
>> >
>> > Please refer to the FLIP document for more details about the proposed
>> > design and implementation. We welcome any feedback and opinions on
>> > this proposal.
>> >
>> > Best regards,
>> >
>> > Dong and Xuannan
>> >
>> > [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749
>>
>> --
>> Ken Krugler
>> http://www.scaleunlimited.com
>> Custom big data solutions
>> Flink & Pinot
>>
>>
>>
>>


Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2024-01-04 Thread Dong Lin
Hi Ken,

Sorry for the late reply. I didn't notice this email from you until now.

In this scenario you described above, I don't think operator2 will see the
result modified by operato1. Note that object re-use applies only to the
transmission of data between operators in the same operator chain. But
Flink won't put StreamX, operator1 and operator2 in the same operator chain
when both operator1 and operator2 reads the same output from StreamX.

Would this answer your question?

Thanks,
Dong



On Fri, Oct 20, 2023 at 7:26 AM Ken Krugler 
wrote:

> Hi Dong,
>
> Sorry for not seeing this initially. I did have one question about the
> description of the issue in the FLIP:
>
> However, in cases where the upstream and downstream operators do not store
> or access references to the input or output records, this deep-copy
> overhead becomes unnecessary
>
>
> I was interested in getting clarification as to what you meant by “or
> access references…”, to see if it covered this situation:
>
> StreamX —forward--> operator1
> StreamX —forward--> operator2
>
> If operator1 modifies the record, and object re-use is enabled, then
> operator2 will see the modified version, right?
>
> Thanks,
>
> — Ken
>
> On Jul 2, 2023, at 7:24 PM, Xuannan Su  wrote:
>
> Hi all,
>
> Dong(cc'ed) and I are opening this thread to discuss our proposal to
> add operator attribute to allow operator to specify support for
> object-reuse [1].
>
> Currently, the default configuration for pipeline.object-reuse is set
> to false to avoid data corruption, which can result in suboptimal
> performance. We propose adding APIs that operators can utilize to
> inform the Flink runtime whether it is safe to reuse the emitted
> records. This enhancement would enable Flink to maximize its
> performance using the default configuration.
>
> Please refer to the FLIP document for more details about the proposed
> design and implementation. We welcome any feedback and opinions on
> this proposal.
>
> Best regards,
>
> Dong and Xuannan
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749
>
>
> --
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink & Pinot
>
>
>
>


Re: FW: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-04 Thread Lincoln Lee
Congratulations, Alex!

Best,
Lincoln Lee


Alexander Fedulov  于2024年1月4日周四 19:08写道:

> Thanks, everyone! It is great to be part of such an active and
> collaborative community!
>
> Best,
> Alex
>
> On Thu, 4 Jan 2024 at 10:10, Etienne Chauchot 
> wrote:
>
> > Congrats! Welcome onboard.
> >
> > Best
> >
> > Etienne
> >
> > Le 04/01/2024 à 03:14, Jane Chan a écrit :
> > > Congratulations, Alex!
> > >
> > > Best,
> > > Jane
> > >
> > > On Thu, Jan 4, 2024 at 10:03 AM Junrui Lee
> wrote:
> > >
> > >> Congratulations, Alex!
> > >>
> > >> Best,
> > >> Junrui
> > >>
> > >> weijie guo  于2024年1月4日周四 09:57写道:
> > >>
> > >>> Congratulations, Alex!
> > >>>
> > >>> Best regards,
> > >>>
> > >>> Weijie
> > >>>
> > >>>
> > >>> Steven Wu  于2024年1月4日周四 02:07写道:
> > >>>
> >  Congra, Alex! Well deserved!
> > 
> >  On Wed, Jan 3, 2024 at 2:31 AM David Radley >
> >  wrote:
> > 
> > > Sorry for my typo.
> > >
> > > Many congratulations Alex!
> > >
> > > From: David Radley
> > > Date: Wednesday, 3 January 2024 at 10:23
> > > To: David Anderson
> > > Cc:dev@flink.apache.org  
> > > Subject: Re: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer -
> > >>> Alexander
> > > Fedulov
> > > Many Congratulations David .
> > >
> > > From: Maximilian Michels
> > > Date: Tuesday, 2 January 2024 at 12:16
> > > To: dev
> > > Cc: Alexander Fedulov
> > > Subject: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer -
> Alexander
> > > Fedulov
> > > Happy New Year everyone,
> > >
> > > I'd like to start the year off by announcing Alexander Fedulov as a
> > > new Flink committer.
> > >
> > > Alex has been active in the Flink community since 2019. He has
> > > contributed more than 100 commits to Flink, its Kubernetes
> operator,
> > > and various connectors [1][2].
> > >
> > > Especially noteworthy are his contributions on deprecating and
> > > migrating the old Source API functions and test harnesses, the
> > > enhancement to flame graphs, the dynamic rescale time computation
> in
> > > Flink Autoscaling, as well as all the small enhancements Alex has
> > > contributed which make a huge difference.
> > >
> > > Beyond code contributions, Alex has been an active community member
> > > with his activity on the mailing lists [3][4], as well as various
> > > talks and blog posts about Apache Flink [5][6].
> > >
> > > Congratulations Alex! The Flink community is proud to have you.
> > >
> > > Best,
> > > The Flink PMC
> > >
> > > [1]
> > >
> > >>>
> > https://github.com/search?type=commits=author%3Aafedulov+org%3Aapache
> > > [2]
> > >
> > >>
> >
> https://issues.apache.org/jira/browse/FLINK-28229?jql=status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(afedulov)%20ORDER%20BY%20resolved%20DESC%2C%20created%20DESC
> > > [3]
> > >>> https://lists.apache.org/list?dev@flink.apache.org:lte=100M:Fedulov
> > > [4]
> > >>> https://lists.apache.org/list?u...@flink.apache.org:lte=100M:Fedulov
> > > [5]
> > >
> > >>
> >
> https://flink.apache.org/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/
> > > [6]
> > >
> > >>
> >
> https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series
> > > Unless otherwise stated above:
> > >
> > > IBM United Kingdom Limited
> > > Registered in England and Wales with number 741598
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6
> > >> 3AU
>


Re: [VOTE] FLIP-387: Support named parameters for functions and call procedures

2024-01-04 Thread Lincoln Lee
+1 (binding)

Best,
Lincoln Lee


Benchao Li  于2024年1月5日周五 11:46写道:

> +1 (binding)
>
> Feng Jin  于2024年1月5日周五 10:49写道:
> >
> > Hi everyone
> >
> > Thanks for all the feedback about the FLIP-387: Support named parameters
> > for functions and call procedures [1] [2] .
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours(excluding weekends,until Jan 10, 12:00AM GMT) unless there is an
> > objection or an insufficient number of votes.
> >
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures
> > [2] https://lists.apache.org/thread/bto7mpjvcx7d7k86owb00dwrm65jx8cn
> >
> >
> > Best,
> > Feng Jin
>
>
>
> --
>
> Best,
> Benchao Li
>


[jira] [Created] (FLINK-34000) Implement restore tests for IncrementalGroupAggregate node

2024-01-04 Thread Bonnie Varghese (Jira)
Bonnie Varghese created FLINK-34000:
---

 Summary: Implement restore tests for IncrementalGroupAggregate node
 Key: FLINK-34000
 URL: https://issues.apache.org/jira/browse/FLINK-34000
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Bonnie Varghese
Assignee: Bonnie Varghese






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-387: Support named parameters for functions and call procedures

2024-01-04 Thread Benchao Li
+1 (binding)

Feng Jin  于2024年1月5日周五 10:49写道:
>
> Hi everyone
>
> Thanks for all the feedback about the FLIP-387: Support named parameters
> for functions and call procedures [1] [2] .
>
> I'd like to start a vote for it. The vote will be open for at least 72
> hours(excluding weekends,until Jan 10, 12:00AM GMT) unless there is an
> objection or an insufficient number of votes.
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures
> [2] https://lists.apache.org/thread/bto7mpjvcx7d7k86owb00dwrm65jx8cn
>
>
> Best,
> Feng Jin



-- 

Best,
Benchao Li


[jira] [Created] (FLINK-33999) FLIP-412: Add the time-consuming span of each stage when starting the Flink job to TraceReporter

2024-01-04 Thread Rui Fan (Jira)
Rui Fan created FLINK-33999:
---

 Summary: FLIP-412: Add the time-consuming span of each stage when 
starting the Flink job to TraceReporter
 Key: FLINK-33999
 URL: https://issues.apache.org/jira/browse/FLINK-33999
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Coordination, Runtime / Metrics
Reporter: Rui Fan
Assignee: junzhong qin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-397: Add config options for administrator JVM options

2024-01-04 Thread Yong Fang
+1 (binding)

Best,
Fang Yong

On Thu, Jan 4, 2024 at 1:14 PM xiangyu feng  wrote:

> +1 (non-binding)
>
> Regards,
> Xiangyu Feng
>
> Rui Fan <1996fan...@gmail.com> 于2024年1月4日周四 13:03写道:
>
> > +1 (binding)
> >
> > Best,
> > Rui
> >
> > On Thu, Jan 4, 2024 at 11:45 AM Benchao Li  wrote:
> >
> > > +1 (binding)
> > >
> > > Zhanghao Chen  于2024年1月4日周四 10:30写道:
> > > >
> > > > Hi everyone,
> > > >
> > > > Thanks for all the feedbacks on FLIP-397 [1], which proposes to add a
> > > set of default JVM options for administrator use that prepends the
> > user-set
> > > extra JVM options for easier platform-wide JVM pre-tuning. It has been
> > > discussed in [2].
> > > >
> > > > I'd like to start a vote. The vote will be open for at least 72 hours
> > > (until January 8th 12:00 GMT) unless there is an objection or
> > insufficient
> > > votes.
> > > >
> > > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options
> > > > [2] https://lists.apache.org/thread/cflonyrfd1ftmyrpztzj3ywckbq41jzg
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>


[VOTE] FLIP-387: Support named parameters for functions and call procedures

2024-01-04 Thread Feng Jin
Hi everyone

Thanks for all the feedback about the FLIP-387: Support named parameters
for functions and call procedures [1] [2] .

I'd like to start a vote for it. The vote will be open for at least 72
hours(excluding weekends,until Jan 10, 12:00AM GMT) unless there is an
objection or an insufficient number of votes.



[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures
[2] https://lists.apache.org/thread/bto7mpjvcx7d7k86owb00dwrm65jx8cn


Best,
Feng Jin


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-04 Thread Yong Fang
Hi Ken,

Sorry for the late reply. After discussing with @Xintong, we think it is
better to keep the method names in the FLIP mainly for the following
reasons:

1. This FLIP is mainly to support the configurable serializer while keeping
consistent with Flink at the semantic layer. Keeping the existing naming
rules can facilitate user understanding.

2. In the future, if Flink can choose Fury as the generic serializer, we
can update the corresponding methods in that FLIP after the discussion of
Fury is completed. This will be a minor modification, and we can avoid
over-design in the current FLIP.

Thanks for your feedback!

Best,
Fang Yong

On Fri, Dec 29, 2023 at 12:38 PM Ken Krugler 
wrote:

> Hi Xintong,
>
> I agree that decoupling from Kryo is a bigger topic, well beyond the scope
> of this FLIP.
>
> The reason I’d brought up Fury is that this increases my confidence that
> Flink will want to decouple from Kryo sooner rather than later.
>
> So I feel it would be worth investing in a (minor) name change now, to
> improve that migration path in the future. Thus my suggestion for avoiding
> the explicit use of Kryo in method names.
>
> Regards,
>
> — Ken
>
>
>
>
> > On Dec 17, 2023, at 7:16 PM, Xintong Song  wrote:
> >
> > Hi Ken,
> >
> > I think the main purpose of this FLIP is to change how users interact
> with
> > the knobs for customizing the serialization behaviors, from requiring
> code
> > changes to working with pure configurations. Redesigning the knobs (i.e.,
> > names, semantics, etc.), on the other hand, is not the purpose of this
> > FLIP. Preserving the existing names and semantics should also help
> minimize
> > the migration cost for existing users. Therefore, I'm in favor of not
> > changing them.
> >
> > Concerning decoupling from Kryo, and introducing other serialization
> > frameworks like Fury, I think that's a bigger topic that is worth further
> > discussion. At the moment, I'm not aware of any community consensus on
> > doing so. And even if in the future we decide to do so, the changes
> needed
> > should be the same w/ or w/o this FLIP. So I'd suggest not to block this
> > FLIP on these issues.
> >
> > WDYT?
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler  >
> > wrote:
> >
> >> Hi Yong,
> >>
> >> Looks good, thanks for creating this.
> >>
> >> One comment - related to my recent email about Fury, I would love to see
> >> the v2 serialization decoupled from Kryo.
> >>
> >> As part of that, instead of using xxxKryo in methods, call them
> xxxGeneric.
> >>
> >> A more extreme change would be to totally rely on Fury (so no more POJO
> >> serializer). Fury is faster than the POJO serializer in my tests, but
> this
> >> would be a much bigger change.
> >>
> >> Though it could dramatically simplify the Flink serialization support.
> >>
> >> — Ken
> >>
> >> PS - a separate issue is how to migrate state from Kryo to something
> like
> >> Fury, which supports schema evolution. I think this might be possible,
> by
> >> having a smarter deserializer that identifies state as being created by
> >> Kryo, and using (shaded) Kryo to deserialize, while still writing as
> Fury.
> >>
> >>> On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
> >>>
> >>> Hi devs,
> >>>
> >>> I'd like to start a discussion about FLIP-398: Improve Serialization
> >>> Configuration And Usage In Flink [1].
> >>>
> >>> Currently, users can register custom data types and serializers in
> Flink
> >>> jobs through various methods, including registration in code,
> >>> configuration, and annotations. These lead to difficulties in upgrading
> >>> Flink jobs and priority issues.
> >>>
> >>> In flink-2.0 we would like to manage job data types and serializers
> >> through
> >>> configurations. This FLIP will introduce a unified option for data type
> >> and
> >>> serializer and users can configure all custom data types and
> >>> pojo/kryo/custom serializers. In addition, this FLIP will add more
> >> built-in
> >>> serializers for complex data types such as List and Map, and optimize
> the
> >>> management of Avro Serializers.
> >>>
> >>> Looking forward to hearing from you, thanks!
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> >>>
> >>> Best,
> >>> Fang Yong
> >>
> >> --
> >> Ken Krugler
> >> http://www.scaleunlimited.com
> >> Custom big data solutions
> >> Flink & Pinot
> >>
> >>
> >>
> >>
>
>
>
> --
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink & Pinot
>
>
>
>


[jira] [Created] (FLINK-33998) Flink Job Manager restarted after kube-apiserver connection intermittent

2024-01-04 Thread Xiangyan (Jira)
Xiangyan created FLINK-33998:


 Summary: Flink Job Manager restarted after kube-apiserver 
connection intermittent
 Key: FLINK-33998
 URL: https://issues.apache.org/jira/browse/FLINK-33998
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.13.6
 Environment: Kubernetes 1.24

Flink Operator 1.4

Flink 1.13.6
Reporter: Xiangyan
 Attachments: audit-log-no-restart.txt, audit-log-restart.txt, 
connection timeout.png, jm-no-restart4.log, jm-restart4.log

We are running Flink on AWS EKS and experienced Job Manager restart issue when 
EKS control plane scaled up/in.

I can reproduce this issue in my local environment too.

Since I have no control of EKS kube-apiserver, I built a Kubernetes cluster by 
my own with below setup:
 * Two kube-apiserver, only one is running at a time;
 * Deploy multiple Flink clusters (with Flink Operator 1.4 and Flink 1.13);
 * Enable Flink Job Manager HA;
 * Configure Job Manager leader election timeout;

high-availability.kubernetes.leader-election.lease-duration: "60s"
high-availability.kubernetes.leader-election.renew-deadline: "60s"
 
For testing, I switch the running kube-apiserver from one instance to another 
each time. When the kube-apiserver is switching, I can see that some Job 
Managers restart, but some are still running normally.

Here is an example. When kube-apiserver swatched over at 05:{{{}*53*{}}}:08, 
both JM lost connection to kube-apiserver. But there is no more connection 
error within a few seconds. I guess the connection recovered by retry.

However, one of the JM (the 2nd one in the attached screen shot) reported 
"leadership revoked" error after the leader election timeout (at 
05:{{{}*54*{}}}:08) and then restarted itself. While the other JM was still 
running normally.

>From kube-apiserver audit logs, the normal JM was able to renew leader lease 
>after the interruption. But there is no any lease renew request from the 
>failed JM until it restarted.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-shaded 18.0, release candidate #1

2024-01-04 Thread Sergey Nuyanzin
Bubble up, we need more votes, especially from PMC members.

On Thu, Dec 28, 2023 at 1:29 PM Martijn Visser 
wrote:

> Hi,
>
> +1 (binding)
>
> - Validated hashes
> - Verified signature
> - Verified that no binaries exist in the source archive
> - Build the source with Maven
> - Verified licenses
> - Verified web PRs
>
> Best regards,
>
> Martijn
>
> On Mon, Dec 11, 2023 at 12:11 AM Sergey Nuyanzin 
> wrote:
> >
> > Hey everyone,
> >
> > The vote for flink-shaded 18.0 is still open. Please test and vote for
> > rc1, so that we can release it.
> >
> > On Thu, Nov 30, 2023 at 4:03 PM Jing Ge 
> wrote:
> >
> > > +1(not binding)
> > >
> > > - validate checksum
> > > - validate hash
> > > - checked the release notes
> > > - verified that no binaries exist in the source archive
> > > - build the source with Maven 3.8.6 and jdk11
> > > - checked repo
> > > - checked tag
> > > - verified web PR
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Thu, Nov 30, 2023 at 11:39 AM Sergey Nuyanzin 
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > - Downloaded all the resources
> > > > - Validated checksum hash
> > > > - Build the source with Maven and jdk8
> > > > - Build Flink master with new flink-shaded and check that all the
> tests
> > > are
> > > > passing
> > > >
> > > > one minor thing that I noticed during releasing: for ci it uses maven
> > > 3.8.6
> > > > at the same time for release profile there is an enforcement plugin
> to
> > > > check that maven version is less than 3.3
> > > > I created a jira issue[1] for that
> > > > i made the release with 3.2.5 maven version (I suppose previous
> version
> > > was
> > > > also done with 3.2.5 because of same issue)
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-33703
> > > >
> > > > On Wed, Nov 29, 2023 at 11:41 AM Matthias Pohl <
> matthias.p...@aiven.io>
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > * Downloaded all resources
> > > > > * Extracts sources and compilation on these sources
> > > > > * Diff of git tag checkout with downloaded sources
> > > > > * Verifies SHA512 checksums & GPG certification
> > > > > * Checks that all POMs have the right expected version
> > > > > * Generated diffs to compare pom file changes with NOTICE files:
> > > Nothing
> > > > > suspicious found except for a minor (non-blocking) typo [1]
> > > > >
> > > > > Thanks for driving this effort, Sergey. :)
> > > > >
> > > > > [1]
> https://github.com/apache/flink-shaded/pull/126/files#r1409080162
> > > > >
> > > > > On Wed, Nov 29, 2023 at 10:25 AM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > > >
> > > > >> Sorry, it's non-binding.
> > > > >>
> > > > >> On Wed, Nov 29, 2023 at 5:19 PM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > > >>
> > > > >> > Thanks Matthias for the clarification!
> > > > >> >
> > > > >> > After I import the latest KEYS, it works fine.
> > > > >> >
> > > > >> > +1(binding)
> > > > >> >
> > > > >> > - Validated checksum hash
> > > > >> > - Verified signature
> > > > >> > - Verified that no binaries exist in the source archive
> > > > >> > - Build the source with Maven and jdk8
> > > > >> > - Verified licenses
> > > > >> > - Verified web PRs, and left a comment
> > > > >> >
> > > > >> > Best,
> > > > >> > Rui
> > > > >> >
> > > > >> > On Wed, Nov 29, 2023 at 5:05 PM Matthias Pohl
> > > > >> >  wrote:
> > > > >> >
> > > > >> >> The key is the last key in the KEYS file. It's just having a
> > > > different
> > > > >> >> format with spaces being added (due to different gpg
> versions?):
> > > F752
> > > > >> 9FAE
> > > > >> >> 2481 1A5C 0DF3  CA74 1596 BBF0 7268 35D8
> > > > >> >>
> > > > >> >> On Wed, Nov 29, 2023 at 9:41 AM Rui Fan <1996fan...@gmail.com>
> > > > wrote:
> > > > >> >>
> > > > >> >> > Hey Sergey,
> > > > >> >> >
> > > > >> >> > Thank you for driving this release.
> > > > >> >> >
> > > > >> >> > I try to check this signature, the whole key is
> > > > >> >> > F7529FAE24811A5C0DF3CA741596BBF0726835D8,
> > > > >> >> > it matches your 1596BBF0726835D8, but I cannot
> > > > >> >> > find it from the Flink KEYS[1].
> > > > >> >> >
> > > > >> >> > Please correct me if my operation is wrong, thanks~
> > > > >> >> >
> > > > >> >> > [1] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > >> >> >
> > > > >> >> > Best,
> > > > >> >> > Rui
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Wed, Nov 29, 2023 at 6:09 AM Sergey Nuyanzin <
> > > > snuyan...@gmail.com
> > > > >> >
> > > > >> >> > wrote:
> > > > >> >> >
> > > > >> >> > > Hi everyone,
> > > > >> >> > > Please review and vote on the release candidate #1 for the
> > > > version
> > > > >> >> 18.0,
> > > > >> >> > as
> > > > >> >> > > follows:
> > > > >> >> > > [ ] +1, Approve the release
> > > > >> >> > > [ ] -1, Do not approve the release (please provide specific
> > > > >> comments)
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > The complete staging area is available for your review,
> which
> > > > >> >> includes:
> > > > >> >> > > * JIRA 

Re: [NOTICE] Hive connector externalization

2024-01-04 Thread Sergey Nuyanzin
Hi Martijn
thanks for reminding
yep, I think you are right we need a release for that.

IIRC so far there is no volunteers for that, so I would volunteer

On Thu, Dec 28, 2023 at 1:27 PM Martijn Visser 
wrote:

> Hi Sergey,
>
> Is the next step that we need to generate a release of the
> externalized code? Did someone already volunteer for that?
>
> Best regards,
>
> Martijn
>
> On Mon, Dec 11, 2023 at 3:00 AM yuxia  wrote:
> >
> > Thanks Sergey for the work. Happy to see we can externalize Hive
> connector finally.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "snuyanzin" 
> > 收件人: "dev" 
> > 发送时间: 星期六, 2023年 12 月 09日 上午 6:24:35
> > 主题: [NOTICE] Hive connector externalization
> >
> > Hi everyone
> >
> > We are getting close to the externalization of Hive connector[1].
> > Since currently externalized version is already passing tests against
> > release-1.18 and release-1.19 then I'm going to remove Hive connector
> code
> > from Flink main repo[2]. For that reason I would kindly ask to avoid
> > merging of Hive connector related changes to Flink main repo (master
> > branch) in order to make this smoother. Instead it would be better to
> > create/merge  prs to connector's repo[3]
> >
> > Also huge shoutout to Yuxia Luo, Martijn Visser, Ryan Skraba for the
> review
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-30064
> > [2] https://issues.apache.org/jira/browse/FLINK-33786
> > [3] https://github.com/apache/flink-connector-hive
> >
> > --
> > Best regards,
> > Sergey
>


-- 
Best regards,
Sergey


[jira] [Created] (FLINK-33997) Typo in the doc `classloader.parent-first-patterns-additional`

2024-01-04 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-33997:
-

 Summary: Typo in the doc 
`classloader.parent-first-patterns-additional`
 Key: FLINK-33997
 URL: https://issues.apache.org/jira/browse/FLINK-33997
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.18.0
Reporter: Matyas Orhidi


Typo in the doc:
[https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/debugging/debugging_classloading/#unloading-of-dynamically-loaded-classes-in-user-code]

classloader.parent-first-patterns-additional -> 
classloader.parent-first-patterns.additional



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2024-01-04 Thread Lu Niu
Hi,

Is this still under active development? I notice
https://issues.apache.org/jira/browse/FLINK-32476 is labeled as
deprioritized. If this is the case, would it be acceptable for us to take
on the task?

Best
Lu



On Thu, Oct 19, 2023 at 4:26 PM Ken Krugler 
wrote:

> Hi Dong,
>
> Sorry for not seeing this initially. I did have one question about the
> description of the issue in the FLIP:
>
> > However, in cases where the upstream and downstream operators do not
> store or access references to the input or output records, this deep-copy
> overhead becomes unnecessary
>
> I was interested in getting clarification as to what you meant by “or
> access references…”, to see if it covered this situation:
>
> StreamX —forward--> operator1
> StreamX —forward--> operator2
>
> If operator1 modifies the record, and object re-use is enabled, then
> operator2 will see the modified version, right?
>
> Thanks,
>
> — Ken
>
> > On Jul 2, 2023, at 7:24 PM, Xuannan Su  wrote:
> >
> > Hi all,
> >
> > Dong(cc'ed) and I are opening this thread to discuss our proposal to
> > add operator attribute to allow operator to specify support for
> > object-reuse [1].
> >
> > Currently, the default configuration for pipeline.object-reuse is set
> > to false to avoid data corruption, which can result in suboptimal
> > performance. We propose adding APIs that operators can utilize to
> > inform the Flink runtime whether it is safe to reuse the emitted
> > records. This enhancement would enable Flink to maximize its
> > performance using the default configuration.
> >
> > Please refer to the FLIP document for more details about the proposed
> > design and implementation. We welcome any feedback and opinions on
> > this proposal.
> >
> > Best regards,
> >
> > Dong and Xuannan
> >
> > [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749
>
> --
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink & Pinot
>
>
>
>


[jira] [Created] (FLINK-33996) Support disabling project rewrite when multiple exprs in the project reference the same project.

2024-01-04 Thread Feng Jin (Jira)
Feng Jin created FLINK-33996:


 Summary: Support disabling project rewrite when multiple exprs in 
the project reference the same project.
 Key: FLINK-33996
 URL: https://issues.apache.org/jira/browse/FLINK-33996
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Runtime
Affects Versions: 1.18.0
Reporter: Feng Jin


When multiple top projects reference the same bottom project, project rewrite 
rules may result in complex projects being calculated multiple times.

Take the following SQL as an example:

{code:sql}
create table test_source(a varchar) with ('connector'='datagen');

explan plan for select a || 'a' as a, a || 'b' as b FROM (select 
REGEXP_REPLACE(a, 'aaa', 'bbb') as a FROM test_source);
{code}


The final SQL plan is as follows:


{code:sql}
== Abstract Syntax Tree ==
LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')])
+- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')])
   +- LogicalTableScan(table=[[default_catalog, default_database, test_source]])

== Optimized Physical Plan ==
Calc(select=[||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), 
_UTF-16LE'a') AS a, ||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), 
_UTF-16LE'b') AS b])
+- TableSourceScan(table=[[default_catalog, default_database, test_source]], 
fields=[a])

== Optimized Execution Plan ==
Calc(select=[||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'a') AS a, 
||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'b') AS b])
+- TableSourceScan(table=[[default_catalog, default_database, test_source]], 
fields=[a])
{code}

It can be observed that after project write, regex_place is calculated twice. 
Generally speaking, regular expression matching is a time-consuming operation 
and we usually do not want it to be calculated multiple times. Therefore, for 
this scenario, we can support disabling project rewrite.

After disabling some rules, the final plan we obtained is as follows:


{code:sql}
== Abstract Syntax Tree ==
LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')])
+- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')])
   +- LogicalTableScan(table=[[default_catalog, default_database, test_source]])

== Optimized Physical Plan ==
Calc(select=[||(a, _UTF-16LE'a') AS a, ||(a, _UTF-16LE'b') AS b])
+- Calc(select=[REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb') AS a])
   +- TableSourceScan(table=[[default_catalog, default_database, test_source]], 
fields=[a])

== Optimized Execution Plan ==
Calc(select=[||(a, 'a') AS a, ||(a, 'b') AS b])
+- Calc(select=[REGEXP_REPLACE(a, 'aaa', 'bbb') AS a])
   +- TableSourceScan(table=[[default_catalog, default_database, test_source]], 
fields=[a])
{code}


After testing, we probably need to modify these few rules:

org.apache.flink.table.planner.plan.rules.logical.FlinkProjectMergeRule

org.apache.flink.table.planner.plan.rules.logical.FlinkCalcMergeRule

org.apache.flink.table.planner.plan.rules.logical.FlinkProjectMergeRule








--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33995) Add test in test_file_sink.sh s3 StreamingFileSink for csv

2024-01-04 Thread Samrat Deb (Jira)
Samrat Deb created FLINK-33995:
--

 Summary: Add test in test_file_sink.sh s3 StreamingFileSink for 
csv 
 Key: FLINK-33995
 URL: https://issues.apache.org/jira/browse/FLINK-33995
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / FileSystem
Reporter: Samrat Deb


test_file_sink.sh s3 StreamingFileSink doesnt have coverage for csv format . 

this task will add new test case to cover when format is `csv`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33994) Use Datadog api key from environment variables if not set in conf

2024-01-04 Thread Sweta Kalakuntla (Jira)
Sweta Kalakuntla created FLINK-33994:


 Summary: Use Datadog api key from environment variables if not set 
in conf
 Key: FLINK-33994
 URL: https://issues.apache.org/jira/browse/FLINK-33994
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib, Runtime / Metrics
Reporter: Sweta Kalakuntla


Add a way to set Datadog API key from the environment variables. This way 
during deployment, there is way to set the value from secrets/vault instead of 
hardcoding key into code.

Someone has created PR :

[https://github.com/apache/flink/pull/19684/files] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33993) Ineffective scaling detection events are misleading

2024-01-04 Thread Maximilian Michels (Jira)
Maximilian Michels created FLINK-33993:
--

 Summary: Ineffective scaling detection events are misleading
 Key: FLINK-33993
 URL: https://issues.apache.org/jira/browse/FLINK-33993
 Project: Flink
  Issue Type: Bug
  Components: Autoscaler, Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: kubernetes-operator-1.8.0


When the ineffective scaling decision feature is turned off, events are 
regenerated which look like this:

{noformat}
Skipping further scale up after ineffective previous scale up for 
65c763af14a952c064c400d516c25529
{noformat}

This is misleading because no action will be taken. It is fair to inform users 
about ineffective scale up even when the feature is disabled but a different 
message should be printed to convey that no action will be taken.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-406: Reorganize State & Checkpointing & Recovery Configuration

2024-01-04 Thread Piotr Nowojski
Hi,

Thanks for trying to clean this up! I don't have strong opinions on the
topics discussed here, so generally speaking +1 from my side!

Best,
Piotrek

śr., 3 sty 2024 o 04:16 Rui Fan <1996fan...@gmail.com> napisał(a):

> Thanks for the feedback!
>
> Using the `execution.checkpointing.incremental.enabled`,
> and enabling it by default sounds good to me.
>
> Best,
> Rui
>
> On Wed, Jan 3, 2024 at 11:10 AM Zakelly Lan  wrote:
>
> > Hi Rui,
> >
> > Thanks for your comments!
> >
> > IMO, given that the state backend can be plugably loaded (as you can
> > specify a state backend factory), I prefer not providing state backend
> > specified options in the framework.
> >
> > Secondly, the incremental checkpoint is actually a sharing file strategy
> > across checkpoints, which means the state backend *could* reuse files
> from
> > previous cp but not *must* do so. When the state backend could not reuse
> > the files, it is reasonable to fallback to a full checkpoint.
> >
> > Thus, I suggest we make it `execution.checkpointing.incremental` and
> enable
> > it by default. For those state backends not supporting this, they perform
> > full checkpoints and print a warning to inform users. Users do not need
> to
> > pay special attention to different options to control this across
> different
> > state backends. This is more user-friendly in my opinion. WDYT?
> >
> > On Tue, Jan 2, 2024 at 10:49 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Hi Zakelly,
> > >
> > > I'm not sure whether we could add the state backend type in the
> > > new key name of state.backend.incremental. It means we use
> > > `execution.checkpointing.rocksdb-incremental` or
> > > `execution.checkpointing.rocksdb-incremental.enabled`.
> > >
> > > So far, state.backend.incremental only works for rocksdb state backend.
> > > And this feature or optimization is very valuable and huge for large
> > > state flink jobs. I believe it's enabled for most production flink jobs
> > > with large rocksdb state.
> > >
> > > If this option isn't generic for all state backend types, I guess we
> > > can enable `execution.checkpointing.rocksdb-incremental.enabled`
> > > by default in Flink 2.0.
> > >
> > > But if it works for all state backends, it's hard to enable it by
> > default.
> > > Enabling great and valuable features or improvements are useful
> > > for users, especially a lot of new flink users. Out-of-the-box options
> > > are good for users.
> > >
> > > WDYT?
> > >
> > > Best,
> > > Rui
> > >
> > > On Fri, Dec 29, 2023 at 1:45 PM Zakelly Lan 
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Thanks all for your comments!
> > > >
> > > > As many of you have questions about the names for boolean options, I
> > > > suggest we make a naming rule for them. For now I could think of
> three
> > > > options:
> > > >
> > > > Option 1: Use enumeration options if possible. But this may cause
> some
> > > name
> > > > collisions or confusion as we discussed and we should unify the
> > statement
> > > > everywhere.
> > > > Option 2: Use boolean options and add 'enabled' as the suffix.
> > > > Option 3: Use boolean options and ONLY add 'enabled' when there are
> > more
> > > > detailed configurations under the same prefix, to prevent one name
> from
> > > > serving as a prefix to another.
> > > >
> > > > I am slightly inclined to Option 3, since it is more in line with
> > current
> > > > practice and friendly for existing users. Also It reduces the length
> of
> > > > configuration names as much as possible. I really want to hear your
> > > > opinions.
> > > >
> > > >
> > > > @Xuannan
> > > >
> > > > I agree with your comments 1 and 3.
> > > >
> > > > For 2, If we decide to change the name, maybe
> > > > `execution.checkpointing.parallel-cleaner` is better? And as for
> > whether
> > > to
> > > > add 'enabled' I suggest we discuss the rule above. WDYT?
> > > > Thanks!
> > > >
> > > >
> > > > Best,
> > > > Zakelly
> > > >
> > > > On Fri, Dec 29, 2023 at 12:02 PM Xuannan Su 
> > > wrote:
> > > >
> > > > > Hi Zakelly,
> > > > >
> > > > > Thanks for driving this! The organization of the configuration
> option
> > > > > in the FLIP looks much cleaner and easier to understand. +1 to the
> > > > > FLIP.
> > > > >
> > > > > Just some questions from me.
> > > > >
> > > > > 1. I think the change to the ConfigOptions should be put in the
> > > > > `Public Interface` section, instead of `Proposed Changed`, as those
> > > > > configuration options are public interface.
> > > > >
> > > > > 2. The key `state.checkpoint.cleaner.parallel-mode` seems
> confusing.
> > > > > It feels like it is used to choose different modes. In fact, it is
> a
> > > > > boolean flag to indicate whether to enable parallel clean. How
> about
> > > > > making it `state.checkpoint.cleaner.parallel-mode.enabled`?
> > > > >
> > > > > 3. The `execution.checkpointing.write-buffer` may better be
> > > > > `execution.checkpointing.write-buffer-size` so that we know it is
> > > > > configuring the 

[jira] [Created] (FLINK-33992) Add option to fetch the jar from private repository in FlinkSessionJob

2024-01-04 Thread Sweta Kalakuntla (Jira)
Sweta Kalakuntla created FLINK-33992:


 Summary: Add option to fetch the jar from private repository in 
FlinkSessionJob
 Key: FLINK-33992
 URL: https://issues.apache.org/jira/browse/FLINK-33992
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Sweta Kalakuntla


FlinkSessionJob spec does not have a capability to download job jar from remote 
private repository. It can currently only download from public repositories. 

Adding capability to supply credentials  to the *spec.job.jarURI* in 
FlinkSessionJob, will solve that problem.

If I use initContainer to download the jar in FlinkDeployment and try to access 
that in FlinkSessionJob, the operator is unable to find the jar in the defined 
path.
---
apiVersion: flink.apache.org/v1beta1
kind: FlinkSessionJob
metadata:
  name: job1
spec:
  deploymentName: session-cluster
  job:
jarURI: file:///opt/flink/job.jar
parallelism: 4
upgradeMode: savepoint
(edited)
caused by: java.io.FileNotFoundException: /opt/flink/job.jar (No such file or 
directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(Unknown Source)
at java.base/java.io.FileInputStream.(Unknown Source)
at 
org.apache.flink.core.fs.local.LocalDataInputStream.(LocalDataInputStream.java:50)
at org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:134)
at 
org.apache.flink.kubernetes.operator.artifact.FileSystemBasedArtifactFetcher.fetch(FileSystemBasedArtifactFetcher.java:44)
at 
org.apache.flink.kubernetes.operator.artifact.ArtifactManager.fetch(ArtifactManager.java:63)
at 
org.apache.flink.kubernetes.operator.service.AbstractFlinkService.uploadJar(AbstractFlinkService.java:707)
at 
org.apache.flink.kubernetes.operator.service.AbstractFlinkService.submitJobToSessionCluster(AbstractFlinkService.java:212)
at 
org.apache.flink.kubernetes.operator.reconciler.sessionjob.SessionJobReconciler.deploy(SessionJobReconciler.java:73)
at 
org.apache.flink.kubernetes.operator.reconciler.sessionjob.SessionJobReconciler.deploy(SessionJobReconciler.java:44)
at 
org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:120)
at 
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:109)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33991) Custom Error Handling for Kinesis Polling Consumer

2024-01-04 Thread Emre Kartoglu (Jira)
Emre Kartoglu created FLINK-33991:
-

 Summary: Custom Error Handling for Kinesis Polling Consumer 
 Key: FLINK-33991
 URL: https://issues.apache.org/jira/browse/FLINK-33991
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kinesis
Affects Versions: aws-connector-4.2.0
Reporter: Emre Kartoglu


We introduced custom error handling for the Kinesis EFO Consumer as part of 
https://issues.apache.org/jira/browse/FLINK-33260

PR for the EFO consumer: https://github.com/apache/flink-connector-aws/pull/110

 

This ticket is to apply the same logic to the Kinesis Polling Consumer in the 
same codebase.

Current configuration for the EFO consumer looks like:

{{}}
{code:java}
flink.shard.consumer.error.recoverable[0].exception=java.net.UnknownHostException
flink.shard.consumer.error.recoverable[1].exception=java.net.SocketTimeoutException
 {code}
{{}}

We should re-use the same code for the polling consumer.{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [Discuss][Flink-31326] Flink autoscaler code

2024-01-04 Thread Maximilian Michels
We discussed in the PR that it's actually a feature, but thanks Yang
for bringing it up and improving the docs around this piece of code!

-Max

On Tue, Jan 2, 2024 at 10:06 PM Yang LI  wrote:
>
> Hello Rui,
>
> Here is the jira ticket https://issues.apache.org/jira/browse/FLINK-33966, I 
> have pushed a tiny pr for this ticket.
>
> Regards,
> Yang
>
> On Tue, 2 Jan 2024 at 16:15, Rui Fan <1996fan...@gmail.com> wrote:
>>
>> Thanks Yang for reporting this issue!
>>
>> You are right, these 2 conditions are indeed the same. It's unexpected IIUC.
>> Would you like to fix it?
>>
>> Feel free to create a FLINK JIRA to fix it if you would like to, and I'm
>> happy to
>> review!
>>
>> And cc @Maximilian Michels 
>>
>> Best,
>> Rui
>>
>> On Tue, Jan 2, 2024 at 11:03 PM Yang LI  wrote:
>>
>> > Hello,
>> >
>> > I see we have 2 times the same condition check in the
>> > function getNumRecordsInPerSecond (L220
>> > <
>> > https://github.com/apache/flink-kubernetes-operator/blob/main/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/metrics/ScalingMetrics.java#L220
>> > >
>> > and
>> > L224
>> > <
>> > https://github.com/apache/flink-kubernetes-operator/blob/main/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/metrics/ScalingMetrics.java#L224
>> > >).
>> > I imagine you want to use SOURCE_TASK_NUM_RECORDS_OUT_PER_SEC when the
>> > operator is not the source. Can you confirm this and if we have a FIP
>> > ticket to fix this?
>> >
>> > Regards,
>> > Yang LI
>> >


[jira] [Created] (FLINK-33990) Use default classloader in TaskManager when there are no user jars for job

2024-01-04 Thread Dan Zou (Jira)
Dan Zou created FLINK-33990:
---

 Summary: Use default classloader in TaskManager when there are no 
user jars for job
 Key: FLINK-33990
 URL: https://issues.apache.org/jira/browse/FLINK-33990
 Project: Flink
  Issue Type: Sub-task
Reporter: Dan Zou


TaskManager will create a new class loader for each flink job even when it has 
no user jars, which may cause metaspace increasing. Flink can use system 
classloader for the jobs without jars. A similar optimization has been made in 
JM, it make sense to optimize it in TM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33989) Insert Statement With Filter Operation Generates Extra Tombstone in Kafka

2024-01-04 Thread Flaviu Cicio (Jira)
Flaviu Cicio created FLINK-33989:


 Summary: Insert Statement With Filter Operation Generates Extra 
Tombstone in Kafka
 Key: FLINK-33989
 URL: https://issues.apache.org/jira/browse/FLINK-33989
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.17.2
Reporter: Flaviu Cicio


Given the following Flink SQL tables:
{code:sql}
CREATE TABLE input (
  id STRING NOT NULL, 
  current_value STRING NOT NULL, 
  PRIMARY KEY (id) NOT ENFORCED
) WITH (
  'connector' = 'upsert-kafka', 'topic' = 'input', 
  'key.format' = 'raw', 'properties.bootstrap.servers' = 'sn-kafka:29092', 
  'properties.group.id' = 'your_group_id', 
  'value.format' = 'json'
);

CREATE TABLE output (
  id STRING NOT NULL, 
  current_value STRING NOT NULL, 
  PRIMARY KEY (id) NOT ENFORCED
) WITH (
  'connector' = 'upsert-kafka', 'topic' = 'output', 
  'key.format' = 'raw', 'properties.bootstrap.servers' = 'sn-kafka:29092', 
  'properties.group.id' = 'your_group_id', 
  'value.format' = 'json'
); {code}
And, the following entries are present in the input Kafka topic:
{code:json}
[
  {
    "id": "1",
    "current_value": "abc"
  },
  {
    "id": "1",
    "current_value": "abcd"
  }
]{code}
If we execute the following statement:
{code:sql}
INSERT INTO output SELECT id, current_value FROM input; {code}
The following entries are published to the output Kafka topic:
{code:json}
[
  {
    "id": "1",
    "current_value": "abc"
  },
  {
    "id": "1",
    "current_value": "abcd"
  }
]{code}
 

But, if we execute the following statement:
{code:sql}
INSERT INTO output SELECT id, current_value FROM input WHERE id IN ('1'); {code}
The following entries are published:
{code:json}
[
  {
    "id": "1",
    "current_value": "abc"
  },
  null,
  {
    "id": "1",
    "current_value": "abcd"
  }
]{code}
We would expect the result to be the same for both insert statements.

As we can see, there is an extra tombstone generated as a result of the second 
statement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: FW: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-04 Thread Alexander Fedulov
Thanks, everyone! It is great to be part of such an active and
collaborative community!

Best,
Alex

On Thu, 4 Jan 2024 at 10:10, Etienne Chauchot  wrote:

> Congrats! Welcome onboard.
>
> Best
>
> Etienne
>
> Le 04/01/2024 à 03:14, Jane Chan a écrit :
> > Congratulations, Alex!
> >
> > Best,
> > Jane
> >
> > On Thu, Jan 4, 2024 at 10:03 AM Junrui Lee  wrote:
> >
> >> Congratulations, Alex!
> >>
> >> Best,
> >> Junrui
> >>
> >> weijie guo  于2024年1月4日周四 09:57写道:
> >>
> >>> Congratulations, Alex!
> >>>
> >>> Best regards,
> >>>
> >>> Weijie
> >>>
> >>>
> >>> Steven Wu  于2024年1月4日周四 02:07写道:
> >>>
>  Congra, Alex! Well deserved!
> 
>  On Wed, Jan 3, 2024 at 2:31 AM David Radley
>  wrote:
> 
> > Sorry for my typo.
> >
> > Many congratulations Alex!
> >
> > From: David Radley
> > Date: Wednesday, 3 January 2024 at 10:23
> > To: David Anderson
> > Cc:dev@flink.apache.org  
> > Subject: Re: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer -
> >>> Alexander
> > Fedulov
> > Many Congratulations David .
> >
> > From: Maximilian Michels
> > Date: Tuesday, 2 January 2024 at 12:16
> > To: dev
> > Cc: Alexander Fedulov
> > Subject: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer - Alexander
> > Fedulov
> > Happy New Year everyone,
> >
> > I'd like to start the year off by announcing Alexander Fedulov as a
> > new Flink committer.
> >
> > Alex has been active in the Flink community since 2019. He has
> > contributed more than 100 commits to Flink, its Kubernetes operator,
> > and various connectors [1][2].
> >
> > Especially noteworthy are his contributions on deprecating and
> > migrating the old Source API functions and test harnesses, the
> > enhancement to flame graphs, the dynamic rescale time computation in
> > Flink Autoscaling, as well as all the small enhancements Alex has
> > contributed which make a huge difference.
> >
> > Beyond code contributions, Alex has been an active community member
> > with his activity on the mailing lists [3][4], as well as various
> > talks and blog posts about Apache Flink [5][6].
> >
> > Congratulations Alex! The Flink community is proud to have you.
> >
> > Best,
> > The Flink PMC
> >
> > [1]
> >
> >>>
> https://github.com/search?type=commits=author%3Aafedulov+org%3Aapache
> > [2]
> >
> >>
> https://issues.apache.org/jira/browse/FLINK-28229?jql=status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(afedulov)%20ORDER%20BY%20resolved%20DESC%2C%20created%20DESC
> > [3]
> >>> https://lists.apache.org/list?dev@flink.apache.org:lte=100M:Fedulov
> > [4]
> >>> https://lists.apache.org/list?u...@flink.apache.org:lte=100M:Fedulov
> > [5]
> >
> >>
> https://flink.apache.org/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/
> > [6]
> >
> >>
> https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6
> >> 3AU


[jira] [Created] (FLINK-33988) Invalid configuration when using initialized root logger level on yarn application mode

2024-01-04 Thread RocMarshal (Jira)
RocMarshal created FLINK-33988:
--

 Summary: Invalid configuration when using initialized root logger 
level on yarn application mode
 Key: FLINK-33988
 URL: https://issues.apache.org/jira/browse/FLINK-33988
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Configuration
Affects Versions: 1.17.2, 1.18.0
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: FW: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-04 Thread Etienne Chauchot

Congrats! Welcome onboard.

Best

Etienne

Le 04/01/2024 à 03:14, Jane Chan a écrit :

Congratulations, Alex!

Best,
Jane

On Thu, Jan 4, 2024 at 10:03 AM Junrui Lee  wrote:


Congratulations, Alex!

Best,
Junrui

weijie guo  于2024年1月4日周四 09:57写道:


Congratulations, Alex!

Best regards,

Weijie


Steven Wu  于2024年1月4日周四 02:07写道:


Congra, Alex! Well deserved!

On Wed, Jan 3, 2024 at 2:31 AM David Radley
wrote:


Sorry for my typo.

Many congratulations Alex!

From: David Radley
Date: Wednesday, 3 January 2024 at 10:23
To: David Anderson
Cc:dev@flink.apache.org  
Subject: Re: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer -

Alexander

Fedulov
Many Congratulations David .

From: Maximilian Michels
Date: Tuesday, 2 January 2024 at 12:16
To: dev
Cc: Alexander Fedulov
Subject: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer - Alexander
Fedulov
Happy New Year everyone,

I'd like to start the year off by announcing Alexander Fedulov as a
new Flink committer.

Alex has been active in the Flink community since 2019. He has
contributed more than 100 commits to Flink, its Kubernetes operator,
and various connectors [1][2].

Especially noteworthy are his contributions on deprecating and
migrating the old Source API functions and test harnesses, the
enhancement to flame graphs, the dynamic rescale time computation in
Flink Autoscaling, as well as all the small enhancements Alex has
contributed which make a huge difference.

Beyond code contributions, Alex has been an active community member
with his activity on the mailing lists [3][4], as well as various
talks and blog posts about Apache Flink [5][6].

Congratulations Alex! The Flink community is proud to have you.

Best,
The Flink PMC

[1]


https://github.com/search?type=commits=author%3Aafedulov+org%3Aapache

[2]


https://issues.apache.org/jira/browse/FLINK-28229?jql=status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(afedulov)%20ORDER%20BY%20resolved%20DESC%2C%20created%20DESC

[3]

https://lists.apache.org/list?dev@flink.apache.org:lte=100M:Fedulov

[4]

https://lists.apache.org/list?u...@flink.apache.org:lte=100M:Fedulov

[5]


https://flink.apache.org/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/

[6]


https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6

3AU