Re:Re: [ANNOUNCE] New Apache Flink PMC Member - Weijie Guo

2024-06-04 Thread Wencong Liu
Congratulations, Weijie!




Best,

Wencong




At 2024-06-04 20:46:52, "Lijie Wang"  wrote:
>Congratulations, Weijie!
>
>Best,
>Lijie
>
>Zakelly Lan  于2024年6月4日周二 20:45写道:
>
>> Congratulations, Weijie!
>>
>> Best,
>> Zakelly
>>
>> On Tue, Jun 4, 2024 at 7:49 PM Sergey Nuyanzin 
>> wrote:
>>
>> > Congratulations Weijio Guo!
>> >
>> > On Tue, Jun 4, 2024, 13:45 Jark Wu  wrote:
>> >
>> > > Congratulations, Weijie!
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Tue, 4 Jun 2024 at 19:10, spoon_lz  wrote:
>> > >
>> > > > Congratulations, Weijie!
>> > > >
>> > > >
>> > > >
>> > > > Regards,
>> > > > Zhuo.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >  Replied Message 
>> > > > | From | Aleksandr Pilipenko |
>> > > > | Date | 06/4/2024 18:59 |
>> > > > | To |  |
>> > > > | Subject | Re: [ANNOUNCE] New Apache Flink PMC Member - Weijie Guo |
>> > > > Congratulations, Weijie!
>> > > >
>> > > > Best,
>> > > > Aleksandr
>> > > >
>> > > > On Tue, 4 Jun 2024 at 11:42, Abdulquddus Babatunde Ekemode <
>> > > > abdulqud...@aligence.io> wrote:
>> > > >
>> > > > Congratulations! I wish you all the best.
>> > > >
>> > > > Best Regards,
>> > > > Abdulquddus
>> > > >
>> > > > On Tue, 4 Jun 2024 at 13:14, Ahmed Hamdy 
>> wrote:
>> > > >
>> > > > Congratulations Weijie
>> > > > Best Regards
>> > > > Ahmed Hamdy
>> > > >
>> > > >
>> > > > On Tue, 4 Jun 2024 at 10:51, Matthias Pohl 
>> wrote:
>> > > >
>> > > > Congratulations, Weijie!
>> > > >
>> > > > Matthias
>> > > >
>> > > > On Tue, Jun 4, 2024 at 11:12 AM Guowei Ma 
>> > > > wrote:
>> > > >
>> > > > Congratulations!
>> > > >
>> > > > Best,
>> > > > Guowei
>> > > >
>> > > >
>> > > > On Tue, Jun 4, 2024 at 4:55 PM gongzhongqiang <
>> > > > gongzhongqi...@apache.org
>> > > >
>> > > > wrote:
>> > > >
>> > > > Congratulations Weijie! Best,
>> > > > Zhongqiang Gong
>> > > >
>> > > > Xintong Song  于2024年6月4日周二 14:46写道:
>> > > >
>> > > > Hi everyone,
>> > > >
>> > > > On behalf of the PMC, I'm very happy to announce that Weijie Guo
>> > > > has
>> > > > joined
>> > > > the Flink PMC!
>> > > >
>> > > > Weijie has been an active member of the Apache Flink community
>> > > > for
>> > > > many
>> > > > years. He has made significant contributions in many components,
>> > > > including
>> > > > runtime, shuffle, sdk, connectors, etc. He has driven /
>> > > > participated
>> > > > in
>> > > > many FLIPs, authored and reviewed hundreds of PRs, been
>> > > > consistently
>> > > > active
>> > > > on mailing lists, and also helped with release management of 1.20
>> > > > and
>> > > > several other bugfix releases.
>> > > >
>> > > > Congratulations and welcome Weijie!
>> > > >
>> > > > Best,
>> > > >
>> > > > Xintong (on behalf of the Flink PMC)
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > >
>> >
>>


[jira] [Created] (FLINK-35221) Support SQL 2011 reserved keywords as identifiers in Flink HiveParser

2024-04-24 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-35221:
---

 Summary: Support SQL 2011 reserved keywords as identifiers in 
Flink HiveParser 
 Key: FLINK-35221
 URL: https://issues.apache.org/jira/browse/FLINK-35221
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Affects Versions: 1.20.0
Reporter: Wencong Liu


According to Hive user documentation[1], starting from version 0.13.0, Hive 
prohibits the use of reserved keywords as identifiers. Moreover, versions 2.1.0 
and earlier allow using SQL11 reserved keywords as identifiers by setting 
{{hive.support.sql11.reserved.keywords=false}} in hive-site.xml. This 
compatibility feature facilitates jobs that utilize keywords as identifiers.

HiveParser in Flink, relying on Hive version 2.3.9, lacks the option to treat 
SQL11 reserved keywords as identifiers. This poses a challenge for users 
migrating SQL from Hive 1.x to Flink SQL, as they might encounter scenarios 
where keywords are used as identifiers. Addressing this issue is necessary to 
support such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34543) Support Full Partition Processing On Non-keyed DataStream

2024-02-28 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-34543:
---

 Summary: Support Full Partition Processing On Non-keyed DataStream
 Key: FLINK-34543
 URL: https://issues.apache.org/jira/browse/FLINK-34543
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Affects Versions: 1.20.0
Reporter: Wencong Liu
 Fix For: 1.20.0


1. Introduce MapParititon, SortPartition, Aggregate, Reduce API in DataStream.
2. Introduce SortPartition API in KeyedStream.

The related FLIP can be found in 
[FLIP-380|https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:[DISCUSS] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-01-24 Thread Wencong Liu
Hi Weijie,

Regarding FLIP-410, I have the following questions:


Q1. In the "Configuration" section, it is mentioned that 
configurations can be set continuously using the withXXX methods. 
Are these configuration options the same as those provided by DataStream V1, 
or might there be different options compared to V1?


Q2. The FLIP describes the interface for handling processing
 timers (ProcessingTimeManager), but it does not mention 
how to delete or update an existing timer. V1 API provides TimeService
that could delete a timer. Does this mean that
 once a timer is registered, it cannot be changed?

I hope to receive answers when you have time, thank you!

Best,
Wencong Liu

















At 2023-12-26 14:46:24, "weijie guo"  wrote:
>Hi devs,
>
>
>I'd like to start a discussion about FLIP-410: Config, Context and
>Processing Timer Service of DataStream API V2 [1]. This is the second
>sub-FLIP of DataStream API V2.
>
>
>In FLIP-409 [2], we have defined the most basic primitive of
>DataStream V2. On this basis, this FLIP will further answer several
>important questions closely related to it:
>
>   1.
>   How to configure the processing over the datastreams, such as
>setting the parallelism.
>   2.
>   How to get access to the runtime contextual information and
>services from inside the process functions.
>   3. How to work with processing-time timers.
>
>You can find more details in this FLIP. Its relationship with other
>sub-FLIPs can be found in the umbrella FLIP
>[3].
>
>
>Looking forward to hearing from you, thanks!
>
>
>Best regards,
>
>Weijie
>
>
>[1]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2
>
>[2]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction
>
>[3]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2


Re:[DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-24 Thread Wencong Liu
Hi Weijie,

Regarding FLIP-409, I have the following questions:


Q1. Other DataStream types are converted into 
Non-Keyed DataStreams by using a "shuffle" operation 
to convert Input into output. Does this "shuffle" include the 
various repartition operations (rebalance/rescale/shuffle) 
from DataStream V1?


Q2. Why is the design for TwoOutputStreamProcessFunction, 
when dealing with a KeyedStream, only outputting combinations 
of (Keyed + Keyed) and (Non-Keyed + Non-Keyed)?


I hope to receive answers when you have time, thank you!


Best,
Wencong Liu














At 2023-12-26 14:43:47, "weijie guo"  wrote:
>Hi devs,
>
>
>I'd like to start a discussion about FLIP-409: DataStream V2 Building
>Blocks: DataStream, Partitioning and ProcessFunction [1].
>
>
>As the first sub-FLIP for DataStream API V2, we'd like to discuss and
>try to answer some of the most fundamental questions in stream
>processing:
>
>   1. What kinds of data streams do we have?
>   2. How to partition data over the streams?
>   3. How to define a processing on the data stream?
>
>The answer to these questions involve three core concepts: DataStream,
>Partitioning and ProcessFunction. In this FLIP, we will discuss the
>definitions and related API primitives of these concepts in detail.
>
>
>You can find more details in FLIP-409 [1]. This sub-FLIP is at the
>heart of the entire DataStream API V2, and its relationship with other
>sub-FLIPs can be found in the umbrella FLIP [2].
>
>
>Looking forward to hearing from you, thanks!
>
>
>Best regards,
>
>Weijie
>
>
>
>[1]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction
>
>[2]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2


Re:[DISCUSS] FLIP-408: [Umbrella] Introduce DataStream API V2

2024-01-24 Thread Wencong Liu
Hi Weijie,


Thank you for the effort you've put into the DataStream API ! By reorganizing 
and 
redesigning the DataStream API, as well as addressing some of the unreasonable 
designs within it, we can enhance the efficiency of job development for 
developers. 
It also allows developers to design more flexible Flink jobs to meet business 
requirements.


I have conducted a comprehensive review of the DataStream API design in 
versions 
1.18 and 1.19. I found quite a few functional defects in the DataStream API, 
such as the
lack of corresponding APIs in batch processing scenarios. In the upcoming 1.20 
version, 
I will further improve the DataStream API in batch computing scenarios.


The issues existing in the old DataStream API (which can be referred to as V1) 
can be 
addressed from a design perspective in the initial version of V2. I hope to 
also have the
 opportunity to participate in the development of DataStream V2 and make my 
contribution.


Regarding FLIP-408, I have a question: The Processing TimerService is currently 
defined as one of the basic primitives, partly because it's understood that 
you have to choose between processing time and event time. 
The other part of the reason is that it needs to work based on the task's
mailbox thread model to avoid concurrency issues. Could you clarify the second
part of the reason?

Best,
Wencong Liu














At 2023-12-26 14:42:20, "weijie guo"  wrote:
>Hi devs,
>
>
>I'd like to start a discussion about FLIP-408: [Umbrella] Introduce
>DataStream API V2 [1].
>
>
>The DataStream API is one of the two main APIs that Flink provides for
>writing data processing programs. As an API that was introduced
>practically since day-1 of the project and has been evolved for nearly
>a decade, we are observing more and more problems of it. Improvements
>on these problems require significant breaking changes, which makes
>in-place refactor impractical. Therefore, we propose to introduce a
>new set of APIs, the DataStream API V2, to gradually replace the
>original DataStream API.
>
>
>The proposal to introduce a whole set new API is complex and includes
>massive changes. We are planning  to break it down into multiple
>sub-FLIPs for incremental discussion. This FLIP is only used as an
>umbrella, mainly focusing on motivation, goals, and overall planning.
>That is to say, more design and implementation details  will be
>discussed in other FLIPs.
>
>
>Given that it's hard to imagine the detailed design of the new API if
>we're just talking about this umbrella FLIP, and we probably won't be
>able to give an opinion on it. Therefore, I have prepared two
>sub-FLIPs [2][3] at the same time, and the discussion of them will be
>posted later in separate threads.
>
>
>Looking forward to hearing from you, thanks!
>
>
>Best regards,
>
>Weijie
>
>
>
>[1]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2
>
>[2]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction
>
>
>[3]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2


[jira] [Created] (FLINK-33949) METHOD_ABSTRACT_NOW_DEFAULT should be both source compatible and binary compatible

2023-12-26 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-33949:
---

 Summary: METHOD_ABSTRACT_NOW_DEFAULT should be both source 
compatible and binary compatible
 Key: FLINK-33949
 URL: https://issues.apache.org/jira/browse/FLINK-33949
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Affects Versions: 1.19.0
Reporter: Wencong Liu
 Fix For: 1.19.0


Currently  I'm trying to refactor some APIs annotated by @Public in [FLIP-382: 
Unify the Provision of Diverse Metadata for Context-like APIs - Apache Flink - 
Apache Software 
Foundation|https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs].
 When an abstract method is changed into a default method, the japicmp maven 
plugin names this change METHOD_ABSTRACT_NOW_DEFAULT and considers it as source 
incompatible and binary incompatible.

The reason maybe that if the abstract method becomes default, the logic in the 
default method will be ignored by the previous implementations.

I create a test case in which a job is compiled with newly changed default 
method and submitted to the previous version. There is no exception thrown. 
Therefore, the METHOD_ABSTRACT_NOW_DEFAULT shouldn't be incompatible both for 
source and binary.

By the way, currently the master branch checks both source compatibility and 
binary compatibility between minor versions. According to Flink's API 
compatibility constraints, the master branch shouldn't check binary 
compatibility. There is already a [Jira|[FLINK-33009] 
tools/release/update_japicmp_configuration.sh should only enable binary 
compatibility checks in the release branch - ASF JIRA (apache.org)] to track it 
and we should fix it as soon as possible.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33905) FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs

2023-12-20 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-33905:
---

 Summary: FLIP-382: Unify the Provision of Diverse Metadata for 
Context-like APIs
 Key: FLINK-33905
 URL: https://issues.apache.org/jira/browse/FLINK-33905
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Affects Versions: 1.19.0
Reporter: Wencong Liu


This ticket is proposed for 
[FLIP-382|https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs

2023-12-20 Thread Wencong Liu
The vote for FLIP-382 [1]: Unify the Provision of Diverse Metadata for 
Context-like APIs
 (discussion thread [2]) concluded. The vote will be closed.[3].

There are 3 binding votes and 1 non-binding votes:

Xintong Song (binding)
Lijie Wang (binding)
Weijie Guo (binding)
Yuxin Tan (non-binding)

There were no -1 votes. Therefore, FLIP-382 was accepted. I will prepare
the necessary changes for Flink 1.19.

Thanks everyone for the discussion!

Wencong Liu

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs
[2] https://lists.apache.org/thread/3mgsc31odtpmzzl32s4oqbhlhxd0mn6b
[3] https://lists.apache.org/thread/5vlf3klv131x8oj45qohvg9c53qkd87c

[RESULT][VOTE] FLIP-380: Support Full Partition Processing On Non-keyed DataStream

2023-12-20 Thread Wencong Liu
The vote for FLIP-380 [1]: Support Full Partition Processing On Non-keyed 
DataStream
 (discussion thread [2]) concluded. The vote will be closed.[3].

There are 3 binding votes and 1 non-binding votes:

Xintong Song (binding)
YingJie Cao (binding)
Weijie Guo (binding)
Yuxin Tan (non-binding)

There were no -1 votes. Therefore, FLIP-380 was accepted. I will prepare
the necessary changes for Flink 1.19.

Thanks everyone for the discussion!

Wencong Liu

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream
[2] https://lists.apache.org/thread/nn7myj7vsvytbkdrnbvj5h0homsjrn1h
[3] https://lists.apache.org/thread/ns1my6ydctjfl9w89hm8gvldh00lqtq3

[VOTE] FLIP-380: Support Full Partition Processing On Non-keyed DataStream

2023-12-14 Thread Wencong Liu
Hi dev,

I'd like to start a vote on FLIP-380.

Discussion thread: 
https://lists.apache.org/thread/nn7myj7vsvytbkdrnbvj5h0homsjrn1h
FLIP: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream

Best regards,
Wencong Liu

[VOTE] FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs

2023-12-14 Thread Wencong Liu
Hi dev,

I'd like to start a vote on FLIP-382.

Discussion thread: 
https://lists.apache.org/thread/3mgsc31odtpmzzl32s4oqbhlhxd0mn6b
FLIP: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs

Best regards,
Wencong Liu

Re:Re: [DISCUSS][2.0] FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs

2023-12-13 Thread Wencong Liu
Hi Lijie,

Thank you for the response and for supporting the proposal.

Regarding your suggestions:

1. While the getTaskNameWithIndexAndAttemptNumber may seem
redundant initially, this concatenated string, for example
"MyTask (3/6)#1", serves as an identifier used to denote
a specific Task across multiple components including
TaskExecutor, Task, ChangelogStateBackend, StreamTask, etc.
Hence, offering a dedicated method for this representation
is quite practical. WDYT?

2. Good point. I will indicate the replacement methods in the
note of deprecated methods to provide clearer guidance.

Please let me know if you have any further questions.

Best regards,
Wencong

















At 2023-12-12 14:57:53, "Lijie Wang"  wrote:
>Hi Wencong
>
>Thanks for driving the discussion, +1 for the proposal. I left two minor
>questions/suggestions:
>
>1. Is the getTaskNameWithIndexAndAttemptNumber method a bit redundant? It
>can be replaced by getTaskName + getTaskIndex + getAttemptNumber.
>2. I think it would be better if we can explicitly specify the alternative
>(based on TaskInfo/JobInfo) for each deprecated method
>
>Best,
>Lijie
>
>Wencong Liu  于2023年11月30日周四 14:50写道:
>
>> Hi devs,
>>
>> I would like to start a discussion on FLIP-382: Unify the Provision
>> of Diverse Metadata for Context-like APIs [1].
>>
>> In the Flink project, the context-like APIs are interfaces annotated by
>> @Public and supply runtime metadata and functionalities to its modules and
>> components. RuntimeContext is such an interface with 27 methods for
>> accessing metadata and framework functionalities. Currently, any
>> addition of metadata requires updating the RuntimeContext interface
>> and all 12 of its implementation classes, leading to high code
>> maintenance costs. To improve this, we propose to a categorize all
>> metadata into some metadata classes and provide it through dedicated
>> methods in context-like APIs. The newly provided metadata in context-like
>> API will only require to update the metadata classes, not context-like API
>> itself or it's implementations.
>>
>> Looking forward to your feedback.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs
>>
>> Best regards,
>> Wencong Liu


Re:Re: [DISCUSS] FLIP-380: Support Full Partition Processing On Non-keyed DataStream

2023-12-12 Thread Wencong Liu
Dear Weijie,

Thanks for your feedback and the +1 on this feature.

In response to your query, you are indeed correct. The PartitionWindowedStream's

upstream operator must utilize a Forward type partitioner. Consequently, 
the framework will apply a POINTWISE edge connection pattern between 
the upstream operator and the operators within the PartitionWindowedStream. 
As the parallelism of upstream operator and operators in the 
PartitionWindowedStream are same, the semantic integrity 
of the 'xxxPartition' can be guaranteed.

Additionally, I'll ensure that this explanation is updated in the FLIP [1].

I hope this clarification addresses your concern, and I'm glad to assist
if you have any more questions or insights to share.

Best regards,

Wencong
[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream
















At 2023-12-12 15:08:31, "weijie guo"  wrote:
>Thanks Wencong for driving this!
>
>I believe this is a useful feature, so +1 from my side.
>
>I only have one minor question about the exchange mode of `xxxPartition`
>method. Does this means the window operator must be connected to the
>upstream operator in forward edge (otherwise the concept of mapPartition is
>a bit far-fetched).
>
>Best regards,
>
>Weijie
>
>
>Wencong Liu  于2023年12月1日周五 14:04写道:
>
>> Hi devs,
>>
>> I'm excited to propose a new FLIP[1] aimed at enhancing the DataStream API
>>
>> to support full window processing on non-keyed streams. This feature
>> addresses
>> the current limitation where non-keyed DataStreams cannot accumulate
>> records
>> per subtask for collective processing at the end of input.
>>
>> Key proposals include:
>>
>>
>> 1. Introduction of PartitionWindowedStream allowing non-keyed DataStreams
>> to
>> be transformed for full window processing per subtask.
>>
>> 2. Addition of four new APIs - mapPartition, sortPartition, aggregate, and
>> reduce
>> - to enable powerful operations on PartitionWindowedStream.
>>
>> This initiative seeks to fill the gap left by the deprecation of the
>> DataSet API,
>> marrying its partition processing strengths with the dynamic capabilities
>> of the DataStream API.
>>
>> Looking forward to your feedback on this FLIP.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream
>>
>> Best regards,
>> Wencong Liu


Re:Re: [DISCUSS] FLIP-380: Support Full Partition Processing On Non-keyed DataStream

2023-12-12 Thread Wencong Liu
Dear Yuxin,


Thank you for your support and for raising those insightful
queries.


Indeed, the full window processing aligns more naturally 
with batch scenarios. The primary reason we've limited 
it to batch processing is due to state management 
considerations in a streaming context. If we were to extend 
support to streaming mode, it would require the state 
backend maintaining all the data received from the 
start to the finish of a task's computation. Such an approach
will lead to reduced computational efficiency and 
potentially impact job stability. In scenarios where full window 
processing is applied, the user is typically unconcerned with 
processing latency. Consequently, batch mode is often 
favored for executing full window operations to get higher 
throughput, as it eliminates the need for frequent checkpoints.


Additionally, the full window operators can perform failovers 
similar to the current batch mode execution. In this setup, 
intermediate results produced by an operator can be persisted 
within the shuffle service. Subsequently, in the event of a failover, 
downstream operators have the capability to re-read the data from 
the shuffle service.


I appreciate your feedback on the API implementation 
section formatting. I will ensure that the text is updated.

Best regards,
Wencong














At 2023-12-12 15:48:23, "Yuxin Tan"  wrote:
>Thanks Wencong for driving this FLIP.
>
>+1 from my side. It appears to significantly improve the handling
>of full-window data within the DataStream API. However, I do
>have a small question regarding the current limitation to batch
>processing: does this stem from performance-related considerations?
>Additionally, is there any possibility that support for streaming
>in the future?
>
>In addition, some format of the section `API implementation` is
>not right (some lines have exceeded the text box), maybe we
>can update and fix it.
>
>Best,
>Yuxin
>
>
>weijie guo  于2023年12月12日周二 15:09写道:
>
>> Thanks Wencong for driving this!
>>
>> I believe this is a useful feature, so +1 from my side.
>>
>> I only have one minor question about the exchange mode of `xxxPartition`
>> method. Does this means the window operator must be connected to the
>> upstream operator in forward edge (otherwise the concept of mapPartition is
>> a bit far-fetched).
>>
>> Best regards,
>>
>> Weijie
>>
>>
>> Wencong Liu  于2023年12月1日周五 14:04写道:
>>
>> > Hi devs,
>> >
>> > I'm excited to propose a new FLIP[1] aimed at enhancing the DataStream
>> API
>> >
>> > to support full window processing on non-keyed streams. This feature
>> > addresses
>> > the current limitation where non-keyed DataStreams cannot accumulate
>> > records
>> > per subtask for collective processing at the end of input.
>> >
>> > Key proposals include:
>> >
>> >
>> > 1. Introduction of PartitionWindowedStream allowing non-keyed DataStreams
>> > to
>> > be transformed for full window processing per subtask.
>> >
>> > 2. Addition of four new APIs - mapPartition, sortPartition, aggregate,
>> and
>> > reduce
>> > - to enable powerful operations on PartitionWindowedStream.
>> >
>> > This initiative seeks to fill the gap left by the deprecation of the
>> > DataSet API,
>> > marrying its partition processing strengths with the dynamic capabilities
>> > of the DataStream API.
>> >
>> > Looking forward to your feedback on this FLIP.
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream
>> >
>> > Best regards,
>> > Wencong Liu
>>


[DISCUSS] FLIP-380: Support Full Partition Processing On Non-keyed DataStream

2023-11-30 Thread Wencong Liu
Hi devs,

I'm excited to propose a new FLIP[1] aimed at enhancing the DataStream API

to support full window processing on non-keyed streams. This feature addresses
the current limitation where non-keyed DataStreams cannot accumulate records
per subtask for collective processing at the end of input.

Key proposals include:


1. Introduction of PartitionWindowedStream allowing non-keyed DataStreams to
be transformed for full window processing per subtask.

2. Addition of four new APIs - mapPartition, sortPartition, aggregate, and 
reduce
- to enable powerful operations on PartitionWindowedStream.

This initiative seeks to fill the gap left by the deprecation of the DataSet 
API,
marrying its partition processing strengths with the dynamic capabilities
of the DataStream API.

Looking forward to your feedback on this FLIP.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream

Best regards,
Wencong Liu

[DISCUSS][2.0] FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs

2023-11-29 Thread Wencong Liu
Hi devs,

I would like to start a discussion on FLIP-382: Unify the Provision
of Diverse Metadata for Context-like APIs [1].

In the Flink project, the context-like APIs are interfaces annotated by
@Public and supply runtime metadata and functionalities to its modules and
components. RuntimeContext is such an interface with 27 methods for
accessing metadata and framework functionalities. Currently, any
addition of metadata requires updating the RuntimeContext interface
and all 12 of its implementation classes, leading to high code
maintenance costs. To improve this, we propose to a categorize all
metadata into some metadata classes and provide it through dedicated
methods in context-like APIs. The newly provided metadata in context-like
API will only require to update the metadata classes, not context-like API
itself or it's implementations.

Looking forward to your feedback.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs

Best regards,
Wencong Liu

Re:[DISCUSS] FLIP-391: Deprecate RuntimeContext#getExecutionConfig

2023-11-17 Thread Wencong Liu
Hello Junrui,


Thanks for the effort. I agree with the proposal to deprecate the 
getExecutionConfig() method in the RuntimeContext class. Exposing
the complex ExecutionConfig to user-defined functions can lead to 
unnecessary complexity and risks.


I also have a suggestion. We could consider reviewing the existing
 methods in ExecutionConfig. If there are methods that are defined
 in ExecutionConfig but currently have no callers, we could consider
 annotating  them as @Internal or directly removing them. Since 
users are no longer able to access and invoke these methods, 
it would be beneficial to clean up the codebase.


+1 (non-binding).


Best,
Wencong



















At 2023-11-15 16:51:15, "Junrui Lee"  wrote:
>Hi all,
>
>I'd like to start a discussion of FLIP-391: Deprecate
>RuntimeContext#getExecutionConfig[1].
>
>Currently, the FLINK RuntimeContext is important for connecting user
>functions to the underlying runtime details. It provides users with
>necessary runtime information during job execution.
>However, he current implementation of the FLINK RuntimeContext exposes the
>ExecutionConfig to users, resulting in two issues:
>Firstly, the ExecutionConfig contains much unrelated information that can
>confuse users and complicate management.
>Secondly, exposing the ExecutionConfig allows users to modify it during job
>execution, which can cause inconsistencies and problems, especially with
>operator chaining.
>
>Therefore, we propose deprecating the RuntimeContext#getExecutionConfig in
>the FLINK RuntimeContext. In the upcoming FLINK-2.0 version, we plan to
>completely remove the RuntimeContext#getExecutionConfig method. Instead, we
>will introduce alternative getter methods that enable users to access
>specific information without exposing unnecessary runtime details. These
>getter methods will include:
>
>1. @PublicEvolving  TypeSerializer
>createSerializer(TypeInformation typeInformation);
>2. @PublicEvolving Map getGlobalJobParameters();
>3. @PublicEvolving boolean isObjectReuseEnabled();
>
>Looking forward to your feedback and suggestions, thanks.
>
>[1]
>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465937
>
>Best regards,
>Junrui


Re:[DISCUSS] FLIP-381: Deprecate configuration getters/setters that return/set complex Java objects

2023-11-03 Thread Wencong Liu
Thanks Junrui for your effort!

Making all configuration code paths lead to ConfigOption is a more standardized 
approach to configuring Flink applications.

+1 for this proposal.

Best,
Wencong Liu














At 2023-11-02 10:10:14, "Junrui Lee"  wrote:
>Hi devs,
>
>I would like to start a discussion on FLIP-381: Deprecate configuration
>getters/setters that return/set complex Java objects[1].
>
>Currently, the job configuration in FLINK is spread out across different
>components, which leads to inconsistencies and confusion. To address this
>issue, it is necessary to migrate non-ConfigOption complex Java objects to
>use ConfigOption and adopt a single Configuration object to host all the
>configuration.
>However, there is a significant blocker in implementing this solution.
>These complex Java objects in StreamExecutionEnvironment, CheckpointConfig,
>and ExecutionConfig have already been exposed through the public API,
>making it challenging to modify the existing implementation.
>
>Therefore, I propose to deprecate these Java objects and their
>corresponding getter/setter interfaces, ultimately removing them in
>FLINK-2.0.
>
>Your feedback and thoughts on this proposal are highly appreciated.
>
>Best regards,
>Junrui Lee
>
>[1]
>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278464992


[jira] [Created] (FLINK-33445) Translate DataSet migration guideline to Chinese

2023-11-02 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-33445:
---

 Summary: Translate DataSet migration guideline to Chinese
 Key: FLINK-33445
 URL: https://issues.apache.org/jira/browse/FLINK-33445
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.19.0
Reporter: Wencong Liu
 Fix For: 1.19.0


The [FLIINK-33041|https://issues.apache.org/jira/browse/FLINK-33041] about 
adding an introduction about how to migrate DataSet API to DataStream has been 
merged into master branch. Here is the link in the Flink website: [How to 
Migrate from DataSet to DataStream | Apache 
Flink|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/dataset_migration/]

According to the [contribution 
guidelines|https://flink.apache.org/how-to-contribute/contribute-documentation/#chinese-documentation-translation],
 we should add an identical markdown file in {{content.zh/}} and translate it 
to Chinese. Any community volunteers are welcomed to take this task.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:[VOTE] FLIP-366: Support standard YAML for FLINK configuration

2023-10-12 Thread Wencong Liu
+1(non-binding)

Best,
Wencong Liu















At 2023-10-13 10:12:06, "Junrui Lee"  wrote:
>Hi all,
>
>Thank you to everyone for the feedback on FLIP-366[1]: Support standard
>YAML for FLINK configuration in the discussion thread [2].
>I would like to start a vote for it. The vote will be open for at least 72
>hours (excluding weekends, unless there is an objection or an insufficient
>number of votes).
>
>Thanks,
>Junrui
>
>[1]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-366%3A+Support+standard+YAML+for+FLINK+configuration
>[2]https://lists.apache.org/thread/qfhcm7h8r5xkv38rtxwkghkrcxg0q7k5


[jira] [Created] (FLINK-33144) Deprecate Iteration API in DataStream

2023-09-24 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-33144:
---

 Summary: Deprecate Iteration API in DataStream
 Key: FLINK-33144
 URL: https://issues.apache.org/jira/browse/FLINK-33144
 Project: Flink
  Issue Type: Technical Debt
  Components: API / DataStream
Affects Versions: 1.19.0
Reporter: Wencong Liu
 Fix For: 1.19.0


Currently, the Iteration API of DataStream is incomplete. For instance, it 
lacks support for iteration in sync mode and exactly once semantics. 
Additionally, it does not offer the ability to set iteration termination 
conditions. As a result, it's hard for developers to build an iteration 
pipeline by DataStream in the practical applications such as machine learning.

[FLIP-176: Unified Iteration to Support 
Algorithms|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=184615300]
 has introduced a unified iteration library in the Flink ML repository. This 
library addresses all the issues present in the Iteration API of DataStream and 
could provide solution for all the iteration use-cases. However, maintaining 
two separate implementations of iteration in both the Flink repository and the 
Flink ML repository would introduce unnecessary complexity and make it 
difficult to maintain the Iteration API.

FLIP-357 has decided to deprecate the Iteration API of DataStream and remove it 
completely in the next major version. In the future, if other modules in the 
Flink repository require the use of the Iteration API, we can consider 
extracting all Iteration implementations from the Flink ML repository into an 
independent module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] FLIP-357: Deprecate Iteration API of DataStream

2023-09-24 Thread Wencong Liu
Hi all,

Thanks everyone for your review and the votes!

I am happy to announce that FLIP-357: Deprecate Iteration 
API of DataStream [1] has been accepted.
There are 4 binding votes and 1 non-binding vote [2]:
- Dong Lin (binding)
- Jing Ge (binding)
- Xintong Song (binding)
- Yangze Guo (binding)
- Yuxin Tan

There is no disapproving vote.
Best,
Wencong

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-357%3A+Deprecate+Iteration+API+of+DataStream
[2] https://lists.apache.org/list?dev@flink.apache.org:lte=1M:Deprecate

Re:Re: [ANNOUNCE] Release 1.18.0, release candidate #0

2023-09-20 Thread Wencong Liu
Hi, dear community and release managers,




Thanks for bringing this up.




When testing the release candidate #0 for the batch scenario, I found an

issue of frequent flushing in Hybrid shuffle. It is a new bug introduced by

1.18 and may significantly impact the performance of shuffle writing.




The fix ticket FLINK-33044[1] has been reviewed and approved. To prevent 

any performance issues, I would like to backport the fix to version 1.18

unless there are any objections.




Glad to hear your suggestions. 




[1] https://issues.apache.org/jira/browse/FLINK-33044




Best,
Yuxin Tan








At 2023-09-19 17:23:02, "Zakelly Lan"  wrote:
>Hi Yuan and Jing,
>
>Thank you for sharing your thoughts. I completely agree that it is our
>top priority to ensure that there are no regressions from the last
>commit the previous benchmark pipeline covered to the final commit of
>this release. I will try to get this result first.
>
>
>Best,
>Zakelly
>
>On Tue, Sep 19, 2023 at 4:55 PM Jing Ge  wrote:
>>
>> Hi
>>
>> Thanks Zakelly and Yuan for your effort and update. Since we changed the
>> hardware, IMHO, if we are able to reach a consensus in the community that
>> there is no regression with the benchmarks, we could consider releasing rc1
>> without waiting for the new baseline scores which might take days.
>>
>> Best regards,
>> Jing
>>
>> On Tue, Sep 19, 2023 at 10:42 AM Yuan Mei  wrote:
>>
>> > Hey Zakelly,
>> >
>> > Thanks very much for the efforts to re-build the entire benchmark
>> > environment.
>> >
>> > As long as we have
>> > 1) the pipeline set up and ready (no need for the entire portal ready),
>> > 2) get benchmark comparison numbers (comparing with the commit just before
>> > the benchmark pipeline is down) and
>> > 3) confirmed no-regression, it should be good enough.
>> >
>> > Thanks again!
>> >
>> > Best
>> > Yuan
>> >
>> > On Tue, Sep 19, 2023 at 4:26 PM Zakelly Lan  wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > I am working on rebuilding the benchmark pipeline and it's almost
>> > > done. However, due to the change in machines for benchmarking, I will
>> > > need a few more days to run tests and gather the baseline scores for
>> > > further comparison. Once the pipeline is fully ready, we will proceed
>> > > with the performance test for release 1.18.0.
>> > >
>> > > Please let me know if you have any concerns. Thank you all for your
>> > > patience.
>> > >
>> > > Best,
>> > > Zakelly
>> > >
>> > > On Mon, Sep 18, 2023 at 6:57 PM Jing Ge 
>> > > wrote:
>> > > >
>> > > > Hi everyone,
>> > > >
>> > > > The RC0 for Apache Flink 1.18.0 has been created. This RC is currently
>> > > for
>> > > > preview only to facilitate the integrated testing since the benchmark
>> > > tests
>> > > > are not available yet[1] and the release announcement is still under
>> > > > review. The RC1 will be released after all benchmarks tests are passed.
>> > > The
>> > > > related voting process will be triggered once the announcement is
>> > ready.
>> > > > The RC0 has all the artifacts that we would typically have for a
>> > release,
>> > > > except for the release note and the website pull request for the
>> > release
>> > > > announcement.
>> > > >
>> > > > The following contents are available for your review:
>> > > >
>> > > > - The preview source release and binary convenience releases [2], which
>> > > > are signed with the key with fingerprint 96AE0E32CBE6E0753CE6 [3].
>> > > > - all artifacts that would normally be deployed to the Maven
>> > > > Central Repository [4].
>> > > > - source code tag "release-1.18.0-rc0" [5]
>> > > >
>> > > > Your help testing the release will be greatly appreciated! And we'll
>> > > > create the rc1 release and the voting thread as soon as all the efforts
>> > > are
>> > > > finished.
>> > > >
>> > > > [1] https://issues.apache.org/jira/browse/FLINK-33052
>> > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.0-rc0/
>> > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> > > > [4]
>> > > https://repository.apache.org/content/repositories/orgapacheflink-1656/
>> > > > [5] https://github.com/apache/flink/releases/tag/release-1.18.0-rc0
>> > > >
>> > > > Best regards,
>> > > > Qingsheng, Sergei, Konstantin and Jing
>> > >
>> >


Re:Re: [DISCUSS] FLIP-331: Support EndOfStreamWindows and isOutputOnEOF operator attribute to optimize task deployment

2023-09-16 Thread Wencong Liu
Hi Dong & Jinhao,

Thanks for your clarification! +1

Best regards,
Wencong

















At 2023-09-15 11:26:16, "Dong Lin"  wrote:
>Hi Wencong,
>
>Thanks for your comments! Please see my reply inline.
>
>On Thu, Sep 14, 2023 at 12:30 PM Wencong Liu  wrote:
>
>> Dear Dong,
>>
>> I have thoroughly reviewed the proposal for FLIP-331 and believe it would
>> be
>> a valuable addition to Flink. However, I do have a few questions that I
>> would
>> like to discuss:
>>
>>
>> 1. The FLIP-331 proposed the EndOfStreamWindows that is implemented by
>> TimeWindow with maxTimestamp = (Long.MAX_VALUE - 1), which naturally
>> supports WindowedStream and AllWindowedStream to process all records
>> belonging to a key in a 'global' window under both STREAMING and BATCH
>> runtime execution mode.
>>
>>
>> However, besides coGroup and keyBy().aggregate(), other operators on
>> WindowedStream and AllWindowedStream, such as join/reduce, etc, currently
>> are still implemented based on WindowOperator.
>>
>>
>> In fact, these operators can also be implemented without using
>> WindowOperator
>> to prevent additional WindowAssigner#assignWindows or
>> triggerContext#onElement
>> invocation cost. Will there be plans to support these operators in the
>> future?
>>
>
>You are right. The EndOfStreamWindows proposed in this FLIP can potentially
>benefit any DataStream API that takes WindowAssigner as parameters. This
>can involve more operations than aggregate and co-group.
>
>And yes, we have plans to take advantage of this API to optimize these
>operators in the future. This FLIP focuses on the introduction of the
>public APIs and uses aggregate/co-group as the first two examples to
>show-case the performance benefits.
>
>I have added a "Analysis of APIs affected by this FLIP" to list the
>DataStream APIs that can benefit from this FLIP. Would this answer your
>question?
>
>
>>
>> 2. When using EndOfStreamWindows, upstream operators no longer support
>> checkpointing. This limit may be too strict, especially when dealing with
>> bounded data in streaming runtime execution mode, where checkpointing
>> can still be useful.
>>
>
>I am not sure we have a good way to support checkpoint while still
>achieving the performance improves targeted by this FLIP.
>
>The issue here is that if we support checkpoint, then we can not take
>advantage of algorithms (e.g. sorting inputs using ExternalSorter) that are
>not compatible with checkpoints. These algorithms (which do not support
>checkpoint) are the main reasons why batch mode currently significantly
>outperforms stream mode in doing aggregation/cogroup etc.
>
>In most cases where the user does not care about processing latency, it is
>generally preferred to use batch mode to perform aggregation operations
>(which should be 10X faster than the existing stream mode performance)
>instead of doing checkpoint.
>
>Also note that we can still let operators perform failover in the same as
>the existing batch mode execution, where the intermediate results (produced
>by one operator) can be persisted in shuffle service and downstream
>operators can re-read those data from shuffle service after failover.
>
>
>>
>> 3. The proposal mentions that if a transformation has isOutputOnEOF ==
>> true, the
>> operator as well as its upstream operators will be executed in 'batch
>> mode' with
>> checkpointing disabled. I would like to understand the specific
>> implications of this
>> 'batch mode' and if there are any other changes associated with it?
>
>
>Good point. We should explicitly mention the changes. I have updated the
>FLIP to clarify this.
>
>More specifically, the checkpoint is disabled when these operators are
>running, such that these operators can do operations not compatible with
>checkpoints (e.g. sorting inputs). And operators should re-read the data
>from the upstream blocking edge or sources after failover.
>
>Would this answer your question?
>
>
>>
>> Additionally, I am curious to know if this 'batch mode' conflicts with the
>> 'mix mode'
>>
>> described in FLIP-327. While the coGroup and keyBy().aggregate() operators
>> on
>> EndOfStreamWindows have the attribute 'isInternalSorterSupported' set to
>> true,
>> indicating support for the 'mixed mode', they also have isOutputOnEOF set
>> to true,
>> which suggests that the upstream operators should be executed in 'batch
>> mode'.
>> Will the 'mixed mode' be ignored when in 'batch mode'? I would appreciate
>> any
>> c

[VOTE] FLIP-357: Deprecate Iteration API of DataStream

2023-09-13 Thread Wencong Liu
Hi dev, 


I'd like to start a vote on FLIP-357.


Discussion thread: 
https://lists.apache.org/thread/shf77phc0wzlbj06jsfj3nclxnm2mrv5
FLIP: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-357%3A+Deprecate+Iteration+API+of+DataStream


Best regards,
Wencong Liu

Re:[DISCUSS] FLIP-331: Support EndOfStreamWindows and isOutputOnEOF operator attribute to optimize task deployment

2023-09-13 Thread Wencong Liu
Dear Dong,

I have thoroughly reviewed the proposal for FLIP-331 and believe it would be 
a valuable addition to Flink. However, I do have a few questions that I would 
like to discuss:


1. The FLIP-331 proposed the EndOfStreamWindows that is implemented by 
TimeWindow with maxTimestamp = (Long.MAX_VALUE - 1), which naturally 
supports WindowedStream and AllWindowedStream to process all records 
belonging to a key in a 'global' window under both STREAMING and BATCH 
runtime execution mode. 


However, besides coGroup and keyBy().aggregate(), other operators on 
WindowedStream and AllWindowedStream, such as join/reduce, etc, currently 
are still implemented based on WindowOperator.


In fact, these operators can also be implemented without using WindowOperator 
to prevent additional WindowAssigner#assignWindows or triggerContext#onElement 
invocation cost. Will there be plans to support these operators in the future?


2. When using EndOfStreamWindows, upstream operators no longer support 
checkpointing. This limit may be too strict, especially when dealing with 
bounded data in streaming runtime execution mode, where checkpointing 
can still be useful.

3. The proposal mentions that if a transformation has isOutputOnEOF == true, 
the 
operator as well as its upstream operators will be executed in 'batch mode' 
with 
checkpointing disabled. I would like to understand the specific implications of 
this
'batch mode' and if there are any other changes associated with it? 

Additionally, I am curious to know if this 'batch mode' conflicts with the 'mix 
mode'

described in FLIP-327. While the coGroup and keyBy().aggregate() operators on 
EndOfStreamWindows have the attribute 'isInternalSorterSupported' set to true, 
indicating support for the 'mixed mode', they also have isOutputOnEOF set to 
true, 
which suggests that the upstream operators should be executed in 'batch mode'. 
Will the 'mixed mode' be ignored when in 'batch mode'? I would appreciate any 
clarification on this matter.

Thank you for taking the time to consider my feedback. I eagerly await your 
response.

Best regards,

Wencong Liu











At 2023-09-01 11:21:47, "Dong Lin"  wrote:
>Hi all,
>
>Jinhao (cc'ed) and I are opening this thread to discuss FLIP-331: Support
>EndOfStreamWindows and isOutputOnEOF operator attribute to optimize task
>deployment. The design doc can be found at
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamWindows++and+isOutputOnEOF+operator+attribute+to+optimize+task+deployment
>.
>
>This FLIP introduces isOutputOnEOF operator attribute that JobManager can
>use to optimize task deployment and resource utilization. In addition, it
>also adds EndOfStreamWindows that can be used with the DataStream APIs
>(e.g. cogroup, aggregate) to significantly increase throughput and reduce
>resource utilization.
>
>We would greatly appreciate any comment or feedback you may have on this
>proposal.
>
>Cheers,
>Dong


[jira] [Created] (FLINK-33041) Add an introduction about how to migrate DataSet API to DataStream

2023-09-05 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-33041:
---

 Summary: Add an introduction about how to migrate DataSet API to 
DataStream
 Key: FLINK-33041
 URL: https://issues.apache.org/jira/browse/FLINK-33041
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.18.0
Reporter: Wencong Liu
 Fix For: 1.18.0


The DataSet API has been formally deprecated and will no longer receive active 
maintenance and support. It will be removed in the Flink 2.0 version. Flink 
users are recommended to migrate from the DataSet API to the DataStream API, 
Table API and SQL for their data processing requirements.

Most of the DataSet operators can be implemented using the DataStream API. 
However, we believe it would be beneficial to have an introductory article on 
the Flink website that guides users in migrating their DataSet jobs to 
DataStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:Re: [DISCUSS] FLIP-357: Deprecate Iteration API of DataStream

2023-09-01 Thread Wencong Liu
Hi Jing,


Thanks for your reply!


> Or the "independent module extraction" mentioned in the FLIP does mean an
independent module in Flink?


Yes. If there are submodules in Flink repository needs the iteration (currently 
not),
we could consider extracting them to a new submodule of Flink.


> users will have to add one more dependency of Flink ML. If iteration is the
only feature they need, it will look a little bit weird.


If users only need to execute iteration jobs, they can simply remove the Flink 
dependency and add the necessary dependencies related to Flink ML. However, 
they can still utilize the DataStream API as it is also a dependency of Flink 
ML.


Keeping an iteration submodule in Flink repository and make Flink ML depends it
is also another solution. But the current implementation of Iteration in 
DataStream
should be removed definitely due to its Incompleteness.


The placement of the Iteration API in the repository is a topic that has 
multiple 
potential solutions. WDYT?


Best,
Wencong











At 2023-09-01 17:59:34, "Jing Ge"  wrote:
>Hi Wencong,
>
>Thanks for the proposal!
>
>"The Iteration API in DataStream is planned be deprecated in Flink 1.19 and
>then finally removed in Flink 2.0. For the users that rely on the Iteration
>API in DataStream, they will have to migrate to Flink ML."
>- Does it make sense to migrate the iteration module into Flink directly?
>Or the "independent module extraction" mentioned in the FLIP does mean an
>independent module in Flink? Since the iteration will be removed in Flink,
>users will have to add one more dependency of Flink ML. If iteration is the
>only feature they need, it will look a little bit weird.
>
>
>Best regards,
>Jing
>
>On Fri, Sep 1, 2023 at 11:05 AM weijie guo 
>wrote:
>
>> Thanks, +1 for this.
>>
>> Best regards,
>>
>> Weijie
>>
>>
>> Yangze Guo  于2023年9月1日周五 14:29写道:
>>
>> > +1
>> >
>> > Thanks for driving this.
>> >
>> > Best,
>> > Yangze Guo
>> >
>> > On Fri, Sep 1, 2023 at 2:00 PM Xintong Song 
>> wrote:
>> > >
>> > > +1
>> > >
>> > > Best,
>> > >
>> > > Xintong
>> > >
>> > >
>> > >
>> > > On Fri, Sep 1, 2023 at 1:11 PM Dong Lin  wrote:
>> > >
>> > > > Thanks Wencong for initiating the discussion.
>> > > >
>> > > > +1 for the proposal.
>> > > >
>> > > > On Fri, Sep 1, 2023 at 12:00 PM Wencong Liu 
>> > wrote:
>> > > >
>> > > > > Hi devs,
>> > > > >
>> > > > > I would like to start a discussion on FLIP-357: Deprecate Iteration
>> > API
>> > > > of
>> > > > > DataStream [1].
>> > > > >
>> > > > > Currently, the Iteration API of DataStream is incomplete. For
>> > instance,
>> > > > it
>> > > > > lacks support
>> > > > > for iteration in sync mode and exactly once semantics.
>> Additionally,
>> > it
>> > > > > does not offer the
>> > > > > ability to set iteration termination conditions. As a result, it's
>> > hard
>> > > > > for developers to
>> > > > > build an iteration pipeline by DataStream in the practical
>> > applications
>> > > > > such as machine learning.
>> > > > >
>> > > > > FLIP-176: Unified Iteration to Support Algorithms [2] has
>> introduced
>> > a
>> > > > > unified iteration library
>> > > > > in the Flink ML repository. This library addresses all the issues
>> > present
>> > > > > in the Iteration API of
>> > > > > DataStream and could provide solution for all the iteration
>> > use-cases.
>> > > > > However, maintaining two
>> > > > > separate implementations of iteration in both the Flink repository
>> > and
>> > > > the
>> > > > > Flink ML repository
>> > > > > would introduce unnecessary complexity and make it difficult to
>> > maintain
>> > > > > the Iteration API.
>> > > > >
>> > > > > As such I propose deprecating the Iteration API of DataStream and
>> > > > removing
>> > > > > it completely in the next
>> > > > > major version. In the future, if other modules in the Flink
>> > repository
>> > > > > require the use of the
>> > > > > Iteration API, we can consider extracting all Iteration
>> > implementations
>> > > > > from the Flink ML repository
>> > > > > into an independent module.
>> > > > >
>> > > > > Looking forward to your feedback.
>> > > > >
>> > > > >
>> > > > > [1]
>> > > > >
>> > > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-357%3A+Deprecate+Iteration+API+of+DataStream
>> > > > > [2]
>> > > > >
>> > > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=184615300
>> > > > >
>> > > > > Best regards,
>> > > > >
>> > > > > Wencong Liu
>> > > >
>> >
>>


[DISCUSS] FLIP-357: Deprecate Iteration API of DataStream

2023-08-31 Thread Wencong Liu
Hi devs,

I would like to start a discussion on FLIP-357: Deprecate Iteration API of 
DataStream [1].

Currently, the Iteration API of DataStream is incomplete. For instance, it 
lacks support
for iteration in sync mode and exactly once semantics. Additionally, it does 
not offer the
ability to set iteration termination conditions. As a result, it's hard for 
developers to
build an iteration pipeline by DataStream in the practical applications such as 
machine learning.

FLIP-176: Unified Iteration to Support Algorithms [2] has introduced a unified 
iteration library
in the Flink ML repository. This library addresses all the issues present in 
the Iteration API of
DataStream and could provide solution for all the iteration use-cases. However, 
maintaining two
separate implementations of iteration in both the Flink repository and the 
Flink ML repository
would introduce unnecessary complexity and make it difficult to maintain the 
Iteration API.

As such I propose deprecating the Iteration API of DataStream and removing it 
completely in the next
major version. In the future, if other modules in the Flink repository require 
the use of the
Iteration API, we can consider extracting all Iteration implementations from 
the Flink ML repository
into an independent module.

Looking forward to your feedback.


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-357%3A+Deprecate+Iteration+API+of+DataStream
[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=184615300

Best regards,

Wencong Liu

[jira] [Created] (FLINK-32979) Deprecate WindowAssigner#getDefaultTrigger(StreamExecutionEnvironment env)

2023-08-28 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-32979:
---

 Summary: Deprecate 
WindowAssigner#getDefaultTrigger(StreamExecutionEnvironment env)
 Key: FLINK-32979
 URL: https://issues.apache.org/jira/browse/FLINK-32979
 Project: Flink
  Issue Type: Technical Debt
  Components: API / Core
Affects Versions: 1.19.0
Reporter: Wencong Liu
 Fix For: 1.19.0


The 
[FLIP-343|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425229]
 has decided that the parameter in WindowAssigner#getDefaultTrigger() will be 
removed in the next major version. We should deprecate it now and remove it in 
Flink 2.0. The removal will be tracked in 
[FLINK-4675|https://issues.apache.org/jira/browse/FLINK-4675].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2023-08-28 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-32978:
---

 Summary: Deprecate RichFunction#open(Configuration parameters)
 Key: FLINK-32978
 URL: https://issues.apache.org/jira/browse/FLINK-32978
 Project: Flink
  Issue Type: Technical Debt
  Components: API / Core
Affects Versions: 1.19.0
Reporter: Wencong Liu


The 
[FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
 has decided that the parameter in RichFunction#open will be removed in the 
next major version. We should deprecate it now and remove it in Flink 2.0. The 
removal will be tracked in 
[FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:[ANNOUNCE] New Apache Flink Committer - Hangxiang Yu

2023-08-07 Thread Wencong Liu
Congratulations, Hangxiang !


Best,
Wencong

















At 2023-08-07 14:57:49, "Yuan Mei"  wrote:
>On behalf of the PMC, I'm happy to announce Hangxiang Yu as a new Flink
>Committer.
>
>Hangxiang has been active in the Flink community for more than 1.5 years
>and has played an important role in developing and maintaining State and
>Checkpoint related features/components, including Generic Incremental
>Checkpoints (take great efforts to make the feature prod-ready). Hangxiang
>is also the main driver of the FLIP-263: Resolving schema compatibility.
>
>Hangxiang is passionate about the Flink community. Besides the technical
>contribution above, he is also actively promoting Flink: talks about Generic
>Incremental Checkpoints in Flink Forward and Meet-up. Hangxiang also spent
>a good amount of time supporting users, participating in Jira/mailing list
>discussions, and reviewing code.
>
>Please join me in congratulating Hangxiang for becoming a Flink Committer!
>
>Thanks,
>Yuan Mei (on behalf of the Flink PMC)


Re:[ANNOUNCE] New Apache Flink Committer - Yanfei Lei

2023-08-07 Thread Wencong Liu
Congratulations, Yanfei !


Best,
Wencong

















At 2023-08-07 14:56:21, "Yuan Mei"  wrote:
>On behalf of the PMC, I'm happy to announce Yanfei Lei as a new Flink
>Committer.
>
>Yanfei has been active in the Flink community for almost two years and has
>played an important role in developing and maintaining State and Checkpoint
>related features/components, including RocksDB Rescaling Performance
>Improvement and Generic Incremental Checkpoints.
>
>Yanfei also helps improve community infrastructure in many ways, including
>migrating the Flink Daily performance benchmark to the Apache Flink slack
>channel. She is the maintainer of the benchmark and has improved its
>detection stability significantly. She is also one of the major maintainers
>of the FrocksDB Repo and released FRocksDB 6.20.3 (part of Flink 1.17
>release). Yanfei is a very active community member, supporting users and
>participating
>in tons of discussions on the mailing lists.
>
>Please join me in congratulating Yanfei for becoming a Flink Committer!
>
>Thanks,
>Yuan Mei (on behalf of the Flink PMC)


[jira] [Created] (FLINK-32770) Fix the inaccurate backlog number of Hybrid Shuffle

2023-08-07 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-32770:
---

 Summary: Fix the inaccurate backlog number of Hybrid Shuffle
 Key: FLINK-32770
 URL: https://issues.apache.org/jira/browse/FLINK-32770
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.18.0
Reporter: Wencong Liu
 Fix For: 1.18.0


The backlog is inaccurate in both memory and disk tier. We should fix it to 
prevent redundant memory usage in reader side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:[ANNOUNCE] New Apache Flink PMC Member - Matthias Pohl

2023-08-04 Thread Wencong Liu
Congratulations, Matthias!

Best,
Wencong Liu

















At 2023-08-04 11:18:00, "Xintong Song"  wrote:
>Hi everyone,
>
>On behalf of the PMC, I'm very happy to announce that Matthias Pohl has
>joined the Flink PMC!
>
>Matthias has been consistently contributing to the project since Sep 2020,
>and became a committer in Dec 2021. He mainly works in Flink's distributed
>coordination and high availability areas. He has worked on many FLIPs
>including FLIP195/270/285. He helped a lot with the release management,
>being one of the Flink 1.17 release managers and also very active in Flink
>1.18 / 2.0 efforts. He also contributed a lot to improving the build
>stability.
>
>Please join me in congratulating Matthias!
>
>Best,
>
>Xintong (on behalf of the Apache Flink PMC)


[jira] [Created] (FLINK-32742) Remove flink-examples-batch module

2023-08-03 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-32742:
---

 Summary: Remove flink-examples-batch module
 Key: FLINK-32742
 URL: https://issues.apache.org/jira/browse/FLINK-32742
 Project: Flink
  Issue Type: Technical Debt
  Components: Examples
Affects Versions: 2.0.0
Reporter: Wencong Liu
 Fix For: 2.0.0


All DataSet APIs will be deprecated in [FLINK-32558], and the examples in the 
flink-examples-batch module should no longer be included in flink-dist. This 
change aims to prevent developers from continuing to use the DataSet API. 

However, it is important to note that for testing purposes, the module is still 
utilized by many end-to-end tests. Therefore, we should explore options to 
remove the examples from the flink-dist before removing the DataSet API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32741) Remove DataSet related descriptions in doc

2023-08-03 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-32741:
---

 Summary: Remove DataSet related descriptions in doc
 Key: FLINK-32741
 URL: https://issues.apache.org/jira/browse/FLINK-32741
 Project: Flink
  Issue Type: Technical Debt
  Components: Documentation
Affects Versions: 2.0.0
Reporter: Wencong Liu
 Fix For: 2.0.0


Since All DataSet APIs will be deprecated in [FLINK-32558] and we don't 
recommend developers to use the DataSet, the descriptions of DataSet should be 
removed in the doc after [FLINK-32558].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] FLIP-347: Remove IOReadableWritable serialization in Path

2023-08-02 Thread Wencong Liu
Hi all,

Thanks everyone for your review and the votes!

I am happy to announce that FLIP-347: Remove
IOReadableWritable serialization in Path [1] has been accepted.

There are 4 binding votes and 2 non-binding vote [2]:
- Weijie Guo (binding)
- Xintong Song (binding)
- Matthias Pohl (binding)
- Jing Ge (binding)
- Yuxin Tan
- Ron Liu

There is no disapproving vote.

Best,
Wencong

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-347%3A+Remove+IOReadableWritable+serialization+in+Path
[2] https://lists.apache.org/thread/99cw4y50xhvc1h9z7v07j5v1krqcxr27

[RESULT][VOTE] FLIP-344: Remove parameter in RichFunction#open

2023-08-02 Thread Wencong Liu
Hi all,

Thanks everyone for your review and the votes!

I am happy to announce that FLIP-344: Remove 
parameter in RichFunction#open [1] has been accepted.

There are 4 binding votes and 2 non-binding vote [2]:
- Weijie Guo (binding)
- Xintong Song (binding)
- Jing Ge (binding)
- Yuxin Tan
- Ron Liu

There is no disapproving vote.

Best,
Wencong

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231
[2] https://lists.apache.org/thread/owq9cwxooty237phbc55c3ko43rw9lww

[RESULT][VOTE] FLIP-343: Remove parameter in WindowAssigner#getDefaultTrigger()

2023-08-02 Thread Wencong Liu
Hi all,

Thanks everyone for your review and the votes!

I am happy to announce that FLIP-343: Remove parameter in
WindowAssigner#getDefaultTrigger() [1] has been accepted.

There are 4 binding votes and 2 non-binding vote [2]:
- Weijie Guo (binding)
- Xintong Song (binding)
- Matthias Pohl (binding)
- Jing Ge (binding)
- Yuxin Tan
- Ron Liu

There is no disapproving vote.

Best,
Wencong

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425229
[2] https://lists.apache.org/thread/gsd79x38chz4zs9v3hsxj4hv70b7rkqj

[jira] [Created] (FLINK-32708) Fix the write logic in remote tier of hybrid shuffle

2023-07-27 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-32708:
---

 Summary: Fix the write logic in remote tier of hybrid shuffle
 Key: FLINK-32708
 URL: https://issues.apache.org/jira/browse/FLINK-32708
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.18.0
Reporter: Wencong Liu
 Fix For: 1.18.0


Currently, on the writer side in the remote tier, the flag file indicating the 
latest segment id is updated first, followed by the creation of the data file. 
This results in an incorrect order of file creation and we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:Re: Re: Re: [DISCUSS][2.0] FLIP-347: Remove IOReadableWritable serialization in Path

2023-07-26 Thread Wencong Liu
Hi Matthias,

Thanks for your reply. Due to my busy work reasons, I would like to focus only 
on
the `Path` class in FLIP-347 for now. As for the implementation of other 
modules, 
I will review them when I have available time later on.


Best regards,
Wencong Liu














At 2023-07-26 18:35:21, "Matthias Pohl"  wrote:
>Correct. I don't have the intention to block this FLIP if it's too much
>effort to expand it. Sorry if that's the message that came across.
>
>On Wed, Jul 26, 2023 at 12:17 PM Xintong Song  wrote:
>
>> I think it worth looking into all implementations of IOReadeableWritable.
>> However, I would not consider that as a concern of this FLIP.
>>
>> An important convention of the open-source community is volunteer work. If
>> Wencong only wants to work on the `Path` case, I think he should not be
>> asked to investigate all other cases.
>>
>> I believe it's not Matthias's intention to put more workload on Wencong.
>> It's just sometimes such requests are not easy to say no.
>>
>> Best,
>>
>> Xintong
>>
>>
>>
>> On Wed, Jul 26, 2023 at 4:14 PM Matthias Pohl
>>  wrote:
>>
>> > Is the time constraint driven by the fact that you wanted to have that
>> > effort being included in 1.18? If so, it looks like that's not possible
>> > based on the decision being made for 1.18 to only allow document changes
>> > [1]. So, there would be actually time to look into it. WDYT?
>> >
>> > [1] https://lists.apache.org/thread/7l1c9ybqgyc1mx7t7tk4wkc1cm8481o9
>> >
>> > On Tue, Jul 25, 2023 at 12:04 PM Junrui Lee  wrote:
>> >
>> > > +1
>> > >
>> > > Best,
>> > > Junrui
>> > >
>> > > Jing Ge  于2023年7月24日周一 23:28写道:
>> > >
>> > > > agree, since we want to try our best to deprecate APIs in 1.18, it
>> > makes
>> > > > sense.
>> > > >
>> > > >
>> > > > Best regards,
>> > > > Jing
>> > > >
>> > > > On Mon, Jul 24, 2023 at 12:11 PM Wencong Liu 
>> > > wrote:
>> > > >
>> > > > > Hi Jing and Matthias,
>> > > > >
>> > > > >
>> > > > > I believe it is reasonable to examine all classes that
>> implement
>> > > the
>> > > > > IOReadableWritable
>> > > > > interface and summarize their actual usage. However, due to time
>> > > > > constraints, I suggest
>> > > > > we minimize the scope of this FLIP to focus on the Path class. As
>> for
>> > > > > other components
>> > > > > that implement IOReadableWritable, we can make an effort to
>> > investigate
>> > > > > them
>> > > > > in the future. WDYT?
>> > > > >
>> > > > >
>> > > > > Best regards,
>> > > > > Wencong Liu
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > At 2023-07-22 00:46:45, "Jing Ge" 
>> > wrote:
>> > > > > >Hi Wencong,
>> > > > > >
>> > > > > >Thanks for the clarification. I got your point. It makes sense.
>> > > > > >
>> > > > > >Wrt IOReadableWritable, the suggestion was to check all classes
>> that
>> > > > > >implemented it, e.g. BlockInfo, Value, Configuration, etc. Not
>> > limited
>> > > > to
>> > > > > >the Path.
>> > > > > >
>> > > > > >Best regards,
>> > > > > >Jing
>> > > > > >
>> > > > > >On Fri, Jul 21, 2023 at 4:31 PM Wencong Liu > >
>> > > > wrote:
>> > > > > >
>> > > > > >> Hello Jing,
>> > > > > >>
>> > > > > >>
>> > > > > >> Thanks for your reply. The URI field should be final and the
>> > > > > >> Path will be immutable.The static method
>> > > deserializeFromDataInputView
>> > > > > >> will create a new Path object instead of replacing the URI field
>> > > > > >>

[VOTE][2.0] FLIP-344: Remove parameter in RichFunction#open

2023-07-26 Thread Wencong Liu
Hi dev, 


I'd like to start a vote on FLIP-344.


Discussion thread: 
https://lists.apache.org/thread/5lyjrrdtwkngkol2t541r4xwoh7133km
FLIP: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231


Best regards, 
Wencong Liu

[VOTE][2.0] FLIP-347: Remove IOReadableWritable serialization in Path

2023-07-26 Thread Wencong Liu
Hi dev, 


I'd like to start a vote on FLIP-347.


Discussion thread: 
https://lists.apache.org/thread/3gcxhnqpsvb85golnlxf9tv5p43xkjgj
FLIP: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-347%3A+Remove+IOReadableWritable+serialization+in+Path


Best regards, 
Wencong Liu

[VOTE][2.0] FLIP-343: Remove parameter in WindowAssigner#getDefaultTrigger()

2023-07-26 Thread Wencong Liu
Hi dev, 


I'd like to start a vote on FLIP-343.


Discussion thread: 
https://lists.apache.org/thread/zn11f460x70nn7f2ckqph41bvx416wxc
FLIP: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425229


Best regards, 
Wencong Liu

Re: [ANNOUNCE] New Apache Flink Committer - Yong Fang

2023-07-24 Thread Wencong Liu
Congratulations!

Best,
Wencong Liu















在 2023-07-24 11:03:30,"Paul Lam"  写道:
>Congrats, Shammon!
>
>Best,
>Paul Lam
>
>> 2023年7月24日 10:56,Jingsong Li  写道:
>> 
>> Shammon
>


Re:Re: [DISCUSS][2.0] FLIP-344: Remove parameter in RichFunction#open

2023-07-24 Thread Wencong Liu
Hi Timo,


Thanks for you reply. I think adding an empty OpenContext to keep the signature 
is
reasonable. I'll modify the FLIP at a later time.


Best,
Wencong Liu

















At 2023-07-24 17:11:44, "Timo Walther"  wrote:
>+1
>
>But instead we should add a OpenContext there to keep the signature 
>stable but still be able to add parameters.
>
>Regards,
>Timo
>
>On 21.07.23 12:24, Jing Ge wrote:
>> +1
>> 
>> On Fri, Jul 21, 2023 at 10:22 AM Yuxin Tan  wrote:
>> 
>>> +1
>>>
>>> Best,
>>> Yuxin
>>>
>>>
>>> Xintong Song  于2023年7月21日周五 12:04写道:
>>>
>>>> +1
>>>>
>>>> Best,
>>>>
>>>> Xintong
>>>>
>>>>
>>>>
>>>> On Fri, Jul 21, 2023 at 10:52 AM Wencong Liu 
>>> wrote:
>>>>
>>>>> Hi devs,
>>>>>
>>>>> I would like to start a discussion on FLIP-344: Remove parameter in
>>>>> RichFunction#open [1].
>>>>>
>>>>> The open() method in RichFunction requires a Configuration instance as
>>> an
>>>>> argument,
>>>>> which is always passed as a new instance without any configuration
>>>>> parameters in
>>>>> AbstractUdfStreamOperator#open. Thus, it is unnecessary to include this
>>>>> parameter
>>>>> in the open() method.
>>>>> As such I propose to remove the Configuration field from
>>>>> RichFunction#open(Configuration parameters).
>>>>> Looking forward to your feedback.
>>>>> [1]
>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231
>>>>> Best regards,
>>>>>
>>>>>
>>>>> Wencong Liu
>>>>
>>>
>> 


Re:Re: Re: [DISCUSS][2.0] FLIP-347: Remove IOReadableWritable serialization in Path

2023-07-23 Thread Wencong Liu
Hi Jing and Matthias,


I believe it is reasonable to examine all classes that implement the 
IOReadableWritable 
interface and summarize their actual usage. However, due to time constraints, I 
suggest
we minimize the scope of this FLIP to focus on the Path class. As for other 
components
that implement IOReadableWritable, we can make an effort to investigate them
in the future. WDYT?


Best regards,
Wencong Liu











At 2023-07-22 00:46:45, "Jing Ge"  wrote:
>Hi Wencong,
>
>Thanks for the clarification. I got your point. It makes sense.
>
>Wrt IOReadableWritable, the suggestion was to check all classes that
>implemented it, e.g. BlockInfo, Value, Configuration, etc. Not limited to
>the Path.
>
>Best regards,
>Jing
>
>On Fri, Jul 21, 2023 at 4:31 PM Wencong Liu  wrote:
>
>> Hello Jing,
>>
>>
>> Thanks for your reply. The URI field should be final and the
>> Path will be immutable.The static method deserializeFromDataInputView
>> will create a new Path object instead of replacing the URI field
>> in a existed Path Object.
>>
>>
>> For the crossing multiple modules issue, I've explained it in the reply
>> to Matthias.
>>
>>
>> Best regards,
>>
>>
>> Wencong Liu
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2023-07-21 18:05:26, "Jing Ge"  wrote:
>> >Hi Wencong,
>> >
>> >Just out of curiosity, will the newly introduced
>> >deserializeFromDataInputView() method make the Path mutable again?
>> >
>> >What Matthias suggested makes sense, although the extension might make
>> this
>> >FLIP cross multiple modules.
>> >
>> >Best regards,
>> >Jing
>> >
>> >On Fri, Jul 21, 2023 at 10:23 AM Matthias Pohl
>> > wrote:
>> >
>> >> There's a kind-of-related issue FLINK-4758 [1] that proposes removing
>> the
>> >> IOReadableWritable interface from more classes. It was briefly
>> mentioned in
>> >> the must-have work items discussion [2].
>> >>
>> >> I'm not too sure about the usage of IOReadableWritable: ...whether it
>> would
>> >> go away with the removal of the DataSet API in general (the Jira issue
>> has
>> >> DataSet as a component), anyway.
>> >>
>> >> Otherwise, might it make sense to extend the scope of this FLIP?
>> >>
>> >> [1] https://issues.apache.org/jira/browse/FLINK-4758
>> >> [2] https://lists.apache.org/thread/gf0h4gh3xfsj78cpdsxsnj70nhzcmv9r
>> >>
>> >> On Fri, Jul 21, 2023 at 6:04 AM Xintong Song 
>> >> wrote:
>> >>
>> >> > +1
>> >> >
>> >> > Best,
>> >> >
>> >> > Xintong
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Jul 21, 2023 at 10:54 AM Wencong Liu 
>> >> wrote:
>> >> >
>> >> > > Hi devs,
>> >> > >
>> >> > > I would like to start a discussion on FLIP-347: Remove
>> >> IOReadableWritable
>> >> > > serialization in Path [1].
>> >> > >
>> >> > >
>> >> > > The Path class is currently mutable to support IOReadableWritable
>> >> > > serialization. However, many parts
>> >> > > of the code assume that the Path is immutable. By making the Path
>> class
>> >> > > immutable, we can ensure
>> >> > > that paths are stored correctly without the possibility of mutation
>> and
>> >> > > eliminate the occurrence of subtle errors.
>> >> > > As such I propose to modify the Path class to no longer implement
>> the
>> >> > > IOReadableWritable interface.
>> >> > > Looking forward to your feedback.
>> >> > > [1]
>> >> > >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-347%3A+Remove+IOReadableWritable+serialization+in+Path
>> >> > > Best regards,
>> >> > >
>> >> > >
>> >> > > Wencong Liu
>> >> >
>> >>
>>


Re:Re: [DISCUSS][2.0] FLIP-347: Remove IOReadableWritable serialization in Path

2023-07-21 Thread Wencong Liu
Hello Jing,


Thanks for your reply. The URI field should be final and the
Path will be immutable.The static method deserializeFromDataInputView
will create a new Path object instead of replacing the URI field
in a existed Path Object.


For the crossing multiple modules issue, I've explained it in the reply
to Matthias.


Best regards,


Wencong Liu



















At 2023-07-21 18:05:26, "Jing Ge"  wrote:
>Hi Wencong,
>
>Just out of curiosity, will the newly introduced
>deserializeFromDataInputView() method make the Path mutable again?
>
>What Matthias suggested makes sense, although the extension might make this
>FLIP cross multiple modules.
>
>Best regards,
>Jing
>
>On Fri, Jul 21, 2023 at 10:23 AM Matthias Pohl
> wrote:
>
>> There's a kind-of-related issue FLINK-4758 [1] that proposes removing the
>> IOReadableWritable interface from more classes. It was briefly mentioned in
>> the must-have work items discussion [2].
>>
>> I'm not too sure about the usage of IOReadableWritable: ...whether it would
>> go away with the removal of the DataSet API in general (the Jira issue has
>> DataSet as a component), anyway.
>>
>> Otherwise, might it make sense to extend the scope of this FLIP?
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-4758
>> [2] https://lists.apache.org/thread/gf0h4gh3xfsj78cpdsxsnj70nhzcmv9r
>>
>> On Fri, Jul 21, 2023 at 6:04 AM Xintong Song 
>> wrote:
>>
>> > +1
>> >
>> > Best,
>> >
>> > Xintong
>> >
>> >
>> >
>> > On Fri, Jul 21, 2023 at 10:54 AM Wencong Liu 
>> wrote:
>> >
>> > > Hi devs,
>> > >
>> > > I would like to start a discussion on FLIP-347: Remove
>> IOReadableWritable
>> > > serialization in Path [1].
>> > >
>> > >
>> > > The Path class is currently mutable to support IOReadableWritable
>> > > serialization. However, many parts
>> > > of the code assume that the Path is immutable. By making the Path class
>> > > immutable, we can ensure
>> > > that paths are stored correctly without the possibility of mutation and
>> > > eliminate the occurrence of subtle errors.
>> > > As such I propose to modify the Path class to no longer implement the
>> > > IOReadableWritable interface.
>> > > Looking forward to your feedback.
>> > > [1]
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-347%3A+Remove+IOReadableWritable+serialization+in+Path
>> > > Best regards,
>> > >
>> > >
>> > > Wencong Liu
>> >
>>


Re:Re: [DISCUSS][2.0] FLIP-347: Remove IOReadableWritable serialization in Path

2023-07-21 Thread Wencong Liu
Hello Matthias,


Thanks for your response. As described in FLIP-347, I have thoroughly reviewed 
all instances where the IOReadableWritable serialization of the Path class is 
used, 
and I have identified three classes that still utilize it for de/serializing 
the Path. 


1. FileSourceSplitSerializer: This class de/serializes the Path 
during the process of de/serializing the FileSourceSplit.
2. Two tests: TestManagedSinkCommittableSerializer,
TestManagedFileSourceSplitSerializer.


For 1, it can be modified to not serialize the Path by IOReadableWritable.
For 2, these two tests will be deprecated before Flink 2.0.


BTW, in my POC of this FLIP, I removed the IOReadableWritable usage in 1/2 
and it has passed the CI on Azure. I seems that the IOReadableWritable of Path 
is 
not used by DataSet API or other components.


I have documented these findings in FLINK-5336 [1].


Best regards,
Wencong Liu


[1] https://issues.apache.org/jira/browse/FLINK-5336











At 2023-07-21 16:22:34, "Matthias Pohl"  wrote:
>There's a kind-of-related issue FLINK-4758 [1] that proposes removing the
>IOReadableWritable interface from more classes. It was briefly mentioned in
>the must-have work items discussion [2].
>
>I'm not too sure about the usage of IOReadableWritable: ...whether it would
>go away with the removal of the DataSet API in general (the Jira issue has
>DataSet as a component), anyway.
>
>Otherwise, might it make sense to extend the scope of this FLIP?
>
>[1] https://issues.apache.org/jira/browse/FLINK-4758
>[2] https://lists.apache.org/thread/gf0h4gh3xfsj78cpdsxsnj70nhzcmv9r
>
>On Fri, Jul 21, 2023 at 6:04 AM Xintong Song  wrote:
>
>> +1
>>
>> Best,
>>
>> Xintong
>>
>>
>>
>> On Fri, Jul 21, 2023 at 10:54 AM Wencong Liu  wrote:
>>
>> > Hi devs,
>> >
>> > I would like to start a discussion on FLIP-347: Remove IOReadableWritable
>> > serialization in Path [1].
>> >
>> >
>> > The Path class is currently mutable to support IOReadableWritable
>> > serialization. However, many parts
>> > of the code assume that the Path is immutable. By making the Path class
>> > immutable, we can ensure
>> > that paths are stored correctly without the possibility of mutation and
>> > eliminate the occurrence of subtle errors.
>> > As such I propose to modify the Path class to no longer implement the
>> > IOReadableWritable interface.
>> > Looking forward to your feedback.
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-347%3A+Remove+IOReadableWritable+serialization+in+Path
>> > Best regards,
>> >
>> >
>> > Wencong Liu
>>


[DISCUSS][2.0] FLIP-347: Remove IOReadableWritable serialization in Path

2023-07-20 Thread Wencong Liu
Hi devs,

I would like to start a discussion on FLIP-347: Remove IOReadableWritable 
serialization in Path [1].


The Path class is currently mutable to support IOReadableWritable 
serialization. However, many parts 
of the code assume that the Path is immutable. By making the Path class 
immutable, we can ensure 
that paths are stored correctly without the possibility of mutation and 
eliminate the occurrence of subtle errors.
As such I propose to modify the Path class to no longer implement the 
IOReadableWritable interface.
Looking forward to your feedback.
[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-347%3A+Remove+IOReadableWritable+serialization+in+Path
Best regards,


Wencong Liu

[DISCUSS][2.0] FLIP-343: Remove parameter in WindowAssigner#getDefaultTrigger()

2023-07-20 Thread Wencong Liu
Hi devs,

I would like to start a discussion on FLIP-343: Remove parameter in 
WindowAssigner#getDefaultTrigger() [1].


The method getDefaultTrigger() in WindowAssigner takes a 
StreamExecutionEnvironment 
parameter, but this parameter is not actually used for any subclasses of 
WindowAssigner. 
Therefore, it is unnecessary to include this parameter.
As such I propose to remove the StreamExecutionEnvironment field from 
WindowAssigner#getDefaultTrigger(StreamExecutionEnvironment env).
Looking forward to your feedback.
[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425229
Best regards,


Wencong Liu

[DISCUSS][2.0] FLIP-344: Remove parameter in RichFunction#open

2023-07-20 Thread Wencong Liu
Hi devs,

I would like to start a discussion on FLIP-344: Remove parameter in 
RichFunction#open [1].

The open() method in RichFunction requires a Configuration instance as an 
argument, 
which is always passed as a new instance without any configuration parameters 
in 
AbstractUdfStreamOperator#open. Thus, it is unnecessary to include this 
parameter 
in the open() method.
As such I propose to remove the Configuration field from 
RichFunction#open(Configuration parameters).
Looking forward to your feedback.
[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231
Best regards,


Wencong Liu

Re:Re: [DISCUSS] Release 2.0 Work Items

2023-07-18 Thread Wencong Liu
Hi Chesnay,
Thanks for the reply. I think it is reasonable to remove the configuration 
argument
in AbstractUdfStreamOperator#open if it is consistently empty. I'll propose a 
discuss
about the specific actions in FLINK-6912 at a later time.


Best,
Wencong Liu











At 2023-07-18 16:38:59, "Chesnay Schepler"  wrote:
>On 18/07/2023 10:33, Wencong Liu wrote:
>> For FLINK-6912:
>>
>>  There are three implementations of RichFunction that actually use
>> the Configuration parameter in RichFunction#open:
>>  1. ContinuousFileMonitoringFunction#open: It uses the configuration
>> to configure the FileInputFormat. [1]
>>  2. OutputFormatSinkFunction#open: It uses the configuration
>> to configure the OutputFormat. [2]
>>  3. InputFormatSourceFunction#open: It uses the configuration
>>   to configure the InputFormat. [3]
>
>And none of them should have any effect since the configuration is empty.
>
>See org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator#open.


Re:Re: [DISCUSS] Release 2.0 Work Items

2023-07-18 Thread Wencong Liu
Thanks Xintong Song and Matthias for the insightful discussion!


I have double-checked the jira tickets that belong to the 
"Need action in 1.18" section and have some inputs to share.

For FLINK-4675:

The argument StreamExecutionEnvironment in 
WindowAssigner.getDefaultTrigger() 
is not used in all implementations of WindowAssigner and is no longer needed.

For FLINK-6912:

There are three implementations of RichFunction that actually use 
the Configuration parameter in RichFunction#open:
1. ContinuousFileMonitoringFunction#open: It uses the configuration 
to configure the FileInputFormat. [1]
2. OutputFormatSinkFunction#open: It uses the configuration 
to configure the OutputFormat. [2]
3. InputFormatSourceFunction#open: It uses the configuration
 to configure the InputFormat. [3]
I think RichFunction#open should still take a Configuration 
instance as an argument.

For FLINK-5336:

There are three classes that de/serialize the Path through IOReadWritable 
interface:
1. FileSourceSplitSerializer: It de/serializes the Path during the process 
of de/serializing FileSourceSplit. [4]
2. TestManagedSinkCommittableSerializer: It de/serializes the Path during 
the process of de/serializing TestManagedCommittable. [5]
3. TestManagedFileSourceSplitSerializer: It de/serializes the Path during 
the process of de/serializing TestManagedIterableSourceSplit. [6]
I think the Path should still implement the IOReadWritable interface.


I plan to propose a discussion about removing argument in FLINK-4675 and 
comment the conclusion in FLINK-6912 and FLINK-5336, WDYT?

[1] 
https://github.com/apache/flink/blob/9c3c8afbd9325b5df8291bd831da2d9f8785b30a/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileMonitoringFunction.java#L199
[2] 
https://github.com/apache/flink/blob/9c3c8afbd9325b5df8291bd831da2d9f8785b30a/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/OutputFormatSinkFunction.java#L63
[3] 
https://github.com/apache/flink/blob/9c3c8afbd9325b5df8291bd831da2d9f8785b30a/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/InputFormatSourceFunction.java#L64C2-L64C2
[4] 
https://github.com/apache/flink/blob/9c3c8afbd9325b5df8291bd831da2d9f8785b30a/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/FileSourceSplitSerializer.java#L67

[5] 
https://github.com/apache/flink/blob/9c3c8afbd9325b5df8291bd831da2d9f8785b30a/flink-table/flink-table-common/src/test/java/org/apache/flink/table/connector/sink/TestManagedSinkCommittableSerializer.java#L113
[6] 
https://github.com/apache/flink/blob/9c3c8afbd9325b5df8291bd831da2d9f8785b30a/flink-table/flink-table-common/src/test/java/org/apache/flink/table/connector/source/TestManagedFileSourceSplitSerializer.java#L56

















At 2023-07-17 12:23:51, "Xintong Song"  wrote:
>Hi Matthias,
>
>How's it going with the summary of existing 2.0.0 jira tickets?
>
>I have gone through everything listed under FLINK-3957[1], and will
>continue with other Jira tickets whose fix-version is 2.0.0.
>
>Here are my 2-cents on the FLINK-3975 subtasks. Hope this helps on your
>summary.
>
>I'd suggest going ahead with the following tickets.
>
>   - Need action in 1.18
>  - FLINK-4675: Double-check whether the argument is indeed not used.
>  Introduce a new non-argument API, and mark the original one as
>  `@Deprecated`. FLIP needed.
>  - FLINK-6912: Double-check whether the argument is indeed not used.
>  Introduce a new non-argument API, and mark the original one as
>  `@Deprecated`. FLIP needed.
>  - FLINK-5336: Double-check whether `IOReadableWritable` is indeed not
>  needed for `Path`. Mark methods from `IOReadableWritable` as
>`@Deprecated`
>  in `Path`. FLIP needed.
>   - Need no action in 1.18
>  - FLINK-4602/14068: Already listed in the release 2.0 wiki [2]
>  - FLINK-3986/3991/3992/4367/5130/7691: Subsumed by "Deprecated
>  methods/fields/classes in DataStream" in the release 2.0 wiki [2]
>  - FLINK-6375: Change the hashCode behavior of `LongValue` (and other
>  numeric types).
>
>I'd suggest not doing the following tickets.
>
>   - FLINK-4147/4330/9529/14658: These changes are non-trivial for both
>   developers and users. Also, we are taking them into consideration designing
>   the new ProcessFunction API. I'd be in favor of letting users migrate to
>   the ProcessFunction API directly once it's ready, rather than forcing users
>   to adapt to the breaking changes twice.
>   - FLINK-3610: Only affects Scala API, which will soon be removed.
>
>I don't have strong opinions on whether to work on the following tickets or
>not. Some of them are not very clear to me based on the description and
>conversation on the ticket, others may require further investigation and
>evaluation to decide. Unless someone volunteers to look into them, I'd be
>slightly 

[jira] [Created] (FLINK-32121) Avro Confluent Schema Registry nightly end-to-end test failed due to timeout

2023-05-17 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-32121:
---

 Summary: Avro Confluent Schema Registry nightly end-to-end test 
failed due to timeout
 Key: FLINK-32121
 URL: https://issues.apache.org/jira/browse/FLINK-32121
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.18.0
 Environment: !temp2.jpg!
Reporter: Wencong Liu
 Attachments: temp1.jpg, temp2.jpg

[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=49102=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=0f3adb59-eefa-51c6-2858-3654d9e0749d#:~:text=%5BFAIL%5D%20%27Avro%20Confluent%20Schema%20Registry]

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32066) Flink Ci service on Azure stops responding to pull requests

2023-05-12 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-32066:
---

 Summary: Flink Ci service on Azure stops responding to pull 
requests
 Key: FLINK-32066
 URL: https://issues.apache.org/jira/browse/FLINK-32066
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines
Affects Versions: 1.18.0
Reporter: Wencong Liu
 Attachments: 20230512152023.jpg

As of the time when this issue was created, Flink's CI service on Azure could 
no longer be triggered by new pull requests.
!20230512152023.jpg!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:Scala Compilation Errors with Flink code in IntelliJ. Builds fine with Maven command line.

2023-05-11 Thread Wencong Liu



Hi Brandon Wright,


I think you could try the following actions in IntelliJ IDE:
First, execute the command "mvn clean install  -Dfast -DskipTests=true 
-Dscala-2.12" in terminal.
Second, in "File -> Invalidate Caches", select all options and restart the IDE.
Finally, click "maven reload" in the maven plugin, and wait until the 
reloading process is finished.
   If it not work after these actions, you could try more times.


Best,


Wencong Liu











At 2023-05-12 07:16:09, "Brandon Wright"  
wrote:
>I clone the Flink git repository, master branch, I configure a Java 8 JDK, and 
>I can build the flink project successfully with:
>
>mvn clean package -DskipTests
>
>However, when I load the project into IntelliJ, and try to compile the project 
>and run the Scala tests in the IDE I get a lot of compilation errors with the 
>existing Scala code like:
>
>./flink/flink-scala/src/test/scala/org/apache/flink/api/scala/DeltaIterationSanityCheckTest.scala:33:41
>could not find implicit value for evidence parameter of type 
>org.apache.flink.api.common.typeinfo.TypeInformation[(Int, String)]
>val solutionInput = env.fromElements((1, "1"))
>
>and
>
>./flink/flink-table/flink-table-api-scala/src/test/scala/org/apache/flink/table/types/extraction/DataTypeExtractorScalaTest.scala:39:7
>overloaded method value assertThatThrownBy with alternatives:
>(x$1: org.assertj.core.api.ThrowableAssert.ThrowingCallable,x$2: String,x$3: 
>Object*)org.assertj.core.api.AbstractThrowableAssert[_, _ <: Throwable] 
>(x$1: 
>org.assertj.core.api.ThrowableAssert.ThrowingCallable)org.assertj.core.api.AbstractThrowableAssert[_,
> _ <: Throwable]
>cannot be applied to (() => Unit)
>assertThatThrownBy(() => runExtraction(testSpec))
>
>Clearly, the same code is compiling when using the Maven build via command 
>line, so this must be some kind of environment/config issue. I'd to get the 
>code building within IntelliJ so I can use the debugger and step through unit 
>tests. I don't want to make source changes quite yet. I'd like to just step 
>through the code as it is.
>
>My first guess is the IntelliJ IDE is using the wrong version of the Scala 
>compiler. In IntelliJ, in "Project Structure" -> "Platform Settings" -> 
>"Global Libraries", I have "scala-sdk-2.12.7" configured and nothing else. I 
>believe that's the specific version of Scala that the Flink code is intended 
>to compile with. I've checked all the project settings and preferences and I 
>don't see any other places I can configure or even verify which version of 
>Scala is being used.
>
>Additional points:
>
>- I can run/debug Java unit tests via the IntelliJ IDE, but not Scala unit 
>tests.
>- If I do "Build" -> "Rebuild Project", I get Scala compilation errors as 
>mentioned above, but no Java errors. The Java code seems to compile 
>successfully.
>- I'm using the current version of IntelliJ 2023.1.1 Ultimate with the Scala 
>plugin installed.
>- I've read and followed the instructions on 
>https://nightlies.apache.org/flink/flink-docs-master/docs/flinkdev/ide_setup/. 
>These docs don't mention specifying the version of the Scala compiler at all.
>- This is a clean repo on "master" branch with absolutely zero changes.
>- In IntelliJ, in "Project Structure" -> "Project Settings" -> "Project", I've 
>chosen a Java 8 JDK, which I presume is the best choice for building Flink 
>code today
>
>Thanks for any help!


Re:[ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren

2023-04-22 Thread Wencong Liu
Congratulations, Qingsheng!

Best,
Wencong LIu















At 2023-04-21 19:47:52, "Jark Wu"  wrote:
>Hi everyone,
>
>We are thrilled to announce that Leonard Xu has joined the Flink PMC!
>
>Leonard has been an active member of the Apache Flink community for many
>years and became a committer in Nov 2021. He has been involved in various
>areas of the project, from code contributions to community building. His
>contributions are mainly focused on Flink SQL and connectors, especially
>leading the flink-cdc-connectors project to receive 3.8+K GitHub stars. He
>authored 150+ PRs, and reviewed 250+ PRs, and drove several FLIPs (e.g.,
>FLIP-132, FLIP-162). He has participated in plenty of discussions in the
>dev mailing list, answering questions about 500+ threads in the
>user/user-zh mailing list. Besides that, he is community minded, such as
>being the release manager of 1.17, verifying releases, managing release
>syncs, etc.
>
>Congratulations and welcome Leonard!
>
>Best,
>Jark (on behalf of the Flink PMC)














At 2023-04-21 19:50:02, "Jark Wu"  wrote:
>Hi everyone,
>
>We are thrilled to announce that Qingsheng Ren has joined the Flink PMC!
>
>Qingsheng has been contributing to Apache Flink for a long time. He is the
>core contributor and maintainer of the Kafka connector and
>flink-cdc-connectors, bringing users stability and ease of use in both
>projects. He drove discussions and implementations in FLIP-221, FLIP-288,
>and the connector testing framework. He is continuously helping with the
>expansion of the Flink community and has given several talks about Flink
>connectors at many conferences, such as Flink Forward Global and Flink
>Forward Asia. Besides that, he is willing to help a lot in the community
>work, such as being the release manager for both 1.17 and 1.18, verifying
>releases, and answering questions on the mailing list.
>
>Congratulations and welcome Qingsheng!
>
>Best,
>Jark (on behalf of the Flink PMC)


Re:[ANNOUNCE] New Apache Flink PMC Member - Leonard Xu

2023-04-22 Thread Wencong Liu
Congratulations, Leonard!

Best,
Wencong LIu















At 2023-04-21 19:47:52, "Jark Wu"  wrote:
>Hi everyone,
>
>We are thrilled to announce that Leonard Xu has joined the Flink PMC!
>
>Leonard has been an active member of the Apache Flink community for many
>years and became a committer in Nov 2021. He has been involved in various
>areas of the project, from code contributions to community building. His
>contributions are mainly focused on Flink SQL and connectors, especially
>leading the flink-cdc-connectors project to receive 3.8+K GitHub stars. He
>authored 150+ PRs, and reviewed 250+ PRs, and drove several FLIPs (e.g.,
>FLIP-132, FLIP-162). He has participated in plenty of discussions in the
>dev mailing list, answering questions about 500+ threads in the
>user/user-zh mailing list. Besides that, he is community minded, such as
>being the release manager of 1.17, verifying releases, managing release
>syncs, etc.
>
>Congratulations and welcome Leonard!
>
>Best,
>Jark (on behalf of the Flink PMC)


[jira] [Created] (FLINK-31841) Redundant local variables in AllWindowedStream#reduce

2023-04-18 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-31841:
---

 Summary: Redundant local variables in AllWindowedStream#reduce
 Key: FLINK-31841
 URL: https://issues.apache.org/jira/browse/FLINK-31841
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Affects Versions: 1.18.0
Reporter: Wencong Liu
 Fix For: 1.18.0


Currently, there is two redundant local variables in AllWindowedStream#reduce.
{code:java}
public SingleOutputStreamOperator reduce(ReduceFunction function) {
if (function instanceof RichFunction) {
throw new UnsupportedOperationException(
"ReduceFunction of reduce can not be a RichFunction. "
+ "Please use reduce(ReduceFunction, WindowFunction) 
instead.");
}

// clean the closure
function = input.getExecutionEnvironment().clean(function);

String callLocation = Utils.getCallLocationName();
String udfName = "AllWindowedStream." + callLocation;

return reduce(function, new PassThroughAllWindowFunction());
} {code}
`callLocation` and `udfName` are not used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:[VOTE] FLIP-301: Hybrid Shuffle supports Remote Storage

2023-03-19 Thread Wencong Liu
+1 (non-binding)

Best regards,

Wencong Liu















At 2023-03-20 12:05:47, "Yuxin Tan"  wrote:
>Hi, everyone,
>
>Thanks for all your feedback for FLIP-301: Hybrid Shuffle
>supports Remote Storage[1] on the discussion thread[2].
>
>I'd like to start a vote for it. The vote will be open for at
>least 72 hours (03/23, 13:00 UTC+8) unless there is an
>objection or not enough votes.
>
>[1]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage
>[2] https://lists.apache.org/thread/nwrqd5jtqwks89tbxpcrgto6r2bhdhno
>
>Best,
>Yuxin


Re:Re: [ANNOUNCE] New Apache Flink Committer - Yuxia Luo

2023-03-12 Thread Wencong Liu
Congratulations, Yuxia!

Best,
Wencong Liu


At 2023-03-13 11:20:21, "Qingsheng Ren"  wrote:
>Congratulations, Yuxia!
>
>Best,
>Qingsheng
>
>On Mon, Mar 13, 2023 at 10:27 AM Jark Wu  wrote:
>
>> Hi, everyone
>>
>> On behalf of the PMC, I'm very happy to announce Yuxia Luo as a new Flink
>> Committer.
>>
>> Yuxia has been continuously contributing to the Flink project for almost
>> two
>> years, authored and reviewed hundreds of PRs over this time. He is
>> currently
>> the core maintainer of the Hive component, where he contributed many
>> valuable
>> features, including the Hive dialect with 95% compatibility and small file
>> compaction.
>> In addition, Yuxia driven FLIP-282 (DELETE & UPDATE API) to better
>> integrate
>> Flink with data lakes. He actively participated in dev discussions and
>> answered
>> many questions on the user mailing list.
>>
>> Please join me in congratulating Yuxia Luo for becoming a Flink Committer!
>>
>> Best,
>> Jark Wu (on behalf of the Flink PMC)
>>


Re:[DISCUSS] FLIP-301: Hybrid Shuffle supports Remote Storage

2023-03-07 Thread Wencong Liu
Hello Yuxin,


Thanks for your proposal! Adding remote storage capability to Flink's 
Hybrid Shuffle is a significant improvement that addresses the issue of local 
disk storage limitations. This enhancement not only ensures uninterrupted 
Shuffle, but also enables Flink to handle larger workloads and more complex 
data processing tasks. With the ability to seamlessly shift between local and 
remote storage, Flink's Hybrid Shuffle will be more versatile and scalable, 
making it an ideal choice for organizations looking to build distributed data 
processing applications with ease.
Besides, I've a small question about the size of Segment in different 
storages. According to the FLIP, the size of Segment may be fixed for each 
Storage Tier, but I think the fixed size may affect the shuffle performance. 
For example, smaller segment size will improve the utilization rate of Memory 
Storage Tier, but it may brings extra cost to Disk Storage Tier or Remote 
Storage Tier. Deciding the size of Segment dynamicly will be helpful.
 
Best,


Wencong Liu



















At 2023-03-06 13:51:21, "Yuxin Tan"  wrote:
>Hi everyone,
>
>I would like to start a discussion on FLIP-301: Hybrid Shuffle supports
>Remote Storage[1].
>
>In the cloud-native environment, it is difficult to determine the
>appropriate
>disk space for Batch shuffle, which will affect job stability.
>
>This FLIP is to support Remote Storage for Hybrid Shuffle to improve the
>Batch job stability in the cloud-native environment.
>
>The goals of this FLIP are as follows.
>1. By default, use the local memory and disk to ensure high shuffle
>performance if the local storage space is sufficient.
>2. When the local storage space is insufficient, use remote storage as
>a supplement to avoid large-scale Batch job failure.
>
>Looking forward to hearing from you.
>
>[1]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage
>
>Best,
>Yuxin


Re:[ANNOUNCE] New Apache Flink PMC Member - Dong Lin

2023-02-16 Thread Wencong Liu
Congratulations Dong!


Bast,
Wencong Liu

















At 2023-02-16 14:19:08, "Guowei Ma"  wrote:
>Hi, everyone
>
>On behalf of the PMC, I'm very happy to announce Dong Lin as a new
>Flink PMC.
>
>Dong is currently the main driver of Flink ML. He reviewed a large
>number of Flink ML related PRs and also participated in many Flink ML
>improvements, such as "FLIP-173","FLIP-174" etc. At the same time, he made
>a lot of evangelism events contributions for the Flink ML ecosystem.
>In fact, in addition to the Flink machine learning field, Dong has also
>participated in many other improvements in Flink, such as "FLIP-205",
>"FLIP-266","FLIP-269","FLIP-274" etc.
>Please join me in congratulating Dong Lin for becoming a Flink PMC!
>
>Best,
>Guowei(on behalf of the Flink PMC)


Re:Re: [ANNOUNCE] New Apache Flink Committer - Weijie Guo

2023-02-14 Thread Wencong Liu
Congratulations! Weijie!

Best,Wencong Liiu















在 2023-02-15 10:52:38,"Shengkai Fang"  写道:
>Congratulations!
>
>Best,
>Shengkai
>
>Feng Jin  于2023年2月14日周二 14:21写道:
>
>> Congratulations! Weijie
>>
>> Best,
>> Feng
>>
>> On Tue, Feb 14, 2023 at 2:19 PM Leonard Xu  wrote:
>> >
>> > Congratulations! Weijie
>> >
>> > Best,
>> > Leonard
>>


Re:Re: [ANNOUNCE] New Apache Flink Committer - Jing Ge

2023-02-14 Thread Wencong Liu
Congratulations! JIngGe!

Best,
Wencong Liu
















在 2023-02-15 10:52:16,"Shengkai Fang"  写道:
>Congratulations!
>
>Best,
>Shengkai
>
>Yanfei Lei  于2023年2月15日周三 10:08写道:
>
>> Congratulations, Jing Ge !
>>
>> Best regards,
>> Yanfei
>>
>> yuxia  于2023年2月15日周三 09:34写道:
>> >
>> > Congratulations, Jing Ge !
>> >
>> > Best regards,
>> > Yuxia
>> >
>> > - 原始邮件 -
>> > 发件人: "Jark Wu" 
>> > 收件人: "dev" 
>> > 发送时间: 星期二, 2023年 2 月 14日 下午 11:32:30
>> > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Jing Ge
>> >
>> > Welcome and congratulations, Jing!
>> >
>> > Best,
>> > Jark
>> >
>> > > 2023年2月14日 23:14,Lijie Wang  写道:
>> > >
>> > > Congratulations, Jing Ge !
>> > >
>> > > Best,
>> > > Lijie
>> > >
>> > > Sergey Nuyanzin  于2023年2月14日周二 21:47写道:
>> > >
>> > >> Congratulations, Jing Ge!
>> > >>
>> > >> On Tue, Feb 14, 2023 at 2:47 PM Rui Fan  wrote:
>> > >>
>> > >>> Congratulations, Jing!
>> > >>>
>> > >>> Best,
>> > >>> Rui Fan
>> > >>>
>> > >>> On Tue, Feb 14, 2023 at 19:36 Yuepeng Pan  wrote:
>> > >>>
>> > >>>> Congratulations, Jing Ge !
>> > >>>>
>> > >>>> Best,Yuepeng Pan
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> At 2023-02-14 15:49:24, "godfrey he"  wrote:
>> > >>>>> Hi everyone,
>> > >>>>>
>> > >>>>> On behalf of the PMC, I'm very happy to announce Jing Ge as a new
>> > >> Flink
>> > >>>>> committer.
>> > >>>>>
>> > >>>>> Jing has been consistently contributing to the project for over 1
>> > >> year.
>> > >>>>> He authored more than 50 PRs and reviewed more than 40 PRs
>> > >>>>> with mainly focus on connector, test, and document modules.
>> > >>>>> He was very active on the mailing list (more than 90 threads) last
>> > >> year,
>> > >>>>> which includes participating in a lot of dev discussions (30+),
>> > >>>>> providing many effective suggestions for FLIPs and answering
>> > >>>>> many user questions. He was the Flink Forward 2022 keynote speaker
>> > >>>>> to help promote Flink and  a trainer for Flink troubleshooting and
>> > >>>> performance
>> > >>>>> tuning of Flink Forward 2022 training program.
>> > >>>>>
>> > >>>>> Please join me in congratulating Jing for becoming a Flink
>> committer!
>> > >>>>>
>> > >>>>> Best,
>> > >>>>> Godfrey
>> > >>>>
>> > >>>
>> > >>
>> > >>
>> > >> --
>> > >> Best regards,
>> > >> Sergey
>> > >>
>>


Re:Re: Re: [DISCUSS] Allow source readers extending SourceReaderBase to override numRecordsIn report logic

2023-01-10 Thread Wencong Liu
 can define its own metric again.
>> With
>> > > how I read FLIP-33, the purpose was exactly to avoid that. So you can
>> > have
>> > > a connector that has its own definition for numRecordsIn, which it
>> > > submits/sends as numRecordsIn, which basically breaks it being a
>> default
>> > > metric.
>> > >
>> > > Best regards,
>> > >
>> > > Martijn
>> > >
>> > > On Sun, Jan 8, 2023 at 3:26 PM Dong Lin  wrote:
>> > >
>> > > > Hi Martijn,
>> > > >
>> > > > I think the change proposed in this thread would *not* break the idea
>> > in
>> > > > FLIP-33. FLIP-33 standardized metrics such as numRecordsIn, by
>> > specifying
>> > > > the name/type/semantics of those metrics so that all connectors can
>> > > follow.
>> > > > The name/type/semantics of numRecordsIn metric would be the same
>> before
>> > > and
>> > > > after the proposed change.
>> > > >
>> > > > Does this make sense? Or could you explain which part of FLIP-33
>> would
>> > be
>> > > > broken by the proposed change?
>> > > >
>> > > > Regards,
>> > > > Dong
>> > > >
>> > > >
>> > > > On Sun, Jan 8, 2023 at 9:33 PM Martijn Visser <
>> > martijnvis...@apache.org>
>> > > > wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > This feels like we purposely want to abandon the idea that was
>> > > introduced
>> > > > > with FLIP-33 on standardizing metrics [1]. From this proposal, I
>> > don't
>> > > > see
>> > > > > why we want to abandon that idea. Next to that, if you override
>> > > > > the numRecordsIn logic, you also touch on the other metrics that
>> rely
>> > > on
>> > > > > this value.
>> > > > >
>> > > > > Best regards,
>> > > > >
>> > > > > Martijn
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
>> > > > >
>> > > > > On Sun, Jan 8, 2023 at 12:43 PM Wencong Liu 
>> > > > wrote:
>> > > > >
>> > > > > > Thanks for the discussion, Dong and John
>> > > > > >
>> > > > > >
>> > > > > > Considering the extra cost of method calls, the approach of
>> adding
>> > an
>> > > > > > extra SourceReaderBase constructor
>> > > > > > should be a better choice. If there is no further discussion, I
>> > will
>> > > > > > follow this plan.
>> > > > > >
>> > > > > >
>> > > > > > Best,
>> > > > > > Wencong
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > At 2023-01-08 01:05:07, "John Roesler" 
>> > wrote:
>> > > > > > >Thanks, for the discussion, Dong.
>> > > > > > >
>> > > > > > >To answer your question: I was unclear if the desire was to
>> > > deprecate
>> > > > > the
>> > > > > > metric itself, to be replaced with some other metric, or whether
>> we
>> > > > just
>> > > > > > wanted to deprecate the way the superclass manages it. It’s clear
>> > now
>> > > > > that
>> > > > > > we want to keep the metric and only change the superclass logic.
>> > > > > > >
>> > > > > > >I think we’re on the same page now.
>> > > > > > >
>&g

Re:Re: [ANNOUNCE] New Apache Flink Committer - Lincoln Lee

2023-01-09 Thread Wencong Liu
Congratulations, Lincoln!

Best regards,
Wencong














在 2023-01-10 13:25:09,"Yanfei Lei"  写道:
>Congratulations, well deserved!
>
>Best,
>Yanfei
>
>Yuan Mei  于2023年1月10日周二 13:16写道:
>
>> Congratulations, Lincoln!
>>
>> Best,
>> Yuan
>>
>> On Tue, Jan 10, 2023 at 12:23 PM Lijie Wang 
>> wrote:
>>
>> > Congratulations, Lincoln!
>> >
>> > Best,
>> > Lijie
>> >
>> > Jingsong Li  于2023年1月10日周二 12:07写道:
>> >
>> > > Congratulations, Lincoln!
>> > >
>> > > Best,
>> > > Jingsong
>> > >
>> > > On Tue, Jan 10, 2023 at 11:56 AM Leonard Xu  wrote:
>> > > >
>> > > > Congratulations, Lincoln!
>> > > >
>> > > > Impressive work in streaming semantics, well deserved!
>> > > >
>> > > >
>> > > > Best,
>> > > > Leonard
>> > > >
>> > > >
>> > > > > On Jan 10, 2023, at 11:52 AM, Jark Wu  wrote:
>> > > > >
>> > > > > Hi everyone,
>> > > > >
>> > > > > On behalf of the PMC, I'm very happy to announce Lincoln Lee as a
>> new
>> > > Flink
>> > > > > committer.
>> > > > >
>> > > > > Lincoln Lee has been a long-term Flink contributor since 2017. He
>> > > mainly
>> > > > > works on Flink
>> > > > > SQL parts and drives several important FLIPs, e.g., FLIP-232 (Retry
>> > > Async
>> > > > > I/O), FLIP-234 (
>> > > > > Retryable Lookup Join), FLIP-260 (TableFunction Finish). Besides,
>> He
>> > > also
>> > > > > contributed
>> > > > > much to Streaming Semantics, including the non-determinism problem
>> > and
>> > > the
>> > > > > message
>> > > > > ordering problem.
>> > > > >
>> > > > > Please join me in congratulating Lincoln for becoming a Flink
>> > > committer!
>> > > > >
>> > > > > Cheers,
>> > > > > Jark Wu
>> > > >
>> > >
>> >
>>


Re:Re: Re: [DISCUSS] Allow source readers extending SourceReaderBase to override numRecordsIn report logic

2023-01-09 Thread Wencong Liu
 FLIP-33 would
>> be
>> > > broken by the proposed change?
>> > >
>> > > Regards,
>> > > Dong
>> > >
>> > >
>> > > On Sun, Jan 8, 2023 at 9:33 PM Martijn Visser <
>> martijnvis...@apache.org>
>> > > wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > This feels like we purposely want to abandon the idea that was
>> > introduced
>> > > > with FLIP-33 on standardizing metrics [1]. From this proposal, I
>> don't
>> > > see
>> > > > why we want to abandon that idea. Next to that, if you override
>> > > > the numRecordsIn logic, you also touch on the other metrics that rely
>> > on
>> > > > this value.
>> > > >
>> > > > Best regards,
>> > > >
>> > > > Martijn
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
>> > > >
>> > > > On Sun, Jan 8, 2023 at 12:43 PM Wencong Liu 
>> > > wrote:
>> > > >
>> > > > > Thanks for the discussion, Dong and John
>> > > > >
>> > > > >
>> > > > > Considering the extra cost of method calls, the approach of adding
>> an
>> > > > > extra SourceReaderBase constructor
>> > > > > should be a better choice. If there is no further discussion, I
>> will
>> > > > > follow this plan.
>> > > > >
>> > > > >
>> > > > > Best,
>> > > > > Wencong
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > At 2023-01-08 01:05:07, "John Roesler" 
>> wrote:
>> > > > > >Thanks, for the discussion, Dong.
>> > > > > >
>> > > > > >To answer your question: I was unclear if the desire was to
>> > deprecate
>> > > > the
>> > > > > metric itself, to be replaced with some other metric, or whether we
>> > > just
>> > > > > wanted to deprecate the way the superclass manages it. It’s clear
>> now
>> > > > that
>> > > > > we want to keep the metric and only change the superclass logic.
>> > > > > >
>> > > > > >I think we’re on the same page now.
>> > > > > >
>> > > > > >Thanks!
>> > > > > >-John
>> > > > > >
>> > > > > >On Sat, Jan 7, 2023, at 07:21, Dong Lin wrote:
>> > > > > >> Hi John,
>> > > > > >>
>> > > > > >> Not sure if I understand the difference between "deprecate that
>> > > > metric"
>> > > > > and
>> > > > > >> "deprecate the private counter mechanism". I think what we want
>> is
>> > > to
>> > > > > >> update SourceReaderBase's implementation so that it no longer
>> > > > explicitly
>> > > > > >> increments this metric. But we still need to expose this metric
>> to
>> > > > user.
>> > > > > >> And methods such as RecordEmitter#emitRecord (which can be
>> invoked
>> > > by
>> > > > > >> SourceReaderBase#pollNext(...)) can potentially increment this
>> > > metric.
>> > > > > >>
>> > > > > >> I like your approach of adding an extra SourceReaderBase
>> > > constructor.
>> > > > > That
>> > > > > >> appears simpler than adding a deprecated config. We can mark the
>> > > > > existing
>> > > > > >> SourceReaderBase constructor as @deprecated.
>> > > > > SourceReaderBase#pollNext(...)
>> > > > > >> will not increment the counter if a subclass uses the newly
>> added
>

Re:Re: [DISCUSS] Allow source readers extending SourceReaderBase to override numRecordsIn report logic

2023-01-08 Thread Wencong Liu
Thanks for the discussion, Dong and John


Considering the extra cost of method calls, the approach of adding an extra 
SourceReaderBase constructor 
should be a better choice. If there is no further discussion, I will follow 
this plan.


Best,
Wencong

















At 2023-01-08 01:05:07, "John Roesler"  wrote:
>Thanks, for the discussion, Dong. 
>
>To answer your question: I was unclear if the desire was to deprecate the 
>metric itself, to be replaced with some other metric, or whether we just 
>wanted to deprecate the way the superclass manages it. It’s clear now that we 
>want to keep the metric and only change the superclass logic.
>
>I think we’re on the same page now. 
>
>Thanks!
>-John
>
>On Sat, Jan 7, 2023, at 07:21, Dong Lin wrote:
>> Hi John,
>>
>> Not sure if I understand the difference between "deprecate that metric" and
>> "deprecate the private counter mechanism". I think what we want is to
>> update SourceReaderBase's implementation so that it no longer explicitly
>> increments this metric. But we still need to expose this metric to user.
>> And methods such as RecordEmitter#emitRecord (which can be invoked by
>> SourceReaderBase#pollNext(...)) can potentially increment this metric.
>>
>> I like your approach of adding an extra SourceReaderBase constructor. That
>> appears simpler than adding a deprecated config. We can mark the existing
>> SourceReaderBase constructor as @deprecated. SourceReaderBase#pollNext(...)
>> will not increment the counter if a subclass uses the newly added
>> constructor.
>>
>> Cheers,
>> Dong
>>
>>
>> On Sat, Jan 7, 2023 at 4:47 AM John Roesler  wrote:
>>
>>> Thanks for the replies, Dong and Wencong!
>>>
>>> That’s a good point about the overhead of the extra method.
>>>
>>> Is the desire to actually deprecate that metric in a user-facing way, or
>>> just to deprecate the private counter mechanism?
>>>
>>> It seems like if the desire is to deprecate the existing private counter,
>>> we can accomplish it by deprecating the current constructor and offering
>>> another that is documented not to track the metric. This seems better than
>>> the config option, since the concern is purely about the division of
>>> responsibilities between the sub- and super-class.
>>>
>>> Another option, which might be better if we wish to keep a uniformly named
>>> metric, would be to simply make the counter protected. That would be better
>>> if we’re generally happy with the metric and counter, but a few special
>>> source connectors need to emit records in other ways.
>>>
>>> And finally, if we really want to get rid of the metric itself, then I
>>> agree, a config is the way to do it.
>>>
>>> Thanks,
>>> John
>>>
>>> On Fri, Jan 6, 2023, at 00:55, Dong Lin wrote:
>>> > Hi John and Wencong,
>>> >
>>> > Thanks for the reply!
>>> >
>>> > It is nice that optional-2 can address the problem without affecting the
>>> > existing source connectors as far as functionality is concerned. One
>>> > potential concern with this approach is that it might increase the Flink
>>> > runtime overhead by adding one more virtual functional call to the
>>> > per-record runtime call stack.
>>> >
>>> > Since Java's default MaxInlineLevel is 12-18, I believe it is easy for an
>>> > operator chain of 5+ operators to exceed this limit. In this case. And
>>> > option-2 would incur one more virtual table lookup to produce each
>>> record.
>>> > It is not clear how much this overhead would show up for jobs with a
>>> chain
>>> > of lightweight operators. I am recently working on FLINK-30531
>>> > <https://issues.apache.org/jira/browse/FLINK-30531> to reduce runtime
>>> > overhead which might be related to this discussion.
>>> >
>>> > In comparison to option-2, the option-3 provided in my earlier email
>>> would
>>> > not add this extra overhead. I think it might be worthwhile to invest in
>>> > the long-term performance (and simpler runtime infra) and pay for the
>>> > short-term cost of deprecating this metric in SourceOperatorBase. What do
>>> > you think?
>>> >
>>> > Regards,
>>> > Dong
>>> >
>>> >
>>> > On Thu, Jan 5, 2023 at 10:10 PM Wencong Liu 
>>> wrote:
>>> >
>>> >> Hi, All
>>> >>
>>> >>
>>> >> Thanks for the reply!
>>> >>
>>> >>
>>> >> I think both John and Dong's opinions are reasonable. John's Suggestion
>>> 2
>>> >> is a good implementation.
>>> >> It does not affect the existing source connectors, but also provides
>>> >> support
>>> >> for custom counter in the future implementation.
>>> >>
>>> >>
>>> >> WDYT?
>>> >>
>>> >>
>>> >> Best,
>>> >>
>>> >> Wencong Liu
>>>


Re:Re: [DISCUSS] Allow source readers extending SourceReaderBase to override numRecordsIn report logic

2023-01-05 Thread Wencong Liu
Hi, All


Thanks for the reply!


I think both John and Dong's opinions are reasonable. John's Suggestion 2 is a 
good implementation. 
It does not affect the existing source connectors, but also provides support 
for custom counter in the future implementation.


WDYT?


Best,

Wencong Liu

[DISCUSS] Allow source readers extending SourceReaderBase to override numRecordsIn report logic

2023-01-02 Thread Wencong Liu
Hi devs,  


I'd like to start a discussion about FLINK-30234: SourceReaderBase should 
provide an option to disable numRecordsIn metric registration [1].


As the FLINK-302345 describes, the numRecordsIn metric is pre-registered for 
all sources in SourceReaderBase currently. Considering different implementation 
of source reader, the definition of "record" might differ from the one we use 
in SourceReaderBase, hence numRecordsIn might be inaccurate.


We could introduce an public option in SourceReaderOptions used in 
SourceReaderBase:


source.reader.metric.num_records_in.override: false


By default, the source reader will use the numRecordsIn metric in 
SourceReaderBase. If source reader want to report to metric by self, it can set 
source.reader.metric.num_records_in.override to true, which disables the 
registration of numRecordsIn in SourceReaderBase and let the actual 
implementation to report the metric instead.


Any thoughts on this?


[1]  https://issues.apache.org/jira/browse/FLINK-30234?jql=project%20%3D%20FLINK


Best, Wencong Liu

[jira] [Created] (FLINK-29097) Moving json se/deserializers from sql-gateway-api to sql-gateway

2022-08-24 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-29097:
---

 Summary: Moving json se/deserializers from sql-gateway-api to 
sql-gateway
 Key: FLINK-29097
 URL: https://issues.apache.org/jira/browse/FLINK-29097
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Gateway
Affects Versions: 1.16.0
Reporter: Wencong Liu


Considering that the current json se/deserialization rules for results returned 
by SqlGateway are only used in Rest Endpoint, we migrated the serialization 
related tools from the flink-sql-gateway-api to the flink-sql-gateway package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28974) Add doc for the API and Option of sql gateway rest endpoint

2022-08-15 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-28974:
---

 Summary: Add doc for the API and Option of sql gateway rest 
endpoint
 Key: FLINK-28974
 URL: https://issues.apache.org/jira/browse/FLINK-28974
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Gateway
Affects Versions: 1.16.0
Reporter: Wencong Liu


Add document for the API and Option of sql gateway rest endpoint.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28963) Add API compatibility test for Sql Gateway Rest Endpoint

2022-08-14 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-28963:
---

 Summary: Add API compatibility test for Sql Gateway Rest Endpoint
 Key: FLINK-28963
 URL: https://issues.apache.org/jira/browse/FLINK-28963
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Gateway
Affects Versions: 1.16.0
Reporter: Wencong Liu


Under the package {_}flink-runtime-web{_}, RestAPIStabilityTest performs 
compatibility checks on Rest API based on a series of CompatibilityRoutines. 
For Sql Gateway, its Rest Endpoint also needs to reuse the same rules to verify 
API compatibility, so as to ensure that all modifications to the Sql Gateway 
Rest API are compatible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28796) Add Statement Completement API for sql gateway rest endpoint

2022-08-03 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-28796:
---

 Summary: Add Statement Completement API for sql gateway rest 
endpoint
 Key: FLINK-28796
 URL: https://issues.apache.org/jira/browse/FLINK-28796
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Gateway
Reporter: Wencong Liu


SQL Gateway supports various clients: sql client, rest, hiveserver2, etc. Given 
the 1.16 feature freeze date, we won't be able to finish all the endpoints. 
Thus, we'd exclude one of the rest apis (tracked by this ticket) from 
[#FLINK-28163], which is only needed by the sql client, and still try to 
complete the remaining of them.

In other words, we'd expect the sql gateway to support rest & hiveserver2 apis 
in 1.16, and sql client in 1.17.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28777) Add configure session API for sql gateway rest endpoint

2022-08-02 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-28777:
---

 Summary: Add configure session API for sql gateway rest endpoint
 Key: FLINK-28777
 URL: https://issues.apache.org/jira/browse/FLINK-28777
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Gateway
Reporter: Wencong Liu


In the development of version 1.16, we will temporarily skip the development of 
configure session api in sql gateway rest endpoint. The compatibility between 
sql client and sql gateway is temporarily ignored, so the relevant development 
work will be carried out in the development work of version 1.17.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)