[RESULT] [VOTE] FLIP-314: Support Customized Job Lineage Listener

2024-03-04 Thread Yong Fang
Hi devs,

I'm happy to announce that FLIP-314: Support Customized Job Lineage
Listener [1] has been accepted with 9 approving votes (5 binding) [2]:

- Peter Huang (non-binding)
- Yangze Guo (binding)
- Zhanghao Chen (non-binding)
- Maciej Obuchowski (non-binding)
- Gyula Fóra (binding)
- Márton Balassi (binding)
- Feng Jin (binding)
- weijie guo (binding)
- Hang Ruan (non-binding)

There are no disapproving votes. Thanks to everyone who participated in the
discussion and voting.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
[2] https://lists.apache.org/thread/btbdclytd3gj70frosmlkrl5plhr9dl8

Best,
Fang Yong


[VOTE] FLIP-314: Support Customized Job Lineage Listener

2024-02-27 Thread Yong Fang
Hi devs,

I would like to restart a vote about FLIP-314: Support Customized Job
Lineage Listener[1].

Previously, we added lineage related interfaces in FLIP-314. Before the
interfaces were developed and merged into the master, @Maciej and
@Zhenqiu provided valuable suggestions for the interface from the
perspective of the lineage system. So we updated the interfaces of FLIP-314
and discussed them again in the discussion thread [2].

So I am here to initiate a new vote on FLIP-314, the vote will be open for
at least 72 hours unless there is an objection or insufficient votes

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
[2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc

Best,
Fang Yong


Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2024-02-22 Thread Yong Fang
Hi Martijn,

If there're no more comments, I will start a vote for this, thanks

Best,
Fang Yong

On Tue, Feb 20, 2024 at 4:53 PM Yong Fang  wrote:

> Hi Martijn,
>
> Thank you for your attention. Let me first explain the specific situation
> of FLIP-314. FLIP-314 is currently in an accepted state, but actual code
> development has not yet begun, and interface related PR has not been merged
> into the master. So it may not be necessary for us to create a separate
> FLIP. Currently, my idea is to directly update the interface on FLIP-314,
> but to initiate a separate thread with the context and we can vote there.
>
> What do you think? Thanks
>
> Best,
> Fang Yong
>
> On Mon, Feb 19, 2024 at 8:27 PM Martijn Visser 
> wrote:
>
>> I'm a bit confused: did we add new interfaces after FLIP-314 was
>> accepted? If so, please move the new interfaces to a new FLIP and
>> start a separate vote. We can't retrospectively change an accepted
>> FLIP with new interfaces and a new vote.
>>
>> On Mon, Feb 19, 2024 at 3:22 AM Yong Fang  wrote:
>> >
>> > Hi all,
>> >
>> > If there are no more feedbacks, I will start a vote for the new
>> interfaces
>> > in the next day, thanks
>> >
>> > Best,
>> > Fang Yong
>> >
>> > On Thu, Feb 8, 2024 at 1:30 PM Yong Fang  wrote:
>> >
>> > > Hi devs,
>> > >
>> > > According to the online-discussion in FLINK-3127 [1] and
>> > > offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would
>> like
>> > > to update the lineage vertex relevant interfaces in FLIP-314 [2] as
>> follows:
>> > >
>> > > 1. Introduce `LineageDataset` which represents source and sink in
>> > > `LineageVertex`. The fields in `LineageDataset` are as follows:
>> > > /* Name for this particular dataset. */
>> > > String name;
>> > > /* Unique name for this dataset's storage, for example, url for
>> jdbc
>> > > connector and location for lakehouse connector. */
>> > > String namespace;
>> > > /* Facets for the lineage vertex to describe the particular
>> > > information of dataset, such as schema and config. */
>> > > Map facets;
>> > >
>> > > 2. There may be multiple datasets in one `LineageVertex`, for example,
>> > > kafka source or hybrid source. So users can get dataset list from
>> > > `LineageVertex`:
>> > > /** Get datasets from the lineage vertex. */
>> > > List datasets();
>> > >
>> > > 3. There will be built in facets for config and schema. To describe
>> > > columns in table/sql jobs and datastream jobs, we introduce
>> > > `DatasetSchemaField`.
>> > > /** Builtin config facet for dataset. */
>> > > @PublicEvolving
>> > > public interface DatasetConfigFacet extends LineageDatasetFacet {
>> > > Map config();
>> > > }
>> > >
>> > > /** Field for schema in dataset. */
>> > > public interface DatasetSchemaField {
>> > > /** The name of the field. */
>> > > String name();
>> > > /** The type of the field. */
>> > > T type();
>> > > }
>> > >
>> > > Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking
>> forward
>> > > to your feedback, thanks
>> > >
>> > > Best,
>> > > Fang Yong
>> > >
>> > > On Mon, Sep 25, 2023 at 1:18 PM Shammon FY  wrote:
>> > >
>> > >> Hi David,
>> > >>
>> > >> Do you want the detailed topology for Flink job? You can get
>> > >> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it
>> has
>> > >> `String jsonPlan`. You can parse the json plan to get all steps and
>> > >> relations between them in a Flink job. Hope this can help you,
>> thanks!
>> > >>
>> > >> Best,
>> > >> Shammon FY
>> > >>
>> > >> On Tue, Sep 19, 2023 at 11:46 PM David Radley <
>> david_rad...@uk.ibm.com>
>> > >> wrote:
>> > >>
>> > >>> Hi there,
>> > >>> I am looking at the interfaces. If I am reading it correctly,there
>> is
>> > >>> one relationship between the source and sink and this relationship
>> > >>> rep

Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2024-02-20 Thread Yong Fang
Hi Martijn,

Thank you for your attention. Let me first explain the specific situation
of FLIP-314. FLIP-314 is currently in an accepted state, but actual code
development has not yet begun, and interface related PR has not been merged
into the master. So it may not be necessary for us to create a separate
FLIP. Currently, my idea is to directly update the interface on FLIP-314,
but to initiate a separate thread with the context and we can vote there.

What do you think? Thanks

Best,
Fang Yong

On Mon, Feb 19, 2024 at 8:27 PM Martijn Visser 
wrote:

> I'm a bit confused: did we add new interfaces after FLIP-314 was
> accepted? If so, please move the new interfaces to a new FLIP and
> start a separate vote. We can't retrospectively change an accepted
> FLIP with new interfaces and a new vote.
>
> On Mon, Feb 19, 2024 at 3:22 AM Yong Fang  wrote:
> >
> > Hi all,
> >
> > If there are no more feedbacks, I will start a vote for the new
> interfaces
> > in the next day, thanks
> >
> > Best,
> > Fang Yong
> >
> > On Thu, Feb 8, 2024 at 1:30 PM Yong Fang  wrote:
> >
> > > Hi devs,
> > >
> > > According to the online-discussion in FLINK-3127 [1] and
> > > offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would
> like
> > > to update the lineage vertex relevant interfaces in FLIP-314 [2] as
> follows:
> > >
> > > 1. Introduce `LineageDataset` which represents source and sink in
> > > `LineageVertex`. The fields in `LineageDataset` are as follows:
> > > /* Name for this particular dataset. */
> > > String name;
> > > /* Unique name for this dataset's storage, for example, url for
> jdbc
> > > connector and location for lakehouse connector. */
> > > String namespace;
> > > /* Facets for the lineage vertex to describe the particular
> > > information of dataset, such as schema and config. */
> > > Map facets;
> > >
> > > 2. There may be multiple datasets in one `LineageVertex`, for example,
> > > kafka source or hybrid source. So users can get dataset list from
> > > `LineageVertex`:
> > > /** Get datasets from the lineage vertex. */
> > > List datasets();
> > >
> > > 3. There will be built in facets for config and schema. To describe
> > > columns in table/sql jobs and datastream jobs, we introduce
> > > `DatasetSchemaField`.
> > > /** Builtin config facet for dataset. */
> > > @PublicEvolving
> > > public interface DatasetConfigFacet extends LineageDatasetFacet {
> > > Map config();
> > > }
> > >
> > > /** Field for schema in dataset. */
> > > public interface DatasetSchemaField {
> > > /** The name of the field. */
> > > String name();
> > > /** The type of the field. */
> > > T type();
> > > }
> > >
> > > Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking
> forward
> > > to your feedback, thanks
> > >
> > > Best,
> > > Fang Yong
> > >
> > > On Mon, Sep 25, 2023 at 1:18 PM Shammon FY  wrote:
> > >
> > >> Hi David,
> > >>
> > >> Do you want the detailed topology for Flink job? You can get
> > >> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it
> has
> > >> `String jsonPlan`. You can parse the json plan to get all steps and
> > >> relations between them in a Flink job. Hope this can help you, thanks!
> > >>
> > >> Best,
> > >> Shammon FY
> > >>
> > >> On Tue, Sep 19, 2023 at 11:46 PM David Radley <
> david_rad...@uk.ibm.com>
> > >> wrote:
> > >>
> > >>> Hi there,
> > >>> I am looking at the interfaces. If I am reading it correctly,there is
> > >>> one relationship between the source and sink and this relationship
> > >>> represents the operational lineage. Lineage is usually represented
> as asset
> > >>> -> process - > asset – see for example
> > >>>
> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
> > >>>
> > >>> Maybe I am missing it, but it seems to be that it would be useful to
> > >>> store the process in the lineage graph.
> > >>>
> > >>> It is useful to have the top level lineage as source -> Flink job ->
> > >>> sink. Where the Flink job is the p

Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2024-02-18 Thread Yong Fang
Hi all,

If there are no more feedbacks, I will start a vote for the new interfaces
in the next day, thanks

Best,
Fang Yong

On Thu, Feb 8, 2024 at 1:30 PM Yong Fang  wrote:

> Hi devs,
>
> According to the online-discussion in FLINK-3127 [1] and
> offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would like
> to update the lineage vertex relevant interfaces in FLIP-314 [2] as follows:
>
> 1. Introduce `LineageDataset` which represents source and sink in
> `LineageVertex`. The fields in `LineageDataset` are as follows:
> /* Name for this particular dataset. */
> String name;
> /* Unique name for this dataset's storage, for example, url for jdbc
> connector and location for lakehouse connector. */
> String namespace;
> /* Facets for the lineage vertex to describe the particular
> information of dataset, such as schema and config. */
> Map facets;
>
> 2. There may be multiple datasets in one `LineageVertex`, for example,
> kafka source or hybrid source. So users can get dataset list from
> `LineageVertex`:
> /** Get datasets from the lineage vertex. */
> List datasets();
>
> 3. There will be built in facets for config and schema. To describe
> columns in table/sql jobs and datastream jobs, we introduce
> `DatasetSchemaField`.
> /** Builtin config facet for dataset. */
> @PublicEvolving
> public interface DatasetConfigFacet extends LineageDatasetFacet {
> Map config();
> }
>
> /** Field for schema in dataset. */
> public interface DatasetSchemaField {
> /** The name of the field. */
> String name();
> /** The type of the field. */
> T type();
> }
>
> Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking forward
> to your feedback, thanks
>
> Best,
> Fang Yong
>
> On Mon, Sep 25, 2023 at 1:18 PM Shammon FY  wrote:
>
>> Hi David,
>>
>> Do you want the detailed topology for Flink job? You can get
>> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has
>> `String jsonPlan`. You can parse the json plan to get all steps and
>> relations between them in a Flink job. Hope this can help you, thanks!
>>
>> Best,
>> Shammon FY
>>
>> On Tue, Sep 19, 2023 at 11:46 PM David Radley 
>> wrote:
>>
>>> Hi there,
>>> I am looking at the interfaces. If I am reading it correctly,there is
>>> one relationship between the source and sink and this relationship
>>> represents the operational lineage. Lineage is usually represented as asset
>>> -> process - > asset – see for example
>>> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
>>>
>>> Maybe I am missing it, but it seems to be that it would be useful to
>>> store the process in the lineage graph.
>>>
>>> It is useful to have the top level lineage as source -> Flink job ->
>>> sink. Where the Flink job is the process, but also to have this asset ->
>>> process -> asset pattern for each of the steps in the job. If this is
>>> present, please could you point me to it,
>>>
>>>   Kind regards, David.
>>>
>>>
>>>
>>>
>>>
>>> From: David Radley 
>>> Date: Tuesday, 19 September 2023 at 16:11
>>> To: dev@flink.apache.org 
>>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job
>>> Lineage Listener
>>> Hi,
>>> I notice that there is an experimental lineage integration for Flink
>>> with OpenLineage https://openlineage.io/docs/integrations/flink  . I
>>> think this feature would allow for a superior Flink OpenLineage integration,
>>> Kind regards, David.
>>>
>>> From: XTransfer 
>>> Date: Tuesday, 19 September 2023 at 15:47
>>> To: dev@flink.apache.org 
>>> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job
>>> Lineage Listener
>>> Thanks Shammon for this proposal.
>>>
>>> That’s helpful for collecting the lineage of Flink tasks.
>>> Looking forward to its implementation.
>>>
>>> Best,
>>> Jiabao
>>>
>>>
>>> > 2023年9月18日 20:56,Leonard Xu  写道:
>>> >
>>> > Thanks Shammon for the informations, the comment makes the lifecycle
>>> clearer.
>>> > +1
>>> >
>>> >
>>> > Best,
>>> > Leonard
>>> >
>>> >
>>> >> On Sep 18, 2023, at 7:54 PM, Shammon FY  wrote:
>>> >>
>>> >

Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2024-02-07 Thread Yong Fang
Hi devs,

According to the online-discussion in FLINK-3127 [1] and offline-discussion
with Maciej Obuchowski and Zhenqiu Huang, we would like to update the
lineage vertex relevant interfaces in FLIP-314 [2] as follows:

1. Introduce `LineageDataset` which represents source and sink in
`LineageVertex`. The fields in `LineageDataset` are as follows:
/* Name for this particular dataset. */
String name;
/* Unique name for this dataset's storage, for example, url for jdbc
connector and location for lakehouse connector. */
String namespace;
/* Facets for the lineage vertex to describe the particular information
of dataset, such as schema and config. */
Map facets;

2. There may be multiple datasets in one `LineageVertex`, for example,
kafka source or hybrid source. So users can get dataset list from
`LineageVertex`:
/** Get datasets from the lineage vertex. */
List datasets();

3. There will be built in facets for config and schema. To describe columns
in table/sql jobs and datastream jobs, we introduce `DatasetSchemaField`.
/** Builtin config facet for dataset. */
@PublicEvolving
public interface DatasetConfigFacet extends LineageDatasetFacet {
Map config();
}

/** Field for schema in dataset. */
public interface DatasetSchemaField {
/** The name of the field. */
String name();
/** The type of the field. */
T type();
}

Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking forward
to your feedback, thanks

Best,
Fang Yong

On Mon, Sep 25, 2023 at 1:18 PM Shammon FY  wrote:

> Hi David,
>
> Do you want the detailed topology for Flink job? You can get
> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has
> `String jsonPlan`. You can parse the json plan to get all steps and
> relations between them in a Flink job. Hope this can help you, thanks!
>
> Best,
> Shammon FY
>
> On Tue, Sep 19, 2023 at 11:46 PM David Radley 
> wrote:
>
>> Hi there,
>> I am looking at the interfaces. If I am reading it correctly,there is one
>> relationship between the source and sink and this relationship represents
>> the operational lineage. Lineage is usually represented as asset -> process
>> - > asset – see for example
>> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
>>
>> Maybe I am missing it, but it seems to be that it would be useful to
>> store the process in the lineage graph.
>>
>> It is useful to have the top level lineage as source -> Flink job ->
>> sink. Where the Flink job is the process, but also to have this asset ->
>> process -> asset pattern for each of the steps in the job. If this is
>> present, please could you point me to it,
>>
>>   Kind regards, David.
>>
>>
>>
>>
>>
>> From: David Radley 
>> Date: Tuesday, 19 September 2023 at 16:11
>> To: dev@flink.apache.org 
>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job
>> Lineage Listener
>> Hi,
>> I notice that there is an experimental lineage integration for Flink with
>> OpenLineage https://openlineage.io/docs/integrations/flink  . I think
>> this feature would allow for a superior Flink OpenLineage integration,
>> Kind regards, David.
>>
>> From: XTransfer 
>> Date: Tuesday, 19 September 2023 at 15:47
>> To: dev@flink.apache.org 
>> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job
>> Lineage Listener
>> Thanks Shammon for this proposal.
>>
>> That’s helpful for collecting the lineage of Flink tasks.
>> Looking forward to its implementation.
>>
>> Best,
>> Jiabao
>>
>>
>> > 2023年9月18日 20:56,Leonard Xu  写道:
>> >
>> > Thanks Shammon for the informations, the comment makes the lifecycle
>> clearer.
>> > +1
>> >
>> >
>> > Best,
>> > Leonard
>> >
>> >
>> >> On Sep 18, 2023, at 7:54 PM, Shammon FY  wrote:
>> >>
>> >> Hi devs,
>> >>
>> >> After discussing with @Qingsheng, I fixed a minor issue of the lineage
>> lifecycle in `StreamExecutionEnvironment`. I have added the comment to
>> explain that the lineage information in `StreamExecutionEnvironment` will
>> be consistent with that of transformations. When users clear the existing
>> transformations, the added lineage information will also be deleted.
>> >>
>> >> Please help to review it again, and If there are no more concerns
>> about FLIP-314[1], I would like to start voting later, thanks. cc @
>> <>Leonard
>> >>
>> >> Best,
>> >> Shammon FY
>> >>
>> >> On Mon, Jul 17, 2023 at 3:43 PM Shammon FY > zjur...@gmail.com>> wrote:
>> >> Hi devs,
>> >>
>> >> Thanks for all the valuable feedback. If there are no more concerns
>> about FLIP-314[1], I would like to start voting later, thanks.
>> >>
>> >>
>> >> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>>  <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>> >
>> >>
>> >> Best,
>> >> Shammon FY
>> >>
>> >>
>> >> On Wed, Jul 12, 2023 at 11:18 

[RESULT] [VOTE] FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-08 Thread Yong Fang
Hi devs,

I'm happy to announce that FLIP-398: Improve Serialization Configuration
And Usage In Flink[1] has been accepted with 4 approving votes (3 binding)
[2]:

 - Xintong Song (binding)
 - Zhanghao Chen (non-binding)
 - Zhu Zhu (binding)
 - weijie guo (binding)

There're no disapproving votes.

Thanks again to everyone who participated in the discussion and voting.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
[2] https://lists.apache.org/thread/2xmcxs67xxzwool554fglrnklyvw348h

Best,
Fang Yong


Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-07 Thread Yong Fang
I agree with @Rui that the current configuration for Flink Client is a
little complex. Can we just provide one strategy with less configuration
items for all scenarios?

Best,
Fang Yong

On Mon, Jan 8, 2024 at 11:19 AM Rui Fan <1996fan...@gmail.com> wrote:

> Thanks xiangyu for driving this proposal! And sorry for the
> late reply.
>
> Overall looks good to me, I only have some minor questions:
>
> 1. Do we need to introduce 3 collect strategies in the first version?
>
> Large and comprehensive configuration items will bring
> additional learning costs and usage costs to users. I tend to
> provide users with out-of-the-box parameters and 2 collect
> strategies may be enough for users.
>
> IIUC, there is no big difference between exponential-delay and
> incremental-delay, especially the default parameters provided.
> I wonder could we provide a multiplier for exponential-delay strategy
> and removing the incremental-delay strategy?
>
> Of course, if you think multiplier option is not needed based on
> your production experience, it's totally fine for me. Simple is better.
>
> 2. Which strategy do you think is best in mass production?
>
> I'm working on FLIP-364[1], it's related to Flink failover restart
> strategy. IIUC, when one cluster only has a few flink jobs,
> fixed-delay is fine. It guarantees minimal latency without too
> much stress. But if one cluster has too many jobs, fixed-delay
> may not be stable.
>
> Do you think exponential-delay is better than fixed delay in this
> scenario? And which strategy is used in your production for now?
> Would you mind sharing it?
>
> Looking forwarding to your opinion~
>
> Best,
> Rui
>
> On Sat, Jan 6, 2024 at 5:54 PM xiangyu feng  wrote:
>
> > Hi all,
> >
> > Thanks for the comments.
> >
> > If there is no further comment, we will open the voting thread next week.
> >
> > Regards,
> > Xiangyu
> >
> > Zhanghao Chen  于2024年1月3日周三 16:46写道:
> >
> > > Thanks for driving this effort on improving the interactive use
> > experience
> > > of Flink. The proposal overall looks good to me.
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > From: xiangyu feng 
> > > Sent: Tuesday, December 26, 2023 16:51
> > > To: dev@flink.apache.org 
> > > Subject: [Discuss] FLIP-407: Improve Flink Client performance in
> > > interactive scenarios
> > >
> > > Hi devs,
> > >
> > > I'm opening this thread to discuss FLIP-407: Improve Flink Client
> > > performance in interactive scenarios. The POC test results and design
> doc
> > > can be found at: FLIP-407
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+when+interacting+with+dedicated+Flink+Session+Clusters
> > > >
> > > .
> > >
> > > Currently, Flink Client is mainly designed for one time interaction
> with
> > > the Flink Cluster. All the resources(http connections, threads, ha
> > > services) and instances(ClusterDescriptor, ClusterClient, RestClient)
> are
> > > created and recycled for each interaction. This works well when users
> do
> > > not need to interact frequently with Flink Cluster and also saves
> > resource
> > > usage since resources are recycled immediately after each usage.
> > >
> > > However, in OLAP or StreamingWarehouse scenarios, users might submit
> > > interactive jobs to a dedicated Flink Session Cluster very often. In
> this
> > > case, we find that for short queries that can finish in less than 1s in
> > > Flink Cluster will still have E2E latency greater than 2s. Hence, we
> > > propose this FLIP to improve the Flink Client performance in this
> > scenario.
> > > This could also improve the user experience when using session debug
> > mode.
> > >
> > > The major change in this FLIP is that there will be a new introduced
> > option
> > > *'execution.interactive-client'*. When this option is enabled, Flink
> > > Client will reuse all the necessary resources to improve interactive
> > > performance, including: HA Services, HTTP connections, threads and all
> > > kinds of instances related to a long-running Flink Cluster. The default
> > > value of this option will be false, then Flink Client will behave as
> > > before.
> > >
> > > Also, this FLIP proposed a configurable RetryStrategy when fetching
> > results
> > > from client-side to Flink Cluster. In interactive scenarios, this can
> > save
> > > more than 15% of TM CPU usage without performance degradation.
> > >
> > > Looking forward to your feedback, thanks.
> > >
> > > Best regards,
> > > Xiangyu
> > >
> >
>


Re: [DISCUSS] FLIP-403: High Availability Services for OLAP Scenarios

2024-01-07 Thread Yong Fang
Thanks Yangze for starting this discussion. I have one comment: why do we
need to abstract two services as `LeaderServices` and
`PersistenceServices`?

>From the content, the purpose of this FLIP is to make job failover more
lightweight, so it would be more appropriate to abstract two services as
`ClusterHighAvailabilityService` and `JobHighAvailabilityService` instead
of `LeaderServices` and `PersistenceServices` based on leader and store. In
this way, we can create a `JobHighAvailabilityService` that has a leader
service and store for the job that meets the requirements based on the
configuration in the zk/k8s high availability service.

WDYT?

Best,
Fang Yong

On Fri, Dec 29, 2023 at 8:10 PM xiangyu feng  wrote:

> Thanks Yangze for restart this discussion.
>
> +1 for the overall idea. By splitting the HighAvailabilityServices into
> LeaderServices and PersistenceServices, we may support configuring
> different storage behind them in the future.
>
> We did run into real problems in production where too much job metadata was
> being stored on ZK, causing system instability.
>
>
> Yangze Guo  于2023年12月29日周五 10:21写道:
>
> > Thanks for the response, Zhanghao.
> >
> > PersistenceServices sounds good to me.
> >
> > Best,
> > Yangze Guo
> >
> > On Wed, Dec 27, 2023 at 11:30 AM Zhanghao Chen
> >  wrote:
> > >
> > > Thanks for driving this effort, Yangze! The proposal overall LGTM.
> Other
> > from the throughput enhancement in the OLAP scenario, the separation of
> > leader election/discovery services and the metadata persistence services
> > will also make the HA impl clearer and easier to maintain. Just a minor
> > comment on naming: would it better to rename PersistentServices to
> > PersistenceServices, as usually we put a noun before Services?
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > From: Yangze Guo 
> > > Sent: Tuesday, December 19, 2023 17:33
> > > To: dev 
> > > Subject: [DISCUSS] FLIP-403: High Availability Services for OLAP
> > Scenarios
> > >
> > > Hi, there,
> > >
> > > We would like to start a discussion thread on "FLIP-403: High
> > > Availability Services for OLAP Scenarios"[1].
> > >
> > > Currently, Flink's high availability service consists of two
> > > mechanisms: leader election/retrieval services for JobManager and
> > > persistent services for job metadata. However, these mechanisms are
> > > set up in an "all or nothing" manner. In OLAP scenarios, we typically
> > > only require leader election/retrieval services for JobManager
> > > components since jobs usually do not have a restart strategy.
> > > Additionally, the persistence of job states can negatively impact the
> > > cluster's throughput, especially for short query jobs.
> > >
> > > To address these issues, this FLIP proposes splitting the
> > > HighAvailabilityServices into LeaderServices and PersistentServices,
> > > and enable users to independently configure the high availability
> > > strategies specifically related to jobs.
> > >
> > > Please find more details in the FLIP wiki document [1]. Looking
> > > forward to your feedback.
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-403+High+Availability+Services+for+OLAP+Scenarios
> > >
> > > Best,
> > > Yangze Guo
> >
>


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-07 Thread Yong Fang
Hi Ken,

I think the main reason is that currently Kryo is the only generic
serializer in Flink. I'm looking forward to your FLIP of Fury, and we can
continue to discuss this issue there.

If there are no other questions, I will close the voting for this FLIP.
Thank you again.

Best,
Fang Yong

On Sat, Jan 6, 2024 at 2:27 AM Ken Krugler 
wrote:

> Hi Fang Yong,
>
> Thanks for the response, and I understand the desire to limit the impact
> of this FLIP.
>
> I guess I should spend the time to start a new FLIP on switching to Fury,
> which could include cleaning up method names.
>
> In the context of “facilitate user understanding”, one aspect of this
> cleanup is the current ExecutionConfig.enable/disable/hasGenericTypes()
> methods.
>
> These are inconsistent with the current xxxKryo() methods, and cause
> confusion whenever I’m teaching a Flink course :)
>
> Regards,
>
> — Ken
>
>
>
>
> On Jan 4, 2024, at 6:40 PM, Yong Fang  wrote:
>
> Hi Ken,
>
> Sorry for the late reply. After discussing with @Xintong, we think it is
> better to keep the method names in the FLIP mainly for the following
> reasons:
>
> 1. This FLIP is mainly to support the configurable serializer while
> keeping consistent with Flink at the semantic layer. Keeping the existing
> naming rules can facilitate user understanding.
>
> 2. In the future, if Flink can choose Fury as the generic serializer, we
> can update the corresponding methods in that FLIP after the discussion of
> Fury is completed. This will be a minor modification, and we can avoid
> over-design in the current FLIP.
>
> Thanks for your feedback!
>
> Best,
> Fang Yong
>
> On Fri, Dec 29, 2023 at 12:38 PM Ken Krugler 
> wrote:
>
>> Hi Xintong,
>>
>> I agree that decoupling from Kryo is a bigger topic, well beyond the
>> scope of this FLIP.
>>
>> The reason I’d brought up Fury is that this increases my confidence that
>> Flink will want to decouple from Kryo sooner rather than later.
>>
>> So I feel it would be worth investing in a (minor) name change now, to
>> improve that migration path in the future. Thus my suggestion for avoiding
>> the explicit use of Kryo in method names.
>>
>> Regards,
>>
>> — Ken
>>
>>
>>
>>
>> > On Dec 17, 2023, at 7:16 PM, Xintong Song 
>> wrote:
>> >
>> > Hi Ken,
>> >
>> > I think the main purpose of this FLIP is to change how users interact
>> with
>> > the knobs for customizing the serialization behaviors, from requiring
>> code
>> > changes to working with pure configurations. Redesigning the knobs
>> (i.e.,
>> > names, semantics, etc.), on the other hand, is not the purpose of this
>> > FLIP. Preserving the existing names and semantics should also help
>> minimize
>> > the migration cost for existing users. Therefore, I'm in favor of not
>> > changing them.
>> >
>> > Concerning decoupling from Kryo, and introducing other serialization
>> > frameworks like Fury, I think that's a bigger topic that is worth
>> further
>> > discussion. At the moment, I'm not aware of any community consensus on
>> > doing so. And even if in the future we decide to do so, the changes
>> needed
>> > should be the same w/ or w/o this FLIP. So I'd suggest not to block this
>> > FLIP on these issues.
>> >
>> > WDYT?
>> >
>> > Best,
>> >
>> > Xintong
>> >
>> >
>> >
>> > On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler <
>> kkrugler_li...@transpac.com>
>> > wrote:
>> >
>> >> Hi Yong,
>> >>
>> >> Looks good, thanks for creating this.
>> >>
>> >> One comment - related to my recent email about Fury, I would love to
>> see
>> >> the v2 serialization decoupled from Kryo.
>> >>
>> >> As part of that, instead of using xxxKryo in methods, call them
>> xxxGeneric.
>> >>
>> >> A more extreme change would be to totally rely on Fury (so no more POJO
>> >> serializer). Fury is faster than the POJO serializer in my tests, but
>> this
>> >> would be a much bigger change.
>> >>
>> >> Though it could dramatically simplify the Flink serialization support.
>> >>
>> >> — Ken
>> >>
>> >> PS - a separate issue is how to migrate state from Kryo to something
>> like
>> >> Fury, which supports schema evolution. I think this might be possible,
>> by
>> >> having a sma

Re: [VOTE] FLIP-397: Add config options for administrator JVM options

2024-01-04 Thread Yong Fang
+1 (binding)

Best,
Fang Yong

On Thu, Jan 4, 2024 at 1:14 PM xiangyu feng  wrote:

> +1 (non-binding)
>
> Regards,
> Xiangyu Feng
>
> Rui Fan <1996fan...@gmail.com> 于2024年1月4日周四 13:03写道:
>
> > +1 (binding)
> >
> > Best,
> > Rui
> >
> > On Thu, Jan 4, 2024 at 11:45 AM Benchao Li  wrote:
> >
> > > +1 (binding)
> > >
> > > Zhanghao Chen  于2024年1月4日周四 10:30写道:
> > > >
> > > > Hi everyone,
> > > >
> > > > Thanks for all the feedbacks on FLIP-397 [1], which proposes to add a
> > > set of default JVM options for administrator use that prepends the
> > user-set
> > > extra JVM options for easier platform-wide JVM pre-tuning. It has been
> > > discussed in [2].
> > > >
> > > > I'd like to start a vote. The vote will be open for at least 72 hours
> > > (until January 8th 12:00 GMT) unless there is an objection or
> > insufficient
> > > votes.
> > > >
> > > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options
> > > > [2] https://lists.apache.org/thread/cflonyrfd1ftmyrpztzj3ywckbq41jzg
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-04 Thread Yong Fang
Hi Ken,

Sorry for the late reply. After discussing with @Xintong, we think it is
better to keep the method names in the FLIP mainly for the following
reasons:

1. This FLIP is mainly to support the configurable serializer while keeping
consistent with Flink at the semantic layer. Keeping the existing naming
rules can facilitate user understanding.

2. In the future, if Flink can choose Fury as the generic serializer, we
can update the corresponding methods in that FLIP after the discussion of
Fury is completed. This will be a minor modification, and we can avoid
over-design in the current FLIP.

Thanks for your feedback!

Best,
Fang Yong

On Fri, Dec 29, 2023 at 12:38 PM Ken Krugler 
wrote:

> Hi Xintong,
>
> I agree that decoupling from Kryo is a bigger topic, well beyond the scope
> of this FLIP.
>
> The reason I’d brought up Fury is that this increases my confidence that
> Flink will want to decouple from Kryo sooner rather than later.
>
> So I feel it would be worth investing in a (minor) name change now, to
> improve that migration path in the future. Thus my suggestion for avoiding
> the explicit use of Kryo in method names.
>
> Regards,
>
> — Ken
>
>
>
>
> > On Dec 17, 2023, at 7:16 PM, Xintong Song  wrote:
> >
> > Hi Ken,
> >
> > I think the main purpose of this FLIP is to change how users interact
> with
> > the knobs for customizing the serialization behaviors, from requiring
> code
> > changes to working with pure configurations. Redesigning the knobs (i.e.,
> > names, semantics, etc.), on the other hand, is not the purpose of this
> > FLIP. Preserving the existing names and semantics should also help
> minimize
> > the migration cost for existing users. Therefore, I'm in favor of not
> > changing them.
> >
> > Concerning decoupling from Kryo, and introducing other serialization
> > frameworks like Fury, I think that's a bigger topic that is worth further
> > discussion. At the moment, I'm not aware of any community consensus on
> > doing so. And even if in the future we decide to do so, the changes
> needed
> > should be the same w/ or w/o this FLIP. So I'd suggest not to block this
> > FLIP on these issues.
> >
> > WDYT?
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler  >
> > wrote:
> >
> >> Hi Yong,
> >>
> >> Looks good, thanks for creating this.
> >>
> >> One comment - related to my recent email about Fury, I would love to see
> >> the v2 serialization decoupled from Kryo.
> >>
> >> As part of that, instead of using xxxKryo in methods, call them
> xxxGeneric.
> >>
> >> A more extreme change would be to totally rely on Fury (so no more POJO
> >> serializer). Fury is faster than the POJO serializer in my tests, but
> this
> >> would be a much bigger change.
> >>
> >> Though it could dramatically simplify the Flink serialization support.
> >>
> >> — Ken
> >>
> >> PS - a separate issue is how to migrate state from Kryo to something
> like
> >> Fury, which supports schema evolution. I think this might be possible,
> by
> >> having a smarter deserializer that identifies state as being created by
> >> Kryo, and using (shaded) Kryo to deserialize, while still writing as
> Fury.
> >>
> >>> On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
> >>>
> >>> Hi devs,
> >>>
> >>> I'd like to start a discussion about FLIP-398: Improve Serialization
> >>> Configuration And Usage In Flink [1].
> >>>
> >>> Currently, users can register custom data types and serializers in
> Flink
> >>> jobs through various methods, including registration in code,
> >>> configuration, and annotations. These lead to difficulties in upgrading
> >>> Flink jobs and priority issues.
> >>>
> >>> In flink-2.0 we would like to manage job data types and serializers
> >> through
> >>> configurations. This FLIP will introduce a unified option for data type
> >> and
> >>> serializer and users can configure all custom data types and
> >>> pojo/kryo/custom serializers. In addition, this FLIP will add more
> >> built-in
> >>> serializers for complex data types such as List and Map, and optimize
> the
> >>> management of Avro Serializers.
> >>>
> >>> Looking forward to hearing from you, thanks!
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> >>>
> >>> Best,
> >>> Fang Yong
> >>
> >> --
> >> Ken Krugler
> >> http://www.scaleunlimited.com
> >> Custom big data solutions
> >> Flink & Pinot
> >>
> >>
> >>
> >>
>
>
>
> --
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink & Pinot
>
>
>
>


Re: [DISCUSS] FLIP-397: Add config options for administrator JVM options

2023-12-26 Thread Yong Fang
+1 for this, we have met jobs that need to set GC policies different from
the default ones to improve performance. Separating the default and
user-set ones can help us better manage them.

Best,
Fang Yong

On Fri, Dec 22, 2023 at 9:18 PM Benchao Li  wrote:

> +1 from my side,
>
> I also met some scenarios that I wanted to set some JVM options by
> default for all Flink jobs before, such as
> '-XX:-DontCompileHugeMethods', without it, some generated big methods
> won't be optimized in JVM C2 compiler, leading to poor performance.
>
> Zhanghao Chen  于2023年11月27日周一 20:04写道:
> >
> > Hi devs,
> >
> > I'd like to start a discussion on FLIP-397: Add config options for
> administrator JVM options [1].
> >
> > In production environments, users typically develop and operate their
> Flink jobs through a managed platform. Users may need to add JVM options to
> their Flink applications (e.g. to tune GC options). They typically use the
> env.java.opts.x series of options to do so. Platform administrators also
> have a set of JVM options to apply by default, e.g. to use JVM 17, enable
> GC logging, or apply pretuned GC options, etc. Both use cases will need to
> set the same series of options and will clobber one another. Similar issues
> have been described in SPARK-23472 [2].
> >
> > Therefore, I propose adding a set of default JVM options for
> administrator use that prepends the user-set extra JVM options.
> >
> > Looking forward to hearing from you.
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options
> > [2] https://issues.apache.org/jira/browse/SPARK-23472
> >
> > Best,
> > Zhanghao Chen
>
>
>
> --
>
> Best,
> Benchao Li
>


[VOTE] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-26 Thread Yong Fang
Hi devs,

Thanks for all feedback about the FLIP-398: Improve Serialization
Configuration And Usage In Flink [1] which has been discussed in [2].

I'd like to start a vote for it. The vote will be open for at least 72
hours unless there is an objection or insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
[2] https://lists.apache.org/thread/m67s4qfrh660lktpq7yqf9docvvf5o9l

Best,
Fang Yong


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-24 Thread Yong Fang
Hi devs,

Thanks for all the feedback. If there are no more comments, I would like to
start a vote for this FLIP, thanks again!

Best,
Fang Yong

On Wed, Dec 20, 2023 at 9:12 PM Yong Fang  wrote:

> Hi Ken,
>
> Thanks for your feedback. The purpose of this FLIP is to improve the use
> of serialization, including configurable serializer for users, providing
> serializer for composite data types, and resolving the default enabling of
> Kryo, etc. Introducing a better serialization framework would be a great
> help for Flink's performance, and it's great to see your tests on Fury.
> However, as @Xintong mentioned, this could be a huge work and beyond the
> scope of this FLIP. If you're interested, I think we could create a new
> FLIP for it and discuss it further. What do you think? Thanks.
>
> Best,
> Fang Yong
>
> On Mon, Dec 18, 2023 at 11:16 AM Xintong Song 
> wrote:
>
>> Hi Ken,
>>
>> I think the main purpose of this FLIP is to change how users interact with
>> the knobs for customizing the serialization behaviors, from requiring code
>> changes to working with pure configurations. Redesigning the knobs (i.e.,
>> names, semantics, etc.), on the other hand, is not the purpose of this
>> FLIP. Preserving the existing names and semantics should also help
>> minimize
>> the migration cost for existing users. Therefore, I'm in favor of not
>> changing them.
>>
>> Concerning decoupling from Kryo, and introducing other serialization
>> frameworks like Fury, I think that's a bigger topic that is worth further
>> discussion. At the moment, I'm not aware of any community consensus on
>> doing so. And even if in the future we decide to do so, the changes needed
>> should be the same w/ or w/o this FLIP. So I'd suggest not to block this
>> FLIP on these issues.
>>
>> WDYT?
>>
>> Best,
>>
>> Xintong
>>
>>
>>
>> On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler 
>> wrote:
>>
>> > Hi Yong,
>> >
>> > Looks good, thanks for creating this.
>> >
>> > One comment - related to my recent email about Fury, I would love to see
>> > the v2 serialization decoupled from Kryo.
>> >
>> > As part of that, instead of using xxxKryo in methods, call them
>> xxxGeneric.
>> >
>> > A more extreme change would be to totally rely on Fury (so no more POJO
>> > serializer). Fury is faster than the POJO serializer in my tests, but
>> this
>> > would be a much bigger change.
>> >
>> > Though it could dramatically simplify the Flink serialization support.
>> >
>> > — Ken
>> >
>> > PS - a separate issue is how to migrate state from Kryo to something
>> like
>> > Fury, which supports schema evolution. I think this might be possible,
>> by
>> > having a smarter deserializer that identifies state as being created by
>> > Kryo, and using (shaded) Kryo to deserialize, while still writing as
>> Fury.
>> >
>> > > On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
>> > >
>> > > Hi devs,
>> > >
>> > > I'd like to start a discussion about FLIP-398: Improve Serialization
>> > > Configuration And Usage In Flink [1].
>> > >
>> > > Currently, users can register custom data types and serializers in
>> Flink
>> > > jobs through various methods, including registration in code,
>> > > configuration, and annotations. These lead to difficulties in
>> upgrading
>> > > Flink jobs and priority issues.
>> > >
>> > > In flink-2.0 we would like to manage job data types and serializers
>> > through
>> > > configurations. This FLIP will introduce a unified option for data
>> type
>> > and
>> > > serializer and users can configure all custom data types and
>> > > pojo/kryo/custom serializers. In addition, this FLIP will add more
>> > built-in
>> > > serializers for complex data types such as List and Map, and optimize
>> the
>> > > management of Avro Serializers.
>> > >
>> > > Looking forward to hearing from you, thanks!
>> > >
>> > > [1]
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
>> > >
>> > > Best,
>> > > Fang Yong
>> >
>> > --
>> > Ken Krugler
>> > http://www.scaleunlimited.com
>> > Custom big data solutions
>> > Flink & Pinot
>> >
>> >
>> >
>> >
>>
>


Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-20 Thread Yong Fang
Hi Ken,

Thanks for your feedback. The purpose of this FLIP is to improve the use of
serialization, including configurable serializer for users, providing
serializer for composite data types, and resolving the default enabling of
Kryo, etc. Introducing a better serialization framework would be a great
help for Flink's performance, and it's great to see your tests on Fury.
However, as @Xintong mentioned, this could be a huge work and beyond the
scope of this FLIP. If you're interested, I think we could create a new
FLIP for it and discuss it further. What do you think? Thanks.

Best,
Fang Yong

On Mon, Dec 18, 2023 at 11:16 AM Xintong Song  wrote:

> Hi Ken,
>
> I think the main purpose of this FLIP is to change how users interact with
> the knobs for customizing the serialization behaviors, from requiring code
> changes to working with pure configurations. Redesigning the knobs (i.e.,
> names, semantics, etc.), on the other hand, is not the purpose of this
> FLIP. Preserving the existing names and semantics should also help minimize
> the migration cost for existing users. Therefore, I'm in favor of not
> changing them.
>
> Concerning decoupling from Kryo, and introducing other serialization
> frameworks like Fury, I think that's a bigger topic that is worth further
> discussion. At the moment, I'm not aware of any community consensus on
> doing so. And even if in the future we decide to do so, the changes needed
> should be the same w/ or w/o this FLIP. So I'd suggest not to block this
> FLIP on these issues.
>
> WDYT?
>
> Best,
>
> Xintong
>
>
>
> On Fri, Dec 15, 2023 at 1:40 AM Ken Krugler 
> wrote:
>
> > Hi Yong,
> >
> > Looks good, thanks for creating this.
> >
> > One comment - related to my recent email about Fury, I would love to see
> > the v2 serialization decoupled from Kryo.
> >
> > As part of that, instead of using xxxKryo in methods, call them
> xxxGeneric.
> >
> > A more extreme change would be to totally rely on Fury (so no more POJO
> > serializer). Fury is faster than the POJO serializer in my tests, but
> this
> > would be a much bigger change.
> >
> > Though it could dramatically simplify the Flink serialization support.
> >
> > — Ken
> >
> > PS - a separate issue is how to migrate state from Kryo to something like
> > Fury, which supports schema evolution. I think this might be possible, by
> > having a smarter deserializer that identifies state as being created by
> > Kryo, and using (shaded) Kryo to deserialize, while still writing as
> Fury.
> >
> > > On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
> > >
> > > Hi devs,
> > >
> > > I'd like to start a discussion about FLIP-398: Improve Serialization
> > > Configuration And Usage In Flink [1].
> > >
> > > Currently, users can register custom data types and serializers in
> Flink
> > > jobs through various methods, including registration in code,
> > > configuration, and annotations. These lead to difficulties in upgrading
> > > Flink jobs and priority issues.
> > >
> > > In flink-2.0 we would like to manage job data types and serializers
> > through
> > > configurations. This FLIP will introduce a unified option for data type
> > and
> > > serializer and users can configure all custom data types and
> > > pojo/kryo/custom serializers. In addition, this FLIP will add more
> > built-in
> > > serializers for complex data types such as List and Map, and optimize
> the
> > > management of Avro Serializers.
> > >
> > > Looking forward to hearing from you, thanks!
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> > >
> > > Best,
> > > Fang Yong
> >
> > --
> > Ken Krugler
> > http://www.scaleunlimited.com
> > Custom big data solutions
> > Flink & Pinot
> >
> >
> >
> >
>


[DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-06 Thread Yong Fang
Hi devs,

I'd like to start a discussion about FLIP-398: Improve Serialization
Configuration And Usage In Flink [1].

Currently, users can register custom data types and serializers in Flink
jobs through various methods, including registration in code,
configuration, and annotations. These lead to difficulties in upgrading
Flink jobs and priority issues.

In flink-2.0 we would like to manage job data types and serializers through
configurations. This FLIP will introduce a unified option for data type and
serializer and users can configure all custom data types and
pojo/kryo/custom serializers. In addition, this FLIP will add more built-in
serializers for complex data types such as List and Map, and optimize the
management of Avro Serializers.

Looking forward to hearing from you, thanks!

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink

Best,
Fang Yong


Re: [DISCUSS] Remove legacy Paimon (TableStore) doc link from Flink web navigation

2023-10-17 Thread Yong Fang
+1

On Tue, Oct 17, 2023 at 4:52 PM Leonard Xu  wrote:

> +1
>
>
> > 2023年10月17日 下午4:50,Martijn Visser  写道:
> >
> > +1
> >
> > On Tue, Oct 17, 2023 at 10:34 AM Jingsong Li 
> wrote:
> >>
> >> Hi marton,
> >>
> >> Thanks for driving. +1
> >>
> >> There is a PR to remove legacy Paimon
> >> https://github.com/apache/flink-web/pull/665 , but it hasn't been
> >> updated for a long time.
> >>
> >> Best,
> >> Jingsong
> >>
> >> On Tue, Oct 17, 2023 at 4:28 PM Márton Balassi 
> wrote:
> >>>
> >>> Hi Flink & Paimon devs,
> >>>
> >>> The Flink webpage documentation navigation section still lists the
> outdated TableStore 0.3 and master docs as subproject docs (see
> attachment). I am all for advertising Paimon as a sister project of Flink,
> but the current state is misleading in multiple ways.
> >>>
> >>> I would like to remove these obsolete links if the communities agree.
> >>>
> >>> Best,
> >>> Marton
>
>