Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2024-02-22 Thread Yong Fang
Hi Martijn,

If there're no more comments, I will start a vote for this, thanks

Best,
Fang Yong

On Tue, Feb 20, 2024 at 4:53 PM Yong Fang  wrote:

> Hi Martijn,
>
> Thank you for your attention. Let me first explain the specific situation
> of FLIP-314. FLIP-314 is currently in an accepted state, but actual code
> development has not yet begun, and interface related PR has not been merged
> into the master. So it may not be necessary for us to create a separate
> FLIP. Currently, my idea is to directly update the interface on FLIP-314,
> but to initiate a separate thread with the context and we can vote there.
>
> What do you think? Thanks
>
> Best,
> Fang Yong
>
> On Mon, Feb 19, 2024 at 8:27 PM Martijn Visser 
> wrote:
>
>> I'm a bit confused: did we add new interfaces after FLIP-314 was
>> accepted? If so, please move the new interfaces to a new FLIP and
>> start a separate vote. We can't retrospectively change an accepted
>> FLIP with new interfaces and a new vote.
>>
>> On Mon, Feb 19, 2024 at 3:22 AM Yong Fang  wrote:
>> >
>> > Hi all,
>> >
>> > If there are no more feedbacks, I will start a vote for the new
>> interfaces
>> > in the next day, thanks
>> >
>> > Best,
>> > Fang Yong
>> >
>> > On Thu, Feb 8, 2024 at 1:30 PM Yong Fang  wrote:
>> >
>> > > Hi devs,
>> > >
>> > > According to the online-discussion in FLINK-3127 [1] and
>> > > offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would
>> like
>> > > to update the lineage vertex relevant interfaces in FLIP-314 [2] as
>> follows:
>> > >
>> > > 1. Introduce `LineageDataset` which represents source and sink in
>> > > `LineageVertex`. The fields in `LineageDataset` are as follows:
>> > > /* Name for this particular dataset. */
>> > > String name;
>> > > /* Unique name for this dataset's storage, for example, url for
>> jdbc
>> > > connector and location for lakehouse connector. */
>> > > String namespace;
>> > > /* Facets for the lineage vertex to describe the particular
>> > > information of dataset, such as schema and config. */
>> > > Map facets;
>> > >
>> > > 2. There may be multiple datasets in one `LineageVertex`, for example,
>> > > kafka source or hybrid source. So users can get dataset list from
>> > > `LineageVertex`:
>> > > /** Get datasets from the lineage vertex. */
>> > > List datasets();
>> > >
>> > > 3. There will be built in facets for config and schema. To describe
>> > > columns in table/sql jobs and datastream jobs, we introduce
>> > > `DatasetSchemaField`.
>> > > /** Builtin config facet for dataset. */
>> > > @PublicEvolving
>> > > public interface DatasetConfigFacet extends LineageDatasetFacet {
>> > > Map config();
>> > > }
>> > >
>> > > /** Field for schema in dataset. */
>> > > public interface DatasetSchemaField {
>> > > /** The name of the field. */
>> > > String name();
>> > > /** The type of the field. */
>> > > T type();
>> > > }
>> > >
>> > > Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking
>> forward
>> > > to your feedback, thanks
>> > >
>> > > Best,
>> > > Fang Yong
>> > >
>> > > On Mon, Sep 25, 2023 at 1:18 PM Shammon FY  wrote:
>> > >
>> > >> Hi David,
>> > >>
>> > >> Do you want the detailed topology for Flink job? You can get
>> > >> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it
>> has
>> > >> `String jsonPlan`. You can parse the json plan to get all steps and
>> > >> relations between them in a Flink job. Hope this can help you,
>> thanks!
>> > >>
>> > >> Best,
>> > >> Shammon FY
>> > >>
>> > >> On Tue, Sep 19, 2023 at 11:46 PM David Radley <
>> david_rad...@uk.ibm.com>
>> > >> wrote:
>> > >>
>> > >>> Hi there,
>> > >>> I am looking at the interfaces. If I am reading it correctly,there
>> is
>> > >>> one relationship between the source and sink and this relationship
>> > >>> represents the operational lineage. Lineage is usually represented
>> as asset
>> > >>> -> process - > asset – see for example
>> > >>>
>> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
>> > >>>
>> > >>> Maybe I am missing it, but it seems to be that it would be useful to
>> > >>> store the process in the lineage graph.
>> > >>>
>> > >>> It is useful to have the top level lineage as source -> Flink job ->
>> > >>> sink. Where the Flink job is the process, but also to have this
>> asset ->
>> > >>> process -> asset pattern for each of the steps in the job. If this
>> is
>> > >>> present, please could you point me to it,
>> > >>>
>> > >>>   Kind regards, David.
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> From: David Radley 
>> > >>> Date: Tuesday, 19 September 2023 at 16:11
>> > >>> To: dev@flink.apache.org 
>> > >>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job
>> > >>> Lineage Listener
>> > >>> Hi,
>> > >>> I notice that there is an experimental lineage integration for Flink
>> > >>> with OpenLineage 

Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2024-02-20 Thread Yong Fang
Hi Martijn,

Thank you for your attention. Let me first explain the specific situation
of FLIP-314. FLIP-314 is currently in an accepted state, but actual code
development has not yet begun, and interface related PR has not been merged
into the master. So it may not be necessary for us to create a separate
FLIP. Currently, my idea is to directly update the interface on FLIP-314,
but to initiate a separate thread with the context and we can vote there.

What do you think? Thanks

Best,
Fang Yong

On Mon, Feb 19, 2024 at 8:27 PM Martijn Visser 
wrote:

> I'm a bit confused: did we add new interfaces after FLIP-314 was
> accepted? If so, please move the new interfaces to a new FLIP and
> start a separate vote. We can't retrospectively change an accepted
> FLIP with new interfaces and a new vote.
>
> On Mon, Feb 19, 2024 at 3:22 AM Yong Fang  wrote:
> >
> > Hi all,
> >
> > If there are no more feedbacks, I will start a vote for the new
> interfaces
> > in the next day, thanks
> >
> > Best,
> > Fang Yong
> >
> > On Thu, Feb 8, 2024 at 1:30 PM Yong Fang  wrote:
> >
> > > Hi devs,
> > >
> > > According to the online-discussion in FLINK-3127 [1] and
> > > offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would
> like
> > > to update the lineage vertex relevant interfaces in FLIP-314 [2] as
> follows:
> > >
> > > 1. Introduce `LineageDataset` which represents source and sink in
> > > `LineageVertex`. The fields in `LineageDataset` are as follows:
> > > /* Name for this particular dataset. */
> > > String name;
> > > /* Unique name for this dataset's storage, for example, url for
> jdbc
> > > connector and location for lakehouse connector. */
> > > String namespace;
> > > /* Facets for the lineage vertex to describe the particular
> > > information of dataset, such as schema and config. */
> > > Map facets;
> > >
> > > 2. There may be multiple datasets in one `LineageVertex`, for example,
> > > kafka source or hybrid source. So users can get dataset list from
> > > `LineageVertex`:
> > > /** Get datasets from the lineage vertex. */
> > > List datasets();
> > >
> > > 3. There will be built in facets for config and schema. To describe
> > > columns in table/sql jobs and datastream jobs, we introduce
> > > `DatasetSchemaField`.
> > > /** Builtin config facet for dataset. */
> > > @PublicEvolving
> > > public interface DatasetConfigFacet extends LineageDatasetFacet {
> > > Map config();
> > > }
> > >
> > > /** Field for schema in dataset. */
> > > public interface DatasetSchemaField {
> > > /** The name of the field. */
> > > String name();
> > > /** The type of the field. */
> > > T type();
> > > }
> > >
> > > Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking
> forward
> > > to your feedback, thanks
> > >
> > > Best,
> > > Fang Yong
> > >
> > > On Mon, Sep 25, 2023 at 1:18 PM Shammon FY  wrote:
> > >
> > >> Hi David,
> > >>
> > >> Do you want the detailed topology for Flink job? You can get
> > >> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it
> has
> > >> `String jsonPlan`. You can parse the json plan to get all steps and
> > >> relations between them in a Flink job. Hope this can help you, thanks!
> > >>
> > >> Best,
> > >> Shammon FY
> > >>
> > >> On Tue, Sep 19, 2023 at 11:46 PM David Radley <
> david_rad...@uk.ibm.com>
> > >> wrote:
> > >>
> > >>> Hi there,
> > >>> I am looking at the interfaces. If I am reading it correctly,there is
> > >>> one relationship between the source and sink and this relationship
> > >>> represents the operational lineage. Lineage is usually represented
> as asset
> > >>> -> process - > asset – see for example
> > >>>
> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
> > >>>
> > >>> Maybe I am missing it, but it seems to be that it would be useful to
> > >>> store the process in the lineage graph.
> > >>>
> > >>> It is useful to have the top level lineage as source -> Flink job ->
> > >>> sink. Where the Flink job is the process, but also to have this
> asset ->
> > >>> process -> asset pattern for each of the steps in the job. If this is
> > >>> present, please could you point me to it,
> > >>>
> > >>>   Kind regards, David.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> From: David Radley 
> > >>> Date: Tuesday, 19 September 2023 at 16:11
> > >>> To: dev@flink.apache.org 
> > >>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job
> > >>> Lineage Listener
> > >>> Hi,
> > >>> I notice that there is an experimental lineage integration for Flink
> > >>> with OpenLineage https://openlineage.io/docs/integrations/flink  . I
> > >>> think this feature would allow for a superior Flink OpenLineage
> integration,
> > >>> Kind regards, David.
> > >>>
> > >>> From: XTransfer 
> > >>> Date: Tuesday, 19 September 2023 at 15:47
> > >>> To: dev@flink.apache.org 
> > >>> Subject: 

Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2024-02-19 Thread Martijn Visser
I'm a bit confused: did we add new interfaces after FLIP-314 was
accepted? If so, please move the new interfaces to a new FLIP and
start a separate vote. We can't retrospectively change an accepted
FLIP with new interfaces and a new vote.

On Mon, Feb 19, 2024 at 3:22 AM Yong Fang  wrote:
>
> Hi all,
>
> If there are no more feedbacks, I will start a vote for the new interfaces
> in the next day, thanks
>
> Best,
> Fang Yong
>
> On Thu, Feb 8, 2024 at 1:30 PM Yong Fang  wrote:
>
> > Hi devs,
> >
> > According to the online-discussion in FLINK-3127 [1] and
> > offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would like
> > to update the lineage vertex relevant interfaces in FLIP-314 [2] as follows:
> >
> > 1. Introduce `LineageDataset` which represents source and sink in
> > `LineageVertex`. The fields in `LineageDataset` are as follows:
> > /* Name for this particular dataset. */
> > String name;
> > /* Unique name for this dataset's storage, for example, url for jdbc
> > connector and location for lakehouse connector. */
> > String namespace;
> > /* Facets for the lineage vertex to describe the particular
> > information of dataset, such as schema and config. */
> > Map facets;
> >
> > 2. There may be multiple datasets in one `LineageVertex`, for example,
> > kafka source or hybrid source. So users can get dataset list from
> > `LineageVertex`:
> > /** Get datasets from the lineage vertex. */
> > List datasets();
> >
> > 3. There will be built in facets for config and schema. To describe
> > columns in table/sql jobs and datastream jobs, we introduce
> > `DatasetSchemaField`.
> > /** Builtin config facet for dataset. */
> > @PublicEvolving
> > public interface DatasetConfigFacet extends LineageDatasetFacet {
> > Map config();
> > }
> >
> > /** Field for schema in dataset. */
> > public interface DatasetSchemaField {
> > /** The name of the field. */
> > String name();
> > /** The type of the field. */
> > T type();
> > }
> >
> > Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking forward
> > to your feedback, thanks
> >
> > Best,
> > Fang Yong
> >
> > On Mon, Sep 25, 2023 at 1:18 PM Shammon FY  wrote:
> >
> >> Hi David,
> >>
> >> Do you want the detailed topology for Flink job? You can get
> >> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has
> >> `String jsonPlan`. You can parse the json plan to get all steps and
> >> relations between them in a Flink job. Hope this can help you, thanks!
> >>
> >> Best,
> >> Shammon FY
> >>
> >> On Tue, Sep 19, 2023 at 11:46 PM David Radley 
> >> wrote:
> >>
> >>> Hi there,
> >>> I am looking at the interfaces. If I am reading it correctly,there is
> >>> one relationship between the source and sink and this relationship
> >>> represents the operational lineage. Lineage is usually represented as 
> >>> asset
> >>> -> process - > asset – see for example
> >>> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
> >>>
> >>> Maybe I am missing it, but it seems to be that it would be useful to
> >>> store the process in the lineage graph.
> >>>
> >>> It is useful to have the top level lineage as source -> Flink job ->
> >>> sink. Where the Flink job is the process, but also to have this asset ->
> >>> process -> asset pattern for each of the steps in the job. If this is
> >>> present, please could you point me to it,
> >>>
> >>>   Kind regards, David.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: David Radley 
> >>> Date: Tuesday, 19 September 2023 at 16:11
> >>> To: dev@flink.apache.org 
> >>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job
> >>> Lineage Listener
> >>> Hi,
> >>> I notice that there is an experimental lineage integration for Flink
> >>> with OpenLineage https://openlineage.io/docs/integrations/flink  . I
> >>> think this feature would allow for a superior Flink OpenLineage 
> >>> integration,
> >>> Kind regards, David.
> >>>
> >>> From: XTransfer 
> >>> Date: Tuesday, 19 September 2023 at 15:47
> >>> To: dev@flink.apache.org 
> >>> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job
> >>> Lineage Listener
> >>> Thanks Shammon for this proposal.
> >>>
> >>> That’s helpful for collecting the lineage of Flink tasks.
> >>> Looking forward to its implementation.
> >>>
> >>> Best,
> >>> Jiabao
> >>>
> >>>
> >>> > 2023年9月18日 20:56,Leonard Xu  写道:
> >>> >
> >>> > Thanks Shammon for the informations, the comment makes the lifecycle
> >>> clearer.
> >>> > +1
> >>> >
> >>> >
> >>> > Best,
> >>> > Leonard
> >>> >
> >>> >
> >>> >> On Sep 18, 2023, at 7:54 PM, Shammon FY  wrote:
> >>> >>
> >>> >> Hi devs,
> >>> >>
> >>> >> After discussing with @Qingsheng, I fixed a minor issue of the
> >>> lineage lifecycle in `StreamExecutionEnvironment`. I have added the 
> >>> comment
> >>> to explain that the lineage information in `StreamExecutionEnvironment`
> 

Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2024-02-18 Thread Yong Fang
Hi all,

If there are no more feedbacks, I will start a vote for the new interfaces
in the next day, thanks

Best,
Fang Yong

On Thu, Feb 8, 2024 at 1:30 PM Yong Fang  wrote:

> Hi devs,
>
> According to the online-discussion in FLINK-3127 [1] and
> offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would like
> to update the lineage vertex relevant interfaces in FLIP-314 [2] as follows:
>
> 1. Introduce `LineageDataset` which represents source and sink in
> `LineageVertex`. The fields in `LineageDataset` are as follows:
> /* Name for this particular dataset. */
> String name;
> /* Unique name for this dataset's storage, for example, url for jdbc
> connector and location for lakehouse connector. */
> String namespace;
> /* Facets for the lineage vertex to describe the particular
> information of dataset, such as schema and config. */
> Map facets;
>
> 2. There may be multiple datasets in one `LineageVertex`, for example,
> kafka source or hybrid source. So users can get dataset list from
> `LineageVertex`:
> /** Get datasets from the lineage vertex. */
> List datasets();
>
> 3. There will be built in facets for config and schema. To describe
> columns in table/sql jobs and datastream jobs, we introduce
> `DatasetSchemaField`.
> /** Builtin config facet for dataset. */
> @PublicEvolving
> public interface DatasetConfigFacet extends LineageDatasetFacet {
> Map config();
> }
>
> /** Field for schema in dataset. */
> public interface DatasetSchemaField {
> /** The name of the field. */
> String name();
> /** The type of the field. */
> T type();
> }
>
> Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking forward
> to your feedback, thanks
>
> Best,
> Fang Yong
>
> On Mon, Sep 25, 2023 at 1:18 PM Shammon FY  wrote:
>
>> Hi David,
>>
>> Do you want the detailed topology for Flink job? You can get
>> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has
>> `String jsonPlan`. You can parse the json plan to get all steps and
>> relations between them in a Flink job. Hope this can help you, thanks!
>>
>> Best,
>> Shammon FY
>>
>> On Tue, Sep 19, 2023 at 11:46 PM David Radley 
>> wrote:
>>
>>> Hi there,
>>> I am looking at the interfaces. If I am reading it correctly,there is
>>> one relationship between the source and sink and this relationship
>>> represents the operational lineage. Lineage is usually represented as asset
>>> -> process - > asset – see for example
>>> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
>>>
>>> Maybe I am missing it, but it seems to be that it would be useful to
>>> store the process in the lineage graph.
>>>
>>> It is useful to have the top level lineage as source -> Flink job ->
>>> sink. Where the Flink job is the process, but also to have this asset ->
>>> process -> asset pattern for each of the steps in the job. If this is
>>> present, please could you point me to it,
>>>
>>>   Kind regards, David.
>>>
>>>
>>>
>>>
>>>
>>> From: David Radley 
>>> Date: Tuesday, 19 September 2023 at 16:11
>>> To: dev@flink.apache.org 
>>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job
>>> Lineage Listener
>>> Hi,
>>> I notice that there is an experimental lineage integration for Flink
>>> with OpenLineage https://openlineage.io/docs/integrations/flink  . I
>>> think this feature would allow for a superior Flink OpenLineage integration,
>>> Kind regards, David.
>>>
>>> From: XTransfer 
>>> Date: Tuesday, 19 September 2023 at 15:47
>>> To: dev@flink.apache.org 
>>> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job
>>> Lineage Listener
>>> Thanks Shammon for this proposal.
>>>
>>> That’s helpful for collecting the lineage of Flink tasks.
>>> Looking forward to its implementation.
>>>
>>> Best,
>>> Jiabao
>>>
>>>
>>> > 2023年9月18日 20:56,Leonard Xu  写道:
>>> >
>>> > Thanks Shammon for the informations, the comment makes the lifecycle
>>> clearer.
>>> > +1
>>> >
>>> >
>>> > Best,
>>> > Leonard
>>> >
>>> >
>>> >> On Sep 18, 2023, at 7:54 PM, Shammon FY  wrote:
>>> >>
>>> >> Hi devs,
>>> >>
>>> >> After discussing with @Qingsheng, I fixed a minor issue of the
>>> lineage lifecycle in `StreamExecutionEnvironment`. I have added the comment
>>> to explain that the lineage information in `StreamExecutionEnvironment`
>>> will be consistent with that of transformations. When users clear the
>>> existing transformations, the added lineage information will also be
>>> deleted.
>>> >>
>>> >> Please help to review it again, and If there are no more concerns
>>> about FLIP-314[1], I would like to start voting later, thanks. cc @
>>> <>Leonard
>>> >>
>>> >> Best,
>>> >> Shammon FY
>>> >>
>>> >> On Mon, Jul 17, 2023 at 3:43 PM Shammon FY >> > wrote:
>>> >> Hi devs,
>>> >>
>>> >> Thanks for all the valuable feedback. If there are no more concerns
>>> about FLIP-314[1], I 

Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2024-02-07 Thread Yong Fang
Hi devs,

According to the online-discussion in FLINK-3127 [1] and offline-discussion
with Maciej Obuchowski and Zhenqiu Huang, we would like to update the
lineage vertex relevant interfaces in FLIP-314 [2] as follows:

1. Introduce `LineageDataset` which represents source and sink in
`LineageVertex`. The fields in `LineageDataset` are as follows:
/* Name for this particular dataset. */
String name;
/* Unique name for this dataset's storage, for example, url for jdbc
connector and location for lakehouse connector. */
String namespace;
/* Facets for the lineage vertex to describe the particular information
of dataset, such as schema and config. */
Map facets;

2. There may be multiple datasets in one `LineageVertex`, for example,
kafka source or hybrid source. So users can get dataset list from
`LineageVertex`:
/** Get datasets from the lineage vertex. */
List datasets();

3. There will be built in facets for config and schema. To describe columns
in table/sql jobs and datastream jobs, we introduce `DatasetSchemaField`.
/** Builtin config facet for dataset. */
@PublicEvolving
public interface DatasetConfigFacet extends LineageDatasetFacet {
Map config();
}

/** Field for schema in dataset. */
public interface DatasetSchemaField {
/** The name of the field. */
String name();
/** The type of the field. */
T type();
}

Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking forward
to your feedback, thanks

Best,
Fang Yong

On Mon, Sep 25, 2023 at 1:18 PM Shammon FY  wrote:

> Hi David,
>
> Do you want the detailed topology for Flink job? You can get
> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has
> `String jsonPlan`. You can parse the json plan to get all steps and
> relations between them in a Flink job. Hope this can help you, thanks!
>
> Best,
> Shammon FY
>
> On Tue, Sep 19, 2023 at 11:46 PM David Radley 
> wrote:
>
>> Hi there,
>> I am looking at the interfaces. If I am reading it correctly,there is one
>> relationship between the source and sink and this relationship represents
>> the operational lineage. Lineage is usually represented as asset -> process
>> - > asset – see for example
>> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
>>
>> Maybe I am missing it, but it seems to be that it would be useful to
>> store the process in the lineage graph.
>>
>> It is useful to have the top level lineage as source -> Flink job ->
>> sink. Where the Flink job is the process, but also to have this asset ->
>> process -> asset pattern for each of the steps in the job. If this is
>> present, please could you point me to it,
>>
>>   Kind regards, David.
>>
>>
>>
>>
>>
>> From: David Radley 
>> Date: Tuesday, 19 September 2023 at 16:11
>> To: dev@flink.apache.org 
>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job
>> Lineage Listener
>> Hi,
>> I notice that there is an experimental lineage integration for Flink with
>> OpenLineage https://openlineage.io/docs/integrations/flink  . I think
>> this feature would allow for a superior Flink OpenLineage integration,
>> Kind regards, David.
>>
>> From: XTransfer 
>> Date: Tuesday, 19 September 2023 at 15:47
>> To: dev@flink.apache.org 
>> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job
>> Lineage Listener
>> Thanks Shammon for this proposal.
>>
>> That’s helpful for collecting the lineage of Flink tasks.
>> Looking forward to its implementation.
>>
>> Best,
>> Jiabao
>>
>>
>> > 2023年9月18日 20:56,Leonard Xu  写道:
>> >
>> > Thanks Shammon for the informations, the comment makes the lifecycle
>> clearer.
>> > +1
>> >
>> >
>> > Best,
>> > Leonard
>> >
>> >
>> >> On Sep 18, 2023, at 7:54 PM, Shammon FY  wrote:
>> >>
>> >> Hi devs,
>> >>
>> >> After discussing with @Qingsheng, I fixed a minor issue of the lineage
>> lifecycle in `StreamExecutionEnvironment`. I have added the comment to
>> explain that the lineage information in `StreamExecutionEnvironment` will
>> be consistent with that of transformations. When users clear the existing
>> transformations, the added lineage information will also be deleted.
>> >>
>> >> Please help to review it again, and If there are no more concerns
>> about FLIP-314[1], I would like to start voting later, thanks. cc @
>> <>Leonard
>> >>
>> >> Best,
>> >> Shammon FY
>> >>
>> >> On Mon, Jul 17, 2023 at 3:43 PM Shammon FY > zjur...@gmail.com>> wrote:
>> >> Hi devs,
>> >>
>> >> Thanks for all the valuable feedback. If there are no more concerns
>> about FLIP-314[1], I would like to start voting later, thanks.
>> >>
>> >>
>> >> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>>  <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>> >
>> >>
>> >> Best,
>> >> Shammon FY
>> >>
>> >>
>> >> On Wed, Jul 12, 2023 at 11:18 

Re: FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2023-09-24 Thread Shammon FY
Hi David,

Do you want the detailed topology for Flink job? You can get
`JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has
`String jsonPlan`. You can parse the json plan to get all steps and
relations between them in a Flink job. Hope this can help you, thanks!

Best,
Shammon FY

On Tue, Sep 19, 2023 at 11:46 PM David Radley 
wrote:

> Hi there,
> I am looking at the interfaces. If I am reading it correctly,there is one
> relationship between the source and sink and this relationship represents
> the operational lineage. Lineage is usually represented as asset -> process
> - > asset – see for example
> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
>
> Maybe I am missing it, but it seems to be that it would be useful to store
> the process in the lineage graph.
>
> It is useful to have the top level lineage as source -> Flink job -> sink.
> Where the Flink job is the process, but also to have this asset -> process
> -> asset pattern for each of the steps in the job. If this is present,
> please could you point me to it,
>
>   Kind regards, David.
>
>
>
>
>
> From: David Radley 
> Date: Tuesday, 19 September 2023 at 16:11
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job Lineage
> Listener
> Hi,
> I notice that there is an experimental lineage integration for Flink with
> OpenLineage https://openlineage.io/docs/integrations/flink  . I think
> this feature would allow for a superior Flink OpenLineage integration,
> Kind regards, David.
>
> From: XTransfer 
> Date: Tuesday, 19 September 2023 at 15:47
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job Lineage
> Listener
> Thanks Shammon for this proposal.
>
> That’s helpful for collecting the lineage of Flink tasks.
> Looking forward to its implementation.
>
> Best,
> Jiabao
>
>
> > 2023年9月18日 20:56,Leonard Xu  写道:
> >
> > Thanks Shammon for the informations, the comment makes the lifecycle
> clearer.
> > +1
> >
> >
> > Best,
> > Leonard
> >
> >
> >> On Sep 18, 2023, at 7:54 PM, Shammon FY  wrote:
> >>
> >> Hi devs,
> >>
> >> After discussing with @Qingsheng, I fixed a minor issue of the lineage
> lifecycle in `StreamExecutionEnvironment`. I have added the comment to
> explain that the lineage information in `StreamExecutionEnvironment` will
> be consistent with that of transformations. When users clear the existing
> transformations, the added lineage information will also be deleted.
> >>
> >> Please help to review it again, and If there are no more concerns about
> FLIP-314[1], I would like to start voting later, thanks. cc @ <>Leonard
> >>
> >> Best,
> >> Shammon FY
> >>
> >> On Mon, Jul 17, 2023 at 3:43 PM Shammon FY  zjur...@gmail.com>> wrote:
> >> Hi devs,
> >>
> >> Thanks for all the valuable feedback. If there are no more concerns
> about FLIP-314[1], I would like to start voting later, thanks.
> >>
> >>
> >> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>  <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
> >
> >>
> >> Best,
> >> Shammon FY
> >>
> >>
> >> On Wed, Jul 12, 2023 at 11:18 AM Shammon FY  zjur...@gmail.com>> wrote:
> >> Thanks for the valuable feedback, Leonard.
> >>
> >> I have discussed with Leonard off-line. We have reached some
> conclusions about these issues and I have updated the FLIP as follows:
> >>
> >> 1. Simplify the `LineageEdge` interface by creating an edge from one
> source vertex to sink vertex.
> >> 2. Remove the `TableColumnSourceLineageVertex` interface and update
> `TableColumnLineageEdge` to create an edge from columns in one source to
> each sink column.
> >> 3. Rename `SupportsLineageVertex` to `LineageVertexProvider`
> >> 4. Add method `addLineageEdges(LineageEdge ... edges)` in
> `StreamExecutionEnviroment` for datastream job and remove previous methods
> in `DataStreamSource` and `DataStreamSink`.
> >>
> >> Looking forward to your feedback, thanks.
> >>
> >> Best,
> >> Shammon FY
> >
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


FW: RE: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2023-09-19 Thread David Radley
Hi there,
I am looking at the interfaces. If I am reading it correctly,there is one 
relationship between the source and sink and this relationship represents the 
operational lineage. Lineage is usually represented as asset -> process - > 
asset – see for example 
https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph

Maybe I am missing it, but it seems to be that it would be useful to store the 
process in the lineage graph.

It is useful to have the top level lineage as source -> Flink job -> sink. 
Where the Flink job is the process, but also to have this asset -> process -> 
asset pattern for each of the steps in the job. If this is present, please 
could you point me to it,

  Kind regards, David.





From: David Radley 
Date: Tuesday, 19 September 2023 at 16:11
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job Lineage 
Listener
Hi,
I notice that there is an experimental lineage integration for Flink with 
OpenLineage https://openlineage.io/docs/integrations/flink  . I think this 
feature would allow for a superior Flink OpenLineage integration,
Kind regards, David.

From: XTransfer 
Date: Tuesday, 19 September 2023 at 15:47
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job Lineage 
Listener
Thanks Shammon for this proposal.

That’s helpful for collecting the lineage of Flink tasks.
Looking forward to its implementation.

Best,
Jiabao


> 2023年9月18日 20:56,Leonard Xu  写道:
>
> Thanks Shammon for the informations, the comment makes the lifecycle clearer.
> +1
>
>
> Best,
> Leonard
>
>
>> On Sep 18, 2023, at 7:54 PM, Shammon FY  wrote:
>>
>> Hi devs,
>>
>> After discussing with @Qingsheng, I fixed a minor issue of the lineage 
>> lifecycle in `StreamExecutionEnvironment`. I have added the comment to 
>> explain that the lineage information in `StreamExecutionEnvironment` will be 
>> consistent with that of transformations. When users clear the existing 
>> transformations, the added lineage information will also be deleted.
>>
>> Please help to review it again, and If there are no more concerns about 
>> FLIP-314[1], I would like to start voting later, thanks. cc @ <>Leonard
>>
>> Best,
>> Shammon FY
>>
>> On Mon, Jul 17, 2023 at 3:43 PM Shammon FY > > wrote:
>> Hi devs,
>>
>> Thanks for all the valuable feedback. If there are no more concerns about 
>> FLIP-314[1], I would like to start voting later, thanks.
>>
>>
>> [1] 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>>
>> >   >
>>
>> Best,
>> Shammon FY
>>
>>
>> On Wed, Jul 12, 2023 at 11:18 AM Shammon FY > > wrote:
>> Thanks for the valuable feedback, Leonard.
>>
>> I have discussed with Leonard off-line. We have reached some conclusions 
>> about these issues and I have updated the FLIP as follows:
>>
>> 1. Simplify the `LineageEdge` interface by creating an edge from one source 
>> vertex to sink vertex.
>> 2. Remove the `TableColumnSourceLineageVertex` interface and update 
>> `TableColumnLineageEdge` to create an edge from columns in one source to 
>> each sink column.
>> 3. Rename `SupportsLineageVertex` to `LineageVertexProvider`
>> 4. Add method `addLineageEdges(LineageEdge ... edges)` in 
>> `StreamExecutionEnviroment` for datastream job and remove previous methods 
>> in `DataStreamSource` and `DataStreamSink`.
>>
>> Looking forward to your feedback, thanks.
>>
>> Best,
>> Shammon FY
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU