RE: Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-09-02 Thread Ryan van Huuksloot via user
Hi Mason,

First off, thanks for putting this FLIP together! Sorry for the delay. Full
disclosure Mason and I chatted a little bit at Flink Forward 2022 but I
have tried to capture the questions I had for him then.

I'll start the conversation with a few questions:

1. The concept of streamIds is not clear to me in the proposal and could
use some more information. If I understand correctly, they will be used in
the MetadataService to link KafkaClusters to ones you want to use? If you
assign stream ids using `setStreamIds`, how can you dynamically increase
the number of clusters you consume if the list of StreamIds is static? I am
basing this off of your example .setStreamIds(List.of("my-stream-1",
"my-stream-2")) so I could be off base with my assumption. If you don't
mind clearing up the intention, that would be great!

2. How would offsets work if you wanted to use this MultiClusterKafkaSource
with a file based backfill? In the case I am thinking of, you have a bucket
backed archive of Kafka data per cluster. and you want to pick up from the
last offset in the archived system, how would you set OffsetInitializers
"per cluster" potentially as a function or are you limited to setting an
OffsetInitializer for the entire Source?

3. Just to make sure - because this system will layer on top of Flink-27
and use KafkaSource for some aspects under the hood, the watermark
alignment that was introduced in FLIP-182 / Flink 1.15 would be possible
across multiple clusters if you assign them to the same alignment group?

Thanks!
Ryan

On 2022/06/28 07:21:15 Mason Chen wrote:
> Hi all,
>
> Thanks for the feedback! I'm adding the users, who responded in the user
> mailing list, to this thread.
>
> @Qingsheng - Yes, I would prefer to reuse the existing Kafka connector
> module. It makes a lot of sense since the dependencies are the same and
the
> implementation can also extend and improve some of the test utilities you
> have been working on for the FLIP 27 Kafka Source. I will enumerate the
> migration steps in the FLIP template.
>
> @Ryan - I don't have a public branch available yet, but I would appreciate
> your review on the FLIP design! When the FLIP design is approved by devs
> and the community, I can start to commit our implementation to a fork.
>
> @Andrew - Yup, one of the requirements of the connector is to read
multiple
> clusters within a single source, so it should be able to work well with
> your use case.
>
> @Devs - what do I need to get started on the FLIP design? I see the FLIP
> template and I have an account (mason6345), but I don't have access to
> create a page.
>
> Best,
> Mason
>
>
>
>
> On Sun, Jun 26, 2022 at 8:08 PM Qingsheng Ren  wrote:
>
> > Hi Mason,
> >
> > It sounds like an exciting enhancement to the Kafka source and will
> > benefit a lot of users I believe.
> >
> > Would you prefer to reuse the existing flink-connector-kafka module or
> > create a new one for the new multi-cluster feature? Personally I prefer
the
> > former one because users won’t need to introduce another dependency
module
> > to their projects in order to use the feature.
> >
> > Thanks for the effort on this and looking forward to your FLIP!
> >
> > Best,
> > Qingsheng
> >
> > > On Jun 24, 2022, at 09:43, Mason Chen  wrote:
> > >
> > > Hi community,
> > >
> > > We have been working on a Multi Cluster Kafka Source and are looking
to
> > > contribute it upstream. I've given a talk about the features and
design
> > at
> > > a Flink meetup: https://youtu.be/H1SYOuLcUTI.
> > >
> > > The main features that it provides is:
> > > 1. Reading multiple Kafka clusters within a single source.
> > > 2. Adjusting the clusters and topics the source consumes from
> > dynamically,
> > > without Flink job restart.
> > >
> > > Some of the challenging use cases that these features solve are:
> > > 1. Transparent Kafka cluster migration without Flink job restart.
> > > 2. Transparent Kafka topic migration without Flink job restart.
> > > 3. Direct integration with Hybrid Source.
> > >
> > > In addition, this is designed with wrapping and managing the existing
> > > KafkaSource components to enable these features, so it can continue to
> > > benefit from KafkaSource improvements and bug fixes. It can be
considered
> > > as a form of a composite source.
> > >
> > > I think the contribution of this source could benefit a lot of users
who
> > > have asked in the mailing list about Flink handling Kafka migrations
and
> > > removing topics in the past. I would love to hear and address your
> > thoughts
> > > and feedback, and if possible drive a FLIP!
> > >
> > > Best,
> > > Mason
> >
> >
>


Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-07-14 Thread Mason Chen
Hi all,

Circling back on this--I have created a first draft document in confluence:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-246%3A+Multi+Cluster+Kafka+Source
.

Looking forward to hear all your feedback in this email thread!

Best,
Mason

On Thu, Jun 30, 2022 at 6:57 AM Thomas Weise  wrote:

> Hi Mason,
>
> I added mason6345 to the Flink confluence space, you should be able to
> add a FLIP now.
>
> Looking forward to the contribution!
>
> Thomas
>
> On Thu, Jun 30, 2022 at 9:25 AM Martijn Visser 
> wrote:
> >
> > Hi Mason,
> >
> > I'm sure there's a PMC (*hint*) out there who can grant you access to
> > create a FLIP. Looking forward to it, this sounds like an improvement
> that
> > users are looking forward to.
> >
> > Best regards,
> >
> > Martijn
> >
> > Op di 28 jun. 2022 om 09:21 schreef Mason Chen :
> >
> > > Hi all,
> > >
> > > Thanks for the feedback! I'm adding the users, who responded in the
> user
> > > mailing list, to this thread.
> > >
> > > @Qingsheng - Yes, I would prefer to reuse the existing Kafka connector
> > > module. It makes a lot of sense since the dependencies are the same
> and the
> > > implementation can also extend and improve some of the test utilities
> you
> > > have been working on for the FLIP 27 Kafka Source. I will enumerate the
> > > migration steps in the FLIP template.
> > >
> > > @Ryan - I don't have a public branch available yet, but I would
> appreciate
> > > your review on the FLIP design! When the FLIP design is approved by
> devs
> > > and the community, I can start to commit our implementation to a fork.
> > >
> > > @Andrew - Yup, one of the requirements of the connector is to read
> > > multiple clusters within a single source, so it should be able to work
> well
> > > with your use case.
> > >
> > > @Devs - what do I need to get started on the FLIP design? I see the
> FLIP
> > > template and I have an account (mason6345), but I don't have access to
> > > create a page.
> > >
> > > Best,
> > > Mason
> > >
> > >
> > >
> > >
> > > On Sun, Jun 26, 2022 at 8:08 PM Qingsheng Ren 
> wrote:
> > >
> > >> Hi Mason,
> > >>
> > >> It sounds like an exciting enhancement to the Kafka source and will
> > >> benefit a lot of users I believe.
> > >>
> > >> Would you prefer to reuse the existing flink-connector-kafka module or
> > >> create a new one for the new multi-cluster feature? Personally I
> prefer the
> > >> former one because users won’t need to introduce another dependency
> module
> > >> to their projects in order to use the feature.
> > >>
> > >> Thanks for the effort on this and looking forward to your FLIP!
> > >>
> > >> Best,
> > >> Qingsheng
> > >>
> > >> > On Jun 24, 2022, at 09:43, Mason Chen 
> wrote:
> > >> >
> > >> > Hi community,
> > >> >
> > >> > We have been working on a Multi Cluster Kafka Source and are
> looking to
> > >> > contribute it upstream. I've given a talk about the features and
> design
> > >> at
> > >> > a Flink meetup: https://youtu.be/H1SYOuLcUTI.
> > >> >
> > >> > The main features that it provides is:
> > >> > 1. Reading multiple Kafka clusters within a single source.
> > >> > 2. Adjusting the clusters and topics the source consumes from
> > >> dynamically,
> > >> > without Flink job restart.
> > >> >
> > >> > Some of the challenging use cases that these features solve are:
> > >> > 1. Transparent Kafka cluster migration without Flink job restart.
> > >> > 2. Transparent Kafka topic migration without Flink job restart.
> > >> > 3. Direct integration with Hybrid Source.
> > >> >
> > >> > In addition, this is designed with wrapping and managing the
> existing
> > >> > KafkaSource components to enable these features, so it can continue
> to
> > >> > benefit from KafkaSource improvements and bug fixes. It can be
> > >> considered
> > >> > as a form of a composite source.
> > >> >
> > >> > I think the contribution of this source could benefit a lot of
> users who
> > >> > have asked in the mailing list about Flink handling Kafka
> migrations and
> > >> > removing topics in the past. I would love to hear and address your
> > >> thoughts
> > >> > and feedback, and if possible drive a FLIP!
> > >> >
> > >> > Best,
> > >> > Mason
> > >>
> > >>
>


Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-06-30 Thread Thomas Weise
Hi Mason,

I added mason6345 to the Flink confluence space, you should be able to
add a FLIP now.

Looking forward to the contribution!

Thomas

On Thu, Jun 30, 2022 at 9:25 AM Martijn Visser  wrote:
>
> Hi Mason,
>
> I'm sure there's a PMC (*hint*) out there who can grant you access to
> create a FLIP. Looking forward to it, this sounds like an improvement that
> users are looking forward to.
>
> Best regards,
>
> Martijn
>
> Op di 28 jun. 2022 om 09:21 schreef Mason Chen :
>
> > Hi all,
> >
> > Thanks for the feedback! I'm adding the users, who responded in the user
> > mailing list, to this thread.
> >
> > @Qingsheng - Yes, I would prefer to reuse the existing Kafka connector
> > module. It makes a lot of sense since the dependencies are the same and the
> > implementation can also extend and improve some of the test utilities you
> > have been working on for the FLIP 27 Kafka Source. I will enumerate the
> > migration steps in the FLIP template.
> >
> > @Ryan - I don't have a public branch available yet, but I would appreciate
> > your review on the FLIP design! When the FLIP design is approved by devs
> > and the community, I can start to commit our implementation to a fork.
> >
> > @Andrew - Yup, one of the requirements of the connector is to read
> > multiple clusters within a single source, so it should be able to work well
> > with your use case.
> >
> > @Devs - what do I need to get started on the FLIP design? I see the FLIP
> > template and I have an account (mason6345), but I don't have access to
> > create a page.
> >
> > Best,
> > Mason
> >
> >
> >
> >
> > On Sun, Jun 26, 2022 at 8:08 PM Qingsheng Ren  wrote:
> >
> >> Hi Mason,
> >>
> >> It sounds like an exciting enhancement to the Kafka source and will
> >> benefit a lot of users I believe.
> >>
> >> Would you prefer to reuse the existing flink-connector-kafka module or
> >> create a new one for the new multi-cluster feature? Personally I prefer the
> >> former one because users won’t need to introduce another dependency module
> >> to their projects in order to use the feature.
> >>
> >> Thanks for the effort on this and looking forward to your FLIP!
> >>
> >> Best,
> >> Qingsheng
> >>
> >> > On Jun 24, 2022, at 09:43, Mason Chen  wrote:
> >> >
> >> > Hi community,
> >> >
> >> > We have been working on a Multi Cluster Kafka Source and are looking to
> >> > contribute it upstream. I've given a talk about the features and design
> >> at
> >> > a Flink meetup: https://youtu.be/H1SYOuLcUTI.
> >> >
> >> > The main features that it provides is:
> >> > 1. Reading multiple Kafka clusters within a single source.
> >> > 2. Adjusting the clusters and topics the source consumes from
> >> dynamically,
> >> > without Flink job restart.
> >> >
> >> > Some of the challenging use cases that these features solve are:
> >> > 1. Transparent Kafka cluster migration without Flink job restart.
> >> > 2. Transparent Kafka topic migration without Flink job restart.
> >> > 3. Direct integration with Hybrid Source.
> >> >
> >> > In addition, this is designed with wrapping and managing the existing
> >> > KafkaSource components to enable these features, so it can continue to
> >> > benefit from KafkaSource improvements and bug fixes. It can be
> >> considered
> >> > as a form of a composite source.
> >> >
> >> > I think the contribution of this source could benefit a lot of users who
> >> > have asked in the mailing list about Flink handling Kafka migrations and
> >> > removing topics in the past. I would love to hear and address your
> >> thoughts
> >> > and feedback, and if possible drive a FLIP!
> >> >
> >> > Best,
> >> > Mason
> >>
> >>


Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-06-30 Thread Martijn Visser
Hi Mason,

I'm sure there's a PMC (*hint*) out there who can grant you access to
create a FLIP. Looking forward to it, this sounds like an improvement that
users are looking forward to.

Best regards,

Martijn

Op di 28 jun. 2022 om 09:21 schreef Mason Chen :

> Hi all,
>
> Thanks for the feedback! I'm adding the users, who responded in the user
> mailing list, to this thread.
>
> @Qingsheng - Yes, I would prefer to reuse the existing Kafka connector
> module. It makes a lot of sense since the dependencies are the same and the
> implementation can also extend and improve some of the test utilities you
> have been working on for the FLIP 27 Kafka Source. I will enumerate the
> migration steps in the FLIP template.
>
> @Ryan - I don't have a public branch available yet, but I would appreciate
> your review on the FLIP design! When the FLIP design is approved by devs
> and the community, I can start to commit our implementation to a fork.
>
> @Andrew - Yup, one of the requirements of the connector is to read
> multiple clusters within a single source, so it should be able to work well
> with your use case.
>
> @Devs - what do I need to get started on the FLIP design? I see the FLIP
> template and I have an account (mason6345), but I don't have access to
> create a page.
>
> Best,
> Mason
>
>
>
>
> On Sun, Jun 26, 2022 at 8:08 PM Qingsheng Ren  wrote:
>
>> Hi Mason,
>>
>> It sounds like an exciting enhancement to the Kafka source and will
>> benefit a lot of users I believe.
>>
>> Would you prefer to reuse the existing flink-connector-kafka module or
>> create a new one for the new multi-cluster feature? Personally I prefer the
>> former one because users won’t need to introduce another dependency module
>> to their projects in order to use the feature.
>>
>> Thanks for the effort on this and looking forward to your FLIP!
>>
>> Best,
>> Qingsheng
>>
>> > On Jun 24, 2022, at 09:43, Mason Chen  wrote:
>> >
>> > Hi community,
>> >
>> > We have been working on a Multi Cluster Kafka Source and are looking to
>> > contribute it upstream. I've given a talk about the features and design
>> at
>> > a Flink meetup: https://youtu.be/H1SYOuLcUTI.
>> >
>> > The main features that it provides is:
>> > 1. Reading multiple Kafka clusters within a single source.
>> > 2. Adjusting the clusters and topics the source consumes from
>> dynamically,
>> > without Flink job restart.
>> >
>> > Some of the challenging use cases that these features solve are:
>> > 1. Transparent Kafka cluster migration without Flink job restart.
>> > 2. Transparent Kafka topic migration without Flink job restart.
>> > 3. Direct integration with Hybrid Source.
>> >
>> > In addition, this is designed with wrapping and managing the existing
>> > KafkaSource components to enable these features, so it can continue to
>> > benefit from KafkaSource improvements and bug fixes. It can be
>> considered
>> > as a form of a composite source.
>> >
>> > I think the contribution of this source could benefit a lot of users who
>> > have asked in the mailing list about Flink handling Kafka migrations and
>> > removing topics in the past. I would love to hear and address your
>> thoughts
>> > and feedback, and if possible drive a FLIP!
>> >
>> > Best,
>> > Mason
>>
>>


Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-06-28 Thread Mason Chen
Hi all,

Thanks for the feedback! I'm adding the users, who responded in the user
mailing list, to this thread.

@Qingsheng - Yes, I would prefer to reuse the existing Kafka connector
module. It makes a lot of sense since the dependencies are the same and the
implementation can also extend and improve some of the test utilities you
have been working on for the FLIP 27 Kafka Source. I will enumerate the
migration steps in the FLIP template.

@Ryan - I don't have a public branch available yet, but I would appreciate
your review on the FLIP design! When the FLIP design is approved by devs
and the community, I can start to commit our implementation to a fork.

@Andrew - Yup, one of the requirements of the connector is to read multiple
clusters within a single source, so it should be able to work well with
your use case.

@Devs - what do I need to get started on the FLIP design? I see the FLIP
template and I have an account (mason6345), but I don't have access to
create a page.

Best,
Mason




On Sun, Jun 26, 2022 at 8:08 PM Qingsheng Ren  wrote:

> Hi Mason,
>
> It sounds like an exciting enhancement to the Kafka source and will
> benefit a lot of users I believe.
>
> Would you prefer to reuse the existing flink-connector-kafka module or
> create a new one for the new multi-cluster feature? Personally I prefer the
> former one because users won’t need to introduce another dependency module
> to their projects in order to use the feature.
>
> Thanks for the effort on this and looking forward to your FLIP!
>
> Best,
> Qingsheng
>
> > On Jun 24, 2022, at 09:43, Mason Chen  wrote:
> >
> > Hi community,
> >
> > We have been working on a Multi Cluster Kafka Source and are looking to
> > contribute it upstream. I've given a talk about the features and design
> at
> > a Flink meetup: https://youtu.be/H1SYOuLcUTI.
> >
> > The main features that it provides is:
> > 1. Reading multiple Kafka clusters within a single source.
> > 2. Adjusting the clusters and topics the source consumes from
> dynamically,
> > without Flink job restart.
> >
> > Some of the challenging use cases that these features solve are:
> > 1. Transparent Kafka cluster migration without Flink job restart.
> > 2. Transparent Kafka topic migration without Flink job restart.
> > 3. Direct integration with Hybrid Source.
> >
> > In addition, this is designed with wrapping and managing the existing
> > KafkaSource components to enable these features, so it can continue to
> > benefit from KafkaSource improvements and bug fixes. It can be considered
> > as a form of a composite source.
> >
> > I think the contribution of this source could benefit a lot of users who
> > have asked in the mailing list about Flink handling Kafka migrations and
> > removing topics in the past. I would love to hear and address your
> thoughts
> > and feedback, and if possible drive a FLIP!
> >
> > Best,
> > Mason
>
>


Re: Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-06-27 Thread Andrew Otto
This sounds very useful!  Another potential use case:

- Consuming from multiple kafka clusters in different datacenters/regions.

I'm not sure if we would ultimately want to do this, but having it as an
option when you need events from multiple kafka clusters to get the full
history of changes (instead of relying on MirrorMaker) could be nice.






On Mon, Jun 27, 2022 at 1:02 PM Ryan van Huuksloot <
ryan.vanhuuksl...@shopify.com> wrote:

> Hi Mason,
>
> Thanks for starting this discussion! The proposed Source sounds awesome
> and we would be interested in taking a look at the source code and
> evaluating our use cases. We can provide information and review on a
> potential FLIP based on other use cases.
>
> Do you have a fork/branch that you are working with that is public? Could
> you attach that so we can start looking at it?
>
> Let us know if you need anything from us to help move this forward.
>
> Thanks!
> Ryan
>
> On 2022/06/27 03:08:13 Qingsheng Ren wrote:
> > Hi Mason,
> >
> > It sounds like an exciting enhancement to the Kafka source and will
> benefit a lot of users I believe.
> >
> > Would you prefer to reuse the existing flink-connector-kafka module or
> create a new one for the new multi-cluster feature? Personally I prefer the
> former one because users won’t need to introduce another dependency module
> to their projects in order to use the feature.
> >
> > Thanks for the effort on this and looking forward to your FLIP!
> >
> > Best,
> > Qingsheng
> >
> > > On Jun 24, 2022, at 09:43, Mason Chen  wrote:
> > >
> > > Hi community,
> > >
> > > We have been working on a Multi Cluster Kafka Source and are looking to
> > > contribute it upstream. I've given a talk about the features and
> design at
> > > a Flink meetup: https://youtu.be/H1SYOuLcUTI.
> > >
> > > The main features that it provides is:
> > > 1. Reading multiple Kafka clusters within a single source.
> > > 2. Adjusting the clusters and topics the source consumes from
> dynamically,
> > > without Flink job restart.
> > >
> > > Some of the challenging use cases that these features solve are:
> > > 1. Transparent Kafka cluster migration without Flink job restart.
> > > 2. Transparent Kafka topic migration without Flink job restart.
> > > 3. Direct integration with Hybrid Source.
> > >
> > > In addition, this is designed with wrapping and managing the existing
> > > KafkaSource components to enable these features, so it can continue to
> > > benefit from KafkaSource improvements and bug fixes. It can be
> considered
> > > as a form of a composite source.
> > >
> > > I think the contribution of this source could benefit a lot of users
> who
> > > have asked in the mailing list about Flink handling Kafka migrations
> and
> > > removing topics in the past. I would love to hear and address your
> thoughts
> > > and feedback, and if possible drive a FLIP!
> > >
> > > Best,
> > > Mason
> >
> >
>


RE: Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-06-27 Thread Ryan van Huuksloot
Hi Mason,

Thanks for starting this discussion! The proposed Source sounds awesome and
we would be interested in taking a look at the source code and evaluating
our use cases. We can provide information and review on a potential FLIP
based on other use cases.

Do you have a fork/branch that you are working with that is public? Could
you attach that so we can start looking at it?

Let us know if you need anything from us to help move this forward.

Thanks!
Ryan

On 2022/06/27 03:08:13 Qingsheng Ren wrote:
> Hi Mason,
>
> It sounds like an exciting enhancement to the Kafka source and will
benefit a lot of users I believe.
>
> Would you prefer to reuse the existing flink-connector-kafka module or
create a new one for the new multi-cluster feature? Personally I prefer the
former one because users won’t need to introduce another dependency module
to their projects in order to use the feature.
>
> Thanks for the effort on this and looking forward to your FLIP!
>
> Best,
> Qingsheng
>
> > On Jun 24, 2022, at 09:43, Mason Chen  wrote:
> >
> > Hi community,
> >
> > We have been working on a Multi Cluster Kafka Source and are looking to
> > contribute it upstream. I've given a talk about the features and design
at
> > a Flink meetup: https://youtu.be/H1SYOuLcUTI.
> >
> > The main features that it provides is:
> > 1. Reading multiple Kafka clusters within a single source.
> > 2. Adjusting the clusters and topics the source consumes from
dynamically,
> > without Flink job restart.
> >
> > Some of the challenging use cases that these features solve are:
> > 1. Transparent Kafka cluster migration without Flink job restart.
> > 2. Transparent Kafka topic migration without Flink job restart.
> > 3. Direct integration with Hybrid Source.
> >
> > In addition, this is designed with wrapping and managing the existing
> > KafkaSource components to enable these features, so it can continue to
> > benefit from KafkaSource improvements and bug fixes. It can be
considered
> > as a form of a composite source.
> >
> > I think the contribution of this source could benefit a lot of users who
> > have asked in the mailing list about Flink handling Kafka migrations and
> > removing topics in the past. I would love to hear and address your
thoughts
> > and feedback, and if possible drive a FLIP!
> >
> > Best,
> > Mason
>
>