Re: Re: KafkaSource consumer group

2023-03-31 Thread Andrew Otto
Hi,

FWIW, I asked a similar question here:
https://lists.apache.org/thread/1f01zo1lqcmhvosptpjlm6k3mgx0sv1m

:)


On Fri, Mar 31, 2023 at 3:57 AM Roberts, Ben (Senior Developer) via user <
user@flink.apache.org> wrote:

> Hi Gordon,
>
> Thanks for the reply!
> I think that makes sense.
>
> The reason for investigating is that generally we run our production
> workloads across 2 kubernetes clusters (each in a different cloud region)
> for availability reasons. So for instance requests to web apps are load
> balanced between servers in both clusters, and pub/sub apps will have
> consumers running in both clusters in the same consumer group (or non-kafka
> equivalent).
>
> We’ve just recently deployed our first production Flink workload, using
> the flink-kubernetes-operator and running the job(s) in HA mode, but we
> discovered that the same job running in each k8s cluster was processing the
> same messages, which was different to what we’d expected.
> It sounds like this is intentional from Flink’s POV though.
>
> I don’t suppose you’re aware of a feature that would allow us to run a
> Flink job across 2 clusters? Otherwise I guess we’ll need to just run it in
> a single cluster and be aware of the risks if we lost that cluster.
>
> Thanks,
> Ben
>
> On 2023/03/30 16:52:31 "Tzu-Li (Gordon) Tai" wrote:
> > Hi Robert,
> >
> > This is a design choice. Flink's KafkaSource doesn't rely on consumer
> > groups for assigning partitions / rebalancing / offset tracking. It
> > manually assigns whatever partitions are in the specified topic across
> its
> > consumer instances, and rebalances only when the Flink job / KafkaSink is
> > rescaled.
> >
> > Is there a specific reason that you need two Flink jobs for this? I
> believe
> > the Flink-way of doing this would be to have one job read the topic, and
> > then you'd do a stream split if you want to have two different branches
> of
> > processing business logic.
> >
> > Thanks,
> > Gordon
> >
> > On Thu, Mar 30, 2023 at 9:34 AM Roberts, Ben (Senior Developer) via user
> <
> > user@flink.apache.org> wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > Is there a way to run multiple flink jobs with the same Kafka group.id
> > > and have them join the same consumer group?
> > >
> > >
> > >
> > > It seems that setting the group.id using
> > > KafkaSource.builder().set_group_id() does not have the effect of
> creating
> > > an actual consumer group in Kafka.
> > >
> > >
> > >
> > > Running the same flink job with the same group.id, consuming from the
> > > same topic, will result in both flink jobs receiving the same messages
> from
> > > the topic, rather than only one of the jobs receiving the messages (as
> > > would be expected for consumers in a consumer group normally with
> Kafka).
> > >
> > >
> > >
> > > Is this a design choice, and is there a way to configure it so messages
> > > can be split across two jobs using the same “group.id”?
> > >
> > >
> > >
> > > Thanks in advance,
> > >
> > > Ben
> > >
> > >
> > > Information in this email including any attachments may be privileged,
> > > confidential and is intended exclusively for the addressee. The views
> > > expressed may not be official policy, but the personal views of the
> > > originator. If you have received it in error, please notify the sender
> by
> > > return e-mail and delete it from your system. You should not reproduce,
> > > distribute, store, retransmit, use or disclose its contents to anyone.
> > > Please note we reserve the right to monitor all e-mail communication
> > > through our internal and external networks. SKY and the SKY marks are
> > > trademarks of Sky Limited and Sky International AG and are used under
> > > licence.
> > >
> > > Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited
> > > (Registration No. 2067075), Sky Subscribers Services Limited
> (Registration
> > > No. 2340150) and Sky CP Limited (Registration No. 9513259) are direct
> or
> > > indirect subsidiaries of Sky Limited (Registration No. 2247735). All
> of the
> > > companies mentioned in this paragraph are incorporated in England and
> Wales
> > > and share the same registered office at Grant Way, Isleworth,
> Middlesex TW7
> > > 5QD
> > >
> >
> Information in this email including any attachments may be privileged,
> confidential and is intended exclusively for the addressee. The views
> expressed may not be official policy, but the personal views of the
> originator. If you have received it in error, please notify the sender by
> return e-mail and delete it from your system. You should not reproduce,
> distribute, store, retransmit, use or disclose its contents to anyone.
> Please note we reserve the right to monitor all e-mail communication
> through our internal and external networks. SKY and the SKY marks are
> trademarks of Sky Limited and Sky International AG and are used under
> licence.
>
> Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited
> (Registration No. 2067075), Sky 

RE: Re: KafkaSource consumer group

2023-03-31 Thread Roberts, Ben (Senior Developer) via user
Hi Gordon,

Thanks for the reply!
I think that makes sense.

The reason for investigating is that generally we run our production workloads 
across 2 kubernetes clusters (each in a different cloud region) for 
availability reasons. So for instance requests to web apps are load balanced 
between servers in both clusters, and pub/sub apps will have consumers running 
in both clusters in the same consumer group (or non-kafka equivalent).

We’ve just recently deployed our first production Flink workload, using the 
flink-kubernetes-operator and running the job(s) in HA mode, but we discovered 
that the same job running in each k8s cluster was processing the same messages, 
which was different to what we’d expected.
It sounds like this is intentional from Flink’s POV though.

I don’t suppose you’re aware of a feature that would allow us to run a Flink 
job across 2 clusters? Otherwise I guess we’ll need to just run it in a single 
cluster and be aware of the risks if we lost that cluster.

Thanks,
Ben

On 2023/03/30 16:52:31 "Tzu-Li (Gordon) Tai" wrote:
> Hi Robert,
>
> This is a design choice. Flink's KafkaSource doesn't rely on consumer
> groups for assigning partitions / rebalancing / offset tracking. It
> manually assigns whatever partitions are in the specified topic across its
> consumer instances, and rebalances only when the Flink job / KafkaSink is
> rescaled.
>
> Is there a specific reason that you need two Flink jobs for this? I believe
> the Flink-way of doing this would be to have one job read the topic, and
> then you'd do a stream split if you want to have two different branches of
> processing business logic.
>
> Thanks,
> Gordon
>
> On Thu, Mar 30, 2023 at 9:34 AM Roberts, Ben (Senior Developer) via user <
> user@flink.apache.org> wrote:
>
> > Hi,
> >
> >
> >
> > Is there a way to run multiple flink jobs with the same Kafka group.id
> > and have them join the same consumer group?
> >
> >
> >
> > It seems that setting the group.id using
> > KafkaSource.builder().set_group_id() does not have the effect of creating
> > an actual consumer group in Kafka.
> >
> >
> >
> > Running the same flink job with the same group.id, consuming from the
> > same topic, will result in both flink jobs receiving the same messages from
> > the topic, rather than only one of the jobs receiving the messages (as
> > would be expected for consumers in a consumer group normally with Kafka).
> >
> >
> >
> > Is this a design choice, and is there a way to configure it so messages
> > can be split across two jobs using the same “group.id”?
> >
> >
> >
> > Thanks in advance,
> >
> > Ben
> >
> >
> > Information in this email including any attachments may be privileged,
> > confidential and is intended exclusively for the addressee. The views
> > expressed may not be official policy, but the personal views of the
> > originator. If you have received it in error, please notify the sender by
> > return e-mail and delete it from your system. You should not reproduce,
> > distribute, store, retransmit, use or disclose its contents to anyone.
> > Please note we reserve the right to monitor all e-mail communication
> > through our internal and external networks. SKY and the SKY marks are
> > trademarks of Sky Limited and Sky International AG and are used under
> > licence.
> >
> > Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited
> > (Registration No. 2067075), Sky Subscribers Services Limited (Registration
> > No. 2340150) and Sky CP Limited (Registration No. 9513259) are direct or
> > indirect subsidiaries of Sky Limited (Registration No. 2247735). All of the
> > companies mentioned in this paragraph are incorporated in England and Wales
> > and share the same registered office at Grant Way, Isleworth, Middlesex TW7
> > 5QD
> >
>
Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views expressed 
may not be official policy, but the personal views of the originator. If you 
have received it in error, please notify the sender by return e-mail and delete 
it from your system. You should not reproduce, distribute, store, retransmit, 
use or disclose its contents to anyone. Please note we reserve the right to 
monitor all e-mail communication through our internal and external networks. 
SKY and the SKY marks are trademarks of Sky Limited and Sky International AG 
and are used under licence.

Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited 
(Registration No. 2067075), Sky Subscribers Services Limited (Registration No. 
2340150) and Sky CP Limited (Registration No. 9513259) are direct or indirect 
subsidiaries of Sky Limited (Registration No. 2247735). All of the companies 
mentioned in this paragraph are incorporated in England and Wales and share the 
same registered office at Grant Way, Isleworth, Middlesex TW7 5QD


Re: KafkaSource consumer group

2023-03-30 Thread Tzu-Li (Gordon) Tai
Sorry, I meant to say "Hi Ben" :-)

On Thu, Mar 30, 2023 at 9:52 AM Tzu-Li (Gordon) Tai 
wrote:

> Hi Robert,
>
> This is a design choice. Flink's KafkaSource doesn't rely on consumer
> groups for assigning partitions / rebalancing / offset tracking. It
> manually assigns whatever partitions are in the specified topic across its
> consumer instances, and rebalances only when the Flink job / KafkaSink is
> rescaled.
>
> Is there a specific reason that you need two Flink jobs for this? I
> believe the Flink-way of doing this would be to have one job read the
> topic, and then you'd do a stream split if you want to have two different
> branches of processing business logic.
>
> Thanks,
> Gordon
>
> On Thu, Mar 30, 2023 at 9:34 AM Roberts, Ben (Senior Developer) via user <
> user@flink.apache.org> wrote:
>
>> Hi,
>>
>>
>>
>> Is there a way to run multiple flink jobs with the same Kafka group.id
>> and have them join the same consumer group?
>>
>>
>>
>> It seems that setting the group.id using
>> KafkaSource.builder().set_group_id() does not have the effect of creating
>> an actual consumer group in Kafka.
>>
>>
>>
>> Running the same flink job with the same group.id, consuming from the
>> same topic, will result in both flink jobs receiving the same messages from
>> the topic, rather than only one of the jobs receiving the messages (as
>> would be expected for consumers in a consumer group normally with Kafka).
>>
>>
>>
>> Is this a design choice, and is there a way to configure it so messages
>> can be split across two jobs using the same “group.id”?
>>
>>
>>
>> Thanks in advance,
>>
>> Ben
>>
>>
>> Information in this email including any attachments may be privileged,
>> confidential and is intended exclusively for the addressee. The views
>> expressed may not be official policy, but the personal views of the
>> originator. If you have received it in error, please notify the sender by
>> return e-mail and delete it from your system. You should not reproduce,
>> distribute, store, retransmit, use or disclose its contents to anyone.
>> Please note we reserve the right to monitor all e-mail communication
>> through our internal and external networks. SKY and the SKY marks are
>> trademarks of Sky Limited and Sky International AG and are used under
>> licence.
>>
>> Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited
>> (Registration No. 2067075), Sky Subscribers Services Limited (Registration
>> No. 2340150) and Sky CP Limited (Registration No. 9513259) are direct or
>> indirect subsidiaries of Sky Limited (Registration No. 2247735). All of the
>> companies mentioned in this paragraph are incorporated in England and Wales
>> and share the same registered office at Grant Way, Isleworth, Middlesex TW7
>> 5QD
>>
>


Re: KafkaSource consumer group

2023-03-30 Thread Tzu-Li (Gordon) Tai
Hi Robert,

This is a design choice. Flink's KafkaSource doesn't rely on consumer
groups for assigning partitions / rebalancing / offset tracking. It
manually assigns whatever partitions are in the specified topic across its
consumer instances, and rebalances only when the Flink job / KafkaSink is
rescaled.

Is there a specific reason that you need two Flink jobs for this? I believe
the Flink-way of doing this would be to have one job read the topic, and
then you'd do a stream split if you want to have two different branches of
processing business logic.

Thanks,
Gordon

On Thu, Mar 30, 2023 at 9:34 AM Roberts, Ben (Senior Developer) via user <
user@flink.apache.org> wrote:

> Hi,
>
>
>
> Is there a way to run multiple flink jobs with the same Kafka group.id
> and have them join the same consumer group?
>
>
>
> It seems that setting the group.id using
> KafkaSource.builder().set_group_id() does not have the effect of creating
> an actual consumer group in Kafka.
>
>
>
> Running the same flink job with the same group.id, consuming from the
> same topic, will result in both flink jobs receiving the same messages from
> the topic, rather than only one of the jobs receiving the messages (as
> would be expected for consumers in a consumer group normally with Kafka).
>
>
>
> Is this a design choice, and is there a way to configure it so messages
> can be split across two jobs using the same “group.id”?
>
>
>
> Thanks in advance,
>
> Ben
>
>
> Information in this email including any attachments may be privileged,
> confidential and is intended exclusively for the addressee. The views
> expressed may not be official policy, but the personal views of the
> originator. If you have received it in error, please notify the sender by
> return e-mail and delete it from your system. You should not reproduce,
> distribute, store, retransmit, use or disclose its contents to anyone.
> Please note we reserve the right to monitor all e-mail communication
> through our internal and external networks. SKY and the SKY marks are
> trademarks of Sky Limited and Sky International AG and are used under
> licence.
>
> Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited
> (Registration No. 2067075), Sky Subscribers Services Limited (Registration
> No. 2340150) and Sky CP Limited (Registration No. 9513259) are direct or
> indirect subsidiaries of Sky Limited (Registration No. 2247735). All of the
> companies mentioned in this paragraph are incorporated in England and Wales
> and share the same registered office at Grant Way, Isleworth, Middlesex TW7
> 5QD
>