Re: Replicas not equally distributed within rack

2024-03-27 Thread Abhishek Singla
Yes, it’s similar.

Replicas are evenly distribute among racks but not among brokers within
rack even if no. of brokers are same in all racks.

Is there a workaround for this?

On Wed, 27 Mar 2024 at 5:36 PM, Chia-Ping Tsai  wrote:

> hi Abhishek
>
> Is this issue similar to the unbalance you had met?
>
> https://issues.apache.org/jira/browse/KAFKA-10368
>
> best,
> chia-ping
>
> On 2024/03/23 21:06:59 Abhishek Singla wrote:
> > Hi Team,
> >
> > Kafka version: 2_2.12-2.6.0
> > Zookeeper version: 3.8.x
> >
> > We have a Kafka Cluster of 12 brokers spread equally across 3 racks.
> Topic
> > gets auto created with default num.partitions=6 and replication_factor=3.
> > It is observed that replicas are equally distributed over racks but
> within
> > the rack the replicas are randomly distributed like sometimes 3,3,0,0 or
> > sometimes 3:2:1 or sometime 2,2,1,1
> >
> > Is there a configuration to evenly distribute replicas across brokers
> > within a rack, maybe some sort of round robin strategy 2,2,1,1?
> >
> > And also it is observed that over time 1 broker ends up having way more
> > replicas across topics than the other broker in the same rack. Is there a
> > config for even distribution of replicas across topics also?
> >
> > Regards,
> > Abhishek Singla
> >
>


Re: Replicas not equally distributed within rack

2024-03-27 Thread Chia-Ping Tsai
hi Abhishek

Is this issue similar to the unbalance you had met?

https://issues.apache.org/jira/browse/KAFKA-10368

best,
chia-ping

On 2024/03/23 21:06:59 Abhishek Singla wrote:
> Hi Team,
> 
> Kafka version: 2_2.12-2.6.0
> Zookeeper version: 3.8.x
> 
> We have a Kafka Cluster of 12 brokers spread equally across 3 racks. Topic
> gets auto created with default num.partitions=6 and replication_factor=3.
> It is observed that replicas are equally distributed over racks but within
> the rack the replicas are randomly distributed like sometimes 3,3,0,0 or
> sometimes 3:2:1 or sometime 2,2,1,1
> 
> Is there a configuration to evenly distribute replicas across brokers
> within a rack, maybe some sort of round robin strategy 2,2,1,1?
> 
> And also it is observed that over time 1 broker ends up having way more
> replicas across topics than the other broker in the same rack. Is there a
> config for even distribution of replicas across topics also?
> 
> Regards,
> Abhishek Singla
> 


Re: Messages disappearing from Kafka Streams topology

2024-03-27 Thread mangat rai
Hey Karsten,

You don't need to do any other configuration to enable EOS. See here -
https://docs.confluent.io/platform/current/streams/concepts.html#processing-guarantees
It mentions that the producer will be idempotent. That also mans ack=all
will be considered. Not that if you have any other ack from the config, it
will be ignored in the favour of exactly-once.

Do let me know if that solves your problem. I am curious. if yes, then I
would ask you to create an issue.

Regards,
Mangat

On Wed, Mar 27, 2024 at 10:49 AM Karsten Stöckmann <
karsten.stoeckm...@gmail.com> wrote:

> Hi Mangat,
>
> thanks for clarification. So to my knowledge exactly-once is configured
> using the 'processing.guarantee=exactly_once_v2' setting? Is the
> configuration setting 'acks=all' somehow related and would you advise
> setting that as well?
>
> Best wishes
> Karsten
>
>
> mangat rai  schrieb am Di., 26. März 2024, 15:44:
>
> > Hey Karsten,
> >
> > So if a topic has not been created yet. Streams app will keep the data in
> > memory, and then write it later when it is available. if your app is
> > restarted (or thread is killed), you may lose data but it depends if the
> > app will commit in the source topics. If there is no errors, then it
> should
> > be persisted eventually.
> >
> > However, overall exactly-once provides a much tighter and better commit
> > control. If you don't have scaling issue, I will strongly advise you to
> use
> > EOS.
> >
> > Thanks,
> > Mangat
> >
> >
> > On Tue, Mar 26, 2024 at 3:33 PM Karsten Stöckmann <
> > karsten.stoeckm...@gmail.com> wrote:
> >
> > > Hi Mangat,
> > >
> > > thanks for your thoughts. I had actually considered exactly-once
> > semantics
> > > already, was unsure whether it would help, and left it aside for once
> > then.
> > > I'll try that immediately when I get back to work.
> > >
> > > About snapshots and deserialization - I doubt that the issue is caused
> by
> > > deserialization failures because: when taking another (i.e. at a later
> > > point of time) snapshot of the exact same data, all messages fed into
> the
> > > input topic pass the pipeline as expected.
> > >
> > > Logs of both Kafka and Kafka Streams show no signs of notable issues as
> > far
> > > as I can tell, apart from these (when initially starting up,
> intermediate
> > > topics not existing yet):
> > >
> > > 2024-03-22 22:36:11,386 WARN [org.apa.kaf.cli.NetworkClient]
> > >
> > >
> >
> (kstreams-folder-aggregator-a38397c2-d30a-437e-9817-baa605d49e23-StreamThread-4)
> > > [Consumer
> > >
> > >
> >
> clientId=kstreams-folder-aggregator-a38397c2-d30a-437e-9817-baa605d49e23-StreamThread-4-consumer,
> > > groupId=kstreams-folder-aggregator] Error while fetching metadata with
> > > correlation id 69 :
> > >
> > >
> >
> {kstreams-folder-aggregator-folder-to-agency-subscription-response-topic=UNKNOWN_TOPIC_OR_PARTITION,
> > > }
> > >
> > > Best wishes
> > > Karsten
> > >
> > >
> > >
> > > mangat rai  schrieb am Di., 26. März 2024,
> 11:06:
> > >
> > > > Hey Karsten,
> > > >
> > > > There could be several reasons this could happen.
> > > > 1. Did you check the error logs? There are several reasons why the
> > Kafka
> > > > stream app may drop incoming messages. Use exactly-once semantics to
> > > limit
> > > > such cases.
> > > > 2. Are you sure there was no error when deserializing the records
> from
> > > > `folderTopicName`. You mentioned that it happens only when you start
> > > > processing and the other table snapshot works fine. This gives me a
> > > feeling
> > > > that the first records in the topic might not be deserialized
> properly.
> > > >
> > > > Regards,
> > > > Mangat
> > > >
> > > > On Tue, Mar 26, 2024 at 8:45 AM Karsten Stöckmann <
> > > > karsten.stoeckm...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > thanks for getting back. I'll try and illustrate the issue.
> > > > >
> > > > > I've got an input topic 'folderTopicName' fed by a database CDC
> > system.
> > > > > Messages then pass a series of FK left joins and are eventually
> sent
> > to
> > > > an
> > > > > output topic like this ('agencies' and 'documents' being KTables):
> > > > >
> > > > >
> > > > > streamsBuilder //
> > > > > .table( //
> > > > > folderTopicName, //
> > > > > Consumed.with( //
> > > > > folderKeySerde, //
> > > > > folderSerde)) //
> > > > > .leftJoin( //
> > > > > agencies, //
> > > > > Folder::agencyIdValue, //
> > > > > AggregateFolder::new, //
> > > > > TableJoined.as("folder-to-agency"), //
> > > > > Materializer //
> > > > > . > > > > AggregateFolder>named("folder-to-agency-materialized") //
> > > > > 

Re: Messages disappearing from Kafka Streams topology

2024-03-27 Thread Karsten Stöckmann
Hi Mangat,

thanks for clarification. So to my knowledge exactly-once is configured
using the 'processing.guarantee=exactly_once_v2' setting? Is the
configuration setting 'acks=all' somehow related and would you advise
setting that as well?

Best wishes
Karsten


mangat rai  schrieb am Di., 26. März 2024, 15:44:

> Hey Karsten,
>
> So if a topic has not been created yet. Streams app will keep the data in
> memory, and then write it later when it is available. if your app is
> restarted (or thread is killed), you may lose data but it depends if the
> app will commit in the source topics. If there is no errors, then it should
> be persisted eventually.
>
> However, overall exactly-once provides a much tighter and better commit
> control. If you don't have scaling issue, I will strongly advise you to use
> EOS.
>
> Thanks,
> Mangat
>
>
> On Tue, Mar 26, 2024 at 3:33 PM Karsten Stöckmann <
> karsten.stoeckm...@gmail.com> wrote:
>
> > Hi Mangat,
> >
> > thanks for your thoughts. I had actually considered exactly-once
> semantics
> > already, was unsure whether it would help, and left it aside for once
> then.
> > I'll try that immediately when I get back to work.
> >
> > About snapshots and deserialization - I doubt that the issue is caused by
> > deserialization failures because: when taking another (i.e. at a later
> > point of time) snapshot of the exact same data, all messages fed into the
> > input topic pass the pipeline as expected.
> >
> > Logs of both Kafka and Kafka Streams show no signs of notable issues as
> far
> > as I can tell, apart from these (when initially starting up, intermediate
> > topics not existing yet):
> >
> > 2024-03-22 22:36:11,386 WARN [org.apa.kaf.cli.NetworkClient]
> >
> >
> (kstreams-folder-aggregator-a38397c2-d30a-437e-9817-baa605d49e23-StreamThread-4)
> > [Consumer
> >
> >
> clientId=kstreams-folder-aggregator-a38397c2-d30a-437e-9817-baa605d49e23-StreamThread-4-consumer,
> > groupId=kstreams-folder-aggregator] Error while fetching metadata with
> > correlation id 69 :
> >
> >
> {kstreams-folder-aggregator-folder-to-agency-subscription-response-topic=UNKNOWN_TOPIC_OR_PARTITION,
> > }
> >
> > Best wishes
> > Karsten
> >
> >
> >
> > mangat rai  schrieb am Di., 26. März 2024, 11:06:
> >
> > > Hey Karsten,
> > >
> > > There could be several reasons this could happen.
> > > 1. Did you check the error logs? There are several reasons why the
> Kafka
> > > stream app may drop incoming messages. Use exactly-once semantics to
> > limit
> > > such cases.
> > > 2. Are you sure there was no error when deserializing the records from
> > > `folderTopicName`. You mentioned that it happens only when you start
> > > processing and the other table snapshot works fine. This gives me a
> > feeling
> > > that the first records in the topic might not be deserialized properly.
> > >
> > > Regards,
> > > Mangat
> > >
> > > On Tue, Mar 26, 2024 at 8:45 AM Karsten Stöckmann <
> > > karsten.stoeckm...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > thanks for getting back. I'll try and illustrate the issue.
> > > >
> > > > I've got an input topic 'folderTopicName' fed by a database CDC
> system.
> > > > Messages then pass a series of FK left joins and are eventually sent
> to
> > > an
> > > > output topic like this ('agencies' and 'documents' being KTables):
> > > >
> > > >
> > > > streamsBuilder //
> > > > .table( //
> > > > folderTopicName, //
> > > > Consumed.with( //
> > > > folderKeySerde, //
> > > > folderSerde)) //
> > > > .leftJoin( //
> > > > agencies, //
> > > > Folder::agencyIdValue, //
> > > > AggregateFolder::new, //
> > > > TableJoined.as("folder-to-agency"), //
> > > > Materializer //
> > > > . > > > AggregateFolder>named("folder-to-agency-materialized") //
> > > > .withKeySerde(folderKeySerde) //
> > > >
> > >  .withValueSerde(aggregateFolderSerde))
> > > > //
> > > > .leftJoin( //
> > > > documents, //
> > > > .toStream(...
> > > > .to(...
> > > >
> > > > ...
> > > >
> > > > As far as I understand, left join sematics should be similar to those
> > of
> > > > relational databases, i.e. the left hand value always passes the join
> > > with
> > > > the right hand value set as  if not present. Whereas what I am
> > > > observing is this: lots of messages on the input topic are even
> absent
> > on
> > > > the first left join changelog topic
> > > > ('folder-to-agency-materialized-changelog'). But: this seems to
> happen
> > > only
> > > > in case the Streams application is fired up for the first time, i.e.
> > > > 

Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald
Hello to all users, contributors and Committers!

[ You are receiving this email as a subscriber to one or more ASF project
dev or user
  mailing lists and is not being sent to you directly. It is important that
we reach all of our
  users and contributors/committers so that they may get a chance to
benefit from this.
  We apologise in advance if this doesn't interest you but it is on topic
for the mailing
  lists of the Apache Software Foundation; and it is important please that
you do not
  mark this as spam in your email client. Thank You! ]

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code NA 2024 are now
open!

We will be supporting Community over Code NA, Denver Colorado in
October 7th to the 10th 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Monday 6th May, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Denver, Colorado , October 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


Re: Replicas not equally distributed within rack

2024-03-27 Thread Abhishek Singla
Hi Team,

Could someone help me with how to distribute kafka topic replicas evenly
across brokers to avoid data skew (disk utilisation).

Regards,
Abhishek Singla

On Sun, Mar 24, 2024 at 2:36 AM Abhishek Singla 
wrote:

> Hi Team,
>
> Kafka version: 2_2.12-2.6.0
> Zookeeper version: 3.8.x
>
> We have a Kafka Cluster of 12 brokers spread equally across 3 racks. Topic
> gets auto created with default num.partitions=6 and replication_factor=3.
> It is observed that replicas are equally distributed over racks but within
> the rack the replicas are randomly distributed like sometimes 3,3,0,0 or
> sometimes 3:2:1 or sometime 2,2,1,1
>
> Is there a configuration to evenly distribute replicas across brokers
> within a rack, maybe some sort of round robin strategy 2,2,1,1?
>
> And also it is observed that over time 1 broker ends up having way more
> replicas across topics than the other broker in the same rack. Is there a
> config for even distribution of replicas across topics also?
>
> Regards,
> Abhishek Singla
>


Re: [ANNOUNCE] New committer: Christo Lolov

2024-03-27 Thread Matthias J. Sax

Congrats!

On 3/26/24 9:39 PM, Christo Lolov wrote:

Thank you everyone!

It wouldn't have been possible without quite a lot of reviews and extremely
helpful inputs from you and the rest of the community! I am looking forward
to working more closely with you going forward :)

On Tue, 26 Mar 2024 at 14:31, Kirk True  wrote:


Congratulations Christo!


On Mar 26, 2024, at 7:27 AM, Satish Duggana 

wrote:


Congratulations Christo!

On Tue, 26 Mar 2024 at 19:20, Ivan Yurchenko  wrote:


Congrats!

On Tue, Mar 26, 2024, at 14:48, Lucas Brutschy wrote:

Congrats!

On Tue, Mar 26, 2024 at 2:44 PM Federico Valeri 

wrote:


Congrats!

On Tue, Mar 26, 2024 at 2:27 PM Mickael Maison <

mickael.mai...@gmail.com> wrote:


Congratulations Christo!

On Tue, Mar 26, 2024 at 2:26 PM Chia-Ping Tsai 

wrote:


Congrats Christo!

Chia-Ping