Re: Regarding Kafka

Abhit Kalsotra Sun, 09 Oct 2016 10:05:20 -0700

Yeah that I realized and later read that in Java Kafka consumer there is
one thread, that's why such behavior does not arise there. May be if I need
to restrict my application to a single threaded :( in order to achieve
that.. I need to ask Magnus Edenhill who is librdkafka expert..
Thanks for your time Hans


Abhi

On Sun, Oct 9, 2016 at 10:08 PM, Hans Jespersen <h...@confluent.io> wrote:

> I'm pretty sure Jun was talking about the Java API in the quoted blog
> text, not librdkafka. There is only one thread in the new Java consumer so
> you wouldn't see this behavior. I do not think that librdkafka makes any
> such guarantee to dispatch unique keys to each thread but I'm not an expert
> in librdkafka so others may be about to help you better on that.
> //h...@confluent.io
> -------- Original message --------From: Abhit Kalsotra <abhit...@gmail.com>
> Date: 10/9/16  3:58 AM  (GMT-08:00) To: users@kafka.apache.org Subject:
> Re: Regarding Kafka
> I did that but i am getting confusing results
>
> e.g
>
> I have created 4 Kafka Consumer threads for doing data analytic, these
> threads just wait for Kafka messages to get consumed and
> I have provided the key provided when I produce, it means that all the
> messages will go to one single partition ref "
> http://www.confluent.io/blog/how-to-choose-the-number-of-
> topicspartitions-in-a-kafka-cluster/
> "
> "* On the consumer side, Kafka always gives a single partition’s data to
> one consumer thread.*"
>
> If you see my application logs, my 4 Kafka Consumer Application threads
> which are calling consume() , Arn't all message of a particular ID should
> be consumed by one Kafka Application thread ?
>
> [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID
> date:2016-09-28 20:07:32.000 ]
> [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID
> date: 2016-09-28 20:07:39.000 ]
> [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID
> date: 2016-09-28 20:07:35.000 ]
> [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID
> date: 2016-09-28 20:07:34.000 ]
> [2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID
> date: 2016-09-28 20:07:33.000 ]
> [2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID
> date: 2016-09-28 20:07:36.000 ]
> [2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID
> date: 2016-09-28 20:07:37.000 ]
> [2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID
> date: 2016-09-28 20:07:38.000 ]
> [2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID
> date: 2016-09-28 20:07:39.000 ]
>
>
>
>
> On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <h...@confluent.io> wrote:
>
> > Then publish with the user ID as the key and all messages for the same
> key
> > will be guaranteed to go to the same partition and therefore be in order
> > for whichever consumer gets that partition.
> >
> >
> > //h...@confluent.io
> > -------- Original message --------From: Abhit Kalsotra <
> abhit...@gmail.com>
> > Date: 10/9/16  12:39 AM  (GMT-08:00) To: users@kafka.apache.org Subject:
> > Re: Regarding Kafka
> > What about the order of message getting received ? If i don't mention the
> > partition.
> >
> > Lets say if i have user ID :4456 and I have to do some analytics at the
> > Kafka Consumer end and at my consumer end if its not getting consumed the
> > way I sent, then my analytics will go haywire.
> >
> > Abhi
> >
> > On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io>
> wrote:
> >
> > > You don't even have to do that because the default partitioner will
> > spread
> > > the data you publish to the topic over the available partitions for
> you.
> > > Just try it out to see. Publish multiple messages to the topic without
> > > using keys, and without specifying a partition, and observe that they
> are
> > > automatically distributed out over the available partitions.
> > >
> > >
> > > //h...@confluent.io
> > > -------- Original message --------From: Abhit Kalsotra <
> > abhit...@gmail.com>
> > > Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org
> Subject:
> > > Re: Regarding Kafka
> > > Hans
> > >
> > > Thanks for the response, yeah you can say yeah I am treating topics
> like
> > > partitions, because my
> > >
> > > current logic of producing to a respective topic goes something like
> this
> > >
> > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_
> > > kafkaTopic[whichTopic],
> > >
> > partition,
> > >
> > > RdKafka::Producer::RK_MSG_COPY,
> > >                                                                 ptr,
> > >                                                                 size,
> > >
> > > &partitionKey,
> > >                                                                 NULL);
> > > where partitionKey is unique number or userID, so what I am doing
> > currently
> > > each partitionKey%10
> > > so whats so ever is the remainder, I am dumping that to the respective
> > > topic.
> > >
> > > But as per your suggestion, Let me create close to 40-50 partitions
> for a
> > > single topic and when i am producing I do something like this
> > >
> > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,
> > >
> > > partition%(50),
> > >
> > > RdKafka::Producer::RK_MSG_COPY,
> > >                                                                 ptr,
> > >                                                                 size,
> > >
> > > &partitionKey,
> > >                                                                 NULL);
> > >
> > > Abhi
> > >
> > > On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io>
> > wrote:
> > >
> > > > Why do you have 10 topics?  It seems like you are treating topics
> like
> > > > partitions and it's unclear why you don't just have 1 topic with 10,
> > 20,
> > > or
> > > > even 30 partitions. Ordering is only guaranteed at a partition level.
> > > >
> > > > In general if you want to capacity plan for partitions you benchmark
> a
> > > > single partition and then divide your peak estimated throughput by
> the
> > > > results of the single partition results.
> > > >
> > > > If you expect the peak throughput to increase over time then double
> > your
> > > > partition count to allow room to grow the number of consumers without
> > > > having to repartition.
> > > >
> > > > Sizing can be a bit more tricky if you are using keys but it doesn't
> > > sound
> > > > like you are if today you are publishing to topics the way you
> > describe.
> > > >
> > > > -hans
> > > >
> > > > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit...@gmail.com>
> > wrote:
> > > > >
> > > > > Guys any views ?
> > > > >
> > > > > Abhi
> > > > >
> > > > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <
> abhit...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> Hello
> > > > >>
> > > > >> I am using librdkafka c++ library for my application .
> > > > >>
> > > > >> *My Kafka Cluster Set up*
> > > > >> 2 Kafka Zookeper running on 2 different instances
> > > > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other
> > > machine
> > > > >> Total 10 Topics and partition count is 3 with replication factor
> of
> > 3.
> > > > >>
> > > > >> Now in my case I need to be very specific for the *message order*
> > > when I
> > > > >> am consuming the messages. I know if all the messages gets
> produced
> > to
> > > > the
> > > > >> same partition, it always gets consumed in the same order.
> > > > >>
> > > > >> I need expert opinions like what's the ideal partition count I
> > should
> > > > >> consider without effecting performance.( I am looking for close to
> > > > 100,000
> > > > >> messages per seconds).
> > > > >> The topics are from 0 to 9 and when I am producing messages I do
> > > > something
> > > > >> like uniqueUserId % 10 , and then pointing to a respective topic
> > like
> > > 0
> > > > ||
> > > > >> 1 || 2 etc..
> > > > >>
> > > > >> Abhi
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> If you can't succeed, call it version 1.0
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > If you can't succeed, call it version 1.0
> > > >
> > >
> > >
> > >
> > > --
> > > If you can't succeed, call it version 1.0
> > >
> >
> >
> >
> > --
> > If you can't succeed, call it version 1.0
> >
>
>
>
> --
> If you can't succeed, call it version 1.0
>



-- 
If you can't succeed, call it version 1.0

Re: Regarding Kafka

Reply via email to