Yeah that I realized and later read that in Java Kafka consumer there is one thread, that's why such behavior does not arise there. May be if I need to restrict my application to a single threaded :( in order to achieve that.. I need to ask Magnus Edenhill who is librdkafka expert.. Thanks for your time Hans
Abhi On Sun, Oct 9, 2016 at 10:08 PM, Hans Jespersen <h...@confluent.io> wrote: > I'm pretty sure Jun was talking about the Java API in the quoted blog > text, not librdkafka. There is only one thread in the new Java consumer so > you wouldn't see this behavior. I do not think that librdkafka makes any > such guarantee to dispatch unique keys to each thread but I'm not an expert > in librdkafka so others may be about to help you better on that. > //h...@confluent.io > -------- Original message --------From: Abhit Kalsotra <abhit...@gmail.com> > Date: 10/9/16 3:58 AM (GMT-08:00) To: users@kafka.apache.org Subject: > Re: Regarding Kafka > I did that but i am getting confusing results > > e.g > > I have created 4 Kafka Consumer threads for doing data analytic, these > threads just wait for Kafka messages to get consumed and > I have provided the key provided when I produce, it means that all the > messages will go to one single partition ref " > http://www.confluent.io/blog/how-to-choose-the-number-of- > topicspartitions-in-a-kafka-cluster/ > " > "* On the consumer side, Kafka always gives a single partition’s data to > one consumer thread.*" > > If you see my application logs, my 4 Kafka Consumer Application threads > which are calling consume() , Arn't all message of a particular ID should > be consumed by one Kafka Application thread ? > > [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID > date:2016-09-28 20:07:32.000 ] > [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID > date: 2016-09-28 20:07:39.000 ] > [2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID > date: 2016-09-28 20:07:35.000 ] > [2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID > date: 2016-09-28 20:07:34.000 ] > [2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID > date: 2016-09-28 20:07:33.000 ] > [2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID > date: 2016-09-28 20:07:36.000 ] > [2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID > date: 2016-09-28 20:07:37.000 ] > [2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID > date: 2016-09-28 20:07:38.000 ] > [2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID > date: 2016-09-28 20:07:39.000 ] > > > > > On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <h...@confluent.io> wrote: > > > Then publish with the user ID as the key and all messages for the same > key > > will be guaranteed to go to the same partition and therefore be in order > > for whichever consumer gets that partition. > > > > > > //h...@confluent.io > > -------- Original message --------From: Abhit Kalsotra < > abhit...@gmail.com> > > Date: 10/9/16 12:39 AM (GMT-08:00) To: users@kafka.apache.org Subject: > > Re: Regarding Kafka > > What about the order of message getting received ? If i don't mention the > > partition. > > > > Lets say if i have user ID :4456 and I have to do some analytics at the > > Kafka Consumer end and at my consumer end if its not getting consumed the > > way I sent, then my analytics will go haywire. > > > > Abhi > > > > On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <h...@confluent.io> > wrote: > > > > > You don't even have to do that because the default partitioner will > > spread > > > the data you publish to the topic over the available partitions for > you. > > > Just try it out to see. Publish multiple messages to the topic without > > > using keys, and without specifying a partition, and observe that they > are > > > automatically distributed out over the available partitions. > > > > > > > > > //h...@confluent.io > > > -------- Original message --------From: Abhit Kalsotra < > > abhit...@gmail.com> > > > Date: 10/8/16 11:19 PM (GMT-08:00) To: users@kafka.apache.org > Subject: > > > Re: Regarding Kafka > > > Hans > > > > > > Thanks for the response, yeah you can say yeah I am treating topics > like > > > partitions, because my > > > > > > current logic of producing to a respective topic goes something like > this > > > > > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_ > > > kafkaTopic[whichTopic], > > > > > partition, > > > > > > RdKafka::Producer::RK_MSG_COPY, > > > ptr, > > > size, > > > > > > &partitionKey, > > > NULL); > > > where partitionKey is unique number or userID, so what I am doing > > currently > > > each partitionKey%10 > > > so whats so ever is the remainder, I am dumping that to the respective > > > topic. > > > > > > But as per your suggestion, Let me create close to 40-50 partitions > for a > > > single topic and when i am producing I do something like this > > > > > > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic, > > > > > > partition%(50), > > > > > > RdKafka::Producer::RK_MSG_COPY, > > > ptr, > > > size, > > > > > > &partitionKey, > > > NULL); > > > > > > Abhi > > > > > > On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <h...@confluent.io> > > wrote: > > > > > > > Why do you have 10 topics? It seems like you are treating topics > like > > > > partitions and it's unclear why you don't just have 1 topic with 10, > > 20, > > > or > > > > even 30 partitions. Ordering is only guaranteed at a partition level. > > > > > > > > In general if you want to capacity plan for partitions you benchmark > a > > > > single partition and then divide your peak estimated throughput by > the > > > > results of the single partition results. > > > > > > > > If you expect the peak throughput to increase over time then double > > your > > > > partition count to allow room to grow the number of consumers without > > > > having to repartition. > > > > > > > > Sizing can be a bit more tricky if you are using keys but it doesn't > > > sound > > > > like you are if today you are publishing to topics the way you > > describe. > > > > > > > > -hans > > > > > > > > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit...@gmail.com> > > wrote: > > > > > > > > > > Guys any views ? > > > > > > > > > > Abhi > > > > > > > > > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra < > abhit...@gmail.com> > > > > wrote: > > > > >> > > > > >> Hello > > > > >> > > > > >> I am using librdkafka c++ library for my application . > > > > >> > > > > >> *My Kafka Cluster Set up* > > > > >> 2 Kafka Zookeper running on 2 different instances > > > > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other > > > machine > > > > >> Total 10 Topics and partition count is 3 with replication factor > of > > 3. > > > > >> > > > > >> Now in my case I need to be very specific for the *message order* > > > when I > > > > >> am consuming the messages. I know if all the messages gets > produced > > to > > > > the > > > > >> same partition, it always gets consumed in the same order. > > > > >> > > > > >> I need expert opinions like what's the ideal partition count I > > should > > > > >> consider without effecting performance.( I am looking for close to > > > > 100,000 > > > > >> messages per seconds). > > > > >> The topics are from 0 to 9 and when I am producing messages I do > > > > something > > > > >> like uniqueUserId % 10 , and then pointing to a respective topic > > like > > > 0 > > > > || > > > > >> 1 || 2 etc.. > > > > >> > > > > >> Abhi > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> If you can't succeed, call it version 1.0 > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > If you can't succeed, call it version 1.0 > > > > > > > > > > > > > > > > -- > > > If you can't succeed, call it version 1.0 > > > > > > > > > > > -- > > If you can't succeed, call it version 1.0 > > > > > > -- > If you can't succeed, call it version 1.0 > -- If you can't succeed, call it version 1.0