Re: Horizontally Scaling Kafka Consumers
The Go Kafka Client also supports offset storage in ZK and Kafka https://github.com/stealthly/go_kafka_client/blob/master/docs/offset_storage.md and has two other strategies for partition ownership with a consensus server (currently uses Zookeeper will be implementing Consul in near future). ~ Joestein On Thu, Apr 30, 2015 at 2:15 AM, Nimi Wariboko Jr n...@channelmeter.com wrote: My mistake, it seems the Java drivers are a lot more advanced than the Shopify's Kafka driver (or I am missing something) - and I haven't used Kafka before. With the Go driver - it seems you have to manage offsets and partitions within the application code, while in Scala driver it seems you have the option of simply subscribing to a topic, and someone else will manage that part. After digging around a bit more, I found there is another library - https://github.com/wvanbergen/kafka - that speaks the consumergroup API and accomplishes what I was looking for and I assume is implemented by keeping track of memberships w/ Zookeeper. Thank you for the information - it really helped clear up what I failing to understand with kafka. Nimi On Wed, Apr 29, 2015 at 10:10 PM, Joe Stein joe.st...@stealth.ly wrote: You can do this with the existing Kafka Consumer https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106 and probably any other Kafka client too (maybe with minor/major rework to-do the offset management). The new consumer approach is more transparent on Subscribing To Specific Partitions https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234 . Here is a Docker file (** pull request pending **) for wrapping kafka consumers (doesn't have to be the go client, need to abstract that out some more after more testing) https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile Also a VM (** pull request pending **) to build container, push to local docker repository and launch on Apache Mesos https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant as working example how-to-do. All of this could be done without the Docker container and still work on Mesos ... or even without Mesos and on YARN. You might also want to checkout how Samza integrates with Execution Frameworks http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375 and built in YARN support. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Wed, Apr 29, 2015 at 8:56 AM, David Corley davidcor...@gmail.com wrote: You're right Stevo, I should re-phrase to say that there can be no more _active_ consumers than there are partitions (within a single consumer group). I'm guessing that's what Nimi is alluding to asking, but perhaps he can elaborate on whether he's using consumer groups and/or whether the 100 partitions are all for a single topic, or multiple topics. On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote: Please correct me if wrong, but I think it is really not hard constraint that one cannot have more consumers (from same group) than partitions on single topic - all the surplus consumers will not be assigned to consume any partition, but they can be there and as soon as one active consumer from same group goes offline (its connection to ZK is dropped), consumers from the group will be rebalanced so one passively waiting consumer will become active. Kind regards, Stevo Slavic. On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com wrote: If the 100 partitions are all for the same topic, you can have up to 100 consumers working as part of a single consumer group for that topic. You cannot have more consumers than there are partitions within a given consumer group. On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com wrote: Hi, I was wondering what options there are for horizontally scaling kafka consumers? Basically if I have 100 partitions and 10 consumers, and want to temporarily scale up to 50 consumers, what options do I have? So far I've thought of just simply tracking consumer membership somehow (either through Raft or zookeeper's znodes) on the consumers.
Re: Horizontally Scaling Kafka Consumers
On Thu, Apr 30, 2015 at 2:15 AM, Nimi Wariboko Jr n...@channelmeter.com wrote: My mistake, it seems the Java drivers are a lot more advanced than the Shopify's Kafka driver (or I am missing something) - and I haven't used Kafka before. With the Go driver - it seems you have to manage offsets and partitions within the application code, while in Scala driver it seems you have the option of simply subscribing to a topic, and someone else will manage that part. After digging around a bit more, I found there is another library - https://github.com/wvanbergen/kafka - that speaks the consumergroup API and accomplishes what I was looking for and I assume is implemented by keeping track of memberships w/ Zookeeper. Yes. That library is built on top of Sarama (Shopify's Go kafka driver), and it's on our roadmap to integrate it properly. As far as I know, this is the only major area where Sarama is lagging behind the jvm client. Thank you for the information - it really helped clear up what I failing to understand with kafka. Nimi On Wed, Apr 29, 2015 at 10:10 PM, Joe Stein joe.st...@stealth.ly wrote: You can do this with the existing Kafka Consumer https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106 and probably any other Kafka client too (maybe with minor/major rework to-do the offset management). The new consumer approach is more transparent on Subscribing To Specific Partitions https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234 . Here is a Docker file (** pull request pending **) for wrapping kafka consumers (doesn't have to be the go client, need to abstract that out some more after more testing) https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile Also a VM (** pull request pending **) to build container, push to local docker repository and launch on Apache Mesos https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant as working example how-to-do. All of this could be done without the Docker container and still work on Mesos ... or even without Mesos and on YARN. You might also want to checkout how Samza integrates with Execution Frameworks http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375 and built in YARN support. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Wed, Apr 29, 2015 at 8:56 AM, David Corley davidcor...@gmail.com wrote: You're right Stevo, I should re-phrase to say that there can be no more _active_ consumers than there are partitions (within a single consumer group). I'm guessing that's what Nimi is alluding to asking, but perhaps he can elaborate on whether he's using consumer groups and/or whether the 100 partitions are all for a single topic, or multiple topics. On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote: Please correct me if wrong, but I think it is really not hard constraint that one cannot have more consumers (from same group) than partitions on single topic - all the surplus consumers will not be assigned to consume any partition, but they can be there and as soon as one active consumer from same group goes offline (its connection to ZK is dropped), consumers from the group will be rebalanced so one passively waiting consumer will become active. Kind regards, Stevo Slavic. On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com wrote: If the 100 partitions are all for the same topic, you can have up to 100 consumers working as part of a single consumer group for that topic. You cannot have more consumers than there are partitions within a given consumer group. On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com wrote: Hi, I was wondering what options there are for horizontally scaling kafka consumers? Basically if I have 100 partitions and 10 consumers, and want to temporarily scale up to 50 consumers, what options do I have? So far I've thought of just simply tracking consumer membership somehow (either through Raft or zookeeper's znodes) on the consumers.
Re: Horizontally Scaling Kafka Consumers
You need to first decide the conditions that need to be met for you to scale to 50 consumers. These can be as simple as the consumer lag. Look at the console offset checker tool and see if any of those numbers make sense. Your existing consumers could also produce some metrics based on which another process will decide when to spawn new customers. -- Sharninder On Wed, Apr 29, 2015 at 11:58 PM, Nimi Wariboko Jr n...@channelmeter.com wrote: Hi, I was wondering what options there are/what other people are doing for horizontally scaling kafka consumers? Basically if I have 100 partitions and 10 consumers, and want to temporarily scale up to 50 consumers, what can I do? So far I've thought of just simply tracking consumer membership somehow (either through zookeeper's ephemeral nodes or maybe using gossip) on the consumers to achieve consensus on who consumes what. Another option would be having a router, possibly using something like nsq (I understand that they are similar pieces of software, but what we are going for is a persistent distributed queue (sharding) which is why I'm looking into Kafka)? -- -- Sharninder
Re: Horizontally Scaling Kafka Consumers
My mistake, it seems the Java drivers are a lot more advanced than the Shopify's Kafka driver (or I am missing something) - and I haven't used Kafka before. With the Go driver - it seems you have to manage offsets and partitions within the application code, while in Scala driver it seems you have the option of simply subscribing to a topic, and someone else will manage that part. After digging around a bit more, I found there is another library - https://github.com/wvanbergen/kafka - that speaks the consumergroup API and accomplishes what I was looking for and I assume is implemented by keeping track of memberships w/ Zookeeper. Thank you for the information - it really helped clear up what I failing to understand with kafka. Nimi On Wed, Apr 29, 2015 at 10:10 PM, Joe Stein joe.st...@stealth.ly wrote: You can do this with the existing Kafka Consumer https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106 and probably any other Kafka client too (maybe with minor/major rework to-do the offset management). The new consumer approach is more transparent on Subscribing To Specific Partitions https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234 . Here is a Docker file (** pull request pending **) for wrapping kafka consumers (doesn't have to be the go client, need to abstract that out some more after more testing) https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile Also a VM (** pull request pending **) to build container, push to local docker repository and launch on Apache Mesos https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant as working example how-to-do. All of this could be done without the Docker container and still work on Mesos ... or even without Mesos and on YARN. You might also want to checkout how Samza integrates with Execution Frameworks http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375 and built in YARN support. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Wed, Apr 29, 2015 at 8:56 AM, David Corley davidcor...@gmail.com wrote: You're right Stevo, I should re-phrase to say that there can be no more _active_ consumers than there are partitions (within a single consumer group). I'm guessing that's what Nimi is alluding to asking, but perhaps he can elaborate on whether he's using consumer groups and/or whether the 100 partitions are all for a single topic, or multiple topics. On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote: Please correct me if wrong, but I think it is really not hard constraint that one cannot have more consumers (from same group) than partitions on single topic - all the surplus consumers will not be assigned to consume any partition, but they can be there and as soon as one active consumer from same group goes offline (its connection to ZK is dropped), consumers from the group will be rebalanced so one passively waiting consumer will become active. Kind regards, Stevo Slavic. On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com wrote: If the 100 partitions are all for the same topic, you can have up to 100 consumers working as part of a single consumer group for that topic. You cannot have more consumers than there are partitions within a given consumer group. On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com wrote: Hi, I was wondering what options there are for horizontally scaling kafka consumers? Basically if I have 100 partitions and 10 consumers, and want to temporarily scale up to 50 consumers, what options do I have? So far I've thought of just simply tracking consumer membership somehow (either through Raft or zookeeper's znodes) on the consumers.
Re: Horizontally Scaling Kafka Consumers
If the 100 partitions are all for the same topic, you can have up to 100 consumers working as part of a single consumer group for that topic. You cannot have more consumers than there are partitions within a given consumer group. On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com wrote: Hi, I was wondering what options there are for horizontally scaling kafka consumers? Basically if I have 100 partitions and 10 consumers, and want to temporarily scale up to 50 consumers, what options do I have? So far I've thought of just simply tracking consumer membership somehow (either through Raft or zookeeper's znodes) on the consumers.
Re: Horizontally Scaling Kafka Consumers
Please correct me if wrong, but I think it is really not hard constraint that one cannot have more consumers (from same group) than partitions on single topic - all the surplus consumers will not be assigned to consume any partition, but they can be there and as soon as one active consumer from same group goes offline (its connection to ZK is dropped), consumers from the group will be rebalanced so one passively waiting consumer will become active. Kind regards, Stevo Slavic. On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com wrote: If the 100 partitions are all for the same topic, you can have up to 100 consumers working as part of a single consumer group for that topic. You cannot have more consumers than there are partitions within a given consumer group. On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com wrote: Hi, I was wondering what options there are for horizontally scaling kafka consumers? Basically if I have 100 partitions and 10 consumers, and want to temporarily scale up to 50 consumers, what options do I have? So far I've thought of just simply tracking consumer membership somehow (either through Raft or zookeeper's znodes) on the consumers.
Re: Horizontally Scaling Kafka Consumers
You're right Stevo, I should re-phrase to say that there can be no more _active_ consumers than there are partitions (within a single consumer group). I'm guessing that's what Nimi is alluding to asking, but perhaps he can elaborate on whether he's using consumer groups and/or whether the 100 partitions are all for a single topic, or multiple topics. On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote: Please correct me if wrong, but I think it is really not hard constraint that one cannot have more consumers (from same group) than partitions on single topic - all the surplus consumers will not be assigned to consume any partition, but they can be there and as soon as one active consumer from same group goes offline (its connection to ZK is dropped), consumers from the group will be rebalanced so one passively waiting consumer will become active. Kind regards, Stevo Slavic. On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com wrote: If the 100 partitions are all for the same topic, you can have up to 100 consumers working as part of a single consumer group for that topic. You cannot have more consumers than there are partitions within a given consumer group. On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com wrote: Hi, I was wondering what options there are for horizontally scaling kafka consumers? Basically if I have 100 partitions and 10 consumers, and want to temporarily scale up to 50 consumers, what options do I have? So far I've thought of just simply tracking consumer membership somehow (either through Raft or zookeeper's znodes) on the consumers.
Re: Horizontally Scaling Kafka Consumers
You can do this with the existing Kafka Consumer https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106 and probably any other Kafka client too (maybe with minor/major rework to-do the offset management). The new consumer approach is more transparent on Subscribing To Specific Partitions https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234 . Here is a Docker file (** pull request pending **) for wrapping kafka consumers (doesn't have to be the go client, need to abstract that out some more after more testing) https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile Also a VM (** pull request pending **) to build container, push to local docker repository and launch on Apache Mesos https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant as working example how-to-do. All of this could be done without the Docker container and still work on Mesos ... or even without Mesos and on YARN. You might also want to checkout how Samza integrates with Execution Frameworks http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375 and built in YARN support. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Wed, Apr 29, 2015 at 8:56 AM, David Corley davidcor...@gmail.com wrote: You're right Stevo, I should re-phrase to say that there can be no more _active_ consumers than there are partitions (within a single consumer group). I'm guessing that's what Nimi is alluding to asking, but perhaps he can elaborate on whether he's using consumer groups and/or whether the 100 partitions are all for a single topic, or multiple topics. On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote: Please correct me if wrong, but I think it is really not hard constraint that one cannot have more consumers (from same group) than partitions on single topic - all the surplus consumers will not be assigned to consume any partition, but they can be there and as soon as one active consumer from same group goes offline (its connection to ZK is dropped), consumers from the group will be rebalanced so one passively waiting consumer will become active. Kind regards, Stevo Slavic. On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com wrote: If the 100 partitions are all for the same topic, you can have up to 100 consumers working as part of a single consumer group for that topic. You cannot have more consumers than there are partitions within a given consumer group. On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com wrote: Hi, I was wondering what options there are for horizontally scaling kafka consumers? Basically if I have 100 partitions and 10 consumers, and want to temporarily scale up to 50 consumers, what options do I have? So far I've thought of just simply tracking consumer membership somehow (either through Raft or zookeeper's znodes) on the consumers.