Re: Horizontally Scaling Kafka Consumers

2015-04-30 Thread Joe Stein
The Go Kafka Client also supports offset storage in ZK and Kafka
https://github.com/stealthly/go_kafka_client/blob/master/docs/offset_storage.md
and has two other strategies for partition ownership with a consensus
server (currently uses Zookeeper will be implementing Consul in near
future).

~ Joestein

On Thu, Apr 30, 2015 at 2:15 AM, Nimi Wariboko Jr n...@channelmeter.com
wrote:

 My mistake, it seems the Java drivers are a lot more advanced than the
 Shopify's Kafka driver (or I am missing something) - and I haven't used
 Kafka before.

 With the Go driver - it seems you have to manage offsets and partitions
 within the application code, while in Scala driver it seems you have the
 option of simply subscribing to a topic, and someone else will manage that
 part.

 After digging around a bit more, I found there is another library -
 https://github.com/wvanbergen/kafka - that speaks the consumergroup API
 and
 accomplishes what I was looking for and I assume is implemented by keeping
 track of memberships w/ Zookeeper.

 Thank you for the information - it really helped clear up what I failing to
 understand with kafka.

 Nimi

 On Wed, Apr 29, 2015 at 10:10 PM, Joe Stein joe.st...@stealth.ly wrote:

  You can do this with the existing Kafka Consumer
 
 
 https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106
  and probably any other Kafka client too (maybe with minor/major rework
  to-do the offset management).
 
  The new consumer approach is more transparent on Subscribing To Specific
  Partitions
 
 
 https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234
  .
 
  Here is a Docker file (** pull request pending **) for wrapping kafka
  consumers (doesn't have to be the go client, need to abstract that out
 some
  more after more testing)
 
 
 https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile
 
 
  Also a VM (** pull request pending **) to build container, push to local
  docker repository and launch on Apache Mesos
 
 
 https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant
  as working example how-to-do.
 
  All of this could be done without the Docker container and still work on
  Mesos ... or even without Mesos and on YARN.
 
  You might also want to checkout how Samza integrates with Execution
  Frameworks
 
 
 http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html
  which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375
  and
  built in YARN support.
 
  ~ Joe Stein
  - - - - - - - - - - - - - - - - -
 
http://www.stealth.ly
  - - - - - - - - - - - - - - - - -
 
  On Wed, Apr 29, 2015 at 8:56 AM, David Corley davidcor...@gmail.com
  wrote:
 
   You're right Stevo, I should re-phrase to say that there can be no more
   _active_ consumers than there are partitions (within a single consumer
   group).
   I'm guessing that's what Nimi is alluding to asking, but perhaps he can
   elaborate on whether he's using consumer groups and/or whether the 100
   partitions are all for a single topic, or multiple topics.
  
   On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote:
  
Please correct me if wrong, but I think it is really not hard
  constraint
that one cannot have more consumers (from same group) than partitions
  on
single topic - all the surplus consumers will not be assigned to
  consume
any partition, but they can be there and as soon as one active
 consumer
from same group goes offline (its connection to ZK is dropped),
  consumers
from the group will be rebalanced so one passively waiting consumer
  will
become active.
   
Kind regards,
Stevo Slavic.
   
On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com
 
wrote:
   
 If the 100 partitions are all for the same topic, you can have up
 to
   100
 consumers working as part of a single consumer group for that
 topic.
 You cannot have more consumers than there are partitions within a
  given
 consumer group.

 On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com
 
wrote:

  Hi,
 
  I was wondering what options there are for horizontally scaling
  kafka
  consumers? Basically if I have 100 partitions and 10 consumers,
 and
want
 to
  temporarily scale up to 50 consumers, what options do I have?
 
  So far I've thought of just simply tracking consumer membership
   somehow
  (either through Raft or zookeeper's znodes) on the consumers.
 

   
  
 



Re: Horizontally Scaling Kafka Consumers

2015-04-30 Thread Evan Huus
On Thu, Apr 30, 2015 at 2:15 AM, Nimi Wariboko Jr n...@channelmeter.com
wrote:

 My mistake, it seems the Java drivers are a lot more advanced than the
 Shopify's Kafka driver (or I am missing something) - and I haven't used
 Kafka before.

 With the Go driver - it seems you have to manage offsets and partitions
 within the application code, while in Scala driver it seems you have the
 option of simply subscribing to a topic, and someone else will manage that
 part.

 After digging around a bit more, I found there is another library -
 https://github.com/wvanbergen/kafka - that speaks the consumergroup API
 and
 accomplishes what I was looking for and I assume is implemented by keeping
 track of memberships w/ Zookeeper.


Yes. That library is built on top of Sarama (Shopify's Go kafka driver),
and it's on our roadmap to integrate it properly. As far as I know, this is
the only major area where Sarama is lagging behind the jvm client.



 Thank you for the information - it really helped clear up what I failing to
 understand with kafka.

 Nimi

 On Wed, Apr 29, 2015 at 10:10 PM, Joe Stein joe.st...@stealth.ly wrote:

  You can do this with the existing Kafka Consumer
 
 
 https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106
  and probably any other Kafka client too (maybe with minor/major rework
  to-do the offset management).
 
  The new consumer approach is more transparent on Subscribing To Specific
  Partitions
 
 
 https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234
  .
 
  Here is a Docker file (** pull request pending **) for wrapping kafka
  consumers (doesn't have to be the go client, need to abstract that out
 some
  more after more testing)
 
 
 https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile
 
 
  Also a VM (** pull request pending **) to build container, push to local
  docker repository and launch on Apache Mesos
 
 
 https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant
  as working example how-to-do.
 
  All of this could be done without the Docker container and still work on
  Mesos ... or even without Mesos and on YARN.
 
  You might also want to checkout how Samza integrates with Execution
  Frameworks
 
 
 http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html
  which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375
  and
  built in YARN support.
 
  ~ Joe Stein
  - - - - - - - - - - - - - - - - -
 
http://www.stealth.ly
  - - - - - - - - - - - - - - - - -
 
  On Wed, Apr 29, 2015 at 8:56 AM, David Corley davidcor...@gmail.com
  wrote:
 
   You're right Stevo, I should re-phrase to say that there can be no more
   _active_ consumers than there are partitions (within a single consumer
   group).
   I'm guessing that's what Nimi is alluding to asking, but perhaps he can
   elaborate on whether he's using consumer groups and/or whether the 100
   partitions are all for a single topic, or multiple topics.
  
   On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote:
  
Please correct me if wrong, but I think it is really not hard
  constraint
that one cannot have more consumers (from same group) than partitions
  on
single topic - all the surplus consumers will not be assigned to
  consume
any partition, but they can be there and as soon as one active
 consumer
from same group goes offline (its connection to ZK is dropped),
  consumers
from the group will be rebalanced so one passively waiting consumer
  will
become active.
   
Kind regards,
Stevo Slavic.
   
On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com
 
wrote:
   
 If the 100 partitions are all for the same topic, you can have up
 to
   100
 consumers working as part of a single consumer group for that
 topic.
 You cannot have more consumers than there are partitions within a
  given
 consumer group.

 On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com
 
wrote:

  Hi,
 
  I was wondering what options there are for horizontally scaling
  kafka
  consumers? Basically if I have 100 partitions and 10 consumers,
 and
want
 to
  temporarily scale up to 50 consumers, what options do I have?
 
  So far I've thought of just simply tracking consumer membership
   somehow
  (either through Raft or zookeeper's znodes) on the consumers.
 

   
  
 



Re: Horizontally Scaling Kafka Consumers

2015-04-30 Thread Sharninder
You need to first decide the conditions that need to be met for you to
scale to 50 consumers. These can be as simple as the consumer lag. Look at
the console offset checker tool and see if any of those numbers make sense.
Your existing consumers could also produce some metrics based on which
another process will decide when to spawn new customers.

--
Sharninder


On Wed, Apr 29, 2015 at 11:58 PM, Nimi Wariboko Jr n...@channelmeter.com
wrote:

 Hi,

 I was wondering what options there are/what other people are doing for
 horizontally scaling kafka consumers? Basically if I have 100 partitions
 and 10 consumers, and want to temporarily scale up to 50 consumers, what
 can I do?

 So far I've thought of just simply tracking consumer membership somehow
 (either through zookeeper's ephemeral nodes or maybe using gossip) on the
 consumers to achieve consensus on who consumes what. Another option would
 be having a router, possibly using something like nsq (I understand that
 they are similar pieces of software, but what we are going for is a
 persistent distributed queue (sharding) which is why I'm looking into
 Kafka)?




-- 
--
Sharninder


Re: Horizontally Scaling Kafka Consumers

2015-04-30 Thread Nimi Wariboko Jr
My mistake, it seems the Java drivers are a lot more advanced than the
Shopify's Kafka driver (or I am missing something) - and I haven't used
Kafka before.

With the Go driver - it seems you have to manage offsets and partitions
within the application code, while in Scala driver it seems you have the
option of simply subscribing to a topic, and someone else will manage that
part.

After digging around a bit more, I found there is another library -
https://github.com/wvanbergen/kafka - that speaks the consumergroup API and
accomplishes what I was looking for and I assume is implemented by keeping
track of memberships w/ Zookeeper.

Thank you for the information - it really helped clear up what I failing to
understand with kafka.

Nimi

On Wed, Apr 29, 2015 at 10:10 PM, Joe Stein joe.st...@stealth.ly wrote:

 You can do this with the existing Kafka Consumer

 https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106
 and probably any other Kafka client too (maybe with minor/major rework
 to-do the offset management).

 The new consumer approach is more transparent on Subscribing To Specific
 Partitions

 https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234
 .

 Here is a Docker file (** pull request pending **) for wrapping kafka
 consumers (doesn't have to be the go client, need to abstract that out some
 more after more testing)

 https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile


 Also a VM (** pull request pending **) to build container, push to local
 docker repository and launch on Apache Mesos

 https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant
 as working example how-to-do.

 All of this could be done without the Docker container and still work on
 Mesos ... or even without Mesos and on YARN.

 You might also want to checkout how Samza integrates with Execution
 Frameworks

 http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html
 which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375
 and
 built in YARN support.

 ~ Joe Stein
 - - - - - - - - - - - - - - - - -

   http://www.stealth.ly
 - - - - - - - - - - - - - - - - -

 On Wed, Apr 29, 2015 at 8:56 AM, David Corley davidcor...@gmail.com
 wrote:

  You're right Stevo, I should re-phrase to say that there can be no more
  _active_ consumers than there are partitions (within a single consumer
  group).
  I'm guessing that's what Nimi is alluding to asking, but perhaps he can
  elaborate on whether he's using consumer groups and/or whether the 100
  partitions are all for a single topic, or multiple topics.
 
  On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote:
 
   Please correct me if wrong, but I think it is really not hard
 constraint
   that one cannot have more consumers (from same group) than partitions
 on
   single topic - all the surplus consumers will not be assigned to
 consume
   any partition, but they can be there and as soon as one active consumer
   from same group goes offline (its connection to ZK is dropped),
 consumers
   from the group will be rebalanced so one passively waiting consumer
 will
   become active.
  
   Kind regards,
   Stevo Slavic.
  
   On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com
   wrote:
  
If the 100 partitions are all for the same topic, you can have up to
  100
consumers working as part of a single consumer group for that topic.
You cannot have more consumers than there are partitions within a
 given
consumer group.
   
On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com
   wrote:
   
 Hi,

 I was wondering what options there are for horizontally scaling
 kafka
 consumers? Basically if I have 100 partitions and 10 consumers, and
   want
to
 temporarily scale up to 50 consumers, what options do I have?

 So far I've thought of just simply tracking consumer membership
  somehow
 (either through Raft or zookeeper's znodes) on the consumers.

   
  
 



Re: Horizontally Scaling Kafka Consumers

2015-04-29 Thread David Corley
If the 100 partitions are all for the same topic, you can have up to 100
consumers working as part of a single consumer group for that topic.
You cannot have more consumers than there are partitions within a given
consumer group.

On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com wrote:

 Hi,

 I was wondering what options there are for horizontally scaling kafka
 consumers? Basically if I have 100 partitions and 10 consumers, and want to
 temporarily scale up to 50 consumers, what options do I have?

 So far I've thought of just simply tracking consumer membership somehow
 (either through Raft or zookeeper's znodes) on the consumers.



Re: Horizontally Scaling Kafka Consumers

2015-04-29 Thread Stevo Slavić
Please correct me if wrong, but I think it is really not hard constraint
that one cannot have more consumers (from same group) than partitions on
single topic - all the surplus consumers will not be assigned to consume
any partition, but they can be there and as soon as one active consumer
from same group goes offline (its connection to ZK is dropped), consumers
from the group will be rebalanced so one passively waiting consumer will
become active.

Kind regards,
Stevo Slavic.

On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com wrote:

 If the 100 partitions are all for the same topic, you can have up to 100
 consumers working as part of a single consumer group for that topic.
 You cannot have more consumers than there are partitions within a given
 consumer group.

 On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com wrote:

  Hi,
 
  I was wondering what options there are for horizontally scaling kafka
  consumers? Basically if I have 100 partitions and 10 consumers, and want
 to
  temporarily scale up to 50 consumers, what options do I have?
 
  So far I've thought of just simply tracking consumer membership somehow
  (either through Raft or zookeeper's znodes) on the consumers.
 



Re: Horizontally Scaling Kafka Consumers

2015-04-29 Thread David Corley
You're right Stevo, I should re-phrase to say that there can be no more
_active_ consumers than there are partitions (within a single consumer
group).
I'm guessing that's what Nimi is alluding to asking, but perhaps he can
elaborate on whether he's using consumer groups and/or whether the 100
partitions are all for a single topic, or multiple topics.

On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote:

 Please correct me if wrong, but I think it is really not hard constraint
 that one cannot have more consumers (from same group) than partitions on
 single topic - all the surplus consumers will not be assigned to consume
 any partition, but they can be there and as soon as one active consumer
 from same group goes offline (its connection to ZK is dropped), consumers
 from the group will be rebalanced so one passively waiting consumer will
 become active.

 Kind regards,
 Stevo Slavic.

 On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com
 wrote:

  If the 100 partitions are all for the same topic, you can have up to 100
  consumers working as part of a single consumer group for that topic.
  You cannot have more consumers than there are partitions within a given
  consumer group.
 
  On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com
 wrote:
 
   Hi,
  
   I was wondering what options there are for horizontally scaling kafka
   consumers? Basically if I have 100 partitions and 10 consumers, and
 want
  to
   temporarily scale up to 50 consumers, what options do I have?
  
   So far I've thought of just simply tracking consumer membership somehow
   (either through Raft or zookeeper's znodes) on the consumers.
  
 



Re: Horizontally Scaling Kafka Consumers

2015-04-29 Thread Joe Stein
You can do this with the existing Kafka Consumer
https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106
and probably any other Kafka client too (maybe with minor/major rework
to-do the offset management).

The new consumer approach is more transparent on Subscribing To Specific
Partitions
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234
.

Here is a Docker file (** pull request pending **) for wrapping kafka
consumers (doesn't have to be the go client, need to abstract that out some
more after more testing)
https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile


Also a VM (** pull request pending **) to build container, push to local
docker repository and launch on Apache Mesos
https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant
as working example how-to-do.

All of this could be done without the Docker container and still work on
Mesos ... or even without Mesos and on YARN.

You might also want to checkout how Samza integrates with Execution
Frameworks
http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html
which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375 and
built in YARN support.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Wed, Apr 29, 2015 at 8:56 AM, David Corley davidcor...@gmail.com wrote:

 You're right Stevo, I should re-phrase to say that there can be no more
 _active_ consumers than there are partitions (within a single consumer
 group).
 I'm guessing that's what Nimi is alluding to asking, but perhaps he can
 elaborate on whether he's using consumer groups and/or whether the 100
 partitions are all for a single topic, or multiple topics.

 On 29 April 2015 at 13:38, Stevo Slavić ssla...@gmail.com wrote:

  Please correct me if wrong, but I think it is really not hard constraint
  that one cannot have more consumers (from same group) than partitions on
  single topic - all the surplus consumers will not be assigned to consume
  any partition, but they can be there and as soon as one active consumer
  from same group goes offline (its connection to ZK is dropped), consumers
  from the group will be rebalanced so one passively waiting consumer will
  become active.
 
  Kind regards,
  Stevo Slavic.
 
  On Wed, Apr 29, 2015 at 2:25 PM, David Corley davidcor...@gmail.com
  wrote:
 
   If the 100 partitions are all for the same topic, you can have up to
 100
   consumers working as part of a single consumer group for that topic.
   You cannot have more consumers than there are partitions within a given
   consumer group.
  
   On 29 April 2015 at 08:41, Nimi Wariboko Jr n...@channelmeter.com
  wrote:
  
Hi,
   
I was wondering what options there are for horizontally scaling kafka
consumers? Basically if I have 100 partitions and 10 consumers, and
  want
   to
temporarily scale up to 50 consumers, what options do I have?
   
So far I've thought of just simply tracking consumer membership
 somehow
(either through Raft or zookeeper's znodes) on the consumers.