Thanks Joe!

Yeah, the new Kafka Consumer really hits the sweet spot with the option to 
perform manual management of which partitions a consumer should subscribe to 
without having to pay the rebalancing cost when the consumer processes scale 
horizontally. And all this without the head-aches of implementing a Simple 
consumer. Awesome indeed!

A summary of the use-case I have in mind is pretty typical for any internet 
event ingestion scenario with low latency, high throughput requirements:

  *   Kafka used as a durable shock-absorber. (Brokers will scale elastically)
  *   Consumers should be able to scale horizontally based on a somewhat 
un-predictable load.
  *   Consumers process messages from Kafka, raises new events, and persist a 
(large) subset of processed messages to Cassandra.
  *   Consumers have in-memory cache of meta-data related to the aggregates the 
messages are related to. Populating the in-memory cache is fairly expensive. 
Semantic partitioning is done on the producer side so one consumer always gets 
messages for the same aggregate (given no rebalancing).
  *   On scale out/in, the consumer partition assignment strategy should move 
as few partitions as possible to limit the cost of re-populating the in-memory 
cache for those partitions. (Only "range" and "roundrobin" assignment 
strategies are possible in Kafka today, but looks like they plan to add the 
possibility to add user defined strategies.)
  *   The non-affected consumers should still continue reading messages to keep 
the overall latency as low as possible.

By letting Kafka handle the consumer rebalancing itself can not solve this 
use-case as it requires knowledge of:

  *   Consumer location relative to other systems it depends on for read and 
writes. That is, the consumer partition assignment should be location aware to 
reduce latency and increase throughput. (E.g. same rack as the related 
Cassandra shard, Postgres shard, etc.)
  *   That all partitions are served by one consumer at any given time, without 
depending on the Kafka consumer rebalancing protocol as the partition ownership 
will now be managed outside.

In short, a Kafka ConsumerCoordinator running inside a cloud resource 
management system, such as Mesos.


From: Joe Stein <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, 11 June 2015 15:26
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Mesos-Kafka, any plans to support Consumers?

Hello Olof, the new consumer on trunk has a feature for subscribing to a 
specific partition 
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200
 for use by launching on Mesos. In this case the "rebalancing" doesn't happen 
since every instance is on a partition (so if you have 100 partitions you would 
have 100 instances running (or less with some basic business logic (e.g. 10 
instances each would own 10 partitions, etc). This is great because if you lose 
1 instance EVERY instance doesn't have to rebalance and they just keep running. 
Awesome!

Making mesos/kafka provide the ability for consumers to launch also is an 
interesting idea. Since each consumer is a custom application things like 
configuration management would be specific and different so there are a lot of 
nuances to consider. I am not saying it can't be done but we need to make sure 
the implementation is useful and generic enough for this. Perhaps we could 
provide the ability to supply the tgz for folks custom consumer so we can 
launch it (via the kafka scheduler) for you. I think though to-do this there 
would be a lot more than just scheduler changes we would likely have to right a 
new executor and some API so your consumer can get data from the executor or 
something. We can noodle on it.

It would be great to hear more about what uses cases you (or anyone else) have 
in mind so we can see how it might work based on other implementations we work, 
see and know about. For what we do now on Mesos (for producers and consumers) 
is either run them as custom frameworks (because they do special things (e.g. 
Storm, Spark)) or via Marathon.


~ Joe Stein
- - - - - - - - - - - - - - - - -
  
[https://docs.google.com/uc?export=download&id=0B3rS2kftp470b19EQXp0Q2JheVE&revid=0B3rS2kftp470aFhGdzZqMnUwT3M0MTlsZU8zZjZobGFuNFdrPQ]
  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Jun 11, 2015 at 12:54 PM, Johansson, Olof 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I just started to look into the New Kafka Consumer where it's easier to assign 
TopicPartitions to individual consumer instances. Ideally we would like the 
assignment of TopicPartitions to be "smart". That is, be able to apply similar 
scheduling, operations, and constraints  as is implemented by Mesos-Kafka today 
when it manages Kafka Brokers, and be able to assign partitions based on the 
state of the consumers and consumer groups in the cluster.

Are there any plans to extend mesos-kafka to manage Consumers and not only 
Brokers? If so, have the goals of such an effort been discussed?

(I understand that due to what's currently available in Kafka it's not easy to 
do. But it doesn't hurt to ask. :-) )

Thanks!
Olof

Reply via email to