Thanks Joe! Yeah, the new Kafka Consumer really hits the sweet spot with the option to perform manual management of which partitions a consumer should subscribe to without having to pay the rebalancing cost when the consumer processes scale horizontally. And all this without the head-aches of implementing a Simple consumer. Awesome indeed!
A summary of the use-case I have in mind is pretty typical for any internet event ingestion scenario with low latency, high throughput requirements: * Kafka used as a durable shock-absorber. (Brokers will scale elastically) * Consumers should be able to scale horizontally based on a somewhat un-predictable load. * Consumers process messages from Kafka, raises new events, and persist a (large) subset of processed messages to Cassandra. * Consumers have in-memory cache of meta-data related to the aggregates the messages are related to. Populating the in-memory cache is fairly expensive. Semantic partitioning is done on the producer side so one consumer always gets messages for the same aggregate (given no rebalancing). * On scale out/in, the consumer partition assignment strategy should move as few partitions as possible to limit the cost of re-populating the in-memory cache for those partitions. (Only "range" and "roundrobin" assignment strategies are possible in Kafka today, but looks like they plan to add the possibility to add user defined strategies.) * The non-affected consumers should still continue reading messages to keep the overall latency as low as possible. By letting Kafka handle the consumer rebalancing itself can not solve this use-case as it requires knowledge of: * Consumer location relative to other systems it depends on for read and writes. That is, the consumer partition assignment should be location aware to reduce latency and increase throughput. (E.g. same rack as the related Cassandra shard, Postgres shard, etc.) * That all partitions are served by one consumer at any given time, without depending on the Kafka consumer rebalancing protocol as the partition ownership will now be managed outside. In short, a Kafka ConsumerCoordinator running inside a cloud resource management system, such as Mesos. From: Joe Stein <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Thursday, 11 June 2015 15:26 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Mesos-Kafka, any plans to support Consumers? Hello Olof, the new consumer on trunk has a feature for subscribing to a specific partition https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200 for use by launching on Mesos. In this case the "rebalancing" doesn't happen since every instance is on a partition (so if you have 100 partitions you would have 100 instances running (or less with some basic business logic (e.g. 10 instances each would own 10 partitions, etc). This is great because if you lose 1 instance EVERY instance doesn't have to rebalance and they just keep running. Awesome! Making mesos/kafka provide the ability for consumers to launch also is an interesting idea. Since each consumer is a custom application things like configuration management would be specific and different so there are a lot of nuances to consider. I am not saying it can't be done but we need to make sure the implementation is useful and generic enough for this. Perhaps we could provide the ability to supply the tgz for folks custom consumer so we can launch it (via the kafka scheduler) for you. I think though to-do this there would be a lot more than just scheduler changes we would likely have to right a new executor and some API so your consumer can get data from the executor or something. We can noodle on it. It would be great to hear more about what uses cases you (or anyone else) have in mind so we can see how it might work based on other implementations we work, see and know about. For what we do now on Mesos (for producers and consumers) is either run them as custom frameworks (because they do special things (e.g. Storm, Spark)) or via Marathon. ~ Joe Stein - - - - - - - - - - - - - - - - - [https://docs.google.com/uc?export=download&id=0B3rS2kftp470b19EQXp0Q2JheVE&revid=0B3rS2kftp470aFhGdzZqMnUwT3M0MTlsZU8zZjZobGFuNFdrPQ] http://www.stealth.ly - - - - - - - - - - - - - - - - - On Thu, Jun 11, 2015 at 12:54 PM, Johansson, Olof <[email protected]<mailto:[email protected]>> wrote: Hi, I just started to look into the New Kafka Consumer where it's easier to assign TopicPartitions to individual consumer instances. Ideally we would like the assignment of TopicPartitions to be "smart". That is, be able to apply similar scheduling, operations, and constraints as is implemented by Mesos-Kafka today when it manages Kafka Brokers, and be able to assign partitions based on the state of the consumers and consumer groups in the cluster. Are there any plans to extend mesos-kafka to manage Consumers and not only Brokers? If so, have the goals of such an effort been discussed? (I understand that due to what's currently available in Kafka it's not easy to do. But it doesn't hurt to ask. :-) ) Thanks! Olof

