Re: Consistency and Availability on Node Failures

2014-10-16 Thread Gwen Shapira
Just note that this is not a universal solution. Many use-cases care about which partition you end up writing to since partitions are used to... well, partition logical entities such as customers and users. On Wed, Oct 15, 2014 at 9:03 PM, Jun Rao jun...@gmail.com wrote: Kyle, What you

Re: Consistency and Availability on Node Failures

2014-10-16 Thread Kyle Banker
I didn't realize that anyone used partitions to logically divide a topic. When would that be preferable to simply having a separate topic? Isn't this a minority case? On Thu, Oct 16, 2014 at 7:28 AM, Gwen Shapira gshap...@cloudera.com wrote: Just note that this is not a universal solution.

Re: Consistency and Availability on Node Failures

2014-10-16 Thread gshapira
It may be a minority, I can't tell yet. But in some apps we need to know that a consumer, who is assigned a single partition, will get all data about a subset of users. This is way more flexible than multiple topics since we still have the benefits of partition reassignment, load balancing

Re: Consistency and Availability on Node Failures

2014-10-16 Thread cac...@gmail.com
Knowing that the partitioning is consistent for a given key means that (apart from other benefits) a given consumer only deals with a partition of the keyspace. So if you are in a system with tens of millions of users each consumer only has to store state on a small number of them with

Re: Consistency and Availability on Node Failures

2014-10-15 Thread Kyle Banker
As I understand it, regardless of whether the partitioning strategy is random or hashed, a producer will eventually try to write a message to one of the partitions that is unavailable (as defined by our acks=-1 and min.isr=2 settings). I suppose that the random partitioner, with enough retries,

Re: Consistency and Availability on Node Failures

2014-10-15 Thread Jun Rao
Kyle, What you wanted is not supported out of box. You can achieve this using the new java producer. The new java producer allows you to pick an arbitrary partition when sending a message. If you receive NotEnoughReplicasException when sending a message, you can resend it to another partition.

Consistency and Availability on Node Failures

2014-10-14 Thread Kyle Banker
Consider a 12-node Kafka cluster with a 200-parition topic having a replication factor of 3. Let's assume, in addition, that we're running Kafka v0.8.2, we've disabled unclean leader election, acks is -1, and min.isr is 2. Now suppose we lose 2 nodes. In this case, there's a good chance that 2/3

Re: Consistency and Availability on Node Failures

2014-10-14 Thread cac...@gmail.com
Wouldn't this work only for producers using random partitioning? On Tue, Oct 14, 2014 at 1:51 PM, Kyle Banker kyleban...@gmail.com wrote: Consider a 12-node Kafka cluster with a 200-parition topic having a replication factor of 3. Let's assume, in addition, that we're running Kafka v0.8.2,