Re: Kafka partition key

2015-03-26 Thread Milinda Pathirage
Hi Shekar, Please refer to [1]. You can set a custom partitioner through the producer cofig. You will have to implement your own partitioner based on your application and partitioning strategy. Thanks Milinda [1] https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example On Thu,

Re: Kafka partition key

2015-03-26 Thread Shekar Tippur
So if I have a feed with {user_id:12345, ethnicity: asian, location: "cerritos, ca", Height:"5.9", weight: "150 lbs"} I am referring to https://kafka.apache.org/081/ops.html#topic-config How do I map the 3 columns - (user_id, ethnicity, and location) to a partition id. If I map it this way and s

Re: Kafka partition key

2015-03-26 Thread Roger Hoover
Hi Richard, You can also partition by a key like "user_id" so that all messages for a given user would end up in the same partition. This can be useful for calculating user-specific aggregations or doing a distributed join where the local state is also partitioned on user_id. Cheers, Roger On

Re: Kafka partition key

2015-03-26 Thread Richard Lee
Is there a typo below? Are all of these actually in the same topic, just different partitions? Partitioning, AFAIK, is mainly done for parallelism & throughput reasons. What is the reason for partitioning your dataset by ‘columns’? https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Ho

Re: Kafka partition key

2015-03-26 Thread Dotan Patrich
Hi Shekar, Each kafka partition is basically just a number, you would need to specify what partitioner strategy to use when mapping your event key to the partition number. You can take the 4 columns you have in the event and map it to a partition number,the partitioner in that case would be a func

Kafka partition key

2015-03-26 Thread Shekar Tippur
Hello, Want to confirm a basic understanding of Kafka. If I have a dataset that needs to be partitioned by 4 columns, then the progression is {topic1:partition_key1} -> {Group by samza on partition_key1} -> {topic2:partition_key2} -> {Group by samza on partition_key2} -> {topic3:partition_key3} -