Hi Shekar,
Please refer to [1]. You can set a custom partitioner through the producer
cofig. You will have to implement your own partitioner based on your
application and partitioning strategy.
Thanks
Milinda
[1] https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
On Thu,
So if I have a feed with
{user_id:12345,
ethnicity: asian,
location: "cerritos, ca",
Height:"5.9",
weight: "150 lbs"}
I am referring to https://kafka.apache.org/081/ops.html#topic-config
How do I map the 3 columns - (user_id, ethnicity, and location) to a
partition id. If I map it this way and s
Hi Richard,
You can also partition by a key like "user_id" so that all messages for a
given user would end up in the same partition. This can be useful for
calculating user-specific aggregations or doing a distributed join where
the local state is also partitioned on user_id.
Cheers,
Roger
On
Is there a typo below? Are all of these actually in the same topic, just
different partitions? Partitioning, AFAIK, is mainly done for parallelism &
throughput reasons. What is the reason for partitioning your dataset by
‘columns’?
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Ho
Hi Shekar,
Each kafka partition is basically just a number, you would need to specify
what partitioner strategy to use when mapping your event key to the
partition number.
You can take the 4 columns you have in the event and map it to a partition
number,the partitioner in that case would be a func
Hello,
Want to confirm a basic understanding of Kafka.
If I have a dataset that needs to be partitioned by 4 columns, then the
progression is
{topic1:partition_key1} -> {Group by samza on partition_key1}
->
{topic2:partition_key2} -> {Group by samza on partition_key2}
->
{topic3:partition_key3} -