I understand Kafka supports keyed messages (I am using 0.8.1.1) and it is possible to de-duplicate messages based on the message key.
(The log compaction section of the on-line documentation described how that works.) I am using a code example that come with Kafka (namely KafkaConsumerProducerDemo) and run it through Kafka local mode. I write a set of messages with the same String key and then have a consumer that consumes data. The consumer consumes messages *only* after the producer has produced all its messages. I would expect the consumer to retrieve only the latest message (as all messages have the same key) but it retrieves all messages the producer has emitted. I have also turned on these properties in the Kafka server: log.cleaner.enable=true log.cleanup.policy=dedupe - is de-duplication of messages guaranteed to take effect only after compaction? - I have tried to "force" compaction by setting "log.cleaner.backoff.ms" and "log.cleaner.min.cleanabke.ratio" to very low values, but I still observe the same behavior. Any ideas or pointers? Thanks.