I understand Kafka supports keyed messages (I am using 0.8.1.1) and it is
possible to de-duplicate messages based on the message key.

(The log compaction section of the on-line documentation described how that
works.)

I am using a code example that come with Kafka (namely
KafkaConsumerProducerDemo) and run it through Kafka local mode. I write a
set of messages with the same String key and then have a consumer that
consumes data.

The consumer consumes messages *only* after the producer has produced all
its messages.

I would expect the consumer to retrieve only the latest message (as all
messages have the same key) but it retrieves all messages the producer has
emitted.

I have also turned on these properties in the Kafka server:

log.cleaner.enable=true
log.cleanup.policy=dedupe

- is de-duplication of messages guaranteed to take effect only after
compaction?

- I have tried to "force" compaction by setting "log.cleaner.backoff.ms"
and "log.cleaner.min.cleanabke.ratio" to very low values, but I still
observe the same behavior.

Any ideas or pointers?

Thanks.

Reply via email to