And to share my experience of doing similar - certain messages on our
system must not be duplicated, but as they are bounced back to us from
third parties, duplication is inevitable. So I deduplicate them using Spark
structured streaming's flapMapGroupsWithState to deduplicate based on a
business key derived from the message.

Kind regards,

Liam Clarke

On Thu, Apr 4, 2019 at 4:09 AM Hans Jespersen <h...@confluent.io> wrote:

> Ok what you are describing is different from accidental duplicate message
> pruning which is what the idempotent publish feature does.
>
> You are describing a situation were multiple independent messages just
> happen to have the same contents (both key and value).
>
> Removing those messages is an application specific function as you can
> imaging applications which would not want independent but identical
> messages to be removed (for example temperature sensor readings, heartbeat
> messages, or other telemetry data that has repeat but independent values).
>
> Your best bet is to write a simple intermediate processor that implements
> your message pruning algorithm of choice and republishes (or not) to
> another topic that your consumers read from. Its a stateful app because it
> needs to remember 1 or more past messages but that can be done using the
> Kafka Streams processor API and the embedded rocksdb state store that comes
> with Kafka Streams (or as a UDF in KSQL).
>
> You can alternatively write your consuming apps to implement similar
> message pruning functionality themselves and avoid one extra component in
> the end to end architecture
>
> -hans
>
> > On Apr 2, 2019, at 7:28 PM, jim.me...@concept-solutions.com <
> jim.me...@concept-solutions.com> wrote:
> >
> >
> >
> >> On 2019/04/02 22:43:31, jim.me...@concept-solutions.com <
> jim.me...@concept-solutions.com> wrote:
> >>
> >>
> >>> On 2019/04/02 22:25:16, jim.me...@concept-solutions.com <
> jim.me...@concept-solutions.com> wrote:
> >>>
> >>>
> >>>> On 2019/04/02 21:59:21, Hans Jespersen <h...@confluent.io> wrote:
> >>>> yes. Idempotent publish uses a unique messageID to discard potential
> duplicate messages caused by failure conditions when  publishing.
> >>>>
> >>>> -hans
> >>>>
> >>>>> On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com <
> jim.me...@concept-solutions.com> wrote:
> >>>>>
> >>>>> Does Kafka have something that behaves like a unique key so a
> producer can’t write the same value to a topic twice?
> >>>
> >>> Hi Hans,
> >>>
> >>>    Is there some documentation or an example with source code where I
> can learn more about this feature and how it is implemented?
> >>>
> >>> Thanks,
> >>> Jim
> >>
> >> By the way I tried this...
> >> echo "key1:value1" | ~/kafka/bin/kafka-console-producer.sh
> --broker-list localhost:9092 --topic TestTopic --property "parse.key=true"
> --property "key.separator=:" --property "enable.idempotence=true" >
> /dev/null
> >>
> >> And... that didn't seem to do the trick - after running that command
> multiple times I did receive key1 value1 for as many times as I had run the
> prior command.
> >>
> >> Maybe it is the way I am setting the flags...
> >> Recently I saw that someone did this...
> >> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> --producer-property enable.idempotence=true --request-required-acks -1
> >
> > Also... the reason for my question is that we are going to have two JMS
> topics with nearly redundant data in them have the UNION written to Kafka
> for further processing.
> >
>

Reply via email to