Joe,

I would ask if you are using a consumer group ID? A consumer group allows all 
consumers for the same application to know about each other's activity - which 
topics, partitions and offsets have most recently been consumed by which 
consumer - and so avoid conflicts and duplication. If you are not using a 
consumer group then having all nodes in a cluster each consume the same data is 
in fact correct behaviour from a Kafka viewpoint.

Steve Hindmarch

From: Joe Obernberger <[email protected]>
Sent: 19 November 2022 18:33
To: [email protected]; Aian Cantabrana <[email protected]>; Joe Witt 
<[email protected]>
Subject: Re: Exacly once from NiFi to Kafka


Are you by chance using a clustered NiFi?  I'm seeing duplicate messages if I 
run the consumer on multiple NiFi nodes, so I've started running the consumer 
only on the parent.  This seems to correct the issue, but leads to other 
problems.  I'd love a solution.

-Joe
On 11/16/2022 3:50 AM, Aian Cantabrana wrote:
Hi Joe,

Thanks for the reply. The actual flow is sending data from the ConsumeAMQP 
processor to two different PublishKafka processors, one with Idempotence and 
other with default config. Each of it is sending same data to two different 
topics and comparing both topics is how I am checking that there are 
duplicates. It seems to be random, some times they appear in the "normal" 
processor's topic and others in the "idempotence", I did not find any pattern.

I will upgrade to NiFi 1.18.0 and try again.

In any case, messages have json format (one json per flowfile) but since I am 
sending and storing them in kafka in plain text I am using no-record-oriented 
Kafka publisher. Is PublishKafkaRecord more reliable? Would it be better to use 
it?

Thanks,

Aian

________________________________
De: "Joe Witt" <[email protected]><mailto:[email protected]>
Para: "users" <[email protected]><mailto:[email protected]>
Enviados: Martes, 15 de Noviembre 2022 17:31:54
Asunto: Re: Exacly once from NiFi to Kafka

Aian,
How can you tell there are duplicates in Kafka and are you certain that no 
duplicates exist in the source topic?

Given NiFi's data provenance capabilities you should be able to pin point a 
given duplicate and figure out whether it happened at the source, in nifi, or 
otherwise.

Note much has changed/improved since the 1.12.x line of NiFi so we have more 
Kafka components and record oriented mechanisms.  But still pretty sure even in 
your version we should not be duplicating data unless the flow is configured 
such that it would happen.

Thanks

On Tue, Nov 15, 2022 at 9:25 AM Aian Cantabrana 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I am having some difficulties trying to get exactly-once semantic while 
ensuring data order from NiFi to Kafka. I have read Kafka documentation and it 
should be quite straight forward using idempotent producer from NiFi and having 
a Kafka topic with a single partition, but I am still getting some duplicated 
messages in Kafka.

NiFi version: 1.12.1
Kafka version: 2.7.0

NiFi flow:
[cid:[email protected]]
(Both shown queues with FIFO prioritizer)

PublishKafka_2_6 configuration:
[cid:[email protected]]
[cid:[email protected]]

As I said, target Kafka topic has just one partition to ensure data order.

Incoming flowfiles are small 60 bytes messages.

I have been a while working with it so any suggestion is really welcome.

Thanks in advance,

Aian


[https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-green-avg-v1.png]<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&data=05%7C01%7Cstephen.hindmarch%40bt.com%7C83952fd1f34343749d4708daca5c8bd0%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638044796103257137%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qZuFICQnDLwpIzUcrrCRL%2BIu5%2Fwsr6Y6qdP21n71QvU%3D&reserved=0>
Virus-free.www.avg.com<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&data=05%7C01%7Cstephen.hindmarch%40bt.com%7C83952fd1f34343749d4708daca5c8bd0%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638044796103257137%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qZuFICQnDLwpIzUcrrCRL%2BIu5%2Fwsr6Y6qdP21n71QvU%3D&reserved=0>

Reply via email to