Are you by chance using a clustered NiFi? I'm seeing duplicate messages
if I run the consumer on multiple NiFi nodes, so I've started running
the consumer only on the parent. This seems to correct the issue, but
leads to other problems. I'd love a solution.
-Joe
On 11/16/2022 3:50 AM, Aian Cantabrana wrote:
Hi Joe,
Thanks for the reply. The actual flow is sending data from the
ConsumeAMQP processor to two different PublishKafka processors, one
with Idempotence and other with default config. Each of it is sending
same data to two different topics and comparing both topics is how I
am checking that there are duplicates. It seems to be random, some
times they appear in the "normal" processor's topic and others in the
"idempotence", I did not find any pattern.
I will upgrade to NiFi 1.18.0 and try again.
In any case, messages have json format (one json per flowfile) but
since I am sending and storing them in kafka in plain text I am using
/no-record-oriented/ Kafka publisher. Is PublishKafkaRecord more
reliable? Would it be better to use it?
Thanks,
Aian
------------------------------------------------------------------------
*De: *"Joe Witt" <[email protected]>
*Para: *"users" <[email protected]>
*Enviados: *Martes, 15 de Noviembre 2022 17:31:54
*Asunto: *Re: Exacly once from NiFi to Kafka
Aian,
How can you tell there are duplicates in Kafka and are you certain
that no duplicates exist in the source topic?
Given NiFi's data provenance capabilities you should be able to pin
point a given duplicate and figure out whether it happened at the
source, in nifi, or otherwise.
Note much has changed/improved since the 1.12.x line of NiFi so we
have more Kafka components and record oriented mechanisms. But still
pretty sure even in your version we should not be duplicating data
unless the flow is configured such that it would happen.
Thanks
On Tue, Nov 15, 2022 at 9:25 AM Aian Cantabrana <[email protected]>
wrote:
Hi,
I am having some difficulties trying to get */exactly-once
/*semantic while ensuring data order from NiFi to Kafka. I have
read Kafka documentation and it should be quite straight forward
using idempotent producer from NiFi and having a Kafka topic with
a single partition, but I am still getting some duplicated
messages in Kafka.
NiFi version: 1.12.1
Kafka version: 2.7.0
NiFi flow:
(Both shown queues with FIFO prioritizer)
PublishKafka_2_6 configuration:
As I said, target Kafka topic has just one partition to ensure
data order.
Incoming flowfiles are small 60 bytes messages.
I have been a while working with it so any suggestion is really
welcome.
Thanks in advance,
Aian
--
This email has been checked for viruses by AVG antivirus software.
www.avg.com