Hi all,

I notice that, if the NIFI instance gets terminated while a processor is
processing a flow file, that processor starts to process the flow file
again from the beginning when NIFI is restarted.
I'm using the PutKudu processor and the PutParquet processor to write data
into kudu and parquet format. Due to the above behaviour,

   1. PutKudu shows primary key violation errors in a restart. I'm using
   INSERT operation and I can't use INSERT_IGNORE or UPSERT operations since I
   need to be notified if incoming data has duplicates.
   2. Since I need to write data in a single flow file into multiple
   parquet files(by specifying the row group size) It is possible for
   PutParquet processor to to generate multiple parquet  files with the same
   content in a restart (data can be duplicated)

I would be grateful if you could suggest a way to overcome this problem.

Thanks & Regards

*Vibhath Ileperuma*

Reply via email to