Hello Vikram, Welcome to NiFi and the community :)
Would you elaborate your data flow? And which version you are using? For example, can you share some input data extracted from Oracle? I wonder why you need to remove duplicate records while PostgreSQL doesn't have primary key constraint, or why you have such records in the beginning. Current PutSQL does not report the cause of batch update failure well. But that behavior has been improved and you can see what is the cause if you can use NiFi 1.4.0-SNAPSHOT (you need to build NiFi from source code to try it). https://issues.apache.org/jira/browse/NIFI-4162 Please refer NiFi README.md for how to build and run NiFi from source code. https://github.com/apache/nifi Also, in order to put Avro data to an RDBMS, NiFi also has PutDatabaseRecord processor today. Which can work more efficiently because you don't have to use 'split avro -> avrotojson -> jsontosql' part, PutDatabaseRecord can directly execute DML statement from Avro dataset. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.3.0/org.apache.nifi.processors.standard.PutDatabaseRecord/index.html Thanks, Koji On Tue, Sep 19, 2017 at 9:21 AM, Vikram More <[email protected]> wrote: > Hi Everyone, > > I am new to NiFi and community :) > > I am trying to build a Nifi flow which will pull from Oracle table and load > into Postgres table. My select query has two columns and I need to remove > duplicates based on these two columns. Can I remove duplicates in Nifi based > on two column data values. My flow is like below - > ExecuteSQL -> split avro -> avrotojson -> jsontosql -> PutSQL > > > PutSQL question : Oracle table has ~ 4 million records and when the PutSQL > was running , it gave several similar errors : > > "Failed to update database due to failed batch update. There were total of 1 > FlowFiles that failed, 5 that successful, and 9 that were not execute and > will be routed to retry" > > Why might be wrong in PutSQL ? have kept PutSQL batch size of 1000 and don't > have any primary key constraint on postgres table. > (Should I create primary key with those two columns, so while loading it can > reject duplicate records, but will it rejects the complete batch rather than > just duplicates ?) > > Would be great if someone can provide insight in this scenario ? > > Thanks, > Vikram
