Hi Koji, Thanks for response and helpful links ! NiFi version : 1.1.0.2.1.2.0-10
I am trying to move data from operational system (oracle db) to analytical system (postgres db). Postgres table has been model/designed by us (and can add primary key). Data from oracle looks like below (i need to remove duplicate record for combination on ColA , ColB) Col A Col B C1 item 1 C1 item 2 *C2* *item 3* *C2* *item 4* *C2* *item 3* C3 item 1 C4 null C5 item 5 C6 item 7 I will try to explore PutDatabaseRecord processor and see i can achieve desired purpose. Thanks, Vikram On Mon, Sep 18, 2017 at 9:59 PM, Koji Kawamura <[email protected]> wrote: > Hello Vikram, > > Welcome to NiFi and the community :) > > Would you elaborate your data flow? And which version you are using? > For example, can you share some input data extracted from Oracle? I > wonder why you need to remove duplicate records while PostgreSQL > doesn't have primary key constraint, or why you have such records in > the beginning. > > Current PutSQL does not report the cause of batch update failure well. > But that behavior has been improved and you can see what is the cause > if you can use NiFi 1.4.0-SNAPSHOT (you need to build NiFi from source > code to try it). > https://issues.apache.org/jira/browse/NIFI-4162 > > Please refer NiFi README.md for how to build and run NiFi from source code. > https://github.com/apache/nifi > > Also, in order to put Avro data to an RDBMS, NiFi also has > PutDatabaseRecord processor today. Which can work more efficiently > because you don't have to use 'split avro -> avrotojson -> jsontosql' > part, PutDatabaseRecord can directly execute DML statement from Avro > dataset. > https://nifi.apache.org/docs/nifi-docs/components/org. > apache.nifi/nifi-standard-nar/1.3.0/org.apache.nifi.processors.standard. > PutDatabaseRecord/index.html > > Thanks, > Koji > > On Tue, Sep 19, 2017 at 9:21 AM, Vikram More <[email protected]> > wrote: > > Hi Everyone, > > > > I am new to NiFi and community :) > > > > I am trying to build a Nifi flow which will pull from Oracle table and > load > > into Postgres table. My select query has two columns and I need to remove > > duplicates based on these two columns. Can I remove duplicates in Nifi > based > > on two column data values. My flow is like below - > > ExecuteSQL -> split avro -> avrotojson -> jsontosql -> PutSQL > > > > > > PutSQL question : Oracle table has ~ 4 million records and when the > PutSQL > > was running , it gave several similar errors : > > > > "Failed to update database due to failed batch update. There were total > of 1 > > FlowFiles that failed, 5 that successful, and 9 that were not execute and > > will be routed to retry" > > > > Why might be wrong in PutSQL ? have kept PutSQL batch size of 1000 and > don't > > have any primary key constraint on postgres table. > > (Should I create primary key with those two columns, so while loading it > can > > reject duplicate records, but will it rejects the complete batch rather > than > > just duplicates ?) > > > > Would be great if someone can provide insight in this scenario ? > > > > Thanks, > > Vikram >
