Hi Koji,
Thanks for response and helpful links !

NiFi version : 1.1.0.2.1.2.0-10

I am trying to move data from operational system (oracle db) to analytical
system (postgres db). Postgres table has been model/designed by us (and can
add primary key). Data from oracle looks like below  (i need to remove
duplicate record for combination on ColA , ColB)

Col A Col B
C1 item 1
C1 item 2
*C2* *item 3*
*C2* *item 4*
*C2* *item 3*
C3 item 1
C4 null
C5 item 5
C6 item 7
I will try to explore PutDatabaseRecord processor and see i can achieve
desired purpose.

Thanks,
Vikram

On Mon, Sep 18, 2017 at 9:59 PM, Koji Kawamura <[email protected]>
wrote:

> Hello Vikram,
>
> Welcome to NiFi and the community :)
>
> Would you elaborate your data flow? And which version you are using?
> For example, can you share some input data extracted from Oracle? I
> wonder why you need to remove duplicate records while PostgreSQL
> doesn't have primary key constraint, or why you have such records in
> the beginning.
>
> Current PutSQL does not report the cause of batch update failure well.
> But that behavior has been improved and you can see what is the cause
> if you can use NiFi 1.4.0-SNAPSHOT (you need to build NiFi from source
> code to try it).
> https://issues.apache.org/jira/browse/NIFI-4162
>
> Please refer NiFi README.md for how to build and run NiFi from source code.
> https://github.com/apache/nifi
>
> Also, in order to put Avro data to an RDBMS, NiFi also has
> PutDatabaseRecord processor today. Which can work more efficiently
> because you don't have to use 'split avro -> avrotojson -> jsontosql'
> part, PutDatabaseRecord can directly execute DML statement from Avro
> dataset.
> https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi/nifi-standard-nar/1.3.0/org.apache.nifi.processors.standard.
> PutDatabaseRecord/index.html
>
> Thanks,
> Koji
>
> On Tue, Sep 19, 2017 at 9:21 AM, Vikram More <[email protected]>
> wrote:
> > Hi Everyone,
> >
> > I am new to NiFi and community :)
> >
> > I am trying to build a Nifi flow which will pull from Oracle table and
> load
> > into Postgres table. My select query has two columns and I need to remove
> > duplicates based on these two columns. Can I remove duplicates in Nifi
> based
> > on two column data values. My flow is like below -
> > ExecuteSQL -> split avro -> avrotojson -> jsontosql -> PutSQL
> >
> >
> > PutSQL question : Oracle table has ~ 4 million records and when the
> PutSQL
> > was running , it gave several similar errors :
> >
> > "Failed to update database due to failed batch update. There were total
> of 1
> > FlowFiles that failed, 5 that successful, and 9 that were not execute and
> > will be routed to retry"
> >
> > Why might be wrong in PutSQL ? have kept PutSQL batch size of 1000 and
> don't
> > have any primary key constraint on postgres table.
> > (Should I create primary key with those two columns, so while loading it
> can
> > reject duplicate records, but will it rejects the complete batch rather
> than
> > just duplicates ?)
> >
> > Would be great if someone can provide insight in this scenario ?
> >
> > Thanks,
> > Vikram
>

Reply via email to