Hi, What's the configuration of the GTF processor? Is data written to the source table while executing the workflow? How do you check for duplicate rows in Hive?
Thanks 2018-07-20 15:12 GMT+02:00 Mohit <[email protected]>: > Hi all, > > I am fetching data from Netezza using GenerateTableFetch -> RPG -> > ExecuteSQL -> PutHDFS . It is working fine for most of the time, but for > some tables with more than a million rows, it fetches duplicate rows. > > > > Partition Size varies from 3 million to 30 million with respect to table > size. For table with ~300 million rows, size is 30 million and likewise. > > > > For Example – > > > > Table : abc > > Netezza count - 3265421 > > Hive Count - 3265421 > > Duplicate rows in Hive - 97070 > > > > Is this the expected behaviour while fetching from Netezza? > > > > Regards, > > Mohit >
