Hi,

What's the configuration of the GTF processor? Is data written to the
source table while executing the workflow?
How do you check for duplicate rows in Hive?

Thanks

2018-07-20 15:12 GMT+02:00 Mohit <[email protected]>:

> Hi all,
>
> I am fetching data from Netezza using GenerateTableFetch -> RPG ->
> ExecuteSQL -> PutHDFS . It is working fine for most of the time, but for
> some tables with more than a million rows, it fetches duplicate rows.
>
>
>
> Partition Size  varies from 3 million to 30 million with respect to table
> size. For table with ~300 million rows, size is 30 million and likewise.
>
>
>
> For Example –
>
>
>
> Table : abc
>
> Netezza count -  3265421
>
> Hive Count - 3265421
>
> Duplicate rows in Hive -  97070
>
>
>
> Is this the expected behaviour while fetching from Netezza?
>
>
>
> Regards,
>
> Mohit
>

Reply via email to