Hi Need some advise on how to implement following use case.
I read dataset which is 1+ TB in size, this has 1000+ columns. Only 3 columns out of these 1000+ columns contain PII information and I need to call Google DLP API. I want to select only 3 columns out of these 1000+ columns and submit only these 3 columns to DLP API. Once I get the results back from DLP, I want to change these 3 columns in my original data set. I dont have any UUID for each row, so I will not be able to join original data (1000+ columns) with another data (3 columns). Any suggestions how to implement it. Thanks Aniruddh
