Hi, I have around 2 million data as parquet file in s3. The file structure is somewhat like id data 1 abc 2 cdf 3 fas Now I want to filter and take the records where the id matches with my required Id.
val requiredDataId = Array(1,2) //Might go upto 100s of records. df.filter(requiredDataId.contains("id")) This is my use case. What will be best way to do this in spark 2.0.1 where I can also pushDown the filter to parquet? Thanks and Regards, Rahul