Hi,
I have around 2 million data as parquet file in s3. The file structure is
somewhat like
id data
1 abc
2 cdf
3 fas
Now I want to filter and take the records where the id matches with my
required Id.

val requiredDataId = Array(1,2) //Might go upto 100s of records.

df.filter(requiredDataId.contains("id"))

This is my use case.

What will be best way to do this in spark 2.0.1 where I can also pushDown
the filter to parquet?



Thanks and Regards,
Rahul

Reply via email to