Re: How to PushDown ParquetFilter Spark 2.0.1 dataframe

2017-03-30 Thread Hanumath Rao Maduri
Hello Rahul, Please try to use df.filter(df("id").isin(1,2)) Thanks, On Thu, Mar 30, 2017 at 10:45 PM, Rahul Nandi wrote: > Hi, > I have around 2 million data as parquet file in s3. The file structure is > somewhat like > id data > 1 abc > 2 cdf > 3 fas > Now I want to filter and take the reco

How to PushDown ParquetFilter Spark 2.0.1 dataframe

2017-03-30 Thread Rahul Nandi
Hi, I have around 2 million data as parquet file in s3. The file structure is somewhat like id data 1 abc 2 cdf 3 fas Now I want to filter and take the records where the id matches with my required Id. val requiredDataId = Array(1,2) //Might go upto 100s of records. df.filter(requiredDataId.conta