Re: How to PushDown ParquetFilter Spark 2.0.1 dataframe

2017-03-31 Thread Hanumath Rao Maduri
Hello Rahul, Please try to use df.filter(df("id").isin(1,2)) Thanks, On Thu, Mar 30, 2017 at 10:45 PM, Rahul Nandi wrote: > Hi, > I have around 2 million data as parquet file in s3. The file structure is > somewhat like > id data > 1 abc > 2 cdf > 3 fas > Now I want

Predicate not getting pusdhown to PrunedFilterScan

2017-03-31 Thread Hanumath Rao Maduri
Hello All, I am working on creating a new PrunedFilteredScan operator which has the ability to execute the predicates pushed to this operator. However What I observed is that if column with deep in the hierarchy is used then it is not getting pushed down. SELECT tom._id, tom.address.city from

Predicate not getting pusdhown to PrunedFilterScan

2017-03-30 Thread Hanumath Rao Maduri
Hello All, I am working on creating a new PrunedFilteredScan operator which has the ability to execute the predicates pushed to this operator. However What I observed is that if column with deep in the hierarchy is used then it is not getting pushed down. SELECT tom._id, tom.address.city from

Predicate not getting pusdhown to PrunedFilterScan

2017-03-30 Thread Hanumath Rao Maduri
Hello All, I am working on creating a new PrunedFilteredScan operator which has the ability to execute the predicates pushed to this operator. However What I observed is that if column with deep in the hierarchy is used then it is not getting pushed down. SELECT tom._id, tom.address.city from

java.lang.RuntimeException: Stream '/jars/' not found

2016-12-16 Thread Hanumath Rao Maduri
Hello All, I am trying to test an application on standalone cluster. Here is my scenario. I started a spark master on a node A and also 1 worker on the same node A. I am trying to run the application from node B(this means I think this acts as driver). I have added jars to the sparkconf using

Re: Spark RDD and Memory

2016-09-22 Thread Hanumath Rao Maduri
Hello Aditya, After an intermediate action has been applied you might want to call rdd.unpersist() to let spark know that this rdd is no longer required. Thanks, -Hanu On Thu, Sep 22, 2016 at 7:54 AM, Aditya wrote: > Hi, > > Suppose I have two RDDs > val