I am running the following (connecting to an external Hive Metastore) /a/shark/spark/bin/spark-shell --master spark://ip:7077 --conf *spark.sql.parquet.filterPushdown=true*
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) and then ran two queries: sqlContext.sql("select count(*) from table where partition='blah' ") andsqlContext.sql("select count(*) from table where partition='blah' and epoch=1415561604") According to the Input tab in the UI both scan about 140G of data which is the size of my whole partition. So I have two questions -- 1. is there a way to tell from the plan if a predicate pushdown is supposed to happen? I see this for the second query res0: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == Aggregate false, [], [Coalesce(SUM(PartialCount#49L),0) AS _c0#0L] Exchange SinglePartition Aggregate true, [], [COUNT(1) AS PartialCount#49L] OutputFaker [] Project [] ParquetTableScan [epoch#139L], (ParquetRelation <list of hdfs files> 2. am I doing something obviously wrong that this is not working? (Im guessing it's not woring because the input size for the second query shows unchanged and the execution time is almost 2x as long) thanks in advance for any insights