Quoting Michael:
Predicate push down into the input format is turned off by default because 
there is a bug in the current parquet library that null pointers when there are 
full row groups that are null.

https://issues.apache.org/jira/browse/SPARK-4258

You can turn it on if you want: 
http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration

Daniel

> On 7 בינו׳ 2015, at 08:18, Xuelin Cao <xuelin...@yahoo.com.INVALID> wrote:
> 
> 
> Hi,
> 
>        I'm testing parquet file format, and the predicate pushdown is a very 
> useful feature for us.
> 
>        However, it looks like the predicate push down doesn't work after I 
> set 
>        sqlContext.sql("SET spark.sql.parquet.filterPushdown=true")
>  
>        Here is my sql:
>        sqlContext.sql("select adId, adTitle  from ad where 
> groupId=10113000").collect
> 
>        Then, I checked the amount of input data on the WEB UI. But the amount 
> of input data is ALWAYS 80.2M regardless whether I turn the 
> spark.sql.parquet.filterPushdown flag on or off.
> 
>        I'm not sure, if there is anything that I must do when generating the 
> parquet file in order to make the predicate pushdown available. (Like ORC 
> file, when creating the ORC file, I need to explicitly sort the field that 
> will be used for predicate pushdown)
> 
>        Anyone have any idea?
> 
>        And, anyone knows the internal mechanism for parquet predicate 
> pushdown?
> 
>        Thanks
> 
>  

Reply via email to