The *spark.sql.parquet.**filterPushdown=true *has been turned on. But set *spark.sql.hive.**convertMetastoreParquet *to *false*. the first parameter is lose efficacy!!!
2015-01-20 6:52 GMT+08:00 Yana Kadiyska <yana.kadiy...@gmail.com>: > If you're talking about filter pushdowns for parquet files this also has > to be turned on explicitly. Try *spark.sql.parquet.**filterPushdown=true > . *It's off by default > > On Mon, Jan 19, 2015 at 3:46 AM, Xiaoyu Wang <wangxy...@gmail.com> wrote: > >> Yes it works! >> But the filter can't pushdown!!! >> >> If custom parquetinputformat only implement the datasource API? >> >> >> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala >> >> 2015-01-16 21:51 GMT+08:00 Xiaoyu Wang <wangxy...@gmail.com>: >> >>> Thanks yana! >>> I will try it! >>> >>> 在 2015年1月16日,20:51,yana <yana.kadiy...@gmail.com> 写道: >>> >>> I think you might need to set >>> spark.sql.hive.convertMetastoreParquet to false if I understand that >>> flag correctly >>> >>> Sent on the new Sprint Network from my Samsung Galaxy S®4. >>> >>> >>> -------- Original message -------- >>> From: Xiaoyu Wang >>> Date:01/16/2015 5:09 AM (GMT-05:00) >>> To: user@spark.apache.org >>> Subject: Why custom parquet format hive table execute "ParquetTableScan" >>> physical plan, not "HiveTableScan"? >>> >>> Hi all! >>> >>> In the Spark SQL1.2.0. >>> I create a hive table with custom parquet inputformat and outputformat. >>> like this : >>> CREATE TABLE test( >>> id string, >>> msg string) >>> CLUSTERED BY ( >>> id) >>> SORTED BY ( >>> id ASC) >>> INTO 10 BUCKETS >>> ROW FORMAT SERDE >>> '*com.a.MyParquetHiveSerDe*' >>> STORED AS INPUTFORMAT >>> '*com.a.MyParquetInputFormat*' >>> OUTPUTFORMAT >>> '*com.a.MyParquetOutputFormat*'; >>> >>> And the spark shell see the plan of "select * from test" is : >>> >>> [== Physical Plan ==] >>> [!OutputFaker [id#5,msg#6]] >>> [ *ParquetTableScan* [id#12,msg#13], (ParquetRelation >>> hdfs://hadoop/user/hive/warehouse/test.db/test, Some(Configuration: >>> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, >>> yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml), >>> org.apache.spark.sql.hive.HiveContext@6d15a113, []), []] >>> >>> *Not HiveTableScan*!!! >>> *So it dosn't execute my custom inputformat!* >>> Why? How can it execute my custom inputformat? >>> >>> Thanks! >>> >>> >>> >> >