When I run spark.read.orc("hdfs://test").filter("conv_date = 20181025").count with "spark.sql.orc.filterPushdown=true" I see below in executors logs. Predicate push down is happening
18/11/01 17:31:17 INFO OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL conv_date) leaf-1 = (EQUALS conv_date 20181025) expr = (and (not leaf-0) leaf-1) But when I run hive query in spark I see below logs Hive table: Hive spark.sql("select * from test where conv_date = 20181025").count 18/11/01 17:37:57 INFO HadoopRDD: Input split: hdfs://test/test1.orc:0+34568 18/11/01 17:37:57 INFO OrcRawRecordMerger: min key = null, max key = null 18/11/01 17:37:57 INFO ReaderImpl: Reading ORC rows from hdfs://test/test1.orc with {include: [true, false, false, false, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false], offset: 0, length: 9223372036854775807} 18/11/01 17:37:57 INFO Executor: Finished task 224.0 in stage 0.0 (TID 33). 1662 bytes result sent to driver 18/11/01 17:37:57 INFO CoarseGrainedExecutorBackend: Got assigned task 40 18/11/01 17:37:57 INFO Executor: Running task 956.0 in stage 0.0 (TID 40) -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org