Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20265#discussion_r161431971 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -483,6 +484,64 @@ object OrcReadBenchmark { } } + def filterPushDownBenchmark(values: Int, width: Int): Unit = { + val benchmark = new Benchmark(s"Filter Pushdown", values) + + withTempPath { dir => + withTempTable("t1", "nativeOrcTable", "hiveOrcTable") { + import spark.implicits._ + val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i") + val whereExpr = (1 to width).map(i => s"NOT c$i LIKE '%not%exist%'").mkString(" AND ") --- End diff -- oh sorry I missed the `uniqueID` part. So the `like` operation is just to make the difference larger? We don't need to do this, just a simple predicate like `col = 1` or `col < 1`, to show normally how much PPD improves performance.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org