cloud-fan commented on a change in pull request #24068: [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion URL: https://github.com/apache/spark/pull/24068#discussion_r293209902
########## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ########## @@ -362,6 +394,13 @@ object FilterPushdownBenchmark extends BenchmarkBase with SQLHelper { } runBenchmark(s"Pushdown benchmark with many filters") { + // This benchmark and the next one are similar in that they both test predicate pushdown + // where the filter itself is very large. There have been cases where the filter conversion + // would take minutes to hours for large filters due to it being implemented with exponential + // complexity in the height of the filter tree. + // The difference between these two benchmarks is that this one benchmarks pushdown with a + // large string filter (`a AND b AND c ...`), whereas the next one benchmarks pushdown with + // a large Column-based filter (`col(a) || (col(b) || (col(c)...))`). Review comment: I still can't get it. Both the string filter and column-based filter will become an `Expression` in the `Filter` operator. The differences I see are 1. the new benchmark builds a larger filter 2. the new benchmark use `Or` instead of `And`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org