IvanVergiliev commented on issue #24068: [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion URL: https://github.com/apache/spark/pull/24068#issuecomment-503021377 @cloud-fan I reverted the benchmark changes, and updated the benchmark in the PR description so it only runs the filter conversion. It's even nicer to look at now, with the old version very visibly increasing 2x on each iteration, and the new one taking roughly 0 ms for each size. I don't remember if we came to a conclusion on whether I should bring this PR back into a state where it has separate `filter` and `build` methods so the followup changes are easier, or if we should merge it as is and do all changes in followups. Let me know what you'd like me to do. There's one minor caveat with benchmark code. The `OrcFilters` class is marked `private[sql]`, so I needed to do some Scala gymnastics to enable the benchmark to call it directly. This is the full version of the code: ```scala // Paste this into spark-shell using the `-raw` flag so it gets interpreted as a Scala file and so that we can trick spark-shell into thinking our class is actually in the `sql` package and can thus access `OrcFilters`. :paste -raw package org.apache.spark.sql import org.apache.spark.sql.execution.datasources.orc._ import org.apache.spark.sql.types._ import org.apache.spark.sql.sources._ object OrcFiltersTest { def foo(): Unit = { val schema = StructType.fromDDL("col INT") (20 to 30).foreach { width => val whereFilter = (1 to width).map(i => EqualTo("col", i)).reduceLeft(Or) val start = System.currentTimeMillis() OrcFilters.createFilter(schema, Seq(whereFilter)) println(s"With $width filters, conversion takes ${System.currentTimeMillis() - start} ms") } } } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
