IvanVergiliev edited a comment on issue #24068: [SPARK-27105][SQL] Optimize 
away exponential complexity in ORC predicate conversion
URL: https://github.com/apache/spark/pull/24068#issuecomment-503021377
 
 
   @cloud-fan I reverted the benchmark changes, and updated the benchmark in 
the PR description so it only runs the filter conversion. It's even nicer to 
look at now, with the old version very visibly increasing 2x on each iteration, 
and the new one taking roughly 0 ms for each size.
   
   I don't remember if we came to a conclusion on whether I should bring this 
PR back into a state where it has separate `filter` and `build` methods so the 
followup changes are easier, or if we should merge it as is and do all changes 
in followups. Let me know what you'd like me to do.
   
   There's one minor caveat with benchmark code. The `OrcFilters` class is 
marked `private[sql]`, so I needed to do some Scala gymnastics to enable the 
benchmark to call it directly. This is the full version of the code:
   
   ```scala
   // Paste this into spark-shell using the `-raw` flag so it gets interpreted 
as
   // a Scala file and so that we can trick spark-shell into thinking our class 
is
   // actually in the `sql` package and can thus access `OrcFilters`.
   :paste -raw
   
   package org.apache.spark.sql
   
   import org.apache.spark.sql.execution.datasources.orc._
   import org.apache.spark.sql.types._
   import org.apache.spark.sql.sources._
   object OrcFiltersTest {
     def foo(): Unit = {
       val schema = StructType.fromDDL("col INT")
       (20 to 30).foreach { width =>
         val whereFilter = (1 to width).map(i => EqualTo("col", 
i)).reduceLeft(Or)
         val start = System.currentTimeMillis()
         OrcFilters.createFilter(schema, Seq(whereFilter))
         println(s"With $width filters, conversion takes 
${System.currentTimeMillis() - start} ms")
       }
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to