Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20265#discussion_r161431971
  
    --- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala ---
    @@ -483,6 +484,64 @@ object OrcReadBenchmark {
         }
       }
     
    +  def filterPushDownBenchmark(values: Int, width: Int): Unit = {
    +    val benchmark = new Benchmark(s"Filter Pushdown", values)
    +
    +    withTempPath { dir =>
    +      withTempTable("t1", "nativeOrcTable", "hiveOrcTable") {
    +        import spark.implicits._
    +        val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) 
c$i")
    +        val whereExpr = (1 to width).map(i => s"NOT c$i LIKE 
'%not%exist%'").mkString(" AND ")
    --- End diff --
    
    oh sorry I missed the `uniqueID` part. So the `like` operation is just to 
make the difference larger? We don't need to do this, just a simple predicate 
like `col = 1` or `col < 1`,  to show normally how much PPD improves 
performance.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to