[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

mshtelma Sat, 21 Apr 2018 03:31:46 -0700

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r183206628
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
           }
         }
       }
    +
    +  test("Simple queries must be working, if CBO is turned on") {
    +    withSQLConf(("spark.sql.cbo.enabled", "true")) {
    +      withTable("TBL1", "TBL") {
    +        import org.apache.spark.sql.functions._
    +        val df = spark.range(1000L).select('id,
    +          'id * 2 as "FLD1",
    +          'id * 12 as "FLD2",
    +          lit("aaa") + 'id as "fld3")
    +        df.write
    +          .mode(SaveMode.Overwrite)
    +          .bucketBy(10, "id", "FLD1", "FLD2")
    +          .sortBy("id", "FLD1", "FLD2")
    +          .saveAsTable("TBL")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
    +        spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, 
FLD1, FLD2, FLD3")
    +        val df2 = spark.sql(
    +          """
    +             SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
    +             FROM tbl t1
    +             JOIN tbl t2 on t1.id=t2.id
    +             WHERE  t1.fld3 IN (-123.23,321.23)
    +          """.stripMargin)
    +        df2.createTempView("TBL2")
    +        spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  
").explain()
    --- End diff --
    
    @wzhfy has suggested calling explain in order to trigger query optimization 
and calling FilterEstimation.evaluateInSet method. 
    I can call collect() instead. 
    I think explain() is sufficient for this test.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

Reply via email to