[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183826745
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -392,6 +392,10 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
 val dataType = attr.dataType
 var newNdv = ndv
 
+if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
--- End diff --

Yeah, we need to correct it in the next PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183685527
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -392,6 +392,10 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
 val dataType = attr.dataType
 var newNdv = ndv
 
+if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty)  {
--- End diff --

why `colStat.min.isEmpty || colStat.max.isEmpty` means empty output? string 
type always has no max/min


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21052


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-21 Thread mshtelma
Github user mshtelma commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183220650
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, 
FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
+df2.createTempView("TBL2")
+sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-21 Thread mshtelma
Github user mshtelma commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183220647
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, 
FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183219812
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, 
FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
--- End diff --

Nit:
```Scala
  """
|SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
|FROM tbl t1
|JOIN tbl t2 on t1.id=t2.id
|WHERE  t1.fld3 IN (-123.23,321.23)
  """.stripMargin)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183219803
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, 
FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
+df2.createTempView("TBL2")
+sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  ").explain()
--- End diff --

Please do not use `explain()`. It will output the strings to the console. 
You can just do this:
```
sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 
'qwe')").queryExecution.executedPlan
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-21 Thread mshtelma
Github user mshtelma commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183206908
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-21 Thread mshtelma
Github user mshtelma commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183206916
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, 
FLD1, FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
+df2.createTempView("TBL2")
+spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  
").explain()
+  }
+}
+
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-21 Thread mshtelma
Github user mshtelma commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183206919
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-21 Thread mshtelma
Github user mshtelma commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183206913
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, 
FLD1, FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
+df2.createTempView("TBL2")
+spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  
").explain()
+  }
+}
+
+  }
+
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-21 Thread mshtelma
Github user mshtelma commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183206628
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, 
FLD1, FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
+df2.createTempView("TBL2")
+spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  
").explain()
--- End diff --

@wzhfy has suggested calling explain in order to trigger query optimization 
and calling FilterEstimation.evaluateInSet method. 
I can call collect() instead. 
I think explain() is sufficient for this test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-20 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183190383
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
--- End diff --

nit: you don't need the `spark.` prefix


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-20 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183190432
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, 
FLD1, FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
+df2.createTempView("TBL2")
+spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  
").explain()
--- End diff --

Why this `explain()` called?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-20 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183190234
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, 
FLD1, FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
+df2.createTempView("TBL2")
+spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  
").explain()
+  }
+}
+
+  }
+
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-20 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183190221
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
+  withTable("TBL1", "TBL") {
+import org.apache.spark.sql.functions._
+val df = spark.range(1000L).select('id,
+  'id * 2 as "FLD1",
+  'id * 12 as "FLD2",
+  lit("aaa") + 'id as "fld3")
+df.write
+  .mode(SaveMode.Overwrite)
+  .bucketBy(10, "id", "FLD1", "FLD2")
+  .sortBy("id", "FLD1", "FLD2")
+  .saveAsTable("TBL")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ")
+spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, 
FLD1, FLD2, FLD3")
+val df2 = spark.sql(
+  """
+ SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
+ FROM tbl t1
+ JOIN tbl t2 on t1.id=t2.id
+ WHERE  t1.fld3 IN (-123.23,321.23)
+  """.stripMargin)
+df2.createTempView("TBL2")
+spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')  
").explain()
+  }
+}
+
--- End diff --

nit: drop this line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...

2018-04-20 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21052#discussion_r183190136
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Simple queries must be working, if CBO is turned on") {
+withSQLConf(("spark.sql.cbo.enabled", "true")) {
--- End diff --

nit: `withSQLConf(SQLConf.CBO_ENABLED.key -> "true")`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org