[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-10 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240148392
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

This makes sense! Let me try. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240146706
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

+1 for @cloud-fan suggestion. I mean, if we cannot check their exact value, 
we should at least check that they exist/have reasonable values. Otherwise this 
UT is useless.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240090371
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

can we just check something like `sortTime  > 0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-09 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240038550
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

@cloud-fan This case tries to check metrics of `SortExec`, however these 
metrics (`sortTime`, `peakMemory`, `spillSize`) change each time  the query is 
executed, they are not fixed. So far what I did is to check whether `SortExec` 
exists. Do you mean we should further check whether these metrics names exist? 
Though we can't know their values beforehand.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240026727
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

can we check the metrics of `SortExec` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-08 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240024723
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

@mgaido91 This case tries to check `Sort` (nodeId=0) metrics, rather than 
`LocalTableScan`. The second parameter (`2`) of `testSparkPlanMetrics(df, 2, 
Map.empty)` means `expectedNumOfJobs` rather than `nodeId`. The third parameter 
`expectedMetrics` will pass `nodeId` together with corresponding expected 
metrics. Because metrics of Sort node (including `sortTime`, `peakMemory`, 
`spillSize`) may change during each execution, unlike metrics like 
`numOutputRows`, we have no way to check these values.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-08 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240003036
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

Thanks for pinging me @maropu. What is the point about checking that 
`LocalTableScan` contains no metrics?

I checked the original PR which introduced this UT by @sameeragarwal who 
can maybe help us stating the goal of the test here (unless someone else can 
answer me, because I have not understood it). It doesn't seem even related to 
the Sort operator to me. Maybe I am missing something.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-07 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r239995952
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// Range(nodeId = 2)
+val df = spark.range(9, -1, -1).sort('id).toDF()
--- End diff --

Using `Seq` instead of `Range`, we makes things simpler and more readable. 
I'll change to use `Seq`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-07 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r239995901
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// Range(nodeId = 2)
+val df = spark.range(9, -1, -1).sort('id).toDF()
+testSparkPlanMetrics(df, 2, Map.empty)
+
df.queryExecution.executedPlan.find(_.isInstanceOf[SortExec]).getOrElse(assert(false))
--- End diff --

OK, I think this is more readable. fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-07 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r239995882
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -26,7 +26,7 @@ import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.expressions.aggregate.{Final, Partial}
 import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
-import org.apache.spark.sql.execution.{FilterExec, RangeExec, SparkPlan, 
WholeStageCodegenExec}
+import org.apache.spark.sql.execution._
--- End diff --

Ok, unfolded.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r239993927
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// Range(nodeId = 2)
+val df = spark.range(9, -1, -1).sort('id).toDF()
--- End diff --

Either is fine to me as we now add assert to make sure Sort node exist.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-07 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r239993889
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// Range(nodeId = 2)
+val df = spark.range(9, -1, -1).sort('id).toDF()
--- End diff --

We need to use `range` here? How about just writing `Seq(1, 3, 2, 
...).toDF("id")`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-07 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r239993718
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// Range(nodeId = 2)
+val df = spark.range(9, -1, -1).sort('id).toDF()
+testSparkPlanMetrics(df, 2, Map.empty)
+
df.queryExecution.executedPlan.find(_.isInstanceOf[SortExec]).getOrElse(assert(false))
--- End diff --


`assert(df.queryExecution.executedPlan.find(_.isInstanceOf[SortExec]).isDefined)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-07 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r239993561
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -26,7 +26,7 @@ import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.expressions.aggregate.{Final, Partial}
 import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
-import org.apache.spark.sql.execution.{FilterExec, RangeExec, SparkPlan, 
WholeStageCodegenExec}
+import org.apache.spark.sql.execution._
--- End diff --

nit: unfold?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-07 Thread seancxmao
GitHub user seancxmao opened a pull request:

https://github.com/apache/spark/pull/23258

[SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metrics while Sort is missing

## What changes were proposed in this pull request?
#20560/[SPARK-23375](https://issues.apache.org/jira/browse/SPARK-23375) 
introduced an optimizer rule to eliminate redundant Sort. For a test case named 
"Sort metrics" in `SQLMetricsSuite`, because range is already sorted, sort is 
removed by the `RemoveRedundantSorts`, which makes this test case meaningless.

This PR modifies the query for testing Sort metrics and checks Sort exists 
in the plan.

## How was this patch tested?
Modify the existing test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/seancxmao/spark sort-metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23258.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23258


commit 408ccf88325c23c6f2f6119cd6ba99c5056f95a2
Author: seancxmao 
Date:   2018-12-08T03:58:21Z

[SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metrics while Sort is missing




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org