[GitHub] spark pull request #23224: [SPARK-26277][SQL][TEST] WholeStageCodegen metric...

2018-12-07 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23224#discussion_r239992863
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -80,8 +80,10 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
 // Assume the execution plan is
 // WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Filter(nodeId = 
1))
 // TODO: update metrics in generated operators
-val ds = spark.range(10).filter('id < 5)
-testSparkPlanMetrics(ds.toDF(), 1, Map.empty)
+val df = spark.range(10).filter('id < 5).toDF()
+testSparkPlanMetrics(df, 1, Map.empty, true)
+
df.queryExecution.executedPlan.find(_.isInstanceOf[WholeStageCodegenExec])
+  .getOrElse(assert(false))
--- End diff --

Thank you @viirya. Very good suggestions.

After investigation, besides whole-stage codegen related issue, I found 
another issue. 
#20560/[SPARK-23375](https://issues.apache.org/jira/browse/SPARK-23375) 
introduced an optimizer rule to eliminate redundant Sort. For a test case named 
"Sort metrics" in `SQLMetricsSuite`, because range is already sorted, sort is 
removed by the `RemoveRedundantSorts`, which makes the test case meaningless. 
This seems to be a pretty different issue, so I opened a new PR. See #23258 for 
details.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23224: [SPARK-26277][SQL][TEST] WholeStageCodegen metric...

2018-12-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23224#discussion_r239837818
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -80,8 +80,10 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
 // Assume the execution plan is
 // WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Filter(nodeId = 
1))
 // TODO: update metrics in generated operators
-val ds = spark.range(10).filter('id < 5)
-testSparkPlanMetrics(ds.toDF(), 1, Map.empty)
+val df = spark.range(10).filter('id < 5).toDF()
+testSparkPlanMetrics(df, 1, Map.empty, true)
+
df.queryExecution.executedPlan.find(_.isInstanceOf[WholeStageCodegenExec])
+  .getOrElse(assert(false))
--- End diff --

Seems test `Sort metric` also has similar issue:

```scala
test("Sort metrics") {
  // Assume the execution plan is
  // WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
  val ds = spark.range(10).sort('id)
  testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org