seancxmao commented on a change in pull request #23258: 
[SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metrics while Sort is missing
URL: https://github.com/apache/spark/pull/23258#discussion_r244346086
 
 

 ##########
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
 ##########
 @@ -185,19 +190,105 @@ trait SQLMetricsTestUtils extends SQLTestUtils {
       df: DataFrame,
       expectedNumOfJobs: Int,
       expectedMetrics: Map[Long, (String, Map[String, Any])]): Unit = {
-    val optActualMetrics = getSparkPlanMetrics(df, expectedNumOfJobs, 
expectedMetrics.keySet)
+    val expectedMetricsPredicates = expectedMetrics.mapValues { case 
(nodeName, nodeMetrics) =>
+      (nodeName, nodeMetrics.mapValues(expectedMetricValue =>
+        (actualMetricValue: Any) => expectedMetricValue.toString === 
actualMetricValue)
+    )}
+    testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, 
expectedMetricsPredicates)
+  }
+
+  /**
+   * Call `df.collect()` and verify if the collected metrics satisfy the 
specified predicates.
+   * @param df `DataFrame` to run
+   * @param expectedNumOfJobs number of jobs that will run
+   * @param expectedMetricsPredicates the expected metrics predicates. The 
format is
 
 Review comment:
   Because usually metric values are numbers, so for metrics values, predicates 
could be more natural than regular expressions which are more suitable for text 
matching. For simple metric values, helper functions are not needed. However, 
timing and size metric values are a little complex:
   
   * timing metric value example: "\n96.2 MB (32.1 MB, 32.1 MB, 32.1 MB)"
   * size metric value example: "\n2.0 ms (1.0 ms, 1.0 ms, 1.0 ms)"
   
   With helper functions, we extract stats (by `timingMetricStats` or 
`sizeMetricStats` method), then we can apply predicates to check any stats (all 
stats or any single one). `timingMetricAllStatsShould` and 
`sizeMetricAllStatsShould` are not required, they are something like syntax 
sugar to eliminate boilerplate code since timing and size metrics are 
frequently used. If we want to check any single value (e.g sum >=0), we can 
provide a predicate like below:
   ```
   timingMetricStats(_)(0)._1 >= 0
   ```
   
   BTW, may be timing and size metric values should be stored in a more 
structured way rather than pure text format (even with "\n" in values).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to