HyukjinKwon commented on a change in pull request #26127: [SPARK-29348][SQL] 
Add observable Metrics for Streaming queries
URL: https://github.com/apache/spark/pull/26127#discussion_r361886972
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
 ##########
 @@ -106,6 +106,9 @@ class QueryExecution(
   lazy val toRdd: RDD[InternalRow] = new SQLExecutionRDD(
     executedPlan.execute(), sparkSession.sessionState.conf)
 
+  /** Get the metrics observed during the execution of the query plan. */
+  def observedMetrics: Map[String, Row] = 
CollectMetricsExec.collect(executedPlan)
 
 Review comment:
   I think `queryExecution` is exposed in `Dataset` as an unstable API. I think 
the comments in the class implies that as well:
   
   >  \* The primary workflow for executing relational queries using Spark.  
Designed to allow easy
   >  \* access to the intermediate phases of query execution for developers.
   >  \*
   >  \* While this is not a public class, we should avoid changing the 
function names for the sake of
   >  \* changing them, because a lot of developers use the feature for 
debugging.
   
   I agree that using methods in this class here is discouraged though. Maybe 
we had to mark `Dataset.observe` as an unstable API or developer API too for 
now if it is difficult to avoid adding and using an API here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to