HeartSaVioR commented on a change in pull request #26127: [SPARK-29348][SQL]
Add observable Metrics for Streaming queries
URL: https://github.com/apache/spark/pull/26127#discussion_r361956851
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
##########
@@ -106,6 +106,9 @@ class QueryExecution(
lazy val toRdd: RDD[InternalRow] = new SQLExecutionRDD(
executedPlan.execute(), sparkSession.sessionState.conf)
+ /** Get the metrics observed during the execution of the query plan. */
+ def observedMetrics: Map[String, Row] =
CollectMetricsExec.collect(executedPlan)
Review comment:
The batch listener is no longer marked as experimental in 3.0, as there have
been some of slight modifications (enough for `@Evolving`) but it doesn't
change majorly during 4 years of life - see #25558. If we feel concerned it
could be rolled back to Experimental/Unstable, though I know there're couple of
projects in Spark ecosystem already leveraging it, and it represents that the
possibility is not restricted to the debug purpose.
The major feature of batch listener is that (unlikely of streaming query
listener which summarizes information and stores into a new data structure) it
exposes various plans for both logical/physical, and I don't imagine these
plans will be used for execution-purposes. Mostly read-only. It may not
unrealistic if we could provide these plans for read-only (cloned, can't
execute), but yeah, it may not be easy as it seems (not enough familiar with
it).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]