cloud-fan commented on pull request #31398: URL: https://github.com/apache/spark/pull/31398#issuecomment-772246886
Yea creating a separate PR SGTM. Let's have a high-level discussion first (I haven't read this PR yet). From my understanding, metrics in batch execution can be done as: 1. data source returns custom metrics in each read/write task (via task completion event or heartbeat event) 2. data source aggregates custom metrics from all tasks For microbatch execution, we just repeat the batch execution steps for each microbatch. And we update the metrics in the UI in every microbatch. For continuous execution, it's like an endless batch execution, so we can only use heartbeat events to update metrics. And we update the metrics in the UI in every epoch. The problem here is how to integrate this with Spark SQL. One idea is to use `AccumulatorV2`, which is a public API already and is very flexible. But we need to figure out how to make it work with the SQL UI. The other idea is to use `SQLMetrics`, which is private and we need some API design to map public API to `SQLMetrics`. It also limits the way of aggregating metrics. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
