cloud-fan commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-785170297


   I think it's too much work to support arbitrary accumulators in 
`SQLMetrics`. During these years, it seems sufficient to support various 
metrics (size, time, average, etc.) with simple long values. So I agree with 
the design here that the v2 metrics API should leverage `SQLMetrics` under the 
hood.
   
   However, I share the same concern from @zsxwing that using enum to indicate 
the metrics merging logic is too limited. I don't think it's hard to design an 
API to merge a bunch of long values.
   
   First, let's look at the task side. The read/write task needs to update the 
value of executor-side `SQLMetrics`, and Spark will send back the metrics 
update to the driver via heartbeat event or task complete event. The current PR 
adds a new method to `PartitionReader` to report the current metrics values, 
which looks good. Spark needs to call the new method at the end of task (at the 
end of each epoch for continuous mode), get the metrics value, find the 
corresponding `SQLMetrics` and update its value. We can polish the API from the 
current PR a little bit:
   ```
   interface CustomTaskMetrics {
     def name: String
     def value: Long
   }
   
   interface PartitionReader {
     ...
     def currentMetrics: Array[CustomTaskMetrics] = Nil
   }
   ```
   
   Then, let's look at the driver side. The driver side needs to aggregate the 
task metrics and produce a string that will be displayed in the UI. The current 
PR adds a new API `CustomMetrics` which is returned by `Scan`. This looks good 
and we just need to replace the metrics type with a real merge method
   ```
   interface CustomMetrics {
     def name: String
     ...
     def aggregateTaskMetrics(taskMetrics: Array[Long]): String
   }
   ```
   
   Internally, we can add a new `V2_CUSTOM` metrics type in `SQLMetrics`. For 
this metrics type, we delegate the aggregating logic to the corresponding 
`CustomMetrics`.
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to