monkeyboy123 commented on code in PR #46705:
URL: https://github.com/apache/spark/pull/46705#discussion_r1614941974
##########
core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala:
##########
@@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging {
*/
def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = {
val tm = new TaskMetrics
- for (acc <- accums) {
+ val (innerAccums, externalAccums) = accums.
+ partition(t => t.name.isDefined && tm.nameToAccums.contains(t.name.get))
Review Comment:
> Do we do a lot of intermediate collections allocations in this
`.partition` operation itself? e.g. under the hood, would we be resizing an
array builder or doing a bunch of link
Good suggestion, as I found that it costs much time in fromAccumulators
function in some scenarios (as Flame Graphs shows)

.Also we do a test offline `tm._externalAccums.addAll` VS
`tm._externalAccums.add`, the first one speed up the compution.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]