JoshRosen commented on code in PR #46705:
URL: https://github.com/apache/spark/pull/46705#discussion_r1613896716


##########
core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala:
##########
@@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging {
    */
   def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = {
     val tm = new TaskMetrics
-    for (acc <- accums) {
+    val (innerAccums, externalAccums) = accums.

Review Comment:
   > tm.nameToAccums is a fixed size LinkedHashMap, why do we need create 
another local variable?
   
   I think the concern was that each call to `.nameToAccums` under the hood 
goes through a `lazy val` which is slightly more expensive to access than an 
ordinary field.
   
   On the other hand, I think the old and new code are performing the same 
total number of `nameToAccums` accesses:
   
   - Before, we'd call `tm.nameToAccums` once per named accumulator to check 
whether it was a task metric, then a second time for each task metric to 
retrieve the actual value.
   - In the new code we perform the same calls but in a different sequence: we 
now do all of the initial calls in a batch then do the second "retrieve the 
actual accum" in a second pass.
   
   Given that we're performing the same total number of calls, I don't think 
that the new code represents a perf. regression w.r.t. those accesses, so the 
suggestion of storing `val nameToAccums = tm.nameToAccums` would just be a 
small performance optimization rather than a perf. regression fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to