JoshRosen commented on code in PR #47192:
URL: https://github.com/apache/spark/pull/47192#discussion_r1667098111
##########
core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java:
##########
@@ -507,4 +525,19 @@ public long getMemoryConsumptionForThisTask() {
public MemoryMode getTungstenMemoryMode() {
return tungstenMemoryMode;
}
+
+ /**
+ * Returns peak task-level off-heap memory usage in bytes.
+ *
+ */
+ public long getPeakOnHeapExecutionMemory() {
+ return peakOnHeapMemory;
+ }
Review Comment:
Brainstorming:
Do we need to worry about synchronization for the field accesses here (and
in the other getter)? Right now the writes to these fields are guarded by
`synchronized (this)` but the reads are unsynchronized.
If we did synchronize these, would we have to worry about deadlocks? I don't
think so, because TaskMemoryManager is internal to Spark and right now the only
call to these getters occurs once at the end of the task during final metrics
emission (and occurs while not holding other TMM / MM locks). Also,
TaskMemoryManager is not a DeveloperAPI thus we wouldn't expect third-party or
user code to be calling this during task execution.
If we want to sidestep any potential deadlock concerns, though, then we
could make both longs `volatile`, similar to the existing `private volatile
long acquiredButNotUsed` (see https://github.com/apache/spark/pull/12681).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]