zhouyejoe commented on code in PR #37638:
URL: https://github.com/apache/spark/pull/37638#discussion_r1065234001
##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -1904,4 +1950,52 @@ long getPos() {
return pos;
}
}
+
+ /**
+ * A class that wraps all the push-based shuffle service metrics.
+ */
+ static class PushMergeMetrics implements MetricSet {
+ // blockAppendCollisions tracks how many times a shuffle block collided
because
+ // of another block for the same reduce partition was being written
+ static final String BLOCK_APPEND_COLLISIONS_METRIC =
"blockAppendCollisions";
+ // lateBlockPushes tracks how many times a shuffle block push request is
too late
Review Comment:
Nit: "how many times a shuffle block push request is too late" -----> "the
number of shuffle push blocks that are received in shuffle service after the
specific shuffle merge has been finalized". Please also update the wording in
monitoring.md. Same for other wording suggestions below
##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -1904,4 +1950,52 @@ long getPos() {
return pos;
}
}
+
+ /**
+ * A class that wraps all the push-based shuffle service metrics.
+ */
+ static class PushMergeMetrics implements MetricSet {
+ // blockAppendCollisions tracks how many times a shuffle block collided
because
+ // of another block for the same reduce partition was being written
Review Comment:
Nit: "how many times a shuffle block collided because of another block for
the same reduce partition was being written" to "the number of shuffle push
blocks collided in shuffle services as blocks for the same reduce partitions
were being written".
##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -1904,4 +1950,52 @@ long getPos() {
return pos;
}
}
+
+ /**
+ * A class that wraps all the push-based shuffle service metrics.
+ */
+ static class PushMergeMetrics implements MetricSet {
+ // blockAppendCollisions tracks how many times a shuffle block collided
because
+ // of another block for the same reduce partition was being written
+ static final String BLOCK_APPEND_COLLISIONS_METRIC =
"blockAppendCollisions";
+ // lateBlockPushes tracks how many times a shuffle block push request is
too late
+ static final String LATE_BLOCK_PUSHES_METRIC = "lateBlockPushes";
+ // blockBytesWritten tracks the length of the pushed block data written to
file in bytes
+ static final String BLOCK_BYTES_WRITTEN_METRIC = "blockBytesWritten";
+ // deferredBlockBytes tracks the size of the current deferred block parts
buffered in memory
+ static final String DEFERRED_BLOCK_BYTES_METRIC = "deferredBlockBytes";
+ // deferredBlocks tracks the number of the current deferred block parts
buffered in memory
+ static final String DEFERRED_BLOCKS_METRIC = "deferredBlocks";
+ // staleBlockPushes tracks how many times a shuffle block push request it
stale
Review Comment:
Nit: "how many times a shuffle block push request it stale" to "the number
of stale shuffle block push requests"
##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -1904,4 +1950,52 @@ long getPos() {
return pos;
}
}
+
+ /**
+ * A class that wraps all the push-based shuffle service metrics.
+ */
+ static class PushMergeMetrics implements MetricSet {
+ // blockAppendCollisions tracks how many times a shuffle block collided
because
+ // of another block for the same reduce partition was being written
+ static final String BLOCK_APPEND_COLLISIONS_METRIC =
"blockAppendCollisions";
+ // lateBlockPushes tracks how many times a shuffle block push request is
too late
+ static final String LATE_BLOCK_PUSHES_METRIC = "lateBlockPushes";
+ // blockBytesWritten tracks the length of the pushed block data written to
file in bytes
Review Comment:
Nit: "the length" to "the size"
##########
docs/monitoring.md:
##########
@@ -1403,6 +1403,15 @@ Note: applies to the shuffle service
- shuffle-server.usedDirectMemory
- shuffle-server.usedHeapMemory
+Below shuffle service server-side metrics are specific to the Push-Based
Shuffle
+
Review Comment:
The format here is problematic.
If it is note, we need to just use a single line start with "- **notes:** ".
We also need to mention the configuration needed to enable these metrics.
```suggestion
- shuffle-server.usedDirectMemory
- shuffle-server.usedHeapMemory
- **notes:**
- The metrics below are specific to push-based shuffle and only emitted
when
`spark.shuffle.push.server.mergedShuffleFileManagerImpl` is configured as
"org.apache.spark.network.shuffle.RemoteBlockPushResolver".
- blockAppendCollisions ...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]