zhouyejoe commented on code in PR #37638:
URL: https://github.com/apache/spark/pull/37638#discussion_r1065234001


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -1904,4 +1950,52 @@ long getPos() {
       return pos;
     }
   }
+
+  /**
+   * A class that wraps all the push-based shuffle service metrics.
+   */
+  static class PushMergeMetrics implements MetricSet {
+    // blockAppendCollisions tracks how many times a shuffle block collided 
because
+    // of another block for the same reduce partition was being written
+    static final String BLOCK_APPEND_COLLISIONS_METRIC = 
"blockAppendCollisions";
+    // lateBlockPushes tracks how many times a shuffle block push request is 
too late

Review Comment:
   Nit: "how many times a shuffle block push request is too late" -----> "the 
number of shuffle push blocks that are received in shuffle service after the 
specific shuffle merge has been finalized". Please also update the wording in 
monitoring.md. Same for other wording suggestions below



##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -1904,4 +1950,52 @@ long getPos() {
       return pos;
     }
   }
+
+  /**
+   * A class that wraps all the push-based shuffle service metrics.
+   */
+  static class PushMergeMetrics implements MetricSet {
+    // blockAppendCollisions tracks how many times a shuffle block collided 
because
+    // of another block for the same reduce partition was being written

Review Comment:
   Nit: "how many times a shuffle block collided because of another block for 
the same reduce partition was being written" to "the number of shuffle push 
blocks collided in shuffle services as blocks for the same reduce partitions 
were being written".



##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -1904,4 +1950,52 @@ long getPos() {
       return pos;
     }
   }
+
+  /**
+   * A class that wraps all the push-based shuffle service metrics.
+   */
+  static class PushMergeMetrics implements MetricSet {
+    // blockAppendCollisions tracks how many times a shuffle block collided 
because
+    // of another block for the same reduce partition was being written
+    static final String BLOCK_APPEND_COLLISIONS_METRIC = 
"blockAppendCollisions";
+    // lateBlockPushes tracks how many times a shuffle block push request is 
too late
+    static final String LATE_BLOCK_PUSHES_METRIC = "lateBlockPushes";
+    // blockBytesWritten tracks the length of the pushed block data written to 
file in bytes
+    static final String BLOCK_BYTES_WRITTEN_METRIC = "blockBytesWritten";
+    // deferredBlockBytes tracks the size of the current deferred block parts 
buffered in memory
+    static final String DEFERRED_BLOCK_BYTES_METRIC = "deferredBlockBytes";
+    // deferredBlocks tracks the number of the current deferred block parts 
buffered in memory
+    static final String DEFERRED_BLOCKS_METRIC = "deferredBlocks";
+    // staleBlockPushes tracks how many times a shuffle block push request it 
stale

Review Comment:
   Nit: "how many times a shuffle block push request it stale" to "the number 
of stale shuffle block push requests"



##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -1904,4 +1950,52 @@ long getPos() {
       return pos;
     }
   }
+
+  /**
+   * A class that wraps all the push-based shuffle service metrics.
+   */
+  static class PushMergeMetrics implements MetricSet {
+    // blockAppendCollisions tracks how many times a shuffle block collided 
because
+    // of another block for the same reduce partition was being written
+    static final String BLOCK_APPEND_COLLISIONS_METRIC = 
"blockAppendCollisions";
+    // lateBlockPushes tracks how many times a shuffle block push request is 
too late
+    static final String LATE_BLOCK_PUSHES_METRIC = "lateBlockPushes";
+    // blockBytesWritten tracks the length of the pushed block data written to 
file in bytes

Review Comment:
   Nit: "the length" to "the size"



##########
docs/monitoring.md:
##########
@@ -1403,6 +1403,15 @@ Note: applies to the shuffle service
 - shuffle-server.usedDirectMemory
 - shuffle-server.usedHeapMemory
 
+Below shuffle service server-side metrics are specific to the Push-Based 
Shuffle
+

Review Comment:
   The format here is problematic.
   If it is note, we need to just use a single line start with "- **notes:** ". 
 We also need to mention the configuration needed to enable these metrics.
   ```suggestion
   - shuffle-server.usedDirectMemory
   - shuffle-server.usedHeapMemory
   - **notes:** 
     - The metrics below are specific to push-based shuffle and only emitted 
when 
     `spark.shuffle.push.server.mergedShuffleFileManagerImpl` is configured as 
     "org.apache.spark.network.shuffle.RemoteBlockPushResolver".
   - blockAppendCollisions ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to