xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r643248522



##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String 
appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       This is a great point, thanks Dongjoon!
   
   Currently a single batch fetch will be considered as one block fetch by this 
metric, regardless of how many blocks are fetched in the block. For our 
purposes, this is what we're interested in, since a big part of what we want to 
understand with this is the number of random reads we're submitting.
   
   However I see value in breaking it out as the actual number of blocks as 
well. Perhaps we can have two metrics, `blockTransferRate` and 
`blockFetchRequestRate`. In the case of non-batch fetches they will be the 
same, and with batch fetches the `blockTransferRate` will be higher than the 
`blockFetchRequestRate`. WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to