[GitHub] [spark] mridulm commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

GitBox Mon, 21 Jun 2021 16:50:17 -0700


mridulm commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r655778959




##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String 
appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       In spark read side, we treat a single read of `ShuffleBlockBatchId` as a 
single block read (from metric point of view).
   A [recent 
discussion](https://github.com/apache/spark/pull/32140#discussion_r652733447) 
of this in push based shuffle here for context (see `ShuffleBlockBatchId` in 
`ShuffleBlockFetcherIterator`).
   
   Given that, I am fine with treating a batch read as a single read - given it 
would be contiguous (effectively, similar to reading a 'large' block).
   
   Thoughts @dongjoon-hyun  ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Reply via email to