[GitHub] [spark] viirya commented on a change in pull request #29441: [SPARK-32626] Do not increase the input metrics when read rdd from cache

GitBox Sun, 16 Aug 2020 09:50:43 -0700


viirya commented on a change in pull request #29441:
URL: https://github.com/apache/spark/pull/29441#discussion_r471133887




##########
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##########
@@ -388,11 +388,9 @@ abstract class RDD[T: ClassTag](
       // Block hit.
       case Left(blockResult) =>
         if (readCachedBlock) {
-          val existingMetrics = context.taskMetrics().inputMetrics
-          existingMetrics.incBytesRead(blockResult.bytes)

Review comment:
       From the doc, I think `inputMetrics` counts all input size the task 
processes. It is not for input size read from disk. Although you don't need to 
read from disk if it is cached, it still increases the amount of input the task 
is going to process.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #29441: [SPARK-32626] Do not increase the input metrics when read rdd from cache

Reply via email to