[GitHub] spark pull request: [SPARK-4092] [CORE] Fix InputMetrics for coale...

ksakellis Mon, 12 Jan 2015 10:02:53 -0800

Github user ksakellis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3120#discussion_r22807972
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala ---
    @@ -153,34 +157,19 @@ class NewHadoopRDD[K, V](
               throw new java.util.NoSuchElementException("End of stream")
             }
             havePair = false
    -
    -        // Update bytes read metric every few records
    -        if (recordsSinceMetricsUpdate == 
HadoopRDD.RECORDS_BETWEEN_BYTES_READ_METRIC_UPDATES
    --- End diff --
    
    @pwendell There is a long thread in this pr between @sryza and 
@kayousterhout about why we need to add the call back to the input metrics. The 
reason is to prevent clobbering between different HadoopRdds. For example 
CartesianRdd - this is why there is a specific unit test for that case. I don't 
think we can do anything correctly if we don't have the callbacks in the 
inputMetrics.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4092] [CORE] Fix InputMetrics for coale...

Reply via email to