Github user ksakellis commented on a diff in the pull request:
https://github.com/apache/spark/pull/3120#discussion_r22807972
--- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala ---
@@ -153,34 +157,19 @@ class NewHadoopRDD[K, V](
throw new java.util.NoSuchElementException("End of stream")
}
havePair = false
-
- // Update bytes read metric every few records
- if (recordsSinceMetricsUpdate ==
HadoopRDD.RECORDS_BETWEEN_BYTES_READ_METRIC_UPDATES
--- End diff --
@pwendell There is a long thread in this pr between @sryza and
@kayousterhout about why we need to add the call back to the input metrics. The
reason is to prevent clobbering between different HadoopRdds. For example
CartesianRdd - this is why there is a specific unit test for that case. I don't
think we can do anything correctly if we don't have the callbacks in the
inputMetrics.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]