Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/126#discussion_r10729469
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -259,23 +299,24 @@ private[spark] class MapOutputTrackerMaster(conf:
SparkConf)
bytes
}
- protected override def cleanup(cleanupTime: Long) {
- super.cleanup(cleanupTime)
- cachedSerializedStatuses.clearOldValues(cleanupTime)
- }
-
override def stop() {
super.stop()
+ metadataCleaner.cancel()
cachedSerializedStatuses.clear()
}
- override def updateEpoch(newEpoch: Long) {
- // This might be called on the MapOutputTrackerMaster if we're running
in local mode.
+ protected def cleanup(cleanupTime: Long) {
+ mapStatuses.clearOldValues(cleanupTime)
+ cachedSerializedStatuses.clearOldValues(cleanupTime)
}
+}
- def has(shuffleId: Int): Boolean = {
- cachedSerializedStatuses.get(shuffleId).isDefined ||
mapStatuses.contains(shuffleId)
- }
+/**
+ * MapOutputTracker for the workers, which fetches map output information
from the driver's
+ * MapOutputTrackerMaster.
+ */
+private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends
MapOutputTracker(conf) {
--- End diff --
@pwendell Given that MapOutputTrackerWorker only change the type of HashMap
used over MapOutputTracker, does it make sense to define two separate
MapOutputTrackerMaster and MapOutputTrackerWorker? Or keep it as before,
MapOutputTracker (for workers) and MapOutputTrackerMaster (for driver)? I think
since we instantiate different classes in the driver and the worker, its best
to name them accordingly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---