Github user wulei-bj-cn commented on a diff in the pull request:
https://github.com/apache/spark/pull/8533#discussion_r38819729
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -190,11 +197,15 @@ private[spark] class TaskSetManager(
}
for (loc <- tasks(index).preferredLocations) {
+
+ val locHost: String = if( sparkLocalHostname != None ) loc.host
--- End diff --
Yeah, right, I'll update accordingly to follow Spark/Scala styles.
The reason why it depends on SPARK_LOCAL_HOSTNAME is :
a) If SPARK_LOCAL_HOSTNAME is not set manually, 'Spark side', i.e. the
Spark task scheduler, will use IP address by default (as controlled from
Utils.scala), but 'Hadoop side', i.e. HadoopRDD will still use host names, so
we need to resolve HadoopRDD's host names to IP addresses, so that they could
match with the 'Spark side'.
b) If SPARK_LOCAL_HOSTNAME is set manually, then the 'Spark side' will use
host names instead(still, as controlled within Utils.scala), and as stated, the
'Hadoop side' always use host names, we wanna make sure we won't convert the
'Hadoop side' host names to IP addresses in this case, because they won't match
with 'Spark side' host names otherwise.
That's why we need to first check if SPARK_LOCAL_HOSTNAME is set manually.
The necessity of resolving host names (of HadoopRDDs) to IP addresses depends
on this setting.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]