Github user wulei-bj-cn commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8533#discussion_r38819729
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -190,11 +197,15 @@ private[spark] class TaskSetManager(
         }
     
         for (loc <- tasks(index).preferredLocations) {
    +
    +      val locHost: String = if( sparkLocalHostname != None ) loc.host
    --- End diff --
    
    Yeah, right, I'll update accordingly to follow Spark/Scala styles. 
    
    The reason why it depends on SPARK_LOCAL_HOSTNAME is : 
    a) If SPARK_LOCAL_HOSTNAME is not set manually, 'Spark side', i.e. the 
Spark task scheduler,  will use IP address by default (as controlled from 
Utils.scala), but 'Hadoop side', i.e. HadoopRDD will still use host names, so 
we need to resolve HadoopRDD's host names to IP addresses, so that they could 
match with the 'Spark side'.
    b) If SPARK_LOCAL_HOSTNAME is set manually, then the 'Spark side' will use 
host names instead(still, as controlled within Utils.scala), and as stated, the 
'Hadoop side' always use host names, we wanna make sure we won't convert the 
'Hadoop side' host names to IP addresses in this case, because they won't match 
with 'Spark side' host names otherwise.
    
    That's why we need to first check if SPARK_LOCAL_HOSTNAME is set manually. 
The necessity of resolving host names (of HadoopRDDs) to IP addresses depends 
on this setting. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to