Github user wulei-bj-cn commented on the pull request:

    https://github.com/apache/spark/pull/8533#issuecomment-136570529
  
    Dear Owen, thanks for checking my updates. I'm not saying this locality 
level being ANY all the time issue is caused by your code. Actually, it lies in 
code in org.apache.spark.scheduler.TaskSetManager:
    
    // Check for node-local tasks
    if (TaskLocality.isAllowed(locality, TaskLocality.NODE_LOCAL)) {
    for (index <- speculatableTasks if canRunOnHost(index)) {
    val locations = tasks(index).preferredLocations.map(_.host)
    if (locations.contains(host))
    { speculatableTasks -= index return Some((index, TaskLocality.NODE_LOCAL)) }
    }
    }
    
    The variable "locations" is hostnames of HDFS splits, which is from 
InetAddress.getHostName.
    The variable "host" is IP address of an executor, which is from 
InetAddress.getHostAddress.
    
    And this "host" variable's value is read from 
    org.apache.spark.deploy.worker.WorkerArguments
    where var host = Utils.localHostName()
    
    Therefore, it leads to Utils.scala. I'm not saying we have to update 
Utils.scala to make things work, maybe we could update codes somewhere else to 
make this ANY go away too. Yet I just thought maybe updating a little bit code 
within Utils.scala would be kind of an easier way to do that. Or probably I'm 
wrong :)
    
    Your solution of giving end user an option of "SPARK_LOCAL_HOSTNAME" works 
fine, given the tests I did with/without it on a Spark cluster of 4 nodes. No 
offense, but this setting is not typical in popular distributed computing 
systems. I mean, when it comes to deployment and maintenance, the configuration 
files (in our case, files under $SPARK_HOME/conf) should all be the same on all 
cluster nodes. However, this "SPARK_LOCAL_HOSTNAME" definitely will introduce 
differences on different nodes. And that's why I'd like to introduce a new 
setting "SPARK_USE_HOSTNAME", whose value could be the same on all cluster 
nodes, i.e. either "true" or "false".
     
    About the multiple NICs you mentioned, I think it is a concern that OS 
should care about instead of our Spark. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to