Anton Kirillov created SPARK-28778:
--------------------------------------

             Summary: [MESOS] Shuffle jobs fail due to incorrect advertised 
address when running in virtual network
                 Key: SPARK-28778
                 URL: https://issues.apache.org/jira/browse/SPARK-28778
             Project: Spark
          Issue Type: Bug
          Components: Mesos
    Affects Versions: 2.4.3, 2.3.0, 2.2.3
            Reporter: Anton Kirillov


When shuffle jobs are launched by Mesos in a virtual network, Mesos scheduler 
sets executor {{--hostname}} parameter to {{0.0.0.0}} in the case when 
{{spark.mesos.network.name}} is provided. This makes executors use {{0.0.0.0}} 
as their advertised address and, in the presence of shuffle, executors fail to 
fetch shuffle blocks from each other using {{0.0.0.0}} as the origin. When a 
virtual network is used the hostname or IP address is not known upfront and 
assigned to a container at its start time so the executor process needs to 
advertise the correct dynamically assigned address to be reachable by other 
executors.
h3.  

The bug described above prevents Mesos users from running any jobs which 
involve shuffle due to the inability of executors to fetch shuffle blocks 
because of incorrect advertised address when virtual network is used.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to