Thomas Graves created SPARK-21714:
-------------------------------------

             Summary: SparkSubmit in Yarn Client mode downloads remote files 
and then reuploads them again
                 Key: SPARK-21714
                 URL: https://issues.apache.org/jira/browse/SPARK-21714
             Project: Spark
          Issue Type: Bug
          Components: Spark Submit
    Affects Versions: 2.2.0
            Reporter: Thomas Graves
            Priority: Critical


SPARK-10643 added the ability for spark-submit to download remote file in 
client mode.

However in yarn mode this introduced a bug where it downloads them for the 
client but then yarn client just reuploads them to HDFS and uses them again. 
This should not happen when the remote file is HDFS.  This is wasting resources 
and its defeating the  distributed cache because if the original object was 
public it would have been shared by many users. By us downloading and 
reuploading, it becomes private.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to