Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13527#discussion_r65934091
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -444,6 +444,7 @@ object SparkSubmit {
           OptionAssigner(args.deployMode, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
             sysProp = "spark.submit.deployMode"),
           OptionAssigner(args.name, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES, 
sysProp = "spark.app.name"),
    +      OptionAssigner(args.jars, ALL_CLUSTER_MGRS, CLIENT, sysProp = 
"spark.jars"),
    --- End diff --
    
    I'm pretty sure this is not what you want to do. If you look later, you 
have these lines:
    
    ```
    OptionAssigner(args.jars, YARN, ALL_DEPLOY_MODES, sysProp = 
"spark.yarn.dist.jars"),
    OptionAssigner(args.jars, LOCAL, CLIENT, sysProp = "spark.jars"),
    OptionAssigner(args.jars, STANDALONE | MESOS, ALL_DEPLOY_MODES, sysProp = 
"spark.jars"),
    ```
    
    So I see how the change caused an issue with the repl in yarn-client mode. 
But your fix would cause issues with the yarn backend, where the same jars 
would be distributed twice (via "spark.jars" and "spark.yarn.dist.jars").
    
    But it seems like the issue you found is not restricted to the shell 
itself; SparkContext seems to also only look at `spark.jars` and not the 
yarn-specific one. So jars specified in the command line would not show up in 
the driver's class loader in yarn-client mode.
    
    I see two options here:
    - revert the original change that used the distributed cache in yarn-client 
mode; this seems like the safer bet for Spark 2. That would mean adding this 
line you're adding, and also removing the others.
    - make SparkContext and other places that only process `spark.jars` also 
process `spark.yarn.dist.jars`. This seems like something that we have to do to 
properly support `spark.yarn.dist.jars` anyway.
    
    /cc @jerryshao since he made the original changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to