xkrogen opened a new pull request #34120:
URL: https://github.com/apache/spark/pull/34120


   ### What changes were proposed in this pull request?
   Refactor the logic for constructing the user classpath from 
`yarn.ApplicationMaster` into `yarn.Client` so that it can be leveraged on the 
executor side as well, instead of having the driver construct it and pass it to 
the executor via command-line arguments. A new method, `getUserClassPath`, is 
added to `CoarseGrainedExecutorBackend` which defaults to `Nil` (consistent 
with the existing behavior where non-YARN resource managers do not configure 
the user classpath). `YarnCoarseGrainedExecutorBackend` overrides this to 
construct the user classpath from the existing `APP_JAR` and `SECONDARY_JARS` 
configs. Within `yarn.Client`, environment variables in the configured paths 
are resolved before constructing the classpath.
   
   Please note that this is a re-submission of #32810, which was reverted in 
#34082 due to the issues described in [this 
comment](https://issues.apache.org/jira/browse/SPARK-35672?focusedCommentId=17419285&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17419285).
 This PR additionally includes the changes described in #34084 to resolve the 
issue, though this PR has been enhanced to properly handle escape strings, 
unlike #34084.
   
   ### Why are the changes needed?
   User-provided JARs are made available to executors using a custom 
classloader, so they do not appear on the standard Java classpath. Instead, 
they are passed as a list to the executor which then creates a classloader out 
of the URLs. Currently in the case of YARN, this list of JARs is crafted by the 
Driver (in `ExecutorRunnable`), which then passes the information to the 
executors (`CoarseGrainedExecutorBackend`) by specifying each JAR on the 
executor command line as `--user-class-path /path/to/myjar.jar`. This can cause 
extremely long argument lists when there are many JARs, which can cause the OS 
argument length to be exceeded, typically manifesting as the error message:
   
   > /bin/bash: Argument list too long
   
   A [Google 
search](https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22&oq=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22)
 indicates that this is not a theoretical problem and afflicts real users, 
including ours. Passing this list using the configurations instead resolves 
this issue.
   
   ### Does this PR introduce _any_ user-facing change?
   No, except for fixing the bug, allowing for larger JAR lists to be passed 
successfully. Configuration of JARs is identical to before. Substitution of 
environment variables in `spark.jars` or `spark.yarn.config.replacementPath` 
works as expected.
   
   ### How was this patch tested?
   New unit tests were added in `YarnClusterSuite`. Also, we have been running 
a similar fix internally for 4 months with great success.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to