vanzin opened a new pull request #23793: [SPARK-24736][k8s] Let spark-submit handle dependency resolution. URL: https://github.com/apache/spark/pull/23793 Before this change, there was some code in the k8s backend to deal with how to resolve dependencies and make them available to the Spark application. It turns out that none of that code is necessary, since spark-submit already handles all that for applications started in client mode - like the k8s driver that is run inside a Spark-created pod. For that reason, specifically for pyspark, there's no need for the k8s backend to deal with PYTHONPATH; or, in general, to change the URIs provided by the user at all. spark-submit takes care of that. For testing, I created a pyspark script that depends on another module that is shipped with --py-files. Then I used: - --py-files http://.../dep.py http://.../test.py - --py-files local:/.../dep.py local:/.../test.py In both cases the driver now see all the needed files, while before the driver would not see the dependency in the http case. The application completes successfully after this patch in the first case. Although that is because currently k8s apps will download files to the working dir, making it possible for the pyspark app to load them without PYTHONPATH tricks. The app itself in the second case did not work before this change, and continues to not work. That's because there's no code in Spark to properly make local: files available in the executor's PYTHONPATH. I'm leaving that as a separate issue. I also tested a Scala app using the main jar from an http server.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
