Github user rvesse commented on the issue: https://github.com/apache/spark/pull/13599 @holdenk What we're doing in some of our products currently is that we require that users create their Python environments up front and that they be stored on a file system that is accessible to all physical nodes. This is partly for performance and partly because our compute nodes don't have external network connectivity. Then when we spin up containers we volume mount the appropriate file system into our containers and have logic in our entry point scripts that activates the relevant environment prior to starting Spark, Dask Distributed or whatever Python job we're actually launching. We're doing this with Spark standalone clusters currently but I expect much the same approach would work for Kubernetes and other resource managers.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org