[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

rvesse Tue, 17 Jul 2018 01:57:27 -0700

Github user rvesse commented on the issue:

    https://github.com/apache/spark/pull/13599
  
    @holdenk What we're doing in some of our products currently is that we 
require that users create their Python environments up front and that they be 
stored on a file system that is accessible to all physical nodes.  This is 
partly for performance and partly because our compute nodes don't have external 
network connectivity.
    
    Then when we spin up containers we volume mount the appropriate file system 
into our containers and have logic in our entry point scripts that activates 
the relevant environment prior to starting Spark, Dask Distributed or whatever 
Python job we're actually launching.
    
    We're doing this with Spark standalone clusters currently but I expect much 
the same approach would work for Kubernetes and other resource managers.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

Reply via email to