[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

kokes Thu, 07 Jun 2018 00:47:29 -0700

Github user kokes commented on the issue:

    https://github.com/apache/spark/pull/13599
  
    Hi, thanks for all the work on this! I see requirements.txt mentioned here 
and there and, browsing this and other JIRAs, it seems to be the proposed way 
to specify dependencies in PySpark. As you probably know, the community has 
rallied around [Pipfile](https://github.com/pypa/pipfile)s as a replacement for 
requirements.txt.
    
    This has a few upsides (including a lock file), the main one being that the 
reference implementation ([Pipenv](http://pipenv.org/)) allows for installing 
packages into a new virtualenv directly, without having to activate it or run 
other commands. So that combines dependency management, reproducibility, and 
environment isolation.
    
    (Also, if one doesn't want said packages to be installed in a venv, there's 
an argument to install them system-wide.)
    
    I'm not proposing this PR gets extended to support Pipfiles, I just wanted 
to ask if this has been considered and is on the roadmap, since it seems to be 
the successor to requirements.txt.
    
    (We stumbled upon this as we were thinking of moving to Kubernetes and 
didn't know how dependencies were handled there [they aren't, yet, see #21092]. 
We could install dependencies in our target Docker images using Pipfiles, but 
submitting a Pipfile with our individual jobs would be a much cleaner solution.)
    
    Thanks!



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

Reply via email to