Github user kokes commented on the issue:
https://github.com/apache/spark/pull/13599
Hi, thanks for all the work on this! I see requirements.txt mentioned here
and there and, browsing this and other JIRAs, it seems to be the proposed way
to specify dependencies in PySpark. As you probably know, the community has
rallied around [Pipfile](https://github.com/pypa/pipfile)s as a replacement for
requirements.txt.
This has a few upsides (including a lock file), the main one being that the
reference implementation ([Pipenv](http://pipenv.org/)) allows for installing
packages into a new virtualenv directly, without having to activate it or run
other commands. So that combines dependency management, reproducibility, and
environment isolation.
(Also, if one doesn't want said packages to be installed in a venv, there's
an argument to install them system-wide.)
I'm not proposing this PR gets extended to support Pipfiles, I just wanted
to ask if this has been considered and is on the roadmap, since it seems to be
the successor to requirements.txt.
(We stumbled upon this as we were thinking of moving to Kubernetes and
didn't know how dependencies were handled there [they aren't, yet, see #21092].
We could install dependencies in our target Docker images using Pipfiles, but
submitting a Pipfile with our individual jobs would be a much cleaner solution.)
Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]