Github user Stibbons commented on the issue:
https://github.com/apache/spark/pull/14963
I would love to have a bit more feedback on this matter but it does not
seem to interest core developers, sadly :(
It's a bit disappointing, seeing how Python support on Spark is great,
being able to deploy job as easy as with java (i.e., developers prepares the
job + describes all the dependencies independently) would be so useful for
Spark. For the moment, we have to ask our IT guys to install a given python
module on each Executor when one needs a new one. We found a way using a nfs
share but this is not convient. Automatic virtualenv creation with pip
installing all dependencies described in a requirements.txt + support for
wheels and python distribution package (sdist or bdist) is so useful and
scalable. Each job can use different libraries and even the same library with
different version, just like what can occur for java jars with the --packages
arguments.
I think of starting maintaining a fork of Spark with latest version of the
source code, I would call it "Python Friendly Spark".
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]