[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user minrk commented on the issue: https://github.com/apache/spark/pull/15659 Awesome, thanks @holdenk! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user minrk commented on the issue: https://github.com/apache/spark/pull/15659 Sounds great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user minrk commented on the issue: https://github.com/apache/spark/pull/15659 > Any ideas on how we could follow up and support that install pattern? To go all the way to the two commands I expected to work based on Python tradition: ```bash ./build/mvn -DskipTests clean package cd python && pip install . ``` I believe you would need to copy (or hardlink or symlink-to-absolute-path) the jars into the Python directory during `build\mvn`, since `pip` copies the whole `python` directory to a temp location before running your setup.py. One step short of this would be to allow the `stage-jars` to be an explicit step, rather than a temporary staging in setup.py that cleans up after itself. Then you would have one small extra step and the full command would be: ```bash ./build/mvn cd python python setup.py stage_jars # ./stage-jars.sh, whatever you prefer pip install . ``` and the error message when jars are missing could point to the extra step. Yet another option, and the least work from what you have now, could be to put the full sequence in your missing-jars error message: > If you are installing pyspark from the spark source, you must build spark and run sdist first: > ``` > ./build/mvn -DskipTests clean package > cd python > python setup.py sdist > pip install dist/*.tar.gz > ``` I think most people wouldn't discover that without very explicit help, because they would need to know about both the temporary symlinks and pip's tempdir to work it out. Not being able to do `pip install .` is a bit odd for a Python package, as that is the standard command for installing any Python package from source (it's not `python setup.py install` anymore). But if you provide something copy/pasteable when people do try it, you are in pretty good shape. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user minrk commented on the issue: https://github.com/apache/spark/pull/15659 I downloaded this branch and ran this to test (macOS, Python 3.5): ```bash build/mvn -DskipTests clean package cd python python setup.py sdist pip install dist/*.tar.gz ``` and everything worked great. Yay! What didn't work (which I tried first) was: pip install . which is what I'm used to for installing a Python package from source, because pip copies the python package directory to a temp location before running the installation. Since this happens before symlinking the jars, they don't get found. I'm not sure how big of a deal that is, since it won't affect stable-release installation, but it is idiomatic for Python projects. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org