[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20089 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20089#discussion_r158799303 --- Diff: python/README.md --- @@ -29,4 +29,4 @@ The Python packaging for Spark is not intended to replace all of the other use c ## Python Requirements -At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages have their own requirements (including numpy and pandas). +At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages might have their own requirements declared as "Extras" (including numpy, pandas, and pyarrow). You can install the requirements by specifying their extra names. --- End diff -- Let's use the simple one you suggested and leave the detailed description for the future prs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20089#discussion_r158797489 --- Diff: python/README.md --- @@ -29,4 +29,4 @@ The Python packaging for Spark is not intended to replace all of the other use c ## Python Requirements -At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages have their own requirements (including numpy and pandas). +At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages might have their own requirements declared as "Extras" (including numpy, pandas, and pyarrow). You can install the requirements by specifying their extra names. --- End diff -- Not a big deal anyway. I am actually fine as is too if you prefer @ueshin. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20089#discussion_r158796151 --- Diff: python/README.md --- @@ -29,4 +29,4 @@ The Python packaging for Spark is not intended to replace all of the other use c ## Python Requirements -At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages have their own requirements (including numpy and pandas). +At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages might have their own requirements declared as "Extras" (including numpy, pandas, and pyarrow). You can install the requirements by specifying their extra names. --- End diff -- Ah, I see. How about simply: ``` At its core PySpark depends on Py4J (currently version 0.10.6), but some additional sub-packages have their own extra requirements for some features (including numpy, pandas, and pyarrow). ``` for now? I just noticed we are a bit unclear on this (e.g., actually I have been under impression that NumPy is required for ML/MLlib so far) but I think this roughly describes it correctly and is good enough. Will maybe try to make a followup to fully describe it later. This PR targets PyArrow anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20089#discussion_r158789947 --- Diff: python/README.md --- @@ -29,4 +29,4 @@ The Python packaging for Spark is not intended to replace all of the other use c ## Python Requirements -At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages have their own requirements (including numpy and pandas). +At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages have their own requirements (including numpy, pandas, and pyarrow). --- End diff -- I added some more details. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20089#discussion_r158774077 --- Diff: python/README.md --- @@ -29,4 +29,4 @@ The Python packaging for Spark is not intended to replace all of the other use c ## Python Requirements -At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages have their own requirements (including numpy and pandas). +At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages have their own requirements (including numpy, pandas, and pyarrow). --- End diff -- Yea, Pandas and PyArrow are optional. Maybe, it's nicer if we have some more details here too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20089#discussion_r158773775 --- Diff: python/setup.py --- @@ -201,7 +201,7 @@ def _supports_symlinks(): extras_require={ 'ml': ['numpy>=1.7'], 'mllib': ['numpy>=1.7'], -'sql': ['pandas>=0.19.2'] +'sql': ['pandas>=0.19.2', 'pyarrow>=0.8.0'] --- End diff -- Nope, `extras_require` does not do anything in normal cases but they can be installed together with a dev option via pip IIRC. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20089#discussion_r158773551 --- Diff: python/setup.py --- @@ -201,7 +201,7 @@ def _supports_symlinks(): extras_require={ 'ml': ['numpy>=1.7'], 'mllib': ['numpy>=1.7'], -'sql': ['pandas>=0.19.2'] +'sql': ['pandas>=0.19.2', 'pyarrow>=0.8.0'] --- End diff -- If no pyarrow is installed, will setup force users to install it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20089#discussion_r158773507 --- Diff: python/README.md --- @@ -29,4 +29,4 @@ The Python packaging for Spark is not intended to replace all of the other use c ## Python Requirements -At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages have their own requirements (including numpy and pandas). +At its core PySpark depends on Py4J (currently version 0.10.6), but additional sub-packages have their own requirements (including numpy, pandas, and pyarrow). --- End diff -- This sounds like mandatory, but I think pyarrow is still an optional choice. Right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20089: [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setu...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/20089 [SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setup.py file. ## What changes were proposed in this pull request? This is a follow-up pr of #19884 updating setup.py file to add pyarrow dependency. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-22324/fup1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20089.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20089 commit 36614af4d8e00bb9564ef834a341859a0e96dfe4 Author: Takuya UESHINDate: 2017-12-27T04:33:59Z Add pyarrow to setup.py. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org