[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20962 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r179664682 --- Diff: python/pyspark/sql/column.py --- @@ -450,21 +450,69 @@ def isin(self, *cols): Returns a sort expression based on the ascending order of the given column name >>> from pyspark.sql import Row ->>> df = spark.createDataFrame([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]) +>>> df = spark.createDataFrame([('Tom', 80), ('Alice', None)], ["name", "height"]) >>> df.select(df.name).orderBy(df.name.asc()).collect() [Row(name=u'Alice'), Row(name=u'Tom')] """ +_asc_nulls_first_doc = """ +Returns a sort expression based on the ascending order of the given column name and null values --- End diff -- very small nit: shall we exactly match the doc with Scala side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r179664441 --- Diff: python/pyspark/sql/column.py --- @@ -450,21 +450,69 @@ def isin(self, *cols): Returns a sort expression based on the ascending order of the given column name >>> from pyspark.sql import Row ->>> df = spark.createDataFrame([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]) +>>> df = spark.createDataFrame([('Tom', 80), ('Alice', None)], ["name", "height"]) --- End diff -- very small nit `"` -> `'` or `'` -> `"`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r179292870 --- Diff: python/pyspark/sql/functions.py --- @@ -87,7 +87,15 @@ def _(): 'col': 'Returns a :class:`Column` based on the given column name.', 'column': 'Returns a :class:`Column` based on the given column name.', 'asc': 'Returns a sort expression based on the ascending order of the given column name.', +'asc_nulls_first': 'Returns a sort expression based on the ascending order of the given' + + ' column name, and null values return before non-null values.', +'asc_nulls_last': 'Returns a sort expression based on the ascending order of the given' + + ' column name, and null values appear after non-null values.', 'desc': 'Returns a sort expression based on the descending order of the given column name.', +'desc_nulls_first': 'Returns a sort expression based on the descending order of the given' + +' column name, and null values appear before non-null values.', +'desc_nulls_last': 'Returns a sort expression based on the descending order of the given' + + ' column name, and null values appear after non-null values', --- End diff -- Yes, I manually tested the changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r179043803 --- Diff: python/pyspark/sql/functions.py --- @@ -87,7 +87,15 @@ def _(): 'col': 'Returns a :class:`Column` based on the given column name.', 'column': 'Returns a :class:`Column` based on the given column name.', 'asc': 'Returns a sort expression based on the ascending order of the given column name.', +'asc_nulls_first': 'Returns a sort expression based on the ascending order of the given' + + ' column name, and null values return before non-null values.', +'asc_nulls_last': 'Returns a sort expression based on the ascending order of the given' + + ' column name, and null values appear after non-null values.', 'desc': 'Returns a sort expression based on the descending order of the given column name.', +'desc_nulls_first': 'Returns a sort expression based on the descending order of the given' + +' column name, and null values appear before non-null values.', +'desc_nulls_last': 'Returns a sort expression based on the descending order of the given' + + ' column name, and null values appear after non-null values', --- End diff -- I think you can make another ``` _functions_2_4 = { ... ``` and ``` for _name, _doc in _functions_2_4.items(): globals()[_name] = since(2.4)(_create_function(_name, _doc)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r179043081 --- Diff: python/pyspark/sql/column.py --- @@ -454,6 +454,32 @@ def isin(self, *cols): >>> df.select(df.name).orderBy(df.name.asc()).collect() [Row(name=u'Alice'), Row(name=u'Tom')] """ +_asc_nulls_first_doc = """ +Returns a sort expression based on the ascending order of the given column name and null values +return before non-null values + +>>> from pyspark.sql import Row +>>> df = spark.createDataFrame([ +... Row(name=u'Tom', height=80), +... Row(name=None, height=None), +... Row(name=u'Alice', height=None) +... ]) +>>> df.select(df.name).orderBy(df.name.asc_nulls_first()).collect() +[Row(name=None), Row(name=u'Alice'), Row(name=u'Tom')] +""" --- End diff -- Can we add `.. versionadded:: 2.4.0` at the end, or do `since(2.4)(ignore_unicode_prefix(_unary_op("asc_nulls_first", _asc_nulls_first_doc)))` below? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r179044620 --- Diff: python/pyspark/sql/functions.py --- @@ -87,7 +87,15 @@ def _(): 'col': 'Returns a :class:`Column` based on the given column name.', 'column': 'Returns a :class:`Column` based on the given column name.', 'asc': 'Returns a sort expression based on the ascending order of the given column name.', +'asc_nulls_first': 'Returns a sort expression based on the ascending order of the given' + + ' column name, and null values return before non-null values.', +'asc_nulls_last': 'Returns a sort expression based on the ascending order of the given' + + ' column name, and null values appear after non-null values.', 'desc': 'Returns a sort expression based on the descending order of the given column name.', +'desc_nulls_first': 'Returns a sort expression based on the descending order of the given' + +' column name, and null values appear before non-null values.', +'desc_nulls_last': 'Returns a sort expression based on the descending order of the given' + + ' column name, and null values appear after non-null values', --- End diff -- Just for sure, it's manually tested, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r179044376 --- Diff: python/pyspark/sql/column.py --- @@ -462,9 +488,39 @@ def isin(self, *cols): >>> df.select(df.name).orderBy(df.name.desc()).collect() [Row(name=u'Tom'), Row(name=u'Alice')] """ +_desc_nulls_first_doc = """ +Returns a sort expression based on the descending order of the given column name and null values +return before non-null values + +>>> from pyspark.sql import Row +>>> df = spark.createDataFrame([ +... Row(name=u'Tom', height=80), +... Row(name=None, height=None), +... Row(name=u'Alice', height=None) +... ]) --- End diff -- I think, things like `spark.createDataFrame([('Alice', 1)], ['name', 'height'])` would be simpler. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r178957855 --- Diff: python/pyspark/sql/column.py --- @@ -454,6 +454,32 @@ def isin(self, *cols): >>> df.select(df.name).orderBy(df.name.asc()).collect() [Row(name=u'Alice'), Row(name=u'Tom')] """ +_asc_nulls_first_doc = """ +Returns a sort expression based on the ascending order of the given column name and null values +return before non-null values + +>>> from pyspark.sql import Row --- End diff -- @HyukjinKwon Thank you very much for reviewing my changes! I verified the doctest using ``` SPARK_TESTING=1 ./bin/pyspark pyspark.sql.column ``` It ran OK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r178692018 --- Diff: python/pyspark/sql/column.py --- @@ -454,6 +454,32 @@ def isin(self, *cols): >>> df.select(df.name).orderBy(df.name.asc()).collect() [Row(name=u'Alice'), Row(name=u'Tom')] """ +_asc_nulls_first_doc = """ +Returns a sort expression based on the ascending order of the given column name and null values +return before non-null values + +>>> from pyspark.sql import Row --- End diff -- @huaxingao, btw, mind if I ask to double check if this doctest is actually ran? I remember I checked this several times and feel sure it runs but you know .. just in case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r178691892 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -1083,10 +1083,10 @@ class Column(val expr: Expression) extends Logging { * and null values return before non-null values. * {{{ * // Scala: sort a DataFrame by age column in ascending order and null values appearing first. - * df.sort(df("age").asc_nulls_last) + * df.sort(df("age").asc_nulls_first) * * // Java - * df.sort(df.col("age").asc_nulls_last()); + * df.sort(df.col("age").asc_nulls_first()); --- End diff -- nice --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/20962 [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, asc_nulls_last to PySpark ## What changes were proposed in this pull request? Column.scala and Functions.scala have asc_nulls_first, asc_nulls_last, desc_nulls_first and desc_nulls_last. Add the corresponding python APIs in column.py and functions.py ## How was this patch tested? Add doctest You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-23847 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20962.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20962 commit 8007902c68f3cb36c1b625a96eda4232d03f8eb2 Author: Huaxin Gao Date: 2018-04-02T18:08:04Z [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, asc_nulls_last to PySpark --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org