[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20962


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r179664682
  
--- Diff: python/pyspark/sql/column.py ---
@@ -450,21 +450,69 @@ def isin(self, *cols):
 Returns a sort expression based on the ascending order of the given 
column name
 
 >>> from pyspark.sql import Row
->>> df = spark.createDataFrame([Row(name=u'Tom', height=80), 
Row(name=u'Alice', height=None)])
+>>> df = spark.createDataFrame([('Tom', 80), ('Alice', None)], 
["name", "height"])
 >>> df.select(df.name).orderBy(df.name.asc()).collect()
 [Row(name=u'Alice'), Row(name=u'Tom')]
 """
+_asc_nulls_first_doc = """
+Returns a sort expression based on the ascending order of the given 
column name and null values
--- End diff --

very small nit: shall we exactly match the doc with Scala side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r179664441
  
--- Diff: python/pyspark/sql/column.py ---
@@ -450,21 +450,69 @@ def isin(self, *cols):
 Returns a sort expression based on the ascending order of the given 
column name
 
 >>> from pyspark.sql import Row
->>> df = spark.createDataFrame([Row(name=u'Tom', height=80), 
Row(name=u'Alice', height=None)])
+>>> df = spark.createDataFrame([('Tom', 80), ('Alice', None)], 
["name", "height"])
--- End diff --

very small nit `"` -> `'` or `'` -> `"`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-04 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r179292870
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -87,7 +87,15 @@ def _():
 'col': 'Returns a :class:`Column` based on the given column name.',
 'column': 'Returns a :class:`Column` based on the given column name.',
 'asc': 'Returns a sort expression based on the ascending order of the 
given column name.',
+'asc_nulls_first': 'Returns a sort expression based on the ascending 
order of the given' +
+   ' column name, and null values return before 
non-null values.',
+'asc_nulls_last': 'Returns a sort expression based on the ascending 
order of the given' +
+  ' column name, and null values appear after non-null 
values.',
 'desc': 'Returns a sort expression based on the descending order of 
the given column name.',
+'desc_nulls_first': 'Returns a sort expression based on the descending 
order of the given' +
+' column name, and null values appear before 
non-null values.',
+'desc_nulls_last': 'Returns a sort expression based on the descending 
order of the given' +
+   ' column name, and null values appear after 
non-null values',
--- End diff --

Yes, I manually tested the changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r179043803
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -87,7 +87,15 @@ def _():
 'col': 'Returns a :class:`Column` based on the given column name.',
 'column': 'Returns a :class:`Column` based on the given column name.',
 'asc': 'Returns a sort expression based on the ascending order of the 
given column name.',
+'asc_nulls_first': 'Returns a sort expression based on the ascending 
order of the given' +
+   ' column name, and null values return before 
non-null values.',
+'asc_nulls_last': 'Returns a sort expression based on the ascending 
order of the given' +
+  ' column name, and null values appear after non-null 
values.',
 'desc': 'Returns a sort expression based on the descending order of 
the given column name.',
+'desc_nulls_first': 'Returns a sort expression based on the descending 
order of the given' +
+' column name, and null values appear before 
non-null values.',
+'desc_nulls_last': 'Returns a sort expression based on the descending 
order of the given' +
+   ' column name, and null values appear after 
non-null values',
--- End diff --

I think you can make another

```
_functions_2_4 = {
...
```

and

```
 for _name, _doc in _functions_2_4.items():
 globals()[_name] = since(2.4)(_create_function(_name, _doc))
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r179043081
  
--- Diff: python/pyspark/sql/column.py ---
@@ -454,6 +454,32 @@ def isin(self, *cols):
 >>> df.select(df.name).orderBy(df.name.asc()).collect()
 [Row(name=u'Alice'), Row(name=u'Tom')]
 """
+_asc_nulls_first_doc = """
+Returns a sort expression based on the ascending order of the given 
column name and null values
+return before non-null values
+
+>>> from pyspark.sql import Row
+>>> df = spark.createDataFrame([
+... Row(name=u'Tom', height=80),
+... Row(name=None, height=None),
+... Row(name=u'Alice', height=None)
+... ])
+>>> df.select(df.name).orderBy(df.name.asc_nulls_first()).collect()
+[Row(name=None), Row(name=u'Alice'), Row(name=u'Tom')]
+"""
--- End diff --

Can we add `.. versionadded:: 2.4.0` at the end, or do 
`since(2.4)(ignore_unicode_prefix(_unary_op("asc_nulls_first", 
_asc_nulls_first_doc)))` below?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r179044620
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -87,7 +87,15 @@ def _():
 'col': 'Returns a :class:`Column` based on the given column name.',
 'column': 'Returns a :class:`Column` based on the given column name.',
 'asc': 'Returns a sort expression based on the ascending order of the 
given column name.',
+'asc_nulls_first': 'Returns a sort expression based on the ascending 
order of the given' +
+   ' column name, and null values return before 
non-null values.',
+'asc_nulls_last': 'Returns a sort expression based on the ascending 
order of the given' +
+  ' column name, and null values appear after non-null 
values.',
 'desc': 'Returns a sort expression based on the descending order of 
the given column name.',
+'desc_nulls_first': 'Returns a sort expression based on the descending 
order of the given' +
+' column name, and null values appear before 
non-null values.',
+'desc_nulls_last': 'Returns a sort expression based on the descending 
order of the given' +
+   ' column name, and null values appear after 
non-null values',
--- End diff --

Just for sure, it's manually tested, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r179044376
  
--- Diff: python/pyspark/sql/column.py ---
@@ -462,9 +488,39 @@ def isin(self, *cols):
 >>> df.select(df.name).orderBy(df.name.desc()).collect()
 [Row(name=u'Tom'), Row(name=u'Alice')]
 """
+_desc_nulls_first_doc = """
+Returns a sort expression based on the descending order of the given 
column name and null values
+return before non-null values
+
+>>> from pyspark.sql import Row
+>>> df = spark.createDataFrame([
+... Row(name=u'Tom', height=80),
+... Row(name=None, height=None),
+... Row(name=u'Alice', height=None)
+... ])
--- End diff --

I think, things like `spark.createDataFrame([('Alice', 1)], ['name', 
'height'])` would be simpler.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-03 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r178957855
  
--- Diff: python/pyspark/sql/column.py ---
@@ -454,6 +454,32 @@ def isin(self, *cols):
 >>> df.select(df.name).orderBy(df.name.asc()).collect()
 [Row(name=u'Alice'), Row(name=u'Tom')]
 """
+_asc_nulls_first_doc = """
+Returns a sort expression based on the ascending order of the given 
column name and null values
+return before non-null values
+
+>>> from pyspark.sql import Row
--- End diff --

@HyukjinKwon Thank you very much for reviewing my changes! I verified the 
doctest using 
```
SPARK_TESTING=1 ./bin/pyspark pyspark.sql.column
```
It ran OK 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r178692018
  
--- Diff: python/pyspark/sql/column.py ---
@@ -454,6 +454,32 @@ def isin(self, *cols):
 >>> df.select(df.name).orderBy(df.name.asc()).collect()
 [Row(name=u'Alice'), Row(name=u'Tom')]
 """
+_asc_nulls_first_doc = """
+Returns a sort expression based on the ascending order of the given 
column name and null values
+return before non-null values
+
+>>> from pyspark.sql import Row
--- End diff --

@huaxingao, btw, mind if I ask to double check if this doctest is actually 
ran? I remember I checked this several times and feel sure it runs but you know 
.. just in case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20962#discussion_r178691892
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -1083,10 +1083,10 @@ class Column(val expr: Expression) extends Logging {
* and null values return before non-null values.
* {{{
*   // Scala: sort a DataFrame by age column in ascending order and 
null values appearing first.
-   *   df.sort(df("age").asc_nulls_last)
+   *   df.sort(df("age").asc_nulls_first)
*
*   // Java
-   *   df.sort(df.col("age").asc_nulls_last());
+   *   df.sort(df.col("age").asc_nulls_first());
--- End diff --

nice


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-02 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/20962

[SPARK-23847][PYTHON][SQL]Add asc_nulls_first, asc_nulls_last to PySpark

## What changes were proposed in this pull request?

Column.scala and Functions.scala have asc_nulls_first, asc_nulls_last,  
desc_nulls_first and desc_nulls_last. Add the corresponding python APIs in 
column.py and functions.py

## How was this patch tested?
Add doctest


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-23847

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20962


commit 8007902c68f3cb36c1b625a96eda4232d03f8eb2
Author: Huaxin Gao 
Date:   2018-04-02T18:08:04Z

[SPARK-23847][PYTHON][SQL]Add asc_nulls_first, asc_nulls_last to PySpark




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org