[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20254


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20254#discussion_r161377477
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1364,7 +1364,9 @@ def subtract(self, other):
 """ Return a new :class:`DataFrame` containing rows in this frame
 but not in another frame.
 
-This is equivalent to `EXCEPT` in SQL.
+This is equivalent to `EXCEPT DISTINCT` in SQL.
+
+(Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT 
ALL` in SQL.)
--- End diff --

Actually, before 2.0, it is not equivalent to EXCEPT ALL. For details, see 
the PR: https://github.com/apache/spark/pull/12736


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20254#discussion_r161377488
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2873,6 +2873,7 @@ setMethod("intersect",
 #' @rdname except
 #' @export
 #' @note except since 1.4.0
+#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT 
DISTINCT} in 2.0.
--- End diff --

Nit: This is wrong.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20254#discussion_r161365422
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1364,7 +1364,9 @@ def subtract(self, other):
 """ Return a new :class:`DataFrame` containing rows in this frame
 but not in another frame.
 
-This is equivalent to `EXCEPT` in SQL.
+This is equivalent to `EXCEPT DISTINCT` in SQL.
+
+(Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT 
ALL` in SQL.)
--- End diff --

nit: `2.0` to `2.0.0`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20254#discussion_r161365371
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2873,6 +2873,7 @@ setMethod("intersect",
 #' @rdname except
 #' @export
 #' @note except since 1.4.0
+#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT 
DISTINCT} in 2.0.
--- End diff --

I don't mind it either way, but to note:
- r doc order and whitespace is significant, if you use `#' Note:` you must 
put it after L2856, if you put an extra `#'` ie. empty line that it becomes the 
`Details` section, which might be the right place; see 
http://spark.apache.org/docs/latest/api/R/awaitTermination.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20254#discussion_r161365416
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2873,6 +2873,7 @@ setMethod("intersect",
 #' @rdname except
 #' @export
 #' @note except since 1.4.0
+#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT 
DISTINCT} in 2.0.
--- End diff --

ie.
```
#' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT 
DISTINCT} in SQL.
#'
#' Note: Before Spark 2.0.0, the behavior was equivalent to `EXCEPT ALL` in 
SQL.
#'
#' @param x a SparkDataFrame.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20254#discussion_r161358664
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1364,7 +1364,9 @@ def subtract(self, other):
 """ Return a new :class:`DataFrame` containing rows in this frame
 but not in another frame.
 
-This is equivalent to `EXCEPT` in SQL.
+This is equivalent to `EXCEPT DISTINCT` in SQL.
+
+(Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT 
ALL` in SQL.)
--- End diff --

In PySpark, we can use `.. note:: `. This makes the doc pretty :).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20254#discussion_r161358593
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2873,6 +2873,7 @@ setMethod("intersect",
 #' @rdname except
 #' @export
 #' @note except since 1.4.0
+#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT 
DISTINCT} in 2.0.
--- End diff --

Ur.. I think we have use `@note` for version specification in SparkR. Just 
adding

```
Note: blabla
```
should be fine like other places.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-12 Thread henryr
GitHub user henryr opened a pull request:

https://github.com/apache/spark/pull/20254

[SPARK-23062][SQL] Improve EXCEPT documentation

## What changes were proposed in this pull request?

Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation, and call out the change in behavior
from 1.x.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/henryr/spark spark-23062

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20254.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20254


commit 9fe57074b496ad95411c4ce5a43b0c43dd6246af
Author: Henry Robinson 
Date:   2018-01-13T00:17:00Z

[SPARK-23062][SQL] Improve EXCEPT documentation

## What changes were proposed in this pull request?

Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation, and call out the change in behavior
from 1.x.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org