[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20254 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161377477 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1364,7 +1364,9 @@ def subtract(self, other): """ Return a new :class:`DataFrame` containing rows in this frame but not in another frame. -This is equivalent to `EXCEPT` in SQL. +This is equivalent to `EXCEPT DISTINCT` in SQL. + +(Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT ALL` in SQL.) --- End diff -- Actually, before 2.0, it is not equivalent to EXCEPT ALL. For details, see the PR: https://github.com/apache/spark/pull/12736 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161377488 --- Diff: R/pkg/R/DataFrame.R --- @@ -2873,6 +2873,7 @@ setMethod("intersect", #' @rdname except #' @export #' @note except since 1.4.0 +#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT DISTINCT} in 2.0. --- End diff -- Nit: This is wrong. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161365422 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1364,7 +1364,9 @@ def subtract(self, other): """ Return a new :class:`DataFrame` containing rows in this frame but not in another frame. -This is equivalent to `EXCEPT` in SQL. +This is equivalent to `EXCEPT DISTINCT` in SQL. + +(Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT ALL` in SQL.) --- End diff -- nit: `2.0` to `2.0.0` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161365371 --- Diff: R/pkg/R/DataFrame.R --- @@ -2873,6 +2873,7 @@ setMethod("intersect", #' @rdname except #' @export #' @note except since 1.4.0 +#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT DISTINCT} in 2.0. --- End diff -- I don't mind it either way, but to note: - r doc order and whitespace is significant, if you use `#' Note:` you must put it after L2856, if you put an extra `#'` ie. empty line that it becomes the `Details` section, which might be the right place; see http://spark.apache.org/docs/latest/api/R/awaitTermination.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161365416 --- Diff: R/pkg/R/DataFrame.R --- @@ -2873,6 +2873,7 @@ setMethod("intersect", #' @rdname except #' @export #' @note except since 1.4.0 +#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT DISTINCT} in 2.0. --- End diff -- ie. ``` #' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT DISTINCT} in SQL. #' #' Note: Before Spark 2.0.0, the behavior was equivalent to `EXCEPT ALL` in SQL. #' #' @param x a SparkDataFrame. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161358664 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1364,7 +1364,9 @@ def subtract(self, other): """ Return a new :class:`DataFrame` containing rows in this frame but not in another frame. -This is equivalent to `EXCEPT` in SQL. +This is equivalent to `EXCEPT DISTINCT` in SQL. + +(Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT ALL` in SQL.) --- End diff -- In PySpark, we can use `.. note:: `. This makes the doc pretty :). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161358593 --- Diff: R/pkg/R/DataFrame.R --- @@ -2873,6 +2873,7 @@ setMethod("intersect", #' @rdname except #' @export #' @note except since 1.4.0 +#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT DISTINCT} in 2.0. --- End diff -- Ur.. I think we have use `@note` for version specification in SparkR. Just adding ``` Note: blabla ``` should be fine like other places. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
GitHub user henryr opened a pull request: https://github.com/apache/spark/pull/20254 [SPARK-23062][SQL] Improve EXCEPT documentation ## What changes were proposed in this pull request? Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more explicit in the documentation, and call out the change in behavior from 1.x. You can merge this pull request into a Git repository by running: $ git pull https://github.com/henryr/spark spark-23062 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20254.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20254 commit 9fe57074b496ad95411c4ce5a43b0c43dd6246af Author: Henry Robinson Date: 2018-01-13T00:17:00Z [SPARK-23062][SQL] Improve EXCEPT documentation ## What changes were proposed in this pull request? Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more explicit in the documentation, and call out the change in behavior from 1.x. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org