spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position
Repository: spark Updated Branches: refs/heads/branch-2.2 869af5bcb -> 815a0820b [SPARK-21042][SQL] Document Dataset.union is resolution by position ## What changes were proposed in this pull request? Document Dataset.union is resolution by position, not by name, since this has been a confusing point for a lot of users. ## How was this patch tested? N/A - doc only change. Author: Reynold XinCloses #18256 from rxin/SPARK-21042. (cherry picked from commit b78e3849b20d0d09b7146efd7ce8f203ef67b890) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/815a0820 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/815a0820 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/815a0820 Branch: refs/heads/branch-2.2 Commit: 815a0820b1808118ae198a44f4aa0f0f2b6511e6 Parents: 869af5b Author: Reynold Xin Authored: Fri Jun 9 18:29:33 2017 -0700 Committer: Reynold Xin Committed: Fri Jun 9 18:29:39 2017 -0700 -- R/pkg/R/DataFrame.R | 1 + python/pyspark/sql/dataframe.py | 13 + .../src/main/scala/org/apache/spark/sql/Dataset.scala | 14 -- 3 files changed, 18 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index a7b1e3b..b606f1f 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -2642,6 +2642,7 @@ generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) { #' Input SparkDataFrames can have different schemas (names and data types). #' #' Note: This does not remove duplicate rows across the two SparkDataFrames. +#' Also as standard in SQL, this function resolves columns by position (not by name). #' #' @param x A SparkDataFrame #' @param y A SparkDataFrame http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/python/pyspark/sql/dataframe.py -- diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index b1eb80e..d1b336d 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -1166,18 +1166,23 @@ class DataFrame(object): @since(2.0) def union(self, other): -""" Return a new :class:`DataFrame` containing union of rows in this -frame and another frame. +""" Return a new :class:`DataFrame` containing union of rows in this and another frame. This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct. + +Also as standard in SQL, this function resolves columns by position (not by name). """ return DataFrame(self._jdf.union(other._jdf), self.sql_ctx) @since(1.3) def unionAll(self, other): -""" Return a new :class:`DataFrame` containing union of rows in this -frame and another frame. +""" Return a new :class:`DataFrame` containing union of rows in this and another frame. + +This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union +(that does deduplication of elements), use this function followed by a distinct. + +Also as standard in SQL, this function resolves columns by position (not by name). .. note:: Deprecated in 2.0, use union instead. """ http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index f37d433..3658890 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1630,10 +1630,11 @@ class Dataset[T] private[sql]( /** * Returns a new Dataset containing union of rows in this Dataset and another Dataset. - * This is equivalent to `UNION ALL` in SQL. * - * To do a SQL-style set union (that does deduplication of elements), use this function followed - * by a [[distinct]]. + * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does + * deduplication of elements), use this function followed by a [[distinct]]. + * + * Also as standard in SQL, this function resolves columns by position (not by name). * * @group typedrel * @since 2.0.0 @@ -1643,10 +1644,11 @@ class Dataset[T]
spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position
Repository: spark Updated Branches: refs/heads/master 571635488 -> b78e3849b [SPARK-21042][SQL] Document Dataset.union is resolution by position ## What changes were proposed in this pull request? Document Dataset.union is resolution by position, not by name, since this has been a confusing point for a lot of users. ## How was this patch tested? N/A - doc only change. Author: Reynold XinCloses #18256 from rxin/SPARK-21042. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b78e3849 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b78e3849 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b78e3849 Branch: refs/heads/master Commit: b78e3849b20d0d09b7146efd7ce8f203ef67b890 Parents: 5716354 Author: Reynold Xin Authored: Fri Jun 9 18:29:33 2017 -0700 Committer: Reynold Xin Committed: Fri Jun 9 18:29:33 2017 -0700 -- R/pkg/R/DataFrame.R | 1 + python/pyspark/sql/dataframe.py | 13 + .../src/main/scala/org/apache/spark/sql/Dataset.scala | 14 -- 3 files changed, 18 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 166b398..3b9d42d 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -2646,6 +2646,7 @@ generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) { #' Input SparkDataFrames can have different schemas (names and data types). #' #' Note: This does not remove duplicate rows across the two SparkDataFrames. +#' Also as standard in SQL, this function resolves columns by position (not by name). #' #' @param x A SparkDataFrame #' @param y A SparkDataFrame http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/python/pyspark/sql/dataframe.py -- diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 99abfcc..8541403 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -1175,18 +1175,23 @@ class DataFrame(object): @since(2.0) def union(self, other): -""" Return a new :class:`DataFrame` containing union of rows in this -frame and another frame. +""" Return a new :class:`DataFrame` containing union of rows in this and another frame. This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct. + +Also as standard in SQL, this function resolves columns by position (not by name). """ return DataFrame(self._jdf.union(other._jdf), self.sql_ctx) @since(1.3) def unionAll(self, other): -""" Return a new :class:`DataFrame` containing union of rows in this -frame and another frame. +""" Return a new :class:`DataFrame` containing union of rows in this and another frame. + +This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union +(that does deduplication of elements), use this function followed by a distinct. + +Also as standard in SQL, this function resolves columns by position (not by name). .. note:: Deprecated in 2.0, use union instead. """ http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index f7637e0..d28ff78 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1734,10 +1734,11 @@ class Dataset[T] private[sql]( /** * Returns a new Dataset containing union of rows in this Dataset and another Dataset. - * This is equivalent to `UNION ALL` in SQL. * - * To do a SQL-style set union (that does deduplication of elements), use this function followed - * by a [[distinct]]. + * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does + * deduplication of elements), use this function followed by a [[distinct]]. + * + * Also as standard in SQL, this function resolves columns by position (not by name). * * @group typedrel * @since 2.0.0 @@ -1747,10 +1748,11 @@ class Dataset[T] private[sql]( /** * Returns a new Dataset containing union of rows in this Dataset and another Dataset. - * This is