spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position

2017-06-09 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 869af5bcb -> 815a0820b


[SPARK-21042][SQL] Document Dataset.union is resolution by position

## What changes were proposed in this pull request?
Document Dataset.union is resolution by position, not by name, since this has 
been a confusing point for a lot of users.

## How was this patch tested?
N/A - doc only change.

Author: Reynold Xin 

Closes #18256 from rxin/SPARK-21042.

(cherry picked from commit b78e3849b20d0d09b7146efd7ce8f203ef67b890)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/815a0820
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/815a0820
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/815a0820

Branch: refs/heads/branch-2.2
Commit: 815a0820b1808118ae198a44f4aa0f0f2b6511e6
Parents: 869af5b
Author: Reynold Xin 
Authored: Fri Jun 9 18:29:33 2017 -0700
Committer: Reynold Xin 
Committed: Fri Jun 9 18:29:39 2017 -0700

--
 R/pkg/R/DataFrame.R   |  1 +
 python/pyspark/sql/dataframe.py   | 13 +
 .../src/main/scala/org/apache/spark/sql/Dataset.scala | 14 --
 3 files changed, 18 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index a7b1e3b..b606f1f 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2642,6 +2642,7 @@ generateAliasesForIntersectedCols <- function (x, 
intersectedColNames, suffix) {
 #' Input SparkDataFrames can have different schemas (names and data types).
 #'
 #' Note: This does not remove duplicate rows across the two SparkDataFrames.
+#' Also as standard in SQL, this function resolves columns by position (not by 
name).
 #'
 #' @param x A SparkDataFrame
 #' @param y A SparkDataFrame

http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index b1eb80e..d1b336d 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1166,18 +1166,23 @@ class DataFrame(object):
 
 @since(2.0)
 def union(self, other):
-""" Return a new :class:`DataFrame` containing union of rows in this
-frame and another frame.
+""" Return a new :class:`DataFrame` containing union of rows in this 
and another frame.
 
 This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
 (that does deduplication of elements), use this function followed by a 
distinct.
+
+Also as standard in SQL, this function resolves columns by position 
(not by name).
 """
 return DataFrame(self._jdf.union(other._jdf), self.sql_ctx)
 
 @since(1.3)
 def unionAll(self, other):
-""" Return a new :class:`DataFrame` containing union of rows in this
-frame and another frame.
+""" Return a new :class:`DataFrame` containing union of rows in this 
and another frame.
+
+This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
+(that does deduplication of elements), use this function followed by a 
distinct.
+
+Also as standard in SQL, this function resolves columns by position 
(not by name).
 
 .. note:: Deprecated in 2.0, use union instead.
 """

http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index f37d433..3658890 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1630,10 +1630,11 @@ class Dataset[T] private[sql](
 
   /**
* Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
-   * This is equivalent to `UNION ALL` in SQL.
*
-   * To do a SQL-style set union (that does deduplication of elements), use 
this function followed
-   * by a [[distinct]].
+   * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union 
(that does
+   * deduplication of elements), use this function followed by a [[distinct]].
+   *
+   * Also as standard in SQL, this function resolves columns by position (not 
by name).
*
* @group typedrel
* @since 2.0.0
@@ -1643,10 +1644,11 @@ class Dataset[T] 

spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position

2017-06-09 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 571635488 -> b78e3849b


[SPARK-21042][SQL] Document Dataset.union is resolution by position

## What changes were proposed in this pull request?
Document Dataset.union is resolution by position, not by name, since this has 
been a confusing point for a lot of users.

## How was this patch tested?
N/A - doc only change.

Author: Reynold Xin 

Closes #18256 from rxin/SPARK-21042.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b78e3849
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b78e3849
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b78e3849

Branch: refs/heads/master
Commit: b78e3849b20d0d09b7146efd7ce8f203ef67b890
Parents: 5716354
Author: Reynold Xin 
Authored: Fri Jun 9 18:29:33 2017 -0700
Committer: Reynold Xin 
Committed: Fri Jun 9 18:29:33 2017 -0700

--
 R/pkg/R/DataFrame.R   |  1 +
 python/pyspark/sql/dataframe.py   | 13 +
 .../src/main/scala/org/apache/spark/sql/Dataset.scala | 14 --
 3 files changed, 18 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 166b398..3b9d42d 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2646,6 +2646,7 @@ generateAliasesForIntersectedCols <- function (x, 
intersectedColNames, suffix) {
 #' Input SparkDataFrames can have different schemas (names and data types).
 #'
 #' Note: This does not remove duplicate rows across the two SparkDataFrames.
+#' Also as standard in SQL, this function resolves columns by position (not by 
name).
 #'
 #' @param x A SparkDataFrame
 #' @param y A SparkDataFrame

http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 99abfcc..8541403 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1175,18 +1175,23 @@ class DataFrame(object):
 
 @since(2.0)
 def union(self, other):
-""" Return a new :class:`DataFrame` containing union of rows in this
-frame and another frame.
+""" Return a new :class:`DataFrame` containing union of rows in this 
and another frame.
 
 This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
 (that does deduplication of elements), use this function followed by a 
distinct.
+
+Also as standard in SQL, this function resolves columns by position 
(not by name).
 """
 return DataFrame(self._jdf.union(other._jdf), self.sql_ctx)
 
 @since(1.3)
 def unionAll(self, other):
-""" Return a new :class:`DataFrame` containing union of rows in this
-frame and another frame.
+""" Return a new :class:`DataFrame` containing union of rows in this 
and another frame.
+
+This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
+(that does deduplication of elements), use this function followed by a 
distinct.
+
+Also as standard in SQL, this function resolves columns by position 
(not by name).
 
 .. note:: Deprecated in 2.0, use union instead.
 """

http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index f7637e0..d28ff78 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1734,10 +1734,11 @@ class Dataset[T] private[sql](
 
   /**
* Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
-   * This is equivalent to `UNION ALL` in SQL.
*
-   * To do a SQL-style set union (that does deduplication of elements), use 
this function followed
-   * by a [[distinct]].
+   * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union 
(that does
+   * deduplication of elements), use this function followed by a [[distinct]].
+   *
+   * Also as standard in SQL, this function resolves columns by position (not 
by name).
*
* @group typedrel
* @since 2.0.0
@@ -1747,10 +1748,11 @@ class Dataset[T] private[sql](
 
   /**
* Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
-   * This is