Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19570#discussion_r147464434
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
*
* Also as standard in SQL, this function resolves columns by position
(not by name).
*
+ * Notice that the column positions in the schema aren't necessarily
matched with the
+ * fields in the typed objects in a Dataset. This function resolves
columns by their positions
+ * in the schema, not the fields in the typed objects, as this Scala
example shows:
+ *
+ * {{{
+ * case class Test(a: String, b: String)
+ * val ds1 = Seq(("a", "b")).toDF("a", "b").as[Test] // ds1's schema:
[a: String, b: String]
+ * val ds2 = Seq(("b", "a")).toDF("b", "a").as[Test] // ds2's schema:
[b: String, a: String]
+ * ds1.union(ds2).show
+ *
+ * // output:
+ * // +---+---+
+ * // | a| b|
+ * // +---+---+
+ * // | a| b|
+ * // | b| a|
+ * // +---+---+
--- End diff --
Please use the same example as `union `. Just need to add a comment to
explain it is also applicable to the strongly-typed JVM objects.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]