[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

gatorsmile Fri, 27 Oct 2017 10:05:58 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147464434
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position 
(not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily 
matched with the
    +   * fields in the typed objects in a Dataset. This function resolves 
columns by their positions
    +   * in the schema, not the fields in the typed objects, as this Scala 
example shows:
    +   *
    +   * {{{
    +   *   case class Test(a: String, b: String)
    +   *   val ds1 = Seq(("a", "b")).toDF("a", "b").as[Test] // ds1's schema: 
[a: String, b: String]
    +   *   val ds2 = Seq(("b", "a")).toDF("b", "a").as[Test] // ds2's schema: 
[b: String, a: String]
    +   *   ds1.union(ds2).show
    +   *
    +   *   // output:
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // |  b|  a|
    +   *   // +---+---+
    --- End diff --
    
    Please use the same example as `union `. Just need to add a comment to 
explain it is also applicable to the strongly-typed JVM objects.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Reply via email to