Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47871420
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
    @@ -48,11 +48,13 @@ class DataFrameJoinSuite extends QueryTest with 
SharedSQLContext {
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "left"),
    -      Row(1, 2, "1", null) :: Row(2, 3, "2", null) :: Row(3, 4, "3", null) 
:: Nil)
    +      Row(1, 2, "1", null, null, null) :: Row(2, 3, "2", null, null, null) 
::
    +        Row(3, 4, "3", null, null, null) :: Nil)
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "right"),
    -      Row(null, null, null, 2) :: Row(null, null, null, 3) :: Row(null, 
null, null, 4) :: Nil)
    +      Row(null, null, null, 1, 2, "2") :: Row(null, null, null, 2, 3, "3") 
::
    +        Row(null, null, null, 3, 4, "4") :: Nil)
    --- End diff --
    
    We probably shouldn't show join keys multiple times in the result set. For 
`LEFT/RIGHT JOIN USING` queries, both PostgreSQL and MySQL show join keys only 
once. ScalaDoc of this overloaded `DataFrame.join` method also has similar 
description:
    
    ```scala
    
      /**
       * Equi-join with another [[DataFrame]] using the given columns.
       *
       * Different from other join functions, the join columns will only appear 
once in the output,
       * i.e. similar to SQL's `JOIN USING` syntax.
       ...
       */
    ```
    
    The following example comes from [PostgreSQL docs][1] (section 7.2.1.1):
    
    ```sql
    CREATE TABLE t1 (num INT, name TEXT);
    INSERT INTO t1 VALUES (1, 'a');
    INSERT INTO t1 VALUES (2, 'b');
    INSERT INTO t1 VALUES (3, 'c');
    
    CREATE TABLE t2 (num INT, value TEXT);
    INSERT INTO T2 VALUES (1, 'xxx');
    INSERT INTO t2 VALUES (3, 'yyy');
    INSERT INTO t2 VALUES (5, 'zzz');
    
    SELECT * FROM t1 LEFT JOIN t2 USING (num);
    ```
    
    PostgreSQL results in:
    
    ```
     num | name | value
    -----+------+-------
       1 | a    | xxx
       2 | b    |
       3 | c    | yyy
    (3 rows)
    ```
    
    and MySQL results in:
    
    ```
    +------+------+-------+
    | num  | name | value |
    +------+------+-------+
    |    1 | a    | xxx   |
    |    2 | b    | NULL  |
    |    3 | c    | yyy   |
    +------+------+-------+
    3 rows in set (0.01 sec)
    ```
    
    [1]: 
http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to