Github user nsyca commented on the issue:

    https://github.com/apache/spark/pull/14580
  
    
    @dongjoon-hyun, My apologies on the terse comment I put in previously. 
There is nothing wrong with the ```full outer join``` with ```using``` What I 
tried to explain is the ```using``` is a syntactic sugar form of a regular 
```full outer join```. Using your example:
    
    ```
    val a = Seq((1,2),(2,3)).toDF("a","b")
    val b = Seq((2,5),(3,4)).toDF("a","c")
    val ab = a.join(b, Seq("a"), "fullouter")
    scala> ab.explain(true)
    == Parsed Logical Plan ==
    'Join UsingJoin(FullOuter,List('a))
    :- Project [_1#186 AS a#189, _2#187 AS b#190]
    :  +- LocalRelation [_1#186, _2#187]
    +- Project [_1#196 AS a#199, _2#197 AS c#200]
       +- LocalRelation [_1#196, _2#197]
    
    == Analyzed Logical Plan ==
    a: int, b: int, c: int
    Project [coalesce(a#189, a#199) AS a#210, b#190, c#200]
    +- Join FullOuter, (a#189 = a#199)
       :- Project [_1#186 AS a#189, _2#187 AS b#190]
       :  +- LocalRelation [_1#186, _2#187]
       +- Project [_1#196 AS a#199, _2#197 AS c#200]
          +- LocalRelation [_1#196, _2#197]
    ...
    ```
    
    @gatorsmile, you can see here that the interpretation of the 
```UsingJoin(...)``` above is a regular ```full outer join``` with the output 
of the join column in the SELECT clause converted to the expression 
```COALESCE(<first-table>.<join-col>, <second-table>.<join-col>)```. The syntax 
```UsingJoin``` is gone after the Analysis phase. 
    
    I found Oracle supports the ```Using``` syntax but it's not clear to me 
that how Oracle interprets the output column(s) in the USING clause. Here is 
what I found from [Oracle's 
website](http://docs.oracle.com/javadb/10.10.1.2/ref/rrefsqljusing.html):
    
    > When a USING clause is specified, an asterisk (*) in the select list of 
the query will be expanded to the following list of columns (in this order):
    > 
    > All the columns in the USING clause
    > All the columns of the first (left) table that are not specified in the 
USING clause
    > All the columns of the second (right) table that are not specified in the 
USING clause
    
    I am trying to verify whether this PR of @dongjoon-hyun is too restrictive 
or not. I understand that this PR has fixed the problem reported here but want 
to make sure it is the right fix. I do agree with @dongjoon-hyun that what he 
fixed is at the right place. I will post an update on my finding later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to