GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/10477

    [SPARK-12520] [PySpark] Correct Descriptions and Add Use Cases in Equi-Join

    After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I 
double checked the code. 
    
    For example, users can do the Equi-Join like
      ```df.join(df2, 'name', 'outer').select('name', 'height').collect()```
    - There exists a bug in 1.5 and 1.4. The code just ignores the third 
parameter (join type) users pass. However, the join type we called is `Inner`, 
even if the user-specified type is the other type (e.g., `Outer`). 
    - After a PR: https://github.com/apache/spark/pull/8600, the 1.6 does not 
have such an issue, but the description has not been updated.  
    
    Plan to submit another PR to fix 1.5 and issue an error message if users 
specify a non-inner join type when using Equi-Join. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark pyOuterJoin

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10477
    
----
commit 29231828d47ff74c190fae782ae08bfe89861958
Author: gatorsmile <[email protected]>
Date:   2015-12-25T18:35:19Z

    equi-join with the other join type.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to