GitHub user bkpathak opened a pull request:

    https://github.com/apache/spark/pull/15409

    [Spark-14761][SQL] Reject invalid join methods  when join columns are not 
specified in PySpark DataFrame join.

    ## What changes were proposed in this pull request?
    
    In PySpark, the invalid join type will not throw error for the following 
join:
    ```df1.join(df2, how='not-a-valid-join-type')```
    
    The signature of the join is:
    ```def join(self, other, on=None, how=None):```
    The existing code completely ignores the `how` parameter when `on` is 
`None`. This patch will process the arguments passed to join and pass in to JVM 
Spark SQL Analyzer, which will validate the join type passed.
    
    ## How was this patch tested?
    Used manual and existing test suites.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bkpathak/spark SPARK-14761

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15409.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15409
    
----
commit cec8ec48de5f51f40ff4b929da0c0496fcc0a662
Author: Bijay Pathak <[email protected]>
Date:   2016-10-09T23:58:33Z

    reject invalid join methods when join columns are not specified in pyspark 
DataFrame joins

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to