GitHub user bkpathak opened a pull request:
https://github.com/apache/spark/pull/12691
[Spark-14761][SQL][WIP] Reject invalid join methods even when join columns
are not specified in PySpark DataFrame join.
## What changes were proposed in this pull request?
In PySpark, the invalid join type will not throw error for the following
join:
```df1.join(df2, how='not-a-valid-join-type')```
The signature of the join is:
```def join(self, other, on=None, how=None):```
The existing code completely ignores the `how` parameter when `on` is
`None`. This patch will process the arguments passed to join and pass in to JVM
Spark SQL Analyzer, which will validate the join type passed.
## How was this patch tested?
Used manual and existing test suites.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bkpathak/spark spark-14761
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12691.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12691
----
commit ab6518468d67063e92667f3a8f6b563fea5b00f8
Author: Bijay Pathak <[email protected]>
Date: 2016-04-26T05:07:30Z
refactored join so it always passes complete arguments to JVM api
commit 66746964427cd8028250963cc26b0397e8141bc4
Author: Bijay Pathak <[email protected]>
Date: 2016-04-26T06:00:07Z
updated to handle condition when on is None
commit db36befc3dd969d5b5ade5398c6d3aa0a93c7fbd
Author: Bijay Pathak <[email protected]>
Date: 2016-04-26T06:38:18Z
fixed the style error
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]