GitHub user coreywoodfield opened a pull request:
https://github.com/apache/spark/pull/18462
Removed invalid joinTypes from javadoc of Dataset#joinWith
## What changes were proposed in this pull request?
Two invalid join types were mistakenly listed in the javadoc for joinWith,
in the Dataset class. I presume these were copied from the javadoc of join, but
since joinWith returns a Dataset\<Tuple2\>, left_semi and left_anti are
invalid, as they only return values from one of the datasets, instead of from
both
## How was this patch tested?
I ran the following code :
```
public static void main(String[] args) {
SparkSession spark = new SparkSession(new SparkContext("local[*]",
"Test"));
Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new
Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class);
Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new
Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class);
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"inner").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"cross").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"full").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"full_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"left").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"left_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"right").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"right_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"left_semi").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")),
"left_anti").show();} catch (Exception e) {e.printStackTrace();}
}
```
which tests all the different join types, and the last two (left_semi and
left_anti) threw exceptions. The same code using join instead of joinWith did
fine. The Bean class was just a java bean with a single int field, x.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/coreywoodfield/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18462.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18462
----
commit a643ef2f60a45c202d71e39d0cdaa32fb1350890
Author: Corey Woodfield <[email protected]>
Date: 2017-06-29T00:11:26Z
Removed invalid joinTypes from javadoc of Dataset#joinWith
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]