[GitHub] spark pull request #18462: Removed invalid joinTypes from javadoc of Dataset...

coreywoodfield Wed, 28 Jun 2017 17:18:43 -0700

GitHub user coreywoodfield opened a pull request:

    https://github.com/apache/spark/pull/18462


    Removed invalid joinTypes from javadoc of Dataset#joinWith

    ## What changes were proposed in this pull request?
    
    Two invalid join types were mistakenly listed in the javadoc for joinWith, 
in the Dataset class. I presume these were copied from the javadoc of join, but 
since joinWith returns a Dataset\<Tuple2\>, left_semi and left_anti are 
invalid, as they only return values from one of the datasets, instead of from 
both
    
    ## How was this patch tested?
    
    I ran the following code : 
    ```
    public static void main(String[] args) {
        SparkSession spark = new SparkSession(new SparkContext("local[*]", 
"Test"));
        Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new 
Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class);
        Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new 
Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class);
                
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"inner").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"cross").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"outer").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"full").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"full_outer").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"left").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"left_outer").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"right").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"right_outer").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"left_semi").show();} catch (Exception e) {e.printStackTrace();}
        try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"left_anti").show();} catch (Exception e) {e.printStackTrace();}
    }
    ```
    which tests all the different join types, and the last two (left_semi and 
left_anti) threw exceptions. The same code using join instead of joinWith did 
fine. The Bean class was just a java bean with a single int field, x.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/coreywoodfield/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18462.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18462
    
----
commit a643ef2f60a45c202d71e39d0cdaa32fb1350890
Author: Corey Woodfield <[email protected]>
Date:   2017-06-29T00:11:26Z

    Removed invalid joinTypes from javadoc of Dataset#joinWith

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18462: Removed invalid joinTypes from javadoc of Dataset...

Reply via email to