brandondahler opened a new pull request #33323: URL: https://github.com/apache/spark/pull/33323
### What changes were proposed in this pull request? Adds 3 new syntactic sugar overloads to Dataset's join method as proposed in [SPARK-35739](https://issues.apache.org/jira/browse/SPARK-35739). ### Why are the changes needed? Improved development experience for developers using Spark SQL, specifically when coding in Java. Prior to changes the Seq overloads required developers to use less-known Java-to-Scala converter methods that made code less readable. The overloads internalize those converter calls for two of the new methods and the third method adds a single-item overload that is useful for both Java and Scala. ### Does this PR introduce _any_ user-facing change? Yes, the three new overloads technically constitute an API change to the Dataset class. These overloads are net-new and have been commented appropriately in line with the existing methods. ### How was this patch tested? Test cases were not added because it is unclear to me where/how syntactic sugar overloads fit into the testing suites (if at all). Happy to add them if I can be pointed in the correct direction. * Changes were tested in Scala via spark-shell. * Changes were tested in Java by modifying an example: ``` diff --git a/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java b/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java index 86a9045d8a..342810c1e6 100644 --- a/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java +++ b/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java @@ -124,6 +124,10 @@ public class JavaSparkSQLExample { // |-- age: long (nullable = true) // |-- name: string (nullable = true) + df.join(df, new String[] {"age"}).show(); + df.join(df, "age", "left").show(); + df.join(df, new String[] {"age"}, "left").show(); + // Select only the "name" column df.select("name").show(); // +-------+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
