jeff303 commented on a change in pull request #26286: [SPARK-26739][SQL]
Standardized Join Types for DataFrames
URL: https://github.com/apache/spark/pull/26286#discussion_r340823270
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -980,21 +980,48 @@ class Dataset[T] private[sql](
* `DataFrame`s, you will NOT be able to reference any columns after the
join, since
* there is no way to disambiguate which side of the join you would like to
reference.
*
+ * @deprecated Use
+ * [[Dataset.join(Dataset[_], Seq[String], JoinType): DataFrame* this
version]] instead
+ *
* @group untypedrel
* @since 2.0.0
*/
+ @deprecated("Use [[Dataset#join(Dataset[_], Seq[String], JoinType):
DataFrame* this]]", "3.0.0")
def join(right: Dataset[_], usingColumns: Seq[String], joinType: String):
DataFrame = {
+ join(right, usingColumns, JoinType(joinType))
+ }
+
+ /**
+ * Equi-join with another `DataFrame` using the given columns. A cross join
with a predicate
+ * is specified as an inner join. If you would explicitly like to perform a
cross join use the
+ * `crossJoin` method.
+ *
+ * Different from other join functions, the join columns will only appear
once in the output,
+ * i.e. similar to SQL's `JOIN USING` syntax.
+ *
+ * @param right Right side of the join operation.
+ * @param usingColumns Names of the columns to join on. This columns must
exist on both sides.
+ * @param joinType Type of join to perform (instance of [[JoinType!]].
Default [[Inner]].
+ *
+ * @note If you perform a self-join using this function without aliasing the
input
+ * `DataFrame`s, you will NOT be able to reference any columns after the
join, since
+ * there is no way to disambiguate which side of the join you would like to
reference.
+ *
+ * @group untypedrel
+ * @since 2.0.0
+ */
+ def join(right: Dataset[_], usingColumns: Seq[String], joinType: JoinType):
DataFrame = {
Review comment:
OK, that change is made. In order to avoid name clashes with the new,
user-facing `JoinType` enumeration, I renamed the previous `JoinType` to
`CatalystJoinType`. That seemed more appropriate since the old one, as
discussed, is not user facing anyway. Also, I tried out different names for
the new public facing enum (ex: `Join`, which clashes too much with existing
names, `JoinKind`, which works but sounds clunky and may confuse with higher
order types), to avoid having to rename the existing `JoinType`, but none of
them quite seemed right.
Of course I am open to feedback here, as always.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]