jeff303 commented on a change in pull request #26286: [SPARK-26739][SQL] 
Standardized Join Types for DataFrames
URL: https://github.com/apache/spark/pull/26286#discussion_r340823270
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##########
 @@ -980,21 +980,48 @@ class Dataset[T] private[sql](
    * `DataFrame`s, you will NOT be able to reference any columns after the 
join, since
    * there is no way to disambiguate which side of the join you would like to 
reference.
    *
+   * @deprecated Use
+   * [[Dataset.join(Dataset[_], Seq[String], JoinType): DataFrame* this 
version]] instead
+   *
    * @group untypedrel
    * @since 2.0.0
    */
+  @deprecated("Use [[Dataset#join(Dataset[_], Seq[String], JoinType): 
DataFrame* this]]", "3.0.0")
   def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
DataFrame = {
+    join(right, usingColumns, JoinType(joinType))
+  }
+
+  /**
+   * Equi-join with another `DataFrame` using the given columns. A cross join 
with a predicate
+   * is specified as an inner join. If you would explicitly like to perform a 
cross join use the
+   * `crossJoin` method.
+   *
+   * Different from other join functions, the join columns will only appear 
once in the output,
+   * i.e. similar to SQL's `JOIN USING` syntax.
+   *
+   * @param right Right side of the join operation.
+   * @param usingColumns Names of the columns to join on. This columns must 
exist on both sides.
+   * @param joinType Type of join to perform (instance of [[JoinType!]]. 
Default [[Inner]].
+   *
+   * @note If you perform a self-join using this function without aliasing the 
input
+   * `DataFrame`s, you will NOT be able to reference any columns after the 
join, since
+   * there is no way to disambiguate which side of the join you would like to 
reference.
+   *
+   * @group untypedrel
+   * @since 2.0.0
+   */
+  def join(right: Dataset[_], usingColumns: Seq[String], joinType: JoinType): 
DataFrame = {
 
 Review comment:
   OK, that change is made.  In order to avoid name clashes with the new, 
user-facing `JoinType` enumeration, I renamed the previous `JoinType` to 
`CatalystJoinType`.  That seemed more appropriate since the old one, as 
discussed, is not user facing anyway.  Also, I tried out different names for 
the new public facing enum (ex: `Join`, which clashes too much with existing 
names, `JoinKind`, which works but sounds clunky and may confuse with higher 
order types), to avoid having to rename the existing `JoinType`, but none of 
them quite seemed right.
   
   Of course I am open to feedback here, as always.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to