Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9921#discussion_r45814672
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -520,6 +523,25 @@ class Dataset[T] private[sql](
         }
       }
     
    +  /**
    +   * Using inner equi-join to join this [[Dataset]] returning a [[Tuple2]] 
for each pair
    +   * where `condition` evaluates to true
    +   *
    +   * @since 1.6.0
    +   */
    +  def joinWith[U](other: Dataset[U], condition: Column): Dataset[(T, U)] = 
{
    +    joinWith(other, condition, "inner")
    +  }
    +
    +  /**
    +   * Joins this [[Dataset]] returning a [[Tuple2]] for each pair using 
cartesian join
    +   *
    +   * Note that cartesian joins are very expensive without an extra filter 
that can be pushed down.
    +   *
    +   * @since 1.6.0
    +   */
    +  def joinWith[U](other: Dataset[U]): Dataset[(T, U)] = joinWith (other, 
lit(true), "inner")
    --- End diff --
    
    Actually I'd maybe just remove this for now -- since cartesian joins are 
too expensive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to