Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18692#discussion_r145498671
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
    @@ -152,3 +152,79 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
           if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
       }
     }
    +
    +/**
    + * A rule that uses propagated constraints to infer join conditions. The 
optimization is applicable
    + * only to CROSS joins. For other join types, adding inferred join 
conditions would potentially
    + * shuffle children as child node's partitioning won't satisfy the JOIN 
node's requirements
    + * which otherwise could have.
    + *
    + * For instance, if there is a CROSS join, where the left relation has 'a 
= 1' and the right
    + * relation has 'b = 1', then the rule infers 'a = b' as a join predicate.
    + */
    +object InferJoinConditionsFromConstraints extends Rule[LogicalPlan] with 
PredicateHelper {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = {
    +    if (SQLConf.get.constraintPropagationEnabled) {
    +      inferJoinConditions(plan)
    +    } else {
    +      plan
    +    }
    +  }
    +
    +  private def inferJoinConditions(plan: LogicalPlan): LogicalPlan = plan 
transform {
    +    case join @ Join(left, right, Cross, conditionOpt) =>
    +
    +      val rightEqualToPredicates = join.constraints.collect {
    --- End diff --
    
    I thought about improving the time complexity here via a hash map with 
semantic equals/hashcode. However, this idea will require a wrapper so I keep 
it as it is. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to