Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1301#discussion_r14565428
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
    @@ -67,15 +68,18 @@ object ColumnPruning extends Rule[LogicalPlan] {
             projectList.flatMap(_.references).toSet ++ 
condition.map(_.references).getOrElse(Set.empty)
     
           /** Applies a projection only when the child is producing 
unnecessary attributes */
    -      def prunedChild(c: LogicalPlan) =
    -        if ((c.outputSet -- 
allReferences.filter(c.outputSet.contains)).nonEmpty) {
    -          Project(allReferences.filter(c.outputSet.contains).toSeq, c)
    -        } else {
    -          c
    -        }
    +      def prunedChild(c: LogicalPlan) = ColumnPruning.prunedChild(c, 
allReferences)
     
           Project(projectList, Join(prunedChild(left), prunedChild(right), 
joinType, condition))
     
    +    // Eliminate unneeded attributes from right side of a LeftSemiJoin.
    +    case Join(left, right, LeftSemi, condition) =>
    +      // Collect the list of off references required either above or to 
evaluate the condition.
    --- End diff --
    
    This comment is kind of misleading.  It says that it is collecting the 
references required above, but actually is only collecting the attributes 
required for the condition.  Since you are only applying it to the right side, 
I believe the code is correct, but it would be good to be more clear about the 
intent.  Also I think off -> all?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to