Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5044#discussion_r28199479
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
    @@ -81,6 +81,39 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
         (plan.children, children).zipped.forall(_ sameResult _)
       }
     
    +  /**
    +   * Returns true when the given logical plan will return part of the 
results of this logical plan.
    +   *
    +   * The given logical plan is considered the part of this logical plan if 
all args of the given
    +   * plan (i.e. cleanArgs) are contained in the ares of this logical plan.
    +   *
    +   * Since its likely undecideable to generally determine if the given 
plan will produce part of
    +   * the results of another plan, it is okay for this function to return 
false, even if the results
    +   * are actually one part of another. Such behavior will not affect 
correctness, only the
    +   * application of performance enhancements like caching.  However, it is 
not acceptable to return
    +   * true if the results could possibly be not of part of another.
    +   *
    +   * By default this function performs a modified version of equality that 
is tolerant of cosmetic
    +   * differences like attribute naming and or expression id differences.  
Logical operators that
    +   * can do better should override this function.
    +   */
    +  def partResult(plan: LogicalPlan): Boolean = {
    +    plan.getClass == this.getClass &&
    +    plan.children.size == children.size && {
    +      logDebug(s"[${plan.cleanArgs.mkString(", ")}] is part of 
[${cleanArgs.mkString(", ")}]")
    +      !plan.cleanArgs.zip(cleanArgs).exists {
    +        // If this arg is a sequence, we check if there is a sequence in 
cleanArgs of this logical
    +        // plan at the same index that contains all elements of this arg.
    +        case (s: Seq[_], ss: Seq[_]) =>
    --- End diff --
    
    This is not a valid check to make sure that queries are going to return the 
same answer.  Consider grouping for example:
    
    `SELECT a FROM table GROUP BY a, b`
    `SELECT a FROM table GROUP BY a`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to