Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11416#discussion_r54352359
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
    @@ -687,7 +687,8 @@ class Analyzer(
             resolved
           } else {
             plan match {
    -          case u: UnaryNode if !u.isInstanceOf[SubqueryAlias] =>
    +          case u: UnaryNode
    --- End diff --
    
    Based on my understanding, the standard is appending/pruning the attributes 
from `outputSet` does not impact the results of the existing/remaining 
attributes. Based on this, we can categorize the existing `UnaryNode` into 
three groups:
    
    - **Group 1**: To add a new attribute into the `outputSet` of one node, we 
just need to add a new attribute into its child `outputSet`. 
    
      - **Type 1.1**: Adding new attributes will not have any impact on the 
existing logics of this node. For example, `Filter` and `Sort`.
      - **Type 1.2**: Adding new attributes will impact the parent nodes. For 
example, `SubqueryAlias`. It will add `alias` into `Quantifier` of attributes 
in its `outputSet`
    
    - **Group 2**: The `outputSet` of one node is fully/partially controlled by 
its class parameters. 
    
      - **Type 2.1**: Adding new attributes will not have any impact on the 
existing logics of this node. For example, `Project` and `Window`.
      - **Type 2.2**: Adding new attributes is restricted by the other class 
parameters. For example, `Aggregate` and `Generate`. For `Aggregate` nodes, we 
only can add attributes if they are part of `groupingExpressions`. Adding 
attributes into `groupingExpressions` will change the results instead of 
appending new columns.
    
    `ScriptTransformation`, `MapPartitions`, `AppendColumns` and `MapGroups` 
belong to **Type 2.2**. `script` and `func` restrict us to add new attributes. 
Thus, I think we should put them into the blacklist.
    
    `EvaluatePython` belongs to **Type 1.1**. Its output is determined by its 
`child.output` and `resultAttribute`. It should be safe. 
    
    As what I mentioned above, `GroupingSets` and `Pivot` are not visible to 
this rule. Thus, we do not need to add them into the blacklist. 
    
    Please correct me if my understanding is wrong. @davies Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to