Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/11416#discussion_r54352359
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -687,7 +687,8 @@ class Analyzer(
resolved
} else {
plan match {
- case u: UnaryNode if !u.isInstanceOf[SubqueryAlias] =>
+ case u: UnaryNode
--- End diff --
Based on my understanding, the standard is appending/pruning the attributes
from `outputSet` does not impact the results of the existing/remaining
attributes. Based on this, we can categorize the existing `UnaryNode` into
three groups:
- **Group 1**: To add a new attribute into the `outputSet` of one node, we
just need to add a new attribute into its child `outputSet`.
- **Type 1.1**: Adding new attributes will not have any impact on the
existing logics of this node. For example, `Filter` and `Sort`.
- **Type 1.2**: Adding new attributes will impact the parent nodes. For
example, `SubqueryAlias`. It will add `alias` into `Quantifier` of attributes
in its `outputSet`
- **Group 2**: The `outputSet` of one node is fully/partially controlled by
its class parameters.
- **Type 2.1**: Adding new attributes will not have any impact on the
existing logics of this node. For example, `Project` and `Window`.
- **Type 2.2**: Adding new attributes is restricted by the other class
parameters. For example, `Aggregate` and `Generate`. For `Aggregate` nodes, we
only can add attributes if they are part of `groupingExpressions`. Adding
attributes into `groupingExpressions` will change the results instead of
appending new columns.
`ScriptTransformation`, `MapPartitions`, `AppendColumns` and `MapGroups`
belong to **Type 2.2**. `script` and `func` restrict us to add new attributes.
Thus, I think we should put them into the blacklist.
`EvaluatePython` belongs to **Type 1.1**. Its output is determined by its
`child.output` and `resultAttribute`. It should be safe.
As what I mentioned above, `GroupingSets` and `Pivot` are not visible to
this rule. Thus, we do not need to add them into the blacklist.
Please correct me if my understanding is wrong. @davies Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]