GitHub user marmbrus opened a pull request:
https://github.com/apache/spark/pull/2109
[WIP][SPARK-3194][SQL] Add AttributeSet to fix bugs with invalid
comparisons of AttributeReferences
It is common to want to describe sets of attributes that are in various
parts of a query plan. However, the semantics of putting `AttributeReference`
objects into a standard Scala `Set` result in subtle bugs when references
differ cosmetically. For example, with case insensitive resolution it is
possible to have two references to the same attribute whose names are not
equal.
In this PR I introduce a new abstraction, an `AttributeSet`, which performs
all comparisons using the globally unique `ExpressionId` instead of case class
equality. (There is already a related class, `AttributeMap`) This new type of
set is used to fix a bug in the optimizer where needed attributes were getting
projected away underneath join operators.
I also took this opportunity to refactor the expression and query plan base
classes. In all but one instance the logic for computing the `references` of
an `Expression` were the same. Thus, I moved this logic into the base class.
For query plans the semantics of the `references` method were ill defined
(is the the references output, or used by expression evaluation, or what?). As
a result, this method wasn't really used very much. So, I removed it.
TODO:
- [ ] Finish scala doc for `AttributeSet`
- [ ] Scan the code for other instances of `Set[Attribute]` and refactor
them.
- [ ] Finish removing `references` from `QueryPlan`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/marmbrus/spark attributeSets
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2109.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2109
----
commit 94e2cfc0a89a249dc5660e28e4f6a3c53ddbd83a
Author: Michael Armbrust <[email protected]>
Date: 2014-08-14T04:42:27Z
WIP
commit c41916565ed681dad4d11a4f5e50c458adc89078
Author: Michael Armbrust <[email protected]>
Date: 2014-08-14T17:29:46Z
WIP
commit 7a09400b6fd6f23edc30ee040a380e5e18399dc3
Author: Michael Armbrust <[email protected]>
Date: 2014-08-14T19:50:28Z
WIP
commit c27f33f133c5fe199307d891f3811599a555b4d3
Author: Michael Armbrust <[email protected]>
Date: 2014-08-24T19:43:05Z
Merge remote-tracking branch 'origin/master' into attributeSets
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]