GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/2109

    [WIP][SPARK-3194][SQL] Add AttributeSet to fix bugs with invalid 
comparisons of AttributeReferences

    It is common to want to describe sets of attributes that are in various 
parts of a query plan.  However, the semantics of putting `AttributeReference` 
objects into a standard Scala `Set` result in subtle bugs when references 
differ cosmetically.  For example, with case insensitive resolution it is 
possible to have two references to the same attribute whose names are not 
equal.  
    
    In this PR I introduce a new abstraction, an `AttributeSet`, which performs 
all comparisons using the globally unique `ExpressionId` instead of case class 
equality.  (There is already a related class, `AttributeMap`)  This new type of 
set is used to fix a bug in the optimizer where needed attributes were getting 
projected away underneath join operators.
    
    I also took this opportunity to refactor the expression and query plan base 
classes.  In all but one instance the logic for computing the `references` of 
an `Expression` were the same.  Thus, I moved this logic into the base class.
    
    For query plans the semantics of  the `references` method were ill defined 
(is the the references output, or used by expression evaluation, or what?).  As 
a result, this method wasn't really used very much.  So, I removed it.
    
    TODO:
     - [ ] Finish scala doc for `AttributeSet`
     - [ ] Scan the code for other instances of `Set[Attribute]` and refactor 
them.
     - [ ] Finish removing `references` from `QueryPlan`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark attributeSets

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2109
    
----
commit 94e2cfc0a89a249dc5660e28e4f6a3c53ddbd83a
Author: Michael Armbrust <[email protected]>
Date:   2014-08-14T04:42:27Z

    WIP

commit c41916565ed681dad4d11a4f5e50c458adc89078
Author: Michael Armbrust <[email protected]>
Date:   2014-08-14T17:29:46Z

    WIP

commit 7a09400b6fd6f23edc30ee040a380e5e18399dc3
Author: Michael Armbrust <[email protected]>
Date:   2014-08-14T19:50:28Z

    WIP

commit c27f33f133c5fe199307d891f3811599a555b4d3
Author: Michael Armbrust <[email protected]>
Date:   2014-08-24T19:43:05Z

    Merge remote-tracking branch 'origin/master' into attributeSets

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to