Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1116#issuecomment-46507895
  
    I'm not sure this solution is really doing what we want.  The `sparkPlan` 
should not be used as the explain output since it won't contain `Exchange` 
operators.  The only reason this is still producing the correct answer is 
because we are using the `executePlan` of the `Explain` operator, so a pass is 
still done to add `Exchange` operators.
    
    Really, the core problem is that `AddExchange` is not idempotent because it 
is not correctly detecting when the inputPartitioning is already correct.  This 
is a more serious bug as it could significantly hurt performance.
    
    The root cause is related to the way that we bind attributes to specific 
ordinals in the input schema.  Right now, for some operators this is done by 
the `BindReferences` rule (the original solution) and for others its done 
through the creation of a Projection (the new, better solution that is faster 
and acts as the interface for codegen).  The issue with the old solution is 
that BoundAttributeReferences can't correctly handle equality, hence the issue 
with AddExchange.
    
    I propose we do this:
     - Revert this change and fix Exchange to use the new method for binding to 
an input schema [Solution Here](https://github.com/apache/spark/pull/1122).
     - In a follow up PR remove BindReferences.
    
    > As the codebase grows, should we consider using type-level constraints to 
enforce some of these constraints?
    
    I'm generally in favor of this approach, but I think in the instance of 
trees its going to be pretty hard to do.  This is because often you want to be 
able to create trees that start off semantically invalid, but are then `fixed` 
through resolution.  One thing we could add a set of invariant checks that run 
at various phases and let us know anytime something is amiss (adjacent 
`Exchange` operators for example).
    
    More docs are also always a good idea :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to