Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/1116#issuecomment-46507895
I'm not sure this solution is really doing what we want. The `sparkPlan`
should not be used as the explain output since it won't contain `Exchange`
operators. The only reason this is still producing the correct answer is
because we are using the `executePlan` of the `Explain` operator, so a pass is
still done to add `Exchange` operators.
Really, the core problem is that `AddExchange` is not idempotent because it
is not correctly detecting when the inputPartitioning is already correct. This
is a more serious bug as it could significantly hurt performance.
The root cause is related to the way that we bind attributes to specific
ordinals in the input schema. Right now, for some operators this is done by
the `BindReferences` rule (the original solution) and for others its done
through the creation of a Projection (the new, better solution that is faster
and acts as the interface for codegen). The issue with the old solution is
that BoundAttributeReferences can't correctly handle equality, hence the issue
with AddExchange.
I propose we do this:
- Revert this change and fix Exchange to use the new method for binding to
an input schema [Solution Here](https://github.com/apache/spark/pull/1122).
- In a follow up PR remove BindReferences.
> As the codebase grows, should we consider using type-level constraints to
enforce some of these constraints?
I'm generally in favor of this approach, but I think in the instance of
trees its going to be pretty hard to do. This is because often you want to be
able to create trees that start off semantically invalid, but are then `fixed`
through resolution. One thing we could add a set of invariant checks that run
at various phases and let us know anytime something is amiss (adjacent
`Exchange` operators for example).
More docs are also always a good idea :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---