Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/9038#issuecomment-147609673
To elaborate on the test failures:
All tests pass as of 8695e3f, when the Tungsten code path is not used for
imperative distinct aggregates.
As of a09c51bc6936cdd3bbbdd4a7b73735079f62896d, which enables this for
distinct aggregates, the "single distinct column set" query fails by returning
the wrong answer. Here, I think the problem was that the attribute bindings
were being mixed up, causing us to return the wrong columns in one of the
projections.
After trying to fix things by copying when changing the buffer offsets, I
wound up getting binding errors. My current understanding of attribute binding
is that we shouldn't be creating new attributes on executors because this could
lead to expression ids being re-used in an incorrect fashion. Given this, I
think the cleanest thing to do will be to do the ImperativeAggregate
mutable-part/immutable-part interface refactoring that I proposed earlier.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]