[GitHub] spark pull request: [SPARK-11017] [SQL] Support ImperativeAggregat...

JoshRosen Mon, 12 Oct 2015 22:42:08 -0700

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9038#issuecomment-147609673
  
    To elaborate on the test failures:
    
    All tests pass as of 8695e3f, when the Tungsten code path is not used for 
imperative distinct aggregates.
    
    As of a09c51bc6936cdd3bbbdd4a7b73735079f62896d, which enables this for 
distinct aggregates, the "single distinct column set" query fails by returning 
the wrong answer. Here, I think the problem was that the attribute bindings 
were being mixed up, causing us to return the wrong columns in one of the 
projections.
    
    After trying to fix things by copying when changing the buffer offsets, I 
wound up getting binding errors. My current understanding of attribute binding 
is that we shouldn't be creating new attributes on executors because this could 
lead to expression ids being re-used in an incorrect fashion. Given this, I 
think the cleanest thing to do will be to do the ImperativeAggregate 
mutable-part/immutable-part interface refactoring that I proposed earlier.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-11017] [SQL] Support ImperativeAggregat...

Reply via email to