[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

chenghao-intel Thu, 23 Oct 2014 21:32:08 -0700

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-60343575
  
    @rxin @marmbrus , I've uploaded an draft design doc in jira. 
https://issues.apache.org/jira/secure/attachment/12676811/grouping_set.pdf, 
sorry it maybe not cover every detail, let me know if you have any confusion. 
    
    @marmbrus 
    >The creation of bit vectors seems like a very implementation focused 
physical concern. I'm curious if this could be restricted to the actual 
physical operator.
    
    Yeah, It's very reasonable, I was thinking of this either. 
    However, the bit vectors stuff don't rely on physical execution engine, and 
it's slightly different with the Aggregate, which has the optimization of 
mapside aggregation for spark execution.
    
    Besides, the attribute reference pass down to the parent logical operator 
need to be correctly set in logical plan analyzing. 
    
    Anyway, I will consider your suggestion, after all, we should keep the 
Logical Plan for "describing what to do", not "how to do".
    
    >Adding a new type of attribute reference for virtual columns might be a 
lot of overhead. Is this really necessary?
    
    A concrete `VirtualColumn` instance is very helpful in attribute 
referencing, and pattern matching, probably better than a name convention. 
Sorry, maybe I didn't understand your mean, we can discuss that in the code 
review.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Reply via email to