[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

marmbrus Sat, 26 Jul 2014 13:33:28 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/993#issuecomment-50247741
  
    Hey @rxin,  thanks for the careful review!  I think I've addressed most of 
your comments.  Regarding the GeneratedAggregate code, I'm happy to sit down 
and explain in more detail at some point.  One thing to note is that its only 
used in a few circumstances at the moment, and its pretty easy to switch off if 
we find it causes problems in the future.  That said, I think there is the 
possibility of pretty huge speed up here.  Here's an example that demonstrates 
how the rewrite happens for a simple query:
    
    `hql("SELECT AVG(key) + 1 FROM src GROUP BY value")`
    
    Partial Aggregation:
    ```
    Initial values: 0,CAST(0, LongType)
    Grouping Projection: value#25
    Update Expressions: if (IS NOT NULL CAST(key#24, LongType)) 
(currentCount#30L + 1) else currentCount#30L,Coalesce((CAST(key#24, LongType) + 
currentSum#31L),currentSum#31L)
    Result Projection: value#25,currentCount#30L AS 
PartialCount#27L,currentSum#31L AS PartialSum#26L
    ```
    
    The updates compute the new currentSum and currentCount given an update 
buffer joined with the input row.
    
    Final aggregation:
    ```
    Initial values: CAST(0, LongType),CAST(0, LongType)
    Grouping Projection: value#25
    Update Expressions: Coalesce((PartialSum#26L + 
currentSum#28L),currentSum#28L),Coalesce((PartialCount#27L + 
currentSum#29L),currentSum#29L)
    Result Projection: ((CAST(currentSum#28L, DoubleType) / 
CAST(currentSum#29L, DoubleType)) + 1.0) AS c_0#22
    ```
    
    The updates calculate the sum of the counts and partial sums.  The result 
divides them and adds 1.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

Reply via email to