Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/9067#issuecomment-151793559
  
    @rxin I ran a simple performance measure as following.
    
    Record count: 1333635318
    Record after group by: 259200
    
    SQL query looks like: `SELECT SUM(a) as a , SUM(b) as b , SUM(c) as c , 
SUM(d) as d from table GROUP BY e`
    
    4 workers (8 cores), executor memory: 512 MB.
    
    With pre-aggregation enabled: 
    
        67720191 microseconds
        66424539 microseconds
        62959275 microseconds
    
    With pre-aggregation disabled: 
    
        69934956 microseconds                                                   
        
        70351959 microseconds                                                   
        
        68437353 microseconds   
                                                            
    So looks like it roughly gains about 5% improvement in average.
    
    Not very significant, but the reduction factor is not high, so it should be 
expected.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to