baibaichen commented on pull request #29695: URL: https://github.com/apache/spark/pull/29695#issuecomment-792516852
Thanks @huaxingao we did some tests on aggregate push down in real product environment last month. here are results 1. datasets: 550M records 2. 4 click-house nodes | 1 User | 10 Users | 20 Users | 60 Users -- | -- | -- | -- | -- QPS | 2.76 | 6.1 | 4.43 | 4.45 90% (sec) | **0.4** | 2.1 | 7 | 17 slowest (sec) | 0.45 | 3.3 | 12 | 27 we didn't test without aggregate push down, because it is 10 X slower than push down However the current PR has some limitations: 1. Don't support count 2. Don't support AVG in case of multiple shards 3. Don't know how to extend the implementation for supporting more aggregation case, for example, sum(if()). Thanks Chang ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
