[GitHub] [spark] baibaichen commented on pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

GitBox Sun, 07 Mar 2021 22:52:29 -0800


baibaichen commented on pull request #29695:
URL: https://github.com/apache/spark/pull/29695#issuecomment-792516852



   Thanks @huaxingao 
   
   we did some tests on aggregate push down in real product environment last 
month. here are results
   
   1. datasets: 550M records
   2. 4 click-house nodes
   
     | 1 User | 10 Users | 20 Users | 60 Users
   -- | -- | -- | -- | --
   QPS | 2.76 | 6.1 | 4.43 | 4.45
   90% (sec) | **0.4** | 2.1 | 7 | 17
   slowest (sec) | 0.45 | 3.3 | 12 | 27
   
   we didn't test without aggregate push down, because it is 10 X slower than 
push down
   
   However the current PR has some limitations:
   1. Don't support count
   2. Don't support AVG in case of multiple shards
   3. Don't know how to extend the implementation for supporting more 
aggregation case, for example, sum(if()).
   
   Thanks
   Chang 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] baibaichen commented on pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

Reply via email to