[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox


jianxind commented on pull request #7607:
URL: https://github.com/apache/arrow/pull/7607#issuecomment-655383817


   > @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark
   
   The result of the buildbot, https://ci.ursalabs.org/#/builders/73/builds/93. 
The clang compiler can generate more better SIMD instructions.
   
   Below is the results for the two typical null proportions(0% 0.01%).
   ```
   31   SumKernelInt32/1048576/06.576 GiB/sec   19.756 GiB/sec   
200.444  {'run_name': 'SumKernelInt32/1048576/0', 'run_...
   14   SumKernelInt16/1048576/03.314 GiB/sec8.518 GiB/sec   
157.026  {'run_name': 'SumKernelInt16/1048576/0', 'run_...
   15   SumKernelFloat/1048576/08.647 GiB/sec   20.204 GiB/sec   
133.656  {'run_name': 'SumKernelFloat/1048576/0', 'run_...
   34   SumKernelFloat/1048576/16.669 GiB/sec   14.262 GiB/sec   
113.867  {'run_name': 'SumKernelFloat/1048576/1', '...
   24SumKernelInt8/1048576/01.906 GiB/sec3.794 GiB/sec
99.079  {'run_name': 'SumKernelInt8/1048576/0', 'run_t...
   23   SumKernelInt64/1048576/0   18.130 GiB/sec   25.094 GiB/sec
38.410  {'run_name': 'SumKernelInt64/1048576/0', 'run_...
   29  SumKernelDouble/1048576/0   18.296 GiB/sec   24.188 GiB/sec
32.204  {'run_name': 'SumKernelDouble/1048576/0', 'run...
   19   SumKernelInt32/1048576/1   10.897 GiB/sec   14.163 GiB/sec
29.967  {'run_name': 'SumKernelInt32/1048576/1', '...
   4SumKernelInt64/1048576/1   16.745 GiB/sec   20.665 GiB/sec
23.411  {'run_name': 'SumKernelInt64/1048576/1', '...
   26  SumKernelDouble/1048576/1   16.439 GiB/sec   19.635 GiB/sec
19.441  {'run_name': 'SumKernelDouble/1048576/1', ...
   9SumKernelInt16/1048576/15.870 GiB/sec6.403 GiB/sec 
9.069  {'run_name': 'SumKernelInt16/1048576/1', '...
   11SumKernelInt8/1048576/12.958 GiB/sec2.917 GiB/sec
-1.382  {'run_name': 'SumKernelInt8/1048576/1', 'r...
   ```
   
   For other 1% 50% null case, there are some small regression(20%) which are 
expected as BitBlockCounter used now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox


jianxind commented on pull request #7607:
URL: https://github.com/apache/arrow/pull/7607#issuecomment-655370919


   @wesm @emkornfield @pitrou 
   
   Ping, could you help to review this approach when you are available? 
Appreciate your help.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox


jianxind commented on pull request #7607:
URL: https://github.com/apache/arrow/pull/7607#issuecomment-655371150


   @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox


jianxind commented on pull request #7607:
URL: https://github.com/apache/arrow/pull/7607#issuecomment-652431931


   @emkornfield This is the new version for sum aggregate without intrinsic, 
could you help to review?
   
   The dense part nearly get the same scores with intrinsic for AVX2 on clang, 
gcc result is little low.
   
   Below is the benchmark results(null_percent 0 and 0.01%) on a AVX2(i7-8700) 
device.
   Before
   ```
   SumKernelFloat/1048576/197.5 us 97.3 us 7150 
bytes_per_second=10.0336G/s null_percent=0.01 size=1048.58k
   SumKernelFloat/1048576/062.1 us 62.0 us11292 
bytes_per_second=15.7443G/s null_percent=0 size=1048.58k
   SumKernelDouble/1048576/1   35.4 us 35.4 us19781 
bytes_per_second=27.5977G/s null_percent=0.01 size=1048.58k
   SumKernelDouble/1048576/0   32.5 us 32.5 us21534 
bytes_per_second=30.0657G/s null_percent=0 size=1048.58k
   SumKernelInt8/1048576/1  183 us  183 us 3832 
bytes_per_second=5.34627G/s null_percent=0.01 size=1048.58k
   SumKernelInt8/1048576/0  133 us  132 us 5285 
bytes_per_second=7.37317G/s null_percent=0 size=1048.58k
   SumKernelInt16/1048576/193.3 us 93.2 us 7505 
bytes_per_second=10.4762G/s null_percent=0.01 size=1048.58k
   SumKernelInt16/1048576/068.4 us 68.3 us10249 
bytes_per_second=14.2887G/s null_percent=0 size=1048.58k
   SumKernelInt32/1048576/149.2 us 49.1 us14255 
bytes_per_second=19.874G/s null_percent=0.01 size=1048.58k
   SumKernelInt32/1048576/040.3 us 40.2 us17654 
bytes_per_second=24.2688G/s null_percent=0 size=1048.58k
   SumKernelInt64/1048576/135.3 us 35.3 us19870 
bytes_per_second=27.6826G/s null_percent=0.01 size=1048.58k
   SumKernelInt64/1048576/032.4 us 32.3 us21628 
bytes_per_second=30.1902G/s null_percent=0 size=1048.58k
   ```
   
   After
   ```
   SumKernelFloat/1048576/141.1 us 41.0 us17004 
bytes_per_second=23.7947G/s null_percent=0.01 size=1048.58k
   SumKernelFloat/1048576/025.1 us 25.0 us27884 
bytes_per_second=38.9922G/s null_percent=0 size=1048.58k
   SumKernelDouble/1048576/1   24.6 us 24.5 us28423 
bytes_per_second=39.8205G/s null_percent=0.01 size=1048.58k
   SumKernelDouble/1048576/0   17.1 us 17.1 us40881 
bytes_per_second=57.1186G/s null_percent=0 size=1048.58k
   SumKernelInt8/1048576/1  116 us  115 us 6073 
bytes_per_second=8.46685G/s null_percent=0.01 size=1048.58k
   SumKernelInt8/1048576/0 61.0 us 60.9 us11501 
bytes_per_second=16.0293G/s null_percent=0 size=1048.58k
   SumKernelInt16/1048576/162.2 us 62.2 us11250 
bytes_per_second=15.7108G/s null_percent=0.01 size=1048.58k
   SumKernelInt16/1048576/037.0 us 37.0 us19883 
bytes_per_second=26.4204G/s null_percent=0 size=1048.58k
   SumKernelInt32/1048576/138.4 us 38.4 us18217 
bytes_per_second=25.4367G/s null_percent=0.01 size=1048.58k
   SumKernelInt32/1048576/024.6 us 24.5 us28531 
bytes_per_second=39.8216G/s null_percent=0 size=1048.58k
   SumKernelInt64/1048576/124.1 us 24.1 us29069 
bytes_per_second=40.5531G/s null_percent=0.01 size=1048.58k
   SumKernelInt64/1048576/016.7 us 16.7 us41887 
bytes_per_second=58.3943G/s null_percent=0 size=1048.58k
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox


jianxind commented on pull request #7607:
URL: https://github.com/apache/arrow/pull/7607#issuecomment-652425725


   @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org