[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch
jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-655383817 > @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark The result of the buildbot, https://ci.ursalabs.org/#/builders/73/builds/93. The clang compiler can generate more better SIMD instructions. Below is the results for the two typical null proportions(0% 0.01%). ``` 31 SumKernelInt32/1048576/06.576 GiB/sec 19.756 GiB/sec 200.444 {'run_name': 'SumKernelInt32/1048576/0', 'run_... 14 SumKernelInt16/1048576/03.314 GiB/sec8.518 GiB/sec 157.026 {'run_name': 'SumKernelInt16/1048576/0', 'run_... 15 SumKernelFloat/1048576/08.647 GiB/sec 20.204 GiB/sec 133.656 {'run_name': 'SumKernelFloat/1048576/0', 'run_... 34 SumKernelFloat/1048576/16.669 GiB/sec 14.262 GiB/sec 113.867 {'run_name': 'SumKernelFloat/1048576/1', '... 24SumKernelInt8/1048576/01.906 GiB/sec3.794 GiB/sec 99.079 {'run_name': 'SumKernelInt8/1048576/0', 'run_t... 23 SumKernelInt64/1048576/0 18.130 GiB/sec 25.094 GiB/sec 38.410 {'run_name': 'SumKernelInt64/1048576/0', 'run_... 29 SumKernelDouble/1048576/0 18.296 GiB/sec 24.188 GiB/sec 32.204 {'run_name': 'SumKernelDouble/1048576/0', 'run... 19 SumKernelInt32/1048576/1 10.897 GiB/sec 14.163 GiB/sec 29.967 {'run_name': 'SumKernelInt32/1048576/1', '... 4SumKernelInt64/1048576/1 16.745 GiB/sec 20.665 GiB/sec 23.411 {'run_name': 'SumKernelInt64/1048576/1', '... 26 SumKernelDouble/1048576/1 16.439 GiB/sec 19.635 GiB/sec 19.441 {'run_name': 'SumKernelDouble/1048576/1', ... 9SumKernelInt16/1048576/15.870 GiB/sec6.403 GiB/sec 9.069 {'run_name': 'SumKernelInt16/1048576/1', '... 11SumKernelInt8/1048576/12.958 GiB/sec2.917 GiB/sec -1.382 {'run_name': 'SumKernelInt8/1048576/1', 'r... ``` For other 1% 50% null case, there are some small regression(20%) which are expected as BitBlockCounter used now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch
jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-655370919 @wesm @emkornfield @pitrou Ping, could you help to review this approach when you are available? Appreciate your help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch
jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-655371150 @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean
jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-652431931 @emkornfield This is the new version for sum aggregate without intrinsic, could you help to review? The dense part nearly get the same scores with intrinsic for AVX2 on clang, gcc result is little low. Below is the benchmark results(null_percent 0 and 0.01%) on a AVX2(i7-8700) device. Before ``` SumKernelFloat/1048576/197.5 us 97.3 us 7150 bytes_per_second=10.0336G/s null_percent=0.01 size=1048.58k SumKernelFloat/1048576/062.1 us 62.0 us11292 bytes_per_second=15.7443G/s null_percent=0 size=1048.58k SumKernelDouble/1048576/1 35.4 us 35.4 us19781 bytes_per_second=27.5977G/s null_percent=0.01 size=1048.58k SumKernelDouble/1048576/0 32.5 us 32.5 us21534 bytes_per_second=30.0657G/s null_percent=0 size=1048.58k SumKernelInt8/1048576/1 183 us 183 us 3832 bytes_per_second=5.34627G/s null_percent=0.01 size=1048.58k SumKernelInt8/1048576/0 133 us 132 us 5285 bytes_per_second=7.37317G/s null_percent=0 size=1048.58k SumKernelInt16/1048576/193.3 us 93.2 us 7505 bytes_per_second=10.4762G/s null_percent=0.01 size=1048.58k SumKernelInt16/1048576/068.4 us 68.3 us10249 bytes_per_second=14.2887G/s null_percent=0 size=1048.58k SumKernelInt32/1048576/149.2 us 49.1 us14255 bytes_per_second=19.874G/s null_percent=0.01 size=1048.58k SumKernelInt32/1048576/040.3 us 40.2 us17654 bytes_per_second=24.2688G/s null_percent=0 size=1048.58k SumKernelInt64/1048576/135.3 us 35.3 us19870 bytes_per_second=27.6826G/s null_percent=0.01 size=1048.58k SumKernelInt64/1048576/032.4 us 32.3 us21628 bytes_per_second=30.1902G/s null_percent=0 size=1048.58k ``` After ``` SumKernelFloat/1048576/141.1 us 41.0 us17004 bytes_per_second=23.7947G/s null_percent=0.01 size=1048.58k SumKernelFloat/1048576/025.1 us 25.0 us27884 bytes_per_second=38.9922G/s null_percent=0 size=1048.58k SumKernelDouble/1048576/1 24.6 us 24.5 us28423 bytes_per_second=39.8205G/s null_percent=0.01 size=1048.58k SumKernelDouble/1048576/0 17.1 us 17.1 us40881 bytes_per_second=57.1186G/s null_percent=0 size=1048.58k SumKernelInt8/1048576/1 116 us 115 us 6073 bytes_per_second=8.46685G/s null_percent=0.01 size=1048.58k SumKernelInt8/1048576/0 61.0 us 60.9 us11501 bytes_per_second=16.0293G/s null_percent=0 size=1048.58k SumKernelInt16/1048576/162.2 us 62.2 us11250 bytes_per_second=15.7108G/s null_percent=0.01 size=1048.58k SumKernelInt16/1048576/037.0 us 37.0 us19883 bytes_per_second=26.4204G/s null_percent=0 size=1048.58k SumKernelInt32/1048576/138.4 us 38.4 us18217 bytes_per_second=25.4367G/s null_percent=0.01 size=1048.58k SumKernelInt32/1048576/024.6 us 24.5 us28531 bytes_per_second=39.8216G/s null_percent=0 size=1048.58k SumKernelInt64/1048576/124.1 us 24.1 us29069 bytes_per_second=40.5531G/s null_percent=0.01 size=1048.58k SumKernelInt64/1048576/016.7 us 16.7 us41887 bytes_per_second=58.3943G/s null_percent=0 size=1048.58k ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean
jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-652425725 @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org