[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-617227451 @wesm, actually I did use codebase's benchmark executable. The problem is I only focused on one case that's directly related to this change. But ignored other cases that look not relevant, and finally found impacted. Guess I can write a simple script to automate benchmark result checking as a temporary solution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616978252 This change introduces severe branch misses in certain conditions. See perf logs below. I changed benchmark code to run only the problematic test case. Without this patch ```bash 807.415826 task-clock (msec) #0.979 CPUs utilized 83 context-switches #0.103 K/sec 0 cpu-migrations#0.000 K/sec 427 page-faults #0.529 K/sec 2,285,801,407 cycles#2.831 GHz (83.17%) 2,313,785 stalled-cycles-frontend #0.10% frontend cycles idle (83.16%) 915,631,177 stalled-cycles-backend# 40.06% backend cycles idle (82.93%) 9,997,208,858 instructions #4.37 insn per cycle #0.09 stalled cycles per insn (83.66%) 1,679,799,451 branches # 2080.464 M/sec (83.66%) 106,599 branch-misses #0.01% of all branches (83.41%) ``` With this patch ```bash 902.557236 task-clock (msec) #0.980 CPUs utilized 94 context-switches #0.104 K/sec 0 cpu-migrations#0.000 K/sec 427 page-faults #0.473 K/sec 2,567,879,767 cycles#2.845 GHz (83.17%) 88,266,680 stalled-cycles-frontend #3.44% frontend cycles idle (83.17%) 20,826,862 stalled-cycles-backend#0.81% backend cycles idle (83.03%) 2,518,949,193 instructions #0.98 insn per cycle #0.04 stalled cycles per insn (83.62%) 847,459,928 branches # 938.954 M/sec (83.61%) 75,187,208 branch-misses #8.87% of all branches (83.39%) ``` Absolute counts are not comparable as gtest runs different loops for each test. The point is branch-misses jumps from 0.01% to 8.87%, which causes high frontend stall(cpu wait for fetching code to execute), and ipc(instructions per cycle) drops from 4.37 to 0.98. I didn't figure out which branch is miss predicted and why. My haswell cpu is too old to support branch tracing. My guess is [this line](https://github.com/apache/arrow/blob/5093b809d63ac8db99aec9caa7ad7e723f277c46/cpp/src/arrow/util/bit_util.cc#L285), no concrete justification. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616921669 Opened a jira card https://issues.apache.org/jira/browse/ARROW-8537 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616915079 @pitrou @wesm Oops, I only checked case "BitmapReader" from benchmark [arrow-bit-util-benchmark](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit_util_benchmark.cc). Obviously it's not enough. I compared all cases just now and see huge performance drop from below 4 tests: Before this patch: ```bash BenchmarkBitmapAnd/32768/1 563496 ns 563260 ns 1243 bytes_per_second=55.4806M/s BenchmarkBitmapAnd/131072/1 2219810 ns 2218984 ns 318 bytes_per_second=56.3321M/s BenchmarkBitmapAnd/32768/2 561738 ns 561467 ns 1265 bytes_per_second=55.6577M/s BenchmarkBitmapAnd/131072/2 2246229 ns 2245119 ns 305 bytes_per_second=55.6763M/s ``` After this patch: ```bash BenchmarkBitmapAnd/32768/11653467 ns 1652680 ns 422 bytes_per_second=18.9087M/s BenchmarkBitmapAnd/131072/1 6665501 ns 6661561 ns 105 bytes_per_second=18.7644M/s BenchmarkBitmapAnd/32768/21670793 ns 1670246 ns 423 bytes_per_second=18.7098M/s BenchmarkBitmapAnd/131072/2 6702369 ns 6698957 ns 103 bytes_per_second=18.6596M/s ``` Before reverting this patch, I would like to understand why it happens. BTW: we definitely need continuous benchmark tools to detect these things early. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616294784 I forgot to add jira no in the first commit, modified later. Looks jira status is not synced with this PR. Shall I abandon and push a new PR? https://issues.apache.org/jira/browse/ARROW-8523 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org