[GitHub] [arrow] wesm edited a comment on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size
wesm edited a comment on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647633470 Here's the benchmark comparison with clang-8 ``` $ archery benchmark diff --cc=clang-8 --cxx=clang++-8 2db48b4 653817301 --benchmark-filter=Cast benchmarkbaseline contender change % regression 41 CastUInt32ToInt32Safe/262144/1 769.933m items/sec4.474b items/sec 481.076 False 62 CastUInt32ToInt32Safe/262144/0 409.277m items/sec2.189b items/sec 434.792 False 15CastUInt32ToInt32Safe/32768/0 399.127m items/sec2.089b items/sec 423.357 False 7 CastUInt32ToInt32Safe/32768/1 742.226m items/sec3.721b items/sec 401.307 False 18CastInt64ToInt32Safe/262144/0 341.403m items/sec1.569b items/sec 359.706 False 55CastUInt32ToInt32Safe/262144/1000 351.335m items/sec1.612b items/sec 358.917 False 16 CastUInt32ToInt32Safe/32768/1000 334.147m items/sec1.484b items/sec 344.139 False 47 CastInt64ToInt32Safe/32768/0 328.491m items/sec1.414b items/sec 330.412 False 30CastInt64ToInt32Safe/262144/1 742.497m items/sec3.131b items/sec 321.638 False 51 CastInt64ToInt32Safe/262144/1000 304.244m items/sec1.226b items/sec 302.928 False 42 CastInt64ToInt32Safe/32768/1000 288.976m items/sec1.101b items/sec 280.942 False 32 CastInt64ToInt32Safe/32768/1 706.339m items/sec2.552b items/sec 261.273 False 45 CastDoubleToInt32Safe/262144/1 924.369m items/sec2.997b items/sec 224.214 False 54 CastInt64ToDoubleSafe/262144/0 419.319m items/sec1.324b items/sec 215.783 False 11CastInt64ToDoubleSafe/32768/0 408.425m items/sec1.216b items/sec 197.672 False 49 CastDoubleToInt32Safe/262144/2 207.799m items/sec 614.658m items/sec 195.795 False 9 CastDoubleToInt32Safe/32768/2 202.480m items/sec 584.558m items/sec 188.699 False 23CastInt64ToDoubleSafe/262144/1000 375.572m items/sec1.078b items/sec 186.948 False 50CastDoubleToInt32Safe/32768/1 869.447m items/sec2.445b items/sec 181.248 False 21 CastInt64ToDoubleSafe/262144/1 790.625m items/sec2.222b items/sec 181.101 False 59CastDoubleToInt32Safe/262144/1000 360.792m items/sec1.013b items/sec 180.714 False 44 CastInt64ToDoubleSafe/32768/1000 360.492m items/sec 988.897m items/sec 174.319 False 48 CastDoubleToInt32Safe/32768/1000 349.576m items/sec 932.771m items/sec 166.829 False 58 CastDoubleToInt32Safe/262144/0 407.159m items/sec1.067b items/sec 162.086 False 63CastInt64ToDoubleSafe/32768/1 746.561m items/sec1.893b items/sec 153.520 False 8 CastDoubleToInt32Safe/32768/0 395.857m items/sec 990.704m items/sec 150.268 False 67 CastDoubleToInt32Safe/262144/10 275.237m items/sec 612.002m items/sec 122.354 False 10 CastDoubleToInt32Safe/32768/10 266.596m items/sec 583.346m items/sec 118.813 False 69CastInt64ToInt32Safe/32768/10 232.545m items/sec 449.883m items/sec93.461 False 64 CastUInt32ToInt32Safe/32768/10 256.845m items/sec 482.636m items/sec87.909 False 61 CastInt64ToInt32Safe/262144/10 243.012m items/sec 435.232m items/sec79.099 False 0 CastUInt32ToInt32Safe/262144/10 264.244m items/sec 466.981m items/sec76.723 False 53 CastInt64ToDoubleSafe/32768/10 278.548m items/sec 441.752m items/sec58.591 False 1 CastInt64ToDoubleSafe/262144/10 283.181m items/sec 431.990m items/sec52.549 False 14 CastInt64ToInt32Safe/32768/2 170.844m items/sec 224.195m items/sec31.228 False 37CastUInt32ToInt32Safe/32768/2 182.246m items/sec 238.051m items/sec30.621 False 27 CastUInt32ToInt32Safe/262144/2 187.277m items/sec 231.385m items/sec23.553 False 28CastInt64ToInt32Safe/262144/2 175.893m items/sec 216.887m items/sec23.306 False 26CastInt64ToDoubleSafe/32768/2 189.465m items/sec 228.996m items/sec20.864 False 3CastInt64ToDoubleSafe/262144/2 192.523m items/sec 219.324m items/sec13.921 False 36 CastInt64ToInt32Unsafe/32768/02.993b items/sec3.227b items/sec 7.800 False 35 CastInt64ToInt32Unsafe/32768/22.937b items/sec3.154b items/sec 7.367 False 68 CastInt64ToInt32Unsafe/32768/12.966b items/sec3.176b items/sec 7.088 False 43 CastInt64ToInt32Unsafe/32768/102.940b item
[GitHub] [arrow] wesm edited a comment on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size
wesm edited a comment on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647233286 @emkornfield I agree. Realistically we're going to have to look at them both. FWIW, in this particular case it seems that the Clang performance is the most representative of how things are behaving across platforms. Stuff that autovectorizes well in gcc may not do much at all on MSVC. There's also the question of how `__builtin_expect` impacts optimizations. But in general I don't think we should be using the "ursabot benchmark" results (which use gcc) to make conclusions about what perf optimizations are working This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org