On 29.11.23 18:15, Nathan Bossart wrote:
Using the same benchmark as we did for the SSE2 linear searches in
XidInMVCCSnapshot() (commit 37a6e5d) [1] [2], I see the following:

   writers    sse2    avx2     %
       256    1195    1188    -1
       512     928    1054   +14
      1024     633     716   +13
      2048     332     420   +27
      4096     162     203   +25
      8192     162     182   +12

AFAICT, your patch merely provides an alternative AVX2 implementation for where currently SSE2 is supported, but it doesn't provide any new API calls or new functionality. One might naively expect that these are just two different ways to call the underlying primitives in the CPU, so these performance improvements are surprising to me. Or do the CPUs actually have completely separate machinery for SSE2 and AVX2, and just using the latter to do the same thing is faster?



Reply via email to