> Note that our current implemention is highly optimized for low-cardinality > inputs. > This is needed for aggregate queries. I found this write-up of a couple > scalar and > vectorized sorts, and they show this library doing very poorly on very-low > cardinality inputs. I would look into that before trying to get something in > shape to > share. > > https://github.com/Voultapher/sort-research- > rs/blob/main/writeup/intel_avx512/text.md
That write up is fairly old and those perf problems has subsequently been fixed. See https://github.com/intel/x86-simd-sort/pull/127 and https://github.com/intel/x86-simd-sort/pull/168. I still suggest measuring perf here for thoroughness. > > There is also the question of hardware support. It seems AVX-512 is not > supported well on client side, where most developers work. And availability of > any flavor is not guaranteed on server either. > Something to keep in mind. simd-sort also works on avx2 which is widely available. I would suggest benchmarking on one of the client laptops to measure the perf. Raghuveer