> "Looks like"? I cannot find the reference, but I've read a while back that a well-known company from Redwood uses it for their in-memory columnar storage. That might have just been a rumor or might have been research only - not sure. It does not really matter anyways.
> SortTuples are currently 24 bytes, and supported vector registers are 16 > bytes, so not sure how you think that would work. The thought was to logically group multiple sort tuples together and then create a vectorized version of that group with just the primitive type sort key as well as a small-sized index/offset into that sort group to later swap the corresponding sort tuple referenced by that index/offset. The sorting network would allow us to do branch-less register based sorting for a particular sort run. I guess this idea is moot, given ... > More broadly, the more invasive a change is, the more risk is involved, and > the more effort to test and evaluate. If you're serious about trying to > improve insertion sort performance, the simple idea we discussed at the start > of the thread is a much more modest step that has a good chance of justifying > the time put into it. That is not to say it's easy, however, because testing > is a non-trivial amount of work. I absolutely agree. Let's concentrate on improving things incrementally. Please excuse me wondering given that you have contributed some of the recent vectorization stuff and seeing that you have obviously experimented a lot with the sort code, that you might already have tried something along those lines or researched the subject - it is definitely a very interesting topic. Cheers, Ben