On 9 Dec 2015 02:44, "Peter Geoghegan" <p...@heroku.com> wrote:
> I guess you mean insertion sort. What's the theoretical justification
> for the change?
Er, right. Insertion sort.
The sort networks I used here are optimal both in number of comparisons and
depth. I suspect modern CPUs actually manage to do some of the comparisons
in parallel even.
I was experimenting with using SIMD registers and did a non SIMD
implementation like this first and noticed it was doing 15% fewer
comparisons than insertion sort and ran faster. That was for sets of 8, I'm
not sure there's as much saving on smaller sets.