I wrote: > Aside from that, this seems like a good place to settle down, so I'm > going to create a CF entry for this. I'll start more rigorous > performance testing in the near future.
Here's the first systematic test results, with scripts. Overall, I'm very pleased. With extremely low cardinality, it's close enough to our B&M quicksort that any difference (a hair slower or faster) can be discarded as insignificant. It's massively faster with most other inputs, so I'll just highlight the exceptions: "ascending" - Our qsort runs a "presorted check" before every partitioning step, and I hadn't done this for radix sort yet because I wanted to see what the "natural" difference is. I'm inclined to put in a single precheck at the beginning (people have come to expect that to be there), but not one at every recursion because I don't think that's useful. (Aside: that precheck at every recursion should be replaced by something that detects ascending/descending runs at the very start, but that's a separate thread) "stagger" with multiplier = no. records / 2 - This seems to be a case where the qsort's presorted check happens to get lucky. As I said above, we should actually detect more sorted runs with something more comprehensive. "p5" - This is explicitly designed to favor the B&M qsort. The input is 95% zeros, 2.5% negative numbers, and 2.5% positive numbers. The first qsort pivot is pretty much guaranteed to be zero, and the first partitioning step completes very quickly. Radix sort must do a lot more work, but it not different than the amount of work it does with other patterns -- it's much less sensitive to the input distribution than qsort. In this case, there's a mix of negative and positive bigints. That defeats common prefix detection, and the first iteration deals into two piles: negative and non-negative. Then a few recursions happen where there is only a single distinct byte, so no useful work happens. I suppose I could try common prefix detection at every recursion, but I don't think that's widely beneficial for integers. Maybe the single-byte-plus comparator small qsort would help a little, and I'm considering adding that anyway. In one sense this is the most worrying, since there doesn't seem to be a widely-useful mitigation, but in another sense it's the least worrying, since this case is deliberately constructed to be at a disadvantage. -- John Naylor Amazon Web Services
BM-perf-test-rand-stag-sawtooth-20251120.sh
Description: application/shellscript
BM-perf-test-misc-20251120.sh
Description: application/shellscript
