Hello Yura,

However, I'm not enthousiastic at combining two methods depending on
the range, the function looks complex enough without that, so I would
suggest not to take this option. Also, the decision process adds to
the average cost, which is undesirable.

Given 99.99% cases will be in the likely case, branch predictor should
eliminate decision cost.

Hmmm. ISTM that a branch predictor should predict that unknown < small should probably be false, so a hint should be given that it is really true.

And as Dean Rasheed correctly mentioned, mask method will have really bad pattern for branch predictor if range is not just below or equal to power of two.

On average the bitmask is the better unbiased method, if the online figures are to be trusted. Also, as already said, I do not really want to add code complexity, especially to get lower average performance, and especially with code like "threshold = - range % range", where both variables are unsigned, I have a headache just looking at it:-)

And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
on modern processor.

Well, % is not cheap either.

Well, probably it could run in parallel with some part of xoroshiro, but it depends on how compiler will optimize this function.

I would certainly select the unbias multiply method if we want a u32 range variant.

There could be two functions.

Yep, but do we need them? Who is likely to want 32 bits pseudo random ints in a range? pgbench needs 64 bits.

So I'm still inclined to just keep the faster-on-average bitmask method, despite that it may be slower for some ranges. The average cost for the worst case in PRNG calls is, if I'm not mistaken:

  1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2

which does not look too bad.

--
Fabien.


Reply via email to