Hello Dean,

Thanks a lot for this work. This version looks much better than the previous one you sent, and has significant advantages over the one I sent, in particular avoiding the prime number stuff and large modular multiply.
So this looks good!

I'm happy that halves-xoring is back because it was an essential part of the design. ISTM that the previous version you sent only had linear/affine transforms which was a bad idea.

The linear transform is moved inside halves, why not, and the any-odd-number multiplication is prime with power-of-two trick is neat on these.

However I still have some reservations:

First, I have a thing against erand48. The erand 48 bits design looks like something that made sense when computer architectures where 16/32 bits and large integers were 32 bits, so a 16 bits margin looked good enough. Using a int16[3] type now is really depressing and probably costly on modern architectures. With 64 bits architecture, and given that we are manipulating 64 bits integers (well 63 really), ISTM that the minimal state size for a PRNG should be at least 64 bits. It there a good reason NOT to use a 64 bits state prn generator? I took Knuth's because it is cheap and 64 bits. I'd accept anything which is at least 64 bits. I looked at xor-shift things, but there was some controversy around these so I kept it simple and just shifted small values to get rid of the obvious cycles on Knuth's generator.

Second, I have a significant reservation about the very structure of the transformation in this version:

  loop 4 times :

    // FIRST HALF STEER
    m/r = pseudo randoms
    if v in first "half"
       v = ((v * m) ^ r) & mask;
       rotate1(v)

    // FULL SHIFT 1
    r = pseudo random
    v = (v + r) % size

    // SECOND HALF STEER
    m/r = pseudo randoms
    if v in second "half"
       same as previous on second half

    // FULL SHIFT 2
    r = pseudo random
    v = (v + r) % size

I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of values are kept out of STEERING. Whole chunks of values could be kept unshuffled because they would only have SHIFTS apply to them and each time fall in the not steered half. It should be an essential part of the design that at least one steer is applied on a value at each round, and if two are applied then fine, but certainly not zero. So basically I think that the design would be significantly improved by removing "FULL SHIFT 1".

Third, I think that the rotate code can be simplified, in particular the ?: should be avoided because it may induce branches quite damaging to processor performance.

I'll give it some more thoughts.

--
Fabien.


Reply via email to