[Issue 3738] MinstdRand0 and MinstdRand very poor performance
https://issues.dlang.org/show_bug.cgi?id=3738 Andrei Alexandrescu and...@erdani.com changed: What|Removed |Added Version|2.039 |D2 --
[Issue 3738] MinstdRand0 and MinstdRand very poor performance
http://d.puremagic.com/issues/show_bug.cgi?id=3738 Witold Baryluk bary...@smp.if.uj.edu.pl changed: What|Removed |Added Status|ASSIGNED|NEW --- Comment #3 from Witold Baryluk bary...@smp.if.uj.edu.pl 2010-01-24 08:35:08 PST --- One can observe that ?: line (branch2) _x = (y v || y = 2147483647) ? ((y+1) 0x7fff_u) : y; can be changed , first to not contain not needed yv, to (branch1) _x = (y = 2147483647) ? ((y+1) 0x7fff_u) : y; or even remove branch completly (branchlessA) _x = (y 0x7fff_) + (y 31); and possibly rearange terms (branchlessB) _x = (y + (y 31)) 0x7fff_; MinstdRnd0 standard: 44.65 s inline: 41.11 s standard with tricks (2 branches): 17.08 s inline with tricks (2 branches): 17.04 s standard with tricks (1 branch): 15.24 s inline with tricks (1 branch): 13.59 s standard with tricks (branchlesA): 16.50 s inline with tricks (branchelesA): 15.82 s standard with tricks (branchlesB): 17.74 s inline with tricks (branchelesB): 15.30 s MinstdRnd standard: 44.95 s inline: 41.17 s standard with tricks (2 branch): 17.04 s inline with tricks (2 branche): 17.03 s standard with tricks (1 branch): 15.29 s inline with tricks (1 branche): 13.54 s standard with tricks (branchlesA): 16.46 s inline with tricks (branchelessA): 15.88 s standard with tricks (branchlesB): 18.10 s inline with tricks (branchelessB): 15.31 s So fastest is currently branch1 version. Updated patch: void popFront() { static if (m) -_x = cast(UIntType) ((cast(ulong) a * _x + c) % m); +static if (is(UIntType == uint) m == 4294967295uL) { + const ulong x = (cast(ulong) a * _x + c); + const ulong v = x 32; + const ulong w = (x 0x_uL); + const uint y = cast(uint)(v+w); + _x = (y v || y = 4294967295u) ? (y+1) : y; +} else static if (is(UIntType == uint) m == 2147483647u) { + const ulong x = (cast(ulong) a * _x + c); + const ulong v = x 31; + const ulong w = (x 0x7fff_uL); + const uint y = cast(uint)(v+w); + _x = (y = 2147483647u) ? ((y+1) 0x7fff_u) : y; +} else { +_x = cast(UIntType) ((cast(ulong) a * _x + c) % m); +} else _x = a * _x + c; } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 3738] MinstdRand0 and MinstdRand very poor performance
http://d.puremagic.com/issues/show_bug.cgi?id=3738 --- Comment #5 from Witold Baryluk bary...@smp.if.uj.edu.pl 2010-01-24 12:03:20 PST --- All my tests pases on your version. 10_000_000_000 (more than all possibilities) passes (for MinstdRand0 and MinstdRand, it is possible that for other a it will fail, but i tested and done some calculation. and it should work for all a,cm. I also tested versions with slightly changed constants (like 30 instad of 31, or int.max-1, int.max-2, int.max-3, int.max+1 in multiple comibinations an allone), and all them fail. Shouldn't const uint y by immucatable y, just like in first static if? I think that our pretty optimised line, can be further optimalised: (1 branch sub) - _x = (y = int.max) ? ((y + 1) int.max) : y; + _x = (y = int.max) ? (y - int.max) : y; :) This gives: MinstdRand0: standard with tricks (1 branch sub): 15.25 inline with tricks (1 branch sub): 13.53 s MinstdRand: standard with tricks (1 branch sub): 15.26 inline with tricks (1 branch sub): 13.52 s It doesn't give much difference (maybe 2%). But i thinkg it is better to have one substraction, than one add and one binary and. I found few interesting materials about this m=2^^31-1 operation. One is http://www.firstpr.com.au/dsp/rand31/ I tested this Carta's versions: But they are only relevant for a=16807 (out Park-Miller's StdminRand0 generator), so pretty limited: Carta1 (original): uint lo = 16807*(_x 0xu); immutable uint hi = 16807*(_x 16); lo += (hi 0x7FFF) 16; lo += (hi 15); _x = (lo int.max) ? (lo - int.max) : lo; Carta3: uint lo = 16807*(_x 0xu); immutable uint hi = 16807*(_x 16); lo += ((hi 0x7FFF) 16) + (hi 15); _x = (lo int.max) ? (lo - int.max) : lo; Carta2: immutable uint hi = 16807*(_x 16), lo 16807*(_x 0xu) + ((hi 0x7FFF) 16) + (hi 15); _x = (lo int.max) ? (lo - int.max) : lo; Last lines can also be changed to branchles: _x = (lo int.max) ? (lo - int.max) : lo; Timings: Carta code1: 22.30 s Carta code1 branchless: 18.42 Carta code3: 21.84 s Carta code3 branchless: 18.19 s Carta code2: 23.84 s Carta code2 branchless: 19.84 s So it is slower in all cases, and it is only for speciall cases (c=0, a 2^^15). Of course answers are still correct. Performing uint*uint+uint - ulong is just faster than their tricks, and more general, and can be reused in many other places. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 3738] MinstdRand0 and MinstdRand very poor performance
http://d.puremagic.com/issues/show_bug.cgi?id=3738 --- Comment #1 from Witold Baryluk bary...@smp.if.uj.edu.pl 2010-01-23 18:29:30 PST --- Ok, sorry. Not all are better than Mt19937. But definitly better than MinstdRand{,0}. And few are acutally better (empirically). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 3738] MinstdRand0 and MinstdRand very poor performance
http://d.puremagic.com/issues/show_bug.cgi?id=3738 Andrei Alexandrescu and...@metalanguage.com changed: What|Removed |Added Status|NEW |ASSIGNED CC||and...@metalanguage.com --- Comment #2 from Andrei Alexandrescu and...@metalanguage.com 2010-01-23 23:09:46 PST --- Thanks for the fix. I'll incorporate the patch into Phobos. Walter will decide whether he wants to pursue a compiler change. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---