On Sun, 15 May 2022, Steffen Nurpmeso wrote: > Stuart Henderson wrote in ... > |what's the perceived problem you're wanting to solve? and does that > |problem actually exist in the first place? > > The problem is that if have a low upper bound then modulo will "remove a > lot of randomization". For example if you have a program which > generates Lotto numbers (1..49), then using _uniform() as it is will > generate many duplicates.
Wut. The *WHOLE POINT* of arc4random_uniform() is that it has uniform distribution. Says right so in the manpage. If an implementation of that API fails to do that, it's a broken implementation. The OpenBSD implementation achieves that. NetBSD's implementation has the same core logic. Here's my quick lotto pick demo program, doing 4900000 picks, so they should all be somewhere near 100000: #include <stdlib.h> #include <stdio.h> #define LIMIT 49 int main(void) { unsigned counts[LIMIT] = { 0 }; int i; for (i = 0; i < 100000 * LIMIT; i++) counts[arc4random_uniform(LIMIT)]++; for (i = 0; i < LIMIT; i++) printf("%2d\t%7u\n", i+1, counts[i]); return 0; } And a sample run: : bleys; ./a.out 1 100100 2 100639 3 99965 4 99729 5 99641 6 99650 7 100299 8 100164 9 99791 10 100101 11 100657 12 100042 13 99661 14 99927 15 99426 16 99491 17 99646 18 100133 19 100013 20 99942 21 99873 22 99924 23 99567 24 100152 25 100688 26 100011 27 100481 28 99980 29 100406 30 99726 31 99808 32 99929 33 100050 34 99983 35 100048 36 99771 37 99906 38 100215 39 100261 40 100426 41 99847 42 99533 43 100368 44 99695 45 100041 46 100465 47 99875 48 100034 49 99920 : bleys; Looks pretty good to me, with repeated runs showing the values bouncing around. ... > I was a bit interested when i saw Luke's message, but i am no > mathematician and, like i said, i in fact never needed the > functionality. And the question i would have is how small > upper_bound has to be for the modulo problem to really kick in. If the implementation isn't broken, it's not a problem, period. > And even if, whether it matters. For example, if you have > a bitset of 1024 "in-flight things", and the small upper bound > hits a used slot, you could simply search forward until you find > a free one. I mean, with 1024 possible values the number of > possibilities is anyhow restricted. Well hey, let's give Luke's a try and see how uniform it is: #include <stdlib.h> #include <stdio.h> #define LIMIT 49 int main(void) { unsigned counts[LIMIT] = { 0 }; unsigned counts2[LIMIT] = { 0 }; int i; for (i = 0; i < 100000 * LIMIT; i++) { counts[arc4random_uniform(LIMIT)]++; counts2[arc4random_uniform_fast_simple(LIMIT)]++; } for (i = 0; i < LIMIT; i++) printf("%2d\t%7u\t%7u\n", i+1, counts[i], counts2[i]); return 0; } : bleys; ./a.out 1 100188 76502 2 99983 76602 3 100521 76522 4 100416 76682 5 100171 76387 6 100163 76759 7 100024 76336 8 100009 76703 9 99769 76237 10 99892 76532 11 100197 76730 12 100483 76398 13 99769 76310 14 100075 76474 15 99781 76599 16 99846 76439 17 99814 76430 18 100313 76648 19 100259 76813 20 99885 77068 21 100302 76546 22 99998 76698 23 99491 76678 24 100340 76324 25 99763 115263 26 99872 153008 27 100022 152979 28 99481 153793 29 100018 210714 30 99617 229286 31 100167 297003 32 100270 449664 33 100468 76790 34 99115 76452 35 99921 76392 36 99862 76140 37 100485 76607 38 100029 75885 39 99577 76498 40 99479 76727 41 100139 76746 42 100883 76698 43 100102 76474 44 99801 76592 45 100117 76124 46 99678 76417 47 99770 76639 48 99524 77034 49 100151 76658 : bleys; Wow, that last column is *bad*. Repeated runs show 25-32 are consistently high, 32 being *outrageously* off, and the others are low. Luke's implementation does not correctly implement the API. Doesn't matter if it's a million times faster when it doesn't deliver the goal. Philip Guenther