Re: [RFC resend v1 2/2] Use arc4random_range() instead of arc4random_uniform() when appropriate
On 1/1/23 01:08, Alejandro Colomar wrote: On 12/31/22 20:08, Alejandro Colomar wrote: This makes the code much more readable and self-documented. While doing this, I noticed a few bugs, and other cases which may be bugs or not. Switching to this specialized API makes it easier to spot such bugs, but since I'm not familiar with the code, I kept some bugs unfixed. The most obvious ones (although I may be wrong) I fixed them. And in some cases where it was very unclear, I didn't touch the old *_uniform() code. Below are the cases where I changed the behavior (I considered it a bug): * usr.bin/ssh/auth.c: - *cp = hashchars[arc4random_uniform(sizeof(hashchars) - 1)]; + *cp = hashchars[arc4random_range(0, sizeof(hashchars) - 1)]; Reconsidering, this one is probably better just as arc4random_uniform(sizeof(hashchars)). I was also wrong here. I was confused by the implicit strlen() calculation with sizeof()-1, whose -1 was cancelled by the +1. -- <http://www.alejandro-colomar.es/> OpenPGP_signature Description: OpenPGP digital signature
Re: [RFC resend v1 2/2] Use arc4random_range() instead of arc4random_uniform() when appropriate
Hi Alejandro, Alejandro Colomar wrote on Sun, Jan 01, 2023 at 01:08:17AM +0100: > On 12/31/22 20:08, Alejandro Colomar wrote: >> This makes the code much more readable and self-documented. While doing >> this, I noticed a few bugs, and other cases which may be bugs or not. >> Switching to this specialized API makes it easier to spot such bugs, but >> since I'm not familiar with the code, I kept some bugs unfixed. The >> most obvious ones (although I may be wrong) I fixed them. And in some >> cases where it was very unclear, I didn't touch the old *_uniform() code. >> >> Below are the cases where I changed the behavior (I considered it a bug): >> >> * usr.bin/ssh/auth.c: >> >> - *cp = hashchars[arc4random_uniform(sizeof(hashchars) - 1)]; >> + *cp = hashchars[arc4random_range(0, sizeof(hashchars) - 1)]; > Reconsidering, this one is probably better just as > arc4random_uniform(sizeof(hashchars)). That seems to introduce exactly the same bug as your first try. I already explained last year that the code is correct as-is. We don't want NUL bytes in the password hash, hence the - 1. Also, please avoid using MIME when you post to OpenBSD lists, with the only exception of posting tarballs of new ports to ports@, in which case attachments are OK. Yours, Ingo
Re: [RFC resend v1 2/2] Use arc4random_range() instead of arc4random_uniform() when appropriate
On 12/31/22 20:08, Alejandro Colomar wrote: This makes the code much more readable and self-documented. While doing this, I noticed a few bugs, and other cases which may be bugs or not. Switching to this specialized API makes it easier to spot such bugs, but since I'm not familiar with the code, I kept some bugs unfixed. The most obvious ones (although I may be wrong) I fixed them. And in some cases where it was very unclear, I didn't touch the old *_uniform() code. Below are the cases where I changed the behavior (I considered it a bug): * usr.bin/ssh/auth.c: - *cp = hashchars[arc4random_uniform(sizeof(hashchars) - 1)]; + *cp = hashchars[arc4random_range(0, sizeof(hashchars) - 1)]; Reconsidering, this one is probably better just as arc4random_uniform(sizeof(hashchars)). * usr.sbin/ftp-proxy/ftp-proxy.c: - return (IPPORT_HIFIRSTAUTO + - arc4random_uniform(IPPORT_HILASTAUTO - IPPORT_HIFIRSTAUTO)); + return arc4random_range(IPPORT_HIFIRSTAUTO, IPPORT_HILASTAUTO); * usr.sbin/rad/engine.c: - tv.tv_sec = MIN_RTR_ADV_INTERVAL + - arc4random_uniform(MAX_RTR_ADV_INTERVAL - MIN_RTR_ADV_INTERVAL); + tv.tv_sec = arc4random_range(MIN_RTR_ADV_INTERVAL, MAX_RTR_ADV_INTERVAL); In the following change, I didn't use the temporary variable 'num3'. AFAICS, this doesn't affect other uses of the variable in other places, because they set it before use. But please check carefully; I may have missed something: * usr.sbin/cron/entry.c: - /* get a random number in the interval [num1, num2] - */ - num3 = num1; - num1 = arc4random_uniform(num2 - num3 + 1) + num3; + num1 = arc4random_range(num1, num2); Signed-off-by: Alejandro Colomar -- <http://www.alejandro-colomar.es/> OpenPGP_signature Description: OpenPGP digital signature
Re: [RFC resend v1 2/2] Use arc4random_range() instead of arc4random_uniform() when appropriate
Hi Alejandro, Alejandro Colomar wrote on Sat, Dec 31, 2022 at 08:08:14PM +0100: > This makes the code much more readable and self-documented. I object. I is a needless detour that invites confusion and bugs. In C code, it is customary to deal with half-open intervals, and closed intervals are rare. For example, array indices run from 0 to sizeof(array)-1, sizeof(array) points to the storage location beyond the last element, and C programmers are used to that. So the was arc4random_uniform(3) works feels familiar to C programmers, whereas your proposal of arc4random_range(3) is idiosyncratic and provokes bugs. > While doing this, I noticed a few bugs, I doubt it. I think you fell into your own trap. > * usr.bin/ssh/auth.c: > >- *cp = hashchars[arc4random_uniform(sizeof(hashchars) - 1)]; >+ *cp = hashchars[arc4random_range(0, sizeof(hashchars) - 1)]; There is no bug here. The code goes: const char hashchars[] = "./ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz0123456789"; /* from bcrypt.c */ char *cp; /* ... */ for (cp = fake.pw_passwd + 7; *cp != '\0'; cp++) *cp = hashchars[arc4random_uniform(sizeof(hashchars) - 1)]; sizeof(hashchars) is the size of the char array, i.e. the string length plus one (plus one for the terminating NUL byte). So sizeof(hashchars)-1 is the number of useful characters in the string, which is the same as the strlen(3). So arc4random_uniform() returns a number in the half-open interval [0, strlen). The minimum index is 0 -> '.', the maximum index is strlen-1 -> '9'. This is all perfectly fine. Your patch introduces a bug. The terminaing NUL character may now be copied into fake.pw_passwd. This is exactly one of the reasons why i objected to your arc4random_range() proposal: it will cause confusion and bugs. I think that the first hunk of your patch introduces rather than fixes a bug makes your patch unworthy of review. It should be summarily rejected. Yours, Ingo
[RFC resend v1 2/2] Use arc4random_range() instead of arc4random_uniform() when appropriate
This makes the code much more readable and self-documented. While doing this, I noticed a few bugs, and other cases which may be bugs or not. Switching to this specialized API makes it easier to spot such bugs, but since I'm not familiar with the code, I kept some bugs unfixed. The most obvious ones (although I may be wrong) I fixed them. And in some cases where it was very unclear, I didn't touch the old *_uniform() code. Below are the cases where I changed the behavior (I considered it a bug): * usr.bin/ssh/auth.c: - *cp = hashchars[arc4random_uniform(sizeof(hashchars) - 1)]; + *cp = hashchars[arc4random_range(0, sizeof(hashchars) - 1)]; * usr.sbin/ftp-proxy/ftp-proxy.c: - return (IPPORT_HIFIRSTAUTO + - arc4random_uniform(IPPORT_HILASTAUTO - IPPORT_HIFIRSTAUTO)); + return arc4random_range(IPPORT_HIFIRSTAUTO, IPPORT_HILASTAUTO); * usr.sbin/rad/engine.c: - tv.tv_sec = MIN_RTR_ADV_INTERVAL + - arc4random_uniform(MAX_RTR_ADV_INTERVAL - MIN_RTR_ADV_INTERVAL); + tv.tv_sec = arc4random_range(MIN_RTR_ADV_INTERVAL, MAX_RTR_ADV_INTERVAL); In the following change, I didn't use the temporary variable 'num3'. AFAICS, this doesn't affect other uses of the variable in other places, because they set it before use. But please check carefully; I may have missed something: * usr.sbin/cron/entry.c: - /* get a random number in the interval [num1, num2] - */ - num3 = num1; - num1 = arc4random_uniform(num2 - num3 + 1) + num3; + num1 = arc4random_range(num1, num2); Signed-off-by: Alejandro Colomar --- games/boggle/boggle/bog.c | 2 +- games/canfield/canfield/canfield.c | 2 +- games/mille/init.c | 2 +- gnu/gcc/gcc/cfgexpand.c | 2 +- lib/libevent/select.c | 2 +- regress/lib/libc/malloc/malloc_general/malloc_general.c | 2 +- regress/sys/sys/tree/rb/rb-test.c | 3 +-- regress/sys/sys/tree/splay/splay-test.c | 3 +-- sbin/iked/ikev2.c | 2 +- sys/dev/pci/drm/drm_linux.c | 2 +- sys/dev/pci/drm/include/linux/random.h | 2 +- sys/kern/kern_fork.c| 2 +- sys/net/if_spppsubr.c | 7 ++- sys/net/pf.c| 2 +- sys/net/pf_lb.c | 4 ++-- sys/netinet/ip_ipsp.c | 2 +- usr.bin/nc/netcat.c | 2 +- usr.bin/skeyinit/skeyinit.c | 2 +- usr.bin/ssh/auth.c | 2 +- usr.sbin/cron/entry.c | 5 + usr.sbin/ftp-proxy/ftp-proxy.c | 3 +-- usr.sbin/pppd/chap.c| 5 + usr.sbin/rad/engine.c | 3 +-- usr.sbin/relayd/shuffle.c | 2 +- 24 files changed, 26 insertions(+), 39 deletions(-) diff --git a/games/boggle/boggle/bog.c b/games/boggle/boggle/bog.c index c0e19454a27..3ed4888fc43 100644 --- a/games/boggle/boggle/bog.c +++ b/games/boggle/boggle/bog.c @@ -607,7 +607,7 @@ newgame(char *b) /* Shuffle the cubes using Fisher-Yates (aka Knuth P). */ p = ncubes; while (--p) { - q = (int)arc4random_uniform(p + 1); + q = (int)arc4random_range(0, p); tmp = cubes[p]; cubes[p] = cubes[q]; cubes[q] = tmp; diff --git a/games/canfield/canfield/canfield.c b/games/canfield/canfield/canfield.c index 346fd20a1d2..dec75f6531f 100644 --- a/games/canfield/canfield/canfield.c +++ b/games/canfield/canfield/canfield.c @@ -531,7 +531,7 @@ shuffle(struct cardtype *deck[]) deck[i]->paid = FALSE; } for (i = decksize - 1; i > 0; i--) { - j = arc4random_uniform(i + 1); + j = arc4random_range(0, i); if (i != j) { temp = deck[i]; deck[i] = deck[j]; diff --git a/games/mille/init.c b/games/mille/init.c index a86157739dd..c0cc6ac1f02 100644 --- a/games/mille/init.c +++ b/games/mille/init.c @@ -90,7 +90,7 @@ shuffle(void) CARDtemp; for (i = DECK_SZ - 1; i > 0; i--) { - r = arc4random_uniform(i + 1); + r = arc4random_range(0, i); temp = Deck[r]; Deck[r] = Deck[i]; Deck[i] = temp; diff --git a/gnu/gcc/gcc/cfgexpand.c b/gnu/gcc/gcc/cfgexpand.c index 17aff165f6d..0cb8a21289b 100644 --- a/gnu/gcc/gcc/cfgexpand.c +++ b/gnu/gcc/gcc/cfgexpand.c @@ -438,7 +438,7 @@ partition_stack
Re: Picky, but much more efficient arc4random_uniform!
We will not be using your code. End of story. Luke Small wrote: > Marc, all you all have to do is say is that you all refuse to provide it.
Re: Picky, but much more efficient arc4random_uniform!
Marc, all you all have to do is say is that you all refuse to provide it. I was asked to at least provide evidence for correctness. I did so; and I’d say I did a stellar job aside from getting some kind of statistical program. The following has an attached source code for my test (along with referencing another post) in case you missed it in this ginormous thread: https://marc.info/?l=openbsd-tech&m=165306234108572&w=2 Aside from multithreaded processes, if you provide this, folks might use this instead of creating their own incorrect custom random number generators to improve performance for smaller values. That custom deterministic “arc4random()” earlier in the thread from Damien was not very good by the way. I tested it. > You're about one step away from getting into my spam rules. > > Whatever you say, the way you say it, *repeatedly* when all the people > I know (me included) tell you to get your facts straight so you don't > sound like a wacko, makes reading you a total waste of time. > -- -Luke
Re: Picky, but much more efficient arc4random_uniform!
On Sat, May 21, 2022 at 07:47:40AM -0500, Luke Small wrote: > Perhaps it was rude sending off list stuff to the list. Your email sounded > "less than friendly" and more of a professional challenge that you were > definitely in the works to produce; much like Damien Miller’s challenge to > prove correctness. So, whatever. You're about one step away from getting into my spam rules. Whatever you say, the way you say it, *repeatedly* when all the people I know (me included) tell you to get your facts straight so you don't sound like a wacko, makes reading you a total waste of time.
Re: Picky, but much more efficient arc4random_uniform!
Perhaps it was rude sending off list stuff to the list. Your email sounded "less than friendly" and more of a professional challenge that you were definitely in the works to produce; much like Damien Miller’s challenge to prove correctness. So, whatever. Aside from that unpleasantness: I worked in __builtin_clz(upperbound-1) mentioned earlier in the thread; instead of my binary search. It made the knuth sort simulation run even faster. arc4random_uniform now takes 130% of the time mine takes for a series of random numbers decreasing from 65535 to 2. “__builtin_clz((upperbound - 1) | 1)” is only needed if upperbound can be 1 and that possibility is eliminated by first checking to see that upperbound is >= 2. static uint32_t arc4random_uniform_small_unlocked(uint32_t upper_bound) { static uint64_t rand_holder = 0; static size_t rand_bits = 0; static size_t upper_bound0 = 2; static size_t bitmask = 0x01; static size_t bits = 1; const size_t ub = upper_bound; size_t ret; if (ub != upper_bound0) { if (ub < 2) { /* * reset static cache for if a process needs to fork() * to make it manually fork-safe */ if (ub == 0) { rand_holder = 0; rand_bits = 0; } return 0; } bits = 32 - __builtin_clz(ub - 1); bitmask = ((size_t)1 << bits) - 1; upper_bound0 = ub; } do { if (rand_bits < bits) { rand_holder |= ((uint64_t)arc4random()) << rand_bits; /* * rand_bits will be between 0 and 31 here * so the 0x20 bit will be empty * rand_bits += 32; */ rand_bits |= 32; } ret = rand_holder & bitmask; rand_holder >>= bits; rand_bits -= bits; } while (ret >= ub); return (uint32_t)ret; } > Luke, > > It's very bad etiquette to deliberately re-post a private, off-list comment > to a public mailing list. > Also, please fix your email client to respect the Mail-Followup-To: header, > this is another lack of etiquette on your part. > I am either using gmail app on a phone or gmail.com, so I don't know if I can help you there.
Re: Picky, but much more efficient arc4random_uniform!
On Fri, May 20, 2022 at 09:48:28PM -0500, Luke Small wrote: > Crystal: You can prove that for random, repetitive, correct, database > record name generation using small upperbounds, the demonstrated 1/3-1/2 > runtime isn???t worth it for an upperbound like 26 - 92 in a business context > that fights for every last millisecond? > > Bring it. > > Prove the correctness of whatever algorithm you???re using while you???re at > it. Luke, It's very bad etiquette to deliberately re-post a private, off-list comment to a public mailing list. Just remember that you might need my help one day, and if your mail has been sent to /dev/null or otherwise ignored then I will never see it. It's also very rude to make demands when I've already told you that I have other work of a much higher priority. IF AND ONLY IF there is sufficient interest from OTHER people, (who can indicate it to me OFF LIST), then I will consider writing up a comprehensive article debunking your ideas and concepts. _Quality_ work takes time, and I've only just finished writing up our response to the claim that our asm optimisation in the rasops bit shifting code wasn't beneficial over the C compiler output, (which of course it was). Also, please fix your email client to respect the Mail-Followup-To: header, this is another lack of etiquette on your part.
Re: Picky, but much more efficient arc4random_uniform!
Crystal: You can prove that for random, repetitive, correct, database record name generation using small upperbounds, the demonstrated 1/3-1/2 runtime isn’t worth it for an upperbound like 26 - 92 in a business context that fights for every last millisecond? Bring it. Prove the correctness of whatever algorithm you’re using while you’re at it. …unless a lot of those processes are threaded of course. :/ On Fri, May 20, 2022 at 7:27 PM Crystal Kolipe wrote: > > I've actually written a program to demonstrate how pointless your chacha > bit > saving routine is :). I just haven't posted it yet because I'm too busy > with > other, (more useful), things to bother finishing it off. > > Your thread on -tech has already wasted more bits than your random number > routine would save in a very long time, ha ha ha. > -- -Luke
Re: Picky, but much more efficient arc4random_uniform!
I appreciate your response, Damien. I did do the bare minimum of correctness testing and it was the post right after Guenther was congratulated on proving incorrectness: https://marc.info/?l=openbsd-tech&m=165259528425835&w=2 The post includes software to reproduce the results. I wrote a highly dynamically allocated program to test at intervals of intervals to show at various stages to show the degree that the output remains random This is an example of some output: testing arc4random_uniform(5) and arc4random_uniform_small_unlocked(5): 256X 2048X16384X 131072X 1048576X 0 original246 2053 16400131115 1048306 mine263 2042 16312131258 1046989 1 original234 2013 16337130894 1047810 mine249 2022 16304131161 1049511 2 original236 2061 16367130802 1047430 mine257 2117 16597130718 1046375 3 original284 2089 16444131092 1050332 mine266 2058 16379131190 1049877 4 original280 2024 16372131457 1049002 mine245 2001 16328131033 1050128 max-min values: original 5076 107 655 2902 mine 21 116 293 540 3753 The program is set up to test values 2 through 50. This will create 224KiB of output. I suspected that you'd prefer that I didn't attach it. Some progress output has been relegated to stderr so that you can easily pipe it to a file. I added some getrlimit an setrlimit functions to maximize memory limits if necessary In the case that I created for you, this will not be needed. It uses 1276K RAM in the current configuration. > Just to bring this back to where we came in: even putting thread-safety > aside (which you absolutely can't): it doesn't matter how much faster > it is, your replacement function isn't useful until you do the work to > demonstrate correctness. > > You have done literally zero so far, not even the bare minimum of > testing the output. As a result your first version was demonstrated to > be completely broken by literally the most basic of possible tests, a > mere 10 lines of code. > > That you left this to others to do tells me that you fundamentally don't > understand the environment in which you're trying to operate, and that > you don't respect the time of the people you're trying to convince. > > Please stop wasting our time. > > -d -- -Luke #include #include #include #include #include #include #include extern char *malloc_options; /* cc arc4random_uniform_small.c -O2 -march=native -mtune=native -flto=full \ -static -p -fno-inline && ./a.out && gprof a.out gmon.out cc arc4random_uniform_small.c -O2 && ./a.out */ /* * uint32_t * arc4random_uniform(uint32_t upper_bound); * * for which this function is named, provides a cryptographically secure: * uint32_t arc4random() % upper_bound; * * this function minimizes the waste of randomly generated bits, * for small upper bounds. * * 'bits' is the requested number of random bits and it needs to be * within the following parameters: * * 1 << bits >= upper_bound > (1 << bits) >> 1 * * I make a possibly incorrect presumption that size_t is * at the very least a uint32_t, but probably a uint64_t for speed */ static uint32_t arc4random_uniform_small_unlocked(uint32_t upper_bound) { static uint64_t rand_holder = 0; static uint64_t rand_bits = 0; static uint64_t upper_bound0 = 2; static uint64_t bits0 = 1; uint64_t bits = 16, i = 1, n = 32, ret, ub = upper_bound; if (ub != upper_bound0) { if (ub < 2) { /* * reset static cache for if a process needs to fork() * to make it manually fork-safe */ if (ub == 0) { rand_holder = 0; rand_bits = 0; } return 0; } /* * binary search for ideal size random bitstream selection */ for (;;) { if ( ub > ((uint64_t)1 << bits) ) { if (n - i == 1) { bits = n; break; } i = bits; bits = (i + n) / 2; continue; } if ( ub <= ((uint64_t)1 << bits) >> 1 ) { if (n - i == 1) { bits = n; break; } n = bits; bits = (i + n) / 2; continue; } break; } upper_bound0 = ub; bits0 = bits; } else bits = bits0; const uint64_t bitmask = ((uint64_t)1 << bits) - 1; do { if (rand_bits < bits) { rand_ho
Fwd: Picky, but much more efficient arc4random_uniform!
I appreciate your response, Damien. I did do the bare minimum of correctness testing and it was the post right after Guenther was congratulated on proving incorrectness: https://marc.info/?l=openbsd-tech&m=165259528425835&w=2 The post includes software to reproduce the results. I wrote a highly dynamically allocated program to test at intervals of intervals to show at various stages to show the degree that the output remains random This is an example of some output: testing arc4random_uniform(5) and arc4random_uniform_small_unlocked(5): 256X 2048X16384X 131072X 1048576X 0 original246 2053 16400131115 1048306 mine263 2042 16312131258 1046989 1 original234 2013 16337130894 1047810 mine249 2022 16304131161 1049511 2 original236 2061 16367130802 1047430 mine257 2117 16597130718 1046375 3 original284 2089 16444131092 1050332 mine266 2058 16379131190 1049877 4 original280 2024 16372131457 1049002 mine245 2001 16328131033 1050128 max-min values: original 5076 107 655 2902 mine 21 116 293 540 3753 The program is set up to test values 2 through 50. This will create 224KiB of output. I suspected that you'd prefer that I didn't attach it. Some progress output has been relegated to stderr so that you can easily pipe it to a file. I added some getrlimit an setrlimit functions to maximize memory limits if necessary In the case that I created for you, this will not be needed. It uses 1276K RAM in the current configuration. > Just to bring this back to where we came in: even putting thread-safety > aside (which you absolutely can't): it doesn't matter how much faster > it is, your replacement function isn't useful until you do the work to > demonstrate correctness. > > You have done literally zero so far, not even the bare minimum of > testing the output. As a result your first version was demonstrated to > be completely broken by literally the most basic of possible tests, a > mere 10 lines of code. > > That you left this to others to do tells me that you fundamentally don't > understand the environment in which you're trying to operate, and that > you don't respect the time of the people you're trying to convince. > > Please stop wasting our time. > > -d -- -Luke arc4random_uniform_small.c Description: Binary data
Re: Picky, but much more efficient arc4random_uniform!
Joerg Sonnenberger wrote in : |Am Wed, May 18, 2022 at 12:49:21PM +0200 schrieb Steffen Nurpmeso: |> What surprised me was that the Apple code requires more calls, and |> that today divisions and multiplications still matter. I think it |> was the Cyrix 166+ (or was it Athlon 1600+) where +,-,<<,>> was |> one cycle, * was ten cycles, and %,/ was fourty cycles. But |> i think the Core i5 8th Gen that i have requires one cycle for all |> of them. | |I dare you to find any CPU optimized for speed where division and modulo |are as fast as multiplication. What you have nowadays is a single cycle |for basic operations like additions and shifts, about three cycles for |multiplications (on x86 at least) and an order of magnitude more for |division. That's not really surprising either if you consider that Thanks. I really had not looked since about 2005. (In fact i never really looked, i think a c't magazine CD once came over with x86 interrupt lists, CPU cycle usage, and a (sketchy) RFC collection.) Hm, it even seems the number of cycles increased for division .. i do not know which numbers i had seen by then, did they include cache work at all? Well. So this was that. |integer multiplication has a fast circuit implementation where as |division is often implemented as cascading conditional subtraction. |The difference matters enough turning the reminder operation in the ELF |hash into essentially (x - x * (1/size) * size) give a 2% speed up for a |large application like Firefox ten years ago. Interesting. Well you live in pkgsrc with all the mess that all those huge software suites introduce, whereas i still "jerk" if i produce software which needs more than EXEC (/ RODATA) and BSS, and "shudder" for all the CTOR and DTOR and TLS (but ok: cool) and all the other stuff that is found in ELF, and handled by the linker. (I have ELF v1.2 and TLS-ABI, and even glanced over them a really long time ago though. But you know, it is so far off! And it makes no fun thinking about that :)) |Joerg --End of --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: Picky, but much more efficient arc4random_uniform!
Am Wed, May 18, 2022 at 12:49:21PM +0200 schrieb Steffen Nurpmeso: > What surprised me was that the Apple code requires more calls, and > that today divisions and multiplications still matter. I think it > was the Cyrix 166+ (or was it Athlon 1600+) where +,-,<<,>> was > one cycle, * was ten cycles, and %,/ was fourty cycles. But > i think the Core i5 8th Gen that i have requires one cycle for all > of them. I dare you to find any CPU optimized for speed where division and modulo are as fast as multiplication. What you have nowadays is a single cycle for basic operations like additions and shifts, about three cycles for multiplications (on x86 at least) and an order of magnitude more for division. That's not really surprising either if you consider that integer multiplication has a fast circuit implementation where as division is often implemented as cascading conditional subtraction. The difference matters enough turning the reminder operation in the ELF hash into essentially (x - x * (1/size) * size) give a 2% speed up for a large application like Firefox ten years ago. Joerg
Re: Picky, but much more efficient arc4random_uniform!
Steffen Nurpmeso wrote in <20220518104921.sr0ht%stef...@sdaoden.eu>: ... |The web site that has been linked from the man from the country |that has an even much worse Earth Country Overshoot Day than |Germany and is almost en par with Australia or even USA (April |3rd, pooh; never again a Saab! Cuba: Nov 25th, Jamaica Dec 20th) Finland is March 31st. Man is that dirty. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: Picky, but much more efficient arc4random_uniform!
Philip Guenther wrote in : |On Tue, May 17, 2022 at 1:10 PM Steffen Nurpmeso \ |wrote: |> Joerg Sonnenberger wrote in |> : |>|Am Fri, May 13, 2022 at 09:43:26AM -0500 schrieb Luke Small: |>|> I made a couple new versions of a new kind of arc4random_uniform-like |> ... |>|If your main use case is limiting the amount of cryptography when using |>|small bounds, there is a much simpler approach to be taken here. For |>|boundaries below 256, use arc4random_buf to extract one byte if bound is |>|a power of two, otherwise two. This gives most of the performance ... |> You can use (really implemented) _buf() if you need a 8-bit or |> 16-bit etc number. |> |> I find that _uniform() often makes no difference to a simple |> modulo because like the comment in _uniform() says "p > 0.5 (worst |>case, usually far better", and usually RNGs sprinkle bits nicely, | |What does that statement mean? You seem to be saying "module is uniform, |except when it isn't, which could be almost half the time for some cases, |but when it's uniform it's uniform, so why bother making it actually |correct and dependable". | |I mean, what does that _mean_??? It's as if I said "my text handling Well it means this thread was too long. |program handles all characters uniformly, except those with accents, but |that's less than 10% of the characters I type, so it handles all characters |uniformly." WTF, NO! But it also means that // calculates 2**32 % range uint32_t t = (-range) % range; for (;;) { uint32_t r = rng(); if (r >= t) where range is a very small number results in a very, very low probability that r>=t is not true. For 16-bit 0x two zero bytes had to occur in the upper 16-bit. And worse for 64-bit RNG. So this is what i said. It depends on the application. This gets hypothetic and is anyway beyond my mathematical capabilities. I mean i look at /etc/ssh/moduli, so much on cramping of random numbers. The web site that has been linked from the man from the country that has an even much worse Earth Country Overshoot Day than Germany and is almost en par with Australia or even USA (April 3rd, pooh; never again a Saab! Cuba: Nov 25th, Jamaica Dec 20th) said the bias for the number 52 is 0.0121%. And what i posted had ~0.008 that rand<0x01FF aka 32-bit high byte is 0, for 32-bit from getrandom(2) as well as mine (in "strong" mode, SipHash on ARC4). You need quite some samples to be able to find a statistical frequency of that order. And it depends on the application of that many samples exist. And even TCP from RFC 793 / September 1981 has a 32-bit sequence number. But sure Philip, of course, yes, of course you are right: simply call _uniform() and "milk the shit out of the random range" -- just use it and forget about the problem. What surprised me was that the Apple code requires more calls, and that today divisions and multiplications still matter. I think it was the Cyrix 166+ (or was it Athlon 1600+) where +,-,<<,>> was one cycle, * was ten cycles, and %,/ was fourty cycles. But i think the Core i5 8th Gen that i have requires one cycle for all of them. (And somewhere i read that there are CPUs where the shift operators are now more expensive than multiplication.) I do not really track any of that since 2005. That is nice: you buy a laptop and suddenly have a NVME SSD that can 1.5GB/sec. Wow. |> 0 bytes "do not occur", so a 32-bit RNG value "is" >=0x01FF in |> most cases for "my RNG" (of 10 803/759/793 NOT; 776/805/793 |> NOT for Linux getrandom(2)), which is a pretty high cut off. ... |Where do these ideas come from, that "0 bytes 'do not occur'"?? If your |rand generator doesn't provide zero bytes at the expected frequency, you |know, 1 in 256, then you're using a garbage random number generator. |Please stop making such suggestions here because THEY ARE NOT TRUE ABOUT |OPENBSD. Do ya'll not bother to test the claims that you make? ... |for (;;) { |u = arc4random(); |if ((u & 0xff00) == 0 || |(u & 0x00ff) == 0 || |(u & 0xff00) == 0 || |(u & 0x00ff) == 0) |break; That is any-byte-0. |00b82e5c58 |ab47880036 Seems random to me. :) --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: Picky, but much more efficient arc4random_uniform!
On Wed, May 18, 2022 at 05:08:02PM +1000, Damien Miller wrote: > On Wed, 18 May 2022, Otto Moerbeek wrote: > > > instrumenting the code to count the number of arc4random calls I see thsi: > > > > openbsd; elapsed = 2.835819; calls = 12340949 > > bitmask; elapsed = 4.335576; calls = 17836216 > > bitmask+reuse; elapsed = 3.710277; calls = 15245337 > > > > (this is a different number of trials on a slow machine). > > > > So the promise of less calls is not fulfilled. Sounds like a bug. > > Well, I don't know whether the simple bitmasking approach will really > result in fewer calls. I *think* our current approach has a higher > probability per loop of suceeding (at least for small upper_bounds) than > mask-and-test, but my brain is too addled with a cold to calculate it :/ I *assumed* the speedups seen in the web page were because of less rnd calls. That was a mistake. > What Theo said is 100% right - the cost is dominated by that of the > underlying RNG. If anyone wants a faster arc4random_uniform() then the > first place to look it at arc4random(). One more reason to keep the current implementation of arc4random_uniform(). > > It's entirely possible that the speedups measured in that article > are because they are using a omgoptimised (non-crypto) RNG and the > improvements they saw were due solely to reduction the overhead inside > the uniformity code even if it came at the cost of more RNG queries. Right. > Anyway, here's a tweaked up version of the test harness that fakes out > arc4random() with a deterministic RNG that counts how often it is called > in case anyone wants to play with it further. -Otto > > > > > #include > #include > > #include > #include > > static uint32_t nqueries; > > /* deterministic, query-counting arc4random() replacement for benchmarking */ > static uint32_t > fake_random(void) > { > static uint32_t ready, remain, msb; > uint32_t ret; > > if (!ready) { > srand_deterministic(31337); > ready = 1; > } > if (remain == 0) { > msb = (uint32_t)rand(); > remain = 31; /* XXX makes assumption re RAND_MAX */ > } > ret = (uint32_t)rand() | (msb << 31); > msb >>= 1; > remain--; > nqueries++; > return ret; > } > > #define arc4random() fake_random() > > #ifndef __has_builtin > #define __has_builtin(x) 0 > #endif > #if __has_builtin(__builtin_clz) > #define arc4random_clz(x) __builtin_clz(x) > #else > #warning lacks __builtin_clz() > /* Count most-significant zero bits, like __builtin_clz() */ > static int > arc4random_clz(unsigned int x) > { > int ret = 0; > unsigned int n; > const int lut[16] = { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 }; > > while (x != 0) { > n = (x >> 28) & 0xf; > if (n != 0) > return ret + lut[n]; > x <<= 4; > ret += 4; > } > return 32; > } > #endif > > /* Current OpenBSD implementation */ > static uint32_t > arc4random_uniform1(uint32_t upper_bound) > { > uint32_t r, min; > > if (upper_bound < 2) > return 0; > > /* 2**32 % x == (2**32 - x) % x */ > min = -upper_bound % upper_bound; > > /* >* This could theoretically loop forever but each retry has >* p > 0.5 (worst case, usually far better) of selecting a >* number inside the range we need, so it should rarely need >* to re-roll. >*/ > for (;;) { > r = arc4random(); > if (r >= min) > break; > } > > return r % upper_bound; > } > > /* > * Like "Bitmask with Rejection" implementation from > * https://www.pcg-random.org/posts/bounded-rands.html > */ > static uint32_t > arc4random_uniform2(uint32_t upper_bound) > { > uint32_t mask, r; > > if (upper_bound < 2) > return 0; > > mask = 0x >> arc4random_clz((upper_bound - 1) | 1);; > for (;;) { > r = arc4random() & mask; > if (r < upper_bound) > return r; > } > /* NOTREACHED */ > } > > /* > * Like Apple's > * > https://opensource.apple.com/source/Libc/Libc-1158.50.2/gen/FreeBSD/arc4random.c > */ > static uint32_t > arc4random_uniform3(uint32_t upper_bound) > { >
Re: Picky, but much more efficient arc4random_uniform!
On Wed, 18 May 2022, Otto Moerbeek wrote: > instrumenting the code to count the number of arc4random calls I see thsi: > > openbsd; elapsed = 2.835819; calls = 12340949 > bitmask; elapsed = 4.335576; calls = 17836216 > bitmask+reuse; elapsed = 3.710277; calls = 15245337 > > (this is a different number of trials on a slow machine). > > So the promise of less calls is not fulfilled. Sounds like a bug. Well, I don't know whether the simple bitmasking approach will really result in fewer calls. I *think* our current approach has a higher probability per loop of suceeding (at least for small upper_bounds) than mask-and-test, but my brain is too addled with a cold to calculate it :/ What Theo said is 100% right - the cost is dominated by that of the underlying RNG. If anyone wants a faster arc4random_uniform() then the first place to look it at arc4random(). It's entirely possible that the speedups measured in that article are because they are using a omgoptimised (non-crypto) RNG and the improvements they saw were due solely to reduction the overhead inside the uniformity code even if it came at the cost of more RNG queries. Anyway, here's a tweaked up version of the test harness that fakes out arc4random() with a deterministic RNG that counts how often it is called in case anyone wants to play with it further. #include #include #include #include static uint32_t nqueries; /* deterministic, query-counting arc4random() replacement for benchmarking */ static uint32_t fake_random(void) { static uint32_t ready, remain, msb; uint32_t ret; if (!ready) { srand_deterministic(31337); ready = 1; } if (remain == 0) { msb = (uint32_t)rand(); remain = 31; /* XXX makes assumption re RAND_MAX */ } ret = (uint32_t)rand() | (msb << 31); msb >>= 1; remain--; nqueries++; return ret; } #define arc4random() fake_random() #ifndef __has_builtin #define __has_builtin(x) 0 #endif #if __has_builtin(__builtin_clz) #define arc4random_clz(x) __builtin_clz(x) #else #warning lacks __builtin_clz() /* Count most-significant zero bits, like __builtin_clz() */ static int arc4random_clz(unsigned int x) { int ret = 0; unsigned int n; const int lut[16] = { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 }; while (x != 0) { n = (x >> 28) & 0xf; if (n != 0) return ret + lut[n]; x <<= 4; ret += 4; } return 32; } #endif /* Current OpenBSD implementation */ static uint32_t arc4random_uniform1(uint32_t upper_bound) { uint32_t r, min; if (upper_bound < 2) return 0; /* 2**32 % x == (2**32 - x) % x */ min = -upper_bound % upper_bound; /* * This could theoretically loop forever but each retry has * p > 0.5 (worst case, usually far better) of selecting a * number inside the range we need, so it should rarely need * to re-roll. */ for (;;) { r = arc4random(); if (r >= min) break; } return r % upper_bound; } /* * Like "Bitmask with Rejection" implementation from * https://www.pcg-random.org/posts/bounded-rands.html */ static uint32_t arc4random_uniform2(uint32_t upper_bound) { uint32_t mask, r; if (upper_bound < 2) return 0; mask = 0x >> arc4random_clz((upper_bound - 1) | 1);; for (;;) { r = arc4random() & mask; if (r < upper_bound) return r; } /* NOTREACHED */ } /* * Like Apple's * https://opensource.apple.com/source/Libc/Libc-1158.50.2/gen/FreeBSD/arc4random.c */ static uint32_t arc4random_uniform3(uint32_t upper_bound) { int zeroes, bits, remain; uint32_t mask, r, rm; if (upper_bound < 2) return 0; zeroes = arc4random_clz((upper_bound - 1) | 1); bits = 32 - zeroes; mask = (uint32_t)-1 >> zeroes; for (;;) { r = arc4random(); rm = r & mask; if (rm < upper_bound) return rm; for (remain = zeroes; remain >= bits; remain -= bits) { r >>= bits; rm = r & mask; if (rm < upper_bound) return rm; } } /* NOTREACHED */ } #include static uint32_t fixture(const char *s, uint32_t (* const rnd)(uint32_t)) { const uint32_t trials = 20 * 1000 * 1000; const uint32_t max = 0x8000;
Re: Picky, but much more efficient arc4random_uniform!
On Wed, May 18, 2022 at 02:50:28PM +1000, Damien Miller wrote: > On Tue, 17 May 2022, Raimo Niskanen wrote: > > > Why reinvent the wheel? > > > > Here is a pretty good walkthrough of established methods: > > > > https://www.pcg-random.org/posts/bounded-rands.html > > > > It sounds to me as if your suggested methor essentially is > > "Bitmask with Rejection -- Apple's Method" with the added twist > > to keep the unused bits of the base generator's word and append > > new, which might be an optimization for small ranges, but might > > be just overhead for large ones. > > > > In that post one can see that there might be other suggested smaller > > optimizations that can be applied to the OpenBSD method, and that > > the bitmask method is effective in many cases but not a clear winner. > > Oh nice, I wasn't aware that Apple had an improved algorithm. I always > thought the best avenue for a speedup was to consider only the bits that > could satisfy the constraint, but it never occurred to me how to actually > make use of this :) > > The toy benchmark below compares the existing implementation with > reimplementations of both their version as well as something more close > to Apple's actual method (which reuses the high bits). > > However, I don't see a speedup for either of the alternate approaches. > > [djm@djm ~]$ cc -O2 -Werror -Wall -Wextra -o rb rb.c && doas nice -n -20 ./rb > openbsd; elapsed = 8.327954 > bitmask; elapsed = 13.306228 > bitmask+reuse; elapsed = 11.567389 instrumenting the code to count the number of arc4random calls I see thsi: openbsd; elapsed = 2.835819; calls = 12340949 bitmask; elapsed = 4.335576; calls = 17836216 bitmask+reuse; elapsed = 3.710277; calls = 15245337 (this is a different number of trials on a slow machine). So the promise of less calls is not fulfilled. Sounds like a bug. -Otto
Re: Picky, but much more efficient arc4random_uniform!
> However, I don't see a speedup for either of the alternate approaches. Or maybe, just maybe, the underlying function (arc4random, which in the dominant case it is just a small chacha step) is so inexpensive relative to the proposed higher-level logic changes? So this is all tilting at windmills, and also something about measuring before optimizing^Wbreaking...
Re: Picky, but much more efficient arc4random_uniform!
On Tue, 17 May 2022, Raimo Niskanen wrote: > Why reinvent the wheel? > > Here is a pretty good walkthrough of established methods: > > https://www.pcg-random.org/posts/bounded-rands.html > > It sounds to me as if your suggested methor essentially is > "Bitmask with Rejection -- Apple's Method" with the added twist > to keep the unused bits of the base generator's word and append > new, which might be an optimization for small ranges, but might > be just overhead for large ones. > > In that post one can see that there might be other suggested smaller > optimizations that can be applied to the OpenBSD method, and that > the bitmask method is effective in many cases but not a clear winner. Oh nice, I wasn't aware that Apple had an improved algorithm. I always thought the best avenue for a speedup was to consider only the bits that could satisfy the constraint, but it never occurred to me how to actually make use of this :) The toy benchmark below compares the existing implementation with reimplementations of both their version as well as something more close to Apple's actual method (which reuses the high bits). However, I don't see a speedup for either of the alternate approaches. [djm@djm ~]$ cc -O2 -Werror -Wall -Wextra -o rb rb.c && doas nice -n -20 ./rb openbsd; elapsed = 8.327954 bitmask; elapsed = 13.306228 bitmask+reuse; elapsed = 11.567389 Maybe my benchmark is crap or maybe I need dlg@ to omgoptimize it for me... #include #include #include #include #ifndef __has_builtin #define __has_builtin(x) 0 #endif #if __has_builtin(__builtin_clz) #define arc4random_clz(x) __builtin_clz(x) #else #warning lacks __builtin_clz() /* Count most-significant zero bits, like __builtin_clz() */ static int arc4random_clz(unsigned int x) { int ret = 0; unsigned int n; const int lut[16] = { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 }; while (x != 0) { n = (x >> 28) & 0xf; if (n != 0) return ret + lut[n]; x <<= 4; ret += 4; } return 32; } #endif /* Current OpenBSD implementation */ static uint32_t arc4random_uniform1(uint32_t upper_bound) { uint32_t r, min; if (upper_bound < 2) return 0; /* 2**32 % x == (2**32 - x) % x */ min = -upper_bound % upper_bound; /* * This could theoretically loop forever but each retry has * p > 0.5 (worst case, usually far better) of selecting a * number inside the range we need, so it should rarely need * to re-roll. */ for (;;) { r = arc4random(); if (r >= min) break; } return r % upper_bound; } /* * Like "Bitmask with Rejection" implementation from * https://www.pcg-random.org/posts/bounded-rands.html */ static uint32_t arc4random_uniform2(uint32_t upper_bound) { uint32_t mask, r; if (upper_bound < 2) return 0; mask = (uint32_t)-1 >> arc4random_clz((upper_bound - 1) | 1);; for (;;) { r = arc4random(); if ((r & mask) < upper_bound) return r & mask; } /* NOTREACHED */ } /* * Like Apple's * https://opensource.apple.com/source/Libc/Libc-1158.50.2/gen/FreeBSD/arc4random.c */ static uint32_t arc4random_uniform3(uint32_t upper_bound) { int zeroes, bits, remain = 32; uint32_t mask, r; if (upper_bound < 2) return 0; zeroes = arc4random_clz((upper_bound - 1) | 1); bits = 32 - zeroes; mask = (uint32_t)-1 >> zeroes; for (;;) { r = arc4random(); if ((r & mask) < upper_bound) return r & mask; for (remain = zeroes; remain >= bits; remain -= bits) { r >>= bits; if ((r & mask) < upper_bound) return r & mask; } } /* NOTREACHED */ } #include static uint32_t fixture(const char *s, uint32_t (* const rnd)(uint32_t)) { const uint32_t trials = 50 * 1000 * 1000; const uint32_t max = 0x8000; const uint32_t mul = 7; unsigned int v, i, r; struct timeval start, finish, delta; gettimeofday(&start, NULL); for (v = 3, r = 0; v < max; v *= mul) { /* printf("%u\n", v); */ for (i = 0; i < trials; i++) r |= rnd(v); } gettimeofday(&finish, NULL); timersub(&finish, &start, &delta); printf("%s; elapsed = %lld.%06lld\n", s, (long long)delta.tv_sec, (long long)delta.tv_usec); return r; } int main(void) { fixture("openbsd", arc4random_uniform1); fixture("bitmask", arc4random_uniform2); fixture("bitmask+reuse",
Re: Picky, but much more efficient arc4random_uniform!
On Tue, May 17, 2022 at 1:10 PM Steffen Nurpmeso wrote: > Joerg Sonnenberger wrote in > : > |Am Fri, May 13, 2022 at 09:43:26AM -0500 schrieb Luke Small: > |> I made a couple new versions of a new kind of arc4random_uniform-like > ... > |If your main use case is limiting the amount of cryptography when using > |small bounds, there is a much simpler approach to be taken here. For > |boundaries below 256, use arc4random_buf to extract one byte if bound is > |a power of two, otherwise two. This gives most of the performance > |benefit without complicating the algorithm. Extracting two bytes ensures > |that the propability of success is > 99% and the double extracting > |doesn't eat up the benefits. > > You can use (really implemented) _buf() if you need a 8-bit or > 16-bit etc number. > > I find that _uniform() often makes no difference to a simple > modulo because like the comment in _uniform() says "p > 0.5 (worst case, usually far better", and usually RNGs sprinkle bits nicely, > What does that statement mean? You seem to be saying "module is uniform, except when it isn't, which could be almost half the time for some cases, but when it's uniform it's uniform, so why bother making it actually correct and dependable". I mean, what does that _mean_??? It's as if I said "my text handling program handles all characters uniformly, except those with accents, but that's less than 10% of the characters I type, so it handles all characters uniformly." WTF, NO! > 0 bytes "do not occur", so a 32-bit RNG value "is" >=0x01FF in > most cases for "my RNG" (of 10 803/759/793 NOT; 776/805/793 > NOT for Linux getrandom(2)), which is a pretty high cut off. > Using _uniform() just because of its name seems strange thus. > Where do these ideas come from, that "0 bytes 'do not occur'"?? If your rand generator doesn't provide zero bytes at the expected frequency, you know, 1 in 256, then you're using a garbage random number generator. Please stop making such suggestions here because THEY ARE NOT TRUE ABOUT OPENBSD. Do ya'll not bother to test the claims that you make? : bleys; cat f.c #include #include int main(void) { uint32_t u; long count; count = 0; while ((u = arc4random()) > 0x1ff) count++; printf("%08x\t%ld\n", u, count); count = 0; for (;;) { u = arc4random(); if ((u & 0xff00) == 0 || (u & 0x00ff) == 0 || (u & 0xff00) == 0 || (u & 0x00ff) == 0) break; count++; } printf("%08x\t%ld\n", u, count); return 0; } : bleys; cc f.c : bleys; ./a.out 00b82e5c58 ab47880036 : bleys; Philip Guenther
Re: Picky, but much more efficient arc4random_uniform!
Steffen Nurpmeso wrote in <20220517220924.xohqc%stef...@sdaoden.eu>: |Joerg Sonnenberger wrote in | : ||Am Fri, May 13, 2022 at 09:43:26AM -0500 schrieb Luke Small: ||> I made a couple new versions of a new kind of arc4random_uniform-like ... |0 bytes "do not occur", so a 32-bit RNG value "is" >=0x01FF in |most cases for "my RNG" (of 10 803/759/793 NOT; 776/805/793 |NOT for Linux getrandom(2)), which is a pretty high cut off. ... /*@ lotto.cxx - output six random numbers */ #include #include #include //#define NYDPROF_ENABLE //#define NYD_ENABLE //#define NYD2_ENABLE #include NSPC_USE(su) static u32 bounded_rand(u32 range, u32 rv){ for(u32 t = -range % range;;){ if(rv >= t) return rv % range; log::write(log::emerg, "NOFIT"); } } int main(void){ u32 uni[50], mod[50], rv; state::create_core(NIL, (state::debug | log::debug), state::err_nopass); su_mem_set(uni, 0, sizeof uni); su_mem_set(mod, 0, sizeof mod); u32 au=0; for(u32 i = 0; i < 10; ++i){ random::builtin_generate(&rv, sizeof rv, state::err_nopass); //if(getrandom(&rv, sizeof rv, GRND_NONBLOCK) == -1) // log::write(log::emerg, "getrandom(2)"); if(rv < 0x01FF) log::write(log::info, "AU %u - 0x%X", ++au,rv); ++mod[rv % NELEM(mod)]; ++uni[bounded_rand(NELEM(mod), rv)]; } u32 unilo, modlo, unihi, modhi; unilo = modlo = max::u32; unihi = modhi = 0; for(u32 i = 0; i < NELEM(uni); ++i){ unilo = get_min(unilo, uni[i]); modlo = get_min(modlo, mod[i]); unihi = get_max(unihi, uni[i]); modhi = get_max(modhi, mod[i]); log::write(log::info, "%u - %u / %u %s", i, uni[i], mod[i], (uni[i] != mod[i] ? "!!!" : "=")); } log::write(log::info, "MIN %u / %u, MAX %u / %u - au %u", unilo, modlo, unihi, modhi,au); return 0; } #include // s-it-mode --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: Picky, but much more efficient arc4random_uniform!
Joerg Sonnenberger wrote in : |Am Fri, May 13, 2022 at 09:43:26AM -0500 schrieb Luke Small: |> I made a couple new versions of a new kind of arc4random_uniform-like ... |If your main use case is limiting the amount of cryptography when using |small bounds, there is a much simpler approach to be taken here. For |boundaries below 256, use arc4random_buf to extract one byte if bound is |a power of two, otherwise two. This gives most of the performance |benefit without complicating the algorithm. Extracting two bytes ensures |that the propability of success is > 99% and the double extracting |doesn't eat up the benefits. You can use (really implemented) _buf() if you need a 8-bit or 16-bit etc number. I find that _uniform() often makes no difference to a simple modulo because like the comment in _uniform() says "p > 0.5 (worst case, usually far better", and usually RNGs sprinkle bits nicely, 0 bytes "do not occur", so a 32-bit RNG value "is" >=0x01FF in most cases for "my RNG" (of 10 803/759/793 NOT; 776/805/793 NOT for Linux getrandom(2)), which is a pretty high cut off. Using _uniform() just because of its name seems strange thus. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: Picky, but much more efficient arc4random_uniform!
Am Fri, May 13, 2022 at 09:43:26AM -0500 schrieb Luke Small: > I made a couple new versions of a new kind of arc4random_uniform-like > function and some example functions which use them. Instead of having a > sufficiently large random number greater than the modulus, I pick a random > number using arc4random() from a bitfield where the length of the bitfield > is just below or slightly beyond the value of the modulus and returns the > bitfield it if it is less than the value of the modulus. If your main use case is limiting the amount of cryptography when using small bounds, there is a much simpler approach to be taken here. For boundaries below 256, use arc4random_buf to extract one byte if bound is a power of two, otherwise two. This gives most of the performance benefit without complicating the algorithm. Extracting two bytes ensures that the propability of success is > 99% and the double extracting doesn't eat up the benefits. Joerg
Re: Picky, but much more efficient arc4random_uniform!
On Mon, 16 May 2022, Luke Small wrote: > Yeah, I see your point. > > I suppose it depends on how conservative you want to be and whether > you want to supply options to people like getchar_unlocked when it > isn’t essential. > > It could be made manually fork-safe if I could make a simple feature > where “arc4random_uniform_unlocked(0);” with a 0 upperbound could > trigger a reset of the static variables rand_bits and rand_holder > which would be simple enough and could be added to a man page. I > certainly read the man pages. (I’m surprised “man getchar” doesn’t > “see also” to getchar_unlocked() by the way.) > > If you look at profiling with programs that call it a lot, > arc4random() inside arc4random_uniform() calls the expensive rekey > function which makes it take more time. That’s why I can get around > 2X-3X performance on a typical repetitive small upperbound loop and > an extra 20% improvement on a single 65536 Knuth shuffle, loop even > though my function repeats a binary search for ‘bit’ size every single > time and has misses which demands calling up more data. > > Otherwise my function would be particularly useful when there’s a loop > with small upperbound like: 26+26+10 which, if I recall correctly, is > in identd, which would call it frequently. Just to bring this back to where we came in: even putting thread-safety aside (which you absolutely can't): it doesn't matter how much faster it is, your replacement function isn't useful until you do the work to demonstrate correctness. You have done literally zero so far, not even the bare minimum of testing the output. As a result your first version was demonstrated to be completely broken by literally the most basic of possible tests, a mere 10 lines of code. That you left this to others to do tells me that you fundamentally don't understand the environment in which you're trying to operate, and that you don't respect the time of the people you're trying to convince. Please stop wasting our time. -d
Re: Picky, but much more efficient arc4random_uniform!
Yeah, I see your point. I suppose it depends on how conservative you want to be and whether you want to supply options to people like getchar_unlocked when it isn’t essential. It could be made manually fork-safe if I could make a simple feature where “arc4random_uniform_unlocked(0);” with a 0 upperbound could trigger a reset of the static variables rand_bits and rand_holder which would be simple enough and could be added to a man page. I certainly read the man pages. (I’m surprised “man getchar” doesn’t “see also” to getchar_unlocked() by the way.) If you look at profiling with programs that call it a lot, arc4random() inside arc4random_uniform() calls the expensive rekey function which makes it take more time. That’s why I can get around 2X-3X performance on a typical repetitive small upperbound loop and an extra 20% improvement on a single 65536 Knuth shuffle, loop even though my function repeats a binary search for ‘bit’ size every single time and has misses which demands calling up more data. Otherwise my function would be particularly useful when there’s a loop with small upperbound like: 26+26+10 which, if I recall correctly, is in identd, which would call it frequently. I have a project that I generate MANY random record names much like that and I use fork()/fork()-exec() on many processes well before calling randomizing functions. Maybe I’m not the only one. I don’t want to even try multithreading, especially as often as you guys say that it is too unpredictable. I believe you. Using lua scripts in multi-worker-process redis calls to avoid race conditions is awkward enough for me. But otherwise it’s certainly your prerogative to not have it. At least you can’t legitimately say that it adds too much extra code or is too complex. It’s pretty simple to understand if you’re familiar with bitwise stuff. On Mon, May 16, 2022 at 5:35 PM Stuart Henderson wrote: > On 2022/05/16 15:13, Luke Small wrote: > > If you’re not running a threaded program, my function wouldn’t be “less > > safe.” > > > > I’d imagine that 99% of programs aren’t multithreaded. > > code is reused in different places. non threaded programs are sometimes > turned into threaded programs and when that happens, sometimes > non-thread-safe calls are missed. so i'd argue that it is still less > safe. > > in some cases there might be benefits that would mean it's worth it, > especially if the failure modes would be obvious such that they can > be detected. really not seeing that here. (how often are you even > calling arc4random_uniform to consider it slow?!) > > if the consequence is not a crash but instead subtly broken randomness, > how long do you think it's going to take to notice and report/fix it? > even *very* broken randomness in widespread software distributions > has been known to sit around for a long time before it's noticed: > > - predictable rng in a popular os. *serious* bug. introduced 2006, > discovered/reported nearly 2 years later. > > - non-fork-safe rng in a popular ssl library, introduced sometime before > sept 2018, reported may 2019. > > -- -Luke
Re: Picky, but much more efficient arc4random_uniform!
On 2022/05/16 15:13, Luke Small wrote: > If you’re not running a threaded program, my function wouldn’t be “less > safe.” > > I’d imagine that 99% of programs aren’t multithreaded. code is reused in different places. non threaded programs are sometimes turned into threaded programs and when that happens, sometimes non-thread-safe calls are missed. so i'd argue that it is still less safe. in some cases there might be benefits that would mean it's worth it, especially if the failure modes would be obvious such that they can be detected. really not seeing that here. (how often are you even calling arc4random_uniform to consider it slow?!) if the consequence is not a crash but instead subtly broken randomness, how long do you think it's going to take to notice and report/fix it? even *very* broken randomness in widespread software distributions has been known to sit around for a long time before it's noticed: - predictable rng in a popular os. *serious* bug. introduced 2006, discovered/reported nearly 2 years later. - non-fork-safe rng in a popular ssl library, introduced sometime before sept 2018, reported may 2019.
Re: Picky, but much more efficient arc4random_uniform!
No...am I incorrect, especially on OpenBSD? Of course since you made such a remark, you seem like the kind of fellow that would put the nail in the coffin for spite. ...now I sound like an asshole. On Mon, May 16, 2022 at 4:00 PM Theo de Raadt wrote: > hey luke you know you sound like an asshole right? > > > Luke Small wrote: > > > If you’re not running a threaded program, my function wouldn’t be “less > > safe.” > > > > I’d imagine that 99% of programs aren’t multithreaded. > > > > On Mon, May 16, 2022 at 1:01 PM wrote: > > > > > > There is the specifically non-threadsafe call getchar_unlocked() on > > > OpenBSD > > > > which is presumably available for performance reasons alone, when > > > getchar() > > > > is a perfectly viable option and is even an ISO conforming function. > > > What I > > > > submitted could be such a higher performance non-threadsafe function. > > > > > > > > so, how about arc4random_uniform_unlocked() ?! > > > > > > getchar_unlocked is mandated by POSIX. > > > > > > OpenBSD has not yet invented an alternate function that only exists to > > > give away safety for performance. It has only gone in the opposite > > > direction if anything. > > > > > > -- > > -Luke >
Re: Picky, but much more efficient arc4random_uniform!
hey luke you know you sound like an asshole right? Luke Small wrote: > If you’re not running a threaded program, my function wouldn’t be “less > safe.” > > I’d imagine that 99% of programs aren’t multithreaded. > > On Mon, May 16, 2022 at 1:01 PM wrote: > > > > There is the specifically non-threadsafe call getchar_unlocked() on > > OpenBSD > > > which is presumably available for performance reasons alone, when > > getchar() > > > is a perfectly viable option and is even an ISO conforming function. > > What I > > > submitted could be such a higher performance non-threadsafe function. > > > > > > so, how about arc4random_uniform_unlocked() ?! > > > > getchar_unlocked is mandated by POSIX. > > > > OpenBSD has not yet invented an alternate function that only exists to > > give away safety for performance. It has only gone in the opposite > > direction if anything. > > > > -- > -Luke
Re: Picky, but much more efficient arc4random_uniform!
If you’re not running a threaded program, my function wouldn’t be “less safe.” I’d imagine that 99% of programs aren’t multithreaded. On Mon, May 16, 2022 at 1:01 PM wrote: > > There is the specifically non-threadsafe call getchar_unlocked() on > OpenBSD > > which is presumably available for performance reasons alone, when > getchar() > > is a perfectly viable option and is even an ISO conforming function. > What I > > submitted could be such a higher performance non-threadsafe function. > > > > so, how about arc4random_uniform_unlocked() ?! > > getchar_unlocked is mandated by POSIX. > > OpenBSD has not yet invented an alternate function that only exists to > give away safety for performance. It has only gone in the opposite > direction if anything. > > -- -Luke
Re: Picky, but much more efficient arc4random_uniform!
Philip Guenther wrote in : |On Sun, 15 May 2022, Steffen Nurpmeso wrote: |> Stuart Henderson wrote in |... |>|what's the perceived problem you're wanting to solve? and does that |>|problem actually exist in the first place? |> |> The problem is that if have a low upper bound then modulo will "remove a |> lot of randomization". For example if you have a program which |> generates Lotto numbers (1..49), then using _uniform() as it is will |> generate many duplicates. | |Wut. The *WHOLE POINT* of arc4random_uniform() is that it has uniform |distribution. Says right so in the manpage. If an implementation of that |API fails to do that, it's a broken implementation. I always admired its source code comments and have been left stunning more than just once. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: Picky, but much more efficient arc4random_uniform!
Luke Small on Sun, May 15 2022: > The current implementation is nothing more than a naive arc4random() % > upper_bound which trashes initial arc4random() calls it doesn’t like, Yes, that's exactly what it is. > The whole transformation by modulus of perfectly decent random data > seems so awkward. [...] it seems like an ugly HACK Let me point out that this way of doing it is absolutely standard. It is not awkward or a hack. > to simply meet demand This is probably a good way of describing the design goal for this highly sensitive piece of kernel interna. -p PS: I am not an OpenBSD developer, a tiny contributor at most. I speak as a bystander who happens to have some domain knowledge.
Re: Picky, but much more efficient arc4random_uniform!
Yeah. It most likely won't go in. From past experience and advice, not necessarily just from a perceived lack of merit. However, many, if not all of the arguments are based upon non-facts and misconceptions from earlier submissions or just not understanding what the software is doing. The only real thing that makes it not quite as good as the standard is mine isn't threadsafe. If it could be accepted as a higher performance, non-threadsafe call, it would perform better in many typical cases and perhaps even give a safer return value, especially in large upper_bound edge cases, I suspect. There is the specifically non-threadsafe call getchar_unlocked() on OpenBSD which is presumably available for performance reasons alone, when getchar() is a perfectly viable option and is even an ISO conforming function. What I submitted could be such a higher performance non-threadsafe function. so, how about arc4random_uniform_unlocked() ?! ...other than making upper_bound a uint32_t instead of the submitted uint64_t. That'd be somewhat of a problem. -Luke > You've had several developers tell you this is not going to go in. I'd > suggest > "read the room". > > If you want this for your own use, just keep it as a local diff. Nobody > will > know (or likely care). > > -ml >
Re: Picky, but much more efficient arc4random_uniform!
I’m not trying to be rude, but you don’t realize what’s going on here: uuu is a bitmask: ‘uuu’ (or (1 << bits)-1 ) in “ret = rand_holder & uuu;“ , only puts the lower ‘bit’ quantity of bits of rand_holder into ret, then it right shifts rand_holder afterward to trash them every time in the loop when it’s done. So if bits is 8, uuu is going to be 0xff No, you aren't: > > > for (;;) { > > if (rand_bits < bits) { > > rand_holder |= ((uint64_t)arc4random()) << > > rand_bits; > > > > /* > > * rand_bits will be a number between 0 and 31 > here > > * so the 0x20 bit will be empty > > * rand_bits += 32; > > */ > > rand_bits |= 32; > > } > > > > ret = rand_holder & uuu; > > rand_holder >>= bits; > > rand_bits -= bits; > > > > if (ret < upper_bound) > > return ret; > > } > > This isn't rejection sampling. This is reusing part of the rejected > sample. -- -Luke
Re: Picky, but much more efficient arc4random_uniform!
On Sun, May 15, 2022 at 08:40:19PM -0500, Luke Small wrote: > https://marc.info/?l=openbsd-tech&m=165259528425835&w=2 > > This one (which is strongly based upon my first of two versions) which I > submitted after Guenther correctly trashed version 2, doesn’t reuse any > part of the sample. It picks up a clean new bitfield upon failure. > > I think Guenther didn’t, perhaps like yourself, realize I submitted this > later program. That’s why he said it wasn’t correct. It didn’t occur to me > at the time of responding to him: “correct correct correct.” > You've had several developers tell you this is not going to go in. I'd suggest "read the room". If you want this for your own use, just keep it as a local diff. Nobody will know (or likely care). -ml > On Sun, May 15, 2022 at 7:47 PM Damien Miller wrote: > > > On Sat, 14 May 2022, Luke Small wrote: > > > > > Look at my code. I don’t even use a modulus operator. I perform hit and > > > miss with a random bitstream. > > > > > > How can I have a bias of something I don’t do? I return a bitstream which > > > meets the parameters of being a value less than the upper bound. Much > > like > > > arc4random_buf(). > > > > > > If I use arc4random_uniform() repeatedly to create a random distribution > > of > > > say numbers less than 0x1000 or even something weird like 0x1300 will the > > > random distribution be better with arc4random_uniform() or with mine? For > > > 0x1000 mine will simply pluck 12 bits of random data straight from the > > > arc4random() (and preserve the remaining 20 bits for later) on the first > > > try, just like it’s arc4random_buf(). > > > > > > arc4random_uniform() will perform a modulus of a 32 bit number which adds > > > data to the bitstream. Does it make it better? Perhaps it makes it harder > > > to guess the source bits. > > > > > > I don’t know; and I’m not going to pretend to be a cryptologist. But I’m > > > looking at modulo bias. > > > > > > I didn’t know what it was, before, but I basically “rejection sample”: > > > > > > > > https://research.kudelskisecurity.com/2020/07/28/the-definitive-guide-to-modulo-bias-and-how-to-avoid-it/ > > > > No, you aren't: > > > > > for (;;) { > > > if (rand_bits < bits) { > > > rand_holder |= ((uint64_t)arc4random()) << > > > rand_bits; > > > > > > /* > > > * rand_bits will be a number between 0 and 31 > > here > > > * so the 0x20 bit will be empty > > > * rand_bits += 32; > > > */ > > > rand_bits |= 32; > > > } > > > > > > ret = rand_holder & uuu; > > > rand_holder >>= bits; > > > rand_bits -= bits; > > > > > > if (ret < upper_bound) > > > return ret; > > > } > > > > This isn't rejection sampling. This is reusing part of the rejected > > sample. > > > > Think of it like this: you want to uniformly generate a number in the > > range [2:10] by rolling 2x 6-sided dice. What do you do when you roll > > 11 or 12? You can't just reroll one of the dice because the other dice > > is constrained to be have rolled either 5 or 6, and so proceeding with > > it would force the output to be in the range [6:11] for these ~5.6% > > of initial rolls. Your output is no longer uniform. > > > > BTW the existing code already implements the prefered approach of the > > article you quoted. > > > > -d > > -- > -Luke
Re: Picky, but much more efficient arc4random_uniform!
https://marc.info/?l=openbsd-tech&m=165259528425835&w=2 This one (which is strongly based upon my first of two versions) which I submitted after Guenther correctly trashed version 2, doesn’t reuse any part of the sample. It picks up a clean new bitfield upon failure. I think Guenther didn’t, perhaps like yourself, realize I submitted this later program. That’s why he said it wasn’t correct. It didn’t occur to me at the time of responding to him: “correct correct correct.” On Sun, May 15, 2022 at 7:47 PM Damien Miller wrote: > On Sat, 14 May 2022, Luke Small wrote: > > > Look at my code. I don’t even use a modulus operator. I perform hit and > > miss with a random bitstream. > > > > How can I have a bias of something I don’t do? I return a bitstream which > > meets the parameters of being a value less than the upper bound. Much > like > > arc4random_buf(). > > > > If I use arc4random_uniform() repeatedly to create a random distribution > of > > say numbers less than 0x1000 or even something weird like 0x1300 will the > > random distribution be better with arc4random_uniform() or with mine? For > > 0x1000 mine will simply pluck 12 bits of random data straight from the > > arc4random() (and preserve the remaining 20 bits for later) on the first > > try, just like it’s arc4random_buf(). > > > > arc4random_uniform() will perform a modulus of a 32 bit number which adds > > data to the bitstream. Does it make it better? Perhaps it makes it harder > > to guess the source bits. > > > > I don’t know; and I’m not going to pretend to be a cryptologist. But I’m > > looking at modulo bias. > > > > I didn’t know what it was, before, but I basically “rejection sample”: > > > > > https://research.kudelskisecurity.com/2020/07/28/the-definitive-guide-to-modulo-bias-and-how-to-avoid-it/ > > No, you aren't: > > > for (;;) { > > if (rand_bits < bits) { > > rand_holder |= ((uint64_t)arc4random()) << > > rand_bits; > > > > /* > > * rand_bits will be a number between 0 and 31 > here > > * so the 0x20 bit will be empty > > * rand_bits += 32; > > */ > > rand_bits |= 32; > > } > > > > ret = rand_holder & uuu; > > rand_holder >>= bits; > > rand_bits -= bits; > > > > if (ret < upper_bound) > > return ret; > > } > > This isn't rejection sampling. This is reusing part of the rejected > sample. > > Think of it like this: you want to uniformly generate a number in the > range [2:10] by rolling 2x 6-sided dice. What do you do when you roll > 11 or 12? You can't just reroll one of the dice because the other dice > is constrained to be have rolled either 5 or 6, and so proceeding with > it would force the output to be in the range [6:11] for these ~5.6% > of initial rolls. Your output is no longer uniform. > > BTW the existing code already implements the prefered approach of the > article you quoted. > > -d -- -Luke
Re: Picky, but much more efficient arc4random_uniform!
On Sat, 14 May 2022, Luke Small wrote: > Look at my code. I don’t even use a modulus operator. I perform hit and > miss with a random bitstream. > > How can I have a bias of something I don’t do? I return a bitstream which > meets the parameters of being a value less than the upper bound. Much like > arc4random_buf(). > > If I use arc4random_uniform() repeatedly to create a random distribution of > say numbers less than 0x1000 or even something weird like 0x1300 will the > random distribution be better with arc4random_uniform() or with mine? For > 0x1000 mine will simply pluck 12 bits of random data straight from the > arc4random() (and preserve the remaining 20 bits for later) on the first > try, just like it’s arc4random_buf(). > > arc4random_uniform() will perform a modulus of a 32 bit number which adds > data to the bitstream. Does it make it better? Perhaps it makes it harder > to guess the source bits. > > I don’t know; and I’m not going to pretend to be a cryptologist. But I’m > looking at modulo bias. > > I didn’t know what it was, before, but I basically “rejection sample”: > > https://research.kudelskisecurity.com/2020/07/28/the-definitive-guide-to-modulo-bias-and-how-to-avoid-it/ No, you aren't: > for (;;) { > if (rand_bits < bits) { > rand_holder |= ((uint64_t)arc4random()) << > rand_bits; > > /* > * rand_bits will be a number between 0 and 31 here > * so the 0x20 bit will be empty > * rand_bits += 32; > */ > rand_bits |= 32; > } > > ret = rand_holder & uuu; > rand_holder >>= bits; > rand_bits -= bits; > > if (ret < upper_bound) > return ret; > } This isn't rejection sampling. This is reusing part of the rejected sample. Think of it like this: you want to uniformly generate a number in the range [2:10] by rolling 2x 6-sided dice. What do you do when you roll 11 or 12? You can't just reroll one of the dice because the other dice is constrained to be have rolled either 5 or 6, and so proceeding with it would force the output to be in the range [6:11] for these ~5.6% of initial rolls. Your output is no longer uniform. BTW the existing code already implements the prefered approach of the article you quoted. -d
Re: Picky, but much more efficient arc4random_uniform!
On Sun, 15 May 2022, Luke Small wrote: > Do I really have to use specific terminology to make a point? > > I'm not educated enough on chacha20 enough to know whether, like I > pointed out, whether choosing 5 bits from the middle of (or even from > the tail end of one and the beginning of another) 32 bit pseudorandom > cipher is "correct." You don't need to understand chacha20 to understand . arc4random_uniform() I certainly didn't when I wrote it. The underlying CSPRNG is irrelevant to how arc4random_uniform() works. It it treated as an oracle that provides 32 random bit upon request. You could swap it out for 32 coin-tossing monkeys and the implementation wouldn't need to change. It requests another 32 bit random value for each attempt at satisfying the bounds check because they need to be independent - reusing parts of a previous attempt is highly likely to introduce biases. It's almost certainly possible to make this function faster, but it's also very easy to get it wrong (e.g. I made one stupid math error in its early implementation, forever immortalised by CVS). The existing code has the advantage of being very obvious in how it works and therefore has a very low risk of being wrong. If someone is proposing to move to something less obvious then it's incumbent upon them to do the work to prove that their alternative is just as correct. > ...correct correct correct. Did I use that buzzword enough? Highly experienced people are taking he time to give you detailed, critical feedback. This can be hard to receive, but if you ever want to improve then you should consider it and try to engage constructively. -d
Re: Picky, but much more efficient arc4random_uniform!
Do I really have to use specific terminology to make a point? I'm not educated enough on chacha20 enough to know whether, like I pointed out, whether choosing 5 bits from the middle of (or even from the tail end of one and the beginning of another) 32 bit pseudorandom cipher is "correct." ...correct correct correct. Did I use that buzzword enough? -Luke On Sun, May 15, 2022 at 5:26 PM Philip Guenther wrote: > On Sun, 15 May 2022, Luke Small wrote: > > The current implementation is nothing more than a naive arc4random() % > > upper_bound which trashes initial arc4random() calls it doesn’t like, > then > > transforms over a desired modulus. The whole transformation by modulus of > > perfectly decent random data seems so awkward. It’s not like it is used > as > > some majestic artistry of RSA it seems like an ugly HACK to simply meet a > > demand lacking of something better. > > You fail to mention correctness at all or address the fact that your > version isn't while the current one is. Meanwhile, you talk about getting > only just enough random data as if there's some sort of limited supply > when there isn't. > > "My version may be wrong, but at least it doesn't look naive!" > > That is utterly the wrong attitude for OpenBSD code. > > > Best wishes. > > Philip Guenther >
Re: Picky, but much more efficient arc4random_uniform!
On Sun, 15 May 2022, Luke Small wrote: > The current implementation is nothing more than a naive arc4random() % > upper_bound which trashes initial arc4random() calls it doesn’t like, then > transforms over a desired modulus. The whole transformation by modulus of > perfectly decent random data seems so awkward. It’s not like it is used as > some majestic artistry of RSA it seems like an ugly HACK to simply meet a > demand lacking of something better. You fail to mention correctness at all or address the fact that your version isn't while the current one is. Meanwhile, you talk about getting only just enough random data as if there's some sort of limited supply when there isn't. "My version may be wrong, but at least it doesn't look naive!" That is utterly the wrong attitude for OpenBSD code. Best wishes. Philip Guenther
Re: Picky, but much more efficient arc4random_uniform!
The current implementation is nothing more than a naive arc4random() % upper_bound which trashes initial arc4random() calls it doesn’t like, then transforms over a desired modulus. The whole transformation by modulus of perfectly decent random data seems so awkward. It’s not like it is used as some majestic artistry of RSA it seems like an ugly HACK to simply meet a demand lacking of something better. If you understand what I’ve done, it streams in a bitfield into an integer type like it’s a buffer for just enough or slightly more data to meet the demands of the upperbound and if it exceeds upperbound-1, it is trashed and reads in a completely new bitfield to check. It relies on arc4random() supplying good random data regardless of how many bits are in the bitfield. If it does so, it should and seems to supply a good distribution across the length of the bitfield which may often be something like 5 for a common 26+26+10 upper_bound in /usr/src. It seems to me that it should be pretty good if not superior method; at least in the realm of cleaner results. Perhaps it’s confusing what I’ve done with all the bitwise operators, but it isn’t some random hacky thing I’ve cobbled together. Or does arc4random() only provide decent random data 32 bits at a time; or an even 8 bits at a time as arc4random_buf() would suggest? All I would have to prove is that chacha20 provides good or superior random bitfields regardless of how many bits are needed and regardless of whether they begin at the beginning of a byte. I don’t have the education for that, but “I got a ‘puter for Christmas.” lol. I can perhaps run simulations if I have nothing better to do. > I think I can say we know here uniformity is only *one* of the > desirable properties of a secure random generator. > > But I don't think anybody else execpt Luke was talking about > "improving". The sole purpose of arc4random_uniform() is to give a > good implementation of a random number function in a specific range > using arc4random() as the source. This is needed because the naive > implementation arc4random() % upper_bound is not uniform. > > -Otto > > > -- -Luke
Re: Picky, but much more efficient arc4random_uniform!
On Sun, May 15, 2022 at 08:57:09AM -0300, Crystal Kolipe wrote: > On Sun, May 15, 2022 at 11:44:29AM +0200, Otto Moerbeek wrote: > > On Sun, May 15, 2022 at 04:27:30AM -0500, Luke Small wrote: > > > How did someone prove the current implementation was cryptographically > > > sound? Did they run massive simulations which ran the gamut of the > > > uint32_t > > > range which demanded tight tolerances over varying length runs? > > > > > > How was rc4 cipher proven bad for pseudorandom numbers? I???d be willing > > > to > > > bet it wasn???t from a mathematical proof; it was from bad data. > > > You miss the point completely. The point is: how do you derive a > > uniformly distributed random function for a smaller range, given a > > uniformly distibuted random function over the range [0-2^32-1]. > > > > The current implementation does exactly that and has all the > > information in the comments to verify the uniformity claim. You only > > need to use basic mathematical properties of modulo arithmetic to > > do the verification. > > You do all realise that uniform distribution alone does not make a > good random number generator, don't you? > > For example: > > 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5... > > That's uniformly distributed, and also useless as a random number > stream. > > Further more, _cryptographically secure_ random number generation is > not the same as _mathematically good_ random number generation. > > There are plenty of random number generation formulas which are > considered good and useful from a mathematical basis, but which are > useless for cryptography. > > So random, (pun intended), hacks at the arc4random code are not > likely to 'improve' it from the general standpoint, (although if you > have a specific need for a specific private application, that's > different). I think Stuart has already more or less made that point. > I think I can say we know here uniformity is only *one* of the desirable properties of a secure random generator. But I don't think anybody else execpt Luke was talking about "improving". The sole purpose of arc4random_uniform() is to give a good implementation of a random number function in a specific range using arc4random() as the source. This is needed because the naive implementation arc4random() % upper_bound is not uniform. -Otto
Re: Picky, but much more efficient arc4random_uniform!
On Sun, May 15, 2022 at 11:44:29AM +0200, Otto Moerbeek wrote: > On Sun, May 15, 2022 at 04:27:30AM -0500, Luke Small wrote: > > How did someone prove the current implementation was cryptographically > > sound? Did they run massive simulations which ran the gamut of the uint32_t > > range which demanded tight tolerances over varying length runs? > > > > How was rc4 cipher proven bad for pseudorandom numbers? I???d be willing to > > bet it wasn???t from a mathematical proof; it was from bad data. > You miss the point completely. The point is: how do you derive a > uniformly distributed random function for a smaller range, given a > uniformly distibuted random function over the range [0-2^32-1]. > > The current implementation does exactly that and has all the > information in the comments to verify the uniformity claim. You only > need to use basic mathematical properties of modulo arithmetic to > do the verification. You do all realise that uniform distribution alone does not make a good random number generator, don't you? For example: 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5... That's uniformly distributed, and also useless as a random number stream. Further more, _cryptographically secure_ random number generation is not the same as _mathematically good_ random number generation. There are plenty of random number generation formulas which are considered good and useful from a mathematical basis, but which are useless for cryptography. So random, (pun intended), hacks at the arc4random code are not likely to 'improve' it from the general standpoint, (although if you have a specific need for a specific private application, that's different). I think Stuart has already more or less made that point.
Re: Picky, but much more efficient arc4random_uniform!
Random generators have been analyzed for years. Pick up "Concrete mathematics" by Don E. Knuth, read it, then come back with actual science.
Re: Picky, but much more efficient arc4random_uniform!
On Sun, May 15, 2022 at 04:27:30AM -0500, Luke Small wrote: > I have a feeling that making it threadsafe under any quantity of threads > and still perform is likely not feasible, but if I could; or if making a > nonthreadsafe variant was possible: > > Creating a mathematical proof that somehow is “simple and probably correct” > enough, would be an impossible task even for a PhD mathematician. > > How did someone prove the current implementation was cryptographically > sound? Did they run massive simulations which ran the gamut of the uint32_t > range which demanded tight tolerances over varying length runs? > > How was rc4 cipher proven bad for pseudorandom numbers? I’d be willing to > bet it wasn’t from a mathematical proof; it was from bad data. > > I’m guessing that large upperbounds approaching 2**32 don’t feed very > soundly in the current implementation using a modulus; although I suspect > that there isn’t much of a call for such things, I’m pretty sure I saw a > 3,000,000,000 upperbound in the src partition tonight. You miss the point completely. The point is: how do you derive a uniformly distributed random function for a smaller range, given a uniformly distibuted random function over the range [0-2^32-1]. The current implementation does exactly that and has all the information in the comments to verify the uniformity claim. You only need to use basic mathematical properties of modulo arithmetic to do the verification. -Otto > > On Sun, May 15, 2022 at 3:15 AM Otto Moerbeek wrote: > > > On Sun, May 15, 2022 at 01:12:28AM -0500, Luke Small wrote: > > > > > This is version 1, which I was pretty sure was secure. > > > > > > I revamped it with a few features and implanted the binary search for > > 'bit' > > > > > > in most cases, which aren't intentionally worst-case, it's pretty darned > > > fast! > > > > > > This is a sample run of my program with your piece of code included: > > > > > > > > > 1 99319 100239 > > > 2 100353 99584 > > > 3 100032 99879 > > > 4 99704 100229 > > > 5 99530 99796 > > > 6 99617 100355 > > > 7 100490 100319 > > > 8 99793 100114 > > > 9 100426 99744 > > > 10 100315 100558 > > > 11 99279 99766 > > > 12 99572 99750 > > > 13 99955 100017 > > > 14 100413 15 > > > 15 100190 100052 > > > 16 101071 100195 > > > 17 100322 100224 > > > 18 99637 99540 > > > 19 100323 99251 > > > 20 99841 100177 > > > 21 99948 99504 > > > 22 100065 100031 > > > 23 100026 99827 > > > 24 99836 99818 > > > 25 100245 99822 > > > 26 100088 99678 > > > 27 99957 3 > > > 28 100065 99961 > > > 29 100701 100679 > > > 30 99756 99587 > > > 31 100220 100076 > > > 32 100067 15 > > > 33 99547 99984 > > > 34 100124 100031 > > > 35 99547 100661 > > > 36 99801 99963 > > > 37 100189 100230 > > > 38 99878 99579 > > > 39 99864 99442 > > > 40 99683 14 > > > 41 99907 100094 > > > 42 100291 99817 > > > 43 99908 99984 > > > 44 100044 100606 > > > 45 100065 100120 > > > 46 99358 100141 > > > 47 100152 100442 > > > 48 10 100279 > > > 49 100486 99848 > > > > Sadly a sample run cannot be used to proof your implementation is > > correct. It can only be used to show it is not correct, like Philip > > did. To show your implementation produces uniform results in all > > cases, you need to provide a solid argumentation that is easy to > > follow. So far you failed to do that and I do not see it coming, given > > the complexituy of your implementation. The current implementation > > has a straightforward argumentation as it uses relatively simple > > mathematical properties of modulo arithmetic. > > > > It is also clear your code (as it uses statics) is not thread safe. > > > > So to answer you original question clearly: no, we will not accept this. > > > > -Otto > > > -- > -Luke
Re: Picky, but much more efficient arc4random_uniform!
I have a feeling that making it threadsafe under any quantity of threads and still perform is likely not feasible, but if I could; or if making a nonthreadsafe variant was possible: Creating a mathematical proof that somehow is “simple and probably correct” enough, would be an impossible task even for a PhD mathematician. How did someone prove the current implementation was cryptographically sound? Did they run massive simulations which ran the gamut of the uint32_t range which demanded tight tolerances over varying length runs? How was rc4 cipher proven bad for pseudorandom numbers? I’d be willing to bet it wasn’t from a mathematical proof; it was from bad data. I’m guessing that large upperbounds approaching 2**32 don’t feed very soundly in the current implementation using a modulus; although I suspect that there isn’t much of a call for such things, I’m pretty sure I saw a 3,000,000,000 upperbound in the src partition tonight. On Sun, May 15, 2022 at 3:15 AM Otto Moerbeek wrote: > On Sun, May 15, 2022 at 01:12:28AM -0500, Luke Small wrote: > > > This is version 1, which I was pretty sure was secure. > > > > I revamped it with a few features and implanted the binary search for > 'bit' > > > > in most cases, which aren't intentionally worst-case, it's pretty darned > > fast! > > > > This is a sample run of my program with your piece of code included: > > > > > > 1 99319 100239 > > 2 100353 99584 > > 3 100032 99879 > > 4 99704 100229 > > 5 99530 99796 > > 6 99617 100355 > > 7 100490 100319 > > 8 99793 100114 > > 9 100426 99744 > > 10 100315 100558 > > 11 99279 99766 > > 12 99572 99750 > > 13 99955 100017 > > 14 100413 15 > > 15 100190 100052 > > 16 101071 100195 > > 17 100322 100224 > > 18 99637 99540 > > 19 100323 99251 > > 20 99841 100177 > > 21 99948 99504 > > 22 100065 100031 > > 23 100026 99827 > > 24 99836 99818 > > 25 100245 99822 > > 26 100088 99678 > > 27 99957 3 > > 28 100065 99961 > > 29 100701 100679 > > 30 99756 99587 > > 31 100220 100076 > > 32 100067 15 > > 33 99547 99984 > > 34 100124 100031 > > 35 99547 100661 > > 36 99801 99963 > > 37 100189 100230 > > 38 99878 99579 > > 39 99864 99442 > > 40 99683 14 > > 41 99907 100094 > > 42 100291 99817 > > 43 99908 99984 > > 44 100044 100606 > > 45 100065 100120 > > 46 99358 100141 > > 47 100152 100442 > > 48 10 100279 > > 49 100486 99848 > > Sadly a sample run cannot be used to proof your implementation is > correct. It can only be used to show it is not correct, like Philip > did. To show your implementation produces uniform results in all > cases, you need to provide a solid argumentation that is easy to > follow. So far you failed to do that and I do not see it coming, given > the complexituy of your implementation. The current implementation > has a straightforward argumentation as it uses relatively simple > mathematical properties of modulo arithmetic. > > It is also clear your code (as it uses statics) is not thread safe. > > So to answer you original question clearly: no, we will not accept this. > > -Otto > -- -Luke
Re: Picky, but much more efficient arc4random_uniform!
On Sun, May 15, 2022 at 01:12:28AM -0500, Luke Small wrote: > This is version 1, which I was pretty sure was secure. > > I revamped it with a few features and implanted the binary search for 'bit' > > in most cases, which aren't intentionally worst-case, it's pretty darned > fast! > > This is a sample run of my program with your piece of code included: > > > 1 99319 100239 > 2 100353 99584 > 3 100032 99879 > 4 99704 100229 > 5 99530 99796 > 6 99617 100355 > 7 100490 100319 > 8 99793 100114 > 9 100426 99744 > 10 100315 100558 > 11 99279 99766 > 12 99572 99750 > 13 99955 100017 > 14 100413 15 > 15 100190 100052 > 16 101071 100195 > 17 100322 100224 > 18 99637 99540 > 19 100323 99251 > 20 99841 100177 > 21 99948 99504 > 22 100065 100031 > 23 100026 99827 > 24 99836 99818 > 25 100245 99822 > 26 100088 99678 > 27 99957 3 > 28 100065 99961 > 29 100701 100679 > 30 99756 99587 > 31 100220 100076 > 32 100067 15 > 33 99547 99984 > 34 100124 100031 > 35 99547 100661 > 36 99801 99963 > 37 100189 100230 > 38 99878 99579 > 39 99864 99442 > 40 99683 14 > 41 99907 100094 > 42 100291 99817 > 43 99908 99984 > 44 100044 100606 > 45 100065 100120 > 46 99358 100141 > 47 100152 100442 > 48 10 100279 > 49 100486 99848 Sadly a sample run cannot be used to proof your implementation is correct. It can only be used to show it is not correct, like Philip did. To show your implementation produces uniform results in all cases, you need to provide a solid argumentation that is easy to follow. So far you failed to do that and I do not see it coming, given the complexituy of your implementation. The current implementation has a straightforward argumentation as it uses relatively simple mathematical properties of modulo arithmetic. It is also clear your code (as it uses statics) is not thread safe. So to answer you original question clearly: no, we will not accept this. -Otto
Re: Picky, but much more efficient arc4random_uniform!
This is version 1, which I was pretty sure was secure. I revamped it with a few features and implanted the binary search for 'bit' in most cases, which aren't intentionally worst-case, it's pretty darned fast! This is a sample run of my program with your piece of code included: 1 99319 100239 2 100353 99584 3 100032 99879 4 99704 100229 5 99530 99796 6 99617 100355 7 100490 100319 8 99793 100114 9 100426 99744 10 100315 100558 11 99279 99766 12 99572 99750 13 99955 100017 14 100413 15 15 100190 100052 16 101071 100195 17 100322 100224 18 99637 99540 19 100323 99251 20 99841 100177 21 99948 99504 22 100065 100031 23 100026 99827 24 99836 99818 25 100245 99822 26 100088 99678 27 99957 3 28 100065 99961 29 100701 100679 30 99756 99587 31 100220 100076 32 100067 15 33 99547 99984 34 100124 100031 35 99547 100661 36 99801 99963 37 100189 100230 38 99878 99579 39 99864 99442 40 99683 14 41 99907 100094 42 100291 99817 43 99908 99984 44 100044 100606 45 100065 100120 46 99358 100141 47 100152 100442 48 10 100279 49 100486 99848 newdata time = 0.240530757 seconds newdata_fast time = 0.073868626 seconds The original takes 325.620% of the runtime of newdata_fast newdataTypableFilename time = 0.236748989 seconds newdataTypableFilename_fast time = 0.115842333 seconds The original takes 204.372% of the runtime of newdataTypableFilename_fast rand_ideal time = 0.235832820 seconds rand_ideal_fast time = 0.025300798 seconds The original takes 932.116% of the runtime of rand_ideal_fast rand_short_ideal time = 0.236828684 seconds rand_short_ideal_fast time = 0.124025922 seconds The original takes 190.951% of the runtime of rand_short_ideal_fast rand_short_worst time = 0.237142869 seconds rand_short_worst_fast time = 0.294156486 seconds The original takes 80.618% of the runtime of rand_short_worst_fast rand_worst time = 0.237002775 seconds rand_worst_fast time = 0.377148420 seconds The original takes 62.841% of the runtime of rand_worst_fast shuffle time = 0.003044472 seconds shuffle_fast time = 0.002590664 seconds The original takes 117.517% of the runtime of shuffle_fast If it crashed here, you are trying to profile. Turn off pledge at the beginning of main(). On Sat, May 14, 2022 at 7:03 PM Philip Guenther wrote: > On Sun, 15 May 2022, Steffen Nurpmeso wrote: > > Stuart Henderson wrote in > ... > > |what's the perceived problem you're wanting to solve? and does that > > |problem actually exist in the first place? > > > > The problem is that if have a low upper bound then modulo will "remove a > > lot of randomization". For example if you have a program which > > generates Lotto numbers (1..49), then using _uniform() as it is will > > generate many duplicates. > > Wut. The *WHOLE POINT* of arc4random_uniform() is that it has uniform > distribution. Says right so in the manpage. If an implementation of that > API fails to do that, it's a broken implementation. > > The OpenBSD implementation achieves that. NetBSD's implementation has the > same core logic. Here's my quick lotto pick demo program, doing 490 > picks, so they should all be somewhere near 10: > > #include > #include > #define LIMIT 49 > int > main(void) > { > unsigned counts[LIMIT] = { 0 }; > int i; > for (i = 0; i < 10 * LIMIT; i++) > counts[arc4random_uniform(LIMIT)]++; > for (i = 0; i < LIMIT; i++) > printf("%2d\t%7u\n", i+1, counts[i]); > return 0; > } > > And a sample run: > > : bleys; ./a.out > > 1 100100 > 2 100639 > 399965 > 499729 > 599641 > 699650 > 7 100299 > 8 100164 > 999791 > 10 100101 > 11 100657 > 12 100042 > 1399661 > 1499927 > 1599426 > 1699491 > 1799646 > 18 100133 > 19 100013 > 2099942 > 2199873 > 2299924 > 2399567 > 24 100152 > 25 100688 > 26 100011 > 27 100481 > 2899980 > 29 100406 > 3099726 > 3199808 > 3299929 > 33 100050 > 3499983 > 35 100048 > 3699771 > 3799906 > 38 100215 > 39 100261 > 40 100426 > 4199847 > 4299533 > 43 100368 > 4499695 > 45 100041 > 46 100465 > 4799875 > 48 100034 > 4999920 > : bleys; > > Looks pretty good to me, with repeated runs showing the values bouncing > around. &
Re: Picky, but much more efficient arc4random_uniform!
On Sat, May 14, 2022 at 05:03:17PM -0700, Philip Guenther wrote: > On Sun, 15 May 2022, Steffen Nurpmeso wrote: > > Stuart Henderson wrote in > ... > > |what's the perceived problem you're wanting to solve? and does that > > |problem actually exist in the first place? > > > > The problem is that if have a low upper bound then modulo will "remove a > > lot of randomization". For example if you have a program which > > generates Lotto numbers (1..49), then using _uniform() as it is will > > generate many duplicates. > > Wut. The *WHOLE POINT* of arc4random_uniform() is that it has uniform > distribution. Says right so in the manpage. If an implementation of that > API fails to do that, it's a broken implementation. > > The OpenBSD implementation achieves that. NetBSD's implementation has the > same core logic. Here's my quick lotto pick demo program, doing 490 > picks, so they should all be somewhere near 10: > > #include > #include > #define LIMIT 49 > int > main(void) > { > unsigned counts[LIMIT] = { 0 }; > int i; > for (i = 0; i < 10 * LIMIT; i++) > counts[arc4random_uniform(LIMIT)]++; > for (i = 0; i < LIMIT; i++) > printf("%2d\t%7u\n", i+1, counts[i]); > return 0; > } > > And a sample run: > > : bleys; ./a.out > > 1 100100 > 2 100639 > 399965 > 499729 > 599641 > 699650 > 7 100299 > 8 100164 > 999791 > 10 100101 > 11 100657 > 12 100042 > 1399661 > 1499927 > 1599426 > 1699491 > 1799646 > 18 100133 > 19 100013 > 2099942 > 2199873 > 2299924 > 2399567 > 24 100152 > 25 100688 > 26 100011 > 27 100481 > 2899980 > 29 100406 > 3099726 > 3199808 > 3299929 > 33 100050 > 3499983 > 35 100048 > 3699771 > 3799906 > 38 100215 > 39 100261 > 40 100426 > 4199847 > 4299533 > 43 100368 > 4499695 > 45 100041 > 46 100465 > 4799875 > 48 100034 > 4999920 > : bleys; > > Looks pretty good to me, with repeated runs showing the values bouncing > around. > > > ... > > I was a bit interested when i saw Luke's message, but i am no > > mathematician and, like i said, i in fact never needed the > > functionality. And the question i would have is how small > > upper_bound has to be for the modulo problem to really kick in. > > If the implementation isn't broken, it's not a problem, period. > > > > And even if, whether it matters. For example, if you have > > a bitset of 1024 "in-flight things", and the small upper bound > > hits a used slot, you could simply search forward until you find > > a free one. I mean, with 1024 possible values the number of > > possibilities is anyhow restricted. > > Well hey, let's give Luke's a try and see how uniform it is: > > > #include > #include > #define LIMIT 49 > int > main(void) > { > unsigned counts[LIMIT] = { 0 }; > unsigned counts2[LIMIT] = { 0 }; > int i; > for (i = 0; i < 10 * LIMIT; i++) { > counts[arc4random_uniform(LIMIT)]++; > counts2[arc4random_uniform_fast_simple(LIMIT)]++; > } > for (i = 0; i < LIMIT; i++) > printf("%2d\t%7u\t%7u\n", i+1, counts[i], counts2[i]); > return 0; > } > > : bleys; ./a.out > 1 100188 76502 > 299983 76602 > 3 100521 76522 > 4 100416 76682 > 5 100171 76387 > 6 100163 76759 > 7 100024 76336 > 8 19 76703 > 999769 76237 > 1099892 76532 > 11 100197 76730 > 12 100483 76398 > 1399769 76310 > 14 100075 76474 > 1599781 76599 > 1699846 76439 > 1799814 76430 > 18 100313 76648 > 19 100259 76813 > 2099885 77068 > 21 100302 76546 > 228 76698 > 2399491 76678 > 24 100340 76324 > 2599763 115263 > 2699872 153008 > 27 100022 1
Re: Picky, but much more efficient arc4random_uniform!
On Sun, 15 May 2022, Steffen Nurpmeso wrote: > Stuart Henderson wrote in ... > |what's the perceived problem you're wanting to solve? and does that > |problem actually exist in the first place? > > The problem is that if have a low upper bound then modulo will "remove a > lot of randomization". For example if you have a program which > generates Lotto numbers (1..49), then using _uniform() as it is will > generate many duplicates. Wut. The *WHOLE POINT* of arc4random_uniform() is that it has uniform distribution. Says right so in the manpage. If an implementation of that API fails to do that, it's a broken implementation. The OpenBSD implementation achieves that. NetBSD's implementation has the same core logic. Here's my quick lotto pick demo program, doing 490 picks, so they should all be somewhere near 10: #include #include #define LIMIT 49 int main(void) { unsigned counts[LIMIT] = { 0 }; int i; for (i = 0; i < 10 * LIMIT; i++) counts[arc4random_uniform(LIMIT)]++; for (i = 0; i < LIMIT; i++) printf("%2d\t%7u\n", i+1, counts[i]); return 0; } And a sample run: : bleys; ./a.out 1 100100 2 100639 399965 499729 599641 699650 7 100299 8 100164 999791 10 100101 11 100657 12 100042 1399661 1499927 1599426 1699491 1799646 18 100133 19 100013 2099942 2199873 2299924 2399567 24 100152 25 100688 26 100011 27 100481 2899980 29 100406 3099726 3199808 3299929 33 100050 3499983 35 100048 3699771 3799906 38 100215 39 100261 40 100426 4199847 4299533 43 100368 4499695 45 100041 46 100465 4799875 48 100034 4999920 : bleys; Looks pretty good to me, with repeated runs showing the values bouncing around. ... > I was a bit interested when i saw Luke's message, but i am no > mathematician and, like i said, i in fact never needed the > functionality. And the question i would have is how small > upper_bound has to be for the modulo problem to really kick in. If the implementation isn't broken, it's not a problem, period. > And even if, whether it matters. For example, if you have > a bitset of 1024 "in-flight things", and the small upper bound > hits a used slot, you could simply search forward until you find > a free one. I mean, with 1024 possible values the number of > possibilities is anyhow restricted. Well hey, let's give Luke's a try and see how uniform it is: #include #include #define LIMIT 49 int main(void) { unsigned counts[LIMIT] = { 0 }; unsigned counts2[LIMIT] = { 0 }; int i; for (i = 0; i < 10 * LIMIT; i++) { counts[arc4random_uniform(LIMIT)]++; counts2[arc4random_uniform_fast_simple(LIMIT)]++; } for (i = 0; i < LIMIT; i++) printf("%2d\t%7u\t%7u\n", i+1, counts[i], counts2[i]); return 0; } : bleys; ./a.out 1 100188 76502 299983 76602 3 100521 76522 4 100416 76682 5 100171 76387 6 100163 76759 7 100024 76336 8 19 76703 999769 76237 1099892 76532 11 100197 76730 12 100483 76398 1399769 76310 14 100075 76474 1599781 76599 1699846 76439 1799814 76430 18 100313 76648 19 100259 76813 2099885 77068 21 100302 76546 228 76698 2399491 76678 24 100340 76324 2599763 115263 2699872 153008 27 100022 152979 2899481 153793 29 100018 210714 3099617 229286 31 100167 297003 32 100270 449664 33 100468 76790 3499115 76452 3599921 76392 3699862 76140 37 100485 76607 38 100029 75885 3999577 76498 4099479 76727 41 100139 76746 42 100883 76698 43 100102 76474 4499801 76592 45 100117 76124 4699678 76417 4799770 76639 4899524 77034 49 100151 76658 : bleys; Wow, that last column is *bad*. Repeated runs show 25-32 are consistently high, 32 being *outrageously* off, and the others are low. Luke's implementation does not correctly implement the API. Doesn't matter if it's a million times faster when it doesn't deliver the goal. Philip Guenther
Re: Picky, but much more efficient arc4random_uniform!
Stuart Henderson wrote in : |On 2022/05/14 06:56, Luke Small wrote: |> If I use arc4random_uniform() repeatedly to create a random distribution \ |> of |> say numbers less than 0x1000 or even something weird like 0x1300 will the |> random distribution be better with arc4random_uniform() or with mine? | |there's no point to have a choice of different arc4random_uniform_xyz |with different characteristics, how is somebody going to pick the |"right" one? | |the bottom of netbsd's version of the arc4random(3) manual says it well: | | One may be tempted to create new APIs to accommodate different \ | security | models and performance constraints without unpleasant surprises \ | on dif- | ferent operating systems. This should not be done lightly, though, | because there are already too many different choices, and too \ | many oppor- | tunities for programmers to reach for one and pick the wrong one. | |what's the perceived problem you're wanting to solve? and does that |problem actually exist in the first place? The problem is that if have a low upper bound then modulo will "remove a lot of randomization". For example if you have a program which generates Lotto numbers (1..49), then using _uniform() as it is will generate many duplicates. I used the approach i found in a TeX file over twenty years ago ("RANDOM.TEX, v.1 (Donald Arseneau) from TeTeX, texmf/tex/generic/misc/random.tex"; and now that i look, he published a .v2 not too long ago), it was named _clip() as it produces something in between a minimum and a maximum: _max -= _min; ++_max; _min = Maximum::sir - 2; // m - 2 = 2^(32|64) - 3 if (_max != 0) _min /= _max; for(;;) { uir i; i = _random; i += M1::uir; if(_min != 0) i /= _min; if(i >= _max && (i != 0 || _max != 0)) { _random *= 16807; // just MUL if(_random == 0) _random = 1; continue; } _random = i; break; } _random += origmin; _Assert(1 && _random >= origmin && _random <= origmax); That is the random number is multiplied with a, eh, non-prime, as long as we do not fit. However i found this became a terrible mess with 64-bit numbers, so i removed the function, aka did not "port it over" when i implemented a random number generator for my own use cases. Maybe i should have increased the multiplicator. It is really, really bad. :-) I only ever used it for a Lotto number program anyhow, i think, and so i thought it would be better to simply read a one byte random and skip those that do not fit, instead of making them always fit with modulo 50. That is, application specific workaround. Things are less harmful for large numbers, anyway. In fact i have to say i was astonished to see someone using this function, not too long ago i saw a FreeBSD commit fly by which uses it! I was a bit interested when i saw Luke's message, but i am no mathematician and, like i said, i in fact never needed the functionality. And the question i would have is how small upper_bound has to be for the modulo problem to really kick in. And even if, whether it matters. For example, if you have a bitset of 1024 "in-flight things", and the small upper bound hits a used slot, you could simply search forward until you find a free one. I mean, with 1024 possible values the number of possibilities is anyhow restricted. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: Picky, but much more efficient arc4random_uniform!
int main() { int results[3] = { 0, 0, 0 }; for (int i = 0; i < 10; i++) { results[arc4random_uniform_fast_simple(3)]++; } for (int i = 0; i < 3; i++) printf("%d: %d\n", i, results[i]); return 0; } % ./a.out 0: 24809 1: 50011 2: 25180 You can't reuse bits because they'll be biased.
Re: Picky, but much more efficient arc4random_uniform!
On 2022/05/14 06:56, Luke Small wrote: > If I use arc4random_uniform() repeatedly to create a random distribution of > say numbers less than 0x1000 or even something weird like 0x1300 will the > random distribution be better with arc4random_uniform() or with mine? there's no point to have a choice of different arc4random_uniform_xyz with different characteristics, how is somebody going to pick the "right" one? the bottom of netbsd's version of the arc4random(3) manual says it well: One may be tempted to create new APIs to accommodate different security models and performance constraints without unpleasant surprises on dif- ferent operating systems. This should not be done lightly, though, because there are already too many different choices, and too many oppor- tunities for programmers to reach for one and pick the wrong one. what's the perceived problem you're wanting to solve? and does that problem actually exist in the first place?
Re: Picky, but much more efficient arc4random_uniform!
Look at my code. I don’t even use a modulus operator. I perform hit and miss with a random bitstream. How can I have a bias of something I don’t do? I return a bitstream which meets the parameters of being a value less than the upper bound. Much like arc4random_buf(). If I use arc4random_uniform() repeatedly to create a random distribution of say numbers less than 0x1000 or even something weird like 0x1300 will the random distribution be better with arc4random_uniform() or with mine? For 0x1000 mine will simply pluck 12 bits of random data straight from the arc4random() (and preserve the remaining 20 bits for later) on the first try, just like it’s arc4random_buf(). arc4random_uniform() will perform a modulus of a 32 bit number which adds data to the bitstream. Does it make it better? Perhaps it makes it harder to guess the source bits. I don’t know; and I’m not going to pretend to be a cryptologist. But I’m looking at modulo bias. I didn’t know what it was, before, but I basically “rejection sample”: https://research.kudelskisecurity.com/2020/07/28/the-definitive-guide-to-modulo-bias-and-how-to-avoid-it/ On Sat, May 14, 2022 at 6:14 AM Otto Moerbeek wrote: > On Sat, May 14, 2022 at 05:48:10AM -0500, Luke Small wrote: > > > arc4random_uniform_fast2 that I made, streams in data from arc4random() > and > > uses the datastream directly and uses it as a bit by bit right "sliding > > window" in the last loop. arc4random_uniform() uses a modulus which I is > > simple to implement, but I wonder how cryptographically sound or even how > > evenly it distributes. Adding a modulus seems sloppy without something > > better. I did make arc4random_fast_simple() which merely takes an > > upperbound. I integrated arc4random_uniform_fast_bitsearch() or whatever > > the top function was into it which binary searches to find the correct > size > > bitfield (return value) needed to barely fit the upperbound while also > > being able to discover every possible value below the upperbound. It > isn't > > as fast as arc4random_uniform_fast2 if it were used repeatedly after a > > single use of arc4random_uniform_fast_bitsearch() , but it does exactly > the > > same thing and appears faster than repeatedly using arc4random_uniform() > > and it's wasteful use of arc4random() and calling the expensive rekeying > > function more often. > > > > It may be interesting to determine even without looking at performance, > > whether arc4random_fast_simple() creates a more superior, secure use of > the > > chacha20 stream than arc4random_uniform() with the modulus. what exactly > > does all that extra data from the modulus do to the random distribution? > > > > -Luke > > I don't follow you at all. Your blabbering does not even use the terms > "uniform" and "modulo bias". I wonder even if you realize what they > mean in this context. > > -Otto > > -- -Luke
Re: Picky, but much more efficient arc4random_uniform!
On Sat, May 14, 2022 at 05:48:10AM -0500, Luke Small wrote: > arc4random_uniform_fast2 that I made, streams in data from arc4random() and > uses the datastream directly and uses it as a bit by bit right "sliding > window" in the last loop. arc4random_uniform() uses a modulus which I is > simple to implement, but I wonder how cryptographically sound or even how > evenly it distributes. Adding a modulus seems sloppy without something > better. I did make arc4random_fast_simple() which merely takes an > upperbound. I integrated arc4random_uniform_fast_bitsearch() or whatever > the top function was into it which binary searches to find the correct size > bitfield (return value) needed to barely fit the upperbound while also > being able to discover every possible value below the upperbound. It isn't > as fast as arc4random_uniform_fast2 if it were used repeatedly after a > single use of arc4random_uniform_fast_bitsearch() , but it does exactly the > same thing and appears faster than repeatedly using arc4random_uniform() > and it's wasteful use of arc4random() and calling the expensive rekeying > function more often. > > It may be interesting to determine even without looking at performance, > whether arc4random_fast_simple() creates a more superior, secure use of the > chacha20 stream than arc4random_uniform() with the modulus. what exactly > does all that extra data from the modulus do to the random distribution? > > -Luke I don't follow you at all. Your blabbering does not even use the terms "uniform" and "modulo bias". I wonder even if you realize what they mean in this context. -Otto
Re: Picky, but much more efficient arc4random_uniform!
arc4random_uniform_fast2 that I made, streams in data from arc4random() and uses the datastream directly and uses it as a bit by bit right "sliding window" in the last loop. arc4random_uniform() uses a modulus which I is simple to implement, but I wonder how cryptographically sound or even how evenly it distributes. Adding a modulus seems sloppy without something better. I did make arc4random_fast_simple() which merely takes an upperbound. I integrated arc4random_uniform_fast_bitsearch() or whatever the top function was into it which binary searches to find the correct size bitfield (return value) needed to barely fit the upperbound while also being able to discover every possible value below the upperbound. It isn't as fast as arc4random_uniform_fast2 if it were used repeatedly after a single use of arc4random_uniform_fast_bitsearch() , but it does exactly the same thing and appears faster than repeatedly using arc4random_uniform() and it's wasteful use of arc4random() and calling the expensive rekeying function more often. It may be interesting to determine even without looking at performance, whether arc4random_fast_simple() creates a more superior, secure use of the chacha20 stream than arc4random_uniform() with the modulus. what exactly does all that extra data from the modulus do to the random distribution? -Luke
Re: Picky, but much more efficient arc4random_uniform!
In general, wasting some arc4random bits is not a problem at all. I see a lot of code with no demonstration, explanation or proof why it would be correct (i.e. produces uniform results). You only talk about speed. The current code might not be optimal, but at least it is very clear it's correct. -Otto On Fri, May 13, 2022 at 09:43:26AM -0500, Luke Small wrote: > I made a couple new versions of a new kind of arc4random_uniform-like > function and some example functions which use them. Instead of having a > sufficiently large random number greater than the modulus, I pick a random > number using arc4random() from a bitfield where the length of the bitfield > is just below or slightly beyond the value of the modulus and returns the > bitfield it if it is less than the value of the modulus. > > In both versions, I retrieve a bitfield from a static uint64_t which is > refreshed from periodic arc4random() calls. The functions demand a bit > length. but I provide a convenient bitsize search function: > arc4random_uniform_fast_bitsearch() > > in the first version, if the chosen return value isn't less than the > modulus, the entire bitfield is trashed and a completely new bitfield is > refreshed from the cache. This can be very inefficient and for certain > upperbounds where the functions demand that all returned values less than > the upperbound are possible. This can perform worse than > arc4random_uniform() even with more random number generation churn. > > in the second version, if the chosen return value isn't less than the > modulus, the bitfield is shifted right, losing the least significant bit > and a new random-value bit from the least significant bit of the cache is > put into the most significant bit of the bitfield and it tries again. For > significant expenditures of effort, this version is always more efficient > than arc4random_uniform() and slightly to much more efficient than my first > version. > > > The performance boost comes from less pseudorandom number generation by not > trashing perfectly good random data and preserving it so that rekeying is > performed less. > > > I suspect that the first version may be secure, but I'm now sure what you > think of the second version, for being secure. Is there a way to test how > well distributed and secure it is?! > > Perhaps it's even better than the typical use case of arc4random_uniform > since it feeds it a bitstream instead of performing a modulous operation on > it! > > I pledge("stdio") and unveil(NULL, NULL) at the beginning, so you know it > doesn't do anything but malloc() an array, do stuff on the array and print > stuff; if you don't profile, anyway. > > What do you think of having such a thing in OpenBSD? > > > > > newdata() generates random a-z characters performing arc4random_uniform(26); > > newdataTypableFilename() generates random filename typable characters > performing arc4random_uniform(92); > > I perform other tests including presumed worst-cases on large numbers and > numbers which will have a lot of misses. > > > > > -Luke
Picky, but much more efficient arc4random_uniform!
I made a couple new versions of a new kind of arc4random_uniform-like function and some example functions which use them. Instead of having a sufficiently large random number greater than the modulus, I pick a random number using arc4random() from a bitfield where the length of the bitfield is just below or slightly beyond the value of the modulus and returns the bitfield it if it is less than the value of the modulus. In both versions, I retrieve a bitfield from a static uint64_t which is refreshed from periodic arc4random() calls. The functions demand a bit length. but I provide a convenient bitsize search function: arc4random_uniform_fast_bitsearch() in the first version, if the chosen return value isn't less than the modulus, the entire bitfield is trashed and a completely new bitfield is refreshed from the cache. This can be very inefficient and for certain upperbounds where the functions demand that all returned values less than the upperbound are possible. This can perform worse than arc4random_uniform() even with more random number generation churn. in the second version, if the chosen return value isn't less than the modulus, the bitfield is shifted right, losing the least significant bit and a new random-value bit from the least significant bit of the cache is put into the most significant bit of the bitfield and it tries again. For significant expenditures of effort, this version is always more efficient than arc4random_uniform() and slightly to much more efficient than my first version. The performance boost comes from less pseudorandom number generation by not trashing perfectly good random data and preserving it so that rekeying is performed less. I suspect that the first version may be secure, but I'm now sure what you think of the second version, for being secure. Is there a way to test how well distributed and secure it is?! Perhaps it's even better than the typical use case of arc4random_uniform since it feeds it a bitstream instead of performing a modulous operation on it! I pledge("stdio") and unveil(NULL, NULL) at the beginning, so you know it doesn't do anything but malloc() an array, do stuff on the array and print stuff; if you don't profile, anyway. What do you think of having such a thing in OpenBSD? newdata() generates random a-z characters performing arc4random_uniform(26); newdataTypableFilename() generates random filename typable characters performing arc4random_uniform(92); I perform other tests including presumed worst-cases on large numbers and numbers which will have a lot of misses. -Luke arc4random_uniform_fast.c Description: Binary data
Re: [patch] Convert modulus to arc4random_uniform
Theo Buehler wrote: > I've now committed most of your diff, thanks once again. > > o I asked for further review on the kernel parts > o I'm going to skip hack for now > > Here's a patch for libc, based on the previous discussion. > I think this is easier to read and understand. No binary change on > amd64. > > ok? sure. the current code is safer for other values of randmax, but that seems an unlikely problem.
Re: [patch] Convert modulus to arc4random_uniform
I've now committed most of your diff, thanks once again. o I asked for further review on the kernel parts o I'm going to skip hack for now Here's a patch for libc, based on the previous discussion. I think this is easier to read and understand. No binary change on amd64. ok? Index: lib/libc/stdlib/rand.c === RCS file: /var/cvs/src/lib/libc/stdlib/rand.c,v retrieving revision 1.15 diff -u -p -r1.15 rand.c --- lib/libc/stdlib/rand.c 13 Sep 2015 08:31:47 - 1.15 +++ lib/libc/stdlib/rand.c 16 Dec 2015 08:02:41 - @@ -37,7 +37,7 @@ int rand_r(u_int *seed) { *seed = *seed * 1103515245 + 12345; - return (*seed % ((u_int)RAND_MAX + 1)); + return (*seed & RAND_MAX); } DEF_WEAK(rand_r); @@ -50,7 +50,7 @@ int rand(void) { if (rand_deterministic == 0) - return (arc4random() % ((u_int)RAND_MAX + 1)); + return (arc4random() & RAND_MAX); return (rand_r(&next)); }
Re: [patch] Convert modulus to arc4random_uniform
Matthew Martin writes: > On Mon, Dec 07, 2015 at 09:33:47AM +0100, Theo Buehler wrote: >> I think some of these are ok, but I'm unsure about some of the others. >> Here are some of my concerns: >> >> - since arc4random_uniform can potentially loop indefinitely, it >> might interfere with predictable timing of some routines. I can't >> tell if this is harmless in all cases below. This applies in >> particular to the proposed changes in the kernel. > > I hadn't considered timing problems. I'll look at it again tonight, but > someone more familiar with the code should certainly look at it before > committing. > >> - changing random() to arc4random() in games might have undesired >> side-effects like interfering with the reproducibility of tests or >> games. I think this might apply to awk for the same reason as in the >> shells. > > The patch for awk was wrong. If there's a concern about upstream for nsd/sqlite/libevent, then note that awk has an upstream too. [...] -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: [patch] Convert modulus to arc4random_uniform
> I'll look into hack tonight when I have more time. Honestly, I would prefer to leave hack as it is right now since it will take some work to repair it anyway. I would not want to add another layer of (potential) complications. > > > Index: lib/libc/stdlib/rand.c > > > === > It's safe but takes a bit of thinking. I first had it as > return (arc4random() & RAND_MAX); > which to me is more obviously correct, but since it's safe as is. I have > no strong opinion on this. I have a mild preference for this version, and I think this version would be preferable from the point of view of uniformity of usage, especially after your patches have gone in. > > > Index: usr.bin/awk/run.c > > > === > > > > Unsure about this one. I think deterministic sequences might be desired > > in some circumstances (this one is deterministic when a seed was given). > > theo@ also pointed out that awk can be deterministic. Since RAND_MAX is > 1 below a power of 2, & is safe. How about > > Index: run.c > === > RCS file: /cvs/src/usr.bin/awk/run.c,v > retrieving revision 1.39 > diff -u -p -r1.39 run.c > --- run.c 5 Sep 2015 22:07:10 - 1.39 > +++ run.c 7 Dec 2015 19:28:31 - > @@ -1581,7 +1581,7 @@ Cell *bltin(Node **a, int n)/* builtin > u = (Awkfloat) system(getsval(x)) / 256; /* 256 is unix-dep */ > break; > case FRAND: > - u = (Awkfloat) (random() % RAND_MAX) / RAND_MAX; > + u = (Awkfloat) (random() & RAND_MAX) / ((u_int)RAND_MAX + 1); > break; > case FSRAND: > if (isrec(x)) { /* no argument provided */ > ok from me on this one.
Re: [patch] Convert modulus to arc4random_uniform
On Mon, Dec 07, 2015 at 09:33:47AM +0100, Theo Buehler wrote: > I think some of these are ok, but I'm unsure about some of the others. > Here are some of my concerns: > > - since arc4random_uniform can potentially loop indefinitely, it > might interfere with predictable timing of some routines. I can't > tell if this is harmless in all cases below. This applies in > particular to the proposed changes in the kernel. I hadn't considered timing problems. I'll look at it again tonight, but someone more familiar with the code should certainly look at it before committing. > - changing random() to arc4random() in games might have undesired > side-effects like interfering with the reproducibility of tests or > games. I think this might apply to awk for the same reason as in the > shells. The patch for awk was wrong. Updated patch below. I'll look into hack tonight when I have more time. > > Index: lib/libc/stdlib/rand.c > > === > > RCS file: /cvs/src/lib/libc/stdlib/rand.c,v > > retrieving revision 1.15 > > diff -u -p -r1.15 rand.c > > --- lib/libc/stdlib/rand.c 13 Sep 2015 08:31:47 - 1.15 > > +++ lib/libc/stdlib/rand.c 7 Dec 2015 06:42:17 - > > @@ -50,7 +50,7 @@ int > > rand(void) > > { > > if (rand_deterministic == 0) > > - return (arc4random() % ((u_int)RAND_MAX + 1)); > > + return (arc4random_uniform((u_int)RAND_MAX + 1)); > > return (rand_r(&next)); > > } > > > > this is modulo 2^n, so I think this one is fine as it is. It's safe but takes a bit of thinking. I first had it as return (arc4random() & RAND_MAX); which to me is more obviously correct, but since it's safe as is. I have no strong opinion on this. > > Index: usr.bin/awk/run.c > > === > > Unsure about this one. I think deterministic sequences might be desired > in some circumstances (this one is deterministic when a seed was given). theo@ also pointed out that awk can be deterministic. Since RAND_MAX is 1 below a power of 2, & is safe. How about Index: run.c === RCS file: /cvs/src/usr.bin/awk/run.c,v retrieving revision 1.39 diff -u -p -r1.39 run.c --- run.c 5 Sep 2015 22:07:10 - 1.39 +++ run.c 7 Dec 2015 19:28:31 - @@ -1581,7 +1581,7 @@ Cell *bltin(Node **a, int n) /* builtin u = (Awkfloat) system(getsval(x)) / 256; /* 256 is unix-dep */ break; case FRAND: - u = (Awkfloat) (random() % RAND_MAX) / RAND_MAX; + u = (Awkfloat) (random() & RAND_MAX) / ((u_int)RAND_MAX + 1); break; case FSRAND: if (isrec(x)) { /* no argument provided */
Re: [patch] Convert modulus to arc4random_uniform
On Mon, Dec 07, 2015 at 12:49:17AM -0600, Matthew Martin wrote: > > Theo's diff inspired me to look for other cases of modulo bias. The > following diff converts most modulus operations on a random number to > use arc4random_uniform or & as appropriate. I excluded > > lib/libsqlite3/src/resolve.c > regress/lib/libevent/test-time.c > usr.sbin/nsd/rrl.c > usr.sbin/nsd/util.c > usr.sbin/nsd/xfrd.c > > as they seem to have upstreams. The only other case is games/wump/wump.c > which has > > if (arc4random() % 2 == 1) > > This is safe and seems obvious enough to me. > > - Matthew Martin Thank you. I was going to do the same :) I think some of these are ok, but I'm unsure about some of the others. Here are some of my concerns: - since arc4random_uniform can potentially loop indefinitely, it might interfere with predictable timing of some routines. I can't tell if this is harmless in all cases below. This applies in particular to the proposed changes in the kernel. - changing random() to arc4random() in games might have undesired side-effects like interfering with the reproducibility of tests or games. I think this might apply to awk for the same reason as in the shells. > Index: games/atc/update.c > === ok > Index: games/hack/hack.mklev.c > === > Index: games/hack/rnd.c > === unsure about these two > Index: lib/libc/stdlib/rand.c > === > RCS file: /cvs/src/lib/libc/stdlib/rand.c,v > retrieving revision 1.15 > diff -u -p -r1.15 rand.c > --- lib/libc/stdlib/rand.c13 Sep 2015 08:31:47 - 1.15 > +++ lib/libc/stdlib/rand.c7 Dec 2015 06:42:17 - > @@ -50,7 +50,7 @@ int > rand(void) > { > if (rand_deterministic == 0) > - return (arc4random() % ((u_int)RAND_MAX + 1)); > + return (arc4random_uniform((u_int)RAND_MAX + 1)); > return (rand_r(&next)); > } > this is modulo 2^n, so I think this one is fine as it is. > Index: sbin/dhclient/dhclient.c > === I have already done this independently. > Index: sys/dev/ic/sili.c > === > Index: sys/netinet6/nd6_rtr.c > === > Index: sys/ufs/ffs/ffs_alloc.c > === These must definitely be looked at by somebody else. > Index: usr.bin/awk/run.c > === Unsure about this one. I think deterministic sequences might be desired in some circumstances (this one is deterministic when a seed was given). > Index: usr.sbin/npppd/common/slist.c > === > Index: usr.sbin/npppd/l2tp/l2tpd.c > === > Index: usr.sbin/npppd/pppoe/pppoed.c > === > Index: usr.sbin/npppd/pptp/pptpd.c > === These four are ok with me.
[patch] Convert modulus to arc4random_uniform
Theo's diff inspired me to look for other cases of modulo bias. The following diff converts most modulus operations on a random number to use arc4random_uniform or & as appropriate. I excluded lib/libsqlite3/src/resolve.c regress/lib/libevent/test-time.c usr.sbin/nsd/rrl.c usr.sbin/nsd/util.c usr.sbin/nsd/xfrd.c as they seem to have upstreams. The only other case is games/wump/wump.c which has if (arc4random() % 2 == 1) This is safe and seems obvious enough to me. - Matthew Martin Index: games/atc/update.c === RCS file: /cvs/src/games/atc/update.c,v retrieving revision 1.16 diff -u -p -r1.16 update.c --- games/atc/update.c 9 Dec 2014 05:01:14 - 1.16 +++ games/atc/update.c 7 Dec 2015 06:42:17 - @@ -59,6 +59,15 @@ atcrandom() return arc4random(); } +uint32_t +atcrandom_uniform(uint32_t upper_bound) +{ + if (seeded) + return random() % upper_bound; + else + return arc4random_uniform(upper_bound); +} + void update(int dummy) { @@ -212,7 +221,7 @@ update(int dummy) * Otherwise, prop jobs show up *on* entrance. Remember that * we don't update props on odd updates. */ - if ((atcrandom() % sp->newplane_time) == 0) + if (atcrandom_uniform(sp->newplane_time) == 0) addplane(); } @@ -308,10 +317,10 @@ addplane(void) memset(&p, 0, sizeof (p)); p.status = S_MARKED; - p.plane_type = atcrandom() % 2; + p.plane_type = atcrandom_uniform(2); num_starts = sp->num_exits + sp->num_airports; - rnd = atcrandom() % num_starts; + rnd = atcrandom_uniform(num_starts); if (rnd < sp->num_exits) { p.dest_type = T_EXIT; @@ -324,7 +333,7 @@ addplane(void) /* loop until we get a plane not near another */ for (i = 0; i < num_starts; i++) { /* loop till we get a different start point */ - while ((rnd2 = atcrandom() % num_starts) == rnd) + while ((rnd2 = atcrandom_uniform(num_starts)) == rnd) ; if (rnd2 < sp->num_exits) { p.orig_type = T_EXIT; Index: games/hack/hack.mklev.c === RCS file: /cvs/src/games/hack/hack.mklev.c,v retrieving revision 1.7 diff -u -p -r1.7 hack.mklev.c --- games/hack/hack.mklev.c 27 Oct 2009 23:59:25 - 1.7 +++ games/hack/hack.mklev.c 7 Dec 2015 06:42:17 - @@ -66,8 +66,8 @@ #include #include "hack.h" -#define somex() ((random()%(croom->hx-croom->lx+1))+croom->lx) -#define somey() ((random()%(croom->hy-croom->ly+1))+croom->ly) +#define somex() (arc4random_uniform(croom->hx-croom->lx+1)+croom->lx) +#define somey() (arc4random_uniform(croom->hy-croom->ly+1)+croom->ly) #defineXLIM4 /* define minimum required space around a room */ #defineYLIM3 Index: games/hack/rnd.c === RCS file: /cvs/src/games/hack/rnd.c,v retrieving revision 1.5 diff -u -p -r1.5 rnd.c --- games/hack/rnd.c27 Oct 2009 23:59:25 - 1.5 +++ games/hack/rnd.c7 Dec 2015 06:42:17 - @@ -61,7 +61,7 @@ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -#define RND(x) ((random()>>3) % x) +#define RND(x) (arc4random_uniform(x)) #include Index: lib/libc/stdlib/rand.c === RCS file: /cvs/src/lib/libc/stdlib/rand.c,v retrieving revision 1.15 diff -u -p -r1.15 rand.c --- lib/libc/stdlib/rand.c 13 Sep 2015 08:31:47 - 1.15 +++ lib/libc/stdlib/rand.c 7 Dec 2015 06:42:17 - @@ -50,7 +50,7 @@ int rand(void) { if (rand_deterministic == 0) - return (arc4random() % ((u_int)RAND_MAX + 1)); + return (arc4random_uniform((u_int)RAND_MAX + 1)); return (rand_r(&next)); } Index: sbin/dhclient/dhclient.c === RCS file: /cvs/src/sbin/dhclient/dhclient.c,v retrieving revision 1.367 diff -u -p -r1.367 dhclient.c --- sbin/dhclient/dhclient.c5 Dec 2015 13:09:11 - 1.367 +++ sbin/dhclient/dhclient.c7 Dec 2015 06:42:18 - @@ -1285,15 +1285,13 @@ send_discover(void) */ if (!client->interval) client->interval = config->initial_interval; - else { - client->interval += (arc4random() >> 2) % - (2 * client->interval); - } + else + client->interval += arc4random_uniform(2 * client->interval); /* Don't backoff past cutoff. */ if (client->interval > config->backoff_cutoff) - client->interval
Re: Arc4random_uniform
> I agree that simply "min = -upper_bound % upper_bound" should be > sufficient in all cases, since u_int32_t arithmetic is defined as > modulo 2**32 by the C standard, at least as of C99 and I think C89 > too. (Even if we supported any 1s-complement architectures, the > compiler would still need to implement u_int32_t as modulo 2**32.) Indeed. I was looking at it from a correctness point of view instead of trying to determine if it would work in practice. > I also think it makes sense to get rid of the LP64 test, because > 64-bit division still takes more than twice as long as 32-bit division > on most amd64 processors for example (according to > http://gmplib.org/~tege/x86-timing.pdf). And to reduce complexity, of course. > Of course, the potential benefit here isn't that great, so I don't > know whether this really makes sense to worry about. Oh, there are certainly more important matters, but you know how these things go. You see something that can be improved and it turns into an itch that needs to be scratched. The quickest and best way to do so was to send an email to this list. Then when I wrote the message, I started thinking about whether this really was the best implementation or it could be improved further. I freely admit that it doesn't make any difference in the grand scheme of things, but there's also the minute scheme of things. ;)
Re: Arc4random_uniform
[+djm, who wrote this code] I agree that simply "min = -upper_bound % upper_bound" should be sufficient in all cases, since u_int32_t arithmetic is defined as modulo 2**32 by the C standard, at least as of C99 and I think C89 too. (Even if we supported any 1s-complement architectures, the compiler would still need to implement u_int32_t as modulo 2**32.) I also think it makes sense to get rid of the LP64 test, because 64-bit division still takes more than twice as long as 32-bit division on most amd64 processors for example (according to http://gmplib.org/~tege/x86-timing.pdf). Of course, the potential benefit here isn't that great, so I don't know whether this really makes sense to worry about. On Fri, Jun 8, 2012 at 5:13 AM, Jorden Verwer wrote: > Hello OpenBSD developers, > > Let me state in advance that I'm not entirely sure if I'm sending this > to the right mailing list. Based on the descriptions, however, I do > believe tech@ is the correct one instead of misc@. Furthermore, I'm not > subscribed to any of the lists, so please CC me if you should want to > reply. Finally, I've sent this message before from another account, but > it didn't appear in any of the archives. If it did reach the list, I > apologize. In the meantime, I did write some extra things that could be > interesting. > > I've been looking at the way several (BSD) operating systems implement > random numbers, in the context of an online election system (what to do > with an equal number of votes per seat, etc.). My current implementation > reads a byte from /dev/random and converts it to the required range, > possibly reading more bytes to avoid the systematic bias a naive > implementation has for ranges that are not a power of 2. It works > correctly, but there are some considerations (transparency and chroot > jail compatibility) that have led me to look at alternative > implementations. This is where arc4random_uniform comes in. The way it > avoids biased results is different from mine (my implementation uses the > high-order bits instead), but the main idea of throwing away results > that are outside the range that is a whole multiple of the desired range > is exactly the same. It might be a viable alternative if I can convince > myself (and others) that the values it generates do not favor one party > over the other, no matter how slightly. Obviously, the criteria are > different from those applied to cryptography, but I don't consider > myself competent to express them formally. > > This is why I compared the function in lib/libc/crypt/arc4random.c to > that in sys/dev/rnd.c. I was surprised to find that the 32-bit > arithmetic in the libc version differs from that in the kernel version. > Apparently, back in 2008 some change was only applied to the kernel, not > to libc. > > This is what the kernel source code says: > min = ((0x - upper_bound) + 1) % upper_bound; > > This is what the libc source code says: > min = ((0x - (upper_bound * 2)) + 1) % upper_bound; > > Now, I'm not a math geek, but it seems pretty obvious to me that the > version in the kernel source is to be preferred, given that it's simpler > and does exactly what it has to do. I'm assuming this is also the reason > the change was implemented, and that's what the commit log seems to say > as well. So, I wonder why the libc source code was not changed. Was this > an oversight? > > Secondly, the kernel code has the added advantage that it works for any > possible value of upper_bound (you can check this for yourself if you > don't believe me). This means that the upper_bound > 0x8000 check > can be removed, making the common case faster (upper_bound <= 0x8000 > is definitely the common case in my book) and everything easier to > understand and cleaner. Personally I don't see any real drawbacks, but > feel free to disagree. ;) > > There are also other possible implementations that are even shorter, > although they aren't necessarily easier to understand. In summary, the > part between #else and #endif could be replaced by any of these lines: > min = ((0x - upper_bound) + 1) % upper_bound; > min = (1 + ~upper_bound) % upper_bound; > min = (0 - upper_bound) % upper_bound; > min = -upper_bound % upper_bound; > > The latter two implementations work correctly because the C standard > defines modulo arithmetic for unsigned integers. Replace 0 by 2**32 and > you'll see why it does what it's supposed to. I'm not entirely sure the > last implementation would work on a machine that doesn't use two's > complement. The standard is vague in its wording. > > Another option still would be to replace the entire #if block by this: > min = (0x1 - upper_bound) % upper_bound; > > This will continue to work correctly even if int ever becomes 64-bit. > > Kind regards, > > Jorden Verwer
Arc4random_uniform
Hello OpenBSD developers, Let me state in advance that I'm not entirely sure if I'm sending this to the right mailing list. Based on the descriptions, however, I do believe tech@ is the correct one instead of misc@. Furthermore, I'm not subscribed to any of the lists, so please CC me if you should want to reply. Finally, I've sent this message before from another account, but it didn't appear in any of the archives. If it did reach the list, I apologize. In the meantime, I did write some extra things that could be interesting. I've been looking at the way several (BSD) operating systems implement random numbers, in the context of an online election system (what to do with an equal number of votes per seat, etc.). My current implementation reads a byte from /dev/random and converts it to the required range, possibly reading more bytes to avoid the systematic bias a naive implementation has for ranges that are not a power of 2. It works correctly, but there are some considerations (transparency and chroot jail compatibility) that have led me to look at alternative implementations. This is where arc4random_uniform comes in. The way it avoids biased results is different from mine (my implementation uses the high-order bits instead), but the main idea of throwing away results that are outside the range that is a whole multiple of the desired range is exactly the same. It might be a viable alternative if I can convince myself (and others) that the values it generates do not favor one party over the other, no matter how slightly. Obviously, the criteria are different from those applied to cryptography, but I don't consider myself competent to express them formally. This is why I compared the function in lib/libc/crypt/arc4random.c to that in sys/dev/rnd.c. I was surprised to find that the 32-bit arithmetic in the libc version differs from that in the kernel version. Apparently, back in 2008 some change was only applied to the kernel, not to libc. This is what the kernel source code says: min = ((0x - upper_bound) + 1) % upper_bound; This is what the libc source code says: min = ((0x - (upper_bound * 2)) + 1) % upper_bound; Now, I'm not a math geek, but it seems pretty obvious to me that the version in the kernel source is to be preferred, given that it's simpler and does exactly what it has to do. I'm assuming this is also the reason the change was implemented, and that's what the commit log seems to say as well. So, I wonder why the libc source code was not changed. Was this an oversight? Secondly, the kernel code has the added advantage that it works for any possible value of upper_bound (you can check this for yourself if you don't believe me). This means that the upper_bound > 0x8000 check can be removed, making the common case faster (upper_bound <= 0x8000 is definitely the common case in my book) and everything easier to understand and cleaner. Personally I don't see any real drawbacks, but feel free to disagree. ;) There are also other possible implementations that are even shorter, although they aren't necessarily easier to understand. In summary, the part between #else and #endif could be replaced by any of these lines: min = ((0x - upper_bound) + 1) % upper_bound; min = (1 + ~upper_bound) % upper_bound; min = (0 - upper_bound) % upper_bound; min = -upper_bound % upper_bound; The latter two implementations work correctly because the C standard defines modulo arithmetic for unsigned integers. Replace 0 by 2**32 and you'll see why it does what it's supposed to. I'm not entirely sure the last implementation would work on a machine that doesn't use two's complement. The standard is vague in its wording. Another option still would be to replace the entire #if block by this: min = (0x1 - upper_bound) % upper_bound; This will continue to work correctly even if int ever becomes 64-bit. Kind regards, Jorden Verwer