Re: UBSAN error in lib/sh/random.c:79
On 1/10/23 11:43 AM, Greg Wooledge wrote: On Sat, Jan 07, 2023 at 01:42:20PM -0500, Greg Wooledge wrote: Or should the code do the multiplications with unsigned values, store them in unsigned variables, and then replace the subtraction with some kind of conditional that checks which of the two is greater? Here's a version that does just that: Thanks, I pushed what I think is a simpler fix that should work fine. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: UBSAN error in lib/sh/random.c:79
On Sat, Jan 07, 2023 at 01:42:20PM -0500, Greg Wooledge wrote: > Or should the code do the multiplications with unsigned > values, store them in unsigned variables, and then replace the subtraction > with some kind of conditional that checks which of the two is greater? Here's a version that does just that: static u_bits32_t intrand32 (last) u_bits32_t last; { u_bits32_t h, l, t1, t2, ret; ret = (last == 0) ? 123459876 : last; h = ret / 127773; l = ret - (127773 * h); t1 = 16807 * l; t2 = 2836 * h; if (t1 < t2) ret = 0x7fff - t2 + t1; else ret = t1 - t2; return ret; } It passes the implementation test in the paper (checking the seed after 1 iterations), and I believe it's free from any kind of overflow. Do with it as you like. Feel free to replace the if/else with a ternary ?: operator if that's your preference.
Re: UBSAN error in lib/sh/random.c:79
On 1/7/23 1:45 PM, Martin Schulte wrote: Hello! Am Sat, 07 Jan 2023 19:08:06 +0100 schrieb Andreas Schwab : On Jan 07 2023, Greg Wooledge wrote: ... I think the original overflow can only happen if the argument of intrand32 is bigger than INT_MAX. Question might be if an overflow does any harm - or maybe even is intended... Positive-to-negative overflow doesn't hurt the algorithm, but since overflow is undefined behavior, anything can happen. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: UBSAN error in lib/sh/random.c:79
On 1/6/23 8:37 PM, Sam James wrote: Hi folks, I'm currently testing common Linux userland with UndefinedBehaviorSanitizer (UBSAN, -fsanitize=undefined). With Bash 5.2_p15, I get the following with this script: ``` $ cat /tmp/guess_suffix guess_suffix() { tmpdir="${TMPDIR}"/.ecompress$$.${RANDOM} } guess_suffix ``` It seems easier to trigger if I run it as an external script rather than as a function in an interactive shell. ``` $ export UBSAN_OPTIONS="print_stacktrace=1:halt_on_error=1" $ bash -x /tmp/guess_suffix + guess_suffix random.c:79:21: runtime error: signed integer overflow: 31789 * 127773 cannot be represented in type 'int' You must be extremely unlucky. I ran a script that generated 15 million random numbers and got three integer overflows. The easiest fix is to use the other variant algorithm in the paper and change the line in question to l = ret % 127773; That is equivalent mathemetically and won't overflow (the overflow isn't actually damaging, though). -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: UBSAN error in lib/sh/random.c:79
Andreas Schwab wrote in <871qo6f90g@igel.home>: |On Jan 07 2023, Greg Wooledge wrote: |> The variable l is a modulus, so its largest possible value is 127772. |> If the code simply said "l = ret % 127773;" then it wouldn't even be |> an issue. But the % was rewritten, presumably for efficiency. | |Presumably the assumption was that two divides are more costly than a |divide and a multiply (although nowadays, compilers will try to combine |the two divides if the target architecture has a divmod insn). Seems to me that even twenty years ago gcc turned many divisions into multiplications if it could. (Which always stunned me as a non-mathematician.) --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: UBSAN error in lib/sh/random.c:79
On Jan 07 2023, Greg Wooledge wrote: > The variable l is a modulus, so its largest possible value is 127772. > If the code simply said "l = ret % 127773;" then it wouldn't even be > an issue. But the % was rewritten, presumably for efficiency. Presumably the assumption was that two divides are more costly than a divide and a multiply (although nowadays, compilers will try to combine the two divides if the target architecture has a divmod insn). -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: UBSAN error in lib/sh/random.c:79
On Jan 07 2023, Martin Schulte wrote: > Hello! > > Am Sat, 07 Jan 2023 19:08:06 +0100 schrieb Andreas Schwab > : > >> On Jan 07 2023, Greg Wooledge wrote: >> ... >> I think the original overflow can only happen if the argument of >> intrand32 is bigger than INT_MAX. > > Question might be if an overflow does any harm - or maybe even is intended... Signed overflow is never intended. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: UBSAN error in lib/sh/random.c:79
Hello! Am Sat, 07 Jan 2023 19:08:06 +0100 schrieb Andreas Schwab : > On Jan 07 2023, Greg Wooledge wrote: > ... > I think the original overflow can only happen if the argument of > intrand32 is bigger than INT_MAX. Question might be if an overflow does any harm - or maybe even is intended... Best regards, Martin
Re: UBSAN error in lib/sh/random.c:79
On Sat, Jan 07, 2023 at 07:08:06PM +0100, Andreas Schwab wrote: > On Jan 07 2023, Greg Wooledge wrote: > > > I think this patch might be correct: > > > > > > --- lib/sh/random.c.orig2023-01-07 12:26:09.049950519 -0500 > > +++ lib/sh/random.c 2023-01-07 12:26:27.469974730 -0500 > > @@ -70,8 +70,8 @@ > > There are lots of other combinations of constants to use; look at > > > > https://www.gnu.org/software/gsl/manual/html_node/Other-random-number-generators.html#Other-random-number-generators > > */ > > > > - bits32_t h, l, t; > > - u_bits32_t ret; > > + bits32_t t; > > + u_bits32_t h, l, ret; > > > >/* Can't seed with 0. */ > >ret = (last == 0) ? 123459876 : last; > > > > > > I tested it briefly, and it builds cleanly and produces the same random > > results as the unpatched version, at least on my system (compiled with > > gcc 10.2.1). > > The assignment t = 16807 * l - 2836 * h can still overflow, because if l > and h are unsigned, the computed value can never be negative, but it > becomes bigger than INT_MAX if 2836 * h is bigger than 16807 * l (the > unsigned result is computed modulo UINT_MAX+1). t is still signed. It has to be. The final check at the end of the function is (t < 0). My patch doesn't change that. > I think the original overflow can only happen if the argument of > intrand32 is bigger than INT_MAX. The original bug report complains about the calculation of l, not t. The variable l is a modulus, so its largest possible value is 127772. If the code simply said "l = ret % 127773;" then it wouldn't even be an issue. But the % was rewritten, presumably for efficiency. The only thing that matters in terms of this bug report is that the calculation of l's value is done using unsigned 32-bit integers at all points, up until the final assignment to the variable l. The original code has l as a signed variable, which is OK if the value is calculated using unsigned arithmetic for the intermediate steps, followed by an implicit cast to bits32_t at the end. The OP's bug report implies that they have some sort of compiler or runtime environment where the intermediate arithmetic is done using signed values. The goal of my patch is to force *any* compiler to use unsigned arithmetic at all points while calculating l. Now, you mentioned t... and yeah, I can see how the calculation of t might have similar issues, where the intermediate steps have to be done with unsigned values, but in this case, the final value *must* be held in a signed variable. I'll have to defer to C experts here. Is the code correct as written, and it's simply the OP's compiler or runtime environment that's in violation? Or should the code do the multiplications with unsigned values, store them in unsigned variables, and then replace the subtraction with some kind of conditional that checks which of the two is greater? Off the top of my head, I don't know.
Re: UBSAN error in lib/sh/random.c:79
On Jan 07 2023, Greg Wooledge wrote: > I think this patch might be correct: > > > --- lib/sh/random.c.orig 2023-01-07 12:26:09.049950519 -0500 > +++ lib/sh/random.c 2023-01-07 12:26:27.469974730 -0500 > @@ -70,8 +70,8 @@ > There are lots of other combinations of constants to use; look at > > https://www.gnu.org/software/gsl/manual/html_node/Other-random-number-generators.html#Other-random-number-generators > */ > > - bits32_t h, l, t; > - u_bits32_t ret; > + bits32_t t; > + u_bits32_t h, l, ret; > >/* Can't seed with 0. */ >ret = (last == 0) ? 123459876 : last; > > > I tested it briefly, and it builds cleanly and produces the same random > results as the unpatched version, at least on my system (compiled with > gcc 10.2.1). The assignment t = 16807 * l - 2836 * h can still overflow, because if l and h are unsigned, the computed value can never be negative, but it becomes bigger than INT_MAX if 2836 * h is bigger than 16807 * l (the unsigned result is computed modulo UINT_MAX+1). I think the original overflow can only happen if the argument of intrand32 is bigger than INT_MAX. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: UBSAN error in lib/sh/random.c:79
On Fri, Jan 06, 2023 at 09:00:30PM -0500, Greg Wooledge wrote: > On Sat, Jan 07, 2023 at 01:37:30AM +, Sam James wrote: > > random.c:79:21: runtime error: signed integer overflow: 31789 * 127773 > > cannot be represented in type 'int' > > #0 0x559791a301ce in intrand32 > > /usr/src/debug/app-shells/bash-5.2_p15/bash-5.2/lib/sh/random.c:79 > > Well, the code in question looks like this (with comments removed): > > static u_bits32_t > intrand32 (last) > u_bits32_t last; > { > bits32_t h, l, t; > u_bits32_t ret; > > ret = (last == 0) ? 123459876 : last; > h = ret / 127773; > l = ret - (127773 * h); > t = 16807 * l - 2836 * h; > ret = (t < 0) ? t + 0x7fff : t; > > return (ret); > } > /* Minimal Standard generator from > "Random number generators: good ones are hard to find", > Park and Miller, Communications of the ACM, vol. 31, no. 10, > October 1988, p. 1195. Filtered through FreeBSD. I've now read the first half of the paper (it's easily readable up until the section entitled "THEORY"). Bash's C implementation is derived from this code presented in the paper: function Random : real; const a = 16807; m = 2147483647; q = 127773; (* m div a *) r = 2836; (* m mod a *) var lo, hi, test : integer; begin hi := seed div q; lo := seed mod q; test := a * lo - r * hi; if test > 0 then seed := test else seed := test + m; Random := seed / m end; with the additional global variable "seed" of type integer. The paper promises that no value will ever overflow a 32-bit variable, and specifically mentions that the "test" variable can safely be a 32-bit signed value. However, it's pretty clear that "hi" and "lo" are not signed values. Even when rewriting the mod calculation as a multiplication and a subtraction, "lo" (aka "l") should never be negative. I think this patch might be correct: --- lib/sh/random.c.orig2023-01-07 12:26:09.049950519 -0500 +++ lib/sh/random.c 2023-01-07 12:26:27.469974730 -0500 @@ -70,8 +70,8 @@ There are lots of other combinations of constants to use; look at https://www.gnu.org/software/gsl/manual/html_node/Other-random-number-generators.html#Other-random-number-generators */ - bits32_t h, l, t; - u_bits32_t ret; + bits32_t t; + u_bits32_t h, l, ret; /* Can't seed with 0. */ ret = (last == 0) ? 123459876 : last; I tested it briefly, and it builds cleanly and produces the same random results as the unpatched version, at least on my system (compiled with gcc 10.2.1).
Re: UBSAN error in lib/sh/random.c:79
On Sat, Jan 07, 2023 at 01:37:30AM +, Sam James wrote: > $ cat /tmp/guess_suffix > guess_suffix() { > tmpdir="${TMPDIR}"/.ecompress$$.${RANDOM} > } > guess_suffix > $ export UBSAN_OPTIONS="print_stacktrace=1:halt_on_error=1" > $ bash -x /tmp/guess_suffix > + guess_suffix > random.c:79:21: runtime error: signed integer overflow: 31789 * 127773 cannot > be represented in type 'int' > #0 0x559791a301ce in intrand32 > /usr/src/debug/app-shells/bash-5.2_p15/bash-5.2/lib/sh/random.c:79 Well, the code in question looks like this (with comments removed): static u_bits32_t intrand32 (last) u_bits32_t last; { bits32_t h, l, t; u_bits32_t ret; ret = (last == 0) ? 123459876 : last; h = ret / 127773; l = ret - (127773 * h); t = 16807 * l - 2836 * h; ret = (t < 0) ? t + 0x7fff : t; return (ret); } The line your error refers to is "l = ..." where the multiplication occurs. Also of note, unicorn:/var/tmp/bash/bash-5.2$ grep bits32_t *.h config.h:#define bits32_t int config.h:#define u_bits32_t unsigned int externs.h:extern u_bits32_t get_urandom32 PARAMS((void)); Variables "h" and "l" are both of type int (with a fancy name) on my platform, and it seems on yours as well, based on your error message. It would not surprise me if this is a long-standing bug in this RNG, but I haven't analyzed the code well enough to understand why some of the variables are defined with a signed type, and some with an unsigned type. Here are the comments which accompany the code: /* Minimal Standard generator from "Random number generators: good ones are hard to find", Park and Miller, Communications of the ACM, vol. 31, no. 10, October 1988, p. 1195. Filtered through FreeBSD. x(n+1) = 16807 * x(n) mod (m). We split up the calculations to avoid overflow. h = last / q; l = x - h * q; t = a * l - h * r m = 2147483647, a = 16807, q = 127773, r = 2836 There are lots of other combinations of constants to use; look at https://www.gnu.org/software/gsl/manual/html_node/Other-random-number-gener ators.html#Other-random-number-generators */