Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2
On Thu, 13 Jun 2019 at 23:33, René Scharfe wrote: > > Am 13.06.19 um 21:42 schrieb Martin Ågren: > > On Thu, 13 Jun 2019 at 19:54, René Scharfe wrote: > >> Make sure the intermediate value stays within the boundaries instead, > >> like this: > >> > >> mid = first + ((last - first) >> 1); > >> > >> The loop condition of the binary search makes sure that 'last' is > >> always greater than 'first', so this is safe as long as 'first' is > >> not negative. And that can be verified easily using the pre-context > >> of each change, except for name-hash.c, so add an assertion to that > >> effect there. > > So all is well. But maybe we should write `(last - first) / 2` anyway. > > We could then drop the extra parenthesis, and we would keep future > > readers (and static analysis?) from wondering whether we might ever be > > shifting a signed value with the sign bit set. A few spots fewer to > > audit in the future... > > Yes, thought about that. When I saw Clang 8 emitting extra opcodes for > handling the sign for the version with division I decided to restrict > the patch to just do overflow prevention and leave the right shifts in > place.. Ah, ok, I should have known you were on top of it. I guess that means that clang isn't able to figure out we're guaranteed to be working with non-negative numbers. Which I guess means that it can't be sure the shift is safe, but with undefined behavior it has the leeway it needs, so... Martin
Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2
Am 13.06.19 um 21:42 schrieb Martin Ågren: > On Thu, 13 Jun 2019 at 19:54, René Scharfe wrote: >> >> Calculating the sum of two array indexes to find the midpoint between >> them can overflow, i.e. code like this is unsafe for big arrays: >> >> mid = (first + last) >> 1; >> >> Make sure the intermediate value stays within the boundaries instead, >> like this: >> >> mid = first + ((last - first) >> 1); >> >> The loop condition of the binary search makes sure that 'last' is >> always greater than 'first', so this is safe as long as 'first' is >> not negative. And that can be verified easily using the pre-context >> of each change, except for name-hash.c, so add an assertion to that >> effect there. > > Right, with "safe", one might mean something like "no undefined behavior > due to shifting a signed value with the high bit set". Especially since > we're worrying about overflows, we're obviously having large values in > mind, so we're right to consider the sign bit. But, we're fine as you > note. Because we subtract, and `last` doesn't have its sign bit set, > and `first` is non-negative and not greater than `last`, the sign bit of > `(last - first)` is always zero. > > So all is well. But maybe we should write `(last - first) / 2` anyway. > We could then drop the extra parenthesis, and we would keep future > readers (and static analysis?) from wondering whether we might ever be > shifting a signed value with the sign bit set. A few spots fewer to > audit in the future... Yes, thought about that. When I saw Clang 8 emitting extra opcodes for handling the sign for the version with division I decided to restrict the patch to just do overflow prevention and leave the right shifts in place.. René
Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2
On Thu, 13 Jun 2019 at 19:54, René Scharfe wrote: > > Calculating the sum of two array indexes to find the midpoint between > them can overflow, i.e. code like this is unsafe for big arrays: > > mid = (first + last) >> 1; > > Make sure the intermediate value stays within the boundaries instead, > like this: > > mid = first + ((last - first) >> 1); > > The loop condition of the binary search makes sure that 'last' is > always greater than 'first', so this is safe as long as 'first' is > not negative. And that can be verified easily using the pre-context > of each change, except for name-hash.c, so add an assertion to that > effect there. Right, with "safe", one might mean something like "no undefined behavior due to shifting a signed value with the high bit set". Especially since we're worrying about overflows, we're obviously having large values in mind, so we're right to consider the sign bit. But, we're fine as you note. Because we subtract, and `last` doesn't have its sign bit set, and `first` is non-negative and not greater than `last`, the sign bit of `(last - first)` is always zero. So all is well. But maybe we should write `(last - first) / 2` anyway. We could then drop the extra parenthesis, and we would keep future readers (and static analysis?) from wondering whether we might ever be shifting a signed value with the sign bit set. A few spots fewer to audit in the future... Martin
Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2
On 6/13/2019 1:51 PM, René Scharfe wrote: > Calculating the sum of two array indexes to find the midpoint between > them can overflow, i.e. code like this is unsafe for big arrays: > > mid = (first + last) >> 1; > > Make sure the intermediate value stays within the boundaries instead, > like this: > > mid = first + ((last - first) >> 1); > > The loop condition of the binary search makes sure that 'last' is > always greater than 'first', so this is safe as long as 'first' is > not negative. And that can be verified easily using the pre-context > of each change, except for name-hash.c, so add an assertion to that > effect there. > > The unsafe calculations were found with: > > git grep '(.*+.*) *>> *1' > > This is a continuation of 19716b21a4 (cleanup: fix possible overflow > errors in binary search, 2017-10-08). Thank you for finding these additional examples! LGTM. Pretty easy to see this is correct. -Stolee
Re: [PATCH] cleanup: fix possible overflow errors in binary search
On 10/6/2017 10:18 AM, Jeff King wrote: On Fri, Oct 06, 2017 at 09:52:31AM -0400, Derrick Stolee wrote: A common mistake when writing binary search is to allow possible integer overflow by using the simple average: mid = (min + max) / 2; Instead, use the overflow-safe version: mid = min + (max - min) / 2; Great, thank you for picking this up! The included changes were found using the following two greps: grep "/ 2;" *.c grep "/ 2;" */*.c grep "/2;" */*.c You can use[1]: git grep '/ 2;' '*.c' to have Git expand the wildcard. That catches a few extra cases in compat/regex/*.c. Even though it's imported code, it might be nice to cover those, too (since it's a possible bug, and also as a good example). [1] I'd actually write: git grep '/ *2;' '*.c' to do it all in one grep. :) Thanks for the grep lesson! I knew there would be a simpler way to do what I wanted. --- builtin/index-pack.c | 4 ++-- builtin/pack-objects.c | 2 +- builtin/unpack-objects.c | 2 +- cache-tree.c | 2 +- packfile.c | 2 +- sha1-lookup.c| 2 +- sha1_name.c | 2 +- string-list.c| 2 +- utf8.c | 2 +- xdiff/xpatience.c| 2 +- 10 files changed, 11 insertions(+), 11 deletions(-) These all look good to me (really the only way the conversion could be bad is if "min" was higher than "max", and each case is just inside a loop condition which makes sure that is not the case). -Peff I thought this should be simple enough. When we are all happy with the idea of this cleanup, I'll re-roll with the missed examples. Thanks, -Stolee
Re: [PATCH] cleanup: fix possible overflow errors in binary search
On Fri, Oct 06, 2017 at 09:52:31AM -0400, Derrick Stolee wrote: > A common mistake when writing binary search is to allow possible > integer overflow by using the simple average: > > mid = (min + max) / 2; > > Instead, use the overflow-safe version: > > mid = min + (max - min) / 2; Great, thank you for picking this up! > The included changes were found using the following two greps: > > grep "/ 2;" *.c > grep "/ 2;" */*.c > grep "/2;" */*.c You can use[1]: git grep '/ 2;' '*.c' to have Git expand the wildcard. That catches a few extra cases in compat/regex/*.c. Even though it's imported code, it might be nice to cover those, too (since it's a possible bug, and also as a good example). [1] I'd actually write: git grep '/ *2;' '*.c' to do it all in one grep. :) > --- > builtin/index-pack.c | 4 ++-- > builtin/pack-objects.c | 2 +- > builtin/unpack-objects.c | 2 +- > cache-tree.c | 2 +- > packfile.c | 2 +- > sha1-lookup.c| 2 +- > sha1_name.c | 2 +- > string-list.c| 2 +- > utf8.c | 2 +- > xdiff/xpatience.c| 2 +- > 10 files changed, 11 insertions(+), 11 deletions(-) These all look good to me (really the only way the conversion could be bad is if "min" was higher than "max", and each case is just inside a loop condition which makes sure that is not the case). -Peff