Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2

2019-06-13 Thread Martin Ågren
On Thu, 13 Jun 2019 at 23:33, René Scharfe  wrote:
>
> Am 13.06.19 um 21:42 schrieb Martin Ågren:
> > On Thu, 13 Jun 2019 at 19:54, René Scharfe  wrote:

> >> Make sure the intermediate value stays within the boundaries instead,
> >> like this:
> >>
> >> mid = first + ((last - first) >> 1);
> >>
> >> The loop condition of the binary search makes sure that 'last' is
> >> always greater than 'first', so this is safe as long as 'first' is
> >> not negative.  And that can be verified easily using the pre-context
> >> of each change, except for name-hash.c, so add an assertion to that
> >> effect there.

> > So all is well. But maybe we should write `(last - first) / 2` anyway.
> > We could then drop the extra parenthesis, and we would keep future
> > readers (and static analysis?) from wondering whether we might ever be
> > shifting a signed value with the sign bit set. A few spots fewer to
> > audit in the future...
>
> Yes, thought about that.  When I saw Clang 8 emitting extra opcodes for
> handling the sign for the version with division I decided to restrict
> the patch to just do overflow prevention and leave the right shifts in
> place..

Ah, ok, I should have known you were on top of it. I guess that means
that clang isn't able to figure out we're guaranteed to be working with
non-negative numbers. Which I guess means that it can't be sure the
shift is safe, but with undefined behavior it has the leeway it needs,
so...

Martin


Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2

2019-06-13 Thread René Scharfe
Am 13.06.19 um 21:42 schrieb Martin Ågren:
> On Thu, 13 Jun 2019 at 19:54, René Scharfe  wrote:
>>
>> Calculating the sum of two array indexes to find the midpoint between
>> them can overflow, i.e. code like this is unsafe for big arrays:
>>
>> mid = (first + last) >> 1;
>>
>> Make sure the intermediate value stays within the boundaries instead,
>> like this:
>>
>> mid = first + ((last - first) >> 1);
>>
>> The loop condition of the binary search makes sure that 'last' is
>> always greater than 'first', so this is safe as long as 'first' is
>> not negative.  And that can be verified easily using the pre-context
>> of each change, except for name-hash.c, so add an assertion to that
>> effect there.
>
> Right, with "safe", one might mean something like "no undefined behavior
> due to shifting a signed value with the high bit set". Especially since
> we're worrying about overflows, we're obviously having large values in
> mind, so we're right to consider the sign bit. But, we're fine as you
> note.  Because we subtract, and `last` doesn't have its sign bit set,
> and `first` is non-negative and not greater than `last`, the sign bit of
> `(last - first)` is always zero.
>
> So all is well. But maybe we should write `(last - first) / 2` anyway.
> We could then drop the extra parenthesis, and we would keep future
> readers (and static analysis?) from wondering whether we might ever be
> shifting a signed value with the sign bit set. A few spots fewer to
> audit in the future...

Yes, thought about that.  When I saw Clang 8 emitting extra opcodes for
handling the sign for the version with division I decided to restrict
the patch to just do overflow prevention and leave the right shifts in
place..

René


Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2

2019-06-13 Thread Martin Ågren
On Thu, 13 Jun 2019 at 19:54, René Scharfe  wrote:
>
> Calculating the sum of two array indexes to find the midpoint between
> them can overflow, i.e. code like this is unsafe for big arrays:
>
> mid = (first + last) >> 1;
>
> Make sure the intermediate value stays within the boundaries instead,
> like this:
>
> mid = first + ((last - first) >> 1);
>
> The loop condition of the binary search makes sure that 'last' is
> always greater than 'first', so this is safe as long as 'first' is
> not negative.  And that can be verified easily using the pre-context
> of each change, except for name-hash.c, so add an assertion to that
> effect there.

Right, with "safe", one might mean something like "no undefined behavior
due to shifting a signed value with the high bit set". Especially since
we're worrying about overflows, we're obviously having large values in
mind, so we're right to consider the sign bit. But, we're fine as you
note.  Because we subtract, and `last` doesn't have its sign bit set,
and `first` is non-negative and not greater than `last`, the sign bit of
`(last - first)` is always zero.

So all is well. But maybe we should write `(last - first) / 2` anyway.
We could then drop the extra parenthesis, and we would keep future
readers (and static analysis?) from wondering whether we might ever be
shifting a signed value with the sign bit set. A few spots fewer to
audit in the future...

Martin


Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2

2019-06-13 Thread Derrick Stolee
On 6/13/2019 1:51 PM, René Scharfe wrote:
> Calculating the sum of two array indexes to find the midpoint between
> them can overflow, i.e. code like this is unsafe for big arrays:
> 
>   mid = (first + last) >> 1;
> 
> Make sure the intermediate value stays within the boundaries instead,
> like this:
> 
>   mid = first + ((last - first) >> 1);
> 
> The loop condition of the binary search makes sure that 'last' is
> always greater than 'first', so this is safe as long as 'first' is
> not negative.  And that can be verified easily using the pre-context
> of each change, except for name-hash.c, so add an assertion to that
> effect there.
> 
> The unsafe calculations were found with:
> 
>   git grep '(.*+.*) *>> *1'
> 
> This is a continuation of 19716b21a4 (cleanup: fix possible overflow
> errors in binary search, 2017-10-08).

Thank you for finding these additional examples!

LGTM. Pretty easy to see this is correct.

-Stolee



Re: [PATCH] cleanup: fix possible overflow errors in binary search

2017-10-06 Thread Derrick Stolee

On 10/6/2017 10:18 AM, Jeff King wrote:

On Fri, Oct 06, 2017 at 09:52:31AM -0400, Derrick Stolee wrote:


A common mistake when writing binary search is to allow possible
integer overflow by using the simple average:

mid = (min + max) / 2;

Instead, use the overflow-safe version:

mid = min + (max - min) / 2;

Great, thank you for picking this up!


The included changes were found using the following two greps:

grep "/ 2;" *.c
grep "/ 2;" */*.c
grep "/2;" */*.c

You can use[1]:

   git grep '/ 2;' '*.c'

to have Git expand the wildcard. That catches a few extra cases in
compat/regex/*.c.  Even though it's imported code, it might be
nice to cover those, too (since it's a possible bug, and also as a good
example).

[1] I'd actually write:

   git grep '/ *2;' '*.c'

 to do it all in one grep. :)


Thanks for the grep lesson! I knew there would be a simpler way to do 
what I wanted.



---
  builtin/index-pack.c | 4 ++--
  builtin/pack-objects.c   | 2 +-
  builtin/unpack-objects.c | 2 +-
  cache-tree.c | 2 +-
  packfile.c   | 2 +-
  sha1-lookup.c| 2 +-
  sha1_name.c  | 2 +-
  string-list.c| 2 +-
  utf8.c   | 2 +-
  xdiff/xpatience.c| 2 +-
  10 files changed, 11 insertions(+), 11 deletions(-)

These all look good to me (really the only way the conversion could be
bad is if "min" was higher than "max", and each case is just inside a
loop condition which makes sure that is not the case).

-Peff
I thought this should be simple enough. When we are all happy with the 
idea of this cleanup, I'll re-roll with the missed examples.


Thanks,
-Stolee


Re: [PATCH] cleanup: fix possible overflow errors in binary search

2017-10-06 Thread Jeff King
On Fri, Oct 06, 2017 at 09:52:31AM -0400, Derrick Stolee wrote:

> A common mistake when writing binary search is to allow possible
> integer overflow by using the simple average:
> 
>   mid = (min + max) / 2;
> 
> Instead, use the overflow-safe version:
> 
>   mid = min + (max - min) / 2;

Great, thank you for picking this up!

> The included changes were found using the following two greps:
> 
>   grep "/ 2;" *.c
>   grep "/ 2;" */*.c
>   grep "/2;" */*.c

You can use[1]:

  git grep '/ 2;' '*.c'

to have Git expand the wildcard. That catches a few extra cases in
compat/regex/*.c.  Even though it's imported code, it might be
nice to cover those, too (since it's a possible bug, and also as a good
example).

[1] I'd actually write:

  git grep '/ *2;' '*.c'

to do it all in one grep. :)

> ---
>  builtin/index-pack.c | 4 ++--
>  builtin/pack-objects.c   | 2 +-
>  builtin/unpack-objects.c | 2 +-
>  cache-tree.c | 2 +-
>  packfile.c   | 2 +-
>  sha1-lookup.c| 2 +-
>  sha1_name.c  | 2 +-
>  string-list.c| 2 +-
>  utf8.c   | 2 +-
>  xdiff/xpatience.c| 2 +-
>  10 files changed, 11 insertions(+), 11 deletions(-)

These all look good to me (really the only way the conversion could be
bad is if "min" was higher than "max", and each case is just inside a
loop condition which makes sure that is not the case).

-Peff