Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-18 Thread Martin Sebor

On 12/14/2017 12:04 PM, Jeff Law wrote:

On 12/14/2017 11:55 AM, Jakub Jelinek wrote:

On Thu, Dec 14, 2017 at 11:51:26AM -0700, Martin Sebor wrote:

Well, it would be nice to get sanitizers diagnose this at runtime.  If we
know the array length at compile time, simply compare after the strlen
call the result and fail if it returns something above it.  Or replace
the strlen call with strnlen for the compile time known size and add
instrumentation if strnlen returns the second argument.


Sure, that sounds like a useful enhancement.  I'll look into
adding it as a follow-on patch unless you feel that it needs
to be part of the same package.


The problem is if we'll need changes to libubsan for that (which we'll
likely do), then those need to be upstreamed, and e.g. my attempts
to upstream simple patch to diagnose noreturn function returns is suspended
upstream because clang doesn't have that support (and I have no interest
in adding to to clang).

In theory we could have some GCC only file in there, but then we'd be ABI
incompatible with them.

So defer the sanitization side until Clang catches up?


I've committed the patch as is in r255790.  If I have some spare
cycles I'll see if the instrumentation is possible without changing
libubsan.

Martin



Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-16 Thread Martin Sebor

On 12/15/2017 10:29 AM, Martin Sebor wrote:

On 12/15/2017 09:17 AM, Richard Biener wrote:

On December 15, 2017 4:58:14 PM GMT+01:00, Martin Sebor
 wrote:

On 12/15/2017 01:48 AM, Richard Biener wrote:

On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor 

wrote:

On 12/14/2017 03:43 AM, Richard Biener wrote:


On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor 

wrote:


On 12/12/2017 05:35 PM, Jeff Law wrote:



On 12/12/2017 01:15 PM, Martin Sebor wrote:



Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the

result

of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.



What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in

the realm

of undefined behavior at that point (I hope so)?




Yes, this is undefined.  Pointer arithmetic (either direct or
via standard library functions) is only defined for pointers
to the same object or subobject.  So even something like

 memcpy (pfu->x1, pfu->x1 + 10, 10);

is undefined.



There's nothing undefined here - computing the pointer pointing
to one-after-the-last element of an array is valid (you are just
not allowed to dereference it).



Right, and memcpy dereferences it, so it's undefined.


That's interpretation of the standard that I don't share.


It's not an interpretation.  It's a basic rule of the languages
that the standards are explicit about.  In C11 you will find
this specified in detail in 6.5.6, paragraph 7 and 8 (of
particular relevance to your question below is p7: "a pointer
to an object that is not an element of an array behaves the same
as a pointer to the first element of an array of length one.")


I know.


Also, if I have struct f { int i; int j; };  and a int * that points
to the j member you say I have no standard conforming way
to get at a pointer to the i member from this, right?


Correct.  See above.


Because
the pointer points to an 'int' object.  But it also points within
a struct f object!  So at least maybe (int *)((char *)p - offsetof
(struct f, j))
should be valid?


No, not really.  It works in practice but it's not well-defined.
It doesn't matter how you get at the result.  What matters is
what you start with.  As Jeff said, to derive a pointer to
distinct suobjects of a larger object you need to start with
a pointer to the larger object and treat it as an array of
chars.


That's obviously not constraints people use C and C++ with so I see no
way to enforce this within gimple.


There's code out there that relies on all sorts of undefined
behavior.  It's a judgment call in each instance as to how much
of such code exists and how important it is.  In this case, I'd
expect it to be confined to low-level software like OS kernels
and such whose authors use C as a more convenient assembly
language to talk directly to the hardware.  Programmers in other
domains are usually more conscious of the requirements and limited
guarantees of the language and less willing to make assumptions
based on what this or that processor lets them get away with.

That being said, it certainly is possible to enforce this
constraint within GIMPLE.  My -Wrestrict patch does it to
an extent for the memory and string built-ins.  The -Warray-bounds
patch I submitted for offsets does it for all other expressions.
Neither patch exposed any such code in the Linux kernel, so it
doesn't look like abuses of this sort are common even in low-level
code.


Let me correct myself on this last point.  I forgot I had disabled
the offset checking for memcpy.  With it enabled, the enhanced
-Warray-bounds warning finds about a dozen distinct instances of
memcpy calls in the Linux kernel that cross the subobject boundary.

Most of them involve struct ieee80211_hdr and its adjacent unsigned
char array members.  I think it would be worthwhile to do a more
careful review and, if this turns out to be the only (or predominant)
use case, consider tightening up GCC checkers to allow only it but
not cases involving members of any other types (especially
non-arrays).  The most troublesome of those being variations on:

  struct S {
...
char array[N];
void (*pfunc)(void);
...
  };

  extern struct S *ps;

  memcpy (ps->array, data, N + X);


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-15 Thread Martin Sebor

On 12/15/2017 09:17 AM, Richard Biener wrote:

On December 15, 2017 4:58:14 PM GMT+01:00, Martin Sebor  
wrote:

On 12/15/2017 01:48 AM, Richard Biener wrote:

On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor 

wrote:

On 12/14/2017 03:43 AM, Richard Biener wrote:


On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor 

wrote:


On 12/12/2017 05:35 PM, Jeff Law wrote:



On 12/12/2017 01:15 PM, Martin Sebor wrote:



Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the

result

of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.



What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in

the realm

of undefined behavior at that point (I hope so)?




Yes, this is undefined.  Pointer arithmetic (either direct or
via standard library functions) is only defined for pointers
to the same object or subobject.  So even something like

 memcpy (pfu->x1, pfu->x1 + 10, 10);

is undefined.



There's nothing undefined here - computing the pointer pointing
to one-after-the-last element of an array is valid (you are just
not allowed to dereference it).



Right, and memcpy dereferences it, so it's undefined.


That's interpretation of the standard that I don't share.


It's not an interpretation.  It's a basic rule of the languages
that the standards are explicit about.  In C11 you will find
this specified in detail in 6.5.6, paragraph 7 and 8 (of
particular relevance to your question below is p7: "a pointer
to an object that is not an element of an array behaves the same
as a pointer to the first element of an array of length one.")


I know.


Also, if I have struct f { int i; int j; };  and a int * that points
to the j member you say I have no standard conforming way
to get at a pointer to the i member from this, right?


Correct.  See above.


Because
the pointer points to an 'int' object.  But it also points within
a struct f object!  So at least maybe (int *)((char *)p - offsetof
(struct f, j))
should be valid?


No, not really.  It works in practice but it's not well-defined.
It doesn't matter how you get at the result.  What matters is
what you start with.  As Jeff said, to derive a pointer to
distinct suobjects of a larger object you need to start with
a pointer to the larger object and treat it as an array of
chars.


That's obviously not constraints people use C and C++ with so I see no way to 
enforce this within gimple.


There's code out there that relies on all sorts of undefined
behavior.  It's a judgment call in each instance as to how much
of such code exists and how important it is.  In this case, I'd
expect it to be confined to low-level software like OS kernels
and such whose authors use C as a more convenient assembly
language to talk directly to the hardware.  Programmers in other
domains are usually more conscious of the requirements and limited
guarantees of the language and less willing to make assumptions
based on what this or that processor lets them get away with.

That being said, it certainly is possible to enforce this
constraint within GIMPLE.  My -Wrestrict patch does it to
an extent for the memory and string built-ins.  The -Warray-bounds
patch I submitted for offsets does it for all other expressions.
Neither patch exposed any such code in the Linux kernel, so it
doesn't look like abuses of this sort are common even in low-level
code.

Martin


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-15 Thread Richard Biener
On December 15, 2017 4:58:14 PM GMT+01:00, Martin Sebor  
wrote:
>On 12/15/2017 01:48 AM, Richard Biener wrote:
>> On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor 
>wrote:
>>> On 12/14/2017 03:43 AM, Richard Biener wrote:

 On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor 
>wrote:
>
> On 12/12/2017 05:35 PM, Jeff Law wrote:
>>
>>
>> On 12/12/2017 01:15 PM, Martin Sebor wrote:
>>>
>>>
>>> Bug 83373 - False positive reported by -Wstringop-overflow, is
>>> another example of warning triggered by a missed optimization
>>> opportunity, this time in the strlen pass.  The optimization
>>> is discussed in pr78450 - strlen(s) return value can be assumed
>>> to be less than the size of s.  The gist of it is that the
>result
>>> of strlen(array) can be assumed to be less than the size of
>>> the array (except in the corner case of last struct members).
>>>
>>> To avoid the false positive the attached patch adds this
>>> optimization to the strlen pass.  Although the patch passes
>>> bootstrap and regression tests for all front-ends I'm not sure
>>> the way it determines the upper bound of the range is 100%
>>> correct for languages with arrays with a non-zero lower bound.
>>> Maybe it's just not as tight as it could be.
>>
>>
>> What about something hideous like
>>
>> struct fu {
>>   char x1[10];
>>   char x2[10];
>>   int avoid_trailing_array;
>> }
>>
>> Where objects stored in x1 are not null terminated.  Are we in
>the realm
>> of undefined behavior at that point (I hope so)?
>
>
>
> Yes, this is undefined.  Pointer arithmetic (either direct or
> via standard library functions) is only defined for pointers
> to the same object or subobject.  So even something like
>
>  memcpy (pfu->x1, pfu->x1 + 10, 10);
>
> is undefined.


 There's nothing undefined here - computing the pointer pointing
 to one-after-the-last element of an array is valid (you are just
 not allowed to dereference it).
>>>
>>>
>>> Right, and memcpy dereferences it, so it's undefined.
>>
>> That's interpretation of the standard that I don't share.
>
>It's not an interpretation.  It's a basic rule of the languages
>that the standards are explicit about.  In C11 you will find
>this specified in detail in 6.5.6, paragraph 7 and 8 (of
>particular relevance to your question below is p7: "a pointer
>to an object that is not an element of an array behaves the same
>as a pointer to the first element of an array of length one.")

I know. 

>> Also, if I have struct f { int i; int j; };  and a int * that points
>> to the j member you say I have no standard conforming way
>> to get at a pointer to the i member from this, right?
>
>Correct.  See above.
>
>> Because
>> the pointer points to an 'int' object.  But it also points within
>> a struct f object!  So at least maybe (int *)((char *)p - offsetof
>> (struct f, j))
>> should be valid?
>
>No, not really.  It works in practice but it's not well-defined.
>It doesn't matter how you get at the result.  What matters is
>what you start with.  As Jeff said, to derive a pointer to
>distinct suobjects of a larger object you need to start with
>a pointer to the larger object and treat it as an array of
>chars.

That's obviously not constraints people use C and C++ with so I see no way to 
enforce this within gimple.

>> This means that pfu->x1 + 10 is a valid pointer
>> into *pfu no matter what you say and you can dereference it.
>
>No.
>
>As another hopefully more convincing example consider a multi-
>dimensional array A[2][2].  The value of the offset of A[i][j]
>is sizeof A[i] + j.  With that, the offset of A[1][0] is
>sizeof A[1] + 0, and so would be the offset of A[0][2]. But
>that doesn't make A[0][2] a valid reference to an element of
>A (because A[0] has only two elements, A[0][0] and A[0][1]),
>or [0] + 2 a derefernceable pointer.  It's a pointer that
>points just past the last element of the array A[0].  That
>there's another array right after A[0] (namely A[1]) is
>immaterial, same as in the struct f example above.

I know. Dependence analysis relies on this. We've had bugs in the past with gcc 
itself introducing such bogus references. 

Richard. 

>
>Martin



Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-15 Thread Martin Sebor

On 12/15/2017 01:48 AM, Richard Biener wrote:

On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor  wrote:

On 12/14/2017 03:43 AM, Richard Biener wrote:


On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor  wrote:


On 12/12/2017 05:35 PM, Jeff Law wrote:



On 12/12/2017 01:15 PM, Martin Sebor wrote:



Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the result
of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.



What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in the realm
of undefined behavior at that point (I hope so)?




Yes, this is undefined.  Pointer arithmetic (either direct or
via standard library functions) is only defined for pointers
to the same object or subobject.  So even something like

 memcpy (pfu->x1, pfu->x1 + 10, 10);

is undefined.



There's nothing undefined here - computing the pointer pointing
to one-after-the-last element of an array is valid (you are just
not allowed to dereference it).



Right, and memcpy dereferences it, so it's undefined.


That's interpretation of the standard that I don't share.


It's not an interpretation.  It's a basic rule of the languages
that the standards are explicit about.  In C11 you will find
this specified in detail in 6.5.6, paragraph 7 and 8 (of
particular relevance to your question below is p7: "a pointer
to an object that is not an element of an array behaves the same
as a pointer to the first element of an array of length one.")


Also, if I have struct f { int i; int j; };  and a int * that points
to the j member you say I have no standard conforming way
to get at a pointer to the i member from this, right?


Correct.  See above.


Because
the pointer points to an 'int' object.  But it also points within
a struct f object!  So at least maybe (int *)((char *)p - offsetof
(struct f, j))
should be valid?


No, not really.  It works in practice but it's not well-defined.
It doesn't matter how you get at the result.  What matters is
what you start with.  As Jeff said, to derive a pointer to
distinct suobjects of a larger object you need to start with
a pointer to the larger object and treat it as an array of
chars.


This means that pfu->x1 + 10 is a valid pointer
into *pfu no matter what you say and you can dereference it.


No.

As another hopefully more convincing example consider a multi-
dimensional array A[2][2].  The value of the offset of A[i][j]
is sizeof A[i] + j.  With that, the offset of A[1][0] is
sizeof A[1] + 0, and so would be the offset of A[0][2]. But
that doesn't make A[0][2] a valid reference to an element of
A (because A[0] has only two elements, A[0][0] and A[0][1]),
or [0] + 2 a derefernceable pointer.  It's a pointer that
points just past the last element of the array A[0].  That
there's another array right after A[0] (namely A[1]) is
immaterial, same as in the struct f example above.

Martin


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-15 Thread Richard Biener
On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor  wrote:
> On 12/14/2017 03:43 AM, Richard Biener wrote:
>>
>> On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor  wrote:
>>>
>>> On 12/12/2017 05:35 PM, Jeff Law wrote:


 On 12/12/2017 01:15 PM, Martin Sebor wrote:
>
>
> Bug 83373 - False positive reported by -Wstringop-overflow, is
> another example of warning triggered by a missed optimization
> opportunity, this time in the strlen pass.  The optimization
> is discussed in pr78450 - strlen(s) return value can be assumed
> to be less than the size of s.  The gist of it is that the result
> of strlen(array) can be assumed to be less than the size of
> the array (except in the corner case of last struct members).
>
> To avoid the false positive the attached patch adds this
> optimization to the strlen pass.  Although the patch passes
> bootstrap and regression tests for all front-ends I'm not sure
> the way it determines the upper bound of the range is 100%
> correct for languages with arrays with a non-zero lower bound.
> Maybe it's just not as tight as it could be.


 What about something hideous like

 struct fu {
   char x1[10];
   char x2[10];
   int avoid_trailing_array;
 }

 Where objects stored in x1 are not null terminated.  Are we in the realm
 of undefined behavior at that point (I hope so)?
>>>
>>>
>>>
>>> Yes, this is undefined.  Pointer arithmetic (either direct or
>>> via standard library functions) is only defined for pointers
>>> to the same object or subobject.  So even something like
>>>
>>>  memcpy (pfu->x1, pfu->x1 + 10, 10);
>>>
>>> is undefined.
>>
>>
>> There's nothing undefined here - computing the pointer pointing
>> to one-after-the-last element of an array is valid (you are just
>> not allowed to dereference it).
>
>
> Right, and memcpy dereferences it, so it's undefined.

That's interpretation of the standard that I don't share.

Also, if I have struct f { int i; int j; };  and a int * that points
to the j member you say I have no standard conforming way
to get at a pointer to the i member from this, right?  Because
the pointer points to an 'int' object.  But it also points within
a struct f object!  So at least maybe (int *)((char *)p - offsetof
(struct f, j))
should be valid?  This means that pfu->x1 + 10 is a valid pointer
into *pfu no matter what you say and you can dereference it.

Richard.

> Martin
>


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Jeff Law
On 12/14/2017 11:55 AM, Jakub Jelinek wrote:
> On Thu, Dec 14, 2017 at 11:51:26AM -0700, Martin Sebor wrote:
>>> Well, it would be nice to get sanitizers diagnose this at runtime.  If we
>>> know the array length at compile time, simply compare after the strlen
>>> call the result and fail if it returns something above it.  Or replace
>>> the strlen call with strnlen for the compile time known size and add
>>> instrumentation if strnlen returns the second argument.
>>
>> Sure, that sounds like a useful enhancement.  I'll look into
>> adding it as a follow-on patch unless you feel that it needs
>> to be part of the same package.
> 
> The problem is if we'll need changes to libubsan for that (which we'll
> likely do), then those need to be upstreamed, and e.g. my attempts
> to upstream simple patch to diagnose noreturn function returns is suspended
> upstream because clang doesn't have that support (and I have no interest
> in adding to to clang).
> 
> In theory we could have some GCC only file in there, but then we'd be ABI
> incompatible with them.
So defer the sanitization side until Clang catches up?

jeff


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 11:51:26AM -0700, Martin Sebor wrote:
> > Well, it would be nice to get sanitizers diagnose this at runtime.  If we
> > know the array length at compile time, simply compare after the strlen
> > call the result and fail if it returns something above it.  Or replace
> > the strlen call with strnlen for the compile time known size and add
> > instrumentation if strnlen returns the second argument.
> 
> Sure, that sounds like a useful enhancement.  I'll look into
> adding it as a follow-on patch unless you feel that it needs
> to be part of the same package.

The problem is if we'll need changes to libubsan for that (which we'll
likely do), then those need to be upstreamed, and e.g. my attempts
to upstream simple patch to diagnose noreturn function returns is suspended
upstream because clang doesn't have that support (and I have no interest
in adding to to clang).

In theory we could have some GCC only file in there, but then we'd be ABI
incompatible with them.

Jakub


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Martin Sebor

On 12/14/2017 09:18 AM, Jakub Jelinek wrote:

On Thu, Dec 14, 2017 at 09:13:21AM -0700, Jeff Law wrote:

Although I would prefer not to, I suppose if letting strlen cross
the boundaries of subobjects was considered an important use to
accommodate in limited cases the optimization could be disabled
for member arrays declared with the new nonstring attribute (while
still issuing a warning for it as GCC does today).

Another alternative (if the above use case is considered important
enough) might be to suppress the optimization for member character
arrays that are immediately followed by another such array.

History tells us that there will be someone out there that does this
kind of thing -- the question is how pervasive is it.  My suspicion is
that it is not common.

Given that I don't expect those uses to be common, the only thing that
should break is non-conforming code and we have a (new) warning for such
code my inclination is to go forward.

So I'm OK with the patch.  I'd give folks till Monday to chime in with
dissenting opinions.


Well, it would be nice to get sanitizers diagnose this at runtime.  If we
know the array length at compile time, simply compare after the strlen
call the result and fail if it returns something above it.  Or replace
the strlen call with strnlen for the compile time known size and add
instrumentation if strnlen returns the second argument.


Sure, that sounds like a useful enhancement.  I'll look into
adding it as a follow-on patch unless you feel that it needs
to be part of the same package.

Martin


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2017 at 09:13:21AM -0700, Jeff Law wrote:
> > Although I would prefer not to, I suppose if letting strlen cross
> > the boundaries of subobjects was considered an important use to
> > accommodate in limited cases the optimization could be disabled
> > for member arrays declared with the new nonstring attribute (while
> > still issuing a warning for it as GCC does today).
> > 
> > Another alternative (if the above use case is considered important
> > enough) might be to suppress the optimization for member character
> > arrays that are immediately followed by another such array.
> History tells us that there will be someone out there that does this
> kind of thing -- the question is how pervasive is it.  My suspicion is
> that it is not common.
> 
> Given that I don't expect those uses to be common, the only thing that
> should break is non-conforming code and we have a (new) warning for such
> code my inclination is to go forward.
> 
> So I'm OK with the patch.  I'd give folks till Monday to chime in with
> dissenting opinions.

Well, it would be nice to get sanitizers diagnose this at runtime.  If we
know the array length at compile time, simply compare after the strlen
call the result and fail if it returns something above it.  Or replace
the strlen call with strnlen for the compile time known size and add
instrumentation if strnlen returns the second argument.

Jakub


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Jeff Law
On 12/12/2017 08:47 PM, Martin Sebor wrote:
>>
>> struct fu {
>>   char x1[10];
>>   char x2[10];
>>   int avoid_trailing_array;
>> }
>>
>> Where objects stored in x1 are not null terminated.  Are we in the realm
>> of undefined behavior at that point (I hope so)?
> 
> Yes, this is undefined.  Pointer arithmetic (either direct or
> via standard library functions) is only defined for pointers
> to the same object or subobject.  So even something like
That's what I expected -- I just wanted to be sure there wasn't some
kind of escape clause that would allow one to compose an object of that
nature, then use pfu->x1 to read/write pfu->x2.


> 
>  memcpy (pfu->x1, pfu->x1 + 10, 10);
> 
> is undefined.  _FORTIFY_SOURCE allows it for raw memory functions
> because some low level code copies regions of the same object that
> span two or more members (I've seen an example or two in the Linux
> kernel but, IIRC, nowhere else).  With the patch, using memchr
> would be the only way to get away with it.
Presumably the right way to go about things in this situation is to use
a pointer to the outer object.

> 
> Other than that, the new -Wstringop-truncation warning is designed
> to prevent creating unterminated character arrays and the manual
> suggests using attribute nonstring when it's necessary.  The
> -Wrestrict warning I'm about to check in also complains about
> forming invalid pointers by built-ins with restrict-qualified
> arguments (although it won't diagnose the above).
Right.  The concern is that it's an new option and folks likely haven't
cleaned up their codebases for these kinds of issues.

> 
> Although I would prefer not to, I suppose if letting strlen cross
> the boundaries of subobjects was considered an important use to
> accommodate in limited cases the optimization could be disabled
> for member arrays declared with the new nonstring attribute (while
> still issuing a warning for it as GCC does today).
> 
> Another alternative (if the above use case is considered important
> enough) might be to suppress the optimization for member character
> arrays that are immediately followed by another such array.
History tells us that there will be someone out there that does this
kind of thing -- the question is how pervasive is it.  My suspicion is
that it is not common.

Given that I don't expect those uses to be common, the only thing that
should break is non-conforming code and we have a (new) warning for such
code my inclination is to go forward.

So I'm OK with the patch.  I'd give folks till Monday to chime in with
dissenting opinions.

Jeff


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Martin Sebor

On 12/14/2017 03:43 AM, Richard Biener wrote:

On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor  wrote:

On 12/12/2017 05:35 PM, Jeff Law wrote:


On 12/12/2017 01:15 PM, Martin Sebor wrote:


Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the result
of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.


What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in the realm
of undefined behavior at that point (I hope so)?



Yes, this is undefined.  Pointer arithmetic (either direct or
via standard library functions) is only defined for pointers
to the same object or subobject.  So even something like

 memcpy (pfu->x1, pfu->x1 + 10, 10);

is undefined.


There's nothing undefined here - computing the pointer pointing
to one-after-the-last element of an array is valid (you are just
not allowed to dereference it).


Right, and memcpy dereferences it, so it's undefined.

Martin



Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-14 Thread Richard Biener
On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor  wrote:
> On 12/12/2017 05:35 PM, Jeff Law wrote:
>>
>> On 12/12/2017 01:15 PM, Martin Sebor wrote:
>>>
>>> Bug 83373 - False positive reported by -Wstringop-overflow, is
>>> another example of warning triggered by a missed optimization
>>> opportunity, this time in the strlen pass.  The optimization
>>> is discussed in pr78450 - strlen(s) return value can be assumed
>>> to be less than the size of s.  The gist of it is that the result
>>> of strlen(array) can be assumed to be less than the size of
>>> the array (except in the corner case of last struct members).
>>>
>>> To avoid the false positive the attached patch adds this
>>> optimization to the strlen pass.  Although the patch passes
>>> bootstrap and regression tests for all front-ends I'm not sure
>>> the way it determines the upper bound of the range is 100%
>>> correct for languages with arrays with a non-zero lower bound.
>>> Maybe it's just not as tight as it could be.
>>
>> What about something hideous like
>>
>> struct fu {
>>   char x1[10];
>>   char x2[10];
>>   int avoid_trailing_array;
>> }
>>
>> Where objects stored in x1 are not null terminated.  Are we in the realm
>> of undefined behavior at that point (I hope so)?
>
>
> Yes, this is undefined.  Pointer arithmetic (either direct or
> via standard library functions) is only defined for pointers
> to the same object or subobject.  So even something like
>
>  memcpy (pfu->x1, pfu->x1 + 10, 10);
>
> is undefined.

There's nothing undefined here - computing the pointer pointing
to one-after-the-last element of an array is valid (you are just
not allowed to dereference it).

>  _FORTIFY_SOURCE allows it for raw memory functions
> because some low level code copies regions of the same object that
> span two or more members (I've seen an example or two in the Linux
> kernel but, IIRC, nowhere else).  With the patch, using memchr
> would be the only way to get away with it.
>
> Other than that, the new -Wstringop-truncation warning is designed
> to prevent creating unterminated character arrays and the manual
> suggests using attribute nonstring when it's necessary.  The
> -Wrestrict warning I'm about to check in also complains about
> forming invalid pointers by built-ins with restrict-qualified
> arguments (although it won't diagnose the above).
>
> Although I would prefer not to, I suppose if letting strlen cross
> the boundaries of subobjects was considered an important use to
> accommodate in limited cases the optimization could be disabled
> for member arrays declared with the new nonstring attribute (while
> still issuing a warning for it as GCC does today).
>
> Another alternative (if the above use case is considered important
> enough) might be to suppress the optimization for member character
> arrays that are immediately followed by another such array.
>
> Martin


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-13 Thread Martin Sebor

On 12/13/2017 12:25 AM, Bernhard Reutner-Fischer wrote:

On 12 December 2017 21:15:25 CET, Martin Sebor  wrote:



Tested on x86_64-linux.


I assume this test worked even before this patch.


Of the tests added by the patch, strlenopt-37.c passes without
the compiler changes and strlenopt-36.c fails.  Both are expected.
-37.c verifies the optimization doesn't happen when it shouldn't
happen and -36.c verifies it does take place when it's safe.

The regression test for the warning fails without the patch and
passes with the patch applied.


Thus:

s/oveflow/overflow/


Thanks.  I'll fix that if/when the patch is approved.

Martin



thanks,





Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-12 Thread Bernhard Reutner-Fischer
On 12 December 2017 21:15:25 CET, Martin Sebor  wrote:

>
>Tested on x86_64-linux.

I assume this test worked even before this patch. Thus: 

s/oveflow/overflow/

thanks, 


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-12 Thread Martin Sebor

On 12/12/2017 05:35 PM, Jeff Law wrote:

On 12/12/2017 01:15 PM, Martin Sebor wrote:

Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the result
of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.

What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in the realm
of undefined behavior at that point (I hope so)?


Yes, this is undefined.  Pointer arithmetic (either direct or
via standard library functions) is only defined for pointers
to the same object or subobject.  So even something like

 memcpy (pfu->x1, pfu->x1 + 10, 10);

is undefined.  _FORTIFY_SOURCE allows it for raw memory functions
because some low level code copies regions of the same object that
span two or more members (I've seen an example or two in the Linux
kernel but, IIRC, nowhere else).  With the patch, using memchr
would be the only way to get away with it.

Other than that, the new -Wstringop-truncation warning is designed
to prevent creating unterminated character arrays and the manual
suggests using attribute nonstring when it's necessary.  The
-Wrestrict warning I'm about to check in also complains about
forming invalid pointers by built-ins with restrict-qualified
arguments (although it won't diagnose the above).

Although I would prefer not to, I suppose if letting strlen cross
the boundaries of subobjects was considered an important use to
accommodate in limited cases the optimization could be disabled
for member arrays declared with the new nonstring attribute (while
still issuing a warning for it as GCC does today).

Another alternative (if the above use case is considered important
enough) might be to suppress the optimization for member character
arrays that are immediately followed by another such array.

Martin


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-12 Thread Jeff Law
On 12/12/2017 01:15 PM, Martin Sebor wrote:
> Bug 83373 - False positive reported by -Wstringop-overflow, is
> another example of warning triggered by a missed optimization
> opportunity, this time in the strlen pass.  The optimization
> is discussed in pr78450 - strlen(s) return value can be assumed
> to be less than the size of s.  The gist of it is that the result
> of strlen(array) can be assumed to be less than the size of
> the array (except in the corner case of last struct members).
> 
> To avoid the false positive the attached patch adds this
> optimization to the strlen pass.  Although the patch passes
> bootstrap and regression tests for all front-ends I'm not sure
> the way it determines the upper bound of the range is 100%
> correct for languages with arrays with a non-zero lower bound.
> Maybe it's just not as tight as it could be.
What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in the realm
of undefined behavior at that point (I hope so)?

jeff