Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-29 Thread Jeff Law
On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>
>>>    /* If the length can be computed at compile-time, return it.  */
>>> -  len = c_strlen (src, 0);
>>> +  tree array;
>>> +  tree len = c_strlen (src, 0, );
>>
>> You know the c_strlen tries to compute wide character sizes,
>> but strlen does not do that, strlen (L"abc") should give 1
>> (or 0 on a BE machine)
>> I wonder if that is correct.
>>
> [snip]
>>>
>>>  static tree
>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>  {
>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>  return NULL_TREE;
>>>    else
>>>  {
>>> -  tree len = c_strlen (arg, 0);
>>> -
>>> +  tree arr = NULL_TREE;
>>> +  tree len = c_strlen (arg, 0, );
>>
>> Is it possible to write a test case where strlen(L"test") reaches this point?
>> what will c_strlen return then?
>>
> 
> Yes, of course it is:
> 
> $ cat y.c
> int f(char *x)
> {
>return __builtin_strlen(x);
> }
> 
> int main ()
> {
>return f((char*)"abcdef"[0]);
> }
FWIW, I've twiddled this a bit and included it in Martin's patch for
86711/86714.  THe proper return value is 0 or 1 depending on endianness.


Jeff


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-26 Thread Bernd Edlinger
On 08/25/18 22:42, Martin Sebor wrote:
> On 08/25/2018 01:32 PM, Bernd Edlinger wrote:
>> On 08/25/18 21:02, Jeff Law wrote:
>>> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
>>>
>>>
>>
>> Well, ya call it "layer one patch over the other"
>> I call it "incremental improvements".
> It is (of course) a case by case basis.  The way I try to look at these
> things is to ask whether or not the first patch under consideration
> would have any value/purpose after the second patch was installed.  If
> so, then it may make sense to include both.  If not, then we really just
> want one patch.
>

 Agreed.  I think the question is which of the possible STRING_CST
 semantics we want to have in the end (the middle-end).
 Everything builds on top of the semantic properties of STRING_CSTs.
>>> This certainly plays a role.  I bumped pretty hard against the
>>> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
>>> those more consistent will ultimately simplify things and avoid the
>>> problems I'm stumbling over.
>>>
>>> Of course, that means more delays in getting this sorted out.  I really
>>> thought I had a viable plan a couple days ago, but I'm having to rethink
>>> in light of some of the issues raised.
>>>
>>
>> I think we should slow down.
> 
> That's coming from someone whose been piling on revisions upon
> revisions of your own work here as you change your mind about
> whether STRING_CST should or shouldn't have a nul byte at
> the end.  You've been doing nothing but slowing us down for
> the last five weeks.
> 

Yes, you know, I may be stubborn, but my point of view regarding
these matters is not set in stone.

And it happens that I listen to what others say, including you.

And if after careful consideration I find myself in agreement with
what was said, I just take the freedom to change my point of view.

I for one have learned a lot out of this discussion and out of
writing and repeatedly revising my patches.

I truly hope others have as well gained some new insights.


Thanks
Bernd.


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-25 Thread Jeff Law
On 08/25/2018 01:32 PM, Bernd Edlinger wrote:
> On 08/25/18 21:02, Jeff Law wrote:
>> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
>>
>>
>
> Well, ya call it "layer one patch over the other"
> I call it "incremental improvements".
 It is (of course) a case by case basis.  The way I try to look at these
 things is to ask whether or not the first patch under consideration
 would have any value/purpose after the second patch was installed.  If
 so, then it may make sense to include both.  If not, then we really just
 want one patch.

>>>
>>> Agreed.  I think the question is which of the possible STRING_CST
>>> semantics we want to have in the end (the middle-end).
>>> Everything builds on top of the semantic properties of STRING_CSTs.
>> This certainly plays a role.  I bumped pretty hard against the
>> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
>> those more consistent will ultimately simplify things and avoid the
>> problems I'm stumbling over.
>>
>> Of course, that means more delays in getting this sorted out.  I really
>> thought I had a viable plan a couple days ago, but I'm having to rethink
>> in light of some of the issues raised.
>>
> 
> I think we should slow down.
> 
>>
>>>
>>> My first attempt of fix the STRING_CST semantic was trying to make
>>> string_constant happy.
>>>
>>> My second attempt is trying to make Richard happy.  And when I look
>>> at both patches, I think the second one is better, and more simple.
>> In general I've found that Richie's advice generally results in a
>> cleaner implementation ;-)
>>>
>>>
>>> BTW I need to correct on statement in my last e-mail:
>>>
>>> On 08/24/18 23:55, Bernd Edlinger wrote:>
 here I quote form martins 1/6 patch:
> +  /* Compute the lower bound number of elements (not bytes) in the array
> + that the string is used to initialize.  The actual size of the array
> + will be may be greater if the string is shorter, but the important
> + data point is whether the literal, including the terminating nul,
> + fits in the array. */
> +  unsigned HOST_WIDE_INT array_elts
> += tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
> +
> +  /* Compute the string length in (wide) characters.  */

 So prior to my STRING_CST patch this will ICE on a flexible array member,
 because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.

 I used:

compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
  TREE_STRING_LENGTH (init))

 and this will not ICE with NULL, but consider it like infinity,
 and return 1.

 So my version did not ICE in that case.

>>>
>>> Oooops,
>>>
>>> actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
>>> I actually tried to test that case, but have done something wrong.
>>>
>>> I hope we can get rid if the incomplete types in the middle-end.
>>>
>>> So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
>>> here?   Or maybe just defer until we have clarity about the semantics.
>> Not sure.  I've tried largely to not let VLA issues drive anything here.
>>   I'm not a fan of them for a variety of reasons and thus I tend to look
>> at all the VLA stuff as exceptional cases.
>>
> 
> We should have a test case with flexible array members, those behave
> slightly different than VLAs.
> 
> struct {
>int i;
>char x[];
> } s;
> 
> const struct s s = { 1, "test" };
> 
> int f()
> {
>return strlen(s.x);
> }
> 
> 
> By the way, when you change so much on Martin's patch it might be
> good to post it again on this list, when it's ready, so maybe I can have a
> look at it and help out with some review comments?  (please put me on CC)
Actually very little changed other than mechanical stuff.  It's one
chunk of code that assumed STRING_CSTs are always NUL terminated that's
problem with that code in isolation.

But I also have to evaluate your patches in the same area and also look
at the known follow-ups to see how the various patches are likely to
interact.

And each time core assumptions/issues are reexamined parts of the
evaluation have to be redone.  In some cases they may simplify, in
others they make the analysis more complex.

Jeff
> 
> 
> Bernd.
> 



Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-25 Thread Martin Sebor

On 08/25/2018 01:32 PM, Bernd Edlinger wrote:

On 08/25/18 21:02, Jeff Law wrote:

On 08/25/2018 12:36 PM, Bernd Edlinger wrote:





Well, ya call it "layer one patch over the other"
I call it "incremental improvements".

It is (of course) a case by case basis.  The way I try to look at these
things is to ask whether or not the first patch under consideration
would have any value/purpose after the second patch was installed.  If
so, then it may make sense to include both.  If not, then we really just
want one patch.



Agreed.  I think the question is which of the possible STRING_CST
semantics we want to have in the end (the middle-end).
Everything builds on top of the semantic properties of STRING_CSTs.

This certainly plays a role.  I bumped pretty hard against the
STRING_CST semantics issue with Martin's patch.  I'm hoping that making
those more consistent will ultimately simplify things and avoid the
problems I'm stumbling over.

Of course, that means more delays in getting this sorted out.  I really
thought I had a viable plan a couple days ago, but I'm having to rethink
in light of some of the issues raised.



I think we should slow down.


That's coming from someone whose been piling on revisions upon
revisions of your own work here as you change your mind about
whether STRING_CST should or shouldn't have a nul byte at
the end.  You've been doing nothing but slowing us down for
the last five weeks.

There's nothing in this basic patch that cannot be easily adjusted
if something changes in the future.  But there is quite a bit of
useful work already that builds on the basic infrastructure in it
that could move forward and that wouldn't be significantly affected
by changes to the underlying representation.

Martin


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-25 Thread Bernd Edlinger
On 08/25/18 21:02, Jeff Law wrote:
> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
> 
>

 Well, ya call it "layer one patch over the other"
 I call it "incremental improvements".
>>> It is (of course) a case by case basis.  The way I try to look at these
>>> things is to ask whether or not the first patch under consideration
>>> would have any value/purpose after the second patch was installed.  If
>>> so, then it may make sense to include both.  If not, then we really just
>>> want one patch.
>>>
>>
>> Agreed.  I think the question is which of the possible STRING_CST
>> semantics we want to have in the end (the middle-end).
>> Everything builds on top of the semantic properties of STRING_CSTs.
> This certainly plays a role.  I bumped pretty hard against the
> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
> those more consistent will ultimately simplify things and avoid the
> problems I'm stumbling over.
> 
> Of course, that means more delays in getting this sorted out.  I really
> thought I had a viable plan a couple days ago, but I'm having to rethink
> in light of some of the issues raised.
> 

I think we should slow down.

> 
>>
>> My first attempt of fix the STRING_CST semantic was trying to make
>> string_constant happy.
>>
>> My second attempt is trying to make Richard happy.  And when I look
>> at both patches, I think the second one is better, and more simple.
> In general I've found that Richie's advice generally results in a
> cleaner implementation ;-)
>>
>>
>> BTW I need to correct on statement in my last e-mail:
>>
>> On 08/24/18 23:55, Bernd Edlinger wrote:>
>>> here I quote form martins 1/6 patch:
 +  /* Compute the lower bound number of elements (not bytes) in the array
 + that the string is used to initialize.  The actual size of the array
 + will be may be greater if the string is shorter, but the important
 + data point is whether the literal, including the terminating nul,
 + fits in the array. */
 +  unsigned HOST_WIDE_INT array_elts
 += tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
 +
 +  /* Compute the string length in (wide) characters.  */
>>>
>>> So prior to my STRING_CST patch this will ICE on a flexible array member,
>>> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
>>>
>>> I used:
>>>
>>>compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>>>  TREE_STRING_LENGTH (init))
>>>
>>> and this will not ICE with NULL, but consider it like infinity,
>>> and return 1.
>>>
>>> So my version did not ICE in that case.
>>>
>>
>> Oooops,
>>
>> actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
>> I actually tried to test that case, but have done something wrong.
>>
>> I hope we can get rid if the incomplete types in the middle-end.
>>
>> So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
>> here?   Or maybe just defer until we have clarity about the semantics.
> Not sure.  I've tried largely to not let VLA issues drive anything here.
>   I'm not a fan of them for a variety of reasons and thus I tend to look
> at all the VLA stuff as exceptional cases.
> 

We should have a test case with flexible array members, those behave
slightly different than VLAs.

struct {
   int i;
   char x[];
} s;

const struct s s = { 1, "test" };

int f()
{
   return strlen(s.x);
}


By the way, when you change so much on Martin's patch it might be
good to post it again on this list, when it's ready, so maybe I can have a
look at it and help out with some review comments?  (please put me on CC)


Bernd.


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-25 Thread Jeff Law
On 08/25/2018 12:36 PM, Bernd Edlinger wrote:


>>>
>>> Well, ya call it "layer one patch over the other"
>>> I call it "incremental improvements".
>> It is (of course) a case by case basis.  The way I try to look at these
>> things is to ask whether or not the first patch under consideration
>> would have any value/purpose after the second patch was installed.  If
>> so, then it may make sense to include both.  If not, then we really just
>> want one patch.
>>
> 
> Agreed.  I think the question is which of the possible STRING_CST
> semantics we want to have in the end (the middle-end).
> Everything builds on top of the semantic properties of STRING_CSTs.
This certainly plays a role.  I bumped pretty hard against the
STRING_CST semantics issue with Martin's patch.  I'm hoping that making
those more consistent will ultimately simplify things and avoid the
problems I'm stumbling over.

Of course, that means more delays in getting this sorted out.  I really
thought I had a viable plan a couple days ago, but I'm having to rethink
in light of some of the issues raised.


> 
> My first attempt of fix the STRING_CST semantic was trying to make
> string_constant happy.
> 
> My second attempt is trying to make Richard happy.  And when I look
> at both patches, I think the second one is better, and more simple.
In general I've found that Richie's advice generally results in a
cleaner implementation ;-)
> 
> 
> BTW I need to correct on statement in my last e-mail:
> 
> On 08/24/18 23:55, Bernd Edlinger wrote:>
>> here I quote form martins 1/6 patch:
>>> +  /* Compute the lower bound number of elements (not bytes) in the array
>>> + that the string is used to initialize.  The actual size of the array
>>> + will be may be greater if the string is shorter, but the important
>>> + data point is whether the literal, including the terminating nul,
>>> + fits in the array. */
>>> +  unsigned HOST_WIDE_INT array_elts
>>> += tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>>> +
>>> +  /* Compute the string length in (wide) characters.  */
>>
>> So prior to my STRING_CST patch this will ICE on a flexible array member,
>> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
>>
>> I used:
>>
>>   compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>> TREE_STRING_LENGTH (init))
>>
>> and this will not ICE with NULL, but consider it like infinity,
>> and return 1.
>>
>> So my version did not ICE in that case.
>>
> 
> Oooops,
> 
> actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
> I actually tried to test that case, but have done something wrong.
> 
> I hope we can get rid if the incomplete types in the middle-end.
> 
> So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
> here?   Or maybe just defer until we have clarity about the semantics.
Not sure.  I've tried largely to not let VLA issues drive anything here.
 I'm not a fan of them for a variety of reasons and thus I tend to look
at all the VLA stuff as exceptional cases.

jeff


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-25 Thread Bernd Edlinger
On 08/25/18 19:32, Jeff Law wrote:
> On 08/25/2018 12:32 AM, Bernd Edlinger wrote:
>> On 08/25/18 01:54, Jeff Law wrote:
>>> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
 On 08/24/18 18:51, Jeff Law wrote:
>> Well, this is broken for wide character strings.
>> but I hope we can get rid of STRING_CST which are
>> not explicitly null terminated.

 I am afraid that is not going to happen.
 Maybe we can get STRING_CST that are never longer
 than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
 need to take care that the string is zero-terminated.

 string_constant, should not promise the string is zero terminated.
 But instead it can promise that:
 1) the STRING_CST is valid up to TREE_STRING_LENGTH
 2) mem_size is >= TREE_STRING_LENGTH
 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.

 It will not guarantee anything about zero termination any more.
>>> Interesting because those conditions would be sufficient to deal with a
>>> regression I stumbled over after fixing Martin's patch to not assume
>>> that all STRING_CSTs are NUL terminated.
>>>
>>> But I need to think about this a bit more.  Essentially the question
>>> we'd need to ask is whether or not these are sufficient in general or
>>> just in specific cases.
>>>
>>> I tend to think they're not sufficient in general. If a string returned
>>> by string_constant that didn't have a terminating NUL, but which did
>>> pass the tests above were ultimately passed to the runtime's str*
>>> routines, then the call may run off the end of the string.  We'd like to
>>> be able to warn for that.
>>>
>>> So ISTM those rules are only valid in contexts where we know the result
>>> isn't going to be passed to str* and friends within the C library.
>>>
>>> I do think they're sufficient to avoid problems with the
>>> tree-ssa-forwprop code we've looked at.  So what may make the most sense
>>> is to have that routine indicate it's willing to accept unterminated
>>> strings, then check the conditions above before optimizing the code.
>>>
>>
>> There are not too many callers of string_constant.
>> Not all need zero termination.
> Right.  And in retrospect we probably should have avoided default
> parameter overloads and just fixed the callers.  But that can be a
> follow-up.
> 
>>
>> But I think if the are interested in zero-termination
>> they should simply call c_strlen or c_getstr.
> Perhaps.
> 
> 
>>
>>

 In the end, the best approach might be to either merge my patch
 with Martins, or step-wise, first fixing wrong code, and then
 implementing warnings without fixing wrong code.
>>> Unsure at this time.  I've been working with both.  I suspect that if we
>>> went with yours that we'd then turn around and layer Martin's on top of
>>> it because of the desire to signal to callers that we have an
>>> unterminated string and have the callers take appropriate action.  Which
>>> begs the question of whether or not we just go with Martin's -- ie, is
>>> there really any value in using both.  I haven't seen indications there
>>> is value in that approach, but I'm still poking at things.
>>>
>>
>> Well, ya call it "layer one patch over the other"
>> I call it "incremental improvements".
> It is (of course) a case by case basis.  The way I try to look at these
> things is to ask whether or not the first patch under consideration
> would have any value/purpose after the second patch was installed.  If
> so, then it may make sense to include both.  If not, then we really just
> want one patch.
> 

Agreed.  I think the question is which of the possible STRING_CST
semantics we want to have in the end (the middle-end).
Everything builds on top of the semantic properties of STRING_CSTs.

My first attempt of fix the STRING_CST semantic was trying to make
string_constant happy.

My second attempt is trying to make Richard happy.  And when I look
at both patches, I think the second one is better, and more simple.


BTW I need to correct on statement in my last e-mail:

On 08/24/18 23:55, Bernd Edlinger wrote:>
> here I quote form martins 1/6 patch:
>> +  /* Compute the lower bound number of elements (not bytes) in the array
>> + that the string is used to initialize.  The actual size of the array
>> + will be may be greater if the string is shorter, but the important
>> + data point is whether the literal, including the terminating nul,
>> + fits in the array. */
>> +  unsigned HOST_WIDE_INT array_elts
>> += tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>> +
>> +  /* Compute the string length in (wide) characters.  */
> 
> So prior to my STRING_CST patch this will ICE on a flexible array member,
> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
> 
> I used:
> 
>   compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
> TREE_STRING_LENGTH (init))
> 
> and this will not ICE with NULL, but consider it like infinity,
> and 

Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-25 Thread Jeff Law
On 08/25/2018 12:32 AM, Bernd Edlinger wrote:
> On 08/25/18 01:54, Jeff Law wrote:
>> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
>>> On 08/24/18 18:51, Jeff Law wrote:
> Well, this is broken for wide character strings.
> but I hope we can get rid of STRING_CST which are
> not explicitly null terminated.
>>>
>>> I am afraid that is not going to happen.
>>> Maybe we can get STRING_CST that are never longer
>>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
>>> need to take care that the string is zero-terminated.
>>>
>>> string_constant, should not promise the string is zero terminated.
>>> But instead it can promise that:
>>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
>>> 2) mem_size is >= TREE_STRING_LENGTH
>>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
>>>
>>> It will not guarantee anything about zero termination any more.
>> Interesting because those conditions would be sufficient to deal with a
>> regression I stumbled over after fixing Martin's patch to not assume
>> that all STRING_CSTs are NUL terminated.
>>
>> But I need to think about this a bit more.  Essentially the question
>> we'd need to ask is whether or not these are sufficient in general or
>> just in specific cases.
>>
>> I tend to think they're not sufficient in general. If a string returned
>> by string_constant that didn't have a terminating NUL, but which did
>> pass the tests above were ultimately passed to the runtime's str*
>> routines, then the call may run off the end of the string.  We'd like to
>> be able to warn for that.
>>
>> So ISTM those rules are only valid in contexts where we know the result
>> isn't going to be passed to str* and friends within the C library.
>>
>> I do think they're sufficient to avoid problems with the
>> tree-ssa-forwprop code we've looked at.  So what may make the most sense
>> is to have that routine indicate it's willing to accept unterminated
>> strings, then check the conditions above before optimizing the code.
>>
> 
> There are not too many callers of string_constant.
> Not all need zero termination.
Right.  And in retrospect we probably should have avoided default
parameter overloads and just fixed the callers.  But that can be a
follow-up.

> 
> But I think if the are interested in zero-termination
> they should simply call c_strlen or c_getstr.
Perhaps.


> 
> 
>>>
>>> In the end, the best approach might be to either merge my patch
>>> with Martins, or step-wise, first fixing wrong code, and then
>>> implementing warnings without fixing wrong code.
>> Unsure at this time.  I've been working with both.  I suspect that if we
>> went with yours that we'd then turn around and layer Martin's on top of
>> it because of the desire to signal to callers that we have an
>> unterminated string and have the callers take appropriate action.  Which
>> begs the question of whether or not we just go with Martin's -- ie, is
>> there really any value in using both.  I haven't seen indications there
>> is value in that approach, but I'm still poking at things.
>>
> 
> Well, ya call it "layer one patch over the other"
> I call it "incremental improvements".
It is (of course) a case by case basis.  The way I try to look at these
things is to ask whether or not the first patch under consideration
would have any value/purpose after the second patch was installed.  If
so, then it may make sense to include both.  If not, then we really just
want one patch.

Jeff


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-25 Thread Bernd Edlinger
On 08/25/18 01:54, Jeff Law wrote:
> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
>> On 08/24/18 18:51, Jeff Law wrote:
 Well, this is broken for wide character strings.
 but I hope we can get rid of STRING_CST which are
 not explicitly null terminated.
>>
>> I am afraid that is not going to happen.
>> Maybe we can get STRING_CST that are never longer
>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
>> need to take care that the string is zero-terminated.
>>
>> string_constant, should not promise the string is zero terminated.
>> But instead it can promise that:
>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
>> 2) mem_size is >= TREE_STRING_LENGTH
>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
>>
>> It will not guarantee anything about zero termination any more.
> Interesting because those conditions would be sufficient to deal with a
> regression I stumbled over after fixing Martin's patch to not assume
> that all STRING_CSTs are NUL terminated.
> 
> But I need to think about this a bit more.  Essentially the question
> we'd need to ask is whether or not these are sufficient in general or
> just in specific cases.
> 
> I tend to think they're not sufficient in general. If a string returned
> by string_constant that didn't have a terminating NUL, but which did
> pass the tests above were ultimately passed to the runtime's str*
> routines, then the call may run off the end of the string.  We'd like to
> be able to warn for that.
> 
> So ISTM those rules are only valid in contexts where we know the result
> isn't going to be passed to str* and friends within the C library.
> 
> I do think they're sufficient to avoid problems with the
> tree-ssa-forwprop code we've looked at.  So what may make the most sense
> is to have that routine indicate it's willing to accept unterminated
> strings, then check the conditions above before optimizing the code.
> 

There are not too many callers of string_constant.
Not all need zero termination.

But I think if the are interested in zero-termination
they should simply call c_strlen or c_getstr.


>>
>> In the end, the best approach might be to either merge my patch
>> with Martins, or step-wise, first fixing wrong code, and then
>> implementing warnings without fixing wrong code.
> Unsure at this time.  I've been working with both.  I suspect that if we
> went with yours that we'd then turn around and layer Martin's on top of
> it because of the desire to signal to callers that we have an
> unterminated string and have the callers take appropriate action.  Which
> begs the question of whether or not we just go with Martin's -- ie, is
> there really any value in using both.  I haven't seen indications there
> is value in that approach, but I'm still poking at things.
> 

Well, ya call it "layer one patch over the other"
I call it "incremental improvements".


Bernd.



Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-24 Thread Jeff Law
On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
> On 08/24/18 18:51, Jeff Law wrote:
>>> Well, this is broken for wide character strings.
>>> but I hope we can get rid of STRING_CST which are
>>> not explicitly null terminated.
> 
> I am afraid that is not going to happen.
> Maybe we can get STRING_CST that are never longer
> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
> need to take care that the string is zero-terminated.
> 
> string_constant, should not promise the string is zero terminated.
> But instead it can promise that:
> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
> 2) mem_size is >= TREE_STRING_LENGTH
> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
> 
> It will not guarantee anything about zero termination any more.
Interesting because those conditions would be sufficient to deal with a
regression I stumbled over after fixing Martin's patch to not assume
that all STRING_CSTs are NUL terminated.

But I need to think about this a bit more.  Essentially the question
we'd need to ask is whether or not these are sufficient in general or
just in specific cases.

I tend to think they're not sufficient in general. If a string returned
by string_constant that didn't have a terminating NUL, but which did
pass the tests above were ultimately passed to the runtime's str*
routines, then the call may run off the end of the string.  We'd like to
be able to warn for that.

So ISTM those rules are only valid in contexts where we know the result
isn't going to be passed to str* and friends within the C library.

I do think they're sufficient to avoid problems with the
tree-ssa-forwprop code we've looked at.  So what may make the most sense
is to have that routine indicate it's willing to accept unterminated
strings, then check the conditions above before optimizing the code.

> 
> In the end, the best approach might be to either merge my patch
> with Martins, or step-wise, first fixing wrong code, and then
> implementing warnings without fixing wrong code.
Unsure at this time.  I've been working with both.  I suspect that if we
went with yours that we'd then turn around and layer Martin's on top of
it because of the desire to signal to callers that we have an
unterminated string and have the callers take appropriate action.  Which
begs the question of whether or not we just go with Martin's -- ie, is
there really any value in using both.  I haven't seen indications there
is value in that approach, but I'm still poking at things.

Jeff


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-24 Thread Bernd Edlinger
On 08/24/18 18:04, Jeff Law wrote:
> On 08/24/2018 06:27 AM, Bernd Edlinger wrote:
> [ Lots of snipping throughout ]
> 
> 

> +
>  if (TREE_CODE (src) == COND_EXPR
>  && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0
>{
> -  tree len1, len2;
> -
> -  len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
> -  len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
> +  tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
> +  tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>  if (tree_int_cst_equal (len1, len2))
> - return len1;
> + {

 funny, if called with NULL *arr and arrs[0] alias each other.
>>>

> +   *arr = arrs[0] ? arrs[0] : arrs[1];
> +   return len1;
> + }
>}
>>>
>>> And in this case it looks like your comment is before the code you're
>>> commenting about.  It was fairly obvious in this case because the code
>>> prior to your "funny, if called..." comment didn't reference arr or arrs
>>> at all.
>>>
>>> And more importantly, why are you concerned about the aliasing?
>>>
>>
>> It is just *arr = arrs[0] does nothing, but it looks like the author
>> was not aware of it.  It may be okay, but causes head-scratching.
>> If you don't have the context you will think this does something different.
> *arr = arrs[0] ? arrs[0] : arrs[1];
> 
> Yes, there's potentially a dead store when arr points to arrs[0] because
> of the earlier initialization when NULL was passed for arr.  But
> otherwise we'd be checking repeatedly that arr != NULL.
> 
> 
>  /* PTR can point to the byte representation of any string type, 
> including
> char* and wchar_t*.  */
>  const char *ptr = TREE_STRING_POINTER (src);
> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>  offsave = fold_convert (ssizetype, offsave);
>  tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, 
> offsave,
> build_int_cst (ssizetype, len * 
> eltsize));

 this computation is wrong, it computes not in units of eltsize,
 I am however not sure if it is really good that this function tries to
 compute strlen of wide character strings.
>>> Note that in the second version of Martin's patch eltsize will always be
>>> 1 when we get here.  Furthermore, the multiplication by eltsize is gone.
>>>It looks like you switched back to commenting after the code for that
>>> comment, but then immediately thereafter...
>>>
>>
>> Acutally I had no idea that the second patch did resolve some of my comments
>> and which those were.
> It happens.  Multiple long threads make it difficult to follow.
> 
>>
>> I had the impression that it is just splitting up of a large patch into
>> several smaller without reworking at the same time.
>>
>> Once again, a summary what was changed would be helpful.
> Agreed.
> 
> 
 What are you fixing here, I think that was another bug.
 If this fixes something then it should be in a different patch,
 just handling this.

>  unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
> - maxelts - eltoff);
> + strelts - eltoff);
>
>  return ssize_int (len);
>}
>>> I'm guessing the comment in this case refers to the code after.
>>> Presumably questioning the change from maxelts to strelts.
>>>
>>>
>>
>> Yes.  I was thinking that could be a patch of its own.
> Funny.  I found myself in agreement and was going to extract this out
> and run it through the usual tests and get it onto the trunk
> immediately.  Then I realized you'd already fixed this in the patch that
> added the eltsize paramter to c_strlen which has already been committed
> :-)
> 
> 
> 
>>
>> Well, yes, although I changed my mind on strlen(L"abc") meanwhile.
>>
>> This may appear as if I contradict myself but, it is more like I learn.
>>
>> The installed patch for the ELTSIZE parameter was in part inspired by the
>> thought about "not folding invalid stuff" that was forming in my mind
>> at that time, even before I wrote it down and sent it to this list.
> ACK.  And I think there seems to be consensus forming around that
> concept which is good.
> 
> If we ultimately decide not to fold strlen of a wide character string,
> then that'll be an easy enough change to make.  In the mean time bring
> consistent with how the C library strlen works is a good thing IMHO.
> 
> 
> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>  return build_int_cst (integer_type_node, type_to_class (TREE_TYPE 
> (arg)));
>}
>
> -/* Fold a call to __builtin_strlen with argument ARG.  */
> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>
>static tree
> -fold_builtin_strlen (location_t loc, 

Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-24 Thread Bernd Edlinger
On 08/24/18 18:51, Jeff Law wrote:
> On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
>>
>>> -  if (compare_tree_int (array_size, length + 1) < 0)
>>> +  if (nulterm)
>> but here you compare bytes with length which is measued un chars.
>>
>>> +*nulterm = array_elts > length;
>>> +  else if (array_elts <= length)
>>>   return NULL_TREE;
>>>   
>>> *ptr_offset = offset;
> Actually in the V2 patch length is measured by element size.  So I think
> this is a non-issue.
> 
> 
> 
>>> -  /* Compute and store the length of the substring at OFFSET.
>>> +  /* Compute and store the size of the substring at OFFSET.
>>>  All offsets past the initial length refer to null strings.  */
>>> -  if (offset <= string_length)
>>> -   *strlen = string_length - offset;
>> this should be offset < string_length.
>>
>>> +  if (offset <= string_size)
>>> +   *strsize = string_size - offset;
>>> else
>>> -   *strlen = 0;
>> this should be 1, you may access the NUL byte of "".
>>
>>> +   *strsize = 0;
>>>   }
> Agreed in both cases based on the defined behavior for the function (NUL
> is included).
> 
> 
>>>   
>>> -  const char *string = TREE_STRING_POINTER (src);
>>> -
>>> -  if (string_length == 0
>>> -  || offset >= string_size)
>>> +  if (string_size == 0
>>> +  || offset >= array_size)
>>>   return NULL;
>>>   
>>> -  if (strsize)
>>> -{
>>> -  /* Support even constant character arrays that aren't proper
>>> -NUL-terminated strings.  */
>>> -  *strsize = string_size;
>>> -}
>>> -  else if (string[string_length - 1] != '\0')
>> Well, this is broken for wide character strings.
>> but I hope we can get rid of STRING_CST which are
>> not explicitly null terminated.

I am afraid that is not going to happen.
Maybe we can get STRING_CST that are never longer
than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
need to take care that the string is zero-terminated.

string_constant, should not promise the string is zero terminated.
But instead it can promise that:
1) the STRING_CST is valid up to TREE_STRING_LENGTH
2) mem_size is >= TREE_STRING_LENGTH
3) memory between TREE_STRING_LENGTH and mem_size is ZERO.

It will not guarantee anything about zero termination any more.

But that will again need an adjustment to the string_constant
and likely to my 86711/86714/87053 patch set.

Sorry for all this back and forth.

> This hunk is gone in the V2 patch.  So I'm not going to worry about it
> right now.
> 
> 

Hmm.  In my tree I wanted to drop this check first, but that
did not work right. So I skip the check if strsize is not NULL.
But if it is NULL, I check the element size is 1, if it is
not 1 return NULL, because it is no character string.

I had several iterations here, and it might still be dependent
on semantical properties of STRING_CST that may not be be granted
in the light of my recent discoveries.


In the end, the best approach might be to either merge my patch
with Martins, or step-wise, first fixing wrong code, and then
implementing warnings without fixing wrong code.

Anyway, the smaller the patch the better for the review.


Bernd.


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-24 Thread Jeff Law
On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
> 
>> -  if (compare_tree_int (array_size, length + 1) < 0)
>> +  if (nulterm)
> but here you compare bytes with length which is measued un chars.
> 
>> +*nulterm = array_elts > length;
>> +  else if (array_elts <= length)
>>  return NULL_TREE;
>>  
>>*ptr_offset = offset;
Actually in the V2 patch length is measured by element size.  So I think
this is a non-issue.



>> -  /* Compute and store the length of the substring at OFFSET.
>> +  /* Compute and store the size of the substring at OFFSET.
>>   All offsets past the initial length refer to null strings.  */
>> -  if (offset <= string_length)
>> -*strlen = string_length - offset;
> this should be offset < string_length.
> 
>> +  if (offset <= string_size)
>> +*strsize = string_size - offset;
>>else
>> -*strlen = 0;
> this should be 1, you may access the NUL byte of "".
> 
>> +*strsize = 0;
>>  }
Agreed in both cases based on the defined behavior for the function (NUL
is included).


>>  
>> -  const char *string = TREE_STRING_POINTER (src);
>> -
>> -  if (string_length == 0
>> -  || offset >= string_size)
>> +  if (string_size == 0
>> +  || offset >= array_size)
>>  return NULL;
>>  
>> -  if (strsize)
>> -{
>> -  /* Support even constant character arrays that aren't proper
>> - NUL-terminated strings.  */
>> -  *strsize = string_size;
>> -}
>> -  else if (string[string_length - 1] != '\0')
> Well, this is broken for wide character strings.
> but I hope we can get rid of STRING_CST which are
> not explicitly null terminated.
This hunk is gone in the V2 patch.  So I'm not going to worry about it
right now.



> 
>> +  if (!nulterm && string[string_size - 1] != '\0')
>>  {
>> -  /* Support only properly NUL-terminated strings but handle
>> - consecutive strings within the same array, such as the six
>> - substrings in "1\0002\0003".  */
>> +  /* When NULTERM is null, support only properly nul-terminated
>> + strings but handle consecutive strings within the same array,
>> + such as the six substrings in "1\0002\0003".  Otherwise, let
>> + the caller deal with non-nul-terminated arrays.  */
>>return NULL;
>>  }
>>  
>> -  return offset <= string_length ? string + offset : "";
> this should be offset < string_size.
> 
>> +  return offset <= string_size ? string + offset : "";
Agreed.

jeff


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-24 Thread Jeff Law
On 08/24/2018 06:27 AM, Bernd Edlinger wrote:
[ Lots of snipping throughout ]


>>>
 +
 if (TREE_CODE (src) == COND_EXPR
 && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0
   {
 -  tree len1, len2;
 -
 -  len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
 -  len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
 +  tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
 +  tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
 if (tree_int_cst_equal (len1, len2))
 -  return len1;
 +  {
>>>
>>> funny, if called with NULL *arr and arrs[0] alias each other.
>>
>>>
 +*arr = arrs[0] ? arrs[0] : arrs[1];
 +return len1;
 +  }
   }
>>
>> And in this case it looks like your comment is before the code you're
>> commenting about.  It was fairly obvious in this case because the code
>> prior to your "funny, if called..." comment didn't reference arr or arrs
>> at all.
>>
>> And more importantly, why are you concerned about the aliasing?
>>
> 
> It is just *arr = arrs[0] does nothing, but it looks like the author
> was not aware of it.  It may be okay, but causes head-scratching.
> If you don't have the context you will think this does something different.
*arr = arrs[0] ? arrs[0] : arrs[1];

Yes, there's potentially a dead store when arr points to arrs[0] because
of the earlier initialization when NULL was passed for arr.  But
otherwise we'd be checking repeatedly that arr != NULL.


 /* PTR can point to the byte representation of any string type, 
 including
char* and wchar_t*.  */
 const char *ptr = TREE_STRING_POINTER (src);
 @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
 offsave = fold_convert (ssizetype, offsave);
 tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, 
 offsave,
  build_int_cst (ssizetype, len * eltsize));
>>>
>>> this computation is wrong, it computes not in units of eltsize,
>>> I am however not sure if it is really good that this function tries to
>>> compute strlen of wide character strings.
>> Note that in the second version of Martin's patch eltsize will always be
>> 1 when we get here.  Furthermore, the multiplication by eltsize is gone.
>>   It looks like you switched back to commenting after the code for that
>> comment, but then immediately thereafter...
>>
> 
> Acutally I had no idea that the second patch did resolve some of my comments
> and which those were.
It happens.  Multiple long threads make it difficult to follow.

> 
> I had the impression that it is just splitting up of a large patch into
> several smaller without reworking at the same time.
> 
> Once again, a summary what was changed would be helpful.
Agreed.


>>> What are you fixing here, I think that was another bug.
>>> If this fixes something then it should be in a different patch,
>>> just handling this.
>>>
 unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
 -  maxelts - eltoff);
 +  strelts - eltoff);
   
 return ssize_int (len);
   }
>> I'm guessing the comment in this case refers to the code after.
>> Presumably questioning the change from maxelts to strelts.
>>
>>
> 
> Yes.  I was thinking that could be a patch of its own.
Funny.  I found myself in agreement and was going to extract this out
and run it through the usual tests and get it onto the trunk
immediately.  Then I realized you'd already fixed this in the patch that
added the eltsize paramter to c_strlen which has already been committed
:-)



> 
> Well, yes, although I changed my mind on strlen(L"abc") meanwhile.
> 
> This may appear as if I contradict myself but, it is more like I learn.
> 
> The installed patch for the ELTSIZE parameter was in part inspired by the
> thought about "not folding invalid stuff" that was forming in my mind
> at that time, even before I wrote it down and sent it to this list.
ACK.  And I think there seems to be consensus forming around that
concept which is good.

If we ultimately decide not to fold strlen of a wide character string,
then that'll be an easy enough change to make.  In the mean time bring
consistent with how the C library strlen works is a good thing IMHO.


 @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
 return build_int_cst (integer_type_node, type_to_class (TREE_TYPE 
 (arg)));
   }
   
 -/* Fold a call to __builtin_strlen with argument ARG.  */
 +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
   
   static tree
 -fold_builtin_strlen (location_t loc, tree type, tree arg)
 +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
   {
 if (!validate_arg (arg, POINTER_TYPE))
   return NULL_TREE;
 else
   {

Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-24 Thread Bernd Edlinger
On 08/24/18 08:36, Jeff Law wrote:
> On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
>> On 08/02/18 04:44, Martin Sebor wrote:
>>> Since the foundation of the patch is detecting and avoiding
>>> the overly aggressive folding of unterminated char arrays,
>>> besides issuing a warning for such arguments to strlen,
>>> the patch also fixes pr86711 - wrong folding of memchr, and
>>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>>
>>> The substance of the attached updated patch is unchanged,
>>> I have just added test cases for the two additional bugs.
>>>
>>> Bernd, as I mentioned Wednesday, the patch supersedes
>>> yours here:
>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>>
>>
>> No problem, but I hope you understand, that I still uphold
>> my patch.
>>
>> So we have two patches now:
>> - mine, fixing a wrong code bug,
>> - yours, implementing a new warning and fixing a wrong
>> code bug at the same time.
>>
>> I will add a few comments to your patch below.
> [ ... ]
> 
> So a lot of the comments are out of date, presumably because Martin
> fixed the issues you pointed out in his second version of the patch.
> But there's still some useful nuggets in your comments that are still
> relevant.
> 

Yes, possible, and therefore I would like updated patches summarize
the changes.

> FYI it appears that sometimes you comment above a chunk of code, and
> other times below.  That makes it exceedingly difficult to figure out
> the issue you're trying to raise.
> 

Okydoky.


> 
>>
>>> Martin
>>>
>>> On 07/30/2018 01:17 PM, Martin Sebor wrote:
 Attached is an updated version of the patch that handles more
 instances of calling strlen() on a constant array that is not
 a nul-terminated string.

 No other functions except strlen are explicitly handled yet,
 and neither are constant arrays with braced-initializer lists
 like const char a[] = { 'a', 'b', 'c' };  I am testing
 an independent solution for those (bug 86552).  Once those
 are handled the warning will be able to detect those as well.

 Tested on x86_64-linux.

 On 07/25/2018 05:38 PM, Martin Sebor wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>
> The fix for bug 86532 has been checked in so this enhancement
> can now be applied on top of it (with only minor adjustments).
>
> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>> In the discussion of my patch for pr86532 Bernd noted that
>> GCC silently accepts constant character arrays with no
>> terminating nul as arguments to strlen (and other string
>> functions).
>>
>> The attached patch is a first step in detecting these kinds
>> of bugs in strlen calls by issuing -Wstringop-overflow.
>> The next step is to modify all other handlers of built-in
>> functions to detect the same problem (not part of this patch).
>> Yet another step is to detect these problems in arguments
>> initialized using the non-string form:
>>
>>    const char a[] = { 'a', 'b', 'c' };
>>
>> This patch is meant to apply on top of the one for bug 86532
>> (I tested it with an earlier version of that patch so there
>> is code in the context that does not appear in the latest
>> version of the other diff).
>>
>> Martin
>>
>

>>>
>>> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long 
>>> initializer
>>> PR tree-optimization/86711 - wrong folding of memchr
>>> PR tree-optimization/86552 - missing warning for reading past the end of 
>>> non-string arrays
>>>
>>> gcc/ChangeLog:
>>>
>>> PR tree-optimization/86714
>>> PR tree-optimization/86711
>>> PR tree-optimization/86552
>>> * builtins.h (warn_string_no_nul): Declare..
>>> (c_strlen): Add argument.
>>> * builtins.c (warn_string_no_nul): New function.
>>> (fold_builtin_strlen): Add argument.  Detect missing nul.
>>> (fold_builtin_1): Adjust.
>>> (string_length): Add argument and use it.
>>> (c_strlen): Same.
>>> (expand_builtin_strlen): Detect missing nul.
>>> * expr.c (string_constant): Add arguments.  Detect missing nul
>>> terminator and outermost declaration it's missing in.
>>> * expr.h (string_constant): Add argument.
>>> * fold-const.c (c_getstr): Change argument to bool*, rename
>>> other arguments.
>>> * fold-const-call.c (fold_const_call): Detect missing nul.
>>> * gimple-fold.c (get_range_strlen): Add argument.
>>> (get_maxval_strlen): Adjust.
>>> * gimple-fold.h (get_range_strlen): Add argument.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR tree-optimization/86714
>>> PR tree-optimization/86711
>>> PR tree-optimization/86552
>>> * gcc.c-torture/execute/memchr-1.c: New test.
>>> * gcc.c-torture/execute/pr86714.c: New test.
>>> * gcc.dg/warn-strlen-no-nul.c: New test.
>>>
>>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> index aa3e0d8..f4924d5 

Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-24 Thread Jeff Law
On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
> On 08/02/18 04:44, Martin Sebor wrote:
>> Since the foundation of the patch is detecting and avoiding
>> the overly aggressive folding of unterminated char arrays,
>> besides issuing a warning for such arguments to strlen,
>> the patch also fixes pr86711 - wrong folding of memchr, and
>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>
>> The substance of the attached updated patch is unchanged,
>> I have just added test cases for the two additional bugs.
>>
>> Bernd, as I mentioned Wednesday, the patch supersedes
>> yours here:
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>
> 
> No problem, but I hope you understand, that I still uphold
> my patch.
> 
> So we have two patches now:
> - mine, fixing a wrong code bug,
> - yours, implementing a new warning and fixing a wrong
> code bug at the same time.
> 
> I will add a few comments to your patch below.
[ ... ]

So a lot of the comments are out of date, presumably because Martin
fixed the issues you pointed out in his second version of the patch.
But there's still some useful nuggets in your comments that are still
relevant.

FYI it appears that sometimes you comment above a chunk of code, and
other times below.  That makes it exceedingly difficult to figure out
the issue you're trying to raise.


> 
>> Martin
>>
>> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>>> Attached is an updated version of the patch that handles more
>>> instances of calling strlen() on a constant array that is not
>>> a nul-terminated string.
>>>
>>> No other functions except strlen are explicitly handled yet,
>>> and neither are constant arrays with braced-initializer lists
>>> like const char a[] = { 'a', 'b', 'c' };  I am testing
>>> an independent solution for those (bug 86552).  Once those
>>> are handled the warning will be able to detect those as well.
>>>
>>> Tested on x86_64-linux.
>>>
>>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
 Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html

 The fix for bug 86532 has been checked in so this enhancement
 can now be applied on top of it (with only minor adjustments).

 On 07/19/2018 02:08 PM, Martin Sebor wrote:
> In the discussion of my patch for pr86532 Bernd noted that
> GCC silently accepts constant character arrays with no
> terminating nul as arguments to strlen (and other string
> functions).
>
> The attached patch is a first step in detecting these kinds
> of bugs in strlen calls by issuing -Wstringop-overflow.
> The next step is to modify all other handlers of built-in
> functions to detect the same problem (not part of this patch).
> Yet another step is to detect these problems in arguments
> initialized using the non-string form:
>
>   const char a[] = { 'a', 'b', 'c' };
>
> This patch is meant to apply on top of the one for bug 86532
> (I tested it with an earlier version of that patch so there
> is code in the context that does not appear in the latest
> version of the other diff).
>
> Martin
>

>>>
>>
>> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long 
>> initializer
>> PR tree-optimization/86711 - wrong folding of memchr
>> PR tree-optimization/86552 - missing warning for reading past the end of 
>> non-string arrays
>>
>> gcc/ChangeLog:
>>
>>  PR tree-optimization/86714
>>  PR tree-optimization/86711
>>  PR tree-optimization/86552
>>  * builtins.h (warn_string_no_nul): Declare..
>>  (c_strlen): Add argument.
>>  * builtins.c (warn_string_no_nul): New function.
>>  (fold_builtin_strlen): Add argument.  Detect missing nul.
>>  (fold_builtin_1): Adjust.
>>  (string_length): Add argument and use it.
>>  (c_strlen): Same.
>>  (expand_builtin_strlen): Detect missing nul.
>>  * expr.c (string_constant): Add arguments.  Detect missing nul
>>  terminator and outermost declaration it's missing in.
>>  * expr.h (string_constant): Add argument.
>>  * fold-const.c (c_getstr): Change argument to bool*, rename
>>  other arguments.
>>  * fold-const-call.c (fold_const_call): Detect missing nul.
>>  * gimple-fold.c (get_range_strlen): Add argument.
>>  (get_maxval_strlen): Adjust.
>>  * gimple-fold.h (get_range_strlen): Add argument.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR tree-optimization/86714
>>  PR tree-optimization/86711
>>  PR tree-optimization/86552
>>  * gcc.c-torture/execute/memchr-1.c: New test.
>>  * gcc.c-torture/execute/pr86714.c: New test.
>>  * gcc.dg/warn-strlen-no-nul.c: New test.
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index aa3e0d8..f4924d5 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, 
>> int);
>>  static rtx expand_builtin_expect (tree, rtx);
>>  static tree fold_builtin_constant_p 

Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-17 Thread Martin Sebor

On 08/16/2018 11:14 PM, Jeff Law wrote:

On 08/01/2018 08:44 PM, Martin Sebor wrote:

Since the foundation of the patch is detecting and avoiding
the overly aggressive folding of unterminated char arrays,
besides issuing a warning for such arguments to strlen,
the patch also fixes pr86711 - wrong folding of memchr, and
pr86714 - tree-ssa-forwprop.c confused by too long initializer.

The substance of the attached updated patch is unchanged,
I have just added test cases for the two additional bugs.


Just to be absolutely sure.  The patch attached to this message is
superceded by the later 6 part patchkit that fixes 86714, 86711 and 86552.

I'm just trying to make sure I'm looking at the right kits from both you
and Bernd for the next step.


Yes, I broke up this patch into a series, with the basic
"infrastructure" in part 1:

[PATCH 1/6] prevent folding of unterminated const arrays (PR
86711, 86714)
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00769.html

and the strlen missing nul detection in part 2:
[PATCH 2/6] detect unterminated const arrays in strlen calls
(PR 86552)
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00778.html

The rest of the series (parts 3 through 6) add the missing
nul detection to a few other built-ins.

Martin



Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-16 Thread Jeff Law
On 08/01/2018 08:44 PM, Martin Sebor wrote:
> Since the foundation of the patch is detecting and avoiding
> the overly aggressive folding of unterminated char arrays,
> besides issuing a warning for such arguments to strlen,
> the patch also fixes pr86711 - wrong folding of memchr, and
> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
> 
> The substance of the attached updated patch is unchanged,
> I have just added test cases for the two additional bugs.
> 
Just to be absolutely sure.  The patch attached to this message is
superceded by the later 6 part patchkit that fixes 86714, 86711 and 86552.

I'm just trying to make sure I'm looking at the right kits from both you
and Bernd for the next step.

Thanks,
Jeff



Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-14 Thread Jeff Law
On 08/03/2018 07:00 AM, Bernd Edlinger wrote:
> On 08/02/18 22:34, Martin Sebor wrote:
>> On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
>>> On 08/02/18 15:26, Bernd Edlinger wrote:
>
>    /* If the length can be computed at compile-time, return it.  */
> -  len = c_strlen (src, 0);
> +  tree array;
> +  tree len = c_strlen (src, 0, );

 You know the c_strlen tries to compute wide character sizes,
 but strlen does not do that, strlen (L"abc") should give 1
 (or 0 on a BE machine)
 I wonder if that is correct.

>>> [snip]
>
>  static tree
> -fold_builtin_strlen (location_t loc, tree type, tree arg)
> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>  {
>    if (!validate_arg (arg, POINTER_TYPE))
>  return NULL_TREE;
>    else
>  {
> -  tree len = c_strlen (arg, 0);
> -
> +  tree arr = NULL_TREE;
> +  tree len = c_strlen (arg, 0, );

 Is it possible to write a test case where strlen(L"test") reaches this 
 point?
 what will c_strlen return then?

>>>
>>> Yes, of course it is:
>>>
>>> $ cat y.c
>>> int f(char *x)
>>> {
>>>    return __builtin_strlen(x);
>>> }
>>>
>>> int main ()
>>> {
>>>    return f((char*)"abcdef"[0]);
>>> }
>>> $ gcc -O3 -S y.c
>>> $ cat y.s
>>> main:
>>> .LFB1:
>>> .cfi_startproc
>>> movl    $6, %eax
>>> ret
>>> .cfi_endproc
>>>
>>> The reason is that c_strlen tries to fold wide chars at all.
>>> I do not know when that was introduced, was that already before your last 
>>> patches?
>>
>> The function itself was introduced in 1992 if not earlier,
>> before wide strings even existed.  AFAICS, it has always
>> accepted strings of all widths.  Until r241489 (in GCC 7)
>> it computed their length in bytes, not characters.  I don't
>> know if that was on purpose or if it was just never changed
>> to compute the length in characters when wide strings were
>> first introduced.  From the name I assume it's the latter.
>> The difference wasn't detected until sprintf tests were added
>> for wide string directives.  The ChangeLog description for
>> the change reads: Correctly handle wide strings.  I didn't
>> consider pathological cases like strlen (L"abc").  It
>> shouldn't be difficult to change to fix this case.
>>
> 
> Oh, oh, oh
> 
> $ cat y3.c
> int main ()
> {
>char c[100];
>int x = __builtin_sprintf (c, "%S", L"\u");
> 
>__builtin_printf("%d %ld\n", x,__builtin_strlen(c));
> }
> 
> $ gcc-4.8 -O3 -std=c99 y3.c
> $ ./a.out
> -1 0
> $ gcc -O3 y3.c
> $ ./a.out
> 1 0
> $ echo $LANG
> de_DE.UTF-8
> 
> I would have expected L"\u" to converted to UTF-8
> or another encoding, so the return value if sprintf is
> far from obvious, and probably language dependent.
FWIW, Martin has a patch (under review) that I think will fix this and
includes a testcase that is likely inspired by the code above.

> 
> Why do you think it is a good idea to use really every
> opportunity to optimize totally unnecessary things like
> using the return value from the sprintf function as it is?
> 
> Did you never think this adds a significant maintenance
> burden on the rest of us as well?
It largely came along for free during the implementation of the sprintf
warnings.  That's changed a bit over time, but it's still the case that
the sprintf warnings do all the analysis necessary to optimize the
sprintf return value.

As both Martin and I have stated before the real goal is getting good
warnings from sprintf.  Optimization is a distant second.

jeff


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-03 Thread Martin Sebor

On 08/03/2018 07:00 AM, Bernd Edlinger wrote:

On 08/02/18 22:34, Martin Sebor wrote:

On 08/02/2018 12:56 PM, Bernd Edlinger wrote:

On 08/02/18 15:26, Bernd Edlinger wrote:


   /* If the length can be computed at compile-time, return it.  */
-  len = c_strlen (src, 0);
+  tree array;
+  tree len = c_strlen (src, 0, );


You know the c_strlen tries to compute wide character sizes,
but strlen does not do that, strlen (L"abc") should give 1
(or 0 on a BE machine)
I wonder if that is correct.


[snip]


 static tree
-fold_builtin_strlen (location_t loc, tree type, tree arg)
+fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
 {
   if (!validate_arg (arg, POINTER_TYPE))
 return NULL_TREE;
   else
 {
-  tree len = c_strlen (arg, 0);
-
+  tree arr = NULL_TREE;
+  tree len = c_strlen (arg, 0, );


Is it possible to write a test case where strlen(L"test") reaches this point?
what will c_strlen return then?



Yes, of course it is:

$ cat y.c
int f(char *x)
{
   return __builtin_strlen(x);
}

int main ()
{
   return f((char*)"abcdef"[0]);
}
$ gcc -O3 -S y.c
$ cat y.s
main:
.LFB1:
.cfi_startproc
movl$6, %eax
ret
.cfi_endproc

The reason is that c_strlen tries to fold wide chars at all.
I do not know when that was introduced, was that already before your last 
patches?


The function itself was introduced in 1992 if not earlier,
before wide strings even existed.  AFAICS, it has always
accepted strings of all widths.  Until r241489 (in GCC 7)
it computed their length in bytes, not characters.  I don't
know if that was on purpose or if it was just never changed
to compute the length in characters when wide strings were
first introduced.  From the name I assume it's the latter.
The difference wasn't detected until sprintf tests were added
for wide string directives.  The ChangeLog description for
the change reads: Correctly handle wide strings.  I didn't
consider pathological cases like strlen (L"abc").  It
shouldn't be difficult to change to fix this case.



Oh, oh, oh

$ cat y3.c
int main ()
{
   char c[100];
   int x = __builtin_sprintf (c, "%S", L"\u");

   __builtin_printf("%d %ld\n", x,__builtin_strlen(c));
}

$ gcc-4.8 -O3 -std=c99 y3.c
$ ./a.out
-1 0
$ gcc -O3 y3.c
$ ./a.out
1 0
$ echo $LANG
de_DE.UTF-8

I would have expected L"\u" to converted to UTF-8
or another encoding, so the return value if sprintf is
far from obvious, and probably language dependent.

Why do you think it is a good idea to use really every
opportunity to optimize totally unnecessary things like
using the return value from the sprintf function as it is?

Did you never think this adds a significant maintenance
burden on the rest of us as well?


Your condescending tone is uncalled for, and you clearly speak
out of ignorance.  I don't owe you an explanation but as I have
said multiple times: most of my work, including the sprintf pass,
is primarily motivated by detecting bugs like buffer overflow.
Optimization is only a secondary goal (but bug detection depends
on it).  It may come as a shock to you but mistakes happen.
That's why it's important to make an effort to detect them.
This is one is a simple typo (handling %S the same way as %s
instead of %ls.

If you are incapable of a professional tone I would suggest
you go harass someone else.

Martin


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-03 Thread Bernd Edlinger
On 08/02/18 22:34, Martin Sebor wrote:
> On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
>> On 08/02/18 15:26, Bernd Edlinger wrote:

    /* If the length can be computed at compile-time, return it.  */
 -  len = c_strlen (src, 0);
 +  tree array;
 +  tree len = c_strlen (src, 0, );
>>>
>>> You know the c_strlen tries to compute wide character sizes,
>>> but strlen does not do that, strlen (L"abc") should give 1
>>> (or 0 on a BE machine)
>>> I wonder if that is correct.
>>>
>> [snip]

  static tree
 -fold_builtin_strlen (location_t loc, tree type, tree arg)
 +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
  {
    if (!validate_arg (arg, POINTER_TYPE))
  return NULL_TREE;
    else
  {
 -  tree len = c_strlen (arg, 0);
 -
 +  tree arr = NULL_TREE;
 +  tree len = c_strlen (arg, 0, );
>>>
>>> Is it possible to write a test case where strlen(L"test") reaches this 
>>> point?
>>> what will c_strlen return then?
>>>
>>
>> Yes, of course it is:
>>
>> $ cat y.c
>> int f(char *x)
>> {
>>    return __builtin_strlen(x);
>> }
>>
>> int main ()
>> {
>>    return f((char*)"abcdef"[0]);
>> }
>> $ gcc -O3 -S y.c
>> $ cat y.s
>> main:
>> .LFB1:
>> .cfi_startproc
>> movl    $6, %eax
>> ret
>> .cfi_endproc
>>
>> The reason is that c_strlen tries to fold wide chars at all.
>> I do not know when that was introduced, was that already before your last 
>> patches?
> 
> The function itself was introduced in 1992 if not earlier,
> before wide strings even existed.  AFAICS, it has always
> accepted strings of all widths.  Until r241489 (in GCC 7)
> it computed their length in bytes, not characters.  I don't
> know if that was on purpose or if it was just never changed
> to compute the length in characters when wide strings were
> first introduced.  From the name I assume it's the latter.
> The difference wasn't detected until sprintf tests were added
> for wide string directives.  The ChangeLog description for
> the change reads: Correctly handle wide strings.  I didn't
> consider pathological cases like strlen (L"abc").  It
> shouldn't be difficult to change to fix this case.
> 

Oh, oh, oh

$ cat y3.c
int main ()
{
   char c[100];
   int x = __builtin_sprintf (c, "%S", L"\u");

   __builtin_printf("%d %ld\n", x,__builtin_strlen(c));
}

$ gcc-4.8 -O3 -std=c99 y3.c
$ ./a.out
-1 0
$ gcc -O3 y3.c
$ ./a.out
1 0
$ echo $LANG
de_DE.UTF-8

I would have expected L"\u" to converted to UTF-8
or another encoding, so the return value if sprintf is
far from obvious, and probably language dependent.

Why do you think it is a good idea to use really every
opportunity to optimize totally unnecessary things like
using the return value from the sprintf function as it is?

Did you never think this adds a significant maintenance
burden on the rest of us as well?


Bernd.


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-02 Thread Martin Sebor

On 08/02/2018 12:56 PM, Bernd Edlinger wrote:

On 08/02/18 15:26, Bernd Edlinger wrote:


   /* If the length can be computed at compile-time, return it.  */
-  len = c_strlen (src, 0);
+  tree array;
+  tree len = c_strlen (src, 0, );


You know the c_strlen tries to compute wide character sizes,
but strlen does not do that, strlen (L"abc") should give 1
(or 0 on a BE machine)
I wonder if that is correct.


[snip]


 static tree
-fold_builtin_strlen (location_t loc, tree type, tree arg)
+fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
 {
   if (!validate_arg (arg, POINTER_TYPE))
 return NULL_TREE;
   else
 {
-  tree len = c_strlen (arg, 0);
-
+  tree arr = NULL_TREE;
+  tree len = c_strlen (arg, 0, );


Is it possible to write a test case where strlen(L"test") reaches this point?
what will c_strlen return then?



Yes, of course it is:

$ cat y.c
int f(char *x)
{
   return __builtin_strlen(x);
}

int main ()
{
   return f((char*)"abcdef"[0]);
}
$ gcc -O3 -S y.c
$ cat y.s
main:
.LFB1:
.cfi_startproc
movl$6, %eax
ret
.cfi_endproc

The reason is that c_strlen tries to fold wide chars at all.
I do not know when that was introduced, was that already before your last 
patches?


The function itself was introduced in 1992 if not earlier,
before wide strings even existed.  AFAICS, it has always
accepted strings of all widths.  Until r241489 (in GCC 7)
it computed their length in bytes, not characters.  I don't
know if that was on purpose or if it was just never changed
to compute the length in characters when wide strings were
first introduced.  From the name I assume it's the latter.
The difference wasn't detected until sprintf tests were added
for wide string directives.  The ChangeLog description for
the change reads: Correctly handle wide strings.  I didn't
consider pathological cases like strlen (L"abc").  It
shouldn't be difficult to change to fix this case.

Martin


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-02 Thread Bernd Edlinger
On 08/02/18 15:26, Bernd Edlinger wrote:
>>
>>    /* If the length can be computed at compile-time, return it.  */
>> -  len = c_strlen (src, 0);
>> +  tree array;
>> +  tree len = c_strlen (src, 0, );
> 
> You know the c_strlen tries to compute wide character sizes,
> but strlen does not do that, strlen (L"abc") should give 1
> (or 0 on a BE machine)
> I wonder if that is correct.
> 
[snip]
>>
>>  static tree
>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>  {
>>    if (!validate_arg (arg, POINTER_TYPE))
>>  return NULL_TREE;
>>    else
>>  {
>> -  tree len = c_strlen (arg, 0);
>> -
>> +  tree arr = NULL_TREE;
>> +  tree len = c_strlen (arg, 0, );
> 
> Is it possible to write a test case where strlen(L"test") reaches this point?
> what will c_strlen return then?
> 

Yes, of course it is:

$ cat y.c
int f(char *x)
{
   return __builtin_strlen(x);
}

int main ()
{
   return f((char*)"abcdef"[0]);
}
$ gcc -O3 -S y.c
$ cat y.s
main:
.LFB1:
.cfi_startproc
movl$6, %eax
ret
.cfi_endproc

The reason is that c_strlen tries to fold wide chars at all.
I do not know when that was introduced, was that already before your last 
patches?
Is it possible to revert the last few patches cleanly?


Bernd.


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-02 Thread Bernd Edlinger
On 08/02/18 04:44, Martin Sebor wrote:
> Since the foundation of the patch is detecting and avoiding
> the overly aggressive folding of unterminated char arrays,
> besides issuing a warning for such arguments to strlen,
> the patch also fixes pr86711 - wrong folding of memchr, and
> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
> 
> The substance of the attached updated patch is unchanged,
> I have just added test cases for the two additional bugs.
> 
> Bernd, as I mentioned Wednesday, the patch supersedes
> yours here:
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
> 

No problem, but I hope you understand, that I still uphold
my patch.

So we have two patches now:
- mine, fixing a wrong code bug,
- yours, implementing a new warning and fixing a wrong
code bug at the same time.

I will add a few comments to your patch below.

> Martin
> 
> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>> Attached is an updated version of the patch that handles more
>> instances of calling strlen() on a constant array that is not
>> a nul-terminated string.
>>
>> No other functions except strlen are explicitly handled yet,
>> and neither are constant arrays with braced-initializer lists
>> like const char a[] = { 'a', 'b', 'c' };  I am testing
>> an independent solution for those (bug 86552).  Once those
>> are handled the warning will be able to detect those as well.
>>
>> Tested on x86_64-linux.
>>
>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>>
>>> The fix for bug 86532 has been checked in so this enhancement
>>> can now be applied on top of it (with only minor adjustments).
>>>
>>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
 In the discussion of my patch for pr86532 Bernd noted that
 GCC silently accepts constant character arrays with no
 terminating nul as arguments to strlen (and other string
 functions).

 The attached patch is a first step in detecting these kinds
 of bugs in strlen calls by issuing -Wstringop-overflow.
 The next step is to modify all other handlers of built-in
 functions to detect the same problem (not part of this patch).
 Yet another step is to detect these problems in arguments
 initialized using the non-string form:

   const char a[] = { 'a', 'b', 'c' };

 This patch is meant to apply on top of the one for bug 86532
 (I tested it with an earlier version of that patch so there
 is code in the context that does not appear in the latest
 version of the other diff).

 Martin

>>>
>>
> 
> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long 
> initializer
> PR tree-optimization/86711 - wrong folding of memchr
> PR tree-optimization/86552 - missing warning for reading past the end of 
> non-string arrays
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/86714
>   PR tree-optimization/86711
>   PR tree-optimization/86552
>   * builtins.h (warn_string_no_nul): Declare..
>   (c_strlen): Add argument.
>   * builtins.c (warn_string_no_nul): New function.
>   (fold_builtin_strlen): Add argument.  Detect missing nul.
>   (fold_builtin_1): Adjust.
>   (string_length): Add argument and use it.
>   (c_strlen): Same.
>   (expand_builtin_strlen): Detect missing nul.
>   * expr.c (string_constant): Add arguments.  Detect missing nul
>   terminator and outermost declaration it's missing in.
>   * expr.h (string_constant): Add argument.
>   * fold-const.c (c_getstr): Change argument to bool*, rename
>   other arguments.
>   * fold-const-call.c (fold_const_call): Detect missing nul.
>   * gimple-fold.c (get_range_strlen): Add argument.
>   (get_maxval_strlen): Adjust.
>   * gimple-fold.h (get_range_strlen): Add argument.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/86714
>   PR tree-optimization/86711
>   PR tree-optimization/86552
>   * gcc.c-torture/execute/memchr-1.c: New test.
>   * gcc.c-torture/execute/pr86714.c: New test.
>   * gcc.dg/warn-strlen-no-nul.c: New test.
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index aa3e0d8..f4924d5 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
>  static rtx expand_builtin_expect (tree, rtx);
>  static tree fold_builtin_constant_p (tree);
>  static tree fold_builtin_classify_type (tree);
> -static tree fold_builtin_strlen (location_t, tree, tree);
> +static tree fold_builtin_strlen (location_t, tree, tree, tree);
>  static tree fold_builtin_inf (location_t, tree, int);
>  static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>  static bool validate_arg (const_tree, enum tree_code code);
> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, 
> unsigned maxelts)
>return n;
>  }
>  
> +/* For a call expression EXP to a function that 

PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-01 Thread Martin Sebor

Since the foundation of the patch is detecting and avoiding
the overly aggressive folding of unterminated char arrays,
besides issuing a warning for such arguments to strlen,
the patch also fixes pr86711 - wrong folding of memchr, and
pr86714 - tree-ssa-forwprop.c confused by too long initializer.

The substance of the attached updated patch is unchanged,
I have just added test cases for the two additional bugs.

Bernd, as I mentioned Wednesday, the patch supersedes
yours here:
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html

Martin

On 07/30/2018 01:17 PM, Martin Sebor wrote:

Attached is an updated version of the patch that handles more
instances of calling strlen() on a constant array that is not
a nul-terminated string.

No other functions except strlen are explicitly handled yet,
and neither are constant arrays with braced-initializer lists
like const char a[] = { 'a', 'b', 'c' };  I am testing
an independent solution for those (bug 86552).  Once those
are handled the warning will be able to detect those as well.

Tested on x86_64-linux.

On 07/25/2018 05:38 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html

The fix for bug 86532 has been checked in so this enhancement
can now be applied on top of it (with only minor adjustments).

On 07/19/2018 02:08 PM, Martin Sebor wrote:

In the discussion of my patch for pr86532 Bernd noted that
GCC silently accepts constant character arrays with no
terminating nul as arguments to strlen (and other string
functions).

The attached patch is a first step in detecting these kinds
of bugs in strlen calls by issuing -Wstringop-overflow.
The next step is to modify all other handlers of built-in
functions to detect the same problem (not part of this patch).
Yet another step is to detect these problems in arguments
initialized using the non-string form:

  const char a[] = { 'a', 'b', 'c' };

This patch is meant to apply on top of the one for bug 86532
(I tested it with an earlier version of that patch so there
is code in the context that does not appear in the latest
version of the other diff).

Martin







PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
PR tree-optimization/86711 - wrong folding of memchr
PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	PR tree-optimization/86552
	* builtins.h (warn_string_no_nul): Declare..
	(c_strlen): Add argument.
	* builtins.c (warn_string_no_nul): New function.
	(fold_builtin_strlen): Add argument.  Detect missing nul.
	(fold_builtin_1): Adjust.
	(string_length): Add argument and use it.
	(c_strlen): Same.
	(expand_builtin_strlen): Detect missing nul.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Change argument to bool*, rename
	other arguments.
	* fold-const-call.c (fold_const_call): Detect missing nul.
	* gimple-fold.c (get_range_strlen): Add argument.
	(get_maxval_strlen): Adjust.
	* gimple-fold.h (get_range_strlen): Add argument.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	PR tree-optimization/86552
	* gcc.c-torture/execute/memchr-1.c: New test.
	* gcc.c-torture/execute/pr86714.c: New test.
	* gcc.dg/warn-strlen-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index aa3e0d8..f4924d5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+{
+  if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+  warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+}
+  else
+{
+  gcc_assert (fndecl);
+  warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+}
+
+  if (warned && DECL_P (nonstr))
+  

Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-08-01 Thread Martin Sebor

On 08/01/2018 10:34 AM, Martin Sebor wrote:

If you care about detecting bugs I would expect you to be
supportive rather than dismissive of this work, and helpful
in bringing it to fruition rather that putting it down or
questioning my priorities.  Especially since the work was
prompted by your own (valid) complaint that GCC doesn't
diagnose them.



You don't really listen to what I am saying, I did not say
that we need another warning instead of fixing the wrong
optimization issue at hand.

But I am in good company, you don't listen to Jakub and Richi
either.


I certainly intend to fix bugs I'm responsible for introducing.
I always do if given the chance.  I assume you are referring
to bug 86711 (and 86714).  Fixing the underlying problem has
been on my mind since you first mentioned it, and on my to-do
list since last week (bug 86688).


I've started looking into fixing 86711 but as it turns out,
by avoiding folding non-nul-terminated strings, this patch
already fixes it as well as producing the output you expect
for the test case in 86714, and also fixes 86688.

So unless you intend to pursue your patch I will assign all
these bugs to myself, add the test cases to this patch, and
resubmit it.  (I would normally prefer to deal with each bug
independently, but since I already have a working patch that
does the right thing I'd just as soon save the time and effort
and not try to break it up).

Martin


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-08-01 Thread Bernd Edlinger
On 08/01/18 18:34, Martin Sebor wrote:
>>> If you care about detecting bugs I would expect you to be
>>> supportive rather than dismissive of this work, and helpful
>>> in bringing it to fruition rather that putting it down or
>>> questioning my priorities.  Especially since the work was
>>> prompted by your own (valid) complaint that GCC doesn't
>>> diagnose them.
>>>
>>
>> You don't really listen to what I am saying, I did not say
>> that we need another warning instead of fixing the wrong
>> optimization issue at hand.
>>
>> But I am in good company, you don't listen to Jakub and Richi
>> either.
> 
> I certainly intend to fix bugs I'm responsible for introducing.
> I always do if given the chance.  I assume you are referring
> to bug 86711 (and 86714).  Fixing the underlying problem has
> been on my mind since you first mentioned it, and on my to-do
> list since last week (bug 86688).  You have now submitted
> a patch for both of the former, plus a follow-on patch, but
> you didn't assign either of the bugs to yourself, or indicated
> if the patch fixes 86688, or if you intend to work on it too.
> I haven't reviewed the patches in any detail except to note
> that they touch the same area as mine and likely conflict.
> I'm not sure what I should do now.  Work on fixing these bugs
> myself?  (I would prefer to.)  Try to rebase my work on top
> of yours to see what the conflicts are and try to resolve
> them them in my ongoing work?  Or just keep working on my
> stuff and deal with the conflicts after your patches have
> been committed?  Or continue to debate conflicting priorities
> and try to resolve them first?
> 
> (Those are mostly rhetorical questions.)  The point is that
> if you would just let me fix my bugs we would not have this
> conundrum.  Your test cases are helpful.  But as I have said
> over and over, submitting patches for the same code at the same
> time and even undoing some prior work with no coordination is
> a recipe for confusion and conflict.  I don't recall this
> happening in the past and I don't really understand what
> triggered it in this case.  This isn't an area that normally
> sees a lot of activity.
> 

Martin,

I am totally sorry for this confusion.  I would please
ask you to do your work a bit slower, and that we please
can talk over the direction in which we want to go on.
For instance in the moment not so many new warnings, when
we actually should look at correctness and reliability issues.
I do definitely not want to revert your work, but I will have
to hedge it where it goes too far, but that does not mean that
it will be worthless.

What made my alarm bells ring is the speed in which new buggy
features, are being implemented recently, while at the same time
several global reviewers raised concerns, which would not be
honored.  That is not a good thing.

To me it is an serious problem when those global reviewers
do not seem to agree on the way these features are implemented.

To be honest, I do not believe in democracy, or majority decisions.
But I always slow down when there is no consensus, and look for a
solution that is acceptable for all the key players.


Bernd.


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-08-01 Thread Martin Sebor

If you care about detecting bugs I would expect you to be
supportive rather than dismissive of this work, and helpful
in bringing it to fruition rather that putting it down or
questioning my priorities.  Especially since the work was
prompted by your own (valid) complaint that GCC doesn't
diagnose them.



You don't really listen to what I am saying, I did not say
that we need another warning instead of fixing the wrong
optimization issue at hand.

But I am in good company, you don't listen to Jakub and Richi
either.


I certainly intend to fix bugs I'm responsible for introducing.
I always do if given the chance.  I assume you are referring
to bug 86711 (and 86714).  Fixing the underlying problem has
been on my mind since you first mentioned it, and on my to-do
list since last week (bug 86688).  You have now submitted
a patch for both of the former, plus a follow-on patch, but
you didn't assign either of the bugs to yourself, or indicated
if the patch fixes 86688, or if you intend to work on it too.
I haven't reviewed the patches in any detail except to note
that they touch the same area as mine and likely conflict.
I'm not sure what I should do now.  Work on fixing these bugs
myself?  (I would prefer to.)  Try to rebase my work on top
of yours to see what the conflicts are and try to resolve
them them in my ongoing work?  Or just keep working on my
stuff and deal with the conflicts after your patches have
been committed?  Or continue to debate conflicting priorities
and try to resolve them first?

(Those are mostly rhetorical questions.)  The point is that
if you would just let me fix my bugs we would not have this
conundrum.  Your test cases are helpful.  But as I have said
over and over, submitting patches for the same code at the same
time and even undoing some prior work with no coordination is
a recipe for confusion and conflict.  I don't recall this
happening in the past and I don't really understand what
triggered it in this case.  This isn't an area that normally
sees a lot of activity.

Martin


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-08-01 Thread Bernd Edlinger
On 07/31/18 05:51, Martin Sebor wrote:
> On 07/30/2018 03:11 PM, Bernd Edlinger wrote:
>> Hi,
>>
>>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>>> maxelts = maxelts / eltsize - 1;
>>>   }
>>>
>>> +  /* Unless the caller is prepared to handle it by passing in a non-null
>>> + ARR, fail if the terminating nul doesn't fit in the array the string
>>> + is stored in (as in const char a[3] = "123";  */
>>> +  if (!arr && maxelts < strelts)
>>> +    return NULL_TREE;
>>> +
>>
>> this is c_strlen, how is the caller ever supposed to handle non-zero 
>> terminated strings???
>> especially if you do this above?
> 
> Callers that pass in a non-null ARR handle them by issuing
> a warning.  The rest get back a null result.  It should be
> evident from the rest of the patch.  It can be debated what
> each caller should do when it detects such a missing nul
> where one is expected.  Different approaches may be more
> or less appropriate for different callers/functions (e.g.,
> strcpy vs strlen).
> 

Sorry, right in the beginning you have "if (!add) arr = arrs;"

>>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>>> {
>>>   STRIP_NOPS (src);
>>> +
>>> +  /* Used to detect non-nul-terminated strings in subexpressions
>>> + of a conditional expression.  When ARR is null, point it at
>>> + one of the elements for simplicity.  */
>>> +  tree arrs[] = { NULL_TREE, NULL_TREE };
>>> +  if (!arr)
>>> +    arr = arrs;
>>
>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>>   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>>   length = string_length (TREE_STRING_POINTER (init), charsize,
>>>   length / charsize);
>>> -  if (compare_tree_int (array_size, length + 1) < 0)
>>> +  if (nulterm)
>>> +    *nulterm = array_elts > length;
>>> +  else if (array_elts <= length)
>>>     return NULL_TREE;
>>
>> I don't understand why you can't use
>> compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), TREE_STRING_LENGTH 
>> (init))
>> instead of this convoluted code above ???
>>
>> Sorry, this patch does not look like it is ready any time soon.
> 
> I'm open to technical comments on the substance of my changes
> but I'm not interested in your opinion of the readiness of
> the patch (whatever that might mean), certainly not if you
> have formed it after skimming a random handful of lines out
> of a 600 line patch.
> 

Sorry, again.  I just meant you should fix the issues, and
maybe make the patch a bit smaller.

>> But actually I am totally puzzled by your priorities.
>> This is what I see right now:
>>
>> 1) We have missing warnings.
>> 2) We have wrong code bugs.
>> 3) We have apparently a specification error on the C Language standard (*)
>>
>>
>> Why are you prioritizing 1) over 2) thus blocking my attempts to fix a wrong 
>> code
>> issue,and why do you not tackle 3) in your WG14?
> 
> My priorities are none of your concern.
> 

Sorry, again, but your priorities seem to conflict with mine.

> Your "attempts to fix" issues interfere with my work on a number
> of projects.  You are not being helpful -- instead, by submitting
> changes that you know fully well conflict with mine, you are
> impeding and undermining my work.  That is why I object to them.
> 
>> (*) which means that GCC is currently removing code from assertions
>> as I pointed out here: 
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01695.html
>>
>> This happens because GCC follows the language standards literally right now.
>>
>> I would say too literally, and it proves that the language standard's logic 
>> is
>> flawed IMHO.
> 
> I have no idea what your point is about standards, but bugs
> like the one in the example, including those arising from
> uninitialized arrays, could be detected with only minor
> enhancements to the tree-ssa-strlen pass.  Implementing some
> of this is among the projects I'm expected and expecting to
> work on for GCC 9.  This patch is a small step in that
> direction.
> 
> If you care about detecting bugs I would expect you to be
> supportive rather than dismissive of this work, and helpful
> in bringing it to fruition rather that putting it down or
> questioning my priorities.  Especially since the work was
> prompted by your own (valid) complaint that GCC doesn't
> diagnose them.
> 

You don't really listen to what I am saying, I did not say
that we need another warning instead of fixing the wrong
optimization issue at hand.

But I am in good company, you don't listen to Jakub and Richi
either.


Bernd.

> Martin
> 


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-07-30 Thread Martin Sebor

On 07/30/2018 03:11 PM, Bernd Edlinger wrote:

Hi,


@@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
maxelts = maxelts / eltsize - 1;
  }

+  /* Unless the caller is prepared to handle it by passing in a non-null
+ ARR, fail if the terminating nul doesn't fit in the array the string
+ is stored in (as in const char a[3] = "123";  */
+  if (!arr && maxelts < strelts)
+return NULL_TREE;
+


this is c_strlen, how is the caller ever supposed to handle non-zero terminated 
strings???
especially if you do this above?


Callers that pass in a non-null ARR handle them by issuing
a warning.  The rest get back a null result.  It should be
evident from the rest of the patch.  It can be debated what
each caller should do when it detects such a missing nul
where one is expected.  Different approaches may be more
or less appropriate for different callers/functions (e.g.,
strcpy vs strlen).


+c_strlen (tree src, int only_value, tree *arr /* = NULL */)
{
  STRIP_NOPS (src);
+
+  /* Used to detect non-nul-terminated strings in subexpressions
+ of a conditional expression.  When ARR is null, point it at
+ one of the elements for simplicity.  */
+  tree arrs[] = { NULL_TREE, NULL_TREE };
+  if (!arr)
+arr = arrs;



@@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
  unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
  length = string_length (TREE_STRING_POINTER (init), charsize,
  length / charsize);
-  if (compare_tree_int (array_size, length + 1) < 0)
+  if (nulterm)
+*nulterm = array_elts > length;
+  else if (array_elts <= length)
return NULL_TREE;


I don't understand why you can't use
compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), TREE_STRING_LENGTH (init))
instead of this convoluted code above ???

Sorry, this patch does not look like it is ready any time soon.


I'm open to technical comments on the substance of my changes
but I'm not interested in your opinion of the readiness of
the patch (whatever that might mean), certainly not if you
have formed it after skimming a random handful of lines out
of a 600 line patch.


But actually I am totally puzzled by your priorities.
This is what I see right now:

1) We have missing warnings.
2) We have wrong code bugs.
3) We have apparently a specification error on the C Language standard (*)


Why are you prioritizing 1) over 2) thus blocking my attempts to fix a wrong 
code
issue,and why do you not tackle 3) in your WG14?


My priorities are none of your concern.

Your "attempts to fix" issues interfere with my work on a number
of projects.  You are not being helpful -- instead, by submitting
changes that you know fully well conflict with mine, you are
impeding and undermining my work.  That is why I object to them.


(*) which means that GCC is currently removing code from assertions
as I pointed out here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01695.html

This happens because GCC follows the language standards literally right now.

I would say too literally, and it proves that the language standard's logic is
flawed IMHO.


I have no idea what your point is about standards, but bugs
like the one in the example, including those arising from
uninitialized arrays, could be detected with only minor
enhancements to the tree-ssa-strlen pass.  Implementing some
of this is among the projects I'm expected and expecting to
work on for GCC 9.  This patch is a small step in that
direction.

If you care about detecting bugs I would expect you to be
supportive rather than dismissive of this work, and helpful
in bringing it to fruition rather that putting it down or
questioning my priorities.  Especially since the work was
prompted by your own (valid) complaint that GCC doesn't
diagnose them.

Martin



Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-07-30 Thread Bernd Edlinger
Hi,

>@@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>   maxelts = maxelts / eltsize - 1;
>   }
> 
>+  /* Unless the caller is prepared to handle it by passing in a non-null
>+ ARR, fail if the terminating nul doesn't fit in the array the string
>+ is stored in (as in const char a[3] = "123";  */
>+  if (!arr && maxelts < strelts)
>+return NULL_TREE;
>+

this is c_strlen, how is the caller ever supposed to handle non-zero terminated 
strings???
especially if you do this above?

>+c_strlen (tree src, int only_value, tree *arr /* = NULL */)
> {
>   STRIP_NOPS (src);
>+
>+  /* Used to detect non-nul-terminated strings in subexpressions
>+ of a conditional expression.  When ARR is null, point it at
>+ one of the elements for simplicity.  */
>+  tree arrs[] = { NULL_TREE, NULL_TREE };
>+  if (!arr)
>+arr = arrs;

>@@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>   length = string_length (TREE_STRING_POINTER (init), charsize,
> length / charsize);
>-  if (compare_tree_int (array_size, length + 1) < 0)
>+  if (nulterm)
>+*nulterm = array_elts > length;
>+  else if (array_elts <= length)
> return NULL_TREE;

I don't understand why you can't use
compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), TREE_STRING_LENGTH (init))
instead of this convoluted code above ???

Sorry, this patch does not look like it is ready any time soon.


But actually I am totally puzzled by your priorities.
This is what I see right now:

1) We have missing warnings.
2) We have wrong code bugs.
3) We have apparently a specification error on the C Language standard (*)


Why are you prioritizing 1) over 2) thus blocking my attempts to fix a wrong 
code
issue,and why do you not tackle 3) in your WG14?



(*) which means that GCC is currently removing code from assertions
as I pointed out here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01695.html

This happens because GCC follows the language standards literally right now.

I would say too literally, and it proves that the language standard's logic is
flawed IMHO.

Thanks
Bernd.


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-07-30 Thread Martin Sebor

Attached is an updated version of the patch that handles more
instances of calling strlen() on a constant array that is not
a nul-terminated string.

No other functions except strlen are explicitly handled yet,
and neither are constant arrays with braced-initializer lists
like const char a[] = { 'a', 'b', 'c' };  I am testing
an independent solution for those (bug 86552).  Once those
are handled the warning will be able to detect those as well.

Tested on x86_64-linux.

On 07/25/2018 05:38 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html

The fix for bug 86532 has been checked in so this enhancement
can now be applied on top of it (with only minor adjustments).

On 07/19/2018 02:08 PM, Martin Sebor wrote:

In the discussion of my patch for pr86532 Bernd noted that
GCC silently accepts constant character arrays with no
terminating nul as arguments to strlen (and other string
functions).

The attached patch is a first step in detecting these kinds
of bugs in strlen calls by issuing -Wstringop-overflow.
The next step is to modify all other handlers of built-in
functions to detect the same problem (not part of this patch).
Yet another step is to detect these problems in arguments
initialized using the non-string form:

  const char a[] = { 'a', 'b', 'c' };

This patch is meant to apply on top of the one for bug 86532
(I tested it with an earlier version of that patch so there
is code in the context that does not appear in the latest
version of the other diff).

Martin





PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	PR tree-optimization/86552
	* builtins.h (warn_string_no_nul): Declare..
	(c_strlen): Add argument.
	* builtins.c (warn_string_no_nul): New function.
	(fold_builtin_strlen): Add argument.  Detect missing nul.
	(fold_builtin_1): Adjust.
	(string_length): Add argument and use it.
	(c_strlen): Same.
	(expand_builtin_strlen): Detect missing nul.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Change argument to bool*, rename
	other arguments.
	* fold-const-call.c (fold_const_call): Detect missing nul.
	* gimple-fold.c (get_range_strlen): Add argument.
	(get_maxval_strlen): Adjust.
	* gimple-fold.h (get_range_strlen): Add argument.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86552
	* gcc.dg/warn-string-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index aa3e0d8..f4924d5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+{
+  if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+  warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+}
+  else
+{
+  gcc_assert (fndecl);
+  warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+}
+
+  if (warned && DECL_P (nonstr))
+inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
+}
+
 /* Compute the length of a null-terminated character string or wide
character string handling character sizes of 1, 2, and 4 bytes.
TREE_STRING_LENGTH is not the right way because it evaluates to
@@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
accesses.  Note that this implies the result is not going to be emitted
into the instruction stream.
 
+   When ARR is non-null and the string is not properly nul-terminated,
+   set *ARR to the declaration of the outermost constant object whose
+   initializer (or one of its elements) is not nul-terminated.
+
The value returned is of type `ssizetype'.
 
Unfortunately, string_constant can't access the values of const char
arrays with initializers, so neither can we do so here.  */
 
 tree
-c_strlen (tree src, int only_value)

PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-07-25 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html

The fix for bug 86532 has been checked in so this enhancement
can now be applied on top of it (with only minor adjustments).

On 07/19/2018 02:08 PM, Martin Sebor wrote:

In the discussion of my patch for pr86532 Bernd noted that
GCC silently accepts constant character arrays with no
terminating nul as arguments to strlen (and other string
functions).

The attached patch is a first step in detecting these kinds
of bugs in strlen calls by issuing -Wstringop-overflow.
The next step is to modify all other handlers of built-in
functions to detect the same problem (not part of this patch).
Yet another step is to detect these problems in arguments
initialized using the non-string form:

  const char a[] = { 'a', 'b', 'c' };

This patch is meant to apply on top of the one for bug 86532
(I tested it with an earlier version of that patch so there
is code in the context that does not appear in the latest
version of the other diff).

Martin