Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Martin Sebor

On 8/22/19 4:37 PM, Jeff Law wrote:

On 8/22/19 4:23 PM, Martin Sebor wrote:

On 8/22/19 3:27 PM, Jeff Law wrote:

On 8/21/19 2:50 PM, Martin Sebor wrote:

This patch is a subset of the solution for PR 91457 whose main
goal is to eliminate inconsistencies in warnings issued for
out-of-bounds accesses to the various flavors of flexible array
members of constant objects.  That patch was posted here:
    https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html

Like PR 91457, this patch also relies on improving optimization
to issue better quality warnings.  (I.e., with the latter being
what motivated it.)

The optimization enhances string_constant to recognize empty
CONSTRUCTORs returned by fold_ctor_reference for references
to members of constant objects with either "insufficient"
initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
or with braced-list initializers (e.g., given
struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
enables the folding of calls to built-ins like strlen, strchr, or
strcmp with such arguments.

Exposing the strings to the folder then also lets it detect and
issue warnings for out-of-bounds offsets in more instances of
such references than before.

The remaining changes in the patch mostly enhance the places
that use the no-warning bit to prevent redundant diagnostics.

Tested on x86_64-linux.

Martin

gcc-91490.diff

PR middle-end/91490 - bogus argument missing terminating nul warning
on strlen of a flexible array member

gcc/c-family/ChangeLog:

 PR middle-end/91490
 * c-common.c (braced_list_to_string): Add argument and overload.
 Handle flexible length arrays.

gcc/testsuite/ChangeLog:

 PR middle-end/91490
 * c-c++-common/Warray-bounds-7.c: New test.
 * gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
 -Wstringop-overflow.
 * gcc.dg/strlenopt-78.c: New test.

gcc/ChangeLog:

 PR middle-end/91490
 * builtins.c (c_strlen): Rename argument and introduce new local.
 Set no-warning bit on original argument.
 * expr.c (string_constant): Pass argument type to
fold_ctor_reference.
 * gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
 for missing initializers.
 * tree.c (build_string_literal): Handle optional argument.
 * tree.h (build_string_literal): Add defaulted argument.
 * gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
 no-warning bit on original expression.

diff --git a/gcc/builtins.c b/gcc/builtins.c
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset,
const_tree exp)
   tree
   string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree
*decl)
   {
+  tree dummy = NULL_TREE;;
+  if (!mem_size)
+    mem_size = 
+
+  tree chartype = TREE_TYPE (arg);
+  if (POINTER_TYPE_P (chartype))
+    chartype = TREE_TYPE (chartype);
+  while (TREE_CODE (chartype) == ARRAY_TYPE)
+    chartype = TREE_TYPE (chartype);
+
     tree array;
     STRIP_NOPS (arg);

So per our conversation today, I took a closer look at this.  As you
noted CHARTYPE is only used for the empty constructor code you're adding
as a part of this patch.

Rather than stripping away types like this to compute chartype, couldn't
we just use char_type_node instead of chartype in this code below?


We can't.  string_constant is also called for wide strings (e.g.,
by the sprintf pass).  Returning a narrow string when the caller
asks for a wide one breaks the sprintf stuff.

Sigh.  And presumably we can't just  move the block down because other
bits in string_constant modify ARG.

So I think a quick comment before that fragment about its purpose and
we're good to go for the patch as a whole based on the one you posted an
hour or so ago.


I did move it down into the block with the STRING_CST transform.
A comment is also there so I committed the patch in r274837.

Thanks
Martin


Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Jeff Law
On 8/22/19 4:23 PM, Martin Sebor wrote:
> On 8/22/19 3:27 PM, Jeff Law wrote:
>> On 8/21/19 2:50 PM, Martin Sebor wrote:
>>> This patch is a subset of the solution for PR 91457 whose main
>>> goal is to eliminate inconsistencies in warnings issued for
>>> out-of-bounds accesses to the various flavors of flexible array
>>> members of constant objects.  That patch was posted here:
>>>    https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html
>>>
>>> Like PR 91457, this patch also relies on improving optimization
>>> to issue better quality warnings.  (I.e., with the latter being
>>> what motivated it.)
>>>
>>> The optimization enhances string_constant to recognize empty
>>> CONSTRUCTORs returned by fold_ctor_reference for references
>>> to members of constant objects with either "insufficient"
>>> initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
>>> or with braced-list initializers (e.g., given
>>> struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
>>> convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
>>> enables the folding of calls to built-ins like strlen, strchr, or
>>> strcmp with such arguments.
>>>
>>> Exposing the strings to the folder then also lets it detect and
>>> issue warnings for out-of-bounds offsets in more instances of
>>> such references than before.
>>>
>>> The remaining changes in the patch mostly enhance the places
>>> that use the no-warning bit to prevent redundant diagnostics.
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>> gcc-91490.diff
>>>
>>> PR middle-end/91490 - bogus argument missing terminating nul warning
>>> on strlen of a flexible array member
>>>
>>> gcc/c-family/ChangeLog:
>>>
>>> PR middle-end/91490
>>> * c-common.c (braced_list_to_string): Add argument and overload.
>>> Handle flexible length arrays.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR middle-end/91490
>>> * c-c++-common/Warray-bounds-7.c: New test.
>>> * gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
>>> -Wstringop-overflow.
>>> * gcc.dg/strlenopt-78.c: New test.
>>>
>>> gcc/ChangeLog:
>>>
>>> PR middle-end/91490
>>> * builtins.c (c_strlen): Rename argument and introduce new local.
>>> Set no-warning bit on original argument.
>>> * expr.c (string_constant): Pass argument type to
>>> fold_ctor_reference.
>>> * gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
>>> for missing initializers.
>>> * tree.c (build_string_literal): Handle optional argument.
>>> * tree.h (build_string_literal): Add defaulted argument.
>>> * gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
>>> no-warning bit on original expression.
>>>
>>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> --- a/gcc/expr.c
>>> +++ b/gcc/expr.c
>>> @@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset,
>>> const_tree exp)
>>>   tree
>>>   string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree
>>> *decl)
>>>   {
>>> +  tree dummy = NULL_TREE;;
>>> +  if (!mem_size)
>>> +    mem_size = 
>>> +
>>> +  tree chartype = TREE_TYPE (arg);
>>> +  if (POINTER_TYPE_P (chartype))
>>> +    chartype = TREE_TYPE (chartype);
>>> +  while (TREE_CODE (chartype) == ARRAY_TYPE)
>>> +    chartype = TREE_TYPE (chartype);
>>> +
>>>     tree array;
>>>     STRIP_NOPS (arg);
>> So per our conversation today, I took a closer look at this.  As you
>> noted CHARTYPE is only used for the empty constructor code you're adding
>> as a part of this patch.
>>
>> Rather than stripping away types like this to compute chartype, couldn't
>> we just use char_type_node instead of chartype in this code below?
> 
> We can't.  string_constant is also called for wide strings (e.g.,
> by the sprintf pass).  Returning a narrow string when the caller
> asks for a wide one breaks the sprintf stuff.
Sigh.  And presumably we can't just  move the block down because other
bits in string_constant modify ARG.

So I think a quick comment before that fragment about its purpose and
we're good to go for the patch as a whole based on the one you posted an
hour or so ago.

Thanks,
jeff


Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Martin Sebor

On 8/22/19 3:27 PM, Jeff Law wrote:

On 8/21/19 2:50 PM, Martin Sebor wrote:

This patch is a subset of the solution for PR 91457 whose main
goal is to eliminate inconsistencies in warnings issued for
out-of-bounds accesses to the various flavors of flexible array
members of constant objects.  That patch was posted here:
   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html

Like PR 91457, this patch also relies on improving optimization
to issue better quality warnings.  (I.e., with the latter being
what motivated it.)

The optimization enhances string_constant to recognize empty
CONSTRUCTORs returned by fold_ctor_reference for references
to members of constant objects with either "insufficient"
initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
or with braced-list initializers (e.g., given
struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
enables the folding of calls to built-ins like strlen, strchr, or
strcmp with such arguments.

Exposing the strings to the folder then also lets it detect and
issue warnings for out-of-bounds offsets in more instances of
such references than before.

The remaining changes in the patch mostly enhance the places
that use the no-warning bit to prevent redundant diagnostics.

Tested on x86_64-linux.

Martin

gcc-91490.diff

PR middle-end/91490 - bogus argument missing terminating nul warning on strlen 
of a flexible array member

gcc/c-family/ChangeLog:

PR middle-end/91490
* c-common.c (braced_list_to_string): Add argument and overload.
Handle flexible length arrays.

gcc/testsuite/ChangeLog:

PR middle-end/91490
* c-c++-common/Warray-bounds-7.c: New test.
* gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
-Wstringop-overflow.
* gcc.dg/strlenopt-78.c: New test.

gcc/ChangeLog:

PR middle-end/91490
* builtins.c (c_strlen): Rename argument and introduce new local.
Set no-warning bit on original argument.
* expr.c (string_constant): Pass argument type to fold_ctor_reference.
* gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
for missing initializers.
* tree.c (build_string_literal): Handle optional argument.
* tree.h (build_string_literal): Add defaulted argument.
* gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
no-warning bit on original expression.

diff --git a/gcc/builtins.c b/gcc/builtins.c
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset, const_tree exp)
  tree
  string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
  {
+  tree dummy = NULL_TREE;;
+  if (!mem_size)
+mem_size = 
+
+  tree chartype = TREE_TYPE (arg);
+  if (POINTER_TYPE_P (chartype))
+chartype = TREE_TYPE (chartype);
+  while (TREE_CODE (chartype) == ARRAY_TYPE)
+chartype = TREE_TYPE (chartype);
+
tree array;
STRIP_NOPS (arg);

So per our conversation today, I took a closer look at this.  As you
noted CHARTYPE is only used for the empty constructor code you're adding
as a part of this patch.

Rather than stripping away types like this to compute chartype, couldn't
we just use char_type_node instead of chartype in this code below?


We can't.  string_constant is also called for wide strings (e.g.,
by the sprintf pass).  Returning a narrow string when the caller
asks for a wide one breaks the sprintf stuff.



+  tree initsize = TYPE_SIZE_UNIT (inittype);
+
+  if (TREE_CODE (init) == CONSTRUCTOR && !CONSTRUCTOR_ELTS (init))
+{
+  /* Convert a char array to an empty STRING_CST having an array
+of the expected type.  */
+  if (!initsize)
+ initsize = integer_zero_node;
+
+  unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
+  init = build_string_literal (size ? 1 : 0, "", chartype, size);
+  init = TREE_OPERAND (init, 0);
+  init = TREE_OPERAND (init, 0);
+
+  *ptr_offset = integer_zero_node;
+}

Per my prior message, we do need to update that test to be something like

if (TREE_CODE (init) == CONSTRUCTOR
 && !CONSTRUCTOR_ELTS
 && !TREE_CLOBBER_P (init))

While I don't think we're likely to see a TREE_CLOBBER_P here, it's best
to be safe since those have totally different meanings,


I went with the other alternative you mentioned and used
initializer_zerop as I explained in my reply.  Let me know if
that's not what you meant.




diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 2810867235d..ef942ffce57 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8868,7 +8876,7 @@ braced_lists_to_strings (tree type, tree ctor)
unsigned HOST_WIDE_INT idx;
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), idx, val)
{
- val = braced_lists_to_strings (ttp, val);
+ val = braced_lists_to_strings (ttp, val, code 

Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Jeff Law
On 8/22/19 3:55 PM, Martin Sebor wrote:

>>> diff --git a/gcc/expr.c b/gcc/expr.c
>>> index 92979289e83..d16e0982dcf 100644
>>> --- a/gcc/expr.c
>>> +++ b/gcc/expr.c
>>> @@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset,
>>> const_tree exp)
>>>   tree
>>>   string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree
>>> *decl)
>>>   {
>>> +  tree dummy = NULL_TREE;;
>>> +  if (!mem_size)
>>> +    mem_size = 
>>> +
>>> +  tree chartype = TREE_TYPE (arg);
>>> +  if (POINTER_TYPE_P (chartype))
>>> +    chartype = TREE_TYPE (chartype);
>>> +  while (TREE_CODE (chartype) == ARRAY_TYPE)
>>> +    chartype = TREE_TYPE (chartype);
>> I'm hesitant to ACK this bit.  I can understand that if we're given an
>> ADDR_EXPR, which should be a POINTER_TYPE_P, that we'll want to look at
>> what it's pointing to.  But shouldn't we verify that it's an ADDR_EXPR
>> before just blindly stripping away the outer type?
> 
> The character type is that of the expression, before any casts
> are stripped.  For example, given:
> 
>   struct S { char n, a[4]; } s[1] = { 0 };
> 
> and strlen (s->a), ARG is the POINTER_PLUS_EXPR
> '(const char *)  + 1'  The type of the operand is a pointer to
> the struct S.
Given "chartype" is only used in the CONSTRUCTOR handling you're adding,
couldn't we just use "char_type_node" in there instead and drop the bits
noted above?


>>> +  if (TREE_CODE (init) == CONSTRUCTOR && !CONSTRUCTOR_ELTS (init))
>> Would initializer_zerop be better here?  That would catch
>> zero-initialized constructors as well.
> 
> I originally had initializer_zerop there and removed it at the last
> minute it because none of my tests needed it.  But my tests only
> exercised the flexible array members (which have to be initialized
> in constant objects) and not the case where initializer_zerop does
> make a difference such as:
> 
>   struct A { char a[4]; };
>   struct B { struct A a; };
> 
>   const struct B b[2] = { 0 };
> 
>   void f (void)
>   {
>     if (__builtin_strlen (b->a.a) == 3)
>   __builtin_abort ();
>   }
> 
> Here, fold_ctor_reference returns a non-empty CONSTRUCTOR that
> the code doesn't handle.  Adding initializer_zerop lets it create
> the empty string and the caller fold the call.  But it's just
> a special case of the more general problem where the CONSTRUCTOR
> is both non-empty and non-zero for the parts of the object before
> the array we're interested in, such as in
> 
>   const struct B b[1] = { 1 };
> 
> Here, strlen (b->a.a) again isn't folded.  Ironically, the equivalent
> strlen (b[0].a.a) is folded.  The difference between the two is that
> b[0].a.a is represented as itself (i.e., NOP (char*, ADDR (b.a.a))
> while b->a.a as (const char *)  + 1.  In the former case
> fold_ctor_reference returns an empty CONSTRUCTOR for the char array.
> In the latter, it returns the non-empty, non-zero CTOR for all of
> b the we don't know how to extract the empty string from.
> 
> Incidentally, it's only the C front-end that "bastardizes"
> the expression like this.  The C++ FE preserves the original form
> of the expression and so it's able to fold the call.
> 
> (Uncovering and trying to fix these problems, by the way, is what
> I meant the other day when I said how patches in this area have
> a tendency to snownball into "projects."  Folding empty ctors of
> constant objects probably isn't terribly important but the lack
> of consistency is what makes writing tests that behave the same
> way across different front-ends and back-ends challenging.)
Yup.  I understand.  When presented with these kinds of snowballing
issues, the best advice I can give is to break things down into logical
units and make patches which are a series rather than a single unified
patch to address multiple issues.  git rebase can really help.

> 
>> If not, then you want to make sure to check !TREE_CLOBBER_P (init) since
>> that's something completely different than zero initialization.
> 
> I went with the initializer_zerop since it improves things, if
> negligibly, and added tests for it.  If you or Richard have a test
> case that I could add to the patch showing when TREE_CLOBBER_P is
> set on a CTOR and that not checking would cause trouble.
Note that initializer_zerop checks TREE_CLOBBER internally.  So if
you're using initializer_zerop, then you're good.


So all that's left is the "chartype" stuff in string_constant.  If we
can use char_type_node instead of trying to extract a character type,
then the problematical code would just go away.

jeff



Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Martin Sebor

On 8/21/19 4:28 PM, Jeff Law wrote:

On 8/21/19 2:50 PM, Martin Sebor wrote:

This patch is a subset of the solution for PR 91457 whose main
goal is to eliminate inconsistencies in warnings issued for
out-of-bounds accesses to the various flavors of flexible array
members of constant objects.  That patch was posted here:
   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html

Like PR 91457, this patch also relies on improving optimization
to issue better quality warnings.  (I.e., with the latter being
what motivated it.)

The optimization enhances string_constant to recognize empty
CONSTRUCTORs returned by fold_ctor_reference for references
to members of constant objects with either "insufficient"
initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
or with braced-list initializers (e.g., given
struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
enables the folding of calls to built-ins like strlen, strchr, or
strcmp with such arguments.

Exposing the strings to the folder then also lets it detect and
issue warnings for out-of-bounds offsets in more instances of
such references than before.

The remaining changes in the patch mostly enhance the places
that use the no-warning bit to prevent redundant diagnostics.

Tested on x86_64-linux.

Martin

gcc-91490.diff

PR middle-end/91490 - bogus argument missing terminating nul warning on strlen 
of a flexible array member

gcc/c-family/ChangeLog:

PR middle-end/91490
* c-common.c (braced_list_to_string): Add argument and overload.
Handle flexible length arrays.

gcc/testsuite/ChangeLog:

PR middle-end/91490
* c-c++-common/Warray-bounds-7.c: New test.
* gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
-Wstringop-overflow.
* gcc.dg/strlenopt-78.c: New test.

gcc/ChangeLog:

PR middle-end/91490
* builtins.c (c_strlen): Rename argument and introduce new local.
Set no-warning bit on original argument.
* expr.c (string_constant): Pass argument type to fold_ctor_reference.
* gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
for missing initializers.
* tree.c (build_string_literal): Handle optional argument.
* tree.h (build_string_literal): Add defaulted argument.
* gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
no-warning bit on original expression.




diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 2810867235d..ef942ffce57 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8868,7 +8876,7 @@ braced_lists_to_strings (tree type, tree ctor)
unsigned HOST_WIDE_INT idx;
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), idx, val)
{
- val = braced_lists_to_strings (ttp, val);
+ val = braced_lists_to_strings (ttp, val, code == RECORD_TYPE);

Do you need to handle UNION_TYPE or QUAL_UNION_TYPE here too?  If so,
RECORD_OR_UNION_TYPE_P is a better test.


I confess I didn't even think about unions.  I wouldn't expect
to see the definition of a const union but I also can't think
of a reason to exclude them from here so I've made the change.




diff --git a/gcc/expr.c b/gcc/expr.c
index 92979289e83..d16e0982dcf 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset, const_tree exp)
  tree
  string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
  {
+  tree dummy = NULL_TREE;;
+  if (!mem_size)
+mem_size = 
+
+  tree chartype = TREE_TYPE (arg);
+  if (POINTER_TYPE_P (chartype))
+chartype = TREE_TYPE (chartype);
+  while (TREE_CODE (chartype) == ARRAY_TYPE)
+chartype = TREE_TYPE (chartype);

I'm hesitant to ACK this bit.  I can understand that if we're given an
ADDR_EXPR, which should be a POINTER_TYPE_P, that we'll want to look at
what it's pointing to.  But shouldn't we verify that it's an ADDR_EXPR
before just blindly stripping away the outer type?


The character type is that of the expression, before any casts
are stripped.  For example, given:

  struct S { char n, a[4]; } s[1] = { 0 };

and strlen (s->a), ARG is the POINTER_PLUS_EXPR
'(const char *)  + 1'  The type of the operand is a pointer to
the struct S.



I also wonder if this (assuming we keep it in some form) belongs after
the STRIP_NOPs.


In other expressions, after STRIP_NOPS the type also may not reflect
the actual character type.  I don't know how else  get at the character
type when the type of the CONSTRUCTOR is that of an enclosing object.

I've moved the code closer to where it's needed but otherwise left it
unchanged.


@@ -11602,17 +11605,39 @@ string_constant (tree arg, tree *ptr_offset, tree 
*mem_size, tree *decl)
int len = native_encode_expr (init, charbuf, sizeof charbuf, 0);
if (len > 0)
{
- /* Construct a string literal with 

Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-22 Thread Jeff Law
On 8/21/19 2:50 PM, Martin Sebor wrote:
> This patch is a subset of the solution for PR 91457 whose main
> goal is to eliminate inconsistencies in warnings issued for
> out-of-bounds accesses to the various flavors of flexible array
> members of constant objects.  That patch was posted here:
>   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html
> 
> Like PR 91457, this patch also relies on improving optimization
> to issue better quality warnings.  (I.e., with the latter being
> what motivated it.)
> 
> The optimization enhances string_constant to recognize empty
> CONSTRUCTORs returned by fold_ctor_reference for references
> to members of constant objects with either "insufficient"
> initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
> or with braced-list initializers (e.g., given
> struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
> convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
> enables the folding of calls to built-ins like strlen, strchr, or
> strcmp with such arguments.
> 
> Exposing the strings to the folder then also lets it detect and
> issue warnings for out-of-bounds offsets in more instances of
> such references than before.
> 
> The remaining changes in the patch mostly enhance the places
> that use the no-warning bit to prevent redundant diagnostics.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-91490.diff
> 
> PR middle-end/91490 - bogus argument missing terminating nul warning on 
> strlen of a flexible array member
> 
> gcc/c-family/ChangeLog:
> 
>   PR middle-end/91490
>   * c-common.c (braced_list_to_string): Add argument and overload.
>   Handle flexible length arrays.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/91490
>   * c-c++-common/Warray-bounds-7.c: New test.
>   * gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
>   -Wstringop-overflow.
>   * gcc.dg/strlenopt-78.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/91490
>   * builtins.c (c_strlen): Rename argument and introduce new local.
>   Set no-warning bit on original argument.
>   * expr.c (string_constant): Pass argument type to fold_ctor_reference.
>   * gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
>   for missing initializers.
>   * tree.c (build_string_literal): Handle optional argument.
>   * tree.h (build_string_literal): Add defaulted argument.
>   * gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
>   no-warning bit on original expression.
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset, const_tree 
> exp)
>  tree
>  string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
>  {
> +  tree dummy = NULL_TREE;;
> +  if (!mem_size)
> +mem_size = 
> +
> +  tree chartype = TREE_TYPE (arg);
> +  if (POINTER_TYPE_P (chartype))
> +chartype = TREE_TYPE (chartype);
> +  while (TREE_CODE (chartype) == ARRAY_TYPE)
> +chartype = TREE_TYPE (chartype);
> +
>tree array;
>STRIP_NOPS (arg);
So per our conversation today, I took a closer look at this.  As you
noted CHARTYPE is only used for the empty constructor code you're adding
as a part of this patch.

Rather than stripping away types like this to compute chartype, couldn't
we just use char_type_node instead of chartype in this code below?


> +  tree initsize = TYPE_SIZE_UNIT (inittype);
> +
> +  if (TREE_CODE (init) == CONSTRUCTOR && !CONSTRUCTOR_ELTS (init))
> +{
> +  /* Convert a char array to an empty STRING_CST having an array
> +  of the expected type.  */
> +  if (!initsize)
> +   initsize = integer_zero_node;
> +
> +  unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
> +  init = build_string_literal (size ? 1 : 0, "", chartype, size);
> +  init = TREE_OPERAND (init, 0);
> +  init = TREE_OPERAND (init, 0);
> +
> +  *ptr_offset = integer_zero_node;
> +}
Per my prior message, we do need to update that test to be something like

if (TREE_CODE (init) == CONSTRUCTOR
&& !CONSTRUCTOR_ELTS
&& !TREE_CLOBBER_P (init))

While I don't think we're likely to see a TREE_CLOBBER_P here, it's best
to be safe since those have totally different meanings,

> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 2810867235d..ef942ffce57 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -8868,7 +8876,7 @@ braced_lists_to_strings (tree type, tree ctor)
>unsigned HOST_WIDE_INT idx;
>FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), idx, val)
>   {
> -   val = braced_lists_to_strings (ttp, val);
> +   val = braced_lists_to_strings (ttp, val, code == RECORD_TYPE);
Also per our discussion, I think a comment that we might want to test
RECORD_OR_UNION_TYPE_P here instead of just testing for RECORD_TYPE is
all we need.

With those changes and a 

Re: [PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-21 Thread Jeff Law
On 8/21/19 2:50 PM, Martin Sebor wrote:
> This patch is a subset of the solution for PR 91457 whose main
> goal is to eliminate inconsistencies in warnings issued for
> out-of-bounds accesses to the various flavors of flexible array
> members of constant objects.  That patch was posted here:
>   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html
> 
> Like PR 91457, this patch also relies on improving optimization
> to issue better quality warnings.  (I.e., with the latter being
> what motivated it.)
> 
> The optimization enhances string_constant to recognize empty
> CONSTRUCTORs returned by fold_ctor_reference for references
> to members of constant objects with either "insufficient"
> initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
> or with braced-list initializers (e.g., given
> struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
> convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
> enables the folding of calls to built-ins like strlen, strchr, or
> strcmp with such arguments.
> 
> Exposing the strings to the folder then also lets it detect and
> issue warnings for out-of-bounds offsets in more instances of
> such references than before.
> 
> The remaining changes in the patch mostly enhance the places
> that use the no-warning bit to prevent redundant diagnostics.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-91490.diff
> 
> PR middle-end/91490 - bogus argument missing terminating nul warning on 
> strlen of a flexible array member
> 
> gcc/c-family/ChangeLog:
> 
>   PR middle-end/91490
>   * c-common.c (braced_list_to_string): Add argument and overload.
>   Handle flexible length arrays.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/91490
>   * c-c++-common/Warray-bounds-7.c: New test.
>   * gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
>   -Wstringop-overflow.
>   * gcc.dg/strlenopt-78.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/91490
>   * builtins.c (c_strlen): Rename argument and introduce new local.
>   Set no-warning bit on original argument.
>   * expr.c (string_constant): Pass argument type to fold_ctor_reference.
>   * gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
>   for missing initializers.
>   * tree.c (build_string_literal): Handle optional argument.
>   * tree.h (build_string_literal): Add defaulted argument.
>   * gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
>   no-warning bit on original expression.
> 

> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 2810867235d..ef942ffce57 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -8868,7 +8876,7 @@ braced_lists_to_strings (tree type, tree ctor)
>unsigned HOST_WIDE_INT idx;
>FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), idx, val)
>   {
> -   val = braced_lists_to_strings (ttp, val);
> +   val = braced_lists_to_strings (ttp, val, code == RECORD_TYPE);
Do you need to handle UNION_TYPE or QUAL_UNION_TYPE here too?  If so,
RECORD_OR_UNION_TYPE_P is a better test.


> diff --git a/gcc/expr.c b/gcc/expr.c
> index 92979289e83..d16e0982dcf 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11402,6 +11402,16 @@ is_aligning_offset (const_tree offset, const_tree 
> exp)
>  tree
>  string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
>  {
> +  tree dummy = NULL_TREE;;
> +  if (!mem_size)
> +mem_size = 
> +
> +  tree chartype = TREE_TYPE (arg);
> +  if (POINTER_TYPE_P (chartype))
> +chartype = TREE_TYPE (chartype);
> +  while (TREE_CODE (chartype) == ARRAY_TYPE)
> +chartype = TREE_TYPE (chartype);
I'm hesitant to ACK this bit.  I can understand that if we're given an
ADDR_EXPR, which should be a POINTER_TYPE_P, that we'll want to look at
what it's pointing to.  But shouldn't we verify that it's an ADDR_EXPR
before just blindly stripping away the outer type?

I also wonder if this (assuming we keep it in some form) belongs after
the STRIP_NOPs.


> @@ -11602,17 +11605,39 @@ string_constant (tree arg, tree *ptr_offset, tree 
> *mem_size, tree *decl)
>int len = native_encode_expr (init, charbuf, sizeof charbuf, 0);
>if (len > 0)
>   {
> -   /* Construct a string literal with elements of ELTYPE and
> +   /* Construct a string literal with elements of INITTYPE and
>the representation above.  Then strip
>the ADDR_EXPR (ARRAY_REF (...)) around the STRING_CST.  */
> -   init = build_string_literal (len, (char *)charbuf, eltype);
> +   init = build_string_literal (len, (char *)charbuf, inittype);
> init = TREE_OPERAND (TREE_OPERAND (init, 0), 0);
>   }
>  }
>  
> +  tree initsize = TYPE_SIZE_UNIT (inittype);
> +
> +  if (TREE_CODE (init) == CONSTRUCTOR && !CONSTRUCTOR_ELTS (init))
Would initializer_zerop be better here?  That would catch
zero-initialized constructors as well.

If 

[PATCH] fold constant flexarrays to strings (PR 91490)

2019-08-21 Thread Martin Sebor

This patch is a subset of the solution for PR 91457 whose main
goal is to eliminate inconsistencies in warnings issued for
out-of-bounds accesses to the various flavors of flexible array
members of constant objects.  That patch was posted here:
  https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01202.html

Like PR 91457, this patch also relies on improving optimization
to issue better quality warnings.  (I.e., with the latter being
what motivated it.)

The optimization enhances string_constant to recognize empty
CONSTRUCTORs returned by fold_ctor_reference for references
to members of constant objects with either "insufficient"
initializers (e.g., given struct S { char n, a[]; } s = { 0 };)
or with braced-list initializers (e.g., given
struct S s = { 3 { 1, 2, 3, 0 } };  The patch lets string_constant
convert the CONSTRUCTOR for s.a into a STRING_CST, which in turn
enables the folding of calls to built-ins like strlen, strchr, or
strcmp with such arguments.

Exposing the strings to the folder then also lets it detect and
issue warnings for out-of-bounds offsets in more instances of
such references than before.

The remaining changes in the patch mostly enhance the places
that use the no-warning bit to prevent redundant diagnostics.

Tested on x86_64-linux.

Martin
PR middle-end/91490 - bogus argument missing terminating nul warning on strlen of a flexible array member

gcc/c-family/ChangeLog:

	PR middle-end/91490
	* c-common.c (braced_list_to_string): Add argument and overload.
	Handle flexible length arrays.

gcc/testsuite/ChangeLog:

	PR middle-end/91490
	* c-c++-common/Warray-bounds-7.c: New test.
	* gcc.dg/Warray-bounds-39.c: Expect either -Warray-bounds or
	-Wstringop-overflow.
	* gcc.dg/strlenopt-78.c: New test.

gcc/ChangeLog:

	PR middle-end/91490
	* builtins.c (c_strlen): Rename argument and introduce new local.
	Set no-warning bit on original argument.
	* expr.c (string_constant): Pass argument type to fold_ctor_reference.
	* gimple-fold.c (fold_nonarray_ctor_reference): Return a STRING_CST
	for missing initializers.
	* tree.c (build_string_literal): Handle optional argument.
	* tree.h (build_string_literal): Add defaulted argument.
	* gimple-ssa-warn-restrict.c (maybe_diag_access_bounds): Check
	no-warning bit on original expression.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 9a766e4ad63..073b92a1dc9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -620,7 +620,7 @@ unterminated_array (tree exp, tree *size /* = NULL */, bool *exact /* = NULL */)
into the instruction stream and zero if it is going to be expanded.
E.g. with i++ ? "foo" : "bar", if ONLY_VALUE is nonzero, constant 3
is returned, otherwise NULL, since
-   len = c_strlen (src, 1); if (len) expand_expr (len, ...); would not
+   len = c_strlen (ARG, 1); if (len) expand_expr (len, ...); would not
evaluate the side-effects.
 
If ONLY_VALUE is two then we do not emit warnings about out-of-bound
@@ -628,7 +628,7 @@ unterminated_array (tree exp, tree *size /* = NULL */, bool *exact /* = NULL */)
into the instruction stream.
 
Additional information about the string accessed may be recorded
-   in DATA.  For example, if SRC references an unterminated string,
+   in DATA.  For example, if ARG references an unterminated string,
then the declaration will be stored in the DECL field.   If the
length of the unterminated string can be determined, it'll be
stored in the LEN field.  Note this length could well be different
@@ -640,7 +640,7 @@ unterminated_array (tree exp, tree *size /* = NULL */, bool *exact /* = NULL */)
The value returned is of type `ssizetype'.  */
 
 tree
-c_strlen (tree src, int only_value, c_strlen_data *data, unsigned eltsize)
+c_strlen (tree arg, int only_value, c_strlen_data *data, unsigned eltsize)
 {
   /* If we were not passed a DATA pointer, then get one to a local
  structure.  That avoids having to check DATA for NULL before
@@ -650,7 +650,8 @@ c_strlen (tree src, int only_value, c_strlen_data *data, unsigned eltsize)
 data = _strlen_data;
 
   gcc_checking_assert (eltsize == 1 || eltsize == 2 || eltsize == 4);
-  STRIP_NOPS (src);
+
+  tree src = STRIP_NOPS (arg);
   if (TREE_CODE (src) == COND_EXPR
   && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0
 {
@@ -762,11 +763,15 @@ c_strlen (tree src, int only_value, c_strlen_data *data, unsigned eltsize)
 {
   /* Suppress multiple warnings for propagated constant strings.  */
   if (only_value != 2
-	  && !TREE_NO_WARNING (src)
+	  && !TREE_NO_WARNING (arg)
 	  && warning_at (loc, OPT_Warray_bounds,
 			 "offset %qwi outside bounds of constant string",
 			 eltoff))
-	TREE_NO_WARNING (src) = 1;
+	{
+	  if (decl)
+	inform (DECL_SOURCE_LOCATION (decl), "%qE declared here", decl);
+	  TREE_NO_WARNING (arg) = 1;
+	}
   return NULL_TREE;
 }
 
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 2810867235d..ef942ffce57 100644
--- a/gcc/c-family/c-common.c