Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-11 Thread Martin Sebor

On 07/11/2018 07:50 AM, Andre Vieira (lists) wrote:

On 11/07/18 11:00, Andre Vieira (lists) wrote:

On 09/07/18 22:44, Martin Sebor wrote:

On 07/09/2018 06:40 AM, Richard Biener wrote:

On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:


On 07/06/2018 09:52 AM, Richard Biener wrote:

On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:


GCC folds accesses to members of constant aggregates except
for character arrays/strings.  For example, the strlen() call
below is not folded:

   const char a[][4] = { "1", "12" };

   int f (void) { retturn strlen (a[1]); }

The attached change set enhances the string_constant() function
to make it possible to extract string constants from aggregate
initializers (CONSTRUCTORS).

The initial solution was much simpler but as is often the case,
MEM_REF made it fail to fold things like:

   int f (void) { retturn strlen (a[1] + 1); }

Handling those made the project a bit more interesting and
the final solution somewhat more involved.

To handle offsets into aggregate string members the patch also
extends the fold_ctor_reference() function to extract entire
string array initializers even if the offset points past
the beginning of the string and even though the size and
exact type of the reference are not known (there isn't enough
information in a MEM_REF to determine that).

Tested along with the patch for PR 86415 on x86_64-linux.


+  if (TREE_CODE (init) == CONSTRUCTOR)
+   {
+ tree type;
+ if (TREE_CODE (arg) == ARRAY_REF
+ || TREE_CODE (arg) == MEM_REF)
+   type = TREE_TYPE (arg);
+ else if (TREE_CODE (arg) == COMPONENT_REF)
+   {
+ tree field = TREE_OPERAND (arg, 1);
+ type = TREE_TYPE (field);
+   }
+ else
+   return NULL_TREE;

what's wrong with just

type = TREE_TYPE (field);


In response to your comment below abut size I simplified things
further so determining the type a priori is no longer necessary.


?

+ base_off *= BITS_PER_UNIT;

poly_uint64 isn't enough for "bits", with wide-int you'd use
offset_int,
for poly you'd then use poly_offset?


Okay, I tried to avoid the overflow.  (Converting between all
these flavors of wide int types is a monumental PITA.)



You extend fold_ctor_reference to treat size == 0 specially but then
bother to compute a size here - that looks unneeded?


Yes, well spotted, thanks!  I simplified the code so this isn't
necessary, and neither is the type.



While the offset of the reference determines the first field in the
CONSTRUCTOR, how do you know the access doesn't touch
adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
so consider

  char x[2][4] = { "abcd", "abcd" };

and MEM[&x] with a char[8] type?  memcpy "inlining" will create
such MEMs for example.


The code is only used to find string constants in initializer
expressions where I don't think the size of the access comes
into play.  If a memcpy() call results in a MEM_REF[char[8],
&x, 8] that's fine.  It's a valid reference and we can still
get the underlying character sequence (which is represented
as two STRING_CSTs with the two string literals).  I might
be missing the point of your question.


Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.



@@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree
ctor,
   tree byte_offset = DECL_FIELD_OFFSET (cfield);
   tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
   tree field_size = DECL_SIZE (cfield);
-  offset_int bitoffset;
-  offset_int bitoffset_end, access_end;
+
+  if (!field_size && TREE_CODE (cval) == STRING_CST)
+   {
+ /* Determine the size of the flexible array member from
+the size of the string initializer provided for it.  */
+ unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
+ tree eltype = TREE_TYPE (TREE_TYPE (cval));
+ len *= tree_to_uhwi (TYPE_SIZE (eltype));
+ field_size = build_int_cst (size_type_node, len);
+   }

Why does this only apply to STRING_CST initializers and not
CONSTRUCTORS,
say, for

struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };


I can't think of a use for it.  Do you have something in mind?


Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler.  So I don't see why
it shouldn't work for general flex-arrays.



?  And why not use simply

  field_size = TYPE_SIZE (TREE_TYPE (cval));

like you do in c_strlen?


Yes, that's simpler, thanks.



Otherwise looks reasonable.


Attached is an updated patch.  I also enhanced the handling
of non-constant indices.  They were already handled before
to a smaller extent.  (There may be other opportunities
here.)


Please don't do functional changes to a patch in review, without
exactly pointing out the change.  It makes review inefficent for me.

Looks like it might be the NULL type argument handling?

Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-11 Thread Andre Vieira (lists)
On 11/07/18 11:00, Andre Vieira (lists) wrote:
> On 09/07/18 22:44, Martin Sebor wrote:
>> On 07/09/2018 06:40 AM, Richard Biener wrote:
>>> On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:

 On 07/06/2018 09:52 AM, Richard Biener wrote:
> On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:
>>
>> GCC folds accesses to members of constant aggregates except
>> for character arrays/strings.  For example, the strlen() call
>> below is not folded:
>>
>>const char a[][4] = { "1", "12" };
>>
>>int f (void) { retturn strlen (a[1]); }
>>
>> The attached change set enhances the string_constant() function
>> to make it possible to extract string constants from aggregate
>> initializers (CONSTRUCTORS).
>>
>> The initial solution was much simpler but as is often the case,
>> MEM_REF made it fail to fold things like:
>>
>>int f (void) { retturn strlen (a[1] + 1); }
>>
>> Handling those made the project a bit more interesting and
>> the final solution somewhat more involved.
>>
>> To handle offsets into aggregate string members the patch also
>> extends the fold_ctor_reference() function to extract entire
>> string array initializers even if the offset points past
>> the beginning of the string and even though the size and
>> exact type of the reference are not known (there isn't enough
>> information in a MEM_REF to determine that).
>>
>> Tested along with the patch for PR 86415 on x86_64-linux.
>
> +  if (TREE_CODE (init) == CONSTRUCTOR)
> +   {
> + tree type;
> + if (TREE_CODE (arg) == ARRAY_REF
> + || TREE_CODE (arg) == MEM_REF)
> +   type = TREE_TYPE (arg);
> + else if (TREE_CODE (arg) == COMPONENT_REF)
> +   {
> + tree field = TREE_OPERAND (arg, 1);
> + type = TREE_TYPE (field);
> +   }
> + else
> +   return NULL_TREE;
>
> what's wrong with just
>
> type = TREE_TYPE (field);

 In response to your comment below abut size I simplified things
 further so determining the type a priori is no longer necessary.

> ?
>
> + base_off *= BITS_PER_UNIT;
>
> poly_uint64 isn't enough for "bits", with wide-int you'd use
> offset_int,
> for poly you'd then use poly_offset?

 Okay, I tried to avoid the overflow.  (Converting between all
 these flavors of wide int types is a monumental PITA.)

>
> You extend fold_ctor_reference to treat size == 0 specially but then
> bother to compute a size here - that looks unneeded?

 Yes, well spotted, thanks!  I simplified the code so this isn't
 necessary, and neither is the type.

>
> While the offset of the reference determines the first field in the
> CONSTRUCTOR, how do you know the access doesn't touch
> adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
> so consider
>
>   char x[2][4] = { "abcd", "abcd" };
>
> and MEM[&x] with a char[8] type?  memcpy "inlining" will create
> such MEMs for example.

 The code is only used to find string constants in initializer
 expressions where I don't think the size of the access comes
 into play.  If a memcpy() call results in a MEM_REF[char[8],
 &x, 8] that's fine.  It's a valid reference and we can still
 get the underlying character sequence (which is represented
 as two STRING_CSTs with the two string literals).  I might
 be missing the point of your question.
>>>
>>> Maybe irrelevant for strlen folding depending on what you do
>>> for missing '\0' termination.
>>>
>
> @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree
> ctor,
>tree byte_offset = DECL_FIELD_OFFSET (cfield);
>tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
>tree field_size = DECL_SIZE (cfield);
> -  offset_int bitoffset;
> -  offset_int bitoffset_end, access_end;
> +
> +  if (!field_size && TREE_CODE (cval) == STRING_CST)
> +   {
> + /* Determine the size of the flexible array member from
> +the size of the string initializer provided for it.  */
> + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
> + tree eltype = TREE_TYPE (TREE_TYPE (cval));
> + len *= tree_to_uhwi (TYPE_SIZE (eltype));
> + field_size = build_int_cst (size_type_node, len);
> +   }
>
> Why does this only apply to STRING_CST initializers and not
> CONSTRUCTORS,
> say, for
>
> struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };

 I can't think of a use for it.  Do you have something in mind?
>>>
>>> Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
>>> which is useful in

Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-11 Thread Andre Vieira (lists)
On 09/07/18 22:44, Martin Sebor wrote:
> On 07/09/2018 06:40 AM, Richard Biener wrote:
>> On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:
>>>
>>> On 07/06/2018 09:52 AM, Richard Biener wrote:
 On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:
>
> GCC folds accesses to members of constant aggregates except
> for character arrays/strings.  For example, the strlen() call
> below is not folded:
>
>const char a[][4] = { "1", "12" };
>
>int f (void) { retturn strlen (a[1]); }
>
> The attached change set enhances the string_constant() function
> to make it possible to extract string constants from aggregate
> initializers (CONSTRUCTORS).
>
> The initial solution was much simpler but as is often the case,
> MEM_REF made it fail to fold things like:
>
>int f (void) { retturn strlen (a[1] + 1); }
>
> Handling those made the project a bit more interesting and
> the final solution somewhat more involved.
>
> To handle offsets into aggregate string members the patch also
> extends the fold_ctor_reference() function to extract entire
> string array initializers even if the offset points past
> the beginning of the string and even though the size and
> exact type of the reference are not known (there isn't enough
> information in a MEM_REF to determine that).
>
> Tested along with the patch for PR 86415 on x86_64-linux.

 +  if (TREE_CODE (init) == CONSTRUCTOR)
 +   {
 + tree type;
 + if (TREE_CODE (arg) == ARRAY_REF
 + || TREE_CODE (arg) == MEM_REF)
 +   type = TREE_TYPE (arg);
 + else if (TREE_CODE (arg) == COMPONENT_REF)
 +   {
 + tree field = TREE_OPERAND (arg, 1);
 + type = TREE_TYPE (field);
 +   }
 + else
 +   return NULL_TREE;

 what's wrong with just

 type = TREE_TYPE (field);
>>>
>>> In response to your comment below abut size I simplified things
>>> further so determining the type a priori is no longer necessary.
>>>
 ?

 + base_off *= BITS_PER_UNIT;

 poly_uint64 isn't enough for "bits", with wide-int you'd use
 offset_int,
 for poly you'd then use poly_offset?
>>>
>>> Okay, I tried to avoid the overflow.  (Converting between all
>>> these flavors of wide int types is a monumental PITA.)
>>>

 You extend fold_ctor_reference to treat size == 0 specially but then
 bother to compute a size here - that looks unneeded?
>>>
>>> Yes, well spotted, thanks!  I simplified the code so this isn't
>>> necessary, and neither is the type.
>>>

 While the offset of the reference determines the first field in the
 CONSTRUCTOR, how do you know the access doesn't touch
 adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
 so consider

   char x[2][4] = { "abcd", "abcd" };

 and MEM[&x] with a char[8] type?  memcpy "inlining" will create
 such MEMs for example.
>>>
>>> The code is only used to find string constants in initializer
>>> expressions where I don't think the size of the access comes
>>> into play.  If a memcpy() call results in a MEM_REF[char[8],
>>> &x, 8] that's fine.  It's a valid reference and we can still
>>> get the underlying character sequence (which is represented
>>> as two STRING_CSTs with the two string literals).  I might
>>> be missing the point of your question.
>>
>> Maybe irrelevant for strlen folding depending on what you do
>> for missing '\0' termination.
>>

 @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree
 ctor,
tree byte_offset = DECL_FIELD_OFFSET (cfield);
tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
tree field_size = DECL_SIZE (cfield);
 -  offset_int bitoffset;
 -  offset_int bitoffset_end, access_end;
 +
 +  if (!field_size && TREE_CODE (cval) == STRING_CST)
 +   {
 + /* Determine the size of the flexible array member from
 +the size of the string initializer provided for it.  */
 + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
 + tree eltype = TREE_TYPE (TREE_TYPE (cval));
 + len *= tree_to_uhwi (TYPE_SIZE (eltype));
 + field_size = build_int_cst (size_type_node, len);
 +   }

 Why does this only apply to STRING_CST initializers and not
 CONSTRUCTORS,
 say, for

 struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };
>>>
>>> I can't think of a use for it.  Do you have something in mind?
>>
>> Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
>> which is useful in other parts of the compiler.  So I don't see why
>> it shouldn't work for general flex-arrays.
>>

 ?  And why not use simply

   field_size = TYPE_SIZE (TRE

Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-09 Thread Martin Sebor

On 07/09/2018 06:40 AM, Richard Biener wrote:

On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:


On 07/06/2018 09:52 AM, Richard Biener wrote:

On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:


GCC folds accesses to members of constant aggregates except
for character arrays/strings.  For example, the strlen() call
below is not folded:

   const char a[][4] = { "1", "12" };

   int f (void) { retturn strlen (a[1]); }

The attached change set enhances the string_constant() function
to make it possible to extract string constants from aggregate
initializers (CONSTRUCTORS).

The initial solution was much simpler but as is often the case,
MEM_REF made it fail to fold things like:

   int f (void) { retturn strlen (a[1] + 1); }

Handling those made the project a bit more interesting and
the final solution somewhat more involved.

To handle offsets into aggregate string members the patch also
extends the fold_ctor_reference() function to extract entire
string array initializers even if the offset points past
the beginning of the string and even though the size and
exact type of the reference are not known (there isn't enough
information in a MEM_REF to determine that).

Tested along with the patch for PR 86415 on x86_64-linux.


+  if (TREE_CODE (init) == CONSTRUCTOR)
+   {
+ tree type;
+ if (TREE_CODE (arg) == ARRAY_REF
+ || TREE_CODE (arg) == MEM_REF)
+   type = TREE_TYPE (arg);
+ else if (TREE_CODE (arg) == COMPONENT_REF)
+   {
+ tree field = TREE_OPERAND (arg, 1);
+ type = TREE_TYPE (field);
+   }
+ else
+   return NULL_TREE;

what's wrong with just

type = TREE_TYPE (field);


In response to your comment below abut size I simplified things
further so determining the type a priori is no longer necessary.


?

+ base_off *= BITS_PER_UNIT;

poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
for poly you'd then use poly_offset?


Okay, I tried to avoid the overflow.  (Converting between all
these flavors of wide int types is a monumental PITA.)



You extend fold_ctor_reference to treat size == 0 specially but then
bother to compute a size here - that looks unneeded?


Yes, well spotted, thanks!  I simplified the code so this isn't
necessary, and neither is the type.



While the offset of the reference determines the first field in the
CONSTRUCTOR, how do you know the access doesn't touch
adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
so consider

  char x[2][4] = { "abcd", "abcd" };

and MEM[&x] with a char[8] type?  memcpy "inlining" will create
such MEMs for example.


The code is only used to find string constants in initializer
expressions where I don't think the size of the access comes
into play.  If a memcpy() call results in a MEM_REF[char[8],
&x, 8] that's fine.  It's a valid reference and we can still
get the underlying character sequence (which is represented
as two STRING_CSTs with the two string literals).  I might
be missing the point of your question.


Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.



@@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
   tree byte_offset = DECL_FIELD_OFFSET (cfield);
   tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
   tree field_size = DECL_SIZE (cfield);
-  offset_int bitoffset;
-  offset_int bitoffset_end, access_end;
+
+  if (!field_size && TREE_CODE (cval) == STRING_CST)
+   {
+ /* Determine the size of the flexible array member from
+the size of the string initializer provided for it.  */
+ unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
+ tree eltype = TREE_TYPE (TREE_TYPE (cval));
+ len *= tree_to_uhwi (TYPE_SIZE (eltype));
+ field_size = build_int_cst (size_type_node, len);
+   }

Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
say, for

struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };


I can't think of a use for it.  Do you have something in mind?


Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler.  So I don't see why
it shouldn't work for general flex-arrays.



?  And why not use simply

  field_size = TYPE_SIZE (TREE_TYPE (cval));

like you do in c_strlen?


Yes, that's simpler, thanks.



Otherwise looks reasonable.


Attached is an updated patch.  I also enhanced the handling
of non-constant indices.  They were already handled before
to a smaller extent.  (There may be other opportunities
here.)


Please don't do functional changes to a patch in review, without
exactly pointing out the change.  It makes review inefficent for me.

Looks like it might be the NULL type argument handling?


Sorry.  The change I was referring to is the addition and handling
of the varidx variable to string_constant.  It was necessary for
parity 

Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-09 Thread Richard Biener
On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:
>
> On 07/06/2018 09:52 AM, Richard Biener wrote:
> > On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:
> >>
> >> GCC folds accesses to members of constant aggregates except
> >> for character arrays/strings.  For example, the strlen() call
> >> below is not folded:
> >>
> >>const char a[][4] = { "1", "12" };
> >>
> >>int f (void) { retturn strlen (a[1]); }
> >>
> >> The attached change set enhances the string_constant() function
> >> to make it possible to extract string constants from aggregate
> >> initializers (CONSTRUCTORS).
> >>
> >> The initial solution was much simpler but as is often the case,
> >> MEM_REF made it fail to fold things like:
> >>
> >>int f (void) { retturn strlen (a[1] + 1); }
> >>
> >> Handling those made the project a bit more interesting and
> >> the final solution somewhat more involved.
> >>
> >> To handle offsets into aggregate string members the patch also
> >> extends the fold_ctor_reference() function to extract entire
> >> string array initializers even if the offset points past
> >> the beginning of the string and even though the size and
> >> exact type of the reference are not known (there isn't enough
> >> information in a MEM_REF to determine that).
> >>
> >> Tested along with the patch for PR 86415 on x86_64-linux.
> >
> > +  if (TREE_CODE (init) == CONSTRUCTOR)
> > +   {
> > + tree type;
> > + if (TREE_CODE (arg) == ARRAY_REF
> > + || TREE_CODE (arg) == MEM_REF)
> > +   type = TREE_TYPE (arg);
> > + else if (TREE_CODE (arg) == COMPONENT_REF)
> > +   {
> > + tree field = TREE_OPERAND (arg, 1);
> > + type = TREE_TYPE (field);
> > +   }
> > + else
> > +   return NULL_TREE;
> >
> > what's wrong with just
> >
> > type = TREE_TYPE (field);
>
> In response to your comment below abut size I simplified things
> further so determining the type a priori is no longer necessary.
>
> > ?
> >
> > + base_off *= BITS_PER_UNIT;
> >
> > poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
> > for poly you'd then use poly_offset?
>
> Okay, I tried to avoid the overflow.  (Converting between all
> these flavors of wide int types is a monumental PITA.)
>
> >
> > You extend fold_ctor_reference to treat size == 0 specially but then
> > bother to compute a size here - that looks unneeded?
>
> Yes, well spotted, thanks!  I simplified the code so this isn't
> necessary, and neither is the type.
>
> >
> > While the offset of the reference determines the first field in the
> > CONSTRUCTOR, how do you know the access doesn't touch
> > adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
> > so consider
> >
> >   char x[2][4] = { "abcd", "abcd" };
> >
> > and MEM[&x] with a char[8] type?  memcpy "inlining" will create
> > such MEMs for example.
>
> The code is only used to find string constants in initializer
> expressions where I don't think the size of the access comes
> into play.  If a memcpy() call results in a MEM_REF[char[8],
> &x, 8] that's fine.  It's a valid reference and we can still
> get the underlying character sequence (which is represented
> as two STRING_CSTs with the two string literals).  I might
> be missing the point of your question.

Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.

> >
> > @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
> >tree byte_offset = DECL_FIELD_OFFSET (cfield);
> >tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
> >tree field_size = DECL_SIZE (cfield);
> > -  offset_int bitoffset;
> > -  offset_int bitoffset_end, access_end;
> > +
> > +  if (!field_size && TREE_CODE (cval) == STRING_CST)
> > +   {
> > + /* Determine the size of the flexible array member from
> > +the size of the string initializer provided for it.  */
> > + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
> > + tree eltype = TREE_TYPE (TREE_TYPE (cval));
> > + len *= tree_to_uhwi (TYPE_SIZE (eltype));
> > + field_size = build_int_cst (size_type_node, len);
> > +   }
> >
> > Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
> > say, for
> >
> > struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };
>
> I can't think of a use for it.  Do you have something in mind?

Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler.  So I don't see why
it shouldn't work for general flex-arrays.

> >
> > ?  And why not use simply
> >
> >   field_size = TYPE_SIZE (TREE_TYPE (cval));
> >
> > like you do in c_strlen?
>
> Yes, that's simpler, thanks.
>
> >
> > Otherwise looks reasonable.
>
> Attached is an updated patch.  I also enhanced the handling
> of non-constant indices.  They were already handled before
> to a smaller extent.  

Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-07 Thread Martin Sebor

On 07/06/2018 09:52 AM, Richard Biener wrote:

On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:


GCC folds accesses to members of constant aggregates except
for character arrays/strings.  For example, the strlen() call
below is not folded:

   const char a[][4] = { "1", "12" };

   int f (void) { retturn strlen (a[1]); }

The attached change set enhances the string_constant() function
to make it possible to extract string constants from aggregate
initializers (CONSTRUCTORS).

The initial solution was much simpler but as is often the case,
MEM_REF made it fail to fold things like:

   int f (void) { retturn strlen (a[1] + 1); }

Handling those made the project a bit more interesting and
the final solution somewhat more involved.

To handle offsets into aggregate string members the patch also
extends the fold_ctor_reference() function to extract entire
string array initializers even if the offset points past
the beginning of the string and even though the size and
exact type of the reference are not known (there isn't enough
information in a MEM_REF to determine that).

Tested along with the patch for PR 86415 on x86_64-linux.


+  if (TREE_CODE (init) == CONSTRUCTOR)
+   {
+ tree type;
+ if (TREE_CODE (arg) == ARRAY_REF
+ || TREE_CODE (arg) == MEM_REF)
+   type = TREE_TYPE (arg);
+ else if (TREE_CODE (arg) == COMPONENT_REF)
+   {
+ tree field = TREE_OPERAND (arg, 1);
+ type = TREE_TYPE (field);
+   }
+ else
+   return NULL_TREE;

what's wrong with just

type = TREE_TYPE (field);


In response to your comment below abut size I simplified things
further so determining the type a priori is no longer necessary.


?

+ base_off *= BITS_PER_UNIT;

poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
for poly you'd then use poly_offset?


Okay, I tried to avoid the overflow.  (Converting between all
these flavors of wide int types is a monumental PITA.)



You extend fold_ctor_reference to treat size == 0 specially but then
bother to compute a size here - that looks unneeded?


Yes, well spotted, thanks!  I simplified the code so this isn't
necessary, and neither is the type.



While the offset of the reference determines the first field in the
CONSTRUCTOR, how do you know the access doesn't touch
adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
so consider

  char x[2][4] = { "abcd", "abcd" };

and MEM[&x] with a char[8] type?  memcpy "inlining" will create
such MEMs for example.


The code is only used to find string constants in initializer
expressions where I don't think the size of the access comes
into play.  If a memcpy() call results in a MEM_REF[char[8],
&x, 8] that's fine.  It's a valid reference and we can still
get the underlying character sequence (which is represented
as two STRING_CSTs with the two string literals).  I might
be missing the point of your question.



@@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
   tree byte_offset = DECL_FIELD_OFFSET (cfield);
   tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
   tree field_size = DECL_SIZE (cfield);
-  offset_int bitoffset;
-  offset_int bitoffset_end, access_end;
+
+  if (!field_size && TREE_CODE (cval) == STRING_CST)
+   {
+ /* Determine the size of the flexible array member from
+the size of the string initializer provided for it.  */
+ unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
+ tree eltype = TREE_TYPE (TREE_TYPE (cval));
+ len *= tree_to_uhwi (TYPE_SIZE (eltype));
+ field_size = build_int_cst (size_type_node, len);
+   }

Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
say, for

struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };


I can't think of a use for it.  Do you have something in mind?



?  And why not use simply

  field_size = TYPE_SIZE (TREE_TYPE (cval));

like you do in c_strlen?


Yes, that's simpler, thanks.



Otherwise looks reasonable.


Attached is an updated patch.  I also enhanced the handling
of non-constant indices.  They were already handled before
to a smaller extent.  (There may be other opportunities
here.)

Martin
PR middle-end/77357 - strlen of constant strings not folded

gcc/ChangeLog:

	* builtins.c (c_strlen): Avoid out-of-bounds warnings when
	accessing implicitly initialized array elements.
	* expr.c (string_constant): Handle string initializers of
	character arrays within aggregates.
	* gimple-fold.c (fold_array_ctor_reference): Add argument.
	Store element offset.  As a special case, handle zero size.
	(fold_nonarray_ctor_reference): Same.
	(fold_ctor_reference): Add argument.  Store subobject offset.
	* gimple-fold.h (fold_ctor_reference): Add argument.

gcc/testsuite/ChangeLog:

	PR middle-end/77357
	* gcc.dg/strlenopt-49.c: New test.
	* gcc.dg/strlenopt-50.c: New test.
	* gcc.dg/strleno

Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-06 Thread Richard Biener
On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:
>
> GCC folds accesses to members of constant aggregates except
> for character arrays/strings.  For example, the strlen() call
> below is not folded:
>
>const char a[][4] = { "1", "12" };
>
>int f (void) { retturn strlen (a[1]); }
>
> The attached change set enhances the string_constant() function
> to make it possible to extract string constants from aggregate
> initializers (CONSTRUCTORS).
>
> The initial solution was much simpler but as is often the case,
> MEM_REF made it fail to fold things like:
>
>int f (void) { retturn strlen (a[1] + 1); }
>
> Handling those made the project a bit more interesting and
> the final solution somewhat more involved.
>
> To handle offsets into aggregate string members the patch also
> extends the fold_ctor_reference() function to extract entire
> string array initializers even if the offset points past
> the beginning of the string and even though the size and
> exact type of the reference are not known (there isn't enough
> information in a MEM_REF to determine that).
>
> Tested along with the patch for PR 86415 on x86_64-linux.

+  if (TREE_CODE (init) == CONSTRUCTOR)
+   {
+ tree type;
+ if (TREE_CODE (arg) == ARRAY_REF
+ || TREE_CODE (arg) == MEM_REF)
+   type = TREE_TYPE (arg);
+ else if (TREE_CODE (arg) == COMPONENT_REF)
+   {
+ tree field = TREE_OPERAND (arg, 1);
+ type = TREE_TYPE (field);
+   }
+ else
+   return NULL_TREE;

what's wrong with just

type = TREE_TYPE (field);

?

+ base_off *= BITS_PER_UNIT;

poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
for poly you'd then use poly_offset?

You extend fold_ctor_reference to treat size == 0 specially but then
bother to compute a size here - that looks unneeded?

While the offset of the reference determines the first field in the
CONSTRUCTOR, how do you know the access doesn't touch
adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
so consider

  char x[2][4] = { "abcd", "abcd" };

and MEM[&x] with a char[8] type?  memcpy "inlining" will create
such MEMs for example.

@@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
   tree byte_offset = DECL_FIELD_OFFSET (cfield);
   tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
   tree field_size = DECL_SIZE (cfield);
-  offset_int bitoffset;
-  offset_int bitoffset_end, access_end;
+
+  if (!field_size && TREE_CODE (cval) == STRING_CST)
+   {
+ /* Determine the size of the flexible array member from
+the size of the string initializer provided for it.  */
+ unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
+ tree eltype = TREE_TYPE (TREE_TYPE (cval));
+ len *= tree_to_uhwi (TYPE_SIZE (eltype));
+ field_size = build_int_cst (size_type_node, len);
+   }

Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
say, for

struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };

?  And why not use simply

  field_size = TYPE_SIZE (TREE_TYPE (cval));

like you do in c_strlen?

Otherwise looks reasonable.

Thanks,
Richard.

> Martin