Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-10-04 Thread Jeff Law
On 9/23/19 4:14 PM, Martin Sebor wrote:

> 
> Yes, it looks redundant.  I never remember which of these functions
> ICE when their argument is not a constant (e.g., tree_int_cst_lt)
> and which ones handle it gracefully (e.g., tree_int_cst_equal) so
> I often check even when it isn't necessary.  It would be nice if
> these closely related APIs had consistent preconditions.
Can't argue with that!

> 
> MAXBOUND is only non-constant when set that way by client code to
> have the function set it to the longest PHI argument, otherwise
> it's either an INTEGER_CST or null.  The inner test may be dead
> code, a leftover from something earlier.  Either way, MAXBOUND
> is only used for diagnostics so it probably doesn't matter.
> 
 @@ -1653,8 +1661,11 @@ get_range_strlen (tree arg, bitmap *visited,
>>>  
>>>   /* Try to obtain the range of the lengths of the string(s) referenced
>>>  by ARG, or the size of the largest array ARG refers to if the range
>>> -   of lengths cannot be determined, and store all in *PDATA.  ELTSIZE
>>> -   is the expected size of the string element in bytes: 1 for char and
>>> +   of lengths cannot be determined, and store all in *PDATA which must
>>> +   be zero-initialized on input except PDATA->MAXBOUND may be set to
>>> +   a non-null tree node other than INTEGER_CST to request to have it
>>> +   set to the length of the longest string in a PHI.  ELTSIZE is
>>> +   the expected size of the string element in bytes: 1 for char and
>> Is there any reason we can't just make a clean distinction between input
>> and output objects in this routine?  As an API this seems awkward at best.
>>
>> Any thoughts on the API question raised?
> 
> I didn't add a new argument because in GCC 9 we got rid of a bunch
> of them to make the function less confusing.  The final signature
> (before the simplification) had 8 arguments:
> 
>    get_range_strlen (tree arg, tree length[2], bitmap *visited,
>  int type, int fuzzy, bool *flexp,
>  unsigned eltsize, tree *nonstr)
> 
> Some of them were being tested inconsistently and their effects
> were pretty subtle (especially TYPE and FUZZY).  The MAXBOUND
> setting is also subtle and used only for warnings so I'd rather
> not expose it as an argument that every caller has to worry about
> if it isn't necessary.
> 
> Longer term, I think a better design than directly accessing
> the data members is for c_strlen_data to become a proper C++ class
> with accessor functions to hide this stuff behind so these kinds
> of "warts" could be hidden out of sight.  Since it will touch all
> callers it should be made in a change independent of this one.
> 
> So for now I've removed the redundant test and fixed the typos below
> (clearly, I need a spell check for code comments).  I also had to
> make a few other minor tweaks to adjust to the recent changes on
> trunk.  Attached is an updated patch.
Sounds reasonable.  And yes, I'm certainly a fan of moving towards
proper classes.


> gcc-90879.diff
> 
> PR tree-optimization/90879 - fold zero-equality of strcmp between a longer 
> string and a smaller array
> 
> gcc/c-family/ChangeLog:
> 
>   PR tree-optimization/90879
>   * c.opt (-Wstring-compare): New option.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/90879
>   * gcc.dg/Wstring-compare-2.c: New test.
>   * gcc.dg/Wstring-compare.c: New test.
>   * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
>   * gcc.dg/strcmpopt_6.c: New test.
>   * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
>   test cases.
>   * gcc.dg/strlenopt-66.c: Run it.
>   * gcc.dg/strlenopt-68.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/90879
>   * builtins.c (check_access): Avoid using maxbound when null.
>   * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen change.
>   * doc/invoke.texi (-Wstring-compare): Document new warning option.
>   * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
>   conditional.
>   (get_range_strlen): Overwrite initial maxbound when non-null.
>   * gimple-ssa-sprintf.c (get_string_length): Adjust to get_range_strlen
>   changes.
>   * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
>   (used_only_for_zero_equality): New function.
>   (handle_builtin_memcmp): Call it.
>   (determine_min_objsize): Return an integer instead of tree.
>   (get_len_or_size, strxcmp_eqz_result): New functions.
>   (maybe_warn_pointless_strcmp): New function.
>   (handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
>   between a longer string and a smaller array.
>   (get_range_strlen_dynamic): Overwrite initial maxbound when non-null.
Parts of this look like bits which I've already approved (some of the
get_range_strlen_dynamic bits).  But regardless, this is OK.


Jeff


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-09-23 Thread Martin Sebor

On 9/3/19 2:00 PM, Jeff Law wrote:

On 8/28/19 3:12 PM, Martin Sebor wrote:

On 8/22/19 3:31 PM, Jeff Law wrote:

On 8/20/19 8:10 PM, Martin Sebor wrote:

Jeff,

Please let me know if you agree/disagree and what I need to
do to advance this work:

    https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html

For the official record, I agree :-)


Great! :)

Any comments/suggestions on the patch?

   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html

Martin

Yea, they were in an earlier message.  I'll extract the relevant
comments since some we addressed independently:


  
  
  
@@ -325,7 +333,7 @@ state_ident_by_name (const char *name, enum insert_option optins)

namlen = strlen (name);
stid =
  (struct state_ident_st *) xmalloc (sizeof (struct state_ident_st) +
-  namlen);
+  namlen + 1);
memset (stid, 0, sizeof (struct state_ident_st) + namlen);
strcpy (stid->stid_name, name);
*slot = stid;

How did you find this goof?




This was more a curiosity than anything.  Nothing we need to change here.


The code is correct as is, I just adjusted the allocated amount to
account for the change to use the zero length trailing array instead
of the [1] kind.  But neither was part of the updated patch (I had
initially posted an outdated version of my patch).




diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index fc57fb45e3a..582768090ae 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1346,6 +1346,10 @@ get_range_strlen_tree (tree arg, bitmap *visited, 
strlen_range_kind rkind,
}
  }
  
+  /* Set if VAL represents the maximum length based on array size (set

+ when exact length cannot be determined).  */
+  bool maxbound = false;
+
if (!val && rkind == SRK_LENRANGE)
  {
if (TREE_CODE (arg) == ADDR_EXPR)
@@ -1441,6 +1445,7 @@ get_range_strlen_tree (tree arg, bitmap *visited, 
strlen_range_kind rkind,
  pdata->minlen = ssize_int (0);
}
}
+  maxbound = true;
  }
  
if (!val)

@@ -1454,7 +1459,7 @@ get_range_strlen_tree (tree arg, bitmap *visited, 
strlen_range_kind rkind,
  && tree_int_cst_lt (val, pdata->minlen)))
  pdata->minlen = val;
  
-  if (pdata->maxbound)

+  if (pdata->maxbound && TREE_CODE (pdata->maxbound) == INTEGER_CST)
  {
/* Adjust the tighter (more optimistic) string length bound
 if necessary and proceed to adjust the more conservative

So inside the conditional guarded by the test you're changing above we have:

  if (TREE_CODE (val) == INTEGER_CST)
 {
   if (TREE_CODE (pdata->maxbound) == INTEGER_CST)
 {
   if (tree_int_cst_lt (pdata->maxbound, val))
 pdata->maxbound = val;
 }
   else
 pdata->maxbound = build_all_ones_cst (size_type_node);
 }

Isn't the inner test that pdata->maxbound == INTEGER_CST always true and
we should remove the test and the else clause?


Yes, it looks redundant.  I never remember which of these functions
ICE when their argument is not a constant (e.g., tree_int_cst_lt)
and which ones handle it gracefully (e.g., tree_int_cst_equal) so
I often check even when it isn't necessary.  It would be nice if
these closely related APIs had consistent preconditions.

   Does the else clause

need to be handled elsewhere (I don't see that it would be handled after
your changes).  Or perhaps it just doesn't matter...


It's handled in the else block, except differently than before.


The redundant test of TREE_CODE (pdata->maxbound) == INTEGER_CST is a
bit of nit, but we might as well clean that up.

I couldn't convince myself that losing the else clause handling was
correct or not.


MAXBOUND is only non-constant when set that way by client code to
have the function set it to the longest PHI argument, otherwise
it's either an INTEGER_CST or null.  The inner test may be dead
code, a leftover from something earlier.  Either way, MAXBOUND
is only used for diagnostics so it probably doesn't matter.


@@ -1653,8 +1661,11 @@ get_range_strlen (tree arg, bitmap *visited,
  
  /* Try to obtain the range of the lengths of the string(s) referenced

 by ARG, or the size of the largest array ARG refers to if the range
-   of lengths cannot be determined, and store all in *PDATA.  ELTSIZE
-   is the expected size of the string element in bytes: 1 for char and
+   of lengths cannot be determined, and store all in *PDATA which must
+   be zero-initialized on input except PDATA->MAXBOUND may be set to
+   a non-null tree node other than INTEGER_CST to request to have it
+   set to the length of the longest string in a PHI.  ELTSIZE is
+   the expected size of the string element in bytes: 1 for char and

Is there any reason we can't just make a clean distinction between input
and output objects in this routine?  As an API this seems awkward at best.
Any thoughts on the API 

Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-09-03 Thread Jeff Law
On 8/28/19 3:12 PM, Martin Sebor wrote:
> On 8/22/19 3:31 PM, Jeff Law wrote:
>> On 8/20/19 8:10 PM, Martin Sebor wrote:
>>> Jeff,
>>>
>>> Please let me know if you agree/disagree and what I need to
>>> do to advance this work:
>>>
>>>    https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html
>> For the official record, I agree :-)
> 
> Great! :)
> 
> Any comments/suggestions on the patch?
> 
>   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html
> 
> Martin
Yea, they were in an earlier message.  I'll extract the relevant
comments since some we addressed independently:


>>  
>>  
>>  
>> @@ -325,7 +333,7 @@ state_ident_by_name (const char *name, enum 
>> insert_option optins)
>>namlen = strlen (name);
>>stid =
>>  (struct state_ident_st *) xmalloc (sizeof (struct state_ident_st) +
>> -   namlen);
>> +   namlen + 1);
>>memset (stid, 0, sizeof (struct state_ident_st) + namlen);
>>strcpy (stid->stid_name, name);
>>*slot = stid;
> How did you find this goof?
> 
> 
> 
This was more a curiosity than anything.  Nothing we need to change here.

>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index fc57fb45e3a..582768090ae 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1346,6 +1346,10 @@ get_range_strlen_tree (tree arg, bitmap *visited, 
>> strlen_range_kind rkind,
>>  }
>>  }
>>  
>> +  /* Set if VAL represents the maximum length based on array size (set
>> + when exact length cannot be determined).  */
>> +  bool maxbound = false;
>> +
>>if (!val && rkind == SRK_LENRANGE)
>>  {
>>if (TREE_CODE (arg) == ADDR_EXPR)
>> @@ -1441,6 +1445,7 @@ get_range_strlen_tree (tree arg, bitmap *visited, 
>> strlen_range_kind rkind,
>>pdata->minlen = ssize_int (0);
>>  }
>>  }
>> +  maxbound = true;
>>  }
>>  
>>if (!val)
>> @@ -1454,7 +1459,7 @@ get_range_strlen_tree (tree arg, bitmap *visited, 
>> strlen_range_kind rkind,
>>&& tree_int_cst_lt (val, pdata->minlen)))
>>  pdata->minlen = val;
>>  
>> -  if (pdata->maxbound)
>> +  if (pdata->maxbound && TREE_CODE (pdata->maxbound) == INTEGER_CST)
>>  {
>>/* Adjust the tighter (more optimistic) string length bound
>>   if necessary and proceed to adjust the more conservative
> So inside the conditional guarded by the test you're changing above we have:
> 
>  if (TREE_CODE (val) == INTEGER_CST)
> {
>   if (TREE_CODE (pdata->maxbound) == INTEGER_CST)
> {
>   if (tree_int_cst_lt (pdata->maxbound, val))
> pdata->maxbound = val;
> }
>   else
> pdata->maxbound = build_all_ones_cst (size_type_node);
> }
> 
> Isn't the inner test that pdata->maxbound == INTEGER_CST always true and
> we should remove the test and the else clause?  Does the else clause
> need to be handled elsewhere (I don't see that it would be handled after
> your changes).  Or perhaps it just doesn't matter...
The redundant test of TREE_CODE (pdata->maxbound) == INTEGER_CST is a
bit of nit, but we might as well clean that up.

I couldn't convince myself that losing the else clause handling was
correct or not.

> 
> 
> > @@ -1653,8 +1661,11 @@ get_range_strlen (tree arg, bitmap *visited,
>  
>  /* Try to obtain the range of the lengths of the string(s) referenced
> by ARG, or the size of the largest array ARG refers to if the range
> -   of lengths cannot be determined, and store all in *PDATA.  ELTSIZE
> -   is the expected size of the string element in bytes: 1 for char and
> +   of lengths cannot be determined, and store all in *PDATA which must
> +   be zero-initialized on input except PDATA->MAXBOUND may be set to
> +   a non-null tree node other than INTEGER_CST to request to have it
> +   set to the length of the longest string in a PHI.  ELTSIZE is
> +   the expected size of the string element in bytes: 1 for char and
Is there any reason we can't just make a clean distinction between input
and output objects in this routine?  As an API this seems awkward at best.
Any thoughts on the API question raised?


The rest are just nits/typos:


> @@ -2862,51 +2865,78 @@ handle_builtin_memset (gimple_stmt_iterator *gsi)
>return true;
>  }
>  
> -/* Handle a call to memcmp.  We try to handle small comparisons by
> -   converting them to load and compare, and replacing the call to memcmp
> -   with a __builtin_memcmp_eq call where possible.
> -   return true when call is transformed, return false otherwise.  */
> +/* Return a pointer to the first such equality expression if RES is used
> +   only in experessions testing its equality to zero, and null otherwise.  */
s/experessions/expressions/


>  
> -static bool
> -handle_builtin_memcmp (gimple_stmt_iterator *gsi)
> +static gimple*
> +used_only_for_zero_equality (tree res)
Nit.  A space between "gimple" and "*".




> +
> +/* If IDX1 and IDX2 refer to 

Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-28 Thread Martin Sebor

On 8/22/19 3:31 PM, Jeff Law wrote:

On 8/20/19 8:10 PM, Martin Sebor wrote:

Jeff,

Please let me know if you agree/disagree and what I need to
do to advance this work:

   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html

For the official record, I agree :-)


Great! :)

Any comments/suggestions on the patch?

  https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html

Martin


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-22 Thread Jeff Law
On 8/20/19 8:10 PM, Martin Sebor wrote:
> Jeff,
> 
> Please let me know if you agree/disagree and what I need to
> do to advance this work:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html
For the official record, I agree :-)

jeff


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-20 Thread Martin Sebor

Jeff,

Please let me know if you agree/disagree and what I need to
do to advance this work:

  https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html

Thanks
Martin

On 8/14/19 1:59 PM, Martin Sebor wrote:

On 8/13/19 4:46 PM, Jeff Law wrote:

On 8/13/19 3:43 PM, Martin Sebor wrote:

On 8/13/19 2:07 PM, Jeff Law wrote:

On 8/9/19 10:51 AM, Martin Sebor wrote:


PR tree-optimization/90879 - fold zero-equality of strcmp between a
longer string and a smaller array

gcc/c-family/ChangeLog:

 PR tree-optimization/90879
 * c.opt (-Wstring-compare): New option.

gcc/testsuite/ChangeLog:

 PR tree-optimization/90879
 * gcc.dg/Wstring-compare-2.c: New test.
 * gcc.dg/Wstring-compare.c: New test.
 * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
 * gcc.dg/strcmpopt_6.c: New test.
 * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
 test cases.
 * gcc.dg/strlenopt-66.c: Run it.
 * gcc.dg/strlenopt-68.c: New test.

gcc/ChangeLog:

 PR tree-optimization/90879
 * builtins.c (check_access): Avoid using maxbound when null.
 * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen
change.
 * doc/invoke.texi (-Wstring-compare): Document new warning 
option.

 * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
 conditional.
 (get_range_strlen): Overwrite initial maxbound when non-null.
 * gimple-ssa-sprintf.c (get_string_length): Adjust to
get_range_strlen
 change.
 * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
 (used_only_for_zero_equality): New function.
 (handle_builtin_memcmp): Call it.
 (determine_min_objsize): Return an integer instead of tree.
 (get_len_or_size, strxcmp_eqz_result): New functions.
 (maybe_warn_pointless_strcmp): New function.
 (handle_builtin_string_cmp): Call it.  Fold zero-equality of 
strcmp

 between a longer string and a smaller array.




diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 4af47855e7c..31e012b741b 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c



@@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest)
       type = TYPE_MAIN_VARIANT (type);
   -  /* We cannot determine the size of the array if it's a flexible
array,
- which is declared at the end of a structure.  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && !array_at_struct_end_p (dest))
+  /* The size of a flexible array cannot be determined.  Otherwise,
+ for arrays with more than one element, return the size of its
+ type.  GCC itself misuses arrays of both zero and one elements
+ as flexible array members so they are excluded as well.  */
+  if (TREE_CODE (type) != ARRAY_TYPE
+  || !array_at_struct_end_p (dest))
   {
-  tree size_t = TYPE_SIZE_UNIT (type);
-  if (size_t && TREE_CODE (size_t) == INTEGER_CST
-  && !integer_zerop (size_t))
-    return size_t;
+  tree type_size = TYPE_SIZE_UNIT (type);
+  if (type_size && TREE_CODE (type_size) == INTEGER_CST
+  && !integer_onep (type_size)
+  && !integer_zerop (type_size))
+    return tree_to_uhwi (type_size);

So I nearly commented on this when looking at the original patch.  Can
we really depend on the size when we've got an array at the end of a
struct with a declared size other than 0/1?   While 0/1 are by far the
most common way to declare them, couldn't someone have used other 
sizes?

   I think we pondered doing that at one time to cut down on the noise
from Coverity for RTL and TREE operand accessors.

Your code makes us safer, so I'm not saying you've done anything wrong,
just trying to decide if we need to tighten this up even further.


This patch issues a warning in these cases, i.e., when it sees
a call like, say, strcmp("foobar", A) with an A that's smaller
than the string, because it seems they are likely (rare) bugs.
I haven't seen the warning in any of the projects I tested it
with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM).

The warning uses strcmp to detect these mistakes (or misuses)
but I'd like to add similar warnings for other string functions
as well and have code out there that does this on purpose use
true flexible array members (or the zero-length extension)
instead.  That makes the intent clear.

It's a judgment call whether to also fold (or do something else
like insert a trap) in addition to issuing a warning.  In this
case (reading) I don't think it matters as much as it does for
writes.  Either way, it would be nice to set a policy and
document it in the manual so users know what to expect and
so we don't have to revisit this question for each patch that
touches on this subject.

The GCC manual documents zero length arrays at the end of an aggregate
as a GNU extension for variable length objects.  The manual also
documents that it could be done with single element arrays, but that
doing so does contribute to the base size of the aggregate, but
otherwise it's handled 

Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-14 Thread Martin Sebor

On 8/13/19 4:46 PM, Jeff Law wrote:

On 8/13/19 3:43 PM, Martin Sebor wrote:

On 8/13/19 2:07 PM, Jeff Law wrote:

On 8/9/19 10:51 AM, Martin Sebor wrote:


PR tree-optimization/90879 - fold zero-equality of strcmp between a
longer string and a smaller array

gcc/c-family/ChangeLog:

 PR tree-optimization/90879
 * c.opt (-Wstring-compare): New option.

gcc/testsuite/ChangeLog:

 PR tree-optimization/90879
 * gcc.dg/Wstring-compare-2.c: New test.
 * gcc.dg/Wstring-compare.c: New test.
 * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
 * gcc.dg/strcmpopt_6.c: New test.
 * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
 test cases.
 * gcc.dg/strlenopt-66.c: Run it.
 * gcc.dg/strlenopt-68.c: New test.

gcc/ChangeLog:

 PR tree-optimization/90879
 * builtins.c (check_access): Avoid using maxbound when null.
 * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen
change.
 * doc/invoke.texi (-Wstring-compare): Document new warning option.
 * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
 conditional.
 (get_range_strlen): Overwrite initial maxbound when non-null.
 * gimple-ssa-sprintf.c (get_string_length): Adjust to
get_range_strlen
 change.
 * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
 (used_only_for_zero_equality): New function.
 (handle_builtin_memcmp): Call it.
 (determine_min_objsize): Return an integer instead of tree.
 (get_len_or_size, strxcmp_eqz_result): New functions.
 (maybe_warn_pointless_strcmp): New function.
 (handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
 between a longer string and a smaller array.




diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 4af47855e7c..31e012b741b 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c



@@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest)
       type = TYPE_MAIN_VARIANT (type);
   -  /* We cannot determine the size of the array if it's a flexible
array,
- which is declared at the end of a structure.  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && !array_at_struct_end_p (dest))
+  /* The size of a flexible array cannot be determined.  Otherwise,
+ for arrays with more than one element, return the size of its
+ type.  GCC itself misuses arrays of both zero and one elements
+ as flexible array members so they are excluded as well.  */
+  if (TREE_CODE (type) != ARRAY_TYPE
+  || !array_at_struct_end_p (dest))
   {
-  tree size_t = TYPE_SIZE_UNIT (type);
-  if (size_t && TREE_CODE (size_t) == INTEGER_CST
-  && !integer_zerop (size_t))
-    return size_t;
+  tree type_size = TYPE_SIZE_UNIT (type);
+  if (type_size && TREE_CODE (type_size) == INTEGER_CST
+  && !integer_onep (type_size)
+  && !integer_zerop (type_size))
+    return tree_to_uhwi (type_size);

So I nearly commented on this when looking at the original patch.  Can
we really depend on the size when we've got an array at the end of a
struct with a declared size other than 0/1?   While 0/1 are by far the
most common way to declare them, couldn't someone have used other sizes?
   I think we pondered doing that at one time to cut down on the noise
from Coverity for RTL and TREE operand accessors.

Your code makes us safer, so I'm not saying you've done anything wrong,
just trying to decide if we need to tighten this up even further.


This patch issues a warning in these cases, i.e., when it sees
a call like, say, strcmp("foobar", A) with an A that's smaller
than the string, because it seems they are likely (rare) bugs.
I haven't seen the warning in any of the projects I tested it
with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM).

The warning uses strcmp to detect these mistakes (or misuses)
but I'd like to add similar warnings for other string functions
as well and have code out there that does this on purpose use
true flexible array members (or the zero-length extension)
instead.  That makes the intent clear.

It's a judgment call whether to also fold (or do something else
like insert a trap) in addition to issuing a warning.  In this
case (reading) I don't think it matters as much as it does for
writes.  Either way, it would be nice to set a policy and
document it in the manual so users know what to expect and
so we don't have to revisit this question for each patch that
touches on this subject.

The GCC manual documents zero length arrays at the end of an aggregate
as a GNU extension for variable length objects.  The manual also
documents that it could be done with single element arrays, but that
doing so does contribute to the base size of the aggregate, but
otherwise it's handled like a zero length array.

So both zero and one element arrays are documented as supported for this
use case.  However, I could easily see someone making the case that any
size should work here and I could easily 

Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-14 Thread Martin Sebor

On 8/12/19 7:40 AM, Michael Matz wrote:

Hi,

On Fri, 9 Aug 2019, Martin Sebor wrote:


The solution introduced in C99 is a flexible array.  C++
compilers usually support it as well.  Those that don't are
likely to support the zero-length array (even Visual C++ does).
If there's a chance that some don't support either do you really
think it's safe to assume they will do something sane with
the [1] hack?


As the [1] "hack" is the traditional pre-C99 (and C++) idiom to
implement flexible trailing char arrays, yes, I do expect all existing
(and not any more existing) compilers to do the obvious and sane thing
with it.  IOW: it's more portable in practice than our documented
zero-length extension.  And that's what matters for the things compiled by
the host compiler.

Without requiring C99 (which would be a different discussion) and a
non-existing C++ standard we can't write this code (in this form) in a
standard conforming way, no matter what we wish for.  Hence it seems
prudent to use the most portable variant of all the non-standard ways, the
trailing [1] array.


There are a few reasons why these legacy C idioms should be
replaced with better/newer/safer alternatives.

First, with two C revisions since C99 and with support for
superior alternatives widely available, pre-C99 idioms have less
and less relevance.

Second, since most of GCC requires a C++98 compiler to compile,
ancient C code needs to adjust to the more strict C++ requirements.
As C++ evolves, dependencies on legacy extensions like this one
make it increasingly difficult to upgrade to newer revisions of
the standard.  C++ 11 already requires compilers to reject
undefined behavior in constexpr contexts, including accesses
to arrays outside of their bounds.  Once GCC adopts C++ 11 it
won't be able to make use of constexpr with code that relies
on the hack.

Third, the safest and most secure approach to dealing with past-
the-end accesses is to diagnose and prevent them.  Accommodating
code that disregards the array bounds compromises this goal.  This
is evident from the gaps in _FORTIFY_SOURCE and -Wstringop-overflow
that other compilers like Clang and ICC don't suffer from(*).  It's
in everyone's best interest to proactively drive them to extinction
and replace them by safer alternatives that let compilers distinguish
the intentional accesses from accidental ones.  It not only makes it
easier to find bugs but also emit more efficient object code.

Martin

PS Unlike GCC, both Clang and ICC diagnose past-the-end accesses
to trailing arrays with more than one element.  They do recognize
the struct hack even in C++ and, outside constexpr contexts, avoid
diagnosing past-the-end accesses to trailing one-element arrays.
This isn't so much an issue today because neither allows statically
initializing struct objects with such arrays to more elements than
the bound specifies.  But it will likely change when the C++
proposal for constexpr functions to use new expressions is adopted 
(P0784R1).


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-13 Thread Jeff Law
On 8/13/19 3:43 PM, Martin Sebor wrote:
> On 8/13/19 2:07 PM, Jeff Law wrote:
>> On 8/9/19 10:51 AM, Martin Sebor wrote:
>>>
>>> PR tree-optimization/90879 - fold zero-equality of strcmp between a
>>> longer string and a smaller array
>>>
>>> gcc/c-family/ChangeLog:
>>>
>>> PR tree-optimization/90879
>>> * c.opt (-Wstring-compare): New option.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR tree-optimization/90879
>>> * gcc.dg/Wstring-compare-2.c: New test.
>>> * gcc.dg/Wstring-compare.c: New test.
>>> * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
>>> * gcc.dg/strcmpopt_6.c: New test.
>>> * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
>>> test cases.
>>> * gcc.dg/strlenopt-66.c: Run it.
>>> * gcc.dg/strlenopt-68.c: New test.
>>>
>>> gcc/ChangeLog:
>>>
>>> PR tree-optimization/90879
>>> * builtins.c (check_access): Avoid using maxbound when null.
>>> * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen
>>> change.
>>> * doc/invoke.texi (-Wstring-compare): Document new warning option.
>>> * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
>>> conditional.
>>> (get_range_strlen): Overwrite initial maxbound when non-null.
>>> * gimple-ssa-sprintf.c (get_string_length): Adjust to
>>> get_range_strlen
>>> change.
>>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
>>> (used_only_for_zero_equality): New function.
>>> (handle_builtin_memcmp): Call it.
>>> (determine_min_objsize): Return an integer instead of tree.
>>> (get_len_or_size, strxcmp_eqz_result): New functions.
>>> (maybe_warn_pointless_strcmp): New function.
>>> (handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
>>> between a longer string and a smaller array.
>>>
>>
>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>> index 4af47855e7c..31e012b741b 100644
>>> --- a/gcc/tree-ssa-strlen.c
>>> +++ b/gcc/tree-ssa-strlen.c
>>
>>> @@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest)
>>>       type = TYPE_MAIN_VARIANT (type);
>>>   -  /* We cannot determine the size of the array if it's a flexible
>>> array,
>>> - which is declared at the end of a structure.  */
>>> -  if (TREE_CODE (type) == ARRAY_TYPE
>>> -  && !array_at_struct_end_p (dest))
>>> +  /* The size of a flexible array cannot be determined.  Otherwise,
>>> + for arrays with more than one element, return the size of its
>>> + type.  GCC itself misuses arrays of both zero and one elements
>>> + as flexible array members so they are excluded as well.  */
>>> +  if (TREE_CODE (type) != ARRAY_TYPE
>>> +  || !array_at_struct_end_p (dest))
>>>   {
>>> -  tree size_t = TYPE_SIZE_UNIT (type);
>>> -  if (size_t && TREE_CODE (size_t) == INTEGER_CST
>>> -  && !integer_zerop (size_t))
>>> -    return size_t;
>>> +  tree type_size = TYPE_SIZE_UNIT (type);
>>> +  if (type_size && TREE_CODE (type_size) == INTEGER_CST
>>> +  && !integer_onep (type_size)
>>> +  && !integer_zerop (type_size))
>>> +    return tree_to_uhwi (type_size);
>> So I nearly commented on this when looking at the original patch.  Can
>> we really depend on the size when we've got an array at the end of a
>> struct with a declared size other than 0/1?   While 0/1 are by far the
>> most common way to declare them, couldn't someone have used other sizes?
>>   I think we pondered doing that at one time to cut down on the noise
>> from Coverity for RTL and TREE operand accessors.
>>
>> Your code makes us safer, so I'm not saying you've done anything wrong,
>> just trying to decide if we need to tighten this up even further.
> 
> This patch issues a warning in these cases, i.e., when it sees
> a call like, say, strcmp("foobar", A) with an A that's smaller
> than the string, because it seems they are likely (rare) bugs.
> I haven't seen the warning in any of the projects I tested it
> with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM).
> 
> The warning uses strcmp to detect these mistakes (or misuses)
> but I'd like to add similar warnings for other string functions
> as well and have code out there that does this on purpose use
> true flexible array members (or the zero-length extension)
> instead.  That makes the intent clear.
> 
> It's a judgment call whether to also fold (or do something else
> like insert a trap) in addition to issuing a warning.  In this
> case (reading) I don't think it matters as much as it does for
> writes.  Either way, it would be nice to set a policy and
> document it in the manual so users know what to expect and
> so we don't have to revisit this question for each patch that
> touches on this subject.
The GCC manual documents zero length arrays at the end of an aggregate
as a GNU extension for variable length objects.  The manual also
documents that it could be done with single element arrays, but that
doing so does contribute to the base 

Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-13 Thread Martin Sebor

On 8/13/19 2:07 PM, Jeff Law wrote:

On 8/9/19 10:51 AM, Martin Sebor wrote:


PR tree-optimization/90879 - fold zero-equality of strcmp between a longer 
string and a smaller array

gcc/c-family/ChangeLog:

PR tree-optimization/90879
* c.opt (-Wstring-compare): New option.

gcc/testsuite/ChangeLog:

PR tree-optimization/90879
* gcc.dg/Wstring-compare-2.c: New test.
* gcc.dg/Wstring-compare.c: New test.
* gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
* gcc.dg/strcmpopt_6.c: New test.
* gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
test cases.
* gcc.dg/strlenopt-66.c: Run it.
* gcc.dg/strlenopt-68.c: New test.

gcc/ChangeLog:

PR tree-optimization/90879
* builtins.c (check_access): Avoid using maxbound when null.
* calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen change.
* doc/invoke.texi (-Wstring-compare): Document new warning option.
* gimple-fold.c (get_range_strlen_tree): Make setting maxbound
conditional.
(get_range_strlen): Overwrite initial maxbound when non-null.
* gimple-ssa-sprintf.c (get_string_length): Adjust to get_range_strlen
change.
* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
(used_only_for_zero_equality): New function.
(handle_builtin_memcmp): Call it.
(determine_min_objsize): Return an integer instead of tree.
(get_len_or_size, strxcmp_eqz_result): New functions.
(maybe_warn_pointless_strcmp): New function.
(handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
between a longer string and a smaller array.




diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 4af47855e7c..31e012b741b 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c



@@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest)
  
type = TYPE_MAIN_VARIANT (type);
  
-  /* We cannot determine the size of the array if it's a flexible array,

- which is declared at the end of a structure.  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && !array_at_struct_end_p (dest))
+  /* The size of a flexible array cannot be determined.  Otherwise,
+ for arrays with more than one element, return the size of its
+ type.  GCC itself misuses arrays of both zero and one elements
+ as flexible array members so they are excluded as well.  */
+  if (TREE_CODE (type) != ARRAY_TYPE
+  || !array_at_struct_end_p (dest))
  {
-  tree size_t = TYPE_SIZE_UNIT (type);
-  if (size_t && TREE_CODE (size_t) == INTEGER_CST
- && !integer_zerop (size_t))
-return size_t;
+  tree type_size = TYPE_SIZE_UNIT (type);
+  if (type_size && TREE_CODE (type_size) == INTEGER_CST
+ && !integer_onep (type_size)
+ && !integer_zerop (type_size))
+return tree_to_uhwi (type_size);

So I nearly commented on this when looking at the original patch.  Can
we really depend on the size when we've got an array at the end of a
struct with a declared size other than 0/1?   While 0/1 are by far the
most common way to declare them, couldn't someone have used other sizes?
  I think we pondered doing that at one time to cut down on the noise
from Coverity for RTL and TREE operand accessors.

Your code makes us safer, so I'm not saying you've done anything wrong,
just trying to decide if we need to tighten this up even further.


This patch issues a warning in these cases, i.e., when it sees
a call like, say, strcmp("foobar", A) with an A that's smaller
than the string, because it seems they are likely (rare) bugs.
I haven't seen the warning in any of the projects I tested it
with (Binutils/GDB, GCC, Glibc, the Linux kernel, and LLVM).

The warning uses strcmp to detect these mistakes (or misuses)
but I'd like to add similar warnings for other string functions
as well and have code out there that does this on purpose use
true flexible array members (or the zero-length extension)
instead.  That makes the intent clear.

It's a judgment call whether to also fold (or do something else
like insert a trap) in addition to issuing a warning.  In this
case (reading) I don't think it matters as much as it does for
writes.  Either way, it would be nice to set a policy and
document it in the manual so users know what to expect and
so we don't have to revisit this question for each patch that
touches on this subject.

Martin


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-13 Thread Jeff Law
On 8/9/19 10:51 AM, Martin Sebor wrote:
> 
> PR tree-optimization/90879 - fold zero-equality of strcmp between a longer 
> string and a smaller array
> 
> gcc/c-family/ChangeLog:
> 
>   PR tree-optimization/90879
>   * c.opt (-Wstring-compare): New option.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/90879
>   * gcc.dg/Wstring-compare-2.c: New test.
>   * gcc.dg/Wstring-compare.c: New test.
>   * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
>   * gcc.dg/strcmpopt_6.c: New test.
>   * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
>   test cases.
>   * gcc.dg/strlenopt-66.c: Run it.
>   * gcc.dg/strlenopt-68.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/90879
>   * builtins.c (check_access): Avoid using maxbound when null.
>   * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen change.
>   * doc/invoke.texi (-Wstring-compare): Document new warning option.
>   * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
>   conditional.
>   (get_range_strlen): Overwrite initial maxbound when non-null.
>   * gimple-ssa-sprintf.c (get_string_length): Adjust to get_range_strlen
>   change.
>   * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
>   (used_only_for_zero_equality): New function.
>   (handle_builtin_memcmp): Call it.
>   (determine_min_objsize): Return an integer instead of tree.
>   (get_len_or_size, strxcmp_eqz_result): New functions.
>   (maybe_warn_pointless_strcmp): New function.
>   (handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
>   between a longer string and a smaller array.
> 

> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> index 4af47855e7c..31e012b741b 100644
> --- a/gcc/tree-ssa-strlen.c
> +++ b/gcc/tree-ssa-strlen.c

> @@ -3079,196 +3042,388 @@ determine_min_objsize (tree dest)
>  
>type = TYPE_MAIN_VARIANT (type);
>  
> -  /* We cannot determine the size of the array if it's a flexible array,
> - which is declared at the end of a structure.  */
> -  if (TREE_CODE (type) == ARRAY_TYPE
> -  && !array_at_struct_end_p (dest))
> +  /* The size of a flexible array cannot be determined.  Otherwise,
> + for arrays with more than one element, return the size of its
> + type.  GCC itself misuses arrays of both zero and one elements
> + as flexible array members so they are excluded as well.  */
> +  if (TREE_CODE (type) != ARRAY_TYPE
> +  || !array_at_struct_end_p (dest))
>  {
> -  tree size_t = TYPE_SIZE_UNIT (type);
> -  if (size_t && TREE_CODE (size_t) == INTEGER_CST
> -   && !integer_zerop (size_t))
> -return size_t;
> +  tree type_size = TYPE_SIZE_UNIT (type);
> +  if (type_size && TREE_CODE (type_size) == INTEGER_CST
> +   && !integer_onep (type_size)
> +   && !integer_zerop (type_size))
> +return tree_to_uhwi (type_size);
So I nearly commented on this when looking at the original patch.  Can
we really depend on the size when we've got an array at the end of a
struct with a declared size other than 0/1?   While 0/1 are by far the
most common way to declare them, couldn't someone have used other sizes?
 I think we pondered doing that at one time to cut down on the noise
from Coverity for RTL and TREE operand accessors.

Your code makes us safer, so I'm not saying you've done anything wrong,
just trying to decide if we need to tighten this up even further.

No additional comments beyond what I pointed out yesterday against the
original patch.

Jeff

>


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-12 Thread Jeff Law
On 8/12/19 4:17 PM, Martin Sebor wrote:
> On 8/12/19 2:04 PM, Jeff Law wrote:
>> On 8/9/19 4:14 PM, Martin Sebor wrote:
>>> On 8/9/19 10:58 AM, Jakub Jelinek wrote:
 On Fri, Aug 09, 2019 at 10:51:09AM -0600, Martin Sebor wrote:
> That said, we should change this code one way or the other.
> There is even less of a guarantee that other compilers support
> writing past the end of arrays that have non-zero size than
> that they recognize the documented zero-length extension.

 We use that everywhere forever, so no.
>>>
>>> Just because some invalid code has been in place "forever" doesn't
>>> mean it cannot be changed.  Relying on undocumented "extensions"
>>> because they just happen to work with the compilers they have been
>>> exposed to is exactly how naive users get in trouble.  Our answer
>>> to reports of "bugs" when the behavior changes is typically: fix
>>> your code.  There's little reason to expect other compiler writers
>>> to be any more accommodating.
>>>
 See e.g. rtx u.fld and u.hwint arrays, tree_exp operands array,
 gimple_statement_with_ops op array just to name a few that are
 everywhere.  Coverity is indeed unhappy about
 that, but it would be with [0] certainly too.  Another option is
 to use maximum possible size where we know it (which is the case of
 rtxes and most tree expressions and gimple stmts, but not e.g.
 CALL_EXPR or GIMPLE_CALL where there is no easy upper bound.
>>>
>>> The solution introduced in C99 is a flexible array.  C++
>>> compilers usually support it as well.  Those that don't are
>>> likely to support the zero-length array (even Visual C++ does).
>>> If there's a chance that some don't support either do you really
>>> think it's safe to assume they will do something sane with
>>> the [1] hack?  If you're concerned that the flexible array syntax
>>> or the zero length array won't compile, add a configure test to
>>> see if it does and use whatever alternative is most appropriate.
>> Given that we require a C++03 compiler to build GCC, I think we can
>> revisit how we represent the trailing array.  But that seems independent
>> of the bulk of this patch.
>>
>> Can we separate this issue from the rest of the patch?
> 
> The updated patch I posted is independent of the trailing
> [1] array hack:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html
I must have dropped this from my queue by accident.  I'll go find it and
give it a looksie as well.

jeff


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-12 Thread Martin Sebor

On 8/12/19 2:04 PM, Jeff Law wrote:

On 8/9/19 4:14 PM, Martin Sebor wrote:

On 8/9/19 10:58 AM, Jakub Jelinek wrote:

On Fri, Aug 09, 2019 at 10:51:09AM -0600, Martin Sebor wrote:

That said, we should change this code one way or the other.
There is even less of a guarantee that other compilers support
writing past the end of arrays that have non-zero size than
that they recognize the documented zero-length extension.


We use that everywhere forever, so no.


Just because some invalid code has been in place "forever" doesn't
mean it cannot be changed.  Relying on undocumented "extensions"
because they just happen to work with the compilers they have been
exposed to is exactly how naive users get in trouble.  Our answer
to reports of "bugs" when the behavior changes is typically: fix
your code.  There's little reason to expect other compiler writers
to be any more accommodating.


See e.g. rtx u.fld and u.hwint arrays, tree_exp operands array,
gimple_statement_with_ops op array just to name a few that are
everywhere.  Coverity is indeed unhappy about
that, but it would be with [0] certainly too.  Another option is
to use maximum possible size where we know it (which is the case of
rtxes and most tree expressions and gimple stmts, but not e.g.
CALL_EXPR or GIMPLE_CALL where there is no easy upper bound.


The solution introduced in C99 is a flexible array.  C++
compilers usually support it as well.  Those that don't are
likely to support the zero-length array (even Visual C++ does).
If there's a chance that some don't support either do you really
think it's safe to assume they will do something sane with
the [1] hack?  If you're concerned that the flexible array syntax
or the zero length array won't compile, add a configure test to
see if it does and use whatever alternative is most appropriate.

Given that we require a C++03 compiler to build GCC, I think we can
revisit how we represent the trailing array.  But that seems independent
of the bulk of this patch.

Can we separate this issue from the rest of the patch?


The updated patch I posted is independent of the trailing
[1] array hack:

  https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00643.html

Martin


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-12 Thread Jeff Law
On 8/9/19 10:17 AM, Martin Sebor wrote:
> GCC 9 optimizes a subset of expression of the form
> (0 == strcmp(a, b)) based on the length and/or size of
> the arguments but it doesn't take advantage of all
> the opportunities there.  For example in the following,
> although it folds the first test to false it doesn't fold
> the second one:
> 
>   char a[4];
> 
>   void f (void)
>   {
> if (strlen (a) > 3)   // folded to false by GCC 8+
>   abort ();
> 
> if (strcmp (a, "1234") == 0)   // folded by patched GCC
>   abort ();
> }
> 
> The attached patch extends the strcmp optimization added in
> GCC 9 to also handle the latter cases (among others).  Testing
> the enhancement with several other sizable code bases besides
> GCC (Binutils/GDB, the Linux kernel, and LLVM) shows that code
> like this is rare.  After thinking about it I decided it's more
> likely a bug than a significant optimization opportunity, so
> I introduced a new warning to point it out: -Wstring-compare
> (enabled in -Wextra).
> 
> Besides this enhancement, the patch also improves the current
> optimization to fold strcmp calls with conditional arguments
> such as in:
> 
>   void f (char *s, int i)
>   {
> strcpy (s, "12");
> if (strcmp (s, i ? "123" : "1234") == 0)   // folded
>   abort ();
>   }
> 
> Martin
> 
> PS The diff looks like the changes are more extensive than they
> actually are.
> 
> gcc-90879.diff
> 
> PR tree-optimization/90879 - fold zero-equality of strcmp between a longer 
> string and a smaller array
> 
> gcc/c-family/ChangeLog:
> 
>   PR tree-optimization/90879
>   * c.opt (-Wstring-compare): New option.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/90879
>   * gcc.dg/Wstring-compare-2.c: New test.
>   * gcc.dg/Wstring-compare.c: New test.
>   * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
>   * gcc.dg/strcmpopt_6.c: New test.
>   * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
>   test cases.
>   * gcc.dg/strlenopt-66.c: Run it.
>   * gcc.dg/strlenopt-67.c: New test.
>   * gcc.dg/strlenopt-68.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/90879
>   * builtins.c (check_access): Avoid using maxbound when null.
>   * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen change.
>   * doc/invoke.texi (-Wstring-compare): Document new warning option.
>   * gengtype-state.c (state_ident_st): Use a zero-length array instead.
>   (state_token_st): Same.  Make last.
>   (state_ident_by_name): Allocate enough space for terminating nul.
>   * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
>   conditional.
>   (get_range_strlen): Overwrite initial maxbound when non-null.
>   * gimple-ssa-sprintf.c (get_string_length): Adjust to get_range_strlen
>   change.
>   * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
>   (used_only_for_zero_equality): New function.
>   (handle_builtin_memcmp): Call it.
>   (determine_min_objsize): Return an integer instead of tree.
>   (get_len_or_size, strxcmp_eqz_result): New functions.
>   (maybe_warn_pointless_strcmp): New function.
>   (handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
>   between a longer string and a smaller array.
> 
> diff --git a/gcc/gengtype-state.c b/gcc/gengtype-state.c
> index 03f40694ec6..80a8b57e9a2 100644
> --- a/gcc/gengtype-state.c
> +++ b/gcc/gengtype-state.c
> @@ -79,6 +79,14 @@ enum state_token_en
>STOK_NAME /* hash-consed name or identifier.  */
>  };
>  
> +/* Suppress warning: ISO C forbids zero-size array for stok_string
> +   below.  The arrays are treated as flexible array members but in
> +   otherwise an empty struct or as a member of a union cannot be
> +   declared as such.  They must have zero size to keep GCC from
> +   assuming their bound reflect their size.  */
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wpedantic"
> +
>  
>  /* Structure and hash-table used to share identifiers or names.  */
>  struct state_ident_st
> @@ -86,11 +94,10 @@ struct state_ident_st
>/* TODO: We could improve the parser by reserving identifiers for
>   state keywords and adding a keyword number for them.  That would
>   mean adding another field in this state_ident_st struct.  */
> -  char stid_name[1]; /* actually bigger & null terminated */
> +  char stid_name[0]; /* actually bigger & null terminated */
>  };
>  static htab_t state_ident_tab;
>  
> -
>  /* The state_token_st structure is for lexical tokens in the read
> state file.  The stok_kind field discriminates the union.  Tokens
> are allocated by peek_state_token which calls read_a_state_token
> @@ -110,14 +117,15 @@ struct state_token_st
>union  /* discriminated by stok_kind! 
> */
>{
>  int stok_num;/* when 

Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-12 Thread Jeff Law
On 8/9/19 4:14 PM, Martin Sebor wrote:
> On 8/9/19 10:58 AM, Jakub Jelinek wrote:
>> On Fri, Aug 09, 2019 at 10:51:09AM -0600, Martin Sebor wrote:
>>> That said, we should change this code one way or the other.
>>> There is even less of a guarantee that other compilers support
>>> writing past the end of arrays that have non-zero size than
>>> that they recognize the documented zero-length extension.
>>
>> We use that everywhere forever, so no.
> 
> Just because some invalid code has been in place "forever" doesn't
> mean it cannot be changed.  Relying on undocumented "extensions"
> because they just happen to work with the compilers they have been
> exposed to is exactly how naive users get in trouble.  Our answer
> to reports of "bugs" when the behavior changes is typically: fix
> your code.  There's little reason to expect other compiler writers
> to be any more accommodating.
> 
>> See e.g. rtx u.fld and u.hwint arrays, tree_exp operands array,
>> gimple_statement_with_ops op array just to name a few that are
>> everywhere.  Coverity is indeed unhappy about
>> that, but it would be with [0] certainly too.  Another option is
>> to use maximum possible size where we know it (which is the case of
>> rtxes and most tree expressions and gimple stmts, but not e.g.
>> CALL_EXPR or GIMPLE_CALL where there is no easy upper bound.
> 
> The solution introduced in C99 is a flexible array.  C++
> compilers usually support it as well.  Those that don't are
> likely to support the zero-length array (even Visual C++ does).
> If there's a chance that some don't support either do you really
> think it's safe to assume they will do something sane with
> the [1] hack?  If you're concerned that the flexible array syntax
> or the zero length array won't compile, add a configure test to
> see if it does and use whatever alternative is most appropriate.
Given that we require a C++03 compiler to build GCC, I think we can
revisit how we represent the trailing array.  But that seems independent
of the bulk of this patch.

Can we separate this issue from the rest of the patch?

jeff


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-12 Thread Michael Matz
Hi,

On Fri, 9 Aug 2019, Martin Sebor wrote:

> The solution introduced in C99 is a flexible array.  C++
> compilers usually support it as well.  Those that don't are
> likely to support the zero-length array (even Visual C++ does).
> If there's a chance that some don't support either do you really
> think it's safe to assume they will do something sane with
> the [1] hack?

As the [1] "hack" is the traditional pre-C99 (and C++) idiom to 
implement flexible trailing char arrays, yes, I do expect all existing 
(and not any more existing) compilers to do the obvious and sane thing 
with it.  IOW: it's more portable in practice than our documented 
zero-length extension.  And that's what matters for the things compiled by 
the host compiler.

Without requiring C99 (which would be a different discussion) and a 
non-existing C++ standard we can't write this code (in this form) in a 
standard conforming way, no matter what we wish for.  Hence it seems 
prudent to use the most portable variant of all the non-standard ways, the 
trailing [1] array.


Ciao,
Michael.


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-09 Thread Martin Sebor

On 8/9/19 10:58 AM, Jakub Jelinek wrote:

On Fri, Aug 09, 2019 at 10:51:09AM -0600, Martin Sebor wrote:

That said, we should change this code one way or the other.
There is even less of a guarantee that other compilers support
writing past the end of arrays that have non-zero size than
that they recognize the documented zero-length extension.


We use that everywhere forever, so no.


Just because some invalid code has been in place "forever" doesn't
mean it cannot be changed.  Relying on undocumented "extensions"
because they just happen to work with the compilers they have been
exposed to is exactly how naive users get in trouble.  Our answer
to reports of "bugs" when the behavior changes is typically: fix
your code.  There's little reason to expect other compiler writers
to be any more accommodating.


See e.g. rtx u.fld and u.hwint arrays, tree_exp operands array,
gimple_statement_with_ops op array just to name a few that are
everywhere.  Coverity is indeed unhappy about
that, but it would be with [0] certainly too.  Another option is
to use maximum possible size where we know it (which is the case of
rtxes and most tree expressions and gimple stmts, but not e.g.
CALL_EXPR or GIMPLE_CALL where there is no easy upper bound.


The solution introduced in C99 is a flexible array.  C++
compilers usually support it as well.  Those that don't are
likely to support the zero-length array (even Visual C++ does).
If there's a chance that some don't support either do you really
think it's safe to assume they will do something sane with
the [1] hack?  If you're concerned that the flexible array syntax
or the zero length array won't compile, add a configure test to
see if it does and use whatever alternative is most appropriate.

Martin


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-09 Thread Jakub Jelinek
On Fri, Aug 09, 2019 at 10:51:09AM -0600, Martin Sebor wrote:
> That said, we should change this code one way or the other.
> There is even less of a guarantee that other compilers support
> writing past the end of arrays that have non-zero size than
> that they recognize the documented zero-length extension.

We use that everywhere forever, so no.
See e.g. rtx u.fld and u.hwint arrays, tree_exp operands array,
gimple_statement_with_ops op array just to name a few that are
everywhere.  Coverity is indeed unhappy about
that, but it would be with [0] certainly too.  Another option is
to use maximum possible size where we know it (which is the case of
rtxes and most tree expressions and gimple stmts, but not e.g.
CALL_EXPR or GIMPLE_CALL where there is no easy upper bound.

Jakub


Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-09 Thread Martin Sebor

On 8/9/19 10:22 AM, Jakub Jelinek wrote:

On Fri, Aug 09, 2019 at 10:17:12AM -0600, Martin Sebor wrote:

--- a/gcc/gengtype-state.c
+++ b/gcc/gengtype-state.c
@@ -79,6 +79,14 @@ enum state_token_en
STOK_NAME /* hash-consed name or identifier.  */
  };
  
+/* Suppress warning: ISO C forbids zero-size array for stok_string

+   below.  The arrays are treated as flexible array members but in
+   otherwise an empty struct or as a member of a union cannot be
+   declared as such.  They must have zero size to keep GCC from
+   assuming their bound reflect their size.  */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpedantic"
+
  
  /* Structure and hash-table used to share identifiers or names.  */

  struct state_ident_st
@@ -86,11 +94,10 @@ struct state_ident_st
/* TODO: We could improve the parser by reserving identifiers for
   state keywords and adding a keyword number for them.  That would
   mean adding another field in this state_ident_st struct.  */
-  char stid_name[1];   /* actually bigger & null terminated */
+  char stid_name[0];   /* actually bigger & null terminated */


No, please don't do this.  The part of the GCC that is built by system
compiler shouldn't use GNU extensions, unless guarded only for compilation
with compilers that do support that.


Hmm, this wasn't supposed to be in the diff anymore (the patch
handles the code without these changes).  I removed it after
verifying it just before sending the patch so my mailer must
have sent a cached copy.  Attached is the latest tested patch
without this change.

That said, we should change this code one way or the other.
There is even less of a guarantee that other compilers support
writing past the end of arrays that have non-zero size than
that they recognize the documented zero-length extension.

Martin
PR tree-optimization/90879 - fold zero-equality of strcmp between a longer string and a smaller array

gcc/c-family/ChangeLog:

	PR tree-optimization/90879
	* c.opt (-Wstring-compare): New option.

gcc/testsuite/ChangeLog:

	PR tree-optimization/90879
	* gcc.dg/Wstring-compare-2.c: New test.
	* gcc.dg/Wstring-compare.c: New test.
	* gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
	* gcc.dg/strcmpopt_6.c: New test.
	* gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
	test cases.
	* gcc.dg/strlenopt-66.c: Run it.
	* gcc.dg/strlenopt-68.c: New test.

gcc/ChangeLog:

	PR tree-optimization/90879
	* builtins.c (check_access): Avoid using maxbound when null.
	* calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen change.
	* doc/invoke.texi (-Wstring-compare): Document new warning option.
	* gimple-fold.c (get_range_strlen_tree): Make setting maxbound
	conditional.
	(get_range_strlen): Overwrite initial maxbound when non-null.
	* gimple-ssa-sprintf.c (get_string_length): Adjust to get_range_strlen
	change.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
	(used_only_for_zero_equality): New function.
	(handle_builtin_memcmp): Call it.
	(determine_min_objsize): Return an integer instead of tree.
	(get_len_or_size, strxcmp_eqz_result): New functions.
	(maybe_warn_pointless_strcmp): New function.
	(handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
	between a longer string and a smaller array.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 695a9d191af..eca710942dc 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3326,7 +3326,7 @@ check_access (tree exp, tree, tree, tree dstwrite,
 	  c_strlen_data lendata = { };
 	  get_range_strlen (srcstr, , /* eltsize = */ 1);
 	  range[0] = lendata.minlen;
-	  range[1] = lendata.maxbound;
+	  range[1] = lendata.maxbound ? lendata.maxbound : lendata.maxlen;
 	  if (range[0] && (!maxread || TREE_CODE (maxread) == INTEGER_CST))
 	{
 	  if (maxread && tree_int_cst_le (maxread, range[0]))
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 257cadfa5f1..2fe6cc4ee08 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -784,6 +784,12 @@ Wsizeof-array-argument
 C ObjC C++ ObjC++ Var(warn_sizeof_array_argument) Warning Init(1)
 Warn when sizeof is applied on a parameter declared as an array.
 
+Wstring-compare
+C ObjC C++ LTO ObjC++ Warning Var(warn_string_compare) Warning LangEnabledBy(C ObjC C++ ObjC++, Wextra)
+Warn about calls to strcmp and strncmp used in equality expressions that
+are necessarily true or false due to the length of one and size of the other
+argument.
+
 Wstringop-overflow
 C ObjC C++ LTO ObjC++ Warning Alias(Wstringop-overflow=, 2, 0)
 Warn about buffer overflow in string manipulation functions like memcpy
diff --git a/gcc/calls.c b/gcc/calls.c
index 7507b698e27..dcebf67b5cc 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1593,6 +1593,10 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
 	if (!get_attr_nonstring_decl (arg))
 	  {
 		c_strlen_data lendata = { };
+		/* Set MAXBOUND to an arbitrary non-null non-integer
+		   node as a 

Re: [PATCH] fold more string comparison with known result (PR 90879)

2019-08-09 Thread Jakub Jelinek
On Fri, Aug 09, 2019 at 10:17:12AM -0600, Martin Sebor wrote:
> --- a/gcc/gengtype-state.c
> +++ b/gcc/gengtype-state.c
> @@ -79,6 +79,14 @@ enum state_token_en
>STOK_NAME /* hash-consed name or identifier.  */
>  };
>  
> +/* Suppress warning: ISO C forbids zero-size array for stok_string
> +   below.  The arrays are treated as flexible array members but in
> +   otherwise an empty struct or as a member of a union cannot be
> +   declared as such.  They must have zero size to keep GCC from
> +   assuming their bound reflect their size.  */
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wpedantic"
> +
>  
>  /* Structure and hash-table used to share identifiers or names.  */
>  struct state_ident_st
> @@ -86,11 +94,10 @@ struct state_ident_st
>/* TODO: We could improve the parser by reserving identifiers for
>   state keywords and adding a keyword number for them.  That would
>   mean adding another field in this state_ident_st struct.  */
> -  char stid_name[1]; /* actually bigger & null terminated */
> +  char stid_name[0]; /* actually bigger & null terminated */

No, please don't do this.  The part of the GCC that is built by system
compiler shouldn't use GNU extensions, unless guarded only for compilation
with compilers that do support that.

Jakub


[PATCH] fold more string comparison with known result (PR 90879)

2019-08-09 Thread Martin Sebor

GCC 9 optimizes a subset of expression of the form
(0 == strcmp(a, b)) based on the length and/or size of
the arguments but it doesn't take advantage of all
the opportunities there.  For example in the following,
although it folds the first test to false it doesn't fold
the second one:

  char a[4];

  void f (void)
  {
if (strlen (a) > 3)   // folded to false by GCC 8+
  abort ();

if (strcmp (a, "1234") == 0)   // folded by patched GCC
  abort ();
}

The attached patch extends the strcmp optimization added in
GCC 9 to also handle the latter cases (among others).  Testing
the enhancement with several other sizable code bases besides
GCC (Binutils/GDB, the Linux kernel, and LLVM) shows that code
like this is rare.  After thinking about it I decided it's more
likely a bug than a significant optimization opportunity, so
I introduced a new warning to point it out: -Wstring-compare
(enabled in -Wextra).

Besides this enhancement, the patch also improves the current
optimization to fold strcmp calls with conditional arguments
such as in:

  void f (char *s, int i)
  {
strcpy (s, "12");
if (strcmp (s, i ? "123" : "1234") == 0)   // folded
  abort ();
  }

Martin

PS The diff looks like the changes are more extensive than they
actually are.
PR tree-optimization/90879 - fold zero-equality of strcmp between a longer string and a smaller array

gcc/c-family/ChangeLog:

	PR tree-optimization/90879
	* c.opt (-Wstring-compare): New option.

gcc/testsuite/ChangeLog:

	PR tree-optimization/90879
	* gcc.dg/Wstring-compare-2.c: New test.
	* gcc.dg/Wstring-compare.c: New test.
	* gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
	* gcc.dg/strcmpopt_6.c: New test.
	* gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
	test cases.
	* gcc.dg/strlenopt-66.c: Run it.
	* gcc.dg/strlenopt-67.c: New test.
	* gcc.dg/strlenopt-68.c: New test.

gcc/ChangeLog:

	PR tree-optimization/90879
	* builtins.c (check_access): Avoid using maxbound when null.
	* calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen change.
	* doc/invoke.texi (-Wstring-compare): Document new warning option.
	* gengtype-state.c (state_ident_st): Use a zero-length array instead.
	(state_token_st): Same.  Make last.
	(state_ident_by_name): Allocate enough space for terminating nul.
	* gimple-fold.c (get_range_strlen_tree): Make setting maxbound
	conditional.
	(get_range_strlen): Overwrite initial maxbound when non-null.
	* gimple-ssa-sprintf.c (get_string_length): Adjust to get_range_strlen
	change.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
	(used_only_for_zero_equality): New function.
	(handle_builtin_memcmp): Call it.
	(determine_min_objsize): Return an integer instead of tree.
	(get_len_or_size, strxcmp_eqz_result): New functions.
	(maybe_warn_pointless_strcmp): New function.
	(handle_builtin_string_cmp): Call it.  Fold zero-equality of strcmp
	between a longer string and a smaller array.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 695a9d191af..eca710942dc 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3326,7 +3326,7 @@ check_access (tree exp, tree, tree, tree dstwrite,
 	  c_strlen_data lendata = { };
 	  get_range_strlen (srcstr, , /* eltsize = */ 1);
 	  range[0] = lendata.minlen;
-	  range[1] = lendata.maxbound;
+	  range[1] = lendata.maxbound ? lendata.maxbound : lendata.maxlen;
 	  if (range[0] && (!maxread || TREE_CODE (maxread) == INTEGER_CST))
 	{
 	  if (maxread && tree_int_cst_le (maxread, range[0]))
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 257cadfa5f1..2fe6cc4ee08 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -784,6 +784,12 @@ Wsizeof-array-argument
 C ObjC C++ ObjC++ Var(warn_sizeof_array_argument) Warning Init(1)
 Warn when sizeof is applied on a parameter declared as an array.
 
+Wstring-compare
+C ObjC C++ LTO ObjC++ Warning Var(warn_string_compare) Warning LangEnabledBy(C ObjC C++ ObjC++, Wextra)
+Warn about calls to strcmp and strncmp used in equality expressions that
+are necessarily true or false due to the length of one and size of the other
+argument.
+
 Wstringop-overflow
 C ObjC C++ LTO ObjC++ Warning Alias(Wstringop-overflow=, 2, 0)
 Warn about buffer overflow in string manipulation functions like memcpy
diff --git a/gcc/calls.c b/gcc/calls.c
index 7507b698e27..dcebf67b5cc 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1593,6 +1593,10 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
 	if (!get_attr_nonstring_decl (arg))
 	  {
 		c_strlen_data lendata = { };
+		/* Set MAXBOUND to an arbitrary non-null non-integer
+		   node as a request to have it set to the length of
+		   the longest string in a PHI.  */
+		lendata.maxbound = arg;
 		get_range_strlen (arg, , /* eltsize = */ 1);
 		maxlen = lendata.maxbound;
 	  }
@@ -1618,6 +1622,10 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
 	if (!get_attr_nonstring_decl (arg))
 	  {
 	c_strlen_data lendata = { };
+	/* Set MAXBOUND to an