Re: [PATCH v2 1/2] strlen: fold strstr() even if the length isn't previously known [PR96601]

2023-09-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 04, 2023 at 11:14:41PM -0600, Jeff Law wrote:
> 
> 
> On 9/4/23 14:58, Hamza Mahfooz wrote:
> > Currently, we give up in fold_strstr_to_strncmp() if the length of the
> > the second argument to strstr() isn't known to us by the time we hit
> > that function. However, we can instead insert a strlen() in ourselves
> > and continue trying to fold strstr() into strlen()+strncmp().
> > 
> > PR tree-optimization/96601
> > 
> > gcc/ChangeLog:
> > 
> > * tree-ssa-strlen.cc (fold_strstr_to_strncmp): If arg1_len == NULL,
> > insert a strlen() for strstr()'s arg1 and use it as arg1_len.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.dg/strlenopt-30.c: Modify test.
> I'm not sure it's necessarily a win to convert to strncmp as aggressively as
> this patch would do.  Essentially when you have large needles that are
> partially matched repeatedly, performance can significantly suffer.

For -Os/-Oz I think this is never a desirable optimization, turning one call
into 2 with all the argument setup etc. (unless we have the length constant
or already computed).

Otherwise, consider say 2GB long needle, it will take quite long to even
compute strlen of that.
Now, looking at current glibc strstr implementation, it starts with
  /* Handle short needle special cases first.  */
  if (ne[0] == '\0')
return (char *)hs;
  hs = (const unsigned char *)strchr ((const char*)hs, ne[0]);
  if (hs == NULL || ne[1] == '\0')
return (char*)hs;
  if (ne[2] == '\0')
return strstr2 (hs, ne);
  if (ne[3] == '\0')
return strstr3 (hs, ne);

  /* Ensure haystack length is at least as long as needle length.
 Since a match may occur early on in a huge haystack, use strnlen
 and read ahead a few cachelines for improved performance.  */
  size_t ne_len = strlen ((const char*)ne);
  size_t hs_len = __strnlen ((const char*)hs, ne_len | 512);
So, if needle is very long but first character of the needle doesn't
appear in haystack at all and haystack is shorter than needle, this will also
not be desirable optimization, because strstr would just return NULL after
walking haystack, while with the optimization you need to compute the
length.  Otherwise I think the optimization is desirable, because typically
haystack is longer than needle and walking it completely using strchr will
be already more expensive than strlen on needle and otherwise strstr
computes the strlen anyway later.  But perhaps if strlen isn't known it
might be better to guard the strlen + strncmp on inline comparison of the
first character, so that one rules out the above mentioned special case,
so that we won't compute the strlen unnecessarily at least in that case.
Still strstr (32b_string, 2147483647b_string) == 32b_string will be
serious slowdown if 32b_string[0] == 2147483647b_string[0], but perhaps the
common case is more important.

Or do we want to add to the C library some kind of asymetric strcmp,
which will be equivalent to strncmp (p1, p2, strlen (p2)) but will actually
not compute the strlen but only compare the characters and just handle the
case where p2 has as the first different character '\0' as return 0 rather
than difference?

Jakub



Re: [PATCH v2 1/2] strlen: fold strstr() even if the length isn't previously known [PR96601]

2023-09-04 Thread Jeff Law via Gcc-patches




On 9/4/23 14:58, Hamza Mahfooz wrote:

Currently, we give up in fold_strstr_to_strncmp() if the length of the
the second argument to strstr() isn't known to us by the time we hit
that function. However, we can instead insert a strlen() in ourselves
and continue trying to fold strstr() into strlen()+strncmp().

PR tree-optimization/96601

gcc/ChangeLog:

* tree-ssa-strlen.cc (fold_strstr_to_strncmp): If arg1_len == NULL,
insert a strlen() for strstr()'s arg1 and use it as arg1_len.

gcc/testsuite/ChangeLog:

* gcc.dg/strlenopt-30.c: Modify test.
I'm not sure it's necessarily a win to convert to strncmp as 
aggressively as this patch would do.  Essentially when you have large 
needles that are partially matched repeatedly, performance can 
significantly suffer.


If we're going to seriously consider this path, then I'd like to see how 
it performs in general.  The glibc testsuite I think has some 
performance coverage for strstr.


jeff


[PATCH v2 1/2] strlen: fold strstr() even if the length isn't previously known [PR96601]

2023-09-04 Thread Hamza Mahfooz
Currently, we give up in fold_strstr_to_strncmp() if the length of the
the second argument to strstr() isn't known to us by the time we hit
that function. However, we can instead insert a strlen() in ourselves
and continue trying to fold strstr() into strlen()+strncmp().

PR tree-optimization/96601

gcc/ChangeLog:

* tree-ssa-strlen.cc (fold_strstr_to_strncmp): If arg1_len == NULL,
insert a strlen() for strstr()'s arg1 and use it as arg1_len.

gcc/testsuite/ChangeLog:

* gcc.dg/strlenopt-30.c: Modify test.

Signed-off-by: Hamza Mahfooz 
---
Please push this for me if you think it looks good. Since, I don't have
write access to the repository.
---
 gcc/testsuite/gcc.dg/strlenopt-30.c |  5 +-
 gcc/tree-ssa-strlen.cc  | 81 -
 2 files changed, 45 insertions(+), 41 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/strlenopt-30.c 
b/gcc/testsuite/gcc.dg/strlenopt-30.c
index 2a3098ba96f..1ee814048c1 100644
--- a/gcc/testsuite/gcc.dg/strlenopt-30.c
+++ b/gcc/testsuite/gcc.dg/strlenopt-30.c
@@ -38,9 +38,6 @@ void f5(char *s)
 foo_f5();
 }
 
-/* Do not perform transform, since strlen (t)
-   is unknown.  */
-
 __attribute__((no_icf))
 _Bool f6(char *s, char *t)
 {
@@ -60,4 +57,4 @@ _Bool f7(char *s)
   return (t1 == s);
 }
 
-/* { dg-final { scan-tree-dump-times "__builtin_strncmp" 5 "strlen1" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_strncmp" 6 "strlen1" } } */
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 8b7ef919826..b0ebbb0db62 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -5273,6 +5273,7 @@ fold_strstr_to_strncmp (tree rhs1, tree rhs2, gimple 
*stmt)
  tree arg1 = gimple_call_arg (call_stmt, 1);
  tree arg1_len = NULL_TREE;
  int idx = get_stridx (arg1, call_stmt);
+ gimple_stmt_iterator gsi = gsi_for_stmt (call_stmt);
 
  if (idx)
{
@@ -5286,51 +5287,57 @@ fold_strstr_to_strncmp (tree rhs1, tree rhs2, gimple 
*stmt)
}
}
 
- if (arg1_len != NULL_TREE)
+ if (arg1_len == NULL_TREE)
{
- gimple_stmt_iterator gsi = gsi_for_stmt (call_stmt);
- tree strncmp_decl = builtin_decl_explicit (BUILT_IN_STRNCMP);
+ tree strlen_decl = builtin_decl_explicit (BUILT_IN_STRLEN);
+ gcall *strlen_call = gimple_build_call (strlen_decl, 1, arg1);
+ tree strlen_lhs = make_ssa_name (size_type_node, strlen_call);
+
+ gimple_call_set_lhs (strlen_call, strlen_lhs);
+ gimple_set_vuse (strlen_call, gimple_vuse (call_stmt));
+ gsi_insert_before (, strlen_call, GSI_SAME_STMT);
+ arg1_len = strlen_lhs;
+   }
+ else if (!is_gimple_val (arg1_len))
+   {
+ tree arg1_len_tmp = make_ssa_name (TREE_TYPE (arg1_len));
+ gassign *arg1_stmt = gimple_build_assign (arg1_len_tmp,
+   arg1_len);
+ gsi_insert_before (, arg1_stmt, GSI_SAME_STMT);
+ arg1_len = arg1_len_tmp;
+   }
 
- if (!is_gimple_val (arg1_len))
+ tree strncmp_decl = builtin_decl_explicit (BUILT_IN_STRNCMP);
+ gcall *strncmp_call = gimple_build_call (strncmp_decl, 3,
+  arg0, arg1, arg1_len);
+ tree strncmp_lhs = make_ssa_name (integer_type_node);
+ gimple_set_vuse (strncmp_call, gimple_vuse (call_stmt));
+ gimple_call_set_lhs (strncmp_call, strncmp_lhs);
+ gsi_remove (, true);
+ gsi_insert_before (, strncmp_call, GSI_SAME_STMT);
+ tree zero = build_zero_cst (TREE_TYPE (strncmp_lhs));
+
+ if (is_gimple_assign (stmt))
+   {
+ if (gimple_assign_rhs_code (stmt) == COND_EXPR)
{
- tree arg1_len_tmp = make_ssa_name (TREE_TYPE (arg1_len));
- gassign *arg1_stmt = gimple_build_assign (arg1_len_tmp,
-   arg1_len);
- gsi_insert_before (, arg1_stmt, GSI_SAME_STMT);
- arg1_len = arg1_len_tmp;
-   }
-
- gcall *strncmp_call = gimple_build_call (strncmp_decl, 3,
- arg0, arg1, arg1_len);
- tree strncmp_lhs = make_ssa_name (integer_type_node);
- gimple_set_vuse (strncmp_call, gimple_vuse (call_stmt));
- gimple_call_set_lhs (strncmp_call, strncmp_lhs);
- gsi_remove (, true);
- gsi_insert_before (, strncmp_call, GSI_SAME_STMT);
- tree zero = build_zero_cst (TREE_TYPE (strncmp_lhs));
-
- if (is_gimple_assign (stmt))
-   {
- if (gimple_assign_rhs_code (stmt) == COND_EXPR)
-   {
- tree cond = gimple_assign_rhs1 (stmt);
-