[Bug middle-end/78809] Inline strcmp with small constant strings

2018-09-05 Thread qinzhao at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

qinzhao at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #48 from qinzhao at gcc dot gnu.org ---
all the issues have been resolved.
close this one as fixed.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-08-30 Thread qinzhao at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #47 from qinzhao at gcc dot gnu.org ---
all the issues triggered by the previous patch have been fixed.
I am planing to close this PR as fixed.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-26 Thread qinzhao at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #46 from qinzhao at gcc dot gnu.org ---
https://gcc.gnu.org/viewcvs/gcc?view=revision=263028
was to fix the optimization level issue.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.2 |8.3

--- Comment #45 from Jakub Jelinek  ---
GCC 8.2 has been released.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #44 from Qing Zhao  ---
> (In reply to wilco from comment #43)
will provide a simple patch for this issue.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-23 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #43 from Wilco  ---
(In reply to qinzhao from comment #42)
> just checked in the patch for fixing the unsigned char issue as:
> https://gcc.gnu.org/viewcvs/gcc?view=revision=262907

That looks it is using unsigned char accesses indeed.

One more thing: the expansion happens both with -Os and -O0, and that shouldn't
happen. The memcmp expansion works from -O2 onwards, which is probably a good
choice for the strcmp too (a single char could be optimized with -Os but that
would be the empty string).

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-20 Thread qinzhao at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #42 from qinzhao at gcc dot gnu.org ---
just checked in the patch for fixing the unsigned char issue as:
https://gcc.gnu.org/viewcvs/gcc?view=revision=262907

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-13 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #41 from Qing Zhao  ---
> --- Comment #40 from Wilco  ---
> See eg. http://www.iso-9899.info/n1570.html section 7.24.4:
> 
> "The sign of a nonzero value returned by the comparison functions memcmp,
> strcmp, and strncmp is determined by the sign of the difference between the
> values of the first pair of characters (both interpreted as unsigned char) 
> that
> differ in the objects being compared."

Thanks. I will provide a small patch to fix this issue soon.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-13 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #40 from Wilco  ---
(In reply to Qing Zhao from comment #39)
> > --- Comment #38 from Wilco  ---
> > This uses signed char while the C standard says the comparison is done on
> > unsigned chars.
> > 
> 
> during my implementation, I did some research on whether I should use
> “unsigned char” or “signed char”
> for the comparison.  what I checked was man page of strcmp, memcmp, (I don’t
> have C standard in hand).
> in the manpage of memcmp, it clearly and explicitly mentioned that the chars
> are interpreted as unsigned char;
> however, in the manpage of strcmp/strncmp, it’s not mentioned at all.  So, I
> thought that for strcmp/strncmp,
> I should use signed char.  but for memcmp, I used unsigned char.
> 
> since I don’t have a C standard, could you please point me the corresponding
> section for this?
> thanks.

See eg. http://www.iso-9899.info/n1570.html section 7.24.4:

"The sign of a nonzero value returned by the comparison functions memcmp,
strcmp, and strncmp is determined by the sign of the difference between the
values of the first pair of characters (both interpreted as unsigned char) that
differ in the objects being compared."

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-13 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #39 from Qing Zhao  ---
> --- Comment #38 from Wilco  ---
> This uses signed char while the C standard says the comparison is done on
> unsigned chars.
> 

during my implementation, I did some research on whether I should use “unsigned
char” or “signed char”
for the comparison.  what I checked was man page of strcmp, memcmp, (I don’t
have C standard in hand).
in the manpage of memcmp, it clearly and explicitly mentioned that the chars
are interpreted as unsigned char;
however, in the manpage of strcmp/strncmp, it’s not mentioned at all.  So, I
thought that for strcmp/strncmp,
I should use signed char.  but for memcmp, I used unsigned char.

since I don’t have a C standard, could you please point me the corresponding
section for this?
thanks.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-13 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #38 from Wilco  ---
(In reply to qinzhao from comment #37)
> since all the implementation were in trunk.
> can I close this PR now?

Thanks, it generates pretty much what I expected for t1.

However there is an issue:

t1:
ldrsb   w1, [x0]
subsw1, w1, #97
bne .L68
ldrsb   w1, [x0, 1]
.L68:
mov w0, w1
ret

This uses signed char while the C standard says the comparison is done on
unsigned chars.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-13 Thread qinzhao at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #37 from qinzhao at gcc dot gnu.org ---
since all the implementation were in trunk.
can I close this PR now?

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-13 Thread qinzhao at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #36 from qinzhao at gcc dot gnu.org ---
the 3rd part (the last part) of this PR was checked into GCC 9 today as:

https://gcc.gnu.org/viewcvs/gcc?view=revision=262636

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-05-31 Thread qinzhao at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

qinzhao at gcc dot gnu.org changed:

   What|Removed |Added

 CC||qinzhao at gcc dot gnu.org

--- Comment #35 from qinzhao at gcc dot gnu.org ---
the second part of the implementation is checked into gcc9 today as:

https://gcc.gnu.org/viewcvs/gcc?limit_changes=0=revision=261039

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-05-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.0 |8.2

--- Comment #34 from Jakub Jelinek  ---
GCC 8.1 has been released.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-30 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #33 from Qing Zhao  ---
with SPEC2017 on X86, I didn't see any run-time performance change as expected.
due to the date I collected for code size change, looks like that n=3 or n=4
might be a reasonable default. n=3 might be safer.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-30 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #32 from Qing Zhao  ---
Created attachment 43298
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43298=edit
code size impact from inlining str(n)cmp with different n on X86

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #31 from Wilco  ---
(In reply to Qing Zhao from comment #30)
> (in reply to Wilco from comment #29)
> > 
> > The new test is better, however it uses i % 15 which means an expensive
> > division by constant every loop iteration. It's best to change to i & 15. 
> > Also
> > using an array of string pointers means you get something like:
> > 
> > result += strcmp (p[i & 15], "abc");
> > 
> > Using this I get ~80% speedup for n=3 on AArch64, similar to your set 2.
> I will try with these modification. 
> > 
> > As for benchmarking, I'm not so sure that SPEC2006 or SPEC2017 call strcmp 
> > with
> > constant strings.
> do you have any suggestion on other real applications?

Not really - I haven't seen strcmp with a constant string in a benchmark other
than Dhrystone (and that has a long string). What I typically do is get traces
from running various benchmarks and create a microbenchmark that mimicks the
behaviour, but that's probably overkill for this optimization.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-25 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #30 from Qing Zhao  ---
(in reply to Wilco from comment #29)
> 
> The new test is better, however it uses i % 15 which means an expensive
> division by constant every loop iteration. It's best to change to i & 15. Also
> using an array of string pointers means you get something like:
> 
> result += strcmp (p[i & 15], "abc");
> 
> Using this I get ~80% speedup for n=3 on AArch64, similar to your set 2.
I will try with these modification. 
> 
> As for benchmarking, I'm not so sure that SPEC2006 or SPEC2017 call strcmp 
> with
> constant strings.
do you have any suggestion on other real applications?

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-25 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #29 from Wilco  ---
(In reply to Qing Zhao from comment #28)
> > 
> > I don't think this is a good test.  Repeatedly calling strcmp with the same
> > inputs is not something real code does, especially when the string matches
> > exactly.  Why would it do that?  The consequence is that you are probably
> > optimizing for a very unlikely scenario.  Instead you need to look at traces
> > from real code to understand how strcmp is called in practice.  Nothing else
> > will give you a defensible answer.
> 
> Please see the latest attachment to this bug, it addressed your above
> concerns.
> let me know if you have more concerns with the new testing cases or any
> concrete
> suggestion to improve the testing case. 
> The next step will do performance testing with SPEC2006 or SPEC2017 to
> further
> decide the default threshold.

The new test is better, however it uses i % 15 which means an expensive
division by constant every loop iteration. It's best to change to i & 15. Also
using an array of string pointers means you get something like:

result += strcmp (p[i & 15], "abc");

Using this I get ~80% speedup for n=3 on AArch64, similar to your set 2.

As for benchmarking, I'm not so sure that SPEC2006 or SPEC2017 call strcmp with
constant strings.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #28 from Qing Zhao  ---
> 
> I don't think this is a good test.  Repeatedly calling strcmp with the same
> inputs is not something real code does, especially when the string matches
> exactly.  Why would it do that?  The consequence is that you are probably
> optimizing for a very unlikely scenario.  Instead you need to look at traces
> from real code to understand how strcmp is called in practice.  Nothing else
> will give you a defensible answer.

Please see the latest attachment to this bug, it addressed your above concerns.
let me know if you have more concerns with the new testing cases or any
concrete
suggestion to improve the testing case. 
The next step will do performance testing with SPEC2006 or SPEC2017 to further
decide the default threshold.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-23 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #27 from Richard Earnshaw  ---
(In reply to Qing Zhao from comment #23)

> qinzhao@gcc116:~/Bugs/78809/const_cmp$ cat t_p.c
> #include 
> 
> char array[]= "fishi";
> 
> #define NUM 10
> int __attribute__ ((noinline))
> cmp2 (const char *p)
> {
>   int result = 0;
>   int i;
>   for (i=0; i< NUM; i++) {
> result |=  strcmp (p, "fishi"); 
>   }
>   return result;
> }
> 
> int result = 0;
> 
> int main()
> {
>   for (int i = 0; i < 25; i++)
>  result += cmp2 (array);
>   return 0;
> }

I don't think this is a good test.  Repeatedly calling strcmp with the same
inputs is not something real code does, especially when the string matches
exactly.  Why would it do that?  The consequence is that you are probably
optimizing for a very unlikely scenario.  Instead you need to look at traces
from real code to understand how strcmp is called in practice.  Nothing else
will give you a defensible answer.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

Qing Zhao  changed:

   What|Removed |Added

  Attachment #42449|0   |1
is obsolete||

--- Comment #26 from Qing Zhao  ---
Created attachment 43222
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43222=edit
the run-time performance data for inlining str(n)cmp

This is the performance data with my private str(n)cmp implementation on both
X86 and AARCH64 platform.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-19 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #25 from Wilco  ---
(In reply to Qing Zhao from comment #24)
> From the above, we can see:
> even with n is as big as 20, inlined version is much faster than the
> non-inlined version, both on aarch64 (no hardware string compare insn
> provided) and X86 (hardware string compare insn provided)
> 
> So, it's reasonable to do the inline as much as possible.

Those numbers look too good to be true. I get around 2x speed up at n=3 (using
old GLIBC header which does the inlining), you get 7x on AArch64 and 21x on
x86...

In general it's not a good idea to inline too much because of code size bloat
and branch mispredictions (a good strcmp implementation processes 8 or 16 chars
at a time rather than requiring 1 branch per character). n=4 seems reasonable
since you need 3 instructions per character on most targets.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-19 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #24 from Qing Zhao  ---
From the above, we can see:
even with n is as big as 20, inlined version is much faster than the
non-inlined version, both on aarch64 (no hardware string compare insn provided)
and X86 (hardware string compare insn provided)

So, it's reasonable to do the inline as much as possible.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-19 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #23 from Qing Zhao  ---
I have an implementation for the part C of this task in my private space:

 part C: for strcmp (s1, s2), strncmp (s1, s2, n):

  if the result is NOT used to do simple equality test against zero, one of 
"s1" or "s2" is a small constant string, n is a constant, and the Min value of 
the length of the constant string and "n" is smaller than a predefined 
threshold T, 
  inline the call by a byte-to-byte comparision sequence to avoid calling 
overhead. 


with this implementation, I was able to measure the performance impact of the
inlining transformation on different value of "n", n is the length of the
string need to be compared. In order to decide the following two concerns:
A. what's the default value of n.
B. on a platform that support string compare hardware insns (for exmaple,
X86), which should be done first for a call to strcmp/strncmp, the inline or
the hardware insns?


On both aarch64 and X86, I tried the following small testing case for the
performance experiments:

qinzhao@gcc116:~/Bugs/78809/const_cmp$ cat t_p.c
#include 

char array[]= "fishi";

#define NUM 10
int __attribute__ ((noinline))
cmp2 (const char *p)
{
  int result = 0;
  int i;
  for (i=0; i< NUM; i++) {
result |=  strcmp (p, "fishi"); 
  }
  return result;
}

int result = 0;

int main()
{
  for (int i = 0; i < 25; i++)
 result += cmp2 (array);
  return 0;
}

and the option I used was:  -O -fno-tree-loop-im and the corresponding option
to enable or disable the added inlining, the following is the performance
result:

aarch64  strcmp

n=  3   4   5   6   10  20

inline  31  41  62  72  114 242
no-inline   229 229 229 229 272 333

aarch64  strncmp

n=  3   4   5   6   10  20

inline  41  62  62  76  125 250
no-inline   291 291 291 291 364 427


X86  strcmp

n=  4   5   6   10  20

inline  21  25  31  42  163
no-inline   445 461 488 529 672


X86  strncmp

n=  4   5   6   10  20

inline  21  25  28  43  77
no-inline   412 435 442 495 638

From the above, we can see:

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-12-14 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #22 from Qing Zhao  ---
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00962.html
2nd patch

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-11-16 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #21 from Jeffrey A. Law  ---
Author: law
Date: Fri Nov 17 05:32:05 2017
New Revision: 254856

URL: https://gcc.gnu.org/viewcvs?rev=254856=gcc=rev
Log:
2017-11-15  Qing Zhao 

PR middle-end/78809
* gimple-fold.c (gimple_fold_builtin_string_compare): Add handling
of replacing call to strncmp with corresponding call to strcmp when
meeting conditions.

PR middle-end/78809
* gcc.dg/strcmpopt_1.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/strcmpopt_1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimple-fold.c
trunk/gcc/testsuite/ChangeLog

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-11-06 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #20 from Qing Zhao  ---
design doc is at:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-26 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

Paolo Carlini  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-10-26
 CC|qing.zhao at oracle dot com|
   Assignee|unassigned at gcc dot gnu.org  |qing.zhao at oracle dot 
com
   Target Milestone|--- |8.0
 Ever confirmed|0   |1

--- Comment #19 from Paolo Carlini  ---
Qing is on it.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #18 from Wilco  ---
(In reply to Qing Zhao from comment #17)
> (In reply to Wilco from comment #16)
> 
> >> const char s[8] = “abcd\0abc”;  // null byte in the middle of the string
> >> int f2(void) { return __builtin_strcmp(s, "abc") != 0; }
> >> int f3(void) { return __builtin_strcmp(s, “abc”); }
> >> 
> >> can either of the above f2 or f3 been optimized to memcmp? seems not.
> > 
> > You never get that to the null byte as the memcmp only compares 
> > strlen("abc"+1)
> > characters.
> 
> Yes, this is correct for memcmp, but not for strcmp.  strcmp will get to the
> null byte. 
> as a result,  
> 
> const char s[8] = “abcd\0abc”;
> strcmp (s, “abc”) != 0.   // s = “abcd", which is != “abc"
> strncmp (s, “abc”, 3) == 0
> memcmp(s, “abc”, 3) == 0
> 
> So, strcmp cannot optimized to memcmp 

No that should be:

strcmp (s, “abc”) != 0
strncmp (s, “abc”, 4) != 0
memcmp(s, “abc”, 4) != 0

You need to compare the null terminator as well.

> > However do you mean an input string which is shorter than the
> > constant string? That's fine as this will compare not-equal in the memcmp.
> 
> for the input string is shorter than the constant string, for example: 
> 
> const char s[8] = “ab\0\0abcd”;
> strcmp (s, “abc”) != 0
> strncmp (s, “abc”, 3) != 0
> memcmp (s, “abc”,3) != 0
> 
> In a summary, since it’s NOT easy for the compiler to know where is the
> “Null_terminator” 
> in the string, strcmp is NOT reasonable to be optimized to memcmp whenever
> its result is 
> used to compare with zero or not.

The compiler knows where the null terminator is in the constant string so it
can easily figure out when it is legal as well as faster than doing a byte by
byte expansion of strcmp.

strcmp (s, STR) -> memcmp (s, strlen (STR) + 1) iff max(sizeof_array(s),
alignment(s)) > strlen (STR).

> But for strncmp, if the result is to compare with zero, it might be
> reasonable to optimized it
> to the corresponding memcmp, i.e
> 
> strncmp (s, “abc”, 3) != 0
> 
> could be optimized to
> 
> memcmp (s, “abc”, 3) != 0

If the strncmp size is smaller than the strlen of the constant string (and
alignment is right), yes. But strncmp (s, "abc", C) is equivalent to strcmp (s,
"abc") if C >= 4.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #17 from Qing Zhao  ---
(In reply to Wilco from comment #16)

>> const char s[8] = “abcd\0abc”;  // null byte in the middle of the string
>> int f2(void) { return __builtin_strcmp(s, "abc") != 0; }
>> int f3(void) { return __builtin_strcmp(s, “abc”); }
>> 
>> can either of the above f2 or f3 been optimized to memcmp? seems not.
> 
> You never get that to the null byte as the memcmp only compares 
> strlen("abc"+1)
> characters.

Yes, this is correct for memcmp, but not for strcmp.  strcmp will get to the
null byte. 
as a result,  

const char s[8] = “abcd\0abc”;
strcmp (s, “abc”) != 0.   // s = “abcd", which is != “abc"
strncmp (s, “abc”, 3) == 0
memcmp(s, “abc”, 3) == 0

So, strcmp cannot optimized to memcmp 

> However do you mean an input string which is shorter than the
> constant string? That's fine as this will compare not-equal in the memcmp.

for the input string is shorter than the constant string, for example: 

const char s[8] = “ab\0\0abcd”;
strcmp (s, “abc”) != 0
strncmp (s, “abc”, 3) != 0
memcmp (s, “abc”,3) != 0

In a summary, since it’s NOT easy for the compiler to know where is the
“Null_terminator” 
in the string, strcmp is NOT reasonable to be optimized to memcmp whenever its
result is 
used to compare with zero or not.

But for strncmp, if the result is to compare with zero, it might be reasonable
to optimized it
to the corresponding memcmp, i.e

strncmp (s, “abc”, 3) != 0

could be optimized to

memcmp (s, “abc”, 3) != 0

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #16 from Wilco  ---
(In reply to Qing Zhao from comment #15)
> (In reply to Wilco from comment 14)
> > The only reason we have to do a character by character comparison is 
> > because we
> > cannot read beyond the end of a string. However when we know the size or
> > alignment we can safely process a string one word at a time.
> 
> is it possible that “NULL_terminator” is in the middle of the string even
> though we 
> know the size or alignment? for example:
> 
> const char s[8] = “abcd\0abc”;  // null byte in the middle of the string
> int f2(void) { return __builtin_strcmp(s, "abc") != 0; }
> int f3(void) { return __builtin_strcmp(s, “abc”); }
> 
> can either of the above f2 or f3 been optimized to memcmp? seems not.

You never get that to the null byte as the memcmp only compares strlen("abc"+1)
characters. However do you mean an input string which is shorter than the
constant string? That's fine as this will compare not-equal in the memcmp.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #15 from Qing Zhao  ---
(In reply to Wilco from comment 14)
> The only reason we have to do a character by character comparison is because 
> we
> cannot read beyond the end of a string. However when we know the size or
> alignment we can safely process a string one word at a time.

is it possible that “NULL_terminator” is in the middle of the string even
though we 
know the size or alignment? for example:

const char s[8] = “abcd\0abc”;  // null byte in the middle of the string
int f2(void) { return __builtin_strcmp(s, "abc") != 0; }
int f3(void) { return __builtin_strcmp(s, “abc”); }

can either of the above f2 or f3 been optimized to memcmp? seems not.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #14 from Wilco  ---
(In reply to Qing Zhao from comment #11)
> (In reply to Wilco from comment #9)
> 
> > str(n)cmp with a constant string can be changed into memcmp if the string 
> > has a
> > known alignment or is an array of known size. We should check the common 
> > cases
> > are implemented.
> 
> Please provide an example in which a str(n)cmp with a constant string can be
> changed into
> memcmp. 
> 
> (From my understanding, for the strcmp (p, “fish”),  since we don’t know
> what will be in the string
> pointed by “p”, and there might be NULL_terminator in any of the place of p,
> we have to compare
> each char in “p” one by one, do I miss anything here?)

The only reason we have to do a character by character comparison is because we
cannot read beyond the end of a string. However when we know the size or
alignment we can safely process a string one word at a time.

I would expect all these to optimize into memcmp - but none do:

#include 

char s[100];
typedef struct { int x; char s[8]; } S;

int f1(S *s) { return __builtin_strcmp(s->s, "abc") != 0; }
int f2(void) { return __builtin_strcmp(s, "abc") != 0; }
int f3(char *s) { s = __builtin_assume_aligned (s, 8); return
__builtin_strcmp(s, "abc") != 0; }
int f4(void) { char *s = (char*)malloc(100); return __builtin_strcmp(s, "abc")
!= 0; }

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #13 from Qing Zhao  ---
(In reply to Richard Earnshaw from comment 12)
> That's not entirely correct.  Notionally memcmp needs to return a value
> representing the relative difference of the first different byte in the
> compared areas of memory; any later bytes are irrelevant.  Yes the compiler 
> can
> compare multiple bytes at the same time and it does not have to worry about
> page faulting, but it does have to keep track of where the first difference
> occurs.
> 
> Of course, the compiler can see how the result is used to optimize things
> further; a simple equality test will allow the compiler to generate a simpler
> sequence that could access all bytes and accumulate the overall result.

Yes.  this should be the reason why currently in GCC, only when the result of 
memcpy is compared to zero, it is optimized by “strlen” pass and “expand” pass.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #12 from Richard Earnshaw  ---
(In reply to Qing Zhao from comment #7)
> on the other hand, memcmp will NOT early stop, it will compare exactly N
> bytes of both buffers. As a result, the compiler can compare multiple bytes
> at one time.  
> 

That's not entirely correct.  Notionally memcmp needs to return a value
representing the relative difference of the first different byte in the
compared areas of memory; any later bytes are irrelevant.  Yes the compiler can
compare multiple bytes at the same time and it does not have to worry about
page faulting, but it does have to keep track of where the first difference
occurs.

Of course, the compiler can see how the result is used to optimize things
further; a simple equality test will allow the compiler to generate a simpler
sequence that could access all bytes and accumulate the overall result.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #11 from Qing Zhao  ---
(In reply to Wilco from comment #9)

> str(n)cmp with a constant string can be changed into memcmp if the string has 
> a
> known alignment or is an array of known size. We should check the common cases
> are implemented.

Please provide an example in which a str(n)cmp with a constant string can be
changed into
memcmp. 

(From my understanding, for the strcmp (p, “fish”),  since we don’t know what
will be in the string
pointed by “p”, and there might be NULL_terminator in any of the place of p, we
have to compare
each char in “p” one by one, do I miss anything here?)

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #10 from Qing Zhao  ---
>> From the data, we can see the inlined version of strcmp (by glibc) is much
>> slower than the direct call to strcmp.  (this is for size 2)
>> I am using GCC farm machine gcc116:
> 
> This result doesn't make sense - it looks like GCC is moving the strcmp call 
> in
> the 2nd case as a loop invariant, so you're just measuring a loop with just a
> subtract and orr instruction…

Yes, Wilco is right here.  -ftree-loop-im moves the call to strcmp out of the
loop.
in order to avoid this issue, I changed the options to

-O -fno-tree-loop-im

and checked the assembly of the routine “cmp2” for the INLINED and Non-INLINED
version.

Inlined version:
cmp2:
mov x4, x0
mov w2, 51712
movkw2, 0x3b9a, lsl 16
mov w0, 0
mov w3, 102
b   .L3
.L2:
neg w1, w1
orr w0, w0, w1
subsw2, w2, #1
beq .L5
.L3:
ldrbw1, [x4]
subsw1, w3, w1
bne .L2
ldrbw1, [x4, 1]
neg w1, w1
b   .L2
.L5:
ret

Non-inlined version:
cmp2:
stp x29, x30, [sp, -48]!
add x29, sp, 0
stp x19, x20, [sp, 16]
stp x21, x22, [sp, 32]
mov x22, x0
mov w19, 51712
movkw19, 0x3b9a, lsl 16
mov w20, 0
adrpx21, .LC0
add x21, x21, :lo12:.LC0
.L2:
mov x1, x21
mov x0, x22
bl  strcmp
orr w20, w20, w0
subsw19, w19, #1
bne .L2
mov w0, w20
ldp x19, x20, [sp, 16]
ldp x21, x22, [sp, 32]
ldp x29, x30, [sp], 48
ret

Then, the run-time performance data is:

qinzhao@gcc116:~/Bugs/78809/const_cmp/perf$ sh t_p
/home/qinzhao/Install/latest/bin/gcc -O -fno-tree-loop-im t_p_1.c t_p.c
-DINLINED
inlined version
34.73user 0.00system 0:34.73elapsed 99%CPU (0avgtext+0avgdata 360maxresident)k
0inputs+0outputs (0major+135minor)pagefaults 0swaps
/home/qinzhao/Install/latest/bin/gcc -O -fno-tree-loop-im t_p_1.c t_p.c
non-inlined version
138.79user 0.00system 2:18.77elapsed 100%CPU (0avgtext+0avgdata
356maxresident)k
0inputs+0outputs (0major+135minor)pagefaults 0swaps

Yes, looks like that the inlined version is much faster than the non-inlined
version on aarch64 platform.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #9 from Wilco  ---
(In reply to Qing Zhao from comment #7)

str(n)cmp with a constant string can be changed into memcmp if the string has a
known alignment or is an array of known size. We should check the common cases
are implemented.

Given it takes just 3 instructions per character on almost all targets, it is
reasonable to inline strcmp with a small string if it cannot be changed into
memcmp. This will be faster than calling strcmp due to the overhead of the
call, the plt redirection and alignment checks. Note also that many strcmp
implementations will do a byte-by-byte check if the strings are not aligned.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #8 from Wilco  ---

> /home/qinzhao/Install/latest/bin/gcc -O2 t_p_1.c t_p.c
> non-inlined version
> 20.84user 0.00system 0:20.83elapsed 100%CPU (0avgtext+0avgdata
> 360maxresident)k
> 0inputs+0outputs (0major+135minor)pagefaults 0swaps
> 
> From the data, we can see the inlined version of strcmp (by glibc) is much
> slower than the direct call to strcmp.  (this is for size 2)
> I am using GCC farm machine gcc116:

This result doesn't make sense - it looks like GCC is moving the strcmp call in
the 2nd case as a loop invariant, so you're just measuring a loop with just a
subtract and orr instruction...

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #7 from Qing Zhao  ---
I have studied the inlining of memcmp and str(n)cmp in GCC, the following are a
summary of my study so far:

1. memcmp is different from str(n)cmp as the following:

• strcmp compares null-terminated C strings
• strncmp compares at most N characters of null-terminated C strings
• memcmp compares binary byte buffers of N bytes.

among these three, both strcmp and strncmp might early stop at NULL terminator
of the compared strings.  as a result, we have to compare the char one by one
for str(n)cmp.

on the other hand, memcmp will NOT early stop, it will compare exactly N bytes
of both buffers. As a result, the compiler can compare multiple bytes at one
time.  

So, memcpy should be much easier for the compiler to optimize than str(n)cmp. 

2. when the memcpy’s result is only compared against zero, it's easier
to be optimized than its result compared with other values. 

3. the implementation of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 have been in latest GCC, and
it optimizes the following
cases on ALL platforms:

memcmp (p, "fishi", 5) != 0
__builtin_memcmp (p, "fish", 8) == 0

and the implementation is a target independent one. when the length of constant
string is multiple word size, it is optimized by “strlen” pass,
when it’s NOT multiple word size, it is optimized by “expand” pass. 

4. However, this implementation to PR52171 does NOT optimize the
following cases:

A.   memcmp’s result is NOT compared to zero

memcmp (p, "fish", 4);   
__builtin_memcmp (p, “fishi”, 5);   

B.  strncmp and strcmp, when the result is or is NOT compared to zero
strncmp (p, "fi", 2) != 0
__builtin_strncmp (p, "fi", 2) != 0

strcmp (p, "fish") != 0
__builtin_strcmp (p, "fish") != 0

strncmp (p, "fi", 2) 
__builtin_strncmp (p, "fi", 2)

strcmp (p, "fish”)   
__builtin_strcmp (p, "fish”)   

Per the reason I mentioned in 1.  strncmp and strcmp can ONLY compare the char
one by one,  the implementation of PR52171 can NOT be extended 
for str(n)cmp.  


5. currently,  glibc optimizes strcmp with constant string up to size 3
in the header as Wilco mentioned in the Description part of the PR. the
optimization is as following:

strcmp (p, “fis”)

will be transformed to:

D.2234 = __s2_len = 3;, __s2_len <= 3; ? TARGET_EXPR  1 && __result == 0)
  {
__result = (int) *(__s1 + 2) - (int) *((const unsigned char *)
"fis" + 2);
if (__s2_len > 2 && __result == 0)
  {
__result = (int) *(__s1 + 3) - (int) *((const unsigned char
*) "fis" + 3);
  }
  }
  }
  }
  D.2233 = __result;
}> : __builtin_strcmp ((const char *) p, (const char *) "fis");

I.e, each character is compared one by one, and the result of the previous
comparison is used to control whether the next char should be compared or not,
as a result,  comparing of one char needs one load, one compare and one branch.
 not very cheap operations. therefore, we cannot apply the above optimization
on longer strings even though its constant string.

The Request of this PR is to move the above transformation from Glibc into GCC.

However, my run-time performance data showed in Comment 5 and 6, shows:

the glibc optimization is SLOWER than the direct call to strcmp on aarch64.

  6.  On the other hand, on a platform that has hardward strcmp insn,  for
example, X86 platform, expand the str(n)cmp to use the hardware strcmp insn
will be very fast.  GCC currently has done this for several platforms.  

  However, for a platform that does NOT have hardware strcmp insns, for
example, aarch64,  inline str(n)cmp might NOT be a good idea, I think.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #6 from Qing Zhao  ---
(A correction to comment 4:

when adding #include , the call to strcmp(s, "a") was inlined by
glibc in the header

with the latest upstream GCC
)


with attached testing case, on GCC farm machine gcc116, I got the following
run-time performance data:

qinzhao@gcc116:~/Bugs/78809/const_cmp$ sh t_p
/home/qinzhao/Install/latest/bin/gcc -O2 t_p_1.c t_p.c -DINLINED
inlined version
41.67user 0.00system 0:41.67elapsed 99%CPU (0avgtext+0avgdata 360maxresident)k
0inputs+0outputs (0major+135minor)pagefaults 0swaps
/home/qinzhao/Install/latest/bin/gcc -O2 t_p_1.c t_p.c
non-inlined version
20.84user 0.00system 0:20.83elapsed 100%CPU (0avgtext+0avgdata 360maxresident)k
0inputs+0outputs (0major+135minor)pagefaults 0swaps

>From the data, we can see the inlined version of strcmp (by glibc) is much
slower than the direct call to strcmp.  (this is for size 2)
I am using GCC farm machine gcc116:

qinzhao@gcc116:~/Bugs/78809/const_cmp$ uname -a
Linux gcc116 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:45:34 UTC 2016
aarch64 aarch64 aarch64 GNU/Linux

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #5 from Qing Zhao  ---
Created attachment 42449
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42449=edit
Run time performance testing case on aarch64

this is for testing the run-time performance of inlined version of strcmp and
non-inlined version of strcmp on aarch64. 
to run the test:
   0. on an aarch64 machine;
   1. untar the tar file
   2. cd perf_78809
   3. sh t_p

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-13 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

Wilco  changed:

   What|Removed |Added

 CC||wdijkstr at arm dot com

--- Comment #4 from Wilco  ---
(In reply to Qing Zhao from comment #3)
> with the latest upstream GCC, we got the following assembly for the testing
> case with -O2 on aarch64:
> 
> t1:
>   adrpx1, .LC0
>   add x1, x1, :lo12:.LC0
>   b   strcmp
>   
> t2:
>   adrpx1, .LC0
>   add x1, x1, :lo12:.LC0
>   b   strcmp

Yes the inlining was recently removed from GLIBC since the goal is to inline in
GCC.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-13 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #3 from Qing Zhao  ---
with the latest upstream GCC, we got the following assembly for the testing
case with -O2 on aarch64:

t1:
adrpx1, .LC0
add x1, x1, :lo12:.LC0
b   strcmp

t2:
adrpx1, .LC0
add x1, x1, :lo12:.LC0
b   strcmp


As I checked on X86, the same testing case has the following assembly with -O2:
t1:

movq%rdi, %rsi
movl$2, %ecx
movl$.LC0, %edi
repz; cmpsb
seta%al
setb%dl
subl%edx, %eax
movsbl  %al, %eax
ret

t2:
movq%rdi, %rsi
movl$2, %ecx
movl$.LC0, %edi
repz; cmpsb
seta%al
setb%dl
subl%edx, %eax
movsbl  %al, %eax
ret

on X86, both routines have the exact assembly and both are inlined by GCC.

[Bug middle-end/78809] Inline strcmp with small constant strings

2016-12-14 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #2 from wilco at gcc dot gnu.org ---
(In reply to Richard Biener from comment #1)
> We may have dups of this.  And we now have inlining for strcmp/memcmp when
> the
> result is only compared against zero.

I don't see that happening with latest trunk (maybe patches are still
outstanding?), however if so then that code could be generalized to support any
strcmp.

[Bug middle-end/78809] Inline strcmp with small constant strings

2016-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
We may have dups of this.  And we now have inlining for strcmp/memcmp when the
result is only compared against zero.