Re: help with PR78809 - inline strcmp for small constant strings

2017-08-06 Thread Martin Sebor

On 08/04/2017 06:59 AM, Prathamesh Kulkarni wrote:

Hi,
I was having a look at PR78809.
For the test-case:
int t1(const char *s) { return __builtin_strcmp (s, "a"); }

for aarch64, trunk with -O2 generates:
t1:
adrpx1, .LANCHOR0
add x1, x1, :lo12:.LANCHOR0
b   strcmp

For i386, it seems strcmp is expanded inline via cmpstr optab by
expand_builtin_strcmp
if one of the strings is constant. Could we similarly define cmpstr
pattern for AArch64 ?

For constant strings of small length (upto 3?), I was wondering if
it'd be a good idea to
manually unroll strcmp loop, similar to __strcmp_* macros in bits/string.h  ?
For eg in gimple-fold, transform
x = __builtin_strcmp(s, "ab")
to
x = s[0] - 'a';
if (x == 0)
{
  x = s[1] - 'b';
  if (x == 0)
x = s[2];
}


IMO, in general, it's better do this sort of expansion into "low
level" code late, after higher-level optimizations have been done.
Otherwise it can defeat such optimizations in passes that aren't
prepared to handle the low level code.  For instance, in the
following, the call to strcmp could readily be eliminated in
tree-ssa-strlen because the pass knows both of the strings being
compared.  But if the strcmp() call were expanded as you show above
the optimization couldn't be implemented nearly as easily (not
without essentially undoing the work previously done by the folder).

  void f (char *s)
  {
strcpy (s, "a");

if (strcmp (s, "ab") > 0)
  abort ();
  }

Not only that, unless all the arguments are known, the early
expansion can also defeat checks for their validity.  For example,
in strcmp (a, b), if a and b point to the same non-null terminated
or even uninitialized) array the call is folded into zero without
a warning.  If/when tree-ssa-strlen (or some other pass) is enhanced
to detect such invalid strings this bug could be easily detected but
only as long as strcmp call makes it to the pass untransformed.

Martin

PS The examples above are only hypothetical because tree-ssa-
strlen doesn't at present handle strcmp.  Bugs 81703 and
81704 are examples where folding does get in the way today.


Re: help with PR78809 - inline strcmp for small constant strings

2017-08-05 Thread Aaron Sawdey
On Fri, 2017-08-04 at 17:30 -0500, Segher Boessenkool wrote:
> On Fri, Aug 04, 2017 at 08:38:11PM +, Wilco Dijkstra wrote:
> > Richard Henderson wrote:
> > > On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote:
> > > > For i386, it seems strcmp is expanded inline via cmpstr optab
> > > > by 
> > > > expand_builtin_strcmp if one of the strings is constant. Could
> > > > we similarly
> > > > define cmpstr pattern for AArch64?
> > > 
> > > Certainly that's possible.
> > 
> > I'd suggest to do it as a target independent way, this is not a
> > target specific
> > optimization and shouldn't be done in the target unless there are
> > special
> > strcmp instructions.
> 
> See rs6000-string.c; cc:ing Aaron.

I think I'd argue that even if there isn't an instruction that does
strcmp (i.e. repz cmpsb) you are still better off with target specific
code. For example, on power one needs to be aware of how the different
processors deal with unaligned loads and that power7 for example
doesn't like overlapping unaligned loads which are fine on power8. Also
instructions like cmpb are useful and I don't really see how a target
independent expansion would end up using that.

> > > For constant strings of small length (upto 3?), I was wondering
> > > if it'd be a
> > > good idea to manually unroll strcmp loop, similar to __strcmp_*
> > > macros in
> > > bits/string.h?>
> > > For eg in gimple-fold, transform
> > > x = __builtin_strcmp(s, "ab")
> > > to
> > > x = s[0] - 'a';
> > > if (x == 0)
> > > {
> > >    x = s[1] - 'b';
> > >    if (x == 0)
> > >  x = s[2];
> > > }

There was code to manually unroll strcmp/strncmp in
string/bits/string2.h in GLIBC but that was recently removed:

https://sourceware.org/git/?p=glibc.git;a=commit;h=f7db120f67d853e0cfa2
72fa39c8b9be67c0a935

Personally I'm glad to see it go not only because of the expansion into
individual comparisons of the first couple characters, but also because
it expanded strcmp/strncmp to __builtin_strcmp/strncmp which meant that
you could not disable the gcc expansions of strcmp/strncmp by using 
-fno-builtin-strcmp/strncmp.

> > 
> > If there is already code that does something similar (see comment
> > #1 in PR78809),
> > it could be easily adapted to handle more cases.
> > 
> > >   if (memcmp(s, "ab", 3) != 0)
> > > 
> > > to be implemented with cmp+ccmp+ccmp and one branch.
> > 
> > Even better would be wider loads if you either know the alignment
> > of s or it's max size
> > (although given the overhead of creating the return value that
> > works best for equality).
> 
> All those things are handled there, too.  Most things that can really
> help
> are quite target-specific, I think.

Same thing goes for memcmp, you really need to know how the target
handles aligned/unaligned and for example on power9 it is handy to be
able to use setb to produce the correct SImode result even if you just
did a DImode compare of 64 bits.

> 
> 
> Segher
> 

Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: help with PR78809 - inline strcmp for small constant strings

2017-08-04 Thread Segher Boessenkool
On Fri, Aug 04, 2017 at 08:38:11PM +, Wilco Dijkstra wrote:
> Richard Henderson wrote:    
> > On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote:
> 
> > > For i386, it seems strcmp is expanded inline via cmpstr optab by 
> > > expand_builtin_strcmp if one of the strings is constant. Could we 
> > > similarly
> > > define cmpstr pattern for AArch64?
> > 
> > Certainly that's possible.
> 
> I'd suggest to do it as a target independent way, this is not a target 
> specific
> optimization and shouldn't be done in the target unless there are special
> strcmp instructions.

See rs6000-string.c; cc:ing Aaron.

> > For constant strings of small length (upto 3?), I was wondering if it'd be a
> > good idea to manually unroll strcmp loop, similar to __strcmp_* macros in
> > bits/string.h?>
> > For eg in gimple-fold, transform
> > x = __builtin_strcmp(s, "ab")
> > to
> > x = s[0] - 'a';
> > if (x == 0)
> > {
> >   x = s[1] - 'b';
> >   if (x == 0)
> > x = s[2];
> > }
> 
> If there is already code that does something similar (see comment #1 in 
> PR78809),
> it could be easily adapted to handle more cases.
> 
> >  if (memcmp(s, "ab", 3) != 0)
> >
> > to be implemented with cmp+ccmp+ccmp and one branch.
> 
> Even better would be wider loads if you either know the alignment of s or 
> it's max size
> (although given the overhead of creating the return value that works best for 
> equality).

All those things are handled there, too.  Most things that can really help
are quite target-specific, I think.


Segher


Re: help with PR78809 - inline strcmp for small constant strings

2017-08-04 Thread Richard Henderson
On 08/04/2017 01:38 PM, Wilco Dijkstra wrote:
>> For constant strings of small length (upto 3?), I was wondering if it'd be a
>> good idea to manually unroll strcmp loop, similar to __strcmp_* macros in
>> bits/string.h?>
>> For eg in gimple-fold, transform
>> x = __builtin_strcmp(s, "ab")
>> to
>> x = s[0] - 'a';
>> if (x == 0)
>> {
>>x = s[1] - 'b';
>>if (x == 0)
>>  x = s[2];
>> }
> 
> If there is already code that does something similar (see comment #1 in 
> PR78809),
> it could be easily adapted to handle more cases.

There's emit_block_cmp_hints and compare_by_pieces.
There is not anything at present that will do the above
such that we avoid extra loads.

> Even better would be wider loads if you either know the alignment 
> of s or it's max size (although given the overhead of creating the
> return value that works best for equality).

It looks like compare_by_pieces does that.


r~


Re: help with PR78809 - inline strcmp for small constant strings

2017-08-04 Thread Wilco Dijkstra
Richard Henderson wrote:    
> On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote:

> > For i386, it seems strcmp is expanded inline via cmpstr optab by 
> > expand_builtin_strcmp if one of the strings is constant. Could we similarly
> > define cmpstr pattern for AArch64?
> 
> Certainly that's possible.

I'd suggest to do it as a target independent way, this is not a target specific
optimization and shouldn't be done in the target unless there are special
strcmp instructions.

> For constant strings of small length (upto 3?), I was wondering if it'd be a
> good idea to manually unroll strcmp loop, similar to __strcmp_* macros in
> bits/string.h?>
> For eg in gimple-fold, transform
> x = __builtin_strcmp(s, "ab")
> to
> x = s[0] - 'a';
> if (x == 0)
> {
>   x = s[1] - 'b';
>   if (x == 0)
> x = s[2];
> }

If there is already code that does something similar (see comment #1 in 
PR78809),
it could be easily adapted to handle more cases.

>  if (memcmp(s, "ab", 3) != 0)
>
> to be implemented with cmp+ccmp+ccmp and one branch.

Even better would be wider loads if you either know the alignment of s or it's 
max size
(although given the overhead of creating the return value that works best for 
equality).

Wilco

Re: help with PR78809 - inline strcmp for small constant strings

2017-08-04 Thread Richard Henderson
On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote:
> Hi,
> I was having a look at PR78809.
> For the test-case:
> int t1(const char *s) { return __builtin_strcmp (s, "a"); }
> 
> for aarch64, trunk with -O2 generates:
> t1:
> adrpx1, .LANCHOR0
> add x1, x1, :lo12:.LANCHOR0
> b   strcmp
> 
> For i386, it seems strcmp is expanded inline via cmpstr optab by 
> expand_builtin_strcmp if one of the strings is constant. Could we similarly
> define cmpstr pattern for AArch64?

Certainly that's possible.

> For constant strings of small length (upto 3?), I was wondering if it'd be a
> good idea to manually unroll strcmp loop, similar to __strcmp_* macros in
> bits/string.h?>
> For eg in gimple-fold, transform
> x = __builtin_strcmp(s, "ab")
> to
> x = s[0] - 'a';
> if (x == 0)
> {
>   x = s[1] - 'b';
>   if (x == 0)
> x = s[2];
> }

I don't know that it would be helpful to do that as early as gimple-fold.
Surely rtl-expand is early enough for this.

Indeed, if the gimple passes are able to transform

  strcmp(s, "ab")
to
  memcmp(s, "ab", 3)

when it can be shown that S has at least 3 bytes known to be allocated (e.g. s
is an array of known size).  This might allow

  if (memcmp(s, "ab", 3) != 0)

to be implemented with cmp+ccmp+ccmp and one branch.


r~