Re: help with PR78809 - inline strcmp for small constant strings
On 08/04/2017 06:59 AM, Prathamesh Kulkarni wrote: Hi, I was having a look at PR78809. For the test-case: int t1(const char *s) { return __builtin_strcmp (s, "a"); } for aarch64, trunk with -O2 generates: t1: adrpx1, .LANCHOR0 add x1, x1, :lo12:.LANCHOR0 b strcmp For i386, it seems strcmp is expanded inline via cmpstr optab by expand_builtin_strcmp if one of the strings is constant. Could we similarly define cmpstr pattern for AArch64 ? For constant strings of small length (upto 3?), I was wondering if it'd be a good idea to manually unroll strcmp loop, similar to __strcmp_* macros in bits/string.h ? For eg in gimple-fold, transform x = __builtin_strcmp(s, "ab") to x = s[0] - 'a'; if (x == 0) { x = s[1] - 'b'; if (x == 0) x = s[2]; } IMO, in general, it's better do this sort of expansion into "low level" code late, after higher-level optimizations have been done. Otherwise it can defeat such optimizations in passes that aren't prepared to handle the low level code. For instance, in the following, the call to strcmp could readily be eliminated in tree-ssa-strlen because the pass knows both of the strings being compared. But if the strcmp() call were expanded as you show above the optimization couldn't be implemented nearly as easily (not without essentially undoing the work previously done by the folder). void f (char *s) { strcpy (s, "a"); if (strcmp (s, "ab") > 0) abort (); } Not only that, unless all the arguments are known, the early expansion can also defeat checks for their validity. For example, in strcmp (a, b), if a and b point to the same non-null terminated or even uninitialized) array the call is folded into zero without a warning. If/when tree-ssa-strlen (or some other pass) is enhanced to detect such invalid strings this bug could be easily detected but only as long as strcmp call makes it to the pass untransformed. Martin PS The examples above are only hypothetical because tree-ssa- strlen doesn't at present handle strcmp. Bugs 81703 and 81704 are examples where folding does get in the way today.
Re: help with PR78809 - inline strcmp for small constant strings
On Fri, 2017-08-04 at 17:30 -0500, Segher Boessenkool wrote: > On Fri, Aug 04, 2017 at 08:38:11PM +, Wilco Dijkstra wrote: > > Richard Henderson wrote: > > > On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote: > > > > For i386, it seems strcmp is expanded inline via cmpstr optab > > > > by > > > > expand_builtin_strcmp if one of the strings is constant. Could > > > > we similarly > > > > define cmpstr pattern for AArch64? > > > > > > Certainly that's possible. > > > > I'd suggest to do it as a target independent way, this is not a > > target specific > > optimization and shouldn't be done in the target unless there are > > special > > strcmp instructions. > > See rs6000-string.c; cc:ing Aaron. I think I'd argue that even if there isn't an instruction that does strcmp (i.e. repz cmpsb) you are still better off with target specific code. For example, on power one needs to be aware of how the different processors deal with unaligned loads and that power7 for example doesn't like overlapping unaligned loads which are fine on power8. Also instructions like cmpb are useful and I don't really see how a target independent expansion would end up using that. > > > For constant strings of small length (upto 3?), I was wondering > > > if it'd be a > > > good idea to manually unroll strcmp loop, similar to __strcmp_* > > > macros in > > > bits/string.h?> > > > For eg in gimple-fold, transform > > > x = __builtin_strcmp(s, "ab") > > > to > > > x = s[0] - 'a'; > > > if (x == 0) > > > { > > > x = s[1] - 'b'; > > > if (x == 0) > > > x = s[2]; > > > } There was code to manually unroll strcmp/strncmp in string/bits/string2.h in GLIBC but that was recently removed: https://sourceware.org/git/?p=glibc.git;a=commit;h=f7db120f67d853e0cfa2 72fa39c8b9be67c0a935 Personally I'm glad to see it go not only because of the expansion into individual comparisons of the first couple characters, but also because it expanded strcmp/strncmp to __builtin_strcmp/strncmp which meant that you could not disable the gcc expansions of strcmp/strncmp by using -fno-builtin-strcmp/strncmp. > > > > If there is already code that does something similar (see comment > > #1 in PR78809), > > it could be easily adapted to handle more cases. > > > > > if (memcmp(s, "ab", 3) != 0) > > > > > > to be implemented with cmp+ccmp+ccmp and one branch. > > > > Even better would be wider loads if you either know the alignment > > of s or it's max size > > (although given the overhead of creating the return value that > > works best for equality). > > All those things are handled there, too. Most things that can really > help > are quite target-specific, I think. Same thing goes for memcmp, you really need to know how the target handles aligned/unaligned and for example on power9 it is handy to be able to use setb to produce the correct SImode result even if you just did a DImode compare of 64 bits. > > > Segher > Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain
Re: help with PR78809 - inline strcmp for small constant strings
On Fri, Aug 04, 2017 at 08:38:11PM +, Wilco Dijkstra wrote: > Richard Henderson wrote: > > On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote: > > > > For i386, it seems strcmp is expanded inline via cmpstr optab by > > > expand_builtin_strcmp if one of the strings is constant. Could we > > > similarly > > > define cmpstr pattern for AArch64? > > > > Certainly that's possible. > > I'd suggest to do it as a target independent way, this is not a target > specific > optimization and shouldn't be done in the target unless there are special > strcmp instructions. See rs6000-string.c; cc:ing Aaron. > > For constant strings of small length (upto 3?), I was wondering if it'd be a > > good idea to manually unroll strcmp loop, similar to __strcmp_* macros in > > bits/string.h?> > > For eg in gimple-fold, transform > > x = __builtin_strcmp(s, "ab") > > to > > x = s[0] - 'a'; > > if (x == 0) > > { > > x = s[1] - 'b'; > > if (x == 0) > > x = s[2]; > > } > > If there is already code that does something similar (see comment #1 in > PR78809), > it could be easily adapted to handle more cases. > > > if (memcmp(s, "ab", 3) != 0) > > > > to be implemented with cmp+ccmp+ccmp and one branch. > > Even better would be wider loads if you either know the alignment of s or > it's max size > (although given the overhead of creating the return value that works best for > equality). All those things are handled there, too. Most things that can really help are quite target-specific, I think. Segher
Re: help with PR78809 - inline strcmp for small constant strings
On 08/04/2017 01:38 PM, Wilco Dijkstra wrote: >> For constant strings of small length (upto 3?), I was wondering if it'd be a >> good idea to manually unroll strcmp loop, similar to __strcmp_* macros in >> bits/string.h?> >> For eg in gimple-fold, transform >> x = __builtin_strcmp(s, "ab") >> to >> x = s[0] - 'a'; >> if (x == 0) >> { >>x = s[1] - 'b'; >>if (x == 0) >> x = s[2]; >> } > > If there is already code that does something similar (see comment #1 in > PR78809), > it could be easily adapted to handle more cases. There's emit_block_cmp_hints and compare_by_pieces. There is not anything at present that will do the above such that we avoid extra loads. > Even better would be wider loads if you either know the alignment > of s or it's max size (although given the overhead of creating the > return value that works best for equality). It looks like compare_by_pieces does that. r~
Re: help with PR78809 - inline strcmp for small constant strings
Richard Henderson wrote: > On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote: > > For i386, it seems strcmp is expanded inline via cmpstr optab by > > expand_builtin_strcmp if one of the strings is constant. Could we similarly > > define cmpstr pattern for AArch64? > > Certainly that's possible. I'd suggest to do it as a target independent way, this is not a target specific optimization and shouldn't be done in the target unless there are special strcmp instructions. > For constant strings of small length (upto 3?), I was wondering if it'd be a > good idea to manually unroll strcmp loop, similar to __strcmp_* macros in > bits/string.h?> > For eg in gimple-fold, transform > x = __builtin_strcmp(s, "ab") > to > x = s[0] - 'a'; > if (x == 0) > { > x = s[1] - 'b'; > if (x == 0) > x = s[2]; > } If there is already code that does something similar (see comment #1 in PR78809), it could be easily adapted to handle more cases. > if (memcmp(s, "ab", 3) != 0) > > to be implemented with cmp+ccmp+ccmp and one branch. Even better would be wider loads if you either know the alignment of s or it's max size (although given the overhead of creating the return value that works best for equality). Wilco
Re: help with PR78809 - inline strcmp for small constant strings
On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote: > Hi, > I was having a look at PR78809. > For the test-case: > int t1(const char *s) { return __builtin_strcmp (s, "a"); } > > for aarch64, trunk with -O2 generates: > t1: > adrpx1, .LANCHOR0 > add x1, x1, :lo12:.LANCHOR0 > b strcmp > > For i386, it seems strcmp is expanded inline via cmpstr optab by > expand_builtin_strcmp if one of the strings is constant. Could we similarly > define cmpstr pattern for AArch64? Certainly that's possible. > For constant strings of small length (upto 3?), I was wondering if it'd be a > good idea to manually unroll strcmp loop, similar to __strcmp_* macros in > bits/string.h?> > For eg in gimple-fold, transform > x = __builtin_strcmp(s, "ab") > to > x = s[0] - 'a'; > if (x == 0) > { > x = s[1] - 'b'; > if (x == 0) > x = s[2]; > } I don't know that it would be helpful to do that as early as gimple-fold. Surely rtl-expand is early enough for this. Indeed, if the gimple passes are able to transform strcmp(s, "ab") to memcmp(s, "ab", 3) when it can be shown that S has at least 3 bytes known to be allocated (e.g. s is an array of known size). This might allow if (memcmp(s, "ab", 3) != 0) to be implemented with cmp+ccmp+ccmp and one branch. r~