Re: strncasecmp
So, finally looping back to the httpd vs. apr implementations of these functions... apr was testing null against equality and both *str1 null and *str2 null characters - that's obviously wrong since str2 of null. It also performed final unnecessary increments of str1 and str1, which I've corrected on apr-2 trunk. And I'm using a short [] rather than unsigned char [] which solves some sign conversion, presuming most architectures are able to more efficiently grab an aligned short and promote to int, as opposed to an unaligned byte (testing backed this up on intel, and I measure no net benefit when aligning to int instead). With these changes, apr slightly outperforms httpd's implementation in most instances. The one small penalty is looking ahead to a null that turns out to be there, but we should only be scanning one of the two strings for a null, and the difference between XOR ax, ax vs. loading both values into ax, cx and subtracting to obtain a zero value is very nominal, once per string comparison. The penalty is only in the case that str1 is equal to str2 for the full null terminated length and is a net win in all other cases. Just synch'ed to apr 2.0 trunk, and now will mass-rename so that the backported functions all correspond to their trunk counterparts using an ap_cstr_ decorated name corresponding to our namespace, to apr's function family group ("C"/POSIX string function). I think we can use that straight back to httpd 2.4 and then worry about migrating trunk/2.6 consumers to the apr 1.6/2.0 names with some helpful #define's. On Mon, Nov 23, 2015 at 11:10 PM, Mikhail T. <mi+t...@aldan.algebra.com> wrote: > On 23.11.2015 23:14, William A Rowe Jr wrote: > > L1 cache and other direct effects of cpu internal optimization. > > Just what I was thinking. Attached is the same program with one more pair > of functions added (and an easy way to add more "candidates" to the > main-driver). I changed the FOR-loop define to obtain repeatable results: > > # Test 1 -- equal strings: > foreach m ( 0 1 2 ) > foreach? ./strncasecmp $m 1 a A 7 > foreach? end > string.h (nb=1, len=7) > time = 6.975845 : res = 0 > optimized (nb=1, len=7) > time = 1.492197 : res = 0 > 'A' - 'a' (nb=1, len=7) > time = 1.787807 : res = 0 > > # Test 2 -- immediately-different strings > foreach m ( 0 1 2 ) > foreach? ./strncasecmp $m 1 a x 7 > foreach? end > string.h (nb=1, len=7) > time = 2.527727 : res = -23 > optimized (nb=1, len=7) > time = 0.406867 : res = -23 > 'A' - 'a' (nb=1, len=7) > time = 0.440320 : res = -23 > > # Test 3 -- strings different at the very end > foreach m ( 0 1 2 ) > foreach? ./strncasecmp $m 1 a x 0 > foreach? end > string.h (nb=1, len=0) > time = 9.629660 : res = -23 > optimized (nb=1, len=0) > time = 1.387208 : res = -23 > 'A' - 'a' (nb=1, len=0) > time = 1.754683 : res = -23 > > The new pair (method 2) does not use the static table, which is likely to > benefit from CPU-cache unfairly in repetitive benchmarks. It is slower > than the table-using method 1 functions. But the two pairs might be > comparable -- or even faster -- in real life. > > -mi > >
Re: strncasecmp
On Tue, Nov 24, 2015 at 10:07 AM, Mikhail T.wrote: > On 24.11.2015 10:08, William A Rowe Jr wrote: > > As long as this function is promoted for fast ASCII-specific token > recognition and has no other unexpected equalities, it serves a useful > purpose. > > Because of this, I'd suggest renaming it to something, that emphasizes it > being ASCII-only. > Strictly speaking, you are referring to US-ASCII, while the actual function including the EBCDIC build follows LC_CTYPE/LC_COLLATE C or POSIX (equivalent names for the same thing) and is honoring LC_COLLATE of the US-ASCII ordering. apr_strcasecmp_lcposix or apr_strcasecmp_lcc might be a usable name. Other suggestions? I was leaning toward apr_strcasecmp_token since that is a pretty well recognized RFC term and usually refers to the ASCII set. For HTTPbis, token is defined at http://tools.ietf.org/html/rfc7230#section-3.2.6
Re: strncasecmp
On 24.11.2015 10:08, William A Rowe Jr wrote: > As long as this function is promoted for fast ASCII-specific token > recognition and has no other unexpected equalities, it serves a useful > purpose. Because of this, I'd suggest renaming it to something, that emphasizes it being ASCII-only. -mi
Re: strncasecmp
On Tue, Nov 24, 2015 at 4:12 AM, Mikhail T.wrote: > On 23.11.2015 19:43, Yann Ylavic wrote: > >> That's expected (or at least no cared about in this test case). We simply >> want res to not be optimized out, so print it before leaving, without any >> particular relevance for its value (string.h and optimized versions should >> return the same res with the same args, ascii strings only, though). > > Yes, we do want the value printed -- such as to catch the kind of errors I > reported earlier. But my question was, why does the value change depending > on the number of iterations? Hmm, res is not reset (to 0) in each loop, so I first thought it might well end up having a different value when ORed more times, but that makes no sense... Actually the "evil" is in (S1[0])++, (S2[0])++ in the for loop, which kills some compiler optimizations in main() but also the comparison once we leave alpha characters range for those :) Let's say that for this test to be relevant, the first char of S1 and S2 must be the same (including case) ;) Regards, Yann.
Re: strncasecmp
On Tue, Nov 24, 2015 at 6:10 AM, Mikhail T.wrote: > > Attached is the same program with one more pair of > functions added (and an easy way to add more "candidates" to the > main-driver). I changed the FOR-loop define to obtain repeatable results: This test program kills str[n]casecmp()'s inlining with the indirections (function pointers), so it's not really fair wrt string.h's versions. That's not really (often) what we'll find in real world apps. > > The new pair (method 2) does not use the static table, which is likely to > benefit from CPU-cache unfairly in repetitive benchmarks. It is slower than > the table-using method 1 functions. But the two pairs might be comparable -- > or even faster -- in real life. Possibly, this version may also be more "respective" of the cacheline (the 256 bytes table may evict hot things), yet it is slower here... Regards, Yann.
Re: strncasecmp
On Tue, Nov 24, 2015 at 6:40 AM, Jim Jagielskiwrote: > It really depends on the OS and the version of the OS. In > my test cases on OSX and CentOS5 and centOS6, I see > measurable improvements. > Part of the reason for your differences... on this console here I have; $ set | grep -E "^L[AC]" LANG=en_US.UTF-8 Knock that back to a SBCS en_US.ISO-8859-1 and you'll see an improvement, not that the stdc library implementation was inefficient (it wasn't), but that UTF-8 causes strcmp to fall over into MBCS character handling. As long as this function is promoted for fast ASCII-specific token recognition and has no other unexpected equalities, it serves a useful purpose.
Re: strncasecmp
It really depends on the OS and the version of the OS. In my test cases on OSX and CentOS5 and centOS6, I see measurable improvements. > On Nov 23, 2015, at 3:12 PM, Christophe JAILLET > <christophe.jail...@wanadoo.fr> wrote: > > Hi Jim, > > Do you have done some benchmark and have results to share? > > I tried to do some but the benefit of the optimized version is not that > clear, at least on my system: > gcc 5.2.1 > Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 > x86_64 x86_64 x86_64 GNU/Linux > > I've tried to trick gcc in order to avoid some optimization. (gcc does its > best to remove/inline/remove from loop known standard C functions) > I've check asm output to be more confident on the code generated by gcc. > > Based on my own results only, and without any becnhmark, I would -1 a > backport. > For small strings (below identical char), your implementation looks faster, > but above 4 identical chars it looks slower > > I can provide my test program if interested. > > > best regards, > CJ > > > Le 20/11/2015 19:53, Jim Jagielski a écrit : >> Implemented in r1715401 >> >> If people want to nit-pick about naming and wish to >> rename it to something else, be my guest. >> >>> On Nov 20, 2015, at 1:03 PM, Yann Ylavic<ylavic@gmail.com> wrote: >>> >>> +1 >>> >>> On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski<j...@jagunet.com> wrote: >>>> We use str[n]casecmp quite a bit. The rub is that some >>>> platforms use a sensible implementation (such as OSX) which >>>> uses a upper-lowercase map and is v. fast, and others >>>> use a brain-dead version which does an actual tolower() of >>>> each char in the string as it tests. We actually try to >>>> handle this in many cases by doing a switch/case test on the >>>> 1st char to fast path the strncasecmp, resulting in ugly code. >>>> >>>> This is crazy. >>>> >>>> I propose a ap_strncasecmp/ap_strcasecmp which we should use. >>>> Ideally, it would be in apr but no need to wait for that >>>> to happen :) >>>> >>>> Unless people have heartburn about this, I will add next week. >
Re: strncasecmp
Hi Jim, Do you have done some benchmark and have results to share? I tried to do some but the benefit of the optimized version is not that clear, at least on my system: gcc 5.2.1 Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux I've tried to trick gcc in order to avoid some optimization. (gcc does its best to remove/inline/remove from loop known standard C functions) I've check asm output to be more confident on the code generated by gcc. Based on my own results only, and without any becnhmark, I would -1 a backport. For small strings (below identical char), your implementation looks faster, but above 4 identical chars it looks slower I can provide my test program if interested. best regards, CJ Le 20/11/2015 19:53, Jim Jagielski a écrit : Implemented in r1715401 If people want to nit-pick about naming and wish to rename it to something else, be my guest. On Nov 20, 2015, at 1:03 PM, Yann Ylavic<ylavic@gmail.com> wrote: +1 On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski<j...@jagunet.com> wrote: We use str[n]casecmp quite a bit. The rub is that some platforms use a sensible implementation (such as OSX) which uses a upper-lowercase map and is v. fast, and others use a brain-dead version which does an actual tolower() of each char in the string as it tests. We actually try to handle this in many cases by doing a switch/case test on the 1st char to fast path the strncasecmp, resulting in ugly code. This is crazy. I propose a ap_strncasecmp/ap_strcasecmp which we should use. Ideally, it would be in apr but no need to wait for that to happen :) Unless people have heartburn about this, I will add next week.
Re: strncasecmp
Hi Christophe, On Mon, Nov 23, 2015 at 9:12 PM, Christophe JAILLETwrote: > > I tried to do some but the benefit of the optimized version is not that > clear, at least on my system: >gcc 5.2.1 >Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 > x86_64 x86_64 x86_64 GNU/Linux Unfortunately, gcc 5.2.1 (i.e. latest compilers' versions) are not widely used... Did you try a code like ap_proxy_port_of_scheme() with values which are unknown schemes? Or even worse cases, with similarly chained strcasecmp() looking for eg. "httpx" in something like {"httpa", "httpb", "httpc", ..., "httpw"}? Regards, Yann.
Re: strncasecmp
I modified your test program a bit (to measure time from it, see attached), tried with -O{2,3,s}, and except -Os I always have better results with the "optimized" version, eg: $ ./a-O3.out 0 15000 xcxcxcxcxcxcxcxcxcxcwwaa xcxcxcxcxcxcxcxcxcxcwwaa 0 (nb=15000, len=0) time = 8.424984 : res = 0 $ ./a-O3.out 1 15000 xcxcxcxcxcxcxcxcxcxcwwaa xcxcxcxcxcxcxcxcxcxcwwaa 0 Optimized (nb=15000, len=0) time = 8.212137 : res = 0 Possibly gcc (v4.4.5 here) is clever enough to optimize/inline/cheat when given standard (no custom) code, since I had similar results than yours with the original test.c... How does this one work with gcc-5.2? On Mon, Nov 23, 2015 at 10:07 PM, Marion & Christophe JAILLETwrote: > I just made a small application which takes as command line parameters the > number of iteration to run, which version of the algorithm to use, the 2 > strings to compare and the length to compare (or 0 for the non 'n' versions) > > > Compiled using > gcc -O3 test.c > > Tested using > linux:~/Code_Source$ time ./a.out 1 1 > xcxcxcxcxcxcxcxcxcxcwwaa > xcxcxcxcxcxcxcxcxcxcwwaa 0 > Optimized (nb=1, len=0) > res = 0 > > real0m4.193s > user0m4.192s > sys0m0.000s > > > > linux:~/Code_Source$ time ./a.out 0 1 > xcxcxcxcxcxcxcxcxcxcwwaa > xcxcxcxcxcxcxcxcxcxcwwaa 0 > (nb=1, len=0) > res = 0 > > real0m1.708s > user0m1.704s > sys0m0.000s > > > > > See atatchement. > > CJ > > > > Le 23/11/2015 21:33, Yann Ylavic a écrit : >> >> Hi Christophe, >> >> On Mon, Nov 23, 2015 at 9:12 PM, Christophe JAILLET >> wrote: >>> >>> I tried to do some but the benefit of the optimized version is not that >>> clear, at least on my system: >>> gcc 5.2.1 >>> Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC >>> 2015 >>> x86_64 x86_64 x86_64 GNU/Linux >> >> Unfortunately, gcc 5.2.1 (i.e. latest compilers' versions) are not >> widely used... >> >> Did you try a code like ap_proxy_port_of_scheme() with values which >> are unknown schemes? >> Or even worse cases, with similarly chained strcasecmp() looking for >> eg. "httpx" in something like {"httpa", "httpb", "httpc", ..., >> "httpw"}? >> >> Regards, >> Yann. >> >
Re: strncasecmp
Please note that the changes in ap_str[n]casecmp(), ie: ++ps1; ++ps2; was a first try/change which (obviously) did nothing. You may ignore it. On Mon, Nov 23, 2015 at 11:43 PM, Yann Ylavicwrote: > with attachment... > > On Mon, Nov 23, 2015 at 11:42 PM, Yann Ylavic wrote: >> I modified your test program a bit (to measure time from it, see >> attached), tried with -O{2,3,s}, and except -Os I always have better >> results with the "optimized" version, eg: >> >> $ ./a-O3.out 0 15000 xcxcxcxcxcxcxcxcxcxcwwaa >> xcxcxcxcxcxcxcxcxcxcwwaa 0 >> (nb=15000, len=0) >> time = 8.424984 : res = 0 >> >> $ ./a-O3.out 1 15000 xcxcxcxcxcxcxcxcxcxcwwaa >> xcxcxcxcxcxcxcxcxcxcwwaa 0 >> Optimized (nb=15000, len=0) >> time = 8.212137 : res = 0 >> >> Possibly gcc (v4.4.5 here) is clever enough to optimize/inline/cheat >> when given standard (no custom) code, since I had similar results than >> yours with the original test.c... >> >> How does this one work with gcc-5.2? >> >> On Mon, Nov 23, 2015 at 10:07 PM, Marion & Christophe JAILLET >> wrote: >>> I just made a small application which takes as command line parameters the >>> number of iteration to run, which version of the algorithm to use, the 2 >>> strings to compare and the length to compare (or 0 for the non 'n' versions) >>> >>> >>> Compiled using >>> gcc -O3 test.c >>> >>> Tested using >>> linux:~/Code_Source$ time ./a.out 1 1 >>> xcxcxcxcxcxcxcxcxcxcwwaa >>> xcxcxcxcxcxcxcxcxcxcwwaa 0 >>> Optimized (nb=1, len=0) >>> res = 0 >>> >>> real0m4.193s >>> user0m4.192s >>> sys0m0.000s >>> >>> >>> >>> linux:~/Code_Source$ time ./a.out 0 1 >>> xcxcxcxcxcxcxcxcxcxcwwaa >>> xcxcxcxcxcxcxcxcxcxcwwaa 0 >>> (nb=1, len=0) >>> res = 0 >>> >>> real0m1.708s >>> user0m1.704s >>> sys0m0.000s >>> >>> >>> >>> >>> See atatchement. >>> >>> CJ >>> >>> >>> >>> Le 23/11/2015 21:33, Yann Ylavic a écrit : Hi Christophe, On Mon, Nov 23, 2015 at 9:12 PM, Christophe JAILLET wrote: > > I tried to do some but the benefit of the optimized version is not that > clear, at least on my system: > gcc 5.2.1 > Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC > 2015 > x86_64 x86_64 x86_64 GNU/Linux Unfortunately, gcc 5.2.1 (i.e. latest compilers' versions) are not widely used... Did you try a code like ap_proxy_port_of_scheme() with values which are unknown schemes? Or even worse cases, with similarly chained strcasecmp() looking for eg. "httpx" in something like {"httpa", "httpb", "httpc", ..., "httpw"}? Regards, Yann. >>>
Re: strncasecmp
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff }; int ap_strcasecmp(const char *s1, const char *s2) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (ucharmap[*ps1] == ucharmap[*ps2]) { if (*ps1 == '\0') { return (0); } ++ps1; ++ps2; } return (ucharmap[*ps1] - ucharmap[*--ps2]); } int ap_strncasecmp(const char *s1, const char *s2, int n) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (n--) { if (ucharmap[*ps1] != ucharmap[*ps2]) { return (ucharmap[*ps1] - ucharmap[*ps2]); } if (*ps1 == '\0') { break; } ++ps1; ++ps2; } return (0); } #define PROG argv[0] #define METHOD argv[1] #define NB argv[2] #define S1 argv[3] #define S2 argv[4] #define LEN argv[5] /* The ++ are here to try to prevent some optimization done by gcc */ #define FOR for (i=0; i<nb; i++, (S1[0])++, (S2[0])++) int main(int argc, char *argv[]) { int64_t diff; struct timeval tvs, tve; int i, len, nb; int res = 0; if (argc < 6) { printf("%s <0=std ; 1=optimized> S1 S2 \n", PROG); return 0; } len = atoi(LEN); nb = atoi(NB); if (*METHOD == '0') { printf(" (nb=%d, len=%d)\n", nb, len); gettimeofday(); if (len == 0) { FOR { /* really use the result of the function */ res |= strcasecmp(S1, S2); } } else { FOR { res |= strncasecmp(S1, S2, len); } } gettimeofday(); } else { printf("Optimized (nb=%d, len=%d)\n", nb, len); gettimeofday(); if (len == 0) { FOR { res |= ap_strcasecmp(S1, S2); } } else { FOR { res |= ap_strncasecmp(S1, S2, len); } } gettimeofday(); } diff = (int64_t)tve.tv_sec * 100L + tve.tv_usec; diff -= (int64_t)tvs.tv_sec * 100L + tvs.tv_usec; /* really use the result of the function */ printf("time = %lld.%.6lld : res = %d\n", (long long)diff / 100L, (long long)diff % 100L, res); return 0; }
Re: strncasecmp
On 23.11.2015 23:14, William A Rowe Jr wrote: > L1 cache and other direct effects of cpu internal optimization. Just what I was thinking. Attached is the same program with one more pair of functions added (and an easy way to add more "candidates" to the main-driver). I changed the FOR-loop define to obtain repeatable results: # Test 1 -- equal strings: foreach m ( 0 1 2 ) foreach? ./strncasecmp $m 1 a A 7 foreach? end string.h (nb=1, len=7) time = 6.975845 : res = 0 optimized (nb=1, len=7) time = 1.492197 : res = 0 'A' - 'a' (nb=1, len=7) time = 1.787807 : res = 0 # Test 2 -- immediately-different strings foreach m ( 0 1 2 ) foreach? ./strncasecmp $m 1 a x 7 foreach? end string.h (nb=1, len=7) time = 2.527727 : res = -23 optimized (nb=1, len=7) time = 0.406867 : res = -23 'A' - 'a' (nb=1, len=7) time = 0.440320 : res = -23 # Test 3 -- strings different at the very end foreach m ( 0 1 2 ) foreach? ./strncasecmp $m 1 a x 0 foreach? end string.h (nb=1, len=0) time = 9.629660 : res = -23 optimized (nb=1, len=0) time = 1.387208 : res = -23 'A' - 'a' (nb=1, len=0) time = 1.754683 : res = -23 The new pair (method 2) does not use the static table, which is likely to benefit from CPU-cache unfairly in repetitive benchmarks. It is slower than the table-using method 1 functions. But the two pairs might be comparable -- or even faster -- in real life. -mi #include #include #include #define gettimeofday(X) gettimeofday(X, NULL) #include #include #include #include #include #include /* * Provide our own known-fast implementation of str[n]casecmp() * NOTE: ASCII only! */ static const unsigned char ucharmap[] = { 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff }; int ap_strcasecmp(const char *s1, const char *s2) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (*ps1 == *ps2 || ucharmap[*ps1] == ucharmap[*ps2]) { if (*ps1 == '\0') { return (0); } ++ps1; ++ps2; } return (ucharmap[*ps1] - ucharmap[*ps2]); } int ap_strncasecmp(const char *s1, const char *s2, size_t n) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (n--) { if (*ps1 == *ps2 || ucharmap[*ps1] != ucharmap[*ps2]) { return (ucharmap[*ps1] - ucharmap[*ps2]); } if (*ps1 == '\0') { break; } ++ps1; ++ps2; } return (0); } int mi_strcasecmp(const char *s1, const char *s2) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; for (;;) { int diff = *ps1 - *ps2; switch (diff) { case 0: break; case 'A' - 'a': if (*ps1 <= 'Z' && *ps1 >= 'A') break; return 1; case 'a' - 'A':
Re: strncasecmp
On Nov 23, 2015 21:12, "Mikhail T."wrote: > > On 23.11.2015 19:43, Yann Ylavic wrote: >> >> No measured difference in my tests, I guess it depends on likelyhood to fail/succeed early in the string or not. > > ? I don't see, where it wins anything -- but I do see, where it loses a little... > >> That's expected (or at least no cared about in this test case). We simply want res to not be optimized out, so print it before leaving, without any particular relevance for its value (string.h and optimized versions should return the same res with the same args, ascii strings only, though). > > Yes, we do want the value printed -- such as to catch the kind of errors I reported earlier. But my question was, why does the value change depending on the number of iterations? L1 cache and other direct effects of cpu internal optimization. You would be better off introducing a non-optimizable operation consuming most cpu registers and the L1 cache between iterations if you want to predict the non-repeditive invocation of this function in different contexts throughout the processing phases.
Re: strncasecmp
On 23.11.2015 19:05, Yann Ylavic wrote: > Here is the correct (new) test, along with the diff wrt the original > (Christophe's) test.c. BTW, if the program measures its own time, should it not use getrusage() instead of gettimeofday()? -mi
Re: strncasecmp
On Tue, Nov 24, 2015 at 1:07 AM, Mikhail T.wrote: > > BTW, if the program measures its own time, should it not use getrusage() > instead of gettimeofday()? Well, it measures the time spent in the relevant code, with a monotonic clock, that should be fair enough. We don't care about the whole process time and other counters. Regards, Yann.
Re: strncasecmp
On Mon, Nov 23, 2015 at 11:42 PM, Yann Ylavicwrote: > except -Os I always have better > results with the "optimized" version To reach better performances with -Os, we could possibly use: int ap_strcasecmp(const char *s1, const char *s2) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; for (;;) { const unsigned int c1 = ucharmap[*ps1++], c2 = ucharmap[*ps2++]; if (c1 != c2) { return c1 - c2; } if (c1 == '\0') { break; } } return (0); } int ap_strncasecmp(const char *s1, const char *s2, int n) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (n--) { const unsigned int c1 = ucharmap[*ps1++], c2 = ucharmap[*ps2++]; if (c1 != c2) { return c1 - c2; } if (c1 == '\0') { break; } } return (0); } This implementation with -Os is ~13% slower than str[n]casemp(), whereas the original ap_str[n]casemp() is ~30% slower. Don't know if it happens/makes sense to compile httpd with -Os... At least for APR, I think we could take that into consideration.
Re: strncasecmp
On 23.11.2015 17:43, Yann Ylavic wrote: > with attachment... There is a mistake somewhere in the optimized version: ./o 1 1 aa1a 0 Optimized (nb=1, len=0) time = 0.611311 : res = 0 The result should not be zero. Indeed, the string.h version is correct: ./o 0 1 aa1a 0 (nb=1, len=0) time = 4.189366 : res = 48 Yours, -mi
Re: strncasecmp
On 23.11.2015 19:05, Yann Ylavic wrote: > while (ucharmap[*ps1] == ucharmap[*ps2++]) { > if (*ps1++ == '\0') { > return (0); > } > } > return (ucharmap[*ps1] - ucharmap[*--ps2]); Is there really a gain in inc- and decrementing this way? Would not it be easier to read with the explicit increments -- and, incidentally, no decrements at all? while (ucharmap[*ps1] == ucharmap[*ps2]) { if (*ps1 == '\0') { return (0); } ++ps1; ++ps2; } return (ucharmap[*ps1] - ucharmap[*ps2]); > We don't care about the whole process time and other counters. That's certainly true. But, then, why bother with building time-counter into the test at all -- instead of simply relying on time(1)? But something is still not right -- the result (for either of the methods) can depend on the number of iterations (!!): ./strncasecmp 1 27 aCaa Ac 2 Optimized (nb=27, len=2) time = 0.000001 : res = 32 ./strncasecmp 1 26 aCaa Ac 2 Optimized (nb=26, len=2) time = 0.000001 : res = 0 ./strncasecmp 0 27 aCaa Ac 2 (nb=27, len=2) time = 0.000003 : res = 32 ./strncasecmp 0 26 aCaa Ac 2 (nb=26, len=2) time = 0.03 : res = 0 Using clang on FreeBSD/amd64 here. Yours, -mi
Re: strncasecmp
FWIW, a new version using clock_gettime() instead of gettimeofday(). Same/Comparable results for optimized vs string.h's (the former wins but with -Os). On Tue, Nov 24, 2015 at 1:43 AM, Yann Ylavic <ylavic@gmail.com> wrote: > On Tue, Nov 24, 2015 at 1:24 AM, Mikhail T. <mi+t...@aldan.algebra.com> wrote: >> >> Is there really a gain in inc- and decrementing this way? Would not it be >> easier to read with the explicit increments -- and, incidentally, no >> decrements at all? > > No measured difference in my tests, I guess it depends on likelyhood > to fail/succeed early in the string or not. > >>> >>> We don't care about the whole process time and other counters. >> >> That's certainly true. But, then, why bother with building time-counter into >> the test at all -- instead of simply relying on time(1)? > > That's the purpose of newtest.c (vs Christophe's test.c): try both... > >> >> But something is still not right -- the result (for either of the methods) >> can depend on the number of iterations (!!): > > That's expected (or at least no cared about in this test case). > We simply want res to not be optimized out, so print it before > leaving, without any particular relevance for its value (string.h and > optimized versions should return the same res with the same args, > ascii strings only, though). #include #include #include #include #include #include /* * Provide our own known-fast implementation of str[n]casecmp() * NOTE: ASCII only! */ static const unsigned char ucharmap[] = { 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff }; int ap_strcasecmp(const char *s1, const char *s2) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (ucharmap[*ps1] == ucharmap[*ps2++]) { if (*ps1++ == '\0') { return (0); } } return (ucharmap[*ps1] - ucharmap[*--ps2]); } int ap_strncasecmp(const char *s1, const char *s2, int n) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (n--) { if (ucharmap[*ps1] != ucharmap[*ps2++]) { return (ucharmap[*ps1] - ucharmap[*--ps2]); } if (*ps1++ == '\0') { break; } } return (0); } #define PROG argv[0] #define METHOD argv[1] #define NB argv[2] #define S1 argv[3] #define S2 argv[4] #define LEN argv[5] /* The ++ are here to try to prevent some optimization done by gcc */ #define FOR for (i=0; i<nb; i++, (S1[0])++, (S2[0])++) int main(int argc, char *argv[]) { int64_t diff; struct timespec tvs, tve; int i, len, nb; int res = 0; if (argc < 6) { printf("%s <0=std ; 1=optimized> S1 S2 \n", PROG); return 0; } len = atoi(LEN); nb = atoi(NB); if (*METHOD == '0') { printf(" (nb=%d, len=%d)\n", nb, len); clock_gettime(CLOCK_MONOTONIC, ); if (len == 0) { FOR { /* really use the result of the function */ res |= strcasecmp(S1, S2); } } else { FOR { res |= strncasecmp(S1, S2, le
Re: strncasecmp
On Tue, Nov 24, 2015 at 12:46 AM, Mikhail T. <mi+t...@aldan.algebra.com> wrote: > On 23.11.2015 17:43, Yann Ylavic wrote: > > with attachment... > > There is a mistake somewhere in the optimized version: My bad, I somehow corrupted the original ap_str[n]casecmp() functions. Here is the correct (new) test, along with the diff wrt the original (Christophe's) test.c. Thanks, Yann. --- test.c 2015-11-24 01:01:29.039339318 +0100 +++ newtest.c 2015-11-24 00:57:13.720817895 +0100 @@ -1,8 +1,10 @@ #include +#include #include #include #include +#include /* * Provide our own known-fast implementation of str[n]casecmp() @@ -83,6 +85,8 @@ int ap_strncasecmp(const char *s1, const int main(int argc, char *argv[]) { +int64_t diff; +struct timeval tvs, tve; int i, len, nb; int res = 0; @@ -96,6 +100,7 @@ int main(int argc, char *argv[]) if (*METHOD == '0') { printf(" (nb=%d, len=%d)\n", nb, len); +gettimeofday(, NULL); if (len == 0) { FOR { /* really use the result of the function */ @@ -107,9 +112,11 @@ int main(int argc, char *argv[]) res |= strncasecmp(S1, S2, len); } } +gettimeofday(, NULL); } else { printf("Optimized (nb=%d, len=%d)\n", nb, len); +gettimeofday(, NULL); if (len == 0) { FOR { res |= ap_strcasecmp(S1, S2); @@ -120,9 +127,16 @@ int main(int argc, char *argv[]) res |= ap_strncasecmp(S1, S2, len); } } +gettimeofday(, NULL); } - + +diff = (int64_t)tve.tv_sec * 100L + tve.tv_usec; +diff -= (int64_t)tvs.tv_sec * 100L + tvs.tv_usec; + /* really use the result of the function */ -printf("res = %d\n", res); +printf("time = %lld.%.6lld : res = %d\n", +(long long)diff / 100L, +(long long)diff % 100L, +res); return 0; } #include #include #include #include #include #include /* * Provide our own known-fast implementation of str[n]casecmp() * NOTE: ASCII only! */ static const unsigned char ucharmap[] = { 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff }; int ap_strcasecmp(const char *s1, const char *s2) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (ucharmap[*ps1] == ucharmap[*ps2++]) { if (*ps1++ == '\0') { return (0); } } return (ucharmap[*ps1] - ucharmap[*--ps2]); } int ap_strncasecmp(const char *s1, const char *s2, int n) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (n--) { if (ucharmap[*ps1] != ucharmap[*ps2++]) { return (ucharmap[*ps1] - ucharmap[*--ps2]); } if (*ps1++ == '\0') { break; } } return (0); } #define PROG argv[0] #define METHOD argv[1] #define NB argv[2] #define S1 argv[3] #define S2 argv[4] #define LEN argv[5] /* The ++ are here to try to prevent some optimization done by gcc */ #define FOR for (i=0; i<nb; i++, (S1[0])++, (S2[0])++) int main(int argc, char *argv[]) { int64_t diff; struct timeval tvs,
Re: strncasecmp
On Tue, Nov 24, 2015 at 1:24 AM, Mikhail T.wrote: > > Is there really a gain in inc- and decrementing this way? Would not it be > easier to read with the explicit increments -- and, incidentally, no > decrements at all? No measured difference in my tests, I guess it depends on likelyhood to fail/succeed early in the string or not. >> >> We don't care about the whole process time and other counters. > > That's certainly true. But, then, why bother with building time-counter into > the test at all -- instead of simply relying on time(1)? That's the purpose of newtest.c (vs Christophe's test.c): try both... > > But something is still not right -- the result (for either of the methods) > can depend on the number of iterations (!!): That's expected (or at least no cared about in this test case). We simply want res to not be optimized out, so print it before leaving, without any particular relevance for its value (string.h and optimized versions should return the same res with the same args, ascii strings only, though).
Re: strncasecmp
On 23.11.2015 19:43, Yann Ylavic wrote: > No measured difference in my tests, I guess it depends on likelyhood to > fail/succeed early in the string or not. ? I don't see, where it wins anything -- but I do see, where it loses a little... > That's expected (or at least no cared about in this test case). We > simply want res to not be optimized out, so print it before leaving, > without any particular relevance for its value (string.h and optimized > versions should return the same res with the same args, ascii strings > only, though). Yes, we do want the value printed -- such as to catch the kind of errors I reported earlier. But my question was, why does the value change depending on the number of iterations? -mi
Re: strncasecmp
I just made a small application which takes as command line parameters the number of iteration to run, which version of the algorithm to use, the 2 strings to compare and the length to compare (or 0 for the non 'n' versions) Compiled using gcc -O3 test.c Tested using linux:~/Code_Source$ time ./a.out 1 1 xcxcxcxcxcxcxcxcxcxcwwaa xcxcxcxcxcxcxcxcxcxcwwaa 0 Optimized (nb=1, len=0) res = 0 real0m4.193s user0m4.192s sys0m0.000s linux:~/Code_Source$ time ./a.out 0 1 xcxcxcxcxcxcxcxcxcxcwwaa xcxcxcxcxcxcxcxcxcxcwwaa 0 (nb=1, len=0) res = 0 real0m1.708s user0m1.704s sys0m0.000s See atatchement. CJ Le 23/11/2015 21:33, Yann Ylavic a écrit : Hi Christophe, On Mon, Nov 23, 2015 at 9:12 PM, Christophe JAILLET <christophe.jail...@wanadoo.fr> wrote: I tried to do some but the benefit of the optimized version is not that clear, at least on my system: gcc 5.2.1 Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Unfortunately, gcc 5.2.1 (i.e. latest compilers' versions) are not widely used... Did you try a code like ap_proxy_port_of_scheme() with values which are unknown schemes? Or even worse cases, with similarly chained strcasecmp() looking for eg. "httpx" in something like {"httpa", "httpb", "httpc", ..., "httpw"}? Regards, Yann. #include #include #include #include /* * Provide our own known-fast implementation of str[n]casecmp() * NOTE: ASCII only! */ static const unsigned char ucharmap[] = { 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff }; int ap_strcasecmp(const char *s1, const char *s2) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (ucharmap[*ps1] == ucharmap[*ps2++]) { if (*ps1++ == '\0') { return (0); } } return (ucharmap[*ps1] - ucharmap[*--ps2]); } int ap_strncasecmp(const char *s1, const char *s2, int n) { const unsigned char *ps1 = (const unsigned char *) s1; const unsigned char *ps2 = (const unsigned char *) s2; while (n--) { if (ucharmap[*ps1] != ucharmap[*ps2++]) { return (ucharmap[*ps1] - ucharmap[*--ps2]); } if (*ps1++ == '\0') { break; } } return (0); } #define PROG argv[0] #define METHOD argv[1] #define NB argv[2] #define S1 argv[3] #define S2 argv[4] #define LEN argv[5] /* The ++ are here to try to prevent some optimization done by gcc */ #define FOR for (i=0; i<nb; i++, (S1[0])++, (S2[0])++) int main(int argc, char *argv[]) { int i, len, nb; int res = 0; if (argc < 6) { printf("%s <0=std ; 1=optimized> S1 S2 \n", PROG); return 0; } len = atoi(LEN); nb = atoi(NB); if (*METHOD == '0') { printf(" (nb=%d, len=%d)\n", nb, len); if (len == 0) { FOR { /* really use the result of the function */ res |= strcasecmp(S1, S2); } } else { FOR { res |= strncasecmp(S1, S2, len); } } }
Re: strncasecmp
I'm +1 if you are suggesting an ascii-only implementation, and by all means introduce both an ap_ and apr_ flavor at the same time. E.g. apr[r]_ascii_str[n]casecmp(). All characters <'A' || >'Z' && <'a || >'z' should be compared by identity. I'm not sure which OS/X implementation you are referring to, though... http://www.opensource.apple.com/source/Libc/Libc-1044.40.1/string/FreeBSD/strcasecmp.c That implementation reminds us that we shouldn't be trusting these local code page implementations for comparing defined ascii tokens, and many of our comparisons in httpd are for such purposes. But since a non-locale based ascii implementation comparing tolower(x)-tolower(y) devolves to chrtable[x]-chrtable[y], where tolower is inline, and if we intend to preserve the defined <0 == >0 behavior of the K definition, you do need to fold. Yes the switch/case is ugly, but far more efficient for a larger set of values/tokens. One optimization would be iterating the pre-sorted list of tokens in a binary fashion with a near skip pointer or returning the elt of that item, but that would quickly become as crufty, subject to bugs and unmaintainable as the usual switch/case solution. On Fri, Nov 20, 2015 at 11:17 AM, Jim Jagielski <j...@jagunet.com> wrote: > We use str[n]casecmp quite a bit. The rub is that some > platforms use a sensible implementation (such as OSX) which > uses a upper-lowercase map and is v. fast, and others > use a brain-dead version which does an actual tolower() of > each char in the string as it tests. We actually try to > handle this in many cases by doing a switch/case test on the > 1st char to fast path the strncasecmp, resulting in ugly code. > > This is crazy. > > I propose a ap_strncasecmp/ap_strcasecmp which we should use. > Ideally, it would be in apr but no need to wait for that > to happen :) > > Unless people have heartburn about this, I will add next week. >
Re: strncasecmp
Pay special attention to; The *strncasecmp*() function shall compare, *while ignoring differences in case*, not more than *n* bytes from the string pointed to by *s1* to the string pointed to by *s2*. In the POSIX locale, *strcasecmp*() and *strncasecmp*() shall *behave as if the strings had been converted to lowercase* and then a byte comparison performed. The results are unspecified in other locales. http://pubs.opengroup.org/onlinepubs/009695399/functions/strcasecmp.html E.g. 0x5B-0x60 sort before alpha. If you don't want to honor posix semantics, don't abuse the "strcmp" concept, but rather name this "streq" or something otherwise less confusing :-) On Fri, Nov 20, 2015 at 12:22 PM, William A Rowe Jr <wr...@rowe-clan.net> wrote: > I'm +1 if you are suggesting an ascii-only implementation, and by all means > introduce both an ap_ and apr_ flavor at the same time. > > E.g. apr[r]_ascii_str[n]casecmp(). > > All characters <'A' || >'Z' && <'a || >'z' should be compared by identity. > > I'm not sure which OS/X implementation you are referring to, though... > > http://www.opensource.apple.com/source/Libc/Libc-1044.40.1/string/FreeBSD/strcasecmp.c > > That implementation reminds us that we shouldn't be trusting these local > code page implementations for comparing defined ascii tokens, and many > of our comparisons in httpd are for such purposes. > > But since a non-locale based ascii implementation comparing > tolower(x)-tolower(y) devolves to chrtable[x]-chrtable[y], where > tolower is inline, and if we intend to preserve the defined <0 == >0 > behavior of the K definition, you do need to fold. > > Yes the switch/case is ugly, but far more efficient for a larger set of > values/tokens. One optimization would be iterating the pre-sorted > list of tokens in a binary fashion with a near skip pointer or returning > the elt of that item, but that would quickly become as crufty, subject > to bugs and unmaintainable as the usual switch/case solution. > > > > On Fri, Nov 20, 2015 at 11:17 AM, Jim Jagielski <j...@jagunet.com> wrote: > >> We use str[n]casecmp quite a bit. The rub is that some >> platforms use a sensible implementation (such as OSX) which >> uses a upper-lowercase map and is v. fast, and others >> use a brain-dead version which does an actual tolower() of >> each char in the string as it tests. We actually try to >> handle this in many cases by doing a switch/case test on the >> 1st char to fast path the strncasecmp, resulting in ugly code. >> >> This is crazy. >> >> I propose a ap_strncasecmp/ap_strcasecmp which we should use. >> Ideally, it would be in apr but no need to wait for that >> to happen :) >> >> Unless people have heartburn about this, I will add next week. >> > >
strncasecmp
We use str[n]casecmp quite a bit. The rub is that some platforms use a sensible implementation (such as OSX) which uses a upper-lowercase map and is v. fast, and others use a brain-dead version which does an actual tolower() of each char in the string as it tests. We actually try to handle this in many cases by doing a switch/case test on the 1st char to fast path the strncasecmp, resulting in ugly code. This is crazy. I propose a ap_strncasecmp/ap_strcasecmp which we should use. Ideally, it would be in apr but no need to wait for that to happen :) Unless people have heartburn about this, I will add next week.
Re: strncasecmp
+1 Le 20/11/2015 18:17, Jim Jagielski a écrit : We use str[n]casecmp quite a bit. The rub is that some platforms use a sensible implementation (such as OSX) which uses a upper-lowercase map and is v. fast, and others use a brain-dead version which does an actual tolower() of each char in the string as it tests. We actually try to handle this in many cases by doing a switch/case test on the 1st char to fast path the strncasecmp, resulting in ugly code. This is crazy. I propose a ap_strncasecmp/ap_strcasecmp which we should use. Ideally, it would be in apr but no need to wait for that to happen :) Unless people have heartburn about this, I will add next week.
Re: strncasecmp
+1 On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski <j...@jagunet.com> wrote: > We use str[n]casecmp quite a bit. The rub is that some > platforms use a sensible implementation (such as OSX) which > uses a upper-lowercase map and is v. fast, and others > use a brain-dead version which does an actual tolower() of > each char in the string as it tests. We actually try to > handle this in many cases by doing a switch/case test on the > 1st char to fast path the strncasecmp, resulting in ugly code. > > This is crazy. > > I propose a ap_strncasecmp/ap_strcasecmp which we should use. > Ideally, it would be in apr but no need to wait for that > to happen :) > > Unless people have heartburn about this, I will add next week.
Re: strncasecmp
Implemented in r1715401 If people want to nit-pick about naming and wish to rename it to something else, be my guest. > On Nov 20, 2015, at 1:03 PM, Yann Ylavic <ylavic@gmail.com> wrote: > > +1 > > On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski <j...@jagunet.com> wrote: >> We use str[n]casecmp quite a bit. The rub is that some >> platforms use a sensible implementation (such as OSX) which >> uses a upper-lowercase map and is v. fast, and others >> use a brain-dead version which does an actual tolower() of >> each char in the string as it tests. We actually try to >> handle this in many cases by doing a switch/case test on the >> 1st char to fast path the strncasecmp, resulting in ugly code. >> >> This is crazy. >> >> I propose a ap_strncasecmp/ap_strcasecmp which we should use. >> Ideally, it would be in apr but no need to wait for that >> to happen :) >> >> Unless people have heartburn about this, I will add next week.
Re: strncasecmp
At first glance this seems to meet the POSIX definition in the POSIX context. Just want to be careful about using this for non-posix applications and your doxygen seems to cover this. Looks good to me. Implemented in r1715401 If people want to nit-pick about naming and wish to rename it to something else, be my guest. > On Nov 20, 2015, at 1:03 PM, Yann Ylavic <ylavic@gmail.com> wrote: > > +1 > > On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski <j...@jagunet.com> wrote: >> We use str[n]casecmp quite a bit. The rub is that some >> platforms use a sensible implementation (such as OSX) which >> uses a upper-lowercase map and is v. fast, and others >> use a brain-dead version which does an actual tolower() of >> each char in the string as it tests. We actually try to >> handle this in many cases by doing a switch/case test on the >> 1st char to fast path the strncasecmp, resulting in ugly code. >> >> This is crazy. >> >> I propose a ap_strncasecmp/ap_strcasecmp which we should use. >> Ideally, it would be in apr but no need to wait for that >> to happen :) >> >> Unless people have heartburn about this, I will add next week.
Re: strncasecmp
Le 20/11/2015 18:17, Jim Jagielski a écrit : Ideally, it would be in apr +1 This could also be even more interesting, because of apr_table_ functions. CJ