Re: strncasecmp

2016-06-08 Thread William A Rowe Jr
So, finally looping back to the httpd vs. apr implementations of these
functions...

apr was testing null against equality and both *str1 null and *str2 null
characters - that's obviously wrong since str2 of null. It also performed
final unnecessary increments of str1 and str1, which I've corrected on
apr-2 trunk.  And I'm using a short [] rather than unsigned char []
which solves some sign conversion, presuming most architectures
are able to more efficiently grab an aligned short and promote to int,
as opposed to an unaligned byte (testing backed this up on intel,
and I measure no net benefit when aligning to int instead).

With these changes, apr slightly outperforms httpd's implementation
in most instances. The one small penalty is looking ahead to a null
that turns out to be there, but we should only be scanning one of the
two strings for a null, and the difference between XOR ax, ax vs.
loading both values into ax, cx and subtracting to obtain a zero value
is very nominal, once per string comparison.  The penalty is only
in the case that str1 is equal to str2 for the full null terminated length
and is a net win in all other cases.

Just synch'ed to apr 2.0 trunk, and now will mass-rename so that
the backported functions all correspond to their trunk counterparts
using an ap_cstr_ decorated name corresponding to our namespace,
to apr's function family group ("C"/POSIX string function).

I think we can use that straight back to httpd 2.4 and then worry
about migrating trunk/2.6 consumers to the apr 1.6/2.0 names with
some helpful #define's.

On Mon, Nov 23, 2015 at 11:10 PM, Mikhail T. <mi+t...@aldan.algebra.com>
wrote:

> On 23.11.2015 23:14, William A Rowe Jr wrote:
>
> L1 cache and other direct effects of cpu internal optimization.
>
> Just what I was thinking. Attached is the same program with one more pair
> of functions added (and an easy way to add more "candidates" to the
> main-driver). I changed the FOR-loop define to obtain repeatable results:
>
> # Test 1 -- equal strings:
> foreach m ( 0 1 2 )
> foreach? ./strncasecmp $m 1 a A 7
> foreach? end
> string.h (nb=1, len=7)
> time = 6.975845 : res = 0
> optimized (nb=1, len=7)
> time = 1.492197 : res = 0
> 'A' - 'a' (nb=1, len=7)
> time = 1.787807 : res = 0
>
> # Test 2 -- immediately-different strings
> foreach m ( 0 1 2 )
> foreach? ./strncasecmp $m 1 a x 7
> foreach? end
> string.h (nb=1, len=7)
> time = 2.527727 : res = -23
> optimized (nb=1, len=7)
> time = 0.406867 : res = -23
> 'A' - 'a' (nb=1, len=7)
> time = 0.440320 : res = -23
>
> # Test 3 -- strings different at the very end
> foreach m ( 0 1 2 )
> foreach? ./strncasecmp $m 1 a x 0
> foreach? end
> string.h (nb=1, len=0)
> time = 9.629660 : res = -23
> optimized (nb=1, len=0)
> time = 1.387208 : res = -23
> 'A' - 'a' (nb=1, len=0)
> time = 1.754683 : res = -23
>
> The new pair (method 2) does not use the static table, which is likely to
> benefit from CPU-cache unfairly in repetitive benchmarks.  It is slower
> than the table-using method 1 functions. But the two pairs might be
> comparable -- or even faster -- in real life.
>
> -mi
>
>


Re: strncasecmp

2015-11-24 Thread William A Rowe Jr
On Tue, Nov 24, 2015 at 10:07 AM, Mikhail T. 
wrote:

> On 24.11.2015 10:08, William A Rowe Jr wrote:
>
> As long as this function is promoted for fast ASCII-specific token
> recognition and has no other unexpected equalities, it serves a useful
> purpose.
>
> Because of this, I'd suggest renaming it to something, that emphasizes it
> being ASCII-only.
>

Strictly speaking, you are referring to US-ASCII, while the actual function
including
the EBCDIC build follows LC_CTYPE/LC_COLLATE C or POSIX (equivalent names
for the same thing) and is honoring LC_COLLATE of the US-ASCII ordering.

apr_strcasecmp_lcposix or apr_strcasecmp_lcc might be a usable name.  Other
suggestions?  I was leaning toward apr_strcasecmp_token since that is a
pretty
well recognized RFC term and usually refers to the ASCII set.  For HTTPbis,
token
is defined at http://tools.ietf.org/html/rfc7230#section-3.2.6


Re: strncasecmp

2015-11-24 Thread Mikhail T.
On 24.11.2015 10:08, William A Rowe Jr wrote:
> As long as this function is promoted for fast ASCII-specific token
> recognition and has no other unexpected equalities, it serves a useful
> purpose.
Because of this, I'd suggest renaming it to something, that emphasizes
it being ASCII-only.

-mi



Re: strncasecmp

2015-11-24 Thread Yann Ylavic
On Tue, Nov 24, 2015 at 4:12 AM, Mikhail T.  wrote:
> On 23.11.2015 19:43, Yann Ylavic wrote:
>
>> That's expected (or at least no cared about in this test case). We simply
>> want res to not be optimized out, so print it before leaving, without any
>> particular relevance for its value (string.h and optimized versions should
>> return the same res with the same args, ascii strings only, though).
>
> Yes, we do want the value printed -- such as to catch the kind of errors I
> reported earlier. But my question was, why does the value change depending
> on the number of iterations?

Hmm, res is not reset (to 0) in each loop, so I first thought it might
well end up having a different value when ORed more times, but that
makes no sense...

Actually the "evil" is in (S1[0])++, (S2[0])++ in the for loop, which
kills some compiler optimizations in main() but also the comparison
once we leave alpha characters range for those :)
Let's say that for this test to be relevant, the first char of S1 and
S2 must be the same (including case) ;)

Regards,
Yann.


Re: strncasecmp

2015-11-24 Thread Yann Ylavic
On Tue, Nov 24, 2015 at 6:10 AM, Mikhail T.  wrote:
>
> Attached is the same program with one more pair of
> functions added (and an easy way to add more "candidates" to the
> main-driver). I changed the FOR-loop define to obtain repeatable results:

This test program kills str[n]casecmp()'s inlining with the
indirections (function pointers), so it's not really fair wrt
string.h's versions.
That's not really (often) what we'll find in real world apps.

>
> The new pair (method 2) does not use the static table, which is likely to
> benefit from CPU-cache unfairly in repetitive benchmarks.  It is slower than
> the table-using method 1 functions. But the two pairs might be comparable --
> or even faster -- in real life.

Possibly, this version may also be more "respective" of the cacheline
(the 256 bytes table may evict hot things), yet it is slower here...

Regards,
Yann.


Re: strncasecmp

2015-11-24 Thread William A Rowe Jr
On Tue, Nov 24, 2015 at 6:40 AM, Jim Jagielski  wrote:

> It really depends on the OS and the version of the OS. In
> my test cases on OSX and CentOS5 and centOS6, I see
> measurable improvements.
>

Part of the reason for your differences... on this console here I have;
$ set | grep -E "^L[AC]"
LANG=en_US.UTF-8

Knock that back to a SBCS en_US.ISO-8859-1 and you'll see an
improvement, not that the stdc library implementation was inefficient
(it wasn't), but that UTF-8 causes strcmp to fall over into MBCS character
handling.

As long as this function is promoted for fast ASCII-specific token
recognition and has no other unexpected equalities, it serves a useful
purpose.


Re: strncasecmp

2015-11-24 Thread Jim Jagielski
It really depends on the OS and the version of the OS. In
my test cases on OSX and CentOS5 and centOS6, I see
measurable improvements.
> On Nov 23, 2015, at 3:12 PM, Christophe JAILLET 
> <christophe.jail...@wanadoo.fr> wrote:
> 
> Hi Jim,
> 
> Do you have done some benchmark and have results to share?
> 
> I tried to do some but the benefit of the optimized version is not that 
> clear, at least on my system:
>   gcc 5.2.1
>   Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 
> x86_64 x86_64 x86_64 GNU/Linux
> 
> I've tried to trick gcc in order to avoid some optimization. (gcc does its 
> best to remove/inline/remove from loop known standard C functions)
> I've check asm output to be more confident on the code generated by gcc.
> 
> Based on my own results only, and without any becnhmark, I would -1 a 
> backport.
> For small strings (below identical char), your implementation looks faster, 
> but above 4 identical chars it looks slower
> 
> I can provide my test program if interested.
> 
> 
> best regards,
> CJ
> 
> 
> Le 20/11/2015 19:53, Jim Jagielski a écrit :
>> Implemented in r1715401
>> 
>> If people want to nit-pick about naming and wish to
>> rename it to something else, be my guest.
>> 
>>> On Nov 20, 2015, at 1:03 PM, Yann Ylavic<ylavic@gmail.com>  wrote:
>>> 
>>> +1
>>> 
>>> On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski<j...@jagunet.com>  wrote:
>>>> We use str[n]casecmp quite a bit. The rub is that some
>>>> platforms use a sensible implementation (such as OSX) which
>>>> uses a upper-lowercase map and is v. fast, and others
>>>> use a brain-dead version which does an actual tolower() of
>>>> each char in the string as it tests. We actually try to
>>>> handle this in many cases by doing a switch/case test on the
>>>> 1st char to fast path the strncasecmp, resulting in ugly code.
>>>> 
>>>> This is crazy.
>>>> 
>>>> I propose a ap_strncasecmp/ap_strcasecmp which we should use.
>>>> Ideally, it would be in apr but no need to wait for that
>>>> to happen :)
>>>> 
>>>> Unless people have heartburn about this, I will add next week.
> 



Re: strncasecmp

2015-11-23 Thread Christophe JAILLET

Hi Jim,

Do you have done some benchmark and have results to share?

I tried to do some but the benefit of the optimized version is not that 
clear, at least on my system:

   gcc 5.2.1
   Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux


I've tried to trick gcc in order to avoid some optimization. (gcc does 
its best to remove/inline/remove from loop known standard C functions)

I've check asm output to be more confident on the code generated by gcc.

Based on my own results only, and without any becnhmark, I would -1 a 
backport.
For small strings (below identical char), your implementation looks 
faster, but above 4 identical chars it looks slower


I can provide my test program if interested.


best regards,
CJ


Le 20/11/2015 19:53, Jim Jagielski a écrit :

Implemented in r1715401

If people want to nit-pick about naming and wish to
rename it to something else, be my guest.


On Nov 20, 2015, at 1:03 PM, Yann Ylavic<ylavic@gmail.com>  wrote:

+1

On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski<j...@jagunet.com>  wrote:

We use str[n]casecmp quite a bit. The rub is that some
platforms use a sensible implementation (such as OSX) which
uses a upper-lowercase map and is v. fast, and others
use a brain-dead version which does an actual tolower() of
each char in the string as it tests. We actually try to
handle this in many cases by doing a switch/case test on the
1st char to fast path the strncasecmp, resulting in ugly code.

This is crazy.

I propose a ap_strncasecmp/ap_strcasecmp which we should use.
Ideally, it would be in apr but no need to wait for that
to happen :)

Unless people have heartburn about this, I will add next week.




Re: strncasecmp

2015-11-23 Thread Yann Ylavic
Hi Christophe,

On Mon, Nov 23, 2015 at 9:12 PM, Christophe JAILLET
 wrote:
>
> I tried to do some but the benefit of the optimized version is not that
> clear, at least on my system:
>gcc 5.2.1
>Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015
> x86_64 x86_64 x86_64 GNU/Linux

Unfortunately, gcc 5.2.1 (i.e. latest compilers' versions) are not
widely used...

Did you try a code like ap_proxy_port_of_scheme() with values which
are unknown schemes?
Or even worse cases, with similarly chained strcasecmp() looking for
eg. "httpx" in something like {"httpa", "httpb", "httpc", ...,
"httpw"}?

Regards,
Yann.


Re: strncasecmp

2015-11-23 Thread Yann Ylavic
I modified your test program a bit (to measure time from it, see
attached), tried with -O{2,3,s}, and except -Os I always have better
results with the "optimized" version, eg:

$ ./a-O3.out 0 15000 xcxcxcxcxcxcxcxcxcxcwwaa
xcxcxcxcxcxcxcxcxcxcwwaa 0
 (nb=15000, len=0)
time = 8.424984 : res = 0

$ ./a-O3.out 1 15000 xcxcxcxcxcxcxcxcxcxcwwaa
xcxcxcxcxcxcxcxcxcxcwwaa 0
Optimized (nb=15000, len=0)
time = 8.212137 : res = 0

Possibly gcc (v4.4.5 here) is clever enough to optimize/inline/cheat
when given standard (no custom) code, since I had similar results than
yours with the original test.c...

How does this one work with gcc-5.2?

On Mon, Nov 23, 2015 at 10:07 PM, Marion & Christophe JAILLET
 wrote:
> I just made a small application which takes as command line parameters the
> number of iteration to run, which version of the algorithm to use, the 2
> strings to compare and the length to compare (or 0 for the non 'n' versions)
>
>
> Compiled using
> gcc -O3 test.c
>
> Tested using
> linux:~/Code_Source$ time ./a.out 1 1
> xcxcxcxcxcxcxcxcxcxcwwaa
> xcxcxcxcxcxcxcxcxcxcwwaa 0
> Optimized (nb=1, len=0)
> res = 0
>
> real0m4.193s
> user0m4.192s
> sys0m0.000s
>
>
>
> linux:~/Code_Source$ time ./a.out 0 1
> xcxcxcxcxcxcxcxcxcxcwwaa
> xcxcxcxcxcxcxcxcxcxcwwaa 0
>  (nb=1, len=0)
> res = 0
>
> real0m1.708s
> user0m1.704s
> sys0m0.000s
>
>
>
>
> See atatchement.
>
> CJ
>
>
>
> Le 23/11/2015 21:33, Yann Ylavic a écrit :
>>
>> Hi Christophe,
>>
>> On Mon, Nov 23, 2015 at 9:12 PM, Christophe JAILLET
>>  wrote:
>>>
>>> I tried to do some but the benefit of the optimized version is not that
>>> clear, at least on my system:
>>> gcc 5.2.1
>>> Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC
>>> 2015
>>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> Unfortunately, gcc 5.2.1 (i.e. latest compilers' versions) are not
>> widely used...
>>
>> Did you try a code like ap_proxy_port_of_scheme() with values which
>> are unknown schemes?
>> Or even worse cases, with similarly chained strcasecmp() looking for
>> eg. "httpx" in something like {"httpa", "httpb", "httpc", ...,
>> "httpw"}?
>>
>> Regards,
>> Yann.
>>
>


Re: strncasecmp

2015-11-23 Thread Yann Ylavic
Please note that the changes in ap_str[n]casecmp(), ie:
++ps1;
++ps2;
was a first try/change which (obviously) did nothing.
You may ignore it.


On Mon, Nov 23, 2015 at 11:43 PM, Yann Ylavic  wrote:
> with attachment...
>
> On Mon, Nov 23, 2015 at 11:42 PM, Yann Ylavic  wrote:
>> I modified your test program a bit (to measure time from it, see
>> attached), tried with -O{2,3,s}, and except -Os I always have better
>> results with the "optimized" version, eg:
>>
>> $ ./a-O3.out 0 15000 xcxcxcxcxcxcxcxcxcxcwwaa
>> xcxcxcxcxcxcxcxcxcxcwwaa 0
>>  (nb=15000, len=0)
>> time = 8.424984 : res = 0
>>
>> $ ./a-O3.out 1 15000 xcxcxcxcxcxcxcxcxcxcwwaa
>> xcxcxcxcxcxcxcxcxcxcwwaa 0
>> Optimized (nb=15000, len=0)
>> time = 8.212137 : res = 0
>>
>> Possibly gcc (v4.4.5 here) is clever enough to optimize/inline/cheat
>> when given standard (no custom) code, since I had similar results than
>> yours with the original test.c...
>>
>> How does this one work with gcc-5.2?
>>
>> On Mon, Nov 23, 2015 at 10:07 PM, Marion & Christophe JAILLET
>>  wrote:
>>> I just made a small application which takes as command line parameters the
>>> number of iteration to run, which version of the algorithm to use, the 2
>>> strings to compare and the length to compare (or 0 for the non 'n' versions)
>>>
>>>
>>> Compiled using
>>> gcc -O3 test.c
>>>
>>> Tested using
>>> linux:~/Code_Source$ time ./a.out 1 1
>>> xcxcxcxcxcxcxcxcxcxcwwaa
>>> xcxcxcxcxcxcxcxcxcxcwwaa 0
>>> Optimized (nb=1, len=0)
>>> res = 0
>>>
>>> real0m4.193s
>>> user0m4.192s
>>> sys0m0.000s
>>>
>>>
>>>
>>> linux:~/Code_Source$ time ./a.out 0 1
>>> xcxcxcxcxcxcxcxcxcxcwwaa
>>> xcxcxcxcxcxcxcxcxcxcwwaa 0
>>>  (nb=1, len=0)
>>> res = 0
>>>
>>> real0m1.708s
>>> user0m1.704s
>>> sys0m0.000s
>>>
>>>
>>>
>>>
>>> See atatchement.
>>>
>>> CJ
>>>
>>>
>>>
>>> Le 23/11/2015 21:33, Yann Ylavic a écrit :

 Hi Christophe,

 On Mon, Nov 23, 2015 at 9:12 PM, Christophe JAILLET
  wrote:
>
> I tried to do some but the benefit of the optimized version is not that
> clear, at least on my system:
> gcc 5.2.1
> Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC
> 2015
> x86_64 x86_64 x86_64 GNU/Linux

 Unfortunately, gcc 5.2.1 (i.e. latest compilers' versions) are not
 widely used...

 Did you try a code like ap_proxy_port_of_scheme() with values which
 are unknown schemes?
 Or even worse cases, with similarly chained strcasecmp() looking for
 eg. "httpx" in something like {"httpa", "httpb", "httpc", ...,
 "httpw"}?

 Regards,
 Yann.

>>>


Re: strncasecmp

2015-11-23 Thread Yann Ylavic
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
};

int ap_strcasecmp(const char *s1, const char *s2)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;

while (ucharmap[*ps1] == ucharmap[*ps2]) {
if (*ps1 == '\0') {
return (0);
}
++ps1;
++ps2;
}
return (ucharmap[*ps1] - ucharmap[*--ps2]);
}

int ap_strncasecmp(const char *s1, const char *s2, int n)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;
while (n--) {
if (ucharmap[*ps1] != ucharmap[*ps2]) {
return (ucharmap[*ps1] - ucharmap[*ps2]);
}
if (*ps1 == '\0') {
break;
}
++ps1;
++ps2;
}
return (0);
}

#define PROG argv[0]
#define METHOD argv[1]
#define NB argv[2]
#define S1 argv[3]
#define S2 argv[4]
#define LEN argv[5]

/* The ++ are here to try to prevent some optimization done by gcc */
#define FOR for (i=0; i<nb; i++, (S1[0])++, (S2[0])++)

int main(int argc, char *argv[])
{
int64_t diff;
struct timeval tvs, tve;
int i, len, nb;
int res = 0;

if (argc < 6) {
printf("%s <0=std ; 1=optimized>  S1 S2 \n", PROG);
return 0;
}

len = atoi(LEN);
nb  = atoi(NB);

if (*METHOD == '0') {
printf(" (nb=%d, len=%d)\n", nb, len);
gettimeofday();
if (len == 0) {
FOR {
/* really use the result of the function */
res |= strcasecmp(S1, S2);
}
}
else {
FOR {
res |= strncasecmp(S1, S2, len);
}
}
gettimeofday();
}
else {
printf("Optimized (nb=%d, len=%d)\n", nb, len);
gettimeofday();
if (len == 0) {
FOR {
res |= ap_strcasecmp(S1, S2);
}
}
else {
FOR {
res |= ap_strncasecmp(S1, S2, len);
}
}
gettimeofday();
}

diff  = (int64_t)tve.tv_sec * 100L + tve.tv_usec;
diff -= (int64_t)tvs.tv_sec * 100L + tvs.tv_usec;

/* really use the result of the function */
printf("time = %lld.%.6lld : res = %d\n",
(long long)diff / 100L,
(long long)diff % 100L,
res);
return 0; 
}


Re: strncasecmp

2015-11-23 Thread Mikhail T.
On 23.11.2015 23:14, William A Rowe Jr wrote:
> L1 cache and other direct effects of cpu internal optimization.
Just what I was thinking. Attached is the same program with one more
pair of functions added (and an easy way to add more "candidates" to the
main-driver). I changed the FOR-loop define to obtain repeatable results:

# Test 1 -- equal strings:
foreach m ( 0 1 2 )
    foreach? ./strncasecmp $m 1 a A 7
foreach? end
string.h (nb=1, len=7)
time = 6.975845 : res = 0
optimized (nb=1, len=7)
time = 1.492197 : res = 0
'A' - 'a' (nb=1, len=7)
time = 1.787807 : res = 0

# Test 2 -- immediately-different strings
foreach m ( 0 1 2 )
    foreach? ./strncasecmp $m 1 a x 7
foreach? end
string.h (nb=1, len=7)
time = 2.527727 : res = -23
optimized (nb=1, len=7)
time = 0.406867 : res = -23
'A' - 'a' (nb=1, len=7)
time = 0.440320 : res = -23

# Test 3 -- strings different at the very end
foreach m ( 0 1 2 )
    foreach? ./strncasecmp $m 1 a x 0
foreach? end
string.h (nb=1, len=0)
time = 9.629660 : res = -23
optimized (nb=1, len=0)
time = 1.387208 : res = -23
'A' - 'a' (nb=1, len=0)
time = 1.754683 : res = -23

The new pair (method 2) does not use the static table, which is likely
to benefit from CPU-cache unfairly in repetitive benchmarks.  It is
slower than the table-using method 1 functions. But the two pairs might
be comparable -- or even faster -- in real life.

-mi

#include 
#include 
#include 

#define gettimeofday(X) gettimeofday(X, NULL)

#include 
#include 
#include 
#include 
#include 

#include 

/*
 * Provide our own known-fast implementation of str[n]casecmp()
 * NOTE: ASCII only!
 */
static const unsigned char ucharmap[] = {
0x0,  0x1,  0x2,  0x3,  0x4,  0x5,  0x6,  0x7,
0x8,  0x9,  0xa,  0xb,  0xc,  0xd,  0xe,  0xf,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
0x40,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
 'x',  'y',  'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
0x60,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
 'x',  'y',  'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
};

int ap_strcasecmp(const char *s1, const char *s2)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;

while (*ps1 == *ps2 || ucharmap[*ps1] == ucharmap[*ps2]) {
if (*ps1 == '\0') {
return (0);
}
++ps1;
++ps2;
}
return (ucharmap[*ps1] - ucharmap[*ps2]);
}

int ap_strncasecmp(const char *s1, const char *s2, size_t n)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;
while (n--) {
if (*ps1 == *ps2 || ucharmap[*ps1] != ucharmap[*ps2]) {
return (ucharmap[*ps1] - ucharmap[*ps2]);
}
if (*ps1 == '\0') {
break;
}
++ps1;
++ps2;
}
return (0);
}

int mi_strcasecmp(const char *s1, const char *s2)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;

for (;;) {
int diff = *ps1 - *ps2;
switch (diff) {
case 0:
break;
case 'A' - 'a':
if (*ps1 <= 'Z' && *ps1 >= 'A')
break;
return 1;
case 'a' - 'A':

Re: strncasecmp

2015-11-23 Thread William A Rowe Jr
On Nov 23, 2015 21:12, "Mikhail T."  wrote:
>
> On 23.11.2015 19:43, Yann Ylavic wrote:
>>
>> No measured difference in my tests, I guess it depends on likelyhood to
fail/succeed early in the string or not.
>
> ? I don't see, where it wins anything -- but I do see, where it loses a
little...
>
>> That's expected (or at least no cared about in this test case). We
simply want res to not be optimized out, so print it before leaving,
without any particular relevance for its value (string.h and optimized
versions should return the same res with the same args, ascii strings only,
though).
>
> Yes, we do want the value printed -- such as to catch the kind of errors
I reported earlier. But my question was, why does the value change
depending on the number of iterations?

L1 cache and other direct effects of cpu internal optimization.

You would be better off introducing a non-optimizable operation consuming
most cpu registers and the L1 cache between iterations if you want to
predict the non-repeditive invocation of this function in different
contexts throughout the processing phases.


Re: strncasecmp

2015-11-23 Thread Mikhail T.
On 23.11.2015 19:05, Yann Ylavic wrote:
> Here is the correct (new) test, along with the diff wrt the original
> (Christophe's) test.c.
BTW, if the program measures its own time, should it not use getrusage()
instead of gettimeofday()?

-mi



Re: strncasecmp

2015-11-23 Thread Yann Ylavic
On Tue, Nov 24, 2015 at 1:07 AM, Mikhail T.  wrote:
>
> BTW, if the program measures its own time, should it not use getrusage()
> instead of gettimeofday()?

Well, it measures the time spent in the relevant code, with a
monotonic clock, that should be fair enough.
We don't care about the whole process time and other counters.

Regards,
Yann.


Re: strncasecmp

2015-11-23 Thread Yann Ylavic
On Mon, Nov 23, 2015 at 11:42 PM, Yann Ylavic  wrote:
> except -Os I always have better
> results with the "optimized" version

To reach better performances with -Os, we could possibly use:

int ap_strcasecmp(const char *s1, const char *s2)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;
for (;;) {
const unsigned int c1 = ucharmap[*ps1++],
   c2 = ucharmap[*ps2++];
if (c1 != c2) {
return c1 - c2;
}
if (c1 == '\0') {
break;
}
}
return (0);
}

int ap_strncasecmp(const char *s1, const char *s2, int n)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;
while (n--) {
const unsigned int c1 = ucharmap[*ps1++],
   c2 = ucharmap[*ps2++];
if (c1 != c2) {
return c1 - c2;
}
if (c1 == '\0') {
break;
}
}
return (0);
}

This implementation with -Os is ~13% slower than str[n]casemp(),
whereas the original ap_str[n]casemp() is ~30% slower.

Don't know if it happens/makes sense to compile httpd with -Os...
At least for APR, I think we could take that into consideration.


Re: strncasecmp

2015-11-23 Thread Mikhail T.
On 23.11.2015 17:43, Yann Ylavic wrote:
> with attachment...
There is a mistake somewhere in the optimized version:

./o 1 1  aa1a 0
Optimized (nb=1, len=0)
time = 0.611311 : res = 0

The result should not be zero. Indeed, the string.h version is correct:

./o 0 1  aa1a 0
 (nb=1, len=0)
time = 4.189366 : res = 48

Yours,

-mi



Re: strncasecmp

2015-11-23 Thread Mikhail T.
On 23.11.2015 19:05, Yann Ylavic wrote:
> while (ucharmap[*ps1] == ucharmap[*ps2++]) {
> if (*ps1++ == '\0') {
> return (0);
> }
> }
> return (ucharmap[*ps1] - ucharmap[*--ps2]);
Is there really a gain in inc- and decrementing this way? Would not it
be easier to read with the explicit increments -- and, incidentally, no
decrements at all?

while (ucharmap[*ps1] == ucharmap[*ps2]) {
if (*ps1 == '\0') {
return (0);
}
++ps1;
++ps2;
}
return (ucharmap[*ps1] - ucharmap[*ps2]);

> We don't care about the whole process time and other counters.
That's certainly true. But, then, why bother with building time-counter
into the test at all -- instead of simply relying on time(1)?

But something is still not right -- the result (for either of the
methods) can depend on the number of iterations (!!):

./strncasecmp 1 27 aCaa Ac 2
Optimized (nb=27, len=2)
time = 0.000001 : res = 32
./strncasecmp 1 26 aCaa Ac 2
Optimized (nb=26, len=2)
time = 0.000001 : res = 0
./strncasecmp 0 27 aCaa Ac 2
 (nb=27, len=2)
time = 0.000003 : res = 32
./strncasecmp 0 26 aCaa Ac 2
 (nb=26, len=2)
time = 0.03 : res = 0

Using clang on FreeBSD/amd64 here. Yours,

-mi




Re: strncasecmp

2015-11-23 Thread Yann Ylavic
FWIW, a new version using clock_gettime() instead of gettimeofday().
Same/Comparable results for optimized vs string.h's (the former wins
but with -Os).

On Tue, Nov 24, 2015 at 1:43 AM, Yann Ylavic <ylavic@gmail.com> wrote:
> On Tue, Nov 24, 2015 at 1:24 AM, Mikhail T. <mi+t...@aldan.algebra.com> wrote:
>>
>> Is there really a gain in inc- and decrementing this way? Would not it be
>> easier to read with the explicit increments -- and, incidentally, no
>> decrements at all?
>
> No measured difference in my tests, I guess it depends on likelyhood
> to fail/succeed early in the string or not.
>
>>>
>>> We don't care about the whole process time and other counters.
>>
>> That's certainly true. But, then, why bother with building time-counter into
>> the test at all -- instead of simply relying on time(1)?
>
> That's the purpose of newtest.c (vs Christophe's test.c): try both...
>
>>
>> But something is still not right -- the result (for either of the methods)
>> can depend on the number of iterations (!!):
>
> That's expected (or at least no cared about in this test case).
> We simply want res to not be optimized out, so print it before
> leaving, without any particular relevance for its value (string.h and
> optimized versions should return the same res with the same args,
> ascii strings only, though).
#include 
#include 
#include 
#include 

#include 
#include 

/*
 * Provide our own known-fast implementation of str[n]casecmp()
 * NOTE: ASCII only!
 */
static const unsigned char ucharmap[] = {
0x0,  0x1,  0x2,  0x3,  0x4,  0x5,  0x6,  0x7,
0x8,  0x9,  0xa,  0xb,  0xc,  0xd,  0xe,  0xf,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
0x40,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
 'x',  'y',  'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
0x60,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
 'x',  'y',  'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
};

int ap_strcasecmp(const char *s1, const char *s2)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;

while (ucharmap[*ps1] == ucharmap[*ps2++]) {
if (*ps1++ == '\0') {
return (0);
}
}
return (ucharmap[*ps1] - ucharmap[*--ps2]);
}

int ap_strncasecmp(const char *s1, const char *s2, int n)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;
while (n--) {
if (ucharmap[*ps1] != ucharmap[*ps2++]) {
return (ucharmap[*ps1] - ucharmap[*--ps2]);
}
if (*ps1++ == '\0') {
break;
}
}
return (0);
}

#define PROG argv[0]
#define METHOD argv[1]
#define NB argv[2]
#define S1 argv[3]
#define S2 argv[4]
#define LEN argv[5]

/* The ++ are here to try to prevent some optimization done by gcc */
#define FOR for (i=0; i<nb; i++, (S1[0])++, (S2[0])++)

int main(int argc, char *argv[])
{
int64_t diff;
struct timespec tvs, tve;
int i, len, nb;
int res = 0;

if (argc < 6) {
printf("%s <0=std ; 1=optimized>  S1 S2 \n", PROG);
return 0;
}

len = atoi(LEN);
nb  = atoi(NB);

if (*METHOD == '0') {
printf(" (nb=%d, len=%d)\n", nb, len);
clock_gettime(CLOCK_MONOTONIC, );
if (len == 0) {
FOR {
/* really use the result of the function */
res |= strcasecmp(S1, S2);
}
}
else {
FOR {
res |= strncasecmp(S1, S2, le

Re: strncasecmp

2015-11-23 Thread Yann Ylavic
On Tue, Nov 24, 2015 at 12:46 AM, Mikhail T. <mi+t...@aldan.algebra.com> wrote:
> On 23.11.2015 17:43, Yann Ylavic wrote:
>
> with attachment...
>
> There is a mistake somewhere in the optimized version:

My bad, I somehow corrupted the original ap_str[n]casecmp() functions.

Here is the correct (new) test, along with the diff wrt the original
(Christophe's) test.c.

Thanks,
Yann.
--- test.c  2015-11-24 01:01:29.039339318 +0100
+++ newtest.c   2015-11-24 00:57:13.720817895 +0100
@@ -1,8 +1,10 @@
 #include 
+#include 
 #include 
 #include 
 
 #include 
+#include 
 
 /*
  * Provide our own known-fast implementation of str[n]casecmp()
@@ -83,6 +85,8 @@ int ap_strncasecmp(const char *s1, const
 
 int main(int argc, char *argv[])
 {
+int64_t diff;
+struct timeval tvs, tve;
 int i, len, nb;
 int res = 0;
 
@@ -96,6 +100,7 @@ int main(int argc, char *argv[])
 
 if (*METHOD == '0') {
 printf(" (nb=%d, len=%d)\n", nb, len);
+gettimeofday(, NULL);
 if (len == 0) {
 FOR {
 /* really use the result of the function */
@@ -107,9 +112,11 @@ int main(int argc, char *argv[])
 res |= strncasecmp(S1, S2, len);
 }
 }
+gettimeofday(, NULL);
 }
 else {
 printf("Optimized (nb=%d, len=%d)\n", nb, len);
+gettimeofday(, NULL);
 if (len == 0) {
 FOR {
 res |= ap_strcasecmp(S1, S2);
@@ -120,9 +127,16 @@ int main(int argc, char *argv[])
 res |= ap_strncasecmp(S1, S2, len);
 }
 }
+gettimeofday(, NULL);
 }
-
+
+diff  = (int64_t)tve.tv_sec * 100L + tve.tv_usec;
+diff -= (int64_t)tvs.tv_sec * 100L + tvs.tv_usec;
+
 /* really use the result of the function */
-printf("res = %d\n", res);
+printf("time = %lld.%.6lld : res = %d\n",
+(long long)diff / 100L,
+(long long)diff % 100L,
+res);
 return 0; 
 }
#include 
#include 
#include 
#include 

#include 
#include 

/*
 * Provide our own known-fast implementation of str[n]casecmp()
 * NOTE: ASCII only!
 */
static const unsigned char ucharmap[] = {
0x0,  0x1,  0x2,  0x3,  0x4,  0x5,  0x6,  0x7,
0x8,  0x9,  0xa,  0xb,  0xc,  0xd,  0xe,  0xf,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
0x40,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
 'x',  'y',  'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
0x60,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
 'x',  'y',  'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
};

int ap_strcasecmp(const char *s1, const char *s2)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;

while (ucharmap[*ps1] == ucharmap[*ps2++]) {
if (*ps1++ == '\0') {
return (0);
}
}
return (ucharmap[*ps1] - ucharmap[*--ps2]);
}

int ap_strncasecmp(const char *s1, const char *s2, int n)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;
while (n--) {
if (ucharmap[*ps1] != ucharmap[*ps2++]) {
return (ucharmap[*ps1] - ucharmap[*--ps2]);
}
if (*ps1++ == '\0') {
break;
}
}
return (0);
}

#define PROG argv[0]
#define METHOD argv[1]
#define NB argv[2]
#define S1 argv[3]
#define S2 argv[4]
#define LEN argv[5]

/* The ++ are here to try to prevent some optimization done by gcc */
#define FOR for (i=0; i<nb; i++, (S1[0])++, (S2[0])++)

int main(int argc, char *argv[])
{
int64_t diff;
struct timeval tvs,

Re: strncasecmp

2015-11-23 Thread Yann Ylavic
On Tue, Nov 24, 2015 at 1:24 AM, Mikhail T.  wrote:
>
> Is there really a gain in inc- and decrementing this way? Would not it be
> easier to read with the explicit increments -- and, incidentally, no
> decrements at all?

No measured difference in my tests, I guess it depends on likelyhood
to fail/succeed early in the string or not.

>>
>> We don't care about the whole process time and other counters.
>
> That's certainly true. But, then, why bother with building time-counter into
> the test at all -- instead of simply relying on time(1)?

That's the purpose of newtest.c (vs Christophe's test.c): try both...

>
> But something is still not right -- the result (for either of the methods)
> can depend on the number of iterations (!!):

That's expected (or at least no cared about in this test case).
We simply want res to not be optimized out, so print it before
leaving, without any particular relevance for its value (string.h and
optimized versions should return the same res with the same args,
ascii strings only, though).


Re: strncasecmp

2015-11-23 Thread Mikhail T.
On 23.11.2015 19:43, Yann Ylavic wrote:
> No measured difference in my tests, I guess it depends on likelyhood to 
> fail/succeed early in the string or not.
? I don't see, where it wins anything -- but I do see, where it loses a
little...
> That's expected (or at least no cared about in this test case). We
> simply want res to not be optimized out, so print it before leaving,
> without any particular relevance for its value (string.h and optimized
> versions should return the same res with the same args, ascii strings
> only, though). 
Yes, we do want the value printed -- such as to catch the kind of errors
I reported earlier. But my question was, why does the value change
depending on the number of iterations?

-mi




Re: strncasecmp

2015-11-23 Thread Marion & Christophe JAILLET
I just made a small application which takes as command line parameters 
the number of iteration to run, which version of the algorithm to use, 
the 2 strings to compare and the length to compare (or 0 for the non 'n' 
versions)



Compiled using
gcc -O3 test.c

Tested using
linux:~/Code_Source$ time ./a.out 1 1 
xcxcxcxcxcxcxcxcxcxcwwaa 
xcxcxcxcxcxcxcxcxcxcwwaa 0

Optimized (nb=1, len=0)
res = 0

real0m4.193s
user0m4.192s
sys0m0.000s



linux:~/Code_Source$ time ./a.out 0 1 
xcxcxcxcxcxcxcxcxcxcwwaa 
xcxcxcxcxcxcxcxcxcxcwwaa 0

 (nb=1, len=0)
res = 0

real0m1.708s
user0m1.704s
sys0m0.000s




See atatchement.

CJ


Le 23/11/2015 21:33, Yann Ylavic a écrit :

Hi Christophe,

On Mon, Nov 23, 2015 at 9:12 PM, Christophe JAILLET
<christophe.jail...@wanadoo.fr> wrote:

I tried to do some but the benefit of the optimized version is not that
clear, at least on my system:
gcc 5.2.1
Linux linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

Unfortunately, gcc 5.2.1 (i.e. latest compilers' versions) are not
widely used...

Did you try a code like ap_proxy_port_of_scheme() with values which
are unknown schemes?
Or even worse cases, with similarly chained strcasecmp() looking for
eg. "httpx" in something like {"httpa", "httpb", "httpc", ...,
"httpw"}?

Regards,
Yann.



#include 
#include 
#include 

#include 

/*
 * Provide our own known-fast implementation of str[n]casecmp()
 * NOTE: ASCII only!
 */
static const unsigned char ucharmap[] = {
0x0,  0x1,  0x2,  0x3,  0x4,  0x5,  0x6,  0x7,
0x8,  0x9,  0xa,  0xb,  0xc,  0xd,  0xe,  0xf,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
0x40,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
 'x',  'y',  'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
0x60,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
 'x',  'y',  'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
};

int ap_strcasecmp(const char *s1, const char *s2)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;

while (ucharmap[*ps1] == ucharmap[*ps2++]) {
if (*ps1++ == '\0') {
return (0);
}
}
return (ucharmap[*ps1] - ucharmap[*--ps2]);
}

int ap_strncasecmp(const char *s1, const char *s2, int n)
{
const unsigned char *ps1 = (const unsigned char *) s1;
const unsigned char *ps2 = (const unsigned char *) s2;
while (n--) {
if (ucharmap[*ps1] != ucharmap[*ps2++]) {
return (ucharmap[*ps1] - ucharmap[*--ps2]);
}
if (*ps1++ == '\0') {
break;
}
}
return (0);
}

#define PROG argv[0]
#define METHOD argv[1]
#define NB argv[2]
#define S1 argv[3]
#define S2 argv[4]
#define LEN argv[5]

/* The ++ are here to try to prevent some optimization done by gcc */
#define FOR for (i=0; i<nb; i++, (S1[0])++, (S2[0])++)

int main(int argc, char *argv[])
{
int i, len, nb;
int res = 0;

if (argc < 6) {
printf("%s <0=std ; 1=optimized>  S1 S2 \n", PROG);
return 0;
}

len = atoi(LEN);
nb  = atoi(NB);

if (*METHOD == '0') {
printf(" (nb=%d, len=%d)\n", nb, len);
if (len == 0) {
FOR {
/* really use the result of the function */
res |= strcasecmp(S1, S2);
    }
}
else {
FOR {
res |= strncasecmp(S1, S2, len);
}
}
}

Re: strncasecmp

2015-11-20 Thread William A Rowe Jr
I'm +1 if you are suggesting an ascii-only implementation, and by all means
introduce both an ap_ and apr_ flavor at the same time.

E.g. apr[r]_ascii_str[n]casecmp().

All characters <'A' || >'Z' && <'a || >'z' should be compared by identity.

I'm not sure which OS/X implementation you are referring to, though...
http://www.opensource.apple.com/source/Libc/Libc-1044.40.1/string/FreeBSD/strcasecmp.c

That implementation reminds us that we shouldn't be trusting these local
code page implementations for comparing defined ascii tokens, and many
of our comparisons in httpd are for such purposes.

But since a non-locale based ascii implementation comparing
tolower(x)-tolower(y) devolves to chrtable[x]-chrtable[y], where
tolower is inline, and if we intend to preserve the defined <0 == >0
behavior of the K definition, you do need to fold.

Yes the switch/case is ugly, but far more efficient for a larger set of
values/tokens.  One optimization would be iterating the pre-sorted
list of tokens in a binary fashion with a near skip pointer or returning
the elt of that item, but that would quickly become as crufty, subject
to bugs and unmaintainable as the usual switch/case solution.



On Fri, Nov 20, 2015 at 11:17 AM, Jim Jagielski <j...@jagunet.com> wrote:

> We use str[n]casecmp quite a bit. The rub is that some
> platforms use a sensible implementation (such as OSX) which
> uses a upper-lowercase map and is v. fast, and others
> use a brain-dead version which does an actual tolower() of
> each char in the string as it tests. We actually try to
> handle this in many cases by doing a switch/case test on the
> 1st char to fast path the strncasecmp, resulting in ugly code.
>
> This is crazy.
>
> I propose a ap_strncasecmp/ap_strcasecmp which we should use.
> Ideally, it would be in apr but no need to wait for that
> to happen :)
>
> Unless people have heartburn about this, I will add next week.
>


Re: strncasecmp

2015-11-20 Thread William A Rowe Jr
Pay special attention to;

The *strncasecmp*() function shall compare, *while ignoring differences in
case*, not more than *n* bytes from the string pointed to by *s1* to the
string pointed to by *s2*.

In the POSIX locale, *strcasecmp*() and *strncasecmp*() shall *behave as if
the strings had been converted to lowercase* and then a byte comparison
performed. The results are unspecified in other locales.
http://pubs.opengroup.org/onlinepubs/009695399/functions/strcasecmp.html

E.g. 0x5B-0x60 sort before alpha.

If you don't want to honor posix semantics, don't abuse the "strcmp"
concept, but rather name this "streq" or something otherwise less confusing
:-)

On Fri, Nov 20, 2015 at 12:22 PM, William A Rowe Jr <wr...@rowe-clan.net>
wrote:

> I'm +1 if you are suggesting an ascii-only implementation, and by all means
> introduce both an ap_ and apr_ flavor at the same time.
>
> E.g. apr[r]_ascii_str[n]casecmp().
>
> All characters <'A' || >'Z' && <'a || >'z' should be compared by identity.
>
> I'm not sure which OS/X implementation you are referring to, though...
>
> http://www.opensource.apple.com/source/Libc/Libc-1044.40.1/string/FreeBSD/strcasecmp.c
>
> That implementation reminds us that we shouldn't be trusting these local
> code page implementations for comparing defined ascii tokens, and many
> of our comparisons in httpd are for such purposes.
>
> But since a non-locale based ascii implementation comparing
> tolower(x)-tolower(y) devolves to chrtable[x]-chrtable[y], where
> tolower is inline, and if we intend to preserve the defined <0 == >0
> behavior of the K definition, you do need to fold.
>
> Yes the switch/case is ugly, but far more efficient for a larger set of
> values/tokens.  One optimization would be iterating the pre-sorted
> list of tokens in a binary fashion with a near skip pointer or returning
> the elt of that item, but that would quickly become as crufty, subject
> to bugs and unmaintainable as the usual switch/case solution.
>
>
>
> On Fri, Nov 20, 2015 at 11:17 AM, Jim Jagielski <j...@jagunet.com> wrote:
>
>> We use str[n]casecmp quite a bit. The rub is that some
>> platforms use a sensible implementation (such as OSX) which
>> uses a upper-lowercase map and is v. fast, and others
>> use a brain-dead version which does an actual tolower() of
>> each char in the string as it tests. We actually try to
>> handle this in many cases by doing a switch/case test on the
>> 1st char to fast path the strncasecmp, resulting in ugly code.
>>
>> This is crazy.
>>
>> I propose a ap_strncasecmp/ap_strcasecmp which we should use.
>> Ideally, it would be in apr but no need to wait for that
>> to happen :)
>>
>> Unless people have heartburn about this, I will add next week.
>>
>
>


strncasecmp

2015-11-20 Thread Jim Jagielski
We use str[n]casecmp quite a bit. The rub is that some
platforms use a sensible implementation (such as OSX) which
uses a upper-lowercase map and is v. fast, and others
use a brain-dead version which does an actual tolower() of
each char in the string as it tests. We actually try to
handle this in many cases by doing a switch/case test on the
1st char to fast path the strncasecmp, resulting in ugly code.

This is crazy.

I propose a ap_strncasecmp/ap_strcasecmp which we should use.
Ideally, it would be in apr but no need to wait for that
to happen :)

Unless people have heartburn about this, I will add next week.


Re: strncasecmp

2015-11-20 Thread Christophe JAILLET

+1


Le 20/11/2015 18:17, Jim Jagielski a écrit :

We use str[n]casecmp quite a bit. The rub is that some
platforms use a sensible implementation (such as OSX) which
uses a upper-lowercase map and is v. fast, and others
use a brain-dead version which does an actual tolower() of
each char in the string as it tests. We actually try to
handle this in many cases by doing a switch/case test on the
1st char to fast path the strncasecmp, resulting in ugly code.

This is crazy.

I propose a ap_strncasecmp/ap_strcasecmp which we should use.
Ideally, it would be in apr but no need to wait for that
to happen :)

Unless people have heartburn about this, I will add next week.





Re: strncasecmp

2015-11-20 Thread Yann Ylavic
+1

On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski <j...@jagunet.com> wrote:
> We use str[n]casecmp quite a bit. The rub is that some
> platforms use a sensible implementation (such as OSX) which
> uses a upper-lowercase map and is v. fast, and others
> use a brain-dead version which does an actual tolower() of
> each char in the string as it tests. We actually try to
> handle this in many cases by doing a switch/case test on the
> 1st char to fast path the strncasecmp, resulting in ugly code.
>
> This is crazy.
>
> I propose a ap_strncasecmp/ap_strcasecmp which we should use.
> Ideally, it would be in apr but no need to wait for that
> to happen :)
>
> Unless people have heartburn about this, I will add next week.


Re: strncasecmp

2015-11-20 Thread Jim Jagielski
Implemented in r1715401

If people want to nit-pick about naming and wish to
rename it to something else, be my guest.

> On Nov 20, 2015, at 1:03 PM, Yann Ylavic <ylavic@gmail.com> wrote:
> 
> +1
> 
> On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski <j...@jagunet.com> wrote:
>> We use str[n]casecmp quite a bit. The rub is that some
>> platforms use a sensible implementation (such as OSX) which
>> uses a upper-lowercase map and is v. fast, and others
>> use a brain-dead version which does an actual tolower() of
>> each char in the string as it tests. We actually try to
>> handle this in many cases by doing a switch/case test on the
>> 1st char to fast path the strncasecmp, resulting in ugly code.
>> 
>> This is crazy.
>> 
>> I propose a ap_strncasecmp/ap_strcasecmp which we should use.
>> Ideally, it would be in apr but no need to wait for that
>> to happen :)
>> 
>> Unless people have heartburn about this, I will add next week.



Re: strncasecmp

2015-11-20 Thread William A Rowe Jr
At first glance this seems to meet the POSIX definition in the POSIX
context.  Just want to be careful about using this for non-posix
applications and your doxygen seems to cover this.  Looks good to me.
Implemented in r1715401

If people want to nit-pick about naming and wish to
rename it to something else, be my guest.

> On Nov 20, 2015, at 1:03 PM, Yann Ylavic <ylavic@gmail.com> wrote:
>
> +1
>
> On Fri, Nov 20, 2015 at 6:17 PM, Jim Jagielski <j...@jagunet.com> wrote:
>> We use str[n]casecmp quite a bit. The rub is that some
>> platforms use a sensible implementation (such as OSX) which
>> uses a upper-lowercase map and is v. fast, and others
>> use a brain-dead version which does an actual tolower() of
>> each char in the string as it tests. We actually try to
>> handle this in many cases by doing a switch/case test on the
>> 1st char to fast path the strncasecmp, resulting in ugly code.
>>
>> This is crazy.
>>
>> I propose a ap_strncasecmp/ap_strcasecmp which we should use.
>> Ideally, it would be in apr but no need to wait for that
>> to happen :)
>>
>> Unless people have heartburn about this, I will add next week.


Re: strncasecmp

2015-11-20 Thread Christophe JAILLET

Le 20/11/2015 18:17, Jim Jagielski a écrit :

Ideally, it would be in apr

+1
This could also be even more interesting, because of apr_table_ functions.

CJ