On Tue, 18 Feb 2003 11:35:32 +1100
Martin Pool <[EMAIL PROTECTED]> wrote:
> On 18 Feb 2003, Andrew Bartlett <[EMAIL PROTECTED]> wrote:
>
> > Possibly only for long strings? But then that is probably
> > micro-optimization.
>
> If we really cared about optimizing this function, then we would
> compare character-by-character rather than converting both strings to
> uppercase first. This is a bit hard for some wierd encodings I know,
> but it ought to be possible to do it in charcnv.c.
Actually you got me thinking and it's not all that hard. In fact I think
there are a lot of good optamizations you can make in this function. For
example you only have to convert to wide characters if *both* characters
are multibyte sequences. If only one has the high bit on they cannot
possibly match even caseless so *str1 != *str2 clause will return.
Here's some rough code. I didn't even try to compile this.
int
utf8casecmp(const char *str1, size_t sn1, const char *str2, size_t sn2)
{
size_t n1, n2;
wchar_t ucs1, ucs2;
mbstate_t ps1, ps2;
unsigned char uc1, uc2;
memset(&ps1, 0, sizeof(ps1));
memset(&ps2, 0, sizeof(ps2));
while (sn1 > 0 && sn2 > 0) {
if ((*str1 & 0x80) && (*str2 & 0x80)) { /* both multibyte */
if ((n1 = mbrtowc(&ucs1, str1, sn, &ps1)) < 0 ||
(n2 = mbrtowc(&ucs2, str2, sn, &ps2)) < 0) {
perror("mbrtowc");
return -1;
}
if (ucs1 != ucs2 &&
(ucs1 = towupper(ucs1)) != (ucs2 = towupper(ucs2))) {
return ucs1 < ucs2 ? -1 : 1;
}
sn1 -= n1; str1 += n1;
sn2 -= n2; str2 += n2;
} else { /* neither or one multibyte */
uc1 = toupper(*str1);
uc2 = toupper(*str2);
if (uc1 != uc2) {
return uc1 < uc2 ? -1 : 1;
} else if (uc1 == '\0') {
return 0;
}
sn1--; str1++;
sn2--; str2++;
}
}
return 0;
}
Note this assumes you're running in a UTF-8 locale. I don't know how
you handle locales. Otherwise you'll need to switch out the mbrtowc
functions. But I think the algorithm is sound.
Mike
--
A program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes the potential for it to be applied to tasks that are
conceptually similar and, more important, to tasks that have not
yet been conceived.