Note that in C, it's essentially just as fast to make character comparisons
with (ch | 0x20) as with ch alone, i.e., if you know ch is in an ASCII range
(0 - 0x7F or 0xE0000 - 0xE007F), you can do a case insensitive compare as
quickly as a case sensitive one.  The problem with assuming lower case is
that the input might not all be in lower case.  I remember all too well
having to accept RTF control words with upper-case letters even though the
RTF spec and Word both specifically use all lower case for these words.

Murray

> -----Original Message-----
> From: Kenneth Whistler [SMTP:[EMAIL PROTECTED]]
> Sent: Wednesday, June 28, 2000 12:03 PM
> To:   Unicode List
> Cc:   [EMAIL PROTECTED]
> Subject:      Re: Plane 14 language tags
> 
> Doug Ewell asked:
> 
> > 2.  (Ken and Glenn) Can you explain in a little more detail the
> rationale
> >     for lowercasing the entire language tag?  It seems that if RFC 1766
> >     is the model to be followed, then the RFC 1766 casing convention
> >     (lowercase for language, uppercase for country) might be preferred.
> 
> John Cowan's non-authoritative response was fine by me -- and was
> better-expressed than this author would probably have done. ;-)
> 
> >     I guess I don't see how lowercasing the entire tag simplifies or
> >     speeds up anything, since the hyphen which separates language from
> >     country is outside the range of lowercase letters anyway and
> >     processes that want to ignore LT's must ignore the entire range from
> >     U+E0000 through U+E007F.
> 
> It is not a matter of range-checking. For ignoring tags, you would always
> check the entire range. Rather, it is just a suggestion that since
> case is not significant in the language tags, it is slightly preferable
> to do the early "normalization" (i.e. case folding to lowercase, in
> this instance), rather than emitting arbitrarily mixed case tags
> and distributing the case-folding burden to all the interpreters of
> the tags.
> 
> --Ken Whistler

Reply via email to