Re: Concise term for non-ASCII Unicode characters

Daniel Bünzli Mon, 21 Sep 2015 04:58:23 -0700

Le lundi, 21 septembre 2015 à 09:22, Sean Leonard a écrit :
> I think we can limit our inquiry to "characters" and "code points". Both
> of those are well-defined in Unicode (see  
> <http://unicode.org/glossary/>).


I wouldn't say so. If you actually have a look at the definition for character 
on this page. There are at least 4 different definitions for the notion of 
character and if you take the one that has formal one attached, i.e. synonym 
for abstract character (D7), then an abstract character can actually be 
represented by a *sequence* of Unicode scalar values.

If you are operating in the context of a standard or technical documentation 
please do use either code points (D9, D10) or scalar values (D76). These 
notions have precise definitions which makes up for saner discussions and 
understandings.  

> I wish that "non-ASCII characters" and "non-ASCII code points" (and  
> non-ASCII scalar values) were sufficient for me. Maybe they can be.  
> However, in contexts where ASCII is getting extended or supplemented  
> (e.g., in the DNS or in e-mail), one needs to be really clear that the  
> octets 0x80 - 0xFF are Unicode (specifically UTF-8, I suppose), and not  
> something else.

So it seems that you want terminology to talk about the *encoding* of Unicode 
scalar values, rather than scalar values themselves. Then I think you should 
specifically avoid terminology like "octets of 0x80-0xFF are Unicode" since 
this doesn't really make sense, there no Unicode property on octets. You should 
rather say something like "these octets may belong to the UTF-8 encoding scheme 
(D95) of Unicode scalar values greater than U+001F".

Best,  

Daniel

Re: Concise term for non-ASCII Unicode characters

Reply via email to