Re: Concise term for non-ASCII Unicode characters

Daniel Bünzli Tue, 29 Sep 2015 11:09:46 -0700

Le mardi, 29 septembre 2015 à 18:30, Sean Leonard a écrit :
> Uh...I think you mean U+007F? :)


Yes… see how it was easy to point out that the definition was wrong. It would 
also have been, if this was code and we were talking about a protocol whose 
specification was using this notation rather than a new Unicode concept.

> Perhaps it's because I'm writing to the Unicode crowd, but honestly
> there are a lot of very intelligent software engineers/standards folks  
> who do not have the "basic knowledge of the Unicode standard" that is  
> being presumed. They want to focus on other parts of their systems or  
> protocols, and when it comes to the "text part", they just hand-wave and  
> say "Unicode!" and call it a day.

Introducing more terminology and jargon is not going to help in this case. Make 
the definitions as obvious as possible and strive for minimality in the exposed 
concepts.

> The fact that (modern implementations of) UTF-8 encoders and decoders are not 
> supposed to process the surrogate code points (arbitrarily), for example, is a
> rather advanced topic

I wouldn't say this is advanced knowledge, this is basic knowledge any 
programmer dealing with Unicode text should have. FWIW this [1] is the absolute 
minimal knowledge I think programmers should have about Unicode (the last 
section can be skipped it's specific to a programming language). This 
corresponds to maybe 3 to 4 A4 pages. If your programmers are not able to grok 
this small amount of knowledge, hire better ones.

Best,  

Daniel

[1] http://erratique.ch/software/uucp/doc/Uucp.html#uminimal

Re: Concise term for non-ASCII Unicode characters

Reply via email to