I think the term "non-ASCII Unicode" is just fine, and we don't need anything beyond that. It is clearly those Unicode characters that aren't (2) in http://unicode.org/glossary/#ASCII.
Mark <https://google.com/+MarkDavis> *— Il meglio è l’inimico del bene —* On Tue, Sep 29, 2015 at 6:20 PM, Sean Leonard <[email protected]> wrote: > On 9/21/2015 5:17 PM, Peter Constable wrote: > >> If you think it's a serious problem that there isn't one conventional >> term for "characters outside the ASCII repertoire" or "UTF-8 >> multi-code-unit encoded representations" (since different authors could >> devise different terminology solutions), then I suggest you submit a >> document to UTC explaining why it's a problem, documenting inconsistent or >> unclear terminology that's been used in some standards / public >> specifications, and requesting that Unicode formally define terminology for >> these concepts. I can't guarantee that UTC will do it, but I can predict >> with confidence that it _won't_ do anything of that nature if nobody >> submits such a document. Peter >> > > I am of the mind to do just that, then. I have seen different documents, > standards, and standards bodies that have invented terminology around this > term, and they are not always the same. Since these standards depend on > Unicode, it would make a lot of sense for Unicode formally to define > terminology for these concepts. With the proliferation of UTF-8 (among > other things), the boundary between 0x7F - 0x80 is more significant than > the boundary between 0xFFFF - 0x10000. > > Since this will be my first submission I would appreciate a co-author on > this topic. Is anyone willing to help? Thanks in advance. Also, it is not > clear if such a document is destined to become a Unicode Technical Report > (UTR / PDUTR etc.), or if it should just be an informal write-up. I am > guessing this is supposed to be somewhat informal but at the same time it > (or the results of it) ought to appear in the UTC Document Search. > > The current terminology that I am considering pursuing is "beyond ASCII", > in various permutations, such as "beyond the ASCII range", "characters > beyond ASCII", "code points beyond ASCII", etc. The term "beyond" implies a > certain directionality, and to that extent, implies the Unicode repertoire > as well as a Unicode encoding. We have seen on this list the blackflips > required to clarify "non-ASCII", since things that are not ASCII literally > could be a wide range of things. > > I think there is some confusion about whether the term "Basic Latin" > excludes the C0 control character range. Formally the standard seems clear > enough to me that it is co-terminus with ASCII, but there is still > confusion if you don't pore through the Standard. My thought is that maybe > the Blocks.txt data should be modified to say "ASCII (Basic Latin)" instead > of just "Basic Latin". (If we "go there", I would appreciate the wisdom of > an experienced Unicode co-author. I am not confident touching that just by > myself.) > > Sean >

