On Fri, Mar 10 2017 at 19:55 CET, man...@mozilla.com writes: > I recently wrote > http://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/ > , which sort of addresses the whole hangup programmers have with > treating code points as "characters".
[...] This is just another confirmation that the present Unicode terminology is confusing. Let me remind below a fragment of an old thread about "textels". Best regards Janusz On Thu, Sep 15 2016 at 21:12 CEST, jsb...@mimuw.edu.pl writes: > On Thu, Sep 15 2016 at 16:36 CEST, john.w.kenn...@gmail.com writes: > > [...] > >> In the new Swift programming language, which is white-hot in the Apple >> community, Apple is moving toward a model of a transparent, generic >> Unicode that can be “viewed” as UTF-8, UTF-16, or UTF-32 if necessary, >> but in which a “character” contains however many code points it needs >> (“e” with a stacked macron, acute accent, and dieresis is >> algorithmically one “character” in Swift). Moreover, >> e-with-an-acute-accent and e followed by a combining acute accent, for >> example, compare as equal. At present, the underlying code is still >> UTF-16LE. > > For several years I use the name "textel" (text element, in Polish > "tekstel") for such objects. I do it mostly orally in my presentations > for my students, but I used it also in writing e.g. in > http://bc.klf.uw.edu.pl/118/, unfortunately without a proper > definition. A rudymentary definition was provided for me only in my > recent paper in Polish: http://bc.klf.uw.edu.pl/480/. It states simply > (on p. 69) "an elementary text element independently of its Unicode > representation" (meaning in particular composed vs precomposed). I still > hope to formulate sooner or later a more satisfactory definition :-) > > I think Swift confirms that such a notion is really needed. > > Best regards > > Janusz On Wed, Sep 21 2016 at 6:44 CEST, jsb...@mimuw.edu.pl writes: > On Tue, Sep 20 2016 at 18:09 CEST, d...@ewellic.org writes: >> Janusz Bień wrote: >> >>> For me it means that Swift's characters are equivalence classes of the >>> set of extended grapheme clusters by canonical equivalence relation. >> >> I still hope we can come to some conclusion on the correct Unicode name >> for this concept. I don't think non-Unicode interpretations of terms >> like "grapheme" are grounds for throwing out "grapheme cluster," > > I agree. > >> but I can see that the equivalence class itself is lacking a name. > > I'glad. > >> >> Note that the Swift definition doesn't say that <00E9> and <0065 0301> >> are identical entities, only that the language compares them as equal. > > I'm fully aware of this. > > Best regards > > Janusz -- , Prof. dr hab. Janusz S. Bien - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/