On Mon, 23 Oct 2017 05:47 pm, Rustom Mody wrote: > On Monday, October 23, 2017 at 8:06:03 AM UTC+5:30, Lawrence D’Oliveiro > wrote: [...] >> Bear in mind that the logical representation of the text is as code points, >> graphemes would have more to do with rendering. > > Heh! Speak of Euro/Anglo-centrism!
I think that Lawrence may be thinking of glyphs. Glyphs are the display form that are rendered. Graphemes are the smallest unit of written language. > In a sane world graphemes would be called letters Graphemes *aren't* letters. For starters, not all written languages have an alphabet. No alphabet, no letters. Even in languages with an alphabet, not all graphemes are letters. Graphemes include: - logograms (symbols which represent a morpheme, an entire word, or a phrase), e.g. Chinese characters, ampersand &, the ™ trademark or ® registered trademark symbols; - syllabic characters such as Japanese kana or Cherokee; - letters of alphabets; - letters with added diacritics; - punctuation marks; - mathematical symbols; - typographical symbols; - word separators; and more. Many linguists also include digraphs (pairs of letters) like the English "th", "sh", "qu", or "gh" as graphemes. https://www.thoughtco.com/what-is-a-grapheme-1690916 https://en.wikipedia.org/wiki/Grapheme > And unicode codepoints would be called something else — letterlets?? > To be fair to the Unicode consortium, they strive hard to call them > codepoints But in an anglo-centric world, the conflation of codepoint to > letter is inevitable I guess. To hear how a non Roman-centric view of the > world would sound: A 'w' is a poorly double-struck 'u' > A 't' is a crossed 'l' > Reasonable? No, T is not a crossed L -- they are unrelated letters and the visual similarity is a coincidence. They are no more connected than E is just an F with an extra line. But you are more right than you knew regarding W: it *literally was* a doubled-up V (sometimes written U) once upon a time. For a long time W did not appear in the Latin alphabet, even after people used it in written text. It was considered a digraph VV then a ligature and finally, only gradually, a proper letter. As late as the 16th century the German grammatican Valentin Ickelshamer complained that hardly anyone, including school masters, knew what to do with W or what it was called. https://en.wikipedia.org/wiki/W#History > The lead of https://en.wikipedia.org/wiki/%C3%9C has > > | Ü, or ü, is a character…classified as a separate letter in several > | extended Latin alphabets > | (including Azeri, Estonian, Hungarian and Turkish), but as the letter U > | with an umlaut/diaeresis in others such as Catalan, French, Galician, > | German, Occitan and Spanish. Indeed: sometimes the same grapheme is considered a letter in one language and a letter-plus-diacritic in another. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list