Re: Standaridized variation sequences for the Deseret alphabet?
On 2017/03/23 22:32, Michael Everson wrote: What is right for Deseret has to be decided by and for Deseret users, rather than by script historians. Odd. That view doesn’t seem to be applicable to CJK unification. Well, it may not seem to you, but actually it is. I have had a lot of discussions with Japanese and others about Han unification (mostly in the '90ies), and have studied the history and principles of Han unification in quite some detail. To summarize it, Han unification unifies very much exactly those cases where an average user, in average texts, would consider two forms "the same" (i.e. exchangeable). Exceptions are due to the round trip rule. It also separates very much exactly those cases where an average user, for average texts, may not consider two forms equivalent. If necessary, I can go into further details, but I would have to dig quite deeply for some of the sources. Regards, Martin.
Re: Standaridized variation sequences for the Deseret alphabet?
> On 23 Mar 2017, at 05:54, Martin J. Dürstwrote: > > Hello Michael, others, > > [Fixed script name in subject.] > > On 2017/03/23 09:03, Michael Everson wrote: >> On 22 Mar 2017, at 21:39, David Starner wrote: > >>> There's the same characters here, written in different ways. >> >> No, it’s not. Its the same diphthong (a sound) written with different >> letters. > > I think this may well be the *historically* correct analysis. And that may > have some influence on how to encode this, but it shouldn't be dominant. Well, Martin, maybe you’re comfortable with shifting goalposts, but we have used historically correct analysis to identify characters in the past and to continue with this precedent is consistent with good practice. > What's most important is (past and) *current use*. If the distinction is an > orthographic one (e.g. different words being written with different shapes), > then that's definitely a good indication for splitting. It *is* an orthographic one. For one thing, the 1859 glyphs look NOTHING LIKE the 1855 glyphs. > On the other hand, if fonts (before/outside Unicode) only include one variant > at the time, if people read over the variant without much ado, if people > would be surprised to find both corresponding variants in one and the same > text (absent font variations), if there are examples where e.g. the variant > is adjusted in quotes from texts that used the 'old' variant inside a text > with the 'new' variants, and so on, then all these would be good indications > that this is, for actual usage purposes, just a font difference, and should > therefore best be handled as such. Um, yeah. Why have Unicode at all? I mean people in Georgia were happy with ASCII-based font hacks. Lots of people are still using them. Sure, people put up with the unification of Coptic and Greek. Just font differences. Yeah. > The closes to the current case that I was able to find was the German ß. It > has roots in both an ss and an sz (to be precise, an ſs and an ſz) ligature > (see https://en.wikipedia.org/wiki/ß). And indeed in some fonts, its right > part looks more like an s, and in other fonts more like a z (and in lower > case, more often like an s, but in upper case, much more like a (cursive) Z). > Nevertheless, there is only one character (or two if you count upper case) > encoded, because anything else would be highly confusing to virtually all > users. The situation of the Deseret diphthong letters isn’t anything like German ß. Yes, you can analyse it as something like ſs and ſȥ, but THOSE LOOK VERY NEARLY ALIKE. Ignoring the stroke of SHORT I which is the same for all the Deseret letters being discussed, we have EW represented by Ѕ and Ћ (which look nothing alike) and OI represented by Љ and Ѓ (which look nothing alike). A unification of these as “glyph variants” is perverse and not consistent with the way we have encoded things in the past. > What is right for Deseret has to be decided by and for Deseret users, rather > than by script historians. Odd. That view doesn’t seem to be applicable to CJK unification. Michael
Re: Standaridized variation sequences for the Deseret alphabet?
2017-03-23 6:54 GMT+01:00 Martin J. Dürst: > Hello Michael, others, > > On 2017/03/23 09:03, Michael Everson wrote: > >> On 22 Mar 2017, at 21:39, David Starner wrote: >> > > There's the same characters here, written in different ways. >>> >> >> No, it’s not. Its the same diphthong (a sound) written with different >> letters. >> > > The closes to the current case that I was able to find was the German ß. > It has roots in both an ss and an sz (to be precise, an ſs and an ſz) > ligature (see https://en.wikipedia.org/wiki/ß). And indeed in some fonts, > its right part looks more like an s, and in other fonts more like a z (and > in lower case, more often like an s, but in upper case, much more like a > (cursive) Z). Nevertheless, there is only one character (or two if you > count upper case) encoded, because anything else would be highly confusing > to virtually all users. > This is a good case for encoding explicit variants, including for the two German ß, to distinguish letter forms in historic (medieval?) texts where ſs and ſz were more distinguished. This does not require disuynification, and fonts that can have both forms can choose the correct glyph to use for each variant, and take a default form for the unified character depending on the contextual language (if it is detected) or based on the font style itself (if it was initially designed for a specific language, notably in medieval styles). > What is right for Deseret has to be decided by and for Deseret users, > rather than by script historians. > In historic texts it is not clear which letter form is better than the other, and historic Deseret was basically for a single language (but there may have been regional variants prefering a form instead of the other). I think that now the distinction is in fact more recent, where some eople will want to distinguish them for new uses with dinstinctions. Here also a variant encoding would solve these special cases but we should not disunify the character (and in fact there's not a lot of fonts except for fancy usages, such as trying to mimic handwritten styles for specific authors about how they draw these shapes; I've not seen however any conclusive case of distinction in typesetted texts). In fact we are in a situation similar to the case of shapes for decimal digits like 4 (open or closed), 7 (with an overstriking bar or none), or 0 (with an overstriking slash or dot, or none), 3 (with an angular or circle top part), or letters like g (with a curled leg drawn counterclockwise, or just a bottom foot from right to left: here a distinctive shape was encoded for the IPA symbol) > > Regards, Martin. >
Re: Standaridized variation sequences for the Deseret alphabet?
On Thu, 23 Mar 2017 11:23:27 +0100 Otto Stolzwrote: > Same issue as with German sharp S: The blackletter »ß« derives from an > ſ-z ligature (thence its German name »Eszet«), whilst the Roman type > »ß« derives from an ſ-s ligature. Still, we encode both variants as > identical letters. I’ve got a print from 1739 with legends in both > German (blackletter) and French (Roman italics), comprising both types > of ligatures in one single document. There's another, lesser German analogy. If I understand correctly, in some styles the diaeresis and umlaut marks may be distinguished visually. While it is permissible to use CGJ to mark the difference, the TUS claims (TUS 9.0 p833, in Section 23.2) that CGJ does not affect rendering, except for the direct effect of blocking canonical reordering. (This does appear to be in contrast to its seemingly archaic effect in inhibiting line-breaking.) However, combining marks are, by policy, unified more readily than letters. Richard.
Re: Standaridized variation sequences for the Deseret alphabet?
Hello Michael, others, On 2017/03/23 09:03, Michael Everson wrote: Its the same diphthong (a sound) written with different letters. Am 23.03.2017 um 06:54 schrieb Martin J. Dürst: I think this may well be the *historically* correct analysis. And that may have some influence on how to encode this, but it shouldn't be dominant. What's most important is (past and) *current use*. Same issue as with German sharp S: The blackletter »ß« derives from an ſ-z ligature (thence its German name »Eszet«), whilst the Roman type »ß« derives from an ſ-s ligature. Still, we encode both variants as identical letters. I’ve got a print from 1739 with legends in both German (blackletter) and French (Roman italics), comprising both types of ligatures in one single document. Best wishes, Otto
Re: Standaridized variation sequences for the Deseret alphabet?
Martin J. Dürst wrote, > What is right for Deseret has to be decided by > and for Deseret users, rather than by script > historians. The Universal Character Set is used by everyone, including script historians. While modern day deployment of the script is determined by its users, the proper encoding of the script should be detemined by character encoders based upon expert input from all interested parties. Best regards, James Kass
Re: Standaridized variation sequences for the Deseret alphabet?
Hello Michael, others, [Fixed script name in subject.] On 2017/03/23 09:03, Michael Everson wrote: On 22 Mar 2017, at 21:39, David Starnerwrote: There's the same characters here, written in different ways. No, it’s not. Its the same diphthong (a sound) written with different letters. I think this may well be the *historically* correct analysis. And that may have some influence on how to encode this, but it shouldn't be dominant. What's most important is (past and) *current use*. If the distinction is an orthographic one (e.g. different words being written with different shapes), then that's definitely a good indication for splitting. On the other hand, if fonts (before/outside Unicode) only include one variant at the time, if people read over the variant without much ado, if people would be surprised to find both corresponding variants in one and the same text (absent font variations), if there are examples where e.g. the variant is adjusted in quotes from texts that used the 'old' variant inside a text with the 'new' variants, and so on, then all these would be good indications that this is, for actual usage purposes, just a font difference, and should therefore best be handled as such. The closes to the current case that I was able to find was the German ß. It has roots in both an ss and an sz (to be precise, an ſs and an ſz) ligature (see https://en.wikipedia.org/wiki/ß). And indeed in some fonts, its right part looks more like an s, and in other fonts more like a z (and in lower case, more often like an s, but in upper case, much more like a (cursive) Z). Nevertheless, there is only one character (or two if you count upper case) encoded, because anything else would be highly confusing to virtually all users. What is right for Deseret has to be decided by and for Deseret users, rather than by script historians. Regards, Martin. The glyphs may come from a different origin, but it's encoding the same idea. We don’t encode diphthongs. We encode the elements of writing systems. The “idea” here is represented by one ligature of І + Ѕ (1855 EW), one ligature of І + Ћ (1859 EW), one ligature of Љ + І (1855 OI), and one ligature of Ѓ + І (1859 OI). Those ligatures are not glyph variants of one another. You might as well say that Æ and Œ are glyph variants of one another. If a user community considers them separate, then they should be separated, but I don't see that happening, and from an idealistic perspective, I think they're platonically the same. I do not agree with that analysis. The ligatures and their constituent parts are distinct and distinctive. In fact, it might have been that the choice for revision was to improve the underlying phonology. In any case, there’s no way that the bottom pair in https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg can be considered to be “glyph variants” of the top pair. Usage is one thing. Character identity is another. Æ is not Œ. A ligature of І + Ѕ is not a ligature of І + Ћ. Michael Everson .