Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-24 Thread Martin J. Dürst

On 2017/03/23 22:32, Michael Everson wrote:


What is right for Deseret has to be decided by and for Deseret users, rather 
than by script historians.


Odd. That view doesn’t seem to be applicable to CJK unification.


Well, it may not seem to you, but actually it is. I have had a lot of 
discussions with Japanese and others about Han unification (mostly in 
the '90ies), and have studied the history and principles of Han 
unification in quite some detail.


To summarize it, Han unification unifies very much exactly those cases 
where an average user, in average texts, would consider two forms "the 
same" (i.e. exchangeable). Exceptions are due to the round trip rule. It 
also separates very much exactly those cases where an average user, for 
average texts, may not consider two forms equivalent.


If necessary, I can go into further details, but I would have to dig 
quite deeply for some of the sources.


Regards,   Martin.


Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Michael Everson

> On 23 Mar 2017, at 05:54, Martin J. Dürst  wrote:
> 
> Hello Michael, others,
> 
> [Fixed script name in subject.]
> 
> On 2017/03/23 09:03, Michael Everson wrote:
>> On 22 Mar 2017, at 21:39, David Starner  wrote:
> 
>>> There's the same characters here, written in different ways.
>> 
>> No, it’s not. Its the same diphthong (a sound) written with different 
>> letters.
> 
> I think this may well be the *historically* correct analysis. And that may 
> have some influence on how to encode this, but it shouldn't be dominant.

Well, Martin, maybe you’re comfortable with shifting goalposts, but we have 
used historically correct analysis to identify characters in the past and to 
continue with this precedent is consistent with good practice. 

> What's most important is (past and) *current use*. If the distinction is an 
> orthographic one (e.g. different words being written with different shapes), 
> then that's definitely a good indication for splitting.

It *is* an orthographic one. For one thing, the 1859 glyphs look NOTHING LIKE 
the 1855 glyphs. 


> On the other hand, if fonts (before/outside Unicode) only include one variant 
> at the time, if people read over the variant without much ado, if people 
> would be surprised to find both corresponding variants in one and the same 
> text (absent font variations), if there are examples where e.g. the variant 
> is adjusted in quotes from texts that used the 'old' variant inside a text 
> with the 'new' variants, and so on, then all these would be good indications 
> that this is, for actual usage purposes, just a font difference, and should 
> therefore best be handled as such.

Um, yeah. Why have Unicode at all? I mean people in Georgia were happy with 
ASCII-based font hacks. Lots of people are still using them. Sure, people put 
up with the unification of Coptic and Greek. 

Just font differences. Yeah. 

> The closes to the current case that I was able to find was the German ß. It 
> has roots in both an ss and an sz (to be precise, an ſs and an ſz) ligature 
> (see https://en.wikipedia.org/wiki/ß). And indeed in some fonts, its right 
> part looks more like an s, and in other fonts more like a z (and in lower 
> case, more often like an s, but in upper case, much more like a (cursive) Z). 
> Nevertheless, there is only one character (or two if you count upper case) 
> encoded, because anything else would be highly confusing to virtually all 
> users.

The situation of the Deseret diphthong letters isn’t anything like German ß. 
Yes, you can analyse it as something like ſs and ſȥ, but THOSE LOOK VERY NEARLY 
ALIKE.

Ignoring the stroke of SHORT I which is the same for all the Deseret letters 
being discussed, we have EW represented by Ѕ and Ћ (which look nothing alike) 
and OI represented by Љ and Ѓ (which look nothing alike).

A unification of these as “glyph variants” is perverse and not consistent with 
the way we have encoded things in the past.

> What is right for Deseret has to be decided by and for Deseret users, rather 
> than by script historians.

Odd. That view doesn’t seem to be applicable to CJK unification.

Michael


Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Philippe Verdy
2017-03-23 6:54 GMT+01:00 Martin J. Dürst :

> Hello Michael, others,
>
> On 2017/03/23 09:03, Michael Everson wrote:
>
>> On 22 Mar 2017, at 21:39, David Starner  wrote:
>>
>
> There's the same characters here, written in different ways.
>>>
>>
>> No, it’s not. Its the same diphthong (a sound) written with different
>> letters.
>>
>
> The closes to the current case that I was able to find was the German ß.
> It has roots in both an ss and an sz (to be precise, an ſs and an ſz)
> ligature (see https://en.wikipedia.org/wiki/ß). And indeed in some fonts,
> its right part looks more like an s, and in other fonts more like a z (and
> in lower case, more often like an s, but in upper case, much more like a
> (cursive) Z). Nevertheless, there is only one character (or two if you
> count upper case) encoded, because anything else would be highly confusing
> to virtually all users.
>

This is a good case for encoding explicit variants, including for the two
German ß, to distinguish letter forms in historic (medieval?) texts where
ſs and ſz were more distinguished. This does not require disuynification,
and fonts that can have both forms can choose the correct glyph to use for
each variant, and take a default form for the unified character depending
on the contextual language (if it is detected) or based on the font style
itself (if it was initially designed for a specific language, notably in
medieval styles).


> What is right for Deseret has to be decided by and for Deseret users,
> rather than by script historians.
>

In historic texts it is not clear which letter form is better than the
other, and historic Deseret was basically for a single language (but there
may have been regional variants prefering a form instead of the other). I
think that now the distinction is in fact more recent, where some eople
will want to distinguish them for new uses with dinstinctions. Here also a
variant encoding would solve these special cases but we should not disunify
the character (and in fact there's not a lot of fonts except for fancy
usages, such as trying to mimic handwritten styles for specific authors
about how they draw these shapes; I've not seen however any conclusive case
of distinction in typesetted texts).

In fact we are in a situation similar to the case of shapes for decimal
digits like 4 (open or closed), 7 (with an overstriking bar or none), or 0
(with an overstriking slash or dot, or none), 3 (with an angular or circle
top part), or letters like g (with a curled leg drawn counterclockwise, or
just a bottom foot from right to left: here a distinctive shape was encoded
for the IPA symbol)

>
> Regards,   Martin.
>


Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Richard Wordingham
On Thu, 23 Mar 2017 11:23:27 +0100
Otto Stolz  wrote:

> Same issue as with German sharp S: The blackletter »ß« derives from an
> ſ-z ligature (thence its German name »Eszet«), whilst the Roman type
> »ß« derives from an ſ-s ligature. Still, we encode both variants as
> identical letters. I’ve got a print from 1739 with legends in both
> German (blackletter) and French (Roman italics), comprising both types
> of ligatures in one single document.

There's another, lesser German analogy.  If I understand correctly, in
some styles the diaeresis and umlaut marks may be distinguished
visually.  While it is permissible to use CGJ to mark the difference,
the TUS claims (TUS 9.0 p833, in Section 23.2) that CGJ does not affect
rendering, except for the direct effect of blocking canonical
reordering.  (This does appear to be in contrast to its seemingly
archaic effect in inhibiting line-breaking.)

However, combining marks are, by policy, unified more readily than
letters.

Richard.



Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Otto Stolz

Hello Michael, others,

On 2017/03/23 09:03, Michael Everson wrote:

Its the same diphthong (a sound) written with different
letters.


Am 23.03.2017 um 06:54 schrieb Martin J. Dürst:

I think this may well be the *historically* correct analysis. And that
may have some influence on how to encode this, but it shouldn't be
dominant.

What's most important is (past and) *current use*.


Same issue as with German sharp S: The blackletter »ß« derives from an
ſ-z ligature (thence its German name »Eszet«), whilst the Roman type
»ß« derives from an ſ-s ligature. Still, we encode both variants as
identical letters. I’ve got a print from 1739 with legends in both
German (blackletter) and French (Roman italics), comprising both types
of ligatures in one single document.

Best wishes,
  Otto



Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread James Kass
Martin J. Dürst wrote,

> What is right for Deseret has to be decided by
> and for Deseret users, rather than by script
> historians.

The Universal Character Set is used by everyone, including script
historians.  While modern day deployment of the script is determined
by its users, the proper encoding of the script should be detemined by
character encoders based upon expert input from all interested
parties.

Best regards,

James Kass



Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Martin J. Dürst

Hello Michael, others,

[Fixed script name in subject.]

On 2017/03/23 09:03, Michael Everson wrote:

On 22 Mar 2017, at 21:39, David Starner  wrote:



There's the same characters here, written in different ways.


No, it’s not. Its the same diphthong (a sound) written with different letters.


I think this may well be the *historically* correct analysis. And that 
may have some influence on how to encode this, but it shouldn't be dominant.


What's most important is (past and) *current use*. If the distinction is 
an orthographic one (e.g. different words being written with different 
shapes), then that's definitely a good indication for splitting.


On the other hand, if fonts (before/outside Unicode) only include one 
variant at the time, if people read over the variant without much ado, 
if people would be surprised to find both corresponding variants in one 
and the same text (absent font variations), if there are examples where 
e.g. the variant is adjusted in quotes from texts that used the 'old' 
variant inside a text with the 'new' variants, and so on, then all these 
would be good indications that this is, for actual usage purposes, just 
a font difference, and should therefore best be handled as such.


The closes to the current case that I was able to find was the German ß. 
It has roots in both an ss and an sz (to be precise, an ſs and an ſz) 
ligature (see https://en.wikipedia.org/wiki/ß). And indeed in some 
fonts, its right part looks more like an s, and in other fonts more like 
a z (and in lower case, more often like an s, but in upper case, much 
more like a (cursive) Z). Nevertheless, there is only one character (or 
two if you count upper case) encoded, because anything else would be 
highly confusing to virtually all users.


What is right for Deseret has to be decided by and for Deseret users, 
rather than by script historians.


Regards,   Martin.


The glyphs may come from a different origin, but it's encoding the same idea.


We don’t encode diphthongs. We encode the elements of writing systems. The 
“idea” here is represented by one ligature of І + Ѕ (1855 EW), one ligature of 
І + Ћ (1859 EW), one ligature of Љ + І (1855 OI), and one ligature of Ѓ + І 
(1859 OI).

Those ligatures are not glyph variants of one another. You might as well say 
that Æ and Œ are glyph variants of one another.


If a user community considers them separate, then they should be separated, but 
I don't see that happening, and from an idealistic perspective, I think they're 
platonically the same.


I do not agree with that analysis. The ligatures and their constituent parts 
are distinct and distinctive. In fact, it might have been that the choice for 
revision was to improve the underlying phonology. In any case, there’s no way 
that the bottom pair in 
https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg
 can be considered to be “glyph variants” of the top pair. Usage is one thing. 
Character identity is another. Æ is not Œ. A ligature of І + Ѕ is not a 
ligature of І + Ћ.

Michael Everson
.