Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson

> On 26 Mar 2017, at 09:12, Martin J. Dürst  wrote:
> 
>> Thats a good point: any disunification requires showing examples of
>> contrasting uses.
> 
> Fully agreed.

The default position is NOT “everything is encoded unified until disunified”. 
The characters in question have different and undisputed origins, undisputed. 
We’ve encoded one pair; evidently this pair was deprecated and another pair was 
devised. The letters wynn and w are also used for the same thing. They too have 
different origins and are encoded separately. The letters yogh and ezh have 
different origins and are encoded separately. (These are not perfect analogies, 
but they are pertinent.)

> We haven't yet heard of any contrasting uses for the letter shapes we are 
> discussing.

Contrasting use is NOT the only criterion we apply when establishing the 
characterhood of characters. Please try to remember that. (It’s a bit shocking 
to have to remind people of this. 

Michael Everson


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Asmus Freytag

  
  
On 3/26/2017 1:51 PM, Michael Everson
  wrote:


  
Finally, if this was in major, modern use, adding these code points would have grave consequences for security.

  
  Why? They’re not visually similar to the existing characters. So spoofing wouldn’t be an issue. 

Spoofing would absolutely be an issue,
because if there are free alternates users will mis-remember
which one was used for a given label. Goes for the whole
simplified / traditional issue in the Han script.
Issues are not limited to visual similarity.
A./
  
  



Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Asmus Freytag

On 3/26/2017 9:20 AM, Michael Everson wrote:

On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:

The priority in encoding has to be with allowing distinctions in modern texts, 
or distinctions that matter to modern users of historic writing systems. Beyond 
that, theoretical analysis of typographical evolution can give some interesting 
insight, but I would be in the camp that does not accord them a status as 
primary rationale for encoding decisions.

Our rationales are NOT ranked in the way you suggest. A variety of criteria are 
applied.


And the way you weigh the criteria?



Thus, critical need for contrasting use of the glyph distinctions would have to 
be established before it makes sense to discuss this further.

Precedent for such needs is well-established. Consider the Latin Extended-D 
block. Sometimes it is editorial preference, and that’s not even always 
universal.


I think the Latin Extended-D block may have its own problems.

However, Latin as a script caters to so many varied levels of users, 
from ordinary text to scholarly notations that it really cannot be used 
to settle this issue.



I see no principled objection to having a font choice result in a noticeable or 
structural glyph variation for only a few elements of an alphabet. We have 
handle-a vs. bowl-a as well as hook-g vs. loop-g in Latin, and fonts routinely 
select one or the other.

Well, Asmus, we encode a and ɑ as well as g and ɡ and ᵹ.
And we do that for reasons that are very different from preserving the 
early and possibly transient history of a minor script.

And we do not consider ɑ and ɡ and ᵹ to be things that ought to be 
distinguished by variation selectors. (I am of course well aware of IPA usage.)
Yes, and the absence of such usage in the current example makes all the 
difference.

Whole-font switching is well understood. But character origin has always been 
taken into account. Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL 
MOON which are apparently really supposed to have identical glyphs, though we 
use an old-fashioned style in the charts for the former. (Yes, I am of course 
aware that there are other reasons for distinguishing these, but as far as 
glyphs go, even our standard distinguishes them artificially.)
Apparently not only in the standard, because they show as different in 
the plaintext view of this message.



(It is only for usage outside normal text that the distinction between these 
forms matters).

What’s “normal” text? “Normal” text in Latin probably doesn’t use the 
characters from the Latin Extended-D block.

"ordinary" text, if you like, reflecting standard orthographies.

As opposed to notational systems.



While the Deseret forms are motivated by their pronunciation, I'm not 
necessarily convinced that the distinction has any practical significance that 
is in any way different than similar differences in derivation (e.g. for long 
s-s or long-s-z for German esszett).

One practical consequence of changing the chart glyphs now, for instance, would 
be that it would invalidate every existing Deseret font. Adding new characters 
would not.
No, if we state that both glyphs are alternates for the same character 
*and if we decide, to _not_ add variation selectors* the choice is where 
it belongs: with the font maker.



In fact, it would seem that if a Deseret text was encoded in one of the two 
systems, changing to a different font would have the attractive property of 
preserving the content of the text (while not preserving the appearance).

Changing to a different font in order to change one or two glyphs is a 
mechanism that we have actually rejected many times in the past. We have 
encoded variant and alternate characters for many scripts.
If the underlying text element is the same, font switching can be the 
correct choice.



This, in a nutshell, is the criterion for making something a font difference 
vs. an encoding distinction.

Character identity is not defined by any single criterion.

Make it the "primary" criterion then.

  Moreover, in Deseret, it is not the case that all texts which contain the 
diphthong /juː/ or /ɔɪ/ write it using EW Ч or OI Ц. Many write them as Y + U 
ЏЋ and O + I ЄІ. So the choice is one of *spelling*, and spelling has always 
been a primary criterion for such decisions.

Yes, and those other spellings are not affected.



This is complicated by combining characters mostly identified by glyph, and the 
fact that while ä and aͤ may be the same character across time, there are 
people wanting to distinguish them in the same text today, and in both cases
 the theoretical falls to the practical. In this case, there are no 
combining character issues and there's nobody needing to use the two forms in 
the same text.

huh?

He’s wrong there, as I pointed out. A text in German may write an older 
Clavieruͤbung in a citation alongside the normal spelling Klavierübung. The 
choice of spelling is key.
That would 

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Asmus Freytag

On 3/26/2017 9:23 AM, Michael Everson wrote:

On 26 Mar 2017, at 17:02, Asmus Freytag  wrote:

On 3/26/2017 6:18 AM, Michael Everson wrote:


In any case it’s not a disunification. Some characters are encoded; they were 
used to write diphthongs in 1855. These characters were abandoned by 1859, and 
other characters were devised.

Calling them "characters" is pre-judging the issue, don't you think?

No, I don’t think so.


I really think it is.



We know that these are different shapes, but that they stand for the same text 
elements.

No, they don’t. Those diphthongs can also be represented in other ways in 
Deseret.


Having alternative ways to represent these doesn't invalidate or affect 
my argument.


I’ve never accepted the view that “everything is already encoded and everything 
new is a disunification” which seems to be a pretty common view.


I would not say I aspire to the view you quote.

If you encode a certain shape, it may get used for a range of text 
elements. This would (de facto) encode these text elements via that 
shape. If it is later felt that the given shape should not be used for 
the full range of text elements, then you could say that the "implicit" 
unification based on the usage (or, if you will, "fallback usage") was 
mistaken and should be better handled by two (or more) shapes. This 
represents a "de-facto" disunification.


However, where I part from your description is the "everything is 
already encoded". That would not be the case anywhere a range of text 
elements cannot be represented at all. Your statement also implies a 
"correctly encoded" or "successfully encoded" which is different from 
"there's an encoding that some people use as a fallback", which, if 
disunification should prove proper later on, would be a better way of 
describing what was the original situation.


Perhaps the point is subtle, but it is important.

In the current case, you have the opposite, to wit, the text elements 
are unchanged, but you would like to add alternate code elements to 
represent what are, ultimately, the same text elements. That's not 
disunification, but dual encoding.


A./


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson
On 26 Mar 2017, at 17:02, Asmus Freytag  wrote:
> 
> On 3/26/2017 6:18 AM, Michael Everson wrote:
> 
>> In any case it’s not a disunification. Some characters are encoded; they 
>> were used to write diphthongs in 1855. These characters were abandoned by 
>> 1859, and other characters were devised.
> 
> Calling them "characters" is pre-judging the issue, don't you think?

No, I don’t think so.

> We know that these are different shapes, but that they stand for the same 
> text elements.

No, they don’t. Those diphthongs can also be represented in other ways in 
Deseret.

I’ve never accepted the view that “everything is already encoded and everything 
new is a disunification” which seems to be a pretty common view. 

Michael Everson




Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Doug Ewell

Michael Everson wrote:


One practical consequence of changing the chart glyphs now, for
instance, would be that it would invalidate every existing Deseret
font. Adding new characters would not.


I thought the chart glyphs were not normative.

--
Doug Ewell | Thornton, CO, US | ewellic.org


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson
On 26 Mar 2017, at 18:20, Doug Ewell  wrote:
> 
> Michael Everson wrote:
> 
>> One practical consequence of changing the chart glyphs now, for instance, 
>> would be that it would invalidate every existing Deseret font. Adding new 
>> characters would not.
> 
> I thought the chart glyphs were not normative.

Come on, Doug. The letter W is a ligature of V and V. But sure, the glyphs are 
only informative, so why don’t we use an OO ligature instead?

Michael. 


Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-26 Thread Doug Ewell

Philippe Verdy wrote:


Or may be, only for historic texts, we could add a combining lowercase
e as an alternative to the existing diaeresis.


Something like U+0364 COMBINING LATIN SMALL LETTER E, maybe?

--
Doug Ewell | Thornton, CO, US | ewellic.org



Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson
On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:
> 
> The priority in encoding has to be with allowing distinctions in modern 
> texts, or distinctions that matter to modern users of historic writing 
> systems. Beyond that, theoretical analysis of typographical evolution can 
> give some interesting insight, but I would be in the camp that does not 
> accord them a status as primary rationale for encoding decisions.

Our rationales are NOT ranked in the way you suggest. A variety of criteria are 
applied. 

> Thus, critical need for contrasting use of the glyph distinctions would have 
> to be established before it makes sense to discuss this further.

Precedent for such needs is well-established. Consider the Latin Extended-D 
block. Sometimes it is editorial preference, and that’s not even always 
universal. 

> I see no principled objection to having a font choice result in a noticeable 
> or structural glyph variation for only a few elements of an alphabet. We have 
> handle-a vs. bowl-a as well as hook-g vs. loop-g in Latin, and fonts 
> routinely select one or the other.

Well, Asmus, we encode a and ɑ as well as g and ɡ and ᵹ. And we do not consider 
ɑ and ɡ and ᵹ to be things that ought to be distinguished by variation 
selectors. (I am of course well aware of IPA usage.) Whole-font switching is 
well understood. But character origin has always been taken into account. 
Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL MOON which are 
apparently really supposed to have identical glyphs, though we use an 
old-fashioned style in the charts for the former. (Yes, I am of course aware 
that there are other reasons for distinguishing these, but as far as glyphs go, 
even our standard distinguishes them artificially.)

> (It is only for usage outside normal text that the distinction between these 
> forms matters). 

What’s “normal” text? “Normal” text in Latin probably doesn’t use the 
characters from the Latin Extended-D block. 

> While the Deseret forms are motivated by their pronunciation, I'm not 
> necessarily convinced that the distinction has any practical significance 
> that is in any way different than similar differences in derivation (e.g. for 
> long s-s or long-s-z for German esszett).

One practical consequence of changing the chart glyphs now, for instance, would 
be that it would invalidate every existing Deseret font. Adding new characters 
would not. 

> In fact, it would seem that if a Deseret text was encoded in one of the two 
> systems, changing to a different font would have the attractive property of 
> preserving the content of the text (while not preserving the appearance). 

Changing to a different font in order to change one or two glyphs is a 
mechanism that we have actually rejected many times in the past. We have 
encoded variant and alternate characters for many scripts. 

> This, in a nutshell, is the criterion for making something a font difference 
> vs. an encoding distinction.

Character identity is not defined by any single criterion. Moreover, in 
Deseret, it is not the case that all texts which contain the diphthong /juː/ or 
/ɔɪ/ write it using EW Ч or OI Ц. Many write them as Y + U ЏЋ and O + I ЄІ. So 
the choice is one of *spelling*, and spelling has always been a primary 
criterion for such decisions. 

>> This is complicated by combining characters mostly identified by glyph, and 
>> the fact that while ä and aͤ may be the same character across time, there 
>> are people wanting to distinguish them in the same text today, and in both 
>> cases the theoretical falls to the practical. In this case, 
>> there are no combining character issues and there's nobody needing to use 
>> the two forms in the same text. 
> 
> huh?

He’s wrong there, as I pointed out. A text in German may write an older 
Clavieruͤbung in a citation alongside the normal spelling Klavierübung. The 
choice of spelling is key.

Michael Everson


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson

> On 26 Mar 2017, at 16:59, Asmus Freytag  wrote:
> 
> On 3/26/2017 8:47 AM, Michael Everson wrote:
>>> On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:
>>> 
>>> The latter is patent nonsense, because ä and aͤ are even less related to 
>>> each other than "i" and "j"; never mind the fact that their forms are both 
>>> based on the letter "a". Encoding and font choice should be seen as 
>>> separate.
>> He refers to the shape of the diacritical marks.
> 
> I see the issue: the font selected on my end made the "e" look like an "o", 
> which completely changed my understanding of what he tried to communicate.

Ah, yes.

M





Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Asmus Freytag

On 3/26/2017 10:33 AM, Michael Everson wrote:

On 26 Mar 2017, at 18:20, Doug Ewell  wrote:

Michael Everson wrote:


One practical consequence of changing the chart glyphs now, for instance, would 
be that it would invalidate every existing Deseret font. Adding new characters 
would not.

I thought the chart glyphs were not normative.

Come on, Doug. The letter W is a ligature of V and V. But sure, the glyphs are 
only informative, so why don’t we use an OO ligature instead?


If there was a tradition of writing W like omega, then switching the 
chart glyphs to that alternative tradition would be something that is at 
least not inconceivable -- even if perhaps not advisable.


For letters, their primary identity is not given by their shape, but 
their position / function in the alphabet.


That's why making Gaelic style and Fraktur a font switch works at all, 
even if that is not perfect (viz, ligatures in Fraktur).


In the Deseret case, making this alternation a font choice would tend to 
preserve the content of all documents. Making this an encoding 
difference would indeed invalidate some documents.


Finally, if this was in major, modern use, adding these code points 
would have grave consequences for security.


A./





Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson
On 26 Mar 2017, at 21:39, Asmus Freytag  wrote:

>> Come on, Doug. The letter W is a ligature of V and V. But sure, the glyphs 
>> are only informative, so why don’t we use an OO ligature instead?
> 
> If there was a tradition of writing W like omega, then switching the chart 
> glyphs to that alternative tradition would be something that is at least not 
> inconceivable -- even if perhaps not advisable.

You know, Asmus, no analogy is perfect. But mine was a discussion of letters 
derived from ligatures, and yours is just a random note about shape. 

> For letters, their primary identity is not given by their shape, but their 
> position / function in the alphabet.

This isn’t really something you can turn into an axiom, much as you would like 
to. Position in the alphabet may very WIDELY from language to language. As can 
function. The Latin letter c can mean /k s tʃ ts ʔ ʃ θ/… 

> That's why making Gaelic style and Fraktur a font switch works at all, even 
> if that is not perfect (viz, ligatures in Fraktur).

Font style isn’t the same thing in this context. The historical letters used to 
make the 1855 ligatures are *different* letters than those used for the 1859 
ligatures. 

> In the Deseret case, making this alternation a font choice would tend to 
> preserve the content of all documents.

No, since it’s a question of *spelling*. Some documents use a ligature-letter 
for the diphthong /juː/. Some documents use two separate letters for the same 
diphthong. So there’s no “standardized” spelling that works for all text that 
would be affected here. (Spelling for English wasn’t standardized anyway in 
historical Deseret texts and there is much variety.)

> Making this an encoding difference would indeed invalidate some documents.

Right now the 1859 characters aren’t representable. Deciding to change the 
chart glyphs to 1859 glyphs would just destabilize EVERY current Deseret font. 
That’s not something we should do. 

> Finally, if this was in major, modern use, adding these code points would 
> have grave consequences for security.

Why? They’re not visually similar to the existing characters. So spoofing 
wouldn’t be an issue. 

Michael Everson


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Richard Wordingham
On Sun, 26 Mar 2017 18:33:00 +0100
Michael Everson  wrote:

> On 26 Mar 2017, at 18:20, Doug Ewell  wrote:

> > Michael Everson wrote:

> >> One practical consequence of changing the chart glyphs now, for
> >> instance, would be that it would invalidate every existing Deseret
> >> font. Adding new characters would not.  

> > I thought the chart glyphs were not normative.  

> Come on, Doug. The letter W is a ligature of V and V. But sure, the
> glyphs are only informative, so why don’t we use an OO ligature
> instead?

A script-stlye font might legitimately use a glyph that looks like a
small omega for U+0077 LATIN SMALL LETTER W.  Small omega, of course,
is an οο ligature.  More to the point, a font may legitimately use the
same glyphs for U+0067 LATIN SMALL LETTER G and U+0261 LATIN SMALL
LETTER SCRIPT G.

A more serious issue is the multiple forms of U+014A LATIN CAPITAL
LETTER ENG, for which the underlying unity comes from their being the
capital form of U+014B LATIN SMALL LETTER ENG.

Are there not serious divergences with the shapes of the Syriac letters?

Richard.



Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson
On 26 Mar 2017, at 21:48, Richard Wordingham  
wrote:

>> Come on, Doug. The letter W is a ligature of V and V. But sure, the glyphs 
>> are only informative, so why don’t we use an OO ligature= instead?
> 
> A script-stlye font might legitimately use a glyph that looks like a small 
> omega for U+0077 LATIN SMALL LETTER W.

As I said to Asmus, my analogy was about ligatures made from underlying 
letters. Yours doesn’t apply because it’s just talking about glyph shapes. 

> Small omega, of course, is an οο ligature.

True. :-) Isn’t history wonderful?

> More to the point, a font may legitimately use the same glyphs for U+0067 
> LATIN SMALL LETTER G and U+0261 LATIN SMALL LETTER SCRIPT G.

A good font will still find a way to distinguish them. :-) 

> A more serious issue is the multiple forms of U+014A LATIN CAPITAL LETTER 
> ENG, for which the underlying unity comes from their being the capital form 
> of U+014B LATIN SMALL LETTER ENG.

We could have, and should have, solved this problem *long ago* by encoding 
LATIN CAPITAL LETTER AFRICAN ENG and LATIN SMALL LETTER AFRICAN ENG. 

> Are there not serious divergences with the shapes of the Syriac letters?

That is analogous to Roman/Gaelic/Fraktur. That analogy doesn’t apply to these 
Deseret characters; it’s not a whole-script gestalt. 

Michael Everson


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Werner LEMBERG

> Well, in most cases, but not e.g. for names. Goethe is not spelled
> Göthe.

Have a look into `Grimmsches Wörterbuch' to see the opposite :-)


Werner



Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Martin J. Dürst

On 2017/03/26 11:24, Philippe Verdy wrote:


Thats a good point: any disunification requires showing examples of
contrasting uses.


Fully agreed. We haven't yet heard of any contrasting uses for the 
letter shapes we are discussing.



Now depending on individual publications, authors would
use one character or the other according to their choice, and the encoding
will respect it. If we need further unification for matching texts in the
samer language across periods of time or authors, collation (UCA) can
provide help: this is already what it does in modern German with the digram
"ae" and the letter "ä" which are orthographic variants not distinguished
by the language but by authors' preference.


Well, in most cases, but not e.g. for names. Goethe is not spelled Göthe.

Regards,   Martin.


VS: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Erkki I Kolehmainen
I tend to agree with Martin, Philippe and others in questioning the 
disunification.

Sincerely,
Erkki I. Kolehmainen

-Alkuperäinen viesti-
Lähettäjä: Unicode [mailto:unicode-boun...@unicode.org] Puolesta Martin J. Dürst
Lähetetty: 26. maaliskuuta 2017 11:12
Vastaanottaja: verd...@wanadoo.fr; David Starner
Kopio: Michael Everson; unicode Unicode Discussion
Aihe: Re: Standaridized variation sequences for the Desert alphabet?

On 2017/03/26 11:24, Philippe Verdy wrote:

> Thats a good point: any disunification requires showing examples of 
> contrasting uses.

Fully agreed. We haven't yet heard of any contrasting uses for the letter 
shapes we are discussing.

> Now depending on individual publications, authors would use one 
> character or the other according to their choice, and the encoding 
> will respect it. If we need further unification for matching texts in 
> the samer language across periods of time or authors, collation (UCA) 
> can provide help: this is already what it does in modern German with 
> the digram "ae" and the letter "ä" which are orthographic variants not 
> distinguished by the language but by authors' preference.

Well, in most cases, but not e.g. for names. Goethe is not spelled Göthe.

Regards,   Martin.




Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson
On 25 Mar 2017, at 22:15, David Starner  wrote:
> 
> And I'd argue that a good theoretical model of the Latin script makes ä, ꞛ 
> and aͤ the same character, distinguished only by the font. 

Fortunately for the users of our standard, we don’t do this. 

> This is complicated by combining characters mostly identified by glyph, and 
> the fact that while ä and aͤ may be the same character across time, there are 
> people wanting to distinguish them in the same text today, and in both cases 
> the theoretical falls to the practical. In this case, there are no combining 
> character issues and there's nobody needing to use the two forms in the same 
> text. 

I’m fairly sure that a person citing a medieval document using aͤ may very well 
also need to write this alongside Swedish or German using ä. 

Michael Everson


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Asmus Freytag

On 3/26/2017 6:18 AM, Michael Everson wrote:

On 26 Mar 2017, at 10:07, Erkki I Kolehmainen  wrote:

I tend to agree with Martin, Philippe and others in questioning the 
disunification.

You may, but you give no evidence or discussion about it, so...

In any case it’s not a disunification. Some characters are encoded; they were 
used to write diphthongs in 1855. These characters were abandoned by 1859, and 
other characters were devised.


Calling them "characters" is pre-judging the issue, don't you think?

We know that these are different shapes, but that they stand for the 
same text elements.


A./


The origin of all of the characters as ligatures of other characters isn’t 
questioned. The right thing to do is to add the missing characters, not to 
invalidate any font that uses the 1855 characters by claiming that the 1855 and 
1859 characters are “the same”.

Michael Everson





Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread David Starner
On Sun, Mar 26, 2017 at 6:12 AM Michael Everson 
wrote:

> On 25 Mar 2017, at 22:15, David Starner  wrote:
> >
> > And I'd argue that a good theoretical model of the Latin script makes ä,
> ꞛ and aͤ the same character, distinguished only by the font.
>
> Fortunately for the users of our standard, we don’t do this.
>

You've yet to come up with users to whom these Deseret letters are relevant.

I’m fairly sure that a person citing a medieval document using aͤ may very
> well also need to write this alongside Swedish or German using ä.
>

I'm fairly sure that a person citing an early 20th century Germany document
may well feel the need to cite it in Fraktur. In both cases, I believe
that's going above and beyond the identity of the characters involved, but
in your case, people do contrast the aͤ with ä, and the user case has been
made. Show me the users who want to use these Deseret letters contrastingly.


Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-26 Thread Martin J. Dürst

On 2017/03/25 03:33, Doug Ewell wrote:

Philippe Verdy wrote:


But Unicode just prefered to keep the roundtrip compatiblity with
earlier 8-bit encodings (including existing ISO 8859 and DIN
standards) so that "ü" in German and French also have the same
canonical decomposition even if the diacritic is a diaeresis in French
and an umlaut in German, with different semantics and origins.


Was this only about compatibility, or perhaps also that the two signs
look identical and that disunifying them would have caused endless
confusion and misuse among users?


I'm not sure to what extent this was explicitly discussed when Unicode 
was created. The fact that the first 256 code points are identical to 
those in ISO-8859-1 was used as a big selling point when Unicode was 
first introduced. It may well have been that for Unicode, there was no 
discussion at all in this area, because ISO-8859-1 was already so well 
established.


And for ISO-8859-1, space was an important concern. Ideally, both 
Islandic and Turkish (and the letters missed for French) would have been 
covered, but that wasn't possible. Disunifying diaeresis and umlaut 
would have been an unaffordable luxury.


The above reasons mask any inherent reasons for why diaeresis and umlaut 
would have been unified or not if the decision had been argued purely 
"on the merit". But having used both German and French, and e.g. looking 
at the situation in Switzerland, where it was important to be able to 
write both French and German on the same typewriter, I would definitely 
argue that disunifying them would have caused endless

confusion and errors among users.

Also, it was argued a few mails ago that diaeresis and umlaut don't look 
exactly the same. I remember well that when Apple introduced its first 
laser printers, there were widespread complaints that the fonts (was it 
Helvetica, Times Roman, and Palatino?) unified away the traditional 
differences in the cuts of these typefaces for different languages.


So to quite some extent, in the relevant period (i.e. 1970ies/80ies), 
the differences between diaeresis and umlaut may be due to design 
differences in the cuts for different languages (e.g. French and 
German). Nobody would have disunified some basic letters because they 
may have looked slightly different in cuts for different languages, and 
so people may also have been just fine with unifying diaeresis and 
umlaut. (German fonts e.g. may have contained a 'ë' for use e.g. with 
"Citroën", but the dots on that 'ë' will have been the same shape as 
'ä', 'ö', and 'ü' umlauts for design consistency, and the other way 
round for French).


Regards,   Martin.


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson
On 26 Mar 2017, at 10:07, Erkki I Kolehmainen  wrote:
> 
> I tend to agree with Martin, Philippe and others in questioning the 
> disunification.

You may, but you give no evidence or discussion about it, so...

In any case it’s not a disunification. Some characters are encoded; they were 
used to write diphthongs in 1855. These characters were abandoned by 1859, and 
other characters were devised. The origin of all of the characters as ligatures 
of other characters isn’t questioned. The right thing to do is to add the 
missing characters, not to invalidate any font that uses the 1855 characters by 
claiming that the 1855 and 1859 characters are “the same”. 

Michael Everson


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson
On 26 Mar 2017, at 14:32, David Starner  wrote:

>>> And I'd argue that a good theoretical model of the Latin script makes ä, ꞛ 
>>> and aͤ the same character, distinguished only by the font.
>> 
>> Fortunately for the users of our standard, we don’t do this.
> 
> You've yet to come up with users to whom these Deseret letters are relevant.

You might imagine it takes time to identify problems and address them. 

>> I’m fairly sure that a person citing a medieval document using aͤ may very 
>> well also need to write this alongside Swedish or German using ä.
> 
> I'm fairly sure that a person citing an early 20th century Germany document 
> may well feel the need to cite it in Fraktur.

Fraktur is a whole-font substitition (modulo the ligatures). This is not the 
same thing as an editor choosing w or ƿ. Imagine if we had unified those two. 
After all, they both represent the same sound, right?

(Shudder.)

> In both cases, I believe that's going above and beyond the identity of the 
> characters involved, but in your case, people do contrast the aͤ with ä, and 
> the user case has been made. Show me the users who want to use these Deseret 
> letters contrastingly.

Do try to be less dismissive. Firstly, *I* have published entire books in 
Deseret and so I myself have a legitimate interest. In the second, Iam in fact 
beginning discussions with relevant experts.

Michael Everson


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Michael Everson

> On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:
> 
> The latter is patent nonsense, because ä and aͤ are even less related to each 
> other than "i" and "j"; never mind the fact that their forms are both based 
> on the letter "a". Encoding and font choice should be seen as separate.

He refers to the shape of the diacritical marks. 

Michael Everson


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Asmus Freytag

On 3/26/2017 8:47 AM, Michael Everson wrote:

On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:

The latter is patent nonsense, because ä and aͤ are even less related to each other than "i" and 
"j"; never mind the fact that their forms are both based on the letter "a". Encoding and 
font choice should be seen as separate.

He refers to the shape of the diacritical marks.


I see the issue: the font selected on my end made the "e" look like an 
"o", which completely changed my understanding of what he tried to 
communicate.


A./


Michael Everson





Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Asmus Freytag

  
  
On 3/25/2017 3:15 PM, David Starner
  wrote:


  

  On Fri, Mar 24, 2017 at 9:17 AM Michael Everson

wrote:
  
  
And we *can* distinguish i and j in that Latin text, because
we have separate characters encoded for it. And we *have*
encoded many other Latin ligature-based letters and sigla of
various kinds for the representation of medieval European
texts. Indeed, that’s just a stronger argument for
distinguishing the ligature-based letters for Deseret, I
think.
  
  
  
  And I'd argue that a good theoretical model of the Latin
script makes ä, ꞛ and aͤ the same character, distinguished
only by the font. 

  


The latter is patent nonsense, because ä and aͤ are even less
related to each other than "i" and "j"; never mind the fact that
their forms are both based on the letter "a". Encoding and font
choice should be seen as separate.

The priority in encoding has to be with allowing distinctions in
modern texts, or distinctions that matter to modern users of
historic writing systems. Beyond that, theoretical analysis of
typographical evolution can give some interesting insight, but I
would be in the camp that does not accord them a status as primary
rationale for encoding decisions.

Thus, critical need for contrasting use of the glyph distinctions
would have to be established before it makes sense to discuss this
further. 

I see no principled objection to having a font choice result in a
noticeable or structural glyph variation for only a few elements of
an alphabet. We have handle-a vs. bowl-a as well as hook-g vs.
loop-g in Latin, and fonts routinely select one or the other. (It is
only for usage outside normal text that the distinction between
these forms matters). 

While the Deseret forms are motivated by their pronunciation, I'm
not necessarily convinced that the distinction has any practical
significance that is in any way different than similar differences
in derivation (e.g. for long s-s or long-s-z for German esszett). 

In fact, it would seem that if a Deseret text was encoded in one of
the two systems, changing to a different font would have the
attractive property of preserving the content of the text (while not
preserving the appearance). This, in a nutshell, is the criterion
for making something a font difference vs. an encoding distinction.

A./


PS:

  

  This is complicated by combining characters mostly
identified by glyph, and the fact that while ä and aͤ may be
the same character across time, there are people wanting to
distinguish them in the same text today, and in both cases
the theoretical falls to the practical. In this case, there
are no combining character issues and there's nobody needing
to use the two forms in the same text. 
  

  

huh?


  



Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread James Kass
Asmus Freytag wrote,

> In the current case, you have the opposite,
> to wit, the text elements are unchanged, but
> you would like to add alternate code elements
> to represent what are, ultimately, the same
> text elements. That's not disunification, but
> dual encoding.

If spelling a word with an x+y string versus a z+y string represents
two different spellings of the same word, then hand printing the same
word with either an x/y ligature versus a z/y ligature also represents
two different spellings of the same word.

Best regards,

James Kass


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Martin J. Dürst

On 2017/03/26 22:15, Michael Everson wrote:



On 26 Mar 2017, at 09:12, Martin J. Dürst  wrote:


Thats a good point: any disunification requires showing examples of
contrasting uses.


Fully agreed.


The default position is NOT “everything is encoded unified until disunified”.


Neither it's "everything is encoded separately unless it's unified".



The characters in question have different and undisputed origins, undisputed.


If you change that to the somewhat more neutral "the shapes in question 
have different and undisputed origins", then I'm with you. I actually 
have said as much (in different words) in an earlier post.




We’ve encoded one pair; evidently this pair was deprecated and another pair was 
devised. The letters wynn and w are also used for the same thing. They too have 
different origins and are encoded separately. The letters yogh and ezh have 
different origins and are encoded separately. (These are not perfect analogies, 
but they are pertinent.)


Fine. I (and others) have also given quite a few analogies, none of them 
perfect, but most if not all of them pertinent.




We haven't yet heard of any contrasting uses for the letter shapes we are 
discussing.


Contrasting use is NOT the only criterion we apply when establishing the 
characterhood of characters.


Sorry, but where did I say that it's the only criterion? I don't think 
it's the only criterion. On the other hand, I also don't think that 
historical origin is or should be the only criterion.


Unfortunately, much of what you wrote gave me the impression that you 
may think that historical origin is the only criterion, or a criterion 
that trumps all others. If you don't think so, it would be good if you 
could confirm this. If you think so, it would be good to know why.




Please try to remember that. (It’s a bit shocking to have to remind people of 
this.


You don't have to remind me, at least. I have mentioned "usability for 
average users in average contexts" and "contrasting use" as criteria, 
and I have also in earlier mail acknowledged history as a (not the) 
criterion, and have mentioned legacy/roundtrip issues. I'm sure there 
are others.



Regards,   Martin.