Re: Fwd: Re: German sharp S uppercase mapping

2024-12-18 Thread Steffen Nurpmeso via Unicode
Steffen Nurpmeso wrote in
 <20241219021905.iTBjfSXk@steffen%sdaoden.eu>:
 |Asmus Freytag via Unicode wrote in
 | <0465b1d5-8bee-4844-80a5-86295623f...@ix.netcom.com>:
 ||On 12/18/2024 5:16 PM, Steffen Nurpmeso via Unicode wrote:
 ||> Doug Ewell via Unicode wrote in
 ||>  t\
 ||>   l\
 ||>   ook.com>:
 ||>|I wouldn’t have been as charitable; the following was completely \
 ||>|uncalled \
 ||>|for and demands an apology:
 ||>
 ||> man errno no longer contains EED, "the experienced user knows what
 ||> is wrong", as it did about 25 years ago.  So it remains "es-zed".
 ||> I have never used ed(1), let alone really.
 ||>
 ||This makes as much sense as your rather immature signature
 |
 |Ah.  Robert Gernhardt, he played well the German language.
 |Missed by many, me included.

..and to mention that the

 ||And in Fall, feel "The Dropbear Bard"s ball(s).

addition of is a contribution of a New Zealandean.  Who then asked
to have his name replaced, the rest remained as-is.  But i like
it.  (I think it is a bit too anglo-saxon in style for Robert
Gernhardt, but i thought he would have liked it, too.  It is no
German, but a New Zealandean Collar Bear, and so it seems they
have something brotherly in common!!)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: [semi-private] Fwd: Re: German sharp S uppercase mapping

2024-12-18 Thread Steffen Nurpmeso via Unicode
Doug Ewell wrote in
 :
 |You know perfectly well that my comment had nothing to do with whether \
 |the name “es-zed” or “es-zett” is accurate, and everything to do with \
 |your baseless accusation that Unicode uses that name “likely with mali\
 |cious anti-german intention.”

So you are saying it has nothing to do with ed(1) even?
That reminds me that the NATO was created "to keep the Russians
out and the Germans down below".  Whoever that me is.  Off-topic
for the marginalized Unicode list, anyway.  "Mein Wert-her" sounds
plush and delighting, by the way, i am happy that high quality
productions (as few as there are) of historic matter place value
in such things.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: Fwd: Re: German sharp S uppercase mapping

2024-12-18 Thread Steffen Nurpmeso via Unicode
Asmus Freytag via Unicode wrote in
 <0465b1d5-8bee-4844-80a5-86295623f...@ix.netcom.com>:
 |On 12/18/2024 5:16 PM, Steffen Nurpmeso via Unicode wrote:
 |> Doug Ewell via Unicode wrote in
 |>  l\
 |>   ook.com>:
 |>|I wouldn’t have been as charitable; the following was completely \
 |>|uncalled \
 |>|for and demands an apology:
 |>
 |> man errno no longer contains EED, "the experienced user knows what
 |> is wrong", as it did about 25 years ago.  So it remains "es-zed".
 |> I have never used ed(1), let alone really.
 |>
 |This makes as much sense as your rather immature signature

Ah.  Robert Gernhardt, he played well the German language.
Missed by many, me included.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: Fwd: Re: German sharp S uppercase mapping

2024-12-18 Thread Asmus Freytag via Unicode

On 12/18/2024 5:16 PM, Steffen Nurpmeso via Unicode wrote:

Doug Ewell via Unicode wrote in
  :
  |I wouldn’t have been as charitable; the following was completely uncalled \
  |for and demands an apology:

man errno no longer contains EED, "the experienced user knows what
is wrong", as it did about 25 years ago.  So it remains "es-zed".
I have never used ed(1), let alone really.


This makes as much sense as your rather immature signature

A./



Re: Fwd: Re: German sharp S uppercase mapping

2024-12-18 Thread Steffen Nurpmeso via Unicode
Doug Ewell via Unicode wrote in
 :
 |I wouldn’t have been as charitable; the following was completely uncalled \
 |for and demands an apology:

man errno no longer contains EED, "the experienced user knows what
is wrong", as it did about 25 years ago.  So it remains "es-zed".
I have never used ed(1), let alone really.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: German sharp S uppercase mapping

2024-12-16 Thread Christoph Päper via Unicode
Asmus Freytag:
> 
> All languages change over time and spelling tends to shed redundant or 
> inoperative letters, if and when they are no longer useful. That firmly 
> applies to the old-style th or tz in German, where the 'h' or 't' were 
> removed, respectively. These serve no particular purpose, unlike the silent 
> 'k' in English "knight", which distinguishes the word from "night", at least 
> in writing.

This is becoming really off-topic, but please let me clarify this widely 
repeated misconception. The German ‘th’ was indeed abolished at the turn of the 
20th century from the official orthography, only to be kept in (apparent) loan 
words, but it actually served a regular purpose that was just a bit more 
complex than most rules and can be phrased as a special case for one: 

Within a morphemic stem, an ‘h’ may be used canonically to show the 
increased length or stress of a single-letter vowel (including umlauts) and is 
placed immediately thereafter, *but* if there is a ‘t’ in that syllable 
(preferably) before or after the vowel – an ‘r’ or ‘l’ glide may arguably come 
in between –, it attracts this _Dehnungs-h_ to form a ‘th’ (or ‘Th’) digraph: 
_thun, that, Thäter; rathen, Rathhaus; Thresen; Werth_. If, however, the ‘h’ 
was needed to separate the vowel from an end schwa, realized as an ‘e’, the 
digraph is inhibited: _Truhe_, not _*Thrue_.

The transcribed theta in Greek loan words is treated exactly the same, and in 
most cases survived the reform 30 years ago, although they were on the 
shortlist to be dropped. This lead(s) to hurdles in learning to spell where 
it’s not naively possible, e.g. the ‘y’ in _Rhythmus_ is short (but stressed). 
Many German-speaking children in grammar school still go through a phase of 
misspelling many long vowels with an ‘h’ afterwards, e.g. _*Tühr_ instead of 
current _Tür_ and old _Thür_, but I believe they are less likely to do so if 
it’s _Türe_ in their regional dialect. 

It might have had been better – because simpler, more regular – over a century 
ago, to keep the ‘h’ but move it after the vowel consistently. That ship has 
sailed, though, as have others. 

RE: Fwd: Re: German sharp S uppercase mapping

2024-12-16 Thread Doug Ewell via Unicode
Asmus Freytag replied to Steffen Nurpmeso:

> The remainder of your post unfortunately descends into irrelevant and
> opinionated editorializing, so there's little constructive to add to
> in a reply.

I wouldn’t have been as charitable; the following was completely uncalled for 
and demands an apology:

>> [...] And then really, necessarily there is
>> not only eszett which is no(t) (longer) sz as the name es-zett
>> implies (wrongly referred to in Unicode, likely with malicious
>> anti-german intention)

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org




Re: Fwd: Re: German sharp S uppercase mapping

2024-12-16 Thread Asmus Freytag via Unicode

On 12/16/2024 1:19 PM, Steffen Nurpmeso via Unicode wrote:

--- Forwarded from Steffen Nurpmeso ---
Date: Mon, 16 Dec 2024 22:18:26 +0100
Author: Steffen Nurpmeso
From: Steffen Nurpmeso
To: Daphne Preston-Kendal
Subject: Re: German sharp S uppercase mapping
Message-ID:<20241216211826.Dt3h2MaC@steffen%sdaoden.eu>

Daphne Preston-Kendal via Unicode wrote in
  <939afa07-02ca-4980-b202-6374a3e99...@nonceword.org>:
  |On 2 Dec 2024, at 11:19, Marius Spix via Unicode wrote:
  ...
  |wrong in the first place.[.]

The thing is the reforms of the last republican century went into
the wrong direction.  For example around 1900 there was "Oele sind
ölig", which today looks odd but isn't it a more aesthetic
experience than having an uppercase Umlaut.  Necessarily the dots
above etc scratch at some upper baseline, astronomic.  This is not
a music sheet of Bach (or other opulent classical composer).
There was Kenntniss "Knowledge" as-you-speak which was mutilated
to Kenntnis but remained Kenntnisse in plural for only technical
reasons.
Thran became Tran, that is surely because the people started using
solely the Tran of Switzerland, Ovomaltine it is, is it, and then
you say "h, beautiful, Tran!!", instead.  Same for Werthe,
which became Werte,  ...


Sounds like a perfect argument based on conservatism for the sake of it. 
When typesetting required paper, extremely tight layouts were common to 
conserve resources. With very limited leading, there was little space to 
place diacritics on top of capitals. The solutions weren't universal, 
but for the umlaut it was not that rare to find a font that placed a 
small e *inside* the capital letter.


Nowadays, the designs tend to much larger interline spacing and there's 
no penalty for placing an accent or umlaut on top of a capital letter. 
So, the rationale for using a 'e' whether next to or as part of the 
capital letter shape have evaporated. Instead, since the times of the 
discussion in 1925 cited in an earlier message, the sentiment that a 1:1 
correspondence between lowercase and uppercase letter is not only normal 
but desirable has only increased.


All languages change over time and spelling tends to shed redundant or 
inoperative letters, if and when they are no longer useful. That firmly 
applies to the old-style th or tz in German, where the 'h' or 't' were 
removed, respectively. These serve no particular purpose, unlike the 
silent 'k' in English "knight", which distinguishes the word from 
"night", at least in writing.



... which is only understandable if you have no
value left except money, it necessarily must be Werthe in all
other thinkable occasions. And then really, necessarily there is
not only eszett which is no(t) (longer) sz as the name es-zett
implies (wrongly referred to in Unicode, likely with malicious
anti-german intention), but a dedicated character, and it is
absolutely necessary to provide uppercase font mappings that bring
it all to the top regarding intransparent obviousness, and with
LiberationMono font i know which of ßẞ is, actually, uppercase.
Or make it ſs.  I am fine with that.  It is better than todays
world where so-called academic intellectual elite magazines which
run their text through automatic spell checkers produce more typos
and "hanging in the nowhere" sentences that police allows.
Other than that this is solely a polemic private opinion for sure.


The remainder of your post unfortunately descends into irrelevant and 
opinionated editorializing, so there's little constructive to add to in 
a reply.


A./


Re: German sharp S uppercase mapping

2024-12-16 Thread Asmus Freytag via Unicode

On 12/16/2024 4:24 AM, Dominikus Dittes Scherkl via Unicode wrote:

Am 15.12.24 um 23:43 schrieb Asmus Freytag via Unicode:

On 12/15/2024 2:14 PM, Erik Carvalhal Miller via Unicode wrote:

all-caps Fraktur words are basically illegible


Which is an interesting digression in and of itself. Especially since
the Latin script had started out as ALL CAPS.


It's not a digression, it's a development. After minuscles were
available, uppercase letters became something rare and special, so first
chapter-initial letters were designed to be something unique and
outstanding, and then those letters replaced the standard-uppercase in
fraktur. But unique and outstanding things are nothing one expects to
occur in a row, so the design doesn't fit that usecase.
And I fully agree to this evolution, as nowadays YELLING at people is
deemed impolite, not sacred.



This side discussion is a digression from the main topic.

A./



Fwd: Re: German sharp S uppercase mapping

2024-12-16 Thread Steffen Nurpmeso via Unicode
--- Forwarded from Steffen Nurpmeso  ---
Date: Mon, 16 Dec 2024 22:18:26 +0100
Author: Steffen Nurpmeso 
From: Steffen Nurpmeso 
To: Daphne Preston-Kendal 
Subject: Re: German sharp S uppercase mapping
Message-ID: <20241216211826.Dt3h2MaC@steffen%sdaoden.eu>

Daphne Preston-Kendal via Unicode wrote in
 <939afa07-02ca-4980-b202-6374a3e99...@nonceword.org>:
 |On 2 Dec 2024, at 11:19, Marius Spix via Unicode  wrote:
 ...
 |wrong in the first place.[.]

The thing is the reforms of the last republican century went into
the wrong direction.  For example around 1900 there was "Oele sind
ölig", which today looks odd but isn't it a more aesthetic
experience than having an uppercase Umlaut.  Necessarily the dots
above etc scratch at some upper baseline, astronomic.  This is not
a music sheet of Bach (or other opulent classical composer).
There was Kenntniss "Knowledge" as-you-speak which was mutilated
to Kenntnis but remained Kenntnisse in plural for only technical
reasons.
Thran became Tran, that is surely because the people started using
solely the Tran of Switzerland, Ovomaltine it is, is it, and then
you say "h, beautiful, Tran!!", instead.  Same for Werthe,
which became Werte, which is only understandable if you have no
value left except money, it necessarily must be Werthe in all
other thinkable occasions.  And then really, necessarily there is
not only eszett which is no(t) (longer) sz as the name es-zett
implies (wrongly referred to in Unicode, likely with malicious
anti-german intention), but a dedicated character, and it is
absolutely necessary to provide uppercase font mappings that bring
it all to the top regarding intransparent obviousness, and with
LiberationMono font i know which of ßẞ is, actually, uppercase.
Or make it ſs.  I am fine with that.  It is better than todays
world where so-called academic intellectual elite magazines which
run their text through automatic spell checkers produce more typos
and "hanging in the nowhere" sentences that police allows.
Other than that this is solely a polemic private opinion for sure.
 -- End forward <20241216211826.Dt3h2MaC@steffen%sdaoden.eu>

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: German sharp S uppercase mapping

2024-12-16 Thread Dominikus Dittes Scherkl via Unicode

Am 15.12.24 um 23:43 schrieb Asmus Freytag via Unicode:

On 12/15/2024 2:14 PM, Erik Carvalhal Miller via Unicode wrote:

all-caps Fraktur words are basically illegible


Which is an interesting digression in and of itself. Especially since
the Latin script had started out as ALL CAPS.


It's not a digression, it's a development. After minuscles were
available, uppercase letters became something rare and special, so first
chapter-initial letters were designed to be something unique and
outstanding, and then those letters replaced the standard-uppercase in
fraktur. But unique and outstanding things are nothing one expects to
occur in a row, so the design doesn't fit that usecase.
And I fully agree to this evolution, as nowadays YELLING at people is
deemed impolite, not sacred.

--

Dominikus Dittes Scherkl



Re: German sharp S uppercase mapping

2024-12-15 Thread Asmus Freytag via Unicode

On 12/3/2024 9:37 AM, Christoph Päper via Unicode wrote:

Just for the record, one popular place I’ve noticed uppercased ẞ a lot lately 
is in Apple‘s Maps app where street names are always uppercased and, of course, 
often include “Straße”. Their soft keyboard on iOS etc. also has it in the S 
key popup.


Which means that they, for one, don't use Unicode's "default" for their 
"ToUpper" function?


Or do they get their uppercased names from some geographical names 
database? In which case we would have a large-scale use of ẞ


Worth tracking down if anyone has contacts to that team or know about 
such databases.


A./


Re: German sharp S uppercase mapping

2024-12-15 Thread Asmus Freytag via Unicode

On 12/15/2024 2:14 PM, Erik Carvalhal Miller via Unicode wrote:

all-caps Fraktur words are basically illegible


Which is an interesting digression in and of itself. Especially since 
the Latin script had started out as ALL CAPS.


A./


Re: German sharp S uppercase mapping

2024-12-15 Thread Erik Carvalhal Miller via Unicode
An instance of two‐tlecase.

On Sunday, December 15, 2024, Daniel Buncic via Unicode <
unicode@corp.unicode.org> wrote:

> The long ſ, which is only used in old Fraktur script, but not in
>>> modern Antiqua script, has the same issue.
>>>
>>
> Except that in Fraktur there are no all-caps words, so that the function
> to_upper() has no use in a Fraktur text anyway.
>
> (For example, a modern Bible printed in Antiqua might have words like GOTT
> ‘GOD’, JESUS, der HERR ‘the LORD’, etc. printed in all-caps to mark them as
> sacred.  In old Bibles printed in Fraktur, something like 𝔊𝔒𝔗𝔗,
> 𝔍𝔈𝔖𝔘𝔖 or 𝔡𝔢𝔯 ℌ𝔈ℜℜ is quite unthinkable because all-caps Fraktur
> words are basically illegible.  What you can find in old Bibles is the use
> of two capitals at the beginning to emphasize words, e.g. 𝔊𝔒𝔱𝔱,
> 𝔍𝔈ſ𝔲𝔰, 𝔡𝔢𝔯 ℌ𝔈𝔯𝔯.)
>
> Best wishes,
>
> Daniel
>
> --
> Prof. Dr. Daniel Bunčić
> ===
> Slavisches Institut der Universität zu Köln
> Weyertal 137, D-50931 Köln
> Telefon:   +49 (0)221  470-90535
> Sprechstunden: https://uni.koeln/ENZEB
> E-Mail:daniel.bun...@uni-koeln.de = dan...@buncic.de
> Threema:   https://threema.id/8M375R5K
> ===
> Homepage:  http://daniel.buncic.de/
> Academia:  http://uni-koeln.academia.edu/buncic
> ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
> ===
>
>


Re: German sharp S uppercase mapping

2024-12-15 Thread Daniel Buncic via Unicode

The long ſ, which is only used in old Fraktur script, but not in
modern Antiqua script, has the same issue.


Except that in Fraktur there are no all-caps words, so that the function 
to_upper() has no use in a Fraktur text anyway.


(For example, a modern Bible printed in Antiqua might have words like 
GOTT ‘GOD’, JESUS, der HERR ‘the LORD’, etc. printed in all-caps to mark 
them as sacred.  In old Bibles printed in Fraktur, something like 
𝔊𝔒𝔗𝔗, 𝔍𝔈𝔖𝔘𝔖 or 𝔡𝔢𝔯 ℌ𝔈ℜℜ is quite unthinkable because 
all-caps Fraktur words are basically illegible.  What you can find in 
old Bibles is the use of two capitals at the beginning to emphasize 
words, e.g. 𝔊𝔒𝔱𝔱, 𝔍𝔈ſ𝔲𝔰, 𝔡𝔢𝔯 ℌ𝔈𝔯𝔯.)


Best wishes,

Daniel

--
Prof. Dr. Daniel Bunčić
===
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:   +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:daniel.bun...@uni-koeln.de = dan...@buncic.de
Threema:   https://threema.id/8M375R5K
===
Homepage:  http://daniel.buncic.de/
Academia:  http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===



Re: German sharp S uppercase mapping

2024-12-14 Thread Daphne Preston-Kendal via Unicode
On 2 Dec 2024, at 11:19, Marius Spix via Unicode  
wrote:

> That problem is not not new. The long ſ, which is only used in old Fraktur 
> script, but not in modern Antiqua script, has the same issue. It shares its 
> uppercase form S with the round s, which behaves differently than the Greek 
> final Sigma ς and can appear mid-word, for example in compound words.
> 
> For example: to_lower(to_upper("Hauſtür")) returns "Haustür", which is 
> inaccurate.

‘Hauſtür’ – assuming it is intended to be the door of a house – is wrong in the 
first place. The round s is used at the end of words within compounds.

The rule of thumb: for native words, if pronounced [z] or [ʃ], use the round s; 
if [s], use the long s. (Non-native words were usually set by printing them in 
Antiqua anyway.)


Daphne




Re: German sharp S uppercase mapping

2024-12-03 Thread Steffen Nurpmeso via Unicode
Mark E. Shoulson via Unicode wrote in
 :
 |Thanks.  I freely admit my towering ignorance regarding German 
 |orthography and the history thereof.  Like I said, what I remember is 
 |people using (what I thought was?) this book (whatever its origin) as 
 |proof that there was no ẞ, and yet it had it on its cover.

But no, *i* find this funny!
Ie in many, especially cultural aspects the DDR was more like
"original" Germany, and if you look at the history of Germanist
discussions, i quoted one from Wikipedia in one of the first of my
too many posts, then isn't it funny to scream on the front page
"but hey, have a look, *here it is*!", while at the same time
being all-correct and submissive on the lengthy inside?  (Despite
that politicized Wikipedia shit which talks on niche details.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: German sharp S uppercase mapping

2024-12-03 Thread Steffen Nurpmeso via Unicode
Asmus Freytag via Unicode wrote in
 <060b3ae8-51ee-4626-b6e7-d71511cf3...@ix.netcom.com>:
 |On 12/3/2024 7:56 AM, Markus Scherer via Unicode wrote:
 |> I am looking for a noticeable increase in relative frequency, not 
 |> one-offs.
 |
 |Understood. But randomly encountered one-offs used to be extremely rare. 

I would not take photos from some BW-locale energy company
advertising for speak for Germany.  The first one could not be
opened here.

Having said that i am out of this thread.  But if it would be me
i would not do anything about uppercase eszett at the moment.
I do not encounter it in my visual experiences around here, at
least.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear


Re: German sharp S uppercase mapping

2024-12-03 Thread Mark E. Shoulson via Unicode
Thanks.  I freely admit my towering ignorance regarding German 
orthography and the history thereof.  Like I said, what I remember is 
people using (what I thought was?) this book (whatever its origin) as 
proof that there was no ẞ, and yet it had it on its cover.


~mark

On 12/3/24 6:34 AM, Andreas Prilop 🇮🇱 via Unicode wrote:

   Mark E. Shoulson wrote:


Große Duden
   The title “Der Große Duden” is from East Germany, not West Germany. 


Re: German sharp S uppercase mapping

2024-12-03 Thread Steffen Nurpmeso via Unicode
Asmus Freytag via Unicode wrote in
 <2cf97bde-3925-4893-833a-70bb66208...@ix.netcom.com>:
 |On 12/2/2024 2:37 PM, Steffen Nurpmeso via Unicode wrote:
 |> i want to point to that photo of
 |> Frankfurt/Main, where it is uppercase STRASSE
 |
 |In the labeling of the photo which likely was created on a typewriter 
 |(note monospaced font).
 |
 |I thought at first you had found an actual street sign.

No, unfortunately not.  I always thought such signs where in
Frakturschrift (also due to things like [1]),  but it seems most
"signs on the streets" (as opposed to street signs) were already
Antiqua.  I have a photo similar to the ones sent, from Berlin,
where on the market place there is an "Unfall Station" aka
"Accident Station" (aka first aid, ..likely).  However here in
Darmstadt at least there were, at least in the past, many "trimmed
to look traditional" street signs, and these were all Fraktur.
Looking around i see a collection of still remaining
Frakturschrift street signs from Munich [2], but they all use
normal casing and eszett (does not make sense to look thus, hm).

  [1] 
https://de.wikipedia.org/wiki/Fraktur_(Schrift)#/media/Datei:Scripts_in_Europe_(1901).jpg
  [2] 
https://upload.wikimedia.org/wikipedia/commons/thumb/9/9b/MuenchenKontaktabzugFrakturschriftStrassenschilder.jpg/320px-MuenchenKontaktabzugFrakturschriftStrassenschilder.jpg

(Typewriter i do not know, but some kind of "print" for sure.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear


Re: German sharp S uppercase mapping

2024-12-03 Thread Steffen Nurpmeso via Unicode
Asmus Freytag via Unicode wrote in
 <4f9fd023-2b9d-43fb-a026-0145b2c5a...@ix.netcom.com>:
 |On 12/3/2024 3:34 AM, Andreas Prilop via Unicode wrote:
 |> Mark E. Shoulson wrote:
 |>
 |>> Große Duden
 |> The title “Der Große Duden” is from East Germany, not West Germany.
 |>
 |And your point being?

The good guys lost (of course).
("Everbody knows" i think, Leonhard Cohen.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: German sharp S uppercase mapping

2024-12-03 Thread Christoph Päper via Unicode
Just for the record, one popular place I’ve noticed uppercased ẞ a lot lately 
is in Apple‘s Maps app where street names are always uppercased and, of course, 
often include “Straße”. Their soft keyboard on iOS etc. also has it in the S 
key popup. 


Re: German sharp S uppercase mapping

2024-12-03 Thread Asmus Freytag via Unicode

On 12/3/2024 7:56 AM, Markus Scherer via Unicode wrote:
I am looking for a noticeable increase in relative frequency, not 
one-offs.


Understood. But randomly encountered one-offs used to be extremely rare. 
And reliably spotting a capital form takes some expertise. Assuming I've 
been sensitized for that since Unicode 5.1, my own random sample shows 
increased frequency of anecdotal encounters in recent years, not just 
pre/post 5.1.


What we might need to sample other than publications would be  a 
representative subset of German typographers / designers to understand 
whether we are unintentionally putting roadblocks by the lack of more 
complete software support for capital sharp s.


A./


Re: German sharp S uppercase mapping

2024-12-03 Thread Asmus Freytag via Unicode

On 12/3/2024 5:59 AM, Andreas Prilop 🇮🇱 via Unicode wrote:

Before 1990, a special sort of capital ß has sometimes been used
in East Germany, but not in West Germany.


That applies to the title of that particular work, but IIRC we had 
examples from everywhere in the documentation for the proposed encoding.


And modern examples, like the one I shared, are not from the former East 
Germany or its modern successor regions.


A./


Re: German sharp S uppercase mapping

2024-12-03 Thread Markus Scherer via Unicode
On Tue, Dec 3, 2024 at 12:05 AM Daniel Buncic via Unicode <
unicode@corp.unicode.org> wrote:

> Thank you very much for the idea.  I could certainly sum up the
> arguments of the discussion (though I’m too busy to do it right now, you
> would have to have a few weeks’ patience), but I still haven’t
> understood where in the CLDR such casing information is stored.


CLDR has "transform" data for case mappings, but practical case mapping
functions, as implemented in libraries like ICU and ICU4X, are very
low-level, and hardcode exceptional cases.

I have implemented most of the ICU case mapping/case folding functions.
Their behavior mostly follows the Unicode Standard core spec and data files
-- including what SpecialCasing.txt says but handling most of that in code.
Over time, we have added and refined language-specific case mappings for
Dutch (IJ), Armenian (a ligature that has gotten reinterpreted), and modern
Greek ("drop accents" but with exceptions on exceptions).

There is a CLDR ticket for documenting all of this in the CLDR spec (UTS
#35).

For uppercasing ß to ẞ rather than SS, we have this ticket:
https://unicode-org.atlassian.net/browse/CLDR-17624
I have added some information there from this thread.

And that is why I am engaging in this thread and looking for evidence that
ẞ is replacing SS (and ß) in German all-uppercase text.
I am looking for a noticeable increase in relative frequency, not one-offs.

Another way to approach this, also discussed in that ticket, is to add some
kind of explicit option that lets one choose the uppercasing behavior of ß.
Given how low-level uppercase functions are and what limited inputs they
take, that is also not an easy problem.
It might in some ways be easier if the new behavior had become widespread
already, so that implementers could just change their code for most
contexts.

Viele Grüße,
markus


Re: German sharp S uppercase mapping

2024-12-03 Thread Andreas Prilop 🇮🇱 via Unicode
Asmus Freytag wrote:

>> The title “Der Große Duden” is from East Germany, not West Germany. 
>
> And your point being?

Non-German readers on this list may not know that there were
two different “Dudens” with diverging guidelines.

https://en.wikipedia.org/wiki/Duden#East_German_Duden_%28Leipzig%29



Re: German sharp S uppercase mapping

2024-12-03 Thread Andreas Prilop 🇮🇱 via Unicode
Asmus Freytag wrote:

>> This book is from East Germany.
>> It was never written in this way in West Germany.
>
> And your point being?

Before 1990, a special sort of capital ß has sometimes been used
in East Germany, but not in West Germany.

https://de.wikipedia.org/wiki/Duden#13.%E2%80%9319._Auflage_%281947%E2%80%931991%29



Re: German sharp S uppercase mapping

2024-12-03 Thread Asmus Freytag via Unicode

On 12/3/2024 3:34 AM, Andreas Prilop via Unicode wrote:

Mark E. Shoulson wrote:


Große Duden

The title “Der Große Duden” is from East Germany, not West Germany.


And your point being?

A./


Re: German sharp S uppercase mapping

2024-12-03 Thread Asmus Freytag via Unicode

On 12/3/2024 1:49 AM, Andreas Prilop 🇮🇱 via Unicode wrote:

The East German Duden (21. Auflage, 1980) says:

Regel 41
Das Schriftzeichen ß fehlt als Großbuchstabe. Es wird ersetzt durch SS
oder, falls Mißverständnisse möglich sind, durch SZ.

The wording is interesting, because now that a capital letter has been 
encoded, the premise is moot.


A./


Re: German sharp S uppercase mapping

2024-12-03 Thread Asmus Freytag via Unicode

On 12/3/2024 1:22 AM, Andreas Prilop via Unicode wrote:

This book is from East Germany.
It was never written in this way in West Germany.


And your point being?

A./


Re: German sharp S uppercase mapping

2024-12-03 Thread Andreas Prilop 🇮🇱 via Unicode
Mark E. Shoulson wrote:

> Große Duden

The title “Der Große Duden” is from East Germany, not West Germany.



Re: German sharp S uppercase mapping

2024-12-03 Thread Andreas Prilop 🇮🇱 via Unicode
The East German Duden (21. Auflage, 1980) says:

Regel 41
Das Schriftzeichen ß fehlt als Großbuchstabe. Es wird ersetzt durch SS
oder, falls Mißverständnisse möglich sind, durch SZ.



Re: German sharp S uppercase mapping

2024-12-03 Thread Andreas Prilop 🇮🇱 via Unicode
Julian Bradfield wrote:

> https://www.megaknihy.cz/nezarazeno/416-der-grosse-duden.html

This book is from East Germany.
It was never written in this way in West Germany.



Re: German sharp S uppercase mapping

2024-12-03 Thread Julian Bradfield via Unicode
On 2024-12-03, Mark E. Shoulson via Unicode  wrote:
> I remember when the debate about adding ẞ was ongoing here on this 
> list.  There were lots of old fonts shown which had a distinct uppercase 
> ß.  I remember that some insisted there was no such letter, pointing to 
> the pronouncements of the then-current Große Duden, which said that ß 
> capitalizes to SS, and yet the cover of *that very book*, in all-capital 
> letters an inch high, clearly showed its title as DER GROẞE DUDEN.

This intrigued me, so to save others searching, here's a photo of the
1957 edition:

https://www.megaknihy.cz/nezarazeno/416-der-grosse-duden.html



Re: German sharp S uppercase mapping

2024-12-03 Thread Daniel Buncic via Unicode

Am 03.12.2024 um 02:51 schrieb Asmus Freytag via Unicode:

Rather than getting hung up on details of parsing one particular
part of one sentence, it would be more useful from Unicode's
perspective if someone (Daniel?) could sum up in a short document
base on this discussion where Unicode is behind the curve and to
make sure the support in CLDR is up to actual current practice and
not what it was 10 or 15 years ago.


Thank you very much for the idea.  I could certainly sum up the 
arguments of the discussion (though I’m too busy to do it right now, you 
would have to have a few weeks’ patience), but I still haven’t 
understood where in the CLDR such casing information is stored.  There 
are data subsets that have “casing” in the title, but they only say 
whether the days of the week, month names, language names, etc. are 
capitalized in a certain language.  There is a field called “main 
examplars” that contains all the small letters (for German, including ß) 
and another field called “index examplars”, which for German does not 
even include Ä, Ö, and Ü.  I surmise that this is only meant for 
numbering items using letters (where indeed you can have parts A, B, C, 
etc. of a book, but you would never have a “part Ä”).  I cannot find any 
information saying something like a ↔ A, b ↔ B, etc.


For Turkish (https://www.unicode.org/cldr/charts/46/summary/tr.html), 
the “main letters” in the very first line are given as


[a b c ç d e f g ğ h ı iİ j k l m n o ö p r s ş t u ü v y z].

So there i and its capital counterpart İ are not separated by a space. 
But for German (https://www.unicode.org/cldr/charts/46/summary/de.html), 
the “main letters” are


[aä b c d e f g h i j k l m n oö p q r s ß t uü v w x y z],

where the missing space does not imply capitalization, so I guess 
changing this list to “… s ßẞ t …” would not automatically inform people 
that ß should be capitalized as ẞ.


In 
https://www.unicode.org/versions/Unicode16.0.0/UnicodeStandard-16.0.pdf 
on page 198 I find:
“Examples of case tailorings which are not covered by data in 
SpecialCasing.txt include: […] Uppercasing of U+00DF ‘ß’ LATIN SMALL 
LETTER SHARP S to U+1E9E LATIN CAPITAL LETTER SHARP S[.] The preferred 
mechanism for defining tailored casing operations is the Unicode Common 
Locale Data Repository (CLDR), https://cldr.unicode.org, where 
tailorings such as these can be specified on a per-language basis, as 
needed.”  So the idea is already there.  On page 295 the problem with ß 
is addressed in detail, and right underneath it says, “Additional 
language-specific or orthography-specific contexts and casing behavior 
is specified in the Unicode Common Locale Data Repository (CLDR), 
https://cldr.unicode.org.”  So does this already exist?  Or where does 
it have to be added?


Can anybody help?

Best wishes,

Daniel

--
Prof. Dr. Daniel Bunčić
===
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:   +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:daniel.bun...@uni-koeln.de = dan...@buncic.de
Threema:   https://threema.id/8M375R5K
===
Homepage:  http://daniel.buncic.de/
Academia:  http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===


Re: German sharp S uppercase mapping

2024-12-02 Thread Asmus Freytag via Unicode

(stuck in my outbox for a bit)
On 11/27/2024 5:12 AM, Daniel Buncic via Unicode wrote:

Am 27.11.2024 um 13:25 schrieb Otto Stolz via Unicode:

So, the wording of the sentence has been reversed, but the example
is given in the same order as in the previous version.


I will stop discussing the interpretation of this sentence now, but 
this is interesting: In the pdf version on the website of the 
orthography council 
(https://www.rechtschreibrat.com/DOX/RfdR_Amtliches-Regelwerk_2024.pdf, 
p. 48) it is “STRAẞE – STRASSE”.  On the website of the IDS (Institute 
for the German language), which is supposed to have just an HTML 
version of the same text, it is “STRASSE – STRAẞE”. Obviously some 
kind of copy-paste error.


Best wishes,

Daniel


Stepping back, it is clear that there has been  a lengthy transition 
here. Up to some point in the past, the use of capital sharp S was 
limited to environments which had an extended typeface support and some 
control over selecting glyphs whether manually or with some mechanism 
other than invoked by a character code.


That is trivially true, because at some point, no character encoding 
existed for the capital form (nor was it part of manual keyboards). 
Nevertheless, it was used to some degree, as has been documented, even 
though it wasn't feasible to suggest or mandate its use in standard or 
default capitalization.


Later, the character was encoded and became supported in fonts and 
keyboards. Initially, the reaction of the rule makers was to allow it as 
an alternative to SS.


Since then, the support has become more widespread. It now appears that 
this is leading to a shift towards a stance from the "descriptive" rule 
makers that acknowledges the fact that the use of this character is no 
longer fundamentally limited.


The attached image, if it comes through, shows the latest use that I 
happened to catch a few days ago.


Rather than getting hung up on details of parsing one particular part of 
one sentence, it would be more useful from Unicode's perspective if 
someone (Daniel?) could sum up in a short document base on this 
discussion where Unicode is behind the curve and to make sure the 
support in CLDR is up to actual current practice and not what it was 10 
or 15 years ago.


As part of this, the clarification of the difference between stable 
identifier-safe casing and up-to-date text processing needs to be 
addressed. The problem report should explain the distinction and, if 
possible, list places in the text that need to be fixed, but fully 
worked out language isn't a requirement (I'm sure the properties groups 
and the editorial WG will have their own ideas on wording).


However, any perceived shortcomings in existing CLDR support should be 
noted.


A./

PS: one downside of the "SS" fallback is that it tends to interfere with 
the use of "ss" over "ß" in indicating the length of the preceding 
vowel. This is a consequence of the reform taken 30 years ago, which has 
been in use long enough to introduce the expectation for many readers 
that "SS" follows a short vowel, something that makes the use of capital 
sharp s more natural.




Re: German sharp S uppercase mapping

2024-12-02 Thread Asmus Freytag via Unicode

On 12/2/2024 2:37 PM, Steffen Nurpmeso via Unicode wrote:

i want to point to that photo of
Frankfurt/Main, where it is uppercase STRASSE


In the labeling of the photo which likely was created on a typewriter 
(note monospaced font).


I thought at first you had found an actual street sign.

A./


Re: German sharp S uppercase mapping

2024-12-02 Thread Asmus Freytag via Unicode

On 12/2/2024 2:40 PM, Steffen Nurpmeso via Unicode wrote:

There are also umlauts
where for example the dots of the U are "inside it", or mostly
look only like a line inside it.  But i only know that for U.)


There are also fonts that have an "e" overlaid on some white space 
inside the capital letter.


I remember seeing that in a Fraktur font long ago, well before I even 
knew what a typeface was.


A./


Re: German sharp S uppercase mapping

2024-12-02 Thread Asmus Freytag via Unicode

On 12/2/2024 5:07 PM, Mark E. Shoulson via Unicode wrote:
I remember when the debate about adding ẞ was ongoing here on this 
list. There were lots of old fonts shown which had a distinct 
uppercase ß.  I remember that some insisted there was no such letter, 
pointing to the pronouncements of the then-current Große Duden, which 
said that ß capitalizes to SS, and yet the cover of *that very book*, 
in all-capital letters an inch high, clearly showed its title as DER 
GROẞE DUDEN.


We can find the discussion in the list archives somewhere. Suffice to 
say, ẞ does seem to be a real thing and seems to have been so even 
before it was recognized.


~mark


Some of us remember those discussions.

And the evidence introduced at the time.

A./


Re: German sharp S uppercase mapping

2024-12-02 Thread Markus Scherer via Unicode
On Mon, Dec 2, 2024 at 5:11 PM Mark E. Shoulson via Unicode <
unicode@corp.unicode.org> wrote:

> We can find the discussion in the list archives somewhere. Suffice to
> say, ẞ does seem to be a real thing and seems to have been so even
> before it was recognized.
>

No one here questions whether it's the real thing.

My question is whether it is in customary use, with higher frequency than
SS, in all-caps text where there is nothing holding the
authors/publishers/producers/advertisers back from using it.

What I have been able to find so far points to some use, but less use than
SS or ß.
I am looking forward to convincing evidence of the opposite.

markus


Re: German sharp S uppercase mapping

2024-12-02 Thread Mark E. Shoulson via Unicode
I remember when the debate about adding ẞ was ongoing here on this 
list.  There were lots of old fonts shown which had a distinct uppercase 
ß.  I remember that some insisted there was no such letter, pointing to 
the pronouncements of the then-current Große Duden, which said that ß 
capitalizes to SS, and yet the cover of *that very book*, in all-capital 
letters an inch high, clearly showed its title as DER GROẞE DUDEN.


We can find the discussion in the list archives somewhere. Suffice to 
say, ẞ does seem to be a real thing and seems to have been so even 
before it was recognized.


~mark



Re: German sharp S uppercase mapping

2024-12-02 Thread Steffen Nurpmeso via Unicode
Marius Spix wrote in
 :
 |There are other examples, for example the street names Oelmühlenstraße \
 |in Bielefeld and An der Oelmühle in Straelen. (In some town names like \
 |Straelen or Baesweiler, the ae is 
 |correct. This is a strech e, similar in English bid/bead.)

We got a logical and interesting response for why this
hand-painted plaque looked like it did.  (There are also umlauts
where for example the dots of the U are "inside it", or mostly
look only like a line inside it.  But i only know that for U.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: German sharp S uppercase mapping

2024-12-02 Thread Steffen Nurpmeso via Unicode
Asmus Freytag via Unicode wrote in
 :
 |On 12/2/2024 1:29 PM, Daniel Buncic via Unicode wrote:
 |> write aufwändiger Missstand instead of aufwendiger Mißstand. 
 |
 |/ I vote for /aufwendiger Missstand.

I need to aufwenden some time to transzendent that.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: German sharp S uppercase mapping

2024-12-02 Thread Steffen Nurpmeso via Unicode
Hello.

Asmus Freytag wrote in
 :
 |On 11/30/2024 4:44 PM, Steffen Nurpmeso wrote:
 |> Doug Ewell via Unicode wrote in
 |>  l\
 |>   ook.com>:
 |>|Thanks to Asmus for saying what I had planned to say, except that his \
 |>|was better-worded, more carefully put together, and more authoritative.
 |>|
 |>|Casing for text meant for human readers should follow current local \
 |>|conventions.
 |>|
 |>|Casing for text meant for machine processing (file systems, databases, \
 |>|etc.) must remain stable, even when local conventions change.
 |>
 |> Sorry that makes totally no sense to me.
 |
 |Because you probably have never considered what havoc would be caused by 

Well this thread has become over my head anyway.  I am not
a linguist, not a Germanist, i have not followed this path;
It is only i feel pain if poets -- so-called, and not so-called,
this gets philosophic now, and political, in modern times, when
one can say the most terrible aggressive nonsense and wish death
to people, and all that without being silenced, whereas even the
world-wide known Documenta art exhibition had to first cover and
then remove an image which was interpreted as being antisemitic
(in a crowd of policemen with helmets and baton one could (well,
a bit, but indeed) be seen aka viewed as a Jew aka Israel, and we
know they are not policemen.  It *could* be that was what they did
not like.), start to decompose the German language.
It always changed, over the centuries, the "Teutsch" that it was
centuries ago is very different from what is known as German
today.  That much is plain.  Very different.
Today too much english is entering, and i for one still totally
dislike the last spelling reform, it was one of my "turn off head
here" moments.  (Like, about almost a decade before that one, i as
a very good pupil who loved to learn Latin went on the "tin drum"
path and turned away from Latin, once a CDU (right republican)
state secretary gave a talk show interview in all Latin, and
i still can here the wonderful Lea Rosh (jew btw) say "*Ich*
verstehe Sie, aber was ist mit den Menschen da draußen?" (i
understand you, but what is with the people out there?), so, not
to "waver in the wind like the grain" (is that Binding?),
i stopped supporting Latin, which was a costly decision.  Anyhow.

 |changing in mid-stream anything about the processing of identifiers and 
 |similar strings that are not text, but unambiguous references to 
 |domains, files or other resources.

But that .. not.  You know, if Unicode aware software *really*
uses code points with such special properties in a form that can
lead to ambiguities later on, it is just more dumb than i am.
And .. is it??

 |(You can make some changes when you create a new domain and set rules 
 |for it for the first time, but that's about it).
 |
 |> I would, however, not bring in uppercase sharp S for quite some
 |> time.  But at some time, or when really the SS would be banned "in
 |> all Germans" which are used as official languages, sooner that is,
 |> then the current Unicode data would be just wrong.
 |
 |Good luck banning SS in "all Germans".

I never had the intention.  If the Swiss people like to write
Missstand, they can do so.  It is likely easier to speak out than
Mißstand when your belly is full of Raclette cheese!!!

 |You probably did not consider the fact that German written in 
 |Switzerland only uses ss and SS and therefore banning SS would be more 
 |than confusing. It would needless compromise a well accepted local 
 |orthography.

No!!!  I have been well introduced to the Swiss:

  https://www.youtube.com/watch?v=UAKk8IcLo18

 |German written in Germany has undergone a transition, and as partially 
 |as a result of / but mostly in parallel to that transition, we see a 
 |secondary shift in capitalization away from a typewriter / telegraph 
 |inspired fallback to a more natural way of writing where all letters 
 |have a case pair and casing is reversible.

Hm.  Who says that?  I think, but i definetely would hope for,
after waiting some years, some sense is coming back, you know.
Frenziness, greed, hypocrisy, lies, and all that, that is not
a sane foundation for a society.  That, who would have believed
*that*, is already written down in "the book" of the (jews and)
christians.  Really.  I hope for some kind of re-re-education.
At the moment, however, it looks as if that could then lead into
the totally wrong direction, from my very own point of view, of
course.  Regardless, i think good German will have a renaissance.
(Away from "Fack ju Göhte".  To "more light!", to say with Göthe.)

 |  The same cannot be said for Swiss German. There's not been a similar 
 |transition, which would have needed to start with lower case usage. As 
 |long as the Swiss don't use ß, I can't see them banning "SS".

Yes, in my opinion Unicode cannot (should not that is) do anything
unless this is unambiguous.  It seems there are still swiss humans
alive who have grown up with that letter.(?)
I

Re: German sharp S uppercase mapping

2024-12-02 Thread Asmus Freytag via Unicode

On 12/2/2024 1:29 PM, Daniel Buncic via Unicode wrote:
write aufwändiger Missstand instead of aufwendiger Mißstand. 


/ I vote for /aufwendiger Missstand.

A./
//


Re: German sharp S uppercase mapping

2024-12-02 Thread Markus Scherer via Unicode
On Mon, Dec 2, 2024 at 1:15 PM David Starner via Unicode <
unicode@corp.unicode.org> wrote:

> Secondly, is there a position that ß should be used in uppercase
> contexts, especially as opposed to using ẞ? If there's absolutely no
> such movement, I think it clear that ß should be counted as a glyph
> variant of ẞ in uppercase contexts.


There is no "movement". It has long been one of the in-use spellings, for
when people wanted to disambiguate and the new capital version didn't yet
exist, or for whatever reasons if in new publications.

Characters can be displayed in a variety of glyphs, but claiming that even
if it uses a glyph in the range of character x it is "intended" to actually
be character y which has a different range of glyphs, destroys its
character identity.
If you do that, then all bets are off. Who is to say that any of the
uppercase-looking things are actually glyphs for uppercase characters? They
might as well be glyph variations for their lowercase characters.

markus


Re: German sharp S uppercase mapping

2024-12-02 Thread Asmus Freytag via Unicode

On 12/2/2024 1:12 PM, David Starner via Unicode wrote:

On Sun, Dec 1, 2024 at 11:27 PM Asmus Freytag via Unicode
 wrote:

What you are arguing is that one should not use that fallback any
longer. I have no arguments with that, but in this case, the fallback
was used.

Let me break it down into two points. Starting with the less
controversial, when counting the use of the capital ẞ, one should
count ß in uppercase contexts separately from SS.

Wait, what are we counting now?

Secondly, is there a position that ß should be used in uppercase
contexts, especially as opposed to using ẞ?


I've stated, as fact, that the example shows a fallback using the 
lowercase in an ALL CAPS context. This fallback was discussed 
extensively as part of the background research for the proposal to 
encode the uppercase form. Therefore, the fallback is not a simple typo, 
but something that was practiced and perhaps even recommended by some 
(at the time).


As you quote, I agreeing with whoever I replied to that the fallback has 
outlived its usefulness. So your question here is disingenuous.



If there's absolutely no
such movement, I think it clear that ß should be counted as a glyph
variant of ẞ in uppercase contexts.


Different letters aren't glyph variants of each other, they are 
alternate spellings.


I have no issue with acknowledging that alternate spellings exist in 
this context (ALL CAPS). Incidentally, SS is also one of the alternate 
spellings.


I would be happy if things settled to where the single capital letter 
becomes the preferred spelling. But that's different from reading a 
lowercase letter "as if" it were the uppercase one.



  Fallbacks like that are almost
always normalized; older texts usually have long-s turned to s and
scriptorial abbreviations expanded when published, for example. If
there is a serious movement against ẞ and for ß as uppercase, then I'm
wrong. I'm certainly biased towards having neat upper-lowercase pairs.


Yes, and strawman argument. Nobody is asking for a "movement against ẞ 
and for ß as uppercase ", but we are arguing against calling a lowercase 
letter an uppercase letter (in a specific example where the glyph 
clearly marks it as the lowercase one).


Pretending that alternate spellings aren't used is not helpful. And it's 
not required as a precondition to having a preferred spelling different 
from the example we discussed.


A./





Aw: Re: German sharp S uppercase mapping

2024-12-02 Thread Marius Spix via Unicode
There are other examples, for example the street names Oelmühlenstraße in Bielefeld and An der Oelmühle in Straelen. (In some town names like Straelen or Baesweiler, the ae is correct. This is a strech e, similar in English bid/bead.)


Gesendet: Montag, 2. Dezember 2024 um 21:06
Von: "Steffen Nurpmeso via Unicode" 
An: "Daniel Buncic via Unicode" 
Betreff: Re: German sharp S uppercase mapping

Daniel Buncic via Unicode wrote in<6f429d01-882c-403c-a163-4ab66bda2...@uni-koeln.de>:|Am 01.12.2024 um 04:15 schrieb Markus Scherer via Unicode:|> As a library implementer and German speaker, I have been looking out|> for the supposed sea change in usage, and haven't seen it.||There are three things that make the change less visible. First, domain |names and other ASCII environments as well as stylistic devices. I live |in a town called Brühl and work in a city called Köln (which has its own |top-level domain, .koeln). You see a surprising number of signage, |logos, etc. which spell the names as “Bruehl” or “Koeln”:| https://www.cvjm.koeln/ueber-uns/175-jahre.html| https://mhi-koeln.de/| http://kleinbahn.koeln/| https://koeln-weekend.de/| https://www.ebay.de/str/koelnartkunsthandelNot surprising but tradition.Please let me attach a colourized photographie of Köln from whenGermany still had an emperor, with hand painted plaques, and onthe plaque of the ferry (beside the ponton bridge) one can read"Ueberfahrt nach Köln". (A bit size reduced but readable.)...|A few years ago, the number of capital ẞ you could see was exactly zero. | Now they are popping up more and more. For the above reasons, they |are not the majority yet, but they are increasing fast. Language change |is happening in front of our eyes.Because you get urged by the wind, that is the reason.And yes, it gets more schizophrenic now you can use this letterbut have to write aufwändiger Missstand instead of aufwendigerMißstand.But nice to read your fluent german, i do not often speak in mynative language!--steffen||Der Kragenbaer, The moon bear,|der holt sich munter he cheerfully and one by one|einen nach dem anderen runter wa.ks himself off|(By Robert Gernhardt)||And in Fall, feel "The Dropbear Bard"s ball(s).||The banded bear|without a care,|Banged on himself for e'er and e'er||Farewell, dear collar bear


Re: German sharp S uppercase mapping

2024-12-02 Thread Daniel Buncic via Unicode

Am 02.12.2024 um 21:06 schrieb Steffen Nurpmeso via Unicode:

Not surprising but tradition. Please let me attach a colourized
photographie of Köln from when Germany still had an emperor, with
hand painted plaques, and on the plaque of the ferry (beside the
ponton bridge) one can read "Ueberfahrt nach Köln".  (A bit size
reduced but readable.)


A very nice photograph.  However, the case with “Ueberfahrt” is a 
completely different one.  That has to do with the fact that in old 
letterpress printing there was no space for diacritics above capitals. 
This is why Ü had to be replaced with Ue, just like a) in Czech until 
the 19th century, Cz was written instead of Č, Sſ instead of Š, etc., b) 
Polish Ż was often simply written as Z without the dot or replaced with 
Ƶ, which is still often written in handwriting, c) there is an official 
rule in French orthography to this day that you do not have to place 
accent marks on capitals, which means that you can choose between “Etat” 
and “État”, d) Greek diacritics are placed before capitals, e.g. Άθως 
for the mountain Athos (where Áθως would simply be wrong), e) in Italian 
you can often see things like CAFFE’ instead of CAFFÈ, etc.  But Köln, 
with a small ö, is spelled “Köln” on the same sign; there was no reason 
for “Koeln”.  The modern examples I gave are different, they have 
“Koeln” or even “koeln” for completely different reasons.  And these 
reasons are the same as for writing ss instead of ß in many similar 
cases; that’s why I brought them up.



And yes, it gets more schizophrenic now you can use this letter but
have to write aufwändiger Missstand instead of aufwendiger Mißstand.


I don’t think we have to discuss the spelling reform of 1998 here...

Best wishes,

Daniel

--
Prof. Dr. Daniel Bunčić
===
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:   +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:daniel.bun...@uni-koeln.de = dan...@buncic.de
Threema:   https://threema.id/8M375R5K
===
Homepage:  http://daniel.buncic.de/
Academia:  http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===



Re: German sharp S uppercase mapping

2024-12-02 Thread David Starner via Unicode
On Sun, Dec 1, 2024 at 11:27 PM Asmus Freytag via Unicode
 wrote:
> What you are arguing is that one should not use that fallback any
> longer. I have no arguments with that, but in this case, the fallback
> was used.

Let me break it down into two points. Starting with the less
controversial, when counting the use of the capital ẞ, one should
count ß in uppercase contexts separately from SS.

Secondly, is there a position that ß should be used in uppercase
contexts, especially as opposed to using ẞ? If there's absolutely no
such movement, I think it clear that ß should be counted as a glyph
variant of ẞ in uppercase contexts. Fallbacks like that are almost
always normalized; older texts usually have long-s turned to s and
scriptorial abbreviations expanded when published, for example. If
there is a serious movement against ẞ and for ß as uppercase, then I'm
wrong. I'm certainly biased towards having neat upper-lowercase pairs.

-- 
The standard is written in English . If you have trouble understanding
a particular section, read it again and again and again . . . Sit up
straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185
(1991)



Re: German sharp S uppercase mapping

2024-12-02 Thread Steffen Nurpmeso via Unicode
Otto Stolz wrote in
 <499cca63-5d54-495d-960c-fada63900...@uni-konstanz.de>:
 |am 2024-12-01 um 1:44 Uhr hat Steffen Nurpmeso geschrieben:
 |> Yes, and Straße/STRASSE is such a thing if there is Gasse/GASSE
 |> but which just has the same "S-sound", and always had since the
 |> earth existsed (1972).  Was it Gaße, ever? 
 |
 |No, never.
 |
 |In German orthography, double consonants mark the preceding vowel as 
 |being short (if there isn’t just a mere co-incidence in a compound,
 |e. g. “Mausschwanz” (mouse tail)). As the “a” in “Straße” is long,
 |you write “ß”; as the “a” in “Gasse” is short, you write “ss”.
 |Cf. 
 |and .

Thank you very much.  I really lost all my knowledge except what
"came in with the mother milk", so to say.
There exist names "Gaß", and the community Külz has a district
"Gaß".  Funnily, maybe, if i search Google for Külz and use the
Map to zoom in, one sees "Gass", but as you zoom in, very nearby,
"Gass" changes to "Gaß".  Haha!  That one goes to Doug Ewell!!

  
https://www.google.com/maps/place/55471+K%C3%BClz+(Hunsr%C3%BCck)/@50.0034197,7.5056758,16z/data=!4m6!3m5!1s0x47be11df4212946b:0xfb88c8d936405a1e!8m2!3d50.0061079!4d7.4893107!16s%2Fm%2F02z3fnp?entry=ttu&g_ep=EgoyMDI0MTEyNC4xIKXMDSoASAFQAw%3D%3D

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: German sharp S uppercase mapping

2024-12-02 Thread Steffen Nurpmeso via Unicode
David Starner via Unicode wrote in
 :
 |On Sun, Dec 1, 2024 at 12:35 PM Markus Scherer via Unicode
 | wrote:
 ...
 |[.]Searching Books for Der
 |große Gatsby shows 12 or 13 distinct covers, 6 with DER (or Der)
 |GROSSE GATSBY on the cover, three with DER (or Der) GROẞE GATSBY, and
 |4 with lowercase titles. A few dated back to 2006, so it's not a
 |trivial sample of modern covers.

And please do not use that book as an example, it surely was
mentioned maliciously given that it is a short book that flies by
in a rush, as in a state of euphoria, or better even intoxication.
It is an examplary of anglo-saxon lifestyle, and even though
Germany has become a hundred percent vassal that throws much more
money over the ocean for false things than is healthy or advisable
(just to name Monsanto), under the surface there is a long
history that points to different things.

 |The standard is written in English . If you have trouble understanding
 |a particular section, read it again and again and again . . . Sit up
 |straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185
 |(1991)

Ah!  I see.  No gotos.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: German sharp S uppercase mapping

2024-12-02 Thread Steffen Nurpmeso via Unicode
Daniel Buncic via Unicode wrote in
 <6f429d01-882c-403c-a163-4ab66bda2...@uni-koeln.de>:
 |Am 01.12.2024 um 04:15 schrieb Markus Scherer via Unicode:
 |> As a library implementer and German speaker, I have been looking out
 |> for the supposed sea change in usage, and haven't seen it.
 |
 |There are three things that make the change less visible.  First, domain 
 |names and other ASCII environments as well as stylistic devices.  I live 
 |in a town called Brühl and work in a city called Köln (which has its own 
 |top-level domain, .koeln).  You see a surprising number of signage, 
 |logos, etc. which spell the names as “Bruehl” or “Koeln”:
 |   https://www.cvjm.koeln/ueber-uns/175-jahre.html
 |   https://mhi-koeln.de/
 |   http://kleinbahn.koeln/
 |   https://koeln-weekend.de/
 |   https://www.ebay.de/str/koelnartkunsthandel

Not surprising but tradition.
Please let me attach a colourized photographie of Köln from when
Germany still had an emperor, with hand painted plaques, and on
the plaque of the ferry (beside the ponton bridge) one can read
"Ueberfahrt nach Köln".  (A bit size reduced but readable.)

  ...
 |A few years ago, the number of capital ẞ you could see was exactly zero. 
 |  Now they are popping up more and more.  For the above reasons, they 
 |are not the majority yet, but they are increasing fast.  Language change 
 |is happening in front of our eyes.

Because you get urged by the wind, that is the reason.
And yes, it gets more schizophrenic now you can use this letter
but have to write aufwändiger Missstand instead of aufwendiger
Mißstand.
But nice to read your fluent german, i do not often speak in my
native language!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear


Aw: Re: German sharp S uppercase mapping

2024-12-02 Thread Marius Spix via Unicode
Gesendet: Montag, 2. Dezember 2024 um 14:37
Von: "Dominikus Dittes Scherkl via Unicode" 
An: unicode@corp.unicode.org
CC: "Dominikus Dittes Scherkl" 
Betreff: Re: German sharp S uppercase mapping
Am 02.12.24 um 14:24 schrieb Julian Bradfield via Unicode:
> On 2024-12-02, Dominikus Dittes Scherkl via Unicode 

wrote:
>> No. I want to be able if I have 2 files "Weiß.doc" and "Weiss.doc" on 
my
>> system and copy them to windows, to get 2 files "WEIẞ.DOC" and 
"WEISS.DOC".
>
> You can't have two files called "Weiss.doc" and "weiss.doc" and expect
> to copy them both to Windows and get two files. Why is this case any worse?
Ok, you are right.
Case-insensitive file-systems simply sucks. No gain in trying to fix them.


The files "weiss.txt" and "weiß.txt" CAN exist on NTFS, VFAT, FAT32, exFAT and 
ReFS at the same time. For "weiß.txt" an extended filename "WEI~1.TXT" is 
generated for downward compatibility with software expecting 8.3 filenames.

NTFS and ReFS do support a case-sensitive mode, which is not enabled by 
default. Even if the case-sensitive mode is off, "weiss.txt" and "weiß.txt" are 
considered to be distinct files.

However, this won't work with some software like OneDrive or Sharepoint, which 
do not allow "weiss.txt" and "weiß.txt" in the same directory.

Case-sensitive file systems are supported by the Windows kernel (there is a 
registry setting called ObCaseInsensitive). This is required for special use 
cases like GNU on Windows (GOW), but it is not enabled by default, because it 
would break compatibility legacy software.






Re: German sharp S uppercase mapping

2024-12-02 Thread Julian Bradfield via Unicode
On 2024-12-02, Dominikus Dittes Scherkl via Unicode  
wrote:
> No. I want to be able if I have 2 files "Weiß.doc" and "Weiss.doc" on my
> system and copy them to windows, to get 2 files "WEIẞ.DOC" and "WEISS.DOC".

You can't have two files called "Weiss.doc" and "weiss.doc" and expect
to copy them both to Windows and get two files. Why is this case any worse?


Re: German sharp S uppercase mapping

2024-12-02 Thread Dominikus Dittes Scherkl via Unicode

Am 02.12.24 um 14:24 schrieb Julian Bradfield via Unicode:

On 2024-12-02, Dominikus Dittes Scherkl via Unicode  
wrote:

No. I want to be able if I have 2 files "Weiß.doc" and "Weiss.doc" on my
system and copy them to windows, to get 2 files "WEIẞ.DOC" and "WEISS.DOC".


You can't have two files called "Weiss.doc" and "weiss.doc" and expect
to copy them both to Windows and get two files. Why is this case any worse?

Ok, you are right.
Case-insensitive file-systems simply sucks. No gain in trying to fix them.





Re: German sharp S uppercase mapping

2024-12-02 Thread Dominikus Dittes Scherkl via Unicode

Am 02.12.24 um 12:18 schrieb Ivan Panchenko via Unicode:

it may not be Unicode’s responsibility to promote it.

It has nothing to do with promotion.
I think from the start it was the intention to make the uppercase of ß
unambiguous by adding a new glyph. But germany insisted (until 2017) to
use the old uppercase (and nobody else cared).
Now germany has given up the resistance, so naturally the default should
be changed as intended. Especially as this would fix some problems (like
the case-insensitive file-copy and the legal names in all-uppercase
documents).


So not exactly prominently featured (“in some fonts”). And again, I
find it highly unlikely that the Rat ever intended to make a
recommendation here

No, they just given up the resistance to something that was intended to
fix a bug.




Re: German sharp S uppercase mapping

2024-12-02 Thread Dominikus Dittes Scherkl via Unicode

Am 02.12.24 um 12:25 schrieb Giacomo Catenazzi via Unicode:

I think you are taking the issue personally, and in a myopic view.

Na, fortunately my name doesn't contain ß, so I really don't care.
But it brings me up that something practically nobody ever needs
_except_ some few people who have legal problems with it, can not be
changed to something that fixes their problem and won't harm anyone
else. And that only because "it has to be stable".

The new letter is there to disentangle two things that were
unnecessarily conflated.


So you want to break many files? If you change the machine translation,
you will have conflicting (and possibly disappearing documents), because
filesystems in Windows (and default in macos) are case insensitive.

No. I want to be able if I have 2 files "Weiß.doc" and "Weiss.doc" on my
system and copy them to windows, to get 2 files "WEIẞ.DOC" and "WEISS.DOC".

But what I get now is 1 file, because the second one overwrites the
first one, as from Windows point of view they both have the same name.
This is awful garbage and should be fixed!

And if you copy a whole drive to windows, would you even recognize that
some files were silently dropped?
If you are lucky, you may get some strange message: "File xy already
exists. Do you want to overwrite it?". But I won't count on it.


Especially for security reasons, the casing should be changed - to not
lose the "ß" in your name and therefore beeing considered a different
person IN YOUR LEGAL DOCUMENTS - the most important identifier of all!


As people said: that it is a different case, which should be handled
outside the machine translation.

No, this is the only relevant case. Legal documents often use
all-uppercase and thereby garbles names containing ß.
In manual processing you can always choose yourself what letter you want
to use. And if anywhere some ẞ occurs and you don't like it, you are
always free to replace it by SS.
But leaving it there can never be more than an aesthetic problem. On the
other hand the old uppercase can be a serious problem.


so as other said: it should be put in local casing. In
Switzerland we do not want such legal distinction.

Swiss doesn't use ß, so there is no problem for them. But I suspect
people going to Switzerland still want their names to be spelled
correct. Maybe the Swiss government doesn't care for this problem, but
it won't hurt them if by accident (changing their sacred default casing)
their systems would no more annoy foreign people, no?


Unlike turkish, which has a different uppercase for "i" - which is used
differently in pretty much _any_ other latin-script using language, "ß"
is not used differently in any other language. It is not used in any
other language at all.

> Switzerland uses it differently.

Yeah, it doesn't use it at all. But then, why should they care what the
uppercase of a letter they don't use is?

How does Switzerland handle foreign names containing ß?

--
 Dominikus Dittes Scherkl





Re: German sharp S uppercase mapping

2024-12-02 Thread Giacomo Catenazzi via Unicode

I think you are taking the issue personally, and in a myopic view.


On 2024-12-02 11:33, Dominikus Dittes Scherkl via Unicode wrote:

Am 02.12.24 um 07:13 schrieb Asmus Freytag via Unicode:

On 12/1/2024 9:09 PM, David Starner via Unicode wrote:

On Sun, Dec 1, 2024 at 7:54 PM Dominikus Dittes Scherkl via Unicode
 wrote:
But in automatic text processing the old form is simply a bug that 
needs

to be fixed. The new form has to be the "default" - otherwise
implementations will proliferate this bug forever.

Various systems take for granted that case folding is stable.

But that is the problem with the old casing: IT IS NOT STABLE!
toLower(toUpper("ß"))=="ss" - this is simply wrong, no matter which
language or locale you are using (beside the fact that is is nowhere
used except in the german languages). This is the reason why the new "ẞ"
was invented - to allow roundtrip without modifying the text!


So you want to break many files? If you change the machine translation, 
you will have conflicting (and possibly disappearing documents), because 
filesystems in Windows (and default in macos) are case insensitive.



Very much agreed on that one. Usually in the context of "identifiers"
and not in free text.

Especially for security reasons, the casing should be changed - to not
lose the "ß" in your name and therefore beeing considered a different
person IN YOUR LEGAL DOCUMENTS - the most important identifier of all!


As people said: that it is a different case, which should be handled 
outside the machine translation.


You are speaking about legal documents, but in reality only on German 
legal documents, so as other said: it should be put in local casing. In 
Switzerland we do not want such legal distinction.




or change the interpretation of code in case-insensitive filesystems.
The automated default isn't going to change, and German is going to
have to join Turkish in that purely default case-conversion just
doesn't work for them.

Unlike turkish, which has a different uppercase for "i" - which is used
differently in pretty much _any_ other latin-script using language, "ß"
is not used differently in any other language. It is not used in any
other language at all.


Switzerland uses it differently.

giacomo




Re: German sharp S uppercase mapping

2024-12-02 Thread Ivan Panchenko via Unicode
For some perspective: The sharp s is made of the long s, which just
corresponds to a capital round S in uppercase*; this is why the
capital sharp S is not uncontroversial. To this day, “ẞ” is rather
exotic and people usually write “SS” or (non-official) “ß”. While I
like the capital letter, it may not be Unicode’s responsibility to
promote it. (* Though the Ehmcke antiqua does have a capital long S,
which resembles an integral sign!)

Daniel Buncic via Unicode :
> Remember, the new
> wording that expresses a preference for ẞ over SS (or at least treats
> them equally) was only published this year, with the new Duden edition
> (which is what people actually read rather than the official rules)
> coming out in August, just 3½ months ago.

This is how Duden currently explains it:

“Bei Verwendung von Großbuchstaben steht traditionellerweise SS für ß.
In manchen Schriften gibt es aber auch einen entsprechenden
Großbuchstaben; seine Verwendung ist fakultativ ‹§ 25 E3›.”

“In Dokumenten kann bei Namen aus Gründen der Eindeutigkeit auch bei
Großbuchstaben anstelle von Doppel-s bzw. großem Eszett das kleine ß
verwendet werden.”

So not exactly prominently featured (“in some fonts”). And again, I
find it highly unlikely that the Rat ever intended to make a
recommendation here, given that they do not make one regarding
“Geografie”/“Geographie” etc.



Re: German sharp S uppercase mapping

2024-12-02 Thread Dominikus Dittes Scherkl via Unicode

Am 02.12.24 um 06:50 schrieb Asmus Freytag via Unicode:

Good luck banning SS in "all Germans".


What a nonsense. Nobody wants this.

Of course "ss" continues to be used - not only in swiss german.
This is about the default for case-mapping.
uppercase "ss" was, is, and will ever be "SS".
But the uppercase of "ß" should now be "ẞ". And it only matters in
automatic text processing, to prevent "ß" from beeing changed to "ss" -
by some stupid case-insensitive filesystem or whatever.




Re: German sharp S uppercase mapping

2024-12-02 Thread Dominikus Dittes Scherkl via Unicode

Am 02.12.24 um 07:13 schrieb Asmus Freytag via Unicode:

On 12/1/2024 9:09 PM, David Starner via Unicode wrote:

On Sun, Dec 1, 2024 at 7:54 PM Dominikus Dittes Scherkl via Unicode
 wrote:

But in automatic text processing the old form is simply a bug that needs
to be fixed. The new form has to be the "default" - otherwise
implementations will proliferate this bug forever.

Various systems take for granted that case folding is stable.

But that is the problem with the old casing: IT IS NOT STABLE!
toLower(toUpper("ß"))=="ss" - this is simply wrong, no matter which
language or locale you are using (beside the fact that is is nowhere
used except in the german languages). This is the reason why the new "ẞ"
was invented - to allow roundtrip without modifying the text!


Very much agreed on that one. Usually in the context of "identifiers"
and not in free text.

Especially for security reasons, the casing should be changed - to not
lose the "ß" in your name and therefore beeing considered a different
person IN YOUR LEGAL DOCUMENTS - the most important identifier of all!


Differences in how Unicode data is interpreted has open security holes
in systems, and while this isn't particularly likely with this change,
it is possible, which is part of the reason case-folding is guaranteed
to be stable. Such a change can confuse case-insensitive filesystems,

Beside the fact that case-insensitive filesystems are a pain in the ass,
especially there it is necessary to not lose the information wether
something contained a "ß" or a "ss" - which with the old casing was not
possible.


or change the interpretation of code in case-insensitive filesystems.
The automated default isn't going to change, and German is going to
have to join Turkish in that purely default case-conversion just
doesn't work for them.

Unlike turkish, which has a different uppercase for "i" - which is used
differently in pretty much _any_ other latin-script using language, "ß"
is not used differently in any other language. It is not used in any
other language at all.


By "default", if I start editing a document, I should not have to worry
about getting a deficient case mapping/case conversion implementation
just because I'm using the "wrong" language.

Correct. This is why the case mapping should be changed _for all_
languages and locales. The default should be changed. Noone should be
using the old casing, except if he specially tailors his system to use it.


Likewise, by default, I should never get the locale-dependent case
conversion invoked when accessing file systems or domain names.

Correct. But with the old mapping, the system will unwanted change my
name from "Heß" to "HESS" - and that cannot be undone if I start using a
case-sensitive filesystem, unless I know that it is wrong and change it
back manually. The new mapping is there to fix that. So please, start
using it! NOW.




Re: German sharp S uppercase mapping

2024-12-02 Thread Dominikus Dittes Scherkl via Unicode

Am 02.12.24 um 11:19 schrieb Marius Spix via Unicode:

That problem is not not new. The long ſ, which is only used in old Fraktur 
script, but not in modern Antiqua script, has the same issue. It shares its 
uppercase form S with the round s, which behaves differently than the Greek 
final Sigma ς and can appear mid-word, for example in compound words.

For example: to_lower(to_upper("Hauſtür")) returns "Haustür", which is 
inaccurate.

That can even make a difference, because "Werksirene" and "Werkſirene" or "Antragsteller" 
and "Antragſteller" have completely different meanings.


Yes, but at least the long ſ is not used in legal names (today). This is
the main source of problems. Otherwise, I am sure, we would now have a
new upper-case form of long ſ. :-)




Aw: Re: German sharp S uppercase mapping

2024-12-02 Thread Marius Spix via Unicode
That problem is not not new. The long ſ, which is only used in old Fraktur 
script, but not in modern Antiqua script, has the same issue. It shares its 
uppercase form S with the round s, which behaves differently than the Greek 
final Sigma ς and can appear mid-word, for example in compound words.

For example: to_lower(to_upper("Hauſtür")) returns "Haustür", which is 
inaccurate.

That can even make a difference, because "Werksirene" and "Werkſirene" or 
"Antragsteller" and "Antragſteller" have completely different meanings.




Gesendet: Montag, 2. Dezember 2024 um 02:48
Von: "Dominikus Dittes Scherkl via Unicode" 
An: unicode@corp.unicode.org
CC: "Dominikus Dittes Scherkl" 
Betreff: Re: German sharp S uppercase mapping
Am 30.11.24 um 18:16 schrieb Asmus Freytag via Unicode:
> On 11/27/2024 12:15 PM, Dominikus Dittes Scherkl via Unicode wrote:
> However, speaking of this as a "default" is confusing to readers who
> think in terms of text processing or authoring environments where a
> different set of requirements rule. Here, the proper "default" is the
> best implementation of a culturally appropriate case transform.

NO. I really mean "default" in a technical sense, not something someone
tailors to local needs.
The ẞ was introduced to have an invertible casing, just like
compatibility codepoints were assigned to make preservation of old
formating information available if a translation back to some obsolete
charset is necessary.

_This new letter was invented to allow for 1:1 roundtrip conversion._

toUpper() shall change "ß" to "ẞ" instead of "SS", just to allow
toLower() producing back "ß" instead of a wrong spelling with "ss"
(which at the moment can only be avoided using a german dictionary - a
really heavy constraint to a small function like toLower - and for
family names simply not possible at all - the information is lost).

This is a really bad situation, which should be fixed as soon as
possible, not a matter of taste.
And it should be fixed explicitly in automatic text processing - because
this is were today errors are produced, that can now be avoided.
In private letters it doesn't matter what form is used - the people
write whatever they want anyway. But automatic processing shall not drop
information that can not be brought back (expcept with re-introducing
this knowledge back manually).

> And what is "best" can change over time.
No. Fixing this round-trip bug is in the best interest of unicode and
that won't change over time. Using "SS" in all uppercase text was always
a bad workaround that became a source of spelling errors by automatic
text processing and for which a fix was invented some ten years ago. So
lets use it everywhere - at least now that it is officially allowed
(since 2017) and even preferred (since this year).






Re: German sharp S uppercase mapping

2024-12-01 Thread Asmus Freytag via Unicode

On 12/1/2024 9:09 PM, David Starner via Unicode wrote:

On Sun, Dec 1, 2024 at 7:54 PM Dominikus Dittes Scherkl via Unicode
 wrote:

But in automatic text processing the old form is simply a bug that needs
to be fixed. The new form has to be the "default" - otherwise
implementations will proliferate this bug forever.

Various systems take for granted that case folding is stable.
Very much agreed on that one. Usually in the context of "identifiers" 
and not in free text.

Differences in how Unicode data is interpreted has open security holes
in systems, and while this isn't particularly likely with this change,
it is possible, which is part of the reason case-folding is guaranteed
to be stable. Such a change can confuse case-insensitive filesystems,
or change the interpretation of code in case-insensitive filesystems.
The automated default isn't going to change, and German is going to
have to join Turkish in that purely default case-conversion just
doesn't work for them.

Again, it would help to mentally change from "default" to some other 
term, like the "InvariantCulture" terminology used by .NET, for example.


By "default", if I start editing a document, I should not have to worry 
about getting a deficient case mapping/case conversion implementation 
just because I'm using the "wrong" language.


Likewise, by default, I should never get the locale-dependent case 
conversion invoked when accessing file systems or domain names.


These are different "defaults".

A./


Re: German sharp S uppercase mapping

2024-12-01 Thread Asmus Freytag via Unicode

On 12/1/2024 5:48 PM, Dominikus Dittes Scherkl via Unicode wrote:

Am 30.11.24 um 18:16 schrieb Asmus Freytag via Unicode:

On 11/27/2024 12:15 PM, Dominikus Dittes Scherkl via Unicode wrote:
However, speaking of this as a "default" is confusing to readers who
think in terms of text processing or authoring environments where a
different set of requirements rule. Here, the proper "default" is the
best implementation of a culturally appropriate case transform.


NO. I really mean "default" in a technical sense, not something someone
tailors to local needs.
The ẞ was introduced to have an invertible casing, just like
compatibility codepoints were assigned to make preservation of old
formating information available if a translation back to some obsolete
charset is necessary.

_This new letter was invented to allow for 1:1 roundtrip conversion._


The letter was not *invented*. It was discovered (= identified as 
occurring in actual writing) and encoded.


It was encoded to match a character with a unique shape and properties. 
One of them of *being* a capital letter and the other one of ß being its 
lowercase equivalent.




toUpper() shall change "ß" to "ẞ" instead of "SS", just to allow
toLower() producing back "ß" instead of a wrong spelling with "ss"
(which at the moment can only be avoided using a german dictionary - a
really heavy constraint to a small function like toLower - and for
family names simply not possible at all - the information is lost).


Your problem is that you assume an implementation of toUpper that takes 
no argument. For purposes like text design, publication etc. you want an 
implementation that selects which locale should set the rules. (Or one, 
where that setting is done behind the scenes, which is logically 
equivalent). Without specifiying the locale, your beautiful toUpper() 
does not now that in Turkish, 'i' is not mapped to 'I' but to CAPITAL I 
WITH DOT.


Because your beautiful toUpper does not handle at least one language 
means that it should not need to handle any languages. Instead it should 
be stable.


What you are describing is a change to the toUpper() that is invoked 
with the german locale as parameter (or selected behind the scenes).


There's not the same requirement for that one to be stable, although 
sometimes transitions are implemented by creating a separate locale for 
"old" and "new" orthographies and the like.


When it comes to case conversion, purpose matters.

This doesn't detract from the need to have implementations that do the 
"right" thing (as currently defined) for a given language. And from the 
need to enable these by default for ordinary text manipulation.


But it's not the same thing as overriding an "identifier-safe" or 
"filesystem-safe" implementation, just because that's incorrectly viewed 
as a "default" that should be applicable to text manipulation.


A./



This is a really bad situation, which should be fixed as soon as
possible, not a matter of taste.
And it should be fixed explicitly in automatic text processing - because
this is were today errors are produced, that can now be avoided.
In private letters it doesn't matter what form is used - the people
write whatever they want anyway. But automatic processing shall not drop
information that can not be brought back (expcept with re-introducing
this knowledge back manually).


And what is "best"  can change over time.

No. Fixing this round-trip bug is in the best interest of unicode and
that won't change over time. Using "SS" in all uppercase text was always
a bad workaround that became a source of spelling errors by automatic
text processing and for which a fix was invented some ten years ago. So
lets use it everywhere - at least now that it is officially allowed
(since 2017) and even preferred (since this year).






Re: German sharp S uppercase mapping

2024-12-01 Thread Asmus Freytag via Unicode

On 11/30/2024 4:44 PM, Steffen Nurpmeso wrote:

Doug Ewell via Unicode wrote in
  :
  |Thanks to Asmus for saying what I had planned to say, except that his \
  |was better-worded, more carefully put together, and more authoritative.
  |
  |Casing for text meant for human readers should follow current local \
  |conventions.
  |
  |Casing for text meant for machine processing (file systems, databases, \
  |etc.) must remain stable, even when local conventions change.

Sorry that makes totally no sense to me.


Because you probably have never considered what havoc would be caused by 
changing in mid-stream anything about the processing of identifiers and 
similar strings that are not text, but unambiguous references to 
domains, files or other resources.


(You can make some changes when you create a new domain and set rules 
for it for the first time, but that's about it).



I would, however, not bring in uppercase sharp S for quite some
time.  But at some time, or when really the SS would be banned "in
all Germans" which are used as official languages, sooner that is,
then the current Unicode data would be just wrong.


Good luck banning SS in "all Germans".

You probably did not consider the fact that German written in 
Switzerland only uses ss and SS and therefore banning SS would be more 
than confusing. It would needless compromise a well accepted local 
orthography.


German written in Germany has undergone a transition, and as partially 
as a result of / but mostly in parallel to that transition, we see a 
secondary shift in capitalization away from a typewriter / telegraph 
inspired fallback to a more natural way of writing where all letters 
have a case pair and casing is reversible.


 The same cannot be said for Swiss German. There's not been a similar 
transition, which would have needed to start with lower case usage. As 
long as the Swiss don't use ß, I can't see them banning "SS".


A./


Re: German sharp S uppercase mapping

2024-12-01 Thread Asmus Freytag via Unicode

On 12/1/2024 12:51 PM, Daniel Buncic via Unicode wrote:

Am 01.12.2024 um 19:32 schrieb Markus Scherer:

I searched amazon.de for “der große”. Not one capital ẞ on the first two
pages. ...


Amazon sells all the books, movies, etc. that are in stock.  They can 
be very old.  Even when a book is given as “published in 2022” or so, 
this often only means that there was a new printing of the same 
edition, or a new but stereotypical edition.  This is not 
representative of whatever change has been going on in the last couple 
of years. ...


Book titles are interesting, but subject to things like house styles.

On the other hand, ad designs are among the most contemporaneous uses of 
text, and I had no problems spotting one with a very prominent capital 
sharp s (image should have been shared on the list, if not stripped).


Use is definitely more prominent than it was at the time (Unicode 5.1) 
that we encoded U+1E9E. Just like adoption of the 1996 orthography has 
not been universal, we can expect there to be a transition period.


However, overly conservative/cautious implementation of software support 
(e.g. using the wrong tables to do uppercasing in a text design app, as 
opposed to for identifiers) will unnecessarily prolong that transition.


A./






Re: German sharp S uppercase mapping

2024-12-01 Thread Asmus Freytag via Unicode

On 12/1/2024 2:24 PM, David Starner via Unicode wrote:

On Sun, Dec 1, 2024 at 3:42 PM Markus Scherer  wrote:

No, that one is clearly a lowercase ß.

I disagree; that's clearly an eszett, between other uppercase
characters, and unless there's some linguistic weirdness going on,
like iPhone or eBook, that's a capital letter.

No, it's not.

Glyphs have to be taken
in context, and in that case, it's clear they didn't intend for one
character in the middle of the word to be lowercase.
That (writing a lowercase ß in ALLCAPS) being / having been an 
acceptable fallback, you can't assume anything based on context. You do 
need to consider the glyph (unless you have access to the underlying 
text buffer).

I could wonder
whether that's a bad glyph for the text, or one used by preference to
the ẞ style glyph, but in Latin-script German, in a modern Unicode
context, it makes no sense to maintain a distinction between an
uppercased lowercase ß and an uppercase ẞ. Uppercase("ß") should go to
"SS" or "ẞ", and a glyph looking like ß in an uppercase context should
be interpreted and written as U+1E9E, not U+00DF.


The glyph is clearly not that for a capital letter - for one, it extends 
above the tops of all other capitals. Typical designs for capital forms 
of ß tend to be wider and a bit more squat in appearance. The 
distinction is quite noticeable.


What you are arguing is that one should not use that fallback any 
longer. I have no arguments with that, but in this case, the fallback 
was used.


A./



Re: German sharp S uppercase mapping

2024-12-01 Thread David Starner via Unicode
On Sun, Dec 1, 2024 at 7:54 PM Dominikus Dittes Scherkl via Unicode
 wrote:
> But in automatic text processing the old form is simply a bug that needs
> to be fixed. The new form has to be the "default" - otherwise
> implementations will proliferate this bug forever.

Various systems take for granted that case folding is stable.
Differences in how Unicode data is interpreted has open security holes
in systems, and while this isn't particularly likely with this change,
it is possible, which is part of the reason case-folding is guaranteed
to be stable. Such a change can confuse case-insensitive filesystems,
or change the interpretation of code in case-insensitive filesystems.
The automated default isn't going to change, and German is going to
have to join Turkish in that purely default case-conversion just
doesn't work for them.

-- 
The standard is written in English . If you have trouble understanding
a particular section, read it again and again and again . . . Sit up
straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185
(1991)



Re: German sharp S uppercase mapping

2024-12-01 Thread Dominikus Dittes Scherkl via Unicode

Am 01.12.24 um 09:53 schrieb Daniel Buncic via Unicode:

Am 01.12.2024 um 04:15 schrieb Markus Scherer via Unicode:

As a library implementer and German speaker, I have been looking out
for the supposed sea change in usage, and haven't seen it.

It's just 4 months that the new Duden is available. Even in the digital
era it will take some more time.


minimum of characters.  My own university chose a font that does not let
me write my name, Bunčić, let alone write a Russian translation in the
same font.  Such minimal fonts also do not contain capital ẞ.

But there definitely is change.

[...]

(where the title is spelled “DER GROẞE GATSBY”).

A few years ago, the number of capital ẞ you could see was exactly zero.
  Now they are popping up more and more.  For the above reasons, they
are not the majority yet, but they are increasing fast.  Language change
is happening in front of our eyes.

Yep.
That so many sources still keeps the old SS is mainly so, because
checking which "SS" should be changed to "ẞ" and which not (because the
lowercase would also use "ss") is an error-prone manual task, that noone
is willing to pay for - especially for so little gain.

So I would expect the new form only occuring in new publications.

But in automatic text processing the old form is simply a bug that needs
to be fixed. The new form has to be the "default" - otherwise
implementations will proliferate this bug forever.




Re: German sharp S uppercase mapping

2024-12-01 Thread Dominikus Dittes Scherkl via Unicode

Am 30.11.24 um 18:16 schrieb Asmus Freytag via Unicode:

On 11/27/2024 12:15 PM, Dominikus Dittes Scherkl via Unicode wrote:
However, speaking of this as a "default" is confusing to readers who
think in terms of text processing or authoring environments where a
different set of requirements rule. Here, the proper "default" is the
best implementation of a culturally appropriate case transform.


NO. I really mean "default" in a technical sense, not something someone
tailors to local needs.
The ẞ was introduced to have an invertible casing, just like
compatibility codepoints were assigned to make preservation of old
formating information available if a translation back to some obsolete
charset is necessary.

_This new letter was invented to allow for 1:1 roundtrip conversion._

toUpper() shall change "ß" to "ẞ" instead of "SS", just to allow
toLower() producing back "ß" instead of a wrong spelling with "ss"
(which at the moment can only be avoided using a german dictionary - a
really heavy constraint to a small function like toLower - and for
family names simply not possible at all - the information is lost).

This is a really bad situation, which should be fixed as soon as
possible, not a matter of taste.
And it should be fixed explicitly in automatic text processing - because
this is were today errors are produced, that can now be avoided.
In private letters it doesn't matter what form is used - the people
write whatever they want anyway. But automatic processing shall not drop
information that can not be brought back (expcept with re-introducing
this knowledge back manually).


And what is "best"  can change over time.

No. Fixing this round-trip bug is in the best interest of unicode and
that won't change over time. Using "SS" in all uppercase text was always
a bad workaround that became a source of spelling errors by automatic
text processing and for which a fix was invented some ten years ago. So
lets use it everywhere - at least now that it is officially allowed
(since 2017) and even preferred (since this year).




Re: German sharp S uppercase mapping

2024-12-01 Thread David Starner via Unicode
On Sun, Dec 1, 2024 at 3:42 PM Markus Scherer  wrote:
> No, that one is clearly a lowercase ß.

I disagree; that's clearly an eszett, between other uppercase
characters, and unless there's some linguistic weirdness going on,
like iPhone or eBook, that's a capital letter. Glyphs have to be taken
in context, and in that case, it's clear they didn't intend for one
character in the middle of the word to be lowercase. I could wonder
whether that's a bad glyph for the text, or one used by preference to
the ẞ style glyph, but in Latin-script German, in a modern Unicode
context, it makes no sense to maintain a distinction between an
uppercased lowercase ß and an uppercase ẞ. Uppercase("ß") should go to
"SS" or "ẞ", and a glyph looking like ß in an uppercase context should
be interpreted and written as U+1E9E, not U+00DF.

-- 
The standard is written in English . If you have trouble understanding
a particular section, read it again and again and again . . . Sit up
straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185
(1991)



Re: German sharp S uppercase mapping

2024-12-01 Thread Markus Scherer via Unicode
On Sun, Dec 1, 2024 at 12:53 PM David Starner  wrote:

> On Sun, Dec 1, 2024 at 12:35 PM Markus Scherer via Unicode
>  wrote:
> > I searched amazon.de for “der große”. Not one capital ẞ on the first
> two pages. Among results on those pages for recent items: “ARTHUR DER
> GROSSE“, “AIR - DER GROSSE WURF“, “DER GROßE WADAS“, “DER GROSSE GOPNIK“,
> “DAS GROSSE BUCH DER GUTEN GEDANKEN“, “DER GROSSE SOMMER“, “DER GROSSE
> YOUTUBER-BEEF“, “DAS GROSSE BUCH DER SELBST REFLEXION“, “DER GROSSE
> SCHLEIMFILM“, “ALEXANDER DER GROSSE“, 2x “DER GROSSE GATSBY“
>
> You searched a digital database that will be displayed on user
> screens, some of which may not have been updated for years and might
> not have new fonts, and who knows when all the bibliographic systems
> have been updated.


I was looking at the *images* of the book and video titles. The images have
no such limitations. (The *text* of the titles rarely uses all caps.)
And I only looked at publications since ca. 2020. Although Daniel has a
point in that publishers may not change their title pages in a new edition
of an older publication.

On the first page of searches for me, I see 14 items; looking at
> titles on cover pictures, two of which are lowercase and one of which,
> "Der große Wadas", uses a ẞ on the cover.


No, that one

is clearly a lowercase ß.

Searching Books for Der
> große Gatsby shows 12 or 13 distinct covers, 6 with DER (or Der)
> GROSSE GATSBY on the cover, three with DER (or Der) GROẞE GATSBY, and
> 4 with lowercase titles.


I am only counting recent editions, and only all-caps titles.
For "der große gatsby", I get a sea of SS. There is a lowercase ß
,
and another lowercase
.
And a third one

.
Finally, down on the second page, a capital ẞ

.

As far as I can tell, the capital ẞ is nowhere near pushing aside SS, and
even the lowercase ß is more common in all caps German book and video
titles.

markus


Re: German sharp S uppercase mapping

2024-12-01 Thread Alexander Lange via Unicode

On 01.12.2024 22:01, Daniel Buncic via Unicode wrote:

Am 01.12.2024 um 21:29 schrieb Alexander Lange via Unicode:
In German orthography, double consonants mark the preceding vowel as 
being short (if there isn’t just a mere co-incidence in a compound,

e. g. “Mausschwanz” (mouse tail)). As the “a” in “Straße” is long,
you write “ß”; as the “a” in “Gasse” is short, you write “ss”.


This is the new rule since the 1996 reform though.


No, this has always been the rule.  However, the rule used to have one 
exception, namely that at the end of a word and before a consonant you 
could only have ß, even if the vowel was short.  It is only this 
exception (which was based on long ſ in blackletter and therefore 
quite obsolete) that was abolished in the 1998 spelling reform.


The rule set till 1996 was called Adelungsche s-Schreibung: 
https://de.wikipedia.org/wiki/Adelungsche_s-Schreibung


And the one now in effect is called Heysesche s-Schreibung: 
https://de.wikipedia.org/wiki/Heysesche_s-Schreibung


What you call an exception is the difference between the two. And both 
are from the time when long ſ was still used, so this doesn't make one 
more obsolete than the other.


But I would not want to argue about this. Whether the second criterion 
in Adelung's rule is part of the rule or an exception doesn't really 
matter, it would just be nitpicking. The thing I found weird is that our 
schoolbooks and teachers listed a lot of words as irregular exceptions 
that we would have to memorize, when in fact they all could have been 
explained with just one or two more sentences (like we both just did).


Kind regards,
Alexander



Re: German sharp S uppercase mapping

2024-12-01 Thread Daniel Buncic via Unicode

Am 01.12.2024 um 21:29 schrieb Alexander Lange via Unicode:
In German orthography, double consonants mark the preceding vowel as 
being short (if there isn’t just a mere co-incidence in a compound,

e. g. “Mausschwanz” (mouse tail)). As the “a” in “Straße” is long,
you write “ß”; as the “a” in “Gasse” is short, you write “ss”.


This is the new rule since the 1996 reform though.


No, this has always been the rule.  However, the rule used to have one 
exception, namely that at the end of a word and before a consonant you 
could only have ß, even if the vowel was short.  It is only this 
exception (which was based on long ſ in blackletter and therefore quite 
obsolete) that was abolished in the 1998 spelling reform.


Best wishes,

Daniel

--
Prof. Dr. Daniel Bunčić
===
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:   +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:daniel.bun...@uni-koeln.de = dan...@buncic.de
Threema:   https://threema.id/8M375R5K
===
Homepage:  http://daniel.buncic.de/
Academia:  http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===



Re: German sharp S uppercase mapping

2024-12-01 Thread David Starner via Unicode
On Sun, Dec 1, 2024 at 12:35 PM Markus Scherer via Unicode
 wrote:
> I searched amazon.de for “der große”. Not one capital ẞ on the first two 
> pages. Among results on those pages for recent items: “ARTHUR DER GROSSE“, 
> “AIR - DER GROSSE WURF“, “DER GROßE WADAS“, “DER GROSSE GOPNIK“, “DAS GROSSE 
> BUCH DER GUTEN GEDANKEN“, “DER GROSSE SOMMER“, “DER GROSSE YOUTUBER-BEEF“, 
> “DAS GROSSE BUCH DER SELBST REFLEXION“, “DER GROSSE SCHLEIMFILM“, “ALEXANDER 
> DER GROSSE“, 2x “DER GROSSE GATSBY“
>
> Surely there are no significant limitations for book and movie titles of the 
> last 2-3 years that would keep their publishers from using the capital ẞ if 
> they wanted to.
>
> markus

You searched a digital database that will be displayed on user
screens, some of which may not have been updated for years and might
not have new fonts, and who knows when all the bibliographic systems
have been updated. Using cutting edge Unicode characters is not always
the best way. I wouldn't be surprised if there weren't systems that
still used ISO-8859-1 or MARC-8 (a library-specific tailoring of
ISO-2022) in the bibliographic loop.

On the first page of searches for me, I see 14 items; looking at
titles on cover pictures, two of which are lowercase and one of which,
"Der große Wadas", uses a ẞ on the cover. Searching Books for Der
große Gatsby shows 12 or 13 distinct covers, 6 with DER (or Der)
GROSSE GATSBY on the cover, three with DER (or Der) GROẞE GATSBY, and
4 with lowercase titles. A few dated back to 2006, so it's not a
trivial sample of modern covers.

Amazon's pretty bad for this in some ways, but judging from those
searches and a couple others, there's some use on book covers, maybe
10-15% of uppercase titles.
-- 
The standard is written in English . If you have trouble understanding
a particular section, read it again and again and again . . . Sit up
straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185
(1991)



Re: German sharp S uppercase mapping

2024-12-01 Thread Daniel Buncic via Unicode

Am 01.12.2024 um 19:32 schrieb Markus Scherer:

I searched amazon.de for “der große”. Not one capital ẞ on the first two
pages. Among results on those pages for recent items: “ARTHUR DER GROSSE“,
“AIR - DER GROSSE WURF“, “DER GROßE WADAS“, “DER GROSSE GOPNIK“, “DAS
GROSSE BUCH DER GUTEN GEDANKEN“, “DER GROSSE SOMMER“, “DER GROSSE
YOUTUBER-BEEF“, “DAS GROSSE BUCH DER SELBST REFLEXION“, “DER GROSSE
SCHLEIMFILM“, “ALEXANDER DER GROSSE“, 2x “DER GROSSE GATSBY“


Amazon sells all the books, movies, etc. that are in stock.  They can be 
very old.  Even when a book is given as “published in 2022” or so, this 
often only means that there was a new printing of the same edition, or a 
new but stereotypical edition.  This is not representative of whatever 
change has been going on in the last couple of years.  Remember, the new 
wording that expresses a preference for ẞ over SS (or at least treats 
them equally) was only published this year, with the new Duden edition 
(which is what people actually read rather than the official rules) 
coming out in August, just 3½ months ago.  Of course this has not 
changed all the 2.5 million German-language books that are currently 
available.  But, as I have shown, the more recent a cover, the more 
often you see a capital ẞ.


All the best,

Daniel

--
Prof. Dr. Daniel Bunčić
===
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:   +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:daniel.bun...@uni-koeln.de = dan...@buncic.de
Threema:   https://threema.id/8M375R5K
===
Homepage:  http://daniel.buncic.de/
Academia:  http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===



Re: German sharp S uppercase mapping

2024-12-01 Thread Alexander Lange via Unicode

Hello,

In German orthography, double consonants mark the preceding vowel as 
being short (if there isn’t just a mere co-incidence in a compound,

e. g. “Mausschwanz” (mouse tail)). As the “a” in “Straße” is long,
you write “ß”; as the “a” in “Gasse” is short, you write “ss”.
Cf. 
and .


This is the new rule since the 1996 reform though. Originally, “ß” would 
replace “ss” in all words where the two “s” belong to the same syllable, 
including words with short vowels like “Fluß” (now “Fluss”) or “naß” 
(now “nass”) as well as those with long vowels like “Fuß” (not changed) 
or “Straße” (where the syllable boundary is before the “ß”, “Stra-ße”). 
“Gasse” however is hyphenated “Gas-se”, so the two “s” never ligated to 
a “ß” by either rule.


Curiously though, when I was in school in the 80s and 90s, we were 
always taught the new rule even before the reform was in public 
discussion, let alone in effect. The many short-vowel words with an “ß” 
were called “exceptions”, despite having been perfectly regular if you 
cited the rule that was actually in use at the time. I have never 
understood how the old rule could fall into oblivion while it was still 
used.


Kind regards,
Alexander



Re: German sharp S uppercase mapping

2024-12-01 Thread Markus Scherer via Unicode
I searched amazon.de for “der große”. Not one capital ẞ on the first two
pages. Among results on those pages for recent items: “ARTHUR DER GROSSE“,
“AIR - DER GROSSE WURF“, “DER GROßE WADAS“, “DER GROSSE GOPNIK“, “DAS
GROSSE BUCH DER GUTEN GEDANKEN“, “DER GROSSE SOMMER“, “DER GROSSE
YOUTUBER-BEEF“, “DAS GROSSE BUCH DER SELBST REFLEXION“, “DER GROSSE
SCHLEIMFILM“, “ALEXANDER DER GROSSE“, 2x “DER GROSSE GATSBY“

Surely there are no significant limitations for book and movie titles of
the last 2-3 years that would keep their publishers from using the
capital ẞ if they wanted to.

markus


Re: German sharp S uppercase mapping

2024-12-01 Thread Markus Scherer via Unicode
Hi Daniel,

On Sun, Dec 1, 2024 at 12:56 AM Daniel Buncic via Unicode <
unicode@corp.unicode.org> wrote:

> But there definitely is change.  Compare old book covers of “The Great
> Gatsby” in German like
>https://nikol-verlag.de/cdn/shop/products/9783868205268_351x523.jpg
>
>
> https://images.thalia.media/00/-/0f18217aef9f4c779a86bc28985ce4d7/der-grosse-gatsby-gebundene-ausgabe-f-scott-fitzgerald.jpeg
> (which have the title as “DER GROSSE GATSBY”)
> with two covers from 2022 and 2023
>
>
> https://images.thalia.media/00/-/20013f18fd854595b609ea31f7dba854/der-grosse-gatsby-gebundene-ausgabe-f-scott-fitzgerald.jpeg


This one looks very much like a lowercase ß.

https://einfachebuecher.de/thumbnail/1f/08/26/1680510657/Der%20gro%C3%9Fe%20Gatsby%20-%20cover%20Lowres_1920x1920.jpg
> (where the title is spelled “DER GROẞE GATSBY”).
>

This one, yes.

A few years ago, the number of capital ẞ you could see was exactly zero.
>   Now they are popping up more and more.  For the above reasons, they
> are not the majority yet, but they are increasing fast.  Language change
> is happening in front of our eyes.
>

Before we switch the default behavior of uppercasing libraries, it would be
useful to get more examples of the transition.
Maybe queries for certain things that show the proportion of SS, ß, ẞ.
Or prominent publications that use all caps with ẞ.

Danke / schönen Gruß,
markus


Re: German sharp S uppercase mapping

2024-12-01 Thread Walter Tross via Unicode
On Sun, Dec 1, 2024 at 2:54 PM Otto Stolz via Unicode <
unicode@corp.unicode.org> wrote:

> In German orthography, double consonants mark the preceding vowel as
> being short (if there isn’t just a mere co-incidence in a compound,
> e. g. “Mausschwanz” (mouse tail)). As the “a” in “Straße” is long,
> you write “ß”; as the “a” in “Gasse” is short, you write “ss”.
> Cf. 
> and .
>

And to clarify the need for the ß: it ensures the pronunciation as /s/ as
opposed to /z/
(Straße is pronounced /ʃtʁaːsə/, while *Strase would be pronounced
/ʃtʁaːzə/)

Walter


Re: German sharp S uppercase mapping

2024-12-01 Thread Otto Stolz via Unicode

Hello,

am 2024-12-01 um 1:44 Uhr hat Steffen Nurpmeso geschrieben:

Yes, and Straße/STRASSE is such a thing if there is Gasse/GASSE
but which just has the same "S-sound", and always had since the
earth existsed (1972).  Was it Gaße, ever? 


No, never.

In German orthography, double consonants mark the preceding vowel as 
being short (if there isn’t just a mere co-incidence in a compound,

e. g. “Mausschwanz” (mouse tail)). As the “a” in “Straße” is long,
you write “ß”; as the “a” in “Gasse” is short, you write “ss”.
Cf. 
and .

Best wishes,
  Otto Stolz


Re: German sharp S uppercase mapping

2024-12-01 Thread Otto Stolz via Unicode

Hello,

am 2024-12-01 um 1:44 Uhr hat Steffen Nurpmeso geschrieben:

Yes, and Straße/STRASSE is such a thing if there is Gasse/GASSE
but which just has the same "S-sound", and always had since the
earth existsed (1972).  Was it Gaße, ever? 


No, never.

In German orthography, double consonants mark the preceding vowel as 
being short (if there isn’t just a mere co-incidence in a compound,

e. g. “Mausschwanz” (mouse tail)). As the “a” in “Straße” is long,
you write “ß”; as the “a” in “Gasse” is short, you write “ss”.
Cf. 
and .

Best wishes,
  Otto Stolz


Re: German sharp S uppercase mapping

2024-12-01 Thread Daniel Buncic via Unicode

Am 01.12.2024 um 04:15 schrieb Markus Scherer via Unicode:

As a library implementer and German speaker, I have been looking out
for the supposed sea change in usage, and haven't seen it.


Dear Markus,

There are three things that make the change less visible.  First, domain 
names and other ASCII environments as well as stylistic devices.  I live 
in a town called Brühl and work in a city called Köln (which has its own 
top-level domain, .koeln).  You see a surprising number of signage, 
logos, etc. which spell the names as “Bruehl” or “Koeln”:

  https://www.cvjm.koeln/ueber-uns/175-jahre.html
  https://mhi-koeln.de/
  http://kleinbahn.koeln/
  https://koeln-weekend.de/
  https://www.ebay.de/str/koelnartkunsthandel
This is also true of Gießen, where the incorrect spelling “Giessen” has 
a long tradition of giving the name a more artistic, more modern, or 
more international flavor:
 
https://img.oldthing.net/8867/27104185/0/p/AK-Ansichtskarte-Giessen-Lahn-Behoerdenhochhaus-Hauptbahnhof-Stadttheater-Schloss-Wappen-Kat.webp

  https://www.sportkreis-giessen.de/
  https://www.dayuse.de/hotels/germany/trip-inn-city-hotel-giessen
  https://www.statt-giessen.com/
  https://giessenkreativ.de/
  http://transit-giessen.de/
If the spelling “Giessen” is ubiquitous, one should not be surprised to 
also find “GIESSEN”.


Second, ignorance about the spelling in general.  I think I have seen 
“Straße” and “Spaß” spelled as “Strasse” and “Spass” more often than the 
correct spelling:

  https://bigtime.ch/6476-tm_thickbox_default/viel-spass-damit-stempel.jpg
  https://access-im-unternehmen.de/Flexible_Adressen/
 
https://www.kartenparadies.at/cdn/shop/products/GanzvielSpasszumGeburtstagKW-460_900x.jpg
 
https://app.fuxcdn.de/api/fb437fc8-fbec-4c45-a28c-7bebd9aa4d5e/thumbnail/74/25/f8/1668190533/mainspatzen-arbeit-macht-spass-v_800x800.jpg

In contrast to this, “STRASSE” and “SPASS” are not even incorrect.

Third, ignorance about technical possibilities.  So many designs are 
nowadays made by people who know how to handle QuarkXPress or InDesign 
but who have no idea about typography or even orthography.  Designers 
often choose fonts for corporate designs that only contain the absolute 
minimum of characters.  My own university chose a font that does not let 
me write my name, Bunčić, let alone write a Russian translation in the 
same font.  Such minimal fonts also do not contain capital ẞ.


But there definitely is change.  Compare old book covers of “The Great 
Gatsby” in German like

  https://nikol-verlag.de/cdn/shop/products/9783868205268_351x523.jpg
 
https://images.thalia.media/00/-/0f18217aef9f4c779a86bc28985ce4d7/der-grosse-gatsby-gebundene-ausgabe-f-scott-fitzgerald.jpeg

(which have the title as “DER GROSSE GATSBY”)
with two covers from 2022 and 2023
 
https://images.thalia.media/00/-/20013f18fd854595b609ea31f7dba854/der-grosse-gatsby-gebundene-ausgabe-f-scott-fitzgerald.jpeg
 
https://einfachebuecher.de/thumbnail/1f/08/26/1680510657/Der%20gro%C3%9Fe%20Gatsby%20-%20cover%20Lowres_1920x1920.jpg

(where the title is spelled “DER GROẞE GATSBY”).

A few years ago, the number of capital ẞ you could see was exactly zero. 
 Now they are popping up more and more.  For the above reasons, they 
are not the majority yet, but they are increasing fast.  Language change 
is happening in front of our eyes.


All the best,

Daniel

--
Prof. Dr. Daniel Bunčić
===
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:   +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:daniel.bun...@uni-koeln.de = dan...@buncic.de
Threema:   https://threema.id/8M375R5K
===
Homepage:  http://daniel.buncic.de/
Academia:  http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===


Re: German sharp S uppercase mapping

2024-11-30 Thread Steffen Nurpmeso via Unicode
Markus Scherer wrote in
 :
 |I have followed this thread with some interest, having grown up in Germany.
 ...
 |The https://www.giessener-zeitung.de/ has a prominent all caps title with
 |ẞ. I think they have been an early adopter. However, their front page
 |weather forecast is titled GIESSEN, so this could be chalked up as title
 |calligraphy.
 |
 |The https://www.giessener-allgemeine.de/ does not seem to use any all caps.
 |The https://www.giessener-anzeiger.de/ has a topic for LANDGERICHT GIESSEN.
 |
 |The city website gießen.de  also still redirects
 |to giessen.de and does not use any all caps.

Ah!  Middle-Hesse, and if i recall correctly the song text of the
Rodgau Monotones was "Gießen und Wies[..]baden bloß, so
gnadenlos".  (Aka "pouring and how-to-bath-it merely, so
ruthless", ähem .. directly translated, that is.  Also, "Erbarme.
Zu spät. Die Hesse komme", "Have mercy. Too late. The Hesse are
coming.".  (But that was after >30 year so-called left wing
Hesse government, now we come near 30 years right wing, i presume
it takes a lng time until that worn down thing comes again.
(I may say that as i am from Hesse myself, yet south, which was
a different Hesse until we had to unite said the americans, yet
our dynasty died out, unless we adopt King Charles of course,
which we could, but the last member dedicated south to the north
hesse dynasty, so in fact we are, now, more slim than before.
But that of course off-topic.)))
I think Gießen is somewhere near Marburg and Fulda, or something.
North of Frankfurt for sure.  North of where Elvis was even iirc!
A whole universe apart.
Talking about universe, i wonder when the future time has come
when computers will be capable to say ä not ae, for example.  In
Germany that is.  I am sure in France or what they can, already.
A more pressing need, in my opinion.  Than the thread subject.

A nice Sunday i wish.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



Re: German sharp S uppercase mapping

2024-11-30 Thread Markus Scherer via Unicode
I have followed this thread with some interest, having grown up in Germany.
I do still visit fairly frequently.

It's totally possible to change uppercasing functions if & when warranted.
But is it, yet?

As a library implementer and German speaker, I have been looking out for
the supposed sea change in usage, and haven't seen it.

For example, looking at product packaging, where all caps are more commonly
used than in regular text, I see SS and lowercase ß, but little if any ẞ.
Try searching on amazon.de for "fußcreme" or "füße".

I have some photos from Germany with similar results, with only one cereal
box using the capital in “EXTRA GROẞE CRUNCHIES“.

fußball.de  still (despite browsers abandoning
transitional processing) redirects to fussball.de, and that page advertises
itself as FUSSBALL.DE  right up top in the site menu.
There is an article about KINDERFUSSBALL

and GRÖSSERE CHANCEN.

The https://www.giessener-zeitung.de/ has a prominent all caps title with
ẞ. I think they have been an early adopter. However, their front page
weather forecast is titled GIESSEN, so this could be chalked up as title
calligraphy.

The https://www.giessener-allgemeine.de/ does not seem to use any all caps.
The https://www.giessener-anzeiger.de/ has a topic for LANDGERICHT GIESSEN.

The city website gießen.de  also still redirects
to giessen.de and does not use any all caps.


So it looks very much like in German all caps SS reigns, ß is fairly
common, and ẞ is very niche.

markus


Re: German sharp S uppercase mapping

2024-11-30 Thread Steffen Nurpmeso via Unicode
Doug Ewell via Unicode wrote in
 :
 |Thanks to Asmus for saying what I had planned to say, except that his \
 |was better-worded, more carefully put together, and more authoritative.
 |
 |Casing for text meant for human readers should follow current local \
 |conventions.
 |
 |Casing for text meant for machine processing (file systems, databases, \
 |etc.) must remain stable, even when local conventions change.

Sorry that makes totally no sense to me.
I would, however, not bring in uppercase sharp S for quite some
time.  But at some time, or when really the SS would be banned "in
all Germans" which are used as official languages, sooner that is,
then the current Unicode data would be just wrong.

I do not agree with what was said on my superficial level anyhow,
even though i have seen bite marks of families, on house corners
and cars, everywhere!, mind you, who were eagerly waiting for
being able to write their ß as ß instead of-f- SS.
Yes, and Straße/STRASSE is such a thing if there is Gasse/GASSE
but which just has the same "S-sound", and always had since the
earth existsed (1972).  Was it Gaße, ever?  Maybe Gasze, i am not
a Germanist.  But just like little Gaza the little Gasse noone
cares about.

Just my one cent.  Have a nice Sunday, if at all possible!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



RE: German sharp S uppercase mapping

2024-11-30 Thread Doug Ewell via Unicode
Thanks to Asmus for saying what I had planned to say, except that his was 
better-worded, more carefully put together, and more authoritative.

Casing for text meant for human readers should follow current local conventions.

Casing for text meant for machine processing (file systems, databases, etc.) 
must remain stable, even when local conventions change.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org




Re: German sharp S uppercase mapping

2024-11-30 Thread Asmus Freytag via Unicode

On 11/27/2024 12:15 PM, Dominikus Dittes Scherkl via Unicode wrote:

Am 26.11.24 um 17:59 schrieb Peter Constable via Unicode:

The case pair stability requirement applies to default data.


"Data" is ambiguous here, so let's unpack that a bit.

The /policy/ only applies to the default tables (data files) as 
published in the UCD.


It does not apply directly to any other "data", whether that refers to 
text or tables.
Conforming applications can still tailor case mapping behaviour use 
language-specific overrides. However, the default caise pairs as 
defined in the SpecialCasing.txt and UnicodeData.txt files must 
remain stable so as not to break existing implementations of file 
systems or other identifier systems that depend on the default case 
pairs.


The /requirement/ that lead to this policy apply to certain scenarios 
only (not that that makes them unimportant). A key one is caseless 
identifiers or case conversions of same (including file names). Here you 
need the strict guarantee that repeating the process on a different 
system / different version does not change the result.


There's also the use of these data as the backbone onto which you apply 
tailorings. Again, stability is tantamount, because changing the 
backbone would require changes (or at least review) of all tailorings. 
(Something that Unicode does not control).


However, speaking of this as a "default" is confusing to readers who 
think in terms of text processing or authoring environments where a 
different set of requirements rule. Here, the proper "default" is the 
best implementation of a culturally appropriate case transform. And what 
is "best"  can change over time. The fact that this is implemented as a 
tailoring on some underlying "default" is not something users need to 
consider.


Therefore the word "default" is also subject to misunderstanding and we 
might to well to consider whether we've framed this correctly in the text.



I think nowadays ẞ is preferred over SS, and _especially_ the default
should be changed to use this, because if a text is automatically
processed by e.g. functions like toUpper(), the old form is not
invertible. 


This statement is a clear example of what I mean. From the perspective 
of a user, the minute you use something that's not an "immutable" 
identifier-safe implementation, you expect as your "default" a 
culturally appropriate tailoring.


If you have a toUpper method that takes a locale identifier or object 
then you should not need to apply further tailorings, and how the 
behavior of the locale tracks changes to the rules as used in the 
culture is subject to a different debate (e.g. whether to define 
sub-locales for either the old or new rules or both, etc.).


The real question with casing is what do you do with text that uses 
"locale-adjacent" characters?


If a French customer has the name of a German supplier in a database, 
why should that use the old rules, while the same for a German customer 
should use the new rules?


Casing is one of the algorithms that need very little tailoring, unlike 
sorting, and therefore there ought to be a different level of "default", 
one which handles all characters with the same behavior, as long as two 
or more locales don't disagree on the casing (like, for example, dotless i).


This would not be an "immutable" version of the casing, but the most 
current "least common denominator" version, and which would be targeted 
to scenarios where otherwise locale-dependent tailorings would be used, 
except that this one would be a "multi-locale" variant.



If the old form is intended, it is very easy to replace
every occurrence of ẞ by SS, but in the other direction not every SS has
to be replaced by ẞ, making it a time consuming manual task to change 
back.

And this problem was the reason why ẞ was introduced at all. After this
introduction the main reason to use SS was that it was not officially
allowed to use ẞ until 2017. Now the only reason to use SS as uppercase
would be if old equipment is used, that doesn't provide the new letter.
Luckily that is vanishing.


Not entirely true. Again, if you think of authoring text, I would agree. 
If you think of identifiers or file names, you better not change your 
uppercasing. There are some things that look like text but for which 
strict stability is more important than cultural correctness.


*Conclusion:* the fact that we are having this discussion at all points 
to the need to look at our way of describing this topic and possible 
deficiencies in describing the full eco-system. Bare bones Unicode, 
including UCD, does not solve everything but apparently we are not doing 
enough to hand people off to CLDR, for example, when they are looking 
for locale-appropriate solutions. In other words, we can and should 
improve our presentation and positioning, but that would best be done in 
response to somebody taking the time to track down some of the key text 
passages (or file headers) and filin

Re: German sharp S uppercase mapping

2024-11-27 Thread Dominikus Dittes Scherkl via Unicode

Am 26.11.24 um 17:59 schrieb Peter Constable via Unicode:

The case pair stability requirement applies to default data. Conforming 
applications can still tailor case mapping behaviour use language-specific 
overrides. However, the default case pairs as defined in the SpecialCasing.txt 
and UnicodeData.txt files must remain stable so as not to break existing 
implementations of file systems or other identifier systems that depend on the 
default case pairs.


I think nowadays ẞ is preferred over SS, and _especially_ the default
should be changed to use this, because if a text is automatically
processed by e.g. functions like toUpper(), the old form is not
invertible. If the old form is intended, it is very easy to replace
every occurrence of ẞ by SS, but in the other direction not every SS has
to be replaced by ẞ, making it a time consuming manual task to change back.
And this problem was the reason why ẞ was introduced at all. After this
introduction the main reason to use SS was that it was not officially
allowed to use ẞ until 2017. Now the only reason to use SS as uppercase
would be if old equipment is used, that doesn't provide the new letter.
Luckily that is vanishing.



Re: German sharp S uppercase mapping

2024-11-27 Thread Daniel Buncic via Unicode

Am 27.11.2024 um 13:25 schrieb Otto Stolz via Unicode:

So, the wording of the sentence has been reversed, but the example
is given in the same order as in the previous version.


I will stop discussing the interpretation of this sentence now, but this 
is interesting: In the pdf version on the website of the orthography 
council 
(https://www.rechtschreibrat.com/DOX/RfdR_Amtliches-Regelwerk_2024.pdf, 
p. 48) it is “STRAẞE – STRASSE”.  On the website of the IDS (Institute 
for the German language), which is supposed to have just an HTML version 
of the same text, it is “STRASSE – STRAẞE”.  Obviously some kind of 
copy-paste error.


Best wishes,

Daniel

--
Prof. Dr. Daniel Bunčić
===
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:   +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:daniel.bun...@uni-koeln.de = dan...@buncic.de
Threema:   https://threema.id/8M375R5K
===
Homepage:  http://daniel.buncic.de/
Academia:  http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===


Re: German sharp S uppercase mapping

2024-11-27 Thread Otto Stolz via Unicode

Am 2024-11-26 um 21:41 Uhr hat Daniel Bunčić geschrieben:
In this light, see the change from the 
previous version of the rule (§25 E3) to the current one:

…

“Bei Schreibung mit Großbuchstaben ist neben der Verwendung des
Großbuchstabens ẞ auch die Schreibung SS möglich: Straße – STRAẞE –
STRASSE.”


This quote is not entirely correct. Rather, 
 (the very source 
Daniel has given)

says:
E3: Bei Schreibung mit Großbuchstaben ist neben der Verwendung 
des Großbuchstabens ẞ auch die Schreibung mit SS möglich: 
Straße – STRASSE – STRAẞE.


So, the wording of the sentence has been reversed, but the example
is given in the same order as in the previous version.

Hence, we cannot safely conclude that the spelling with upper-case ẞ
is preferred over the conventional SS spelling. I rather guess that
the two spellings are considered as equally valid alternatives – but
I’ll try to get an official statement (after having read the rest of
this lenghty discussion).

Best wishes,
   Otto Stolz



Re: German sharp S uppercase mapping

2024-11-27 Thread Ivan Panchenko via Unicode
Daniel Buncic via Unicode :
> But they literally turned around the wording
> of the previous version with “neben … auch … möglich” (‘in addition to …
> also … possible’), which you agree expressed a preference for SS.

Anyway, I do prefer “ẞ” and would also be interested whether there is
a place (CLDR or elsewhere) to propose it. But I suspect that even in
2017, it was not intended to recommend “SS” over “ẞ” and that they
simply added a sentence that dealt with the new capital letter in a
potentially misleading fashion. If they had simply turned around the
2017 wording

“Bei Schreibung mit Großbuchstaben schreibt man SS. Daneben ist auch
die Verwendung des Großbuchstabens ẞ möglich.”

to

“Bei Schreibung mit Großbuchstaben schreibt man ẞ. Daneben ist auch
die Verwendung von SS möglich.”,

you might have a point. This is not what happened, though. They
changed it to a SINGLE sentence about the POSSIBILITY of “SS” (besides
the POSSIBILITY of the (already known) capital letter; and since “SS”
is not a single letter, it is not surprising that it is not already
listed among the letters).



Re: German sharp S uppercase mapping

2024-11-26 Thread Daniel Buncic via Unicode

Am 26.11.2024 um 22:18 schrieb Asmus Freytag via Unicode:

The mistake is to assume that people supporting text processing
should be using the default table to begin with.


Dear Asmus, dear all,

Thank you very much, I totally see the point now.  That’s the mistake I 
also made.  So the casing tables are clearly not the place where the 
change should be reflected.


I am unfamiliar with the CLDR.  But I looked at 
https://www.unicode.org/cldr/charts/46/summary/de.html and saw that 
capital ẞ does not occur among the capital letters in line 3.  But I’m 
not sure that this is where it should occur.  I could not find any list 
of casing pairs.  Does that exist somewhere?  Or is the place where 
capital ẞ would have to be added?


Am 27.11.2024 um 01:16 schrieb Kent Karlsson via Unicode:

There's one more wrinkle. Because the sharp S is not natively
used outside German,

Did you mean “outside Germany”? It is not used for German in 
Switzerland. But, IIUC, used for Colognian/Kölsch with the 
uppercase/lowercase mapping. Or, at least, so I was told several 
years ago.


No, “outside German” is perfectly correct.  The ß is used in German in 
Austria, Luxembourg, Belgium, and Italy, where it is an official, 
co-official, or regionally official language, as well as by German 
speakers in other countries.  Kölsch is a German dialect that, like some 
other German dialects, happens to have its own ISO code and locale, but 
I doubt that even its speakers (proud as they are of their dialect) 
would describe it as “outside German”.  And that is precisely why it 
uses the German alphabet, including ß.


Am 26.11.2024 um 23:13 schrieb Ivan Panchenko via Unicode:

I still disagree.


Yes, if this was just any text, you could interpret the sentence the way 
you do.  But the writers of these official rules are perfectly aware 
that “auch” (‘also’) in a text like this is a technical term.  They 
would have formulated it differently if they had meant the two 
alternatives to be equal.  But they literally turned around the wording 
of the previous version with “neben … auch … möglich” (‘in addition to … 
also … possible’), which you agree expressed a preference for SS.  And 
there is good reason to prefer capital ẞ now that it is technically 
available in most environments, with all the problems SS causes in 
personal names and in general with turning all-caps text back into 
normal lowercase text (how do you tease apart SS → ß and SS → ss?).


All the best,

Daniel

--
Prof. Dr. Daniel Bunčić
===
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:   +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:daniel.bun...@uni-koeln.de = dan...@buncic.de
Threema:   https://threema.id/8M375R5K
===
Homepage:  http://daniel.buncic.de/
Academia:  http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===



Re: German sharp S uppercase mapping

2024-11-26 Thread Kent Karlsson via Unicode
Skickat från min iPhone26 nov. 2024 kl. 22:19 skrev Asmus Freytag via Unicode :

  

  
  
On 11/26/2024 8:59 AM, Peter Constable
  via Unicode wrote:


  The case pair stability requirement applies to default data. Conforming applications can still tailor case mapping behaviour use language-specific overrides. However, the default case pairs as defined in the SpecialCasing.txt and UnicodeData.txt files must remain stable so as not to break existing implementations of file systems or other identifier systems that depend on the default case pairs.


Peter

The case pair stability is indeed important, if not essential for
  identifiers. However, in the past, we may have been a bit cavalier
  on how an identifier-safe casing (default) intersects with text
  processing in general.
I think there's a case to be maded for someone writing a short
  proposal to clarify whatever language we are using and to make
  clear, among other things, that **text processing** should be
  using the appropriate locale-based casing (from CLDR).

We should probably go further and explicitly reposition the
  "default" casing as "identifier-safe" because that is what
  motivates the stability requirements.
There's one more wrinkle. Because the sharp S is not natively
  used outside German,Did you mean “outside Germany”? It is not used for German in Switzerland. But, IIUC, used for Colognian/Kölsch with the uppercase/lowercase mapping. Or, at least, so I was told several years ago. it would be reasonable to have a
  multicultural case mapping that treats this, and all cases where
  local usages do not conflict, according to the evolving rules of
  of the local culture(s) that use the character, so that there's an
  optional level between identtifier stable and fully local case
  mappings. 

To borrow a bit of terminology from some other implementation we
  need these levels:

  InvariantCulture (stable default/identifier-safe)
  CommonCulture (current culture, common defaul texcept for
conflicting use)
  LocalCulture (current culture... one per locale ID)
  (Throw in regional if you have to, somewhere).

The ideal case would be one where out of the box text processing
  operations can default to something that performs according to the
  current (evolving) needs for as many cultures as possible (except
  where they have contradictory treatment of the same character).
And where the fully invariant case is positioned to make clear
  where it's applicable (identifier) and where its use is therefore
  mandatory.
A./



  
-Original Message-
From: Unicode  On Behalf Of Daniel Buncic via Unicode
Sent: November 26, 2024 2:44 AM
To: unicode@corp.unicode.org
Subject: Re: German sharp S uppercase mapping



As to the case pair stability guarantee, ...  if case folding changes in that language, then Unicode has to adjust.  It would be poor service to the public to stick to a case mapping that is no longer valid just because Unicode came into existence at a time when it was still valid. ...







  



Re: German sharp S uppercase mapping

2024-11-26 Thread Ivan Panchenko via Unicode
Erratum:
> • In 2011, “ẞ” was introduced, and they dealt with this case by adding

It was in 2017.



Re: German sharp S uppercase mapping

2024-11-26 Thread Ivan Panchenko via Unicode
Asmus Freytag via Unicode :
> I totally agree with the parsing of the sentence. It is quite clear,
> that the way this statement is written implies the use of captial sharp
> S as the ordinary (or "unmarked") case, while the "SS" can be used in
> addition (implicit in that is the suggestion that you might have a
> particular reason, such as compatibility with older usage, but also,
> things like identifiers.

I still disagree. Let us look at the full context:

• In 2006, only “SS” was allowed (“Bei Schreibung mit Großbuchstaben
schreibt man SS”; italicized “SS”).

• In 2011, “ẞ” was introduced, and they dealt with this case by adding
the sentence “Daneben ist auch die Verwendung des Großbuchstabens ẞ
möglich” (italicized “ẞ”) to § 25 E3.

• However, the wording suggested, perhaps unintendedly, that the “SS”
form is the standard one because “Bei Schreibung mit Großbuchstaben
schreibt man SS” remained in this form rather than, say, in the form
“[…] KANN […] geschrieben werden” (“[…] can be written”). Perhaps for
this reason, the wording was changed in 2024 to what we have now.

(And yet, people can still read too much into it …)

Here is how I understand it: “ẞ” is already shown among the 30 capital
letters in the preliminary remarks (“Vorbemerkungen”). Given that, §
25 E3 is only required to introduce the “SS” alternative, so the point
is that besides (“neben”) the capital sharp S (which we already know),
“SS” is also (“auch”) allowed. In this context, “also” does not mean
that the other variant is preferred, it is just there for an addition
to WHAT WAS ALREADY SHOWN in the “Vorbemerkungen”.

I do agree that “auch” (as opposed to “oder” or just a comma) can be
used for a secondary variant in a dictionary entry, but it is not so
clear to me that we can apply such an understanding to our case. Even
if it is the case that the writers have written it in this way because
they personally prefer the capital eszett, this preference is
certainly not part of the literal meaning and should, in my opinion,
not be considered as officially codified; it would simply be a
personal preference of the writer(s). And it is certainly wrong that
“SS” can only be used where “ẞ” is unavailable.



Re: German sharp S uppercase mapping

2024-11-26 Thread Asmus Freytag via Unicode

On 11/26/2024 12:41 PM, Daniel Buncic via Unicode wrote:

Dear Marius, dear Ivan, dear Peter, dear all,

Thanks to Marius for the compromise idea that the ß → SS mapping could 
remain in the standard table but ß → ẞ be handled as special casing 
for German.  However, I wonder what language the standard table would 
be there for then, given that ß is used in no other language but German.


Unicode's default casing table is essentially an "identifier-safe" 
casing table. Not only is it identifier-safe, it is also geared towards 
all situations where it needs to run unattended.


This is distinct from full-fidelity text processing. When 
text-processing functions support authoring (editing), there's an 
immediate quality control, and also, the same stability concerns related 
to both identifiers and unattended use do not apply.




(Or if the ß → SS rule was then only applied to those few older 
non-German texts that did use ß, it would be wrong in most cases, as 
in this Polish Bible from 1846: 
https://books.google.de/books?id=W4xbMAAJ&hl=de.  Google Books, 
certainly on the basis of some ß → ss rule, gives one of the words in 
the title as “Wssystko”, but that does not make sense; the word 
spelled “Wßystko” on the title page has to be transcribed as 
“Wszystko” (‘Whole’), in the same way as e.g. the first word in the 
heading of Genesis is spelled “PIERWSZE” (‘First’), not, of course, 
“PIERWSSE”.)


As to the interpretation of spelling rules, one has to know that 
“auch” (‘also’) in normative dictionaries always separates a secondary 
form from a preferred one.  Equal options are separated by “oder” 
(‘or’) or merely by a comma or a slash.  In this light, see the change 
from the previous version of the rule (§25 E3) to the current one:
I totally agree with the parsing of the sentence. It is quite clear, 
that the way this statement is written implies the use of captial sharp 
S as the ordinary (or "unmarked") case, while the "SS" can be used in 
addition (implicit in that is the suggestion that you might have a 
particular reason, such as compatibility with older usage, but also, 
things like identifiers.


“Bei Schreibung mit Großbuchstaben schreibt man SS. Daneben ist auch 
die Verwendung des Großbuchstabens ẞ möglich. Beispiel: Straße – 
STRASSE – STRAẞE.”
(‘When writing in capital letters, one writes SS. In addition to this, 
the use of the capital letter ẞ is also possible: Straße – STRASSE – 
STRAẞE.’ – 
https://www.rechtschreibrat.com/DOX/rfdr_Regeln_2016_redigiert_2018.pdf, 
p. 29)

↓
“Bei Schreibung mit Großbuchstaben ist neben der Verwendung des
Großbuchstabens ẞ auch die Schreibung SS möglich: Straße – STRAẞE –
STRASSE.”
(‘When writing in capital letters, in addition to using the capital 
letter ẞ, it is also possible to write SS: Straße – STRAẞE – STRASSE.’ 
– 
https://www.rechtschreibrat.com/DOX/RfdR_Amtliches-Regelwerk_2024.pdf, 
p. 48)


Before, capital ẞ was classified as ‘also possible’, now SS is ‘also 
possible’, and the order of the examples was also changed from 
“STRASSE – STRAẞE” to “STRAẞE – STRASSE”.  If they had meant the 
alternatives to be equal, they would have written something like “Bei 
Schreibung mit Großbuchstaben kann man ẞ oder SS schreiben” (‘When 
writing in capital letters, one can write ẞ or SS’).  It is correct 
that the order by itself does not indicate a preference, but the 
wording does.


I think your analysis is very conclusive on that aspect. The rules have 
definitely changed.
Peter, can you give me an example of an implementation that would 
crash if there was a new version of CaseFolding.txt or 
SpecialCasing.txt? Wouldn’t a programmer either copy the data of the 
file into their application so that it still works if the server 
unicode.org is down? And then changing the original would have no 
effect until the programmer decides to implement the change in their 
application, but then it would be their responsibility to take care of 
the effects of that change within their application.  Or in the worst 
case, the application would download its data directly from, say, 
https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt, but then a 
new version would just have to be stored under …/17.0.0/… and it would 
not affect the application.  How can a new version of a file like this 
directly “break existing implementations”? Probably I am 
misunderstanding something here.


The mistake is to assume that people supporting text processing should 
be using the default table to begin with.


.NET has a concept of an "invariant culture", which you can specify 
instead of a locale-specific culture. It will violate some cultures' 
preferences, but will produce a dependable and stable result. This is 
pretty much what the the Unicode "default" casing does.


Somebody should go over the text and look at all the descriptions and 
propose an update that more clearly explains that for any processing 
that should be correct (for any language) on needs to not us

Re: German sharp S uppercase mapping

2024-11-26 Thread Asmus Freytag via Unicode

On 11/26/2024 8:59 AM, Peter Constable via Unicode wrote:

The case pair stability requirement applies to default data. Conforming 
applications can still tailor case mapping behaviour use language-specific 
overrides. However, the default case pairs as defined in the SpecialCasing.txt 
and UnicodeData.txt files must remain stable so as not to break existing 
implementations of file systems or other identifier systems that depend on the 
default case pairs.


Peter


The case pair stability is indeed important, if not essential for 
identifiers. However, in the past, we may have been a bit cavalier on 
how an identifier-safe casing (default) intersects with text processing 
in general.


I think there's a case to be maded for someone writing a short proposal 
to clarify whatever language we are using and to make clear, among other 
things, that **text processing** should be using the appropriate 
locale-based casing (from CLDR).


We should probably go further and explicitly reposition the "default" 
casing as "identifier-safe" because that is what motivates the stability 
requirements.


There's one more wrinkle. Because the sharp S is not natively used 
outside German, it would be reasonable to have a multicultural case 
mapping that treats this, and all cases where local usages do not 
conflict, according to the evolving rules of of the local culture(s) 
that use the character, so that there's an optional level between 
identtifier stable and fully local case mappings.


To borrow a bit of terminology from some other implementation we need 
these levels:


 * InvariantCulture (stable default/identifier-safe)
 * CommonCulture (current culture, common defaul texcept for
   conflicting use)
 * LocalCulture (current culture... one per locale ID)
 * (Throw in regional if you have to, somewhere).

The ideal case would be one where out of the box text processing 
operations can default to something that performs according to the 
current (evolving) needs for as many cultures as possible (except where 
they have contradictory treatment of the same character).


And where the fully invariant case is positioned to make clear where 
it's applicable (identifier) and where its use is therefore mandatory.


A./




-Original Message-
From: Unicode On Behalf Of Daniel Buncic via 
Unicode
Sent: November 26, 2024 2:44 AM
To:unicode@corp.unicode.org
Subject: Re: German sharp S uppercase mapping



As to the case pair stability guarantee, ...  if case folding changes in that 
language, then Unicode has to adjust.  It would be poor service to the public 
to stick to a case mapping that is no longer valid just because Unicode came 
into existence at a time when it was still valid. ...





  1   2   >