Re: German sharp S uppercase mapping

Daniel Buncic via Unicode Tue, 03 Dec 2024 00:15:14 -0800

Am 03.12.2024 um 02:51 schrieb Asmus Freytag via Unicode:

Rather than getting hung up on details of parsing one particular
part of one sentence, it would be more useful from Unicode's
perspective if someone (Daniel?) could sum up in a short document
base on this discussion where Unicode is behind the curve and to
make sure the support in CLDR is up to actual current practice and
not what it was 10 or 15 years ago.

Thank you very much for the idea. I could certainly sum up thearguments of the discussion (though I’m too busy to do it right now, youwould have to have a few weeks’ patience), but I still haven’tunderstood where in the CLDR such casing information is stored. Thereare data subsets that have “casing” in the title, but they only saywhether the days of the week, month names, language names, etc. arecapitalized in a certain language. There is a field called “mainexamplars” that contains all the small letters (for German, including ß)and another field called “index examplars”, which for German does noteven include Ä, Ö, and Ü. I surmise that this is only meant fornumbering items using letters (where indeed you can have parts A, B, C,etc. of a book, but you would never have a “part Ä”). I cannot find anyinformation saying something like a ↔ A, b ↔ B, etc.

For Turkish (https://www.unicode.org/cldr/charts/46/summary/tr.html),the “main letters” in the very first line are given as


[a b c ç d e f g ğ h ı iİ j k l m n o ö p r s ş t u ü v y z].

So there i and its capital counterpart İ are not separated by a space.But for German (https://www.unicode.org/cldr/charts/46/summary/de.html),the “main letters” are


[aä b c d e f g h i j k l m n oö p q r s ß t uü v w x y z],

where the missing space does not imply capitalization, so I guesschanging this list to “… s ßẞ t …” would not automatically inform peoplethat ß should be capitalized as ẞ.

Inhttps://www.unicode.org/versions/Unicode16.0.0/UnicodeStandard-16.0.pdfon page 198 I find:“Examples of case tailorings which are not covered by data inSpecialCasing.txt include: […] Uppercasing of U+00DF ‘ß’ LATIN SMALLLETTER SHARP S to U+1E9E LATIN CAPITAL LETTER SHARP S[.] The preferredmechanism for defining tailored casing operations is the Unicode CommonLocale Data Repository (CLDR), https://cldr.unicode.org, wheretailorings such as these can be specified on a per-language basis, asneeded.” So the idea is already there. On page 295 the problem with ßis addressed in detail, and right underneath it says, “Additionallanguage-specific or orthography-specific contexts and casing behavioris specified in the Unicode Common Locale Data Repository (CLDR),https://cldr.unicode.org.” So does this already exist? Or where doesit have to be added?


Can anybody help?

Best wishes,

Daniel

--
Prof. Dr. Daniel Bunčić
===============================================================
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:       +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:        [email protected] = [email protected]
Threema:       https://threema.id/8M375R5K
===============================================================
Homepage:      http://daniel.buncic.de/
Academia:      http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===============================================================

Re: German sharp S uppercase mapping

Reply via email to