Hello John, others,
I'm going to answer about the question of ASCII vs. Latin, and which
Latin exactly, in a separate mail.
This mail is just about NFC/Turkish, in particular to correct some
misconceptions.
On 2025-11-02 12:07, John C Klensin wrote:
Jean, Martin,
One set of comments here illustrate why I have been urging caution in
making seemingly innocuous changes in response to seemingly innocuous
comments/ suggestions...
--On Thursday, October 30, 2025 13:10 -0500 Jean Mahoney
<[email protected]> wrote:
Current RPC operational procedure: Postal addresses are not
required in RFCs; however, if one is provided, the RPC will
update a country name to match the English short name for the
country found here: https:// www.iso.org/obp/ui/#search. This is
specified in the RFC Style Guide:
https://www.rfc-editor.org/rfc/rfc7322#section-4.12
I believe there was already feedback to also include the ASCII
equivalent here.
To be exact, this should be a Latin script equivalent, not an
ASCII equivalent.
[JM] Ack
Maybe not. A "Latin script equivalent" includes non-ASCII characters
used in common (and contemporary) Western European and Western
European languages and is a useful rule for, e.g., allowing Martin to
spell his name correctly. But it is not limited to that. Maybe a
rule about ASCII and what Unicode called the "Latin-1 Supplement"
(U+00CA through U+00FF or maybe even U+00A1 through U+00FF) would
work, although even that could lead to issues with dotless-i
(U+0131), which can cause NFC to fail unless the language is known,
No. Care is needed for Turkish (and Turkic languages) when using case
conversion (upper case to lower case and back). There is absolutely no
problem with NFC for Turkish.
and the Turkish / Romanian font style problem that the Unicode
Standard points out.
This was indeed a problem up to Unicode 2.1. The 'splitters' (as opposed
to the 'lumpers') won (as they almost always do), and new characters
were encoded in Unicode 3.0 (Sept. 1999).
Now we have these characters for Turkish:
015E;LATIN CAPITAL LETTER S WITH CEDILLA;Lu;0;L;0053 0327;;;;N;LATIN
CAPITAL LETTER S CEDILLA;;;015F;
015F;LATIN SMALL LETTER S WITH CEDILLA;Ll;0;L;0073 0327;;;;N;LATIN SMALL
LETTER S CEDILLA;;015E;;015E
0162;LATIN CAPITAL LETTER T WITH CEDILLA;Lu;0;L;0054 0327;;;;N;LATIN
CAPITAL LETTER T CEDILLA;;;0163;
0163;LATIN SMALL LETTER T WITH CEDILLA;Ll;0;L;0074 0327;;;;N;LATIN SMALL
LETTER T CEDILLA;;0162;;0162
And these characters for Romanian:
0218;LATIN CAPITAL LETTER S WITH COMMA BELOW;Lu;0;L;0053 0326;;;;N;;;;0219;
0219;LATIN SMALL LETTER S WITH COMMA BELOW;Ll;0;L;0073
0326;;;;N;;;0218;;0218
021A;LATIN CAPITAL LETTER T WITH COMMA BELOW;Lu;0;L;0054 0326;;;;N;;;;021B;
021B;LATIN SMALL LETTER T WITH COMMA BELOW;Ll;0;L;0074
0326;;;;N;;;021A;;021A
The Wikipedia page about the Romanian language written in Romanian (at
https://ro.wikipedia.org/wiki/Limba_română) uses the later, so it seems
that in practice the problem you point out is essentially gone.
Regards, Martin.
--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]