--On Tuesday, April 21, 2026 22:44 -0500 Pete Resnick
<[email protected]> wrote:
> Trying to keep my chair hat on, but perhaps slipping into
> commentary. My apologies if I slip too far:
>
> On 21 Apr 2026, at 21:13, John C Klensin wrote:
>
>> If Christian will forgive me, let me state his position a tad
>> differently. "ASCII" is too restrictive. "Latin script" much
>> too general. If what we really mean is "Latin script as used in
>> Western European languages and writing systems narrowly derived
>> from that subset", it would be desirable to say that, with
>> anything else treated the way the documents treats non-Latin
>> names, i.e., with the the expectation of an alternate name in that
>> subset. We don't have a convenient name for that "as used in
>> Western European" subset, so we end up handwaving and/or relying
>> on the knowledge that no one using a name outside that Western
>> European subset has written and RFC yet and therefore we assume
>> they won't in the future.
>
> I think that's right, but I think it's even hand-wavier than you
> put it: If the former president of China, Hú Jǐntāo, wanted to
> start publishing RFCs, I think we would all be fine with him using
> the Pinyin form of his name as I just did without giving it an
> alternative, even though as far as I know neither "ǐ or "ā" is
> used in any "Western European" language. (I think "ā" is used in
> some Northern European places.) And I wouldn't be surprised if we
> could come up with weirder such examples.
At the risk of further damage to the proverbial dead horse, and
possibly falling into the same trap I suggested Martin might have
fallen into, I used "Western European" in the same way I think I've
run across in Unicode documentation (but could have been wrong about
that when I wrote my note), specifically the Basic Latin, Latin-1
Supplement, and Latin Extended-A blocks. That group includes both of
your examples as U+012D and U+010 respectively. Venturing into the
Latin Extended-B block and beyond opens the door to a large number of
characters that are easily confused with others, several more cases
where NFC is insufficient (this is, IIR, at least one of those in the
Latin-1 block), and characters were guessing at pronunciation based
on experience with the writing system group often called GLC (Greek,
Latin, and Cyrillic) will often lead one astray.
More below, including an explanation of why I think it was (and is)
worth pushing this far.
> So I think that Christian
> is correct that this is an "I know it when I see it" kind of thing.
> In the end, the RPC (and perhaps the RSAB) may go looking for
> advice for something novel.
Mostly agreed. But what I know when I see it may different from what
you know on sight and that may again differ from the same situation
for Martin, Christian, and others.
>> I can live with simply giving the RPC the authority and discretion
>> to decide what is allowed without an alternative in the Western
>> European subset (or Latin Script more generally) and what
>> alternatives are permitted, but it would be good if all of us
>> understood that the is a certain amount of handwaving -- and risk
>> of the RPC and RSAB getting drawn into some nasty disagreements --
>> involved.
>
> I don't think the alternative is any better: If we try to give come
> up with a sharp policy statement on this beyond, "If it's not
> Latin, it needs an alternative in Latin, and with the rest the RPC
> should do their best to do what the author prefers", I think for
> all our effort we will end up just being hand-wavy anyway and the
> RPC and RSAB will end up in the same position. I'd like to think
> that's why the WG was content with the current text (with more
> "affirmation by silence" than makes me completely comfortable, but
> such is life), and why neither you nor Christian came up with any
> text to make the WG significantly happier.
First of all, by moving from what I read as "if the name consists of
Latin script characters (from any block described by Unicode as Latin
script) than no alternative (or, earlier, 'equivalent' is allowed" to
allowing the RPC to negotiate an all-Latin-script alternative, we've
eliminated most, if not all, of the problem cases I was concerned
about. That comes with two qualifications:
(i) If an alternative is used, it should be restricted to
being composed of a much narrower set of characters than
"Latin Script" -- probably not all the way to ASCII, but
using an all-Latin string that is going to be unfamiliar to a
large fraction of the RFC-reading audience, including
ignorant Americans, may not do it. That is where Martin and
I may differ. I imagine that a sufficient portion of that
reader community may be unfamiliar enough with the
implications of dieresis markings above certain letters and
in certain languages that I'd like to see the RPC allowed to
at least negotiate the presence of alternatives and, for
those cases, all-ASCII alternatives at that. As long as RFCs
are written in English and no skill in character
understanding beyond those needed for English is expected,
then alternatives should be limited to ASCII unless
negotiated specifically with the RPC.
(ii) Even if there were an ASCII restriction for alternatives but
especially for the more general "Latin" cases, some discretion is in
order. As the most obvious example, I'd expect the RPC to push back
on the use of some punctuation characters in names. "Looks a bit off
to them" is, most likely, a sufficient criterion but, based on past
painful experiences, I fear episodes of insistent cuteness or
cleverness and we should be clear that the RPC has the authority to
push back in such cases. And clarity does not mean a reference to a
principle laid out in a document that is not even referenced.
Finally, I am hoping that Alexis and the RPC are reading this message
and at least parts of the rest of this thread. While I am ok with
the document not going into the level of detail in this discussion
thread as long as the discretion allowed the RPC is clear to readers
of the document, the RPC should understand the discussion as a "there
lie dragons" warning even if that warning is not explicit.
john
--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]