Thank you John for educating me with the background and the ideas to
mitigate the pain. My comments to specific points are inline:
On 9/3/2014 5:50 AM, John C Klensin wrote:
--On Friday, August 29, 2014 16:15 -0600 Peter Saint-Andre
<[email protected]> wrote:
On 8/25/14, 7:52 PM, Dan Chiba wrote:
Hi Peter,
This is essentially generic but I think the degree of impact
would vary, depending on the profile. UsenameIdentifierClass
would be one of those severely affected because it is
important to evaluate usernames correctly and there are
various common practices of handing them. Sometimes case
insensitive, sometimes sensitive, among others.
Correct. Which is why it's difficult to formulate one rule for
all treatments of usernames, and why it took quite a bit of
discussion to come to consensus on the text in Section 4.2.1
of the SASLprepbis specification:
http://tools.ietf.org/html/draft-ietf-precis-saslprepbis-07#se
ction-4.2.1
If I understand your original message correctly, you are
looking for a way that, say, client software can know in
advance how a server will treat usernames with regard to case
mapping, based on the SASL mechanism or application protocol
in use. I was looking for that, too. Unfortunately, our
friends in the KITTEN WG (which works on SASL) were insistent
- and correct - that there is no deterministic formula here
because case mapping can even be a matter of deployment or
service policy and thus not determined by the SASL mechanism
or application protocol in use. Thus our carefully-crafted
text in Section 4.2.1.
I wish I could report happier news.
Dan,
Let me add an additional perspective to this because it relates
to something I've been saying in other contexts and you have, I
assume inadvertently, provided another example. This comment is
complementary to my response to Andrew on the specific
"fußball" example, which has been used many times with more or
less the same conclusion.
While it would be good for user agents to be able to better
predict server behavior and that carries with it the problems
Peter describes, ultimately computer systems don't care. The
users do. For most users, it would be very desirable if all of
these systems did what they want and expect -- in this case,
case folding should behave with all of the relevant locale and
language special cases (Turkic dotless "i" is the most usually
cited case, but there are many others, including different
conventions about decorated Latin characters in different
places).
Unfortunately, our experience with DWIM (or "expect" or "want")
systems at the individual user language and character-handling
level has been pretty poor. For starters, every user interface
has to have a method of telling the server, at very fine
granularity than we normally use, exactly what its expectations
are and servers have to be prepared to deal with that. That, as
Andrew points out in a different context, is an implementer's
nightmare.
I think our efforts might be better spent educating users (by
clear error messages if needed) into a different set of
expectations than "the computer should be able to figure out
what I intended". In the particular case of PRECIS (and
IDNA), three things would be extremely helpful:
(1) Instead of lots of profiles, sub-profiles, and variations on
profiles, reduce the number to an absolute minimum whose
applicability can be predicted and, ideally, have those differ
only in edge cases.
I totally agree with this. It's ideal for the UsernameIdentifierClass
profile to be used predominantly with as little variations as possible.
Having a lot of variations defeats the purpose.
(2) Educate the users that, if they want consistent,
predictable, and easy-to-understand behavior, they should simply
avoid the edge cases to the extent possible. If we have to
consider "upper case" as an edge case in that regard, so be it.
Yes. It is not clear to me what compatibility issues the users may face,
but I think it's important to get as many users to stick to the default
profile as possible.
(3) Adopt a model that, if two characters are assigned different
(after NFC) Unicode code points, they are different, that we
simply don't try to figure out the user's intent, and that forms
that are more likely to cause confusion than to provide useful
distinctions are simply not used together in the same context
rather than somehow mapped together.
It is my understanding that Unicode defines distinct but confusable
characters and they should be avoided when they are confusable.
It isn't ideal, but it is at least something we can fully
understand even if (3) will still require some judgment for some
subset of characters, "ß", dotless i, and those Final (and
other) forms included.
"ss" and "ß" are to be considered equivalent with case folding; I don't
understand what kind of judgment you think is needed.
The regular i and dotless i seem slightly different from ss and ß for
being distinct letters, but I think it is possible to treat them as
equivalent, as far as precis username comparison is concerned. Turkish
users can still use either letter as needed; when prepared, they can be
normalized to the same letter.
Regards,
-Dan
best,
john
_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis