On 3/14/14, 8:48 PM, Yutaka OIWA wrote:
Dear Peter and all PRECIS related members,
Thank you very much, and I'm really sorry that our definition
had a serious mistake which have confused many of you.
I'll talk with co-authors (especially, Nemoto-san) again and
revise the document as soon as possible,
to restart the next steps as fast as possible.
# That's why our specification has mentioned about
# the mapping for spaces, which are not included by current definition.
When I talked with Peter and Julian, we mentioned about
another character classes during the talk, and at that time
I should be noticed that I possibly referred the incorrect class.
A wider question: why do you think that the definition of an HTTP
username needs to be so loose? Do you define things this way to be
backward compatible with existing implementations, or do you really
think that this is a best practice? I'm truly curious. (And I wonder
if we even want to call this construct a "username"...)
I think the reasons are both.
1: For backward compatibility, we need to keep all ASCII "printable"
(U+0020 - U+007E, including SP) characters as is, as well as
Latin-1 printable (U+00A1 - U+00FF, except SHY) be independent.
I think "saslprepbis" does this.
2: For more semantic reasons, HTTP authentication will be a vehicle
for many different kinds of existing application frameworks, including
IMs, Web mails, social network, and others.
It should be able to accept all kinds of "user name" formats,
for example a simple "user ID" (yoiwa), user "Name" (Yutaka OIWA),
a mail address ([email protected]),
Social ID formats (@yoiwa or =yoiwa),
Here again, I think that "saslprepbis" handles those.
and many others.
Here we might have some disagreement.
If we think that http-auth needs to allow just about any string as a
"userid" (to use the term from RFC 2617 - I don't want to call it a
username), then even a PRECIS profile as loose as
draft-ietf-precis-nickname is too strict for your purposes.
However, I question whether those purposes are justified.
Do we really expect that an actual userid in http-auth might be any of
the following?
"Y u taka O i w a "
"♖♘♗♕♔♗♘♖" (i.e., the back line of white chess pieces)
"mycatisa bby" (where those spaces are actually a tab)
and so on.
Do we have evidence from existing applications that we need to support
strings of characters like those as userids in HTTP authentication? Or
are we being way too liberal in what we accept?
Unlike SASL or XMPP which have its own semantics in framework,
the authentication names in HTTP must be semanticless,
unstructured strings, which can later be added a meaningful
semantics for each application which uses Web/HTTP.
We are not likely to correct all possible use cases of
IDs which are to be used with HTTP (including future uses) and
then take a union set of these,
so instead we're defining a "grand-father" ID notations,
expecting that all ID string use-cases are likely to be subsets of it.
My concern is that this "grandfather set" is every Unicode character,
and if we allow that then we're really not providing any kind of helpful
guidance to application developers.
One fundamental assumption underlying the PRECIS work is that it is our
responsibility as internationalization experts to prevent application
developers from shooting themselves in the foot. In particular, the
IdentifierClass in the PRECIS framework tries to provide a safe subset
of characters, and the username construct in "saslprepbis" profiles the
IdentifierClass so that application developers can avoid trouble. As far
as I can see, what you are proposing would invite such trouble, and I'm
not comfortable with that.
At the same time, defining it just a "UTF-8" makes users' confusion
and inter-operability mess about possible "visiblly-same" strings,
so we must care about that side of string preparation with PRECIS.
For example, NBSP and other spaces should be replaced.
Hmm. So you are saying that if a user inputs the following string:
"Iam cey"
Then " " would be replaced with U+0020 (SPACE) as follows:
"Iam cey"
"Just UTF-8" is just as useless for internationalization as "just TLS"
is for security. I definitely agree that we need something more than
"just UTF-8", which is why we've put so much work into PRECIS. Although
we cannot solve the problem of confusable characters, we can define some
string classes that "first do no harm" (is there a Hippocratic Oath for
i18n?). So far, I do not think that what you are proposing does no harm,
in fact I think it is actively harmful to allow such a wide range of
Unicode characters into userids.
Peter
_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis