On 3/14/14, 8:48 PM, Yutaka OIWA wrote:
Dear Peter and all PRECIS related members,

Thank you very much, and I'm really sorry that our definition
had a serious mistake which have confused many of you.
I'll talk with co-authors (especially, Nemoto-san) again and
revise the document as soon as possible,
to restart the next steps as fast as possible.

# That's why our specification has mentioned about
# the mapping for spaces, which are not included by current definition.

When I talked with Peter and Julian, we mentioned about
another character classes during the talk, and at that time
I should be noticed that I possibly referred the incorrect class.

A wider question: why do you think that the definition of an HTTP
username needs to be so loose? Do you define things this way to be
backward compatible with existing implementations, or do you really
think that this is a best practice? I'm truly curious. (And I wonder
if we even want to call this construct a "username"...)
I think the reasons are both.

1: For backward compatibility, we need to keep all ASCII "printable"
     (U+0020 - U+007E, including SP) characters as is, as well as
     Latin-1 printable (U+00A1 - U+00FF, except SHY) be independent.

I think "saslprepbis" does this.

2: For more semantic reasons, HTTP authentication will be a vehicle
     for many different kinds of existing application frameworks, including
     IMs, Web mails, social network, and others.
     It should be able to accept all kinds of "user name" formats,
     for example a simple "user ID" (yoiwa), user "Name" (Yutaka OIWA),
     a mail address ([email protected]),
     Social ID formats (@yoiwa or =yoiwa),

Here again, I think that "saslprepbis" handles those.

and many others.

Here we might have some disagreement.

If we think that http-auth needs to allow just about any string as a "userid" (to use the term from RFC 2617 - I don't want to call it a username), then even a PRECIS profile as loose as draft-ietf-precis-nickname is too strict for your purposes.

However, I question whether those purposes are justified.

Do we really expect that an actual userid in http-auth might be any of the following?

"Y                     u            taka  O   i    w     a      "
"♖♘♗♕♔♗♘♖" (i.e., the back line of white chess pieces)
"mycatisa  bby" (where those spaces are actually a tab)

and so on.

Do we have evidence from existing applications that we need to support strings of characters like those as userids in HTTP authentication? Or are we being way too liberal in what we accept?

     Unlike SASL or XMPP which have its own semantics in framework,
     the authentication names in HTTP must be semanticless,
     unstructured strings, which can later be added a meaningful
     semantics for each application which uses Web/HTTP.

     We are not likely to correct all possible use cases of
     IDs which are to be used with HTTP (including future uses) and
     then take a union set of these,
     so instead we're defining a "grand-father" ID notations,
     expecting that all ID string use-cases are likely to be subsets of it.

My concern is that this "grandfather set" is every Unicode character, and if we allow that then we're really not providing any kind of helpful guidance to application developers.

One fundamental assumption underlying the PRECIS work is that it is our responsibility as internationalization experts to prevent application developers from shooting themselves in the foot. In particular, the IdentifierClass in the PRECIS framework tries to provide a safe subset of characters, and the username construct in "saslprepbis" profiles the IdentifierClass so that application developers can avoid trouble. As far as I can see, what you are proposing would invite such trouble, and I'm not comfortable with that.

     At the same time, defining it just a "UTF-8" makes users' confusion
     and inter-operability mess about possible "visiblly-same" strings,
     so we must care about that side of string preparation with PRECIS.
     For example, NBSP and other spaces should be replaced.

Hmm. So you are saying that if a user inputs the following string:

"Iam cey"

Then " " would be replaced with U+0020 (SPACE) as follows:

"Iam cey"

"Just UTF-8" is just as useless for internationalization as "just TLS" is for security. I definitely agree that we need something more than "just UTF-8", which is why we've put so much work into PRECIS. Although we cannot solve the problem of confusable characters, we can define some string classes that "first do no harm" (is there a Hippocratic Oath for i18n?). So far, I do not think that what you are proposing does no harm, in fact I think it is actively harmful to allow such a wide range of Unicode characters into userids.

Peter

_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Reply via email to