Dear Peter,
I have a side question: looking into saslprepbis spec,
username = userpart [1*(1*SP userpart)]
/ userpart ’@’ domainpart
userpart = 1*(idpoint)
domainpart = IP-literal / IPv4address / ifqdn
ifqdn = 1*1023(domainpoint)
Is the set of strings generated by "userpart '@' domainpart"
the proper subset of the set generated by userpart rule?
If so, I will be more-than-80% happy by just removing the
redundant second line from the username rule.
We (from HTTP side) do not want any rule which may
even *seem* to handle any characters (here, the @ mark)
specially than others. It's up to the application's own definition.
We're aware of some use case for which
the application defines a meaningful semantics
for strings with 2 @ marks.
(user@domain style identifier used for roaming accesses,
appended with another @provider-domain suffix.
So, the result is user@internal-domain@external-domain.)
(Exception: the colon is special in HTTP Basic and Digest,
but I see it as a known technical issue of these and
should not be generalized for every HTTP authentication.)
I have to (and will) dig in to the original Unicode table for
more analysis, but the thing I intuitively want seems to be
something between IdentifierClass and FreeFormClass.
My answers to the Peter's questions:
> "Y u taka O i w a "
Current HTTP allows it, and we don't need to reject that mostly.
(and saslprepbis's username acutally allows it except the last spaces.)
> "♖♘♗♕♔♗♘♖" (i.e., the back line of white chess pieces)
I have a 50-50 feeling on this, little bit offseted to accept.
It's really weired, *I* will never use this, but it's still harmless, I think.
Someone may be using this as a funny identifier.
I think I have to convince HTTP people that it's harmful.
> "mycatisa bby" (where those spaces are actually a tab)
I think this should be either rejected or replaced with a single space.
The support: most text input field of modern applications accepts
the top two, but for the last case some apps are rejecting TAB.
At least, almost all applications explicitly filter CR and LF out.
I think we can declare that "CR, LF, TAB, SHY and NBSP are harmful
so we should handle it as a part of I18N string handling".
It's the purpose of PRECIS, I agree.
(and that's why I am strongly opposing against "just UTF-8" in HTTPAUTH WG).
2014-03-25 12:36 GMT+09:00 Peter Saint-Andre <[email protected]>:
> On 3/14/14, 8:48 PM, Yutaka OIWA wrote:
>>
>> Dear Peter and all PRECIS related members,
>>
>> Thank you very much, and I'm really sorry that our definition
>> had a serious mistake which have confused many of you.
>> I'll talk with co-authors (especially, Nemoto-san) again and
>> revise the document as soon as possible,
>> to restart the next steps as fast as possible.
>>
>> # That's why our specification has mentioned about
>> # the mapping for spaces, which are not included by current definition.
>>
>> When I talked with Peter and Julian, we mentioned about
>> another character classes during the talk, and at that time
>> I should be noticed that I possibly referred the incorrect class.
>>
>>> A wider question: why do you think that the definition of an HTTP
>>> username needs to be so loose? Do you define things this way to be
>>> backward compatible with existing implementations, or do you really
>>> think that this is a best practice? I'm truly curious. (And I wonder
>>> if we even want to call this construct a "username"...)
>>
>> I think the reasons are both.
>>
>> 1: For backward compatibility, we need to keep all ASCII "printable"
>> (U+0020 - U+007E, including SP) characters as is, as well as
>> Latin-1 printable (U+00A1 - U+00FF, except SHY) be independent.
>
>
> I think "saslprepbis" does this.
>
>
>> 2: For more semantic reasons, HTTP authentication will be a vehicle
>> for many different kinds of existing application frameworks,
>> including
>> IMs, Web mails, social network, and others.
>> It should be able to accept all kinds of "user name" formats,
>> for example a simple "user ID" (yoiwa), user "Name" (Yutaka OIWA),
>> a mail address ([email protected]),
>> Social ID formats (@yoiwa or =yoiwa),
>
>
> Here again, I think that "saslprepbis" handles those.
>
>> and many others.
>
>
> Here we might have some disagreement.
>
> If we think that http-auth needs to allow just about any string as a
> "userid" (to use the term from RFC 2617 - I don't want to call it a
> username), then even a PRECIS profile as loose as draft-ietf-precis-nickname
> is too strict for your purposes.
>
> However, I question whether those purposes are justified.
>
> Do we really expect that an actual userid in http-auth might be any of the
> following?
>
> "Y u taka O i w a "
> "♖♘♗♕♔♗♘♖" (i.e., the back line of white chess pieces)
> "mycatisa bby" (where those spaces are actually a tab)
>
> and so on.
>
> Do we have evidence from existing applications that we need to support
> strings of characters like those as userids in HTTP authentication? Or are
> we being way too liberal in what we accept?
>
>
>> Unlike SASL or XMPP which have its own semantics in framework,
>> the authentication names in HTTP must be semanticless,
>> unstructured strings, which can later be added a meaningful
>> semantics for each application which uses Web/HTTP.
>>
>> We are not likely to correct all possible use cases of
>> IDs which are to be used with HTTP (including future uses) and
>> then take a union set of these,
>> so instead we're defining a "grand-father" ID notations,
>> expecting that all ID string use-cases are likely to be subsets of
>> it.
>
>
> My concern is that this "grandfather set" is every Unicode character, and if
> we allow that then we're really not providing any kind of helpful guidance
> to application developers.
>
> One fundamental assumption underlying the PRECIS work is that it is our
> responsibility as internationalization experts to prevent application
> developers from shooting themselves in the foot. In particular, the
> IdentifierClass in the PRECIS framework tries to provide a safe subset of
> characters, and the username construct in "saslprepbis" profiles the
> IdentifierClass so that application developers can avoid trouble. As far as
> I can see, what you are proposing would invite such trouble, and I'm not
> comfortable with that.
>
>
>> At the same time, defining it just a "UTF-8" makes users' confusion
>> and inter-operability mess about possible "visiblly-same" strings,
>> so we must care about that side of string preparation with PRECIS.
>> For example, NBSP and other spaces should be replaced.
>
>
> Hmm. So you are saying that if a user inputs the following string:
>
> "Iam cey"
>
> Then " " would be replaced with U+0020 (SPACE) as follows:
>
> "Iam cey"
>
> "Just UTF-8" is just as useless for internationalization as "just TLS" is
> for security. I definitely agree that we need something more than "just
> UTF-8", which is why we've put so much work into PRECIS. Although we cannot
> solve the problem of confusable characters, we can define some string
> classes that "first do no harm" (is there a Hippocratic Oath for i18n?). So
> far, I do not think that what you are proposing does no harm, in fact I
> think it is actively harmful to allow such a wide range of Unicode
> characters into userids.
>
> Peter
>
--
Yutaka OIWA, Ph.D. Leader, System Life-cycle Research Group
Research Institute for Secure Systems (RISEC)
National Institute of Advanced Industrial Science and Technology (AIST)
Mail addresses: <[email protected]>, <[email protected]>
OpenPGP: id[440546B5] fp[7C9F 723A 7559 3246 229D 3139 8677 9BD2 4405 46B5]
_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis