On 10/15/13 7:32 AM, Florian Zeitz wrote:
> Am 15.10.2013 14:45, schrieb Peter Saint-Andre:
>> On 10/15/13 4:16 AM, Florian Zeitz wrote:
>>> Am 14.10.2013 17:36, schrieb Peter Saint-Andre:
>>>> On 10/12/13 5:25 AM, Florian Zeitz wrote:
>>>>> On 12.10.2013 04:33, Peter Saint-Andre wrote:
>>>>>> On 10/11/2013 07:39 PM, Florian Zeitz wrote:
>>>>>>> On 09.10.2013 05:20, Peter Saint-Andre wrote:
>>>>> What I'm trying to avoid here is a certain ambiguity I think we have
>>>>> now. To give an example: Text we have in 6122bis now says «MUST consist
>>>>> only of Unicode code points that conform to the "FreeformClass" base
>>>>> string class». 
>>>>
>>>> Ah, I see your point. I think we'll need to adjust the text in all of
>>>> the documents that use the framework.
>>>>
>>>> So for instance, currently we say:
>>>>
>>>>    A resourcepart MUST consist only of Unicode code points that conform
>>>>    to the "FreeformClass" base string class defined in
>>>>    [I-D.ietf-precis-framework].  (Note that there is no XMPP-specific
>>>>    subclass for resourceparts.)
>>>>
>>>>    The normalization and mapping rules for the resourcepart of a JID are
>>>>    as follows, where the operations specified MUST be completed in the
>>>>    order shown:
>>>>
>>>>    1.  Fullwidth and halfwidth characters MAY be mapped to their
>>>>        decomposition equivalents.
>>>>
>>>>    [etc.]
>>>>
>>>> I think we'll need to change that to say something like this:
>>>>
>>>>    A resourcepart MUST consist only of Unicode code points that conform
>>>>    to the "JIDresourceFreeformClass" profile, which is defined as
>>>>    follows:
>>>>
>>>>    1. The base string class is the "FreeformClass" class specified in
>>>>    [I-D.ietf-precis-framework]
>>>>
>>>>    2.  Fullwidth and halfwidth characters MAY be mapped to their
>>>>        decomposition equivalents.
>>>>
>>>> That is, the base string class can immediately limit the characters that
>>>> you even consider. For the "JIDlocalIdentifierClass" profile (or
>>>> whatever we call it), if a character is disallowed by the
>>>> IdentifierClass then you don't need to consider it further, but if it's
>>>> allowed then you need to complete further processing (such as the
>>>> relevant mapping operations).
>>>>
>>> I'm not sure that is the right approach, see below.
>>> It also occurs to me that the framework draft currently calls this
>>> "Usage" instead of "Profile", or is that yet another concept?
>>
>> Based on other last call feedback (see exchanges with Martin), I've
>> provisionally replaced the concepts of subclasses and usages with the
>> concept of a profile. This has (IMHO) simplified matters.
>>
>>>>> For arguments sake lets pretend it also specified NFKC.
>>>>> Does "U+1D7D0 MATHEMATICAL BOLD DIGIT TWO" conform to "FreeformClass" in
>>>>> this case?
>>>>>
>>>>> It's either "Yes, that is clearly FREE_PVAL", or "No, that must be
>>>>> normalized to U+0032 DIGIT TWO", depending on your reading.
>>>>
>>>> Here again I think conformance applies only at the level of the profile.
>>>> The base string classes just limit the universe of characters you need
>>>> to consider.
>>>>
>>>>>>> This may even already be the intent, but as I said a profile can easily
>>>>>>> be defined such, that a string matches this criteria, but can never be
>>>>>>> produced after the specified normalization and all mappings were 
>>>>>>> applied.
>>>>>>> At any rate I think we need clearer text about the intention here,
>>>>>>> answering the question: "When is a string allowed by a profile?". I
>>>>>>> personally can not really tell from the draft right now.
>>>>>>
>>>>>> In part, I don't think it is the responsibility of this specification to
>>>>>> answer that question, other than to make it clear that you need to check
>>>>>> a string against the full set of rules defined by a profile. I do think
>>>>>> it would be helpful to provide some examples, although I think they
>>>>>> probably belong in the various specs that define the profiles (so far
>>>>>> that would be nickname, saslprepbis, and 6122bis).
>>>>>>
>>>>> I think I agree. And I think that is why I suggested leaving
>>>>> normalizations and mappings out of the classes. We want to tell people
>>>>> that they have to normalize, and we want them to think about mappings.
>>>>> But the exact order of those operations, who needs to perform them, what
>>>>> is valid in protocol slots, etc. is their business.
>>>>>
>>>>> And what that means in particular (to me) is that a profile would tell
>>>>> you after which of the steps a string needs to be PVALID under a certain
>>>>> class. 
>>>>
>>>> s/class/profile/ (IMHO)
>>>>
>>> I think that might be where we are talking cross purposes.
>>> My understanding is, that we have an algorithm that will tell us whether
>>> a codepoint is PVALID, according to the set of codepoints a class
>>> allows. Profiles have no influence on this decision. I.e. nothing that
>>> determines whether a codepoint is PVALID may be (re)defined by a
>>> profile. This always requires a subclass.
>>
>> It all depends on what you mean by the "P" in PVALID. In practice, my
>> feeling is that you want to know if a given codepoint is allowed in,
>> say, the localpart of an XMPP address. The base class isn't always going
>> to answer that question for you.
>>
> I'm actually talking about the calculated property here, be it named as
> it will. My point is we have an algorithm to calculate PVALID, but we
> don't have one for "is this an allowed string for a profile". Which, to
> be clear, I think is fine for the framework document.

OK. I agree.

>>>>> E.g. for SASLPrepbis you would (as I understand it) split up the
>>>>> simple username into its parts, perform normalization and all mappings
>>>>> except case mapping, and then check whether the userparts are
>>>>> PVALID/ID_PVAL. Case mappings would then be performed whenever the SASL
>>>>> mechanism tells you to.
>>>>>
>>>>> If everyone is required to specify such rules (usually less complex ones
>>>>> I'd hope) I don't see the benefit of formally including normalization
>>>>> and mappings in the definition of a class (in particular if that
>>>>> definition pretty much is "that's up to you").
>>>>> This would also allow making this text generic, and not repeating it for
>>>>> IdentifierClass and FreeformClass.
>>>>
>>>> I'm sorry, I've lost track of exactly what "this text" refers to here.
>>>>
>>> I think I had this much clearer in my head then I managed to express it,
>>> sorry. Let me try again:
>>> What we have right now are classes. These do two things:
>>> a) Limit the character set
>>> b) Specify a set of properties usages need to define
>>
>> s/usages/profiles/ yes (if we accept the simplification that Martin and
>> I worked out).
>>
>>> Specifically a) encompasses Valid, Disallowed, and Unassigned, while
>>> b) encompasses Width Mapping, Additional Mappings, Case Mapping,
>>> Normalization, and Directionality
>>>
>>> Note that the text describing b) is almost completely identical for
>>> IdentifierClass and FreeformClass. This is what "this text" was refering to.
>>
>> Thanks for the clarification.
>>
>>> My suggestion is to restrict classes to include only the a) properties.
>>> We would still say that a usage needs to specify everything in b) though.
>>
>> I'm confused again, because that's what I *thought* we were doing all
>> along. However, if that wasn't clear to you then we need to improve the
>> text.
>>
> Umm... I think I'm still not making myself clear :/. And I suspect that
> is, because "include" is ambiguous here.
> I do realize that we are already at the point where the classes
> explicitly specify everything in a), while everything in b) is profile
> dependent.
> What I'm saying is that IMHO b) should not be part of a class at all.

I understand. You're saying let's move everything about mapping and
normalization and directionality out of Section 3 (about string classes)
and move it to Section 4 (about profiles).

That makes a lot of sense. Sorry I was so dense. :-)

> I'd want to have classes, consisting of Valid, Disallowed and Unassigned
> sets. For these we have an algorithm determining whether a codepoint is
> within the class (i.e. PVALID or CONTEXT[JO] + rule), or not.
> Each usage (I think the term is more appropriate for this scheme), would
> then define everything from the b) set, and which class to restrict data to.

Got it.

BTW, I prefer the term profile because during WG meetings and list
discussions that's the term I've heard people naturally use. It seems
artificial to force people to call it a "usage" when their tendency is
to use the term "profile".

Peter

-- 
Peter Saint-Andre
https://stpeter.im/


_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Reply via email to