On 10/15/13 7:32 AM, Florian Zeitz wrote: > Am 15.10.2013 14:45, schrieb Peter Saint-Andre: >> On 10/15/13 4:16 AM, Florian Zeitz wrote: >>> Am 14.10.2013 17:36, schrieb Peter Saint-Andre: >>>> On 10/12/13 5:25 AM, Florian Zeitz wrote: >>>>> On 12.10.2013 04:33, Peter Saint-Andre wrote: >>>>>> On 10/11/2013 07:39 PM, Florian Zeitz wrote: >>>>>>> On 09.10.2013 05:20, Peter Saint-Andre wrote: >>>>> What I'm trying to avoid here is a certain ambiguity I think we have >>>>> now. To give an example: Text we have in 6122bis now says «MUST consist >>>>> only of Unicode code points that conform to the "FreeformClass" base >>>>> string class». >>>> >>>> Ah, I see your point. I think we'll need to adjust the text in all of >>>> the documents that use the framework. >>>> >>>> So for instance, currently we say: >>>> >>>> A resourcepart MUST consist only of Unicode code points that conform >>>> to the "FreeformClass" base string class defined in >>>> [I-D.ietf-precis-framework]. (Note that there is no XMPP-specific >>>> subclass for resourceparts.) >>>> >>>> The normalization and mapping rules for the resourcepart of a JID are >>>> as follows, where the operations specified MUST be completed in the >>>> order shown: >>>> >>>> 1. Fullwidth and halfwidth characters MAY be mapped to their >>>> decomposition equivalents. >>>> >>>> [etc.] >>>> >>>> I think we'll need to change that to say something like this: >>>> >>>> A resourcepart MUST consist only of Unicode code points that conform >>>> to the "JIDresourceFreeformClass" profile, which is defined as >>>> follows: >>>> >>>> 1. The base string class is the "FreeformClass" class specified in >>>> [I-D.ietf-precis-framework] >>>> >>>> 2. Fullwidth and halfwidth characters MAY be mapped to their >>>> decomposition equivalents. >>>> >>>> That is, the base string class can immediately limit the characters that >>>> you even consider. For the "JIDlocalIdentifierClass" profile (or >>>> whatever we call it), if a character is disallowed by the >>>> IdentifierClass then you don't need to consider it further, but if it's >>>> allowed then you need to complete further processing (such as the >>>> relevant mapping operations). >>>> >>> I'm not sure that is the right approach, see below. >>> It also occurs to me that the framework draft currently calls this >>> "Usage" instead of "Profile", or is that yet another concept? >> >> Based on other last call feedback (see exchanges with Martin), I've >> provisionally replaced the concepts of subclasses and usages with the >> concept of a profile. This has (IMHO) simplified matters. >> >>>>> For arguments sake lets pretend it also specified NFKC. >>>>> Does "U+1D7D0 MATHEMATICAL BOLD DIGIT TWO" conform to "FreeformClass" in >>>>> this case? >>>>> >>>>> It's either "Yes, that is clearly FREE_PVAL", or "No, that must be >>>>> normalized to U+0032 DIGIT TWO", depending on your reading. >>>> >>>> Here again I think conformance applies only at the level of the profile. >>>> The base string classes just limit the universe of characters you need >>>> to consider. >>>> >>>>>>> This may even already be the intent, but as I said a profile can easily >>>>>>> be defined such, that a string matches this criteria, but can never be >>>>>>> produced after the specified normalization and all mappings were >>>>>>> applied. >>>>>>> At any rate I think we need clearer text about the intention here, >>>>>>> answering the question: "When is a string allowed by a profile?". I >>>>>>> personally can not really tell from the draft right now. >>>>>> >>>>>> In part, I don't think it is the responsibility of this specification to >>>>>> answer that question, other than to make it clear that you need to check >>>>>> a string against the full set of rules defined by a profile. I do think >>>>>> it would be helpful to provide some examples, although I think they >>>>>> probably belong in the various specs that define the profiles (so far >>>>>> that would be nickname, saslprepbis, and 6122bis). >>>>>> >>>>> I think I agree. And I think that is why I suggested leaving >>>>> normalizations and mappings out of the classes. We want to tell people >>>>> that they have to normalize, and we want them to think about mappings. >>>>> But the exact order of those operations, who needs to perform them, what >>>>> is valid in protocol slots, etc. is their business. >>>>> >>>>> And what that means in particular (to me) is that a profile would tell >>>>> you after which of the steps a string needs to be PVALID under a certain >>>>> class. >>>> >>>> s/class/profile/ (IMHO) >>>> >>> I think that might be where we are talking cross purposes. >>> My understanding is, that we have an algorithm that will tell us whether >>> a codepoint is PVALID, according to the set of codepoints a class >>> allows. Profiles have no influence on this decision. I.e. nothing that >>> determines whether a codepoint is PVALID may be (re)defined by a >>> profile. This always requires a subclass. >> >> It all depends on what you mean by the "P" in PVALID. In practice, my >> feeling is that you want to know if a given codepoint is allowed in, >> say, the localpart of an XMPP address. The base class isn't always going >> to answer that question for you. >> > I'm actually talking about the calculated property here, be it named as > it will. My point is we have an algorithm to calculate PVALID, but we > don't have one for "is this an allowed string for a profile". Which, to > be clear, I think is fine for the framework document.
OK. I agree. >>>>> E.g. for SASLPrepbis you would (as I understand it) split up the >>>>> simple username into its parts, perform normalization and all mappings >>>>> except case mapping, and then check whether the userparts are >>>>> PVALID/ID_PVAL. Case mappings would then be performed whenever the SASL >>>>> mechanism tells you to. >>>>> >>>>> If everyone is required to specify such rules (usually less complex ones >>>>> I'd hope) I don't see the benefit of formally including normalization >>>>> and mappings in the definition of a class (in particular if that >>>>> definition pretty much is "that's up to you"). >>>>> This would also allow making this text generic, and not repeating it for >>>>> IdentifierClass and FreeformClass. >>>> >>>> I'm sorry, I've lost track of exactly what "this text" refers to here. >>>> >>> I think I had this much clearer in my head then I managed to express it, >>> sorry. Let me try again: >>> What we have right now are classes. These do two things: >>> a) Limit the character set >>> b) Specify a set of properties usages need to define >> >> s/usages/profiles/ yes (if we accept the simplification that Martin and >> I worked out). >> >>> Specifically a) encompasses Valid, Disallowed, and Unassigned, while >>> b) encompasses Width Mapping, Additional Mappings, Case Mapping, >>> Normalization, and Directionality >>> >>> Note that the text describing b) is almost completely identical for >>> IdentifierClass and FreeformClass. This is what "this text" was refering to. >> >> Thanks for the clarification. >> >>> My suggestion is to restrict classes to include only the a) properties. >>> We would still say that a usage needs to specify everything in b) though. >> >> I'm confused again, because that's what I *thought* we were doing all >> along. However, if that wasn't clear to you then we need to improve the >> text. >> > Umm... I think I'm still not making myself clear :/. And I suspect that > is, because "include" is ambiguous here. > I do realize that we are already at the point where the classes > explicitly specify everything in a), while everything in b) is profile > dependent. > What I'm saying is that IMHO b) should not be part of a class at all. I understand. You're saying let's move everything about mapping and normalization and directionality out of Section 3 (about string classes) and move it to Section 4 (about profiles). That makes a lot of sense. Sorry I was so dense. :-) > I'd want to have classes, consisting of Valid, Disallowed and Unassigned > sets. For these we have an algorithm determining whether a codepoint is > within the class (i.e. PVALID or CONTEXT[JO] + rule), or not. > Each usage (I think the term is more appropriate for this scheme), would > then define everything from the b) set, and which class to restrict data to. Got it. BTW, I prefer the term profile because during WG meetings and list discussions that's the term I've heard people naturally use. It seems artificial to force people to call it a "usage" when their tendency is to use the term "profile". Peter -- Peter Saint-Andre https://stpeter.im/ _______________________________________________ precis mailing list [email protected] https://www.ietf.org/mailman/listinfo/precis
