On 11/2/12 8:05 PM, "Peter Saint-Andre" <[email protected]> wrote:
>Ask, and ye shall receive. > >### > > For comparison purposes (e.g., when a chatroom server determines if > two nicknames are in conflict during the authorization process), an > application MUST treat a nickname as follows, where the operations > specified MUST be completed in the order shown (in particular, > normalization MUST be performed before all other mapping steps and > validity checks, consistent with [I-D.ietf-precis-framework]): > > 1. The string MUST be normalized using Unicode Normalization Form KC > (NFKC). Because NFKC is more "aggressive" in finding matches > than other normalization forms (in the terminology of Unicode, it > performs both canonical and compatibility decomposition before > recomposing code points), this rule helps to reduce the > possibility of confusion by increasing the number of characters > that would match (e.g., U+2163 ROMAN NUMERAL FOUR would match the > combination of U+0049 LATIN CAPITAL LETTER I and U+0056 LATIN > CAPITAL LETTER V). > > 2. Uppercase and titlecase characters MUST be mapped to their > lowercase equivalents. In applications that prohibit conflicting > nicknames, this rule helps to reduce the possibility of confusion > by ensuring that nicknames differing only by case (e.g., > "stpeter" vs. "StPeter") would not be allowed in a chatroom at > the same time. > > 3. Non-ASCII space characters from the "N" category defined under > Section 6.14 of [I-D.ietf-precis-framework] MUST be mapped to > U+0020 SPACE. > > 4. Leading and trailing whitespace (i.e., one or more instances of > the ASCII space character at the beginning or end of a nickname) > MUST be removed (e.g., "stpeter " is mapped to "stpeter"). > > 5. Interior sequences of more than one ASCII space character MUST be > mapped to a single ASCII space character (e.g., "St Peter" is > mapped to "St Peter"). > > 6. Other mappings MAY be applied, such as those defined in > [I-D.yoneya-precis-mappings]. (Note that mapping of fullwidth > and halfwidth characters to their decomposition mappings is not > necessary, since those mappings are performed as part of > normalization using NFKC.) I think we should also add the confusable mapping (see: http://www.unicode.org/reports/tr39/#Confusable_Detection) here. This brings up another point, however. I think that things that are acting as registries (such as a chat server ensuring that there aren't two people using the same nickname) MUST NOT transmit these liberally-normalized names. That means they probably have to keep at least two (and maybe three) versions around: - The précis-mapped version for equality checking - The confusable-mapped version for uniqueness checking - The original version (maybe, if the précis-mapped version is losing information) This way, the spaces would still exist, but they aren't an attack vector. -- Joe Hildebrand _______________________________________________ precis mailing list [email protected] https://www.ietf.org/mailman/listinfo/precis
