Hello, I don't have much knowledge on the topic, but 1. probably something like the punycode used for internationalized domain name might help? 2. I don't think keyboard mapping is a good idea, as to some less computer-savvy Chinese-speaking users, it's often that their only way to write Chinese into computer is by handwriting and handwriting doesn't seem to be something supported by keyboard mapping. 2016/01/05 13:33 "Sean Leonard" <[email protected]>:
> Hi Unicode list, I am looking for feedback on this proposal, specifically > a standard specification to map between (presumably) Unicode text strings > and octet strings. > > A "password" is defined as an arbitrary octet string in a number of > protocols and formats. This has worked for basic cases where the "password" > is just ASCII, but there are interoperability issues when characters beyond > ASCII get involved. My observation is that a lot of security folks get > hand-wavy about the Unicode stuff, which is why there is little > standardization in this area. > > Recently in the IETF, application/pkcs8-encrypted is proposed for the PKCS > #8 EncryptedPrivateKeyInfo type. For purposes of our discussion, the format > takes as input an opaque octet string (any octet in the range 00h-FFh, of > any length), and executes various specified algorithms; the result is a > decrypted private key. The most common algorithm is PBKDF2, but any > algorithm can be used (including, for example, a raw symmetric encryption > algorithm such as AES-256). > > PKCS #8 punts on the issue of character encoding. It says that ASCII or > UTF-8 could be used, but doesn’t enforce anything in particular. PKCS #12 > specifies UTF-16LE with a terminating NULL character (00h 00h). > > In the application/pkcs8-encrypted registration, I thought it might be > wise to allow senders and receivers to specify how input (whether user > input or otherwise) gets mapped to the octet string, since it's not part of > the format. Originally my concern at that time was to reflect IANA > character sets, rather than profiles of Unicode. > > These days, however, most user agents are Unicode-enabled and will accept > user input in Unicode. Therefore, issue is less about legacy character > sets, and more about how to take the Unicode input and get a consistent and > reasonable stream of bits out on both ends. For example: should the > password be case folded, converted to NFKC, encoded in UTF-8 vs. UTF-16BE, > etc.? Constraining or transforming the input would be helpful for disparate > systems to agree on these things. > > > Thank you, > > Sean > > PS I read the "Unicode in passwords" thread. It's relevant. An alternative > or addition to proposing a mapping to/from Unicode, might be to have a > "keyboard-mapping" or "keyboard-layout" parameter, that specifies the > suggested layout of the keyboard (or input device) used for password input, > preferably by deferring to some international standard on the topic. Such a > parameter could influence the initial user input method, but it doesn't > answer the question of how to turn the key presses into specific bits > (Unicode-based or otherwise). > > ********** > The relevant part of the template (most recent proposal, today) is: > *** > Optional parameters: > > password-mapping: > When the private key encryption algorithm incorporates a "password" that > is an octet string, a mapping between user input and the octet string is > desirable. PKCS #5 [RFC2898] Section 3 recommends "that applications follow > some common text encoding rules"; it then suggests, but does not recommend, > ASCII and UTF-8. This parameter specifies the charset that a recipient > SHOULD attempt first when mapping user input to the octet string. It has > similar semantics as the charset parameter from text/plain, except that it > only applies to the user’s input of the password. There is no default value. > > The following special values are defined: > *pkcs12 = UTF-16LE with U+0000 NULL terminator (PKCS #12-style) > *precis = PRECIS password profile, i.e., OpaqueString from Section 4 of > RFC 7613 (always UTF-8) > *precis-XXX = PRECIS profile as named XXX in the IANA PRECIS Profiles > Registry <https://www.iana.org/assignments/precis-parameters> > *hex = hexadecimal input: the input is mapped to 0-9, A-F, and then > converted directly to octets. If there are an odd number of hex digits, the > final digit 0 is appended, or an error condition may be raised. Compare > with Annex M.4 of IEEE 802.11-2012. > *dtmf = The characters "0"-"9", "A"-"D", "*", and "#", which map to > their corresponding ASCII codes. (This is to support restricted-input > devices, i.e., telephones and telephone-like equipment.) > > Otherwise, the value of this parameter is a charset, from the Character > Sets Registry <http://www.iana.org/assignments/character-sets>. > *** > > The relevant part of the original template (proposed 2015-11-04) is: > *** > Optional parameters: > charset: When the private key encryption algorithm incorporates a > “password" that is an octet string, a mapping between user input and the > octet string is desirable. PKCS #5 [RFC2898] Section 3 recommends "that > applications follow some common text encoding rules"; it then suggests, but > does not recommend, ASCII and UTF-8. This parameter specifies the charset > that a recipient SHOULD attempt first when mapping user input to the octet > string. It has the same semantics as the charset parameter from text/plain, > except that it only applies to the user’s input of the password. There is > no default value. > > ualg: When the charset is a Unicode-based encoding, this parameter is a > space-delimited list of Unicode algorithms that a recipient SHOULD first > attempt to apply to the Unicode user input in succession, in order to > derive the octet string. The list of algorithm keywords is defined by > [UNICODE]. “Tailored operations” are operations that are sensitive to > language, which must be provided as an input parameter. If a tailored > operation is called for, the exclamation mark followed by the [BCP47] > language tag specifies the language. For example, "toNFD > toNFKC_Casefold!tr" first applies Normalization Form D, followed by > Normalization Form KC with Case Folding in the Turkish language, according > to [UNICODE] and [UAX31]. The default value of this parameter is empty, and > leaves the matter of whether to normalize, case fold, or apply other > transformations unspecified. > > > The latest template is here: > > http://mailarchive.ietf.org/arch/msg/precis/Qil9mc5AtqxXp8OXllp0lAwYts4 > > >

