Hi Unicode list, I am looking for feedback on this proposal, specifically a standard specification to map between (presumably) Unicode text strings and octet strings.

A "password" is defined as an arbitrary octet string in a number of protocols and formats. This has worked for basic cases where the "password" is just ASCII, but there are interoperability issues when characters beyond ASCII get involved. My observation is that a lot of security folks get hand-wavy about the Unicode stuff, which is why there is little standardization in this area.

Recently in the IETF, application/pkcs8-encrypted is proposed for the PKCS #8 
EncryptedPrivateKeyInfo type. For purposes of our discussion, the format takes 
as input an opaque octet string (any octet in the range 00h-FFh, of any 
length), and executes various specified algorithms; the result is a decrypted 
private key. The most common algorithm is PBKDF2, but any algorithm can be used 
(including, for example, a raw symmetric encryption algorithm such as AES-256).

PKCS #8 punts on the issue of character encoding. It says that ASCII or UTF-8 
could be used, but doesn’t enforce anything in particular. PKCS #12 specifies 
UTF-16LE with a terminating NULL character (00h 00h).

In the application/pkcs8-encrypted registration, I thought it might be wise to 
allow senders and receivers to specify how input (whether user input or 
otherwise) gets mapped to the octet string, since it's not part of the format. 
Originally my concern at that time was to reflect IANA character sets, rather 
than profiles of Unicode.

These days, however, most user agents are Unicode-enabled and will accept user 
input in Unicode. Therefore, issue is less about legacy character sets, and 
more about how to take the Unicode input and get a consistent and reasonable 
stream of bits out on both ends. For example: should the password be case 
folded, converted to NFKC, encoded in UTF-8 vs. UTF-16BE, etc.? Constraining or 
transforming the input would be helpful for disparate systems to agree on these 
things.


Thank you,

Sean

PS I read the "Unicode in passwords" thread. It's relevant. An alternative or addition to proposing a mapping to/from Unicode, might be to have a "keyboard-mapping" or "keyboard-layout" parameter, that specifies the suggested layout of the keyboard (or input device) used for password input, preferably by deferring to some international standard on the topic. Such a parameter could influence the initial user input method, but it doesn't answer the question of how to turn the key presses into specific bits (Unicode-based or otherwise).

**********
The relevant part of the template (most recent proposal, today) is:
***
Optional parameters:

password-mapping:
When the private key encryption algorithm incorporates a "password" that is an octet string, a mapping between user input and the octet string is desirable. PKCS #5 [RFC2898] Section 3 recommends "that applications follow some common text encoding rules"; it then suggests, but does not recommend, ASCII and UTF-8. This parameter specifies the charset that a recipient SHOULD attempt first when mapping user input to the octet string. It has similar semantics as the charset parameter from text/plain, except that it only applies to the user’s input of the password. There is no default value.

The following special values are defined:
*pkcs12  = UTF-16LE with U+0000 NULL terminator (PKCS #12-style)
*precis = PRECIS password profile, i.e., OpaqueString from Section 4 of RFC 7613 (always UTF-8) *precis-XXX = PRECIS profile as named XXX in the IANA PRECIS Profiles Registry <https://www.iana.org/assignments/precis-parameters> *hex = hexadecimal input: the input is mapped to 0-9, A-F, and then converted directly to octets. If there are an odd number of hex digits, the final digit 0 is appended, or an error condition may be raised. Compare with Annex M.4 of IEEE 802.11-2012. *dtmf = The characters "0"-"9", "A"-"D", "*", and "#", which map to their corresponding ASCII codes. (This is to support restricted-input devices, i.e., telephones and telephone-like equipment.)

Otherwise, the value of this parameter is a charset, from the Character Sets Registry <http://www.iana.org/assignments/character-sets>.
***

The relevant part of the original template (proposed 2015-11-04) is:
***
Optional parameters:
charset: When the private key encryption algorithm incorporates a “password" that is an 
octet string, a mapping between user input and the octet string is desirable. PKCS #5 
[RFC2898] Section 3 recommends "that applications follow some common text encoding 
rules"; it then suggests, but does not recommend, ASCII and UTF-8. This parameter 
specifies the charset that a recipient SHOULD attempt first when mapping user input to the 
octet string. It has the same semantics as the charset parameter from text/plain, except that 
it only applies to the user’s input of the password. There is no default value.

ualg: When the charset is a Unicode-based encoding, this parameter is a space-delimited 
list of Unicode algorithms that a recipient SHOULD first attempt to apply to the Unicode 
user input in succession, in order to derive the octet string. The list of algorithm 
keywords is defined by [UNICODE]. “Tailored operations” are operations that are sensitive 
to language, which must be provided as an input parameter. If a tailored operation is 
called for, the exclamation mark followed by the [BCP47] language tag specifies the 
language. For example, "toNFD toNFKC_Casefold!tr" first applies Normalization 
Form D, followed by Normalization Form KC with Case Folding in the Turkish language, 
according to [UNICODE] and [UAX31]. The default value of this parameter is empty, and 
leaves the matter of whether to normalize, case fold, or apply other transformations 
unspecified.


The latest template is here:

http://mailarchive.ietf.org/arch/msg/precis/Qil9mc5AtqxXp8OXllp0lAwYts4


Reply via email to