Unicode password mapping for crypto standard

Sean Leonard Mon, 04 Jan 2016 21:34:38 -0800

Hi Unicode list, I am looking for feedback on this proposal,specifically a standard specification to map between (presumably)Unicode text strings and octet strings.

A "password" is defined as an arbitrary octet string in a number ofprotocols and formats. This has worked for basic cases where the"password" is just ASCII, but there are interoperability issues whencharacters beyond ASCII get involved. My observation is that a lot ofsecurity folks get hand-wavy about the Unicode stuff, which is why thereis little standardization in this area.


Recently in the IETF, application/pkcs8-encrypted is proposed for the PKCS #8 
EncryptedPrivateKeyInfo type. For purposes of our discussion, the format takes 
as input an opaque octet string (any octet in the range 00h-FFh, of any 
length), and executes various specified algorithms; the result is a decrypted 
private key. The most common algorithm is PBKDF2, but any algorithm can be used 
(including, for example, a raw symmetric encryption algorithm such as AES-256).

PKCS #8 punts on the issue of character encoding. It says that ASCII or UTF-8 
could be used, but doesn’t enforce anything in particular. PKCS #12 specifies 
UTF-16LE with a terminating NULL character (00h 00h).

In the application/pkcs8-encrypted registration, I thought it might be wise to 
allow senders and receivers to specify how input (whether user input or 
otherwise) gets mapped to the octet string, since it's not part of the format. 
Originally my concern at that time was to reflect IANA character sets, rather 
than profiles of Unicode.

These days, however, most user agents are Unicode-enabled and will accept user 
input in Unicode. Therefore, issue is less about legacy character sets, and 
more about how to take the Unicode input and get a consistent and reasonable 
stream of bits out on both ends. For example: should the password be case 
folded, converted to NFKC, encoded in UTF-8 vs. UTF-16BE, etc.? Constraining or 
transforming the input would be helpful for disparate systems to agree on these 
things.


Thank you,

Sean

PS I read the "Unicode in passwords" thread. It's relevant. Analternative or addition to proposing a mapping to/from Unicode, might beto have a "keyboard-mapping" or "keyboard-layout" parameter, thatspecifies the suggested layout of the keyboard (or input device) usedfor password input, preferably by deferring to some internationalstandard on the topic. Such a parameter could influence the initial userinput method, but it doesn't answer the question of how to turn the keypresses into specific bits (Unicode-based or otherwise).


**********
The relevant part of the template (most recent proposal, today) is:
***
Optional parameters:

password-mapping:

When the private key encryption algorithm incorporates a "password" thatis an octet string, a mapping between user input and the octet string isdesirable. PKCS #5 [RFC2898] Section 3 recommends "that applicationsfollow some common text encoding rules"; it then suggests, but does notrecommend, ASCII and UTF-8. This parameter specifies the charset that arecipient SHOULD attempt first when mapping user input to the octetstring. It has similar semantics as the charset parameter fromtext/plain, except that it only applies to the user’s input of thepassword. There is no default value.


The following special values are defined:
*pkcs12  = UTF-16LE with U+0000 NULL terminator (PKCS #12-style)

*precis = PRECIS password profile, i.e., OpaqueString from Section 4 ofRFC 7613 (always UTF-8)*precis-XXX = PRECIS profile as named XXX in the IANA PRECIS ProfilesRegistry <https://www.iana.org/assignments/precis-parameters>*hex = hexadecimal input: the input is mapped to 0-9, A-F, and thenconverted directly to octets. If there are an odd number of hex digits,the final digit 0 is appended, or an error condition may be raised.Compare with Annex M.4 of IEEE 802.11-2012.*dtmf = The characters "0"-"9", "A"-"D", "*", and "#", which map totheir corresponding ASCII codes. (This is to support restricted-inputdevices, i.e., telephones and telephone-like equipment.)

Otherwise, the value of this parameter is a charset, from the CharacterSets Registry <http://www.iana.org/assignments/character-sets>.

***

The relevant part of the original template (proposed 2015-11-04) is:
***
Optional parameters:
charset: When the private key encryption algorithm incorporates a “password" that is an
octet string, a mapping between user input and the octet string is desirable. PKCS #5
[RFC2898] Section 3 recommends "that applications follow some common text encoding
rules"; it then suggests, but does not recommend, ASCII and UTF-8. This parameter
specifies the charset that a recipient SHOULD attempt first when mapping user input to the
octet string. It has the same semantics as the charset parameter from text/plain, except that
it only applies to the user’s input of the password. There is no default value.

ualg: When the charset is a Unicode-based encoding, this parameter is a space-delimited
list of Unicode algorithms that a recipient SHOULD first attempt to apply to the Unicode
user input in succession, in order to derive the octet string. The list of algorithm
keywords is defined by [UNICODE]. “Tailored operations” are operations that are sensitive
to language, which must be provided as an input parameter. If a tailored operation is
called for, the exclamation mark followed by the [BCP47] language tag specifies the
language. For example, "toNFD toNFKC_Casefold!tr" first applies Normalization
Form D, followed by Normalization Form KC with Case Folding in the Turkish language,
according to [UNICODE] and [UAX31]. The default value of this parameter is empty, and
leaves the matter of whether to normalize, case fold, or apply other transformations
unspecified.

The latest template is here:

http://mailarchive.ietf.org/arch/msg/precis/Qil9mc5AtqxXp8OXllp0lAwYts4

Unicode password mapping for crypto standard

Reply via email to