On 1/5/2016 8:26 AM, Markus Scherer wrote:
I would specify that UTF-8 must be used, without mapping.
US-ASCII is a proper subset, so need not be mentioned explicitly, nor
distinguished in the protocol.
Mappings would require that all implementations carry relevant data,
and are up to date to recent versions of Unicode, or else
previously-unassigned code points will cause failures.
As long as a user types the same password the same way, or with IMEs
that produce the same output, they are fine. Strange variants might
improve password security.
Right.
In PRECIS, UTF-8 is enforced. However as you point out, the issue is
that "strange variants" exist, as well as different IMEs and different
keyboard/keystroke combinations. A case in point is that 0xFF is not a
valid UTF-8 octet. However, nothing constrains the underlying technology
not to use 0xFF, so there should be a way for a user (or process) to
force the use of specific octet strings as inputs. That is why the
"password-mapping" parameter is proposed as a hint rather than a strict
rule.
Also as pointed out, PKCS#8 encrypted blobs are used within PKCS #12,
which has its own Unicode mapping (based on UTF-16LE).
Sean