Re: [precis] [media-types] Internet media type application/pkcs8-encrypted rev 2

Martin J . Dürst Mon, 23 Nov 2015 17:32:08 -0800

Hello Sean,

I have cc'ed the precis mailing list because some of what I'll writebelow is relevant for the discussion you have started there. This isalso the reason why I'm keeping most the previous context.


On 2015/11/11 00:25, Sean Leonard wrote:

Hello Martin,

On Nov 10, 2015, at 1:45 AM, Martin J. Dürst <[email protected]> wrote:

Hello Sean,

I have a few questions re. your registration below.

On 2015/11/05 14:57, Sean Leonard wrote:

Hello:

To keep this moving, trying a different thing. Please review.

Sean

*****

Type name: application

Subtype name: pkcs8-encrypted

Required parameters: N/A

Optional parameters:
charset: When the private key encryption algorithm incorporates a “password" that is an 
octet string, a mapping between user input and the octet string is desirable. PKCS #5 
[RFC2898] Section 3 recommends "that applications follow some common text encoding 
rules"; it then suggests, but does not recommend, ASCII and UTF-8. This parameter 
specifies the charset that a recipient SHOULD attempt first when mapping user input to the 
octet string. It has the same semantics as the charset parameter from text/plain, except that 
it only applies to the user’s input of the password. There is no default value.


Why does it say "This parameter specifies the charset that a recipient SHOULD 
attempt *first*" here? Can't that encoding just be specified as such?

At least for future, similar efforts, it would be extremely desirable to not 
leave character encoding open like this, but just to nail it down to UTF-8.


There seems to be something of a “cultural disconnect” between the security 
people and the I18N/UI/UX people.

The I18N/UI/UX people want well-defined interfaces that work with users “in 
their own language”, whether that language is visual, aural, tactile, symbolic, 
pictorial, etc. Invariably this involves Unicode and a large character 
repertoire such as 💩 and 大便所.

In contrast, the security people find open-ended things like Unicode to be 
anathema and would much rather restrict the range of inputs to a small and 
preferably uniformly distributed set of values. And there are good reasons for 
that, because when you introduce bias into cryptographic protocols, it turns 
out that it is a lot easier to cryptanalyze the results.

The common security protocols that I have seen that take passwords, hand-wave 
about character sets and encodings and define the password to be an octet 
string. This is great for universality but bad for human input. PBKDF2 (PKCS 
#5, on which this PKCS #8 EncryptedPrivateKeyInfo registration is based) is a 
leading example of the “octet string” approach. Ultimately, the algorithms 
don’t care what encoding it’s in, as long as they get a blob of bits (octets).

My knowledge of implementations of PKCS #5/#8/#12 suggests that there are many 
applications out there that give zero thought to the encoding issue, which 
means that they will take user input “As-Is”, i.e., in the current code page.

Note that PKCS #12 defines the input to this structure as a UTF-16LE encoded 
character string, *with* a terminating U+0000 NULL character (i.e., the octets 
00 00). This is really “weird” except of course for the fact that Microsoft 
invented it and then shipped it without too much thought, in which case, all 
weirdness can be explained.

It is a design criteria that if you extract such an EncryptedPrivateKeyInfo 
blob from a PKCS #12 file, that you should be able to process it. If you 
specify UTF-8 as the one, single, true encoding of the password for 
application/pkcs8-encrypted, that can’t happen.

That's just fine, in this specific case. I have explicitly prefaced myremark above with "At least in the future".

But if we know that the password is encoded in UTF-16LE, then whydoesn't your registration just say "This parameter specifies thecharset" rather than the handwavy "This parameter specifies the charsetthat a recipient SHOULD attempt *first*".

Furthermore, UTF-8 is not uniformly distributed across the octet range. If your 
users are in US-English they are highly likely to have octets in 20-7E. Octets 
in 00-1F will be pretty rare. And if you choose scalar values randomly in 
Unicode (regardless of assignment), you will see a *lot* of F0-F4 but virtually 
none in 00-7F. And in spite of all this, octets F5-FF will *never* appear in 
UTF-8.

It turns out that we have a pretty good source of uniformity and universality: 
characters in the US-ASCII range 20-7E. Many password input boxes will only 
accept US-ASCII and so user’s non-US-English keyboards will switch to US-ASCII 
mode for the purpose of providing input to such boxes. What matters is not so 
much the specific characters, so much as a reasonable selection of arbitrary 
buttons that a user can push *across a wide range of devices*. This ends up 
giving you 5-6 bits of entropy per user input. So the need for UTF-8 or any 
particular encoding is actually not as great as some people perceive.

My comment was specifically trying to say: If you use something morethan US-ASCII, make it UTF-8. I think that's also the general policy ofthe IETF. As for entropy, the entropy needs to be measured over thewhole string. It's clear that in UTF-8 bytes, a password in the ASCIIrange is shorter than a similar-length (in terms of charaters) passwordin a non-Latin script. The entropy of each byte will be lower, but theentropy of the overall password should be about the same.

Something that's very important for passwords is how easy they are toremember for actual people. It should be obvious that it's easier forsomebody to remember a password in the language/script they use everyday than in some foreign gibberish.

Overall I think that a standard such as IEEE 802.11 strikes a reasonable 
balance. (See 802.11-2012 Annex M.4, which is informative, but is pretty much 
the worldwide de-facto standard practice.) In 802.11, the input to PBKDF2 is 
between 8-63 ASCII-encoded characters in the range 20-7E, or 64 hexadecimal 
characters that convert directly to 32 octets.

So it's up to 63 ASCII characters but only up to 32 octets that may e.g.be used for UTF-8? That doesn't strike me as a reasonable balance; itputs a much stronger length limitation on some scripts outside ASCII.

***
To answer your questions directly:

Why does it say "This parameter specifies the charset that a recipient SHOULD 
attempt *first*" here?
Can't that encoding just be specified as such?



The parameter is not cryptographically protected so it is subject to tampering 
or substitution. Furthermore, a good-faith but naïve sender may put some 
encoding (e.g., UTF-8) but not have the means to verify that the encoding 
actually works, because the user did not supply the password. Basically it’s a 
good-faith first effort, but this parameter can’t meaningfully restrict what 
the sender or receiver attempt to do.

That essentially applies to any single parameter in any single mediatype registration, and in much more of what the IETF does. Yet this isvirtually never called out, because otherwise, IETF documents would befull of such stuff and very hard to read.

Also, I am not sure how to specify the NULL suffix in the PKCS #12-extracted 
case.


That may suggest that you are going down the wrong path here.

I suppose it could just be “+0” or something.

ualg: When the charset is a Unicode-based encoding, this parameter is a space-delimited 
list of Unicode algorithms that a recipient SHOULD first attempt to apply to the Unicode 
user input in succession, in order to derive the octet string. The list of algorithm 
keywords is defined by [UNICODE]. “Tailored operations” are operations that are sensitive 
to language, which must be provided as an input parameter. If a tailored operation is 
called for, the exclamation mark followed by the [BCP47] language tag specifies the 
language. For example, "toNFD toNFKC_Casefold!tr" first applies Normalization 
Form D, followed by Normalization Form KC with Case Folding in the Turkish language, 
according to [UNICODE] and [UAX31]. The default value of this parameter is empty, and 
leaves the matter of whether to normalize, case fold, or apply other transformations 
unspecified.


"When the charset is": Is this the charset parameter, or the actual encoding of 
the password?


Admittedly this was vague. First draft. I am not sure what it should be. Per PKCS #5, the 
"Actual Encoding" is just an octet string of arbitrary length.

I would limit this to cases when the charset parameter is present and defined. 
Makes it easier.


What is a "Unicode algorithm”?


Conformance Clause D17.

Well, this, via the term "Named Unicode Algorithm" points to table 3.1(page 93 in Unicode V 8.0).

Reading on and looking at the examples, the intent becomes clearer, at least to somebody 
who has seen things such toNFD and toNFKC and Casefold, but I hope we can avoid 
"specification by example" here.


In fairness, “toNFD” and “toNFKC” are not defined terms. However, NFD (D118) 
and NFKC (D121) are.


Yes, but not as (Named) Unicode Algorithms.

I would rather not create Yet Another Registry of things.


I'd agree in principle.

The terms are in fact defined in [UNICODE] in the conformance clauses.


Yes, but there are many other things defined there, too.

My usability perception is that if people really want to use Unicode in their 
passwords, canonicalization is a very useful property to preserve. Case 
folding/case mapping are not so useful, as most systems like to have 
case-sensitive passwords for greater entropy, but “most systems” is not “all 
systems” so we shouldn’t preclude the use of case algorithms. As for other 
algorithms such as line breaking, character segmentation, Hangul syllable name 
generation, etc., the short answer is “I don’t know”. (These are all reasons 
why people stick with ASCII passwords, by the way.)

Line breaking, character segmentation, Hangul syllable namegeneration,... are completely irrelevant for passwords and passphrases.


Also, many algorithms come with options or parameters.

Also, if there is indeed a list of algorithm identifiers in [UNICODE], then it 
would be good to give a Section number. Is the intent that each and every 
algorithm named somewhere in [UNICODE] is implemented? My rough guess would be 
that the average password input implementation implements only the identity 
transform. [I would of course be positively surprised if I were wrong.]


See above; main thing that worries me is Normalization Forms.


Also, references for [UNICODE], [BCP47], and [UAX31] should be give so that 
this registration is self-containing.


Ok.

Another possibility is that this registration goes back to “rev 1”, i.e., no 
optional parameters about the character encoding at all. I think that is 
perfectly defensible. But it is not particularly i18n-friendly.

I'm not sufficiently familiar with the format and the actual use cases,but my suggestion would be to check what's actually out there in thefield (such as the Microsoft UTF-16LE including final NULL), and selector create a list of parameters/algorithms (with a registry if it turnsout to be needed). To that, add a way to reference PRECIS, even if it'snot currently used, because that includes the expertise/recommendationsof experts.

The current proposal just essentially saying: Unicode may define some ofthe pieces you may want to use here, and may have labels for them, sojust give it a try. I'm not at all sure this will help interoperability,except by similar accidents like the Microsoft one that you described above.


Regards,    Martin.

Regards,

Sean


Regards,   Martin.

Encoding considerations: binary

Security considerations:
Carries a cryptographic private key. See Section 6 of RFC 5958.
EncryptedPrivateKeyInfo PKCS #8 data contains exactly one private key. Poor 
password choices, weak algorithms, or improper parameter selections (e.g., 
insufficient salting rounds) will make the confidential payloads much easier to 
compromise.

Interoperability considerations:
PKCS #8 is a widely recognized format for private key information on all modern 
cryptographic stacks. The encrypted variation in this registration, 
EncryptedPrivateKeyInfo (Section 3, Encrypted Private Key Info, of RFC 5958, and Section 
6 of PKCS #8), is less widely used for exchange than PKCS #12, but it is much simpler to 
implement. The contents are exactly one private key (with optional attributes), so the 
possibility for hidden "easter eggs" in the payload such as unexpected 
certificates or miscellaneous secrets is drastically reduced.

Published specification:
PKCS #8 v1.2, November 1993 (republished as RFC 5208, May 2008); RFC 5958, 
August 2010

Applications that use this media type:
Machines, applications, browsers, Internet kiosks, and so on, that support this 
standard allow a user to import, export, and exercise a single private key.

Fragment identifier considerations: N/A

Additional information:

Deprecated alias names for this type: N/A
Magic number(s): None.
File extension(s): .p8e
Macintosh file type code(s): N/A

Person & email address to contact for further information:
Sean Leonard <dev+ietf&seantek.com>

Intended usage: COMMON

Restrictions on usage: None.

Author:
RSA, EMC, IETF

Change controller: The IETF

Provisional registration? (standards tree only): No



_______________________________________________
media-types mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/media-types


_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Re: [precis] [media-types] Internet media type application/pkcs8-encrypted rev 2

Reply via email to