Apologies in advance for top posting and a need to be a little pedantic
about KDFs. I'll have some comments inline below as well.
KDF's aren't well understood but people think they are. The key stream
generation part is pretty straightforward (keyed PRBG), but the
interaction of how the key stream is generated and how the key stream is
assigned to actual cryptographic objects is not. Here's why:
1) KDF's are repeatable. Given the exact same inputs (key, mixin data)
they produce the same key stream.
2) Any change in the inputs changes ALL of the key stream.
3) Unless the overall length property is included, then changing the
length of the key stream will not change the prefix (e.g. if the
original call was for 10 bytes and a second call was for 20, the first
10 bytes of both calls will produce the exact same key stream data)
4) The general format of each round of key stream generation is
something like PRF (master key, mixins), where mixins are the
concatenation of at least a label and context and a value to
differentiate each round (a counter or the previous rounds output for
example). Including L in the mixin prevents the property described in
(3) above. Including a length for each subcomponent as a mixin prevents
the property described in (5) below.
5) Unless the length for each derived object is included in the mix in,
then it is possible to move the assignment of key stream bytes between
objects. For example, both TLS (1.2 and before) and IPSEC use KDFs that
generate non-secret IV material along with secret session key
material. This is less important for software only KDFs as both the
secret key material and the IV material are both in the JVM memory
domain. This is very important if you're trying to keep your secret key
material secure in an HSM.
Example: a given TLS session may need 2 256 bit AES keys and 2 128 bit
IVs. That is a requirement for 96 bytes of key stream (if I've got my
calculation correct). We have the HSM produce this (see the PKCS11
calling sequence for example) and we get out the IVs. An attacker who
has access to the HSM (which may or may not be on the same machine as
the TLS instantiation) can call the derivation function with new output
parameters (but with the same master key and mixins) which specifies
only IV material and have the function output the same key stream bytes
that were previously assigned to the secret key material in the IV
output. A very easy key extraction attack.
This is why TLS1.3 only does single outputs per KDF call and makes the
length of that output a mandatory mixin. An HSM can also look at the
labels and make a determination as to whether an object need be
protected (key material) or in the clear (iv).
Given (3) and (5) I believe that both L and l[i] (subcomponent length)
may need to be provided for BEFORE any key material is produced which
argues for input during initialization phase.
On 11/20/2017 5:12 AM, Jamil Nimeh wrote:
On 11/19/2017 12:45 PM, Michael StJohns wrote:
On 11/17/2017 1:07 PM, Adam Petcher wrote:
On 11/17/2017 10:04 AM, Michael StJohns wrote:
On 11/16/2017 2:15 PM, Adam Petcher wrote:
So it seems like they could all be supplied to init.
Alternatively, algorithm names could specify more concrete
algorithms that include the mode/PRF/etc. Can you provide more
information to explain why these existing patterns won't work in
this case?
What I need to do is provide a lifecycle diagram, but its hard to
do in text. But basically, the .getInstance() followed by
.setParameters() builds a concrete engine while the .init()
initializes that engine with a key and the derivation parameters.
Think about a TLS 1.2 instance - the PRF is selected once, but the
KDF may be used multiple times.
This is the information I was missing. There are two sets of
parameters, and the first set should be fixed, but the second set
should be changed on each init.
I considered the mode/PRF/etc stuff but that works for things like
Cipher and Signature because most of those have exactly the same
pattern. For the KDF pattern we;ve got fully specified KDFs (e.g.
TLS 1.1 and before, IPSEC), almost fully specified KDFs (TLS 1.2
and HDKF needs a PRF) and then the SP800 style KDFs which are
defined to be *very* flexible. So translating that into a naming
convention is going to be restrictive and may not cover all of the
possible approaches. I'd rather do it as an algorithmparameter
instead. With a given KDF implementation having a default if
nothing is specified during instantiation.
I agree that this is challenging because there is so much variety in
KDFs. But I don't think that SP 800-108 is a good example of
something that should be exposed as an algorithm in JCA, because it
is too broad. SP 800-108 is more of a toolbox that can be used to
construct KDFs. Particular specializations of SP 800-108 are widely
used, and they will get names that can be used in getInstance. For
example, HKDF-Expand is a particular specialization of SP 800-108.
So I think the existing pattern of using algorithm names to specify
concrete algorithms should work just as well in this API as it does
in the rest of JCA. Of course, more flexibility in the API is a nice
feature, but supporting this level of generality may be out of scope
for this effort.
The more I think about it the more I think you're mostly right. But
let's split this slightly as almost every KDF allows for the
specification of the PRF. So
<kdfname>/<prf> as the standard naming convention.
Or TLS13/HMAC-SHA256 and HKDF/HMAC-SHA256 (which are different
because of the mandatory inclusion of "L" in the derivation
parameters and each component object for TLS13)
Still - let's include the .setParameters() call as a failsafe as
looking forward I can see the need for flexibility rearing its ugly
head (e.g. adding PSS parameters to RSA signatures way late in the
game.....) and it does match the pattern for Signature so its not a
new concept. A given provider need not support the call, but its
there if needed.
Signature appears to have setParameter because the initSign and
initVerify didn't have APS parameters in their method signatures.
Since we're talking about providing APS objects through both
getInstance() for those locked to the algorithm and init() for things
like salts, info, etc. that can be changed on successive inits it
seems like we're covered without the need for a setParameter method.
You're missing the point that setParameter() provides information used
in all future calls to the signature generation, while init() provides
data specifically for a given key stream production. In Signature() you
call .setParameter() to set up the PSS parameters (or use the
defaults). Each subsequent call to initSign or initVerify uses those
PSS parameters. The equivalent part of .init() in KeyDerivation is
actually the calls to .update() in signature as they provide the
specific information for the production of the output key stream. In
fact, setting up an HMAC signature instance and passing it the mixin
data as part of a .update() is a way of producing the key stream round.
So equivalences:
KeyDerivation.getInstance(PRF) == Signature.getInstance(HMAC)
KeyDerivation.setParameters() == Signature.setParameters()
KeyDerivation.init(key, List<Parameters>) == concatenation of the
results of multiple calls (each key stream round based on the needed
output length) to [Signature.initSign(Key) followed by
Signature.update(converttobytearray(List<Parameters>)) followed by
Signature.sign()] to produce the key stream
KeyDerivation.deriveKey() == various calls to key or object factories
with parts of the key stream (signature).
(Hmm.. I think I forgot to get back to this comment - a KDF key should
be tagged differently than an HMAC key even though the underlying
functions are the same. It shouldn't be possible to use an HMAC
SecretKey (or an AES secret key) as a KDF master key and vice versa,
basically because of the property that an HMAC output is by definition
non-secret data while the key stream production is by definition -
secret. You want to make sure that its not trivial to do this).
One additional topic for discussion: Late in the week we talked about
the current state of the API internally and one item to revisit is
where the DerivationParameterSpec objects are passed. It was brought
up by a couple people that it would be better to provide the DPS
objects pertaining to keys at the time they are called for through
deriveKey() and deriveKeys() (and possibly deriveData).
Originally we had them all grouped in a List in the init method. One
reason for needing it up there was to know the total length of
material to generate. If we can provide the total length through the
AlgorithmParameterSpec passed in via init() then things like:
Key deriveKey(DerivationParameterSpec param);
List<Key> deriveKeys(List<DerivationParameterSpec> params);
become possible. To my eyes at least it does make it more clear what
DPS you're processing since they're provided at derive time, rather
than the caller having to keep track in their heads where in the DPS
list they might be with each successive deriveKey or deriveKeys
calls. And I think we could do away with deriveKeys(int), too.
See above - the key stream is logically produced in its entirety before
any assignment of that stream is made to any cryptographic objects
because the mixins (except for the round differentiator) are the same
for each key stream production round. Simply passing in the total
length may not give you the right result if the KDF requires a per
component length (and it should to defeat (5) or it should only produce
a single key).
95% of the time this will be a call to produce a single key. 4% of the
time it will be a call to produce multiple keys. Only 1% of the time
will it need to intermix key, data and object productions. Anybody who
is doing that is going to write a wrapper around this class to make sure
they get the key and data production order correct for each call. So
I'm not all that bothered by keeping the complexity as a price for
keeping flexibility.
You could have a Key deriveKey(Key k, DerivationParameterSpec param) for
some things like TLS1.3 (where you can only make a single call to derive
key between inits) , but then you'd also need at least a byte[]
deriveData (Key k, DerivationParameterSpec param) and an Object
deriveObject(Key k, DerivationParameterSpec param).
I think the most common pattern will be
.init(Key k, DerivationParameterSpec param) followed by .deriveKey() or
.init(Key k, List<DerivationParameterSpec> params) followed by .deriveKeys()
but the other intermixed patterns are just as valid.
--Jamil