Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread David Howells
Ard Biesheuvel  wrote:

> Apparently, it is permitted for gss_krb5_cts_crypt() to do a
> kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is
> being invoked, and so I don't see why it wouldn't be possible to
> simply kmalloc() a scatterlist[] of the appropriate size, populate it
> with all the pages, bufs and whatever else gets passed into the
> skcipher, and pass it into the skcipher in one go.

I never said it wasn't possible.  But doing a pair of order-1 allocations from
there might have a significant detrimental effect on performance - in which
case Trond and co. will say "no".

Remember: to crypt 1MiB of data on a 64-bit machine requires 2 x minimum 8KiB
scatterlist arrays.  That's assuming the pages in the middle are contiguous,
which might not be the case for a direct I/O read/write.  So for the DIO case,
it could be involve an order-2 allocation (or chaining of single pages).

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread Ard Biesheuvel
On Tue, 8 Dec 2020 at 14:25, David Howells  wrote:
>
> I wonder - would it make sense to reserve two arrays of scatterlist structs
> and a mutex per CPU sufficient to map up to 1MiB of pages with each array
> while the krb5 service is in use?
>
> That way sunrpc could, say, grab the mutex, map the input and output buffers,
> do the entire crypto op in one go and then release the mutex - at least for
> big ops, small ops needn't use this service.
>
> For rxrpc/afs's use case this would probably be overkill - it's doing crypto
> on each packet, not on whole operations - but I could still make use of it
> there.
>
> However, that then limits the maximum size of an op to 1MiB, plus dangly bits
> on either side (which can be managed with chained scatterlist structs) and
> also limits the number of large simultaneous krb5 crypto ops we can do.
>

Apparently, it is permitted for gss_krb5_cts_crypt() to do a
kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is
being invoked, and so I don't see why it wouldn't be possible to
simply kmalloc() a scatterlist[] of the appropriate size, populate it
with all the pages, bufs and whatever else gets passed into the
skcipher, and pass it into the skcipher in one go.


Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread David Howells
David Howells  wrote:

> I wonder - would it make sense to reserve two arrays of scatterlist structs
> and a mutex per CPU sufficient to map up to 1MiB of pages with each array
> while the krb5 service is in use?

Actually, simply reserving a set per CPU is probably unnecessary.  We could,
say, set a minimum and a maximum on the reservations (say 2 -> 2*nr_cpus) and
then allocate new ones when we run out.  Then let the memory shrinker clean
them up off an lru list.

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread David Howells
I wonder - would it make sense to reserve two arrays of scatterlist structs
and a mutex per CPU sufficient to map up to 1MiB of pages with each array
while the krb5 service is in use?

That way sunrpc could, say, grab the mutex, map the input and output buffers,
do the entire crypto op in one go and then release the mutex - at least for
big ops, small ops needn't use this service.

For rxrpc/afs's use case this would probably be overkill - it's doing crypto
on each packet, not on whole operations - but I could still make use of it
there.

However, that then limits the maximum size of an op to 1MiB, plus dangly bits
on either side (which can be managed with chained scatterlist structs) and
also limits the number of large simultaneous krb5 crypto ops we can do.

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread David Howells
Ard Biesheuvel  wrote:

Ard Biesheuvel  wrote:

> > > > I wonder if it would help if the input buffer and output buffer didn't
> > > > have to correspond exactly in usage - ie. the output buffer could be
> > > > used at a slower rate than the input to allow for buffering inside the
> > > > crypto algorithm.
> > >
> > > I don't follow - how could one be used at a slower rate?
> >
> > I mean that the crypto algorithm might need to buffer the last part of the
> > input until it has a block's worth before it can write to the output.
> 
> This is what is typically handled transparently by the driver. When
> you populate a scatterlist, it doesn't matter how misaligned the
> individual elements are, the scatterlist walker will always present
> the data in chunks that the crypto algorithm can manage. This is why
> using a single scatterlist for the entire input is preferable in
> general.

Yep - but the assumption currently on the part of the callers is that they
provide the input buffer and corresponding output buffer - and that the
algorithm will transfer data from one to the other, such that the same amount
of input and output bufferage will be used.

However, if we start pushing data in progressively, this would no longer hold
true unless we also require the caller to only present in block-size chunks.

For example, if I gave the encryption function 120 bytes of data and a 120
byte output buffer, but the algorithm has a 16-byte blocksize, it will,
presumably, consume 120 bytes of input, but it can only write 112 bytes of
output at this time.  So the current interface would need to evolve to
indicate separately how much input has been consumed and how much output has
been produced - in which case it can't be handled transparently.

For krb5, it's actually worse than that, since we want to be able to
insert/remove a header and a trailer (and might need to go back and update the
header after) - but I think in the krb5 case, we need to treat the header and
trailer specially and update them after the fact in the wrapping case
(unwrapping is not a problem, since we can just cache the header).

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread Ard Biesheuvel
On Mon, 7 Dec 2020 at 15:15, David Howells  wrote:
>
> Ard Biesheuvel  wrote:
>
> > > I wonder if it would help if the input buffer and output buffer didn't
> > > have to correspond exactly in usage - ie. the output buffer could be used
> > > at a slower rate than the input to allow for buffering inside the crypto
> > > algorithm.
> > >
> >
> > I don't follow - how could one be used at a slower rate?
>
> I mean that the crypto algorithm might need to buffer the last part of the
> input until it has a block's worth before it can write to the output.
>

This is what is typically handled transparently by the driver. When
you populate a scatterlist, it doesn't matter how misaligned the
individual elements are, the scatterlist walker will always present
the data in chunks that the crypto algorithm can manage. This is why
using a single scatterlist for the entire input is preferable in
general.

> > > The hashes corresponding to the kerberos enctypes I'm supporting are:
> > >
> > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96.
> > >
> > > HMAC-SHA256 for aes128-cts-hmac-sha256-128
> > >
> > > HMAC-SHA384 for aes256-cts-hmac-sha384-192
> > >
> > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac
> > >
> > > I'm not sure you can support all of those with the instructions available.
> >
> > It depends on whether the caller can make use of the authenc()
> > pattern, which is a type of AEAD we support.
>
> Interesting.  I didn't realise AEAD was an API.
>
> > There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)),
> > including h/w accelerated ones, but none that implement ciphertext
> > stealing. So that means that, even if you manage to use the AEAD layer to
> > perform both at the same time, the generic authenc() template will perform
> > the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes,
> > respectively, which won't give you any benefit until accelerated
> > implementations turn up that perform the whole operation in one pass over
> > the input. And even then, I don't think the performance benefit will be
> > worth it.
>
> Also, the rfc8009 variants that use AES with SHA256/384 hash the ciphertext,
> not the plaintext.
>
> For the moment, it's probably not worth worrying about, then.  If I can manage
> to abstract the sunrpc bits out into a krb5 library, we can improve the
> library later.
>


Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-07 Thread David Howells
Ard Biesheuvel  wrote:

> > I wonder if it would help if the input buffer and output buffer didn't
> > have to correspond exactly in usage - ie. the output buffer could be used
> > at a slower rate than the input to allow for buffering inside the crypto
> > algorithm.
> >
> 
> I don't follow - how could one be used at a slower rate?

I mean that the crypto algorithm might need to buffer the last part of the
input until it has a block's worth before it can write to the output.

> > The hashes corresponding to the kerberos enctypes I'm supporting are:
> >
> > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96.
> >
> > HMAC-SHA256 for aes128-cts-hmac-sha256-128
> >
> > HMAC-SHA384 for aes256-cts-hmac-sha384-192
> >
> > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac
> >
> > I'm not sure you can support all of those with the instructions available.
>
> It depends on whether the caller can make use of the authenc()
> pattern, which is a type of AEAD we support.

Interesting.  I didn't realise AEAD was an API.

> There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)),
> including h/w accelerated ones, but none that implement ciphertext
> stealing. So that means that, even if you manage to use the AEAD layer to
> perform both at the same time, the generic authenc() template will perform
> the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes,
> respectively, which won't give you any benefit until accelerated
> implementations turn up that perform the whole operation in one pass over
> the input. And even then, I don't think the performance benefit will be
> worth it.

Also, the rfc8009 variants that use AES with SHA256/384 hash the ciphertext,
not the plaintext.

For the moment, it's probably not worth worrying about, then.  If I can manage
to abstract the sunrpc bits out into a krb5 library, we can improve the
library later.

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-07 Thread Ard Biesheuvel
On Mon, 7 Dec 2020 at 13:02, David Howells  wrote:
>
> Ard Biesheuvel  wrote:
>
> > > Yeah - the problem with that is that for sunrpc, we might be dealing with 
> > > 1MB
> > > plus bits of non-contiguous pages, requiring >8K of scatterlist elements
> > > (admittedly, we can chain them, but we may have to do one or more large
> > > allocations).
> > >
> > > > However, I would recommend against it:
> > >
> > > Sorry, recommend against what?
> > >
> >
> > Recommend against the current approach of manipulating the input like
> > this and feeding it into the skcipher piecemeal.
>
> Right.  I understand the problem, but as I mentioned above, the scatterlist
> itself becomes a performance issue as it may exceed two pages in size.  Double
> that as there may need to be separate input and output scatterlists.
>

I wasn't aware that Herbert's work hadn't been merged yet. So that
means it is entirely reasonable to split the input like this and feed
the first part into a cbc(aes) skcipher and the last part into a
cts(cbc(aes)) skcipher, provided that you ensure that the last part
covers the final two blocks (one full block and one block that is
either full or partial)

With Herbert's changes, you will be able to use the same skcipher, and
pass a flag to all but the final part that more data is coming. But
for lack of that, the current approach is optimal for cases where
having to cover the entire input with a single scatterlist is
undesirable.

> > Herbert recently made some changes for MSG_MORE support in the AF_ALG
> > code, which permits a skcipher encryption to be split into several
> > invocations of the skcipher layer without the need for this complexity
> > on the side of the caller. Maybe there is a way to reuse that here.
> > Herbert?
>
> I wonder if it would help if the input buffer and output buffer didn't have to
> correspond exactly in usage - ie. the output buffer could be used at a slower
> rate than the input to allow for buffering inside the crypto algorithm.
>

I don't follow - how could one be used at a slower rate?

> > > Can you also do SHA at the same time in the same loop?
> >
> > SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD.
> > The former doesn't really fit the current API so we'd have to invent
> > something for it.
>
> The hashes corresponding to the kerberos enctypes I'm supporting are:
>
> HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96.
>
> HMAC-SHA256 for aes128-cts-hmac-sha256-128
>
> HMAC-SHA384 for aes256-cts-hmac-sha384-192
>
> CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac
>
> I'm not sure you can support all of those with the instructions available.
>

It depends on whether the caller can make use of the authenc()
pattern, which is a type of AEAD we support. There are numerous
implementations of authenc(hmac(shaXXX),cbc(aes)), including h/w
accelerated ones, but none that implement ciphertext stealing. So that
means that, even if you manage to use the AEAD layer to perform both
at the same time, the generic authenc() template will perform the
cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes,
respectively, which won't give you any benefit until accelerated
implementations turn up that perform the whole operation in one pass
over the input. And even then, I don't think the performance benefit
will be worth it.


Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-07 Thread David Howells
Ard Biesheuvel  wrote:

> > Yeah - the problem with that is that for sunrpc, we might be dealing with 
> > 1MB
> > plus bits of non-contiguous pages, requiring >8K of scatterlist elements
> > (admittedly, we can chain them, but we may have to do one or more large
> > allocations).
> >
> > > However, I would recommend against it:
> >
> > Sorry, recommend against what?
> >
> 
> Recommend against the current approach of manipulating the input like
> this and feeding it into the skcipher piecemeal.

Right.  I understand the problem, but as I mentioned above, the scatterlist
itself becomes a performance issue as it may exceed two pages in size.  Double
that as there may need to be separate input and output scatterlists.

> Herbert recently made some changes for MSG_MORE support in the AF_ALG
> code, which permits a skcipher encryption to be split into several
> invocations of the skcipher layer without the need for this complexity
> on the side of the caller. Maybe there is a way to reuse that here.
> Herbert?

I wonder if it would help if the input buffer and output buffer didn't have to
correspond exactly in usage - ie. the output buffer could be used at a slower
rate than the input to allow for buffering inside the crypto algorithm.

> > Can you also do SHA at the same time in the same loop?
> 
> SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD.
> The former doesn't really fit the current API so we'd have to invent
> something for it.

The hashes corresponding to the kerberos enctypes I'm supporting are:

HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96.

HMAC-SHA256 for aes128-cts-hmac-sha256-128

HMAC-SHA384 for aes256-cts-hmac-sha384-192

CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac

I'm not sure you can support all of those with the instructions available.

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-07 Thread David Howells
Herbert Xu  wrote:

> > Herbert recently made some changes for MSG_MORE support in the AF_ALG
> > code, which permits a skcipher encryption to be split into several
> > invocations of the skcipher layer without the need for this complexity
> > on the side of the caller. Maybe there is a way to reuse that here.
> > Herbert?
> 
> Yes this was one of the reasons I was persuing the continuation
> work.  It should allow us to kill the special case for CTS in the
> krb5 code.
> 
> Hopefully I can get some time to restart work on this soon.

In the krb5 case, we know in advance how much data we're going to be dealing
with, if that helps.

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread Herbert Xu
On Fri, Dec 04, 2020 at 06:35:48PM +0100, Ard Biesheuvel wrote:
>
> Herbert recently made some changes for MSG_MORE support in the AF_ALG
> code, which permits a skcipher encryption to be split into several
> invocations of the skcipher layer without the need for this complexity
> on the side of the caller. Maybe there is a way to reuse that here.
> Herbert?

Yes this was one of the reasons I was persuing the continuation
work.  It should allow us to kill the special case for CTS in the
krb5 code.

Hopefully I can get some time to restart work on this soon.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread Theodore Y. Ts'o
On Fri, Dec 04, 2020 at 02:59:35PM +, David Howells wrote:
> Hi Chuck, Bruce,
> 
> Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
> gss_krb5_aes_encrypt() code looks like the attached.
> 
> From what I can tell, in AES mode, the difference between the main cipher and
> the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
> "cts(cbc(aes))" - but they have the same key.
> 
> Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
> same as the non-CTS, except for the last two blocks, but the non-CTS one is
> more efficient.

The reason to use CTS is if you don't want to expand the size of the
cipher text to the cipher block size.  e.g., if you have a 53 byte
plaintext, and you can't afford to let the ciphertext be 56 bytes, the
cryptographic engineer will reach for CTS instead of CBC.

So that probably explains the explanation to use CTS (and it's
required by the spec in any case).  As far as why CBC is being used
instead of CTS, the only reason I can think of is the one you posted.
Perhaps there was some hardware or software configureation where
cbc(aes) was hardware accelerated, and cts(cbc(aes)) would not be?

In any case, using cbc(aes) for all but the last two blocks, and using
cts(cbc(aes)) for the last two blocks, is identical to using
cts(cbc(aes)) for the whole encryption.  So the only reason to do this
in the more complex way would be because for performance reasons.

 - Ted


Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread Ard Biesheuvel
On Fri, 4 Dec 2020 at 18:19, David Howells  wrote:
>
> Ard Biesheuvel  wrote:
>
> > The tricky thing with CTS is that you have to ensure that the final
> > full and partial blocks are presented to the crypto driver as one
> > chunk, or it won't be able to perform the ciphertext stealing. This
> > might be the reason for the current approach. If the sunrpc code has
> > multiple disjoint chunks of data to encrypto, it is always better to
> > wrap it in a single scatterlist and call into the skcipher only once.
>
> Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB
> plus bits of non-contiguous pages, requiring >8K of scatterlist elements
> (admittedly, we can chain them, but we may have to do one or more large
> allocations).
>
> > However, I would recommend against it:
>
> Sorry, recommend against what?
>

Recommend against the current approach of manipulating the input like
this and feeding it into the skcipher piecemeal.

Herbert recently made some changes for MSG_MORE support in the AF_ALG
code, which permits a skcipher encryption to be split into several
invocations of the skcipher layer without the need for this complexity
on the side of the caller. Maybe there is a way to reuse that here.
Herbert?

> > at least for ARM and arm64, I
> > have already contributed SIMD based implementations that use SIMD
> > permutation instructions and overlapping loads and stores to perform
> > the ciphertext stealing, which means that there is only a single layer
> > which implements CTS+CBC+AES, and this layer can consume the entire
> > scatterlist in one go. We could easily do something similar in the
> > AES-NI driver as well.
>
> Can you point me at that in the sources?
>

arm64 has

arch/arm64/crypto/aes-glue.c
arch/arm64/crypto/aes-modes.S

where the former implements the skcipher wrapper for an implementation
of "cts(cbc(aes))"

static int cts_cbc_encrypt(struct skcipher_request *req)

walks over the src/dst scatterlist and feeds the data into the asm
helpers, one for the bulk of the input, and one for the final full and
partial blocks (or two final full blocks)

The SIMD asm helpers are

aes_cbc_encrypt
aes_cbc_decrypt
aes_cbc_cts_encrypt
aes_cbc_cts_decrypt

> Can you also do SHA at the same time in the same loop?
>

SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD.
The former doesn't really fit the current API so we'd have to invent
something for it.

> Note that the rfc3962 AES does the checksum over the plaintext, but rfc8009
> does it over the ciphertext.
>
> David
>


Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread David Howells
Ard Biesheuvel  wrote:

> The tricky thing with CTS is that you have to ensure that the final
> full and partial blocks are presented to the crypto driver as one
> chunk, or it won't be able to perform the ciphertext stealing. This
> might be the reason for the current approach. If the sunrpc code has
> multiple disjoint chunks of data to encrypto, it is always better to
> wrap it in a single scatterlist and call into the skcipher only once.

Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB
plus bits of non-contiguous pages, requiring >8K of scatterlist elements
(admittedly, we can chain them, but we may have to do one or more large
allocations).

> However, I would recommend against it:

Sorry, recommend against what?

> at least for ARM and arm64, I
> have already contributed SIMD based implementations that use SIMD
> permutation instructions and overlapping loads and stores to perform
> the ciphertext stealing, which means that there is only a single layer
> which implements CTS+CBC+AES, and this layer can consume the entire
> scatterlist in one go. We could easily do something similar in the
> AES-NI driver as well.

Can you point me at that in the sources?

Can you also do SHA at the same time in the same loop?

Note that the rfc3962 AES does the checksum over the plaintext, but rfc8009
does it over the ciphertext.

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread Ard Biesheuvel
On Fri, 4 Dec 2020 at 17:52, David Howells  wrote:
>
> Bruce Fields  wrote:
>
> > OK, I guess I don't understand the question.  I haven't thought about
> > this code in at least a decade.  What's an auxilary cipher?  Is this a
> > question about why we're implementing something, or how we're
> > implementing it?
>
> That's what the Linux sunrpc implementation calls them:
>
> struct crypto_sync_skcipher *acceptor_enc;
> struct crypto_sync_skcipher *initiator_enc;
> struct crypto_sync_skcipher *acceptor_enc_aux;
> struct crypto_sync_skcipher *initiator_enc_aux;
>
> Auxiliary ciphers aren't mentioned in rfc396{1,2} so it appears to be
> something peculiar to that implementation.
>
> So acceptor_enc and acceptor_enc_aux, for instance, are both based on the same
> key, and the implementation seems to pass the IV from one to the other.  The
> only difference is that the 'aux' cipher lacks the CTS wrapping - which only
> makes a difference for the final two blocks[*] of the encryption (or
> decryption) - and only if the data doesn't fully fill out the last block
> (ie. it needs padding in some way so that the encryption algorithm can handle
> it).
>
> [*] Encryption cipher blocks, that is.
>
> So I think it's purpose is twofold:
>
>  (1) It's a way to be a bit more efficient, cutting out the CTS layer's
>  indirection and additional buffering.
>
>  (2) crypto_skcipher_encrypt() assumes that it's doing the entire crypto
>  operation in one go and will always impose the final CTS bit, so you
>  can't call it repeatedly to progress through a buffer (as
>  xdr_process_buf() would like to do) as that would corrupt the data being
>  encrypted - unless you made sure that the data was always block-size
>  aligned (in which case, there's no point using CTS).
>
> I wonder how much going through three layers of crypto modules costs.  Looking
> at how AES can be implemented using, say, Intel AES intructions, it looks like
> AES+CBC should be easy to do in a single module.  I wonder if we could have
> optimised kerberos crypto that do the AES and the SHA together in a single
> loop.
>

The tricky thing with CTS is that you have to ensure that the final
full and partial blocks are presented to the crypto driver as one
chunk, or it won't be able to perform the ciphertext stealing. This
might be the reason for the current approach. If the sunrpc code has
multiple disjoint chunks of data to encrypto, it is always better to
wrap it in a single scatterlist and call into the skcipher only once.

However, I would recommend against it: at least for ARM and arm64, I
have already contributed SIMD based implementations that use SIMD
permutation instructions and overlapping loads and stores to perform
the ciphertext stealing, which means that there is only a single layer
which implements CTS+CBC+AES, and this layer can consume the entire
scatterlist in one go. We could easily do something similar in the
AES-NI driver as well.


Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread David Howells
Bruce Fields  wrote:

> OK, I guess I don't understand the question.  I haven't thought about
> this code in at least a decade.  What's an auxilary cipher?  Is this a
> question about why we're implementing something, or how we're
> implementing it?

That's what the Linux sunrpc implementation calls them:

struct crypto_sync_skcipher *acceptor_enc;
struct crypto_sync_skcipher *initiator_enc;
struct crypto_sync_skcipher *acceptor_enc_aux;
struct crypto_sync_skcipher *initiator_enc_aux;

Auxiliary ciphers aren't mentioned in rfc396{1,2} so it appears to be
something peculiar to that implementation.

So acceptor_enc and acceptor_enc_aux, for instance, are both based on the same
key, and the implementation seems to pass the IV from one to the other.  The
only difference is that the 'aux' cipher lacks the CTS wrapping - which only
makes a difference for the final two blocks[*] of the encryption (or
decryption) - and only if the data doesn't fully fill out the last block
(ie. it needs padding in some way so that the encryption algorithm can handle
it).

[*] Encryption cipher blocks, that is.

So I think it's purpose is twofold:

 (1) It's a way to be a bit more efficient, cutting out the CTS layer's
 indirection and additional buffering.

 (2) crypto_skcipher_encrypt() assumes that it's doing the entire crypto
 operation in one go and will always impose the final CTS bit, so you
 can't call it repeatedly to progress through a buffer (as
 xdr_process_buf() would like to do) as that would corrupt the data being
 encrypted - unless you made sure that the data was always block-size
 aligned (in which case, there's no point using CTS).

I wonder how much going through three layers of crypto modules costs.  Looking
at how AES can be implemented using, say, Intel AES intructions, it looks like
AES+CBC should be easy to do in a single module.  I wonder if we could have
optimised kerberos crypto that do the AES and the SHA together in a single
loop.

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread Bruce Fields
On Fri, Dec 04, 2020 at 10:46:26AM -0500, Bruce Fields wrote:
> On Fri, Dec 04, 2020 at 02:59:35PM +, David Howells wrote:
> > Hi Chuck, Bruce,
> > 
> > Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
> > gss_krb5_aes_encrypt() code looks like the attached.
> > 
> > >From what I can tell, in AES mode, the difference between the main cipher 
> > >and
> > the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
> > "cts(cbc(aes))" - but they have the same key.
> > 
> > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
> > same as the non-CTS, except for the last two blocks, but the non-CTS one is
> > more efficient.
> 
> CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
> that did that, and I don't remember the history.  I thought it was
> required by some spec or peer implementation (maybe Windows?) but I
> really don't remember.  It may predate git.  I'll dig around and see
> what I can find.

Like I say, I've got no insight here, I'm just grepping through
mailboxes and stuff, but maybe some of this history's useful;

Addition of CTS mode:


https://lore.kernel.org/linux-crypto/20080220202543.3209.47410.st...@jazz.citi.umich.edu/

This rpc/krb5 code went in with 934a95aa1c9c "gss_krb5: add remaining
pieces to enable AES encryption support"; may be worth looking at that
and the series leading up to it, I see the changelogs have some RFC
references that might explain why it's using the crypto it is.

--b.


Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread Chuck Lever



> On Dec 4, 2020, at 10:46 AM, Bruce Fields  wrote:
> 
> On Fri, Dec 04, 2020 at 02:59:35PM +, David Howells wrote:
>> Hi Chuck, Bruce,
>> 
>> Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
>> gss_krb5_aes_encrypt() code looks like the attached.
>> 
>>> From what I can tell, in AES mode, the difference between the main cipher 
>>> and
>> the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
>> "cts(cbc(aes))" - but they have the same key.
>> 
>> Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
>> same as the non-CTS, except for the last two blocks, but the non-CTS one is
>> more efficient.
> 
> CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
> that did that, and I don't remember the history.  I thought it was
> required by some spec or peer implementation (maybe Windows?) but I
> really don't remember.  It may predate git.  I'll dig around and see
> what I can find.

I can't add more here, this design comes from well before I started
working on this body of code (though, I worked near Kevin when he
implemented it).


--
Chuck Lever





Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread Bruce Fields
On Fri, Dec 04, 2020 at 04:01:53PM +, David Howells wrote:
> Bruce Fields  wrote:
> 
> > > Reading up on CTS, I'm guessing the reason it's like this is that CTS is 
> > > the
> > > same as the non-CTS, except for the last two blocks, but the non-CTS one 
> > > is
> > > more efficient.
> > 
> > CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
> > that did that, and I don't remember the history.  I thought it was
> > required by some spec or peer implementation (maybe Windows?) but I
> > really don't remember.  It may predate git.  I'll dig around and see
> > what I can find.
> 
> rfc3961 and rfc3962 specify CTS-CBC with AES.

OK, I guess I don't understand the question.  I haven't thought about
this code in at least a decade.  What's an auxilary cipher?  Is this a
question about why we're implementing something, or how we're
implementing it?

--b.


Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread David Howells
Bruce Fields  wrote:

> > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
> > same as the non-CTS, except for the last two blocks, but the non-CTS one is
> > more efficient.
> 
> CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
> that did that, and I don't remember the history.  I thought it was
> required by some spec or peer implementation (maybe Windows?) but I
> really don't remember.  It may predate git.  I'll dig around and see
> what I can find.

rfc3961 and rfc3962 specify CTS-CBC with AES.

David



Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread Bruce Fields
On Fri, Dec 04, 2020 at 02:59:35PM +, David Howells wrote:
> Hi Chuck, Bruce,
> 
> Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
> gss_krb5_aes_encrypt() code looks like the attached.
> 
> >From what I can tell, in AES mode, the difference between the main cipher and
> the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
> "cts(cbc(aes))" - but they have the same key.
> 
> Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
> same as the non-CTS, except for the last two blocks, but the non-CTS one is
> more efficient.

CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
that did that, and I don't remember the history.  I thought it was
required by some spec or peer implementation (maybe Windows?) but I
really don't remember.  It may predate git.  I'll dig around and see
what I can find.

--b.

> 
> David
> ---
>   nbytes = buf->len - offset - GSS_KRB5_TOK_HDR_LEN;
>   nblocks = (nbytes + blocksize - 1) / blocksize;
>   cbcbytes = 0;
>   if (nblocks > 2)
>   cbcbytes = (nblocks - 2) * blocksize;
> 
>   memset(desc.iv, 0, sizeof(desc.iv));
> 
>   if (cbcbytes) {
>   SYNC_SKCIPHER_REQUEST_ON_STACK(req, aux_cipher);
> 
>   desc.pos = offset + GSS_KRB5_TOK_HDR_LEN;
>   desc.fragno = 0;
>   desc.fraglen = 0;
>   desc.pages = pages;
>   desc.outbuf = buf;
>   desc.req = req;
> 
>   skcipher_request_set_sync_tfm(req, aux_cipher);
>   skcipher_request_set_callback(req, 0, NULL, NULL);
> 
>   sg_init_table(desc.infrags, 4);
>   sg_init_table(desc.outfrags, 4);
> 
>   err = xdr_process_buf(buf, offset + GSS_KRB5_TOK_HDR_LEN,
> cbcbytes, encryptor, );
>   skcipher_request_zero(req);
>   if (err)
>   goto out_err;
>   }
> 
>   /* Make sure IV carries forward from any CBC results. */
>   err = gss_krb5_cts_crypt(cipher, buf,
>offset + GSS_KRB5_TOK_HDR_LEN + cbcbytes,
>desc.iv, pages, 1);
>   if (err) {
>   err = GSS_S_FAILURE;
>   goto out_err;
>   }


Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-04 Thread David Howells
Hi Chuck, Bruce,

Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
gss_krb5_aes_encrypt() code looks like the attached.

>From what I can tell, in AES mode, the difference between the main cipher and
the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
"cts(cbc(aes))" - but they have the same key.

Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
same as the non-CTS, except for the last two blocks, but the non-CTS one is
more efficient.

David
---
nbytes = buf->len - offset - GSS_KRB5_TOK_HDR_LEN;
nblocks = (nbytes + blocksize - 1) / blocksize;
cbcbytes = 0;
if (nblocks > 2)
cbcbytes = (nblocks - 2) * blocksize;

memset(desc.iv, 0, sizeof(desc.iv));

if (cbcbytes) {
SYNC_SKCIPHER_REQUEST_ON_STACK(req, aux_cipher);

desc.pos = offset + GSS_KRB5_TOK_HDR_LEN;
desc.fragno = 0;
desc.fraglen = 0;
desc.pages = pages;
desc.outbuf = buf;
desc.req = req;

skcipher_request_set_sync_tfm(req, aux_cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);

sg_init_table(desc.infrags, 4);
sg_init_table(desc.outfrags, 4);

err = xdr_process_buf(buf, offset + GSS_KRB5_TOK_HDR_LEN,
  cbcbytes, encryptor, );
skcipher_request_zero(req);
if (err)
goto out_err;
}

/* Make sure IV carries forward from any CBC results. */
err = gss_krb5_cts_crypt(cipher, buf,
 offset + GSS_KRB5_TOK_HDR_LEN + cbcbytes,
 desc.iv, pages, 1);
if (err) {
err = GSS_S_FAILURE;
goto out_err;
}