Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)
>> Also not mentioned in the documentation is that some algorithms *do* >> have different implementations depending on key size. SHA-2 is the >> classic example. > What do you mean by that? SHA has no keying at all. In this case, the analagous property is hash size. Sorry, I thought that was so obvious I didn't need to say it. Specifically, SHA2-256 (and -224) and SHA2-512 (and -384) are separate algorithms with similar structures but deparate implementations.
Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)
On Tue, Jun 28, 2016 at 10:32:12AM -0400, George Spelvin wrote: > > - struct crypto_instance > - struct crypto_spawn > - struct crypto_blkcipher > - struct blkcipher_desc > - More on the context structures returned by crypto_tfm_ctx blkcipher is obsolete and will be removed soon. So if you are going to write this then please document skcipher instead. > Also not mentioned in the documentation is that some algorithms *do* > have different implementations depending on key size. SHA-2 is the > classic example. What do you mean by that? SHA has no keying at all. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)
> We have actually gained quite a bit of documentation recently. > Have you looked at Documentation/DocBook/crypto-API.tmpl? > > More is always welcome of course. It's improved since I last looked at it, but there are still many structures that aren't described: - struct crypto_instance - struct crypto_spawn - struct crypto_blkcipher - struct blkcipher_desc - More on the context structures returned by crypto_tfm_ctx Also not mentioned in the documentation is that some algorithms *do* have different implementations depending on key size. SHA-2 is the classic example.
Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)
On Tue, Jun 28, 2016 at 09:23:01AM -0400, George Spelvin wrote: > > Wow, I should see how you do that. I couldn't get it below 3 > blocks of temporary, and the dst SG list only gives you > one and a half. I don't mean that I'm using no temporary buffers at all, just that the actual crypto only operates on the SG lists. I'm still doing the xoring and stitching in temp buffers. I just counted and I'm using three blocks like you. > Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd? Yes gss_krb5_crypto.c is the one. > I have a request of you: like Andy, I find the crypto layer an > impenetrable thicket of wrapper structures. I'm not suggesting there > aren't reasons for it, but it's extremely hard to infer those reasons by > looking at the code. If I were to draft a (hilariously wrong) overview > document, would you be willing to edit it into correctness? We have actually gained quite a bit of documentation recently. Have you looked at Documentation/DocBook/crypto-API.tmpl? More is always welcome of course. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)
Herbert Xu wrote: > I'm currently working on cts and I'm removing the stack usage > altogether by having it operate on the src/dst SG lists only. Wow, I should see how you do that. I couldn't get it below 3 blocks of temporary, and the dst SG list only gives you one and a half. > BTW, the only cts user in our tree appears to be implementing > CTS all over again and is only calling the crypto API cts for > the last two blocks. Someone should fix that. Hint taken. Although I'm having a hard time finding that only user amidst all the drivers thinking it means Clear To Send or (for HDMI) Cycle Time Stamp. Um...the uses in fs/crypto/keyinfo.c and fs/ext4/crypto_key.c don't seem to do anything untoward. Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd? I have a request of you: like Andy, I find the crypto layer an impenetrable thicket of wrapper structures. I'm not suggesting there aren't reasons for it, but it's extremely hard to infer those reasons by looking at the code. If I were to draft a (hilariously wrong) overview document, would you be willing to edit it into correctness?
Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)
On Tue, Jun 28, 2016 at 08:37:43AM -0400, George Spelvin wrote: > Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers. > > I have a local patch (appended, if anyone wants) to reduce the wasteful > amount of buffer space it uses (from 7 to 3 blocks on encrypt, from > 6 to 3 blocks on decrypt), but it would take some rework to convert to > crypto_cipher_encrypt_one() or avoid stack buffers entirely. I'm currently working on cts and I'm removing the stack usage altogether by having it operate on the src/dst SG lists only. It's part of the skcipher conversion though so it'll have to go through the crypto tree. BTW, the only cts user in our tree appears to be implementing CTS all over again and is only calling the crypto API cts for the last two blocks. Someone should fix that. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)
Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers. I have a local patch (appended, if anyone wants) to reduce the wasteful amount of buffer space it uses (from 7 to 3 blocks on encrypt, from 6 to 3 blocks on decrypt), but it would take some rework to convert to crypto_cipher_encrypt_one() or avoid stack buffers entirely. commit c0aa0ae38dc6115b378939c5483ba6c7eb65d92a Author: George Spelvin Date: Sat Oct 10 17:26:08 2015 -0400 crypto: cts - Reduce internal buffer usage It only takes a 3-block temporary buffer to handle all the tricky CTS cases. Encryption could in theory be done with two, but at a cost in complexity. But it's still a saving from the previous six blocks on the stack. One issue I'm uncertain of and I'd like clarification on: to simplify the cts_cbc_{en,de}crypt calls, I pass in the lcldesc structure which contains the ctx->child transform rather than the parent one. I'm assuming the block sizes are guaranteed to be the same (they're set up in crypto_cts_alloc by copying), but I haven't been able to prove it to my satisfaction. Signed-off-by: George Spelvin diff --git a/crypto/cts.c b/crypto/cts.c index e467ec0ac..e24d2e15 100644 --- a/crypto/cts.c +++ b/crypto/cts.c @@ -70,54 +70,44 @@ static int crypto_cts_setkey(struct crypto_tfm *parent, const u8 *key, return err; } -static int cts_cbc_encrypt(struct crypto_cts_ctx *ctx, - struct blkcipher_desc *desc, +/* + * The final CTS encryption is just like CBC encryption except that: + * - the last plaintext block is zero-padded, + * - the second-last ciphertext block is trimmed, and + * - the last (complete) block of ciphertext is output before the + * (truncated) second-last one. + */ +static int cts_cbc_encrypt(struct blkcipher_desc *lcldesc, struct scatterlist *dst, struct scatterlist *src, unsigned int offset, unsigned int nbytes) { - int bsize = crypto_blkcipher_blocksize(desc->tfm); - u8 tmp[bsize], tmp2[bsize]; - struct blkcipher_desc lcldesc; - struct scatterlist sgsrc[1], sgdst[1]; + int bsize = crypto_blkcipher_blocksize(lcldesc->tfm); + u8 tmp[3*bsize] __aligned(8); + struct scatterlist sgsrc[1], sgdst[2]; int lastn = nbytes - bsize; - u8 iv[bsize]; - u8 s[bsize * 2], d[bsize * 2]; int err; - if (lastn < 0) + if (lastn <= 0) return -EINVAL; - sg_init_table(sgsrc, 1); - sg_init_table(sgdst, 1); - - memset(s, 0, sizeof(s)); - scatterwalk_map_and_copy(s, src, offset, nbytes, 0); - - memcpy(iv, desc->info, bsize); - - lcldesc.tfm = ctx->child; - lcldesc.info = iv; - lcldesc.flags = desc->flags; - - sg_set_buf(&sgsrc[0], s, bsize); - sg_set_buf(&sgdst[0], tmp, bsize); - err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize); - - memcpy(d + bsize, tmp, lastn); - - lcldesc.info = tmp; - - sg_set_buf(&sgsrc[0], s + bsize, bsize); - sg_set_buf(&sgdst[0], tmp2, bsize); - err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize); - - memcpy(d, tmp2, bsize); - - scatterwalk_map_and_copy(d, dst, offset, nbytes, 1); - - memcpy(desc->info, tmp2, bsize); + /* Copy the input to a temporary buffer; tmp = xxx, P[n-1], P[n] */ + memset(tmp+2*bsize, 0, bsize); + scatterwalk_map_and_copy(tmp+bsize, src, offset, nbytes, 0); + + sg_init_one(sgsrc, tmp+bsize, 2*bsize); + /* Initialize dst specially to do the rearrangement for us */ + sg_init_table(sgdst, 2); + sg_set_buf(sgdst+0, tmp+bsize, bsize); + sg_set_buf(sgdst+1, tmp, bsize); + + /* CBC-encrypt in place the two blocks; tmp = C[n], C[n-1], P[n] */ + err = crypto_blkcipher_encrypt_iv(lcldesc, sgdst, sgsrc, 2*bsize); + + /* Copy beginning of tmp to the output */ + scatterwalk_map_and_copy(tmp, dst, offset, nbytes, 1); + memzero_explicit(tmp, sizeof(tmp)); return err; } @@ -126,8 +116,8 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes) { - struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm); int bsize = crypto_blkcipher_blocksize(desc->tfm); + struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm); int tot_blocks = (nbytes + bsize - 1) / bsize; int cbc_blocks = tot_blocks > 2 ? tot_blocks - 2 : 0; struct blkcipher_desc lcldesc; @@ -140,14 +130,14 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc, if (tot_blocks == 1) { err = crypto_blkcipher_encrypt_iv(&lcldesc, dst, src, bsize); } else if (nbytes <
Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)
On Wed, Jun 22, 2016 at 11:41 PM, Herbert Xu wrote: > On Thu, Jun 23, 2016 at 11:48:25AM +0800, Herbert Xu wrote: >> >> No we never had such an API in the kernel. However, I see that >> rxkad does some pretty silly things and we should be able to avoid >> using the stack in pretty much all cases. Let me try to come up with >> something. > > Here it is: > > ---8<--- > Subject: rxrpc: Avoid using stack memory in SG lists in rxkad Looks reasonable to me. Unless anyone tells me otherwise, my plan is to queue it in my virtually-mapped stack series and to ask Ingo to apply it via -tip. If it went in via the networking tree, that would work as well, but it would introduce a bisectability problem. Thanks! --Andy
Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)
On Thu, Jun 23, 2016 at 11:48:25AM +0800, Herbert Xu wrote: > > No we never had such an API in the kernel. However, I see that > rxkad does some pretty silly things and we should be able to avoid > using the stack in pretty much all cases. Let me try to come up with > something. Here it is: ---8<--- Subject: rxrpc: Avoid using stack memory in SG lists in rxkad rxkad uses stack memory in SG lists which would not work if stacks were allocated from vmalloc memory. In fact, in most cases this isn't even necessary as the stack memory ends up getting copied over to kmalloc memory. This patch eliminates all the unnecessary stack memory uses by supplying the final destination directly to the crypto API. In two instances where a temporary buffer is actually needed we also switch use the skb->cb area instead of the stack. Finally there is no need to split a split-page buffer into two SG entries so code dealing with that has been removed. Signed-off-by: Herbert Xu diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index f0b807a..8ee5933 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -277,6 +277,7 @@ struct rxrpc_connection { struct key *key; /* security for this connection (client) */ struct key *server_key;/* security for this service */ struct crypto_skcipher *cipher;/* encryption handle */ + struct rxrpc_crypt csum_iv_head; /* leading block for csum_iv */ struct rxrpc_crypt csum_iv;/* packet checksum base */ unsigned long events; #define RXRPC_CONN_CHALLENGE 0 /* send challenge packet */ diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c index 6b726a0..ee142de 100644 --- a/net/rxrpc/rxkad.c +++ b/net/rxrpc/rxkad.c @@ -105,11 +105,9 @@ static void rxkad_prime_packet_security(struct rxrpc_connection *conn) { struct rxrpc_key_token *token; SKCIPHER_REQUEST_ON_STACK(req, conn->cipher); - struct scatterlist sg[2]; + struct rxrpc_crypt *csum_iv; + struct scatterlist sg; struct rxrpc_crypt iv; - struct { - __be32 x[4]; - } tmpbuf __attribute__((aligned(16))); /* must all be in same page */ _enter(""); @@ -119,24 +117,21 @@ static void rxkad_prime_packet_security(struct rxrpc_connection *conn) token = conn->key->payload.data[0]; memcpy(&iv, token->kad->session_key, sizeof(iv)); - tmpbuf.x[0] = htonl(conn->epoch); - tmpbuf.x[1] = htonl(conn->cid); - tmpbuf.x[2] = 0; - tmpbuf.x[3] = htonl(conn->security_ix); + csum_iv = &conn->csum_iv_head; + csum_iv[0].x[0] = htonl(conn->epoch); + csum_iv[0].x[1] = htonl(conn->cid); + csum_iv[1].x[0] = 0; + csum_iv[1].x[1] = htonl(conn->security_ix); - sg_init_one(&sg[0], &tmpbuf, sizeof(tmpbuf)); - sg_init_one(&sg[1], &tmpbuf, sizeof(tmpbuf)); + sg_init_one(&sg, csum_iv, 16); skcipher_request_set_tfm(req, conn->cipher); skcipher_request_set_callback(req, 0, NULL, NULL); - skcipher_request_set_crypt(req, &sg[1], &sg[0], sizeof(tmpbuf), iv.x); + skcipher_request_set_crypt(req, &sg, &sg, 16, iv.x); crypto_skcipher_encrypt(req); skcipher_request_zero(req); - memcpy(&conn->csum_iv, &tmpbuf.x[2], sizeof(conn->csum_iv)); - ASSERTCMP((u32 __force)conn->csum_iv.n[0], ==, (u32 __force)tmpbuf.x[2]); - _leave(""); } @@ -150,12 +145,9 @@ static int rxkad_secure_packet_auth(const struct rxrpc_call *call, { struct rxrpc_skb_priv *sp; SKCIPHER_REQUEST_ON_STACK(req, call->conn->cipher); + struct rxkad_level1_hdr hdr; struct rxrpc_crypt iv; - struct scatterlist sg[2]; - struct { - struct rxkad_level1_hdr hdr; - __be32 first; /* first four bytes of data and padding */ - } tmpbuf __attribute__((aligned(8))); /* must all be in same page */ + struct scatterlist sg; u16 check; sp = rxrpc_skb(skb); @@ -165,24 +157,21 @@ static int rxkad_secure_packet_auth(const struct rxrpc_call *call, check = sp->hdr.seq ^ sp->hdr.callNumber; data_size |= (u32)check << 16; - tmpbuf.hdr.data_size = htonl(data_size); - memcpy(&tmpbuf.first, sechdr + 4, sizeof(tmpbuf.first)); + hdr.data_size = htonl(data_size); + memcpy(sechdr, &hdr, sizeof(hdr)); /* start the encryption afresh */ memset(&iv, 0, sizeof(iv)); - sg_init_one(&sg[0], &tmpbuf, sizeof(tmpbuf)); - sg_init_one(&sg[1], &tmpbuf, sizeof(tmpbuf)); + sg_init_one(&sg, sechdr, 8); skcipher_request_set_tfm(req, call->conn->cipher); skcipher_request_set_callback(req, 0, NULL, NULL); - skcipher_request_set_crypt(req, &sg[1], &sg[0], sizeof(tmpbuf), iv.x); + skcipher_request_set_crypt(req, &sg, &sg, 8, iv.x); crypto_skcipher_encry