Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-29 Thread George Spelvin
>> Also not mentioned in the documentation is that some algorithms *do*
>> have different implementations depending on key size.  SHA-2 is the
>> classic example.

> What do you mean by that? SHA has no keying at all.

In this case, the analagous property is hash size.  Sorry, I thought
that was so obvious I didn't need to say it.

Specifically, SHA2-256 (and -224) and SHA2-512 (and -384) are separate
algorithms with similar structures but deparate implementations.


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread Herbert Xu
On Tue, Jun 28, 2016 at 10:32:12AM -0400, George Spelvin wrote:
> 
> - struct crypto_instance
> - struct crypto_spawn
> - struct crypto_blkcipher
> - struct blkcipher_desc
> - More on the context structures returned by crypto_tfm_ctx

blkcipher is obsolete and will be removed soon.  So if you are
going to write this then please document skcipher instead.

> Also not mentioned in the documentation is that some algorithms *do*
> have different implementations depending on key size.  SHA-2 is the
> classic example.

What do you mean by that? SHA has no keying at all.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin
> We have actually gained quite a bit of documentation recently.
> Have you looked at Documentation/DocBook/crypto-API.tmpl?
> 
> More is always welcome of course.

It's improved since I last looked at it, but there are still many structures
that aren't described:

- struct crypto_instance
- struct crypto_spawn
- struct crypto_blkcipher
- struct blkcipher_desc
- More on the context structures returned by crypto_tfm_ctx

Also not mentioned in the documentation is that some algorithms *do*
have different implementations depending on key size.  SHA-2 is the
classic example.


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread Herbert Xu
On Tue, Jun 28, 2016 at 09:23:01AM -0400, George Spelvin wrote:
> 
> Wow, I should see how you do that.  I couldn't get it below 3
> blocks of temporary, and the dst SG list only gives you
> one and a half.

I don't mean that I'm using no temporary buffers at all, just
that the actual crypto only operates on the SG lists.  I'm still
doing the xoring and stitching in temp buffers.  I just counted
and I'm using three blocks like you.

> Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd?

Yes gss_krb5_crypto.c is the one.

> I have a request of you: like Andy, I find the crypto layer an
> impenetrable thicket of wrapper structures.  I'm not suggesting there
> aren't reasons for it, but it's extremely hard to infer those reasons by
> looking at the code.  If I were to draft a (hilariously wrong) overview
> document, would you be willing to edit it into correctness?

We have actually gained quite a bit of documentation recently.
Have you looked at Documentation/DocBook/crypto-API.tmpl?

More is always welcome of course.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin
Herbert Xu wrote:
> I'm currently working on cts and I'm removing the stack usage
> altogether by having it operate on the src/dst SG lists only.

Wow, I should see how you do that.  I couldn't get it below 3
blocks of temporary, and the dst SG list only gives you
one and a half.

> BTW, the only cts user in our tree appears to be implementing
> CTS all over again and is only calling the crypto API cts for
> the last two blocks.  Someone should fix that.

Hint taken.  Although I'm having a hard time finding that only user
amidst all the drivers thinking it means Clear To Send or (for HDMI)
Cycle Time Stamp.

Um...the uses in fs/crypto/keyinfo.c and fs/ext4/crypto_key.c
don't seem to do anything untoward.

Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd?


I have a request of you: like Andy, I find the crypto layer an
impenetrable thicket of wrapper structures.  I'm not suggesting there
aren't reasons for it, but it's extremely hard to infer those reasons by
looking at the code.  If I were to draft a (hilariously wrong) overview
document, would you be willing to edit it into correctness?


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread Herbert Xu
On Tue, Jun 28, 2016 at 08:37:43AM -0400, George Spelvin wrote:
> Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers.
> 
> I have a local patch (appended, if anyone wants) to reduce the wasteful
> amount of buffer space it uses (from 7 to 3 blocks on encrypt, from
> 6 to 3 blocks on decrypt), but it would take some rework to convert to
> crypto_cipher_encrypt_one() or avoid stack buffers entirely.

I'm currently working on cts and I'm removing the stack usage
altogether by having it operate on the src/dst SG lists only.

It's part of the skcipher conversion though so it'll have to go
through the crypto tree.

BTW, the only cts user in our tree appears to be implementing
CTS all over again and is only calling the crypto API cts for
the last two blocks.  Someone should fix that.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin
Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers.

I have a local patch (appended, if anyone wants) to reduce the wasteful
amount of buffer space it uses (from 7 to 3 blocks on encrypt, from
6 to 3 blocks on decrypt), but it would take some rework to convert to
crypto_cipher_encrypt_one() or avoid stack buffers entirely.

commit c0aa0ae38dc6115b378939c5483ba6c7eb65d92a
Author: George Spelvin 
Date:   Sat Oct 10 17:26:08 2015 -0400

crypto: cts - Reduce internal buffer usage

It only takes a 3-block temporary buffer to handle all the tricky
CTS cases.  Encryption could in theory be done with two, but at a cost
in complexity.

But it's still a saving from the previous six blocks on the stack.

One issue I'm uncertain of and I'd like clarification on: to simplify
the cts_cbc_{en,de}crypt calls, I pass in the lcldesc structure which
contains the ctx->child transform rather than the parent one.  I'm
assuming the block sizes are guaranteed to be the same (they're set up
in crypto_cts_alloc by copying), but I haven't been able to prove it to
my satisfaction.

Signed-off-by: George Spelvin 

diff --git a/crypto/cts.c b/crypto/cts.c
index e467ec0ac..e24d2e15 100644
--- a/crypto/cts.c
+++ b/crypto/cts.c
@@ -70,54 +70,44 @@ static int crypto_cts_setkey(struct crypto_tfm *parent, 
const u8 *key,
return err;
 }
 
-static int cts_cbc_encrypt(struct crypto_cts_ctx *ctx,
-  struct blkcipher_desc *desc,
+/*
+ * The final CTS encryption is just like CBC encryption except that:
+ * - the last plaintext block is zero-padded,
+ * - the second-last ciphertext block is trimmed, and
+ * - the last (complete) block of ciphertext is output before the
+ *   (truncated) second-last one.
+ */
+static int cts_cbc_encrypt(struct blkcipher_desc *lcldesc,
   struct scatterlist *dst,
   struct scatterlist *src,
   unsigned int offset,
   unsigned int nbytes)
 {
-   int bsize = crypto_blkcipher_blocksize(desc->tfm);
-   u8 tmp[bsize], tmp2[bsize];
-   struct blkcipher_desc lcldesc;
-   struct scatterlist sgsrc[1], sgdst[1];
+   int bsize = crypto_blkcipher_blocksize(lcldesc->tfm);
+   u8 tmp[3*bsize] __aligned(8);
+   struct scatterlist sgsrc[1], sgdst[2];
int lastn = nbytes - bsize;
-   u8 iv[bsize];
-   u8 s[bsize * 2], d[bsize * 2];
int err;
 
-   if (lastn < 0)
+   if (lastn <= 0)
return -EINVAL;
 
-   sg_init_table(sgsrc, 1);
-   sg_init_table(sgdst, 1);
-
-   memset(s, 0, sizeof(s));
-   scatterwalk_map_and_copy(s, src, offset, nbytes, 0);
-
-   memcpy(iv, desc->info, bsize);
-
-   lcldesc.tfm = ctx->child;
-   lcldesc.info = iv;
-   lcldesc.flags = desc->flags;
-
-   sg_set_buf(&sgsrc[0], s, bsize);
-   sg_set_buf(&sgdst[0], tmp, bsize);
-   err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
-
-   memcpy(d + bsize, tmp, lastn);
-
-   lcldesc.info = tmp;
-
-   sg_set_buf(&sgsrc[0], s + bsize, bsize);
-   sg_set_buf(&sgdst[0], tmp2, bsize);
-   err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
-
-   memcpy(d, tmp2, bsize);
-
-   scatterwalk_map_and_copy(d, dst, offset, nbytes, 1);
-
-   memcpy(desc->info, tmp2, bsize);
+   /* Copy the input to a temporary buffer; tmp = xxx, P[n-1], P[n] */
+   memset(tmp+2*bsize, 0, bsize);
+   scatterwalk_map_and_copy(tmp+bsize, src, offset, nbytes, 0);
+
+   sg_init_one(sgsrc, tmp+bsize, 2*bsize);
+   /* Initialize dst specially to do the rearrangement for us */
+   sg_init_table(sgdst, 2);
+   sg_set_buf(sgdst+0, tmp+bsize, bsize);
+   sg_set_buf(sgdst+1, tmp,   bsize);
+
+   /* CBC-encrypt in place the two blocks; tmp = C[n], C[n-1], P[n] */
+   err = crypto_blkcipher_encrypt_iv(lcldesc, sgdst, sgsrc, 2*bsize);
+
+   /* Copy beginning of tmp to the output */
+   scatterwalk_map_and_copy(tmp, dst, offset, nbytes, 1);
+   memzero_explicit(tmp, sizeof(tmp));
 
return err;
 }
@@ -126,8 +116,8 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
  struct scatterlist *dst, struct scatterlist *src,
  unsigned int nbytes)
 {
-   struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int bsize = crypto_blkcipher_blocksize(desc->tfm);
+   struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int tot_blocks = (nbytes + bsize - 1) / bsize;
int cbc_blocks = tot_blocks > 2 ? tot_blocks - 2 : 0;
struct blkcipher_desc lcldesc;
@@ -140,14 +130,14 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
if (tot_blocks == 1) {
err = crypto_blkcipher_encrypt_iv(&lcldesc, dst, src, bsize);
} else if (nbytes <

Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-23 Thread Andy Lutomirski
On Wed, Jun 22, 2016 at 11:41 PM, Herbert Xu
 wrote:
> On Thu, Jun 23, 2016 at 11:48:25AM +0800, Herbert Xu wrote:
>>
>> No we never had such an API in the kernel.  However, I see that
>> rxkad does some pretty silly things and we should be able to avoid
>> using the stack in pretty much all cases.  Let me try to come up with
>> something.
>
> Here it is:
>
> ---8<---
> Subject: rxrpc: Avoid using stack memory in SG lists in rxkad

Looks reasonable to me.  Unless anyone tells me otherwise, my plan is
to queue it in my virtually-mapped stack series and to ask Ingo to
apply it via -tip.

If it went in via the networking tree, that would work as well, but it
would introduce a bisectability problem.

Thanks!

--Andy


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-22 Thread Herbert Xu
On Thu, Jun 23, 2016 at 11:48:25AM +0800, Herbert Xu wrote:
> 
> No we never had such an API in the kernel.  However, I see that
> rxkad does some pretty silly things and we should be able to avoid
> using the stack in pretty much all cases.  Let me try to come up with
> something.

Here it is:

---8<---
Subject: rxrpc: Avoid using stack memory in SG lists in rxkad

rxkad uses stack memory in SG lists which would not work if stacks
were allocated from vmalloc memory.  In fact, in most cases this
isn't even necessary as the stack memory ends up getting copied
over to kmalloc memory.

This patch eliminates all the unnecessary stack memory uses by
supplying the final destination directly to the crypto API.  In
two instances where a temporary buffer is actually needed we also
switch use the skb->cb area instead of the stack.

Finally there is no need to split a split-page buffer into two SG
entries so code dealing with that has been removed.

Signed-off-by: Herbert Xu 

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index f0b807a..8ee5933 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -277,6 +277,7 @@ struct rxrpc_connection {
struct key  *key;   /* security for this connection 
(client) */
struct key  *server_key;/* security for this service */
struct crypto_skcipher  *cipher;/* encryption handle */
+   struct rxrpc_crypt  csum_iv_head;   /* leading block for csum_iv */
struct rxrpc_crypt  csum_iv;/* packet checksum base */
unsigned long   events;
 #define RXRPC_CONN_CHALLENGE   0   /* send challenge packet */
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index 6b726a0..ee142de 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -105,11 +105,9 @@ static void rxkad_prime_packet_security(struct 
rxrpc_connection *conn)
 {
struct rxrpc_key_token *token;
SKCIPHER_REQUEST_ON_STACK(req, conn->cipher);
-   struct scatterlist sg[2];
+   struct rxrpc_crypt *csum_iv;
+   struct scatterlist sg;
struct rxrpc_crypt iv;
-   struct {
-   __be32 x[4];
-   } tmpbuf __attribute__((aligned(16))); /* must all be in same page */
 
_enter("");
 
@@ -119,24 +117,21 @@ static void rxkad_prime_packet_security(struct 
rxrpc_connection *conn)
token = conn->key->payload.data[0];
memcpy(&iv, token->kad->session_key, sizeof(iv));
 
-   tmpbuf.x[0] = htonl(conn->epoch);
-   tmpbuf.x[1] = htonl(conn->cid);
-   tmpbuf.x[2] = 0;
-   tmpbuf.x[3] = htonl(conn->security_ix);
+   csum_iv = &conn->csum_iv_head;
+   csum_iv[0].x[0] = htonl(conn->epoch);
+   csum_iv[0].x[1] = htonl(conn->cid);
+   csum_iv[1].x[0] = 0;
+   csum_iv[1].x[1] = htonl(conn->security_ix);
 
-   sg_init_one(&sg[0], &tmpbuf, sizeof(tmpbuf));
-   sg_init_one(&sg[1], &tmpbuf, sizeof(tmpbuf));
+   sg_init_one(&sg, csum_iv, 16);
 
skcipher_request_set_tfm(req, conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, &sg[1], &sg[0], sizeof(tmpbuf), iv.x);
+   skcipher_request_set_crypt(req, &sg, &sg, 16, iv.x);
 
crypto_skcipher_encrypt(req);
skcipher_request_zero(req);
 
-   memcpy(&conn->csum_iv, &tmpbuf.x[2], sizeof(conn->csum_iv));
-   ASSERTCMP((u32 __force)conn->csum_iv.n[0], ==, (u32 
__force)tmpbuf.x[2]);
-
_leave("");
 }
 
@@ -150,12 +145,9 @@ static int rxkad_secure_packet_auth(const struct 
rxrpc_call *call,
 {
struct rxrpc_skb_priv *sp;
SKCIPHER_REQUEST_ON_STACK(req, call->conn->cipher);
+   struct rxkad_level1_hdr hdr;
struct rxrpc_crypt iv;
-   struct scatterlist sg[2];
-   struct {
-   struct rxkad_level1_hdr hdr;
-   __be32  first;  /* first four bytes of data and padding */
-   } tmpbuf __attribute__((aligned(8))); /* must all be in same page */
+   struct scatterlist sg;
u16 check;
 
sp = rxrpc_skb(skb);
@@ -165,24 +157,21 @@ static int rxkad_secure_packet_auth(const struct 
rxrpc_call *call,
check = sp->hdr.seq ^ sp->hdr.callNumber;
data_size |= (u32)check << 16;
 
-   tmpbuf.hdr.data_size = htonl(data_size);
-   memcpy(&tmpbuf.first, sechdr + 4, sizeof(tmpbuf.first));
+   hdr.data_size = htonl(data_size);
+   memcpy(sechdr, &hdr, sizeof(hdr));
 
/* start the encryption afresh */
memset(&iv, 0, sizeof(iv));
 
-   sg_init_one(&sg[0], &tmpbuf, sizeof(tmpbuf));
-   sg_init_one(&sg[1], &tmpbuf, sizeof(tmpbuf));
+   sg_init_one(&sg, sechdr, 8);
 
skcipher_request_set_tfm(req, call->conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, &sg[1], &sg[0], sizeof(tmpbuf), iv.x);
+   skcipher_request_set_crypt(req, &sg, &sg, 8, iv.x);
 
crypto_skcipher_encry