[PATCH] crypto: rsa: rename two rsa key files

2016-11-03 Thread yanjiang.jin
From: Yanjiang Jin 

This is to eliminate the below compile error:

crypto/rsa_helper.c:19:29: fatal error: rsaprivkey-asn1.h: No such file or 
directory
 #include "rsaprivkey-asn1.h"
 ^
compilation terminated.

Signed-off-by: Yanjiang Jin 
---
 crypto/{rsaprivkey.asn1 => rsaprivkey.asn1.h} | 0
 crypto/{rsapubkey.asn1 => rsapubkey.asn1.h}   | 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename crypto/{rsaprivkey.asn1 => rsaprivkey.asn1.h} (100%)
 rename crypto/{rsapubkey.asn1 => rsapubkey.asn1.h} (100%)

diff --git a/crypto/rsaprivkey.asn1 b/crypto/rsaprivkey.asn1.h
similarity index 100%
rename from crypto/rsaprivkey.asn1
rename to crypto/rsaprivkey.asn1.h
diff --git a/crypto/rsapubkey.asn1 b/crypto/rsapubkey.asn1.h
similarity index 100%
rename from crypto/rsapubkey.asn1
rename to crypto/rsapubkey.asn1.h
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vmalloced stacks and scatterwalk_map_and_copy()

2016-11-03 Thread Andy Lutomirski
On Thu, Nov 3, 2016 at 4:10 PM, Eric Biggers  wrote:
> On Thu, Nov 03, 2016 at 02:12:07PM -0700, Eric Biggers wrote:
>> On Thu, Nov 03, 2016 at 01:30:49PM -0700, Andy Lutomirski wrote:
>> >
>> > Also, Herbert, it seems like the considerable majority of the crypto
>> > code is acting on kernel virtual memory addresses and does software
>> > processing.  Would it perhaps make sense to add a kvec-based or
>> > iov_iter-based interface to the crypto code?  I bet it would be quite
>> > a bit faster and it would make crypto on stack buffers work directly.
>>
>> I'd like to hear Herbert's opinion on this too, but as I understand it, if a
>> symmetric cipher API operating on virtual addresses was added, similar to the
>> existing "shash" API it would only allow software processing.  Whereas with 
>> the
>> current API you can request a transform and use it the same way regardless of
>> whether the crypto framework has chosen a software or hardware 
>> implementation,
>> or a combination thereof.  If this wasn't a concern then I expect using 
>> virtual
>> addresses would indeed simplify things a lot, at least for users not already
>> working with physical memory (struct page).
>>
>> Either way, in the near term it looks like 4.9 will be released with the new
>> behavior that encryption/decryption is not supported on stack buffers.
>> Separately from the scatterwalk_map_and_copy() issue, today I've found two
>> places in the filesystem-level encryption code that do encryption on stack
>> buffers and therefore hit the 'BUG_ON(!virt_addr_valid(buf));' in 
>> sg_set_buf().
>> I will be sending patches to fix these, but I suspect there may be more 
>> crypto
>> API users elsewhere that have this same problem.
>>
>> Eric
>
> [Added linux-mm to Cc]
>
> For what it's worth, grsecurity has a special case to allow a scatterlist 
> entry
> to be created from a stack buffer:
>
> static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
>   unsigned int buflen)
> {
> const void *realbuf = buf;
>
> #ifdef CONFIG_GRKERNSEC_KSTACKOVERFLOW
> if (object_starts_on_stack(buf))
> realbuf = buf - current->stack + 
> current->lowmem_stack;
> #endif
>
> #ifdef CONFIG_DEBUG_SG
> BUG_ON(!virt_addr_valid(realbuf));
> #endif
> sg_set_page(sg, virt_to_page(realbuf), buflen, 
> offset_in_page(realbuf));
> }

Yes, that's how grsecurity works.  The upstream version is going to do
it right instead of hacking around it.

> I don't know about all the relative merits of the two approaches.  But one of
> the things that will need to be done with the currently upstream approach is
> that all callers of sg_set_buf() will need to be checked to make sure they
> aren't using stack addresses, and any that are will need to be updated to do
> otherwise, e.g. by using heap-allocated memory.

I tried to do this, but I may have missed a couple example.

>  I suppose this is already
> happening, but in the case of the crypto API it will probably take a while for
> all the users to be identified and updated.  (And it's not always clear from 
> the
> local context whether something can be stack memory or not, e.g. the memory 
> for
> crypto request objects may be either.)

The crypto request objects can live on the stack just fine.  It's the
request buffers that need to live elsewhere (or the alternative
interfaces can be used, or the crypto core code can start using
something other than scatterlists).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vmalloced stacks and scatterwalk_map_and_copy()

2016-11-03 Thread Eric Biggers
On Thu, Nov 03, 2016 at 02:12:07PM -0700, Eric Biggers wrote:
> On Thu, Nov 03, 2016 at 01:30:49PM -0700, Andy Lutomirski wrote:
> > 
> > Also, Herbert, it seems like the considerable majority of the crypto
> > code is acting on kernel virtual memory addresses and does software
> > processing.  Would it perhaps make sense to add a kvec-based or
> > iov_iter-based interface to the crypto code?  I bet it would be quite
> > a bit faster and it would make crypto on stack buffers work directly.
> 
> I'd like to hear Herbert's opinion on this too, but as I understand it, if a
> symmetric cipher API operating on virtual addresses was added, similar to the
> existing "shash" API it would only allow software processing.  Whereas with 
> the
> current API you can request a transform and use it the same way regardless of
> whether the crypto framework has chosen a software or hardware implementation,
> or a combination thereof.  If this wasn't a concern then I expect using 
> virtual
> addresses would indeed simplify things a lot, at least for users not already
> working with physical memory (struct page).
> 
> Either way, in the near term it looks like 4.9 will be released with the new
> behavior that encryption/decryption is not supported on stack buffers.
> Separately from the scatterwalk_map_and_copy() issue, today I've found two
> places in the filesystem-level encryption code that do encryption on stack
> buffers and therefore hit the 'BUG_ON(!virt_addr_valid(buf));' in 
> sg_set_buf().
> I will be sending patches to fix these, but I suspect there may be more crypto
> API users elsewhere that have this same problem.
> 
> Eric

[Added linux-mm to Cc]

For what it's worth, grsecurity has a special case to allow a scatterlist entry
to be created from a stack buffer:

static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
  unsigned int buflen)
{
const void *realbuf = buf;

#ifdef CONFIG_GRKERNSEC_KSTACKOVERFLOW
if (object_starts_on_stack(buf))
realbuf = buf - current->stack + current->lowmem_stack;
#endif

#ifdef CONFIG_DEBUG_SG
BUG_ON(!virt_addr_valid(realbuf));
#endif
sg_set_page(sg, virt_to_page(realbuf), buflen, 
offset_in_page(realbuf));
}

It seems to maintain two virtual mappings for each stack, a physically
contiguous one (task_struct.lowmem_stack) and a physically non-contiguous one
(task_struct.stack).  This seems different from upstream CONFIG_VMAP_STACK which
just maintains a physically non-contiguous one.

I don't know about all the relative merits of the two approaches.  But one of
the things that will need to be done with the currently upstream approach is
that all callers of sg_set_buf() will need to be checked to make sure they
aren't using stack addresses, and any that are will need to be updated to do
otherwise, e.g. by using heap-allocated memory.  I suppose this is already
happening, but in the case of the crypto API it will probably take a while for
all the users to be identified and updated.  (And it's not always clear from the
local context whether something can be stack memory or not, e.g. the memory for
crypto request objects may be either.)

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] poly1305: generic C can be faster on chips with slow unaligned access

2016-11-03 Thread Jason A. Donenfeld
Hi David,

On Thu, Nov 3, 2016 at 6:08 PM, David Miller  wrote:
> In any event no piece of code should be doing 32-bit word reads from
> addresses like "x + 3" without, at a very minimum, going through the
> kernel unaligned access handlers.

Excellent point. In otherwords,

ctx->r[0] = (le32_to_cpuvp(key +  0) >> 0) & 0x3ff;
ctx->r[1] = (le32_to_cpuvp(key +  3) >> 2) & 0x303;
ctx->r[2] = (le32_to_cpuvp(key +  6) >> 4) & 0x3ffc0ff;
ctx->r[3] = (le32_to_cpuvp(key +  9) >> 6) & 0x3f03fff;
ctx->r[4] = (le32_to_cpuvp(key + 12) >> 8) & 0x00f;

should change to:

ctx->r[0] = (le32_to_cpuvp(key +  0) >> 0) & 0x3ff;
ctx->r[1] = (get_unaligned_le32(key +  3) >> 2) & 0x303;
ctx->r[2] = (get_unaligned_le32(key +  6) >> 4) & 0x3ffc0ff;
ctx->r[3] = (get_unaligned_le32(key +  9) >> 6) & 0x3f03fff;
ctx->r[4] = (le32_to_cpuvp(key + 12) >> 8) & 0x00f;

> We know explicitly that these offsets will not be 32-bit aligned, so
> it is required that we use the helpers, or alternatively do things to
> avoid these unaligned accesses such as using temporary storage when
> the HAVE_EFFICIENT_UNALIGNED_ACCESS kconfig value is not set.

So the question is: is the clever avoidance of unaligned accesses of
the original patch faster or slower than changing the unaligned
accesses to use the helper function?

I've put a little test harness together for playing with this:

$ git clone git://git.zx2c4.com/polybench
$ cd polybench
$ make run

To test with one method, do as normal. To test with the other, remove
"#define USE_FIRST_METHOD" from the source code.

@René: do you think you could retest on your MIPS32r2 hardware and
report back which is faster?

And if anybody else has other hardware and would like to try, this
could be nice.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] fscrypto: don't use on-stack buffer for key derivation

2016-11-03 Thread Eric Biggers
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page.  get_crypt_info() was using a stack buffer to hold the
output from the encryption operation used to derive the per-file key.
Fix it by using a heap buffer.

This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.

Signed-off-by: Eric Biggers 
---
 fs/crypto/keyinfo.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/crypto/keyinfo.c b/fs/crypto/keyinfo.c
index 82f0285..67fb6d8 100644
--- a/fs/crypto/keyinfo.c
+++ b/fs/crypto/keyinfo.c
@@ -185,7 +185,7 @@ int get_crypt_info(struct inode *inode)
struct crypto_skcipher *ctfm;
const char *cipher_str;
int keysize;
-   u8 raw_key[FS_MAX_KEY_SIZE];
+   u8 *raw_key = NULL;
int res;
 
res = fscrypt_initialize();
@@ -238,6 +238,15 @@ int get_crypt_info(struct inode *inode)
if (res)
goto out;
 
+   /*
+* This cannot be a stack buffer because it is passed to the scatterlist
+* crypto API as part of key derivation.
+*/
+   res = -ENOMEM;
+   raw_key = kmalloc(FS_MAX_KEY_SIZE, GFP_NOFS);
+   if (!raw_key)
+   goto out;
+
if (fscrypt_dummy_context_enabled(inode)) {
memset(raw_key, 0x42, FS_AES_256_XTS_KEY_SIZE);
goto got_key;
@@ -276,7 +285,8 @@ int get_crypt_info(struct inode *inode)
if (res)
goto out;
 
-   memzero_explicit(raw_key, sizeof(raw_key));
+   kzfree(raw_key);
+   raw_key = NULL;
if (cmpxchg(&inode->i_crypt_info, NULL, crypt_info) != NULL) {
put_crypt_info(crypt_info);
goto retry;
@@ -287,7 +297,7 @@ int get_crypt_info(struct inode *inode)
if (res == -ENOKEY)
res = 0;
put_crypt_info(crypt_info);
-   memzero_explicit(raw_key, sizeof(raw_key));
+   kzfree(raw_key);
return res;
 }
 
-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] fscrypto: don't use on-stack buffer for filename encryption

2016-11-03 Thread Eric Biggers
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page.  For short filenames, fname_encrypt() was encrypting a
stack buffer holding the padded filename.  Fix it by encrypting the
filename in-place in the output buffer, thereby making the temporary
buffer unnecessary.

This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.

Signed-off-by: Eric Biggers 
---
 fs/crypto/fname.c | 53 +
 1 file changed, 21 insertions(+), 32 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 9a28133..9b774f4 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -39,65 +39,54 @@ static void fname_crypt_complete(struct 
crypto_async_request *req, int res)
 static int fname_encrypt(struct inode *inode,
const struct qstr *iname, struct fscrypt_str *oname)
 {
-   u32 ciphertext_len;
struct skcipher_request *req = NULL;
DECLARE_FS_COMPLETION_RESULT(ecr);
struct fscrypt_info *ci = inode->i_crypt_info;
struct crypto_skcipher *tfm = ci->ci_ctfm;
int res = 0;
char iv[FS_CRYPTO_BLOCK_SIZE];
-   struct scatterlist src_sg, dst_sg;
+   struct scatterlist sg;
int padding = 4 << (ci->ci_flags & FS_POLICY_FLAGS_PAD_MASK);
-   char *workbuf, buf[32], *alloc_buf = NULL;
-   unsigned lim;
+   unsigned int lim;
+   unsigned int cryptlen;
 
lim = inode->i_sb->s_cop->max_namelen(inode);
if (iname->len <= 0 || iname->len > lim)
return -EIO;
 
-   ciphertext_len = max(iname->len, (u32)FS_CRYPTO_BLOCK_SIZE);
-   ciphertext_len = round_up(ciphertext_len, padding);
-   ciphertext_len = min(ciphertext_len, lim);
+   /*
+* Copy the filename to the output buffer for encrypting in-place and
+* pad it with the needed number of NUL bytes.
+*/
+   cryptlen = max_t(unsigned int, iname->len, FS_CRYPTO_BLOCK_SIZE);
+   cryptlen = round_up(cryptlen, padding);
+   cryptlen = min(cryptlen, lim);
+   memcpy(oname->name, iname->name, iname->len);
+   memset(oname->name + iname->len, 0, cryptlen - iname->len);
 
-   if (ciphertext_len <= sizeof(buf)) {
-   workbuf = buf;
-   } else {
-   alloc_buf = kmalloc(ciphertext_len, GFP_NOFS);
-   if (!alloc_buf)
-   return -ENOMEM;
-   workbuf = alloc_buf;
-   }
+   /* Initialize the IV */
+   memset(iv, 0, FS_CRYPTO_BLOCK_SIZE);
 
-   /* Allocate request */
+   /* Set up the encryption request */
req = skcipher_request_alloc(tfm, GFP_NOFS);
if (!req) {
printk_ratelimited(KERN_ERR
-   "%s: crypto_request_alloc() failed\n", __func__);
-   kfree(alloc_buf);
+   "%s: skcipher_request_alloc() failed\n", __func__);
return -ENOMEM;
}
skcipher_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
fname_crypt_complete, &ecr);
+   sg_init_one(&sg, oname->name, cryptlen);
+   skcipher_request_set_crypt(req, &sg, &sg, cryptlen, iv);
 
-   /* Copy the input */
-   memcpy(workbuf, iname->name, iname->len);
-   if (iname->len < ciphertext_len)
-   memset(workbuf + iname->len, 0, ciphertext_len - iname->len);
-
-   /* Initialize IV */
-   memset(iv, 0, FS_CRYPTO_BLOCK_SIZE);
-
-   /* Create encryption request */
-   sg_init_one(&src_sg, workbuf, ciphertext_len);
-   sg_init_one(&dst_sg, oname->name, ciphertext_len);
-   skcipher_request_set_crypt(req, &src_sg, &dst_sg, ciphertext_len, iv);
+   /* Do the encryption */
res = crypto_skcipher_encrypt(req);
if (res == -EINPROGRESS || res == -EBUSY) {
+   /* Request is being completed asynchronously; wait for it */
wait_for_completion(&ecr.completion);
res = ecr.res;
}
-   kfree(alloc_buf);
skcipher_request_free(req);
if (res < 0) {
printk_ratelimited(KERN_ERR
@@ -105,7 +94,7 @@ static int fname_encrypt(struct inode *inode,
return res;
}
 
-   oname->len = ciphertext_len;
+   oname->len = cryptlen;
return 0;
 }
 
-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vmalloced stacks and scatterwalk_map_and_copy()

2016-11-03 Thread Eric Biggers
On Thu, Nov 03, 2016 at 01:30:49PM -0700, Andy Lutomirski wrote:
> 
> Also, Herbert, it seems like the considerable majority of the crypto
> code is acting on kernel virtual memory addresses and does software
> processing.  Would it perhaps make sense to add a kvec-based or
> iov_iter-based interface to the crypto code?  I bet it would be quite
> a bit faster and it would make crypto on stack buffers work directly.

I'd like to hear Herbert's opinion on this too, but as I understand it, if a
symmetric cipher API operating on virtual addresses was added, similar to the
existing "shash" API it would only allow software processing.  Whereas with the
current API you can request a transform and use it the same way regardless of
whether the crypto framework has chosen a software or hardware implementation,
or a combination thereof.  If this wasn't a concern then I expect using virtual
addresses would indeed simplify things a lot, at least for users not already
working with physical memory (struct page).

Either way, in the near term it looks like 4.9 will be released with the new
behavior that encryption/decryption is not supported on stack buffers.
Separately from the scatterwalk_map_and_copy() issue, today I've found two
places in the filesystem-level encryption code that do encryption on stack
buffers and therefore hit the 'BUG_ON(!virt_addr_valid(buf));' in sg_set_buf().
I will be sending patches to fix these, but I suspect there may be more crypto
API users elsewhere that have this same problem.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vmalloced stacks and scatterwalk_map_and_copy()

2016-11-03 Thread Andy Lutomirski
On Thu, Nov 3, 2016 at 11:16 AM, Eric Biggers  wrote:
> Hello,
>
> I hit the BUG_ON() in arch/x86/mm/physaddr.c:26 while testing some crypto code
> in an x86_64 kernel with CONFIG_DEBUG_VIRTUAL=y and CONFIG_VMAP_STACK=y:
>
> /* carry flag will be set if starting x was >= PAGE_OFFSET */
> VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
>
> The problem is the following code in scatterwalk_map_and_copy() in
> crypto/scatterwalk.c, which tries to determine if the buffer passed in aliases
> the physical memory of the first segment of the scatterlist:
>
> if (sg_page(sg) == virt_to_page(buf) &&
> sg->offset == offset_in_page(buf))
> return;

...

>
> Currently I think the best solution would be to require that callers to
> scatterwalk_map_and_copy() do not alias their source and destination.  Then 
> the
> alias check could be removed.  This check has only been there since v4.2 
> (commit
> 74412fd5d71b6), so I'd hope not many callers rely on the behavior.  I'm not 
> sure
> exactly which ones do, though.
>
> Thoughts on this?

The relevant commit is:

commit 74412fd5d71b6eda0beb302aa467da000f0d530c
Author: Herbert Xu 
Date:   Thu May 21 15:11:12 2015 +0800

crypto: scatterwalk - Check for same address in map_and_copy

This patch adds a check for in scatterwalk_map_and_copy to avoid
copying from the same address to the same address.  This is going
to be used for IV copying in AEAD IV generators.

There is no provision for partial overlaps.

This patch also uses the new scatterwalk_ffwd instead of doing
it by hand in scatterwalk_map_and_copy.

Signed-off-by: Herbert Xu 

Herbert, can you clarify this?  The check seems rather bizarre --
you're doing an incomplete check for aliasing and skipping the whole
copy if the beginning aliases.  In any event the stack *can't*
reasonably alias the scatterlist because a scatterlist can't safely
point to the stack.  Is there any code that actually relies on the
aliasing-detecting behavior?

Also, Herbert, it seems like the considerable majority of the crypto
code is acting on kernel virtual memory addresses and does software
processing.  Would it perhaps make sense to add a kvec-based or
iov_iter-based interface to the crypto code?  I bet it would be quite
a bit faster and it would make crypto on stack buffers work directly.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: gf128mul - remove dead gf128mul_64k_lle code

2016-11-03 Thread Alex Cope
This code is unlikely to be useful in the future because transforms
don't know how often keys will be changed, new algorithms are unlikely
to use lle representation, and tables should be replaced with
carryless multiplication instructions when available.

Signed-off-by: Alex Cope 
---
 crypto/gf128mul.c | 55 ---
 include/crypto/gf128mul.h | 13 ++-
 2 files changed, 6 insertions(+), 62 deletions(-)

diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c
index 5276607..57c85dd 100644
--- a/crypto/gf128mul.c
+++ b/crypto/gf128mul.c
@@ -263,48 +263,6 @@ EXPORT_SYMBOL(gf128mul_bbe);
  * t[1][BYTE] contains g*x^8*BYTE
  *  ..
  * t[15][BYTE] contains g*x^120*BYTE */
-struct gf128mul_64k *gf128mul_init_64k_lle(const be128 *g)
-{
- struct gf128mul_64k *t;
- int i, j, k;
-
- t = kzalloc(sizeof(*t), GFP_KERNEL);
- if (!t)
- goto out;
-
- for (i = 0; i < 16; i++) {
- t->t[i] = kzalloc(sizeof(*t->t[i]), GFP_KERNEL);
- if (!t->t[i]) {
- gf128mul_free_64k(t);
- t = NULL;
- goto out;
- }
- }
-
- t->t[0]->t[128] = *g;
- for (j = 64; j > 0; j >>= 1)
- gf128mul_x_lle(&t->t[0]->t[j], &t->t[0]->t[j + j]);
-
- for (i = 0;;) {
- for (j = 2; j < 256; j += j)
- for (k = 1; k < j; ++k)
- be128_xor(&t->t[i]->t[j + k],
-  &t->t[i]->t[j], &t->t[i]->t[k]);
-
- if (++i >= 16)
- break;
-
- for (j = 128; j > 0; j >>= 1) {
- t->t[i]->t[j] = t->t[i - 1]->t[j];
- gf128mul_x8_lle(&t->t[i]->t[j]);
- }
- }
-
-out:
- return t;
-}
-EXPORT_SYMBOL(gf128mul_init_64k_lle);
-
 struct gf128mul_64k *gf128mul_init_64k_bbe(const be128 *g)
 {
  struct gf128mul_64k *t;
@@ -357,19 +315,6 @@ void gf128mul_free_64k(struct gf128mul_64k *t)
 }
 EXPORT_SYMBOL(gf128mul_free_64k);

-void gf128mul_64k_lle(be128 *a, struct gf128mul_64k *t)
-{
- u8 *ap = (u8 *)a;
- be128 r[1];
- int i;
-
- *r = t->t[0]->t[ap[0]];
- for (i = 1; i < 16; ++i)
- be128_xor(r, r, &t->t[i]->t[ap[i]]);
- *a = *r;
-}
-EXPORT_SYMBOL(gf128mul_64k_lle);
-
 void gf128mul_64k_bbe(be128 *a, struct gf128mul_64k *t)
 {
  u8 *ap = (u8 *)a;
diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h
index da2530e..b611aa9 100644
--- a/include/crypto/gf128mul.h
+++ b/include/crypto/gf128mul.h
@@ -181,20 +181,19 @@ static inline void gf128mul_free_4k(struct gf128mul_4k *t)
 }


-/* 64k table optimization, implemented for lle and bbe */
+/* 64k table optimization, implemented for bbe */

 struct gf128mul_64k {
  struct gf128mul_4k *t[16];
 };

-/* first initialize with the constant factor with which you
- * want to multiply and then call gf128_64k_lle with the other
- * factor in the first argument, the table in the second and a
- * scratch register in the third. Afterwards *a = *r. */
-struct gf128mul_64k *gf128mul_init_64k_lle(const be128 *g);
+/* First initialize with the constant factor with which you
+ * want to multiply and then call gf128mul_64k_bbe with the other
+ * factor in the first argument, and the table in the second.
+ * Afterwards, the result is stored in *a.
+ */
 struct gf128mul_64k *gf128mul_init_64k_bbe(const be128 *g);
 void gf128mul_free_64k(struct gf128mul_64k *t);
-void gf128mul_64k_lle(be128 *a, struct gf128mul_64k *t);
 void gf128mul_64k_bbe(be128 *a, struct gf128mul_64k *t);

 #endif /* _CRYPTO_GF128MUL_H */
-- 
2.8.0.rc3.226.g39d4020
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vmalloced stacks and scatterwalk_map_and_copy()

2016-11-03 Thread Eric Biggers
Hello,

I hit the BUG_ON() in arch/x86/mm/physaddr.c:26 while testing some crypto code
in an x86_64 kernel with CONFIG_DEBUG_VIRTUAL=y and CONFIG_VMAP_STACK=y:

/* carry flag will be set if starting x was >= PAGE_OFFSET */
VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));

The problem is the following code in scatterwalk_map_and_copy() in
crypto/scatterwalk.c, which tries to determine if the buffer passed in aliases
the physical memory of the first segment of the scatterlist:

if (sg_page(sg) == virt_to_page(buf) &&
sg->offset == offset_in_page(buf))
return;

If 'buf' is on the stack with CONFIG_VMAP_STACK=y it will be a vmalloc address.
And virt_to_page(buf) does not have a meaningful behavior on vmalloc addresses.
Hence the BUG.

I don't think this should be fixed by forbidding passing a stack address to
scatterwalk_map_and_copy().  Not only are there a number of callers that
explicitly use stack addresses (e.g. poly_verify_tag() in
crypto/chacha20poly1305.c) but it's also possible for a whole skcipher_request
to be allocated on the stack with the SKCIPHER_REQUEST_ON_STACK() macro.  Note
that this has nothing to do with DMA per se.

Another solution would be to make scatterwalk_map_and_copy() explicitly check
for virt_addr_valid().  But this would make the behavior of
scatterwalk_map_and_copy() confusing because it would detect aliasing but only
under some circumstances, and it would depend on the kernel config.

Currently I think the best solution would be to require that callers to
scatterwalk_map_and_copy() do not alias their source and destination.  Then the
alias check could be removed.  This check has only been there since v4.2 (commit
74412fd5d71b6), so I'd hope not many callers rely on the behavior.  I'm not sure
exactly which ones do, though.

Thoughts on this?

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] poly1305: generic C can be faster on chips with slow unaligned access

2016-11-03 Thread David Miller
From: "Jason A. Donenfeld" 
Date: Thu, 3 Nov 2016 08:24:57 +0100

> Hi Herbert,
> 
> On Thu, Nov 3, 2016 at 1:49 AM, Herbert Xu  
> wrote:
>> FWIW I'd rather live with a 6% slowdown than having two different
>> code paths in the generic code.  Anyone who cares about 6% would
>> be much better off writing an assembly version of the code.
> 
> Please think twice before deciding that the generic C "is allowed to
> be slow".

In any event no piece of code should be doing 32-bit word reads from
addresses like "x + 3" without, at a very minimum, going through the
kernel unaligned access handlers.

Yet that is what the generic C poly1305 code is doing, all over the
place.

We know explicitly that these offsets will not be 32-bit aligned, so
it is required that we use the helpers, or alternatively do things to
avoid these unaligned accesses such as using temporary storage when
the HAVE_EFFICIENT_UNALIGNED_ACCESS kconfig value is not set.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] poly1305: generic C can be faster on chips with slow unaligned access

2016-11-03 Thread Jason A. Donenfeld
Hi Herbert,

On Thu, Nov 3, 2016 at 1:49 AM, Herbert Xu  wrote:
> FWIW I'd rather live with a 6% slowdown than having two different
> code paths in the generic code.  Anyone who cares about 6% would
> be much better off writing an assembly version of the code.

Please think twice before deciding that the generic C "is allowed to
be slow". It turns out to be used far more often than might be
obvious. For example, crypto is commonly done on the netdev layer --
like the case with mac80211-based drivers. At this layer, the FPU on
x86 isn't always available, depending on the path used. Some
combinations of drivers, packet family, and workload can result in the
generic C being used instead of the vectorized assembly for a massive
percentage of time. So, I think we do have a good motivation for
wanting the generic C to be as fast as possible.

In the particular case of poly1305, these are the only spots where
unaligned accesses take place, and they're rather small, and I think
it's pretty obvious what's happening in the two different cases of
code from a quick glance. This isn't the "two different paths case" in
which there's a significant future-facing maintenance burden.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html