Re: [PATCH 0/3] padata cpu awareness fixes

2017-10-02 Thread Mathias Krause
On 12 September 2017 at 11:07, Steffen Klassert
 wrote:
> On Fri, Sep 08, 2017 at 08:57:08PM +0200, Mathias Krause wrote:
>> Hi Steffen, Herbert,
>>
>> this series solves multiple issues of padata I ran into while trying to
>> make use of pcrypt for IPsec.
>>
>> The first issue is that the reorder timer callback may run on a CPU that
>> isn't part of the parallel set, as it runs on the CPU where the timer
>> interrupt gets handled. As a result, padata_reorder() may run on a CPU
>> with uninitialized per-cpu parallel queues, so the __this_cpu_read() in
>> padata_get_next() refers to an uninitialized cpu_index. However, as
>> per-cpu memory gets memset(0) on allocation time, it'll read 0. If the
>> next CPU index we're waiting for is 0 too, the compare will succeed and
>> we'll return with -ENODATA, making padata_reorder() think there's
>> nothing to do, stop any pending timer and return immediately. This is
>> wrong as the pending timer might have been the one to trigger the needed
>> reordering, but, as it was canceled, will never happen -- if no other
>> parallel requests arrive afterwards.
>>
>> That issue is addressed with the first two patches, ensuring we're not
>> using a bogus cpu_index value for the compare and thereby not wrongly
>> cancel a pending timer. The second patch then ensures the reorder timer
>> callback runs on the correct CPU or, at least, on a CPU from the
>> parallel set to generate forward progress.
>>
>> The third patch addresses a similar issues for the serialization
>> callback. We implicitly require padata_do_serial() to be called on the
>> very same CPU padata_do_parallel() was called on to ensure the correct
>> ordering of requests -- and correct functioning of padata, even. If the
>> algorithm we're parallelizing is asynchronous itself, that invariant
>> might not be true, as, e.g. crypto hardware may execute the callback on
>> the CPU its interrupt gets handled on which isn't necessarily the one
>> the request got issued on.
>>
>> To handle that issue we track the CPU we're supposed to handle the
>> request on and ensure we'll call padata_reorder() on that CPU when
>> padata_do_serial() gets called -- either by already running on the
>> corect CPU or by deferring the call to a worker.
>>
>> Please apply!
>>
>> Mathias Krause (3):
>>   padata: set cpu_index of unused CPUs to -1
>>   padata: ensure the reorder timer callback runs on the correct CPU
>>   padata: ensure padata_do_serial() runs on the correct CPU
>
> Looks good, thanks!
>
> Acked-by: Steffen Klassert 

Ping! Herbert, will these patches go through your tree or Steffen's?


[PATCH 3/3] padata: ensure padata_do_serial() runs on the correct CPU

2017-09-08 Thread Mathias Krause
If the algorithm we're parallelizing is asynchronous we might change
CPUs between padata_do_parallel() and padata_do_serial(). However, we
don't expect this to happen as we need to enqueue the padata object into
the per-cpu reorder queue we took it from, i.e. the same-cpu's parallel
queue.

Ensure we're not switching CPUs for a given padata object by tracking
the CPU within the padata object. If the serial callback gets called on
the wrong CPU, defer invoking padata_reorder() via a kernel worker on
the CPU we're expected to run on.

Signed-off-by: Mathias Krause 
---
 include/linux/padata.h |2 ++
 kernel/padata.c|   20 +++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 5c0175bbc179..5d13d25da2c8 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -37,6 +37,7 @@
  * @list: List entry, to attach to the padata lists.
  * @pd: Pointer to the internal control structure.
  * @cb_cpu: Callback cpu for serializatioon.
+ * @cpu: Cpu for parallelization.
  * @seq_nr: Sequence number of the parallelized data object.
  * @info: Used to pass information from the parallel to the serial function.
  * @parallel: Parallel execution function.
@@ -46,6 +47,7 @@ struct padata_priv {
struct list_headlist;
struct parallel_data*pd;
int cb_cpu;
+   int cpu;
int info;
void(*parallel)(struct padata_priv *padata);
void(*serial)(struct padata_priv *padata);
diff --git a/kernel/padata.c b/kernel/padata.c
index b4066147bce4..f262c9a4e70a 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -131,6 +131,7 @@ int padata_do_parallel(struct padata_instance *pinst,
padata->cb_cpu = cb_cpu;
 
target_cpu = padata_cpu_hash(pd);
+   padata->cpu = target_cpu;
queue = per_cpu_ptr(pd->pqueue, target_cpu);
 
spin_lock(&queue->parallel.lock);
@@ -363,10 +364,21 @@ void padata_do_serial(struct padata_priv *padata)
int cpu;
struct padata_parallel_queue *pqueue;
struct parallel_data *pd;
+   int reorder_via_wq = 0;
 
pd = padata->pd;
 
cpu = get_cpu();
+
+   /* We need to run on the same CPU padata_do_parallel(.., padata, ..)
+* was called on -- or, at least, enqueue the padata object into the
+* correct per-cpu queue.
+*/
+   if (cpu != padata->cpu) {
+   reorder_via_wq = 1;
+   cpu = padata->cpu;
+   }
+
pqueue = per_cpu_ptr(pd->pqueue, cpu);
 
spin_lock(&pqueue->reorder.lock);
@@ -376,7 +388,13 @@ void padata_do_serial(struct padata_priv *padata)
 
put_cpu();
 
-   padata_reorder(pd);
+   /* If we're running on the wrong CPU, call padata_reorder() via a
+* kernel worker.
+*/
+   if (reorder_via_wq)
+   queue_work_on(cpu, pd->pinst->wq, &pqueue->reorder_work);
+   else
+   padata_reorder(pd);
 }
 EXPORT_SYMBOL(padata_do_serial);
 
-- 
1.7.10.4



[PATCH 2/3] padata: ensure the reorder timer callback runs on the correct CPU

2017-09-08 Thread Mathias Krause
The reorder timer function runs on the CPU where the timer interrupt was
handled which is not necessarily one of the CPUs of the 'pcpu' CPU mask
set.

Ensure the padata_reorder() callback runs on the correct CPU, which is
one in the 'pcpu' CPU mask set and, preferrably, the next expected one.
Do so by comparing the current CPU with the expected target CPU. If they
match, call padata_reorder() right away. If they differ, schedule a work
item on the target CPU that does the padata_reorder() call for us.

Signed-off-by: Mathias Krause 
---
 include/linux/padata.h |2 ++
 kernel/padata.c|   43 ++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 2f9c1f93b1ce..5c0175bbc179 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -85,6 +85,7 @@ struct padata_serial_queue {
  * @swork: work struct for serialization.
  * @pd: Backpointer to the internal control structure.
  * @work: work struct for parallelization.
+ * @reorder_work: work struct for reordering.
  * @num_obj: Number of objects that are processed by this cpu.
  * @cpu_index: Index of the cpu.
  */
@@ -93,6 +94,7 @@ struct padata_parallel_queue {
struct padata_listreorder;
struct parallel_data *pd;
struct work_structwork;
+   struct work_structreorder_work;
atomic_t  num_obj;
int   cpu_index;
 };
diff --git a/kernel/padata.c b/kernel/padata.c
index 1b9b4bac4a9b..b4066147bce4 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -275,11 +275,51 @@ static void padata_reorder(struct parallel_data *pd)
return;
 }
 
+static void invoke_padata_reorder(struct work_struct *work)
+{
+   struct padata_parallel_queue *pqueue;
+   struct parallel_data *pd;
+
+   local_bh_disable();
+   pqueue = container_of(work, struct padata_parallel_queue, reorder_work);
+   pd = pqueue->pd;
+   padata_reorder(pd);
+   local_bh_enable();
+}
+
 static void padata_reorder_timer(unsigned long arg)
 {
struct parallel_data *pd = (struct parallel_data *)arg;
+   unsigned int weight;
+   int target_cpu, cpu;
 
-   padata_reorder(pd);
+   cpu = get_cpu();
+
+   /* We don't lock pd here to not interfere with parallel processing
+* padata_reorder() calls on other CPUs. We just need any CPU out of
+* the cpumask.pcpu set. It would be nice if it's the right one but
+* it doesn't matter if we're off to the next one by using an outdated
+* pd->processed value.
+*/
+   weight = cpumask_weight(pd->cpumask.pcpu);
+   target_cpu = padata_index_to_cpu(pd, pd->processed % weight);
+
+   /* ensure to call the reorder callback on the correct CPU */
+   if (cpu != target_cpu) {
+   struct padata_parallel_queue *pqueue;
+   struct padata_instance *pinst;
+
+   /* The timer function is serialized wrt itself -- no locking
+* needed.
+*/
+   pinst = pd->pinst;
+   pqueue = per_cpu_ptr(pd->pqueue, target_cpu);
+   queue_work_on(target_cpu, pinst->wq, &pqueue->reorder_work);
+   } else {
+   padata_reorder(pd);
+   }
+
+   put_cpu();
 }
 
 static void padata_serial_worker(struct work_struct *serial_work)
@@ -399,6 +439,7 @@ static void padata_init_pqueues(struct parallel_data *pd)
__padata_list_init(&pqueue->reorder);
__padata_list_init(&pqueue->parallel);
INIT_WORK(&pqueue->work, padata_parallel_worker);
+   INIT_WORK(&pqueue->reorder_work, invoke_padata_reorder);
atomic_set(&pqueue->num_obj, 0);
}
 }
-- 
1.7.10.4



[PATCH 1/3] padata: set cpu_index of unused CPUs to -1

2017-09-08 Thread Mathias Krause
The parallel queue per-cpu data structure gets initialized only for CPUs
in the 'pcpu' CPU mask set. This is not sufficient as the reorder timer
may run on a different CPU and might wrongly decide it's the target CPU
for the next reorder item as per-cpu memory gets memset(0) and we might
be waiting for the first CPU in cpumask.pcpu, i.e. cpu_index 0.

Make the '__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index'
compare in padata_get_next() fail in this case by initializing the
cpu_index member of all per-cpu parallel queues. Use -1 for unused ones.

Signed-off-by: Mathias Krause 
---
 kernel/padata.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index 868f947166d7..1b9b4bac4a9b 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -384,8 +384,14 @@ static void padata_init_pqueues(struct parallel_data *pd)
struct padata_parallel_queue *pqueue;
 
cpu_index = 0;
-   for_each_cpu(cpu, pd->cpumask.pcpu) {
+   for_each_possible_cpu(cpu) {
pqueue = per_cpu_ptr(pd->pqueue, cpu);
+
+   if (!cpumask_test_cpu(cpu, pd->cpumask.pcpu)) {
+   pqueue->cpu_index = -1;
+   continue;
+   }
+
pqueue->pd = pd;
pqueue->cpu_index = cpu_index;
cpu_index++;
-- 
1.7.10.4



[PATCH 0/3] padata cpu awareness fixes

2017-09-08 Thread Mathias Krause
Hi Steffen, Herbert,

this series solves multiple issues of padata I ran into while trying to
make use of pcrypt for IPsec.

The first issue is that the reorder timer callback may run on a CPU that
isn't part of the parallel set, as it runs on the CPU where the timer
interrupt gets handled. As a result, padata_reorder() may run on a CPU
with uninitialized per-cpu parallel queues, so the __this_cpu_read() in
padata_get_next() refers to an uninitialized cpu_index. However, as
per-cpu memory gets memset(0) on allocation time, it'll read 0. If the
next CPU index we're waiting for is 0 too, the compare will succeed and
we'll return with -ENODATA, making padata_reorder() think there's
nothing to do, stop any pending timer and return immediately. This is
wrong as the pending timer might have been the one to trigger the needed
reordering, but, as it was canceled, will never happen -- if no other
parallel requests arrive afterwards.

That issue is addressed with the first two patches, ensuring we're not
using a bogus cpu_index value for the compare and thereby not wrongly
cancel a pending timer. The second patch then ensures the reorder timer
callback runs on the correct CPU or, at least, on a CPU from the
parallel set to generate forward progress.

The third patch addresses a similar issues for the serialization
callback. We implicitly require padata_do_serial() to be called on the
very same CPU padata_do_parallel() was called on to ensure the correct
ordering of requests -- and correct functioning of padata, even. If the
algorithm we're parallelizing is asynchronous itself, that invariant
might not be true, as, e.g. crypto hardware may execute the callback on
the CPU its interrupt gets handled on which isn't necessarily the one
the request got issued on.

To handle that issue we track the CPU we're supposed to handle the
request on and ensure we'll call padata_reorder() on that CPU when
padata_do_serial() gets called -- either by already running on the
corect CPU or by deferring the call to a worker. 

Please apply!

Mathias Krause (3):
  padata: set cpu_index of unused CPUs to -1
  padata: ensure the reorder timer callback runs on the correct CPU
  padata: ensure padata_do_serial() runs on the correct CPU

 include/linux/padata.h |4 +++
 kernel/padata.c|   71 ++--
 2 files changed, 72 insertions(+), 3 deletions(-)

-- 
1.7.10.4



echainiv not working as supposed to be?

2016-09-06 Thread Mathias Krause
Hi Herbert,

I'm a little bit confused on how echainiv is supposed to work. I think
it's doing a few things wrongly, I just can't decide what's actually
wrong as the intended mode of operation is unclear to me. At least,
as-is, the code isn't quite operating as advertised in the comment at
the top of the file.

So, in hope you could enlighten me, here's my understanding of the
current _implementation_ of echainiv (assuming aligned req->iv, for
simplicity):

For each request it XORs the salt into the provided IV (req->iv). For
ESP the provided IV is the sequence number or, at least, parts of it.
The thus modified IV gets written into the *destination* buffer
(req->dst) which might be a different buffer than the source buffer
(not in the ESP case, though, as it passes the same sg for src and
dst). Afterwards, the per-cpu IV gets copied over into req->iv,
effectively overwriting the generated XORed value. Then the inner
crypto request gets initiated which may finish synchronously or
asynchronously. Either way, the per-cpu IV gets updated with the new
value from subreq->iv, which should be equal to req->iv in the normal
case.

Things that are unclear to me:

1/ Why is the XORed IV written to the destination buffer and not the
source buffer? Isn't the sub-request supposed to use the IV from
either req->src or req->iv -- and especially *not* from req->dst?

2/ Why does the XORed IV gets overwritten with the per-cpu IV prior
passing it on to the sub-request, making all possible IV locations in
req->src, req->dst and req->iv *all* be different?

Moreover, the behaviour of 2/ actually leads to the fact, that the
very first IV on each CPU will be a null IV -- not a salted sequence
number. All following IVs will be the result of the (or better "some")
sub-request's IV update, stored into the per-cpu variable -- still no
salted sequence numbers, though.

That behaviour is a bug, IMHO, as this clearly differs from what is
described in the comment. However, I'm uncertain on how to fix it, as
the intended mode of operation is unclear to me... Should only the
first IV of each crypto transform be the salted sequence number and
all following the result of the sub-request's IV update, therefore not
stored in a single per-cpu variable but some echainiv context specific
one?

3/ Why is echainiv using per-cpu IVs in the first place? Shouldn't
those be crypto context specific? Currently we're happily mixing IVs
from different transforms (with possibly different IV sizes!) with
each other via the per-cpu variables. That might be an "issue" if one
echainiv user can maliciously mess with the IVs of another user, e.g.
via AF_ALG.

So, should echainiv use a per-context per-cpu array instead that --
for simplicity -- gets initialised with random bytes and will be
updated as it's now, just not with a single global per-cpu array, but
a per-context one?

That would allow us to still have a lock-less IV generator but one
that cannot be accidentally / maliciously be tampered with by other
echainiv users.


Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: user - re-add size check for CRYPTO_MSG_GETALG

2016-06-22 Thread Mathias Krause
On 22 June 2016 at 21:03, Stephan Mueller  wrote:
> Am Mittwoch, 22. Juni 2016, 20:29:37 schrieb Mathias Krause:
>
> Hi Mathias,
>
>> Commit 9aa867e46565 ("crypto: user - Add CRYPTO_MSG_DELRNG")
>> accidentally removed the minimum size check for CRYPTO_MSG_GETALG
>> netlink messages. This allows userland to send a truncated
>> CRYPTO_MSG_GETALG message as short as a netlink header only making
>> crypto_report() operate on uninitialized memory by accessing data
>> beyond the end of the netlink message.
>>
>> Fix this be re-adding the minimum required size of CRYPTO_MSG_GETALG
>> messages to the crypto_msg_min[] array.
>
> I was playing with the adding of the GETALG value as you did to track down the
> issue fixed with eed1e1afd8d542d9644534c1b712599b5d680007 ("crypto: user - no
> parsing of CRYPTO_MSG_GETALG") in the cryptodev-2.6 tree.

Oh, I haven't noticed this commit. :D Just looked at Linus' master and
crypto-2.6/master.

> It did not occur to me that it fixes another bug. Yet, with this addition, it
> would be possible to revert the patch eed1e1afd8d542d9644534c1b712599b5d680007
> as your patch fixes the issue too. But my fix can also stay as it does not
> hurt either.

Well, it does. Commit eed1e1afd8d542d9644534c1b712599b5d680007 is
really just a workaround for the underlying issue. It does not fix the
bug of the missing minimal size check for CRYPTO_MSG_GETALG, it just
disables any further size checks for CRYPTO_MSG_GETALG. Putting my
patch on top of yours will still not fix the issue of the missing
minimal size check as your patch explicitly excludes CRYPTO_MSG_GETALG
from any size checks. So it needs to be reverted, sorry :/

> What is your take on that?

I'd say to revert eed1e1afd8d542d9644534c1b712599b5d680007 and fix the
issue for real with my patch in crypto-2.6/master, i.e. the upcoming
v4.7. That adds back the size check so userland can't play tricks with
us.

The patch already contains the Cc to stable, so the fix will
eventually end up in the LTS kernel releases used by distros, so this
regression should be fixed in a few weeks.


Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: user - re-add size check for CRYPTO_MSG_GETALG

2016-06-22 Thread Mathias Krause
Commit 9aa867e46565 ("crypto: user - Add CRYPTO_MSG_DELRNG")
accidentally removed the minimum size check for CRYPTO_MSG_GETALG
netlink messages. This allows userland to send a truncated
CRYPTO_MSG_GETALG message as short as a netlink header only making
crypto_report() operate on uninitialized memory by accessing data
beyond the end of the netlink message.

Fix this be re-adding the minimum required size of CRYPTO_MSG_GETALG
messages to the crypto_msg_min[] array.

Fixes: 9aa867e46565 ("crypto: user - Add CRYPTO_MSG_DELRNG")
Cc: sta...@vger.kernel.org  # v4.2
Signed-off-by: Mathias Krause 
Cc: Steffen Klassert 
---
This should go on top of crypto-2.6/master.

 crypto/crypto_user.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 43fe85f20d57..7097a3395b25 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -455,6 +455,7 @@ static const int crypto_msg_min[CRYPTO_NR_MSGTYPES] = {
[CRYPTO_MSG_NEWALG  - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg),
[CRYPTO_MSG_DELALG  - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg),
[CRYPTO_MSG_UPDATEALG   - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg),
+   [CRYPTO_MSG_GETALG  - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg),
[CRYPTO_MSG_DELRNG  - CRYPTO_MSG_BASE] = 0,
 };
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: user - lock crypto_alg_list on alg dump

2016-02-01 Thread Mathias Krause
We miss to take the crypto_alg_sem semaphore when traversing the
crypto_alg_list for CRYPTO_MSG_GETALG dumps. This allows a race with
crypto_unregister_alg() removing algorithms from the list while we're
still traversing it, thereby leading to a use-after-free as show below:

[ 3482.071639] general protection fault:  [#1] SMP
[ 3482.075639] Modules linked in: aes_x86_64 glue_helper lrw ablk_helper cryptd 
gf128mul ipv6 pcspkr serio_raw virtio_net microcode virtio_pci virtio_ring 
virtio sr_mod cdrom [last unloaded: aesni_intel]
[ 3482.075639] CPU: 1 PID: 11065 Comm: crconf Not tainted 4.3.4-grsec+ #126
[ 3482.075639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[ 3482.075639] task: 88001cd41a40 ti: 88001cd422c8 task.ti: 
88001cd422c8
[ 3482.075639] RIP: 0010:[]  [] 
strncpy+0x13/0x30
[ 3482.075639] RSP: 0018:88001f713b60  EFLAGS: 00010202
[ 3482.075639] RAX: 88001f6c4430 RBX: 88001f6c43a0 RCX: 88001f6c4430
[ 3482.075639] RDX: 0040 RSI: fefefefefefeff16 RDI: 88001f6c4430
[ 3482.075639] RBP: 88001f713b60 R08: 88001f6c4470 R09: 88001f6c4480
[ 3482.075639] R10: 0002 R11: 0246 R12: 88001ce2aa28
[ 3482.075639] R13: 88093700 R14: 88001f5e4bf8 R15: 3b20
[ 3482.075639] FS:  033826fa2700() GS:88001e90() 
knlGS:
[ 3482.075639] CS:  0010 DS:  ES:  CR0: 80050033
[ 3482.075639] CR2: ff600400 CR3: 139ec000 CR4: 001606f0
[ 3482.075639] Stack:
[ 3482.075639]  88001f713bd8 936ccd00 88001e5c4200 
88093700
[ 3482.075639]  88001f713bd0 938ef4bf  
3b20
[ 3482.075639]  88001f5e4bf8 88001f5e4848  
3b20
[ 3482.075639] Call Trace:
[ 3482.075639]  [] crypto_report_alg+0xc0/0x3e0
[ 3482.075639]  [] ? __alloc_skb+0x16f/0x300
[ 3482.075639]  [] crypto_dump_report+0x6a/0x90
[ 3482.075639]  [] netlink_dump+0x147/0x2e0
[ 3482.075639]  [] __netlink_dump_start+0x159/0x190
[ 3482.075639]  [] crypto_user_rcv_msg+0xc3/0x130
[ 3482.075639]  [] ? crypto_report_alg+0x3e0/0x3e0
[ 3482.075639]  [] ? alg_test_crc32c+0x120/0x120
[ 3482.075639]  [] ? __netlink_lookup+0xd5/0x120
[ 3482.075639]  [] ? crypto_add_alg+0x1d0/0x1d0
[ 3482.075639]  [] netlink_rcv_skb+0xe1/0x130
[ 3482.075639]  [] crypto_netlink_rcv+0x28/0x40
[ 3482.075639]  [] netlink_unicast+0x108/0x180
[ 3482.075639]  [] netlink_sendmsg+0x541/0x770
[ 3482.075639]  [] sock_sendmsg+0x21/0x40
[ 3482.075639]  [] SyS_sendto+0xf3/0x130
[ 3482.075639]  [] ? bad_area_nosemaphore+0x13/0x20
[ 3482.075639]  [] ? __do_page_fault+0x80/0x3a0
[ 3482.075639]  [] entry_SYSCALL_64_fastpath+0x12/0x6e
[ 3482.075639] Code: 88 4a ff 75 ed 5d 48 0f ba 2c 24 3f c3 66 66 2e 0f 1f 84 
00 00 00 00 00 55 48 85 d2 48 89 f8 48 89 f9 4c 8d 04 17 48 89 e5 74 15 <0f> b6 
16 80 fa 01 88 11 48 83 de ff 48 83 c1 01 4c 39 c1 75 eb
[ 3482.075639] RIP  [] strncpy+0x13/0x30

To trigger the race run the following loops simultaneously for a while:
  $ while : ; do modprobe aesni-intel; rmmod aesni-intel; done
  $ while : ; do crconf show all > /dev/null; done

Fix the race by taking the crypto_alg_sem read lock, thereby preventing
crypto_unregister_alg() from modifying the algorithm list during the
dump.

This bug has been detected by the PaX memory sanitize feature.

Signed-off-by: Mathias Krause 
Cc: Steffen Klassert 
Cc: PaX Team 
---
 crypto/crypto_user.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 237f3795cfaa..43fe85f20d57 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -499,6 +499,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (link->dump == NULL)
return -EINVAL;
 
+   down_read(&crypto_alg_sem);
list_for_each_entry(alg, &crypto_alg_list, cra_list)
dump_alloc += CRYPTO_REPORT_MAXSIZE;
 
@@ -508,8 +509,11 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
.done = link->done,
.min_dump_alloc = dump_alloc,
};
-   return netlink_dump_start(crypto_nlsk, skb, nlh, &c);
+   err = netlink_dump_start(crypto_nlsk, skb, nlh, &c);
}
+   up_read(&crypto_alg_sem);
+
+   return err;
}
 
err = nlmsg_parse(nlh, crypto_msg_min[type], attrs, CRYPTOCFGA_MAX,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BISECTED] 4943ba16 ("include crypto- module prefix") breaks wifi

2015-02-17 Thread Mathias Krause
On 18 February 2015 at 07:34, George Spelvin  wrote:
>
> It's trying to load modules named:
>
> crypto-ccm(aes)
> crypto-ccm(aes)-all
> crypto-ctr(aes)
> crypto-ctr(aes)-all
>
> depmod -n doesn't show any aliases with parens in their names,

That's okay. Also that it fails to load these as it'll fall back
trying to load modules for the templates in that case, as can be seen
in your log:

> Wed Feb 18 06:58:10 UTC 2015 /sbin/modprobe -q -- crypto-ctr

What's curious, however, that it only tries to load the template for
"ctr", not for "ccm". :/
Are the logs complete? Could you please simplify your /sbin/x/modprobe
wrapper to just output the modprobe call, as Kees suggested?

Also, could you please provide the output of "depmod -n | grep
crypto-"? There should be lines for crypto-ccm and crypto-ctr if you
build them as modules.


Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BISECTED] 4943ba16 ("include crypto- module prefix") breaks wifi

2015-02-16 Thread Mathias Krause
On 17 February 2015 at 04:09, George Spelvin  wrote:
> I discovered when (belatedly) testing 3.19-rc7 the other week that
> my laptop wifi was broken and would no longer associate.
>
> Apparently this is causing some necessary crypto algorithms to fail to
> load, breaking my wifi.
>
> Perhaps I'm displaying my ignorance of what's supposed to happen,
> but shouldn't make install have installed some files with names like
> /lib/modules/`uname r`/kernel/crypto/crypto-*.ko?

No, the module names do not change. They just got another module alias
with the "crypto-" prefix.

> Or is it something only I'm hitting because I have a lot of common
> crypto algorithms statically compiled into my kernel?
>
> CONFIG_CRYPTO_CBC=y
> CONFIG_CRYPTO_HMAC=y
> CONFIG_CRYPTO_MD5=y
> CONFIG_CRYPTO_SHA1=y
> CONFIG_CRYPTO_AES=y
> CONFIG_CRYPTO_AES_586=y
> CONFIG_CRYPTO_ARC4=y
>
> Trying this on kernel 4943ba16 produces instead an endless loop of:
>
> wlan1: SME: Trying to authenticate with aa:bb:cc:dd:ee:ff (SSID='FOO' 
> freq=24xx MHz)
> wlan1: Trying to associate with aa:bb:cc:dd:ee:ff (SSID='FOO' freq=24xx MHz)
> wlan1: Associated with aa:bb:cc:dd:ee:ff
> wlan1: WPA: Failed to set PTK to the driver (alg=3 keylen=16 
> bssid=aa:bb:cc:dd:ee:ff)
> wlan1: CTRL-EVENT-DISCONNECTED bssid=aa:bb:cc:dd:ee:ff reason=1
>
>
> The kernel logs are not particularly informative.
>
> They just go through the usual successful series, but end with
>
> wlan1: RxAssocResp from aa:bb:cc:dd:ee:ff (capab=0x431 status=0 aid=1)
> wlan1: associated
> wlan1: deauthenticating from 11:bb:cc:dd:ee:ff by local choice (Reason: 
> 1=UNSPECIFIED)
>
> While successful connection ends before that last line.

Commit 4943ba16bbc2 was incomplete and could have caused regressions
like the above. Those should have been fixed with commits 4943ba16bbc2
+ 3e14dcf7cb80. However, those should be in v3.19-rc7 already, so I'm
not much of a help here :(


Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] crypto: sparc64/md5 - fix module description

2015-01-11 Thread Mathias Krause
MD5 is not SHA1.

Cc: David S. Miller 
Signed-off-by: Mathias Krause 
---
 arch/sparc/crypto/md5_glue.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sparc/crypto/md5_glue.c b/arch/sparc/crypto/md5_glue.c
index 64c7ff5f72a9..b688731d7ede 100644
--- a/arch/sparc/crypto/md5_glue.c
+++ b/arch/sparc/crypto/md5_glue.c
@@ -183,7 +183,7 @@ module_init(md5_sparc64_mod_init);
 module_exit(md5_sparc64_mod_fini);
 
 MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("MD5 Secure Hash Algorithm, sparc64 md5 opcode 
accelerated");
+MODULE_DESCRIPTION("MD5 Message Digest Algorithm, sparc64 md5 opcode 
accelerated");
 
 MODULE_ALIAS_CRYPTO("md5");
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] crypto: x86/des3_ede - drop bogus module aliases

2015-01-11 Thread Mathias Krause
This module implements variations of "des3_ede" only. Drop the bogus
module aliases for "des".

Cc: Jussi Kivilinna 
Signed-off-by: Mathias Krause 
---
 arch/x86/crypto/des3_ede_glue.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index 38a14f818ef1..d6fc59df 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -504,6 +504,4 @@ MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("Triple DES EDE Cipher Algorithm, asm optimized");
 MODULE_ALIAS_CRYPTO("des3_ede");
 MODULE_ALIAS_CRYPTO("des3_ede-asm");
-MODULE_ALIAS_CRYPTO("des");
-MODULE_ALIAS_CRYPTO("des-asm");
 MODULE_AUTHOR("Jussi Kivilinna ");
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] crypto: add missing crypto module aliases

2015-01-11 Thread Mathias Krause
Commit 5d26a105b5a7 ("crypto: prefix module autoloading with "crypto-"")
changed the automatic module loading when requesting crypto algorithms
to prefix all module requests with "crypto-". This requires all crypto
modules to have a crypto specific module alias even if their file name
would otherwise match the requested crypto algorithm.

Even though commit 5d26a105b5a7 added those aliases for a vast amount of
modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO
annotations to those files to make them get loaded automatically, again.
This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work
with kernels v3.18 and below.

Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former
won't work for crypto modules any more.

Fixes: 5d26a105b5a7 ("crypto: prefix module autoloading with "crypto-"")
Cc: Kees Cook 
Signed-off-by: Mathias Krause 
---
 arch/powerpc/crypto/sha1.c   |1 +
 arch/x86/crypto/sha-mb/sha1_mb.c |2 +-
 crypto/aes_generic.c |1 +
 crypto/ansi_cprng.c  |1 +
 crypto/blowfish_generic.c|1 +
 crypto/camellia_generic.c|1 +
 crypto/cast5_generic.c   |1 +
 crypto/cast6_generic.c   |1 +
 crypto/crc32c_generic.c  |1 +
 crypto/crct10dif_generic.c   |1 +
 crypto/des_generic.c |7 ---
 crypto/ghash-generic.c   |1 +
 crypto/krng.c|1 +
 crypto/salsa20_generic.c |1 +
 crypto/serpent_generic.c |1 +
 crypto/sha1_generic.c|1 +
 crypto/sha256_generic.c  |2 ++
 crypto/sha512_generic.c  |2 ++
 crypto/tea.c |1 +
 crypto/tgr192.c  |1 +
 crypto/twofish_generic.c |1 +
 crypto/wp512.c   |1 +
 22 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/crypto/sha1.c b/arch/powerpc/crypto/sha1.c
index d3feba5a275f..c154cebc1041 100644
--- a/arch/powerpc/crypto/sha1.c
+++ b/arch/powerpc/crypto/sha1.c
@@ -154,4 +154,5 @@ module_exit(sha1_powerpc_mod_fini);
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("SHA1 Secure Hash Algorithm");
 
+MODULE_ALIAS_CRYPTO("sha1");
 MODULE_ALIAS_CRYPTO("sha1-powerpc");
diff --git a/arch/x86/crypto/sha-mb/sha1_mb.c b/arch/x86/crypto/sha-mb/sha1_mb.c
index a225a5ca1037..fd9f6b035b16 100644
--- a/arch/x86/crypto/sha-mb/sha1_mb.c
+++ b/arch/x86/crypto/sha-mb/sha1_mb.c
@@ -931,4 +931,4 @@ module_exit(sha1_mb_mod_fini);
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("SHA1 Secure Hash Algorithm, multi buffer accelerated");
 
-MODULE_ALIAS("sha1");
+MODULE_ALIAS_CRYPTO("sha1");
diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
index 9b3c54c1cbe8..3dd101144a58 100644
--- a/crypto/aes_generic.c
+++ b/crypto/aes_generic.c
@@ -1475,3 +1475,4 @@ module_exit(aes_fini);
 MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS_CRYPTO("aes");
+MODULE_ALIAS_CRYPTO("aes-generic");
diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index b4485a108389..6f5bebc9bf01 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -477,3 +477,4 @@ MODULE_PARM_DESC(dbg, "Boolean to enable debugging (0/1 == 
off/on)");
 module_init(prng_mod_init);
 module_exit(prng_mod_fini);
 MODULE_ALIAS_CRYPTO("stdrng");
+MODULE_ALIAS_CRYPTO("ansi_cprng");
diff --git a/crypto/blowfish_generic.c b/crypto/blowfish_generic.c
index 7bd71f02d0dd..87b392a77a93 100644
--- a/crypto/blowfish_generic.c
+++ b/crypto/blowfish_generic.c
@@ -139,3 +139,4 @@ module_exit(blowfish_mod_fini);
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("Blowfish Cipher Algorithm");
 MODULE_ALIAS_CRYPTO("blowfish");
+MODULE_ALIAS_CRYPTO("blowfish-generic");
diff --git a/crypto/camellia_generic.c b/crypto/camellia_generic.c
index 1b74c5a3e891..a02286bf319e 100644
--- a/crypto/camellia_generic.c
+++ b/crypto/camellia_generic.c
@@ -1099,3 +1099,4 @@ module_exit(camellia_fini);
 MODULE_DESCRIPTION("Camellia Cipher Algorithm");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_CRYPTO("camellia");
+MODULE_ALIAS_CRYPTO("camellia-generic");
diff --git a/crypto/cast5_generic.c b/crypto/cast5_generic.c
index 84c86db67ec7..df5c72629383 100644
--- a/crypto/cast5_generic.c
+++ b/crypto/cast5_generic.c
@@ -550,3 +550,4 @@ module_exit(cast5_mod_fini);
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("Cast5 Cipher Algorithm");
 MODULE_ALIAS_CRYPTO("cast5");
+MODULE_ALIAS_CRYPTO("cast5-generic");
diff --git a/crypto/cast6_generic.c b/crypto/cast6_generic.c
index f408f0bd8de2..058c8d755d03 100644
--- a/crypto/cast6_generic.c
+++ b/crypto/cast6_generic.c

[PATCH 0/6] crypto: module alias fixes

2015-01-11 Thread Mathias Krause
Hi Herbert,

this series is a follow up to commit 5d26a105b5a7 ("crypto: prefix
module autoloading with "crypto-""). In patch 1 it adds the required
MODULE_ALIAS_CRYPTO annotation where needed to bring back automatic
crypto module loading. Namely, modules that have a file name matching
the cipher they implement still need the MODULE_ALIAS_CRYPTO()
annotation. Otherwise they won't get loaded.

Patches 2 to 6 fix bogus module descriptions or aliases with the
exception of patch 4 which adds a missing crypto module alias.

At least patch 1 should go to crypto-2.6 to not regress the crypto API
for v3.19 in regard to automatic module loading.

The others are probably safe to apply, too.

Please apply!

Thanks,

Mathias Krause (6):
  crypto: add missing crypto module aliases
  crypto: sparc64/aes - fix module description
  crypto: sparc64/camellia - fix module alias
  crypto: sparc64/des - add "des3_ede" module alias
  crypto: sparc64/md5 - fix module description
  crypto: x86/des3_ede - drop bogus module aliases

 arch/powerpc/crypto/sha1.c|1 +
 arch/sparc/crypto/aes_glue.c  |2 +-
 arch/sparc/crypto/camellia_glue.c |2 +-
 arch/sparc/crypto/des_glue.c  |1 +
 arch/sparc/crypto/md5_glue.c  |2 +-
 arch/x86/crypto/des3_ede_glue.c   |2 --
 arch/x86/crypto/sha-mb/sha1_mb.c  |2 +-
 crypto/aes_generic.c  |1 +
 crypto/ansi_cprng.c   |1 +
 crypto/blowfish_generic.c |1 +
 crypto/camellia_generic.c |1 +
 crypto/cast5_generic.c|1 +
 crypto/cast6_generic.c|1 +
 crypto/crc32c_generic.c   |1 +
 crypto/crct10dif_generic.c|1 +
 crypto/des_generic.c  |7 ---
 crypto/ghash-generic.c|1 +
 crypto/krng.c |1 +
 crypto/salsa20_generic.c  |1 +
 crypto/serpent_generic.c  |1 +
 crypto/sha1_generic.c |1 +
 crypto/sha256_generic.c   |2 ++
 crypto/sha512_generic.c   |2 ++
 crypto/tea.c  |1 +
 crypto/tgr192.c   |1 +
 crypto/twofish_generic.c  |1 +
 crypto/wp512.c|1 +
 27 files changed, 31 insertions(+), 9 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] crypto: sparc64/des - add "des3_ede" module alias

2015-01-11 Thread Mathias Krause
This module provides implementations for "des3_ede", too. Announce those
via an appropriate crypto module alias so it can be used in favour to
the generic C implementation.

Cc: David S. Miller 
Signed-off-by: Mathias Krause 
---
 arch/sparc/crypto/des_glue.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/sparc/crypto/des_glue.c b/arch/sparc/crypto/des_glue.c
index d11500972994..dd6a34fa6e19 100644
--- a/arch/sparc/crypto/des_glue.c
+++ b/arch/sparc/crypto/des_glue.c
@@ -533,5 +533,6 @@ MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("DES & Triple DES EDE Cipher Algorithms, sparc64 des opcode 
accelerated");
 
 MODULE_ALIAS_CRYPTO("des");
+MODULE_ALIAS_CRYPTO("des3_ede");
 
 #include "crop_devid.c"
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] crypto: sparc64/aes - fix module description

2015-01-11 Thread Mathias Krause
AES is a block cipher, not a hash.

Cc: David S. Miller 
Signed-off-by: Mathias Krause 
---
 arch/sparc/crypto/aes_glue.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sparc/crypto/aes_glue.c b/arch/sparc/crypto/aes_glue.c
index 705408766ab0..2e48eb8813ff 100644
--- a/arch/sparc/crypto/aes_glue.c
+++ b/arch/sparc/crypto/aes_glue.c
@@ -497,7 +497,7 @@ module_init(aes_sparc64_mod_init);
 module_exit(aes_sparc64_mod_fini);
 
 MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("AES Secure Hash Algorithm, sparc64 aes opcode 
accelerated");
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, sparc64 aes opcode 
accelerated");
 
 MODULE_ALIAS_CRYPTO("aes");
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] crypto: sparc64/camellia - fix module alias

2015-01-11 Thread Mathias Krause
The module alias should be "camellia", not "aes".

Cc: David S. Miller 
Signed-off-by: Mathias Krause 
---
 arch/sparc/crypto/camellia_glue.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sparc/crypto/camellia_glue.c 
b/arch/sparc/crypto/camellia_glue.c
index 641f55cb61c3..6bf2479a12fb 100644
--- a/arch/sparc/crypto/camellia_glue.c
+++ b/arch/sparc/crypto/camellia_glue.c
@@ -322,6 +322,6 @@ module_exit(camellia_sparc64_mod_fini);
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("Camellia Cipher Algorithm, sparc64 camellia opcode 
accelerated");
 
-MODULE_ALIAS_CRYPTO("aes");
+MODULE_ALIAS_CRYPTO("camellia");
 
 #include "crop_devid.c"
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: aesni - fix "by8" variant for 128 bit keys

2014-12-30 Thread Mathias Krause
The "by8" counter mode optimization is broken for 128 bit keys with
input data longer than 128 bytes. It uses the wrong key material for
en- and decryption.

The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved
in case we're handling more than 128 bytes of input data -- they won't
get reloaded after the initial load. They must therefore be (a) loaded
on the first iteration and (b) be preserved for the latter ones. The
implementation for 128 bit keys does not comply with (a) nor (b).

Fix this by bringing the implementation back to its original source
and correctly load the key registers and preserve their values by
*not* re-using the registers for other purposes.

Kudos to James for reporting the issue and providing a test case
showing the discrepancies.

Reported-by: James Yonan 
Cc: Chandramouli Narayanan 
Cc:  # v3.18
Signed-off-by: Mathias Krause 
---

James, this should fix the issue for you, too -- it did for me, using
your test case. Feel free to add your 'Tested-by' if it does.

This patch should go on top of crypto-2.6.git#master as it's a fix
that needs to go to stable as well.

 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |   46 +++
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S 
b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index 2df2a0298f5a..a916c4a61165 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -208,7 +208,7 @@ ddq_add_8:
 
.if (klen == KEY_128)
.if (load_keys)
-   vmovdqa 3*16(p_keys), xkeyA
+   vmovdqa 3*16(p_keys), xkey4
.endif
.else
vmovdqa 3*16(p_keys), xkeyA
@@ -224,7 +224,7 @@ ddq_add_8:
add $(16*by), p_in
 
.if (klen == KEY_128)
-   vmovdqa 4*16(p_keys), xkey4
+   vmovdqa 4*16(p_keys), xkeyB
.else
.if (load_keys)
vmovdqa 4*16(p_keys), xkey4
@@ -234,7 +234,12 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkeyA, var_xdata, var_xdata /* key 3 */
+   /* key 3 */
+   .if (klen == KEY_128)
+   vaesenc xkey4, var_xdata, var_xdata
+   .else
+   vaesenc xkeyA, var_xdata, var_xdata
+   .endif
.set i, (i +1)
.endr
 
@@ -243,13 +248,18 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkey4, var_xdata, var_xdata /* key 4 */
+   /* key 4 */
+   .if (klen == KEY_128)
+   vaesenc xkeyB, var_xdata, var_xdata
+   .else
+   vaesenc xkey4, var_xdata, var_xdata
+   .endif
.set i, (i +1)
.endr
 
.if (klen == KEY_128)
.if (load_keys)
-   vmovdqa 6*16(p_keys), xkeyB
+   vmovdqa 6*16(p_keys), xkey8
.endif
.else
vmovdqa 6*16(p_keys), xkeyB
@@ -267,12 +277,17 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkeyB, var_xdata, var_xdata /* key 6 */
+   /* key 6 */
+   .if (klen == KEY_128)
+   vaesenc xkey8, var_xdata, var_xdata
+   .else
+   vaesenc xkeyB, var_xdata, var_xdata
+   .endif
.set i, (i +1)
.endr
 
.if (klen == KEY_128)
-   vmovdqa 8*16(p_keys), xkey8
+   vmovdqa 8*16(p_keys), xkeyB
.else
.if (load_keys)
vmovdqa 8*16(p_keys), xkey8
@@ -288,7 +303,7 @@ ddq_add_8:
 
.if (klen == KEY_128)
.if (load_keys)
-   vmovdqa 9*16(p_keys), xkeyA
+   vmovdqa 9*16(p_keys), xkey12
.endif
.else
vmovdqa 9*16(p_keys), xkeyA
@@ -297,7 +312,12 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkey8, var_xdata, var_xdata /* key 8 */
+   /* key 8 */
+   .if (klen == KEY_128)
+   vaesenc xkeyB, var_xdata, var_xdata
+   .else
+   vaesenc xkey8, var_xdata, var_xdata
+   .endif
.set i, (i +1)
.endr
 
@@ -306,7 +326,12 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkeyA, var_xdata, var_xdata /* key 9 */
+   /* key 9 */
+   .if (klen == KEY_128)
+   vaesenc xkey12, var_xdata, var_xdata
+   .else
+   vaesenc xkeyA, var_xdat

Re: [PATCH] crypto: aesni - disable "by8" AVX CTR optimization

2014-12-30 Thread Mathias Krause
On 16 December 2014 at 22:49, James Yonan  wrote:
> On 15/12/2014 12:26, James Yonan wrote:
>>
>> Mathias,
>>
 I'm seeing some anomalous results with the "by8" AVX CTR optimization in
 3.18.
>>>
>>>
>>> the patch you're replying to actually *disabled* the "by8" variant for
>>> v3.17 as it had another bug related to wrong counter handling in GCM.
>>> The fix for that particular issue only made it to v3.18, so the code
>>> got re-enabled only for v3.18. But it looks like that there's yet
>>> another bug :/
>>
>>
>> Right, I should have clarified that I initially suspected the "by8"
>> variant was to blame because your patch that disables it resolves the
>> discrepancy.
>>
 In particular, crypto_aead_encrypt appears to produce different
 ciphertext
 from the same plaintext depending on whether or not the optimization is
 enabled.

 See the attached patch to tcrypt that demonstrates the discrepancy.
>>>
>>>
>>> I can reproduce your observations, so I can confirm the difference,
>>> when using the "by8" variant compared to other AES implementations.
>>> When applying this very patch (commit 7da4b29d496b ("crypto: aesni -
>>> disable "by8" AVX CTR optimization")) -- the patch that disables the
>>> "by8" variant -- on top of v3.18 the discrepancies are gone. So the
>>> behavior is bound to the "by8" optimization, only.
>>
>>
>> Right -- this is exactly what I'm seeing as well.
>>
>>> As it was Chandramouli, who contributed the code, maybe he has a clue
>>> what's wrong here. Chandramouli?
>>
>>
>> A few more observations:
>>
>> * Encryption produces bad ciphertext only when the size of plaintext
>> exceeds a certain threshold.  In test_aead_encrypt_consistency in the
>> tcrypt patch, I found that data_size must be >= 128 to produce bad
>> ciphertext.
>>
>> * Encrypting then decrypting data always gets back to the original
>> plaintext, no matter what the size.
>>
>> * The bad ciphertext from encryption is only evident when the same
>> encrypt operation is performed on a different AES implementation and the
>> ciphertexts are compared.
>>
>> * When the encrypt operation produces bad ciphertext, the generated auth
>> tag is actually correct, so another AES implementation that decrypts the
>> ciphertext will end up with corrupted plaintext that succeeds
>> authentication.

Hi James,

I gave it a shot since Chandramouli does not seem to respond... :(

>
> Another interesting observation:
>
> * bug only occurs when key size is 128 bits, not 192 or 256.

Thank you for your exhaustive analysis. The data size >= 128 bytes and
a key size of 128 were the key bits to this puzzle. The code is plain
wrong for 128 bit keys.
I'll send a patch soon.


Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: aesni - disable "by8" AVX CTR optimization

2014-12-14 Thread Mathias Krause
Hi James,

On 11 December 2014 at 09:52, James Yonan  wrote:
> I'm seeing some anomalous results with the "by8" AVX CTR optimization in
> 3.18.

the patch you're replying to actually *disabled* the "by8" variant for
v3.17 as it had another bug related to wrong counter handling in GCM.
The fix for that particular issue only made it to v3.18, so the code
got re-enabled only for v3.18. But it looks like that there's yet
another bug :/

> In particular, crypto_aead_encrypt appears to produce different ciphertext
> from the same plaintext depending on whether or not the optimization is
> enabled.
>
> See the attached patch to tcrypt that demonstrates the discrepancy.

I can reproduce your observations, so I can confirm the difference,
when using the "by8" variant compared to other AES implementations.
When applying this very patch (commit 7da4b29d496b ("crypto: aesni -
disable "by8" AVX CTR optimization")) -- the patch that disables the
"by8" variant -- on top of v3.18 the discrepancies are gone. So the
behavior is bound to the "by8" optimization, only.
As it was Chandramouli, who contributed the code, maybe he has a clue
what's wrong here. Chandramouli?


Mathias

>
> James
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: include crypto- module prefix in template

2014-11-24 Thread Mathias Krause
On 24 November 2014 at 23:24, Kees Cook  wrote:
> This adds the module loading prefix "crypto-" to the template lookup
> as well.
>
> For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly
> includes the "crypto-" prefix at every level, correctly rejecting "vfat":
>
> net-pf-38
> algif-hash
> crypto-vfat(blowfish)
> crypto-vfat(blowfish)-all
> crypto-vfat
>
> Reported-by: Mathias Krause 
> Signed-off-by: Kees Cook 
> ---
>  crypto/algapi.c | 4 ++--
>  crypto/authenc.c| 1 +
>  crypto/authencesn.c | 1 +
>  crypto/cbc.c| 1 +
>  crypto/chainiv.c| 1 +
>  crypto/cmac.c   | 1 +
>  crypto/cts.c| 1 +
>  crypto/ecb.c| 1 +
>  crypto/eseqiv.c | 1 +
>  crypto/hmac.c   | 1 +
>  crypto/lrw.c| 1 +
>  crypto/pcbc.c   | 1 +
>  crypto/seqiv.c  | 1 +
>  crypto/vmac.c   | 1 +
>  crypto/xcbc.c   | 1 +
>  crypto/xts.c| 1 +
>  16 files changed, 17 insertions(+), 2 deletions(-)

The following ones are still missing:

  arch/x86/crypto/fpu.c: needs MODULE_ALIAS_CRYPTO("fpu")
  crypto/ccm.c: needs MODULE_ALIAS_CRYPTO("ccm")
  crypto/cryptd.c: needs MODULE_ALIAS_CRYPTO("cryptd")
  crypto/ctr.c: needs MODULE_ALIAS_CRYPTO("ctr")
  crypto/gcm.c: needs MODULE_ALIAS_CRYPTO("gcm")
  crypto/mcryptd.c: needs MODULE_ALIAS_CRYPTO("mcryptd")
  crypto/pcrypt.c: needs MODULE_ALIAS_CRYPTO("pcrypt")

With that fixed,

Acked-by: Mathias Krause 

>
> diff --git a/crypto/algapi.c b/crypto/algapi.c
> index e8d3a7dca8c4..71a8143e23b1 100644
> --- a/crypto/algapi.c
> +++ b/crypto/algapi.c
> @@ -509,8 +509,8 @@ static struct crypto_template 
> *__crypto_lookup_template(const char *name)
>
>  struct crypto_template *crypto_lookup_template(const char *name)
>  {
> -   return try_then_request_module(__crypto_lookup_template(name), "%s",
> -  name);
> +   return try_then_request_module(__crypto_lookup_template(name),
> +  "crypto-%s", name);
>  }
>  EXPORT_SYMBOL_GPL(crypto_lookup_template);
>
> diff --git a/crypto/authenc.c b/crypto/authenc.c
> index e1223559d5df..78fb16cab13f 100644
> --- a/crypto/authenc.c
> +++ b/crypto/authenc.c
> @@ -721,3 +721,4 @@ module_exit(crypto_authenc_module_exit);
>
>  MODULE_LICENSE("GPL");
>  MODULE_DESCRIPTION("Simple AEAD wrapper for IPsec");
> +MODULE_ALIAS_CRYPTO("authenc");
> diff --git a/crypto/authencesn.c b/crypto/authencesn.c
> index 4be0dd4373a9..024bff2344fc 100644
> --- a/crypto/authencesn.c
> +++ b/crypto/authencesn.c
> @@ -814,3 +814,4 @@ module_exit(crypto_authenc_esn_module_exit);
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Steffen Klassert ");
>  MODULE_DESCRIPTION("AEAD wrapper for IPsec with extended sequence numbers");
> +MODULE_ALIAS_CRYPTO("authencesn");
> diff --git a/crypto/cbc.c b/crypto/cbc.c
> index 61ac42e1e32b..780ee27b2d43 100644
> --- a/crypto/cbc.c
> +++ b/crypto/cbc.c
> @@ -289,3 +289,4 @@ module_exit(crypto_cbc_module_exit);
>
>  MODULE_LICENSE("GPL");
>  MODULE_DESCRIPTION("CBC block cipher algorithm");
> +MODULE_ALIAS_CRYPTO("cbc");
> diff --git a/crypto/chainiv.c b/crypto/chainiv.c
> index 9c294c8f9a07..63c17d5992f7 100644
> --- a/crypto/chainiv.c
> +++ b/crypto/chainiv.c
> @@ -359,3 +359,4 @@ module_exit(chainiv_module_exit);
>
>  MODULE_LICENSE("GPL");
>  MODULE_DESCRIPTION("Chain IV Generator");
> +MODULE_ALIAS_CRYPTO("chainiv");
> diff --git a/crypto/cmac.c b/crypto/cmac.c
> index 50880cf17fad..7a8bfbd548f6 100644
> --- a/crypto/cmac.c
> +++ b/crypto/cmac.c
> @@ -313,3 +313,4 @@ module_exit(crypto_cmac_module_exit);
>
>  MODULE_LICENSE("GPL");
>  MODULE_DESCRIPTION("CMAC keyed hash algorithm");
> +MODULE_ALIAS_CRYPTO("cmac");
> diff --git a/crypto/cts.c b/crypto/cts.c
> index 133f0874c95e..bd9405820e8a 100644
> --- a/crypto/cts.c
> +++ b/crypto/cts.c
> @@ -351,3 +351,4 @@ module_exit(crypto_cts_module_exit);
>
>  MODULE_LICENSE("Dual BSD/GPL");
>  MODULE_DESCRIPTION("CTS-CBC CipherText Stealing for CBC");
> +MODULE_ALIAS_CRYPTO("cts");
> diff --git a/crypto/ecb.c b/crypto/ecb.c
> index 935cfef4aa84..12011aff0971 100644
> --- a/crypto/ecb.c
> +++ b/crypto/ecb.c
> @@ -185,3 +185,4 @@ module_exit(crypto_ecb_module_exit);
>
>  MODULE_LICENSE("GPL");
>  MODULE_DESCRIPTION("ECB block cipher algorithm");
> +MODUL

Re: [PATCH] crypto: include crypto- module prefix in template

2014-11-24 Thread Mathias Krause
On 24 November 2014 at 20:17, Kees Cook  wrote:
> This adds the module loading prefix "crypto-" to the template lookup
> as well.
>
> For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly
> includes the "crypto-" prefix at every level, correctly rejecting "vfat":
>
> net-pf-38
> algif-hash
> crypto-vfat(blowfish)
> crypto-vfat(blowfish)-all
> crypto-vfat
>
> Reported-by: Mathias Krause 
> Signed-off-by: Kees Cook 
> ---
>  crypto/algapi.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

That commit will break the user API again as ciphers like 'cbc(aes)'
won't work any more -- as the cbc module won't be loaded.
You're missing the MODULE_ALIAS_CRYPTO() annotaions for all the crypto
templates -- cbc, ctr, xts, hmac, ...


Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] crypto: prefix module autoloading with "crypto-"

2014-11-21 Thread Mathias Krause
On 21 November 2014 02:05, Kees Cook  wrote:
> This prefixes all crypto module loading with "crypto-" so we never run
> the risk of exposing module auto-loading to userspace via a crypto API,
> as demonstrated by Mathias Krause:
>
> https://lkml.org/lkml/2013/3/4/70
>
> Signed-off-by: Kees Cook 
> ---
> [...]

Looks good so far, but unfortunately does not solve the problem
completely (af_alg_mod from the link above):

bbox:~# cat /sbin/modlog
#!/bin/sh
exec 1>/tmp/modlog.$$
echo "CMD: $0 $*"
echo "ENV: "
env
exec /sbin/modprobe "$@"
bbox:~# echo /sbin/modlog > /proc/sys/kernel/modprobe
bbox:~# lsmod | grep fat
bbox:~# af_alg_mod 'vfat(blowfish)'
bbox:~# grep CMD /tmp/modlog.*
/tmp/modlog.257:CMD: /sbin/modlog -q -- net-pf-38
/tmp/modlog.261:CMD: /sbin/modlog -q -- algif-hash
/tmp/modlog.265:CMD: /sbin/modlog -q -- crypto-vfat(blowfish)
/tmp/modlog.268:CMD: /sbin/modlog -q -- crypto-vfat(blowfish)-all
/tmp/modlog.272:CMD: /sbin/modlog -q -- vfat
bbox:~# lsmod | grep fat
vfat   17135  0
fat61984  1 vfat
bbox:~#

The last modlog call does not contain the "crypto-" prefix, therefore
happily loads the vfat module.
I guess crypto templates are handled special?

Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: prefix module autoloading with "crypto-"

2014-11-18 Thread Mathias Krause
On 18 November 2014 01:45, Kees Cook  wrote:
> On Mon, Nov 17, 2014 at 3:20 PM, Mathias Krause  
> wrote:
>> On 17 November 2014 21:02, Kees Cook  wrote:
>>> This prefixes all crypto module loading with "crypto-" so we never run
>>> the risk of exposing module auto-loading to userspace via a crypto API,
>>> as demonstrated by Mathias Krause:
>>>
>>> https://lkml.org/lkml/2013/3/4/70
>>>
>>> Signed-off-by: Kees Cook 
>>> ---
>>> v2:
>>>  - added missing #include, thanks to minipli
>>>  - built with allmodconfig
>>> [...snip...]
>>> diff --git a/include/linux/crypto.h b/include/linux/crypto.h
>>> index d45e949699ea..d14230f6e977 100644
>>> --- a/include/linux/crypto.h
>>> +++ b/include/linux/crypto.h
>>> @@ -26,6 +26,13 @@
>>>  #include 
>>>
>>>  /*
>>> + * Autoloaded crypto modules should only use a prefixed name to avoid 
>>> allowing
>>> + * arbitrary modules to be loaded.
>>> + */
>>> +#define MODULE_ALIAS_CRYPTO(name)  \
>>> +   MODULE_ALIAS("crypto-" name)
>>
>> This would break userland relying on the old aliases, e.g. 'modprobe
>> aes' no longer works.
>>
>> Why not have both aliases, one with the "crypto-" prefix for on-demand
>> loading within the crypto API and one without for manual loading from
>> userland? E.g., something like this:
>>
>> #define MODULE_ALIAS_CRYPTO(name)  \
>>MODULE_ALIAS(name); \
>>MODULE_ALIAS("crypto-" name)
>>
>> That would prevent the userland breakage and still achieves the goal
>> of restricting the request_module() call offered by the means of the
>> AF_ALG API.
>
> That was my intention originally, and I should go back to it. The
> trouble is with the use of __UNIQUE_ID in the MODULE_ALIAS macro. It
> uses __LINE__ to produce the id, so the suggested macro expansion
> (which is what I started with) won't work on non-gcc compilers.
>
> I haven't found any solutions for C89 version of gcc's __COUNTER__,
> and I haven't found any C89 ways to force a macro to be expanded as
> being multi-line.

Well, clang should support it as well, according to [1]. But still, a
compiler independent solution would be nice.
Anyway, the __COUNTER__ support is gcc >= 4.3 only. So, according to
Documentation/Changes, stating gcc 3.2 is the minimum supported
version for compiling the kernel, this would be a no-go, too.

[1] http://clang.llvm.org/docs/LanguageExtensions.html#builtin-macros

>
> I'd like to avoid having to open-code both MODULE_ALIAS and
> MODULE_ALIAS_CRYPTO in each module's source.
>
> Anyone see some sneaky way to accomplish this?

Unfortunately, I do not ... beside this, maybe:

#define MODULE_ALIAS_CRYPTO(name)  \
   __MODULE_INFO(alias, alias_userland, name); \
   __MODULE_INFO(alias, alias_crypto, "crypto-" name)

Looks ugly, but works. ;)


Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: prefix module autoloading with "crypto-"

2014-11-17 Thread Mathias Krause
On 17 November 2014 21:02, Kees Cook  wrote:
> This prefixes all crypto module loading with "crypto-" so we never run
> the risk of exposing module auto-loading to userspace via a crypto API,
> as demonstrated by Mathias Krause:
>
> https://lkml.org/lkml/2013/3/4/70
>
> Signed-off-by: Kees Cook 
> ---
> v2:
>  - added missing #include, thanks to minipli
>  - built with allmodconfig
> ---
>  arch/arm/crypto/aes_glue.c  | 4 ++--
>  arch/arm/crypto/sha1_glue.c | 2 +-
>  arch/arm/crypto/sha1_neon_glue.c| 2 +-
>  arch/arm/crypto/sha512_neon_glue.c  | 4 ++--
>  arch/arm64/crypto/aes-ce-ccm-glue.c | 2 +-
>  arch/arm64/crypto/aes-glue.c| 8 
>  arch/powerpc/crypto/sha1.c  | 2 +-
>  arch/s390/crypto/aes_s390.c | 2 +-
>  arch/s390/crypto/des_s390.c | 4 ++--
>  arch/s390/crypto/ghash_s390.c   | 2 +-
>  arch/s390/crypto/sha1_s390.c| 2 +-
>  arch/s390/crypto/sha256_s390.c  | 4 ++--
>  arch/s390/crypto/sha512_s390.c  | 4 ++--
>  arch/sparc/crypto/aes_glue.c| 2 +-
>  arch/sparc/crypto/camellia_glue.c   | 2 +-
>  arch/sparc/crypto/crc32c_glue.c | 2 +-
>  arch/sparc/crypto/des_glue.c| 2 +-
>  arch/sparc/crypto/md5_glue.c| 2 +-
>  arch/sparc/crypto/sha1_glue.c   | 2 +-
>  arch/sparc/crypto/sha256_glue.c | 4 ++--
>  arch/sparc/crypto/sha512_glue.c | 4 ++--
>  arch/x86/crypto/aes_glue.c  | 4 ++--
>  arch/x86/crypto/aesni-intel_glue.c  | 2 +-
>  arch/x86/crypto/blowfish_glue.c | 4 ++--
>  arch/x86/crypto/camellia_aesni_avx2_glue.c  | 4 ++--
>  arch/x86/crypto/camellia_aesni_avx_glue.c   | 4 ++--
>  arch/x86/crypto/camellia_glue.c | 4 ++--
>  arch/x86/crypto/cast5_avx_glue.c| 2 +-
>  arch/x86/crypto/cast6_avx_glue.c| 2 +-
>  arch/x86/crypto/crc32-pclmul_glue.c | 4 ++--
>  arch/x86/crypto/crc32c-intel_glue.c | 4 ++--
>  arch/x86/crypto/crct10dif-pclmul_glue.c | 4 ++--
>  arch/x86/crypto/des3_ede_glue.c | 8 
>  arch/x86/crypto/ghash-clmulni-intel_glue.c  | 2 +-
>  arch/x86/crypto/salsa20_glue.c  | 4 ++--
>  arch/x86/crypto/serpent_avx2_glue.c | 4 ++--
>  arch/x86/crypto/serpent_avx_glue.c  | 2 +-
>  arch/x86/crypto/serpent_sse2_glue.c | 2 +-
>  arch/x86/crypto/sha1_ssse3_glue.c   | 2 +-
>  arch/x86/crypto/sha256_ssse3_glue.c | 4 ++--
>  arch/x86/crypto/sha512_ssse3_glue.c | 4 ++--
>  arch/x86/crypto/twofish_avx_glue.c  | 2 +-
>  arch/x86/crypto/twofish_glue.c  | 4 ++--
>  arch/x86/crypto/twofish_glue_3way.c | 4 ++--
>  crypto/842.c| 1 +
>  crypto/aes_generic.c| 2 +-
>  crypto/ansi_cprng.c | 2 +-
>  crypto/anubis.c | 1 +
>  crypto/api.c| 4 ++--
>  crypto/arc4.c   | 1 +
>  crypto/blowfish_generic.c   | 2 +-
>  crypto/camellia_generic.c   | 2 +-
>  crypto/cast5_generic.c  | 2 +-
>  crypto/cast6_generic.c  | 2 +-
>  crypto/ccm.c| 4 ++--
>  crypto/crc32.c  | 1 +
>  crypto/crc32c_generic.c | 2 +-
>  crypto/crct10dif_generic.c  | 2 +-
>  crypto/crypto_null.c| 6 +++---
>  crypto/ctr.c| 2 +-
>  crypto/deflate.c| 2 +-
>  crypto/des_generic.c| 2 +-
>  crypto/fcrypt.c | 1 +
>  crypto/gcm.c| 6 +++---
>  crypto/ghash-generic.c  | 2 +-
>  crypto/khazad.c | 1 +
>  crypto/krng.c   | 2 +-
>  crypto/lz4.c| 1 +
>  crypto/lz4hc.c  | 1 +
>  crypto/lzo.c| 1 +
>  crypto/md4.c| 2 +-
>  crypto/md5.c| 1 +
>  crypto/michael_mic.c| 1 +
>  crypto/rmd128.c | 1 +
>  crypto/rmd160.c | 1 +
>  crypto/rmd256.c | 1 +
>  crypto/rmd320.c | 1 +
>  crypto/salsa20_generic.c| 2 +-
>  crypto/seed.c   | 1 +
>

Re: [PATCH] crypto: prefix module autoloading with "crypto-"

2014-11-17 Thread Mathias Krause
On 17 November 2014 16:09, Herbert Xu  wrote:
> On Fri, Nov 14, 2014 at 01:34:59PM -0800, Kees Cook wrote:
>> This prefixes all crypto module loading with "crypto-" so we never run
>> the risk of exposing module auto-loading to userspace via a crypto API,
>> as demonstrated by Mathias Krause:
>>
>> https://lkml.org/lkml/2013/3/4/70
>>
>> Signed-off-by: Kees Cook 
>
> Sorry but this doesn't build for me:
>
>   CC [M]  drivers/crypto/qat/qat_common/adf_ctl_drv.o
> drivers/crypto/qat/qat_common/adf_ctl_drv.c:490:21: error: expected 
> declaration specifiers or ‘...’ before string constant
> make[3]: *** [drivers/crypto/qat/qat_common/adf_ctl_drv.o] Error 1
> make[2]: *** [drivers/crypto/qat/qat_common] Error 2
> make[1]: *** [drivers/crypto/qat] Error 2
> make[1]: *** Waiting for unfinished jobs

Looks like drivers/crypto/qat/qat_common/adf_ctl_drv.c is missing
'#include '.

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] crypto: aesni - fix counter overflow handling in "by8" variant

2014-09-28 Thread Mathias Krause
The "by8" CTR AVX implementation fails to propperly handle counter
overflows. That was the reason it got disabled in commit 7da4b29d496b
("crypto: aesni - disable "by8" AVX CTR optimization").

Fix the overflow handling by incrementing the counter block as a double
quad word, i.e. a 128 bit, and testing for overflows afterwards. We need
to use VPTEST to do so as VPADD* does not set the flags itself and
silently drops the carry bit.

As this change adds branches to the hot path, minor performance
regressions  might be a side effect. But, OTOH, we now have a conforming
implementation -- the preferable goal.

A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical
numbers for the old and this version with differences within the noise
range. A dm-crypt test with the fixed version gave even slightly better
results for this version. So the performance impact might not be as big
as expected.

Tested-by: Romain Francoise 
Signed-off-by: Mathias Krause 
Cc: Chandramouli Narayanan 
---
 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |   17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S 
b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index f091f122ed24..a029bc744244 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -108,6 +108,10 @@
 
 byteswap_const:
.octa 0x000102030405060708090A0B0C0D0E0F
+ddq_low_msk:
+   .octa 0x
+ddq_high_add_1:
+   .octa 0x0001
 ddq_add_1:
.octa 0x0001
 ddq_add_2:
@@ -169,7 +173,12 @@ ddq_add_8:
.rept (by - 1)
club DDQ_DATA, i
club XDATA, i
-   vpaddd  var_ddq_add(%rip), xcounter, var_xdata
+   vpaddq  var_ddq_add(%rip), xcounter, var_xdata
+   vptest  ddq_low_msk(%rip), var_xdata
+   jnz 1f
+   vpaddq  ddq_high_add_1(%rip), var_xdata, var_xdata
+   vpaddq  ddq_high_add_1(%rip), xcounter, xcounter
+   1:
vpshufb xbyteswap, var_xdata, var_xdata
.set i, (i +1)
.endr
@@ -178,7 +187,11 @@ ddq_add_8:
 
vpxor   xkey0, xdata0, xdata0
club DDQ_DATA, by
-   vpaddd  var_ddq_add(%rip), xcounter, xcounter
+   vpaddq  var_ddq_add(%rip), xcounter, xcounter
+   vptest  ddq_low_msk(%rip), xcounter
+   jnz 1f
+   vpaddq  ddq_high_add_1(%rip), xcounter, xcounter
+   1:
 
.set i, 1
.rept (by - 1)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Revert "crypto: aesni - disable "by8" AVX CTR optimization"

2014-09-28 Thread Mathias Krause
This reverts commit 7da4b29d496b1389d3a29b55d3668efecaa08ebd.

Now, that the issue is fixed, we can re-enable the code.

Signed-off-by: Mathias Krause 
Cc: Chandramouli Narayanan 
---
 arch/x86/crypto/aesni-intel_glue.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index a7ccd57f19e4..888950f29fd9 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -481,7 +481,7 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
crypto_inc(ctrblk, AES_BLOCK_SIZE);
 }
 
-#if 0  /* temporary disabled due to failing crypto tests */
+#ifdef CONFIG_AS_AVX
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
  const u8 *in, unsigned int len, u8 *iv)
 {
@@ -1522,7 +1522,7 @@ static int __init aesni_init(void)
aesni_gcm_dec_tfm = aesni_gcm_dec;
}
aesni_ctr_enc_tfm = aesni_ctr_enc;
-#if 0  /* temporary disabled due to failing crypto tests */
+#ifdef CONFIG_AS_AVX
if (cpu_has_avx) {
/* optimize performance of ctr mode encryption transform */
aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] crypto: aesni - remove unused defines in "by8" variant

2014-09-28 Thread Mathias Krause
The defines for xkey3, xkey6 and xkey9 are not used in the code. They're
probably left overs from merging the three source files for 128, 192 and
256 bit AES. They can safely be removed.

Signed-off-by: Mathias Krause 
Cc: Chandramouli Narayanan 
---
 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S 
b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index a029bc744244..2df2a0298f5a 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -79,9 +79,6 @@
 #define xcounter   %xmm8
 #define xbyteswap  %xmm9
 #define xkey0  %xmm10
-#define xkey3  %xmm11
-#define xkey6  %xmm12
-#define xkey9  %xmm13
 #define xkey4  %xmm11
 #define xkey8  %xmm12
 #define xkey12 %xmm13
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] crypto: aesni - fix and re-enable "by8" CTR variant

2014-09-28 Thread Mathias Krause
This series fixes the counter overflow handling of the "by8" CTR variant
which lead to failing cryptomgr tests and, in turn, disabling this
optimization with commit 7da4b29d496b.

Patch 1 fixes the bug, patch 2 removes some unused defines (left overs
from the unification of the initial source files) and patch 3 re-enables
the code.

The fix was tested by me, doing tcrypt and dm-crypt tests. It was also
tested by Romain who initially reported the issue.

The patches should go on top of crypto-2.6.git.

In case this doesn't get merged for v3.17, patches 1 and 3 may be cc'ed
to stable to propagate the fix.

Please apply!

Thanks,
Mathias


Mathias Krause (3):
  crypto: aesni - fix counter overflow handling in "by8" variant
  crypto: aesni - remove unused defines in "by8" variant
  Revert "crypto: aesni - disable "by8" AVX CTR optimization"

 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |   20 +++-
 arch/x86/crypto/aesni-intel_glue.c  |4 ++--
 2 files changed, 17 insertions(+), 7 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni

2014-09-24 Thread Mathias Krause
On 25 September 2014 00:23, chandramouli narayanan
 wrote:
> On Mon, 2014-09-22 at 00:28 +0200, Mathias Krause wrote:
>> What might be worth noting, the failing test uses an IV value of
>> "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFD",
>> or, in other words, a counter value that'll wrap after processing
>> three blocks. The Crypto++ implementation explicitly states, it can
>> handle the wrap around (see [1]). Dumping the IV before and after the
>> cryptomgr tests shows, the "by8" implementation only handles the lower
>> 32 bit as a counter. Looking at RFC 3686, it defines the "counter
>> block" as a 128 bit combination of nonce, IV and a 32 bit counter
>> value. It also defines the initial value of the counter part (1) and
>> how it should be incremented (increment the whole counter block, i.e.
>> the 128 bit value). However, it also states that the maximum number
>> blocks per packet are (2^32)-1. So, according to the RFC, the wrap
>> around cannot happen -- not even for the 32 bit part of the counter
>> block. However the other aesni-backed implementation does handle the
>> wrap around just fine. It does so by doing the increment on a integer
>> register so it can use the carry flag to detect the wrap around.
>> Changing the "by8" variant would be straight forward, but I fear
>> performance implications... :/
>>
>
> It will take some time for me to debug this issue. Is there a list of
> test vectors to debug with?

The failing test is aes_ctr_enc_tv_template[3] from crypto/testmgr.h.
It fails because of a wrong handling of counter overflows.

I'm already working on a patch that fixes the counter overflow issue.
It passes the cryptomgr test but I like to do some more thorough
tests. Especially some performance measurements as we now have
branches in the hot path.

I don't know if we should still rush fix this for v3.17 or delay this
for the next merge window. The offending code was already disabled in
Linus' tree and the fixes would be small enough to be backported if
there is interest.

Regards,
Mathias

>
> thanks
> -mouli
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: aesni - disable "by8" AVX CTR optimization

2014-09-23 Thread Mathias Krause
The "by8" implementation introduced in commit 22cddcc7df8f ("crypto: aes
- AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it
handles counter block overflows differently. It only accounts the right
most 32 bit as a counter -- not the whole block as all other
implementations do. This makes it fail the cryptomgr test #4 that
specifically tests this corner case.

As we're quite late in the release cycle, just disable the "by8" variant
for now.

Reported-by: Romain Francoise 
Signed-off-by: Mathias Krause 
Cc: Chandramouli Narayanan 

---
I'll try to create a real fix next week but I can't promise much. If
Linus releases v3.17 early, as he has mentioned in his -rc6
announcement, we should hot fix this by just disabling the "by8"
variant. The real fix would add the necessary counter block handling to
the asm code and revert this commit afterwards. Reverting the whole code
is not necessary, imho.

Would that be okay for you, Herbert?
---
 arch/x86/crypto/aesni-intel_glue.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index 888950f29fd9..a7ccd57f19e4 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -481,7 +481,7 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
crypto_inc(ctrblk, AES_BLOCK_SIZE);
 }
 
-#ifdef CONFIG_AS_AVX
+#if 0  /* temporary disabled due to failing crypto tests */
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
  const u8 *in, unsigned int len, u8 *iv)
 {
@@ -1522,7 +1522,7 @@ static int __init aesni_init(void)
aesni_gcm_dec_tfm = aesni_gcm_dec;
}
aesni_ctr_enc_tfm = aesni_ctr_enc;
-#ifdef CONFIG_AS_AVX
+#if 0  /* temporary disabled due to failing crypto tests */
if (cpu_has_avx) {
/* optimize performance of ctr mode encryption transform */
aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni

2014-09-21 Thread Mathias Krause
On 17 September 2014 13:29, Herbert Xu  wrote:
> On Tue, Sep 16, 2014 at 10:01:07PM +0200, Romain Francoise wrote:
>> On Mon, Sep 15, 2014 at 10:36:46AM +0200, Romain Francoise wrote:
>> > I upgraded from v3.16 to v3.17-rc5 and the ctr-aes-aesni encryption test
>> > fails, which makes my IPsec tunnels unhappy (see trace below). Before I
>> > start bisecting (2cddcc7df8fd3 is probably my first guess), is this
>> > already known?
>>
>> > Sep 15 08:07:56 silenus kernel: [   35.137145] alg: skcipher: Test 4 
>> > failed on encryption for ctr-aes-aesni [...]
>>
>> Update: reverting 2cddcc7df8 ("crypto: aes - AES CTR x86_64 "by8" AVX
>> optimization") fixes this. This machine is Sandy Bridge (Core i7-2600),
>> and the problem doesn't seem to occur with the exact same kernel image
>> on Ivy Bridge (Xeon E3-1240v2).
>
> Thanks for bisecting.  If we can't fix this quickly then we should
> just revert it for now.
>

[Adding James and Sean as they're stated as "contact information"]

I compared the implementation against the original code from Intel
referenced in the source file and found a few differences. But even
after removing those, the code still generates the same error. So if
Chandramouli does not come up with something, we should revert it, as
the reference implementation from Intel is a) either wrong or b) used
wrongly in the Linux kernel.

What might be worth noting, the failing test uses an IV value of
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFD",
or, in other words, a counter value that'll wrap after processing
three blocks. The Crypto++ implementation explicitly states, it can
handle the wrap around (see [1]). Dumping the IV before and after the
cryptomgr tests shows, the "by8" implementation only handles the lower
32 bit as a counter. Looking at RFC 3686, it defines the "counter
block" as a 128 bit combination of nonce, IV and a 32 bit counter
value. It also defines the initial value of the counter part (1) and
how it should be incremented (increment the whole counter block, i.e.
the 128 bit value). However, it also states that the maximum number
blocks per packet are (2^32)-1. So, according to the RFC, the wrap
around cannot happen -- not even for the 32 bit part of the counter
block. However the other aesni-backed implementation does handle the
wrap around just fine. It does so by doing the increment on a integer
register so it can use the carry flag to detect the wrap around.
Changing the "by8" variant would be straight forward, but I fear
performance implications... :/

Looking back at the test vector, even if it might be inappropriate for
IPsec, it is still valid for AES-CTR in the general case. So IMHO the
"by8" implementation is wrong and should be fixed -- or reverted, for
that matter.

James, Sean: Have you observed any interoperability problems with the
Intel reference implementation for the AVX by8 variant of AES-CTR?
Especially, have you tested the code for wrapping counter values?


Regards,
Mathias

[1] http://www.cryptopp.com/wiki/CTR_Mode
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni

2014-09-17 Thread Mathias Krause
On 17 September 2014 22:10, Mathias Krause  wrote:
> On 16 September 2014 22:01, Romain Francoise  wrote:
>> On Mon, Sep 15, 2014 at 10:36:46AM +0200, Romain Francoise wrote:
>>> I upgraded from v3.16 to v3.17-rc5 and the ctr-aes-aesni encryption test
>>> fails, which makes my IPsec tunnels unhappy (see trace below). Before I
>>> start bisecting (2cddcc7df8fd3 is probably my first guess), is this
>>> already known?
>>
>>> Sep 15 08:07:56 silenus kernel: [   35.137145] alg: skcipher: Test 4 failed 
>>> on encryption for ctr-aes-aesni [...]
>
> I do not get the above message on a SandyBridge i7-2620M, even though
> the module makes use of the "by8" variant on my system, too:
>

Never mind! Playing a little with crconf to instantiate 'ctr(aes)' and
I can see the failing test now too.

Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni

2014-09-17 Thread Mathias Krause
On 16 September 2014 22:01, Romain Francoise  wrote:
> On Mon, Sep 15, 2014 at 10:36:46AM +0200, Romain Francoise wrote:
>> I upgraded from v3.16 to v3.17-rc5 and the ctr-aes-aesni encryption test
>> fails, which makes my IPsec tunnels unhappy (see trace below). Before I
>> start bisecting (2cddcc7df8fd3 is probably my first guess), is this
>> already known?
>
>> Sep 15 08:07:56 silenus kernel: [   35.137145] alg: skcipher: Test 4 failed 
>> on encryption for ctr-aes-aesni [...]

I do not get the above message on a SandyBridge i7-2620M, even though
the module makes use of the "by8" variant on my system, too:

[0.340626] AVX version of gcm_enc/dec engaged.
[0.340627] AES CTR mode by8 optimization enabled
[0.341273] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)

> Update: reverting 2cddcc7df8 ("crypto: aes - AES CTR x86_64 "by8" AVX
> optimization") fixes this. This machine is Sandy Bridge (Core i7-2600),
> and the problem doesn't seem to occur with the exact same kernel image
> on Ivy Bridge (Xeon E3-1240v2).

Can you please provide the full kernel log and /proc/cpuinfo of those
machines? It would be interesting to know which variant was used on
those machines -- the new "by8" or the old one.


Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, crypto: Check if gas supports CRC32

2014-07-30 Thread Mathias Krause
On 31 July 2014 00:11, Borislav Petkov  wrote:
>
> On Wed, Jul 30, 2014 at 11:28:14PM +0200, Mathias Krause wrote:
> > Gah, CONFIG_AS_CRC32 gets defined as a preprocessor symbol only so
> > cannot be used in makefiles. So crc32c-pcl-intel-asm_64.S needs a
> > "#ifdef CONFIG_AS_CRC32" guard and still be compiled for CONFIG_64BIT,
> > as it is now. It'll be an empty object for older binutils versions not
> > supporting the crc32 instruction.
>
> Yeah, that makes it all even simpler, thanks!
>
> We're still b0rked though:
>
> arch/x86/crypto/crct10dif-pcl-asm_64.S: Assembler messages:
> arch/x86/crypto/crct10dif-pcl-asm_64.S:147: Error: no such instruction: 
> `pclmulqdq $0x0,%xmm10,%xmm0'
> arch/x86/crypto/crct10dif-pcl-asm_64.S:148: Error: no such instruction: 
> `pclmulqdq $0x11,%xmm10,%xmm8'
> arch/x86/crypto/crct10dif-pcl-asm_64.S:149: Error: no such instruction: 
> `pclmulqdq $0x0,%xmm10,%xmm1'
> ...
>
> and need checking for more instructions. I'll play with this more
> tomorrow.
>

You probably can reuse the AVX test for this -- either the
CONFIG_AS_AVX preprocessor one or the $(avx_supported) make one, local
to arch/x86/crypto/Makefile.
Even though the CLMUL feature has not much to with AVX (it has a
dedicated CPUID feature bit), support for it in binutils was added
together with AVX support, see
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c0f3af977b0f28a0dc5a620110b8dcf9d8286f84

Regards,
Mathias

> Good night :-)
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, crypto: Check if gas supports CRC32

2014-07-30 Thread Mathias Krause
On Wed, Jul 30, 2014 at 11:19:37PM +0200, Mathias Krause wrote:
> On Wed, Jul 30, 2014 at 06:47:01PM +0200, Borislav Petkov wrote:
> > [...]
> 
> This looks too complicated. We do have as-instr for exactly those kind
> of tests. And, in fact, looking at arch/x86/Makefile we already have one
> for crc32:
> 
>   asinstr += $(call as-instr,crc32l %eax$(comma)%eax,-DCONFIG_AS_CRC32=1)
> 
> So you can just used CONFIG_AS_CRC32 for your tests and drop the shell
> script.
> 
> > +ifdef CONFIG_64BIT
> > +crc32c-intel-$(CONFIG_GAS_SUPPORTS_CRC32) += crc32c-pcl-intel-asm_64.o
> > +endif
> 
> s/CONFIG_GAS_SUPPORTS_CRC32/CONFIG_AS_CRC32/ here and for all further
> uses.
> 

Gah, CONFIG_AS_CRC32 gets defined as a preprocessor symbol only so
cannot be used in makefiles. So crc32c-pcl-intel-asm_64.S needs a
"#ifdef CONFIG_AS_CRC32" guard and still be compiled for CONFIG_64BIT,
as it is now. It'll be an empty object for older binutils versions not
supporting the crc32 instruction.

Sorry for the confusion.


Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, crypto: Check if gas supports CRC32

2014-07-30 Thread Mathias Krause
On Wed, Jul 30, 2014 at 06:47:01PM +0200, Borislav Petkov wrote:
> Building current kernel with some old toolchain (gcc 4.1.2 and gas 2.17)
> chokes with:
> 
> arch/x86/crypto/crc32c-pcl-intel-asm_64.S: Assembler messages:
> arch/x86/crypto/crc32c-pcl-intel-asm_64.S:128: Error: no such instruction: 
> `crc32b %bl,%r8d'
> arch/x86/crypto/crc32c-pcl-intel-asm_64.S:204: Error: no such instruction: 
> `crc32q -i*8(%rcx),%r8'
> ...
> 
> due to the fact that gas doesn't know the CRC32 instruction. Check that
> before building.
> 
> Cc: linux-crypto@vger.kernel.org
> Signed-off-by: Borislav Petkov 
> ---
> 
> So this is one possibility to address this. Code already
> has the ifdeffery around crc_pcl() which is implemented in
> crc32c-pcl-intel-asm_64.S so we can piggyback on that and not build that
> file if gas doesn't know CRC32.
> 
> If no CRC32 support, it still builds fine silently, however it would be
> better to probably say that due to old toolchain, kernel doesn't include
> fast CRC32 stuff in crc32c-intel.ko.
> 
> Hmmm.
> 
> 
>  arch/x86/crypto/Makefile|  8 +++-
>  arch/x86/crypto/crc32c-intel_glue.c |  7 ---
>  scripts/gas-can-do-crc32.sh | 12 
>  3 files changed, 23 insertions(+), 4 deletions(-)
>  create mode 100755 scripts/gas-can-do-crc32.sh
> 
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index 61d6e281898b..707bf7ecb903 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -83,7 +83,13 @@ ifeq ($(avx2_supported),yes)
>  sha1-ssse3-y += sha1_avx2_x86_64_asm.o
>  endif
>  crc32c-intel-y := crc32c-intel_glue.o
> -crc32c-intel-$(CONFIG_64BIT) += crc32c-pcl-intel-asm_64.o
> +
> +gas_can_crc32 := $(srctree)/scripts/gas-can-do-crc32.sh
> +CONFIG_GAS_SUPPORTS_CRC32=$(shell $(CONFIG_SHELL) $(gas_can_crc32) $(AS))
> +

This looks too complicated. We do have as-instr for exactly those kind
of tests. And, in fact, looking at arch/x86/Makefile we already have one
for crc32:

  asinstr += $(call as-instr,crc32l %eax$(comma)%eax,-DCONFIG_AS_CRC32=1)

So you can just used CONFIG_AS_CRC32 for your tests and drop the shell
script.

> +ifdef CONFIG_64BIT
> +crc32c-intel-$(CONFIG_GAS_SUPPORTS_CRC32) += crc32c-pcl-intel-asm_64.o
> +endif

s/CONFIG_GAS_SUPPORTS_CRC32/CONFIG_AS_CRC32/ here and for all further
uses.

>  crc32-pclmul-y := crc32-pclmul_asm.o crc32-pclmul_glue.o
>  sha256-ssse3-y := sha256-ssse3-asm.o sha256-avx-asm.o sha256-avx2-asm.o 
> sha256_ssse3_glue.o
>  sha512-ssse3-y := sha512-ssse3-asm.o sha512-avx-asm.o sha512-avx2-asm.o 
> sha512_ssse3_glue.o
> diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
> b/arch/x86/crypto/crc32c-intel_glue.c
> index 6812ad98355c..4ce8e2db2984 100644
> --- a/arch/x86/crypto/crc32c-intel_glue.c
> +++ b/arch/x86/crypto/crc32c-intel_glue.c
> @@ -46,7 +46,7 @@
>  #define REX_PRE
>  #endif
>  
> -#ifdef CONFIG_X86_64
> +#if defined(CONFIG_X86_64) && defined(CONFIG_GAS_SUPPORTS_CRC32)
>  /*
>   * use carryless multiply version of crc32c when buffer
>   * size is >= 512 (when eager fpu is enabled) or
> @@ -181,7 +181,7 @@ static int crc32c_intel_cra_init(struct crypto_tfm *tfm)
>   return 0;
>  }
>  
> -#ifdef CONFIG_X86_64
> +#if defined(CONFIG_X86_64) && defined(CONFIG_GAS_SUPPORTS_CRC32)
>  static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
>  unsigned int len)
>  {
> @@ -257,7 +257,8 @@ static int __init crc32c_intel_mod_init(void)
>  {
>   if (!x86_match_cpu(crc32c_cpu_id))
>   return -ENODEV;
> -#ifdef CONFIG_X86_64
> +
> +#if defined(CONFIG_X86_64) && defined(CONFIG_GAS_SUPPORTS_CRC32)
>   if (cpu_has_pclmulqdq) {
>   alg.update = crc32c_pcl_intel_update;
>   alg.finup = crc32c_pcl_intel_finup;
> diff --git a/scripts/gas-can-do-crc32.sh b/scripts/gas-can-do-crc32.sh
> new file mode 100755
> index ..88ff476bd3cc
> --- /dev/null
> +++ b/scripts/gas-can-do-crc32.sh
> @@ -0,0 +1,12 @@
> +#!/bin/sh
> +
> +TMP=$(mktemp)
> +
> +echo "crc32 %rax,%rbx" | $* --warn - -o $TMP 2>/dev/null
> +if [ "$?" -eq "0" ]; then
> + echo y
> +else
> + echo n
> +fi
> +
> +rm -f $TMP
> -- 
> 2.0.0
> 

Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/1] crypto: AES CTR x86_64 "by8" AVX optimization

2014-06-10 Thread Mathias Krause
  do_aes_load 2, \key_len
> +   add $(2*16), p_out
> +   and $(~7*16), num_bytes
> +   jz  .Ldo_return2\key_len
> +   jmp .Lmain_loop2\key_len
> +
> +
> +.Leq3\key_len:
> +   do_aes_load 3, \key_len
> +   add $(3*16), p_out
> +   and $(~7*16), num_bytes
> +   jz  .Ldo_return2\key_len
> +   jmp .Lmain_loop2\key_len
> +
> +.Leq4\key_len:
> +   do_aes_load 4, \key_len
> +   add $(4*16), p_out
> +   and $(~7*16), num_bytes
> +   jz  .Ldo_return2\key_len
> +   jmp .Lmain_loop2\key_len
> +
> +.Lgt4\key_len:
> +   cmp $(6*16), tmp
> +   jg  .Leq7\key_len
> +   je  .Leq6\key_len
> +
> +.Leq5\key_len:
> +   do_aes_load 5, \key_len
> +   add $(5*16), p_out
> +   and $(~7*16), num_bytes
> +   jz  .Ldo_return2\key_len
> +   jmp .Lmain_loop2\key_len
> +
> +.Leq6\key_len:
> +   do_aes_load 6, \key_len
> +   add $(6*16), p_out
> +   and $(~7*16), num_bytes
> +   jz  .Ldo_return2\key_len
> +   jmp .Lmain_loop2\key_len
> +
> +.Leq7\key_len:
> +   do_aes_load 7, \key_len
> +   add $(7*16), p_out
> +   and $(~7*16), num_bytes
> +   jz  .Ldo_return2\key_len
> +   jmp .Lmain_loop2\key_len
> +
> +.Lmult_of_8_blks\key_len:
> +   .if (\key_len != KEY_128)
> +   vmovdqa 0*16(p_keys), xkey0
> +   vmovdqa 4*16(p_keys), xkey4
> +   vmovdqa 8*16(p_keys), xkey8
> +   vmovdqa 12*16(p_keys), xkey12
> +   .else
> +   vmovdqa 0*16(p_keys), xkey0
> +   vmovdqa 3*16(p_keys), xkey4
> +   vmovdqa 6*16(p_keys), xkey8
> +   vmovdqa 9*16(p_keys), xkey12
> +   .endif
> +.align 16
> +.Lmain_loop2\key_len:
> +   /* num_bytes is a multiple of 8 and >0 */
> +   do_aes_noload   8, \key_len
> +   add $(8*16), p_out
> +   sub $(8*16), num_bytes
> +   jne .Lmain_loop2\key_len
> +
> +.Ldo_return2\key_len:
> +   /* return updated IV */
> +   vpshufb xbyteswap, xcounter, xcounter
> +   vmovdqu xcounter, (p_iv)
> +   ret
> +.endm
> +
> +/*
> + * routine to do AES128 CTR enc/decrypt "by8"
> + * XMM registers are clobbered.
> + * Saving/restoring must be done at a higher level
> + * aes_ctr_enc_128_avx_by8(void *in, void *iv, void *keys, void *out,
> + * unsigned int num_bytes)
> + */
> +ENTRY(aes_ctr_enc_128_avx_by8)
> +   /* call the aes main loop */
> +   do_aes_ctrmain KEY_128
> +
> +ENDPROC(aes_ctr_enc_128_avx_by8)
> +
> +/*
> + * routine to do AES192 CTR enc/decrypt "by8"
> + * XMM registers are clobbered.
> + * Saving/restoring must be done at a higher level
> + * aes_ctr_enc_192_avx_by8(void *in, void *iv, void *keys, void *out,
> + * unsigned int num_bytes)
> + */
> +ENTRY(aes_ctr_enc_192_avx_by8)
> +   /* call the aes main loop */
> +   do_aes_ctrmain KEY_192
> +
> +ENDPROC(aes_ctr_enc_192_avx_by8)
> +
> +/*
> + * routine to do AES256 CTR enc/decrypt "by8"
> + * XMM registers are clobbered.
> + * Saving/restoring must be done at a higher level
> + * aes_ctr_enc_256_avx_by8(void *in, void *iv, void *keys, void *out,
> + * unsigned int num_bytes)
> + */
> +ENTRY(aes_ctr_enc_256_avx_by8)
> +   /* call the aes main loop */
> +   do_aes_ctrmain KEY_256
> +
> +ENDPROC(aes_ctr_enc_256_avx_by8)
> diff --git a/arch/x86/crypto/aesni-intel_glue.c 
> b/arch/x86/crypto/aesni-intel_glue.c
> index 948ad0e..888950f 100644
> --- a/arch/x86/crypto/aesni-intel_glue.c
> +++ b/arch/x86/crypto/aesni-intel_glue.c
> @@ -105,6 +105,9 @@ void crypto_fpu_exit(void);
>  #define AVX_GEN4_OPTSIZE 4096
>
>  #ifdef CONFIG_X86_64
> +
> +static void (*aesni_ctr_enc_tfm)(struct crypto_aes_ctx *ctx, u8 *out,
> + const u8 *in, unsigned int len, u8 *iv);
>  asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
>   const u8 *in, unsigned int len, u8 *iv);
>
> @@ -155,6 +158,12 @@ asmlinkage void aesni_gcm_dec(void *ctx, u8 *out,
>
>
>  #ifdef CONFIG_AS_AVX
> +asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv,
> +   void *keys, u8 *out, unsigned int num_bytes);
> +asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv,
> +   void *keys, u8 *out, unsigned int num_bytes);
> +asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv,
> +   void *keys, u8 *out, unsigned int num_bytes);
>  /*
>   * asmlinkage void aesni_gcm_precomp_avx_gen2()
>   * gcm_data *my_ctx_data, context data
> @@ -472,6 +481,25 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
> crypto_inc(ctrblk, AES_BLOCK_SIZE);
>  }
>
> +#ifdef CONFIG_AS_AVX
> +static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
> + const u8 *in, unsigned int len, u8 *iv)
> +{
> +   /*
> +* based on key length, override with the by8 version
> +* of ctr mode encryption/decryption for improved performance
> +* aes_set_key_common() ensures that key length is one of
> +* {128,192,256}
> +*/
> +   if (ctx->key_length == AES_KEYSIZE_128)
> +   aes_ctr_enc_128_avx_by8(in, iv, (void *)ctx, out, len);
> +   else if (ctx->key_length == AES_KEYSIZE_192)
> +   aes_ctr_enc_192_avx_by8(in, iv, (void *)ctx, out, len);
> +   else
> +   aes_ctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len);
> +}
> +#endif
> +
>  static int ctr_crypt(struct blkcipher_desc *desc,
>  struct scatterlist *dst, struct scatterlist *src,
>  unsigned int nbytes)
> @@ -486,8 +514,8 @@ static int ctr_crypt(struct blkcipher_desc *desc,
>
> kernel_fpu_begin();
> while ((nbytes = walk.nbytes) >= AES_BLOCK_SIZE) {
> -   aesni_ctr_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
> - nbytes & AES_BLOCK_MASK, walk.iv);
> +   aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr,
> + nbytes & AES_BLOCK_MASK, walk.iv);
> nbytes &= AES_BLOCK_SIZE - 1;
> err = blkcipher_walk_done(desc, &walk, nbytes);
> }
> @@ -1493,6 +1521,14 @@ static int __init aesni_init(void)
> aesni_gcm_enc_tfm = aesni_gcm_enc;
> aesni_gcm_dec_tfm = aesni_gcm_dec;
> }
> +   aesni_ctr_enc_tfm = aesni_ctr_enc;
> +#ifdef CONFIG_AS_AVX
> +   if (cpu_has_avx) {
> +   /* optimize performance of ctr mode encryption transform */
> +   aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
> +   pr_info("AES CTR mode by8 optimization enabled\n");
> +   }
> +#endif
>  #endif
>
> err = crypto_fpu_init();
> --
> 1.8.2.1
>
>

Patch is
Reviewed-by: Mathias Krause 

Thanks, Chandramouli!

Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/1] crypto: AES CTR x86_64 "by8" AVX optimization

2014-06-10 Thread Mathias Krause
On 9 June 2014 19:41, chandramouli narayanan  wrote:
> [...]
> @@ -1493,6 +1521,14 @@ static int __init aesni_init(void)
> aesni_gcm_enc_tfm = aesni_gcm_enc;
> aesni_gcm_dec_tfm = aesni_gcm_dec;
> }
> +   aesni_ctr_enc_tfm = aesni_ctr_enc;
> +#ifdef CONFIG_AS_AVX
> +   if (cpu_has_avx) {
> +   /* optimize performance of ctr mode encryption trasform */

The "trasform" typo is also still there. :/

> +   aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
> +   pr_info("AES CTR mode by8 optimization enabled\n");
> +   }
> +#endif
>  #endif
>
> err = crypto_fpu_init();

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/1] crypto: AES CTR x86_64 "by8" AVX optimization

2014-06-06 Thread Mathias Krause
+ jg  .Leq7\key_len
> + je  .Leq6\key_len
> +
> +.Leq5\key_len:
> + do_aes_load 5, \key_len
> + add $(5*16), p_out
> + and $(~7*16), num_bytes
> + jz  .Ldo_return2\key_len
> + jmp .Lmain_loop2\key_len
> +
> +.Leq6\key_len:
> + do_aes_load 6, \key_len
> + add $(6*16), p_out
> + and $(~7*16), num_bytes
> + jz  .Ldo_return2\key_len
> + jmp .Lmain_loop2\key_len
> +
> +.Leq7\key_len:
> + do_aes_load 7, \key_len
> + add $(7*16), p_out
> + and $(~7*16), num_bytes
> + jz  .Ldo_return2\key_len
> + jmp .Lmain_loop2\key_len
> +
> +.Lmult_of_8_blks\key_len:
> + .if (\key_len != KEY_128)
> + vmovdqa 0*16(p_keys), xkey0
> + vmovdqa 4*16(p_keys), xkey4
> + vmovdqa 8*16(p_keys), xkey8
> + vmovdqa 12*16(p_keys), xkey12
> + .else
> + vmovdqa 0*16(p_keys), xkey0
> + vmovdqa 3*16(p_keys), xkey4
> + vmovdqa 6*16(p_keys), xkey8
> + vmovdqa 9*16(p_keys), xkey12
> + .endif
> +.align 16
> +.Lmain_loop2\key_len:
> + /* num_bytes is a multiple of 8 and >0 */
> + do_aes_noload   8, \key_len
> + add $(8*16), p_out
> + sub $(8*16), num_bytes
> + jne .Lmain_loop2\key_len
> +
> +.Ldo_return2\key_len:
> + /* return updated IV */
> + vpshufb xbyteswap, xcounter, xcounter
> + vmovdqu xcounter, (p_iv)
> + ret
> +.endm
> +
> +/*
> + * routine to do AES128 CTR enc/decrypt "by8"
> + * XMM registers are clobbered.
> + * Saving/restoring must be done at a higher level
> + * aes_ctr_enc_128_avx_by8(void *in, void *iv, void *keys, void *out,
> + *   unsigned int num_bytes)
> + */
> +ENTRY(aes_ctr_enc_128_avx_by8)
> + /* call the aes main loop */
> + do_aes_ctrmain KEY_128
> +
> +ENDPROC(aes_ctr_enc_128_avx_by8)
> +
> +/*
> + * routine to do AES192 CTR enc/decrypt "by8"
> + * XMM registers are clobbered.
> + * Saving/restoring must be done at a higher level
> + * aes_ctr_enc_192_avx_by8(void *in, void *iv, void *keys, void *out,
> + *   unsigned int num_bytes)
> + */
> +ENTRY(aes_ctr_enc_192_avx_by8)
> + /* call the aes main loop */
> + do_aes_ctrmain KEY_192
> +
> +ENDPROC(aes_ctr_enc_192_avx_by8)
> +
> +/*
> + * routine to do AES256 CTR enc/decrypt "by8"
> + * XMM registers are clobbered.
> + * Saving/restoring must be done at a higher level
> + * aes_ctr_enc_256_avx_by8(void *in, void *iv, void *keys, void *out,
> + *   unsigned int num_bytes)
> + */
> +ENTRY(aes_ctr_enc_256_avx_by8)
> + /* call the aes main loop */
> + do_aes_ctrmain KEY_256
> +
> +ENDPROC(aes_ctr_enc_256_avx_by8)
> diff --git a/arch/x86/crypto/aesni-intel_glue.c 
> b/arch/x86/crypto/aesni-intel_glue.c
> index 948ad0e..0179b14 100644
> --- a/arch/x86/crypto/aesni-intel_glue.c
> +++ b/arch/x86/crypto/aesni-intel_glue.c
> @@ -105,6 +105,9 @@ void crypto_fpu_exit(void);
>  #define AVX_GEN4_OPTSIZE 4096
>  
>  #ifdef CONFIG_X86_64
> +
> +static void (*aesni_ctr_enc_tfm)(struct crypto_aes_ctx *ctx, u8 *out,
> +   const u8 *in, unsigned int len, u8 *iv);
>  asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
> const u8 *in, unsigned int len, u8 *iv);
>  
> @@ -155,6 +158,12 @@ asmlinkage void aesni_gcm_dec(void *ctx, u8 *out,
>  
> 
>  #ifdef CONFIG_AS_AVX
> +asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv,
> + void *keys, u8 *out, unsigned int num_bytes);
> +asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv,
> + void *keys, u8 *out, unsigned int num_bytes);
> +asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv,
> +         void *keys, u8 *out, unsigned int num_bytes);
>  /*
>   * asmlinkage void aesni_gcm_precomp_avx_gen2()
>   * gcm_data *my_ctx_data, context data
> @@ -472,6 +481,25 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
>   crypto_inc(ctrblk, AES_BLOCK_SIZE);
>  }
>  
> +#ifdef CONFIG_AS_AVX
> +static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
> +   const u8 *in, unsigned int len, u8 *iv)
> +{
> + /*
> +  * based on key length, override with the by8 version
> +  * of ctr mode encryption/decryption for improved performance
> +  * aes_set_key_common() ensures that key length is one of
> +  * {128,192,256}
> +  */
> + if (ctx->key_length == AES_KEYSIZE_128)
> + aes_ctr_enc_128_avx_by8(in, iv, (void *)ctx, out, len);
> + else if (ctx->key_length == AES_KEYSIZE_192)
> + aes_ctr_enc_192_avx_by8(in, iv, (void *)ctx, out, len);
> + else if (ctx->key_length == AES_KEYSIZE_256)

You can make that last one an unconditional else, because it *has* to
be that key size. Otherwise someone managed to tamper with the crypto
context and what is worse than unsing the wrong key size for the
encryption algorithm is doing no encryption at all.

> + aes_ctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len);
> +}
> +#endif
> +
>  static int ctr_crypt(struct blkcipher_desc *desc,
>struct scatterlist *dst, struct scatterlist *src,
>unsigned int nbytes)
> @@ -486,8 +514,8 @@ static int ctr_crypt(struct blkcipher_desc *desc,
>  
>   kernel_fpu_begin();
>   while ((nbytes = walk.nbytes) >= AES_BLOCK_SIZE) {
> - aesni_ctr_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
> -   nbytes & AES_BLOCK_MASK, walk.iv);
> + aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr,
> +   nbytes & AES_BLOCK_MASK, walk.iv);
>   nbytes &= AES_BLOCK_SIZE - 1;
>   err = blkcipher_walk_done(desc, &walk, nbytes);
>   }
> @@ -1493,6 +1521,14 @@ static int __init aesni_init(void)
>   aesni_gcm_enc_tfm = aesni_gcm_enc;
>   aesni_gcm_dec_tfm = aesni_gcm_dec;
>   }
> + aesni_ctr_enc_tfm = aesni_ctr_enc;
> +#ifdef CONFIG_AS_AVX
> + if (cpu_has_aes && cpu_has_avx) {

The cpu_has_aes test is still redundant.

> + /* optimize performance of ctr mode encryption trasform */
> + aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
> + pr_info("AES CTR mode by8 optimization enabled\n");
> + }
> +#endif
>  #endif
>  
>   err = crypto_fpu_init();
> -- 
> 1.8.2.1
> 
> 

With the above nitpicks handled:

Reviewed-by: Mathias Krause 


Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/1] crypto: AES CTR x86_64 "by8" AVX optimization

2014-06-03 Thread Mathias Krause
On Tue, Jun 03, 2014 at 05:41:14PM -0700, chandramouli narayanan wrote:
> This patch introduces "by8" AES CTR mode AVX optimization inspired by
> Intel Optimized IPSEC Cryptograhpic library. For additional information,
> please see:
> http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972
> 
> The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and
> aes_ctr_enc_256_avx_by8() are adapted from
> Intel Optimized IPSEC Cryptographic library. When both AES and AVX features
> are enabled in a platform, the glue code in AESNI module overrieds the
> existing "by4" CTR mode en/decryption with the "by8"
> AES CTR mode en/decryption.
> 
> On a Haswell desktop, with turbo disabled and all cpus running
> at maximum frequency, the "by8" CTR mode optimization
> shows better performance results across data & key sizes
> as measured by tcrypt.
> 
> The average performance improvement of the "by8" version over the "by4"
> version is as follows:
> 
> For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement.
> For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement.
> For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement.

Nice improvement :)

How does it perform on older processors that do have a penalty for
unaligned loads (vmovdqu), e.g. SandyBridge? If those perform worse it
might be wise to extend the CPU feature test in the glue code by a model
test to enable the "by8" variant only for Haswell and newer processors
that don't have such a penalty.

> 
> A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8"
> optimization shows the following results:
> 
> tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop:
> ---
> 
> testing speed of __ctr-aes-aesni encryption
> test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes)
> test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes)
> test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes)
> test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 
> bytes)
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 
> bytes)
> test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes)
> test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes)
> test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes)
> test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 
> bytes)
> test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 
> bytes)
> test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes)
> test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes)
> test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
> test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 
> bytes)
> test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 
> bytes)
> 
> testing speed of __ctr-aes-aesni decryption
> test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes)
> test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes)
> test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes)
> test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 
> bytes)
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 
> bytes)
> test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes)
> test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes)
> test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes)
> test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 
> bytes)
> test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 
> bytes)
> test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes)
> test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes)
> test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes)
> test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 
> bytes)
> test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 
> bytes)
> 
> tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop:
> ---
> 
> testing speed of __ctr-aes-aesni encryption
> test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes)
> test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes)
> test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes)
> test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 
> bytes)
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 
> bytes)
> test 5 (192 bit key, 16 byt

Re: [PATCH 0/3] crypto: x86/sha1 - regression and other fixes

2014-03-25 Thread Mathias Krause
On 24 March 2014 18:29, chandramouli narayanan  wrote:
> On Mon, 2014-03-24 at 17:10 +0100, Mathias Krause wrote:
>> The recent addition of the AVX2 variant of the SHA1 hash function wrongly
>> disabled the AVX variant by introducing a flaw in the feature test. Fixed
>> in patch 1.
>>
>> The alignment calculations of the AVX2 assembler implementation are
>> questionable, too. Especially the page alignment of the stack pointer is
>> broken in multiple ways. Fixed in patch 2. In patch 3 another issue for
>> code alignment is fixed.
>>
>> Please apply!
>>
>> Mathias Krause (3):
>>   crypto: x86/sha1 - re-enable the AVX variant
>>   crypto: x86/sha1 - fix stack alignment of AVX2 variant
>>   crypto: x86/sha1 - reduce size of the AVX2 asm implementation
>>
>>  arch/x86/crypto/sha1_avx2_x86_64_asm.S |8 ++--
>>  arch/x86/crypto/sha1_ssse3_glue.c  |   26 --
>>  2 files changed, 18 insertions(+), 16 deletions(-)
>>
> Your fixes are the right on mark. I went through your patches and tested
> them and found to be correct.

Thanks for double-checking!

> Sorry for causing regression and missing alignment issues in the patches
> I submitted.

No problem with that. But as I'm not subscribed to the linux-crypto
mailing list I haven't seen your earlier submissions. Otherwise I
would have objected earlier. ;)

Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] crypto: x86/sha1 - regression and other fixes

2014-03-24 Thread Mathias Krause
The recent addition of the AVX2 variant of the SHA1 hash function wrongly
disabled the AVX variant by introducing a flaw in the feature test. Fixed
in patch 1.

The alignment calculations of the AVX2 assembler implementation are
questionable, too. Especially the page alignment of the stack pointer is
broken in multiple ways. Fixed in patch 2. In patch 3 another issue for
code alignment is fixed.

Please apply!

Mathias Krause (3):
  crypto: x86/sha1 - re-enable the AVX variant
  crypto: x86/sha1 - fix stack alignment of AVX2 variant
  crypto: x86/sha1 - reduce size of the AVX2 asm implementation

 arch/x86/crypto/sha1_avx2_x86_64_asm.S |8 ++--
 arch/x86/crypto/sha1_ssse3_glue.c  |   26 --
 2 files changed, 18 insertions(+), 16 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] crypto: x86/sha1 - reduce size of the AVX2 asm implementation

2014-03-24 Thread Mathias Krause
There is really no need to page align sha1_transform_avx2. The default
alignment is just fine. This is not the hot code but only the entry
point, after all.

Cc: Chandramouli Narayanan 
Cc: H. Peter Anvin 
Cc: Marek Vasut 
Signed-off-by: Mathias Krause 
---
 arch/x86/crypto/sha1_avx2_x86_64_asm.S |1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S 
b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
index bacac22b20..1cd792db15 100644
--- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
+++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
@@ -623,7 +623,6 @@ _loop3:
  */
 .macro SHA1_VECTOR_ASM  name
ENTRY(\name)
-   .align 4096
 
push%rbx
push%rbp
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] crypto: x86/sha1 - fix stack alignment of AVX2 variant

2014-03-24 Thread Mathias Krause
The AVX2 implementation might waste up to a page of stack memory because
of a wrong alignment calculation. This will, in the worst case, increase
the stack usage of sha1_transform_avx2() alone to 5.4 kB -- way to big
for a kernel function. Even worse, it might also allocate *less* bytes
than needed if the stack pointer is already aligned bacause in that case
the 'sub %rbx, %rsp' is effectively moving the stack pointer upwards,
not downwards.

Fix those issues by changing and simplifying the alignment calculation
to use a 32 byte alignment, the alignment really needed.

Cc: Chandramouli Narayanan 
Cc: H. Peter Anvin 
Cc: Marek Vasut 
Signed-off-by: Mathias Krause 
---
 arch/x86/crypto/sha1_avx2_x86_64_asm.S |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S 
b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
index 4f348544d1..bacac22b20 100644
--- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
+++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
@@ -636,9 +636,7 @@ _loop3:
 
/* Align stack */
mov %rsp, %rbx
-   and $(0x1000-1), %rbx
-   sub $(8+32), %rbx
-   sub %rbx, %rsp
+   and $~(0x20-1), %rsp
push%rbx
sub $RESERVE_STACK, %rsp
 
@@ -665,8 +663,7 @@ _loop3:
avx2_zeroupper
 
add $RESERVE_STACK, %rsp
-   pop %rbx
-   add %rbx, %rsp
+   pop %rsp
 
pop %r15
pop %r14
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] crypto: x86/sha1 - re-enable the AVX variant

2014-03-24 Thread Mathias Krause
Commit 7c1da8d0d0 "crypto: sha - SHA1 transform x86_64 AVX2"
accidentally disabled the AVX variant by making the avx_usable() test
not only fail in case the CPU doesn't support AVX or OSXSAVE but also
if it doesn't support AVX2.

Fix that regression by splitting up the AVX/AVX2 test into two
functions. Also test for the BMI1 extension in the avx2_usable() test
as the AVX2 implementation not only makes use of BMI2 but also BMI1
instructions.

Cc: Chandramouli Narayanan 
Cc: H. Peter Anvin 
Cc: Marek Vasut 
Signed-off-by: Mathias Krause 
---
 arch/x86/crypto/sha1_ssse3_glue.c |   26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c 
b/arch/x86/crypto/sha1_ssse3_glue.c
index 139a55c04d..74d16ef707 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -208,11 +208,7 @@ static bool __init avx_usable(void)
 {
u64 xcr0;
 
-#if defined(CONFIG_AS_AVX2)
-   if (!cpu_has_avx || !cpu_has_avx2 || !cpu_has_osxsave)
-#else
if (!cpu_has_avx || !cpu_has_osxsave)
-#endif
return false;
 
xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
@@ -224,11 +220,23 @@ static bool __init avx_usable(void)
 
return true;
 }
+
+#ifdef CONFIG_AS_AVX2
+static bool __init avx2_usable(void)
+{
+   if (avx_usable() && cpu_has_avx2 && boot_cpu_has(X86_FEATURE_BMI1) &&
+   boot_cpu_has(X86_FEATURE_BMI2))
+   return true;
+
+   return false;
+}
+#endif
 #endif
 
 static int __init sha1_ssse3_mod_init(void)
 {
char *algo_name;
+
/* test for SSSE3 first */
if (cpu_has_ssse3) {
sha1_transform_asm = sha1_transform_ssse3;
@@ -238,13 +246,11 @@ static int __init sha1_ssse3_mod_init(void)
 #ifdef CONFIG_AS_AVX
/* allow AVX to override SSSE3, it's a little faster */
if (avx_usable()) {
-   if (cpu_has_avx) {
-   sha1_transform_asm = sha1_transform_avx;
-   algo_name = "AVX";
-   }
+   sha1_transform_asm = sha1_transform_avx;
+   algo_name = "AVX";
 #ifdef CONFIG_AS_AVX2
-   if (cpu_has_avx2 && boot_cpu_has(X86_FEATURE_BMI2)) {
-   /* allow AVX2 to override AVX, it's a little faster */
+   /* allow AVX2 to override AVX, it's a little faster */
+   if (avx2_usable()) {
sha1_transform_asm = sha1_apply_transform_avx2;
algo_name = "AVX2";
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] pcrypt/padata rcu fixes

2013-11-28 Thread Mathias Krause
Two small RCU related fixes, lockdep complained about.

Please apply!


Mathias Krause (2):
  crypto: pcrypt - Fix wrong usage of rcu_dereference()
  padata: Fix wrong usage of rcu_dereference()

 crypto/pcrypt.c |2 +-
 kernel/padata.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: pcrypt - Fix wrong usage of rcu_dereference()

2013-11-28 Thread Mathias Krause
A kernel with enabled lockdep complains about the wrong usage of
rcu_dereference() under a rcu_read_lock_bh() protected region.

  ===
  [ INFO: suspicious RCU usage. ]
  3.13.0-rc1+ #126 Not tainted
  ---
  linux/crypto/pcrypt.c:81 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 1
  1 lock held by cryptomgr_test/153:
   #0:  (rcu_read_lock_bh){.+}, at: [] 
pcrypt_do_parallel.isra.2+0x5/0x200

Fix that by using rcu_dereference_bh() instead.

Signed-off-by: Mathias Krause 
Cc: Steffen Klassert 
Cc: Herbert Xu 
Cc: "David S. Miller" 
---
 crypto/pcrypt.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index f8c920cafe..309d345ead 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -78,7 +78,7 @@ static int pcrypt_do_parallel(struct padata_priv *padata, 
unsigned int *cb_cpu,
cpu = *cb_cpu;
 
rcu_read_lock_bh();
-   cpumask = rcu_dereference(pcrypt->cb_cpumask);
+   cpumask = rcu_dereference_bh(pcrypt->cb_cpumask);
if (cpumask_test_cpu(cpu, cpumask->mask))
goto out;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] padata: Fix wrong usage of rcu_dereference()

2013-11-28 Thread Mathias Krause
A kernel with enabled lockdep complains about the wrong usage of
rcu_dereference() under a rcu_read_lock_bh() protected region.

  ===
  [ INFO: suspicious RCU usage. ]
  3.13.0-rc1+ #126 Not tainted
  ---
  linux/kernel/padata.c:115 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 1
  1 lock held by cryptomgr_test/153:
   #0:  (rcu_read_lock_bh){.+}, at: [] 
padata_do_parallel+0x5/0x270

Fix that by using rcu_dereference_bh() instead.

Signed-off-by: Mathias Krause 
Cc: Steffen Klassert 
---
 kernel/padata.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index 2abd25d79c..161402f0b5 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -112,7 +112,7 @@ int padata_do_parallel(struct padata_instance *pinst,
 
rcu_read_lock_bh();
 
-   pd = rcu_dereference(pinst->pd);
+   pd = rcu_dereference_bh(pinst->pd);
 
err = -EINVAL;
if (!(pinst->flags & PADATA_INIT) || pinst->flags & PADATA_INVALID)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND] padata: make the sequence counter an atomic_t

2013-10-25 Thread Mathias Krause
Using a spinlock to atomically increase a counter sounds wrong -- we've
atomic_t for this!

Also move 'seq_nr' to a different cache line than 'lock' to reduce cache
line trashing. This has the nice side effect of decreasing the size of
struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g.
occupying only two instead of three cache lines.

Those changes results in a 5% performance increase on an IPsec test run
using pcrypt.

Btw. the seq_lock spinlock was never explicitly initialized -- one more
reason to get rid of it.

Signed-off-by: Mathias Krause 
Acked-by: Steffen Klassert 
---
 include/linux/padata.h |3 +--
 kernel/padata.c|9 -
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 86292be..4386946 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -129,10 +129,9 @@ struct parallel_data {
struct padata_serial_queue  __percpu *squeue;
atomic_treorder_objects;
atomic_trefcnt;
+   atomic_tseq_nr;
struct padata_cpumask   cpumask;
spinlock_t  lock cacheline_aligned;
-   spinlock_t  seq_lock;
-   unsigned intseq_nr;
unsigned intprocessed;
struct timer_list   timer;
 };
diff --git a/kernel/padata.c b/kernel/padata.c
index 07af2c9..2abd25d 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -46,6 +46,7 @@ static int padata_index_to_cpu(struct parallel_data *pd, int 
cpu_index)
 
 static int padata_cpu_hash(struct parallel_data *pd)
 {
+   unsigned int seq_nr;
int cpu_index;
 
/*
@@ -53,10 +54,8 @@ static int padata_cpu_hash(struct parallel_data *pd)
 * seq_nr mod. number of cpus in use.
 */
 
-   spin_lock(&pd->seq_lock);
-   cpu_index =  pd->seq_nr % cpumask_weight(pd->cpumask.pcpu);
-   pd->seq_nr++;
-   spin_unlock(&pd->seq_lock);
+   seq_nr = atomic_inc_return(&pd->seq_nr);
+   cpu_index = seq_nr % cpumask_weight(pd->cpumask.pcpu);
 
return padata_index_to_cpu(pd, cpu_index);
 }
@@ -429,7 +428,7 @@ static struct parallel_data *padata_alloc_pd(struct 
padata_instance *pinst,
padata_init_pqueues(pd);
padata_init_squeues(pd);
setup_timer(&pd->timer, padata_reorder_timer, (unsigned long)pd);
-   pd->seq_nr = 0;
+   atomic_set(&pd->seq_nr, -1);
atomic_set(&pd->reorder_objects, 0);
atomic_set(&pd->refcnt, 0);
pd->pinst = pinst;
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] padata: make the sequence counter an atomic_t

2013-10-25 Thread Mathias Krause
On 25.10.2013 11:26, Herbert Xu wrote:
> On Fri, Oct 25, 2013 at 10:20:48AM +0200, Mathias Krause wrote:
>> On 08.10.2013 14:08, Steffen Klassert wrote:
>>> On Wed, Oct 02, 2013 at 03:40:45PM +0200, Mathias Krause wrote:
>>>> Using a spinlock to atomically increase a counter sounds wrong -- we've
>>>> atomic_t for this!
>>>>
>>>> Also move 'seq_nr' to a different cache line than 'lock' to reduce cache
>>>> line trashing. This has the nice side effect of decreasing the size of
>>>> struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g.
>>>> occupying only two instead of three cache lines.
>>>>
>>>> Those changes results in a 5% performance increase on an IPsec test run
>>>> using pcrypt.
>>>>
>>>> Btw. the seq_lock spinlock was never explicitly initialized -- one more
>>>> reason to get rid of it.
>>>>
>>>> Signed-off-by: Mathias Krause 
>>> Acked-by: Steffen Klassert 
>>>
>>> Herbert can you take this one?
>> Ping, Herbert? Anything wrong with the patch?
> 
> Sorry I don't seem to have this patch in my mail box.  Can you
> resend it please?

I send it to linux-crypto and Steffen only. Will resend it directed to
you, now.

> 
> Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] padata: make the sequence counter an atomic_t

2013-10-25 Thread Mathias Krause
On 08.10.2013 14:08, Steffen Klassert wrote:
> On Wed, Oct 02, 2013 at 03:40:45PM +0200, Mathias Krause wrote:
>> Using a spinlock to atomically increase a counter sounds wrong -- we've
>> atomic_t for this!
>>
>> Also move 'seq_nr' to a different cache line than 'lock' to reduce cache
>> line trashing. This has the nice side effect of decreasing the size of
>> struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g.
>> occupying only two instead of three cache lines.
>>
>> Those changes results in a 5% performance increase on an IPsec test run
>> using pcrypt.
>>
>> Btw. the seq_lock spinlock was never explicitly initialized -- one more
>> reason to get rid of it.
>>
>> Signed-off-by: Mathias Krause 
> 
> Acked-by: Steffen Klassert 
> 
> Herbert can you take this one?

Ping, Herbert? Anything wrong with the patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] crypto: ixp4xx - Simplify and harden key parsing

2013-10-15 Thread Mathias Krause
Use the common helper function crypto_authenc_extractkeys() for key
parsing. Also ensure the keys do fit into the corresponding buffers.
Otherwise memory corruption might occur.

Cc: Christian Hohnstaedt 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Signed-off-by: Mathias Krause 
---
Not tested as I've no such hardware, nor the needed cross compiler!

 drivers/crypto/ixp4xx_crypto.c |   26 +-
 1 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/crypto/ixp4xx_crypto.c b/drivers/crypto/ixp4xx_crypto.c
index 21180d6..153f73c 100644
--- a/drivers/crypto/ixp4xx_crypto.c
+++ b/drivers/crypto/ixp4xx_crypto.c
@@ -1159,32 +1159,24 @@ static int aead_setkey(struct crypto_aead *tfm, const 
u8 *key,
unsigned int keylen)
 {
struct ixp_ctx *ctx = crypto_aead_ctx(tfm);
-   struct rtattr *rta = (struct rtattr *)key;
-   struct crypto_authenc_key_param *param;
+   struct crypto_authenc_keys keys;
 
-   if (!RTA_OK(rta, keylen))
-   goto badkey;
-   if (rta->rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
-   goto badkey;
-   if (RTA_PAYLOAD(rta) < sizeof(*param))
+   if (crypto_authenc_extractkeys(&keys, key, keylen) != 0)
goto badkey;
 
-   param = RTA_DATA(rta);
-   ctx->enckey_len = be32_to_cpu(param->enckeylen);
-
-   key += RTA_ALIGN(rta->rta_len);
-   keylen -= RTA_ALIGN(rta->rta_len);
+   if (keys.authkeylen > sizeof(ctx->authkey))
+   goto badkey;
 
-   if (keylen < ctx->enckey_len)
+   if (keys.enckeylen > sizeof(ctx->enckey))
goto badkey;
 
-   ctx->authkey_len = keylen - ctx->enckey_len;
-   memcpy(ctx->enckey, key + ctx->authkey_len, ctx->enckey_len);
-   memcpy(ctx->authkey, key, ctx->authkey_len);
+   memcpy(ctx->authkey, keys.authkey, keys.authkeylen);
+   memcpy(ctx->enckey, keys.enckey, keys.enckeylen);
+   ctx->authkey_len = keys.authkeylen;
+   ctx->enckey_len = keys.enckeylen;
 
return aead_setup(tfm, crypto_aead_authsize(tfm));
 badkey:
-   ctx->enckey_len = 0;
crypto_aead_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
 }
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] crypto: authencesn - Simplify key parsing

2013-10-15 Thread Mathias Krause
Use the common helper function crypto_authenc_extractkeys() for key
parsing.

Cc: Herbert Xu 
Cc: "David S. Miller" 
Signed-off-by: Mathias Krause 
---
 crypto/authencesn.c |   26 --
 1 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/crypto/authencesn.c b/crypto/authencesn.c
index ab53762..8ed5b47 100644
--- a/crypto/authencesn.c
+++ b/crypto/authencesn.c
@@ -59,37 +59,19 @@ static void authenc_esn_request_complete(struct 
aead_request *req, int err)
 static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 
*key,
 unsigned int keylen)
 {
-   unsigned int authkeylen;
-   unsigned int enckeylen;
struct crypto_authenc_esn_ctx *ctx = crypto_aead_ctx(authenc_esn);
struct crypto_ahash *auth = ctx->auth;
struct crypto_ablkcipher *enc = ctx->enc;
-   struct rtattr *rta = (void *)key;
-   struct crypto_authenc_key_param *param;
+   struct crypto_authenc_keys keys;
int err = -EINVAL;
 
-   if (!RTA_OK(rta, keylen))
+   if (crypto_authenc_extractkeys(&keys, key, keylen) != 0)
goto badkey;
-   if (rta->rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
-   goto badkey;
-   if (RTA_PAYLOAD(rta) < sizeof(*param))
-   goto badkey;
-
-   param = RTA_DATA(rta);
-   enckeylen = be32_to_cpu(param->enckeylen);
-
-   key += RTA_ALIGN(rta->rta_len);
-   keylen -= RTA_ALIGN(rta->rta_len);
-
-   if (keylen < enckeylen)
-   goto badkey;
-
-   authkeylen = keylen - enckeylen;
 
crypto_ahash_clear_flags(auth, CRYPTO_TFM_REQ_MASK);
crypto_ahash_set_flags(auth, crypto_aead_get_flags(authenc_esn) &
 CRYPTO_TFM_REQ_MASK);
-   err = crypto_ahash_setkey(auth, key, authkeylen);
+   err = crypto_ahash_setkey(auth, keys.authkey, keys.authkeylen);
crypto_aead_set_flags(authenc_esn, crypto_ahash_get_flags(auth) &
   CRYPTO_TFM_RES_MASK);
 
@@ -99,7 +81,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead 
*authenc_esn, const u8 *
crypto_ablkcipher_clear_flags(enc, CRYPTO_TFM_REQ_MASK);
crypto_ablkcipher_set_flags(enc, crypto_aead_get_flags(authenc_esn) &
 CRYPTO_TFM_REQ_MASK);
-   err = crypto_ablkcipher_setkey(enc, key + authkeylen, enckeylen);
+   err = crypto_ablkcipher_setkey(enc, keys.enckey, keys.enckeylen);
crypto_aead_set_flags(authenc_esn, crypto_ablkcipher_get_flags(enc) &
   CRYPTO_TFM_RES_MASK);
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Authenc key parsing consolidation

2013-10-15 Thread Mathias Krause
This series removes the code duplication of authenc key parsing by
introducing a common helper function crypto_authenc_extractkeys() in
patch 1. Patches 2 to 5 change all remaining places to use the new
helper. Patches 3 and 4 also fix potential memory corruptions by
ensuring the supplied keys won't overflow there respective buffers.

I was unable to test patches 3 to 5 as I don't have the needed hardware
for these devices -- not even a cross compiler for those architectures.

In case patches 3 and 4 are enqueued for stable, patch 1 needs to be as
well, as it's a prerequisite for those.

Please apply!

Mathias Krause (5):
  crypto: authenc - Export key parsing helper function
  crypto: authencesn - Simplify key parsing
  crypto: ixp4xx - Simplify and harden key parsing
  crypto: picoxcell - Simplify and harden key parsing
  crypto: talitos - Simplify key parsing

 crypto/authenc.c  |   48 +++--
 crypto/authencesn.c   |   26 +++-
 drivers/crypto/ixp4xx_crypto.c|   26 +++-
 drivers/crypto/picoxcell_crypto.c |   32 ++--
 drivers/crypto/talitos.c  |   35 ++
 include/crypto/authenc.h  |   12 -
 6 files changed, 70 insertions(+), 109 deletions(-)

-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] crypto: talitos - Simplify key parsing

2013-10-15 Thread Mathias Krause
Use the common helper function crypto_authenc_extractkeys() for key
parsing.

Cc: Kim Phillips 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Signed-off-by: Mathias Krause 
---
Not tested as I've no such hardware, nor the needed cross compiler!

 drivers/crypto/talitos.c |   35 ---
 1 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 661dc3e..f6f7c68 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -671,39 +671,20 @@ static int aead_setkey(struct crypto_aead *authenc,
   const u8 *key, unsigned int keylen)
 {
struct talitos_ctx *ctx = crypto_aead_ctx(authenc);
-   struct rtattr *rta = (void *)key;
-   struct crypto_authenc_key_param *param;
-   unsigned int authkeylen;
-   unsigned int enckeylen;
-
-   if (!RTA_OK(rta, keylen))
-   goto badkey;
+   struct crypto_authenc_keys keys;
 
-   if (rta->rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
+   if (crypto_authenc_extractkeys(&keys, key, keylen) != 0)
goto badkey;
 
-   if (RTA_PAYLOAD(rta) < sizeof(*param))
+   if (keys.authkeylen + keys.enckeylen > TALITOS_MAX_KEY_SIZE)
goto badkey;
 
-   param = RTA_DATA(rta);
-   enckeylen = be32_to_cpu(param->enckeylen);
-
-   key += RTA_ALIGN(rta->rta_len);
-   keylen -= RTA_ALIGN(rta->rta_len);
-
-   if (keylen < enckeylen)
-   goto badkey;
+   memcpy(ctx->key, keys.authkey, keys.authkeylen);
+   memcpy(&ctx->key[keys.authkeylen], keys.enckey, keys.enckeylen);
 
-   authkeylen = keylen - enckeylen;
-
-   if (keylen > TALITOS_MAX_KEY_SIZE)
-   goto badkey;
-
-   memcpy(&ctx->key, key, keylen);
-
-   ctx->keylen = keylen;
-   ctx->enckeylen = enckeylen;
-   ctx->authkeylen = authkeylen;
+   ctx->keylen = keys.authkeylen + keys.enckeylen;
+   ctx->enckeylen = keys.enckeylen;
+   ctx->authkeylen = keys.authkeylen;
 
return 0;
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] crypto: authenc - Export key parsing helper function

2013-10-15 Thread Mathias Krause
AEAD key parsing is duplicated to multiple places in the kernel. Add a
common helper function to consolidate that functionality.

Cc: Herbert Xu 
Cc: "David S. Miller" 
Signed-off-by: Mathias Krause 
---
 crypto/authenc.c |   48 -
 include/crypto/authenc.h |   12 ++-
 2 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/crypto/authenc.c b/crypto/authenc.c
index ffce19d..0fda5ba 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -52,40 +52,52 @@ static void authenc_request_complete(struct aead_request 
*req, int err)
aead_request_complete(req, err);
 }
 
-static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key,
-unsigned int keylen)
+int crypto_authenc_extractkeys(struct crypto_authenc_keys *keys, const u8 *key,
+  unsigned int keylen)
 {
-   unsigned int authkeylen;
-   unsigned int enckeylen;
-   struct crypto_authenc_ctx *ctx = crypto_aead_ctx(authenc);
-   struct crypto_ahash *auth = ctx->auth;
-   struct crypto_ablkcipher *enc = ctx->enc;
-   struct rtattr *rta = (void *)key;
+   struct rtattr *rta = (struct rtattr *)key;
struct crypto_authenc_key_param *param;
-   int err = -EINVAL;
 
if (!RTA_OK(rta, keylen))
-   goto badkey;
+   return -EINVAL;
if (rta->rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
-   goto badkey;
+   return -EINVAL;
if (RTA_PAYLOAD(rta) < sizeof(*param))
-   goto badkey;
+   return -EINVAL;
 
param = RTA_DATA(rta);
-   enckeylen = be32_to_cpu(param->enckeylen);
+   keys->enckeylen = be32_to_cpu(param->enckeylen);
 
key += RTA_ALIGN(rta->rta_len);
keylen -= RTA_ALIGN(rta->rta_len);
 
-   if (keylen < enckeylen)
-   goto badkey;
+   if (keylen < keys->enckeylen)
+   return -EINVAL;
 
-   authkeylen = keylen - enckeylen;
+   keys->authkeylen = keylen - keys->enckeylen;
+   keys->authkey = key;
+   keys->enckey = key + keys->authkeylen;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(crypto_authenc_extractkeys);
+
+static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key,
+unsigned int keylen)
+{
+   struct crypto_authenc_ctx *ctx = crypto_aead_ctx(authenc);
+   struct crypto_ahash *auth = ctx->auth;
+   struct crypto_ablkcipher *enc = ctx->enc;
+   struct crypto_authenc_keys keys;
+   int err = -EINVAL;
+
+   if (crypto_authenc_extractkeys(&keys, key, keylen) != 0)
+   goto badkey;
 
crypto_ahash_clear_flags(auth, CRYPTO_TFM_REQ_MASK);
crypto_ahash_set_flags(auth, crypto_aead_get_flags(authenc) &
CRYPTO_TFM_REQ_MASK);
-   err = crypto_ahash_setkey(auth, key, authkeylen);
+   err = crypto_ahash_setkey(auth, keys.authkey, keys.authkeylen);
crypto_aead_set_flags(authenc, crypto_ahash_get_flags(auth) &
   CRYPTO_TFM_RES_MASK);
 
@@ -95,7 +107,7 @@ static int crypto_authenc_setkey(struct crypto_aead 
*authenc, const u8 *key,
crypto_ablkcipher_clear_flags(enc, CRYPTO_TFM_REQ_MASK);
crypto_ablkcipher_set_flags(enc, crypto_aead_get_flags(authenc) &
 CRYPTO_TFM_REQ_MASK);
-   err = crypto_ablkcipher_setkey(enc, key + authkeylen, enckeylen);
+   err = crypto_ablkcipher_setkey(enc, keys.enckey, keys.enckeylen);
crypto_aead_set_flags(authenc, crypto_ablkcipher_get_flags(enc) &
   CRYPTO_TFM_RES_MASK);
 
diff --git a/include/crypto/authenc.h b/include/crypto/authenc.h
index e47b044..6775059 100644
--- a/include/crypto/authenc.h
+++ b/include/crypto/authenc.h
@@ -23,5 +23,15 @@ struct crypto_authenc_key_param {
__be32 enckeylen;
 };
 
-#endif /* _CRYPTO_AUTHENC_H */
+struct crypto_authenc_keys {
+   const u8 *authkey;
+   const u8 *enckey;
+
+   unsigned int authkeylen;
+   unsigned int enckeylen;
+};
 
+int crypto_authenc_extractkeys(struct crypto_authenc_keys *keys, const u8 *key,
+  unsigned int keylen);
+
+#endif /* _CRYPTO_AUTHENC_H */
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] crypto: picoxcell - Simplify and harden key parsing

2013-10-15 Thread Mathias Krause
Use the common helper function crypto_authenc_extractkeys() for key
parsing. Also ensure the auth key won't overflow the hash_ctx buffer.

Cc: Jamie Iles 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Signed-off-by: Mathias Krause 
---
Not tested as I've no such hardware, nor the needed cross compiler!

 drivers/crypto/picoxcell_crypto.c |   32 
 1 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/drivers/crypto/picoxcell_crypto.c 
b/drivers/crypto/picoxcell_crypto.c
index 888f7f4..a6175ba 100644
--- a/drivers/crypto/picoxcell_crypto.c
+++ b/drivers/crypto/picoxcell_crypto.c
@@ -495,45 +495,29 @@ static int spacc_aead_setkey(struct crypto_aead *tfm, 
const u8 *key,
 {
struct spacc_aead_ctx *ctx = crypto_aead_ctx(tfm);
struct spacc_alg *alg = to_spacc_alg(tfm->base.__crt_alg);
-   struct rtattr *rta = (void *)key;
-   struct crypto_authenc_key_param *param;
-   unsigned int authkeylen, enckeylen;
+   struct crypto_authenc_keys keys;
int err = -EINVAL;
 
-   if (!RTA_OK(rta, keylen))
+   if (crypto_authenc_extractkeys(&keys, key, keylen) != 0)
goto badkey;
 
-   if (rta->rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
+   if (keys.enckeylen > AES_MAX_KEY_SIZE)
goto badkey;
 
-   if (RTA_PAYLOAD(rta) < sizeof(*param))
-   goto badkey;
-
-   param = RTA_DATA(rta);
-   enckeylen = be32_to_cpu(param->enckeylen);
-
-   key += RTA_ALIGN(rta->rta_len);
-   keylen -= RTA_ALIGN(rta->rta_len);
-
-   if (keylen < enckeylen)
-   goto badkey;
-
-   authkeylen = keylen - enckeylen;
-
-   if (enckeylen > AES_MAX_KEY_SIZE)
+   if (keys.authkeylen > sizeof(ctx->hash_ctx))
goto badkey;
 
if ((alg->ctrl_default & SPACC_CRYPTO_ALG_MASK) ==
SPA_CTRL_CIPH_ALG_AES)
-   err = spacc_aead_aes_setkey(tfm, key + authkeylen, enckeylen);
+   err = spacc_aead_aes_setkey(tfm, keys.enckey, keys.enckeylen);
else
-   err = spacc_aead_des_setkey(tfm, key + authkeylen, enckeylen);
+   err = spacc_aead_des_setkey(tfm, keys.enckey, keys.enckeylen);
 
if (err)
goto badkey;
 
-   memcpy(ctx->hash_ctx, key, authkeylen);
-   ctx->hash_key_len = authkeylen;
+   memcpy(ctx->hash_ctx, keys.authkey, keys.authkeylen);
+   ctx->hash_key_len = keys.authkeylen;
 
return 0;
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] padata: make the sequence counter an atomic_t

2013-10-02 Thread Mathias Krause
Using a spinlock to atomically increase a counter sounds wrong -- we've
atomic_t for this!

Also move 'seq_nr' to a different cache line than 'lock' to reduce cache
line trashing. This has the nice side effect of decreasing the size of
struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g.
occupying only two instead of three cache lines.

Those changes results in a 5% performance increase on an IPsec test run
using pcrypt.

Btw. the seq_lock spinlock was never explicitly initialized -- one more
reason to get rid of it.

Signed-off-by: Mathias Krause 
---
 include/linux/padata.h |3 +--
 kernel/padata.c|9 -
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 86292be..4386946 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -129,10 +129,9 @@ struct parallel_data {
struct padata_serial_queue  __percpu *squeue;
atomic_treorder_objects;
atomic_trefcnt;
+   atomic_tseq_nr;
struct padata_cpumask   cpumask;
spinlock_t  lock cacheline_aligned;
-   spinlock_t  seq_lock;
-   unsigned intseq_nr;
unsigned intprocessed;
struct timer_list   timer;
 };
diff --git a/kernel/padata.c b/kernel/padata.c
index 07af2c9..2abd25d 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -46,6 +46,7 @@ static int padata_index_to_cpu(struct parallel_data *pd, int 
cpu_index)
 
 static int padata_cpu_hash(struct parallel_data *pd)
 {
+   unsigned int seq_nr;
int cpu_index;
 
/*
@@ -53,10 +54,8 @@ static int padata_cpu_hash(struct parallel_data *pd)
 * seq_nr mod. number of cpus in use.
 */
 
-   spin_lock(&pd->seq_lock);
-   cpu_index =  pd->seq_nr % cpumask_weight(pd->cpumask.pcpu);
-   pd->seq_nr++;
-   spin_unlock(&pd->seq_lock);
+   seq_nr = atomic_inc_return(&pd->seq_nr);
+   cpu_index = seq_nr % cpumask_weight(pd->cpumask.pcpu);
 
return padata_index_to_cpu(pd, cpu_index);
 }
@@ -429,7 +428,7 @@ static struct parallel_data *padata_alloc_pd(struct 
padata_instance *pinst,
padata_init_pqueues(pd);
padata_init_squeues(pd);
setup_timer(&pd->timer, padata_reorder_timer, (unsigned long)pd);
-   pd->seq_nr = 0;
+   atomic_set(&pd->seq_nr, -1);
atomic_set(&pd->reorder_objects, 0);
atomic_set(&pd->refcnt, 0);
pd->pinst = pinst;
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: algif - suppress sending source address information in recvmsg

2013-04-21 Thread Mathias Krause
On Wed, Apr 10, 2013 at 8:26 AM, Herbert Xu  wrote:
> On Wed, Apr 10, 2013 at 08:21:51AM +0200, Mathias Krause wrote:
>> On Wed, Apr 10, 2013 at 5:31 AM, Herbert Xu  
>> wrote:
>> > On Sun, Apr 07, 2013 at 02:05:39PM +0200, Mathias Krause wrote:
>> >> The current code does not set the msg_namelen member to 0 and therefore
>> >> makes net/socket.c leak the local sockaddr_storage variable to userland
>> >> -- 128 bytes of kernel stack memory. Fix that.
>> >>
>> >> Signed-off-by: Mathias Krause 
>> >
>> > Patch applied.  Thanks!
>>
>> Thanks, but that patch shouldn't have been applied to cryptodev but
>> crypto instead, and probably queued up for stable as well.
>> I missed the 'Cc: stable # v2.6.38'. My bad.
>
> OK, I'll move it across and add the stable Cc.

Any specific reason you're not pushing it to Linus for inclusion in v3.9?

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: algif - suppress sending source address information in recvmsg

2013-04-09 Thread Mathias Krause
On Wed, Apr 10, 2013 at 5:31 AM, Herbert Xu  wrote:
> On Sun, Apr 07, 2013 at 02:05:39PM +0200, Mathias Krause wrote:
>> The current code does not set the msg_namelen member to 0 and therefore
>> makes net/socket.c leak the local sockaddr_storage variable to userland
>> -- 128 bytes of kernel stack memory. Fix that.
>>
>> Signed-off-by: Mathias Krause 
>
> Patch applied.  Thanks!

Thanks, but that patch shouldn't have been applied to cryptodev but
crypto instead, and probably queued up for stable as well.
I missed the 'Cc: stable # v2.6.38'. My bad.

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: algif - suppress sending source address information in recvmsg

2013-04-07 Thread Mathias Krause
The current code does not set the msg_namelen member to 0 and therefore
makes net/socket.c leak the local sockaddr_storage variable to userland
-- 128 bytes of kernel stack memory. Fix that.

Signed-off-by: Mathias Krause 
---
 crypto/algif_hash.c |2 ++
 crypto/algif_skcipher.c |1 +
 2 files changed, 3 insertions(+)

diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index ef5356c..0262210 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -161,6 +161,8 @@ static int hash_recvmsg(struct kiocb *unused, struct socket 
*sock,
else if (len < ds)
msg->msg_flags |= MSG_TRUNC;
 
+   msg->msg_namelen = 0;
+
lock_sock(sk);
if (ctx->more) {
ctx->more = 0;
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index 6a6dfc0..a1c4f0a 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -432,6 +432,7 @@ static int skcipher_recvmsg(struct kiocb *unused, struct 
socket *sock,
long copied = 0;
 
lock_sock(sk);
+   msg->msg_namelen = 0;
for (iov = msg->msg_iov, iovlen = msg->msg_iovlen; iovlen > 0;
 iovlen--, iov++) {
unsigned long seglen = iov->iov_len;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: user - constify netlink dispatch table

2013-02-24 Thread Mathias Krause
There is no need to modify the netlink dispatch table at runtime and
making it const even makes the resulting object file slightly smaller.

Cc: Steffen Klassert 
Signed-off-by: Mathias Krause 
---
 crypto/crypto_user.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 35d700a..77f6c4e 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -426,7 +426,7 @@ static const struct nla_policy 
crypto_policy[CRYPTOCFGA_MAX+1] = {
 
 #undef MSGSIZE
 
-static struct crypto_link {
+static const struct crypto_link {
int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
int (*dump)(struct sk_buff *, struct netlink_callback *);
int (*done)(struct netlink_callback *);
@@ -442,7 +442,7 @@ static struct crypto_link {
 static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
struct nlattr *attrs[CRYPTOCFGA_MAX+1];
-   struct crypto_link *link;
+   const struct crypto_link *link;
int type, err;
 
type = nlh->nlmsg_type;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] crypto: user - ensure user supplied strings are nul-terminated

2013-02-05 Thread Mathias Krause
To avoid misuse, ensure cru_name and cru_driver_name are always
nul-terminated strings.

Signed-off-by: Mathias Krause 
---
 crypto/crypto_user.c |   14 ++
 1 file changed, 14 insertions(+)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 423a267..dfd511f 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -30,6 +30,8 @@
 
 #include "internal.h"
 
+#define null_terminated(x) (strnlen(x, sizeof(x)) < sizeof(x))
+
 static DEFINE_MUTEX(crypto_cfg_mutex);
 
 /* The crypto netlink socket */
@@ -196,6 +198,9 @@ static int crypto_report(struct sk_buff *in_skb, struct 
nlmsghdr *in_nlh,
struct crypto_dump_info info;
int err;
 
+   if (!null_terminated(p->cru_name) || 
!null_terminated(p->cru_driver_name))
+   return -EINVAL;
+
if (!p->cru_driver_name[0])
return -EINVAL;
 
@@ -260,6 +265,9 @@ static int crypto_update_alg(struct sk_buff *skb, struct 
nlmsghdr *nlh,
struct nlattr *priority = attrs[CRYPTOCFGA_PRIORITY_VAL];
LIST_HEAD(list);
 
+   if (!null_terminated(p->cru_name) || 
!null_terminated(p->cru_driver_name))
+   return -EINVAL;
+
if (priority && !strlen(p->cru_driver_name))
return -EINVAL;
 
@@ -287,6 +295,9 @@ static int crypto_del_alg(struct sk_buff *skb, struct 
nlmsghdr *nlh,
struct crypto_alg *alg;
struct crypto_user_alg *p = nlmsg_data(nlh);
 
+   if (!null_terminated(p->cru_name) || 
!null_terminated(p->cru_driver_name))
+   return -EINVAL;
+
alg = crypto_alg_match(p, 1);
if (!alg)
return -ENOENT;
@@ -368,6 +379,9 @@ static int crypto_add_alg(struct sk_buff *skb, struct 
nlmsghdr *nlh,
struct crypto_user_alg *p = nlmsg_data(nlh);
struct nlattr *priority = attrs[CRYPTOCFGA_PRIORITY_VAL];
 
+   if (!null_terminated(p->cru_name) || 
!null_terminated(p->cru_driver_name))
+   return -EINVAL;
+
if (strlen(p->cru_driver_name))
exact = 1;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 2/3] crypto: user - fix empty string test in report API

2013-02-05 Thread Mathias Krause
The current test for empty strings fails because it is testing the
address of a field, not a pointer. So the test will always be true.
Test the first character in the string to not be null instead.

Signed-off-by: Mathias Krause 
Cc: Steffen Klassert 
---
v2: - switched to simple character test, as suggested by Herbert Xu

 crypto/crypto_user.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index f6d9baf..423a267 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -196,7 +196,7 @@ static int crypto_report(struct sk_buff *in_skb, struct 
nlmsghdr *in_nlh,
struct crypto_dump_info info;
int err;
 
-   if (!p->cru_driver_name)
+   if (!p->cru_driver_name[0])
return -EINVAL;
 
alg = crypto_alg_match(p, 1);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 1/3] crypto: user - fix info leaks in report API

2013-02-05 Thread Mathias Krause
Three errors resulting in kernel memory disclosure:

1/ The structures used for the netlink based crypto algorithm report API
are located on the stack. As snprintf() does not fill the remainder of
the buffer with null bytes, those stack bytes will be disclosed to users
of the API. Switch to strncpy() to fix this.

2/ crypto_report_one() does not initialize all field of struct
crypto_user_alg. Fix this to fix the heap info leak.

3/ For the module name we should copy only as many bytes as
module_name() returns -- not as much as the destination buffer could
hold. But the current code does not and therefore copies random data
from behind the end of the module name, as the module name is always
shorter than CRYPTO_MAX_ALG_NAME.

Also switch to use strncpy() to copy the algorithm's name and
driver_name. They are strings, after all.

Signed-off-by: Mathias Krause 
Cc: Steffen Klassert 
---
 crypto/ablkcipher.c  |   12 ++--
 crypto/aead.c|9 -
 crypto/ahash.c   |2 +-
 crypto/blkcipher.c   |6 +++---
 crypto/crypto_user.c |   22 +++---
 crypto/pcompress.c   |3 +--
 crypto/rng.c |2 +-
 crypto/shash.c   |3 ++-
 8 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c
index 533de95..7d4a8d2 100644
--- a/crypto/ablkcipher.c
+++ b/crypto/ablkcipher.c
@@ -388,9 +388,9 @@ static int crypto_ablkcipher_report(struct sk_buff *skb, 
struct crypto_alg *alg)
 {
struct crypto_report_blkcipher rblkcipher;
 
-   snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, "%s", "ablkcipher");
-   snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, "%s",
-alg->cra_ablkcipher.geniv ?: "");
+   strncpy(rblkcipher.type, "ablkcipher", sizeof(rblkcipher.type));
+   strncpy(rblkcipher.geniv, alg->cra_ablkcipher.geniv ?: "",
+   sizeof(rblkcipher.geniv));
 
rblkcipher.blocksize = alg->cra_blocksize;
rblkcipher.min_keysize = alg->cra_ablkcipher.min_keysize;
@@ -469,9 +469,9 @@ static int crypto_givcipher_report(struct sk_buff *skb, 
struct crypto_alg *alg)
 {
struct crypto_report_blkcipher rblkcipher;
 
-   snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, "%s", "givcipher");
-   snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, "%s",
-alg->cra_ablkcipher.geniv ?: "");
+   strncpy(rblkcipher.type, "givcipher", sizeof(rblkcipher.type));
+   strncpy(rblkcipher.geniv, alg->cra_ablkcipher.geniv ?: "",
+   sizeof(rblkcipher.geniv));
 
rblkcipher.blocksize = alg->cra_blocksize;
rblkcipher.min_keysize = alg->cra_ablkcipher.min_keysize;
diff --git a/crypto/aead.c b/crypto/aead.c
index 0b8121e..27bc487 100644
--- a/crypto/aead.c
+++ b/crypto/aead.c
@@ -117,9 +117,8 @@ static int crypto_aead_report(struct sk_buff *skb, struct 
crypto_alg *alg)
struct crypto_report_aead raead;
struct aead_alg *aead = &alg->cra_aead;
 
-   snprintf(raead.type, CRYPTO_MAX_ALG_NAME, "%s", "aead");
-   snprintf(raead.geniv, CRYPTO_MAX_ALG_NAME, "%s",
-aead->geniv ?: "");
+   strncpy(raead.type, "aead", sizeof(raead.type));
+   strncpy(raead.geniv, aead->geniv ?: "", sizeof(raead.geniv));
 
raead.blocksize = alg->cra_blocksize;
raead.maxauthsize = aead->maxauthsize;
@@ -203,8 +202,8 @@ static int crypto_nivaead_report(struct sk_buff *skb, 
struct crypto_alg *alg)
struct crypto_report_aead raead;
struct aead_alg *aead = &alg->cra_aead;
 
-   snprintf(raead.type, CRYPTO_MAX_ALG_NAME, "%s", "nivaead");
-   snprintf(raead.geniv, CRYPTO_MAX_ALG_NAME, "%s", aead->geniv);
+   strncpy(raead.type, "nivaead", sizeof(raead.type));
+   strncpy(raead.geniv, aead->geniv, sizeof(raead.geniv));
 
raead.blocksize = alg->cra_blocksize;
raead.maxauthsize = aead->maxauthsize;
diff --git a/crypto/ahash.c b/crypto/ahash.c
index 3887856..793a27f 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -404,7 +404,7 @@ static int crypto_ahash_report(struct sk_buff *skb, struct 
crypto_alg *alg)
 {
struct crypto_report_hash rhash;
 
-   snprintf(rhash.type, CRYPTO_MAX_ALG_NAME, "%s", "ahash");
+   strncpy(rhash.type, "ahash", sizeof(rhash.type));
 
rhash.blocksize = alg->cra_blocksize;
rhash.digestsize = __crypto_hash_alg_common(alg)->digestsize;
diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index a8d85a1..c44e014 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -499,9 +499,9 @@ static int crypto_blkcipher_report(struct sk_buff *skb, 
struct crypto_alg *alg)
 {
struct crypto_repo

[PATCHv2 0/3] crypto user API fixes

2013-02-05 Thread Mathias Krause
This series fixes kernel memory disclosures (aka info leaks) and a bug
in the empty string test. In addition to the first version of this
series it also ensures all user supplied strings are nul-terminated
before using them.

Patch 1 is the same as in the first series, patch 2 was changed as
suggested by Herbert, patch 3 is new.

Please apply!


Mathias Krause (3):
  crypto: user - fix info leaks in report API
  crypto: user - fix empty string test in report API
  crypto: user - ensure user supplied strings are nul-terminated

 crypto/ablkcipher.c  |   12 ++--
 crypto/aead.c|9 -
 crypto/ahash.c   |2 +-
 crypto/blkcipher.c   |6 +++---
 crypto/crypto_user.c |   38 ++
 crypto/pcompress.c   |3 +--
 crypto/rng.c |2 +-
 crypto/shash.c   |3 ++-
 8 files changed, 44 insertions(+), 31 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] crypto: user - fix empty string test in report API

2013-02-05 Thread Mathias Krause
On Mon, Feb 4, 2013 at 2:15 PM, Herbert Xu  wrote:
> On Sun, Feb 03, 2013 at 12:09:01PM +0100, Mathias Krause wrote:
>> The current test for empty strings fails because it is testing the
>> address of a field, not a pointer. So the test will always be true.
>> Test for the string length instead.
>>
>> Signed-off-by: Mathias Krause 
>> Cc: Steffen Klassert 
>
> Good catch.  However, what if cru_driver_name isn't NUL-terminated?

Your objection is totally valid, sure. And my initial idea wouldn't
have that problem as it would just test for the first character to be
'\0', i.e. do something like that:

-   if (!p->cru_driver_name)
+   if (!p->cru_driver_name[0])

But then I looked how the other code in the crypto user API does refer
to string lengths related to cru_driver_name and switched to strlen().
So the other code is (potentially) vulnerable to non-NUL-terminated
strings, too.

So, I think we need another patch that adds sanity checks for
non-NUL-terminated strings. I can do this, maybe this evening, and
send out a new version of the patch series if you like me to.

Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: user - fix info leaks in report API

2013-02-03 Thread Mathias Krause
Three errors resulting in kernel memory disclosure:

1/ The structures used for the netlink based crypto algorithm report API
are located on the stack. As snprintf() does not fill the remainder of
the buffer with null bytes, those stack bytes will be disclosed to users
of the API. Switch to strncpy() to fix this.

2/ crypto_report_one() does not initialize all field of struct
crypto_user_alg. Fix this to fix the heap info leak.

3/ For the module name we should copy only as many bytes as
module_name() returns -- not as much as the destination buffer could
hold. But the current code does not and therefore copies random data
from behind the end of the module name, as the module name is always
shorter than CRYPTO_MAX_ALG_NAME.

Also switch to use strncpy() to copy the algorithm's name and
driver_name. They are strings, after all.

Signed-off-by: Mathias Krause 
Cc: Steffen Klassert 
---
 crypto/ablkcipher.c  |   12 ++--
 crypto/aead.c|9 -
 crypto/ahash.c   |2 +-
 crypto/blkcipher.c   |6 +++---
 crypto/crypto_user.c |   22 +++---
 crypto/pcompress.c   |3 +--
 crypto/rng.c |2 +-
 crypto/shash.c   |3 ++-
 8 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c
index 533de95..7d4a8d2 100644
--- a/crypto/ablkcipher.c
+++ b/crypto/ablkcipher.c
@@ -388,9 +388,9 @@ static int crypto_ablkcipher_report(struct sk_buff *skb, 
struct crypto_alg *alg)
 {
struct crypto_report_blkcipher rblkcipher;
 
-   snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, "%s", "ablkcipher");
-   snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, "%s",
-alg->cra_ablkcipher.geniv ?: "");
+   strncpy(rblkcipher.type, "ablkcipher", sizeof(rblkcipher.type));
+   strncpy(rblkcipher.geniv, alg->cra_ablkcipher.geniv ?: "",
+   sizeof(rblkcipher.geniv));
 
rblkcipher.blocksize = alg->cra_blocksize;
rblkcipher.min_keysize = alg->cra_ablkcipher.min_keysize;
@@ -469,9 +469,9 @@ static int crypto_givcipher_report(struct sk_buff *skb, 
struct crypto_alg *alg)
 {
struct crypto_report_blkcipher rblkcipher;
 
-   snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, "%s", "givcipher");
-   snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, "%s",
-alg->cra_ablkcipher.geniv ?: "");
+   strncpy(rblkcipher.type, "givcipher", sizeof(rblkcipher.type));
+   strncpy(rblkcipher.geniv, alg->cra_ablkcipher.geniv ?: "",
+   sizeof(rblkcipher.geniv));
 
rblkcipher.blocksize = alg->cra_blocksize;
rblkcipher.min_keysize = alg->cra_ablkcipher.min_keysize;
diff --git a/crypto/aead.c b/crypto/aead.c
index 0b8121e..27bc487 100644
--- a/crypto/aead.c
+++ b/crypto/aead.c
@@ -117,9 +117,8 @@ static int crypto_aead_report(struct sk_buff *skb, struct 
crypto_alg *alg)
struct crypto_report_aead raead;
struct aead_alg *aead = &alg->cra_aead;
 
-   snprintf(raead.type, CRYPTO_MAX_ALG_NAME, "%s", "aead");
-   snprintf(raead.geniv, CRYPTO_MAX_ALG_NAME, "%s",
-aead->geniv ?: "");
+   strncpy(raead.type, "aead", sizeof(raead.type));
+   strncpy(raead.geniv, aead->geniv ?: "", sizeof(raead.geniv));
 
raead.blocksize = alg->cra_blocksize;
raead.maxauthsize = aead->maxauthsize;
@@ -203,8 +202,8 @@ static int crypto_nivaead_report(struct sk_buff *skb, 
struct crypto_alg *alg)
struct crypto_report_aead raead;
struct aead_alg *aead = &alg->cra_aead;
 
-   snprintf(raead.type, CRYPTO_MAX_ALG_NAME, "%s", "nivaead");
-   snprintf(raead.geniv, CRYPTO_MAX_ALG_NAME, "%s", aead->geniv);
+   strncpy(raead.type, "nivaead", sizeof(raead.type));
+   strncpy(raead.geniv, aead->geniv, sizeof(raead.geniv));
 
raead.blocksize = alg->cra_blocksize;
raead.maxauthsize = aead->maxauthsize;
diff --git a/crypto/ahash.c b/crypto/ahash.c
index 3887856..793a27f 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -404,7 +404,7 @@ static int crypto_ahash_report(struct sk_buff *skb, struct 
crypto_alg *alg)
 {
struct crypto_report_hash rhash;
 
-   snprintf(rhash.type, CRYPTO_MAX_ALG_NAME, "%s", "ahash");
+   strncpy(rhash.type, "ahash", sizeof(rhash.type));
 
rhash.blocksize = alg->cra_blocksize;
rhash.digestsize = __crypto_hash_alg_common(alg)->digestsize;
diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index a8d85a1..c44e014 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -499,9 +499,9 @@ static int crypto_blkcipher_report(struct sk_buff *skb, 
struct crypto_alg *alg)
 {
struct crypto_repo

[PATCH 2/2] crypto: user - fix empty string test in report API

2013-02-03 Thread Mathias Krause
The current test for empty strings fails because it is testing the
address of a field, not a pointer. So the test will always be true.
Test for the string length instead.

Signed-off-by: Mathias Krause 
Cc: Steffen Klassert 
---
 crypto/crypto_user.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index f6d9baf..1b9a9a1 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -196,7 +196,7 @@ static int crypto_report(struct sk_buff *in_skb, struct 
nlmsghdr *in_nlh,
struct crypto_dump_info info;
int err;
 
-   if (!p->cru_driver_name)
+   if (!strlen(p->cru_driver_name))
return -EINVAL;
 
alg = crypto_alg_match(p, 1);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] crypto user API fixes

2013-02-03 Thread Mathias Krause
This series fixes kernel memory disclosures (aka info leaks) and a bug
in the empty string test.

The crypto user API is protected by CAP_NET_ADMIN so one needs some
level of privilege already to exploit the leaks. It still might be
material for stable, though. Your choice.

Please apply!


Mathias Krause (2):
  crypto: user - fix info leaks in report API
  crypto: user - fix empty string test in report API

 crypto/ablkcipher.c  |   12 ++--
 crypto/aead.c|9 -
 crypto/ahash.c   |2 +-
 crypto/blkcipher.c   |6 +++---
 crypto/crypto_user.c |   24 
 crypto/pcompress.c   |3 +--
 crypto/rng.c |2 +-
 crypto/shash.c   |3 ++-
 8 files changed, 30 insertions(+), 31 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: testmgr - remove superfluous initializers for xts(aes)

2012-11-20 Thread Mathias Krause
The test vectors for 'xts(aes)' contain superfluous initializers.
Remove them.

Signed-off-by: Mathias Krause 
Cc: Jarod Wilson 
---
 crypto/testmgr.h |4 
 1 file changed, 4 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 17db4a9..b5667a4 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -15011,8 +15011,6 @@ static struct cipher_testvec aes_xts_enc_tv_template[] 
= {
.klen   = 64,
.iv = "\xff\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00\x00\x00\x00\x00\x00",
- "\x00\x00\x00\x00\x00\x00\x00\x00",
- "\x00\x00\x00\x00\x00\x00\x00\x00",
.input  = "\x00\x01\x02\x03\x04\x05\x06\x07"
  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
  "\x10\x11\x12\x13\x14\x15\x16\x17"
@@ -15355,8 +15353,6 @@ static struct cipher_testvec aes_xts_dec_tv_template[] 
= {
.klen   = 64,
.iv = "\xff\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00\x00\x00\x00\x00\x00",
- "\x00\x00\x00\x00\x00\x00\x00\x00",
- "\x00\x00\x00\x00\x00\x00\x00\x00",
.input  = "\x1c\x3b\x3a\x10\x2f\x77\x03\x86"
  "\xe4\x83\x6c\x99\xe3\x70\xcf\x9b"
  "\xea\x00\x80\x3f\x5e\x48\x23\x57"
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 3.6-rc5

2012-09-09 Thread Mathias Krause
On Sun, Sep 09, 2012 at 02:00:00PM -0700, Herbert Xu wrote:
> On Sun, Sep 09, 2012 at 10:09:10PM +0200, Mathias Krause wrote:
> >
> > It happens with the C variants of SHA1 and AES, too. You can easily
> > trigger the bug with Steffen's crconf[1]:
> > 
> > $ crconf add alg "authenc(hmac(sha1-generic),cbc(aes-generic))" type 3
> > 
> > So the problem is likely not related to sha1-ssse3.ko or aesni-intel.ko.
> 
> Thanks! I think this patch should fix the problem.  Can someone
> please confirm this?
> 
> crypto: authenc - Fix crash with zero-length assoc data
> 
> The authenc code doesn't deal with zero-length associated data
> correctly and ends up constructing a zero-length sg entry which
> causes a crash when it's fed into the crypto system.
> 
> This patch fixes this by avoiding the code-path that triggers
> the SG construction if we have no associated data.
> 
> This isn't the most optimal fix as it means that we'll end up
> using the fallback code-path even when we could still execute
> the digest function.  However, this isn't a big deal as nobody
> but the test path would supply zero-length associated data.
> 
> Reported-by: Romain Francoise 
> Signed-off-by: Herbert Xu 

Looks good to me.

> 
> diff --git a/crypto/authenc.c b/crypto/authenc.c
> index 5ef7ba6..d0583a4 100644
> --- a/crypto/authenc.c
> +++ b/crypto/authenc.c
> @@ -336,7 +336,7 @@ static int crypto_authenc_genicv(struct aead_request 
> *req, u8 *iv,
>   cryptlen += ivsize;
>   }
>  
> - if (sg_is_last(assoc)) {
> + if (req->assoclen && sg_is_last(assoc)) {
>   authenc_ahash_fn = crypto_authenc_ahash;
>   sg_init_table(asg, 2);
>   sg_set_page(asg, sg_page(assoc), assoc->length, assoc->offset);
> @@ -490,7 +490,7 @@ static int crypto_authenc_iverify(struct aead_request 
> *req, u8 *iv,
>   cryptlen += ivsize;
>   }
>  
> - if (sg_is_last(assoc)) {
> + if (req->assoclen && sg_is_last(assoc)) {
>   authenc_ahash_fn = crypto_authenc_ahash;
>   sg_init_table(asg, 2);
>   sg_set_page(asg, sg_page(assoc), assoc->length, assoc->offset);
> 

Tested-by: Mathias Krause 

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 3.6-rc5

2012-09-09 Thread Mathias Krause
On Sun, Sep 09, 2012 at 12:19:58PM -0700, Herbert Xu wrote:
> On Sun, Sep 09, 2012 at 11:13:02AM +0200, Romain Francoise wrote:
> > Still seeing this BUG with -rc5, that I originally reported here:
> > http://marc.info/?l=linux-crypto-vger&m=134653220530264&w=2
> > 
> > [   26.362567] [ cut here ]
> > [   26.362583] kernel BUG at crypto/scatterwalk.c:37!
> > [   26.362606] invalid opcode:  [#1] SMP 
> 
> Can you try blacklisting/not loading sha1_ssse3 and aesni_intel
> to see which one of them is causing this crash? Of course if you
> can still reproduce this without loading either of them that would
> also be interesting to know.

It happens with the C variants of SHA1 and AES, too. You can easily
trigger the bug with Steffen's crconf[1]:

$ crconf add alg "authenc(hmac(sha1-generic),cbc(aes-generic))" type 3

So the problem is likely not related to sha1-ssse3.ko or aesni-intel.ko.


Regards,
Mathias

[1] http://sourceforge.net/projects/crconf/
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: aesni-intel - fix unaligned cbc decrypt for x86-32

2012-05-30 Thread Mathias Krause
On Thu, May 31, 2012 at 7:27 AM, Herbert Xu  wrote:
> On Wed, May 30, 2012 at 01:43:08AM +0200, Mathias Krause wrote:
>> The 32 bit variant of cbc(aes) decrypt is using instructions requiring
>> 128 bit aligned memory locations but fails to ensure this constraint in
>> the code. Fix this by loading the data into intermediate registers with
>> load unaligned instructions.
>>
>> This fixes reported general protection faults related to aesni.
>>
>> References: https://bugzilla.kernel.org/show_bug.cgi?id=43223
>> Reported-by: Daniel 
>> Cc: sta...@kernel.org [v2.6.39+]
>> Signed-off-by: Mathias Krause 
>
> Have measured this against increasing alignmask to 15?

No, but the latter will likely be much slower as it would need to
memmove the data if it's not aligned, right?

My patch essentially just breaks the combined "XOR a memory operand
with a register" operation into two -- load memory into register, then
XOR with registers. It shouldn't be much slower compared to the
current version. But it fixes a bug the current version exposes when
working on unaligned data.

That said, I did some micro benchmark on "pxor (%edx), %xmm0" vs.
"movups (%edx), %xmm1; pxor %xmm1, %xmm0" and observed the latter
might be even slightly faster! But changing the code to perform better
is out of scope for this patch as it should just fix the bug in the
code. We can increase performance in a follow up patch.

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: aesni-intel - fix unaligned cbc decrypt for x86-32

2012-05-29 Thread Mathias Krause
The 32 bit variant of cbc(aes) decrypt is using instructions requiring
128 bit aligned memory locations but fails to ensure this constraint in
the code. Fix this by loading the data into intermediate registers with
load unaligned instructions.

This fixes reported general protection faults related to aesni.

References: https://bugzilla.kernel.org/show_bug.cgi?id=43223
Reported-by: Daniel 
Cc: sta...@kernel.org [v2.6.39+]
Signed-off-by: Mathias Krause 
---
 arch/x86/crypto/aesni-intel_asm.S |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index be6d9e3..3470624 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2460,10 +2460,12 @@ ENTRY(aesni_cbc_dec)
pxor IN3, STATE4
movaps IN4, IV
 #else
-   pxor (INP), STATE2
-   pxor 0x10(INP), STATE3
pxor IN1, STATE4
movaps IN2, IV
+   movups (INP), IN1
+   pxor IN1, STATE2
+   movups 0x10(INP), IN2
+   pxor IN2, STATE3
 #endif
movups STATE1, (OUTP)
movups STATE2, 0x10(OUTP)
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: sha1 - use Kbuild supplied flags for AVX test

2012-05-24 Thread Mathias Krause
Commit ea4d26ae ("raid5: add AVX optimized RAID5 checksumming")
introduced x86/ arch wide defines for AFLAGS and CFLAGS indicating AVX
support in binutils based on the same test we have in x86/crypto/ right
now. To minimize duplication drop our implementation in favour to the
one in x86/.

Signed-off-by: Mathias Krause 
---

This should be applied to cryptodev-2.6.git after it contains the above
mentioned commit, e.g. after cryptodev-2.6.git rebased to/merged v3.5-rc1.

 arch/x86/crypto/Makefile  |7 ---
 arch/x86/crypto/sha1_ssse3_asm.S  |2 +-
 arch/x86/crypto/sha1_ssse3_glue.c |6 +++---
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index e191ac0..479f95a 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -34,12 +34,5 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
 
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
-
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
-
-# enable AVX support only when $(AS) can actually assemble the instructions
-ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
-AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT
-CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
-endif
 sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
index b2c2f57..49d6987 100644
--- a/arch/x86/crypto/sha1_ssse3_asm.S
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -468,7 +468,7 @@ W_PRECALC_SSSE3
  */
 SHA1_VECTOR_ASM sha1_transform_ssse3
 
-#ifdef SHA1_ENABLE_AVX_SUPPORT
+#ifdef CONFIG_AS_AVX
 
 .macro W_PRECALC_AVX
 
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c 
b/arch/x86/crypto/sha1_ssse3_glue.c
index f916499..4a11a9d 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -35,7 +35,7 @@
 
 asmlinkage void sha1_transform_ssse3(u32 *digest, const char *data,
 unsigned int rounds);
-#ifdef SHA1_ENABLE_AVX_SUPPORT
+#ifdef CONFIG_AS_AVX
 asmlinkage void sha1_transform_avx(u32 *digest, const char *data,
   unsigned int rounds);
 #endif
@@ -184,7 +184,7 @@ static struct shash_alg alg = {
}
 };
 
-#ifdef SHA1_ENABLE_AVX_SUPPORT
+#ifdef CONFIG_AS_AVX
 static bool __init avx_usable(void)
 {
u64 xcr0;
@@ -209,7 +209,7 @@ static int __init sha1_ssse3_mod_init(void)
if (cpu_has_ssse3)
sha1_transform_asm = sha1_transform_ssse3;
 
-#ifdef SHA1_ENABLE_AVX_SUPPORT
+#ifdef CONFIG_AS_AVX
/* allow AVX to override SSSE3, it's a little faster */
if (avx_usable())
sha1_transform_asm = sha1_transform_avx;
-- 
1.5.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] crypto, x86: assembler implementation of SHA1

2011-08-17 Thread Mathias Krause
On Fri, Aug 5, 2011 at 9:49 AM, Herbert Xu  wrote:
> On Thu, Aug 04, 2011 at 08:19:23PM +0200, Mathias Krause wrote:
>> This patch series adds an assembler implementation for the SHA1 hash 
>> algorithm
>> for the x86-64 architecture. Its raw hash performance can be more than 2 
>> times
>> faster than the generic C implementation. This gives a real world benefit for
>> IPsec with an throughput increase of up to +35%. For concrete numbers have a
>> look at the second patch.
>
> I'll apply this as soon as rc1 is out.  Thanks!
>
> PS if you have time please look into the asynchronous version.

I've converted the implementation to ahash. It was pretty easy in the
end as I had a reference to look at -- the ghash implementation. But
I'm pretty unhappy how many layers of indirection are in there now
even for the irq_fpu_usable() case -- not to mention that the
!irq_fpu_usable() is much more heavyweight but, at least, it can now
always make use of the SSE3 variant. (See the attached patch, but
please don't apply it. It's compile tested only! Also I'm not
comfortable with the ahash version right now.)

I haven't made any benchmarks yet because I've no access to the test
machine right now. But I suspect it'll be much slower for small chunks
and *maybe* faster for the IPsec case I'm interested in because it can
now always use the SSE3 variant.

What I'm thinking about is having both, a shash and an ahash variant
so the user can actually chose which interface fits its needs best --
e.g., dm-crypt might want to use the shash interface and xfrm the
ahash one. Do you think something like that is possible? And if so,
how would dm-crypt automatically chose the shash and xfrm the ahash
variant?


Regards,
Mathias


0001-crypto-sha1-convert-to-ahash-interface.patch
Description: Binary data


Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-14 Thread Mathias Krause
On Thu, Aug 11, 2011 at 4:50 PM, Andy Lutomirski  wrote:
> I have vague plans to clean up extended state handling and make
> kernel_fpu_begin work efficiently from any context.  (i.e. the first
> kernel_fpu_begin after a context switch could take up to ~60 ns on Sandy
> Bridge, but further calls to kernel_fpu_begin would be a single branch.)
>
> The current code that handles context switches when user code is using
> extended state is terrible and will almost certainly become faster in the
> near future.

Sounds good! This would not only improve the performance of sha1_ssse3
but of aesni as well.

> Hopefully I'll have patches for 3.2 or 3.3.
>
> IOW, please don't introduce another thing like the fpu crypto module quite
> yet unless there's a good reason.  I'm looking forward to deleting the fpu
> module entirely.

I've no intention to. So please go ahead and do so.


Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-14 Thread Mathias Krause
Hi Max,

2011/8/8 Locktyukhin, Maxim :
> I'd like to note that at Intel we very much appreciate Mathias effort to 
> port/integrate this implementation into Linux kernel!
>
>
> $0.02 re tcrypt perf numbers below: I believe something must be terribly 
> broken with the tcrypt measurements
>
> 20 (and more) cycles per byte shown below are not reasonable numbers for 
> SHA-1 - ~6 c/b (as can be seen in some of the results for Core2) is the 
> expected results ... so, while relative improvement seen is sort of 
> consistent, the absolute performance numbers are very much off (and yes Sandy 
> Bridge on AVX code is expected to be faster than Core2/SSSE3 - ~5.2 c/b vs. 
> ~5.8 c/b on the level of the sha1_update() call to me more precise)
>
> this does not affect the proposed patch in any way, it looks like tcrypt's 
> timing problem to me - I'd even venture a guess that it may be due to the use 
> of RDTSC (that gets affected significantly by Turbo/EIST, TSC is isotropic in 
> time but not with the core clock domain, i.e. RDTSC cannot be used to measure 
> core cycles without at least disabling EIST and Turbo, or doing runtime 
> adjustment of actual bus/core clock ratio vs. the standard ratio always used 
> by TSC - I could elaborate more if someone is interested)

I found the Sandy Bridge numbers odd too but suspected, it might be
because of the laptop platform. The SSSE3 numbers on this platform
were slightly lower than the AVX numbers and that for still way off
the ones for the Core2 system. But your explanation fits well, too. It
might be EIST or Turbo mode that tampered with the numbers. Another,
maybe more likely point might be the overhead Andy mentioned.

> thanks again,
> -Max
>

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

 frame sizesha1-generic sha1-ssse3delta
64 byte 37.5 MBit/s37.5 MBit/s 0.0%
   128 byte 56.3 MBit/s62.5 MBit/s   +11.0%
   256 byte 87.5 MBit/s   100.0 MBit/s   +14.3%
   512 byte131.3 MBit/s   150.0 MBit/s   +14.2%
  1024 byte162.5 MBit/s   193.8 MBit/s   +19.3%
  1280 byte175.0 MBit/s   212.5 MBit/s   +21.4%
  1420 byte175.0 MBit/s   218.7 MBit/s   +25.0%
  1518 byte150.0 MBit/s   181.2 MBit/s   +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.

Signed-off-by: Mathias Krause 
Cc: Maxim Locktyukhin 
---
 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 5 files changed, 819 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index c04f1b7..57c7f7b 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
 
 obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
+obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
 twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
@@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
+
+# enable AVX support only when $(AS) can actually assemble the instructions
+ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
+AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT
+CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
+endif
+sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
new file mode 100644
index 000..b2c2f57
--- /dev/null
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -0,0 +1,558 @@
+/*
+ * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental
+ * SSE3 instruction set extensions introduced in Intel Core Microarchitecture
+ * processors. CPUs supporting Intel(R) AVX extensions will get an additional
+ * boost.
+ *
+ * This work was inspired by the vectorized implementation of Dean Gaudet.
+ * Additional information on it can be found at:
+ *http://www.arctic.org/~dean/crypto/sha1.html
+ *
+ * It was improved upon with more efficient vectorization of the message
+ * scheduling. This implementation has also been optimized for all current and
+ * several future generations of Intel CPUs.
+ *
+ * See this article for more information about the implementation details:
+ *   
http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/
+ *
+ * Copyright (C) 2010, Intel Corp.
+ *   Authors: Maxim Locktyukhin 
+ *Ronen Zohar 
+ *
+ * Converted to AT&T syntax and adapted for inclusion in the Linux kernel:
+ *   Author: Mathias Krause 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as publishe

[PATCH v3 1/2] crypto, sha1: export sha1_update for reuse

2011-08-04 Thread Mathias Krause
Export the update function as crypto_sha1_update() to not have the need
to reimplement the same algorithm for each SHA-1 implementation. This
way the generic SHA-1 implementation can be used as fallback for other
implementations that fail to run under certain circumstances, like the
need for an FPU context while executing in IRQ context.

Signed-off-by: Mathias Krause 
---
 crypto/sha1_generic.c |9 +
 include/crypto/sha.h  |3 +++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index 0416091..0b6d907 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc)
return 0;
 }
 
-static int sha1_update(struct shash_desc *desc, const u8 *data,
+int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
@@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 
return 0;
 }
+EXPORT_SYMBOL(crypto_sha1_update);
 
 
 /* Add padding and return the message digest. */
@@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* Pad out to 56 mod 64 */
index = sctx->count & 0x3f;
padlen = (index < 56) ? (56 - index) : ((64+56) - index);
-   sha1_update(desc, padding, padlen);
+   crypto_sha1_update(desc, padding, padlen);
 
/* Append length */
-   sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+   crypto_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
 
/* Store state in digest */
for (i = 0; i < 5; i++)
@@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void 
*in)
 static struct shash_alg alg = {
.digestsize =   SHA1_DIGEST_SIZE,
.init   =   sha1_init,
-   .update =   sha1_update,
+   .update =   crypto_sha1_update,
.final  =   sha1_final,
.export =   sha1_export,
.import =   sha1_import,
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 069e85b..83e6be5 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -82,4 +82,7 @@ struct sha512_state {
u8 buf[SHA512_BLOCK_SIZE];
 };
 
+extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len);
+
 #endif
-- 
1.5.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] crypto, x86: assembler implementation of SHA1

2011-08-04 Thread Mathias Krause
This patch series adds an assembler implementation for the SHA1 hash algorithm
for the x86-64 architecture. Its raw hash performance can be more than 2 times
faster than the generic C implementation. This gives a real world benefit for
IPsec with an throughput increase of up to +35%. For concrete numbers have a
look at the second patch.

This implementation is currently x86-64 only but might be ported to 32 bit with
some effort in a follow up patch. (I had no time to do this yet.)

Note: The SSSE3 is no typo, it's "Supplemental SSE3".

v3 changes:
- removed #ifdef in first patch
v2 changes:
- fixed typo in Makefile making AVX version unusable
- whitespace fixes for the .S file

Regards,
Mathias

Mathias Krause (2):
  crypto, sha1: export sha1_update for reuse
  crypto, x86: SSSE3 based SHA1 implementation for x86-64

 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 crypto/sha1_generic.c |9 +-
 include/crypto/sha.h  |3 +
 7 files changed, 827 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
On Thu, Aug 4, 2011 at 7:05 PM, Mathias Krause  wrote:
> It does. Just have a look at how fpu_available() is implemented:

read: irq_fpu_usable()
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
On Thu, Aug 4, 2011 at 8:44 AM, Herbert Xu  wrote:
> On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote:
>>
>> With this algorithm I was able to increase the throughput of a single
>> IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
>> the SSSE3 variant -- a speedup of +34.8%.
>
> Were you testing this on the transmit side or the receive side?

I was running an iperf test on two directly connected systems. Both sides
showed me those numbers (iperf server and client).

> As the IPsec receive code path usually runs in a softirq context,
> does this code have any effect there at all?

It does. Just have a look at how fpu_available() is implemented:

,-[ arch/x86/include/asm/i387.h ]
| static inline bool irq_fpu_usable(void)
| {
| struct pt_regs *regs;
|
| return !in_interrupt() || !(regs = get_irq_regs()) || \
| user_mode(regs) || (read_cr0() & X86_CR0_TS);
| }
`

So, it'll fail in softirq context when the softirq interrupted a kernel thread
or TS in CR0 is set. When it interrupted a userland thread that hasn't the TS
flag set in CR0, i.e. the CPU won't generate an exception when we use the FPU,
it'll work in softirq context, too.

With a busy userland making extensive use of the FPU it'll almost always have
to fall back to the generic implementation, right. However, using this module
on an IPsec gateway with no real userland at all, you get a nice performance
gain.

> This is pretty similar to the situation with the Intel AES code.
> Over there they solved it by using the asynchronous interface and
> deferring the processing to a work queue.
>
> This also avoids the situation where you have an FPU/SSE using
> process that also tries to transmit over IPsec thrashing the
> FPU state.

Interesting. I'll look into this.

> Now I'm still happy to take this because hashing is very different
> from ciphers in that some users tend to hash small amounts of data
> all the time.  Those users will typically use the shash interface
> that you provide here.
>
> So I'm interested to know how much of an improvement this is for
> those users (< 64 bytes).

Anything below 64 byte will i(and has to) be padded to a full block, i.e. 64
bytes.

> If you run the tcrypt speed tests that should provide some useful info.

I've summarized the mean values of five consecutive tcrypt runs from two
different systems. The first system is an Intel Core i7 2620M based notebook
running at 2.70 GHz. It's a Sandy Bridge processor so could make use of the
AVX variant. The second system was an Intel Core 2 Quad Xeon system running at
2.40 GHz -- no AVX, but SSSE3.

Since the output of tcrypt is a little awkward to read, I've condensed it
slightly to make it (hopefully) more readable. Please interpret the table as
follow: The triple in the first column is (byte blocks | bytes per update |
updates), c/B is cycles per byte.

Here are the numbers for the first system:

   sha1-genericsha1-ssse3 (AVX)
 (  16 |   16 |   1): 9.65 MiB/s, 266.2 c/B 12.93 MiB/s, 200.0 c/B
 (  64 |   16 |   4):19.05 MiB/s, 140.2 c/B 25.27 MiB/s, 105.6 c/B
 (  64 |   64 |   1):21.35 MiB/s, 119.2 c/B 29.29 MiB/s,  87.0 c/B
 ( 256 |   16 |  16):28.81 MiB/s,  88.8 c/B 37.70 MiB/s,  68.4 c/B
 ( 256 |   64 |   4):34.58 MiB/s,  74.0 c/B 47.16 MiB/s,  54.8 c/B
 ( 256 |  256 |   1):37.44 MiB/s,  68.0 c/B 69.01 MiB/s,  36.8 c/B
 (1024 |   16 |  64):33.55 MiB/s,  76.2 c/B 43.77 MiB/s,  59.0 c/B
 (1024 |  256 |   4):45.12 MiB/s,  58.0 c/B 88.90 MiB/s,  28.8 c/B
 (1024 | 1024 |   1):46.69 MiB/s,  54.0 c/B104.39 MiB/s,  25.6 c/B
 (2048 |   16 | 128):34.66 MiB/s,  74.0 c/B 44.93 MiB/s,  57.2 c/B
 (2048 |  256 |   8):46.81 MiB/s,  54.0 c/B 93.83 MiB/s,  27.0 c/B
 (2048 | 1024 |   2):48.28 MiB/s,  52.4 c/B110.98 MiB/s,  23.0 c/B
 (2048 | 2048 |   1):48.69 MiB/s,  52.0 c/B114.26 MiB/s,  22.0 c/B
 (4096 |   16 | 256):35.15 MiB/s,  72.6 c/B 45.53 MiB/s,  56.0 c/B
 (4096 |  256 |  16):47.69 MiB/s,  53.0 c/B 96.46 MiB/s,  26.0 c/B
 (4096 | 1024 |   4):49.24 MiB/s,  51.0 c/B114.36 MiB/s,  22.0 c/B
 (4096 | 4096 |   1):49.77 MiB/s,  51.0 c/B119.80 MiB/s,  21.0 c/B
 (8192 |   16 | 512):35.46 MiB/s,  72.2 c/B 45.84 MiB/s,  55.8 c/B
 (8192 |  256 |  32):48.15 MiB/s,  53.0 c/B 97.83 MiB/s,  26.0 c/B
 (8192 | 1024 |   8):49.73 MiB/s,  51.0 c/B116.35 MiB/s,  22.0 c/B
 (8192 | 4096 |   2):50.10 MiB/s,  50.8 c/B121.66 MiB/s,  21.0 c/B
 (8192 | 8192 |   1):50.25 MiB/s,  50.8 c/B121.87 MiB/s,  21.0 c/B

For the second system I got the following numbers:

   sha1-genericsha1-ssse3 (SSSE3)
 (  16 |   16 |   1):27.23 MiB/s, 106.6 c/B 32.86 MiB/s,  73.8 c/B

Re: [PATCH v2 1/2] crypto, sha1: export sha1_update for reuse

2011-07-31 Thread Mathias Krause
On Sat, Jul 30, 2011 at 3:46 PM, Herbert Xu  wrote:
> On Thu, Jul 28, 2011 at 05:29:35PM +0200, Mathias Krause wrote:
>> On Thu, Jul 28, 2011 at 4:58 PM, Herbert Xu  
>> wrote:
>> > On Sun, Jul 24, 2011 at 07:53:13PM +0200, Mathias Krause wrote:
>> >>
>> >> diff --git a/include/crypto/sha.h b/include/crypto/sha.h
>> >> index 069e85b..7c46d0c 100644
>> >> --- a/include/crypto/sha.h
>> >> +++ b/include/crypto/sha.h
>> >> @@ -82,4 +82,9 @@ struct sha512_state {
>> >>       u8 buf[SHA512_BLOCK_SIZE];
>> >>  };
>> >>
>> >> +#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE)
>> >> +extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
>> >> +                           unsigned int len);
>> >> +#endif
>> >
>> > Please remove the unnecessary #if.
>>
>> The function will only be available when crypto/sha1_generic.o will
>> either be build into the kernel or build as a module. Without the #if
>> a potential wrong user of this function might not be detected as early
>> as at compilation time but as late as at link time, or even worse, as
>> late as on module load time -- which is pretty bad. IMHO it's better
>> to detect errors early, as in reading "error: implicit declaration of
>> function ‘crypto_sha1_update’" when trying to compile the code in
>> question instead of "Unknown symbol crypto_sha1_update" in dmesg when
>> trying to load the module. That for I would like to keep the #if.
>
> In order to be consistent please remove the ifdef.  In most
> similar cases in the crypto subsystem we don't do this.  As
> adding such ifdefs all over the place would gain very little,
> I'd much rather you left it out.

Noting that this function wasn't exported before and the only user
(sha-ssse3) ensures its availability by other means, it should be okay
to remove the #if. I'll update the patch accordingly.

Any objections to the second patch?

Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] crypto, sha1: export sha1_update for reuse

2011-07-28 Thread Mathias Krause
On Thu, Jul 28, 2011 at 4:58 PM, Herbert Xu  wrote:
> On Sun, Jul 24, 2011 at 07:53:13PM +0200, Mathias Krause wrote:
>>
>> diff --git a/include/crypto/sha.h b/include/crypto/sha.h
>> index 069e85b..7c46d0c 100644
>> --- a/include/crypto/sha.h
>> +++ b/include/crypto/sha.h
>> @@ -82,4 +82,9 @@ struct sha512_state {
>>       u8 buf[SHA512_BLOCK_SIZE];
>>  };
>>
>> +#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE)
>> +extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
>> +                           unsigned int len);
>> +#endif
>
> Please remove the unnecessary #if.

The function will only be available when crypto/sha1_generic.o will
either be build into the kernel or build as a module. Without the #if
a potential wrong user of this function might not be detected as early
as at compilation time but as late as at link time, or even worse, as
late as on module load time -- which is pretty bad. IMHO it's better
to detect errors early, as in reading "error: implicit declaration of
function ‘crypto_sha1_update’" when trying to compile the code in
question instead of "Unknown symbol crypto_sha1_update" in dmesg when
trying to load the module. That for I would like to keep the #if.

Thanks for the review!

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-24 Thread Mathias Krause
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

 frame sizesha1-generic sha1-ssse3delta
64 byte 37.5 MBit/s37.5 MBit/s 0.0%
   128 byte 56.3 MBit/s62.5 MBit/s   +11.0%
   256 byte 87.5 MBit/s   100.0 MBit/s   +14.3%
   512 byte131.3 MBit/s   150.0 MBit/s   +14.2%
  1024 byte162.5 MBit/s   193.8 MBit/s   +19.3%
  1280 byte175.0 MBit/s   212.5 MBit/s   +21.4%
  1420 byte175.0 MBit/s   218.7 MBit/s   +25.0%
  1518 byte150.0 MBit/s   181.2 MBit/s   +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.

Signed-off-by: Mathias Krause 
Cc: Maxim Locktyukhin 
---
 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 5 files changed, 819 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index c04f1b7..57c7f7b 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
 
 obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
+obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
 twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
@@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
+
+# enable AVX support only when $(AS) can actually assemble the instructions
+ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
+AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT
+CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
+endif
+sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
new file mode 100644
index 000..b2c2f57
--- /dev/null
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -0,0 +1,558 @@
+/*
+ * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental
+ * SSE3 instruction set extensions introduced in Intel Core Microarchitecture
+ * processors. CPUs supporting Intel(R) AVX extensions will get an additional
+ * boost.
+ *
+ * This work was inspired by the vectorized implementation of Dean Gaudet.
+ * Additional information on it can be found at:
+ *http://www.arctic.org/~dean/crypto/sha1.html
+ *
+ * It was improved upon with more efficient vectorization of the message
+ * scheduling. This implementation has also been optimized for all current and
+ * several future generations of Intel CPUs.
+ *
+ * See this article for more information about the implementation details:
+ *   
http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/
+ *
+ * Copyright (C) 2010, Intel Corp.
+ *   Authors: Maxim Locktyukhin 
+ *Ronen Zohar 
+ *
+ * Converted to AT&T syntax and adapted for inclusion in the Linux kernel:
+ *   Author: Mathias Krause 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as publishe

[PATCH v2 1/2] crypto, sha1: export sha1_update for reuse

2011-07-24 Thread Mathias Krause
Export the update function as crypto_sha1_update() to not have the need
to reimplement the same algorithm for each SHA-1 implementation. This
way the generic SHA-1 implementation can be used as fallback for other
implementations that fail to run under certain circumstances, like the
need for an FPU context while executing in IRQ context.

Signed-off-by: Mathias Krause 
---
 crypto/sha1_generic.c |9 +
 include/crypto/sha.h  |5 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index 0416091..0b6d907 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc)
return 0;
 }
 
-static int sha1_update(struct shash_desc *desc, const u8 *data,
+int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
@@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 
return 0;
 }
+EXPORT_SYMBOL(crypto_sha1_update);
 
 
 /* Add padding and return the message digest. */
@@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* Pad out to 56 mod 64 */
index = sctx->count & 0x3f;
padlen = (index < 56) ? (56 - index) : ((64+56) - index);
-   sha1_update(desc, padding, padlen);
+   crypto_sha1_update(desc, padding, padlen);
 
/* Append length */
-   sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+   crypto_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
 
/* Store state in digest */
for (i = 0; i < 5; i++)
@@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void 
*in)
 static struct shash_alg alg = {
.digestsize =   SHA1_DIGEST_SIZE,
.init   =   sha1_init,
-   .update =   sha1_update,
+   .update =   crypto_sha1_update,
.final  =   sha1_final,
.export =   sha1_export,
.import =   sha1_import,
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 069e85b..7c46d0c 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -82,4 +82,9 @@ struct sha512_state {
u8 buf[SHA512_BLOCK_SIZE];
 };
 
+#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE)
+extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len);
+#endif
+
 #endif
-- 
1.5.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] crypto, x86: assembler implementation of SHA1

2011-07-24 Thread Mathias Krause
This patch series adds an assembler implementation for the SHA1 hash algorithm
for the x86-64 architecture. Its raw hash performance can be more than 2 times
faster than the generic C implementation. This gives a real world benefit for
IPsec with an throughput increase of up to +35%. For concrete numbers have a
look at the second patch.

This implementation is currently x86-64 only but might be ported to 32 bit with
some effort in a follow up patch. (I had no time to do this yet.)

Note: The SSSE3 is no typo, it's "Supplemental SSE3".

v2 changes:
- fixed typo in Makefile making AVX version unusable
- whitespace fixes for the .S file

Regards,
Mathias

Mathias Krause (2):
  crypto, sha1: export sha1_update for reuse
  crypto, x86: SSSE3 based SHA1 implementation for x86-64

 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 crypto/sha1_generic.c |9 +-
 include/crypto/sha.h  |5 +
 7 files changed, 829 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-24 Thread Mathias Krause
On Sat, Jul 16, 2011 at 2:44 PM, Mathias Krause  wrote:
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index c04f1b7..a80be92 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
>  obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
>
>  obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
> +obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
>
>  aes-i586-y := aes-i586-asm_32.o aes_glue.o
>  twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
> @@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
>  aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
>
>  ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
> +
> +# enable AVX support only when $(AS) can actually assemble the instructions
> +ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
> +AFLAGS_sha1_ssse3.o += -DSHA1_ENABLE_AVX_SUPPORT

This should have been

AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT

instead. Sorry, a missing adjustment for a "last minute file rename".
I'll post a new version of the series with a wider target audience
since there have been no reply so far for a week.

> +CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
> +endif
> +sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
> diff --git a/arch/x86/crypto/sha1_ssse3_asm.S 
> b/arch/x86/crypto/sha1_ssse3_asm.S
> new file mode 100644
> index 000..8fb0ba6


Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-16 Thread Mathias Krause
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

 frame sizesha1-generic sha1-ssse3delta
64 byte 37.5 MBit/s37.5 MBit/s 0.0%
   128 byte 56.3 MBit/s62.5 MBit/s   +11.0%
   256 byte 87.5 MBit/s   100.0 MBit/s   +14.3%
   512 byte131.3 MBit/s   150.0 MBit/s   +14.2%
  1024 byte162.5 MBit/s   193.8 MBit/s   +19.3%
  1280 byte175.0 MBit/s   212.5 MBit/s   +21.4%
  1420 byte175.0 MBit/s   218.7 MBit/s   +25.0%
  1518 byte150.0 MBit/s   181.2 MBit/s   +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.

Signed-off-by: Mathias Krause 
Cc: Maxim Locktyukhin 
---
 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 5 files changed, 819 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index c04f1b7..a80be92 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
 
 obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
+obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
 twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
@@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
+
+# enable AVX support only when $(AS) can actually assemble the instructions
+ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
+AFLAGS_sha1_ssse3.o += -DSHA1_ENABLE_AVX_SUPPORT
+CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
+endif
+sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
new file mode 100644
index 000..8fb0ba6
--- /dev/null
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -0,0 +1,558 @@
+/*
+ * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental
+ * SSE3 instruction set extensions introduced in Intel Core Microarchitecture
+ * processors. CPUs supporting Intel(R) AVX extensions will get an additional
+ * boost.
+ *
+ * This work was inspired by the vectorized implementation of Dean Gaudet.
+ * Additional information on it can be found at:
+ *http://www.arctic.org/~dean/crypto/sha1.html
+ *
+ * It was improved upon with more efficient vectorization of the message
+ * scheduling. This implementation has also been optimized for all current and
+ * several future generations of Intel CPUs.
+ *
+ * See this article for more information about the implementation details:
+ *   
http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/
+ *
+ * Copyright (C) 2010, Intel Corp.
+ *   Authors: Maxim Locktyukhin 
+ *Ronen Zohar 
+ *
+ * Converted to AT&T syntax and adapted for inclusion in the Linux kernel:
+ *   Author: Mathias Krause 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as publishe

[PATCH 1/2] crypto, sha1: export sha1_update for reuse

2011-07-16 Thread Mathias Krause
Export the update function as crypto_sha1_update() to not have the need
to reimplement the same algorithm for each SHA-1 implementation. This
way the generic SHA-1 implementation can be used as fallback for other
implementations that fail to run under certain circumstances, like the
need for an FPU context while executing in IRQ context.

Signed-off-by: Mathias Krause 
---
 crypto/sha1_generic.c |9 +
 include/crypto/sha.h  |5 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index 0416091..0b6d907 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc)
return 0;
 }
 
-static int sha1_update(struct shash_desc *desc, const u8 *data,
+int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
@@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 
return 0;
 }
+EXPORT_SYMBOL(crypto_sha1_update);
 
 
 /* Add padding and return the message digest. */
@@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* Pad out to 56 mod 64 */
index = sctx->count & 0x3f;
padlen = (index < 56) ? (56 - index) : ((64+56) - index);
-   sha1_update(desc, padding, padlen);
+   crypto_sha1_update(desc, padding, padlen);
 
/* Append length */
-   sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+   crypto_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
 
/* Store state in digest */
for (i = 0; i < 5; i++)
@@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void 
*in)
 static struct shash_alg alg = {
.digestsize =   SHA1_DIGEST_SIZE,
.init   =   sha1_init,
-   .update =   sha1_update,
+   .update =   crypto_sha1_update,
.final  =   sha1_final,
.export =   sha1_export,
.import =   sha1_import,
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 069e85b..7c46d0c 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -82,4 +82,9 @@ struct sha512_state {
u8 buf[SHA512_BLOCK_SIZE];
 };
 
+#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE)
+extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len);
+#endif
+
 #endif
-- 
1.5.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] crypto, x86: assembler implementation of SHA1

2011-07-16 Thread Mathias Krause
This patch series adds an assembler implementation for the SHA1 hash algorithm
for the x86-64 architecture. Its raw hash performance can be more than 2 times
faster than the generic C implementation. This gives a real world benefit for
IPsec with an throughput increase of up to +35%. For concrete numbers have a
look at the second patch.

This implementation is currently x86-64 only but might be ported to 32 bit with
some effort in a follow up patch. (I had no time to do this yet.)

Note: The SSSE3 is no typo, it's "Supplemental SSE3".

Regards,
Mathias

Mathias Krause (2):
  crypto, sha1: export sha1_update for reuse
  crypto, x86: SSSE3 based SHA1 implementation for x86-64

 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 crypto/sha1_generic.c |9 +-
 include/crypto/sha.h  |5 +
 7 files changed, 829 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto, gf128: fix call to memset()

2011-07-07 Thread Mathias Krause
In gf128mul_lle() and gf128mul_bbe() r isn't completely initialized with
zero because the size argument passed to memset() is the size of the
pointer, not the structure it points to.

Luckily there are no in-kernel users of those functions so the ABI
change implied by this fix should break no existing code.

Based on a patch by the PaX Team.

Signed-off-by: Mathias Krause 
Cc: PaX Team 
---
 crypto/gf128mul.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c
index df35e4c..5276607 100644
--- a/crypto/gf128mul.c
+++ b/crypto/gf128mul.c
@@ -182,7 +182,7 @@ void gf128mul_lle(be128 *r, const be128 *b)
for (i = 0; i < 7; ++i)
gf128mul_x_lle(&p[i + 1], &p[i]);
 
-   memset(r, 0, sizeof(r));
+   memset(r, 0, sizeof(*r));
for (i = 0;;) {
u8 ch = ((u8 *)b)[15 - i];
 
@@ -220,7 +220,7 @@ void gf128mul_bbe(be128 *r, const be128 *b)
for (i = 0; i < 7; ++i)
gf128mul_x_bbe(&p[i + 1], &p[i]);
 
-   memset(r, 0, sizeof(r));
+   memset(r, 0, sizeof(*r));
for (i = 0;;) {
u8 ch = ((u8 *)b)[i];
 
-- 
1.5.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 21:37 Randy Dunlap wrote:
> On 11/29/10 12:21, Mathias Krause wrote:
>> On 29.11.2010, 21:11 Randy Dunlap wrote:
>>> On 11/29/10 12:02, Mathias Krause wrote:
>>>> On 29.11.2010, 20:54 Randy Dunlap wrote:
>>>>> On 11/29/10 11:45, Mathias Krause wrote:
>>>>>> On 29.11.2010, 20:31 Randy Dunlap wrote:
>>>>>>> On 11/29/10 11:21, Mathias Krause wrote:
>>>>>>>> On 29.11.2010, 19:54 Randy Dunlap wrote:
>>>>>>>>> On 11/29/10 10:26, Mathias Krause wrote:
>>>>>>>>>> On 29.11.2010, 17:31 Randy Dunlap wrote:
>>>>>>>>>>> On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> Changes since 20101126:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> on i386 builds, I get tons of these (and more) errors:
>>>>>>>>>>> 
>>>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name 
>>>>>>>>>>> `%r12'
>>>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name 
>>>>>>>>>>> `%r13'
>>>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name 
>>>>>>>>>>> `%r14'
>>>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name 
>>>>>>>>>>> `%rsp'
>>>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name 
>>>>>>>>>>> `%rsp'
>>>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name 
>>>>>>>>>>> `%rsp'
>>>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name 
>>>>>>>>>>> `%r9'
>>>>>>>>>>> 
>>>>>>>>>>> even though the kernel .config file says:
>>>>>>>>>>> 
>>>>>>>>>>> CONFIG_CRYPTO_AES=m
>>>>>>>>>>> CONFIG_CRYPTO_AES_586=m
>>>>>>>>>>> CONFIG_CRYPTO_AES_NI_INTEL=m
>>>>>>>>>>> 
>>>>>>>>>>> Should arch/x86/crypto/aesni-intel_asm.S be testing
>>>>>>>>>>> #ifdef CONFIG_X86_64
>>>>>>>>>>> instead of
>>>>>>>>>>> #ifdef __x86_64__
>>>>>>>>>>> or does that not matter?
>>>>>>>>>>> 
>>>>>>>>>>> or is this a toolchain issue?
>>>>>>>>>> 
>>>>>>>>>> Well, __x86_64__ should be a build-in define of the compiler while
>>>>>>>>>> CONFIG_X86_64 is defined for 64 bit builds in 
>>>>>>>>>> include/generated/autoconf.h.
>>>>>>>>>> So by using the latter we should be on the safe side but if your 
>>>>>>>>>> compiler
>>>>>>>>>> defines __x86_64__ for 32-bit builds it's simply broken. Also git 
>>>>>>>>>> grep
>>>>>>>>>> showed quite a few more places using __x86_64__ so those would 
>>>>>>>>>> miscompile on
>>>>>>>>>> your toolchain, too.
>>>>>>>>>> 
>>>>>>>>>> But it looks like linux-next is just missing
>>>>>>>>>> 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
>>>>>>>>>> That should fix the build issue.
>>>>>>>>> 
>>>>>>>>> The build problem still happens when that patch is applied.
>>>>>>>> 
>>>>>>>> That's weird. So it must be something with your toolchain.
>>>>>>>> Can you please post the output of the following commands?:
>>>>>>>> 
>

Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 21:11 Randy Dunlap wrote:
> On 11/29/10 12:02, Mathias Krause wrote:
>> On 29.11.2010, 20:54 Randy Dunlap wrote:
>>> On 11/29/10 11:45, Mathias Krause wrote:
>>>> On 29.11.2010, 20:31 Randy Dunlap wrote:
>>>>> On 11/29/10 11:21, Mathias Krause wrote:
>>>>>> On 29.11.2010, 19:54 Randy Dunlap wrote:
>>>>>>> On 11/29/10 10:26, Mathias Krause wrote:
>>>>>>>> On 29.11.2010, 17:31 Randy Dunlap wrote:
>>>>>>>>> On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:
>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> Changes since 20101126:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> on i386 builds, I get tons of these (and more) errors:
>>>>>>>>> 
>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12'
>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13'
>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14'
>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp'
>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp'
>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp'
>>>>>>>>> arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9'
>>>>>>>>> 
>>>>>>>>> even though the kernel .config file says:
>>>>>>>>> 
>>>>>>>>> CONFIG_CRYPTO_AES=m
>>>>>>>>> CONFIG_CRYPTO_AES_586=m
>>>>>>>>> CONFIG_CRYPTO_AES_NI_INTEL=m
>>>>>>>>> 
>>>>>>>>> Should arch/x86/crypto/aesni-intel_asm.S be testing
>>>>>>>>> #ifdef CONFIG_X86_64
>>>>>>>>> instead of
>>>>>>>>> #ifdef __x86_64__
>>>>>>>>> or does that not matter?
>>>>>>>>> 
>>>>>>>>> or is this a toolchain issue?
>>>>>>>> 
>>>>>>>> Well, __x86_64__ should be a build-in define of the compiler while
>>>>>>>> CONFIG_X86_64 is defined for 64 bit builds in 
>>>>>>>> include/generated/autoconf.h.
>>>>>>>> So by using the latter we should be on the safe side but if your 
>>>>>>>> compiler
>>>>>>>> defines __x86_64__ for 32-bit builds it's simply broken. Also git grep
>>>>>>>> showed quite a few more places using __x86_64__ so those would 
>>>>>>>> miscompile on
>>>>>>>> your toolchain, too.
>>>>>>>> 
>>>>>>>> But it looks like linux-next is just missing
>>>>>>>> 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
>>>>>>>> That should fix the build issue.
>>>>>>> 
>>>>>>> The build problem still happens when that patch is applied.
>>>>>> 
>>>>>> That's weird. So it must be something with your toolchain.
>>>>>> Can you please post the output of the following commands?:
>>>>>> 
>>>>>> $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86'
>>>>> 
>>>>> #define __i386 1
>>>>> #define __i386__ 1
>>>>> #define i386 1
>>>>> #define __i586 1
>>>>> #define __i586__ 1
>>>>> 
>>>>>> $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86'
>>>>> 
>>>>> #define __x86_64 1
>>>>> #define __x86_64__ 1
>>>>> 
>>>>> So that's not the problem... and the patch below didn't help.
>>>> 
>>>> That's odd. The output of the commands looks good so the x86-64 specific 
>>>> code
>>>> should be left out for 32-bit builds. :/
>>>> 
>>>>> Sorry that I even asked about that.  W

Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 20:54 Randy Dunlap wrote:
> On 11/29/10 11:45, Mathias Krause wrote:
>> On 29.11.2010, 20:31 Randy Dunlap wrote:
>>> On 11/29/10 11:21, Mathias Krause wrote:
>>>> On 29.11.2010, 19:54 Randy Dunlap wrote:
>>>>> On 11/29/10 10:26, Mathias Krause wrote:
>>>>>> On 29.11.2010, 17:31 Randy Dunlap wrote:
>>>>>>> On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> Changes since 20101126:
>>>>>>> 
>>>>>>> 
>>>>>>> on i386 builds, I get tons of these (and more) errors:
>>>>>>> 
>>>>>>> arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12'
>>>>>>> arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13'
>>>>>>> arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14'
>>>>>>> arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp'
>>>>>>> arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp'
>>>>>>> arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp'
>>>>>>> arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9'
>>>>>>> 
>>>>>>> even though the kernel .config file says:
>>>>>>> 
>>>>>>> CONFIG_CRYPTO_AES=m
>>>>>>> CONFIG_CRYPTO_AES_586=m
>>>>>>> CONFIG_CRYPTO_AES_NI_INTEL=m
>>>>>>> 
>>>>>>> Should arch/x86/crypto/aesni-intel_asm.S be testing
>>>>>>> #ifdef CONFIG_X86_64
>>>>>>> instead of
>>>>>>> #ifdef __x86_64__
>>>>>>> or does that not matter?
>>>>>>> 
>>>>>>> or is this a toolchain issue?
>>>>>> 
>>>>>> Well, __x86_64__ should be a build-in define of the compiler while
>>>>>> CONFIG_X86_64 is defined for 64 bit builds in 
>>>>>> include/generated/autoconf.h.
>>>>>> So by using the latter we should be on the safe side but if your compiler
>>>>>> defines __x86_64__ for 32-bit builds it's simply broken. Also git grep
>>>>>> showed quite a few more places using __x86_64__ so those would 
>>>>>> miscompile on
>>>>>> your toolchain, too.
>>>>>> 
>>>>>> But it looks like linux-next is just missing
>>>>>> 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
>>>>>> That should fix the build issue.
>>>>> 
>>>>> The build problem still happens when that patch is applied.
>>>> 
>>>> That's weird. So it must be something with your toolchain.
>>>> Can you please post the output of the following commands?:
>>>> 
>>>> $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86'
>>> 
>>> #define __i386 1
>>> #define __i386__ 1
>>> #define i386 1
>>> #define __i586 1
>>> #define __i586__ 1
>>> 
>>>> $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86'
>>> 
>>> #define __x86_64 1
>>> #define __x86_64__ 1
>>> 
>>> So that's not the problem... and the patch below didn't help.
>> 
>> That's odd. The output of the commands looks good so the x86-64 specific code
>> should be left out for 32-bit builds. :/
>> 
>>> Sorry that I even asked about that.  What next?
>> 
>> Can you please post the full error message. Meanwhile I'm checking out a
>> linux-next tree, trying to reproduce your problem.
>> 
> 
> I just built with "make V=1" to see the full commands that are used, but
> that didn't help me either:
> 
>  gcc -Wp,-MD,arch/x86/crypto/.aesni-intel_asm.o.d  -nostdinc -isystem 
> /usr/lib/gcc/x86_64-redhat-linux/4.4.1/include 
> -I/lnx/src/NEXT/linux-next-20101129/arch/x86/include -Iinclude  
> -I/lnx/src/NEXT/linux-next-20101129/include -include 
> include/generated/autoconf.h -D__KERNEL__ -D__ASSEMBLY__ -m32 
> -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DMODULE  -c -o 
> arch/x86/crypto/aesni-intel_asm.o 
> /lnx/src/NEXT/linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S
> 
> 
> There are 2945 lines like this:
> 
> linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S:841: Error: bad 
> register name `%r12'

Well, in my tree (linux-next + 559ad0ff) line 841 is a comment. Albeit without
559ad0ff it's a 'push %r12'. So maybe you should apply the patch just once
more to be sure. ;)

> It's around 311 KB, so I'll just put it here instead of emailing it:
> http://oss.oracle.com/~rdunlap/doc/cry32.out
> 
> -- 
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >