Re: ipsec impact on performance
On 12/02/2015 03:56 AM, David Laight wrote: From: Sowmini Varadhan Sent: 01 December 2015 18:37 ... I was using esp-null merely to not have the crypto itself perturb the numbers (i.e., just focus on the s/w overhead for now), but here are the numbers for the stock linux kernel stack Gbps peak cpu util esp-null 1.8 71% aes-gcm-c-2561.6 79% aes-ccm-a-1280.7 96% That trend made me think that if we can get esp-null to be as close as possible to GSO/GRO, the rest will follow closely behind. That's not how I read those figures. They imply to me that there is a massive cost for the actual encryption (particularly for aes-ccm-a-128) - so whatever you do to the esp-null case won't help. To build on the whole "importance of normalizing throughput and CPU utilization in some way" theme, the following are some non-IPSec netperf TCP_STREAM runs between a pair of 2xIntel E5-2603 v3 systems using Broadcom BCM57810-based NICs, 4.2.0-19 kernel, 7.10.72 firmware and bnx2x driver version 1.710.51-0: root@htx-scale300-258:~# ./take_numbers.sh Baseline MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.12.49.1 () port 0 AF_INET : +/-2.500% @ 99% conf. : demo : cpu bind Throughput Local Local Local Remote Remote Remote Throughput Local Remote CPU Service PeakCPUService PeakConfidence CPU CPU Util Demand Per CPU Util Demand Per CPU Width (%) Confidence Confidence % Util % % Util % Width (%) Width (%) 9414.111.87 0.195 26.54 3.70 0.387 45.42 0.002 7.073 1.276 Disable TSO/GSO 5651.258.36 1.454 100.00 2.46 0.428 30.35 1.093 1.101 4.889 Disable tx CKO 5287.698.46 1.573 100.00 2.34 0.435 29.66 0.428 7.710 3.518 Disable remote LRO/GRO 4148.768.32 1.971 99.97 5.95 1.409 71.98 3.656 0.735 3.491 Disable remote rx CKO 4204.498.31 1.942 100.00 6.68 1.563 82.05 2.015 0.437 4.921 You can see that as the offloads are disabled, the service demands (usec of CPU time consumed systemwide per KB of data transferred) go up, and until one hits a bottleneck (eg one of the CPUs pegs at 100%), go up faster than the throughputs go down. To aid in reproducibility those tests were with irqbalance disabled, all the IRQs for the NICs pointed at CPU 0, netperf/netserver bound to CPU 0, and the power management set to static high performance. Assuming I've created a "matching" ipsec.conf, here is what I see with esp=null-null on the TCP_STREAM test - again, keeping all the binding in place etc: 3077.378.01 2.560 97.78 8.21 2.625 99.41 4.869 1.876 0.955 You can see that even with the null-null, there is a rather large increase in service demand. And this is what I see when I run netperf TCP_RR (first is without ipsec, second is with. I didn't ask for confidence intervals this time around and I didn't try to tweak interrupt coalescing settings) # HDR="-P 1";for i in 10.12.49.1 192.168.0.2; do ./netperf -H $i -t TCP_RR -c -C -l 30 -T 0 $HDR; HDR="-P 0"; done MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.12.49.1 () port 0 AF_INET : demo : first burst 0 : cpu bind Local /Remote Socket Size Request Resp. Elapsed Trans. CPUCPUS.dem S.dem Send Recv SizeSize TimeRate local remote local remote bytes bytes bytes bytes secs. per sec % S% Sus/Tr us/Tr 16384 87380 1 1 30.00 30419.75 1.72 1.68 6.783 6.617 16384 87380 16384 87380 1 1 30.00 20711.39 2.15 2.05 12.450 11.882 16384 87380 The service demand increases ~83% on the netperf side and almost 80% on the netserver side. That is pure "effective" path-length increase. happy benchmarking, rick jones PS - the netperf commands were varations on this theme: ./netperf -P 0 -T 0 -H 10.12.49.1 -c -C -l 30 -i 30,3 -- -O throughput,local_cpu_util,local_sd,local_cpu_peak_util,remote_cpu_util,remote_sd,remote_cpu_peak_util,throughput_confid,local_cpu_confid,remote_cpu_confid altering IP address or test as appropriate. -P 0 disables printing the test banner/headers. -T 0 binds netperf and netserver to CPU0 on their respective systems. -H sets the destination, -c and -C ask for local and remote CPU measurements respectively. -l 30 says each test iteration should be 30 seconds long and -i 30,3 says to run at least three iterations and no more than 30 when trying to hit the confidence interval - by default 99% confident the average reported is within +/- 2.5% of the "actual" average. The -O stuff is selecting specific values to be emitted. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.ht
Re: ipsec impact on performance
On Wed, 2015-12-02 at 16:12 -0500, Sowmini Varadhan wrote: > IPv6 would be an interesting academic exercise Really, you made my day ! -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: n2 - Use platform_register/unregister_drivers()
From: Thierry Reding Date: Wed, 2 Dec 2015 17:16:36 +0100 > From: Thierry Reding > > These new helpers simplify implementing multi-driver modules and > properly handle failure to register one driver by unregistering all > previously registered drivers. > > Signed-off-by: Thierry Reding Acked-by: David S. Miller -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On (12/02/15 14:01), Tom Herbert wrote: > No, please don't persist is this myopic "we'll get to IPv6 later" > model! IPv6 is a real protocol, it has significant deployment of the > Internet, and there are now whole data centers that are IPv6 only > (e.g. FB), and there are plenty of use cases of IPSEC/IPv6 that could > benefit for performance improvements just as much IPv4. This vendor > mentality that IPv6 is still not important simply doesn't help > matters. :-( Ok, I'll get you the numbers for this later, and sure, if we do this, we should solve the ipv6 problem too. BTW, the ipv6 nov3 paths have severe alignment issues. I flagged this a long time ago http://www.spinics.net/lists/netdev/msg336257.html I think all of it is triggered by mld. Someone needs to do something about that too. I dont think those paths are using NET_ALIGN very well, and I dont think this is the most wholesome thing for perf. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On Wed, Dec 2, 2015 at 1:47 PM, Sowmini Varadhan wrote: > On (12/02/15 13:44), Tom Herbert wrote: >> > IPv6 would be an interesting academic exercise, but it's going >> > to be a while before we get RDS-TCP to go over IPv6. >> > >> Huh? Who said anything about RDS-TCP? I thought you were trying to >> improve IPsec performance... > > yes, and it would be nice to find out that IPsec for IPv6 is > fast, but I'm afraid there are a lot of IPv4 use cases out there that > need the same thing for IPv4 too (first?). > No, please don't persist is this myopic "we'll get to IPv6 later" model! IPv6 is a real protocol, it has significant deployment of the Internet, and there are now whole data centers that are IPv6 only (e.g. FB), and there are plenty of use cases of IPSEC/IPv6 that could benefit for performance improvements just as much IPv4. This vendor mentality that IPv6 is still not important simply doesn't help matters. :-( Tom -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On (12/02/15 13:44), Tom Herbert wrote: > > IPv6 would be an interesting academic exercise, but it's going > > to be a while before we get RDS-TCP to go over IPv6. > > > Huh? Who said anything about RDS-TCP? I thought you were trying to > improve IPsec performance... yes, and it would be nice to find out that IPsec for IPv6 is fast, but I'm afraid there are a lot of IPv4 use cases out there that need the same thing for IPv4 too (first?). --Sowmini -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On Wed, Dec 2, 2015 at 1:12 PM, Sowmini Varadhan wrote: > On (12/02/15 13:07), Tom Herbert wrote: >> That's easy enough to add to flow dissector, but is SPI really >> intended to be used an L4 entropy value? We would need to consider the > > yes. To quote https://en.wikipedia.org/wiki/Security_Parameter_Index > "This works like port numbers in TCP and UDP connections. What it means > is that there could be different SAs used to provide security to one > connection. An SA could therefore act as a set of rules." > >> effects of running multiple TCP connections over an IPsec. Also, you >> might want to try IPv6, the flow label should provide a good L4 hash >> for RPS/RFS, it would be interesting to see what the effects are with >> IPsec processing. (ESP/UDP could also if RSS/ECMP is critical) > > IPv6 would be an interesting academic exercise, but it's going > to be a while before we get RDS-TCP to go over IPv6. > Huh? Who said anything about RDS-TCP? I thought you were trying to improve IPsec performance... -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On (12/02/15 13:07), Tom Herbert wrote: > That's easy enough to add to flow dissector, but is SPI really > intended to be used an L4 entropy value? We would need to consider the yes. To quote https://en.wikipedia.org/wiki/Security_Parameter_Index "This works like port numbers in TCP and UDP connections. What it means is that there could be different SAs used to provide security to one connection. An SA could therefore act as a set of rules." > effects of running multiple TCP connections over an IPsec. Also, you > might want to try IPv6, the flow label should provide a good L4 hash > for RPS/RFS, it would be interesting to see what the effects are with > IPsec processing. (ESP/UDP could also if RSS/ECMP is critical) IPv6 would be an interesting academic exercise, but it's going to be a while before we get RDS-TCP to go over IPv6. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On Wed, Dec 2, 2015 at 12:50 PM, Sowmini Varadhan wrote: > On (12/02/15 12:41), David Laight wrote: >> You are getting 0.7 Gbps with ass-ccm-a-128, scale the esp-null back to >> that and it would use 7/18*71 = 27% of the cpu. >> So 69% of the cpu in the a-128 case is probably caused by the >> encryption itself. >> Even if the rest of the code cost nothing you'd not increase >> above 1Gbps. > > Fortunately, the situation is not quite hopeless yet. > > Thanks to Rick Jones for supplying the hints for this, but with > some careful manual pinning of irqs and iperf processes to cpus, > I can get to 4.5 Gbps for the esp-null case. > > Given that the [clear traffic + GSO without GRO] gets me about 5-7 Gbps, > the 4.5 Gbps is not that far off (and at that point, the nickel-and-dime > tweaks may help even more). > > For AES-GCM, I'm able to go from 1.8 Gbps (no GSO) to 2.8 Gbps. > Still not great, but proves that we haven't yet hit any upper bounds > yet. > > I think a lot of the manual tweaking of irq/process placement > is needed because the existing rps/rfs flow steering is looking > for TCP/UDP flow numbers to do the steering. It can just as easily > use the IPsec SPI numbers to do this, and that's another place where > we can make this more ipsec-friendly. > That's easy enough to add to flow dissector, but is SPI really intended to be used an L4 entropy value? We would need to consider the effects of running multiple TCP connections over an IPsec. Also, you might want to try IPv6, the flow label should provide a good L4 hash for RPS/RFS, it would be interesting to see what the effects are with IPsec processing. (ESP/UDP could also if RSS/ECMP is critical) Tom -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On (12/02/15 12:41), David Laight wrote: > You are getting 0.7 Gbps with ass-ccm-a-128, scale the esp-null back to > that and it would use 7/18*71 = 27% of the cpu. > So 69% of the cpu in the a-128 case is probably caused by the > encryption itself. > Even if the rest of the code cost nothing you'd not increase > above 1Gbps. Fortunately, the situation is not quite hopeless yet. Thanks to Rick Jones for supplying the hints for this, but with some careful manual pinning of irqs and iperf processes to cpus, I can get to 4.5 Gbps for the esp-null case. Given that the [clear traffic + GSO without GRO] gets me about 5-7 Gbps, the 4.5 Gbps is not that far off (and at that point, the nickel-and-dime tweaks may help even more). For AES-GCM, I'm able to go from 1.8 Gbps (no GSO) to 2.8 Gbps. Still not great, but proves that we haven't yet hit any upper bounds yet. I think a lot of the manual tweaking of irq/process placement is needed because the existing rps/rfs flow steering is looking for TCP/UDP flow numbers to do the steering. It can just as easily use the IPsec SPI numbers to do this, and that's another place where we can make this more ipsec-friendly. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 3/5] crypto: AES CBC multi-buffer scheduler
This patch implements in-order scheduler for encrypting multiple buffers in parallel supporting AES CBC encryption with key sizes of 128, 192 and 256 bits. It uses 8 data lanes by taking advantage of the SIMD instructions with XMM registers. The multibuffer manager and scheduler is mostly written in assembly and the initialization support is written C. The AES CBC multibuffer crypto driver support interfaces with the multibuffer manager and scheduler to support AES CBC encryption in parallel. The scheduler supports job submissions, job flushing and and job retrievals after completion. The basic flow of usage of the CBC multibuffer scheduler is as follows: - The caller allocates an aes_cbc_mb_mgr_inorder_x8 object and initializes it once by calling aes_cbc_init_mb_mgr_inorder_x8(). - The aes_cbc_mb_mgr_inorder_x8 structure has an array of JOB_AES objects. Allocation and scheduling of JOB_AES objects are managed by the multibuffer scheduler support routines. The caller allocates a JOB_AES using aes_cbc_get_next_job_inorder_x8(). - The returned JOB_AES must be filled in with parameters for CBC encryption (eg: plaintext buffer, ciphertext buffer, key, iv, etc) and submitted to the manager object using aes_cbc_submit_job_inorder_xx(). - If the oldest JOB_AES is completed during a call to aes_cbc_submit_job_inorder_x8(), it is returned. Otherwise, NULL is returned. - A call to aes_cbc_flush_job_inorder_x8() always returns the oldest job, unless the multibuffer manager is empty of jobs. - A call to aes_cbc_get_completed_job_inorder_x8() returns a completed job. This routine is useful to process completed jobs instead of waiting for the flusher to engage. - When a job is returned from submit or flush, the caller extracts the useful data and returns it to the multibuffer manager implicitly by the next call to aes_cbc_get_next_job_xx(). Jobs are always returned from submit or flush routines in the order they were submitted (hence "inorder").A job allocated using aes_cbc_get_next_job_inorder_x8() must be filled in and submitted before another call. A job returned by aes_cbc_submit_job_inorder_x8() or aes_cbc_flush_job_inorder_x8() is 'deallocated' upon the next call to get a job structure. Calls to get_next_job() cannot fail. If all jobs are allocated after a call to get_next_job(), the subsequent call to submit always returns the oldest job in a completed state. Originally-by: Chandramouli Narayanan Signed-off-by: Tim Chen --- arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c | 145 +++ arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S | 222 +++ arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S | 416 + 3 files changed, 783 insertions(+) create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S diff --git a/arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c b/arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c new file mode 100644 index 000..7a7f8a1 --- /dev/null +++ b/arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c @@ -0,0 +1,145 @@ +/* + * Initialization code for multi buffer AES CBC algorithm + * + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2015 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * Contact Information: + * James Guilford + * Sean Gulley + * Tim Chen + * + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE
Re: [PATCH v3 5/5] crypto: AES CBC multi-buffer glue code
On Tue, 2015-12-01 at 09:19 -0800, Tim Chen wrote: > On Thu, 2015-11-26 at 16:49 +0800, Herbert Xu wrote: > > On Tue, Nov 24, 2015 at 10:30:06AM -0800, Tim Chen wrote: > > > > > > On the decrypt path, we don't need to use multi-buffer algorithm > > > as aes-cbc decrypt can be parallelized inherently on a single > > > request. So most of the time the outer layer algorithm > > > cbc_mb_async_ablk_decrypt can bypass mcryptd and > > > invoke mb_aes_cbc_decrypt synchronously > > > to do aes_cbc_dec when fpu is available. > > > This avoids the overhead of going through mcryptd. Hence > > > the use of blkcipher on the inner layer. For the mcryptd > > > path, we will complete a decrypt request in one shot so > > > blkcipher usage should be fine. > > > > I think there is a misunderstanding here. Just because you're > > using/exporting through the ablkcipher interface doesn't mean > > that you are asynchrounous. For example, all blkcipher algorithms > > can be accessed through the ablkcipher interface and they of course > > remain synchrounous. > > > > So I don't see how using an ablkcipher in the inner layer changes > > anything at all. You can still return immediately and not bother > > with completion functions when you are synchrounous. > > > > Cheers, > > OK, I'll try to see if I can cast things back to the original ablkcipher > request and use that to walk the sg list. > Herbert, I've sent out a new version of this series to use ablkcipher on the inner layer of decrypt. Thanks. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 4/5] crypto: AES CBC by8 encryption
This patch introduces the assembly routine to do a by8 AES CBC encryption in support of the AES CBC multi-buffer implementation. Encryption of 8 data streams of a key size are done simultaneously. Originally-by: Chandramouli Narayanan Signed-off-by: Tim Chen --- arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S | 774 1 file changed, 774 insertions(+) create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S b/arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S new file mode 100644 index 000..eaffc28 --- /dev/null +++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S @@ -0,0 +1,774 @@ +/* + * AES CBC by8 multibuffer optimization (x86_64) + * This file implements 128/192/256 bit AES CBC encryption + * + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2015 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * Contact Information: + * James Guilford + * Sean Gulley + * Tim Chen + * + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + */ +#include + +/* stack size needs to be an odd multiple of 8 for alignment */ + +#define AES_KEYSIZE_12816 +#define AES_KEYSIZE_19224 +#define AES_KEYSIZE_25632 + +#define XMM_SAVE_SIZE 16*10 +#define GPR_SAVE_SIZE 8*9 +#define STACK_SIZE (XMM_SAVE_SIZE + GPR_SAVE_SIZE) + +#define GPR_SAVE_REG %rsp +#define GPR_SAVE_AREA %rsp + XMM_SAVE_SIZE +#define LEN_AREA_OFFSETXMM_SAVE_SIZE + 8*8 +#define LEN_AREA_REG %rsp +#define LEN_AREA %rsp + XMM_SAVE_SIZE + 8*8 + +#define IN_OFFSET 0 +#define OUT_OFFSET 8*8 +#define KEYS_OFFSET16*8 +#define IV_OFFSET 24*8 + + +#define IDX%rax +#define TMP%rbx +#define ARG%rdi +#define LEN%rsi + +#define KEYS0 %r14 +#define KEYS1 %r15 +#define KEYS2 %rbp +#define KEYS3 %rdx +#define KEYS4 %rcx +#define KEYS5 %r8 +#define KEYS6 %r9 +#define KEYS7 %r10 + +#define IN0%r11 +#define IN2%r12 +#define IN4%r13 +#define IN6LEN + +#define XDATA0 %xmm0 +#define XDATA1 %xmm1 +#define XDATA2 %xmm2 +#define XDATA3 %xmm3 +#define XDATA4 %xmm4 +#define XDATA5 %xmm5 +#define XDATA6 %xmm6 +#define XDATA7 %xmm7 + +#define XKEY0_3%xmm8 +#define XKEY1_4%xmm9 +#define XKEY2_5%xmm10 +#define XKEY3_6%xmm11 +#define XKEY4_7%xmm12 +#define XKEY5_8%xmm13 +#define XKEY6_9%xmm14 +#define XTMP %xmm15 + +#defineMOVDQ movdqu /* assume buffers not aligned */ +#define CONCAT(a, b) a##b +#define INPUT_REG_SUFX 1 /* IN */ +#define XDATA_REG_SUFX 2 /* XDAT */ +#define KEY_REG_SUFX 3 /* KEY */ +#define XMM_REG_SUFX 4 /* XMM */ + +/* + * To avoid positional parameter errors while compiling + * three registers need to be passed + */ +.text + +.macro pxor2 x, y, z + MOVDQ (\x,\y), XTMP + pxorXTMP, \z +.endm + +.macro inreg n
[PATCH v4 5/5] crypto: AES CBC multi-buffer glue code
This patch introduces the multi-buffer job manager which is responsible for submitting scatter-gather buffers from several AES CBC jobs to the multi-buffer algorithm. The glue code interfaces with the underlying algorithm that handles 8 data streams of AES CBC encryption in parallel. AES key expansion and CBC decryption requests are performed in a manner similar to the existing AESNI Intel glue driver. The outline of the algorithm for AES CBC encryption requests is sketched below: Any driver requesting the crypto service will place an async crypto request on the workqueue. The multi-buffer crypto daemon will pull an AES CBC encryption request from work queue and put each request in an empty data lane for multi-buffer crypto computation. When all the empty lanes are filled, computation will commence on the jobs in parallel and the job with the shortest remaining buffer will get completed and be returned. To prevent prolonged stall, when no new jobs arrive, we will flush workqueue of jobs after a maximum allowable delay has elapsed. To accommodate the fragmented nature of scatter-gather, we will keep submitting the next scatter-buffer fragment for a job for multi-buffer computation until a job is completed and no more buffer fragments remain. At that time we will pull a new job to fill the now empty data slot. We check with the multibuffer scheduler to see if there are other completed jobs to prevent extraneous delay in returning any completed jobs. This multi-buffer algorithm should be used for cases where we get at least 8 streams of crypto jobs submitted at a reasonably high rate. For low crypto job submission rate and low number of data streams, this algorithm will not be beneficial. The reason is at low rate, we do not fill out the data lanes before flushing the jobs instead of processing them with all the data lanes full. We will miss the benefit of parallel computation, and adding delay to the processing of the crypto job at the same time. Some tuning of the maximum latency parameter may be needed to get the best performance. Originally-by: Chandramouli Narayanan Signed-off-by: Tim Chen --- arch/x86/crypto/Makefile| 1 + arch/x86/crypto/aes-cbc-mb/Makefile | 22 + arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c | 835 3 files changed, 858 insertions(+) create mode 100644 arch/x86/crypto/aes-cbc-mb/Makefile create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index b9b912a..000db49 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -33,6 +33,7 @@ obj-$(CONFIG_CRYPTO_CRC32_PCLMUL) += crc32-pclmul.o obj-$(CONFIG_CRYPTO_SHA256_SSSE3) += sha256-ssse3.o obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o +obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb/ obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o # These modules require assembler to support AVX. diff --git a/arch/x86/crypto/aes-cbc-mb/Makefile b/arch/x86/crypto/aes-cbc-mb/Makefile new file mode 100644 index 000..b642bd8 --- /dev/null +++ b/arch/x86/crypto/aes-cbc-mb/Makefile @@ -0,0 +1,22 @@ +# +# Arch-specific CryptoAPI modules. +# + +avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no) + +# we need decryption and key expansion routine symbols +# if either AESNI_NI_INTEL or AES_CBC_MB is a module + +ifeq ($(CONFIG_CRYPTO_AES_NI_INTEL),m) + dec_support := ../aesni-intel_asm.o +endif +ifeq ($(CONFIG_CRYPTO_AES_CBC_MB),m) + dec_support := ../aesni-intel_asm.o +endif + +ifeq ($(avx_supported),yes) + obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb.o + aes-cbc-mb-y := $(dec_support) aes_cbc_mb.o aes_mb_mgr_init.o \ + mb_mgr_inorder_x8_asm.o mb_mgr_ooo_x8_asm.o \ + aes_cbc_enc_x8.o +endif diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c new file mode 100644 index 000..4d16a5d --- /dev/null +++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c @@ -0,0 +1,835 @@ +/* + * Multi buffer AES CBC algorithm glue code + * + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2015 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * Contact Information: + * James Guilford + * Sean Gulley + * Tim Chen + */ + +#define pr_fmt(fmt)KBUILD_MODNAME "
[PATCH v4 2/5] crypto: AES CBC multi-buffer data structures
This patch introduces the data structures and prototypes of functions needed for doing AES CBC encryption using multi-buffer. Included are the structures of the multi-buffer AES CBC job, job scheduler in C and data structure defines in x86 assembly code. Originally-by: Chandramouli Narayanan Signed-off-by: Tim Chen --- arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h| 96 + arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h| 131 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S | 270 + arch/x86/crypto/aes-cbc-mb/reg_sizes.S | 125 4 files changed, 622 insertions(+) create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S create mode 100644 arch/x86/crypto/aes-cbc-mb/reg_sizes.S diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h new file mode 100644 index 000..5493f83 --- /dev/null +++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h @@ -0,0 +1,96 @@ +/* + * Header file for multi buffer AES CBC algorithm manager + * that deals with 8 buffers at a time + * + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2015 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * Contact Information: + * James Guilford + * Sean Gulley + * Tim Chen + * + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + */ +#ifndef __AES_CBC_MB_CTX_H +#define __AES_CBC_MB_CTX_H + + +#include + +#include "aes_cbc_mb_mgr.h" + +#define CBC_ENCRYPT0x01 +#define CBC_DECRYPT0x02 +#define CBC_START 0x04 +#define CBC_DONE 0x08 + +#define CBC_CTX_STS_IDLE 0x00 +#define CBC_CTX_STS_PROCESSING 0x01 +#define CBC_CTX_STS_LAST 0x02 +#define CBC_CTX_STS_COMPLETE 0x04 + +enum cbc_ctx_error { + CBC_CTX_ERROR_NONE = 0, + CBC_CTX_ERROR_INVALID_FLAGS = -1, + CBC_CTX_ERROR_ALREADY_PROCESSING = -2, + CBC_CTX_ERROR_ALREADY_COMPLETED = -3, +}; + +#define cbc_ctx_init(ctx, nbytes, op) \ + do { \ + (ctx)->flag = (op) | CBC_START; \ + (ctx)->nbytes = nbytes; \ + } while (0) + +/* AESNI routines to perform cbc decrypt and key expansion */ + +asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, + const u8 *in, unsigned int len, u8 *iv); +asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, +unsigned int key_len); + +#endif /* __AES_CBC_MB_CTX_H */ diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h new file mode 100644 index 000..0def82e --- /dev/null +++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h @@ -0,0 +1,131 @@ +/* + * Header file for multi buffer AES CBC algorithm manager + *
[PATCH v4 0/5] crypto: x86 AES-CBC encryption with multibuffer
In this patch series, we introduce AES CBC encryption that is parallelized on x86_64 cpu with XMM registers. The multi-buffer technique encrypt 8 data streams in parallel with SIMD instructions. Decryption is handled as in the existing AESNI Intel CBC implementation which can already parallelize decryption even for a single data stream. Please see the multi-buffer whitepaper for details of the technique: http://www.intel.com/content/www/us/en/communications/communications-ia-multi-buffer-paper.html It is important that any driver uses this algorithm properly for scenarios where we have many data streams that can fill up the data lanes most of the time. It shouldn't be used when only a single data stream is expected mostly. Otherwise we may incurr extra delays when we have frequent gaps in data lanes, causing us to wait till data come in to fill the data lanes before initiating encryption. We may have to wait for flush operations to commence when no new data come in after some wait time. However we keep this extra delay to a minimum by opportunistically flushing the unfinished jobs if crypto daemon is the only active task running on a cpu. By using this technique, we saw a throughput increase of up to 5.7x under optimal conditions when we have fully loaded encryption jobs filling up all the data lanes. Change Log: v4 1. Make the decrypt path also use ablkcpher walk. v3 1. Use ablkcipher_walk helpers to walk the scatter gather list and eliminated needs to modify blkcipher_walk for multibuffer cipher v2 1. Update cpu feature check to make sure SSE is supported 2. Fix up unloading of aes-cbc-mb module to properly free memory Tim Chen (5): crypto: Multi-buffer encryption infrastructure support crypto: AES CBC multi-buffer data structures crypto: AES CBC multi-buffer scheduler crypto: AES CBC by8 encryption crypto: AES CBC multi-buffer glue code arch/x86/crypto/Makefile | 1 + arch/x86/crypto/aes-cbc-mb/Makefile| 22 + arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S| 774 +++ arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c| 835 + arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h| 96 +++ arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h| 131 arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c | 145 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S | 270 +++ arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S | 222 ++ arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S | 416 ++ arch/x86/crypto/aes-cbc-mb/reg_sizes.S | 125 +++ crypto/Kconfig | 16 + crypto/mcryptd.c | 256 ++- include/crypto/algapi.h| 1 + include/crypto/mcryptd.h | 36 + 15 files changed, 3345 insertions(+), 1 deletion(-) create mode 100644 arch/x86/crypto/aes-cbc-mb/Makefile create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S create mode 100644 arch/x86/crypto/aes-cbc-mb/reg_sizes.S -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/5] crypto: Multi-buffer encryption infrastructure support
In this patch, the infrastructure needed to support multibuffer encryption implementation is added: a) Enhace mcryptd daemon to support blkcipher requests. b) Update configuration to include multi-buffer encryption build support. For an introduction to the multi-buffer implementation, please see http://www.intel.com/content/www/us/en/communications/communications-ia-multi-buffer-paper.html Originally-by: Chandramouli Narayanan Signed-off-by: Tim Chen --- crypto/Kconfig | 16 +++ crypto/mcryptd.c | 256 ++- include/crypto/algapi.h | 1 + include/crypto/mcryptd.h | 36 +++ 4 files changed, 308 insertions(+), 1 deletion(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index 7240821..6b51084 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -888,6 +888,22 @@ config CRYPTO_AES_NI_INTEL ECB, CBC, LRW, PCBC, XTS. The 64 bit version has additional acceleration for CTR. +config CRYPTO_AES_CBC_MB + tristate "AES CBC algorithm (x86_64 Multi-Buffer, Experimental)" + depends on X86 && 64BIT + select CRYPTO_ABLK_HELPER + select CRYPTO_MCRYPTD + help + AES CBC encryption implemented using multi-buffer technique. + This algorithm computes on multiple data lanes concurrently with + SIMD instructions for better throughput. It should only be + used when there is significant work to generate many separate + crypto requests that keep all the data lanes filled to get + the performance benefit. If the data lanes are unfilled, a + flush operation will be initiated after some delay to process + the exisiting crypto jobs, adding some extra latency at low + load case. + config CRYPTO_AES_SPARC64 tristate "AES cipher algorithms (SPARC64)" depends on SPARC64 diff --git a/crypto/mcryptd.c b/crypto/mcryptd.c index fe5b495a..01f747c 100644 --- a/crypto/mcryptd.c +++ b/crypto/mcryptd.c @@ -116,8 +116,28 @@ static int mcryptd_enqueue_request(struct mcryptd_queue *queue, return err; } +static int mcryptd_enqueue_blkcipher_request(struct mcryptd_queue *queue, + struct crypto_async_request *request, + struct mcryptd_blkcipher_request_ctx *rctx) +{ + int cpu, err; + struct mcryptd_cpu_queue *cpu_queue; + + cpu = get_cpu(); + cpu_queue = this_cpu_ptr(queue->cpu_queue); + rctx->tag.cpu = cpu; + + err = crypto_enqueue_request(&cpu_queue->queue, request); + pr_debug("enqueue request: cpu %d cpu_queue %p request %p\n", +cpu, cpu_queue, request); + queue_work_on(cpu, kcrypto_wq, &cpu_queue->work); + put_cpu(); + + return err; +} + /* - * Try to opportunisticlly flush the partially completed jobs if + * Try to opportunistically flush the partially completed jobs if * crypto daemon is the only task running. */ static void mcryptd_opportunistic_flush(void) @@ -225,6 +245,130 @@ static inline struct mcryptd_queue *mcryptd_get_queue(struct crypto_tfm *tfm) return ictx->queue; } +static int mcryptd_blkcipher_setkey(struct crypto_ablkcipher *parent, + const u8 *key, unsigned int keylen) +{ + struct mcryptd_blkcipher_ctx *ctx = crypto_ablkcipher_ctx(parent); + struct crypto_blkcipher *child = ctx->child; + int err; + + crypto_blkcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK); + crypto_blkcipher_set_flags(child, crypto_ablkcipher_get_flags(parent) & + CRYPTO_TFM_REQ_MASK); + err = crypto_blkcipher_setkey(child, key, keylen); + crypto_ablkcipher_set_flags(parent, crypto_blkcipher_get_flags(child) & + CRYPTO_TFM_RES_MASK); + return err; +} + +static void mcryptd_blkcipher_crypt(struct ablkcipher_request *req, + struct crypto_blkcipher *child, + int err, + int (*crypt)(struct blkcipher_desc *desc, + struct scatterlist *dst, + struct scatterlist *src, + unsigned int len)) +{ + struct mcryptd_blkcipher_request_ctx *rctx; + struct blkcipher_desc desc; + + rctx = ablkcipher_request_ctx(req); + + if (unlikely(err == -EINPROGRESS)) + goto out; + + /* set up the blkcipher request to work on */ + desc.tfm = child; + desc.info = req->info; + desc.flags = CRYPTO_TFM_REQ_MAY_SLEEP; + rctx->desc = desc; + + /* +* pass addr of descriptor stored in the request context +* so that the callee can get to the request context +*/ + err = crypt(&rctx->desc, req->dst, req->src, req->nbytes); + if
Re: [RFC] KEYS: Exposing {a,}symmetric key ops to userspace and other bits
On Sun, 2015-11-22 at 09:41 -0500, Mimi Zohar wrote: > On Fri, 2015-11-20 at 11:07 +, David Howells wrote: > > > > (*) Add Mimi's patches to allow keys/keyrings to be marked undeletable. > > This > > is for the purpose of creating blacklists and to prevent people from > > removing entries in the blacklist. Note that only the kernel can > > create > > a blacklist - we don't want userspace generating them as a way to take > > up > > kernel space. > > > > I think the right way to do this is to not allow marked keys to be > > unlinked from marked keyrings, but to allow marked keys to be unlinked > > from ordinary keyrings. > > > > The reason the 'keep' mark is required on individual keys is to prevent > > the keys from being directly revoked, expired or invalidated by keyctl > > without reference to the keyring. Marked keys that are set expirable > > when they're created will still expire and be subsequently removed and > > if > > a marked key or marked keyring loses all its references it still gets > > gc'd. > > Agreed. I'll fix and re-post soon. In addition to Petko's 3 patches, the ima-keyrings branch (git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git) contains these two patches. d939a88 IMA: prevent keys on the .ima_blacklist from being removed 77f33b5 KEYS: prevent keys from being removed from specified keyrings As the IMA patch is dependent on the KEYS patch, do you mind if the KEYS patch would be upstreamed together with this patch set? Mimi -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: n2 - Use platform_register/unregister_drivers()
From: Thierry Reding These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Signed-off-by: Thierry Reding --- drivers/crypto/n2_core.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/crypto/n2_core.c b/drivers/crypto/n2_core.c index 5450880abb7b..739a786b9f08 100644 --- a/drivers/crypto/n2_core.c +++ b/drivers/crypto/n2_core.c @@ -2243,22 +2243,19 @@ static struct platform_driver n2_mau_driver = { .remove = n2_mau_remove, }; +static struct platform_driver * const drivers[] = { + &n2_crypto_driver, + &n2_mau_driver, +}; + static int __init n2_init(void) { - int err = platform_driver_register(&n2_crypto_driver); - - if (!err) { - err = platform_driver_register(&n2_mau_driver); - if (err) - platform_driver_unregister(&n2_crypto_driver); - } - return err; + return platform_register_drivers(drivers, ARRAY_SIZE(drivers)); } static void __exit n2_exit(void) { - platform_driver_unregister(&n2_mau_driver); - platform_driver_unregister(&n2_crypto_driver); + platform_unregister_drivers(drivers, ARRAY_SIZE(drivers)); } module_init(n2_init); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On (12/02/15 12:41), David Laight wrote: > > Also what/how are you measuring cpu use. > I'm not sure anything on Linux gives you a truly accurate value > when processes are running for very short periods. I was using mpstat, while running iperf. Should I be using something else? or running it for longer intervals? but I hope we are not doomed at 1 Gbps, or else security itself would come at a very unattractive cost. Anyway, even aside from crypto. we need to have some way to add TCP options (that depend on the contents of the tcp header) etc post-GSO, in the interest of not ossifying the stack. > On an SMP system you also get big effects when work is switched > between cpus. I've got some tests that run a lot faster if I > put all but one of the cpus into a busy-loop in userspace > (eg: while :; do :; done)! yes Rick Jones also pointed the same thing to me, and one of the things I was going to try out later today is to instrument the effects of pinning irqs and iperf threads to a specific cpu. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ipsec impact on performance
From: Sowmini Varadhan > Sent: 01 December 2015 18:37 ... > I was using esp-null merely to not have the crypto itself perturb > the numbers (i.e., just focus on the s/w overhead for now), but here > are the numbers for the stock linux kernel stack > Gbps peak cpu util > esp-null 1.8 71% > aes-gcm-c-2561.6 79% > aes-ccm-a-1280.7 96% > > That trend made me think that if we can get esp-null to be as close > as possible to GSO/GRO, the rest will follow closely behind. That's not how I read those figures. They imply to me that there is a massive cost for the actual encryption (particularly for aes-ccm-a-128) - so whatever you do to the esp-null case won't help. One way to get a view of the cost of the encryption (and copies) is to do the operation twice. David -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ipsec impact on performance
From: Sowmini Varadhan > Sent: 02 December 2015 12:12 > On (12/02/15 11:56), David Laight wrote: > > > Gbps peak cpu util > > > esp-null 1.8 71% > > > aes-gcm-c-2561.6 79% > > > aes-ccm-a-1280.7 96% > > > > > > That trend made me think that if we can get esp-null to be as close > > > as possible to GSO/GRO, the rest will follow closely behind. > > > > That's not how I read those figures. > > They imply to me that there is a massive cost for the actual encryption > > (particularly for aes-ccm-a-128) - so whatever you do to the esp-null > > case won't help. > > I'm not a crypto expert, but my understanding is that the CCM mode > is the "older" encryption algorithm, and GCM is the way of the future. > Plus, I think the GCM mode has some type of h/w support (hence the > lower cpu util) > > I'm sure that crypto has a cost, not disputing that, but my point > was that 1.8 -> 1.6 -> 0.7 is a curve with a much gentler slope than > the 9 Gbps (clear traffic, GSO, GRO) > -> 4 Gbps (clear, no gro, gso) >-> 1.8 (esp-null) > That steeper slope smells of s/w perf that we need to resolve first, > before getting into the work of faster crypto? That isn't the way cpu cost works. You are getting 0.7 Gbps with ass-ccm-a-128, scale the esp-null back to that and it would use 7/18*71 = 27% of the cpu. So 69% of the cpu in the a-128 case is probably caused by the encryption itself. Even if the rest of the code cost nothing you'd not increase above 1Gbps. The sums for aes-gcm-c-256 are slightly better, about 15%. Ok, things aren't quite that simple since you are probably changing the way data flows through the system as well. Also what/how are you measuring cpu use. I'm not sure anything on Linux gives you a truly accurate value when processes are running for very short periods. On an SMP system you also get big effects when work is switched between cpus. I've got some tests that run a lot faster if I put all but one of the cpus into a busy-loop in userspace (eg: while :; do :; done)! David -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On (12/02/15 11:56), David Laight wrote: > > Gbps peak cpu util > > esp-null 1.8 71% > > aes-gcm-c-2561.6 79% > > aes-ccm-a-1280.7 96% > > > > That trend made me think that if we can get esp-null to be as close > > as possible to GSO/GRO, the rest will follow closely behind. > > That's not how I read those figures. > They imply to me that there is a massive cost for the actual encryption > (particularly for aes-ccm-a-128) - so whatever you do to the esp-null > case won't help. I'm not a crypto expert, but my understanding is that the CCM mode is the "older" encryption algorithm, and GCM is the way of the future. Plus, I think the GCM mode has some type of h/w support (hence the lower cpu util) I'm sure that crypto has a cost, not disputing that, but my point was that 1.8 -> 1.6 -> 0.7 is a curve with a much gentler slope than the 9 Gbps (clear traffic, GSO, GRO) -> 4 Gbps (clear, no gro, gso) -> 1.8 (esp-null) That steeper slope smells of s/w perf that we need to resolve first, before getting into the work of faster crypto? > One way to get a view of the cost of the encryption (and copies) > is to do the operation twice. I could also just instrument it with perf tracepoints, if that data is interesting --Sowmini -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On (12/02/15 07:53), Steffen Klassert wrote: > > I'm currently working on a GRO/GSO codepath for IPsec too. The GRO part > works already. I decapsulate/decrypt the packets on layer2 with a esp GRO > callback function and reinject them into napi_gro_receive(). So in case > the decapsulated packet is TCP, GRO can aggregate big packets. Would you be able to share your patch with me? I'd like to give that a try just to get preliminary numbers (and I could massage it as needed for transport mode too). > My approach to GSO is a bit different to yours. I focused on tunnel mode, > but transport mode should work too. I encapsulate the big GSO packets > but don't do the encryption. Then I've added a esp_gso_segment() function, > so the (still not encrypted ESP packets) get segmented with GSO. Finally I > do encryption for all segments. This works well as long as I do sync crypto. > The hard part is when crypto returns async. This is what I'm working on now. > I hope to get this ready during the next weeks that I can post a RFC version > and some numbers. I see. My thought for attacking tunnel mode would have been to callout the esp code at the tail of gre_gso_segment, but I did not yet consider this carefully - clearly you've spent more time on it, and know more about all the gotchas there. > Also I tried to consider the IPsec GRO/GSO codepath as a software fallback. > So I added hooks for the encapsulation, encryption etc. If a NIC can do > IPsec, it can use this hooks to prepare the packets the way it needs it. > There are NICs that can do IPsec, it's just that our stack does not support > it. yes, this is one of the things I wanted to bring up at netdev 1.1. Evidently many of the 10G NICS (Niantic, Twinville, Sageville) already support ipsec offload but that feature is not enabled for BSD or linux because the stack does not support it (though Microsoft does. The intel folks pointed me at this doc: https://msdn.microsoft.com/en-us/library/windows/hardware/ff556996%28v=vs.85%29.aspx) But quite independant of h/w offload, the s/w stack can already do a very good job for 10G with just GSO and GRO, so being able to extend that path to do encryption after segmentation should at least bridge the huge gap between the ipsec and non-ipsec mech. And that gap should be as small as possible for esp-null, so that the only big hit we take is for the complexity of encryption itself! > Another thing, I thought about setting up an IPsec BoF/workshop at > netdev1.1. My main topic is GRO/GSO for IPsec. I'll send out a mail > to the list later this week to see if there is enough interest and > maybe some additional topics. Sounds like an excellent idea. I'm certainly interested. --Sowmini > -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html