Re: [PATCH 2/2] [v2] crypto: sha1: add ARM NEON implementation
On 29 June 2014 16:33, Jussi Kivilinna jussi.kivili...@iki.fi wrote: This patch adds ARM NEON assembly implementation of SHA-1 algorithm. tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm: block-size bytes/updateold-vs-new 16 16 1.04x 64 16 1.02x 64 64 1.05x 256 16 1.03x 256 64 1.04x 256 256 1.30x 102416 1.03x 1024256 1.36x 102410241.52x 204816 1.03x 2048256 1.39x 204810241.55x 204820481.59x 409616 1.03x 4096256 1.40x 409610241.57x 409640961.62x 819216 1.03x 8192256 1.40x 819210241.58x 819240961.63x 819281921.63x Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version - Move contants to .text section - Further tweaks to implementation for ~10% speed-up. Please move the changelog to below the '---' so it doesn't end up in the kernel commit log. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested on Exynos-5250 (Cortex-A15) ARM asm === [ 1478.699012] testing speed of sha1 [ 1478.699040] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 873594 opers/sec, 13977514 bytes/sec [ 1481.694959] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 386415 opers/sec, 24730581 bytes/sec [ 1484.694958] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 543196 opers/sec, 34764586 bytes/sec [ 1487.694959] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 141109 opers/sec, 36123989 bytes/sec [ 1490.694959] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 218391 opers/sec, 55908266 bytes/sec [ 1493.694958] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 256225 opers/sec, 65593685 bytes/sec [ 1496.694959] test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 39845 opers/sec, 40801280 bytes/sec [ 1499.694973] test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 78594 opers/sec, 80480597 bytes/sec [ 1502.694966] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 83790 opers/sec, 85801642 bytes/sec [ 1505.694966] test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 20204 opers/sec, 41379157 bytes/sec [ 1508.694989] test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 41075 opers/sec, 84121600 bytes/sec [ 1511.694979] test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 43358 opers/sec, 88797184 bytes/sec [ 1514.694960] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 44168 opers/sec, 90457429 bytes/sec [ 1517.694968] test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 10331 opers/sec, 42315776 bytes/sec [ 1520.694967] test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 21004 opers/sec, 86032384 bytes/sec [ 1523.694955] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 22193 opers/sec, 90903893 bytes/sec [ 1526.694989] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 22671 opers/sec, 92860416 bytes/sec [ 1529.695000] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 5192 opers/sec, 42538325 bytes/sec [ 1532.695110] test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 10628 opers/sec, 87067306 bytes/sec [ 1535.695015] test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 11233 opers/sec, 92026197 bytes/sec [ 1538.694997] test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 11393 opers/sec, 93334186 bytes/sec [ 1541.694980] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 11427 opers/sec, 93615445 bytes/sec ARM neon [ 1582.519068] testing speed of sha1 [ 1582.519097] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 900970 opers/sec, 14415520 bytes/sec [ 1585.514959] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 406465 opers/sec, 26013802 bytes/sec [ 1588.514961] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 579712 opers/sec, 37101610 bytes/sec [ 1591.514958] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 139189 opers/sec, 35632554 bytes/sec [ 1594.514964] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 234671 opers/sec, 60075861 bytes/sec [ 1597.514960] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 347872 opers/sec, 89055402 bytes/sec [ 1600.514959]
Re: [PATCH] [v2] crypto: sha512: add ARM NEON implementation
On 29 June 2014 16:34, Jussi Kivilinna jussi.kivili...@iki.fi wrote: This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/updateold-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 102416 2.53x 1024256 3.39x 102410243.52x 204816 2.50x 2048256 3.41x 204810243.54x 204820483.57x 409616 2.49x 4096256 3.42x 409610243.56x 409640963.59x 819216 2.48x 8192256 3.42x 819210243.56x 819240963.60x 819281923.60x Nice speedup! Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version Please move Changelog below '---' Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested on Exynos-5250 (Cortex-A15) ARM-asm [ 1715.164122] testing speed of sha512 [ 1715.164150] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 136277 opers/sec, 2180437 bytes/sec [ 1718.159959] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 126636 opers/sec, 8104746 bytes/sec [ 1721.159962] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 136605 opers/sec, 8742720 bytes/sec [ 1724.159958] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 41576 opers/sec, 10643541 bytes/sec [ 1727.159957] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 45984 opers/sec, 11771989 bytes/sec [ 1730.159959] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 47479 opers/sec, 12154794 bytes/sec [ 1733.159977] test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 13410 opers/sec, 13731840 bytes/sec [ 1736.160027] test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 15916 opers/sec, 16298325 bytes/sec [ 1739.159975] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 16095 opers/sec, 16481280 bytes/sec [ 1742.159993] test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 7042 opers/sec, 14423381 bytes/sec [ 1745.159994] test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 8438 opers/sec, 17281024 bytes/sec [ 1748.159995] test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 8541 opers/sec, 17492650 bytes/sec [ 1751.160001] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 8560 opers/sec, 17531562 bytes/sec [ 1754.159975] test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 3612 opers/sec, 14794752 bytes/sec [ 1757.160103] test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 4350 opers/sec, 17820330 bytes/sec [ 1760.160122] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 4405 opers/sec, 18042880 bytes/sec [ 1763.159957] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 4463 opers/sec, 18280448 bytes/sec [ 1766.160049] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 1829 opers/sec, 14988629 bytes/sec [ 1769.160328] test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 2209 opers/sec, 18101589 bytes/sec [ 1772.160318] test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 2238 opers/sec, 18333696 bytes/sec [ 1775.160278] test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 2245 opers/sec, 18393770 bytes/sec [ 1778.160025] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 2267 opers/sec, 18576725 bytes/sec ARM-neon = [ 1810.729100] testing speed of sha512 [ 1810.729130] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 330941 opers/sec, 5295066 bytes/sec [ 1813.724958] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 277607 opers/sec, 17766890 bytes/sec [ 1816.724958] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 330251 opers/sec, 21136085 bytes/sec [ 1819.724956] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 89849 opers/sec, 23001429 bytes/sec [ 1822.724961] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 113344 opers/sec, 29016149 bytes/sec [ 1825.724963] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 127466 opers/sec, 32631381 bytes/sec [ 1828.724960] test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 27818 opers/sec, 28485632 bytes/sec [
Re: [crypto] BUG: unable to handle kernel paging request at ffff88000bb88000
Am Montag, 30. Juni 2014, 13:31:26 schrieb Fengguang Wu: Hi Fengguang, Hi Stephan, On Sun, Jun 29, 2014 at 09:45:48PM +0200, Stephan Mueller wrote: Am Sonntag, 29. Juni 2014, 22:52:46 schrieb Fengguang Wu: Hi Fengguang, Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is May I ask whether there is anything special in your kernel config? It's an x86_64 randconfig. You may find it in the attachment of the original report email. Thanks, I used that config. I was just wondering whether there were some special config options that changed the memory allocation mechanism. The kernel configs I used never triggered the issue albeit it should have had. I ran stress tests months ago (with the bug present) where I invoked the DRBG for one day, causing billions of rounds of RNG operation where each round should have triggered the bug. This very bug should have been triggered already in all previous code levels! I am seriously wondering why this bug was not triggered before -- does kalloc somehow allocates more memory than you requested? And only your specific kernel config made kalloc to allocate the exact amount of memory that was requested? Yeah the bug may have been triggered in other places. If you see anything valuable from this bisect result, it would be great. Judging from the comparison of 64d1cdfbe2 and its parent commit 3332ee2a17, it's pretty reproducible, so easy to verify the possible fixes. Well, it is not so reproducible as you may think. And I as far as I can see the other oops that you send was caused by the same issue. When I was debugging the issue and just adding some printk statements, the crasher went away (reliably) or it crashed at some other random places. It was very bizarre. But after adding my fix, I did not see any crash any more. Ciao Stephan -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 1/3] ima: use ahash API for file hash calculation
On 26/06/14 14:54, Mimi Zohar wrote: On Thu, 2014-06-19 at 18:20 +0300, Dmitry Kasatkin wrote: Async hash API allows to use HW acceleration for hash calculation. It may give significant performance gain or/and reduce power consumption, which might be very beneficial for battery powered devices. This patch introduces hash calculation using ahash API. ahash peformance depends on data size and particular HW. Under certain limit, depending on the system, shash performance may be better. This patch also introduces 'ima_ahash_size' kernel parameter which can be used to defines minimal data size to use with ahash. When this parameter is not set or file size is smaller than defined by this parameter, shash will be used. Thus, by defult, original shash implementation is used. Signed-off-by: Dmitry Kasatkin d.kasat...@samsung.com --- Documentation/kernel-parameters.txt | 3 + security/integrity/ima/ima_crypto.c | 182 +++- 2 files changed, 181 insertions(+), 4 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index a0c155c..f8efb01 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1286,6 +1286,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ihash_entries= [KNL] Set number of hash buckets for inode cache. +ima_ahash_size=size [IMA] +Set the minimal file size when use ahash API. + ima_appraise= [IMA] appraise integrity measurements Format: { off | enforce | fix } default: enforce diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c index ccd0ac8..b7a8650 100644 --- a/security/integrity/ima/ima_crypto.c +++ b/security/integrity/ima/ima_crypto.c @@ -25,7 +25,25 @@ #include crypto/hash_info.h #include ima.h + +struct ahash_completion { +struct completion completion; +int err; +}; + static struct crypto_shash *ima_shash_tfm; +static struct crypto_ahash *ima_ahash_tfm; + +/* data size for ahash use */ +static loff_t ima_ahash_size; + +static int __init ima_ahash_setup(char *str) +{ +int rc = kstrtoll(str, 10, ima_ahash_size); In general, variable definitions should be separated from code. A simple initialization is fine. Please separate variable definitions from code with a blank line. +pr_info(ima_ahash_size = %lld\n, ima_ahash_size); +return !rc; +} +__setup(ima_ahash_size=, ima_ahash_setup); This boot parameter name doesn't reflect its purpose, defining the minimum file size for using ahash. The next patch defines an additional boot parameter ima_ahash_bufsize. Perhaps defining a single boot parameter (eg. ima_ahash=) with multiple fields would be better. /** * ima_kernel_read - read file content @@ -68,6 +86,14 @@ int ima_init_crypto(void) hash_algo_name[ima_hash_algo], rc); return rc; } +ima_ahash_tfm = crypto_alloc_ahash(hash_algo_name[ima_hash_algo], 0, 0); +if (IS_ERR(ima_ahash_tfm)) { +rc = PTR_ERR(ima_ahash_tfm); +crypto_free_shash(ima_shash_tfm); Only crypto_alloc_ahash() failed, not crypto_alloc_shash(). shash has worked fine up to now. Why require both shash and ahash to succeed? +pr_err(Can not allocate %s (reason: %ld)\n, + hash_algo_name[ima_hash_algo], rc); +return rc; +} return 0; } @@ -93,9 +119,143 @@ static void ima_free_tfm(struct crypto_shash *tfm) crypto_free_shash(tfm); } -/* - * Calculate the MD5/SHA1 file digest - */ +static struct crypto_ahash *ima_alloc_atfm(enum hash_algo algo) +{ +struct crypto_ahash *tfm = ima_ahash_tfm; +int rc; + +if (algo != ima_hash_algo algo HASH_ALGO__LAST) { +tfm = crypto_alloc_ahash(hash_algo_name[algo], 0, 0); +if (IS_ERR(tfm)) { +rc = PTR_ERR(tfm); +pr_err(Can not allocate %s (reason: %d)\n, + hash_algo_name[algo], rc); +} +} +return tfm; +} + +static void ima_free_atfm(struct crypto_ahash *tfm) +{ +if (tfm != ima_ahash_tfm) +crypto_free_ahash(tfm); +} + +static void ahash_complete(struct crypto_async_request *req, int err) +{ +struct ahash_completion *res = req-data; + +if (err == -EINPROGRESS) +return; +res-err = err; +complete(res-completion); +} + +static int ahash_wait(int err, struct ahash_completion *res) +{ +switch (err) { +case 0: +break; +case -EINPROGRESS: +case -EBUSY: +wait_for_completion(res-completion); +reinit_completion(res-completion); +err = res-err; +/* fall through */ +default: +
Re: [PATCH v1 1/3] ima: use ahash API for file hash calculation
On Mon, 2014-06-30 at 17:58 +0300, Dmitry Kasatkin wrote: On 26/06/14 14:54, Mimi Zohar wrote: On Thu, 2014-06-19 at 18:20 +0300, Dmitry Kasatkin wrote: @@ -156,7 +316,7 @@ out: return rc; } -int ima_calc_file_hash(struct file *file, struct ima_digest_data *hash) +static int ima_calc_file_shash(struct file *file, struct ima_digest_data *hash) { struct crypto_shash *tfm; int rc; @@ -172,6 +332,20 @@ int ima_calc_file_hash(struct file *file, struct ima_digest_data *hash) return rc; } +int ima_calc_file_hash(struct file *file, struct ima_digest_data *hash) +{ + loff_t i_size = i_size_read(file_inode(file)); + + /* shash is more efficient small data + * ahash performance depends on data size and particular HW + * ima_ahash_size allows to specify the best value for the system + */ + if (ima_ahash_size i_size = ima_ahash_size) + return ima_calc_file_ahash(file, hash); + else + return ima_calc_file_shash(file, hash); +} If calculating the file hash using ahash fails, should it fall back to using shash? If ahash fails, then it could be a HW error, which should not happen. IF HW fails device is broken. I would assume it depends on the HW, if the entire device/system is broken. Do you really want to fallback to shash? Yes, in this case, there is no downside to letting it to continue working, just slower, using the software crypto implementation. In any case, it shouldn't be hard coded. Mimi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] [v3] crypto: sha1: add ARM NEON implementation
This patch adds ARM NEON assembly implementation of SHA-1 algorithm. tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm: block-size bytes/updateold-vs-new 16 16 1.04x 64 16 1.02x 64 64 1.05x 256 16 1.03x 256 64 1.04x 256 256 1.30x 102416 1.03x 1024256 1.36x 102410241.52x 204816 1.03x 2048256 1.39x 204810241.55x 204820481.59x 409616 1.03x 4096256 1.40x 409610241.57x 409640961.62x 819216 1.03x 8192256 1.40x 819210241.58x 819240961.63x 819281921.63x Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version - Move contants to .text section - Further tweaks to implementation for ~10% speed-up. v3: - Changelog moved below '---' --- arch/arm/crypto/Makefile |2 arch/arm/crypto/sha1-armv7-neon.S | 634 arch/arm/crypto/sha1_glue.c|8 arch/arm/crypto/sha1_neon_glue.c | 197 +++ arch/arm/include/asm/crypto/sha1.h | 10 + crypto/Kconfig | 11 + 6 files changed, 859 insertions(+), 3 deletions(-) create mode 100644 arch/arm/crypto/sha1-armv7-neon.S create mode 100644 arch/arm/crypto/sha1_neon_glue.c create mode 100644 arch/arm/include/asm/crypto/sha1.h diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 81cda39..374956d 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -5,10 +5,12 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o +obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o +sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha1-armv7-neon.S b/arch/arm/crypto/sha1-armv7-neon.S new file mode 100644 index 000..50013c0 --- /dev/null +++ b/arch/arm/crypto/sha1-armv7-neon.S @@ -0,0 +1,634 @@ +/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/linkage.h + + +.syntax unified +.code 32 +.fpu neon + +.text + + +/* Context structure */ + +#define state_h0 0 +#define state_h1 4 +#define state_h2 8 +#define state_h3 12 +#define state_h4 16 + + +/* Constants */ + +#define K1 0x5A827999 +#define K2 0x6ED9EBA1 +#define K3 0x8F1BBCDC +#define K4 0xCA62C1D6 +.align 4 +.LK_VEC: +.LK1: .long K1, K1, K1, K1 +.LK2: .long K2, K2, K2, K2 +.LK3: .long K3, K3, K3, K3 +.LK4: .long K4, K4, K4, K4 + + +/* Register macros */ + +#define RSTATE r0 +#define RDATA r1 +#define RNBLKS r2 +#define ROLDSTACK r3 +#define RWK lr + +#define _a r4 +#define _b r5 +#define _c r6 +#define _d r7 +#define _e r8 + +#define RT0 r9 +#define RT1 r10 +#define RT2 r11 +#define RT3 r12 + +#define W0 q0 +#define W1 q1 +#define W2 q2 +#define W3 q3 +#define W4 q4 +#define W5 q5 +#define W6 q6 +#define W7 q7 + +#define tmp0 q8 +#define tmp1 q9 +#define tmp2 q10 +#define tmp3 q11 + +#define qK1 q12 +#define qK2 q13 +#define qK3 q14 +#define qK4 q15 + + +/* Round function macros. */ + +#define WK_offs(i) (((i) 15) * 4) + +#define _R_F1(a,b,c,d,e,i,pre1,pre2,pre3,i16,\ + W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \ + ldr RT3, [sp, WK_offs(i)]; \ + pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + bic RT0, d, b; \ + add e, e, a, ror #(32 - 5); \ + and RT1, c, b; \ + pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add RT0, RT0, RT3; \ + add e, e, RT1; \ + ror b, #(32 - 30); \ + pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add e, e, RT0; + +#define _R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,\ + W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \ + ldr RT3, [sp, WK_offs(i)]; \ +
[PATCH] [v3] crypto: sha512: add ARM NEON implementation
This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/updateold-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 102416 2.53x 1024256 3.39x 102410243.52x 204816 2.50x 2048256 3.41x 204810243.54x 204820483.57x 409616 2.49x 4096256 3.42x 409610243.56x 409640963.59x 819216 2.48x 8192256 3.42x 819210243.56x 819240963.60x 819281923.60x Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version v3: - Changelog moved below '---' --- arch/arm/crypto/Makefile|2 arch/arm/crypto/sha512-armv7-neon.S | 455 +++ arch/arm/crypto/sha512_neon_glue.c | 305 +++ crypto/Kconfig | 15 + 4 files changed, 777 insertions(+) create mode 100644 arch/arm/crypto/sha512-armv7-neon.S create mode 100644 arch/arm/crypto/sha512_neon_glue.c diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 374956d..b48fa34 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o +obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o +sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha512-armv7-neon.S b/arch/arm/crypto/sha512-armv7-neon.S new file mode 100644 index 000..fe99472 --- /dev/null +++ b/arch/arm/crypto/sha512-armv7-neon.S @@ -0,0 +1,455 @@ +/* sha512-armv7-neon.S - ARM/NEON assembly implementation of SHA-512 transform + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/linkage.h + + +.syntax unified +.code 32 +.fpu neon + +.text + +/* structure of SHA512_CONTEXT */ +#define hd_a 0 +#define hd_b ((hd_a) + 8) +#define hd_c ((hd_b) + 8) +#define hd_d ((hd_c) + 8) +#define hd_e ((hd_d) + 8) +#define hd_f ((hd_e) + 8) +#define hd_g ((hd_f) + 8) + +/* register macros */ +#define RK %r2 + +#define RA d0 +#define RB d1 +#define RC d2 +#define RD d3 +#define RE d4 +#define RF d5 +#define RG d6 +#define RH d7 + +#define RT0 d8 +#define RT1 d9 +#define RT2 d10 +#define RT3 d11 +#define RT4 d12 +#define RT5 d13 +#define RT6 d14 +#define RT7 d15 + +#define RT01q q4 +#define RT23q q5 +#define RT45q q6 +#define RT67q q7 + +#define RW0 d16 +#define RW1 d17 +#define RW2 d18 +#define RW3 d19 +#define RW4 d20 +#define RW5 d21 +#define RW6 d22 +#define RW7 d23 +#define RW8 d24 +#define RW9 d25 +#define RW10 d26 +#define RW11 d27 +#define RW12 d28 +#define RW13 d29 +#define RW14 d30 +#define RW15 d31 + +#define RW01q q8 +#define RW23q q9 +#define RW45q q10 +#define RW67q q11 +#define RW89q q12 +#define RW1011q q13 +#define RW1213q q14 +#define RW1415q q15 + +/*** + * ARM assembly implementation of sha512 transform + ***/ +#define rounds2_0_63(ra, rb, rc, rd, re, rf, rg, rh, rw0, rw1, rw01q, rw2, \ + rw23q, rw1415q, rw9, rw10, interleave_op, arg1) \ + /* t1 = h + Sum1 (e) + Ch (e, f, g) + k[t] + w[t]; */ \ + vshr.u64 RT2, re, #14; \ + vshl.u64 RT3, re, #64 - 14; \ + interleave_op(arg1); \ + vshr.u64 RT4, re, #18; \ + vshl.u64 RT5, re, #64 - 18; \ + vld1.64 {RT0}, [RK]!; \ + veor.64 RT23q, RT23q, RT45q; \ + vshr.u64 RT4, re, #41; \ + vshl.u64 RT5, re, #64 - 41; \ + vadd.u64 RT0, RT0, rw0; \ + veor.64 RT23q, RT23q, RT45q; \ +
[PATCH 1/2] [v3] crypto: sha1/ARM: make use of common SHA-1 structures
Common SHA-1 structures are defined in crypto/sha.h for code sharing. This patch changes SHA-1/ARM glue code to use these structures. Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/arm/crypto/sha1_glue.c | 50 +++ 1 file changed, 22 insertions(+), 28 deletions(-) diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c index 76cd976..c494e57 100644 --- a/arch/arm/crypto/sha1_glue.c +++ b/arch/arm/crypto/sha1_glue.c @@ -24,31 +24,25 @@ #include crypto/sha.h #include asm/byteorder.h -struct SHA1_CTX { - uint32_t h0,h1,h2,h3,h4; - u64 count; - u8 data[SHA1_BLOCK_SIZE]; -}; -asmlinkage void sha1_block_data_order(struct SHA1_CTX *digest, +asmlinkage void sha1_block_data_order(u32 *digest, const unsigned char *data, unsigned int rounds); static int sha1_init(struct shash_desc *desc) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); - memset(sctx, 0, sizeof(*sctx)); - sctx-h0 = SHA1_H0; - sctx-h1 = SHA1_H1; - sctx-h2 = SHA1_H2; - sctx-h3 = SHA1_H3; - sctx-h4 = SHA1_H4; + struct sha1_state *sctx = shash_desc_ctx(desc); + + *sctx = (struct sha1_state){ + .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 }, + }; + return 0; } -static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, - unsigned int len, unsigned int partial) +static int __sha1_update(struct sha1_state *sctx, const u8 *data, +unsigned int len, unsigned int partial) { unsigned int done = 0; @@ -56,17 +50,17 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, if (partial) { done = SHA1_BLOCK_SIZE - partial; - memcpy(sctx-data + partial, data, done); - sha1_block_data_order(sctx, sctx-data, 1); + memcpy(sctx-buffer + partial, data, done); + sha1_block_data_order(sctx-state, sctx-buffer, 1); } if (len - done = SHA1_BLOCK_SIZE) { const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE; - sha1_block_data_order(sctx, data + done, rounds); + sha1_block_data_order(sctx-state, data + done, rounds); done += rounds * SHA1_BLOCK_SIZE; } - memcpy(sctx-data, data + done, len - done); + memcpy(sctx-buffer, data + done, len - done); return 0; } @@ -74,14 +68,14 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, static int sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int partial = sctx-count % SHA1_BLOCK_SIZE; int res; /* Handle the fast case right here */ if (partial + len SHA1_BLOCK_SIZE) { sctx-count += len; - memcpy(sctx-data + partial, data, len); + memcpy(sctx-buffer + partial, data, len); return 0; } res = __sha1_update(sctx, data, len, partial); @@ -92,7 +86,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, /* Add padding and return the message digest. */ static int sha1_final(struct shash_desc *desc, u8 *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int i, index, padlen; __be32 *dst = (__be32 *)out; __be64 bits; @@ -106,7 +100,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* We need to fill a whole block for __sha1_update() */ if (padlen = 56) { sctx-count += padlen; - memcpy(sctx-data + index, padding, padlen); + memcpy(sctx-buffer + index, padding, padlen); } else { __sha1_update(sctx, padding, padlen, index); } @@ -114,7 +108,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* Store state in digest */ for (i = 0; i 5; i++) - dst[i] = cpu_to_be32(((u32 *)sctx)[i]); + dst[i] = cpu_to_be32(sctx-state[i]); /* Wipe context */ memset(sctx, 0, sizeof(*sctx)); @@ -124,7 +118,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) static int sha1_export(struct shash_desc *desc, void *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); memcpy(out, sctx, sizeof(*sctx)); return 0; } @@ -132,7 +126,7 @@ static int sha1_export(struct shash_desc *desc, void *out) static int sha1_import(struct shash_desc *desc, const void *in) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc);
Re: [PATCH 1/2] [v2] crypto: sha1/ARM: make use of common SHA-1 structures
On 29 June 2014 16:33, Jussi Kivilinna jussi.kivili...@iki.fi wrote: Common SHA-1 structures are defined in crypto/sha.h for code sharing. This patch changes SHA-1/ARM glue code to use these structures. Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- These two should go into Russell's patch system if nobody else has any more comments. http://www.arm.linux.org.uk/developer/patches/ -- Ard. arch/arm/crypto/sha1_glue.c | 50 +++ 1 file changed, 22 insertions(+), 28 deletions(-) diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c index 76cd976..c494e57 100644 --- a/arch/arm/crypto/sha1_glue.c +++ b/arch/arm/crypto/sha1_glue.c @@ -24,31 +24,25 @@ #include crypto/sha.h #include asm/byteorder.h -struct SHA1_CTX { - uint32_t h0,h1,h2,h3,h4; - u64 count; - u8 data[SHA1_BLOCK_SIZE]; -}; -asmlinkage void sha1_block_data_order(struct SHA1_CTX *digest, +asmlinkage void sha1_block_data_order(u32 *digest, const unsigned char *data, unsigned int rounds); static int sha1_init(struct shash_desc *desc) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); - memset(sctx, 0, sizeof(*sctx)); - sctx-h0 = SHA1_H0; - sctx-h1 = SHA1_H1; - sctx-h2 = SHA1_H2; - sctx-h3 = SHA1_H3; - sctx-h4 = SHA1_H4; + struct sha1_state *sctx = shash_desc_ctx(desc); + + *sctx = (struct sha1_state){ + .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 }, + }; + return 0; } -static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, - unsigned int len, unsigned int partial) +static int __sha1_update(struct sha1_state *sctx, const u8 *data, +unsigned int len, unsigned int partial) { unsigned int done = 0; @@ -56,17 +50,17 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, if (partial) { done = SHA1_BLOCK_SIZE - partial; - memcpy(sctx-data + partial, data, done); - sha1_block_data_order(sctx, sctx-data, 1); + memcpy(sctx-buffer + partial, data, done); + sha1_block_data_order(sctx-state, sctx-buffer, 1); } if (len - done = SHA1_BLOCK_SIZE) { const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE; - sha1_block_data_order(sctx, data + done, rounds); + sha1_block_data_order(sctx-state, data + done, rounds); done += rounds * SHA1_BLOCK_SIZE; } - memcpy(sctx-data, data + done, len - done); + memcpy(sctx-buffer, data + done, len - done); return 0; } @@ -74,14 +68,14 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, static int sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int partial = sctx-count % SHA1_BLOCK_SIZE; int res; /* Handle the fast case right here */ if (partial + len SHA1_BLOCK_SIZE) { sctx-count += len; - memcpy(sctx-data + partial, data, len); + memcpy(sctx-buffer + partial, data, len); return 0; } res = __sha1_update(sctx, data, len, partial); @@ -92,7 +86,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, /* Add padding and return the message digest. */ static int sha1_final(struct shash_desc *desc, u8 *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int i, index, padlen; __be32 *dst = (__be32 *)out; __be64 bits; @@ -106,7 +100,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* We need to fill a whole block for __sha1_update() */ if (padlen = 56) { sctx-count += padlen; - memcpy(sctx-data + index, padding, padlen); + memcpy(sctx-buffer + index, padding, padlen); } else { __sha1_update(sctx, padding, padlen, index); } @@ -114,7 +108,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* Store state in digest */ for (i = 0; i 5; i++) - dst[i] = cpu_to_be32(((u32 *)sctx)[i]); + dst[i] = cpu_to_be32(sctx-state[i]); /* Wipe context */ memset(sctx, 0, sizeof(*sctx)); @@ -124,7 +118,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) static int sha1_export(struct shash_desc *desc, void *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc);
Re: [PATCH] [v3] crypto: sha512: add ARM NEON implementation
On 30 June 2014 18:39, Jussi Kivilinna jussi.kivili...@iki.fi wrote: This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/updateold-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 102416 2.53x 1024256 3.39x 102410243.52x 204816 2.50x 2048256 3.41x 204810243.54x 204820483.57x 409616 2.49x 4096256 3.42x 409610243.56x 409640963.59x 819216 2.48x 8192256 3.42x 819210243.56x 819240963.60x 819281923.60x Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi Likewise for this one: if nobody has any more comments, this should go into the patch system. One remaining question though: is this code (and the SHA1 code) known to be broken for big endian or just untested? Thanks, Ard. --- Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version v3: - Changelog moved below '---' --- arch/arm/crypto/Makefile|2 arch/arm/crypto/sha512-armv7-neon.S | 455 +++ arch/arm/crypto/sha512_neon_glue.c | 305 +++ crypto/Kconfig | 15 + 4 files changed, 777 insertions(+) create mode 100644 arch/arm/crypto/sha512-armv7-neon.S create mode 100644 arch/arm/crypto/sha512_neon_glue.c diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 374956d..b48fa34 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o +obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o +sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha512-armv7-neon.S b/arch/arm/crypto/sha512-armv7-neon.S new file mode 100644 index 000..fe99472 --- /dev/null +++ b/arch/arm/crypto/sha512-armv7-neon.S @@ -0,0 +1,455 @@ +/* sha512-armv7-neon.S - ARM/NEON assembly implementation of SHA-512 transform + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/linkage.h + + +.syntax unified +.code 32 +.fpu neon + +.text + +/* structure of SHA512_CONTEXT */ +#define hd_a 0 +#define hd_b ((hd_a) + 8) +#define hd_c ((hd_b) + 8) +#define hd_d ((hd_c) + 8) +#define hd_e ((hd_d) + 8) +#define hd_f ((hd_e) + 8) +#define hd_g ((hd_f) + 8) + +/* register macros */ +#define RK %r2 + +#define RA d0 +#define RB d1 +#define RC d2 +#define RD d3 +#define RE d4 +#define RF d5 +#define RG d6 +#define RH d7 + +#define RT0 d8 +#define RT1 d9 +#define RT2 d10 +#define RT3 d11 +#define RT4 d12 +#define RT5 d13 +#define RT6 d14 +#define RT7 d15 + +#define RT01q q4 +#define RT23q q5 +#define RT45q q6 +#define RT67q q7 + +#define RW0 d16 +#define RW1 d17 +#define RW2 d18 +#define RW3 d19 +#define RW4 d20 +#define RW5 d21 +#define RW6 d22 +#define RW7 d23 +#define RW8 d24 +#define RW9 d25 +#define RW10 d26 +#define RW11 d27 +#define RW12 d28 +#define RW13 d29 +#define RW14 d30 +#define RW15 d31 + +#define RW01q q8 +#define RW23q q9 +#define RW45q q10 +#define RW67q q11 +#define RW89q q12 +#define RW1011q q13 +#define RW1213q q14 +#define RW1415q q15 + +/*** + * ARM assembly implementation of sha512 transform + ***/ +#define rounds2_0_63(ra, rb, rc, rd, re, rf, rg, rh, rw0, rw1, rw01q, rw2, \ + rw23q, rw1415q, rw9, rw10, interleave_op, arg1) \ + /* t1 =
Re: [PATCH] [v3] crypto: sha512: add ARM NEON implementation
On 30.06.2014 21:13, Ard Biesheuvel wrote: On 30 June 2014 18:39, Jussi Kivilinna jussi.kivili...@iki.fi wrote: This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/updateold-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 102416 2.53x 1024256 3.39x 102410243.52x 204816 2.50x 2048256 3.41x 204810243.54x 204820483.57x 409616 2.49x 4096256 3.42x 409610243.56x 409640963.59x 819216 2.48x 8192256 3.42x 819210243.56x 819240963.60x 819281923.60x Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi Likewise for this one: if nobody has any more comments, this should go into the patch system. One remaining question though: is this code (and the SHA1 code) known to be broken for big endian or just untested? Untested and probably broken, so therefore I've disabled when CPU_BIG_ENDIAN=y. -Jussi Thanks, Ard. --- Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version v3: - Changelog moved below '---' --- arch/arm/crypto/Makefile|2 arch/arm/crypto/sha512-armv7-neon.S | 455 +++ arch/arm/crypto/sha512_neon_glue.c | 305 +++ crypto/Kconfig | 15 + 4 files changed, 777 insertions(+) create mode 100644 arch/arm/crypto/sha512-armv7-neon.S create mode 100644 arch/arm/crypto/sha512_neon_glue.c diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 374956d..b48fa34 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o +obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o +sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha512-armv7-neon.S b/arch/arm/crypto/sha512-armv7-neon.S new file mode 100644 index 000..fe99472 --- /dev/null +++ b/arch/arm/crypto/sha512-armv7-neon.S @@ -0,0 +1,455 @@ +/* sha512-armv7-neon.S - ARM/NEON assembly implementation of SHA-512 transform + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/linkage.h + + +.syntax unified +.code 32 +.fpu neon + +.text + +/* structure of SHA512_CONTEXT */ +#define hd_a 0 +#define hd_b ((hd_a) + 8) +#define hd_c ((hd_b) + 8) +#define hd_d ((hd_c) + 8) +#define hd_e ((hd_d) + 8) +#define hd_f ((hd_e) + 8) +#define hd_g ((hd_f) + 8) + +/* register macros */ +#define RK %r2 + +#define RA d0 +#define RB d1 +#define RC d2 +#define RD d3 +#define RE d4 +#define RF d5 +#define RG d6 +#define RH d7 + +#define RT0 d8 +#define RT1 d9 +#define RT2 d10 +#define RT3 d11 +#define RT4 d12 +#define RT5 d13 +#define RT6 d14 +#define RT7 d15 + +#define RT01q q4 +#define RT23q q5 +#define RT45q q6 +#define RT67q q7 + +#define RW0 d16 +#define RW1 d17 +#define RW2 d18 +#define RW3 d19 +#define RW4 d20 +#define RW5 d21 +#define RW6 d22 +#define RW7 d23 +#define RW8 d24 +#define RW9 d25 +#define RW10 d26 +#define RW11 d27 +#define RW12 d28 +#define RW13 d29 +#define RW14 d30 +#define RW15 d31 + +#define RW01q q8 +#define RW23q q9 +#define RW45q q10 +#define RW67q q11 +#define RW89q q12 +#define RW1011q q13 +#define RW1213q q14 +#define RW1415q q15 + +/*** + * ARM assembly implementation of sha512 transform + ***/ +#define