Re: [PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions
On Tue, Nov 20, 2018 at 05:30:47PM +0100, Martin Willi wrote: > In the quest for pushing the limits of chacha20 encryption for both IPsec > and Wireguard, this small series adds AVX-512VL block functions. The VL > variant works on 256-bit ymm registers, but compared to AVX2 can benefit > from the new instructions. > > Compared to the AVX2 version, these block functions bring an overall > speed improvement across encryption lengths of ~20%. Below the tcrypt > results for additional block sizes in kOps/s, for the current AVX2 > code path, the new AVX-512VL code path and the comparison to Zinc in > AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz). > > These numbers result in a very nice chart, available at: > https://download.strongswan.org/misc/chacha-avx-512vl.svg > > zinc zinc > len avx2 512vl avx2 512vl >8 5719 5672 5468 5612 > 16 5675 5627 5355 5621 > 24 5687 5601 5322 5633 > 32 5667 5622 5244 5564 > 40 5603 5582 5337 5578 > 48 5638 5539 5400 5556 > 56 5624 5566 5375 5482 > 64 5590 5573 5352 5531 > 72 4841 5467 3365 3457 > 80 5316 5761 3310 3381 > 88 4798 5470 3239 3343 > 96 5324 5723 3197 3281 > 104 4819 5460 3155 3232 > 112 5266 5749 3020 3195 > 120 4776 5391 2959 3145 > 128 5291 5723 3398 3489 > 136 4122 4837 3321 3423 > 144 4507 5057 3247 3389 > 152 4139 4815 3233 3329 > 160 4482 5043 3159 3256 > 168 4142 4766 3131 3224 > 176 4506 5028 3073 3162 > 184 4119 4772 3010 3109 > 192 4499 5016 3402 3502 > 200 4127 4766 3329 3448 > 208 4452 5012 3276 3371 > 216 4128 4744 3243 3334 > 224 4484 5008 3203 3298 > 232 4103 4772 3141 3237 > 240 4458 4963 3115 3217 > 248 4121 4751 3085 3177 > 256 4461 4987 3364 4046 > 264 3406 4282 3270 4006 > 272 3408 4287 3207 3961 > 280 3371 4271 3203 3825 > 288 3625 4301 3129 3751 > 296 3402 4283 3093 3688 > 304 3401 4247 3062 3637 > 312 3382 4282 2995 3614 > 320 3611 4279 3305 4070 > 328 3386 4260 3276 3968 > 336 3369 4288 3171 3929 > 344 3389 4289 3134 3847 > 352 3609 4266 3127 3720 > 360 3355 4252 3076 3692 > 368 3387 4264 3048 3650 > 376 3387 4238 2967 3553 > 384 3568 4265 3277 4035 > 392 3369 4262 3299 3973 > 400 3362 4235 3239 3899 > 408 3352 4269 3196 3843 > 416 3585 4243 3127 3736 > 424 3364 4216 3092 3672 > 432 3341 4246 3067 3628 > 440 3353 4235 3018 3593 > 448 3538 4245 3327 4035 > 456 3322 4244 3275 3900 > 464 3340 4237 3212 3880 > 472 3330 4242 3054 3802 > 480 3530 4234 3078 3707 > 488 3337 4228 3094 3664 > 496 3330 4223 3015 3591 > 504 3317 4214 3002 3517 > 512 3531 4197 3339 4016 > 520 2511 3101 2030 2682 > 528 2627 3087 2027 2641 > 536 2508 3102 2001 2601 > 544 2638 3090 1964 2564 > 552 2494 3077 1962 2516 > 560 2625 3064 1941 2515 > 568 2500 3086 1922 2493 > 576 2611 3074 2050 2689 > 584 2482 3062 2041 2680 > 592 2595 3074 2026 2644 > 600 2470 3060 1985 2595 > 608 2581 3039 1961 2555 > 616 2478 3062 1956 2521 > 624 2587 3066 1930 2493 > 632 2457 3053 1923 2486 > 640 2581 3050 2059 2712 > 648 2296 2839 2024 2655 > 656 2389 2845 2019 2642 > 664 2292 2842 2002 2610 > 672 2404 2838 1959 2537 > 680 2273 2827 1956 2527 > 688 2389 2840 1938 2510 > 696 2280 2837 1911 2463 > 704 2370 2819 2055 2702 > 712 2277 2834 2029 2663 > 720 2369 2829 2020 2625 > 728 2255 2820 2001 2600 > 736 2373 2819 1958 2543 > 744 2269 2827 1956 2524 > 752 2364 2817 1937 2492 > 760 2270 2805 1909 2483 > 768 2378 2820 2050 2696 > 776 2053 2700 2002 2643 > 784 2066 2693 1922 2640 > 792 2065 2703 1928 2602 > 800 2138 2706 1962 2535 > 808 2065 2679 1938 2528 > 816 2063 2699 1929 2500 > 824 2053 2676 1915 2468 > 832 2149 2692 2036 2693 > 840 2055 2689 2024 2659 > 848 2049 2689 2006 2610 > 856 2057 2702 1979 2585 > 864 2144 2703 1960 2547 > 872 2047 2685 1945 2501 > 880 2055 2683 1902 2497 > 888 2060 2689 1897 2478 > 896 2139 2693 2023 2663 > 904 2049 2686 1970 2644 > 912 2055 2688 1925 2621 > 920 2047 2685 1911 2572 > 928 2114 2695 1907 2545 > 936 2055 2681 1927 2492 > 944 2055 2693 1930 2478
[PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions
In the quest for pushing the limits of chacha20 encryption for both IPsec and Wireguard, this small series adds AVX-512VL block functions. The VL variant works on 256-bit ymm registers, but compared to AVX2 can benefit from the new instructions. Compared to the AVX2 version, these block functions bring an overall speed improvement across encryption lengths of ~20%. Below the tcrypt results for additional block sizes in kOps/s, for the current AVX2 code path, the new AVX-512VL code path and the comparison to Zinc in AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz). These numbers result in a very nice chart, available at: https://download.strongswan.org/misc/chacha-avx-512vl.svg zinc zinc len avx2 512vl avx2 512vl 8 5719 5672 5468 5612 16 5675 5627 5355 5621 24 5687 5601 5322 5633 32 5667 5622 5244 5564 40 5603 5582 5337 5578 48 5638 5539 5400 5556 56 5624 5566 5375 5482 64 5590 5573 5352 5531 72 4841 5467 3365 3457 80 5316 5761 3310 3381 88 4798 5470 3239 3343 96 5324 5723 3197 3281 104 4819 5460 3155 3232 112 5266 5749 3020 3195 120 4776 5391 2959 3145 128 5291 5723 3398 3489 136 4122 4837 3321 3423 144 4507 5057 3247 3389 152 4139 4815 3233 3329 160 4482 5043 3159 3256 168 4142 4766 3131 3224 176 4506 5028 3073 3162 184 4119 4772 3010 3109 192 4499 5016 3402 3502 200 4127 4766 3329 3448 208 4452 5012 3276 3371 216 4128 4744 3243 3334 224 4484 5008 3203 3298 232 4103 4772 3141 3237 240 4458 4963 3115 3217 248 4121 4751 3085 3177 256 4461 4987 3364 4046 264 3406 4282 3270 4006 272 3408 4287 3207 3961 280 3371 4271 3203 3825 288 3625 4301 3129 3751 296 3402 4283 3093 3688 304 3401 4247 3062 3637 312 3382 4282 2995 3614 320 3611 4279 3305 4070 328 3386 4260 3276 3968 336 3369 4288 3171 3929 344 3389 4289 3134 3847 352 3609 4266 3127 3720 360 3355 4252 3076 3692 368 3387 4264 3048 3650 376 3387 4238 2967 3553 384 3568 4265 3277 4035 392 3369 4262 3299 3973 400 3362 4235 3239 3899 408 3352 4269 3196 3843 416 3585 4243 3127 3736 424 3364 4216 3092 3672 432 3341 4246 3067 3628 440 3353 4235 3018 3593 448 3538 4245 3327 4035 456 3322 4244 3275 3900 464 3340 4237 3212 3880 472 3330 4242 3054 3802 480 3530 4234 3078 3707 488 3337 4228 3094 3664 496 3330 4223 3015 3591 504 3317 4214 3002 3517 512 3531 4197 3339 4016 520 2511 3101 2030 2682 528 2627 3087 2027 2641 536 2508 3102 2001 2601 544 2638 3090 1964 2564 552 2494 3077 1962 2516 560 2625 3064 1941 2515 568 2500 3086 1922 2493 576 2611 3074 2050 2689 584 2482 3062 2041 2680 592 2595 3074 2026 2644 600 2470 3060 1985 2595 608 2581 3039 1961 2555 616 2478 3062 1956 2521 624 2587 3066 1930 2493 632 2457 3053 1923 2486 640 2581 3050 2059 2712 648 2296 2839 2024 2655 656 2389 2845 2019 2642 664 2292 2842 2002 2610 672 2404 2838 1959 2537 680 2273 2827 1956 2527 688 2389 2840 1938 2510 696 2280 2837 1911 2463 704 2370 2819 2055 2702 712 2277 2834 2029 2663 720 2369 2829 2020 2625 728 2255 2820 2001 2600 736 2373 2819 1958 2543 744 2269 2827 1956 2524 752 2364 2817 1937 2492 760 2270 2805 1909 2483 768 2378 2820 2050 2696 776 2053 2700 2002 2643 784 2066 2693 1922 2640 792 2065 2703 1928 2602 800 2138 2706 1962 2535 808 2065 2679 1938 2528 816 2063 2699 1929 2500 824 2053 2676 1915 2468 832 2149 2692 2036 2693 840 2055 2689 2024 2659 848 2049 2689 2006 2610 856 2057 2702 1979 2585 864 2144 2703 1960 2547 872 2047 2685 1945 2501 880 2055 2683 1902 2497 888 2060 2689 1897 2478 896 2139 2693 2023 2663 904 2049 2686 1970 2644 912 2055 2688 1925 2621 920 2047 2685 1911 2572 928 2114 2695 1907 2545 936 2055 2681 1927 2492 944 2055 2693 1930 2478 952 2042 2688 1909 2471 960 2136 2682 2014 2672 968 2054 2687 1999 2626 976 2040 2682 1982 2598 984 2055 2687 1943 2569 992 2138 2694 1884 2522 1000 2036 2681 1929 2506 1008 2052 2676 1926 2475 1016 2050 2686 1889 2430 1024 2125 2670 2039 2656