Here's the rebased patch with a few modifications. The hand-unrolled hex encode performs better than the non-unrolled version on r8g.4xlarge. No improvement on m7g.4xlarge. Added line-by-line comments explaining the changes with an example.
Below are the results. Input size is in bytes, and exec time is in ms. encode - r8g.4xlarge Input | master | SVE | SVE-unrolled -------+--------+--------+-------------- 8 | 4.971 | 6.434 | 6.623 16 | 8.532 | 4.399 | 4.710 24 | 12.296 | 5.007 | 5.780 32 | 16.003 | 5.027 | 5.234 40 | 19.628 | 5.807 | 6.201 48 | 23.277 | 5.815 | 6.222 56 | 26.927 | 6.744 | 7.030 64 | 30.419 | 6.774 | 6.347 128 | 83.250 | 10.214 | 9.104 256 |112.158 | 17.892 | 16.313 512 |216.544 | 31.060 | 29.876 1024 |429.351 | 59.310 | 53.374 2048 |854.677 |116.769 | 101.004 4096 |1706.528|237.322 | 195.297 8192 |3723.884|499.520 | 385.424 --------------------------------------- encode - m7g.4xlarge Input | master | SVE | SVE-unrolled -------+--------+--------+-------------- 8 | 5.503 | 7.986 | 8.053 16 | 9.881 | 9.583 | 9.888 24 | 13.854 | 9.212 | 10.138 32 | 18.056 | 9.208 | 9.364 40 | 22.127 | 10.134 | 10.540 48 | 26.214 | 10.186 | 10.550 56 | 29.718 | 10.197 | 10.428 64 | 33.613 | 10.982 | 10.497 128 | 66.060 | 12.460 | 12.624 256 |130.225 | 18.491 | 18.872 512 |267.105 | 30.343 | 31.661 1024 |515.603 | 54.371 | 55.341 2048 |1013.766|103.898 | 105.192 4096 |2018.705|202.653 | 203.142 8192 |4000.496|400.918 | 401.842 --------------------------------------- decode - r8g.4xlarge Input | master | SVE -------+--------+-------- 8 | 7.641 | 8.787 16 | 14.301 | 14.477 32 | 28.663 | 6.091 48 | 42.940 | 17.604 64 | 57.483 | 10.549 80 | 71.637 | 19.194 96 | 85.918 | 15.586 112 |100.272 | 25.956 128 |114.740 | 19.829 256 |229.176 | 36.032 512 |458.295 | 68.222 1024 |916.741 |132.927 2048 |1833.422|262.741 4096 |3667.096|522.009 8192 |7333.886|1042.447 --------------------------------------- decode - m7g.4xlarge Input | master | SVE -------+--------+-------- 8 | 8.194 | 9.433 16 | 14.397 | 15.606 32 | 26.669 | 29.006 48 | 45.971 | 48.984 64 | 58.468 | 12.388 80 | 70.820 | 22.295 96 | 84.792 | 43.470 112 | 98.992 | 54.282 128 |113.250 | 25.508 256 |218.743 | 45.165 512 |414.133 | 86.800 1024 |828.493 |174.670 2048 |1617.921|346.375 4096 |3259.159|689.391 8192 |6551.879|1376.195 -------- Chiranmoy
v5-0001-SVE-support-for-hex-coding.patch
Description: v5-0001-SVE-support-for-hex-coding.patch