Here's the rebased patch with a few modifications.

The hand-unrolled hex encode performs better than the non-unrolled version on
r8g.4xlarge. No improvement on m7g.4xlarge.
Added line-by-line comments explaining the changes with an example.

Below are the results. Input size is in bytes, and exec time is in ms.

encode - r8g.4xlarge

 Input | master |   SVE  | SVE-unrolled
-------+--------+--------+--------------
     8 |  4.971 |  6.434 |        6.623
    16 |  8.532 |  4.399 |        4.710
    24 | 12.296 |  5.007 |        5.780
    32 | 16.003 |  5.027 |        5.234
    40 | 19.628 |  5.807 |        6.201
    48 | 23.277 |  5.815 |        6.222
    56 | 26.927 |  6.744 |        7.030
    64 | 30.419 |  6.774 |        6.347
   128 | 83.250 | 10.214 |        9.104
   256 |112.158 | 17.892 |       16.313
   512 |216.544 | 31.060 |       29.876
  1024 |429.351 | 59.310 |       53.374
  2048 |854.677 |116.769 |      101.004
  4096 |1706.528|237.322 |      195.297
  8192 |3723.884|499.520 |      385.424
---------------------------------------

encode - m7g.4xlarge

 Input | master |   SVE  | SVE-unrolled
-------+--------+--------+--------------
     8 |  5.503 |  7.986 |        8.053
    16 |  9.881 |  9.583 |        9.888
    24 | 13.854 |  9.212 |       10.138
    32 | 18.056 |  9.208 |        9.364
    40 | 22.127 | 10.134 |       10.540
    48 | 26.214 | 10.186 |       10.550
    56 | 29.718 | 10.197 |       10.428
    64 | 33.613 | 10.982 |       10.497
   128 | 66.060 | 12.460 |       12.624
   256 |130.225 | 18.491 |       18.872
   512 |267.105 | 30.343 |       31.661
  1024 |515.603 | 54.371 |       55.341
  2048 |1013.766|103.898 |      105.192
  4096 |2018.705|202.653 |      203.142
  8192 |4000.496|400.918 |      401.842
---------------------------------------
decode - r8g.4xlarge

 Input | master |   SVE
-------+--------+--------
     8 |  7.641 |  8.787
    16 | 14.301 | 14.477
    32 | 28.663 |  6.091
    48 | 42.940 | 17.604
    64 | 57.483 | 10.549
    80 | 71.637 | 19.194
    96 | 85.918 | 15.586
   112 |100.272 | 25.956
   128 |114.740 | 19.829
   256 |229.176 | 36.032
   512 |458.295 | 68.222
  1024 |916.741 |132.927
  2048 |1833.422|262.741
  4096 |3667.096|522.009
  8192 |7333.886|1042.447
---------------------------------------

decode - m7g.4xlarge

 Input | master |   SVE
-------+--------+--------
     8 |  8.194 |  9.433
    16 | 14.397 | 15.606
    32 | 26.669 | 29.006
    48 | 45.971 | 48.984
    64 | 58.468 | 12.388
    80 | 70.820 | 22.295
    96 | 84.792 | 43.470
   112 | 98.992 | 54.282
   128 |113.250 | 25.508
   256 |218.743 | 45.165
   512 |414.133 | 86.800
  1024 |828.493 |174.670
  2048 |1617.921|346.375
  4096 |3259.159|689.391
  8192 |6551.879|1376.195

--------
Chiranmoy

Attachment: v5-0001-SVE-support-for-hex-coding.patch
Description: v5-0001-SVE-support-for-hex-coding.patch

Reply via email to