On Fri, Jan 10, 2025 at 09:38:14AM -0600, Nathan Bossart wrote:
> On Fri, Jan 10, 2025 at 11:10:03AM +0000, chiranmoy.bhattacha...@fujitsu.com 
> wrote:
>> We tried auto-vectorization and observed no performance improvement.
> 
> Do you mean that the auto-vectorization worked and you observed no
> performance improvement, or the auto-vectorization had no effect on the
> code generated?

I was able to get auto-vectorization to take effect on Apple clang 16 with
the following addition to src/backend/utils/adt/Makefile:

        encode.o: CFLAGS += ${CFLAGS_VECTORIZE} -mllvm -force-vector-width=8

This gave the following results with your hex_encode_test() function:

    buf  | HEAD  | patch | % diff
  -------+-------+-------+--------
      16 |    21 |    16 |   24
      64 |    54 |    41 |   24
     256 |   138 |   100 |   28
    1024 |   441 |   300 |   32
    4096 |  1671 |  1106 |   34
   16384 |  6890 |  4570 |   34
   65536 | 27393 | 18054 |   34

This doesn't compare with the gains you are claiming to see with
intrinsics, but it's not bad for a one line change.  I bet there are ways
to adjust the code so that the auto-vectorization is more effective, too.

-- 
nathan


Reply via email to