On Fri, Jan 10, 2025 at 09:38:14AM -0600, Nathan Bossart wrote: > On Fri, Jan 10, 2025 at 11:10:03AM +0000, chiranmoy.bhattacha...@fujitsu.com > wrote: >> We tried auto-vectorization and observed no performance improvement. > > Do you mean that the auto-vectorization worked and you observed no > performance improvement, or the auto-vectorization had no effect on the > code generated?
I was able to get auto-vectorization to take effect on Apple clang 16 with the following addition to src/backend/utils/adt/Makefile: encode.o: CFLAGS += ${CFLAGS_VECTORIZE} -mllvm -force-vector-width=8 This gave the following results with your hex_encode_test() function: buf | HEAD | patch | % diff -------+-------+-------+-------- 16 | 21 | 16 | 24 64 | 54 | 41 | 24 256 | 138 | 100 | 28 1024 | 441 | 300 | 32 4096 | 1671 | 1106 | 34 16384 | 6890 | 4570 | 34 65536 | 27393 | 18054 | 34 This doesn't compare with the gains you are claiming to see with intrinsics, but it's not bad for a one line change. I bet there are ways to adjust the code so that the auto-vectorization is more effective, too. -- nathan