On Tue, Jun 17, 2025 at 1:55 AM John Naylor <johncnaylo...@gmail.com> wrote:

I took the minimal repro from [1] and took a look at the code generated
between clang 17 -O0 [2] and clang 17 -O3 [3]. I saw that -O3 (and
actually -O1 and -O2) generated the following code for:

castval = _mm512_castsi128_si512(_mm_cvtsi32_si128(crc0));
x0 = _mm512_xor_si512(castval, x0);

vinserti128  ymm0, ymm0, xmmword ptr [rip + .LCPI1_0], 0
vpxorq  zmm0, zmm0, zmmword ptr [rdi]

Reading vpxorq's pseudocode [4], it seems that it zeroes out the leading
bits:

DEST[MAXVL-1:VL] := 0

Same thing for clang 17 -O0, if we are using _mm512_zextsi128_si512
instead [5] -  vpxor and vbroadcast128 are used which seem to also
zero out leading bits.

So, -O1..-O3 were indeed emitting instructions that zero-extend and, thus
avoiding the undefined behavior.

[1]
https://www.postgresql.org/message-id/PH8PR11MB8286A89AF2B104044187E54DFB70A%40PH8PR11MB8286.namprd11.prod.outlook.com
[2] https://godbolt.org/z/ahx9PePYr
[3] https://godbolt.org/z/W4WPzjnbb
[4] https://www.felixcloutier.com/x86/pxor#vpxorq--evex-encoded-versions-
[5] https://godbolt.org/z/46brvrnnv

Regards,
Deep (VMware)

Reply via email to