On Tue, Jun 17, 2025 at 1:55 AM John Naylor <johncnaylo...@gmail.com> wrote:
I took the minimal repro from [1] and took a look at the code generated between clang 17 -O0 [2] and clang 17 -O3 [3]. I saw that -O3 (and actually -O1 and -O2) generated the following code for: castval = _mm512_castsi128_si512(_mm_cvtsi32_si128(crc0)); x0 = _mm512_xor_si512(castval, x0); vinserti128 ymm0, ymm0, xmmword ptr [rip + .LCPI1_0], 0 vpxorq zmm0, zmm0, zmmword ptr [rdi] Reading vpxorq's pseudocode [4], it seems that it zeroes out the leading bits: DEST[MAXVL-1:VL] := 0 Same thing for clang 17 -O0, if we are using _mm512_zextsi128_si512 instead [5] - vpxor and vbroadcast128 are used which seem to also zero out leading bits. So, -O1..-O3 were indeed emitting instructions that zero-extend and, thus avoiding the undefined behavior. [1] https://www.postgresql.org/message-id/PH8PR11MB8286A89AF2B104044187E54DFB70A%40PH8PR11MB8286.namprd11.prod.outlook.com [2] https://godbolt.org/z/ahx9PePYr [3] https://godbolt.org/z/W4WPzjnbb [4] https://www.felixcloutier.com/x86/pxor#vpxorq--evex-encoded-versions- [5] https://godbolt.org/z/46brvrnnv Regards, Deep (VMware)