On Tue, Jun 17, 2025 at 6:40 AM Andy Fan <zhihuifan1...@163.com> wrote:
>
> "Devulapalli, Raghuveer" <raghuveer.devulapa...@intel.com> writes:
>
> > Great catch! From the intrinsic manual:
> >
> > Cast vector of type __m128i to type __m512i; the upper 384 bits of the
> > result are undefined.

Thanks Raghuveer and Nathan, for the diagnosis!

> Just be curious, what kind of optimization (like what -O2 does) could
> mask this issue?

In case Andy is asking about "how" rather than "under what
circumstances", my guess is: -O1+  may have just chosen instructions
that also happen to zero-extend, which are common. -O0 doesn't
represent the naive straightforward structure of what the programmer
wrote, it's more like an "exploded" representation suitable for later
optimization passes. That's why it always looks goofy.

> > Replacing that with _mm512_zextsi128_si512 fixes the problem.

Here's a patch for testing, which also reverts the previous
workaround. Help welcome, but I still promise to test it in the near
future regardless.

--
John Naylor
Amazon Web Services
diff --git a/src/port/pg_crc32c_sse42.c b/src/port/pg_crc32c_sse42.c
index 9af3474a6ca..1a717255355 100644
--- a/src/port/pg_crc32c_sse42.c
+++ b/src/port/pg_crc32c_sse42.c
@@ -123,7 +123,7 @@ pg_comp_crc32c_avx512(pg_crc32c crc, const void *data, size_t len)
 		__m512i		k;
 
 		k = _mm512_broadcast_i32x4(_mm_setr_epi32(0x740eef02, 0, 0x9e4addf8, 0));
-		x0 = _mm512_xor_si512(_mm512_castsi128_si512(_mm_cvtsi32_si128(crc0)), x0);
+		x0 = _mm512_xor_si512(_mm512_zextsi128_si512(_mm_cvtsi32_si128(crc0)), x0);
 		buf += 64;
 
 		/* Main loop. */
diff --git a/src/port/pg_crc32c_sse42_choose.c b/src/port/pg_crc32c_sse42_choose.c
index 802e47788c1..74d2421ba2b 100644
--- a/src/port/pg_crc32c_sse42_choose.c
+++ b/src/port/pg_crc32c_sse42_choose.c
@@ -95,9 +95,7 @@ pg_comp_crc32c_choose(pg_crc32c crc, const void *data, size_t len)
 			__cpuidex(exx, 7, 0);
 #endif
 
-#if defined(__clang__) && !defined(__OPTIMIZE__)
-			/* Some versions of clang are broken at -O0 */
-#elif defined(USE_AVX512_CRC32C_WITH_RUNTIME_CHECK)
+#ifdef USE_AVX512_CRC32C_WITH_RUNTIME_CHECK
 			if (exx[2] & (1 << 10) &&	/* VPCLMULQDQ */
 				exx[1] & (1 << 31)) /* AVX512-VL */
 				pg_comp_crc32c = pg_comp_crc32c_avx512;

Reply via email to