On Mon, Mar 10, 2025 at 03:48:31PM +0700, John Naylor wrote: > On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandboss...@gmail.com> > wrote: >> Overall, I wish we could avoid splitting things into separate files and >> adding more header file gymnastics, but maybe there isn't much better we >> can do without overhauling the CPU feature detection code. > > I wanted to make an attempt to make this aspect nicer. v13-0002 > incorporates deliberately compact and simple loops for inlined > constant input into the dispatch function, and leaves the existing > code alone. This avoids code churn and saves vertical space in the > copied code. It needs a bit more commentary, but I hope this is a more > digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll > be simpler if we can always assume non-constant input can go through a > function pointer.
That is certainly more readable. FWIW I think it would be entirely reasonable to replace the pg_crc32c_sse42.c implementation with a call to this new pg_comp_crc32c_dispatch() function. Of course, you'd have to split things up like: pg_attribute_no_sanitize_alignment() static inline pg_crc32c pg_comp_crc32c_sse42_inline(pgcrc32c crc, const void *data, size_t len) { const unsigned char *p = data; #ifdef __x86_64__ for (; len >= 8; p += 8, len -= 8) crc = _mm_crc32_u64(crc, *(const uint64 *) p); #endif for (; len >= 4; p += 4, len -= 4) crc = _mm_crc32_u32(crc, *(const uint32 *) p); for (; len > 0; len--) crc = _mm_crc32_u8(crc, *p++) return crc; } pg_attribute_no_sanitize_alignment() static inline pg_crc32c pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len) { if (__builtin_constant_p(len)) return pg_comp_crc32c_sse42_inline(crc, data, len); return pg_comp_crc32c_sse42(crc, data, len); } And then in pg_crc32c_sse42.c: pg_attribute_no_sanitize_alignment() pg_attribute_target("sse4.2") pg_crc32c pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len) { return pg_comp_crc32c_sse42_inline(crc, data, len); } Granted, that adds back some of the complexity you were trying to avoid (and is very similar to your v12 patches), but it's pretty compact and avoids some code duplication. FTR I don't feel too strongly about this, but on balance I guess I'd be okay with a tad more complexity than your v13 patches if it served a useful purpose. -- nathan