On Mon, Mar 10, 2025 at 03:48:31PM +0700, John Naylor wrote:
> On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandboss...@gmail.com> 
> wrote:
>> Overall, I wish we could avoid splitting things into separate files and
>> adding more header file gymnastics, but maybe there isn't much better we
>> can do without overhauling the CPU feature detection code.
> 
> I wanted to make an attempt to make this aspect nicer. v13-0002
> incorporates deliberately compact and simple loops for inlined
> constant input into the dispatch function, and leaves the existing
> code alone. This avoids code churn and saves vertical space in the
> copied code. It needs a bit more commentary, but I hope this is a more
> digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll
> be simpler if we can always assume non-constant input can go through a
> function pointer.

That is certainly more readable.  FWIW I think it would be entirely
reasonable to replace the pg_crc32c_sse42.c implementation with a call to
this new pg_comp_crc32c_dispatch() function.  Of course, you'd have to
split things up like:

        pg_attribute_no_sanitize_alignment()
        static inline
        pg_crc32c
        pg_comp_crc32c_sse42_inline(pgcrc32c crc, const void *data, size_t len)
        {
                const unsigned char *p = data;

        #ifdef __x86_64__
                for (; len >= 8; p += 8, len -= 8)
                        crc = _mm_crc32_u64(crc, *(const uint64 *) p);
        #endif
                for (; len >= 4; p += 4, len -= 4)
                        crc = _mm_crc32_u32(crc, *(const uint32 *) p);
                for (; len > 0; len--)
                        crc = _mm_crc32_u8(crc, *p++)
                return crc;
        }

        pg_attribute_no_sanitize_alignment()
        static inline
        pg_crc32c
        pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
        {
                if (__builtin_constant_p(len))
                        return pg_comp_crc32c_sse42_inline(crc, data, len);
                return pg_comp_crc32c_sse42(crc, data, len);
        }

And then in pg_crc32c_sse42.c:

        pg_attribute_no_sanitize_alignment()
        pg_attribute_target("sse4.2")
        pg_crc32c
        pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len)
        {
                return pg_comp_crc32c_sse42_inline(crc, data, len);
        }

Granted, that adds back some of the complexity you were trying to avoid
(and is very similar to your v12 patches), but it's pretty compact and
avoids some code duplication.  FTR I don't feel too strongly about this,
but on balance I guess I'd be okay with a tad more complexity than your
v13 patches if it served a useful purpose.

-- 
nathan


Reply via email to