On Tue, Mar 04, 2025 at 12:09:09PM +0700, John Naylor wrote:
> On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandboss...@gmail.com> 
> wrote:
>> This could potentially lead to a small regression for machines with SSE
>> 4.2 but not PCLMUL, but that may be uncommon enough at this point to not
>> worry aobut.
> 
> Note also upthread I mentioned we may have to go to 512-bit pclmul,
> since Zen 2 regresses on 128-bit. :-(

Ah, okay.  You mean the AVX-512 version [0]?  And are you thinking we'd use
the same strategy for the compiled-in-SSE4.2 builds, i.e., inline the
SSE4.2 version for small inputs and use a function pointer for larger ones?

> I actually haven't seen any measurable difference with direct calls
> versus indirect, but it could very well be that the microbenchmark is
> hiding that since it's doing something unnatural by calling things a
> bunch of times in a loop. I want to try changing the benchmark to base
> the address it's computing on some bits from the crc from the last
> loop iteration. I think that would make it more latency-sensitive. We
> could also make it do an additional constant 20-byte input every time
> to make it resemble WAL more closely.

Looking back on some old benchmarks for small-ish inputs [0], the
difference does seem within the noise range.  I suppose these functions
might be expensive enough to make the function pointer overhead negligible.
IME there's a big difference when a function pointer is used for an
instruction or two [2], but even relatively small inputs to the CRC-32C
functions might require several instructions.

>> The main question I have is whether we can simplify this by always using a
>> runtime check and by inlining slicing-by-8 for small inputs.  That would be
>> dependent on the performance of slicing-by-8 and SSE 4.2 being comparable
>> for small inputs.
> 
> Slicing-by-8 needs one lookup and one XOR per byte of input, and other
> overheads, so I think it would still be very slow.

That's my suspicion, too.

[0] 
https://postgr.es/m/BL1PR11MB530401FA7E9B1CA432CF9DC3DC192%40BL1PR11MB5304.namprd11.prod.outlook.com
[1] https://postgr.es/m/20231031033601.GA68409%40nathanxps13
[2] 
https://postgr.es/m/CAApHDvqyMNGVgwpaOPtENdq5uEMR%3DvSkRJEgG1S%2BX7Vtk1-EnA%40mail.gmail.com

-- 
nathan


Reply via email to