On Tue, Mar 04, 2025 at 12:09:09PM +0700, John Naylor wrote: > On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandboss...@gmail.com> > wrote: >> This could potentially lead to a small regression for machines with SSE >> 4.2 but not PCLMUL, but that may be uncommon enough at this point to not >> worry aobut. > > Note also upthread I mentioned we may have to go to 512-bit pclmul, > since Zen 2 regresses on 128-bit. :-(
Ah, okay. You mean the AVX-512 version [0]? And are you thinking we'd use the same strategy for the compiled-in-SSE4.2 builds, i.e., inline the SSE4.2 version for small inputs and use a function pointer for larger ones? > I actually haven't seen any measurable difference with direct calls > versus indirect, but it could very well be that the microbenchmark is > hiding that since it's doing something unnatural by calling things a > bunch of times in a loop. I want to try changing the benchmark to base > the address it's computing on some bits from the crc from the last > loop iteration. I think that would make it more latency-sensitive. We > could also make it do an additional constant 20-byte input every time > to make it resemble WAL more closely. Looking back on some old benchmarks for small-ish inputs [0], the difference does seem within the noise range. I suppose these functions might be expensive enough to make the function pointer overhead negligible. IME there's a big difference when a function pointer is used for an instruction or two [2], but even relatively small inputs to the CRC-32C functions might require several instructions. >> The main question I have is whether we can simplify this by always using a >> runtime check and by inlining slicing-by-8 for small inputs. That would be >> dependent on the performance of slicing-by-8 and SSE 4.2 being comparable >> for small inputs. > > Slicing-by-8 needs one lookup and one XOR per byte of input, and other > overheads, so I think it would still be very slow. That's my suspicion, too. [0] https://postgr.es/m/BL1PR11MB530401FA7E9B1CA432CF9DC3DC192%40BL1PR11MB5304.namprd11.prod.outlook.com [1] https://postgr.es/m/20231031033601.GA68409%40nathanxps13 [2] https://postgr.es/m/CAApHDvqyMNGVgwpaOPtENdq5uEMR%3DvSkRJEgG1S%2BX7Vtk1-EnA%40mail.gmail.com -- nathan