On Wed, Dec 11, 2024 at 11:54 PM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Wed, Dec 11, 2024 at 02:08:58PM +0700, John Naylor wrote:
> > and how light it was. With more hardware support, we can go much lower > > than 1024 bytes, but that can be left for future work. > > Nice. I'm curious how this compares to both the existing implementations > and the proposed ones that require new intrinsics. I like the idea of > avoiding new runtime and config checks, especially if the performance is > somewhat comparable for the most popular cases (i.e., dozens of bytes to a > few thousand bytes). With 8k inputs on x86 its fairly close to 3x faster than master. I wasn't very clear, but v9 still has a cutoff of 1008 bytes just to copy from 0008, but on a slightly old machine the crossover point is about 400-600 bytes. Doing microbenchmarks that hammer on single instructions is very finicky, so I don't trust these numbers much. With hardware CLMUL, I'm guessing cutoff would be between 120 and 192 bytes (must be a multiple of 24 -- 3 words), and would depend on architecture. Arm has an advantage that vmull_p64() operates on scalars, but on x86 the corresponding operation is _mm_clmulepi64_si128() , and there's a bit of shuffling in and out of vector registers. > If we still want to add new intrinsics, would it be easy enough to add them > on top of this patch? Or would it require further restructuring? I'm still trying to wrap my head around how function selection works after commit 4b03a27fafc , but it could be something like this on x86: #if defined(__has_attribute) && __has_attribute (target) pg_attribute_target("sse4.2,pclmul") pg_comp_crc32c_sse42 { <big loop with special case for end> <hardware carryless multiply> <tail> } #endif pg_attribute_target("sse4.2") pg_comp_crc32c_sse42 { <big loop> <software carryless multiply> <tail> } ...where we have the tail part in a separate function for readability. On Arm it might have to be as complex as in 0008, since as you've mentioned, compiler support for the needed attributes is still pretty new. -- John Naylor Amazon Web Services