On Thu, Mar 28, 2024 at 04:38:54PM -0500, Nathan Bossart wrote: > Here is a v14 of the patch that I think is beginning to approach something > committable. Besides general review and testing, there are two things that > I'd like to bring up: > > * The latest patch set from Paul Amonson appeared to support MSVC in the > meson build, but not the autoconf one. I don't have much expertise here, > so the v14 patch doesn't have any autoconf/meson support for MSVC, which > I thought might be okay for now. IIUC we assume that 64-bit/MSVC builds > can always compile the x86_64 popcount code, but I don't know whether > that's safe for AVX512. > > * I think we need to verify there isn't a huge performance regression for > smaller arrays. IIUC those will still require an AVX512 instruction or > two as well as a function call, which might add some noticeable overhead.
I forgot to mention that I also want to understand whether we can actually assume availability of XGETBV when CPUID says we support AVX512: > + /* > + * We also need to check that the OS has enabled support for > the ZMM > + * registers. > + */ > +#ifdef _MSC_VER > + return (_xgetbv(0) & 0xe0) != 0; > +#else > + uint64 xcr = 0; > + uint32 high; > + uint32 low; > + > +__asm__ __volatile__(" xgetbv\n":"=a"(low), "=d"(high):"c"(xcr)); > + return (low & 0xe0) != 0; > +#endif -- Nathan Bossart Amazon Web Services: https://aws.amazon.com