On Wed, 20 Mar 2024 at 11:56, Amonson, Paul D <paul.d.amon...@intel.com> wrote: > Changed in this patch set. > > * Rebased. > * Direct *slow* calls via macros as shown in example patch. > * Changed the choose filename to be platform specific as suggested. > * Falls back to intermediate "Fast" methods if AVX512 is not available at > runtime. > * inline used where is makes sense, remember using "extern" negates "inline".
I'm not sure about this "extern negates inline" comment. It seems to me the compiler is perfectly free to inline a static function into an external function and it's free to inline the static function elsewhere within the same .c file. The final sentence of the following comment that the 0001 patch removes explains this: /* * When the POPCNT instruction is not available, there's no point in using * function pointers to vary the implementation between the fast and slow * method. We instead just make these actual external functions when * TRY_POPCNT_FAST is not defined. The compiler should be able to inline * the slow versions here. */ Also, have a look at [1]. You'll see f_slow() wasn't even compiled and the code was just inlined into f(). I just added the __attribute__((noinline)) so that usage() wouldn't just perform constant folding and just return 6. I think, unless you have evidence that some common compiler isn't inlining the static into the extern then we shouldn't add the macros. It adds quite a bit of churn to the patch and will break out of core code as you no longer have functions named pg_popcount32(), pg_popcount64() and pg_popcount(). David [1] https://godbolt.org/z/6joExb79d