[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
FreddyYe added inline comments. Herald added a subscriber: StephenFan. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067 + // favor this processor. + TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName(); +} pengfei wrote: > erichkeane wrote: > > andrew.w.kaylor wrote: > > > Unfortunately, I don't think it's this easy. The list of names used for > > > cpu_specific doesn't come from the same place as the list of names used > > > by "tune-cpu". For one thing, the cpu_specific names can't contain the > > > '-' character, so we have names like "skylake_avx512" in cpu_specific > > > that would need to be translated to "skylake-avx512" for "tune-cpu". I > > > believe the list of valid names for "tune-cpu" comes from here: > > > https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294 > > > > > > Also, some of the aliases supported by cpu_specific don't have any > > > corresponding "tune-cpu" name. You happen to have picked one of these for > > > the test. I believe "core_4th_gen_avx" should map to "haswell". > > Hmm... this is unfortunate. I wonder if we add some 'translation' type > > field to the X86TargetParser.def entries? Any idea who the right one to > > populate said list would be? > > I believe the list of valid names for "tune-cpu" comes from ... > > I think it's here > https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Target/X86/X86.td#L1408 > > So back to Andy's problems, where we consume the cpu_specific names in > compiler previously, e.g., mapping to different targets? Or it is done by > external libraries like compiler-rt? > > I think I have the same requirments that mapping `-` and `_` for "tune-cpu" > in https://github.com/llvm/llvm-project/issues/50125 where the preprocessor > defines use `_` as well. > Unfortunately, I don't think it's this easy. The list of names used for > cpu_specific doesn't come from the same place as the list of names used by > "tune-cpu". For one thing, the cpu_specific names can't contain the '-' > character, so we have names like "skylake_avx512" in cpu_specific that would > need to be translated to "skylake-avx512" for "tune-cpu". I believe the list > of valid names for "tune-cpu" comes from here: > https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294 > > Also, some of the aliases supported by cpu_specific don't have any > corresponding "tune-cpu" name. You happen to have picked one of these for the > test. I believe "core_4th_gen_avx" should map to "haswell". Happens to find this patch. I recently also change here back to the initial version of this patch at https://reviews.llvm.org/D151696. To resolve the problem @andrew.w.kaylor mentioned here, I added these "unsupported" names in X86.td like Phoebe mentioned below. If you are interested, feel free to comment there. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. erichkeane marked an inline comment as done. Closed by commit rGdc152659b452: Have cpu-specific variants set 'tune-cpu' as an optimization hint (authored by erichkeane). Herald added a project: clang. Changed prior to commit: https://reviews.llvm.org/D121410?vs=414709&id=415077#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 Files: clang/include/clang/Basic/TargetInfo.h clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h clang/lib/CodeGen/CodeGenModule.cpp clang/test/CodeGen/attr-cpuspecific-avx-abi.c clang/test/CodeGen/attr-cpuspecific.c llvm/include/llvm/Support/X86TargetParser.def Index: llvm/include/llvm/Support/X86TargetParser.def === --- llvm/include/llvm/Support/X86TargetParser.def +++ llvm/include/llvm/Support/X86TargetParser.def @@ -211,47 +211,47 @@ #undef X86_FEATURE #ifndef CPU_SPECIFIC -#define CPU_SPECIFIC(NAME, MANGLING, FEATURES) +#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES) #endif #ifndef CPU_SPECIFIC_ALIAS -#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME) +#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME) #endif -CPU_SPECIFIC("generic", 'A', "") -CPU_SPECIFIC("pentium", 'B', "") -CPU_SPECIFIC("pentium_pro", 'C', "+cmov") -CPU_SPECIFIC("pentium_mmx", 'D', "+mmx") -CPU_SPECIFIC("pentium_ii", 'E', "+cmov,+mmx") -CPU_SPECIFIC("pentium_iii", 'H', "+cmov,+mmx,+sse") -CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium_iii") -CPU_SPECIFIC("pentium_4", 'J', "+cmov,+mmx,+sse,+sse2") -CPU_SPECIFIC("pentium_m", 'K', "+cmov,+mmx,+sse,+sse2") -CPU_SPECIFIC("pentium_4_sse3", 'L', "+cmov,+mmx,+sse,+sse2,+sse3") -CPU_SPECIFIC("core_2_duo_ssse3", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3") -CPU_SPECIFIC("core_2_duo_sse4_1", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1") -CPU_SPECIFIC("atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe") -CPU_SPECIFIC("atom_sse4_2", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("core_i7_sse4_2", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("core_aes_pclmulqdq", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("atom_sse4_2_movbe", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") -CPU_SPECIFIC("goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") -CPU_SPECIFIC("sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx") -CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge") -CPU_SPECIFIC("ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx") -CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge") -CPU_SPECIFIC("haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2") -CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell") -CPU_SPECIFIC("core_4th_gen_avx_tsx", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2") -CPU_SPECIFIC("broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx") -CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell") -CPU_SPECIFIC("core_5th_gen_avx_tsx", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx") -CPU_SPECIFIC("knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd") -CPU_SPECIFIC_ALIAS("mic_avx512", "knl") -CPU_SPECIFIC("skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx") -CPU_SPECIFIC( "skylake_avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb") -CPU_SPECIFIC("cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi") -CPU_SPECIFIC("knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq") +CPU_SPECIFIC("generic", "generic", 'A', "") +CPU_SPECIFIC("pentium", "pentium", 'B', "") +CPU_SPECIFIC("pentium_pro", "pentiumpro", 'C', "+cmov") +CPU_SPECIFIC("pentium_mmx", "pentium-mmx", 'D', "+mmx") +CPU_SPECIFIC("pentium_ii", "pentium2", 'E', "+cmov,+mmx") +CPU_SPECIFIC("pentium_iii", "pentium3", 'H', "+cmov,+mmx,+sse") +CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium3", "pentiu
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
pengfei accepted this revision. pengfei added a comment. LGTM. Comment at: clang/lib/Basic/Targets/X86.cpp:1133 +#include "llvm/Support/X86TargetParser.def" +.Default(""); +} clang-format. Comment at: llvm/include/llvm/Support/X86TargetParser.def:236 +CPU_SPECIFIC("core_i7_sse4_2", "nehalem", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") +CPU_SPECIFIC("core_aes_pclmulqdq", westmere", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") +CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") Missed the left `"`? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
aaron.ballman accepted this revision. aaron.ballman added a comment. LGTM, though I'm not qualified to review the CPU specific bits in the .def file. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
erichkeane marked 4 inline comments as done. erichkeane added inline comments. Comment at: llvm/include/llvm/Support/X86TargetParser.def:236 +CPU_SPECIFIC("core_i7_sse4_2", "nehalem", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") +CPU_SPECIFIC("core_aes_pclmulqdq", "icelake-client", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") +CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont, 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") craig.topper wrote: > core_aes_pclmulqdq is westmere Thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
erichkeane updated this revision to Diff 414709. erichkeane added a comment. Update the `core_aes_pclmulqdq` to be `westmere` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 Files: clang/include/clang/Basic/TargetInfo.h clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h clang/lib/CodeGen/CodeGenModule.cpp clang/test/CodeGen/attr-cpuspecific-avx-abi.c clang/test/CodeGen/attr-cpuspecific.c llvm/include/llvm/Support/X86TargetParser.def Index: llvm/include/llvm/Support/X86TargetParser.def === --- llvm/include/llvm/Support/X86TargetParser.def +++ llvm/include/llvm/Support/X86TargetParser.def @@ -211,47 +211,47 @@ #undef X86_FEATURE #ifndef CPU_SPECIFIC -#define CPU_SPECIFIC(NAME, MANGLING, FEATURES) +#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES) #endif #ifndef CPU_SPECIFIC_ALIAS -#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME) +#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME) #endif -CPU_SPECIFIC("generic", 'A', "") -CPU_SPECIFIC("pentium", 'B', "") -CPU_SPECIFIC("pentium_pro", 'C', "+cmov") -CPU_SPECIFIC("pentium_mmx", 'D', "+mmx") -CPU_SPECIFIC("pentium_ii", 'E', "+cmov,+mmx") -CPU_SPECIFIC("pentium_iii", 'H', "+cmov,+mmx,+sse") -CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium_iii") -CPU_SPECIFIC("pentium_4", 'J', "+cmov,+mmx,+sse,+sse2") -CPU_SPECIFIC("pentium_m", 'K', "+cmov,+mmx,+sse,+sse2") -CPU_SPECIFIC("pentium_4_sse3", 'L', "+cmov,+mmx,+sse,+sse2,+sse3") -CPU_SPECIFIC("core_2_duo_ssse3", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3") -CPU_SPECIFIC("core_2_duo_sse4_1", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1") -CPU_SPECIFIC("atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe") -CPU_SPECIFIC("atom_sse4_2", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("core_i7_sse4_2", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("core_aes_pclmulqdq", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("atom_sse4_2_movbe", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") -CPU_SPECIFIC("goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") -CPU_SPECIFIC("sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx") -CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge") -CPU_SPECIFIC("ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx") -CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge") -CPU_SPECIFIC("haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2") -CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell") -CPU_SPECIFIC("core_4th_gen_avx_tsx", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2") -CPU_SPECIFIC("broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx") -CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell") -CPU_SPECIFIC("core_5th_gen_avx_tsx", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx") -CPU_SPECIFIC("knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd") -CPU_SPECIFIC_ALIAS("mic_avx512", "knl") -CPU_SPECIFIC("skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx") -CPU_SPECIFIC( "skylake_avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb") -CPU_SPECIFIC("cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi") -CPU_SPECIFIC("knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq") +CPU_SPECIFIC("generic", "generic", 'A', "") +CPU_SPECIFIC("pentium", "pentium", 'B', "") +CPU_SPECIFIC("pentium_pro", "pentiumpro", 'C', "+cmov") +CPU_SPECIFIC("pentium_mmx", "pentium-mmx", 'D', "+mmx") +CPU_SPECIFIC("pentium_ii", "pentium2", 'E', "+cmov,+mmx") +CPU_SPECIFIC("pentium_iii", "pentium3", 'H', "+cmov,+mmx,+sse") +CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium3", "pentium_iii") +CPU_SPECIFIC("pentium_4", "pentium4", 'J', "+cmov,+mmx,+sse,+sse2") +CPU_SPECIFIC("pentium_m", "pentium-m", 'K', "+cmov,+mmx,+sse,+sse2") +CPU_SPECIFIC("pentium_4_sse3", "prescott", 'L', "+cmov,+mmx,+sse,+sse2,+sse3") +CPU_SPECIFIC("core_2_duo_ssse3", "core2", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3") +CPU_SPECIFIC("
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
craig.topper added inline comments. Comment at: llvm/include/llvm/Support/X86TargetParser.def:236 +CPU_SPECIFIC("core_i7_sse4_2", "nehalem", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") +CPU_SPECIFIC("core_aes_pclmulqdq", "icelake-client", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") +CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont, 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") core_aes_pclmulqdq is westmere CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
andrew.w.kaylor accepted this revision. andrew.w.kaylor added a comment. This revision is now accepted and ready to land. This looks good to me. Thanks for the patch! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
erichkeane updated this revision to Diff 414699. erichkeane added a comment. Corrected the last few processor names thanks to @andrew.w.kaylor and @pengfei CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 Files: clang/include/clang/Basic/TargetInfo.h clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h clang/lib/CodeGen/CodeGenModule.cpp clang/test/CodeGen/attr-cpuspecific-avx-abi.c clang/test/CodeGen/attr-cpuspecific.c llvm/include/llvm/Support/X86TargetParser.def Index: llvm/include/llvm/Support/X86TargetParser.def === --- llvm/include/llvm/Support/X86TargetParser.def +++ llvm/include/llvm/Support/X86TargetParser.def @@ -211,47 +211,47 @@ #undef X86_FEATURE #ifndef CPU_SPECIFIC -#define CPU_SPECIFIC(NAME, MANGLING, FEATURES) +#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES) #endif #ifndef CPU_SPECIFIC_ALIAS -#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME) +#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME) #endif -CPU_SPECIFIC("generic", 'A', "") -CPU_SPECIFIC("pentium", 'B', "") -CPU_SPECIFIC("pentium_pro", 'C', "+cmov") -CPU_SPECIFIC("pentium_mmx", 'D', "+mmx") -CPU_SPECIFIC("pentium_ii", 'E', "+cmov,+mmx") -CPU_SPECIFIC("pentium_iii", 'H', "+cmov,+mmx,+sse") -CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium_iii") -CPU_SPECIFIC("pentium_4", 'J', "+cmov,+mmx,+sse,+sse2") -CPU_SPECIFIC("pentium_m", 'K', "+cmov,+mmx,+sse,+sse2") -CPU_SPECIFIC("pentium_4_sse3", 'L', "+cmov,+mmx,+sse,+sse2,+sse3") -CPU_SPECIFIC("core_2_duo_ssse3", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3") -CPU_SPECIFIC("core_2_duo_sse4_1", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1") -CPU_SPECIFIC("atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe") -CPU_SPECIFIC("atom_sse4_2", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("core_i7_sse4_2", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("core_aes_pclmulqdq", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("atom_sse4_2_movbe", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") -CPU_SPECIFIC("goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") -CPU_SPECIFIC("sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx") -CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge") -CPU_SPECIFIC("ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx") -CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge") -CPU_SPECIFIC("haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2") -CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell") -CPU_SPECIFIC("core_4th_gen_avx_tsx", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2") -CPU_SPECIFIC("broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx") -CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell") -CPU_SPECIFIC("core_5th_gen_avx_tsx", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx") -CPU_SPECIFIC("knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd") -CPU_SPECIFIC_ALIAS("mic_avx512", "knl") -CPU_SPECIFIC("skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx") -CPU_SPECIFIC( "skylake_avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb") -CPU_SPECIFIC("cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi") -CPU_SPECIFIC("knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq") +CPU_SPECIFIC("generic", "generic", 'A', "") +CPU_SPECIFIC("pentium", "pentium", 'B', "") +CPU_SPECIFIC("pentium_pro", "pentiumpro", 'C', "+cmov") +CPU_SPECIFIC("pentium_mmx", "pentium-mmx", 'D', "+mmx") +CPU_SPECIFIC("pentium_ii", "pentium2", 'E', "+cmov,+mmx") +CPU_SPECIFIC("pentium_iii", "pentium3", 'H', "+cmov,+mmx,+sse") +CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium3", "pentium_iii") +CPU_SPECIFIC("pentium_4", "pentium4", 'J', "+cmov,+mmx,+sse,+sse2") +CPU_SPECIFIC("pentium_m", "pentium-m", 'K', "+cmov,+mmx,+sse,+sse2") +CPU_SPECIFIC("pentium_4_sse3", "prescott", 'L', "+cmov,+mmx,+sse,+sse2,+sse3") +CPU_SPECIFIC("core_2_duo_ssse3", "core2", 'M', "+cmov,+mmx,+sse,+sse2
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
erichkeane added inline comments. Comment at: llvm/include/llvm/Support/X86TargetParser.def:230 +CPU_SPECIFIC("pentium_m", "pentium-m", 'K', "+cmov,+mmx,+sse,+sse2") +CPU_SPECIFIC("pentium_4_sse3", "", 'L', "+cmov,+mmx,+sse,+sse2,+sse3") +CPU_SPECIFIC("core_2_duo_ssse3", "", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3") Note the blanks on 230-232, 234-237, 245, and 248. Otherwise, a double-check would be really appreciated from everyone familiar with the x86 naming. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
erichkeane updated this revision to Diff 414650. erichkeane added a comment. Herald added a project: LLVM. Herald added a subscriber: llvm-commits. add a 'translation' feature to the x86 target so that we can get the 'tune cpu' name from the list. Note that there are 9 with blanks that I was unable to figure out the corresponding name (I have an email out to @andrew.w.kaylor and @pengfei to tell me what it should be). In the meantime, these will result in NO tune-cpu. Also note that I intentionally added this conversion from the 'alias' as well. This gives us the power to use an alias to change the 'tune' if we care to. Typically I'd consider this unimportant, but it means that previously mentioned VendorA (@arsenm) could simply add their processors as aliases and get the tune feature more easily. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 Files: clang/include/clang/Basic/TargetInfo.h clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h clang/lib/CodeGen/CodeGenModule.cpp clang/test/CodeGen/attr-cpuspecific-avx-abi.c clang/test/CodeGen/attr-cpuspecific.c llvm/include/llvm/Support/X86TargetParser.def Index: llvm/include/llvm/Support/X86TargetParser.def === --- llvm/include/llvm/Support/X86TargetParser.def +++ llvm/include/llvm/Support/X86TargetParser.def @@ -211,47 +211,47 @@ #undef X86_FEATURE #ifndef CPU_SPECIFIC -#define CPU_SPECIFIC(NAME, MANGLING, FEATURES) +#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES) #endif #ifndef CPU_SPECIFIC_ALIAS -#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME) +#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME) #endif -CPU_SPECIFIC("generic", 'A', "") -CPU_SPECIFIC("pentium", 'B', "") -CPU_SPECIFIC("pentium_pro", 'C', "+cmov") -CPU_SPECIFIC("pentium_mmx", 'D', "+mmx") -CPU_SPECIFIC("pentium_ii", 'E', "+cmov,+mmx") -CPU_SPECIFIC("pentium_iii", 'H', "+cmov,+mmx,+sse") -CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium_iii") -CPU_SPECIFIC("pentium_4", 'J', "+cmov,+mmx,+sse,+sse2") -CPU_SPECIFIC("pentium_m", 'K', "+cmov,+mmx,+sse,+sse2") -CPU_SPECIFIC("pentium_4_sse3", 'L', "+cmov,+mmx,+sse,+sse2,+sse3") -CPU_SPECIFIC("core_2_duo_ssse3", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3") -CPU_SPECIFIC("core_2_duo_sse4_1", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1") -CPU_SPECIFIC("atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe") -CPU_SPECIFIC("atom_sse4_2", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("core_i7_sse4_2", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("core_aes_pclmulqdq", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt") -CPU_SPECIFIC("atom_sse4_2_movbe", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") -CPU_SPECIFIC("goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") -CPU_SPECIFIC("sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx") -CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge") -CPU_SPECIFIC("ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx") -CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge") -CPU_SPECIFIC("haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2") -CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell") -CPU_SPECIFIC("core_4th_gen_avx_tsx", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2") -CPU_SPECIFIC("broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx") -CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell") -CPU_SPECIFIC("core_5th_gen_avx_tsx", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx") -CPU_SPECIFIC("knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd") -CPU_SPECIFIC_ALIAS("mic_avx512", "knl") -CPU_SPECIFIC("skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx") -CPU_SPECIFIC( "skylake_avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb") -CPU_SPECIFIC("cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi") -CPU_SPECIFIC("knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq") +CPU_SPECIFIC("generic", "
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
erichkeane added a comment. Thanks all! I'll do some work on populating a list of 'converted names', but I'll definitely need @pengfei and @andrew.w.kaylor help checking the list/filling in what I miss. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
aaron.ballman added a reviewer: arsenm. aaron.ballman added a subscriber: arsenm. aaron.ballman added a comment. Herald added a subscriber: wdng. Adding @arsenm because of this bit: > Note that the 'valid' list of processors for x86 is in > llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list > contains only Intel processors, but other vendors may wish to add their own > entries as 'alias'es (or wiht different feature lists!). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
pengfei added a comment. Typos in `wiht different feature lists` and `In the even that`. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067 + // favor this processor. + TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName(); +} erichkeane wrote: > andrew.w.kaylor wrote: > > Unfortunately, I don't think it's this easy. The list of names used for > > cpu_specific doesn't come from the same place as the list of names used by > > "tune-cpu". For one thing, the cpu_specific names can't contain the '-' > > character, so we have names like "skylake_avx512" in cpu_specific that > > would need to be translated to "skylake-avx512" for "tune-cpu". I believe > > the list of valid names for "tune-cpu" comes from here: > > https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294 > > > > Also, some of the aliases supported by cpu_specific don't have any > > corresponding "tune-cpu" name. You happen to have picked one of these for > > the test. I believe "core_4th_gen_avx" should map to "haswell". > Hmm... this is unfortunate. I wonder if we add some 'translation' type field > to the X86TargetParser.def entries? Any idea who the right one to populate > said list would be? > I believe the list of valid names for "tune-cpu" comes from ... I think it's here https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Target/X86/X86.td#L1408 So back to Andy's problems, where we consume the cpu_specific names in compiler previously, e.g., mapping to different targets? Or it is done by external libraries like compiler-rt? I think I have the same requirments that mapping `-` and `_` for "tune-cpu" in https://github.com/llvm/llvm-project/issues/50125 where the preprocessor defines use `_` as well. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
erichkeane added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067 + // favor this processor. + TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName(); +} andrew.w.kaylor wrote: > Unfortunately, I don't think it's this easy. The list of names used for > cpu_specific doesn't come from the same place as the list of names used by > "tune-cpu". For one thing, the cpu_specific names can't contain the '-' > character, so we have names like "skylake_avx512" in cpu_specific that would > need to be translated to "skylake-avx512" for "tune-cpu". I believe the list > of valid names for "tune-cpu" comes from here: > https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294 > > Also, some of the aliases supported by cpu_specific don't have any > corresponding "tune-cpu" name. You happen to have picked one of these for the > test. I believe "core_4th_gen_avx" should map to "haswell". Hmm... this is unfortunate. I wonder if we add some 'translation' type field to the X86TargetParser.def entries? Any idea who the right one to populate said list would be? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
andrew.w.kaylor added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067 + // favor this processor. + TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName(); +} Unfortunately, I don't think it's this easy. The list of names used for cpu_specific doesn't come from the same place as the list of names used by "tune-cpu". For one thing, the cpu_specific names can't contain the '-' character, so we have names like "skylake_avx512" in cpu_specific that would need to be translated to "skylake-avx512" for "tune-cpu". I believe the list of valid names for "tune-cpu" comes from here: https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294 Also, some of the aliases supported by cpu_specific don't have any corresponding "tune-cpu" name. You happen to have picked one of these for the test. I believe "core_4th_gen_avx" should map to "haswell". Comment at: clang/test/CodeGen/attr-cpuspecific-avx-abi.c:28 // CHECK: attributes #[[V]] = {{.*}}"target-features"="+avx,+avx2,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" +// CHECK-SAME: "tune-cpu"="core_4th_gen_avx" As noted above, this isn't a valid setting for "tune-cpu". I think it would just be ignored. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
andrew.w.kaylor added a comment. This example illustrates the problem this patch intends to fix: https://godbolt.org/z/j445sxPMc For Intel microarchitectures before Skylake, the LLVM cost model says that vector fsqrt is slow, so if fast-math is enabled, we'll use an approximation rather than the vsqrtps instruction when vectorizing a call to sqrtf(). If the code is compiled with -march=skylake or -mtune=skylake, we'll choose the vsqrtps instruction, but with any earlier base target, we'll choose the approximation even if there is a cpu_specific(skylake) implementation in the source code. For example __attribute__((cpu_specific(skylake))) void foo(void) { for (int i = 0; i < 8; ++i) x[i] = sqrtf(y[i]); } compiles to foo.b: vmovaps ymm0, ymmword ptr [rip + y] vrsqrtpsymm1, ymm0 vmulps ymm2, ymm0, ymm1 vbroadcastssymm3, dword ptr [rip + .LCPI2_0] # ymm3 = [-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0] vfmadd231ps ymm3, ymm2, ymm1# ymm3 = (ymm2 * ymm1) + ymm3 vbroadcastssymm1, dword ptr [rip + .LCPI2_1] # ymm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1] vmulps ymm1, ymm2, ymm1 vmulps ymm1, ymm1, ymm3 vbroadcastssymm2, dword ptr [rip + .LCPI2_2] # ymm2 = [NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN] vandps ymm0, ymm0, ymm2 vbroadcastssymm2, dword ptr [rip + .LCPI2_3] # ymm2 = [1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38] vcmplepsymm0, ymm2, ymm0 vandps ymm0, ymm0, ymm1 vmovaps ymmword ptr [rip + x], ymm0 vzeroupper ret but it should compile to foo.b: vsqrtps ymm0, ymmword ptr [rip + y] vmovaps ymmword ptr [rip + x], ymm0 vzeroupper ret CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
erichkeane added a comment. @aaron.ballman : if you can add other reviewers or subscribers (particularly those from "VendorA") it would be greatly appreciated! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121410/new/ https://reviews.llvm.org/D121410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint
erichkeane created this revision. erichkeane added a reviewer: aaron.ballman. Herald added a subscriber: pengfei. Herald added a project: All. erichkeane requested review of this revision. Due to various implementation constraints, despite the programmer choosing a 'processor' cpu_dispatch/cpu_specific needs to use the 'feature' list of a processor to identify it. This results in the identified processor in source-code not being propogated to the optimizer, and thus, not able to be tuned for. This patch changes to use the actual cpu as written for tune-cpu so that opt can make decisions based on the cpu-as-spelled, which should better match the behavior expected by the programmer. Note that the 'valid' list of processors for x86 is in llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list contains only Intel processors, but other vendors may wish to add their own entries as 'alias'es (or wiht different feature lists!). If this is not done, there is two potential performance issues with the patch, but I believe them to be worth it in light of the improvements to behavior and performance. 1- In the event that the user spelled "ProcessorB", but we only have the features available to test for "ProcessorA" (where A is B minus features), AND there is an optimization opportunity for "B" that negatively affects "A", the optimizer will likely choose to do so. 2- In the even that the user spelled VendorI's processor, and the feature list allows it to run on VendorA's processor of similar features, AND there is an optimization opportunity for VendorIs that negatively affects "A"s, the optimizer will likely choose to do so. This can be fixed by adding an alias to X86TargetParser.def. https://reviews.llvm.org/D121410 Files: clang/lib/CodeGen/CodeGenModule.cpp clang/test/CodeGen/attr-cpuspecific-avx-abi.c clang/test/CodeGen/attr-cpuspecific.c Index: clang/test/CodeGen/attr-cpuspecific.c === --- clang/test/CodeGen/attr-cpuspecific.c +++ clang/test/CodeGen/attr-cpuspecific.c @@ -340,5 +340,8 @@ void OrderDispatchUsageSpecific(void) {} // CHECK: attributes #[[S]] = {{.*}}"target-features"="+avx,+cmov,+crc32,+cx8,+f16c,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" +// CHECK-SAME: "tune-cpu"="ivybridge" // CHECK: attributes #[[K]] = {{.*}}"target-features"="+adx,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" +// CHECK-SAME: "tune-cpu"="knl" // CHECK: attributes #[[O]] = {{.*}}"target-features"="+cmov,+cx8,+mmx,+movbe,+sse,+sse2,+sse3,+ssse3,+x87" +// CHECK-SAME: "tune-cpu"="atom" Index: clang/test/CodeGen/attr-cpuspecific-avx-abi.c === --- clang/test/CodeGen/attr-cpuspecific-avx-abi.c +++ clang/test/CodeGen/attr-cpuspecific-avx-abi.c @@ -23,4 +23,6 @@ // CHECK: define{{.*}} @foo.V() #[[V:[0-9]+]] // CHECK: attributes #[[A]] = {{.*}}"target-features"="+avx,+crc32,+cx8,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" +// CHECK-SAME: "tune-cpu"="generic" // CHECK: attributes #[[V]] = {{.*}}"target-features"="+avx,+avx2,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" +// CHECK-SAME: "tune-cpu"="core_4th_gen_avx" Index: clang/lib/CodeGen/CodeGenModule.cpp === --- clang/lib/CodeGen/CodeGenModule.cpp +++ clang/lib/CodeGen/CodeGenModule.cpp @@ -2060,6 +2060,12 @@ getTarget().isValidCPUName(ParsedAttr.Tune)) TuneCPU = ParsedAttr.Tune; } + +if (SD) { + // Apply the given CPU name as the 'tune-cpu' so that the optimizer can + // favor this processor. + TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName(); +} } else { // Otherwise just add the existing target cpu and target features to the // function. Index: clang/test/CodeGen/attr-cpuspecific.c === --- clang/test/CodeGen/attr-cpuspecific.c +++ clang/test/CodeGen/attr-cpuspecific.c @@ -340,5 +340,8 @@ void OrderDispatchUsageSpecific(void) {} // CHECK: attributes #[[S]] = {{.*}}"target-features"="+avx,+cmov,+crc32,+cx8,+f16c,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" +// CHECK-SAME: "tune-cpu"="ivybridge" // CHECK: attributes #[[K]] = {{.*}}"target-features"="+adx,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" +// CHECK-SAME: "tune-cpu"="knl" // CHECK: attributes #[[O]] = {{.*}}"target-features"="+cmov,+cx8,+mmx,+movbe,+sse,+sse2,+sse3,+ssse3,+x87" +// CHECK-SAME: "tune-cpu"="atom" Index: clang/test/CodeGen/attr-cpuspecific-avx-abi.c ==