[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2023-06-27 Thread Freddy, Ye via Phabricator via cfe-commits
FreddyYe added inline comments.
Herald added a subscriber: StephenFan.



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067
+  // favor this processor.
+  TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName();
+}

pengfei wrote:
> erichkeane wrote:
> > andrew.w.kaylor wrote:
> > > Unfortunately, I don't think it's this easy. The list of names used for 
> > > cpu_specific doesn't come from the same place as the list of names used 
> > > by "tune-cpu". For one thing, the cpu_specific names can't contain the 
> > > '-' character, so we have names like "skylake_avx512" in cpu_specific 
> > > that would need to be translated to "skylake-avx512" for "tune-cpu". I 
> > > believe the list of valid names for "tune-cpu" comes from here: 
> > > https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294
> > > 
> > > Also, some of the aliases supported by cpu_specific don't have any 
> > > corresponding "tune-cpu" name. You happen to have picked one of these for 
> > > the test. I believe "core_4th_gen_avx" should map to "haswell".
> > Hmm... this is unfortunate.  I wonder if we add some 'translation' type 
> > field to the X86TargetParser.def entries?  Any idea who the right one to 
> > populate said list would be?
> > I believe the list of valid names for "tune-cpu" comes from ...
> 
> I think it's here 
> https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Target/X86/X86.td#L1408
> 
> So back to Andy's problems, where we consume the cpu_specific names in 
> compiler previously, e.g., mapping to different targets? Or it is done by 
> external libraries like compiler-rt?
> 
> I think I have the same requirments that mapping `-` and `_` for "tune-cpu" 
> in https://github.com/llvm/llvm-project/issues/50125 where the preprocessor 
> defines use `_` as well.
> Unfortunately, I don't think it's this easy. The list of names used for 
> cpu_specific doesn't come from the same place as the list of names used by 
> "tune-cpu". For one thing, the cpu_specific names can't contain the '-' 
> character, so we have names like "skylake_avx512" in cpu_specific that would 
> need to be translated to "skylake-avx512" for "tune-cpu". I believe the list 
> of valid names for "tune-cpu" comes from here: 
> https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294
> 
> Also, some of the aliases supported by cpu_specific don't have any 
> corresponding "tune-cpu" name. You happen to have picked one of these for the 
> test. I believe "core_4th_gen_avx" should map to "haswell".

Happens to find this patch. I recently also change here back to the initial 
version of this patch at https://reviews.llvm.org/D151696.  To resolve the 
problem @andrew.w.kaylor mentioned here, I added these "unsupported" names in 
X86.td like Phoebe mentioned below. If you are interested, feel free to comment 
there.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-14 Thread Erich Keane via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
erichkeane marked an inline comment as done.
Closed by commit rGdc152659b452: Have cpu-specific variants set 
'tune-cpu' as an optimization hint (authored by erichkeane).
Herald added a project: clang.

Changed prior to commit:
  https://reviews.llvm.org/D121410?vs=414709&id=415077#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

Files:
  clang/include/clang/Basic/TargetInfo.h
  clang/lib/Basic/Targets/X86.cpp
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGen/attr-cpuspecific-avx-abi.c
  clang/test/CodeGen/attr-cpuspecific.c
  llvm/include/llvm/Support/X86TargetParser.def

Index: llvm/include/llvm/Support/X86TargetParser.def
===
--- llvm/include/llvm/Support/X86TargetParser.def
+++ llvm/include/llvm/Support/X86TargetParser.def
@@ -211,47 +211,47 @@
 #undef X86_FEATURE
 
 #ifndef CPU_SPECIFIC
-#define CPU_SPECIFIC(NAME, MANGLING, FEATURES)
+#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES)
 #endif
 
 #ifndef CPU_SPECIFIC_ALIAS
-#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME)
+#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME)
 #endif
 
-CPU_SPECIFIC("generic", 'A', "")
-CPU_SPECIFIC("pentium", 'B', "")
-CPU_SPECIFIC("pentium_pro", 'C', "+cmov")
-CPU_SPECIFIC("pentium_mmx", 'D', "+mmx")
-CPU_SPECIFIC("pentium_ii", 'E', "+cmov,+mmx")
-CPU_SPECIFIC("pentium_iii", 'H', "+cmov,+mmx,+sse")
-CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium_iii")
-CPU_SPECIFIC("pentium_4", 'J', "+cmov,+mmx,+sse,+sse2")
-CPU_SPECIFIC("pentium_m", 'K', "+cmov,+mmx,+sse,+sse2")
-CPU_SPECIFIC("pentium_4_sse3", 'L', "+cmov,+mmx,+sse,+sse2,+sse3")
-CPU_SPECIFIC("core_2_duo_ssse3", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3")
-CPU_SPECIFIC("core_2_duo_sse4_1", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1")
-CPU_SPECIFIC("atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe")
-CPU_SPECIFIC("atom_sse4_2", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("core_i7_sse4_2", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("core_aes_pclmulqdq", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("atom_sse4_2_movbe", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
-CPU_SPECIFIC("goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
-CPU_SPECIFIC("sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx")
-CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge")
-CPU_SPECIFIC("ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx")
-CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge")
-CPU_SPECIFIC("haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
-CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell")
-CPU_SPECIFIC("core_4th_gen_avx_tsx", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
-CPU_SPECIFIC("broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
-CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell")
-CPU_SPECIFIC("core_5th_gen_avx_tsx", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
-CPU_SPECIFIC("knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd")
-CPU_SPECIFIC_ALIAS("mic_avx512", "knl")
-CPU_SPECIFIC("skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx")
-CPU_SPECIFIC( "skylake_avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb")
-CPU_SPECIFIC("cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi")
-CPU_SPECIFIC("knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq")
+CPU_SPECIFIC("generic", "generic", 'A', "")
+CPU_SPECIFIC("pentium", "pentium", 'B', "")
+CPU_SPECIFIC("pentium_pro", "pentiumpro", 'C', "+cmov")
+CPU_SPECIFIC("pentium_mmx", "pentium-mmx", 'D', "+mmx")
+CPU_SPECIFIC("pentium_ii", "pentium2", 'E', "+cmov,+mmx")
+CPU_SPECIFIC("pentium_iii", "pentium3", 'H', "+cmov,+mmx,+sse")
+CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium3", "pentiu

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei accepted this revision.
pengfei added a comment.

LGTM.




Comment at: clang/lib/Basic/Targets/X86.cpp:1133
+#include "llvm/Support/X86TargetParser.def"
+.Default("");
+}

clang-format.



Comment at: llvm/include/llvm/Support/X86TargetParser.def:236
+CPU_SPECIFIC("core_i7_sse4_2", "nehalem", 'P', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
+CPU_SPECIFIC("core_aes_pclmulqdq", westmere", 'Q', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
+CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont", 'd', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")

Missed the left `"`?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman accepted this revision.
aaron.ballman added a comment.

LGTM, though I'm not qualified to review the CPU specific bits in the .def file.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Erich Keane via Phabricator via cfe-commits
erichkeane marked 4 inline comments as done.
erichkeane added inline comments.



Comment at: llvm/include/llvm/Support/X86TargetParser.def:236
+CPU_SPECIFIC("core_i7_sse4_2", "nehalem", 'P', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
+CPU_SPECIFIC("core_aes_pclmulqdq", "icelake-client", 'Q', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
+CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont, 'd', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")

craig.topper wrote:
> core_aes_pclmulqdq is westmere
Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Erich Keane via Phabricator via cfe-commits
erichkeane updated this revision to Diff 414709.
erichkeane added a comment.

Update the `core_aes_pclmulqdq` to be `westmere`


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

Files:
  clang/include/clang/Basic/TargetInfo.h
  clang/lib/Basic/Targets/X86.cpp
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGen/attr-cpuspecific-avx-abi.c
  clang/test/CodeGen/attr-cpuspecific.c
  llvm/include/llvm/Support/X86TargetParser.def

Index: llvm/include/llvm/Support/X86TargetParser.def
===
--- llvm/include/llvm/Support/X86TargetParser.def
+++ llvm/include/llvm/Support/X86TargetParser.def
@@ -211,47 +211,47 @@
 #undef X86_FEATURE
 
 #ifndef CPU_SPECIFIC
-#define CPU_SPECIFIC(NAME, MANGLING, FEATURES)
+#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES)
 #endif
 
 #ifndef CPU_SPECIFIC_ALIAS
-#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME)
+#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME)
 #endif
 
-CPU_SPECIFIC("generic", 'A', "")
-CPU_SPECIFIC("pentium", 'B', "")
-CPU_SPECIFIC("pentium_pro", 'C', "+cmov")
-CPU_SPECIFIC("pentium_mmx", 'D', "+mmx")
-CPU_SPECIFIC("pentium_ii", 'E', "+cmov,+mmx")
-CPU_SPECIFIC("pentium_iii", 'H', "+cmov,+mmx,+sse")
-CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium_iii")
-CPU_SPECIFIC("pentium_4", 'J', "+cmov,+mmx,+sse,+sse2")
-CPU_SPECIFIC("pentium_m", 'K', "+cmov,+mmx,+sse,+sse2")
-CPU_SPECIFIC("pentium_4_sse3", 'L', "+cmov,+mmx,+sse,+sse2,+sse3")
-CPU_SPECIFIC("core_2_duo_ssse3", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3")
-CPU_SPECIFIC("core_2_duo_sse4_1", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1")
-CPU_SPECIFIC("atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe")
-CPU_SPECIFIC("atom_sse4_2", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("core_i7_sse4_2", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("core_aes_pclmulqdq", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("atom_sse4_2_movbe", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
-CPU_SPECIFIC("goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
-CPU_SPECIFIC("sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx")
-CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge")
-CPU_SPECIFIC("ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx")
-CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge")
-CPU_SPECIFIC("haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
-CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell")
-CPU_SPECIFIC("core_4th_gen_avx_tsx", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
-CPU_SPECIFIC("broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
-CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell")
-CPU_SPECIFIC("core_5th_gen_avx_tsx", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
-CPU_SPECIFIC("knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd")
-CPU_SPECIFIC_ALIAS("mic_avx512", "knl")
-CPU_SPECIFIC("skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx")
-CPU_SPECIFIC( "skylake_avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb")
-CPU_SPECIFIC("cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi")
-CPU_SPECIFIC("knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq")
+CPU_SPECIFIC("generic", "generic", 'A', "")
+CPU_SPECIFIC("pentium", "pentium", 'B', "")
+CPU_SPECIFIC("pentium_pro", "pentiumpro", 'C', "+cmov")
+CPU_SPECIFIC("pentium_mmx", "pentium-mmx", 'D', "+mmx")
+CPU_SPECIFIC("pentium_ii", "pentium2", 'E', "+cmov,+mmx")
+CPU_SPECIFIC("pentium_iii", "pentium3", 'H', "+cmov,+mmx,+sse")
+CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium3", "pentium_iii")
+CPU_SPECIFIC("pentium_4", "pentium4", 'J', "+cmov,+mmx,+sse,+sse2")
+CPU_SPECIFIC("pentium_m", "pentium-m", 'K', "+cmov,+mmx,+sse,+sse2")
+CPU_SPECIFIC("pentium_4_sse3", "prescott", 'L', "+cmov,+mmx,+sse,+sse2,+sse3")
+CPU_SPECIFIC("core_2_duo_ssse3", "core2", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3")
+CPU_SPECIFIC("

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: llvm/include/llvm/Support/X86TargetParser.def:236
+CPU_SPECIFIC("core_i7_sse4_2", "nehalem", 'P', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
+CPU_SPECIFIC("core_aes_pclmulqdq", "icelake-client", 'Q', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
+CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont, 'd', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")

core_aes_pclmulqdq is westmere


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

This looks good to me. Thanks for the patch!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Erich Keane via Phabricator via cfe-commits
erichkeane updated this revision to Diff 414699.
erichkeane added a comment.

Corrected the last few processor names thanks to @andrew.w.kaylor  and @pengfei


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

Files:
  clang/include/clang/Basic/TargetInfo.h
  clang/lib/Basic/Targets/X86.cpp
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGen/attr-cpuspecific-avx-abi.c
  clang/test/CodeGen/attr-cpuspecific.c
  llvm/include/llvm/Support/X86TargetParser.def

Index: llvm/include/llvm/Support/X86TargetParser.def
===
--- llvm/include/llvm/Support/X86TargetParser.def
+++ llvm/include/llvm/Support/X86TargetParser.def
@@ -211,47 +211,47 @@
 #undef X86_FEATURE
 
 #ifndef CPU_SPECIFIC
-#define CPU_SPECIFIC(NAME, MANGLING, FEATURES)
+#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES)
 #endif
 
 #ifndef CPU_SPECIFIC_ALIAS
-#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME)
+#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME)
 #endif
 
-CPU_SPECIFIC("generic", 'A', "")
-CPU_SPECIFIC("pentium", 'B', "")
-CPU_SPECIFIC("pentium_pro", 'C', "+cmov")
-CPU_SPECIFIC("pentium_mmx", 'D', "+mmx")
-CPU_SPECIFIC("pentium_ii", 'E', "+cmov,+mmx")
-CPU_SPECIFIC("pentium_iii", 'H', "+cmov,+mmx,+sse")
-CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium_iii")
-CPU_SPECIFIC("pentium_4", 'J', "+cmov,+mmx,+sse,+sse2")
-CPU_SPECIFIC("pentium_m", 'K', "+cmov,+mmx,+sse,+sse2")
-CPU_SPECIFIC("pentium_4_sse3", 'L', "+cmov,+mmx,+sse,+sse2,+sse3")
-CPU_SPECIFIC("core_2_duo_ssse3", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3")
-CPU_SPECIFIC("core_2_duo_sse4_1", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1")
-CPU_SPECIFIC("atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe")
-CPU_SPECIFIC("atom_sse4_2", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("core_i7_sse4_2", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("core_aes_pclmulqdq", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("atom_sse4_2_movbe", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
-CPU_SPECIFIC("goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
-CPU_SPECIFIC("sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx")
-CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge")
-CPU_SPECIFIC("ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx")
-CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge")
-CPU_SPECIFIC("haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
-CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell")
-CPU_SPECIFIC("core_4th_gen_avx_tsx", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
-CPU_SPECIFIC("broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
-CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell")
-CPU_SPECIFIC("core_5th_gen_avx_tsx", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
-CPU_SPECIFIC("knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd")
-CPU_SPECIFIC_ALIAS("mic_avx512", "knl")
-CPU_SPECIFIC("skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx")
-CPU_SPECIFIC( "skylake_avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb")
-CPU_SPECIFIC("cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi")
-CPU_SPECIFIC("knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq")
+CPU_SPECIFIC("generic", "generic", 'A', "")
+CPU_SPECIFIC("pentium", "pentium", 'B', "")
+CPU_SPECIFIC("pentium_pro", "pentiumpro", 'C', "+cmov")
+CPU_SPECIFIC("pentium_mmx", "pentium-mmx", 'D', "+mmx")
+CPU_SPECIFIC("pentium_ii", "pentium2", 'E', "+cmov,+mmx")
+CPU_SPECIFIC("pentium_iii", "pentium3", 'H', "+cmov,+mmx,+sse")
+CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium3", "pentium_iii")
+CPU_SPECIFIC("pentium_4", "pentium4", 'J', "+cmov,+mmx,+sse,+sse2")
+CPU_SPECIFIC("pentium_m", "pentium-m", 'K', "+cmov,+mmx,+sse,+sse2")
+CPU_SPECIFIC("pentium_4_sse3", "prescott", 'L', "+cmov,+mmx,+sse,+sse2,+sse3")
+CPU_SPECIFIC("core_2_duo_ssse3", "core2", 'M', "+cmov,+mmx,+sse,+sse2

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Erich Keane via Phabricator via cfe-commits
erichkeane added inline comments.



Comment at: llvm/include/llvm/Support/X86TargetParser.def:230
+CPU_SPECIFIC("pentium_m", "pentium-m", 'K', "+cmov,+mmx,+sse,+sse2")
+CPU_SPECIFIC("pentium_4_sse3", "", 'L', "+cmov,+mmx,+sse,+sse2,+sse3")
+CPU_SPECIFIC("core_2_duo_ssse3", "", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3")

Note the blanks on 230-232, 234-237, 245, and 248.  Otherwise, a double-check 
would be really appreciated from everyone familiar with the x86 naming.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Erich Keane via Phabricator via cfe-commits
erichkeane updated this revision to Diff 414650.
erichkeane added a comment.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

add a 'translation' feature to the x86 target so that we can get the 'tune cpu' 
name from the list.  Note that there are 9 with blanks that I was unable to 
figure out the corresponding name (I have an email out to @andrew.w.kaylor  and 
@pengfei to tell me what it should be). In the meantime, these will result in 
NO tune-cpu.

Also note that I intentionally added this conversion from the 'alias' as well.  
This gives us the power to use an alias to change the 'tune' if we care to.  
Typically I'd consider this unimportant, but it means that previously mentioned 
VendorA (@arsenm) could simply add their processors as aliases and get the tune 
feature more easily.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

Files:
  clang/include/clang/Basic/TargetInfo.h
  clang/lib/Basic/Targets/X86.cpp
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGen/attr-cpuspecific-avx-abi.c
  clang/test/CodeGen/attr-cpuspecific.c
  llvm/include/llvm/Support/X86TargetParser.def

Index: llvm/include/llvm/Support/X86TargetParser.def
===
--- llvm/include/llvm/Support/X86TargetParser.def
+++ llvm/include/llvm/Support/X86TargetParser.def
@@ -211,47 +211,47 @@
 #undef X86_FEATURE
 
 #ifndef CPU_SPECIFIC
-#define CPU_SPECIFIC(NAME, MANGLING, FEATURES)
+#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES)
 #endif
 
 #ifndef CPU_SPECIFIC_ALIAS
-#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME)
+#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME)
 #endif
 
-CPU_SPECIFIC("generic", 'A', "")
-CPU_SPECIFIC("pentium", 'B', "")
-CPU_SPECIFIC("pentium_pro", 'C', "+cmov")
-CPU_SPECIFIC("pentium_mmx", 'D', "+mmx")
-CPU_SPECIFIC("pentium_ii", 'E', "+cmov,+mmx")
-CPU_SPECIFIC("pentium_iii", 'H', "+cmov,+mmx,+sse")
-CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium_iii")
-CPU_SPECIFIC("pentium_4", 'J', "+cmov,+mmx,+sse,+sse2")
-CPU_SPECIFIC("pentium_m", 'K', "+cmov,+mmx,+sse,+sse2")
-CPU_SPECIFIC("pentium_4_sse3", 'L', "+cmov,+mmx,+sse,+sse2,+sse3")
-CPU_SPECIFIC("core_2_duo_ssse3", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3")
-CPU_SPECIFIC("core_2_duo_sse4_1", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1")
-CPU_SPECIFIC("atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe")
-CPU_SPECIFIC("atom_sse4_2", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("core_i7_sse4_2", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("core_aes_pclmulqdq", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
-CPU_SPECIFIC("atom_sse4_2_movbe", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
-CPU_SPECIFIC("goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
-CPU_SPECIFIC("sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx")
-CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge")
-CPU_SPECIFIC("ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx")
-CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge")
-CPU_SPECIFIC("haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
-CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell")
-CPU_SPECIFIC("core_4th_gen_avx_tsx", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
-CPU_SPECIFIC("broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
-CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell")
-CPU_SPECIFIC("core_5th_gen_avx_tsx", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
-CPU_SPECIFIC("knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd")
-CPU_SPECIFIC_ALIAS("mic_avx512", "knl")
-CPU_SPECIFIC("skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx")
-CPU_SPECIFIC( "skylake_avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb")
-CPU_SPECIFIC("cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi")
-CPU_SPECIFIC("knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq")
+CPU_SPECIFIC("generic", "

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Erich Keane via Phabricator via cfe-commits
erichkeane added a comment.

Thanks all!  I'll do some work on populating a list of 'converted names', but 
I'll definitely need @pengfei and @andrew.w.kaylor help checking the 
list/filling in what I miss.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a reviewer: arsenm.
aaron.ballman added a subscriber: arsenm.
aaron.ballman added a comment.
Herald added a subscriber: wdng.

Adding @arsenm because of this bit:

> Note that the 'valid' list of processors for x86 is in 
> llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list 
> contains only Intel processors, but other vendors may wish to add their own 
> entries as 'alias'es (or wiht different feature lists!).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

Typos in `wiht different feature lists` and `In the even that`.




Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067
+  // favor this processor.
+  TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName();
+}

erichkeane wrote:
> andrew.w.kaylor wrote:
> > Unfortunately, I don't think it's this easy. The list of names used for 
> > cpu_specific doesn't come from the same place as the list of names used by 
> > "tune-cpu". For one thing, the cpu_specific names can't contain the '-' 
> > character, so we have names like "skylake_avx512" in cpu_specific that 
> > would need to be translated to "skylake-avx512" for "tune-cpu". I believe 
> > the list of valid names for "tune-cpu" comes from here: 
> > https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294
> > 
> > Also, some of the aliases supported by cpu_specific don't have any 
> > corresponding "tune-cpu" name. You happen to have picked one of these for 
> > the test. I believe "core_4th_gen_avx" should map to "haswell".
> Hmm... this is unfortunate.  I wonder if we add some 'translation' type field 
> to the X86TargetParser.def entries?  Any idea who the right one to populate 
> said list would be?
> I believe the list of valid names for "tune-cpu" comes from ...

I think it's here 
https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Target/X86/X86.td#L1408

So back to Andy's problems, where we consume the cpu_specific names in compiler 
previously, e.g., mapping to different targets? Or it is done by external 
libraries like compiler-rt?

I think I have the same requirments that mapping `-` and `_` for "tune-cpu" in 
https://github.com/llvm/llvm-project/issues/50125 where the preprocessor 
defines use `_` as well.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-10 Thread Erich Keane via Phabricator via cfe-commits
erichkeane added inline comments.



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067
+  // favor this processor.
+  TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName();
+}

andrew.w.kaylor wrote:
> Unfortunately, I don't think it's this easy. The list of names used for 
> cpu_specific doesn't come from the same place as the list of names used by 
> "tune-cpu". For one thing, the cpu_specific names can't contain the '-' 
> character, so we have names like "skylake_avx512" in cpu_specific that would 
> need to be translated to "skylake-avx512" for "tune-cpu". I believe the list 
> of valid names for "tune-cpu" comes from here: 
> https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294
> 
> Also, some of the aliases supported by cpu_specific don't have any 
> corresponding "tune-cpu" name. You happen to have picked one of these for the 
> test. I believe "core_4th_gen_avx" should map to "haswell".
Hmm... this is unfortunate.  I wonder if we add some 'translation' type field 
to the X86TargetParser.def entries?  Any idea who the right one to populate 
said list would be?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-10 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added inline comments.



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067
+  // favor this processor.
+  TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName();
+}

Unfortunately, I don't think it's this easy. The list of names used for 
cpu_specific doesn't come from the same place as the list of names used by 
"tune-cpu". For one thing, the cpu_specific names can't contain the '-' 
character, so we have names like "skylake_avx512" in cpu_specific that would 
need to be translated to "skylake-avx512" for "tune-cpu". I believe the list of 
valid names for "tune-cpu" comes from here: 
https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294

Also, some of the aliases supported by cpu_specific don't have any 
corresponding "tune-cpu" name. You happen to have picked one of these for the 
test. I believe "core_4th_gen_avx" should map to "haswell".



Comment at: clang/test/CodeGen/attr-cpuspecific-avx-abi.c:28
 // CHECK: attributes #[[V]] = 
{{.*}}"target-features"="+avx,+avx2,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
+// CHECK-SAME: "tune-cpu"="core_4th_gen_avx"

As noted above, this isn't a valid setting for "tune-cpu". I think it would 
just be ignored.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-10 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.

This example illustrates the problem this patch intends to fix: 
https://godbolt.org/z/j445sxPMc

For Intel microarchitectures before Skylake, the LLVM cost model says that 
vector fsqrt is slow, so if fast-math is enabled, we'll use an approximation 
rather than the vsqrtps instruction when vectorizing a call to sqrtf(). If the 
code is compiled with -march=skylake or -mtune=skylake, we'll choose the 
vsqrtps instruction, but with any earlier base target, we'll choose the 
approximation even if there is a cpu_specific(skylake) implementation in the 
source code.

For example

  __attribute__((cpu_specific(skylake))) void foo(void) {
for (int i = 0; i < 8; ++i)
  x[i] = sqrtf(y[i]);
  }

compiles to

  foo.b:
  vmovaps ymm0, ymmword ptr [rip + y]
  vrsqrtpsymm1, ymm0
  vmulps  ymm2, ymm0, ymm1
  vbroadcastssymm3, dword ptr [rip + .LCPI2_0] # ymm3 = 
[-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0]
  vfmadd231ps ymm3, ymm2, ymm1# ymm3 = (ymm2 * ymm1) + ymm3
  vbroadcastssymm1, dword ptr [rip + .LCPI2_1] # ymm1 = 
[-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
  vmulps  ymm1, ymm2, ymm1
  vmulps  ymm1, ymm1, ymm3
  vbroadcastssymm2, dword ptr [rip + .LCPI2_2] # ymm2 = 
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]
  vandps  ymm0, ymm0, ymm2
  vbroadcastssymm2, dword ptr [rip + .LCPI2_3] # ymm2 = 
[1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38]
  vcmplepsymm0, ymm2, ymm0
  vandps  ymm0, ymm0, ymm1
  vmovaps ymmword ptr [rip + x], ymm0
  vzeroupper
  ret

but it should compile to

  foo.b:
  vsqrtps ymm0, ymmword ptr [rip + y]
  vmovaps ymmword ptr [rip + x], ymm0
  vzeroupper
  ret


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-10 Thread Erich Keane via Phabricator via cfe-commits
erichkeane added a comment.

@aaron.ballman : if you can add other reviewers or subscribers (particularly 
those from "VendorA") it would be greatly appreciated!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-10 Thread Erich Keane via Phabricator via cfe-commits
erichkeane created this revision.
erichkeane added a reviewer: aaron.ballman.
Herald added a subscriber: pengfei.
Herald added a project: All.
erichkeane requested review of this revision.

Due to various implementation constraints, despite the programmer
choosing a 'processor' cpu_dispatch/cpu_specific needs to use the
'feature' list of a processor to identify it.  This results in the
identified processor in source-code not being propogated to the
optimizer, and thus, not able to be tuned for.

This patch changes to use the actual cpu as written for tune-cpu so that
opt can make decisions based on the cpu-as-spelled, which should better
match the behavior expected by the programmer.

Note that the 'valid' list of processors for x86 is in
llvm/include/llvm/Support/X86TargetParser.def.  At the moment, this list
contains only Intel processors, but other vendors may wish to add their
own entries as 'alias'es (or wiht different feature lists!).

If this is not done, there is two potential performance issues with the 
patch, but I believe them to be worth it in light of the improvements to
behavior and performance.

1- In the event that the user spelled "ProcessorB", but we only have the
features available to test for "ProcessorA" (where A is B minus features),
AND there is an optimization opportunity for "B" that negatively affects
"A", the optimizer will likely choose to do so.

2- In the even that the user spelled VendorI's processor, and the feature
list allows it to run on VendorA's processor of similar features, AND there
is an optimization opportunity for VendorIs that negatively affects "A"s,
the optimizer will likely choose to do so.  This can be fixed by adding an
alias to X86TargetParser.def.


https://reviews.llvm.org/D121410

Files:
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGen/attr-cpuspecific-avx-abi.c
  clang/test/CodeGen/attr-cpuspecific.c


Index: clang/test/CodeGen/attr-cpuspecific.c
===
--- clang/test/CodeGen/attr-cpuspecific.c
+++ clang/test/CodeGen/attr-cpuspecific.c
@@ -340,5 +340,8 @@
 void OrderDispatchUsageSpecific(void) {}
 
 // CHECK: attributes #[[S]] = 
{{.*}}"target-features"="+avx,+cmov,+crc32,+cx8,+f16c,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
+// CHECK-SAME: "tune-cpu"="ivybridge"
 // CHECK: attributes #[[K]] = 
{{.*}}"target-features"="+adx,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
+// CHECK-SAME: "tune-cpu"="knl"
 // CHECK: attributes #[[O]] = 
{{.*}}"target-features"="+cmov,+cx8,+mmx,+movbe,+sse,+sse2,+sse3,+ssse3,+x87"
+// CHECK-SAME: "tune-cpu"="atom"
Index: clang/test/CodeGen/attr-cpuspecific-avx-abi.c
===
--- clang/test/CodeGen/attr-cpuspecific-avx-abi.c
+++ clang/test/CodeGen/attr-cpuspecific-avx-abi.c
@@ -23,4 +23,6 @@
 // CHECK: define{{.*}} @foo.V() #[[V:[0-9]+]]
 
 // CHECK: attributes #[[A]] = 
{{.*}}"target-features"="+avx,+crc32,+cx8,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
+// CHECK-SAME: "tune-cpu"="generic"
 // CHECK: attributes #[[V]] = 
{{.*}}"target-features"="+avx,+avx2,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
+// CHECK-SAME: "tune-cpu"="core_4th_gen_avx"
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -2060,6 +2060,12 @@
   getTarget().isValidCPUName(ParsedAttr.Tune))
 TuneCPU = ParsedAttr.Tune;
 }
+
+if (SD) {
+  // Apply the given CPU name as the 'tune-cpu' so that the optimizer can
+  // favor this processor.
+  TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName();
+}
   } else {
 // Otherwise just add the existing target cpu and target features to the
 // function.


Index: clang/test/CodeGen/attr-cpuspecific.c
===
--- clang/test/CodeGen/attr-cpuspecific.c
+++ clang/test/CodeGen/attr-cpuspecific.c
@@ -340,5 +340,8 @@
 void OrderDispatchUsageSpecific(void) {}
 
 // CHECK: attributes #[[S]] = {{.*}}"target-features"="+avx,+cmov,+crc32,+cx8,+f16c,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
+// CHECK-SAME: "tune-cpu"="ivybridge"
 // CHECK: attributes #[[K]] = {{.*}}"target-features"="+adx,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
+// CHECK-SAME: "tune-cpu"="knl"
 // CHECK: attributes #[[O]] = {{.*}}"target-features"="+cmov,+cx8,+mmx,+movbe,+sse,+sse2,+sse3,+ssse3,+x87"
+// CHECK-SAME: "tune-cpu"="atom"
Index: clang/test/CodeGen/attr-cpuspecific-avx-abi.c
==