Re: [RFC, x86] Changes for AVX and AVX2 processors
I've fixed Changelog. Can we commit the patch to trunk now? 2012-12-27 Vladimir Yakovlev vladimir.b.yakov...@intel.com * config/i386/i386-c.c (ix86_target_macros_internal): New case. (ix86_target_macros_internal): Likewise. * config/i386/i386.c (m_CORE2I7): Removed. (m_CORE_HASWELL): New macro. (m_CORE_ALL): Likewise. (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL. (initial_ix86_arch_features): Likewise. (processor_target_table): Initializations for Core avx2. (cpu_names): New names core-avx2. (ix86_option_override_internal): Changed PROCESSOR_COREI7 by PROCESSOR_CORE_HASWELL. (ix86_issue_rate): New case. (ia32_multipass_dfa_lookahead): Likewise. (ix86_sched_init_global): Likewise. * config/i386/i386.h (TARGET_HASWELL): New macro. (target_cpu_default): New TARGET_CPU_DEFAULT_haswell. (processor_type): New PROCESSOR_HASWELL. 2013/1/10 Jakub Jelinek ja...@redhat.com: On Thu, Jan 10, 2013 at 12:28:24PM +0100, Uros Bizjak wrote: On Thu, Jan 10, 2013 at 12:12 PM, Vladimir Yakovlev vbyakov...@gmail.com wrote: It seems I didn't sent a patch with last changes. Sorry if so. Vladimir 2012-12-27 Vladimir Yakovlev vladimir.b.yakov...@intel.com Missing at the end of line. * config/i386/i386-c.c (ix86_target_macros_internal): New case. (ix86_target_macros_internal): Likewise. There is some additional space at the beginning of this line (note, all ChangeLog lines but the one with date should be tab indented, not space). Jakub
Re: [RFC, x86] Changes for AVX and AVX2 processors
I sent the patch. Send it once more. 2013/1/11 Jakub Jelinek ja...@redhat.com: On Fri, Jan 11, 2013 at 03:25:47PM +0400, Vladimir Yakovlev wrote: I've fixed Changelog. Can we commit the patch to trunk now? 2012-12-27 Vladimir Yakovlev vladimir.b.yakov...@intel.com * config/i386/i386-c.c (ix86_target_macros_internal): New case. (ix86_target_macros_internal): Likewise. * config/i386/i386.c (m_CORE2I7): Removed. (m_CORE_HASWELL): New macro. (m_CORE_ALL): Likewise. (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL. (initial_ix86_arch_features): Likewise. (processor_target_table): Initializations for Core avx2. (cpu_names): New names core-avx2. (ix86_option_override_internal): Changed PROCESSOR_COREI7 by PROCESSOR_CORE_HASWELL. (ix86_issue_rate): New case. (ia32_multipass_dfa_lookahead): Likewise. (ix86_sched_init_global): Likewise. * config/i386/i386.h (TARGET_HASWELL): New macro. (target_cpu_default): New TARGET_CPU_DEFAULT_haswell. (processor_type): New PROCESSOR_HASWELL. Uros already acked the patch, so it certainly is ok to commit now. Jakub patch1 Description: Binary data
Re: [RFC, x86] Changes for AVX and AVX2 processors
Kirill, Could you commit patch? 2013-01-11 Vladimir Yakovlev vladimir.b.yakov...@intel.com * config/i386/i386-c.c (ix86_target_macros_internal): New case. (ix86_target_macros_internal): Likewise. * config/i386/i386.c (m_CORE2I7): Removed. (m_CORE_HASWELL): New macro. (m_CORE_ALL): Likewise. (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL. (initial_ix86_arch_features): Likewise. (processor_target_table): Initializations for Core avx2. (cpu_names): New names core-avx2. (ix86_option_override_internal): Changed PROCESSOR_COREI7 by PROCESSOR_CORE_HASWELL. (ix86_issue_rate): New case. (ia32_multipass_dfa_lookahead): Likewise. (ix86_sched_init_global): Likewise. * config/i386/i386.h (TARGET_HASWELL): New macro. (target_cpu_default): New TARGET_CPU_DEFAULT_haswell. (processor_type): New PROCESSOR_HASWELL. 2013/1/11 Uros Bizjak ubiz...@gmail.com: On Fri, Jan 11, 2013 at 1:14 PM, Vladimir Yakovlev vbyakov...@gmail.com wrote: I sent the patch. Send it once more. 2013/1/11 Jakub Jelinek ja...@redhat.com: On Fri, Jan 11, 2013 at 03:25:47PM +0400, Vladimir Yakovlev wrote: I've fixed Changelog. Can we commit the patch to trunk now? 2012-12-27 Vladimir Yakovlev vladimir.b.yakov...@intel.com * config/i386/i386-c.c (ix86_target_macros_internal): New case. (ix86_target_macros_internal): Likewise. * config/i386/i386.c (m_CORE2I7): Removed. (m_CORE_HASWELL): New macro. (m_CORE_ALL): Likewise. (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL. (initial_ix86_arch_features): Likewise. (processor_target_table): Initializations for Core avx2. (cpu_names): New names core-avx2. (ix86_option_override_internal): Changed PROCESSOR_COREI7 by PROCESSOR_CORE_HASWELL. (ix86_issue_rate): New case. (ia32_multipass_dfa_lookahead): Likewise. (ix86_sched_init_global): Likewise. * config/i386/i386.h (TARGET_HASWELL): New macro. (target_cpu_default): New TARGET_CPU_DEFAULT_haswell. (processor_type): New PROCESSOR_HASWELL. Uros already acked the patch, so it certainly is ok to commit now. Yes, the patch is OK, you can commit it to mainline SVN. If you are unable to commit, please say so in the patch proposal, so someone will commit the patch for you (as explained in [1]). [1] http://gcc.gnu.org/contribute.html Uros. patch1 Description: Binary data
Re: [RFC, x86] Changes for AVX and AVX2 processors
Hello Uros, It seems I didn't sent a patch with last changes. Sorry if so. Vladimir 2012-12-27 Vladimir Yakovlev vladimir.b.yakov...@intel.com * config/i386/i386-c.c (ix86_target_macros_internal): New case. (ix86_target_macros_internal): Likewise. * config/i386/i386.c (m_CORE2I7): Removed. (m_CORE_HASWELL): New macro. (m_CORE_ALL): Likewise. (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL. (initial_ix86_arch_features): Likewise. (processor_target_table): Initializations for Core avx2. (cpu_names): New names core-avx2. (ix86_option_override_internal): Changed PROCESSOR_COREI7 by PROCESSOR_CORE_HASWELL. (ix86_issue_rate): New case. (ia32_multipass_dfa_lookahead): Likewise. (ix86_sched_init_global): Likewise. * config/i386/i386.h (TARGET_HASWELL): New macro. (target_cpu_default): New TARGET_CPU_DEFAULT_haswell. (processor_type): New PROCESSOR_HASWELL. 2012/12/30 Uros Bizjak ubiz...@gmail.com: On Sun, Dec 30, 2012 at 5:05 PM, Vladimir Yakovlev vbyakov...@gmail.com wrote: I fixed typos and added CalangeLog. 2012-12-27 Vladimir Yakovlev vladimir.b.yakov...@intel.com * config/i386/i386-c.c (ix86_target_macros_internal): New case. (ix86_target_macros_internal): Likewise. * config/i386/i386.c (m_CORE2I7): Removed. (m_CORE_HASWELL): New macro. (m_CORE_ALL): Likewise. (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL. (initial_ix86_arch_features): Likewise. (processor_target_table): Initializations for Core avx2. (cpu_names): New names core-avx2. (ix86_option_override_internal): Changed PROCESSOR_COREI7 by PROCESSOR_CORE_HASWELL. (ix86_issue_rate): New case. (ia32_multipass_dfa_lookahead): Likewise. (ix86_sched_init_global): Likewise. (get_builtin_code_for_version): Likewise. * config/i386/i386.h (TARGET_HASWELL): New macro. (target_cpu_default): New TARGET_CPU_DEFAULT_haswell. (processor_type): New PROCESSOR_HASWELL. Please remove this part, it should be part of processor dispatcher part: @@ -28705,6 +28712,10 @@ get_builtin_code_for_version (tree decl, tree *predicate_list) arg_str = corei7; priority = P_PROC_SSE4_2; break; + case PROCESSOR_HASWELL: + arg_str = core-avx2; + priority = P_PROC_SSE4_2; + break; case PROCESSOR_ATOM: arg_str = atom; priority = P_PROC_SSSE3; Uros. patch1 Description: Binary data
Re: [RFC, x86] Changes for AVX and AVX2 processors
I fixed typos and added CalangeLog. 2012-12-27 Vladimir Yakovlev vladimir.b.yakov...@intel.com * config/i386/i386-c.c (ix86_target_macros_internal): New case. (ix86_target_macros_internal): Likewise. * config/i386/i386.c (m_CORE2I7): Removed. (m_CORE_HASWELL): New macro. (m_CORE_ALL): Likewise. (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL. (initial_ix86_arch_features): Likewise. (processor_target_table): Initializations for Core avx2. (cpu_names): New names core-avx2. (ix86_option_override_internal): Changed PROCESSOR_COREI7 by PROCESSOR_CORE_HASWELL. (ix86_issue_rate): New case. (ia32_multipass_dfa_lookahead): Likewise. (ix86_sched_init_global): Likewise. (get_builtin_code_for_version): Likewise. * config/i386/i386.h (TARGET_HASWELL): New macro. (target_cpu_default): New TARGET_CPU_DEFAULT_haswell. (processor_type): New PROCESSOR_HASWELL. 2012/12/30 Uros Bizjak ubiz...@gmail.com: On Sat, Dec 29, 2012 at 5:57 PM, Vladimir Yakovlev vbyakov...@gmail.com wrote: I did changes. Please take a look. 2012/12/29, Uros Bizjak ubiz...@gmail.com: On Sat, Dec 29, 2012 at 6:26 AM, Vladimir Yakovlev vbyakov...@gmail.com wrote: processor_alias_table contains the same processor type for all corei7, corei7-avx, core-avx-i and core-avx2. At least, it has consequence on checking x86_avx256_split_unaligned_load ix86_tune_mask: for all these processors it results the same. Moreover we cannot turn new features on for AVX/AVX2 using initial_ix86_tune_features. corei7, corei7-avx and core-avx-i are all based on sandybridge (= PROCESSOR_COREI7) architecture. The only problematic entry is core-avx2, which should be based on new architecture. I propose PROCESSOR_HASWELL, in the same way as we have PROCESSOR_NOCONA. @@ -2467,6 +2470,7 @@ nocona, core2, corei7, + coreavx2, atom, geode, k6, This string should match processor_alias_table name, so core-avx2. @@ -28709,6 +28716,10 @@ arg_str = corei7; priority = P_PROC_SSE4_2; break; + case PROCESSOR_HASWELL: + arg_str = core_avx2; + priority = P_PROC_SSE4_2; + break; case PROCESSOR_ATOM: arg_str = atom; priority = P_PROC_SSSE3; This is part of a processor dispatcher functionality. To support this functionality, some more changes are needed, so it is IMO best to leave this part out for now. I would also like the author of processor dispatcher to review changes in this area. On a related note, it looks to me that corei7 should declare P_PROC_AVX here (this change should be part of another patch). Other than that , the patch looks OK, but please repost final version with a correct ChangeLog. Uros. patch Description: Binary data
[RFC, x86] Changes for AVX and AVX2 processors
I did changes. Please take a look. 2012/12/29, Uros Bizjak ubiz...@gmail.com: On Sat, Dec 29, 2012 at 6:26 AM, Vladimir Yakovlev vbyakov...@gmail.com wrote: processor_alias_table contains the same processor type for all corei7, corei7-avx, core-avx-i and core-avx2. At least, it has consequence on checking x86_avx256_split_unaligned_load ix86_tune_mask: for all these processors it results the same. Moreover we cannot turn new features on for AVX/AVX2 using initial_ix86_tune_features. corei7, corei7-avx and core-avx-i are all based on sandybridge (= PROCESSOR_COREI7) architecture. The only problematic entry is core-avx2, which should be based on new architecture. I propose PROCESSOR_HASWELL, in the same way as we have PROCESSOR_NOCONA. Uros. diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 08e1afe..2d8abd5 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -142,11 +142,7 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, __corei7); def_or_undef (parse_in, __corei7__); break; -case PROCESSOR_CORE_AVX: - def_or_undef (parse_in, __core_avx); - def_or_undef (parse_in, __core_avx__); - break; -case PROCESSOR_CORE_AVX2: +case PROCESSOR_HASWELL: def_or_undef (parse_in, __core_avx2); def_or_undef (parse_in, __core_avx2__); break; @@ -240,10 +236,7 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, case PROCESSOR_COREI7: def_or_undef (parse_in, __tune_corei7__); break; -case PROCESSOR_CORE_AVX: - def_or_undef (parse_in, __tune_core_avx__); - break; -case PROCESSOR_CORE_AVX2: +case PROCESSOR_HASWELL: def_or_undef (parse_in, __tune_core_avx2__); break; case PROCESSOR_ATOM: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 10411da..4adbef6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -1732,9 +1732,8 @@ const struct processor_costs *ix86_cost = pentium_cost; #define m_P4_NOCONA (m_PENT4 | m_NOCONA) #define m_CORE2 (1PROCESSOR_CORE2) #define m_COREI7 (1PROCESSOR_COREI7) -#define m_CORE_AVX (1PROCESSOR_CORE_AVX) -#define m_CORE_AVX2 (1PROCESSOR_CORE_AVX2) -#define m_CORE_ALL (m_CORE2 | m_COREI7 | m_CORE_AVX | m_CORE_AVX2) +#define m_HASWELL (1PROCESSOR_HASWELL) +#define m_CORE_ALL (m_CORE2 | m_COREI7 | m_HASWELL) #define m_ATOM (1PROCESSOR_ATOM) #define m_GEODE (1PROCESSOR_GEODE) @@ -2438,8 +2437,6 @@ static const struct ptt processor_target_table[PROCESSOR_max] = {core_cost, 16, 10, 16, 10, 16}, /* Core i7 */ {core_cost, 16, 10, 16, 10, 16}, - /* Core avx */ - {core_cost, 16, 10, 16, 10, 16}, /* Core avx2 */ {core_cost, 16, 10, 16, 10, 16}, {generic32_cost, 16, 7, 16, 7, 16}, @@ -2469,7 +2466,6 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = nocona, core2, corei7, - coreavx, coreavx2, atom, geode, @@ -2912,17 +2908,17 @@ ix86_option_override_internal (bool main_args_p) {corei7, PROCESSOR_COREI7, CPU_COREI7, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_CX16 | PTA_FXSR}, - {corei7-avx, PROCESSOR_CORE_AVX, CPU_COREI7, + {corei7-avx, PROCESSOR_COREI7, CPU_COREI7, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT}, - {core-avx-i, PROCESSOR_CORE_AVX, CPU_COREI7, + {core-avx-i, PROCESSOR_COREI7, CPU_COREI7, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE | PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT}, - {core-avx2, PROCESSOR_CORE_AVX2, CPU_COREI7, + {core-avx2, PROCESSOR_HASWELL, CPU_COREI7, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2 | PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE @@ -24069,8 +24065,7 @@ ix86_issue_rate (void) case PROCESSOR_PENTIUM4: case PROCESSOR_CORE2: case PROCESSOR_COREI7: -case PROCESSOR_CORE_AVX: -case PROCESSOR_CORE_AVX2: +case PROCESSOR_HASWELL: case PROCESSOR_ATHLON: case PROCESSOR_K8: case PROCESSOR_AMDFAM10: @@ -24327,8 +24322,7 @@ ia32_multipass_dfa_lookahead (void) case PROCESSOR_CORE2: case PROCESSOR_COREI7: -case PROCESSOR_CORE_AVX: -case PROCESSOR_CORE_AVX2: +case PROCESSOR_HASWELL: case PROCESSOR_ATOM: /* Generally, we want haifa-sched:max_issue() to look ahead as far as many instructions can be executed on a cycle, i.e., @@ -24873,8 +24867,7 @@ ix86_sched_init_global (FILE *dump ATTRIBUTE_UNUSED, { case PROCESSOR_CORE2: case
Re: [RFC, x86] Changes for AVX and AVX2 processors
Hello, processor_alias_table contains the same processor type for all corei7, corei7-avx, core-avx-i and core-avx2. At least, it has consequence on checking x86_avx256_split_unaligned_load ix86_tune_mask: for all these processors it results the same. Moreover we cannot turn new features on for AVX/AVX2 using initial_ix86_tune_features. . 2012/12/28 Uros Bizjak ubiz...@gmail.com: Hello! New processors core-avx and core-avx2 are added. It was done to have possibilities to turn new features on for these processors. Please review. I don't think this is a good approach, you are mixing an architecture with an ISA extension in the name. We already have processor_alias_table, where processor architecture and features (extensions) can be activated, depending on the name. Uros.
Fwd: [off-list] Re: [PATCH] Vzeroupper placement/47440
-- Forwarded message -- From: Vladimir Yakovlev vbyakov...@gmail.com Date: 2012/11/9 Subject: Re: [off-list] Re: [PATCH] Vzeroupper placement/47440 To: Uros Bizjak ubiz...@gmail.com Копия: H.J. Lu hjl.to...@gmail.com, Igor Zamyatin izamya...@gmail.com I did changes that moves vzeroupper insertion after reload 2012-11-09 Vladimir Yakovlev vladimir.b.yakov...@intel.com * i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New. * config/i386/i386.c (ix86_init_machine_status): Deleted initialization for mode switching. * i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New. * mode-switching.c (gate_mode_switching1): New. (rest_of_handle_mode_switching1): New. (pass_mode_switching1): New. * passes.c (init_optimization_passes): New pass pass_mode_switching1. * tree-pass.h (pass_mode_switching1): New. But this caused assertion fails in rtl_verify_flow_info_1 () at cfgrtl.c:2291 fatal_insn (flow control insn inside a basic block, x); The asserts are called by two calls of mode-switching.c: commit_edge_insertion and cleanup_cfg. After I commented (see below) 459.GemsFDTD benchspec passed. Your opinion of the patch and haw we can do something with asserts. Regards, Vladimir --- a/gcc/mode-switching.c +++ b/gcc/mode-switching.c @@ -747,7 +747,7 @@ optimize_mode_switching (void) commit_edge_insertions (); #if defined (MODE_ENTRY) defined (MODE_EXIT) - cleanup_cfg (CLEANUP_NO_INSN_DEL); + /*cleanup_cfg (CLEANUP_NO_INSN_DEL);*/ #else if (!need_commit !emitted) return 0; --- a/gcc/cfgrtl.c +++ b/gcc/cfgrtl.c @@ -1828,7 +1828,7 @@ commit_edge_insertions (void) basic_block bb; #ifdef ENABLE_CHECKING - verify_flow_info (); + /*verify_flow_info ();*/ #endif 2012/11/9 Uros Bizjak ubiz...@gmail.com: On Thu, Nov 8, 2012 at 6:52 PM, Uros Bizjak ubiz...@gmail.com wrote: Uh, this is spill around call insn, produced by reload. Please compile this code: double test (double a) { printf (Hello\n); return a; } You will get at mode switching: 1 NOTE_INSN_DELETED 4 NOTE_INSN_BASIC_BLOCK 2 r60:DF=xmm0:DF REG_DEAD: xmm0:DF 3 NOTE_INSN_FUNCTION_BEG 6 di:DI=`*.LC0' 7 call ... REG_DEAD: di:DI REG_UNUSED: ax:SI 12 xmm0:DF=r60:DF REG_DEAD: r60:DF 15 use xmm0:DF But reload will insert: 1 NOTE_INSN_DELETED 4 NOTE_INSN_BASIC_BLOCK 2 xmm0:DF=xmm0:DF REG_DEAD: xmm0:DF 18 [sp:DI+0x8]=xmm0:DF REG_DEAD: xmm0:DF 3 NOTE_INSN_FUNCTION_BEG 6 di:DI=`*.LC0' 7 call ... REG_DEAD: di:DI REG_UNUSED: ax:SI 19 xmm0:DF=[sp:DI+0x8] REG_DEAD: r62:DF 12 xmm0:DF=xmm0:DF REG_DEAD: xmm0:DF 15 use xmm0:DF I was not paying attention to this situation. A viable solution to this issue is through machine-reorg function (AKA x86_reorg) that would just move vzeroupper to the close proximity to a call insn. This would work on non-64bit-MS-ABI targets, where all SSE registers are dead at call insn place. Please note that 64bit-MS-ABI target declares registers xmm6+ as call-saved, so they can live over the call. I am not familiar with this target, but it looks to me that we have to remove vzeroupper, if one or more call-saved SSE registers are live at the call insn place. Uros. prvzu.patch Description: Binary data
Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
These assert should tell you what is wrong with the control flow. Please look at control_flow_insn_p, which condition returns true. There is a note after call insn. (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200 _gfortran_stop_string) [0 _gfortran_stop_string S1 A8]) (const_int 0 [0])) huygens.fppized.f90:190 616 {*call} (expr_list:REG_DEAD (reg:DI 5 di) (expr_list:REG_DEAD (reg:SI 4 si) (expr_list:REG_NORETURN (const_int 0 [0]) (nil (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di)) (expr_list:REG_BR_PRED (use (reg:SI 4 si)) (nil (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si) (const_int 0 [0])) (nil))) NOTE_INSN_CALL_ARG_LOCATION) You shouldn't disable commit_edge_insertions, as there is the function where vzerouppers are emitted. I didn;t disable commit_edge_insertions. I only remove call of assert. 2012/11/9 Uros Bizjak ubiz...@gmail.com: On Fri, Nov 9, 2012 at 11:45 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Nov 9, 2012 at 11:21 AM, Vladimir Yakovlev vbyakov...@gmail.com wrote: I did changes that moves vzeroupper insertion after reload 2012-11-09 Vladimir Yakovlev vladimir.b.yakov...@intel.com * i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New. * config/i386/i386.c (ix86_init_machine_status): Deleted initialization for mode switching. * i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New. * mode-switching.c (gate_mode_switching1): New. (rest_of_handle_mode_switching1): New. (pass_mode_switching1): New. * passes.c (init_optimization_passes): New pass pass_mode_switching1. * tree-pass.h (pass_mode_switching1): New. But this caused assertion fails in rtl_verify_flow_info_1 () at cfgrtl.c:2291 fatal_insn (flow control insn inside a basic block, x); The asserts are called by two calls of mode-switching.c: commit_edge_insertion and cleanup_cfg. After I commented (see below) 459.GemsFDTD benchspec passed. Your opinion of the patch and haw we can do something with asserts. These assert should tell you what is wrong with the control flow. Please look at control_flow_insn_p, which condition returns true. You shouldn't disable commit_edge_insertions, as there is the function where vzerouppers are emitted. Uros.
Re: [PATCH] Vzeroupper placement/47440
Hello, Thanyou for investigation and fixing the problem. I'll answer on remarks later. Regards, Vladimir 2012/11/7 Jakub Jelinek ja...@redhat.com: On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote: On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, OK for mainline SVN, please commit. Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html Thanks, K This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224 Not only that, it also broke --enable-checking=yes,rtl bootstrap. SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and the code looks for reg, so I think looking for CALL was just a mistake. This fixes the bootstrap, ok for trunk? 2012-11-06 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx_u128_mode_after): Don't look for reg in CALL operand. --- gcc/config/i386/i386.c.jj 2012-11-06 18:10:22.0 +0100 +++ gcc/config/i386/i386.c 2012-11-06 20:15:09.068912242 +0100 @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx /* Check for CALL instruction. */ if (CALL_P (insn)) { - if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL) + if (GET_CODE (pat) == SET) reg = SET_DEST (pat); - else if (GET_CODE (pat) == PARALLEL) + else if (GET_CODE (pat) == PARALLEL) for (i = XVECLEN (pat, 0) - 1; i = 0; i--) { rtx x = XVECEXP (pat, 0, i); Jakub
Re: [PATCH] Vzeroupper placement/47440
I tested changes with configure ../gcc/configure --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran,java,lto,objc --with-arch=corei7-avx --with-cpu=corei7-avx Bootstrap is passed and no new fails in make check. Thank you, Vladimir 2012/11/7 Vladimir Yakovlev vbyakov...@gmail.com: Hello, Thanyou for investigation and fixing the problem. I'll answer on remarks later. Regards, Vladimir 2012/11/7 Jakub Jelinek ja...@redhat.com: On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote: On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, OK for mainline SVN, please commit. Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html Thanks, K This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224 Not only that, it also broke --enable-checking=yes,rtl bootstrap. SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and the code looks for reg, so I think looking for CALL was just a mistake. This fixes the bootstrap, ok for trunk? 2012-11-06 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx_u128_mode_after): Don't look for reg in CALL operand. --- gcc/config/i386/i386.c.jj 2012-11-06 18:10:22.0 +0100 +++ gcc/config/i386/i386.c 2012-11-06 20:15:09.068912242 +0100 @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx /* Check for CALL instruction. */ if (CALL_P (insn)) { - if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL) + if (GET_CODE (pat) == SET) reg = SET_DEST (pat); - else if (GET_CODE (pat) == PARALLEL) + else if (GET_CODE (pat) == PARALLEL) for (i = XVECLEN (pat, 0) - 1; i = 0; i--) { rtx x = XVECEXP (pat, 0, i); Jakub
Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return
Hellow, Kaz I've updated copyright. Is it Ok? Thanks, Vladimir --- a/gcc/mode-switching.c +++ b/gcc/mode-switching.c @@ -1,6 +1,6 @@ /* CPU mode switching Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, - 2009, 2010 Free Software Foundation, Inc. + 2009, 2010, 2011, 2012 Free Software Foundation, Inc. This file is part of GCC. 2012/11/5 Kaz Kojima kkoj...@rr.iij4u.or.jp: Uros Bizjak ubiz...@gmail.com wrote: 2012-11-04 Vladimir Yakovlev vladimir.b.yakov...@intel.com Uros Bizjak ubiz...@gmail.com * mode-switching.c (create_pre_exit): Added code for maybe_builtin_apply case. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu, with vzeroupper patch [1] applied. I have added SH4 maintainer for possible comments. I've confirmed that there are no new failures with the patch on sh4-unknown-linux-gnu. BTW, it looks that the copyright year of mode-switching.c should be updated. Regards, kaz
Re: [PATCH] Changes in mode switching
2012/9/30 Uros Bizjak ubiz...@gmail.com: On Thu, Sep 20, 2012 at 8:35 AM, Uros Bizjak ubiz...@gmail.com wrote: On Thu, Sep 20, 2012 at 8:06 AM, Vladimir Yakovlev vbyakov...@gmail.com wrote: The compiler with the patch and without post_reload.patch is built and works successfully. It has the only failure with avx-vzeroupper-3 test because of post reload problem. Ok, can you please elaborate a bit on this filure? Perhaps someone has an idea why reload moves unspec_volatile around? LRA will eventually replace reload in the nearby future [1], does LRA also move unspec_volatile vzeroupper around? [1] http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01862.html Uros. I tried my patch with LRA. It works fine. The test avx-vzeroupper-3 runs succesfully, unspec_volatile vzeroupper is not moved around in LRA. Vladimir
Re: [PATCH] Changes in mode switching
Will we wait for LRA commit or is it possiple to commit to trank vzeroupper patch now? 2012/10/2 Uros Bizjak ubiz...@gmail.com: On Tue, Oct 2, 2012 at 11:35 AM, Vladimir Yakovlev vbyakov...@gmail.com wrote: The compiler with the patch and without post_reload.patch is built and works successfully. It has the only failure with avx-vzeroupper-3 test because of post reload problem. Ok, can you please elaborate a bit on this filure? Perhaps someone has an idea why reload moves unspec_volatile around? LRA will eventually replace reload in the nearby future [1], does LRA also move unspec_volatile vzeroupper around? [1] http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01862.html I tried my patch with LRA. It works fine. The test avx-vzeroupper-3 runs succesfully, unspec_volatile vzeroupper is not moved around in LRA. Great! This also means +1 to include LRA in 4.8 from x86 maintainer. We also expect spill falure fixes and other improvements for pre-reload scheduling from LRA. Uros.
Re: [PATCH] Changes in mode switching
Hi Ricard, You are right I no need the changes in mode-switchig.c at all. After I remove additional argument from EMIT_MODE_SET and run 'make check' I found no differences with make check result of previous run. So I no need in any changes in the middle end part. Regards, Vladimir P.S. I'll be in vacation till end of nonth. Vladimir Yakovlev vbyakov...@gmail.com writes: I reproduced the failure and found reason of it. I understood haw it resolve and now I need small changes only - additional argument of EMIT_MODE_SET. Is it good fo trunk? I'm not sure I understand why you need to know the instruction. The x86 code was: + if (mode == AVX_U128_CLEAN) + { + if (insn) + { + rtx pat = PATTERN(insn); + if (!is_vzeroupper(pat) !is_vzeroall(pat)) + ix86_emit_vzeroupper (); + } + else + ix86_emit_vzeroupper (); + } + break; But the pass should already know via MODE_AFTER that the mode is set to AVX_U128_CLEAN by vzeroupper and vzeroall. Under what circumstances do we think that we need to set the mode to AVX_U128_CLEAN immediately before vzeroupper or vzeroall? I'm probably making you repeat yourself here, sorry. Richard 2012/9/16 Richard Sandiford rdsandif...@googlemail.com: Vladimir Yakovlev vbyakov...@gmail.com writes: I reproduced the failure and found reason of it. I understood haw it resolve and now I need small changes only - additional argument of EMIT_MODE_SET. Is it good fo trunk? I'm not sure I understand why you need to know the instruction. The x86 code was: + if (mode == AVX_U128_CLEAN) + { + if (insn) + { + rtx pat = PATTERN(insn); + if (!is_vzeroupper(pat) !is_vzeroall(pat)) + ix86_emit_vzeroupper (); + } + else + ix86_emit_vzeroupper (); + } + break; But the pass should already know via MODE_AFTER that the mode is set to AVX_U128_CLEAN by vzeroupper and vzeroall. Under what circumstances do we think that we need to set the mode to AVX_U128_CLEAN immediately before vzeroupper or vzeroall? I'm probably making you repeat yourself here, sorry. Richard
Re: [PATCH] Changes in mode switching
I tried to perform vzeroupper emitting after reload as additional pass of mode switching. I sow one problem that I don't know haw to overcome. After 'pro_and_epilogue', there can be no flow edge to exit block and pre_exit block is not created in this case (see rotine create_pre_exit). Without that I cannot properly perform vzeroupper insertion at rotine exit. Regards, Vladimir 2012/9/18 Uros Bizjak ubiz...@gmail.com: Hello! You are right I no need the changes in mode-switchig.c at all. After I remove additional argument from EMIT_MODE_SET and run 'make check' I found no differences with make check result of previous run. So I no need in any changes in the middle end part. Vladimir, can you please investigate, how to emit vzeroupper insns after reload? Vzeroupper emits hard registers, and reload moves the insn around even when declared with unspec_volatile. Uros.
Re: [PATCH] Changes in mode switching
Looks OK to me, though I have no authority to approve it except SH specific part. Is there any more comments? Can it be committed in trank? Regards, Vladimir 2012/9/14 Kaz Kojima kkoj...@rr.iij4u.or.jp: Vladimir Yakovlev vbyakov...@gmail.com wrote: I reproduced the failure and found reason of it. I understood haw it resolve and now I need small changes only - additional argument of EMIT_MODE_SET. Is it good fo trunk? Thank you, Vladimir 2012-09-14 Vladimir Yakovlev vladimir.b.yakov...@intel.com * (optimize_mode_switching): Added an argument EMIT_MODE_SET calls. * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument. * config/i386/i386.h (EMIT_MODE_SET): Added an argument. * config/sh/sh.h (EMIT_MODE_SET): Added an argument. No new failures on sh4-unknown-linux-gnu with your patch. Looks OK to me, though I have no authority to approve it except SH specific part. BTW, I guess that the active voice is usual in gcc/ChangeLog. Also, perhaps mailer issue, a tab should be used for indentation instead of 8 spaces and the empty line isn't required between items. Maybe something like * mode-switching.c (optimize_mode_switching): Add an argument EMIT_MODE_SET calls. * config/epiphany/epiphany.h (EMIT_MODE_SET): Add an argument. * config/i386/i386.h (EMIT_MODE_SET): Likewise. * config/sh/sh.h (EMIT_MODE_SET): Likewise. is a usual form. Regards, kaz
Re: [PATCH] Changes in mode switching
Hello, I reproduced the failure and found reason of it. I understood haw it resolve and now I need small changes only - additional argument of EMIT_MODE_SET. Is it good fo trunk? Thank you, Vladimir 2012-09-14 Vladimir Yakovlev vladimir.b.yakov...@intel.com * (optimize_mode_switching): Added an argument EMIT_MODE_SET calls. * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument. * config/i386/i386.h (EMIT_MODE_SET): Added an argument. * config/sh/sh.h (EMIT_MODE_SET): Added an argument. 2012/8/29 Vladimir Yakovlev vbyakov...@gmail.com: I built using last configure. Thank you, Vladimir 2012/8/29 Kaz Kojima kkoj...@rr.iij4u.or.jp: I tryed ../gcc/configure --host=i686-pc-linux-gnu --target=sh4-unknown-linux-gnu --enable-build-with-cxx --enable-lto --enable-shared --enable-threads=posix --enable-clocale=gnu --enable-libitm --enable-libgcj --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu --with-mpc=/opt2/i686-pc-linux-gnu --with-libelf=/opt2/i686-pc-linux-gnu --with-ppl=no --enable-languages=c,c++,fortran,java,lto,objc --prefix=/export/users/mstester/stability/work/trunk/64/install_sh4 and have got build error. make.log attached. Could you take a look? make.log says make[2]: i686-pc-linux-gnu-ar: Command not found It looks your build system is x86_64-unknown-linux-gnu. Perhaps with specifying --host=x86_64-unknown-linux-gnu instead of --host=i686-pc-linux-gnu in your configuration, that error could be resolved, though --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu --with-mpc=/opt2/i686-pc-linux-gnu --with-libelf=/opt2/i686-pc-linux-gnu are strongly specific to my environment. Maybe ../gcc/configure --host=x86_64-unknown-linux-gnu --target=sh4-unknown-linux-gnu --enable-languages=c and make all-gcc is enough to get cc1 for sh4-unknown-linux-gnu. Best Regards, kaz middle.patch Description: Binary data
Re: [PATCH] Changes in mode switching
Additionaly. You can find the patch history in http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01590.html. I need this changes for my implementation of vzeroupper placement: for some statements I have no needs doing real insertion. I tested the changes on bootstrap using config ../gcc/configure --prefix=/export/users/vbyakovl/workspaces/vzu/install-middle --enable-languages=c,c++,fortran 2012/9/14 Vladimir Yakovlev vbyakov...@gmail.com: Hello, I reproduced the failure and found reason of it. I understood haw it resolve and now I need small changes only - additional argument of EMIT_MODE_SET. Is it good fo trunk? Thank you, Vladimir 2012-09-14 Vladimir Yakovlev vladimir.b.yakov...@intel.com * (optimize_mode_switching): Added an argument EMIT_MODE_SET calls. * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument. * config/i386/i386.h (EMIT_MODE_SET): Added an argument. * config/sh/sh.h (EMIT_MODE_SET): Added an argument. 2012/8/29 Vladimir Yakovlev vbyakov...@gmail.com: I built using last configure. Thank you, Vladimir 2012/8/29 Kaz Kojima kkoj...@rr.iij4u.or.jp: I tryed ../gcc/configure --host=i686-pc-linux-gnu --target=sh4-unknown-linux-gnu --enable-build-with-cxx --enable-lto --enable-shared --enable-threads=posix --enable-clocale=gnu --enable-libitm --enable-libgcj --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu --with-mpc=/opt2/i686-pc-linux-gnu --with-libelf=/opt2/i686-pc-linux-gnu --with-ppl=no --enable-languages=c,c++,fortran,java,lto,objc --prefix=/export/users/mstester/stability/work/trunk/64/install_sh4 and have got build error. make.log attached. Could you take a look? make.log says make[2]: i686-pc-linux-gnu-ar: Command not found It looks your build system is x86_64-unknown-linux-gnu. Perhaps with specifying --host=x86_64-unknown-linux-gnu instead of --host=i686-pc-linux-gnu in your configuration, that error could be resolved, though --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu --with-mpc=/opt2/i686-pc-linux-gnu --with-libelf=/opt2/i686-pc-linux-gnu are strongly specific to my environment. Maybe ../gcc/configure --host=x86_64-unknown-linux-gnu --target=sh4-unknown-linux-gnu --enable-languages=c and make all-gcc is enough to get cc1 for sh4-unknown-linux-gnu. Best Regards, kaz
Re: [PATCH] Changes in mode switching
Thank you for testing. With commenting out if (i != mode) of the hunk I changed type of transp and added this checking because if we reset transp[mode], then later in the loop FOR_EACH_BB (bb) sbitmap_not (kill[bb-index], transp[i][bb-index]); we set kill of the bb for that mode and thereby force insertion mode switching for the mode in succeeding blocks in any case. Regards, Vladimir 2012/8/24 Kaz Kojima kkoj...@rr.iij4u.or.jp: I've tried the patch on sh4-unknown-linux-gnu. I see new failures with it: Here is a reduced test case for sh4-unknown-linux-gnu. volatile double gd[32]; volatile float gf[32]; int main () { int i; for (i = 0; i 32; i++) gd[i] = i * 4, gf[i] = i; for (i = 0; i 32; i++) if (gd[i] != i * 4 || gf[i] != i) abort (); exit (0); } The problem occurs at the second loop. With the patch, the only mode switching is done at just before gf[i] != i. OTOH the original compiler inserts mode switchings both at before gd[i] != i * 4 and gf[i] != i. With commenting out if (i != mode) of the hunk @@ -530,10 +535,16 @@ optimize_mode_switching (void) last_mode = mode; ptr = new_seginfo (mode, insn, bb-index, live_now); add_seginfo (info + bb-index, ptr); - RESET_BIT (transp[bb-index], j); + for (i = 0 ; i max_num_modes; i++) + if (i != mode) + RESET_BIT (transp[i][bb-index], j); ... it looks all new failures go away. Regards, kaz
[PATCH] Changes in mode switching
I discoverd some inaccuracies when tried to implement vzeroupper insertion (pr#47440). First, I made 'transp' as an array of bit vectors rather bitvector because it should be own for each mode, otherwise its resetting on mode changing kills all modes (and new mode also). Another changes concern processing of mode switching inside a basic block. I also added addition argument to EMIT_MODE_SET because it is needed me in target dependent changes. Make check and bootstrap passed, no fails. I used compiler Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --prefix=/export/users/vbyakovl/workspaces/vzu/install-ref --enable-languages=c,c++,fortran --with-arch=corei7 --with-cpu=corei7 --with-fpmath=sse Ok for trank. 2012-08-25 Vladimir Yakovlev vladimir.b.yakov...@intel.com * mode-switching.c (transp): Changed type (make_preds_opaque): Added an argument (optimize_mode_switching): Some fixes which was done for vzeroupper insertion needs. * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument. * config/i386/i386.h (EMIT_MODE_SET): Added an argument. * config/sh/sh.h (EMIT_MODE_SET): Added an argument. patch Description: Binary data
Re: [PATCH, Atom] Fix performance regression with -mtune=atom
This is a ping. Change affects Atom only and was made because it really gives better performance on this architecture. This fact actually leads to the thought that old value is just a simple misprint. Please look. Vladimir 2011/9/30 Vladimir Yakovlev vbyakov...@gmail.com: This patch fixes performance regression with -mtune=atom. Changing atom cost removes regression in several tests of EEMBC and spec2000. Bootstrap amd make check Ok for both with and witout -mtune-atom. OK for trunk? 2011-09-30 Yakovlev Vladimir vladimir.b.yakov...@intel.com * gcc/config/i386/i386.c (atom_cost): Changed cost for loading QImode using movzbl. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7e89dbd..8a512a7 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -1672,7 +1672,7 @@ struct processor_costs atom_cost = { COSTS_N_INSNS (1), /* cost of movzx */ 8, /* large insn */ 17, /* MOVE_RATIO */ - 2, /* cost for loading QImode using movzbl */ + 4, /* cost for loading QImode using movzbl */ {4, 4, 4}, /* cost of loading integer registers in QImode, HImode and SImode. Relative to reg-reg move (2). */
Re: [PATCH, Atom] Fix performance regression with -mtune=atom
Could anyone checkin that? Thanks, Vladimir 2011/10/14 Uros Bizjak ubiz...@gmail.com: Hello! This is a ping. Change affects Atom only and was made because it really gives better performance on this architecture. This fact actually leads to the thought that old value is just a simple misprint. This patch fixes performance regression with -mtune=atom. Changing atom cost removes regression in several tests of EEMBC and spec2000. Bootstrap amd make check Ok for both with and witout -mtune-atom. OK for trunk? 2011-09-30 ?Yakovlev Vladimir ?vladimir.b.yakov...@intel.com ? ? ?* gcc/config/i386/i386.c (atom_cost): Changed cost for loading ? ? ? QImode using movzbl. OK. Thanks, Uros.
Fix performance regression with -mtune=atom
This patch fixes performance regression with -mtune=atom. Changing atom cost removes regression in several tests of EEMBC and spec2000. Bootstrap amd make check Ok for both with and witout -mtune-atom. OK for trunk? 2011-09-30 Yakovlev Vladimir vladimir.b.yakov...@intel.com * gcc/config/i386/i386.c (atom_cost): Changed cost for loading QImode using movzbl. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7e89dbd..8a512a7 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -1672,7 +1672,7 @@ struct processor_costs atom_cost = { COSTS_N_INSNS (1), /* cost of movzx */ 8, /* large insn */ 17, /* MOVE_RATIO */ - 2,/* cost for loading QImode using movzbl */ + 4, /* cost for loading QImode using movzbl */ {4, 4, 4}, /* cost of loading integer registers in QImode, HImode and SImode. Relative to reg-reg move (2). */
Re: [PATCH, testsuite] Fix for PR47440 - Use LCM for vzeroupper insertion
Hi Steven, I need a separate pass because the transformation on vzeroupper redandancy elemination must be performed when reload is completed. I think it is possible to make it as a separate pass in ix86-reorg if the phase works after reload. Thaks, Vladimir 2011/7/19 Steven Bosscher stevenb@gmail.com: * a/gcc/gcse.c (alloc_gcse_mem): Added code to run in PRE2. And this is necessary because...??? Why not just make it a separate pass in ix86-reorg that uses LCM? Look at mode switching for an example. Ciao! Steven
Re: [PATCH, testsuite] Fix for PR47440 - Use LCM for vzeroupper insertion
Hi Uros, Thank you for such a detailed explanation. I'll study it. Regards, Vladimir 2011/7/20 Uros Bizjak ubiz...@gmail.com: Hello! ? ? ? ?* a/gcc/gcse.c (alloc_gcse_mem): Added code to run in PRE2. And this is necessary because...??? Why not just make it a separate pass in ix86-reorg that uses LCM? Look at mode switching for an example. I was also expecting that vzeroupper would be inserted in the same way as I387 mode switching instructions are inserted. To expand on Steven's suggestion, please see i386.h for OPTIMIZE_MODE_SWITCHING and following macros. At the moment, there are 4 separate entities that handle (four independent) insertions for mode switching for x87 for each mode of fistp or frndint instruction. Mode insertions will actually insert calculations of x87 control word (CW) at optimal points and push this new CW (together with old CW) to known stack slot to be consumed by fistp/frndint insn. You can add a new entitiy to enum ix86_entity (say, AVX_VZEROUPPER) and update OPTIMIZE_MODE_SWITCHING to perform mode insertion for AVX_VZEROUPPER entitiy when needed. Various modes for AVX_VZEROUPPER are defined in NUM_MODES_FOR_MODE_SWITCHING, mode transition in MODE_NEEDED and insn insertions in EMIT_MODE_SET. Please note that LCM handles all entities in parallel, so there is no need for extra passes. The real worker for mode switching is ix86_mode_needed, but don't forget that you can disable mode switching pass per-function when not needed through OPTIMIZE_MODE_SWITCHING macro. FYI: Existing x87 CW initialization insertion works this way: - fistp/frndint is inserted into insn stream and corresponding OPTIMIZE_MODE_SWITCHING flag is set. - inserted insn has i386_cw attribute that defines requested mode in which the insn operate. Based on this attribute, MODE_NEEDED handles mode transitions (please note that there are four independent entities) for each entitiy. - EMIT_MODE_SET emits CW initializations. These are further optimized by follow-up optimization passes, so two consecutive initializations at the same place are CSEd, etc. Uros.