Re: [PATCH] Vzeroupper placement/47440
On Fri, Nov 9, 2012 at 2:28 PM, Uros Bizjak ubiz...@gmail.com wrote: Finally, having a post-reload mode-switching pass, we can double-check that there are no live SSE registers at vzeroupper insertion point. As vzeroupper is only an optimization, we want to play safe and cancel vzeroupper insertion in this case There is no degradation for x86_64 gABI targets, since all SSE registers are call-clobbered. Vzeroupper is conditionally inserted just before call insn, where all registers are saved to stack and already dead. The vzeroupper at function exit is not problematic. Patch was committed to mainline SVN with the following ChangeLog: 2012-11-16 Uros Bizjak ubiz...@gmail.com * config/i386/i386-protos.h (ix86_emit_mode_set): Add third argument. * config/i386/i386.h (EMIT_MODE_SET): Update. * config/i386/i386.c (ix86_avx_emit_vzeroupper): New function. (ix86_emit_mode_set) AVX_U128: Call ix86_avx_emit_vzeroupper. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, configured with --with-arch=corei7-avx --with-tune=corei7-avx. Uros. Index: i386-protos.h === --- i386-protos.h (revision 193549) +++ i386-protos.h (working copy) @@ -172,8 +172,11 @@ extern int ix86_mode_after (int, int, rtx); extern int ix86_mode_entry (int); extern int ix86_mode_exit (int); -extern void ix86_emit_mode_set (int, int); +#ifdef HARD_CONST +extern void ix86_emit_mode_set (int, int, HARD_REG_SET); +#endif + extern void x86_order_regs_for_local_alloc (void); extern void x86_function_profiler (FILE *, int); extern void x86_emit_floatuns (rtx [2]); Index: i386.c === --- i386.c (revision 193549) +++ i386.c (working copy) @@ -15477,16 +15477,38 @@ emit_move_insn (new_mode, reg); } +/* Emit vzeroupper. */ + +void +ix86_avx_emit_vzeroupper (HARD_REG_SET regs_live) +{ + int i; + + /* Cancel automatic vzeroupper insertion if there are + live call-saved SSE registers at the insertion point. */ + + for (i = FIRST_SSE_REG; i = LAST_SSE_REG; i++) +if (TEST_HARD_REG_BIT (regs_live, i) !call_used_regs[i]) + return; + + if (TARGET_64BIT) +for (i = FIRST_REX_SSE_REG; i = LAST_REX_SSE_REG; i++) + if (TEST_HARD_REG_BIT (regs_live, i) !call_used_regs[i]) + return; + + emit_insn (gen_avx_vzeroupper ()); +} + /* Generate one or more insns to set ENTITY to MODE. */ void -ix86_emit_mode_set (int entity, int mode) +ix86_emit_mode_set (int entity, int mode, HARD_REG_SET regs_live) { switch (entity) { case AVX_U128: if (mode == AVX_U128_CLEAN) - emit_insn (gen_avx_vzeroupper ()); + ix86_avx_emit_vzeroupper (regs_live); break; case I387_TRUNC: case I387_FLOOR: Index: i386.h === --- i386.h (revision 193549) +++ i386.h (working copy) @@ -2226,7 +2226,7 @@ are to be inserted. */ #define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \ - ix86_emit_mode_set ((ENTITY), (MODE)) + ix86_emit_mode_set ((ENTITY), (MODE), (HARD_REGS_LIVE)) /* Avoid renaming of stack registers, as doing so in combination with scheduling just increases amount of live registers at time and in
Fwd: [off-list] Re: [PATCH] Vzeroupper placement/47440
-- Forwarded message -- From: Vladimir Yakovlev vbyakov...@gmail.com Date: 2012/11/9 Subject: Re: [off-list] Re: [PATCH] Vzeroupper placement/47440 To: Uros Bizjak ubiz...@gmail.com Копия: H.J. Lu hjl.to...@gmail.com, Igor Zamyatin izamya...@gmail.com I did changes that moves vzeroupper insertion after reload 2012-11-09 Vladimir Yakovlev vladimir.b.yakov...@intel.com * i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New. * config/i386/i386.c (ix86_init_machine_status): Deleted initialization for mode switching. * i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New. * mode-switching.c (gate_mode_switching1): New. (rest_of_handle_mode_switching1): New. (pass_mode_switching1): New. * passes.c (init_optimization_passes): New pass pass_mode_switching1. * tree-pass.h (pass_mode_switching1): New. But this caused assertion fails in rtl_verify_flow_info_1 () at cfgrtl.c:2291 fatal_insn (flow control insn inside a basic block, x); The asserts are called by two calls of mode-switching.c: commit_edge_insertion and cleanup_cfg. After I commented (see below) 459.GemsFDTD benchspec passed. Your opinion of the patch and haw we can do something with asserts. Regards, Vladimir --- a/gcc/mode-switching.c +++ b/gcc/mode-switching.c @@ -747,7 +747,7 @@ optimize_mode_switching (void) commit_edge_insertions (); #if defined (MODE_ENTRY) defined (MODE_EXIT) - cleanup_cfg (CLEANUP_NO_INSN_DEL); + /*cleanup_cfg (CLEANUP_NO_INSN_DEL);*/ #else if (!need_commit !emitted) return 0; --- a/gcc/cfgrtl.c +++ b/gcc/cfgrtl.c @@ -1828,7 +1828,7 @@ commit_edge_insertions (void) basic_block bb; #ifdef ENABLE_CHECKING - verify_flow_info (); + /*verify_flow_info ();*/ #endif 2012/11/9 Uros Bizjak ubiz...@gmail.com: On Thu, Nov 8, 2012 at 6:52 PM, Uros Bizjak ubiz...@gmail.com wrote: Uh, this is spill around call insn, produced by reload. Please compile this code: double test (double a) { printf (Hello\n); return a; } You will get at mode switching: 1 NOTE_INSN_DELETED 4 NOTE_INSN_BASIC_BLOCK 2 r60:DF=xmm0:DF REG_DEAD: xmm0:DF 3 NOTE_INSN_FUNCTION_BEG 6 di:DI=`*.LC0' 7 call ... REG_DEAD: di:DI REG_UNUSED: ax:SI 12 xmm0:DF=r60:DF REG_DEAD: r60:DF 15 use xmm0:DF But reload will insert: 1 NOTE_INSN_DELETED 4 NOTE_INSN_BASIC_BLOCK 2 xmm0:DF=xmm0:DF REG_DEAD: xmm0:DF 18 [sp:DI+0x8]=xmm0:DF REG_DEAD: xmm0:DF 3 NOTE_INSN_FUNCTION_BEG 6 di:DI=`*.LC0' 7 call ... REG_DEAD: di:DI REG_UNUSED: ax:SI 19 xmm0:DF=[sp:DI+0x8] REG_DEAD: r62:DF 12 xmm0:DF=xmm0:DF REG_DEAD: xmm0:DF 15 use xmm0:DF I was not paying attention to this situation. A viable solution to this issue is through machine-reorg function (AKA x86_reorg) that would just move vzeroupper to the close proximity to a call insn. This would work on non-64bit-MS-ABI targets, where all SSE registers are dead at call insn place. Please note that 64bit-MS-ABI target declares registers xmm6+ as call-saved, so they can live over the call. I am not familiar with this target, but it looks to me that we have to remove vzeroupper, if one or more call-saved SSE registers are live at the call insn place. Uros. prvzu.patch Description: Binary data
Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
These assert should tell you what is wrong with the control flow. Please look at control_flow_insn_p, which condition returns true. There is a note after call insn. (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200 _gfortran_stop_string) [0 _gfortran_stop_string S1 A8]) (const_int 0 [0])) huygens.fppized.f90:190 616 {*call} (expr_list:REG_DEAD (reg:DI 5 di) (expr_list:REG_DEAD (reg:SI 4 si) (expr_list:REG_NORETURN (const_int 0 [0]) (nil (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di)) (expr_list:REG_BR_PRED (use (reg:SI 4 si)) (nil (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si) (const_int 0 [0])) (nil))) NOTE_INSN_CALL_ARG_LOCATION) You shouldn't disable commit_edge_insertions, as there is the function where vzerouppers are emitted. I didn;t disable commit_edge_insertions. I only remove call of assert. 2012/11/9 Uros Bizjak ubiz...@gmail.com: On Fri, Nov 9, 2012 at 11:45 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Nov 9, 2012 at 11:21 AM, Vladimir Yakovlev vbyakov...@gmail.com wrote: I did changes that moves vzeroupper insertion after reload 2012-11-09 Vladimir Yakovlev vladimir.b.yakov...@intel.com * i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New. * config/i386/i386.c (ix86_init_machine_status): Deleted initialization for mode switching. * i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New. * mode-switching.c (gate_mode_switching1): New. (rest_of_handle_mode_switching1): New. (pass_mode_switching1): New. * passes.c (init_optimization_passes): New pass pass_mode_switching1. * tree-pass.h (pass_mode_switching1): New. But this caused assertion fails in rtl_verify_flow_info_1 () at cfgrtl.c:2291 fatal_insn (flow control insn inside a basic block, x); The asserts are called by two calls of mode-switching.c: commit_edge_insertion and cleanup_cfg. After I commented (see below) 459.GemsFDTD benchspec passed. Your opinion of the patch and haw we can do something with asserts. These assert should tell you what is wrong with the control flow. Please look at control_flow_insn_p, which condition returns true. You shouldn't disable commit_edge_insertions, as there is the function where vzerouppers are emitted. Uros.
Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
On Fri, Nov 9, 2012 at 1:18 PM, Vladimir Yakovlev vbyakov...@gmail.com wrote: These assert should tell you what is wrong with the control flow. Please look at control_flow_insn_p, which condition returns true. There is a note after call insn. (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200 _gfortran_stop_string) [0 _gfortran_stop_string S1 A8]) (const_int 0 [0])) huygens.fppized.f90:190 616 {*call} (expr_list:REG_DEAD (reg:DI 5 di) (expr_list:REG_DEAD (reg:SI 4 si) (expr_list:REG_NORETURN (const_int 0 [0]) (nil (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di)) (expr_list:REG_BR_PRED (use (reg:SI 4 si)) (nil (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si) (const_int 0 [0])) (nil))) NOTE_INSN_CALL_ARG_LOCATION) Huh, this RTX is ignored: --cfgrtl.c-- bool control_flow_insn_p (const_rtx insn) { switch (GET_CODE (insn)) { case NOTE: case CODE_LABEL: case DEBUG_INSN: return false; --cfgrtl.c-- The problem is noreturn call. BTW: What happens if the new pass is put before pro_and_epilogue pass? Uros,
Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
On Fri, Nov 09, 2012 at 01:29:18PM +0100, Uros Bizjak wrote: On Fri, Nov 9, 2012 at 1:18 PM, Vladimir Yakovlev vbyakov...@gmail.com wrote: These assert should tell you what is wrong with the control flow. Please look at control_flow_insn_p, which condition returns true. There is a note after call insn. (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200 _gfortran_stop_string) [0 _gfortran_stop_string S1 A8]) (const_int 0 [0])) huygens.fppized.f90:190 616 {*call} (expr_list:REG_DEAD (reg:DI 5 di) (expr_list:REG_DEAD (reg:SI 4 si) (expr_list:REG_NORETURN (const_int 0 [0]) (nil (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di)) (expr_list:REG_BR_PRED (use (reg:SI 4 si)) (nil (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si) (const_int 0 [0])) (nil))) NOTE_INSN_CALL_ARG_LOCATION) Huh, this RTX is ignored: NOTE_INSN_CALL_ARG_LOCATION is fine, even after a REG_NORETURN call. It is just a way how to pass call argument details to dwarf2out. If you have a pass after var-tracking, you need to skip over it. Jakub
Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
On Fri, Nov 9, 2012 at 1:36 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Nov 09, 2012 at 01:29:18PM +0100, Uros Bizjak wrote: On Fri, Nov 9, 2012 at 1:18 PM, Vladimir Yakovlev vbyakov...@gmail.com wrote: These assert should tell you what is wrong with the control flow. Please look at control_flow_insn_p, which condition returns true. There is a note after call insn. (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200 _gfortran_stop_string) [0 _gfortran_stop_string S1 A8]) (const_int 0 [0])) huygens.fppized.f90:190 616 {*call} (expr_list:REG_DEAD (reg:DI 5 di) (expr_list:REG_DEAD (reg:SI 4 si) (expr_list:REG_NORETURN (const_int 0 [0]) (nil (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di)) (expr_list:REG_BR_PRED (use (reg:SI 4 si)) (nil (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si) (const_int 0 [0])) (nil))) NOTE_INSN_CALL_ARG_LOCATION) Huh, this RTX is ignored: NOTE_INSN_CALL_ARG_LOCATION is fine, even after a REG_NORETURN call. It is just a way how to pass call argument details to dwarf2out. If you have a pass after var-tracking, you need to skip over it. Yes, but postreload mode switching should come before pro_and_epilogue anyway, otherwise create_pre_exit won't work: --mode-switching.c (222)-- /* If this function returns a value at the end, we have to insert the final mode switch before the return value copy to its hard register. */ if (EDGE_COUNT (EXIT_BLOCK_PTR-preds) == 1 NONJUMP_INSN_P ((last_insn = BB_END (src_bb))) GET_CODE (PATTERN (last_insn)) == USE GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG) --mode-switching.2 (228)-- I believe that this will work OK if the pass is inserted before prologue generation pass. Uros.
Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
On Fri, Nov 9, 2012 at 1:47 PM, Uros Bizjak ubiz...@gmail.com wrote: These assert should tell you what is wrong with the control flow. Please look at control_flow_insn_p, which condition returns true. There is a note after call insn. (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200 _gfortran_stop_string) [0 _gfortran_stop_string S1 A8]) (const_int 0 [0])) huygens.fppized.f90:190 616 {*call} (expr_list:REG_DEAD (reg:DI 5 di) (expr_list:REG_DEAD (reg:SI 4 si) (expr_list:REG_NORETURN (const_int 0 [0]) (nil (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di)) (expr_list:REG_BR_PRED (use (reg:SI 4 si)) (nil (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si) (const_int 0 [0])) (nil))) NOTE_INSN_CALL_ARG_LOCATION) Huh, this RTX is ignored: NOTE_INSN_CALL_ARG_LOCATION is fine, even after a REG_NORETURN call. It is just a way how to pass call argument details to dwarf2out. If you have a pass after var-tracking, you need to skip over it. Yes, but postreload mode switching should come before pro_and_epilogue anyway, otherwise create_pre_exit won't work: --mode-switching.c (222)-- /* If this function returns a value at the end, we have to insert the final mode switch before the return value copy to its hard register. */ if (EDGE_COUNT (EXIT_BLOCK_PTR-preds) == 1 NONJUMP_INSN_P ((last_insn = BB_END (src_bb))) GET_CODE (PATTERN (last_insn)) == USE GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG) --mode-switching.2 (228)-- I believe that this will work OK if the pass is inserted before prologue generation pass. Finally, having a post-reload mode-switching pass, we can double-check that there are no live SSE registers at vzeroupper insertion point. As vzeroupper is only an optimization, we want to play safe and cancel vzeroupper insertion in this case There is no degradation for x86_64 gABI targets, since all SSE registers are call-clobbered. Vzeroupper is conditionally inserted just before call insn, where all registers are saved to stack and already dead. The vzeroupper at function exit is not problematic. Uros. Index: i386-protos.h === --- i386-protos.h (revision 193329) +++ i386-protos.h (working copy) @@ -172,8 +172,11 @@ extern int ix86_mode_needed (int, rtx); extern int ix86_mode_after (int, int, rtx); extern int ix86_mode_entry (int); extern int ix86_mode_exit (int); -extern void ix86_emit_mode_set (int, int); +#ifdef HARD_CONST +extern void ix86_emit_mode_set (int, int, HARD_REG_SET); +#endif + extern void x86_order_regs_for_local_alloc (void); extern void x86_function_profiler (FILE *, int); extern void x86_emit_floatuns (rtx [2]); Index: i386.c === --- i386.c (revision 193329) +++ i386.c (working copy) @@ -15277,16 +15284,38 @@ emit_i387_cw_initialization (int mode) emit_move_insn (new_mode, reg); } +/* Emit vzeroupper. */ + +void +ix86_avx_emit_vzeroupper (HARD_REG_SET regs_live) +{ + int i; + + /* Cancel automatic vzeroupper insertion if there are + live call-saved SSE registers at the insertion point. */ + + for (i = FIRST_SSE_REG; i = LAST_SSE_REG; i++) +if (!call_used_regs[i] TEST_HARD_REG_BIT (regs_live, i)) + return; + + if (TARGET_64BIT) +for (i = FIRST_REX_SSE_REG; i = LAST_REX_SSE_REG; i++) + if (!call_used_regs[i] TEST_HARD_REG_BIT (regs_live, i)) + return; + + emit_insn (gen_avx_vzeroupper ()); +} + /* Generate one or more insns to set ENTITY to MODE. */ void -ix86_emit_mode_set (int entity, int mode) +ix86_emit_mode_set (int entity, int mode, HARD_REG_SET regs_live) { switch (entity) { case AVX_U128: if (mode == AVX_U128_CLEAN) - emit_insn (gen_avx_vzeroupper ()); + ix86_avx_emit_vzeroupper (regs_live); break; case I387_TRUNC: case I387_FLOOR: Index: i386.h === --- i386.h (revision 193329) +++ i386.h (working copy) @@ -2223,7 +2227,7 @@ enum avx_u128_state are to be inserted. */ #define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \ - ix86_emit_mode_set ((ENTITY), (MODE)) + ix86_emit_mode_set ((ENTITY), (MODE), (HARD_REGS_LIVE)) /* Avoid renaming of stack registers, as doing so in combination with scheduling just increases amount of live registers at time and in
Re: [PATCH] Vzeroupper placement/47440
On Wed, Nov 07, 2012 at 08:08:08AM +0100, Uros Bizjak wrote: 2012-11-06 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx_u128_mode_after): Don't look for reg in CALL operand. OK. You can also break the loop after reg is found. I have committed the patch as is to fix the bootstrap, as anything else needs another bootstrap/regtest. I don't think breaking out of the loop would be correct, then say for *{,sib}call_value_pop patterns reg would be stack pointer rather than the return value of the call. Due to that pattern we can't use single_set, but I wonder if we just can't use XVECEXP (pat, 0, 0) unconditionally for the return value, or perhaps check the condition inside of the loop (REG_P (reg) VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg))), return AVX_U128_DIRTY if true (and that way break out of the loop), and return AVX_U128_CLEAN after the loop. Or I wonder why is call handled specially at all, doesn't /* Check if a 256bit AVX register is referenced in stores. */ state = unused; note_stores (pat, check_avx256_stores, state); if (state == used) return AVX_U128_DIRTY; handle it? Then it would just need to be if (CALL_P (insn)) return AVX_U128_CLEAN. BTW, the formatting is wrong in some spots, e.g. check_avx256_stores (rtx dest, const_rtx set, void *data) { if (((REG_P (dest) || MEM_P(dest)) I'd prefer to leave this to the original submitter. Jakub
Re: [PATCH] Vzeroupper placement/47440
On Wed, Nov 7, 2012 at 9:04 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Nov 07, 2012 at 08:08:08AM +0100, Uros Bizjak wrote: 2012-11-06 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx_u128_mode_after): Don't look for reg in CALL operand. OK. You can also break the loop after reg is found. I have committed the patch as is to fix the bootstrap, as anything else needs another bootstrap/regtest. I don't think breaking out of the loop would be correct, then say for *{,sib}call_value_pop patterns reg would be stack pointer rather than the return value of the call. Due to that pattern we can't use single_set, but I wonder if we just can't use XVECEXP (pat, 0, 0) unconditionally for the return value, or perhaps check the condition inside of the loop (REG_P (reg) VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg))), return AVX_U128_DIRTY if true (and that way break out of the loop), and return AVX_U128_CLEAN after the loop. Indeed, I didn't notice reverse loop. Or I wonder why is call handled specially at all, doesn't /* Check if a 256bit AVX register is referenced in stores. */ state = unused; note_stores (pat, check_avx256_stores, state); if (state == used) return AVX_U128_DIRTY; handle it? Then it would just need to be if (CALL_P (insn)) return AVX_U128_CLEAN. You are right, I am testing the attached patch. BTW, the formatting is wrong in some spots, e.g. check_avx256_stores (rtx dest, const_rtx set, void *data) { if (((REG_P (dest) || MEM_P(dest)) I have some doubts that this function is fully correct. The comment says: /* Check if a 256bit AVX register is referenced in stores. */ But, we are in fact checking stores and uses. IIRC, U128_DIRTY state is only set for stores to YMM register, so: static void check_avx256_stores (rtx dest, const_rtx set ATTRIBUTE_UNUSED, void *data) { if (REG_P (dest) VALID_AVX256_REG_OR_OI_MODE (GET_MODE (dest))) { enum upper_128bits_state *state = (enum upper_128bits_state *) data; *state = used; } } I'd prefer to leave this to the original submitter. Yes, Vladimir, can you please comment on these issues? Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 193280) +++ config/i386/i386.c (working copy) @@ -15079,30 +15079,8 @@ ix86_avx_u128_mode_after (int mode, rtx insn) { rtx pat = PATTERN (insn); - rtx reg = NULL; - int i; enum upper_128bits_state state; - /* Check for CALL instruction. */ - if (CALL_P (insn)) -{ - if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL) - reg = SET_DEST (pat); - else if (GET_CODE (pat) == PARALLEL) - for (i = XVECLEN (pat, 0) - 1; i = 0; i--) - { - rtx x = XVECEXP (pat, 0, i); - if (GET_CODE(x) == SET) - reg = SET_DEST (x); - } - /* Mode after call is set to AVX_U128_DIRTY if there are -256bit modes used in the function return register. */ - if (reg REG_P (reg) VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg))) - return AVX_U128_DIRTY; - else - return AVX_U128_CLEAN; -} - if (vzeroupper_operation (pat, VOIDmode) || vzeroall_operation (pat, VOIDmode)) return AVX_U128_CLEAN; @@ -15112,6 +15090,10 @@ note_stores (pat, check_avx256_stores, state); if (state == used) return AVX_U128_DIRTY; + /* We know that state is clean after CALL insn if there are no + 256bit modes used in the function return register. */ + else if (CALL_P (insn) state == unused) +return AVX_U128_CLEAN; return mode; }
Re: [PATCH] Vzeroupper placement/47440
On Wed, Nov 7, 2012 at 9:04 AM, Jakub Jelinek ja...@redhat.com wrote: Or I wonder why is call handled specially at all, doesn't /* Check if a 256bit AVX register is referenced in stores. */ state = unused; note_stores (pat, check_avx256_stores, state); if (state == used) return AVX_U128_DIRTY; handle it? Then it would just need to be if (CALL_P (insn)) return AVX_U128_CLEAN. BTW, the formatting is wrong in some spots, e.g. check_avx256_stores (rtx dest, const_rtx set, void *data) { if (((REG_P (dest) || MEM_P(dest)) I'd prefer to leave this to the original submitter. I have committed following patch that address all the above issues. 2012-11-07 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (enum upper_128bits_state): Remove. (check_avx256_store): Use bool pointer argument. (ix86_avx_u128_mode_needed): Use note_stores also for CALL insns. * config/i386/predicates.md (vzeroupper_operation): Use match_test. Bootstrapped and regression tested on x86_64-pc-linux-gnu, committed. Uros. Index: predicates.md === --- predicates.md (revision 193292) +++ predicates.md (working copy) @@ -1231,10 +1231,8 @@ ;; return true if OP is a vzeroupper operation. (define_predicate vzeroupper_operation - (match_code unspec_volatile) -{ - return XINT (op, 1) == UNSPECV_VZEROUPPER; -}) + (and (match_code unspec_volatile) + (match_test XINT (op, 1) == UNSPECV_VZEROUPPER))) ;; Return true if OP is a parallel for a vbroadcast permute. Index: i386.c === --- i386.c (revision 193292) +++ i386.c (working copy) @@ -65,27 +65,19 @@ along with GCC; see the file COPYING3. If not see #include tree-pass.h #include tree-flow.h -enum upper_128bits_state -{ - unknown = 0, - unused, - used -}; - /* Check if a 256bit AVX register is referenced in stores. */ static void check_avx256_stores (rtx dest, const_rtx set, void *data) { - if (((REG_P (dest) || MEM_P(dest)) + if (((REG_P (dest) || MEM_P (dest)) VALID_AVX256_REG_OR_OI_MODE (GET_MODE (dest))) || (GET_CODE (set) == SET (REG_P (SET_SRC (set)) || MEM_P (SET_SRC (set))) VALID_AVX256_REG_OR_OI_MODE (GET_MODE (SET_SRC (set) { - enum upper_128bits_state *state - = (enum upper_128bits_state *) data; - *state = used; + bool *used = (bool *) data; + *used = true; } } @@ -14967,23 +14959,24 @@ output_387_binary_op (rtx insn, rtx *operands) static int ix86_avx_u128_mode_needed (rtx insn) { - rtx pat = PATTERN (insn); - rtx arg; - enum upper_128bits_state state; + bool avx_u128_used; if (CALL_P (insn)) { + rtx link; + /* Needed mode is set to AVX_U128_CLEAN if there are no 256bit modes used in function arguments. */ - for (arg = CALL_INSN_FUNCTION_USAGE (insn); arg; - arg = XEXP (arg, 1)) + for (link = CALL_INSN_FUNCTION_USAGE (insn); + link; + link = XEXP (link, 1)) { - if (GET_CODE (XEXP (arg, 0)) == USE) + if (GET_CODE (XEXP (link, 0)) == USE) { - rtx reg = XEXP (XEXP (arg, 0), 0); + rtx arg = XEXP (XEXP (link, 0), 0); - if (reg REG_P (reg) - VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg))) + if (REG_P (arg) + VALID_AVX256_REG_OR_OI_MODE (GET_MODE (arg))) return AVX_U128_ANY; } } @@ -14992,10 +14985,11 @@ ix86_avx_u128_mode_needed (rtx insn) } /* Check if a 256bit AVX register is referenced in stores. */ - state = unused; - note_stores (pat, check_avx256_stores, state); - if (state == used) + avx_u128_used = false; + note_stores (PATTERN (insn), check_avx256_stores, avx_u128_used); + if (avx_u128_used) return AVX_U128_DIRTY; + return AVX_U128_ANY; } @@ -15079,39 +15073,21 @@ static int ix86_avx_u128_mode_after (int mode, rtx insn) { rtx pat = PATTERN (insn); - rtx reg = NULL; - int i; - enum upper_128bits_state state; + bool avx_u128_used; - /* Check for CALL instruction. */ - if (CALL_P (insn)) -{ - if (GET_CODE (pat) == SET) - reg = SET_DEST (pat); - else if (GET_CODE (pat) == PARALLEL) - for (i = XVECLEN (pat, 0) - 1; i = 0; i--) - { - rtx x = XVECEXP (pat, 0, i); - if (GET_CODE(x) == SET) - reg = SET_DEST (x); - } - /* Mode after call is set to AVX_U128_DIRTY if there are -256bit modes used in the function return register. */ - if (reg REG_P (reg) VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg))) - return AVX_U128_DIRTY; - else - return AVX_U128_CLEAN; -} - if (vzeroupper_operation (pat, VOIDmode) || vzeroall_operation (pat, VOIDmode)) return AVX_U128_CLEAN; /* Check
Re: [PATCH] Vzeroupper placement/47440
Hello, Thanyou for investigation and fixing the problem. I'll answer on remarks later. Regards, Vladimir 2012/11/7 Jakub Jelinek ja...@redhat.com: On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote: On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, OK for mainline SVN, please commit. Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html Thanks, K This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224 Not only that, it also broke --enable-checking=yes,rtl bootstrap. SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and the code looks for reg, so I think looking for CALL was just a mistake. This fixes the bootstrap, ok for trunk? 2012-11-06 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx_u128_mode_after): Don't look for reg in CALL operand. --- gcc/config/i386/i386.c.jj 2012-11-06 18:10:22.0 +0100 +++ gcc/config/i386/i386.c 2012-11-06 20:15:09.068912242 +0100 @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx /* Check for CALL instruction. */ if (CALL_P (insn)) { - if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL) + if (GET_CODE (pat) == SET) reg = SET_DEST (pat); - else if (GET_CODE (pat) == PARALLEL) + else if (GET_CODE (pat) == PARALLEL) for (i = XVECLEN (pat, 0) - 1; i = 0; i--) { rtx x = XVECEXP (pat, 0, i); Jakub
Re: [PATCH] Vzeroupper placement/47440
I tested changes with configure ../gcc/configure --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran,java,lto,objc --with-arch=corei7-avx --with-cpu=corei7-avx Bootstrap is passed and no new fails in make check. Thank you, Vladimir 2012/11/7 Vladimir Yakovlev vbyakov...@gmail.com: Hello, Thanyou for investigation and fixing the problem. I'll answer on remarks later. Regards, Vladimir 2012/11/7 Jakub Jelinek ja...@redhat.com: On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote: On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, OK for mainline SVN, please commit. Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html Thanks, K This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224 Not only that, it also broke --enable-checking=yes,rtl bootstrap. SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and the code looks for reg, so I think looking for CALL was just a mistake. This fixes the bootstrap, ok for trunk? 2012-11-06 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx_u128_mode_after): Don't look for reg in CALL operand. --- gcc/config/i386/i386.c.jj 2012-11-06 18:10:22.0 +0100 +++ gcc/config/i386/i386.c 2012-11-06 20:15:09.068912242 +0100 @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx /* Check for CALL instruction. */ if (CALL_P (insn)) { - if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL) + if (GET_CODE (pat) == SET) reg = SET_DEST (pat); - else if (GET_CODE (pat) == PARALLEL) + else if (GET_CODE (pat) == PARALLEL) for (i = XVECLEN (pat, 0) - 1; i = 0; i--) { rtx x = XVECEXP (pat, 0, i); Jakub
Re: [PATCH] Vzeroupper placement/47440
Hello, OK for mainline SVN, please commit. Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html Thanks, K
Re: [PATCH] Vzeroupper placement/47440
On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, OK for mainline SVN, please commit. Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html Thanks, K This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224 -- H.J.
Re: [PATCH] Vzeroupper placement/47440
On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote: On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, OK for mainline SVN, please commit. Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html Thanks, K This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224 Not only that, it also broke --enable-checking=yes,rtl bootstrap. SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and the code looks for reg, so I think looking for CALL was just a mistake. This fixes the bootstrap, ok for trunk? 2012-11-06 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx_u128_mode_after): Don't look for reg in CALL operand. --- gcc/config/i386/i386.c.jj 2012-11-06 18:10:22.0 +0100 +++ gcc/config/i386/i386.c 2012-11-06 20:15:09.068912242 +0100 @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx /* Check for CALL instruction. */ if (CALL_P (insn)) { - if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL) + if (GET_CODE (pat) == SET) reg = SET_DEST (pat); - else if (GET_CODE (pat) == PARALLEL) + else if (GET_CODE (pat) == PARALLEL) for (i = XVECLEN (pat, 0) - 1; i = 0; i--) { rtx x = XVECEXP (pat, 0, i); Jakub
Re: [PATCH] Vzeroupper placement/47440
On Tue, Nov 6, 2012 at 11:18 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote: On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, OK for mainline SVN, please commit. Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html Thanks, K This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224 Not only that, it also broke --enable-checking=yes,rtl bootstrap. SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and the code looks for reg, so I think looking for CALL was just a mistake. This fixes the bootstrap, ok for trunk? 2012-11-06 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx_u128_mode_after): Don't look for reg in CALL operand. OK. Thanks, Uros.
Re: [PATCH] Vzeroupper placement/47440
On Wed, Nov 7, 2012 at 8:03 AM, Uros Bizjak ubiz...@gmail.com wrote: OK for mainline SVN, please commit. Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html Thanks, K This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224 Not only that, it also broke --enable-checking=yes,rtl bootstrap. SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and the code looks for reg, so I think looking for CALL was just a mistake. This fixes the bootstrap, ok for trunk? 2012-11-06 Jakub Jelinek ja...@redhat.com * config/i386/i386.c (ix86_avx_u128_mode_after): Don't look for reg in CALL operand. OK. You can also break the loop after reg is found. Uros.
Re: [PATCH] Vzeroupper placement/47440
Hello! 2012-11-04 Vladimir Yakovlev vladimir.b.yakov...@intel.com * mode-switching.c (create_pre_exit): Added code for maybe_builtin_apply case. The part above is already committed. * config/i386/i386-protos.h (emit_i387_cw_initialization): Deleted. (emit_vzero): Added prototype. (ix86_mode_entry): Likewise. (ix86_mode_exit): Likewise. (ix86_emit_mode_set): Likewise. * config/i386/i386.c (typedef struct block_info_def): Deleted. (define BLOCK_INFO): Deleted. (check_avx256_stores): Added checking for MEM_P. (move_or_delete_vzeroupper_2): Deleted. (move_or_delete_vzeroupper_1): Deleted. (move_or_delete_vzeroupper): Deleted. (ix86_maybe_emit_epilogue_vzeroupper): Deleted. (function_pass_avx256_p): Deleted. (ix86_function_ok_for_sibcall): Deleted disabling sibcall. (nit_cumulative_args): Deleted initialization of of avx256 fields of cfun-machine. (ix86_emit_restore_sse_regs_using_mov): Deleted vzeroupper generation. (ix86_expand_epilogue): Likewise. (ix86_avx_u128_mode_needed): New. (ix86_i387_mode_needed): Renamed ix86_mode_needed. (ix86_mode_needed): New. (ix86_avx_u128_mode_after): New. (ix86_mode_after): New. (ix86_avx_u128_mode_entry): New. (ix86_mode_entry): New. (ix86_avx_u128_mode_exit): New. (ix86_mode_exit): New. (ix86_emit_mode_set): New. (ix86_expand_call): Deleted vzeroupper generation. (ix86_split_call_vzeroupper): Deleted. (ix86_init_machine_status): Initialzed optimize_mode_switching. (ix86_expand_special_args_builtin): Changed. (ix86_reorg): Deleted a call of move_or_delete_vzeroupper. * config/i386/i386.h (VALID_AVX256_REG_OR_OI_MODE): New. (AVX_U128): New. (avx_u128_state): New. (NUM_MODES_FOR_MODE_SWITCHING): Added AVX_U128_ANY. (MODE_AFTER): New. (MODE_ENTRY): New. (MODE_EXIT): New. (EMIT_MODE_SET): Changed. (machine_function): Deleted avx256 fields. * config/i386/i386.md (UNSPEC_CALL_NEEDS_VZEROUPPER): Deleted. (define_insn_and_split *call_vzeroupper): Deleted. (define_insn_and_split *call_rex64_ms_sysv_vzeroupper): Deleted. (define_insn_and_split *sibcall_vzeroupper): Deleted. (define_insn_and_split *call_pop_vzeroupper): Deleted. (define_insn_and_split *sibcall_pop_vzeroupper): Deleted. (define_insn_and_split *call_value_vzeroupper): Deleted. (define_insn_and_split *sibcall_value_vzeroupper): Deleted. (define_insn_and_split *call_value_rex64_ms_sysv_vzeroupper): Deleted. (define_insn_and_split *call_value_pop_vzeroupper): Deleted. (define_insn_and_split *sibcall_value_pop_vzeroupper): Deleted. (define_expand return): Deleted vzeroupper emitting. (define_expand simple_return): Deleted. * config/i386/predicates.md (vzeroupper_operation): New. * config/i386/sse.md (avx_vzeroupper): Changed. 2012-11-04 Vladimir Yakovlev vladimir.b.yakov...@intel.com * gcc.target/i386/avx-vzeroupper-5.c: Changed scan-assembler-times. gcc.target/i386/avx-vzeroupper-8.c: Likewise. gcc.target/i386/avx-vzeroupper-9.c: Likewise. gcc.target/i386/avx-vzeroupper-10.c: Likewise. gcc.target/i386/avx-vzeroupper-11.c: Likewise. gcc.target/i386/avx-vzeroupper-12.c: Likewise. gcc.target/i386/avx-vzeroupper-19.c: Likewis. gcc.target/i386/avx-vzeroupper-27.c: New. OK for mainline SVN, please commit. Thanks, Uros.
Re: [PATCH] Vzeroupper placement/47440
Hello! 2012-11-04 Vladimir Yakovlev vladimir.b.yakov...@intel.com * mode-switching.c (create_pre_exit): Added code for maybe_builtin_apply case. * config/i386/i386-protos.h (emit_i387_cw_initialization): Deleted. (emit_vzero): Added prototype. (ix86_mode_entry): Likewise. (ix86_mode_exit): Likewise. (ix86_emit_mode_set): Likewise. * config/i386/i386.c (VALID_AVX256_REG_OR_OI_MODE): New. (typedef struct block_info_def): Deleted. (define BLOCK_INFO): Deleted. (check_avx256_stores): Added checking for MEM_P. (move_or_delete_vzeroupper_2): Deleted. (move_or_delete_vzeroupper_1): Deleted. (move_or_delete_vzeroupper): Deleted. (ix86_maybe_emit_epilogue_vzeroupper): Deleted. (function_pass_avx256_p): Deleted. (ix86_function_ok_for_sibcall): Deleted disabling sibcall. (nit_cumulative_args): Deleted initialization of of avx256 fields of cfun-machine. (ix86_emit_restore_sse_regs_using_mov): Deleted vzeroupper generation. (ix86_expand_epilogue): Likewise. (is_vzeroupper): New. (is_vzeroall): New. (ix86_avx_u128_mode_needed): New. (ix86_i387_mode_needed): Renamed ix86_mode_needed. (ix86_mode_needed): New. (ix86_avx_u128_mode_after): New. (ix86_mode_after): New. (ix86_avx_u128_mode_entry): New. (ix86_mode_entry): New. (ix86_avx_u128_mode_exit): New. (ix86_mode_exit): New. (ix86_emit_vzeroupper): New. (ix86_emit_mode_set): New. (ix86_expand_call): Deleted vzeroupper generation. (ix86_split_call_vzeroupper): Deleted. (ix86_init_machine_status): Initialzed optimize_mode_switching. (ix86_expand_special_args_builtin): Changed. (ix86_reorg): Deleted a call of move_or_delete_vzeroupper. * config/i386/i386.h (AVX_U128): New. (avx_u128_state): New. (NUM_MODES_FOR_MODE_SWITCHING): Added AVX_U128_ANY. (MODE_AFTER): New. (MODE_ENTRY): New. (MODE_EXIT): New. (EMIT_MODE_SET): Changed. (machine_function): Deleted avx256 fields. * config/i386/i386.md (UNSPEC_CALL_NEEDS_VZEROUPPER): Deleted. (define_insn_and_split *call_vzeroupper): Deleted. (define_insn_and_split *call_rex64_ms_sysv_vzeroupper): Deleted. (define_insn_and_split *sibcall_vzeroupper): Deleted. (define_insn_and_split *call_pop_vzeroupper): Deleted. (define_insn_and_split *sibcall_pop_vzeroupper): Deleted. (define_insn_and_split *call_value_vzeroupper): Deleted. (define_insn_and_split *sibcall_value_vzeroupper): Deleted. (define_insn_and_split *call_value_rex64_ms_sysv_vzeroupper): Deleted. (define_insn_and_split *call_value_pop_vzeroupper): Deleted. (define_insn_and_split *sibcall_value_pop_vzeroupper): Deleted. (define_expand return): Deleted vzeroupper emitting. (define_expand simple_return): Deleted. 2012-11-04 Vladimir Yakovlev vladimir.b.yakov...@intel.com * gcc.target/i386/avx-vzeroupper-5.c: Changed scan-assembler-times. gcc.target/i386/avx-vzeroupper-8.c: Likewise. gcc.target/i386/avx-vzeroupper-9.c: Likewise. gcc.target/i386/avx-vzeroupper-10.c: Likewise. gcc.target/i386/avx-vzeroupper-11.c: Likewise. gcc.target/i386/avx-vzeroupper-12.c: Likewise. gcc.target/i386/avx-vzeroupper-19.c: Likewis. gcc.target/i386/avx-vzeroupper-27.c: New. Target part (without mode-switching.c change) is OK for mainline, with a few small changes below: +#define VALID_AVX256_REG_OR_OI_MODE(m) (VALID_AVX256_REG_MODE (m) || (m) == OImode) enum upper_128bits_state Put this definition in i386.h, after VALID_AVX256_REG_MODE. +static void +ix86_emit_vzeroupper (void) +{ + emit_insn (gen_avx_vzeroupper (GEN_INT (9))); +} No need to pass argument to vzeroupper anymore. We have only one vzeroupper type now, so following definition in sse.md could also be changed from: (define_insn avx_vzeroupper [(unspec_volatile [(match_operand 0 const_int_operand)] UNSPECV_VZEROUPPER)] to: (define_insn avx_vzeroupper [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)] Please call gen_avx_vzeroupper directly, so ix86_emit_vzeroupper wrapper function can be simply deleted. +/* Check insn for vzeroupper intrinsic. */ + +static bool +is_vzeroupper (rtx pat) +{ + return pat + GET_CODE (pat) == UNSPEC_VOLATILE + XINT (pat, 1) == UNSPECV_VZEROUPPER; +} + +/* Check insn for vzeroall intrinsic. */ + +static bool +is_vzeroall (rtx pat) +{ + return pat + GET_CODE (pat) == PARALLEL + GET_CODE (XVECEXP (pat, 0, 0)) == UNSPEC_VOLATILE + XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL; +} These should be put in predicates.md. This can be in a follow-up patch. case
Re: [PATCH] Vzeroupper placement/47440
On Sun, Nov 4, 2012 at 2:29 PM, Uros Bizjak ubiz...@gmail.com wrote: -/* { dg-final { scan-assembler-not avx_vzeroupper } } */ +/* { dg-final { scan-assembler-times avx_vzeroupper 3 } } */ (... and a couple of similar testsuite changes ...) These asm scans were put there for a reason. I assume you have looked at these differences and are correct (this also implies that current vzeroupper placement code is not optimal or even wrong). Ah, these extra instructions were inserted with _mm265_* intrinsics. We decided some time ago, that these should remain, and no attempt to optimize them will be performed. OTOH, automatic insertion won't emit extra vzeroupper in this case. So, all is OK. Thanks, Uros.