Re: [PATCH] Vzeroupper placement/47440

2012-11-15 Thread Uros Bizjak
On Fri, Nov 9, 2012 at 2:28 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Finally, having a post-reload mode-switching pass, we can double-check
 that there are no live SSE registers at vzeroupper insertion point. As
 vzeroupper is only an optimization, we want to play safe and cancel
 vzeroupper insertion in this case

 There is no degradation for x86_64 gABI targets, since all SSE
 registers are call-clobbered. Vzeroupper is conditionally inserted
 just before call insn, where all registers are saved to stack and
 already dead. The vzeroupper at function exit is not problematic.

Patch was committed to mainline SVN with the following ChangeLog:

2012-11-16  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386-protos.h (ix86_emit_mode_set): Add third argument.
* config/i386/i386.h (EMIT_MODE_SET): Update.
* config/i386/i386.c (ix86_avx_emit_vzeroupper): New function.
(ix86_emit_mode_set) AVX_U128: Call ix86_avx_emit_vzeroupper.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
configured with --with-arch=corei7-avx --with-tune=corei7-avx.

Uros.
Index: i386-protos.h
===
--- i386-protos.h   (revision 193549)
+++ i386-protos.h   (working copy)
@@ -172,8 +172,11 @@
 extern int ix86_mode_after (int, int, rtx);
 extern int ix86_mode_entry (int);
 extern int ix86_mode_exit (int);
-extern void ix86_emit_mode_set (int, int);
 
+#ifdef HARD_CONST
+extern void ix86_emit_mode_set (int, int, HARD_REG_SET);
+#endif
+
 extern void x86_order_regs_for_local_alloc (void);
 extern void x86_function_profiler (FILE *, int);
 extern void x86_emit_floatuns (rtx [2]);
Index: i386.c
===
--- i386.c  (revision 193549)
+++ i386.c  (working copy)
@@ -15477,16 +15477,38 @@
   emit_move_insn (new_mode, reg);
 }
 
+/* Emit vzeroupper.  */
+
+void
+ix86_avx_emit_vzeroupper (HARD_REG_SET regs_live)
+{
+  int i;
+
+  /* Cancel automatic vzeroupper insertion if there are
+ live call-saved SSE registers at the insertion point.  */
+
+  for (i = FIRST_SSE_REG; i = LAST_SSE_REG; i++)
+if (TEST_HARD_REG_BIT (regs_live, i)  !call_used_regs[i])
+  return;
+
+  if (TARGET_64BIT)
+for (i = FIRST_REX_SSE_REG; i = LAST_REX_SSE_REG; i++)
+  if (TEST_HARD_REG_BIT (regs_live, i)  !call_used_regs[i])
+   return;
+
+  emit_insn (gen_avx_vzeroupper ());
+}
+
 /* Generate one or more insns to set ENTITY to MODE.  */
 
 void
-ix86_emit_mode_set (int entity, int mode)
+ix86_emit_mode_set (int entity, int mode, HARD_REG_SET regs_live)
 {
   switch (entity)
 {
 case AVX_U128:
   if (mode == AVX_U128_CLEAN)
-   emit_insn (gen_avx_vzeroupper ());
+   ix86_avx_emit_vzeroupper (regs_live);
   break;
 case I387_TRUNC:
 case I387_FLOOR:
Index: i386.h
===
--- i386.h  (revision 193549)
+++ i386.h  (working copy)
@@ -2226,7 +2226,7 @@
are to be inserted.  */
 
 #define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \
-  ix86_emit_mode_set ((ENTITY), (MODE))
+  ix86_emit_mode_set ((ENTITY), (MODE), (HARD_REGS_LIVE))
 
 /* Avoid renaming of stack registers, as doing so in combination with
scheduling just increases amount of live registers at time and in


Fwd: [off-list] Re: [PATCH] Vzeroupper placement/47440

2012-11-09 Thread Vladimir Yakovlev
-- Forwarded message --
From: Vladimir Yakovlev vbyakov...@gmail.com
Date: 2012/11/9
Subject: Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
To: Uros Bizjak ubiz...@gmail.com
Копия: H.J. Lu hjl.to...@gmail.com, Igor Zamyatin izamya...@gmail.com


I did changes that moves vzeroupper insertion after reload

2012-11-09  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New.
* config/i386/i386.c (ix86_init_machine_status): Deleted
initialization for mode switching.
* i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New.
* mode-switching.c (gate_mode_switching1): New.
(rest_of_handle_mode_switching1): New.
(pass_mode_switching1): New.
* passes.c (init_optimization_passes): New pass pass_mode_switching1.
* tree-pass.h (pass_mode_switching1): New.

But this caused assertion fails in  rtl_verify_flow_info_1 () at cfgrtl.c:2291

  fatal_insn (flow control insn inside a basic block, x);

The asserts are called by two calls of mode-switching.c:
commit_edge_insertion and cleanup_cfg. After I commented (see below)
459.GemsFDTD benchspec passed. Your opinion of the patch and haw we
can do something with asserts.

Regards,
Vladimir

--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -747,7 +747,7 @@ optimize_mode_switching (void)
 commit_edge_insertions ();

 #if defined (MODE_ENTRY)  defined (MODE_EXIT)
-  cleanup_cfg (CLEANUP_NO_INSN_DEL);
+  /*cleanup_cfg (CLEANUP_NO_INSN_DEL);*/
 #else
   if (!need_commit  !emitted)
 return 0;
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -1828,7 +1828,7 @@ commit_edge_insertions (void)
   basic_block bb;

 #ifdef ENABLE_CHECKING
-  verify_flow_info ();
+  /*verify_flow_info ();*/
 #endif


2012/11/9 Uros Bizjak ubiz...@gmail.com:
 On Thu, Nov 8, 2012 at 6:52 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Uh, this is spill around call insn, produced by reload.

 Please compile this code:

 double test (double a)
 {
   printf (Hello\n);
   return a;
 }

 You will get at mode switching:

 1 NOTE_INSN_DELETED
 4 NOTE_INSN_BASIC_BLOCK
 2 r60:DF=xmm0:DF
   REG_DEAD: xmm0:DF
 3 NOTE_INSN_FUNCTION_BEG
 6 di:DI=`*.LC0'
 7 call ...
   REG_DEAD: di:DI
   REG_UNUSED: ax:SI
12 xmm0:DF=r60:DF
   REG_DEAD: r60:DF
15 use xmm0:DF

 But reload will insert:

 1 NOTE_INSN_DELETED
 4 NOTE_INSN_BASIC_BLOCK
 2 xmm0:DF=xmm0:DF
   REG_DEAD: xmm0:DF
18 [sp:DI+0x8]=xmm0:DF
   REG_DEAD: xmm0:DF
 3 NOTE_INSN_FUNCTION_BEG
 6 di:DI=`*.LC0'
 7 call ...
   REG_DEAD: di:DI
   REG_UNUSED: ax:SI
19 xmm0:DF=[sp:DI+0x8]
   REG_DEAD: r62:DF
12 xmm0:DF=xmm0:DF
   REG_DEAD: xmm0:DF
15 use xmm0:DF

 I was not paying attention to this situation.


 A viable solution to this issue is through machine-reorg function (AKA
 x86_reorg) that would just move vzeroupper to the close proximity to a
 call insn. This would work on non-64bit-MS-ABI targets, where all SSE
 registers are dead at call insn place.

 Please note that 64bit-MS-ABI target declares registers xmm6+ as
 call-saved, so they can live over the call. I am not familiar with
 this target, but it looks to me that we have to remove vzeroupper, if
 one or more call-saved SSE registers are live at the call insn place.

 Uros.


prvzu.patch
Description: Binary data


Re: [off-list] Re: [PATCH] Vzeroupper placement/47440

2012-11-09 Thread Vladimir Yakovlev
 These assert should tell you what is wrong with the control flow.
 Please look at control_flow_insn_p, which condition returns true.

There is a note after call insn.

(call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
(_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200
_gfortran_stop_string) [0 _gfortran_stop_string S1 A8])
(const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
 (expr_list:REG_DEAD (reg:DI 5 di)
(expr_list:REG_DEAD (reg:SI 4 si)
(expr_list:REG_NORETURN (const_int 0 [0])
(nil
(expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
(expr_list:REG_BR_PRED (use (reg:SI 4 si))
(nil
(note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
(const_int 0 [0]))
(expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
(const_int 0 [0]))
(nil))) NOTE_INSN_CALL_ARG_LOCATION)

 You shouldn't disable commit_edge_insertions, as there is the function
 where vzerouppers are emitted.

I didn;t disable commit_edge_insertions. I only remove call of assert.

2012/11/9 Uros Bizjak ubiz...@gmail.com:
 On Fri, Nov 9, 2012 at 11:45 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Fri, Nov 9, 2012 at 11:21 AM, Vladimir Yakovlev vbyakov...@gmail.com 
 wrote:
 I did changes that moves vzeroupper insertion after reload

 2012-11-09  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

 * i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New.
 * config/i386/i386.c (ix86_init_machine_status): Deleted
 initialization for mode switching.
 * i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New.
 * mode-switching.c (gate_mode_switching1): New.
 (rest_of_handle_mode_switching1): New.
 (pass_mode_switching1): New.
 * passes.c (init_optimization_passes): New pass 
 pass_mode_switching1.
 * tree-pass.h (pass_mode_switching1): New.

 But this caused assertion fails in  rtl_verify_flow_info_1 () at 
 cfgrtl.c:2291

   fatal_insn (flow control insn inside a basic block, x);

 The asserts are called by two calls of mode-switching.c:
 commit_edge_insertion and cleanup_cfg. After I commented (see below)
 459.GemsFDTD benchspec passed. Your opinion of the patch and haw we
 can do something with asserts.

 These assert should tell you what is wrong with the control flow.
 Please look at control_flow_insn_p, which condition returns true. You
 shouldn't disable commit_edge_insertions, as there is the function
 where vzerouppers are emitted.

 Uros.


Re: [off-list] Re: [PATCH] Vzeroupper placement/47440

2012-11-09 Thread Uros Bizjak
On Fri, Nov 9, 2012 at 1:18 PM, Vladimir Yakovlev vbyakov...@gmail.com wrote:
 These assert should tell you what is wrong with the control flow.
 Please look at control_flow_insn_p, which condition returns true.

 There is a note after call insn.

 (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
 (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200
 _gfortran_stop_string) [0 _gfortran_stop_string S1 A8])
 (const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
  (expr_list:REG_DEAD (reg:DI 5 di)
 (expr_list:REG_DEAD (reg:SI 4 si)
 (expr_list:REG_NORETURN (const_int 0 [0])
 (nil
 (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
 (expr_list:REG_BR_PRED (use (reg:SI 4 si))
 (nil
 (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
 (const_int 0 [0]))
 (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
 (const_int 0 [0]))
 (nil))) NOTE_INSN_CALL_ARG_LOCATION)


Huh, this RTX is ignored:

--cfgrtl.c--
bool
control_flow_insn_p (const_rtx insn)
{
  switch (GET_CODE (insn))
{
case NOTE:
case CODE_LABEL:
case DEBUG_INSN:
  return false;
--cfgrtl.c--

The problem is noreturn call.

BTW: What happens if the new pass is put before pro_and_epilogue pass?

Uros,


Re: [off-list] Re: [PATCH] Vzeroupper placement/47440

2012-11-09 Thread Jakub Jelinek
On Fri, Nov 09, 2012 at 01:29:18PM +0100, Uros Bizjak wrote:
 On Fri, Nov 9, 2012 at 1:18 PM, Vladimir Yakovlev vbyakov...@gmail.com 
 wrote:
  These assert should tell you what is wrong with the control flow.
  Please look at control_flow_insn_p, which condition returns true.
 
  There is a note after call insn.
 
  (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
  (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200
  _gfortran_stop_string) [0 _gfortran_stop_string S1 A8])
  (const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
   (expr_list:REG_DEAD (reg:DI 5 di)
  (expr_list:REG_DEAD (reg:SI 4 si)
  (expr_list:REG_NORETURN (const_int 0 [0])
  (nil
  (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
  (expr_list:REG_BR_PRED (use (reg:SI 4 si))
  (nil
  (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
  (const_int 0 [0]))
  (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
  (const_int 0 [0]))
  (nil))) NOTE_INSN_CALL_ARG_LOCATION)
 
 
 Huh, this RTX is ignored:

NOTE_INSN_CALL_ARG_LOCATION is fine, even after a REG_NORETURN call.
It is just a way how to pass call argument details to dwarf2out.
If you have a pass after var-tracking, you need to skip over it.

Jakub


Re: [off-list] Re: [PATCH] Vzeroupper placement/47440

2012-11-09 Thread Uros Bizjak
On Fri, Nov 9, 2012 at 1:36 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Nov 09, 2012 at 01:29:18PM +0100, Uros Bizjak wrote:
 On Fri, Nov 9, 2012 at 1:18 PM, Vladimir Yakovlev vbyakov...@gmail.com 
 wrote:
  These assert should tell you what is wrong with the control flow.
  Please look at control_flow_insn_p, which condition returns true.
 
  There is a note after call insn.
 
  (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
  (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200
  _gfortran_stop_string) [0 _gfortran_stop_string S1 A8])
  (const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
   (expr_list:REG_DEAD (reg:DI 5 di)
  (expr_list:REG_DEAD (reg:SI 4 si)
  (expr_list:REG_NORETURN (const_int 0 [0])
  (nil
  (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
  (expr_list:REG_BR_PRED (use (reg:SI 4 si))
  (nil
  (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
  (const_int 0 [0]))
  (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
  (const_int 0 [0]))
  (nil))) NOTE_INSN_CALL_ARG_LOCATION)
 

 Huh, this RTX is ignored:

 NOTE_INSN_CALL_ARG_LOCATION is fine, even after a REG_NORETURN call.
 It is just a way how to pass call argument details to dwarf2out.
 If you have a pass after var-tracking, you need to skip over it.

Yes, but postreload mode switching should come before pro_and_epilogue
anyway, otherwise create_pre_exit won't work:

--mode-switching.c (222)--
/* If this function returns a value at the end, we have to
   insert the final mode switch before the return value copy
   to its hard register.  */
if (EDGE_COUNT (EXIT_BLOCK_PTR-preds) == 1
 NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
 GET_CODE (PATTERN (last_insn)) == USE
 GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)
--mode-switching.2 (228)--

I believe that this will work OK if the pass is inserted before
prologue generation pass.

Uros.


Re: [off-list] Re: [PATCH] Vzeroupper placement/47440

2012-11-09 Thread Uros Bizjak
On Fri, Nov 9, 2012 at 1:47 PM, Uros Bizjak ubiz...@gmail.com wrote:

  These assert should tell you what is wrong with the control flow.
  Please look at control_flow_insn_p, which condition returns true.
 
  There is a note after call insn.
 
  (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
  (_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200
  _gfortran_stop_string) [0 _gfortran_stop_string S1 A8])
  (const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
   (expr_list:REG_DEAD (reg:DI 5 di)
  (expr_list:REG_DEAD (reg:SI 4 si)
  (expr_list:REG_NORETURN (const_int 0 [0])
  (nil
  (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
  (expr_list:REG_BR_PRED (use (reg:SI 4 si))
  (nil
  (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
  (const_int 0 [0]))
  (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
  (const_int 0 [0]))
  (nil))) NOTE_INSN_CALL_ARG_LOCATION)
 

 Huh, this RTX is ignored:

 NOTE_INSN_CALL_ARG_LOCATION is fine, even after a REG_NORETURN call.
 It is just a way how to pass call argument details to dwarf2out.
 If you have a pass after var-tracking, you need to skip over it.

 Yes, but postreload mode switching should come before pro_and_epilogue
 anyway, otherwise create_pre_exit won't work:

 --mode-switching.c (222)--
 /* If this function returns a value at the end, we have to
insert the final mode switch before the return value copy
to its hard register.  */
 if (EDGE_COUNT (EXIT_BLOCK_PTR-preds) == 1
  NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
  GET_CODE (PATTERN (last_insn)) == USE
  GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)
 --mode-switching.2 (228)--

 I believe that this will work OK if the pass is inserted before
 prologue generation pass.

Finally, having a post-reload mode-switching pass, we can double-check
that there are no live SSE registers at vzeroupper insertion point. As
vzeroupper is only an optimization, we want to play safe and cancel
vzeroupper insertion in this case

There is no degradation for x86_64 gABI targets, since all SSE
registers are call-clobbered. Vzeroupper is conditionally inserted
just before call insn, where all registers are saved to stack and
already dead. The vzeroupper at function exit is not problematic.

Uros.
Index: i386-protos.h
===
--- i386-protos.h   (revision 193329)
+++ i386-protos.h   (working copy)
@@ -172,8 +172,11 @@ extern int ix86_mode_needed (int, rtx);
 extern int ix86_mode_after (int, int, rtx);
 extern int ix86_mode_entry (int);
 extern int ix86_mode_exit (int);
-extern void ix86_emit_mode_set (int, int);
 
+#ifdef HARD_CONST
+extern void ix86_emit_mode_set (int, int, HARD_REG_SET);
+#endif
+
 extern void x86_order_regs_for_local_alloc (void);
 extern void x86_function_profiler (FILE *, int);
 extern void x86_emit_floatuns (rtx [2]);
Index: i386.c
===
--- i386.c  (revision 193329)
+++ i386.c  (working copy)
@@ -15277,16 +15284,38 @@ emit_i387_cw_initialization (int mode)
   emit_move_insn (new_mode, reg);
 }
 
+/* Emit vzeroupper.  */
+
+void
+ix86_avx_emit_vzeroupper (HARD_REG_SET regs_live)
+{
+  int i;
+
+  /* Cancel automatic vzeroupper insertion if there are
+ live call-saved SSE registers at the insertion point.  */
+
+  for (i = FIRST_SSE_REG; i = LAST_SSE_REG; i++)
+if (!call_used_regs[i]  TEST_HARD_REG_BIT (regs_live, i))
+  return;
+
+  if (TARGET_64BIT)
+for (i = FIRST_REX_SSE_REG; i = LAST_REX_SSE_REG; i++)
+  if (!call_used_regs[i]  TEST_HARD_REG_BIT (regs_live, i))
+   return;
+
+  emit_insn (gen_avx_vzeroupper ());
+}
+
 /* Generate one or more insns to set ENTITY to MODE.  */
 
 void
-ix86_emit_mode_set (int entity, int mode)
+ix86_emit_mode_set (int entity, int mode, HARD_REG_SET regs_live)
 {
   switch (entity)
 {
 case AVX_U128:
   if (mode == AVX_U128_CLEAN)
-   emit_insn (gen_avx_vzeroupper ());
+   ix86_avx_emit_vzeroupper (regs_live);
   break;
 case I387_TRUNC:
 case I387_FLOOR:
Index: i386.h
===
--- i386.h  (revision 193329)
+++ i386.h  (working copy)
@@ -2223,7 +2227,7 @@ enum avx_u128_state
are to be inserted.  */
 
 #define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \
-  ix86_emit_mode_set ((ENTITY), (MODE))
+  ix86_emit_mode_set ((ENTITY), (MODE), (HARD_REGS_LIVE))
 
 /* Avoid renaming of stack registers, as doing so in combination with
scheduling just increases amount of live registers at time and in


Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Jakub Jelinek
On Wed, Nov 07, 2012 at 08:08:08AM +0100, Uros Bizjak wrote:
  2012-11-06  Jakub Jelinek  ja...@redhat.com
 
  * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
  look for reg in CALL operand.
 
  OK.
 
 You can also break the loop after reg is found.

I have committed the patch as is to fix the bootstrap, as anything else
needs another bootstrap/regtest.  I don't think breaking out of the loop
would be correct, then say for *{,sib}call_value_pop patterns reg would be
stack pointer rather than the return value of the call.  Due to that pattern
we can't use single_set, but I wonder if we just can't use XVECEXP (pat, 0, 0)
unconditionally for the return value, or perhaps check
the condition inside of the loop (REG_P (reg)  VALID_AVX256_REG_OR_OI_MODE 
(GET_MODE
(reg))), return AVX_U128_DIRTY if true (and that way break out of the loop),
and return AVX_U128_CLEAN after the loop.
Or I wonder why is call handled specially at all, doesn't
  /* Check if a 256bit AVX register is referenced in stores.  */
  state = unused;
  note_stores (pat, check_avx256_stores, state);
  if (state == used)
return AVX_U128_DIRTY;
handle it?  Then it would just need to be if (CALL_P (insn)) return 
AVX_U128_CLEAN.
BTW, the formatting is wrong in some spots, e.g.
check_avx256_stores (rtx dest, const_rtx set, void *data)
{
  if (((REG_P (dest) || MEM_P(dest))

I'd prefer to leave this to the original submitter.

Jakub


Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Uros Bizjak
On Wed, Nov 7, 2012 at 9:04 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Nov 07, 2012 at 08:08:08AM +0100, Uros Bizjak wrote:
  2012-11-06  Jakub Jelinek  ja...@redhat.com
 
  * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
  look for reg in CALL operand.
 
  OK.

 You can also break the loop after reg is found.

 I have committed the patch as is to fix the bootstrap, as anything else
 needs another bootstrap/regtest.  I don't think breaking out of the loop
 would be correct, then say for *{,sib}call_value_pop patterns reg would be
 stack pointer rather than the return value of the call.  Due to that pattern
 we can't use single_set, but I wonder if we just can't use XVECEXP (pat, 0, 0)
 unconditionally for the return value, or perhaps check
 the condition inside of the loop (REG_P (reg)  VALID_AVX256_REG_OR_OI_MODE 
 (GET_MODE
 (reg))), return AVX_U128_DIRTY if true (and that way break out of the loop),
 and return AVX_U128_CLEAN after the loop.

Indeed, I didn't notice reverse loop.

 Or I wonder why is call handled specially at all, doesn't
   /* Check if a 256bit AVX register is referenced in stores.  */
   state = unused;
   note_stores (pat, check_avx256_stores, state);
   if (state == used)
 return AVX_U128_DIRTY;
 handle it?  Then it would just need to be if (CALL_P (insn)) return 
 AVX_U128_CLEAN.

You are right, I am testing the attached patch.

 BTW, the formatting is wrong in some spots, e.g.
 check_avx256_stores (rtx dest, const_rtx set, void *data)
 {
   if (((REG_P (dest) || MEM_P(dest))

I have some doubts that this function is fully correct. The comment says:

/* Check if a 256bit AVX register is referenced in stores.   */

But, we are in fact checking stores and uses.

IIRC, U128_DIRTY state is only set for stores to YMM register, so:

static void
check_avx256_stores (rtx dest, const_rtx set ATTRIBUTE_UNUSED, void *data)
{
  if (REG_P (dest)
   VALID_AVX256_REG_OR_OI_MODE (GET_MODE (dest)))
{
  enum upper_128bits_state *state
= (enum upper_128bits_state *) data;
  *state = used;
}
}

 I'd prefer to leave this to the original submitter.

Yes, Vladimir, can you please comment on these issues?

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 193280)
+++ config/i386/i386.c  (working copy)
@@ -15079,30 +15079,8 @@
 ix86_avx_u128_mode_after (int mode, rtx insn)
 {
   rtx pat = PATTERN (insn);
-  rtx reg = NULL;
-  int i;
   enum upper_128bits_state state;
 
-  /* Check for CALL instruction.  */
-  if (CALL_P (insn))
-{
-  if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
-   reg = SET_DEST (pat);
-  else if (GET_CODE (pat) ==  PARALLEL)
-   for (i = XVECLEN (pat, 0) - 1; i = 0; i--)
- {
-   rtx x = XVECEXP (pat, 0, i);
-   if (GET_CODE(x) == SET)
- reg = SET_DEST (x);
- }
-  /* Mode after call is set to AVX_U128_DIRTY if there are
-256bit modes used in the function return register.  */
-  if (reg  REG_P (reg)  VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
-   return AVX_U128_DIRTY;
-  else
-   return AVX_U128_CLEAN;
-}
-
   if (vzeroupper_operation (pat, VOIDmode)
   || vzeroall_operation (pat, VOIDmode))
 return AVX_U128_CLEAN;
@@ -15112,6 +15090,10 @@
   note_stores (pat, check_avx256_stores, state);
   if (state == used)
 return AVX_U128_DIRTY;
+  /* We know that state is clean after CALL insn if there are no
+ 256bit modes used in the function return register.  */
+  else if (CALL_P (insn)  state == unused)
+return AVX_U128_CLEAN;
 
   return mode;
 }


Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Uros Bizjak
On Wed, Nov 7, 2012 at 9:04 AM, Jakub Jelinek ja...@redhat.com wrote:

 Or I wonder why is call handled specially at all, doesn't
   /* Check if a 256bit AVX register is referenced in stores.  */
   state = unused;
   note_stores (pat, check_avx256_stores, state);
   if (state == used)
 return AVX_U128_DIRTY;
 handle it?  Then it would just need to be if (CALL_P (insn)) return 
 AVX_U128_CLEAN.
 BTW, the formatting is wrong in some spots, e.g.
 check_avx256_stores (rtx dest, const_rtx set, void *data)
 {
   if (((REG_P (dest) || MEM_P(dest))

 I'd prefer to leave this to the original submitter.

I have committed following patch that address all the above issues.

2012-11-07  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (enum upper_128bits_state): Remove.
(check_avx256_store): Use bool pointer argument.
(ix86_avx_u128_mode_needed): Use note_stores also for CALL insns.
* config/i386/predicates.md (vzeroupper_operation): Use match_test.

Bootstrapped and regression tested on x86_64-pc-linux-gnu, committed.

Uros.
Index: predicates.md
===
--- predicates.md   (revision 193292)
+++ predicates.md   (working copy)
@@ -1231,10 +1231,8 @@
 
 ;; return true if OP is a vzeroupper operation.
 (define_predicate vzeroupper_operation
-  (match_code unspec_volatile)
-{
-  return XINT (op, 1) == UNSPECV_VZEROUPPER;
-})
+  (and (match_code unspec_volatile)
+   (match_test XINT (op, 1) == UNSPECV_VZEROUPPER)))
 
 ;; Return true if OP is a parallel for a vbroadcast permute.
 
Index: i386.c
===
--- i386.c  (revision 193292)
+++ i386.c  (working copy)
@@ -65,27 +65,19 @@ along with GCC; see the file COPYING3.  If not see
 #include tree-pass.h
 #include tree-flow.h
 
-enum upper_128bits_state
-{
-  unknown = 0,
-  unused,
-  used
-};
-
 /* Check if a 256bit AVX register is referenced in stores.   */
 
 static void
 check_avx256_stores (rtx dest, const_rtx set, void *data)
 {
-  if (((REG_P (dest) || MEM_P(dest))
+  if (((REG_P (dest) || MEM_P (dest))
 VALID_AVX256_REG_OR_OI_MODE (GET_MODE (dest)))
   || (GET_CODE (set) == SET
   (REG_P (SET_SRC (set)) || MEM_P (SET_SRC (set)))
   VALID_AVX256_REG_OR_OI_MODE (GET_MODE (SET_SRC (set)
 {
-  enum upper_128bits_state *state
-   = (enum upper_128bits_state *) data;
-  *state = used;
+  bool *used = (bool *) data;
+  *used = true;
 }
 }
 
@@ -14967,23 +14959,24 @@ output_387_binary_op (rtx insn, rtx *operands)
 static int
 ix86_avx_u128_mode_needed (rtx insn)
 {
-  rtx pat = PATTERN (insn);
-  rtx arg;
-  enum upper_128bits_state state;
+  bool avx_u128_used;
 
   if (CALL_P (insn))
 {
+  rtx link;
+
   /* Needed mode is set to AVX_U128_CLEAN if there are
 no 256bit modes used in function arguments.  */
-  for (arg = CALL_INSN_FUNCTION_USAGE (insn); arg;
-  arg = XEXP (arg, 1))
+  for (link = CALL_INSN_FUNCTION_USAGE (insn);
+  link;
+  link = XEXP (link, 1))
{
- if (GET_CODE (XEXP (arg, 0)) == USE)
+ if (GET_CODE (XEXP (link, 0)) == USE)
{
- rtx reg = XEXP (XEXP (arg, 0), 0);
+ rtx arg = XEXP (XEXP (link, 0), 0);
 
- if (reg  REG_P (reg)
-  VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
+ if (REG_P (arg)
+  VALID_AVX256_REG_OR_OI_MODE (GET_MODE (arg)))
return AVX_U128_ANY;
}
}
@@ -14992,10 +14985,11 @@ ix86_avx_u128_mode_needed (rtx insn)
 }
 
   /* Check if a 256bit AVX register is referenced in stores.  */
-  state = unused;
-  note_stores (pat, check_avx256_stores, state);
-  if (state == used)
+  avx_u128_used = false;
+  note_stores (PATTERN (insn), check_avx256_stores, avx_u128_used);
+  if (avx_u128_used)
 return AVX_U128_DIRTY;
+
   return AVX_U128_ANY;
 }
 
@@ -15079,39 +15073,21 @@ static int
 ix86_avx_u128_mode_after (int mode, rtx insn)
 {
   rtx pat = PATTERN (insn);
-  rtx reg = NULL;
-  int i;
-  enum upper_128bits_state state;
+  bool avx_u128_used;
 
-  /* Check for CALL instruction.  */
-  if (CALL_P (insn))
-{
-  if (GET_CODE (pat) == SET)
-   reg = SET_DEST (pat);
-  else if (GET_CODE (pat) == PARALLEL)
-   for (i = XVECLEN (pat, 0) - 1; i = 0; i--)
- {
-   rtx x = XVECEXP (pat, 0, i);
-   if (GET_CODE(x) == SET)
- reg = SET_DEST (x);
- }
-  /* Mode after call is set to AVX_U128_DIRTY if there are
-256bit modes used in the function return register.  */
-  if (reg  REG_P (reg)  VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
-   return AVX_U128_DIRTY;
-  else
-   return AVX_U128_CLEAN;
-}
-
   if (vzeroupper_operation (pat, VOIDmode)
   || vzeroall_operation (pat, VOIDmode))
 return AVX_U128_CLEAN;
 
   /* Check 

Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Vladimir Yakovlev
Hello,

Thanyou for investigation and fixing the problem.  I'll answer on remarks later.

Regards,
Vladimir

2012/11/7 Jakub Jelinek ja...@redhat.com:
 On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote:
 On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
  Hello,
  OK for mainline SVN, please commit.
  Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html
 
  Thanks, K

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224

 Not only that, it also broke --enable-checking=yes,rtl bootstrap.
 SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and
 the code looks for reg, so I think looking for CALL was just a mistake.

 This fixes the bootstrap, ok for trunk?

 2012-11-06  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
 look for reg in CALL operand.

 --- gcc/config/i386/i386.c.jj   2012-11-06 18:10:22.0 +0100
 +++ gcc/config/i386/i386.c  2012-11-06 20:15:09.068912242 +0100
 @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx
/* Check for CALL instruction.  */
if (CALL_P (insn))
  {
 -  if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
 +  if (GET_CODE (pat) == SET)
 reg = SET_DEST (pat);
 -  else if (GET_CODE (pat) ==  PARALLEL)
 +  else if (GET_CODE (pat) == PARALLEL)
 for (i = XVECLEN (pat, 0) - 1; i = 0; i--)
   {
 rtx x = XVECEXP (pat, 0, i);


 Jakub


Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Vladimir Yakovlev
I tested changes with configure

../gcc/configure --enable-clocale=gnu --with-system-zlib
--enable-shared --with-demangler-in-ld --with-fpmath=sse
--enable-languages=c,c++,fortran,java,lto,objc --with-arch=corei7-avx
--with-cpu=corei7-avx

Bootstrap is passed and no new fails in make check.

Thank you,
Vladimir


2012/11/7 Vladimir Yakovlev vbyakov...@gmail.com:
 Hello,

 Thanyou for investigation and fixing the problem.  I'll answer on remarks 
 later.

 Regards,
 Vladimir

 2012/11/7 Jakub Jelinek ja...@redhat.com:
 On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote:
 On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
  Hello,
  OK for mainline SVN, please commit.
  Checked into GCC trunk: 
  http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html
 
  Thanks, K

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224

 Not only that, it also broke --enable-checking=yes,rtl bootstrap.
 SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and
 the code looks for reg, so I think looking for CALL was just a mistake.

 This fixes the bootstrap, ok for trunk?

 2012-11-06  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
 look for reg in CALL operand.

 --- gcc/config/i386/i386.c.jj   2012-11-06 18:10:22.0 +0100
 +++ gcc/config/i386/i386.c  2012-11-06 20:15:09.068912242 +0100
 @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx
/* Check for CALL instruction.  */
if (CALL_P (insn))
  {
 -  if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
 +  if (GET_CODE (pat) == SET)
 reg = SET_DEST (pat);
 -  else if (GET_CODE (pat) ==  PARALLEL)
 +  else if (GET_CODE (pat) == PARALLEL)
 for (i = XVECLEN (pat, 0) - 1; i = 0; i--)
   {
 rtx x = XVECEXP (pat, 0, i);


 Jakub


Re: [PATCH] Vzeroupper placement/47440

2012-11-06 Thread Kirill Yukhin
Hello,
 OK for mainline SVN, please commit.
Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html

Thanks, K


Re: [PATCH] Vzeroupper placement/47440

2012-11-06 Thread H.J. Lu
On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 OK for mainline SVN, please commit.
 Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html

 Thanks, K

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224

-- 
H.J.


Re: [PATCH] Vzeroupper placement/47440

2012-11-06 Thread Jakub Jelinek
On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote:
 On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
  Hello,
  OK for mainline SVN, please commit.
  Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html
 
  Thanks, K
 
 This caused:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224

Not only that, it also broke --enable-checking=yes,rtl bootstrap.
SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and
the code looks for reg, so I think looking for CALL was just a mistake.

This fixes the bootstrap, ok for trunk?

2012-11-06  Jakub Jelinek  ja...@redhat.com

* config/i386/i386.c (ix86_avx_u128_mode_after): Don't
look for reg in CALL operand.

--- gcc/config/i386/i386.c.jj   2012-11-06 18:10:22.0 +0100
+++ gcc/config/i386/i386.c  2012-11-06 20:15:09.068912242 +0100
@@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx
   /* Check for CALL instruction.  */
   if (CALL_P (insn))
 {
-  if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
+  if (GET_CODE (pat) == SET)
reg = SET_DEST (pat);
-  else if (GET_CODE (pat) ==  PARALLEL)
+  else if (GET_CODE (pat) == PARALLEL)
for (i = XVECLEN (pat, 0) - 1; i = 0; i--)
  {
rtx x = XVECEXP (pat, 0, i);


Jakub


Re: [PATCH] Vzeroupper placement/47440

2012-11-06 Thread Uros Bizjak
On Tue, Nov 6, 2012 at 11:18 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote:
 On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
  Hello,
  OK for mainline SVN, please commit.
  Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html
 
  Thanks, K

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224

 Not only that, it also broke --enable-checking=yes,rtl bootstrap.
 SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and
 the code looks for reg, so I think looking for CALL was just a mistake.

 This fixes the bootstrap, ok for trunk?

 2012-11-06  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
 look for reg in CALL operand.

OK.

Thanks,
Uros.


Re: [PATCH] Vzeroupper placement/47440

2012-11-06 Thread Uros Bizjak
On Wed, Nov 7, 2012 at 8:03 AM, Uros Bizjak ubiz...@gmail.com wrote:

  OK for mainline SVN, please commit.
  Checked into GCC trunk: 
  http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html
 
  Thanks, K

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224

 Not only that, it also broke --enable-checking=yes,rtl bootstrap.
 SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and
 the code looks for reg, so I think looking for CALL was just a mistake.

 This fixes the bootstrap, ok for trunk?

 2012-11-06  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
 look for reg in CALL operand.

 OK.

You can also break the loop after reg is found.

Uros.


Re: [PATCH] Vzeroupper placement/47440

2012-11-05 Thread Uros Bizjak
Hello!

 2012-11-04  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* mode-switching.c (create_pre_exit): Added code for 
 maybe_builtin_apply case.

The part above is already committed.

* config/i386/i386-protos.h (emit_i387_cw_initialization): Deleted.
(emit_vzero): Added prototype.
(ix86_mode_entry): Likewise.
(ix86_mode_exit): Likewise.
(ix86_emit_mode_set): Likewise.

* config/i386/i386.c (typedef struct block_info_def): Deleted.
(define BLOCK_INFO): Deleted.
(check_avx256_stores): Added checking for MEM_P.
(move_or_delete_vzeroupper_2): Deleted.
(move_or_delete_vzeroupper_1): Deleted.
(move_or_delete_vzeroupper): Deleted.
(ix86_maybe_emit_epilogue_vzeroupper): Deleted.
(function_pass_avx256_p): Deleted.
(ix86_function_ok_for_sibcall): Deleted disabling sibcall.
(nit_cumulative_args): Deleted initialization of of avx256 fields of
cfun-machine.
(ix86_emit_restore_sse_regs_using_mov): Deleted vzeroupper generation.
(ix86_expand_epilogue): Likewise.
(ix86_avx_u128_mode_needed): New.
(ix86_i387_mode_needed): Renamed ix86_mode_needed.
(ix86_mode_needed): New.
(ix86_avx_u128_mode_after): New.
(ix86_mode_after): New.
(ix86_avx_u128_mode_entry): New.
(ix86_mode_entry): New.
(ix86_avx_u128_mode_exit): New.
(ix86_mode_exit): New.
(ix86_emit_mode_set): New.
(ix86_expand_call): Deleted vzeroupper generation.
(ix86_split_call_vzeroupper): Deleted.
(ix86_init_machine_status): Initialzed optimize_mode_switching.
(ix86_expand_special_args_builtin): Changed.
(ix86_reorg): Deleted a call of move_or_delete_vzeroupper.

* config/i386/i386.h  (VALID_AVX256_REG_OR_OI_MODE): New.
(AVX_U128): New.
(avx_u128_state): New.
(NUM_MODES_FOR_MODE_SWITCHING): Added AVX_U128_ANY.
(MODE_AFTER): New.
(MODE_ENTRY): New.
(MODE_EXIT): New.
(EMIT_MODE_SET): Changed.
(machine_function): Deleted avx256 fields.

* config/i386/i386.md (UNSPEC_CALL_NEEDS_VZEROUPPER): Deleted.
(define_insn_and_split *call_vzeroupper): Deleted.
(define_insn_and_split *call_rex64_ms_sysv_vzeroupper): Deleted.
(define_insn_and_split *sibcall_vzeroupper): Deleted.
(define_insn_and_split *call_pop_vzeroupper): Deleted.
(define_insn_and_split *sibcall_pop_vzeroupper): Deleted.
(define_insn_and_split *call_value_vzeroupper): Deleted.
(define_insn_and_split *sibcall_value_vzeroupper): Deleted.
(define_insn_and_split *call_value_rex64_ms_sysv_vzeroupper): 
 Deleted.
(define_insn_and_split *call_value_pop_vzeroupper): Deleted.
(define_insn_and_split *sibcall_value_pop_vzeroupper): Deleted.
(define_expand return): Deleted vzeroupper emitting.
(define_expand simple_return): Deleted.

* config/i386/predicates.md (vzeroupper_operation): New.

* config/i386/sse.md (avx_vzeroupper): Changed.

 2012-11-04  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* gcc.target/i386/avx-vzeroupper-5.c: Changed scan-assembler-times.
gcc.target/i386/avx-vzeroupper-8.c: Likewise.
gcc.target/i386/avx-vzeroupper-9.c: Likewise.
gcc.target/i386/avx-vzeroupper-10.c: Likewise.
gcc.target/i386/avx-vzeroupper-11.c: Likewise.
gcc.target/i386/avx-vzeroupper-12.c: Likewise.
gcc.target/i386/avx-vzeroupper-19.c: Likewis.
gcc.target/i386/avx-vzeroupper-27.c: New.

OK for mainline SVN, please commit.

Thanks,
Uros.


Re: [PATCH] Vzeroupper placement/47440

2012-11-04 Thread Uros Bizjak
Hello!

2012-11-04  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* mode-switching.c (create_pre_exit): Added code for
maybe_builtin_apply case.

* config/i386/i386-protos.h (emit_i387_cw_initialization): Deleted.
(emit_vzero): Added prototype.
(ix86_mode_entry): Likewise.
(ix86_mode_exit): Likewise.
(ix86_emit_mode_set): Likewise.

* config/i386/i386.c (VALID_AVX256_REG_OR_OI_MODE): New.
(typedef struct block_info_def): Deleted.
(define BLOCK_INFO): Deleted.
(check_avx256_stores): Added checking for MEM_P.
(move_or_delete_vzeroupper_2): Deleted.
(move_or_delete_vzeroupper_1): Deleted.
(move_or_delete_vzeroupper): Deleted.
(ix86_maybe_emit_epilogue_vzeroupper): Deleted.
(function_pass_avx256_p): Deleted.
(ix86_function_ok_for_sibcall): Deleted disabling sibcall.
(nit_cumulative_args): Deleted initialization of of avx256 fields of
cfun-machine.
(ix86_emit_restore_sse_regs_using_mov): Deleted vzeroupper generation.
(ix86_expand_epilogue): Likewise.
(is_vzeroupper): New.
(is_vzeroall): New.
(ix86_avx_u128_mode_needed): New.
(ix86_i387_mode_needed): Renamed ix86_mode_needed.
(ix86_mode_needed): New.
(ix86_avx_u128_mode_after): New.
(ix86_mode_after): New.
(ix86_avx_u128_mode_entry): New.
(ix86_mode_entry): New.
(ix86_avx_u128_mode_exit): New.
(ix86_mode_exit): New.
(ix86_emit_vzeroupper): New.
(ix86_emit_mode_set): New.
(ix86_expand_call): Deleted vzeroupper generation.
(ix86_split_call_vzeroupper): Deleted.
(ix86_init_machine_status): Initialzed optimize_mode_switching.
(ix86_expand_special_args_builtin): Changed.
(ix86_reorg): Deleted a call of move_or_delete_vzeroupper.

* config/i386/i386.h (AVX_U128): New.
(avx_u128_state): New.
(NUM_MODES_FOR_MODE_SWITCHING): Added AVX_U128_ANY.
(MODE_AFTER): New.
(MODE_ENTRY): New.
(MODE_EXIT): New.
(EMIT_MODE_SET): Changed.
(machine_function): Deleted avx256 fields.

* config/i386/i386.md (UNSPEC_CALL_NEEDS_VZEROUPPER): Deleted.
(define_insn_and_split *call_vzeroupper): Deleted.
(define_insn_and_split *call_rex64_ms_sysv_vzeroupper): Deleted.
(define_insn_and_split *sibcall_vzeroupper): Deleted.
(define_insn_and_split *call_pop_vzeroupper): Deleted.
(define_insn_and_split *sibcall_pop_vzeroupper): Deleted.
(define_insn_and_split *call_value_vzeroupper): Deleted.
(define_insn_and_split *sibcall_value_vzeroupper): Deleted.
(define_insn_and_split *call_value_rex64_ms_sysv_vzeroupper): Deleted.
(define_insn_and_split *call_value_pop_vzeroupper): Deleted.
(define_insn_and_split *sibcall_value_pop_vzeroupper): Deleted.
(define_expand return): Deleted vzeroupper emitting.
(define_expand simple_return): Deleted.

2012-11-04  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* gcc.target/i386/avx-vzeroupper-5.c: Changed scan-assembler-times.
gcc.target/i386/avx-vzeroupper-8.c: Likewise.
gcc.target/i386/avx-vzeroupper-9.c: Likewise.
gcc.target/i386/avx-vzeroupper-10.c: Likewise.
gcc.target/i386/avx-vzeroupper-11.c: Likewise.
gcc.target/i386/avx-vzeroupper-12.c: Likewise.
gcc.target/i386/avx-vzeroupper-19.c: Likewis.
gcc.target/i386/avx-vzeroupper-27.c: New.

Target part (without mode-switching.c change) is OK for mainline, with
a few small changes below:

+#define VALID_AVX256_REG_OR_OI_MODE(m) (VALID_AVX256_REG_MODE (m) ||
(m) == OImode)
 enum upper_128bits_state

Put this definition in i386.h, after VALID_AVX256_REG_MODE.

+static void
+ix86_emit_vzeroupper (void)
+{
+  emit_insn (gen_avx_vzeroupper (GEN_INT (9)));
+}

No need to pass argument to vzeroupper anymore. We have only one
vzeroupper type now, so following definition in sse.md could also be
changed from:

(define_insn avx_vzeroupper
  [(unspec_volatile [(match_operand 0 const_int_operand)]
UNSPECV_VZEROUPPER)]

to:

(define_insn avx_vzeroupper
  [(unspec_volatile [(const_int 0)]
UNSPECV_VZEROUPPER)]

Please call gen_avx_vzeroupper directly, so ix86_emit_vzeroupper
wrapper function can be simply deleted.

+/* Check insn for vzeroupper intrinsic.  */
+
+static bool
+is_vzeroupper (rtx pat)
+{
+  return pat
+ GET_CODE (pat) == UNSPEC_VOLATILE
+ XINT (pat, 1) == UNSPECV_VZEROUPPER;
+}
+
+/* Check insn for vzeroall intrinsic.  */
+
+static bool
+is_vzeroall (rtx pat)
+{
+  return pat
+ GET_CODE (pat) == PARALLEL
+ GET_CODE (XVECEXP (pat, 0, 0)) == UNSPEC_VOLATILE
+ XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL;
+}

These should be put in predicates.md. This can be in a follow-up patch.

 case 

Re: [PATCH] Vzeroupper placement/47440

2012-11-04 Thread Uros Bizjak
On Sun, Nov 4, 2012 at 2:29 PM, Uros Bizjak ubiz...@gmail.com wrote:

 -/* { dg-final { scan-assembler-not avx_vzeroupper } } */
 +/* { dg-final { scan-assembler-times avx_vzeroupper 3 } } */

 (... and a couple of similar testsuite changes ...)

 These asm scans were put there for a reason. I assume you have looked
 at these differences and are correct (this also implies that current
 vzeroupper placement code is not optimal or even wrong).

Ah, these extra instructions were inserted with _mm265_* intrinsics.
We decided some time ago, that these should remain, and no attempt to
optimize them will be performed. OTOH, automatic insertion won't
emit extra vzeroupper in this case.

So, all is OK.

Thanks,
Uros.