Re: [RFC, x86] Changes for AVX and AVX2 processors

2013-01-11 Thread Vladimir Yakovlev
I've fixed Changelog. Can we commit the patch to trunk now?

2012-12-27  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* config/i386/i386-c.c (ix86_target_macros_internal): New case.
(ix86_target_macros_internal): Likewise.

* config/i386/i386.c (m_CORE2I7): Removed.
(m_CORE_HASWELL): New macro.
(m_CORE_ALL): Likewise.
(initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
(initial_ix86_arch_features): Likewise.
(processor_target_table): Initializations for Core avx2.
(cpu_names): New names core-avx2.
(ix86_option_override_internal): Changed PROCESSOR_COREI7 by
PROCESSOR_CORE_HASWELL.
(ix86_issue_rate): New case.
(ia32_multipass_dfa_lookahead): Likewise.
(ix86_sched_init_global): Likewise.

* config/i386/i386.h (TARGET_HASWELL): New macro.
(target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
(processor_type): New PROCESSOR_HASWELL.


2013/1/10 Jakub Jelinek ja...@redhat.com:
 On Thu, Jan 10, 2013 at 12:28:24PM +0100, Uros Bizjak wrote:
 On Thu, Jan 10, 2013 at 12:12 PM, Vladimir Yakovlev
 vbyakov...@gmail.com wrote:

  It seems I didn't sent a patch with last changes. Sorry if so.
 
  Vladimir
 
   2012-12-27  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

 Missing  at the end of line.

 
   * config/i386/i386-c.c (ix86_target_macros_internal): New case.
(ix86_target_macros_internal): Likewise.

 There is some additional space at the beginning of this line (note, all
 ChangeLog lines but the one with date should be tab indented, not space).

 Jakub


Re: [RFC, x86] Changes for AVX and AVX2 processors

2013-01-11 Thread Vladimir Yakovlev
I sent the patch. Send it once more.

2013/1/11 Jakub Jelinek ja...@redhat.com:
 On Fri, Jan 11, 2013 at 03:25:47PM +0400, Vladimir Yakovlev wrote:
 I've fixed Changelog. Can we commit the patch to trunk now?

 2012-12-27  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

   * config/i386/i386-c.c (ix86_target_macros_internal): New case.
   (ix86_target_macros_internal): Likewise.

   * config/i386/i386.c (m_CORE2I7): Removed.
   (m_CORE_HASWELL): New macro.
   (m_CORE_ALL): Likewise.
   (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
   (initial_ix86_arch_features): Likewise.
   (processor_target_table): Initializations for Core avx2.
   (cpu_names): New names core-avx2.
   (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
   PROCESSOR_CORE_HASWELL.
   (ix86_issue_rate): New case.
   (ia32_multipass_dfa_lookahead): Likewise.
   (ix86_sched_init_global): Likewise.

   * config/i386/i386.h (TARGET_HASWELL): New macro.
   (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
   (processor_type): New PROCESSOR_HASWELL.

 Uros already acked the patch, so it certainly is ok to commit now.

 Jakub


patch1
Description: Binary data


Re: [RFC, x86] Changes for AVX and AVX2 processors

2013-01-11 Thread Vladimir Yakovlev
Kirill,

Could you commit patch?

2013-01-11  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* config/i386/i386-c.c (ix86_target_macros_internal): New case.
(ix86_target_macros_internal): Likewise.

* config/i386/i386.c (m_CORE2I7): Removed.
(m_CORE_HASWELL): New macro.
(m_CORE_ALL): Likewise.
(initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
(initial_ix86_arch_features): Likewise.
(processor_target_table): Initializations for Core avx2.
(cpu_names): New names core-avx2.
(ix86_option_override_internal): Changed PROCESSOR_COREI7 by
PROCESSOR_CORE_HASWELL.
(ix86_issue_rate): New case.
(ia32_multipass_dfa_lookahead): Likewise.
(ix86_sched_init_global): Likewise.

* config/i386/i386.h (TARGET_HASWELL): New macro.
(target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
(processor_type): New PROCESSOR_HASWELL.



2013/1/11 Uros Bizjak ubiz...@gmail.com:
 On Fri, Jan 11, 2013 at 1:14 PM, Vladimir Yakovlev vbyakov...@gmail.com 
 wrote:
 I sent the patch. Send it once more.

 2013/1/11 Jakub Jelinek ja...@redhat.com:
 On Fri, Jan 11, 2013 at 03:25:47PM +0400, Vladimir Yakovlev wrote:
 I've fixed Changelog. Can we commit the patch to trunk now?

 2012-12-27  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

   * config/i386/i386-c.c (ix86_target_macros_internal): New case.
   (ix86_target_macros_internal): Likewise.

   * config/i386/i386.c (m_CORE2I7): Removed.
   (m_CORE_HASWELL): New macro.
   (m_CORE_ALL): Likewise.
   (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
   (initial_ix86_arch_features): Likewise.
   (processor_target_table): Initializations for Core avx2.
   (cpu_names): New names core-avx2.
   (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
   PROCESSOR_CORE_HASWELL.
   (ix86_issue_rate): New case.
   (ia32_multipass_dfa_lookahead): Likewise.
   (ix86_sched_init_global): Likewise.

   * config/i386/i386.h (TARGET_HASWELL): New macro.
   (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
   (processor_type): New PROCESSOR_HASWELL.

 Uros already acked the patch, so it certainly is ok to commit now.

 Yes, the patch is OK, you can commit it to mainline SVN. If you are
 unable to commit, please say so in the patch proposal, so someone will
 commit the patch for you (as explained in [1]).

 [1] http://gcc.gnu.org/contribute.html

 Uros.


patch1
Description: Binary data


Re: [RFC, x86] Changes for AVX and AVX2 processors

2013-01-10 Thread Vladimir Yakovlev
Hello Uros,

It seems I didn't sent a patch with last changes. Sorry if so.

Vladimir

 2012-12-27  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

 * config/i386/i386-c.c (ix86_target_macros_internal): New case.
  (ix86_target_macros_internal): Likewise.

 * config/i386/i386.c (m_CORE2I7): Removed.
 (m_CORE_HASWELL): New macro.
 (m_CORE_ALL): Likewise.
 (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
 (initial_ix86_arch_features): Likewise.
 (processor_target_table): Initializations for Core avx2.
 (cpu_names): New names core-avx2.
 (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
 PROCESSOR_CORE_HASWELL.
 (ix86_issue_rate): New case.
 (ia32_multipass_dfa_lookahead): Likewise.
 (ix86_sched_init_global): Likewise.

 * config/i386/i386.h (TARGET_HASWELL): New macro.
 (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
 (processor_type): New PROCESSOR_HASWELL.


2012/12/30 Uros Bizjak ubiz...@gmail.com:
 On Sun, Dec 30, 2012 at 5:05 PM, Vladimir Yakovlev vbyakov...@gmail.com 
 wrote:
 I fixed typos and added CalangeLog.

 2012-12-27  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

 * config/i386/i386-c.c (ix86_target_macros_internal): New case.
  (ix86_target_macros_internal): Likewise.

 * config/i386/i386.c (m_CORE2I7): Removed.
 (m_CORE_HASWELL): New macro.
 (m_CORE_ALL): Likewise.
 (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
 (initial_ix86_arch_features): Likewise.
 (processor_target_table): Initializations for Core avx2.
 (cpu_names): New names core-avx2.
 (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
 PROCESSOR_CORE_HASWELL.
 (ix86_issue_rate): New case.
 (ia32_multipass_dfa_lookahead): Likewise.
 (ix86_sched_init_global): Likewise.
 (get_builtin_code_for_version): Likewise.

 * config/i386/i386.h (TARGET_HASWELL): New macro.
 (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
 (processor_type): New PROCESSOR_HASWELL.

 Please remove this part, it should be part of processor dispatcher part:

 @@ -28705,6 +28712,10 @@ get_builtin_code_for_version (tree decl, tree
 *predicate_list)
   arg_str = corei7;
   priority = P_PROC_SSE4_2;
   break;
 +   case PROCESSOR_HASWELL:
 + arg_str = core-avx2;
 + priority = P_PROC_SSE4_2;
 + break;
 case PROCESSOR_ATOM:
   arg_str = atom;
   priority = P_PROC_SSSE3;

 Uros.


patch1
Description: Binary data


Re: [RFC, x86] Changes for AVX and AVX2 processors

2012-12-30 Thread Vladimir Yakovlev
I fixed typos and added CalangeLog.

2012-12-27  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* config/i386/i386-c.c (ix86_target_macros_internal): New case.
 (ix86_target_macros_internal): Likewise.

* config/i386/i386.c (m_CORE2I7): Removed.
(m_CORE_HASWELL): New macro.
(m_CORE_ALL): Likewise.
(initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
(initial_ix86_arch_features): Likewise.
(processor_target_table): Initializations for Core avx2.
(cpu_names): New names core-avx2.
(ix86_option_override_internal): Changed PROCESSOR_COREI7 by
PROCESSOR_CORE_HASWELL.
(ix86_issue_rate): New case.
(ia32_multipass_dfa_lookahead): Likewise.
(ix86_sched_init_global): Likewise.
(get_builtin_code_for_version): Likewise.

* config/i386/i386.h (TARGET_HASWELL): New macro.
(target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
(processor_type): New PROCESSOR_HASWELL.


2012/12/30 Uros Bizjak ubiz...@gmail.com:
 On Sat, Dec 29, 2012 at 5:57 PM, Vladimir Yakovlev vbyakov...@gmail.com 
 wrote:
 I did changes. Please take a look.

 2012/12/29, Uros Bizjak ubiz...@gmail.com:
 On Sat, Dec 29, 2012 at 6:26 AM, Vladimir Yakovlev vbyakov...@gmail.com
 wrote:

 processor_alias_table contains the same processor type for all
 corei7, corei7-avx, core-avx-i and core-avx2. At least, it has
 consequence on checking x86_avx256_split_unaligned_load 
 ix86_tune_mask: for all these processors it results the same. Moreover
 we cannot turn new features on for AVX/AVX2 using
 initial_ix86_tune_features.

 corei7, corei7-avx and core-avx-i are all based on sandybridge (=
 PROCESSOR_COREI7) architecture. The only problematic entry is
 core-avx2, which should be based on new architecture. I propose
 PROCESSOR_HASWELL, in the same way as we have PROCESSOR_NOCONA.

 @@ -2467,6 +2470,7 @@
nocona,
core2,
corei7,
 +  coreavx2,
atom,
geode,
k6,

 This string should match processor_alias_table name, so core-avx2.

 @@ -28709,6 +28716,10 @@
   arg_str = corei7;
   priority = P_PROC_SSE4_2;
   break;
 +   case PROCESSOR_HASWELL:
 + arg_str = core_avx2;
 + priority = P_PROC_SSE4_2;
 + break;
 case PROCESSOR_ATOM:
   arg_str = atom;
   priority = P_PROC_SSSE3;

 This is part of a processor dispatcher functionality. To support this
 functionality, some more changes are needed, so it is IMO best to
 leave this part out for now. I would also like the author of processor
 dispatcher to review changes in this area.

 On a related note, it looks to me that corei7 should declare
 P_PROC_AVX here (this change should be part of another patch).

 Other than that , the patch looks OK, but please repost final version
 with a correct ChangeLog.

 Uros.


patch
Description: Binary data


[RFC, x86] Changes for AVX and AVX2 processors

2012-12-29 Thread Vladimir Yakovlev
I did changes. Please take a look.

2012/12/29, Uros Bizjak ubiz...@gmail.com:
 On Sat, Dec 29, 2012 at 6:26 AM, Vladimir Yakovlev vbyakov...@gmail.com
 wrote:

 processor_alias_table contains the same processor type for all
 corei7, corei7-avx, core-avx-i and core-avx2. At least, it has
 consequence on checking x86_avx256_split_unaligned_load 
 ix86_tune_mask: for all these processors it results the same. Moreover
 we cannot turn new features on for AVX/AVX2 using
 initial_ix86_tune_features.

 corei7, corei7-avx and core-avx-i are all based on sandybridge (=
 PROCESSOR_COREI7) architecture. The only problematic entry is
 core-avx2, which should be based on new architecture. I propose
 PROCESSOR_HASWELL, in the same way as we have PROCESSOR_NOCONA.

 Uros.

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 08e1afe..2d8abd5 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -142,11 +142,7 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
   def_or_undef (parse_in, __corei7);
   def_or_undef (parse_in, __corei7__);
   break;
-case PROCESSOR_CORE_AVX:
-  def_or_undef (parse_in, __core_avx);
-  def_or_undef (parse_in, __core_avx__);
-  break;
-case PROCESSOR_CORE_AVX2:
+case PROCESSOR_HASWELL:
   def_or_undef (parse_in, __core_avx2);
   def_or_undef (parse_in, __core_avx2__);
   break;
@@ -240,10 +236,7 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 case PROCESSOR_COREI7:
   def_or_undef (parse_in, __tune_corei7__);
   break;
-case PROCESSOR_CORE_AVX:
-  def_or_undef (parse_in, __tune_core_avx__);
-  break;
-case PROCESSOR_CORE_AVX2:
+case PROCESSOR_HASWELL:
   def_or_undef (parse_in, __tune_core_avx2__);
   break;
 case PROCESSOR_ATOM:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 10411da..4adbef6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1732,9 +1732,8 @@ const struct processor_costs *ix86_cost = pentium_cost;
 #define m_P4_NOCONA (m_PENT4 | m_NOCONA)
 #define m_CORE2 (1PROCESSOR_CORE2)
 #define m_COREI7 (1PROCESSOR_COREI7)
-#define m_CORE_AVX (1PROCESSOR_CORE_AVX)
-#define m_CORE_AVX2 (1PROCESSOR_CORE_AVX2)
-#define m_CORE_ALL (m_CORE2 | m_COREI7 | m_CORE_AVX | m_CORE_AVX2)
+#define m_HASWELL (1PROCESSOR_HASWELL)
+#define m_CORE_ALL (m_CORE2 | m_COREI7  | m_HASWELL)
 #define m_ATOM (1PROCESSOR_ATOM)
 
 #define m_GEODE (1PROCESSOR_GEODE)
@@ -2438,8 +2437,6 @@ static const struct ptt 
processor_target_table[PROCESSOR_max] =
   {core_cost, 16, 10, 16, 10, 16},
   /* Core i7  */
   {core_cost, 16, 10, 16, 10, 16},
-  /* Core avx  */
-  {core_cost, 16, 10, 16, 10, 16},
   /* Core avx2  */
   {core_cost, 16, 10, 16, 10, 16},
   {generic32_cost, 16, 7, 16, 7, 16},
@@ -2469,7 +2466,6 @@ static const char *const 
cpu_names[TARGET_CPU_DEFAULT_max] =
   nocona,
   core2,
   corei7,
-  coreavx,
   coreavx2,
   atom,
   geode,
@@ -2912,17 +2908,17 @@ ix86_option_override_internal (bool main_args_p)
   {corei7, PROCESSOR_COREI7, CPU_COREI7,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_CX16 | PTA_FXSR},
-  {corei7-avx, PROCESSOR_CORE_AVX, CPU_COREI7,
+  {corei7-avx, PROCESSOR_COREI7, CPU_COREI7,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL
| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-  {core-avx-i, PROCESSOR_CORE_AVX, CPU_COREI7,
+  {core-avx-i, PROCESSOR_COREI7, CPU_COREI7,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
| PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-  {core-avx2, PROCESSOR_CORE_AVX2, CPU_COREI7,
+  {core-avx2, PROCESSOR_HASWELL, CPU_COREI7,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
@@ -24069,8 +24065,7 @@ ix86_issue_rate (void)
 case PROCESSOR_PENTIUM4:
 case PROCESSOR_CORE2:
 case PROCESSOR_COREI7:
-case PROCESSOR_CORE_AVX:
-case PROCESSOR_CORE_AVX2:
+case PROCESSOR_HASWELL:
 case PROCESSOR_ATHLON:
 case PROCESSOR_K8:
 case PROCESSOR_AMDFAM10:
@@ -24327,8 +24322,7 @@ ia32_multipass_dfa_lookahead (void)
 
 case PROCESSOR_CORE2:
 case PROCESSOR_COREI7:
-case PROCESSOR_CORE_AVX:
-case PROCESSOR_CORE_AVX2:
+case PROCESSOR_HASWELL:
 case PROCESSOR_ATOM:
   /* Generally, we want haifa-sched:max_issue() to look ahead as far
 as many instructions can be executed on a cycle, i.e.,
@@ -24873,8 +24867,7 @@ ix86_sched_init_global (FILE *dump ATTRIBUTE_UNUSED,
 {
 case PROCESSOR_CORE2:
 case

Re: [RFC, x86] Changes for AVX and AVX2 processors

2012-12-28 Thread Vladimir Yakovlev
Hello,

processor_alias_table contains the same processor type for all
corei7, corei7-avx, core-avx-i and core-avx2. At least, it has
consequence on checking x86_avx256_split_unaligned_load 
ix86_tune_mask: for all these processors it results the same. Moreover
we cannot turn new features on for AVX/AVX2 using
initial_ix86_tune_features.
.
2012/12/28 Uros Bizjak ubiz...@gmail.com:
 Hello!

 New processors core-avx and core-avx2 are added. It was done to have
 possibilities to turn new features on for these processors. Please review.

 I don't think this is a good approach, you are mixing an architecture
 with an ISA extension in the name. We already have
 processor_alias_table, where processor architecture and features
 (extensions) can be activated, depending on the name.

 Uros.


Fwd: [off-list] Re: [PATCH] Vzeroupper placement/47440

2012-11-09 Thread Vladimir Yakovlev
-- Forwarded message --
From: Vladimir Yakovlev vbyakov...@gmail.com
Date: 2012/11/9
Subject: Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
To: Uros Bizjak ubiz...@gmail.com
Копия: H.J. Lu hjl.to...@gmail.com, Igor Zamyatin izamya...@gmail.com


I did changes that moves vzeroupper insertion after reload

2012-11-09  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New.
* config/i386/i386.c (ix86_init_machine_status): Deleted
initialization for mode switching.
* i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New.
* mode-switching.c (gate_mode_switching1): New.
(rest_of_handle_mode_switching1): New.
(pass_mode_switching1): New.
* passes.c (init_optimization_passes): New pass pass_mode_switching1.
* tree-pass.h (pass_mode_switching1): New.

But this caused assertion fails in  rtl_verify_flow_info_1 () at cfgrtl.c:2291

  fatal_insn (flow control insn inside a basic block, x);

The asserts are called by two calls of mode-switching.c:
commit_edge_insertion and cleanup_cfg. After I commented (see below)
459.GemsFDTD benchspec passed. Your opinion of the patch and haw we
can do something with asserts.

Regards,
Vladimir

--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -747,7 +747,7 @@ optimize_mode_switching (void)
 commit_edge_insertions ();

 #if defined (MODE_ENTRY)  defined (MODE_EXIT)
-  cleanup_cfg (CLEANUP_NO_INSN_DEL);
+  /*cleanup_cfg (CLEANUP_NO_INSN_DEL);*/
 #else
   if (!need_commit  !emitted)
 return 0;
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -1828,7 +1828,7 @@ commit_edge_insertions (void)
   basic_block bb;

 #ifdef ENABLE_CHECKING
-  verify_flow_info ();
+  /*verify_flow_info ();*/
 #endif


2012/11/9 Uros Bizjak ubiz...@gmail.com:
 On Thu, Nov 8, 2012 at 6:52 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Uh, this is spill around call insn, produced by reload.

 Please compile this code:

 double test (double a)
 {
   printf (Hello\n);
   return a;
 }

 You will get at mode switching:

 1 NOTE_INSN_DELETED
 4 NOTE_INSN_BASIC_BLOCK
 2 r60:DF=xmm0:DF
   REG_DEAD: xmm0:DF
 3 NOTE_INSN_FUNCTION_BEG
 6 di:DI=`*.LC0'
 7 call ...
   REG_DEAD: di:DI
   REG_UNUSED: ax:SI
12 xmm0:DF=r60:DF
   REG_DEAD: r60:DF
15 use xmm0:DF

 But reload will insert:

 1 NOTE_INSN_DELETED
 4 NOTE_INSN_BASIC_BLOCK
 2 xmm0:DF=xmm0:DF
   REG_DEAD: xmm0:DF
18 [sp:DI+0x8]=xmm0:DF
   REG_DEAD: xmm0:DF
 3 NOTE_INSN_FUNCTION_BEG
 6 di:DI=`*.LC0'
 7 call ...
   REG_DEAD: di:DI
   REG_UNUSED: ax:SI
19 xmm0:DF=[sp:DI+0x8]
   REG_DEAD: r62:DF
12 xmm0:DF=xmm0:DF
   REG_DEAD: xmm0:DF
15 use xmm0:DF

 I was not paying attention to this situation.


 A viable solution to this issue is through machine-reorg function (AKA
 x86_reorg) that would just move vzeroupper to the close proximity to a
 call insn. This would work on non-64bit-MS-ABI targets, where all SSE
 registers are dead at call insn place.

 Please note that 64bit-MS-ABI target declares registers xmm6+ as
 call-saved, so they can live over the call. I am not familiar with
 this target, but it looks to me that we have to remove vzeroupper, if
 one or more call-saved SSE registers are live at the call insn place.

 Uros.


prvzu.patch
Description: Binary data


Re: [off-list] Re: [PATCH] Vzeroupper placement/47440

2012-11-09 Thread Vladimir Yakovlev
 These assert should tell you what is wrong with the control flow.
 Please look at control_flow_insn_p, which condition returns true.

There is a note after call insn.

(call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
(_gfortran_stop_string) [flags 0x41] function_decl 0x77eb6200
_gfortran_stop_string) [0 _gfortran_stop_string S1 A8])
(const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
 (expr_list:REG_DEAD (reg:DI 5 di)
(expr_list:REG_DEAD (reg:SI 4 si)
(expr_list:REG_NORETURN (const_int 0 [0])
(nil
(expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
(expr_list:REG_BR_PRED (use (reg:SI 4 si))
(nil
(note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
(const_int 0 [0]))
(expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
(const_int 0 [0]))
(nil))) NOTE_INSN_CALL_ARG_LOCATION)

 You shouldn't disable commit_edge_insertions, as there is the function
 where vzerouppers are emitted.

I didn;t disable commit_edge_insertions. I only remove call of assert.

2012/11/9 Uros Bizjak ubiz...@gmail.com:
 On Fri, Nov 9, 2012 at 11:45 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Fri, Nov 9, 2012 at 11:21 AM, Vladimir Yakovlev vbyakov...@gmail.com 
 wrote:
 I did changes that moves vzeroupper insertion after reload

 2012-11-09  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

 * i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New.
 * config/i386/i386.c (ix86_init_machine_status): Deleted
 initialization for mode switching.
 * i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New.
 * mode-switching.c (gate_mode_switching1): New.
 (rest_of_handle_mode_switching1): New.
 (pass_mode_switching1): New.
 * passes.c (init_optimization_passes): New pass 
 pass_mode_switching1.
 * tree-pass.h (pass_mode_switching1): New.

 But this caused assertion fails in  rtl_verify_flow_info_1 () at 
 cfgrtl.c:2291

   fatal_insn (flow control insn inside a basic block, x);

 The asserts are called by two calls of mode-switching.c:
 commit_edge_insertion and cleanup_cfg. After I commented (see below)
 459.GemsFDTD benchspec passed. Your opinion of the patch and haw we
 can do something with asserts.

 These assert should tell you what is wrong with the control flow.
 Please look at control_flow_insn_p, which condition returns true. You
 shouldn't disable commit_edge_insertions, as there is the function
 where vzerouppers are emitted.

 Uros.


Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Vladimir Yakovlev
Hello,

Thanyou for investigation and fixing the problem.  I'll answer on remarks later.

Regards,
Vladimir

2012/11/7 Jakub Jelinek ja...@redhat.com:
 On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote:
 On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
  Hello,
  OK for mainline SVN, please commit.
  Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html
 
  Thanks, K

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224

 Not only that, it also broke --enable-checking=yes,rtl bootstrap.
 SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and
 the code looks for reg, so I think looking for CALL was just a mistake.

 This fixes the bootstrap, ok for trunk?

 2012-11-06  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
 look for reg in CALL operand.

 --- gcc/config/i386/i386.c.jj   2012-11-06 18:10:22.0 +0100
 +++ gcc/config/i386/i386.c  2012-11-06 20:15:09.068912242 +0100
 @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx
/* Check for CALL instruction.  */
if (CALL_P (insn))
  {
 -  if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
 +  if (GET_CODE (pat) == SET)
 reg = SET_DEST (pat);
 -  else if (GET_CODE (pat) ==  PARALLEL)
 +  else if (GET_CODE (pat) == PARALLEL)
 for (i = XVECLEN (pat, 0) - 1; i = 0; i--)
   {
 rtx x = XVECEXP (pat, 0, i);


 Jakub


Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Vladimir Yakovlev
I tested changes with configure

../gcc/configure --enable-clocale=gnu --with-system-zlib
--enable-shared --with-demangler-in-ld --with-fpmath=sse
--enable-languages=c,c++,fortran,java,lto,objc --with-arch=corei7-avx
--with-cpu=corei7-avx

Bootstrap is passed and no new fails in make check.

Thank you,
Vladimir


2012/11/7 Vladimir Yakovlev vbyakov...@gmail.com:
 Hello,

 Thanyou for investigation and fixing the problem.  I'll answer on remarks 
 later.

 Regards,
 Vladimir

 2012/11/7 Jakub Jelinek ja...@redhat.com:
 On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote:
 On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
  Hello,
  OK for mainline SVN, please commit.
  Checked into GCC trunk: 
  http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html
 
  Thanks, K

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224

 Not only that, it also broke --enable-checking=yes,rtl bootstrap.
 SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and
 the code looks for reg, so I think looking for CALL was just a mistake.

 This fixes the bootstrap, ok for trunk?

 2012-11-06  Jakub Jelinek  ja...@redhat.com

 * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
 look for reg in CALL operand.

 --- gcc/config/i386/i386.c.jj   2012-11-06 18:10:22.0 +0100
 +++ gcc/config/i386/i386.c  2012-11-06 20:15:09.068912242 +0100
 @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx
/* Check for CALL instruction.  */
if (CALL_P (insn))
  {
 -  if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
 +  if (GET_CODE (pat) == SET)
 reg = SET_DEST (pat);
 -  else if (GET_CODE (pat) ==  PARALLEL)
 +  else if (GET_CODE (pat) == PARALLEL)
 for (i = XVECLEN (pat, 0) - 1; i = 0; i--)
   {
 rtx x = XVECEXP (pat, 0, i);


 Jakub


Re: [PATCH, middle-end]: Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return

2012-11-05 Thread Vladimir Yakovlev
Hellow, Kaz

I've updated copyright. Is it Ok?

Thanks,
Vladimir

--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -1,6 +1,6 @@
 /* CPU mode switching
Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008,
-   2009, 2010 Free Software Foundation, Inc.
+   2009, 2010, 2011, 2012 Free Software Foundation, Inc.

 This file is part of GCC.


2012/11/5 Kaz Kojima kkoj...@rr.iij4u.or.jp:
 Uros Bizjak ubiz...@gmail.com wrote:
 2012-11-04  Vladimir Yakovlev  vladimir.b.yakov...@intel.com
   Uros Bizjak  ubiz...@gmail.com

 * mode-switching.c (create_pre_exit): Added code for
 maybe_builtin_apply case.

 Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
 with vzeroupper patch [1] applied.

 I have added SH4 maintainer for possible comments.

 I've confirmed that there are no new failures with the patch
 on sh4-unknown-linux-gnu.
 BTW, it looks that the copyright year of mode-switching.c
 should be updated.


 Regards,
 kaz


Re: [PATCH] Changes in mode switching

2012-10-02 Thread Vladimir Yakovlev
2012/9/30 Uros Bizjak ubiz...@gmail.com:
 On Thu, Sep 20, 2012 at 8:35 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Thu, Sep 20, 2012 at 8:06 AM, Vladimir Yakovlev vbyakov...@gmail.com 
 wrote:
 The compiler with the patch and without post_reload.patch is built and works
 successfully. It has the only failure with avx-vzeroupper-3 test because of
 post reload problem.

 Ok, can you please elaborate a bit on this filure? Perhaps someone has
 an idea why reload moves unspec_volatile around?

 LRA will eventually replace reload in the nearby future [1], does LRA
 also move unspec_volatile vzeroupper around?

 [1] http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01862.html

 Uros.

I tried my patch with LRA. It works fine. The test avx-vzeroupper-3
runs succesfully, unspec_volatile vzeroupper is not moved around in
LRA.

Vladimir


Re: [PATCH] Changes in mode switching

2012-10-02 Thread Vladimir Yakovlev
Will we wait for LRA commit or is it possiple to commit to trank
vzeroupper patch now?

2012/10/2 Uros Bizjak ubiz...@gmail.com:
 On Tue, Oct 2, 2012 at 11:35 AM, Vladimir Yakovlev vbyakov...@gmail.com 
 wrote:
 The compiler with the patch and without post_reload.patch is built and 
 works
 successfully. It has the only failure with avx-vzeroupper-3 test because 
 of
 post reload problem.

 Ok, can you please elaborate a bit on this filure? Perhaps someone has
 an idea why reload moves unspec_volatile around?

 LRA will eventually replace reload in the nearby future [1], does LRA
 also move unspec_volatile vzeroupper around?

 [1] http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01862.html

 I tried my patch with LRA. It works fine. The test avx-vzeroupper-3
 runs succesfully, unspec_volatile vzeroupper is not moved around in
 LRA.

 Great!

 This also means +1 to include LRA in 4.8 from x86 maintainer. We also
 expect spill falure fixes and other improvements for pre-reload
 scheduling from LRA.

 Uros.


Re: [PATCH] Changes in mode switching

2012-09-18 Thread Vladimir Yakovlev
Hi Ricard,

You are right I no need the changes in mode-switchig.c at all. After I
remove additional argument from  EMIT_MODE_SET and run 'make check' I
found no differences with make check result of previous run. So I no
need in any changes in the middle end part.

Regards,
Vladimir

P.S.  I'll be in vacation till end of nonth.

 Vladimir Yakovlev vbyakov...@gmail.com writes:
 I reproduced the failure and found reason of it. I understood haw it
 resolve and now I need small changes only - additional argument of
 EMIT_MODE_SET. Is it good fo trunk?

 I'm not sure I understand why you need to know the instruction.
 The x86 code was:

 +  if (mode == AVX_U128_CLEAN)
 +   {
 + if (insn)
 +   {
 + rtx pat = PATTERN(insn);
 + if (!is_vzeroupper(pat)  !is_vzeroall(pat))
 +   ix86_emit_vzeroupper ();
 +   }
 + else
 +   ix86_emit_vzeroupper ();
 +   }
 +  break;

 But the pass should already know via MODE_AFTER that the mode is set to
 AVX_U128_CLEAN by vzeroupper and vzeroall.  Under what circumstances
 do we think that we need to set the mode to AVX_U128_CLEAN immediately
 before vzeroupper or vzeroall?

 I'm probably making you repeat yourself here, sorry.

 Richard


2012/9/16 Richard Sandiford rdsandif...@googlemail.com:
 Vladimir Yakovlev vbyakov...@gmail.com writes:
 I reproduced the failure and found reason of it. I understood haw it
 resolve and now I need small changes only - additional argument of
 EMIT_MODE_SET. Is it good fo trunk?

 I'm not sure I understand why you need to know the instruction.
 The x86 code was:

 +  if (mode == AVX_U128_CLEAN)
 +   {
 + if (insn)
 +   {
 + rtx pat = PATTERN(insn);
 + if (!is_vzeroupper(pat)  !is_vzeroall(pat))
 +   ix86_emit_vzeroupper ();
 +   }
 + else
 +   ix86_emit_vzeroupper ();
 +   }
 +  break;

 But the pass should already know via MODE_AFTER that the mode is set to
 AVX_U128_CLEAN by vzeroupper and vzeroall.  Under what circumstances
 do we think that we need to set the mode to AVX_U128_CLEAN immediately
 before vzeroupper or vzeroall?

 I'm probably making you repeat yourself here, sorry.

 Richard


Re: [PATCH] Changes in mode switching

2012-09-18 Thread Vladimir Yakovlev
I tried to perform vzeroupper emitting after reload as additional pass
of mode switching.
I sow one problem that I don't know haw to overcome. After
'pro_and_epilogue', there can be no
flow edge to exit block and pre_exit block is not created in this case
(see rotine create_pre_exit).
Without that I cannot properly perform vzeroupper insertion at rotine exit.

Regards,
Vladimir

2012/9/18 Uros Bizjak ubiz...@gmail.com:
 Hello!

 You are right I no need the changes in mode-switchig.c at all. After I
 remove additional argument from  EMIT_MODE_SET and run 'make check' I
 found no differences with make check result of previous run. So I no
 need in any changes in the middle end part.

 Vladimir, can you please investigate, how to emit vzeroupper insns
 after reload? Vzeroupper emits hard registers, and reload moves the
 insn around even when declared with unspec_volatile.

 Uros.


Re: [PATCH] Changes in mode switching

2012-09-17 Thread Vladimir Yakovlev
 Looks OK to me, though I have no authority to approve it
 except SH specific part.

Is there any more comments? Can it be committed in trank?

Regards,
Vladimir

2012/9/14 Kaz Kojima kkoj...@rr.iij4u.or.jp:
 Vladimir Yakovlev vbyakov...@gmail.com wrote:
 I reproduced the failure and found reason of it. I understood haw it
 resolve and now I need small changes only - additional argument of
 EMIT_MODE_SET. Is it good fo trunk?

 Thank you,
 Vladimir

 2012-09-14  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

 * (optimize_mode_switching): Added an argument EMIT_MODE_SET calls.

 * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.

 * config/i386/i386.h (EMIT_MODE_SET): Added an argument.

 * config/sh/sh.h (EMIT_MODE_SET): Added an argument.

 No new failures on sh4-unknown-linux-gnu with your patch.
 Looks OK to me, though I have no authority to approve it
 except SH specific part.

 BTW, I guess that the active voice is usual in gcc/ChangeLog.
 Also, perhaps mailer issue, a tab should be used for indentation
 instead of 8 spaces and the empty line isn't required between
 items.  Maybe something like

 * mode-switching.c (optimize_mode_switching): Add an argument
 EMIT_MODE_SET calls.
 * config/epiphany/epiphany.h (EMIT_MODE_SET): Add an argument.
 * config/i386/i386.h (EMIT_MODE_SET): Likewise.
 * config/sh/sh.h (EMIT_MODE_SET): Likewise.

 is a usual form.

 Regards,
 kaz


Re: [PATCH] Changes in mode switching

2012-09-14 Thread Vladimir Yakovlev
Hello,

I reproduced the failure and found reason of it. I understood haw it
resolve and now I need small changes only - additional argument of
EMIT_MODE_SET. Is it good fo trunk?

Thank you,
Vladimir

2012-09-14  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* (optimize_mode_switching): Added an argument EMIT_MODE_SET calls.

* config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.

* config/i386/i386.h (EMIT_MODE_SET): Added an argument.

* config/sh/sh.h (EMIT_MODE_SET): Added an argument.


2012/8/29 Vladimir Yakovlev vbyakov...@gmail.com:
 I built using last configure.

 Thank you,
 Vladimir

 2012/8/29 Kaz Kojima kkoj...@rr.iij4u.or.jp:
 I tryed

 ../gcc/configure --host=i686-pc-linux-gnu
 --target=sh4-unknown-linux-gnu --enable-build-with-cxx --enable-lto
 --enable-shared --enable-threads=posix --enable-clocale=gnu
 --enable-libitm --enable-libgcj
 --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld
 --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as
 --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu
 --with-mpc=/opt2/i686-pc-linux-gnu
 --with-libelf=/opt2/i686-pc-linux-gnu --with-ppl=no
 --enable-languages=c,c++,fortran,java,lto,objc
 --prefix=/export/users/mstester/stability/work/trunk/64/install_sh4

 and have got build error. make.log attached. Could you take a look?

 make.log says

 make[2]: i686-pc-linux-gnu-ar: Command not found

 It looks your build system is x86_64-unknown-linux-gnu.
 Perhaps with specifying --host=x86_64-unknown-linux-gnu instead
 of --host=i686-pc-linux-gnu in your configuration, that error
 could be resolved, though

 --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld
 --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as
 --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu
 --with-mpc=/opt2/i686-pc-linux-gnu
 --with-libelf=/opt2/i686-pc-linux-gnu

 are strongly specific to my environment.  Maybe

   ../gcc/configure --host=x86_64-unknown-linux-gnu 
 --target=sh4-unknown-linux-gnu --enable-languages=c

 and

   make all-gcc

 is enough to get cc1 for sh4-unknown-linux-gnu.

 Best Regards,
 kaz


middle.patch
Description: Binary data


Re: [PATCH] Changes in mode switching

2012-09-14 Thread Vladimir Yakovlev
Additionaly.
You can find the patch history in
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01590.html.
I need this changes for  my implementation of vzeroupper placement:
for some statements I have no needs doing real insertion.
I tested the changes on bootstrap using config
../gcc/configure
--prefix=/export/users/vbyakovl/workspaces/vzu/install-middle
--enable-languages=c,c++,fortran

2012/9/14 Vladimir Yakovlev vbyakov...@gmail.com:
 Hello,

 I reproduced the failure and found reason of it. I understood haw it
 resolve and now I need small changes only - additional argument of
 EMIT_MODE_SET. Is it good fo trunk?

 Thank you,
 Vladimir

 2012-09-14  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

 * (optimize_mode_switching): Added an argument EMIT_MODE_SET calls.

 * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.

 * config/i386/i386.h (EMIT_MODE_SET): Added an argument.

 * config/sh/sh.h (EMIT_MODE_SET): Added an argument.


 2012/8/29 Vladimir Yakovlev vbyakov...@gmail.com:
 I built using last configure.

 Thank you,
 Vladimir

 2012/8/29 Kaz Kojima kkoj...@rr.iij4u.or.jp:
 I tryed

 ../gcc/configure --host=i686-pc-linux-gnu
 --target=sh4-unknown-linux-gnu --enable-build-with-cxx --enable-lto
 --enable-shared --enable-threads=posix --enable-clocale=gnu
 --enable-libitm --enable-libgcj
 --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld
 --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as
 --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu
 --with-mpc=/opt2/i686-pc-linux-gnu
 --with-libelf=/opt2/i686-pc-linux-gnu --with-ppl=no
 --enable-languages=c,c++,fortran,java,lto,objc
 --prefix=/export/users/mstester/stability/work/trunk/64/install_sh4

 and have got build error. make.log attached. Could you take a look?

 make.log says

 make[2]: i686-pc-linux-gnu-ar: Command not found

 It looks your build system is x86_64-unknown-linux-gnu.
 Perhaps with specifying --host=x86_64-unknown-linux-gnu instead
 of --host=i686-pc-linux-gnu in your configuration, that error
 could be resolved, though

 --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld
 --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as
 --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu
 --with-mpc=/opt2/i686-pc-linux-gnu
 --with-libelf=/opt2/i686-pc-linux-gnu

 are strongly specific to my environment.  Maybe

   ../gcc/configure --host=x86_64-unknown-linux-gnu 
 --target=sh4-unknown-linux-gnu --enable-languages=c

 and

   make all-gcc

 is enough to get cc1 for sh4-unknown-linux-gnu.

 Best Regards,
 kaz


Re: [PATCH] Changes in mode switching

2012-08-24 Thread Vladimir Yakovlev
Thank you for testing.

 With commenting out if (i != mode) of the hunk

I changed type of transp and added this checking because if we reset
transp[mode], then later in the loop

  FOR_EACH_BB (bb)
sbitmap_not (kill[bb-index], transp[i][bb-index]);

we set kill of the bb for that mode and thereby force insertion mode
switching for the mode in succeeding blocks in any case.

Regards,
Vladimir

2012/8/24 Kaz Kojima kkoj...@rr.iij4u.or.jp:
 I've tried the patch on sh4-unknown-linux-gnu.  I see new failures
 with it:

 Here is a reduced test case for sh4-unknown-linux-gnu.

 volatile double gd[32];
 volatile float gf[32];

 int main ()
 {
   int i;

   for (i = 0; i  32; i++)
 gd[i] = i * 4, gf[i] = i;

   for (i = 0; i  32; i++)
 if (gd[i] != i * 4
 || gf[i] != i)
   abort ();
   exit (0);
 }

 The problem occurs at the second loop.  With the patch, the only
 mode switching is done at just before gf[i] != i.
 OTOH the original compiler inserts mode switchings both at before
 gd[i] != i * 4 and gf[i] != i.
 With commenting out if (i != mode) of the hunk

 @@ -530,10 +535,16 @@ optimize_mode_switching (void)
   last_mode = mode;
   ptr = new_seginfo (mode, insn, bb-index, live_now);
   add_seginfo (info + bb-index, ptr);
 - RESET_BIT (transp[bb-index], j);
 + for (i = 0 ; i  max_num_modes; i++)
 +   if (i != mode)
 + RESET_BIT (transp[i][bb-index], j);
 ...

 it looks all new failures go away.

 Regards,
 kaz


[PATCH] Changes in mode switching

2012-08-23 Thread Vladimir Yakovlev
I discoverd some inaccuracies when tried to implement vzeroupper
insertion (pr#47440).

First, I made 'transp' as an array of bit vectors rather bitvector
because it should be own for each mode, otherwise its resetting on
mode changing kills all modes (and new mode also).

Another changes  concern processing of  mode switching inside a basic block.

I also added addition argument to EMIT_MODE_SET because it is needed
me in target dependent changes.
Make check and bootstrap passed, no fails.

I used compiler
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure
--prefix=/export/users/vbyakovl/workspaces/vzu/install-ref
--enable-languages=c,c++,fortran --with-arch=corei7 --with-cpu=corei7
--with-fpmath=sse

Ok for trank.


2012-08-25  Vladimir Yakovlev  vladimir.b.yakov...@intel.com

* mode-switching.c (transp): Changed type
(make_preds_opaque): Added an argument
(optimize_mode_switching): Some fixes which was done for vzeroupper
insertion needs.

* config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.

* config/i386/i386.h (EMIT_MODE_SET): Added an argument.

* config/sh/sh.h (EMIT_MODE_SET): Added an argument.


patch
Description: Binary data


Re: [PATCH, Atom] Fix performance regression with -mtune=atom

2011-10-14 Thread Vladimir Yakovlev
This is a ping. Change affects Atom only and was made because it
really gives better performance on this architecture. This fact
actually leads to the thought that old value is just a simple
misprint.
  Please look.

Vladimir

2011/9/30 Vladimir Yakovlev vbyakov...@gmail.com:
 This patch fixes performance regression with -mtune=atom. Changing
 atom cost removes regression in several tests of EEMBC and spec2000.
 Bootstrap amd make check Ok for both with and witout -mtune-atom.
 OK for trunk?

 2011-09-30  Yakovlev Vladimir  vladimir.b.yakov...@intel.com

      * gcc/config/i386/i386.c (atom_cost): Changed cost for loading
       QImode using movzbl.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 7e89dbd..8a512a7 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -1672,7 +1672,7 @@ struct processor_costs atom_cost = {
   COSTS_N_INSNS (1),                   /* cost of movzx */
   8,                                   /* large insn */
   17,                                  /* MOVE_RATIO */
 -  2,                                /* cost for loading QImode using movzbl 
 */
 +  4,                                   /* cost for loading QImode
 using movzbl */
   {4, 4, 4},                           /* cost of loading integer registers
                                           in QImode, HImode and SImode.
                                           Relative to reg-reg move (2).  */



Re: [PATCH, Atom] Fix performance regression with -mtune=atom

2011-10-14 Thread Vladimir Yakovlev
Could anyone checkin that?

Thanks,
Vladimir

2011/10/14 Uros Bizjak ubiz...@gmail.com:
 Hello!

 This is a ping. Change affects Atom only and was made because it
 really gives better performance on this architecture. This fact
 actually leads to the thought that old value is just a simple
 misprint.

  This patch fixes performance regression with -mtune=atom. Changing
  atom cost removes regression in several tests of EEMBC and spec2000.
  Bootstrap amd make check Ok for both with and witout -mtune-atom.
  OK for trunk?
 
  2011-09-30 ?Yakovlev Vladimir ?vladimir.b.yakov...@intel.com
 
  ? ? ?* gcc/config/i386/i386.c (atom_cost): Changed cost for loading
  ? ? ? QImode using movzbl.

 OK.

 Thanks,
 Uros.



Fix performance regression with -mtune=atom

2011-09-30 Thread Vladimir Yakovlev
This patch fixes performance regression with -mtune=atom. Changing
atom cost removes regression in several tests of EEMBC and spec2000.
Bootstrap amd make check Ok for both with and witout -mtune-atom.
OK for trunk?

2011-09-30  Yakovlev Vladimir  vladimir.b.yakov...@intel.com

  * gcc/config/i386/i386.c (atom_cost): Changed cost for loading
   QImode using movzbl.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7e89dbd..8a512a7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1672,7 +1672,7 @@ struct processor_costs atom_cost = {
   COSTS_N_INSNS (1),   /* cost of movzx */
   8,   /* large insn */
   17,  /* MOVE_RATIO */
-  2,/* cost for loading QImode using movzbl */
+  4,   /* cost for loading QImode
using movzbl */
   {4, 4, 4},   /* cost of loading integer registers
   in QImode, HImode and SImode.
   Relative to reg-reg move (2).  */


Re: [PATCH, testsuite] Fix for PR47440 - Use LCM for vzeroupper insertion

2011-07-20 Thread Vladimir Yakovlev
Hi Steven,

I need a separate pass because the transformation on vzeroupper
redandancy elemination must be performed when reload is completed.
I think it is possible to make it as a separate pass in ix86-reorg if
the phase works after reload.

Thaks,
Vladimir

2011/7/19 Steven Bosscher stevenb@gmail.com:
        * a/gcc/gcse.c (alloc_gcse_mem): Added code to run in PRE2.

 And this is necessary because...???

 Why not just make it a separate pass in ix86-reorg that uses LCM? Look
 at mode switching for an example.

 Ciao!
 Steven



Re: [PATCH, testsuite] Fix for PR47440 - Use LCM for vzeroupper insertion

2011-07-20 Thread Vladimir Yakovlev
Hi Uros,

Thank you for such a detailed explanation. I'll study it.

Regards,
Vladimir

2011/7/20 Uros Bizjak ubiz...@gmail.com:
 Hello!

  ? ? ? ?* a/gcc/gcse.c (alloc_gcse_mem): Added code to run in PRE2.

 And this is necessary because...???

 Why not just make it a separate pass in ix86-reorg that uses LCM? Look at 
 mode switching for an example.

 I was also expecting that vzeroupper would be inserted in the same way
 as I387 mode switching instructions are inserted. To expand on
 Steven's suggestion, please see i386.h for OPTIMIZE_MODE_SWITCHING and
 following macros.

 At the moment, there are 4 separate entities that handle (four
 independent) insertions for mode switching for x87 for each mode of
 fistp or frndint instruction. Mode insertions will actually insert
 calculations of x87 control word (CW) at optimal points and push this
 new CW (together with old CW) to known stack slot to be consumed by
 fistp/frndint insn.

 You can add a new entitiy to enum ix86_entity (say, AVX_VZEROUPPER)
 and update OPTIMIZE_MODE_SWITCHING to perform mode insertion for
 AVX_VZEROUPPER entitiy when needed. Various modes for AVX_VZEROUPPER
 are defined in NUM_MODES_FOR_MODE_SWITCHING, mode transition in
 MODE_NEEDED and insn insertions in EMIT_MODE_SET.

 Please note that LCM handles all entities in parallel, so there is no
 need for extra passes. The real worker for mode switching is
 ix86_mode_needed, but don't forget that you can disable mode switching
 pass per-function when not needed through OPTIMIZE_MODE_SWITCHING
 macro.

 FYI: Existing x87 CW initialization insertion works this way:
 - fistp/frndint is inserted into insn stream and corresponding
 OPTIMIZE_MODE_SWITCHING flag is set.
 - inserted insn has i386_cw attribute that defines requested mode in
 which the insn operate. Based on this attribute, MODE_NEEDED handles
 mode transitions (please note that there are four independent
 entities) for each entitiy.
 - EMIT_MODE_SET emits CW initializations. These are further optimized
 by follow-up optimization passes, so two consecutive initializations
 at the same place are CSEd, etc.

 Uros.