[PATCH]: Guard division by zero in ipa-icf dumps

2014-10-24 Thread Uros Bizjak
Hello!

2014-10-24  Martin Liska  mli...@suse.cz

* ipa-icf.c (sem_item_optimizer::parse_nonsingleton_classes): Guard
division by zero in dumps.
(sem_item_optimizer::merge_classes): Ditto.

Tested on alphaev68-linux-gnu, committed to mainline as obvious.

Uros.
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index d1238a4..e7a293e 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -1736,7 +1736,7 @@ sem_item_optimizer::parse_nonsingleton_classes (void)
 
   if (dump_file)
 fprintf (dump_file, Init called for %u items (%.2f%%).\n, 
init_called_count,
-100.0f * init_called_count / m_items.length ());
+m_items.length () ? 100.0f * init_called_count / m_items.length 
(): 0.0f);
 }
 
 /* Equality function for semantic items is used to subdivide existing
@@ -2196,14 +2196,15 @@ sem_item_optimizer::merge_classes (unsigned int 
prev_class_count)
   fprintf (dump_file, Congruent classes before: %u, after: %u\n,
   prev_class_count, class_count);
   fprintf (dump_file, Average class size before: %.2f, after: %.2f\n,
-  1.0f * item_count / prev_class_count,
-  1.0f * item_count / class_count);
+  prev_class_count ? 1.0f * item_count / prev_class_count : 0.0f,
+  class_count ? 1.0f * item_count / class_count : 0.0f);
   fprintf (dump_file, Average non-singular class size: %.2f, count: %u\n,
-  1.0f * non_singular_classes_sum / non_singular_classes_count,
+  non_singular_classes_count ? 1.0f * non_singular_classes_sum /
+  non_singular_classes_count : 0.0f,
   non_singular_classes_count);
   fprintf (dump_file, Equal symbols: %u\n, equal_items);
   fprintf (dump_file, Fraction of visited symbols: %.2f%%\n\n,
-  100.0f * equal_items / item_count);
+  item_count ? 100.0f * equal_items / item_count : 0.0f);
 }
 
   for (hash_tablecongruence_class_group_hash::iterator it = m_classes.begin 
();



Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-24 Thread augustine.sterl...@gmail.com
On Thu, Oct 23, 2014 at 9:12 PM, Yangfei (Felix) felix.y...@huawei.com wrote:
 Here the key point is we need a general purpose register for the loop
 instruction.

So the question to ask here is, How does this work today, without
loop instructions? Somehow--even when it has been spilled--a branch
instruction can test the trip count. There should be no difference.

 And we cannot use zero-cost looping in this situation.
 And that's why I spilt the zero_cost_loop_end into a normal test and branch.

 Also note that the hwloop_pattern_reg interface also expects a general 
 purpose register in the doloop_end pattern.

If there were no loop instruction, how would this work?


[PATCH] c++ify sreal

2014-10-24 Thread tsaunders
From: Trevor Saunders tsaund...@mozilla.com

Hi,

do $subject, and cleanup for always 64 bit hwi.


bootstrapped + regtested x86_64-unknown-linux-gnu, ok?

Trev

gcc/ChangeLog:

2014-10-24  Trevor Saunders  tsaund...@mozilla.com

* ipa-inline.c (edge_badness): Adjust.
(inline_small_functions): Likewise.
* predict.c (propagate_freq): Likewise.
(estimate_bb_frequencies): Likewise.
* sreal.c (sreal::dump): Rename from dump_sreal.
(debug): Adjust.
(copy): Remove function.
(sreal::shift_right): Rename from sreal_sift_right.
(sreal::normalize): Rename from normalize.
(sreal_init): Remove function.
(sreal::to_int): Rename from sreal_to_int.
(sreal_compare): Remove function.
(sreal::operator+): Rename from sreal_add.
(sreal::operator-): Rename from sreal_sub.
(sreal::operator*): Rename from sreal_mul.
(sreal::operator/): Rename from sreal_div.
* sreal.h (class sreal): Adjust.
(inline sreal operator+=): New operator.
(inline sreal operator-=): Likewise.
(inline sreal operator/=): Likewise.
(inline sreal operator*=): Likewise.
(inline bool operator!=): Likewise.
(inline bool operator): Likewise.
(inline bool operator=): Likewise.
(inline bool operator=): Likewise.
---
 gcc/ipa-inline.c |  25 ++-
 gcc/predict.c|  82 --
 gcc/sreal.c  | 479 +++
 gcc/sreal.h  |  97 ---
 4 files changed, 213 insertions(+), 470 deletions(-)

diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index e79a4dd..cca1fb3 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -939,29 +939,28 @@ edge_badness (struct cgraph_edge *edge, bool dump)
 
   else if (max_count)
 {
-  sreal tmp, relbenefit_real, growth_real;
   int relbenefit = relative_time_benefit (callee_info, edge, edge_time);
   /* Capping edge-count to max_count. edge-count can be larger than
 max_count if an inline adds new edges which increase max_count
 after max_count is computed.  */
   gcov_type edge_count = edge-count  max_count ? max_count : edge-count;
 
-  sreal_init (relbenefit_real, relbenefit, 0);
-  sreal_init (growth_real, growth, 0);
+  sreal relbenefit_real (relbenefit, 0);
+  sreal growth_real (growth, 0);
 
   /* relative_edge_count.  */
-  sreal_init (tmp, edge_count, 0);
-  sreal_div (tmp, tmp, max_count_real);
+  sreal tmp (edge_count, 0);
+  tmp /= max_count_real;
 
   /* relative_time_benefit.  */
-  sreal_mul (tmp, tmp, relbenefit_real);
-  sreal_div (tmp, tmp, max_relbenefit_real);
+  tmp *= relbenefit_real;
+  tmp /= max_relbenefit_real;
 
   /* growth_f_caller.  */
-  sreal_mul (tmp, tmp, half_int_min_real);
-  sreal_div (tmp, tmp, growth_real);
+  tmp *= half_int_min_real;
+  tmp /=  growth_real;
 
-  badness = -1 * sreal_to_int (tmp);
+  badness = -1 * tmp.to_int ();
  
   if (dump)
{
@@ -1604,9 +1603,9 @@ inline_small_functions (void)
  if (max_count  edge-count)
max_count = edge-count;
   }
-  sreal_init (max_count_real, max_count, 0);
-  sreal_init (max_relbenefit_real, RELATIVE_TIME_BENEFIT_RANGE, 0);
-  sreal_init (half_int_min_real, INT_MAX / 2, 0);
+  max_count_real = sreal (max_count, 0);
+  max_relbenefit_real = sreal (RELATIVE_TIME_BENEFIT_RANGE, 0);
+  half_int_min_real = sreal (INT_MAX / 2, 0);
   ipa_free_postorder_info ();
   initialize_growth_caches ();
 
diff --git a/gcc/predict.c b/gcc/predict.c
index 5f5d4a5..10675c3 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -2571,15 +2571,13 @@ propagate_freq (basic_block head, bitmap tovisit)
bb-count = bb-frequency = 0;
 }
 
-  memcpy (BLOCK_INFO (head)-frequency, real_one, sizeof (real_one));
+  BLOCK_INFO (head)-frequency = real_one;
   last = head;
   for (bb = head; bb; bb = nextbb)
 {
   edge_iterator ei;
-  sreal cyclic_probability, frequency;
-
-  memcpy (cyclic_probability, real_zero, sizeof (real_zero));
-  memcpy (frequency, real_zero, sizeof (real_zero));
+  sreal cyclic_probability = real_zero;
+  sreal frequency = real_zero;
 
   nextbb = BLOCK_INFO (bb)-next;
   BLOCK_INFO (bb)-next = NULL;
@@ -2596,42 +2594,34 @@ propagate_freq (basic_block head, bitmap tovisit)
  FOR_EACH_EDGE (e, ei, bb-preds)
if (EDGE_INFO (e)-back_edge)
  {
-   sreal_add (cyclic_probability, cyclic_probability,
-  EDGE_INFO (e)-back_edge_prob);
+   cyclic_probability += EDGE_INFO (e)-back_edge_prob;
  }
else if (!(e-flags  EDGE_DFS_BACK))
  {
-   sreal tmp;
-
/*  frequency += (e-probability
  * BLOCK_INFO (e-src)-frequency /
  

Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-24 Thread Yangfei (Felix)
 
 On Thu, Oct 23, 2014 at 9:12 PM, Yangfei (Felix) felix.y...@huawei.com 
 wrote:
  Here the key point is we need a general purpose register for the loop
  instruction.
 
 So the question to ask here is, How does this work today, without loop
 instructions? Somehow--even when it has been spilled--a branch instruction 
 can
 test the trip count. There should be no difference.
 
  And we cannot use zero-cost looping in this situation.
  And that's why I spilt the zero_cost_loop_end into a normal test and 
  branch.
 
  Also note that the hwloop_pattern_reg interface also expects a general
 purpose register in the doloop_end pattern.
 
 If there were no loop instruction, how would this work?


Just take a look at my patch. I handle this in the new define_split:

+(define_split
+  [(set (pc)
+(if_then_else (ne (match_operand:SI 0 nonimmediate_operand )
+  (const_int 1))
+  (label_ref (match_operand 1  ))
+  (pc)))
+   (set (match_operand:SI 2 nonimmediate_operand )
+(plus:SI (match_dup 0)
+ (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_LSETUP_END)
+   (clobber (match_scratch 3))]
+  TARGET_LOOPS  optimize  reload_completed
+  [(const_int 0)]
+{
+  if (!REG_P (operands[0]))
+{
+  rtx test;
+
+  /* Fallback into a normal conditional branch insn.  */
+  emit_move_insn (operands[3], operands[0]);
+  emit_insn (gen_addsi3 (operands[3], operands[3], constm1_rtx));
+  emit_move_insn (operands[0], operands[3]);
+  test = gen_rtx_NE (VOIDmode, operands[3], const0_rtx);
+  emit_jump_insn (gen_cbranchsi4 (test, operands[3],
+  const0_rtx, operands[1]));
+}
+  else
+{
+  emit_jump_insn (gen_loop_end (operands[0], operands[1], operands[2]));
+}
+
+  DONE;
+})


Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-24 Thread augustine.sterl...@gmail.com
I mean without your patch at all.

On Thu, Oct 23, 2014 at 11:30 PM, Yangfei (Felix) felix.y...@huawei.com wrote:

 On Thu, Oct 23, 2014 at 9:12 PM, Yangfei (Felix) felix.y...@huawei.com 
 wrote:
  Here the key point is we need a general purpose register for the loop
  instruction.

 So the question to ask here is, How does this work today, without loop
 instructions? Somehow--even when it has been spilled--a branch instruction 
 can
 test the trip count. There should be no difference.

  And we cannot use zero-cost looping in this situation.
  And that's why I spilt the zero_cost_loop_end into a normal test and 
  branch.

  Also note that the hwloop_pattern_reg interface also expects a general
 purpose register in the doloop_end pattern.

 If there were no loop instruction, how would this work?


 Just take a look at my patch. I handle this in the new define_split:

 +(define_split
 +  [(set (pc)
 +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand )
 +  (const_int 1))
 +  (label_ref (match_operand 1  ))
 +  (pc)))
 +   (set (match_operand:SI 2 nonimmediate_operand )
 +(plus:SI (match_dup 0)
 + (const_int -1)))
 +   (unspec [(const_int 0)] UNSPEC_LSETUP_END)
 +   (clobber (match_scratch 3))]
 +  TARGET_LOOPS  optimize  reload_completed
 +  [(const_int 0)]
 +{
 +  if (!REG_P (operands[0]))
 +{
 +  rtx test;
 +
 +  /* Fallback into a normal conditional branch insn.  */
 +  emit_move_insn (operands[3], operands[0]);
 +  emit_insn (gen_addsi3 (operands[3], operands[3], constm1_rtx));
 +  emit_move_insn (operands[0], operands[3]);
 +  test = gen_rtx_NE (VOIDmode, operands[3], const0_rtx);
 +  emit_jump_insn (gen_cbranchsi4 (test, operands[3],
 +  const0_rtx, operands[1]));
 +}
 +  else
 +{
 +  emit_jump_insn (gen_loop_end (operands[0], operands[1], operands[2]));
 +}
 +
 +  DONE;
 +})


Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-24 Thread Yangfei (Felix)
1. The original xtensa port never generates loop instruction at all.
2. A port doesn't need to implement hwloop_pattern_reg hook if it has no 
zero-cost loop instruction.

Is that clear?

 
 I mean without your patch at all.
 
 On Thu, Oct 23, 2014 at 11:30 PM, Yangfei (Felix) felix.y...@huawei.com
 wrote:
 
  On Thu, Oct 23, 2014 at 9:12 PM, Yangfei (Felix) felix.y...@huawei.com
 wrote:
   Here the key point is we need a general purpose register for the loop
   instruction.
 
  So the question to ask here is, How does this work today, without
  loop instructions? Somehow--even when it has been spilled--a branch
  instruction can test the trip count. There should be no difference.
 
   And we cannot use zero-cost looping in this situation.
   And that's why I spilt the zero_cost_loop_end into a normal test and
 branch.
 
   Also note that the hwloop_pattern_reg interface also expects a
   general
  purpose register in the doloop_end pattern.
 
  If there were no loop instruction, how would this work?
 
 
  Just take a look at my patch. I handle this in the new define_split:
 
  +(define_split
  +  [(set (pc)
  +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand
 )
  +  (const_int 1))
  +  (label_ref (match_operand 1  ))
  +  (pc)))
  +   (set (match_operand:SI 2 nonimmediate_operand )
  +(plus:SI (match_dup 0)
  + (const_int -1)))
  +   (unspec [(const_int 0)] UNSPEC_LSETUP_END)
  +   (clobber (match_scratch 3))]
  +  TARGET_LOOPS  optimize  reload_completed
  +  [(const_int 0)]
  +{
  +  if (!REG_P (operands[0]))
  +{
  +  rtx test;
  +
  +  /* Fallback into a normal conditional branch insn.  */
  +  emit_move_insn (operands[3], operands[0]);
  +  emit_insn (gen_addsi3 (operands[3], operands[3], constm1_rtx));
  +  emit_move_insn (operands[0], operands[3]);
  +  test = gen_rtx_NE (VOIDmode, operands[3], const0_rtx);
  +  emit_jump_insn (gen_cbranchsi4 (test, operands[3],
  +  const0_rtx, operands[1]));
  +}
  +  else
  +{
  +  emit_jump_insn (gen_loop_end (operands[0], operands[1],
 operands[2]));
  +}
  +
  +  DONE;
  +})


Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-24 Thread augustine.sterl...@gmail.com
On Thu, Oct 23, 2014 at 11:40 PM, Yangfei (Felix) felix.y...@huawei.com wrote:
 1. The original xtensa port never generates loop instruction at all.
 2. A port doesn't need to implement hwloop_pattern_reg hook if it has no 
 zero-cost loop instruction.

 Is that clear?

We are talking in circles. I understand very well what goes on here.

My point is:

1. Right now, today, GCC generates loops with branch instructions even
when the trip count is spilled.
2. Branch instructions and loop instructions have identical register
requirements.

Therefore:

3. loop instructions should be generatable when the trip count is spilled.


[PATCH] Fix bootstrap/PR63632

2014-10-24 Thread Markus Trippelsdorf
r216566 (r216568 for 4.9 branch) added %{fno-lto} to LINK_COMMAND_SPEC.
However the linker doesn't understand -fno-lto and errors out.
This causes LTO/PGO bootstrap to fail, because -fno-lto is used during
STAGEprofile.
Fixed by filtering out -fno-lto in collect2.c.

LTO/PGO bootstrapped and tested on powerpc64-unknown-linux-gnu.
Preapproved by Jakub on IRC.
Applied to trunk and 4.9 branch.

2014-10-24  Markus Trippelsdorf  mar...@trippelsdorf.de

PR bootstrap/63632
* collect2.c (main): Filter out -fno-lto.

PR bootstrap/63632
* g++.dg/torture/pr63632.C: New test.

diff --git a/gcc/collect2.c b/gcc/collect2.c
index c54e6fb51578..7c067ffcafbb 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -1311,6 +1311,12 @@ main (int argc, char **argv)
  ld1--;
  ld2--;
}
+ else if (strncmp (arg, -fno-lto, 8) == 0)
+   {
+ /* Do not pass -fno-lto to the linker. */
+ ld1--;
+ ld2--;
+   }
 #ifdef TARGET_AIX_VERSION
  else
{
diff --git a/gcc/testsuite/g++.dg/torture/pr63632.C 
b/gcc/testsuite/g++.dg/torture/pr63632.C
new file mode 100644
index ..48cd8692412a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr63632.C
@@ -0,0 +1,5 @@
+// PR bootstrap/63632
+// { dg-do link }
+// { dg-options -fno-lto }
+
+int main () {}
-- 
Markus


Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-24 Thread Yangfei (Felix)
Thanks for the explanation. I think I am clear about what you are thinking now. 
That's an interesting question. I am not sure about reason why GCC's reload 
cannot handle a doloop_end insn.
I guess maybe the doloop_end pattern is special? I mean it's a branch insn in a 
parallel form.



 
 On Thu, Oct 23, 2014 at 11:40 PM, Yangfei (Felix) felix.y...@huawei.com
 wrote:
  1. The original xtensa port never generates loop instruction at all.
  2. A port doesn't need to implement hwloop_pattern_reg hook if it has no
 zero-cost loop instruction.
 
  Is that clear?
 
 We are talking in circles. I understand very well what goes on here.
 
 My point is:
 
 1. Right now, today, GCC generates loops with branch instructions even when 
 the
 trip count is spilled.
 2. Branch instructions and loop instructions have identical register 
 requirements.
 
 Therefore:
 
 3. loop instructions should be generatable when the trip count is spilled.


Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-24 Thread Andrew Pinski
On Thu, Oct 23, 2014 at 11:51 PM, Yangfei (Felix) felix.y...@huawei.com wrote:
 Thanks for the explanation. I think I am clear about what you are thinking 
 now.
 That's an interesting question. I am not sure about reason why GCC's reload 
 cannot handle a doloop_end insn.
 I guess maybe the doloop_end pattern is special? I mean it's a branch insn in 
 a parallel form.


No it is not special.  Just jump are never handled by reload.  I
thought this was documented somewhere also.  Basically the main issue
with jumps is where does the reload value go which side of the jump?

Thanks,
Andrew





 On Thu, Oct 23, 2014 at 11:40 PM, Yangfei (Felix) felix.y...@huawei.com
 wrote:
  1. The original xtensa port never generates loop instruction at all.
  2. A port doesn't need to implement hwloop_pattern_reg hook if it has no
 zero-cost loop instruction.
 
  Is that clear?

 We are talking in circles. I understand very well what goes on here.

 My point is:

 1. Right now, today, GCC generates loops with branch instructions even when 
 the
 trip count is spilled.
 2. Branch instructions and loop instructions have identical register 
 requirements.

 Therefore:

 3. loop instructions should be generatable when the trip count is spilled.


Re: [PATCH 5/5] add libcc1

2014-10-24 Thread Phil Muldoon
On 10/10/14 22:58, Jeff Law wrote:
 On 10/09/14 03:07, Phil Muldoon wrote:

 Sorry for taking so long to reply.  We've talked, on irc and elsewhere
 a little (some at the Cauldron too!).  I think the consensus is as
 nobody has explicitly mentioned anything, this is OK to go in?
 Yes, please go ahead and check it in.  You'll be the first contact point if 
 something goes wrong :-)

 Given the length of time since the original post and now, can you please do 
 sanity bootstrap to make sure nothing's bitrotted before you commit?

I rebased the patch on top of GCC head (from the git repository),
updated the ChangeLogs, etc from two days ago (it takes two days to do
a full rebase, pristine and patched bootstrap and testrun on my poor
laptop ;).

I've built both pristine and patched branches with bootstrap enabled.
I ran both testsuites and used contrib/compare_tests to make sure
everything was as it should be.  compare_tests reports everything as
fine.  One minor change I found, was due to some ongoing work on
hash_tables.  It seems to parameterless constructor call for a new
hash table has been removed.  This was trivially fixed with the patch
attached.  Even though (to me) it is obvious, what do you think?

Cheers

Phil

--

diff --git a/libcc1/plugin.cc b/libcc1/plugin.cc
index fbb49d3..5cdd19d 100644
--- a/libcc1/plugin.cc
+++ b/libcc1/plugin.cc
@@ -220,13 +220,10 @@ static plugin_context *current_context;
 
 plugin_context::plugin_context (int fd)
   : cc1_plugin::connection (fd),
-address_map (),
-preserved (),
-file_names ()
+address_map (30),
+preserved (30),
+file_names (30)
 {
-  address_map.create (20);
-  preserved.create (20);
-  file_names.create (20);
 }
 
 void
@@ -236,8 +233,8 @@ plugin_context::mark ()
it != address_map.end ();
++it)
 {
-  ggc_mark ((*it).decl);
-  ggc_mark ((*it).address);
+  ggc_mark ((*it)-decl);
+  ggc_mark ((*it)-address);
 }
 
   for (hash_table pointer_hashtree_node ::iterator it = preserved.begin ();



Re: [PATCH 5/5] add libcc1

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 08:15:36AM +0100, Phil Muldoon wrote:
 On 10/10/14 22:58, Jeff Law wrote:
  On 10/09/14 03:07, Phil Muldoon wrote:
 
  Sorry for taking so long to reply.  We've talked, on irc and elsewhere
  a little (some at the Cauldron too!).  I think the consensus is as
  nobody has explicitly mentioned anything, this is OK to go in?
  Yes, please go ahead and check it in.  You'll be the first contact point if 
  something goes wrong :-)
 
  Given the length of time since the original post and now, can you please do 
  sanity bootstrap to make sure nothing's bitrotted before you commit?
 
 I rebased the patch on top of GCC head (from the git repository),
 updated the ChangeLogs, etc from two days ago (it takes two days to do
 a full rebase, pristine and patched bootstrap and testrun on my poor
 laptop ;).
 
 I've built both pristine and patched branches with bootstrap enabled.
 I ran both testsuites and used contrib/compare_tests to make sure
 everything was as it should be.  compare_tests reports everything as
 fine.  One minor change I found, was due to some ongoing work on
 hash_tables.  It seems to parameterless constructor call for a new
 hash table has been removed.  This was trivially fixed with the patch
 attached.  Even though (to me) it is obvious, what do you think?

 --- a/libcc1/plugin.cc
 +++ b/libcc1/plugin.cc
 @@ -220,13 +220,10 @@ static plugin_context *current_context;
  
  plugin_context::plugin_context (int fd)
: cc1_plugin::connection (fd),
 -address_map (),
 -preserved (),
 -file_names ()
 +address_map (30),
 +preserved (30),
 +file_names (30)
  {
 -  address_map.create (20);
 -  preserved.create (20);
 -  file_names.create (20);

This is http://gcc.gnu.org/r211936 , i.e.
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01598.html
so the changes are fine.

  }
  
  void
 @@ -236,8 +233,8 @@ plugin_context::mark ()
 it != address_map.end ();
 ++it)
  {
 -  ggc_mark ((*it).decl);
 -  ggc_mark ((*it).address);
 +  ggc_mark ((*it)-decl);
 +  ggc_mark ((*it)-address);
  }

And this is http://gcc.gnu.org/r211937 , i.e.
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01599.html
in action. 

for (hash_table pointer_hashtree_node ::iterator it = preserved.begin 
 ();

So, if these are the only non-obvious changes you needed, please go ahead
and commit.

Jakub


Re: [PATCH] Fix genmatch linking

2014-10-24 Thread Rainer Orth
Richard Biener rguent...@suse.de writes:

 On Thu, 23 Oct 2014, Richard Biener wrote:

 
 Final try for today.

 And this may work as well and is slightly simpler.

 Index: gcc/Makefile.in
 ===
 --- gcc/Makefile.in   (revision 216590)
 +++ gcc/Makefile.in   (working copy)
 @@ -981,7 +981,15 @@ else
  LIBIBERTY = ../libiberty/libiberty.a
  BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a
  endif
 +# For stage1 and when cross-compiling use the build libcpp which is
 +# built with NLS disabled.  For stage2+ use the host library and
 +# its dependencies.
 +ifeq ($(build_objdir),$(build_libobjdir))
  BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
 +else
 +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV)
 +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) 
 $(LIBICONV_DEP)
 +endif
  
  # Dependencies on the intl and portability libraries.
  LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \


 Can you test it please?

Sure: this version allowed an i386-pc-solaris2.10 bootstrap to complete
just fine.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix genmatch linking

2014-10-24 Thread Richard Biener
On Thu, 23 Oct 2014, Richard Sandiford wrote:

 Richard Biener rguent...@suse.de writes:
  This adds a libcpp host module without NLS and ICONV support
  and properly links genmatch against the build libcpp instead of
  the host one.
 
  Bootstrap running on x86_64-unknown-linux-gnu (stage1 all-gcc
  finished fine).
 
  Ok for trunk?
 
  Thanks,
  Richard.
 
  2014-10-23  Richard Biener  rguent...@suse.de
 
  * Makefile.def: Add libcpp build module and dependencies.
  * configure.ac: Add libcpp build module.
  * Makefile.in: Regenerate.
  * configure: Likewise.
 
  gcc/
  * Makefile.in (BUILD_CPPLIB): Add.
  (build/genmatch$(build_exeext)): Use BUILD_CPPLIB, not CPPLIB.
  Drop LIBIBERTY.
 
 This breaks a -j1 (!) build on x86_64-linux-gnu for me with:
 
 g++  -I/blah/libcpp -I. -I/blah/libcpp/../include -I/blah/libcpp/include  -g 
 -O2 -W -Wall -Wno-narrowing -Wwrite-strings -Wmissing-format-attribute 
 -pedantic -Wno-long-long  -fno-exceptions -fno-rtti -I/blah/libcpp -I. 
 -I/blah/libcpp/../include -I/blah/libcpp/include   -c -o charset.o -MT 
 charset.o -MMD -MP -MF .deps/charset.Tpo /blah/libcpp/charset.c
 In file included from /blah/libcpp/../include/hashtab.h:40:0,
  from /blah/libcpp/../include/filenames.h:29,
  from /blah/libcpp/system.h:367,
  from /blah/libcpp/charset.c:21:
 /blah/libcpp/../include/ansidecl.h:171:64: error: new declaration ‘char* 
 basename(const char*)’
  #  define ATTRIBUTE_NONNULL(m) __attribute__ ((__nonnull__ (m)))
 ^
 /blah/libcpp/../include/libiberty.h:113:64: note: in expansion of macro 
 ‘ATTRIBUTE_NONNULL’
  extern char *basename (const char *) ATTRIBUTE_RETURNS_NONNULL 
 ATTRIBUTE_NONNULL(1);
 ^
 In file included from /blah/libcpp/system.h:201:0,
  from /blah/libcpp/charset.c:21:
 /usr/include/string.h:597:26: error: ambiguates old declaration ‘const char* 
 basename(const char*)’
  extern C++ const char *basename (const char *__filename)
   ^
 make: *** [charset.o] Error 1
 
 The build libiberty and libcpp share a config.cache file and both check

Ugh - why do they do this kind of thing?  Btw, I can't decipher
where or how it's doing that.

 for basename.  The problem is that libiberty is built without _GNU_SOURCE
 and doesn't see the declaration while libcpp is built with it and does
 see the declaration.  So if libiberty configures first we assume basename
 isn't available.  If they both configure at the same time we're OK.

Bah.  Where does the _GNU_SOURCE come from?  Is it libcpp being built
with C++ and libiberty build with C?

I think the solution is to force a separate config.cache for libcpp.
But as I didn't yet figure out how it ends up sharing config.cache
I didn't figure out how to disable that sharing...

Thanks,
Richard.

 Thanks,
 Richard
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer

Re: [PATCH 1/2, x86, PR63534] Fix darwin bootstrap

2014-10-24 Thread Uros Bizjak
On Thu, Oct 23, 2014 at 11:19 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Linux x86 bootstrap/make check passed.
 Darwin x86 bootstrap passed configured with --with-cpu=intel
 --with-cpu=corei7 issue refers to PR63620.

 On Fri, Oct 17, 2014 at 6:08 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 Hi,

 The patch fixes 1st fail in darwin bootstarp.
 When PIC register is pseudo we don't need to init it after setjmp or
 non local goto.

 Is it ok?

 ChangeLog:

 2014-10-17  Evgeny Stupachenko  evstu...@gmail.com

 PR target/63534
 * config/i386/i386.c (builtin_setjmp_receiver): Delete.
 (nonlocal_goto_receiver): Ditto.

This one needs approval from Darwin maintainer.

Uros.


Re: PATCH: fix breakage from [PATCH] Fix genmatch linking

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Hans-Peter Nilsson wrote:

  From: Richard Biener rguent...@suse.de
  Date: Thu, 23 Oct 2014 10:47:43 +0200
 
  This adds a libcpp host module without NLS and ICONV support
  and properly links genmatch against the build libcpp instead of
  the host one.
  
  Bootstrap running on x86_64-unknown-linux-gnu (stage1 all-gcc
  finished fine).
  
  Ok for trunk?
  
  Thanks,
  Richard.
  
  2014-10-23  Richard Biener  rguent...@suse.de
  
  * Makefile.def: Add libcpp build module and dependencies.
  * configure.ac: Add libcpp build module.
  * Makefile.in: Regenerate.
  * configure: Likewise.
 
 You only exposed a dormant issue, but ever since this commit
 (nominally, some commit in (216573:216588] but I only see this
 one matching) I see, for cross-builds (to cris-elf):
 
 make[2]: Entering directory 
 `/tmp/hpautotest-gcc1/cris-elf/gccobj/build-x86_64-unknown-linux-gnu/libcpp'
 g++  -I/tmp/hpautotest-gcc1/gcc/libcpp -I. 
 -I/tmp/hpautotest-gcc1/gcc/libcpp/../include 
 -I/tmp/hpautotest-gcc1/gcc/libcpp/include  -g -O2 -W -Wall -Wwrite-strings 
 -Wmissing-format-attribute -pedantic -Wno-long-long  -fno-exceptions 
 -fno-rtti -I/tmp/hpautotest-gcc1/gcc/libcpp -I. 
 -I/tmp/hpautotest-gcc1/gcc/libcpp/../include 
 -I/tmp/hpautotest-gcc1/gcc/libcpp/include   -c -o charset.o -MT charset.o 
 -MMD -MP -MF .deps/charset.Tpo /tmp/hpautotest-gcc1/gcc/libcpp/charset.c
 In file included from /tmp/hpautotest-gcc1/gcc/libcpp/system.h:370,
  from /tmp/hpautotest-gcc1/gcc/libcpp/charset.c:21:
 /tmp/hpautotest-gcc1/gcc/libcpp/../include/libiberty.h:113: error: new 
 declaration 'char* basename(const char*)'
 /usr/include/string.h:601: error: ambiguates old declaration 'const char* 
 basename(const char*)'
 make[2]: *** [charset.o] Error 1
 make[2]: Leaving directory 
 `/tmp/hpautotest-gcc1/cris-elf/gccobj/build-x86_64-unknown-linux-gnu/libcpp'
 make[1]: *** [all-build-libcpp] Error 2
 make[1]: Leaving directory `/tmp/hpautotest-gcc1/cris-elf/gccobj'
 make: *** [all] Error 2
 
 Above that, we have:
 checking whether basename is declared... (cached) no
 
 and above that, we have:
 make[2]: Leaving directory 
 `/tmp/hpautotest-gcc1/cris-elf/gccobj/build-x86_64-unknown-linux-gnu/fixincludes'
 mkdir -p -- build-x86_64-unknown-linux-gnu/libcpp
 Configuring in build-x86_64-unknown-linux-gnu/libcpp
 configure: loading cache ../config.cache
 
 which is apparently set due to (above that, first non-cached):
 mkdir -p -- build-x86_64-unknown-linux-gnu/libiberty
 Configuring in build-x86_64-unknown-linux-gnu/libiberty
 configure: creating cache ../config.cache
 ...
 checking whether basename is declared... no
 
 Your commit introduces build-subdirectories for cross-builds.
 Build-subdirs share a config.cache (in build-host/config.cache),
 with subdirs in build-host being fixincludes, libcpp and
 libiberty.
 
 But, libiberty and fixincludes are configure-tested and compiled
 using gcc, while libcpp is compiled with g++, which causes a
 different set of declarations to be exposed, so the shared
 config.cache is invalid and its use is bogus.  Not sure how this
 works for native builds.
 
 The libcpp configure checks are actually run with gcc which is
 bogus by itself, but apparently working.  I guess the C vs. C++
 declaration etc. differences for libcpp are mostly hidden by
 using _GNU_SOURCE (through AC_USE_SYSTEM_EXTENSIONS), and I'm a
 bit surprised that's not used for libiberty and fixincludes.
 Still, a red herring.  Aligning those options *may* cause the
 build to succeed, but I think that'd be too much of sweeping the
 issue under the carpet.
 
 It seems more correct to just disable the config.cache sharing
 between the differently-configured build-subdirectories, as is
 already is done for host-libraries and target-libraries, even if
 that may slow down the builds.  (Erroring out is infinitely
 slower. :)  Still, I don't understand exactly how your patch
 introduces build-subdirectories where there were none before.
 Maybe that +all-gcc: maybe-all-build-libcpp was wrong and
 should be different?

No, we do need a build-libcpp to build gcc/build/genmatch.
Not sure how you got around without a build-libiberty as other
gen* programs surely require that.

 Anyway, with this, a cris-elf cross build passes the point of
 failure; compilers and libraries built, progressing into
 testing.
 
 Ok to commit?

Ok.

Thanks,
Richard.

 toplev:
   * configure.ac (build_configargs): Don't share
   config.cache between build subdirs.
 
 Index: configure.ac
 ===
 --- configure.ac  (revision 216610)
 +++ configure.ac  (working copy)
 @@ -2922,8 +2922,10 @@ AC_ARG_VAR([target_configargs],
  
  # For the build-side libraries, we just need to pretend we're native,
  # and not use the same cache file.  Multilibs are neither needed nor
 -# desired.
 -build_configargs=$build_configargs --cache-file=../config.cache ${baseargs}
 +# 

Re: [PATCH] Fix genmatch linking

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Rainer Orth wrote:

 Richard Biener rguent...@suse.de writes:
 
  On Thu, 23 Oct 2014, Richard Biener wrote:
 
  
  Final try for today.
 
  And this may work as well and is slightly simpler.
 
  Index: gcc/Makefile.in
  ===
  --- gcc/Makefile.in (revision 216590)
  +++ gcc/Makefile.in (working copy)
  @@ -981,7 +981,15 @@ else
   LIBIBERTY = ../libiberty/libiberty.a
   BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a
   endif
  +# For stage1 and when cross-compiling use the build libcpp which is
  +# built with NLS disabled.  For stage2+ use the host library and
  +# its dependencies.
  +ifeq ($(build_objdir),$(build_libobjdir))
   BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
  +else
  +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV)
  +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) 
  $(LIBICONV_DEP)
  +endif
   
   # Dependencies on the intl and portability libraries.
   LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) 
  \
 
 
  Can you test it please?
 
 Sure: this version allowed an i386-pc-solaris2.10 bootstrap to complete
 just fine.

Great.  Installed as follows.

Richard.

2014-10-24  Richard Biener  rguent...@suse.de

* Makefile.in (BUILD_CPPLIB): When in stage2+ use the
host library and make sure to pull in the required libintl
and libiconv dependencies.

Index: gcc/Makefile.in
===
--- gcc/Makefile.in (revision 216590)
+++ gcc/Makefile.in (working copy)
@@ -981,7 +981,15 @@ else
 LIBIBERTY = ../libiberty/libiberty.a
 BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a
 endif
+# For stage1 and when cross-compiling use the build libcpp which is
+# built with NLS disabled.  For stage2+ use the host library and
+# its dependencies.
+ifeq ($(build_objdir),$(build_libobjdir))
 BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
+else
+BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV)
+build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP)
+endif
 
 # Dependencies on the intl and portability libraries.
 LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \


Re: [Patch ARM-AArch64/testsuite v3 00/21] Neon intrinsics executable tests

2014-10-24 Thread Marcus Shawcroft
On 21 October 2014 14:02, Christophe Lyon christophe.l...@linaro.org wrote:
 This patch series is an updated version of the series I sent here:
 https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00022.html

 I addressed comments from Marcus and Richard, and decided to skip
 support for half-precision variants for the time being. I'll post
 dedicated patches later.

 Compared to v2:
 - the directory containing the new tests is named
   gcc.target/aarch64/adv-simd instead of
   gcc.target/aarch64/neon-intrinsics.
 - the driver is named adv-simd.exp instead of neon-intrinsics.exp
 - the driver is guarded against the new test parallelization framework
 - the README file uses 'Advanced SIMD (Neon)' instead of 'Neon'


Thank you Christophe.  Please commit all 21 patches in the series.

/Marcus


[AARCH64, Question] Does AARCH64 GCC support long calls?

2014-10-24 Thread Yangfei (Felix)
Hi,

I find that the -mlong-calls option is not there for AARCH64. So can this 
port generate long calls? 
Any plan on this option? I would like to have a try on this if it's missing 
:-) 
Thanks. 


Re: [PATCHv4] Enable -fsanitize-recover for KASan

2014-10-24 Thread Yury Gribov

On 10/23/2014 03:10 PM, Andrey Ryabinin wrote:

On 10/23/2014 02:38 PM, Jakub Jelinek wrote:

On Thu, Oct 23, 2014 at 02:33:42PM +0400, Yury Gribov wrote:

Actually this is a historical artifact.  If inlining proves to be
significantly faster, they may want to switch.


Ok.


So, at that point you can include your ugly hacks in __asan_load* logic in
the kernel, the difference between __asan_load4 and __asan_load4_noabort
will be just that the latter will always return, while the former will not
if some error has been reported.
All the __asan_load* and __asan_store* entrypoints, regardless of
-f{,no-}sanitize-recover=kernel-address are by definition not noreturn, they
in the common case (if the code is not buggy) return.


Perhaps we should just keep __asan_load* as is and leave the decision
whether to abort or continue for the runtime?  This would make semantics of
-fsanitize-recover cumbersome though (because it wouldn't work if user
selects outline instrumentation).


Well, the don't ever report anything while some per-CPU flag is set thing
can be considered as part of the is this memory access ok test, it is
pretending everything is accessible.

But, otherwise, if it is supposed to be developer's decision at compile
time, __asan_load*_noabort should better always continue, even if it
reported issues, and __asan_load* should better not return after reporting
errors.



True, but why we need new functions for that.
__asan_load could also abort or not depending on what user/developer wants.
Why we have to rebuild the entire kernel if someone wants to switch from abort 
to noabort?

I'm not against __asan_load_noabort, I'm just saying that this is no point to 
have separate
__asan_load/__asan_load_noabort functions in kernel.


I'd still suggest to emit __asan_load_noabort so that we match userspace 
(where __asan_load strictly matches __asan_report in terminating the 
program).  Behavior of __asan_load_noabort can further be restricted by 
user via various environment settings (kernel parameters, /proc, etc.).


@Dmitry: what's your opinion on this?

-Y



Re: [PATCH PR63173] [AARCH64, NEON] Improve vld[234](q?)_dup intrinsics

2014-10-24 Thread Marcus Shawcroft
On 24 October 2014 03:21, Yangfei (Felix) felix.y...@huawei.com wrote:

 Thanks for the comments. I updated the patch with the intrinsic moved to its 
 place.
 Attached please find the new version of the patch.
 OK for the trunk?


 Index: gcc/ChangeLog
 ===
 --- gcc/ChangeLog   (revision 216558)
 +++ gcc/ChangeLog   (working copy)
 @@ -1,3 +1,39 @@
 +2014-10-23  Felix Yang  felix.y...@huawei.com
 +   Jiji Jiang jiangj...@huawei.com

Double space before .  Otherwise OK.
Thanks /Marcus


Re: [PATCH] c++ify sreal

2014-10-24 Thread Richard Biener
On Fri, Oct 24, 2014 at 8:28 AM,  tsaund...@mozilla.com wrote:
 From: Trevor Saunders tsaund...@mozilla.com

 Hi,

 do $subject, and cleanup for always 64 bit hwi.


 bootstrapped + regtested x86_64-unknown-linux-gnu, ok?

Ok.  Can you please replace remaining HOST_WIDE_INT
vestiges in there with [u]int64_t please?

Thanks,
Richard.

 Trev

 gcc/ChangeLog:

 2014-10-24  Trevor Saunders  tsaund...@mozilla.com

 * ipa-inline.c (edge_badness): Adjust.
 (inline_small_functions): Likewise.
 * predict.c (propagate_freq): Likewise.
 (estimate_bb_frequencies): Likewise.
 * sreal.c (sreal::dump): Rename from dump_sreal.
 (debug): Adjust.
 (copy): Remove function.
 (sreal::shift_right): Rename from sreal_sift_right.
 (sreal::normalize): Rename from normalize.
 (sreal_init): Remove function.
 (sreal::to_int): Rename from sreal_to_int.
 (sreal_compare): Remove function.
 (sreal::operator+): Rename from sreal_add.
 (sreal::operator-): Rename from sreal_sub.
 (sreal::operator*): Rename from sreal_mul.
 (sreal::operator/): Rename from sreal_div.
 * sreal.h (class sreal): Adjust.
 (inline sreal operator+=): New operator.
 (inline sreal operator-=): Likewise.
 (inline sreal operator/=): Likewise.
 (inline sreal operator*=): Likewise.
 (inline bool operator!=): Likewise.
 (inline bool operator): Likewise.
 (inline bool operator=): Likewise.
 (inline bool operator=): Likewise.
 ---
  gcc/ipa-inline.c |  25 ++-
  gcc/predict.c|  82 --
  gcc/sreal.c  | 479 
 +++
  gcc/sreal.h  |  97 ---
  4 files changed, 213 insertions(+), 470 deletions(-)

 diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
 index e79a4dd..cca1fb3 100644
 --- a/gcc/ipa-inline.c
 +++ b/gcc/ipa-inline.c
 @@ -939,29 +939,28 @@ edge_badness (struct cgraph_edge *edge, bool dump)

else if (max_count)
  {
 -  sreal tmp, relbenefit_real, growth_real;
int relbenefit = relative_time_benefit (callee_info, edge, edge_time);
/* Capping edge-count to max_count. edge-count can be larger than
  max_count if an inline adds new edges which increase max_count
  after max_count is computed.  */
gcov_type edge_count = edge-count  max_count ? max_count : 
 edge-count;

 -  sreal_init (relbenefit_real, relbenefit, 0);
 -  sreal_init (growth_real, growth, 0);
 +  sreal relbenefit_real (relbenefit, 0);
 +  sreal growth_real (growth, 0);

/* relative_edge_count.  */
 -  sreal_init (tmp, edge_count, 0);
 -  sreal_div (tmp, tmp, max_count_real);
 +  sreal tmp (edge_count, 0);
 +  tmp /= max_count_real;

/* relative_time_benefit.  */
 -  sreal_mul (tmp, tmp, relbenefit_real);
 -  sreal_div (tmp, tmp, max_relbenefit_real);
 +  tmp *= relbenefit_real;
 +  tmp /= max_relbenefit_real;

/* growth_f_caller.  */
 -  sreal_mul (tmp, tmp, half_int_min_real);
 -  sreal_div (tmp, tmp, growth_real);
 +  tmp *= half_int_min_real;
 +  tmp /=  growth_real;

 -  badness = -1 * sreal_to_int (tmp);
 +  badness = -1 * tmp.to_int ();

if (dump)
 {
 @@ -1604,9 +1603,9 @@ inline_small_functions (void)
   if (max_count  edge-count)
 max_count = edge-count;
}
 -  sreal_init (max_count_real, max_count, 0);
 -  sreal_init (max_relbenefit_real, RELATIVE_TIME_BENEFIT_RANGE, 0);
 -  sreal_init (half_int_min_real, INT_MAX / 2, 0);
 +  max_count_real = sreal (max_count, 0);
 +  max_relbenefit_real = sreal (RELATIVE_TIME_BENEFIT_RANGE, 0);
 +  half_int_min_real = sreal (INT_MAX / 2, 0);
ipa_free_postorder_info ();
initialize_growth_caches ();

 diff --git a/gcc/predict.c b/gcc/predict.c
 index 5f5d4a5..10675c3 100644
 --- a/gcc/predict.c
 +++ b/gcc/predict.c
 @@ -2571,15 +2571,13 @@ propagate_freq (basic_block head, bitmap tovisit)
 bb-count = bb-frequency = 0;
  }

 -  memcpy (BLOCK_INFO (head)-frequency, real_one, sizeof (real_one));
 +  BLOCK_INFO (head)-frequency = real_one;
last = head;
for (bb = head; bb; bb = nextbb)
  {
edge_iterator ei;
 -  sreal cyclic_probability, frequency;
 -
 -  memcpy (cyclic_probability, real_zero, sizeof (real_zero));
 -  memcpy (frequency, real_zero, sizeof (real_zero));
 +  sreal cyclic_probability = real_zero;
 +  sreal frequency = real_zero;

nextbb = BLOCK_INFO (bb)-next;
BLOCK_INFO (bb)-next = NULL;
 @@ -2596,42 +2594,34 @@ propagate_freq (basic_block head, bitmap tovisit)
   FOR_EACH_EDGE (e, ei, bb-preds)
 if (EDGE_INFO (e)-back_edge)
   {
 -   sreal_add (cyclic_probability, cyclic_probability,
 -  EDGE_INFO (e)-back_edge_prob);
 +   cyclic_probability += 

[committed] MAINTAINERS: add myself to write-after-approval list.

2014-10-24 Thread Daniel Hellstrom

2014-10-24  Daniel Hellstrom  dan...@gaisler.com

* MAINTAINERS (write-after-approval): Add myself.

Index: MAINTAINERS
===
--- MAINTAINERS(revision 216624)
+++ MAINTAINERS(working copy)
@@ -408,6 +408,7 @@
 Pat Haugenpthau...@us.ibm.com
 Mark Heffernanmeh...@google.com
 George Helffrichgeo...@gcc.gnu.org
+Daniel Hellstromdan...@gaisler.com
 Fergus Hendersonf...@cs.mu.oz.au
 Stuart Hendersonshend...@gcc.gnu.org
 Matthew Hillerhil...@redhat.com



Re: [libgomp, libiberty, libobjc] Fix gnu11 fallout on Solaris 10+

2014-10-24 Thread Rainer Orth
Richard Henderson r...@redhat.com writes:

 On 10/22/2014 04:43 AM, Rainer Orth wrote:
 The gnu11 patch broke Solaris 10 and 11 bootstrap: sys/feature_test.h
 has
 
 /*
  * It is invalid to compile an XPG3, XPG4, XPG4v2, or XPG5 application
  * using c99.  The same is true for POSIX.1-1990, POSIX.2-1992, POSIX.1b,
  * and POSIX.1c applications. Likewise, it is invalid to compile an XPG6
  * or a POSIX.1-2001 application with anything other than a c99 or later
  * compiler.  Therefore, we force an error in both cases.
  */
 #if defined(_STDC_C99)  (defined(__XOPEN_OR_POSIX)  !defined(_XPG6))
 #error Compiler or options invalid for pre-UNIX 03 X/Open applications \
  and pre-2001 POSIX applications
 #elif !defined(_STDC_C99)  \
  (defined(__XOPEN_OR_POSIX)  defined(_XPG6))
 #error Compiler or options invalid; UNIX 03 and POSIX.1-2001 applications \
  require the use of c99
 #endif
 
 so the headers now error out for any definition of (say) _XOPEN_SOURCE  600.

 Hum.

 Does it hurt us to simply update to 600 everywhere?  If we can, I'd prefer 
 that
 as a solution.  But if that causes more problems than it solves, I'm ok with
 this as a solution.

This certainly depends on what the oldest systems we still support are.
E.g. in Solaris 8 (no longer supported on mainline, just serving as
illustration) sys/feature_test.h we have for XPG5/UNIX 98:

#if (_XOPEN_SOURCE - 0 == 500)
#define _XPG5

Changing _XOPEN_SOURCE to 600 loses this completely.  Same on Solaris 9
(equally no longer supported), and at least some pre-XPG6 systems, I fear.
I cannot tell if we still support any of those, though.

This might not even an issue for the cases as hand: e.g. even Solaris 8
pthread.h defines PTHREAD_MUTEX_RECURSIVE (the reason to define
_XOPEN_SOURCE in libgomp/config/posix/lock.c) unconditionally.

We might as well try and watch out for breakage, given that we are still
in stage1.

 I would like the comments updated to mention the reason for XPG6; just saying
 that Solaris requires it for C99 and later seems sufficient.

Sure, will do once we've decided which route to follow.

And even with the _XOPEN_SOURCE business out of the way, there's still
the question what to do about _POSIX_SOURCE in libiberty/sigsetmask.c.
Given that it was introduced almost 20 years ago, it may well be
unnecessary for the systems we still care about.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH][match-and-simplify] Fix failed init of force_no_side_effects

2014-10-24 Thread Richard Biener

Committed.

Richard.

2014-10-24  Richard Biener  rguent...@suse.de

* genmatch.c (capture_info::capture_info): Initialize
force_no_side_effects.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 216594)
+++ gcc/genmatch.c  (working copy)
@@ -1902,6 +1902,7 @@ capture_info::capture_info (simplify *s)
   return;
 }
 
+  force_no_side_effects = 0;
   info.safe_grow_cleared (s-capture_max + 1);
   e = as_a expr * (s-match);
   for (unsigned i = 0; i  e-ops.length (); ++i)


[PATCH, alpha]: Fix c11 fallout in alpha testsuite

2014-10-24 Thread Uros Bizjak
2014-10-24  Uros Bizjak  ubiz...@gmail.com

* gcc.target/alpha/980217-1.c (main): Fix implicit int.
* gcc.target/alpha/pr19518.c (_mm_setzero_si64): New function.

Tested on alphaev68-linux-gnu and committed to mainline SVN.

Uros.
Index: gcc.target/alpha/980217-1.c
===
--- gcc.target/alpha/980217-1.c (revision 216611)
+++ gcc.target/alpha/980217-1.c (working copy)
@@ -26,7 +26,7 @@
   return ~(t  31)  (1 - ((ix  0x8000)  30));
 }
 
-main ()
+int main ()
 {
   float x = 1.0 / 0.0;
   int i = isinff (x);
Index: gcc.target/alpha/pr19518.c
===
--- gcc.target/alpha/pr19518.c  (revision 216611)
+++ gcc.target/alpha/pr19518.c  (working copy)
@@ -9,6 +9,10 @@
 typedef int FbStride;
 typedef unsigned long __m64;
 extern __m64 load (__m64);
+static __inline __m64 _mm_setzero_si64(void)
+{
+  return (__m64)0L;
+}
 static __inline __m64 _mm_adds_pu8(__m64 __m1, __m64 __m2)
 {
 return __m1 + __builtin_alpha_minsb8(__m2, ~__m1);


Re: avoid alignment of static variables affecting stack's

2014-10-24 Thread Jan Beulich
 On 23.10.14 at 20:13, l...@redhat.com wrote:
 On 10/23/14 01:09, Jan Beulich wrote:
 On 23.10.14 at 08:50, ja...@redhat.com wrote:
 On Thu, Oct 23, 2014 at 07:30:27AM +0100, Jan Beulich wrote:
 Function (or more narrow) scope static variables (as well as others not
 placed on the stack) should also not have any effect on the stack
 alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
 construct using an 8-byte aligned sub-file-scope local variable.

 According to my checking bad behavior started with 4.6.x (4.5.3 was
 still okay), but generated code got quite a bit worse as of 4.9.0.

 If the static/external var has BLKmode, then perhaps it is safe, but I
 wonder about other vars, say vectors etc.  Such vars are most likely
 loaded from their memory location, and if for some reason that needs to be
 spilled again, stack realignment would not be able to do that.
 Or do we inspect the IL and for any pseudos with modes needing larger
 alignment we adjust the dynamic stack realignment fields?

 I don't know, but it would seem to me that this ought to happen
 anyway: If the pseudo holds the result of some computation
 other than a simple load from memory and needs spilling, the same
 would apply afaict.
 
 For something in static storage, this seems OK.  However, I think a hard 
 register variable ought to be left alone -- even if we can't spill it to 
 a stack slot today, there's a reasonable chance we might add that 
 capability in the future.

Hmm, but then wouldn't it need to be the code generating the spill
that's responsible for enforcing suitable alignment? I can certainly
re-submit without the hard register special cased (as it would still
fix the original issue I'm seeing), but it feels wrong to do so.

Jan



Re: Move loop peeling from RTL to gimple

2014-10-24 Thread Uros Bizjak
Hello!

 Hi,
 this is update of my 2013 update to 2012 patch to move rtl loop peeling
 to tree level. This is to expose optimization oppurtunities earlier.
 Incrementally I think I can also improve profiling to provide a histogram
 on loop iterations and get more sensible peeling decisions.

 profiled-bootstrapped/regtested x86_64-linux, OK?

 Ok.

 Thanks,
 Richard.

 Honza

 * loop-unroll.c: (decide_unrolling_and_peeling): Rename to
 (decide_unrolling): ... this one.
 (peel_loops_completely): Remove.
 (decide_peel_simple): Remove.
 (decide_peel_once_rolling): Remove.
 (decide_peel_completely): Remove.
 (peel_loop_simple): Remove.
 (peel_loop_completely): Remove.
 (unroll_and_peel_loops): Rename to ...
 (unroll_loops): ... this one; handle only unrolling.
 * cfgloop.h (lpt_dec): Remove LPT_PEEL_COMPLETELY and
 LPT_PEEL_SIMPLE.
 (UAP_PEEL): Remove.
 (unroll_and_peel_loops): Remove.
 (unroll_loops): New.
 * passes.def: Replace
 pass_rtl_unroll_and_peel_loops by pass_rtl_unroll_loops.
 * loop-init.c (gate_rtl_unroll_and_peel_loops,
 rtl_unroll_and_peel_loops): Rename to ...
 (gate_rtl_unroll_loops, rtl_unroll_loops): ... these; update.
 (pass_rtl_unroll_and_peel_loops): Rename to ...
 (pass_rtl_unroll_loops): ... this one.
 * tree-pass.h (make_pass_rtl_unroll_and_peel_loops): Remove.
 (make_pass_rtl_unroll_loops): New.
 * tree-ssa-loop-ivcanon.c: (estimated_peeled_sequence_size, try_peel_loop): 
 New.
 (canonicalize_loop_induction_variables): Update.

 * gcc.dg/tree-prof/peel-1.c: Update.
 * gcc.dg/tree-prof/unroll-1.c: Update.
 * gcc.dg/gcc.dg/unroll_1.c: Update.
 * gcc.dg/gcc.dg/unroll_2.c: Update.
 * gcc.dg/gcc.dg/unroll_3.c: Update.
 * gcc.dg/gcc.dg/unroll_4.c: Update.

This patch caused:

UNRESOLVED: gcc.dg/tree-prof/peel-1.c scan-rtl-dump loop2_unroll
Considering simply peeling loop

The gcc.dg/tree-prof/peel-1.c testcase needs further updates in dump
scan and cleanup dg-directives.

Uros.


Re: avoid alignment of static variables affecting stack's

2014-10-24 Thread Richard Biener
On Fri, Oct 24, 2014 at 11:01 AM, Jan Beulich jbeul...@suse.com wrote:
 On 23.10.14 at 20:13, l...@redhat.com wrote:
 On 10/23/14 01:09, Jan Beulich wrote:
 On 23.10.14 at 08:50, ja...@redhat.com wrote:
 On Thu, Oct 23, 2014 at 07:30:27AM +0100, Jan Beulich wrote:
 Function (or more narrow) scope static variables (as well as others not
 placed on the stack) should also not have any effect on the stack
 alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
 construct using an 8-byte aligned sub-file-scope local variable.

 According to my checking bad behavior started with 4.6.x (4.5.3 was
 still okay), but generated code got quite a bit worse as of 4.9.0.

 If the static/external var has BLKmode, then perhaps it is safe, but I
 wonder about other vars, say vectors etc.  Such vars are most likely
 loaded from their memory location, and if for some reason that needs to be
 spilled again, stack realignment would not be able to do that.
 Or do we inspect the IL and for any pseudos with modes needing larger
 alignment we adjust the dynamic stack realignment fields?

 I don't know, but it would seem to me that this ought to happen
 anyway: If the pseudo holds the result of some computation
 other than a simple load from memory and needs spilling, the same
 would apply afaict.

 For something in static storage, this seems OK.  However, I think a hard
 register variable ought to be left alone -- even if we can't spill it to
 a stack slot today, there's a reasonable chance we might add that
 capability in the future.

 Hmm, but then wouldn't it need to be the code generating the spill
 that's responsible for enforcing suitable alignment? I can certainly
 re-submit without the hard register special cased (as it would still
 fix the original issue I'm seeing), but it feels wrong to do so.

Yes, ISTR the spilling code is supposed to update the required
stack alignment.  After all the RA decision might affect required
alignment of spills.

Richard.

 Jan



Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-24 Thread Richard Biener
On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 In my initial design I did such splitting but before start real
 if-conversion but I decided to not perform it since code size for
 if-converted loop is growing (number of phi nodes is increased). It is
 worth noting also that for phi with #nodes  2 we need to get all
 predicates (except one) to do phi-predication and it means that block
 containing such phi can have only 1 critical edge.

Can you point me to the patch with the special insertion code then?
I definitely want to avoid the mess we ran into with the reassoc
code clever insertion code.

Richard.

 Thanks.
 Yuri.

 2014-10-21 18:19 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I saw the sources of these functions, but I can't understand why I
 should use something else? Note that all predicate computations are
 located in basic blocks ( by design of if-conv) and there is special
 function that put these computations in bb
 (insert_gimplified_predicates). Edge contains only predicate not its
 computations. New function - find_insertion_point() does very simple
 search - it finds out the latest (in current bb) operand def-stmt of
 predicates taken from all incoming edges.
 In original algorithm the predicate of non-critical edge is taken to
 perform phi-node predication since for critical edge it does not work
 properly.

 My question is: does your comments mean that I should re-design my 
 extensions?

 Well, we have infrastructure for inserting code on edges and you've
 made critical edges predicated correctly.  So why re-invent the wheel?
 I realize this is very similar to my initial suggestion to simply split
 critical edges in loops you want to if-convert but delays splitting
 until it turns out to be necessary (which might be good for the
 !force_vect case).

 For edge predicates you simply can emit their computation on the
 edge, no?

 Btw, I very originally suggested to rework if-conversion to only
 record edge predicates - having both block and edge predicates
 somewhat complicates the code and makes it harder to
 maintain (thus also the suggestion to simply split critical edges
 if necessary to make BB predicates work always).

 Your patches add a lot of code and to me it seems we can avoid
 doing so much special casing.

 For example attacking the critical edge issue by a simple

 Index: tree-if-conv.c
 ===
 --- tree-if-conv.c  (revision 216508)
 +++ tree-if-conv.c  (working copy)
 @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
 if (EDGE_COUNT (e-src-succs) == 1)
   found = true;
if (!found)
 -   {
 - if (dump_file  (dump_flags  TDF_DETAILS))
 -   fprintf (dump_file, only critical predecessors\n);
 - return false;
 -   }
 +   split_edge (EDGE_PRED (bb, 0));
  }

return true;

 it changes the number of blocks in the loop, so
 get_loop_body_in_if_conv_order should probably be re-done with the
 above eventually signalling that it created a new block.  Or the above
 should populate a vector of edges to split and do that after the
 loop calling if_convertible_bb_p.

 Richard.

 Richard.

 Thanks.
 Yuri.

 BTW Jeff did initial review of my changes related to predicate
 computation for join blocks. I presented him updated patch with
 test-case and some minor changes in patch. But still did not get any
 feedback on it. Could you please take a look also on it?


 2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 Yes, This patch does not make sense since phi node predication for bb
 with critical incoming edges only performs another function which is
 absent (predicate_extended_scalar_phi).

 BTW I see that commit_edge_insertions() is used for rtx instructions
 only but you propose to use it for tree also.
 Did I miss something?

 Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
 if you want easy access to the newly created basic block to push
 the predicate to - see gsi_commit_edge_inserts implementation).

 Richard.

 Thanks ahead.


 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 I did some changes in patch and ChangeLog to mark that support for
 if-convert of blocks with only critical incoming edges will be added
 in the future (more precise in patch.4).

 But the same reasoning applies to this version of the patch when
 flag_force_vectorize is true!?  (insertion point and invalid SSA form)

 Which means the patch doesn't make sense in isolation?

 Btw, I think for the case you should simply 

Re: [PATCH] Simple improvement for predicate computation in if-convert phase.

2014-10-24 Thread Richard Biener
On Fri, Oct 17, 2014 at 3:08 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Jeff,

 I prepared another patch that includes test-case as you requested.

 Below are answers on your questions.

 First, for the benefit of anyone trying to understand what you're doing, 
 defining what cd equivalent means would be helpful.

 I added the following  comment to function:

fwe call basic blocks bb1 and bb2
cd-equivalent if they are executed under the same condition.


 Is it sufficient?

So, do you have a case where the dominated_by_p test above is true and 
is_predicated(bb) returns true as well?  I think this part of the change is 
largely responsible for the hack you're doing with having the function scoped 
static variable join_bb.

 I don't have such test-case and I assume that if bb is always
 executed, it is not predicated.

 I also deleted join_bb in my changes.


 Is it OK for trunk now.

Ok.

Thanks,
Richard.

 Thanks.
 Yuri.

 2014-10-17  Yuri Rumyantsev  ysrum...@gmail.com
 gcc/ChangeLog

 * tree-if-conv.c (add_to_predicate_list): Check unconditionally
 that bb is always executed to early exit. Use predicate of
 cd-equivalent block for join blocks if it exists.
 (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree.
 (tree_if_conversion): Free post-dominance information.

 gcc/testsuite/ChangeLog

 * gcc/dg/tree-ssa/ifc-cd.c: New test.



 2014-10-17 1:16 GMT+04:00 Jeff Law l...@redhat.com:
 On 10/16/14 05:52, Yuri Rumyantsev wrote:

 Hi All,

 Here is a simple enhancement for predicate computation in if-convert
 phase:

   We use notion of cd equivalence to get simpler predicate for
   join block, e.g. if join block has 2 predecessors with predicates
   p1  p2 and p1  !p2, we'd like to get p1 for it instead of
   p1  p2 | p1  !p2.

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?

 gcc/ChangeLog
 2014-10-16  Yuri Rumyantsevysrum...@gmail.com

 * tree-if-conv.c (add_to_predicate_list): Check unconditionally
 that bb is always executed to early exit. Use predicate of
 cd-equivalent block for join blocks if it exists.
 (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree.
 (tree_if_conversion): Free post-dominance information.

 First, for the benefit of anyone trying to understand what you're doing,
 defining what cd equivalent means would be helpful.




 if-conv.patch


 Index: tree-if-conv.c
 ===
 --- tree-if-conv.c  (revision 216217)
 +++ tree-if-conv.c  (working copy)
 @@ -396,25 +396,51 @@
   }

   /* Add condition NC to the predicate list of basic block BB.  LOOP is
 -   the loop to be if-converted.  */
 +   the loop to be if-converted. Use predicate of cd-equivalent block
 +   for join bb if it exists.  */

   static inline void
   add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
   {
 tree bc, *tp;
 +  basic_block dom_bb;
 +  static basic_block join_bb = NULL;

 if (is_true_predicate (nc))
   return;

 -  if (!is_predicated (bb))
 +  /* If dominance tells us this basic block is always executed,
 + don't record any predicates for it.  */
 +  if (dominated_by_p (CDI_DOMINATORS, loop-latch, bb))
 +return;

 So, do you have a case where the dominated_by_p test above is true and
 is_predicated(bb) returns true as well?  I think this part of the change is
 largely responsible for the hack you're doing with having the function
 scoped static variable join_bb.




 +
 +  /* If predicate has been already set up for given bb using
 cd-equivalent
 + block predicate, simply escape.  */
 +  if (join_bb == bb)
 +return;

 I *really* dislike the state you're carrying around via join_bb.

 ISTM that if you compute that there's an equivalence, then you just set the
 predicate for the equivalent block and the right things would have happened
 if you had not changed the test above.

 You also need a testcase.  It doesn't have to be extensive, but at least
 some basic smoke test to verify basic operation of this code.  It's
 perfectly fine to scan the debugging dumps for debug output.


 jeff




Re: avoid alignment of static variables affecting stack's

2014-10-24 Thread Jan Beulich
 On 24.10.14 at 11:10, richard.guent...@gmail.com wrote:
 On Fri, Oct 24, 2014 at 11:01 AM, Jan Beulich jbeul...@suse.com wrote:
 On 23.10.14 at 20:13, l...@redhat.com wrote:
 On 10/23/14 01:09, Jan Beulich wrote:
 On 23.10.14 at 08:50, ja...@redhat.com wrote:
 On Thu, Oct 23, 2014 at 07:30:27AM +0100, Jan Beulich wrote:
 Function (or more narrow) scope static variables (as well as others not
 placed on the stack) should also not have any effect on the stack
 alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
 construct using an 8-byte aligned sub-file-scope local variable.

 According to my checking bad behavior started with 4.6.x (4.5.3 was
 still okay), but generated code got quite a bit worse as of 4.9.0.

 If the static/external var has BLKmode, then perhaps it is safe, but I
 wonder about other vars, say vectors etc.  Such vars are most likely
 loaded from their memory location, and if for some reason that needs to be
 spilled again, stack realignment would not be able to do that.
 Or do we inspect the IL and for any pseudos with modes needing larger
 alignment we adjust the dynamic stack realignment fields?

 I don't know, but it would seem to me that this ought to happen
 anyway: If the pseudo holds the result of some computation
 other than a simple load from memory and needs spilling, the same
 would apply afaict.

 For something in static storage, this seems OK.  However, I think a hard
 register variable ought to be left alone -- even if we can't spill it to
 a stack slot today, there's a reasonable chance we might add that
 capability in the future.

 Hmm, but then wouldn't it need to be the code generating the spill
 that's responsible for enforcing suitable alignment? I can certainly
 re-submit without the hard register special cased (as it would still
 fix the original issue I'm seeing), but it feels wrong to do so.
 
 Yes, ISTR the spilling code is supposed to update the required
 stack alignment.  After all the RA decision might affect required
 alignment of spills.

Thanks for confirming. So is the patch then okay to commit as is?

Jan



Re: avoid alignment of static variables affecting stack's

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 11:10:08AM +0200, Richard Biener wrote:
  For something in static storage, this seems OK.  However, I think a hard
  register variable ought to be left alone -- even if we can't spill it to
  a stack slot today, there's a reasonable chance we might add that
  capability in the future.
 
  Hmm, but then wouldn't it need to be the code generating the spill
  that's responsible for enforcing suitable alignment? I can certainly
  re-submit without the hard register special cased (as it would still
  fix the original issue I'm seeing), but it feels wrong to do so.
 
 Yes, ISTR the spilling code is supposed to update the required
 stack alignment.  After all the RA decision might affect required
 alignment of spills.

From what I remember, at RA time you already have to know conservatively
that you'll want to do dynamic stack realignment and what the highest needed
alignment will be, so various parts of expansion etc. conservatively compute
what will be needed.  I think that is because you e.g. need to reserve some
registers (vDRAP, etc.) if doing dynamic realignment.
If you conservatively assume you'll need dynamic stack realignment and after
RA you find you really don't need it, there are some optimizations in
prologue threading where it attempts to at least decrease amount of
unnecessary code, but the harm has already been done.

Might be that with LRA perhaps this could be changed and not conservatively
assume more alignment than proven to be needed, but such code isn't there I
think.

Jakub


Re: [testsuite] Fix gcc.dg/ipa/ipa-icf-2[18].c on Solaris

2014-10-24 Thread Rainer Orth
Jan Hubicka hubi...@ucw.cz writes:

 I had a terrible time figuring out what march value
 i686-unknown-linux-gnu uses; only in gcc -v it's possible to see that it
 uses pentiumpro, unlike pentium4 on Solaris/x86, which explains the
 difference.

 What is the reason to use pentium4 for Solaris/x86? Is it -march setting or
 -mtune? For -mtune it is a terrible choice

This was changed in

http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01483.html
http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00254.html

It's with_arch_32=pentium4 and with_tune_32=generic now.

The latest Oracle Studio compilers default to -xarch=sse2, which is

  sse2  May use 386, MMX, Pentium_pro, SSE and SSE2
instructions.

The pentium4 over pentiumpro choice may well be the result of a
misunderstanding: as I said, the code in config.gcc setting with_arch*
is almost impossible to follow.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [match-and-simplify] add new lower function

2014-10-24 Thread Richard Biener
On Thu, Oct 23, 2014 at 11:22 AM, Prathamesh Kulkarni
bilbotheelffri...@gmail.com wrote:
 Instead of iterating each time for calling lowering function
 (lower_opt_convert, etc.),
 add new lower function and pass lower_opt_convert, lower_for etc. to
 it as callback.

I don't think this is more readable.

Thanks,
Richard.

 * genmatch.c (lower): New overloaded function.
 (lower): Adjust to call overloaded lower.

 Thanks,
 Prathamesh


Re: [PATCH] Fix and improve avx2 broadcasts (PR target/63594)

2014-10-24 Thread Rainer Orth
Rainer Orth r...@cebitec.uni-bielefeld.de writes:

 Jakub Jelinek ja...@redhat.com writes:

 On Thu, Oct 23, 2014 at 02:58:06PM +0200, Rainer Orth wrote:
 Unfortunately, I see some problems with those tests on Solaris:
 
 * On Solaris/x86, I get
 
 FAIL: gcc.dg/pr63594-2.c execution test
 
   for 32-bit.  Any particular reason to restrict -mno-mmx to Linux/x86?
   Manually building the testcase with -mno-mmx on Solaris/x86 seems to
   cure the failure.

 No reason, probably finger memory without lots of thinking.
 The reason for -mno-mmx is that the functions use floating point vectors
 and scalar floating point arithmetics in the same function.
 Feel free to change both pr63594-{1,2}.c with s/linux//g .

 Ok, will do and commit after Linux and Solaris testing.

Here's what I've checked in after i686-unknown-linux-gnu,
x86_64-unknown-linux-gnu, and i386-pc-solaris2.11 testing:

2014-10-24  Rainer Orth  r...@cebitec.uni-bielefeld.de

   * gcc.dg/pr63594-1.c: Apply -mno-mmx to all i?86-*-* and x86_64-*-*
   targets.
   * gcc.dg/pr63594-2.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/pr63594-1.c b/gcc/testsuite/gcc.dg/pr63594-1.c
--- a/gcc/testsuite/gcc.dg/pr63594-1.c
+++ b/gcc/testsuite/gcc.dg/pr63594-1.c
@@ -1,7 +1,7 @@
 /* PR target/63594 */
 /* { dg-do compile } */
 /* { dg-options -O2 -Wno-psabi } */
-/* { dg-additional-options -mno-mmx { target i?86-*-linux* x86_64-*-linux* } } */
+/* { dg-additional-options -mno-mmx { target i?86-*-* x86_64-*-* } } */
 
 #define C1 c
 #define C2 C1, C1
diff --git a/gcc/testsuite/gcc.dg/pr63594-2.c b/gcc/testsuite/gcc.dg/pr63594-2.c
--- a/gcc/testsuite/gcc.dg/pr63594-2.c
+++ b/gcc/testsuite/gcc.dg/pr63594-2.c
@@ -1,7 +1,7 @@
 /* PR target/63594 */
 /* { dg-do run } */
 /* { dg-options -O2 -Wno-psabi } */
-/* { dg-additional-options -mno-mmx { target i?86-*-linux* x86_64-*-linux* } } */
+/* { dg-additional-options -mno-mmx { target i?86-*-* x86_64-*-* } } */
 
 #define C1 c
 #define C2 C1, C1


 * On 64-bit Solaris/SPARC, I get
 
 FAIL: gcc.dg/pr63594-1.c (internal compiler error)
 FAIL: gcc.dg/pr63594-1.c (test for excess errors)
 
 /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c: In function
 'test1float1':
 /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:19:1:
 internal compiler error: Bus Error
 /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:57:1: note:
 in expansion of macro 'T'
 0x751c03 crash_signal
 /vol/gcc/src/hg/trunk/local/gcc/toplev.c:349
 0x44ffb4 gen_group_rtx(rtx_def*)
 /vol/gcc/src/hg/trunk/local/gcc/expr.c:1624
 0x4f8167 expand_function_start(tree_node*)
 /vol/gcc/src/hg/trunk/local/gcc/function.c:4803
 0x36278f execute
 /vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:5709

 Works fine on x86_64, and doesn't seem to be related to the fix in any way,
 it seems the ICE is related to returning or passing the vectors, so
 supposedly some latent Solaris/SPARC issue?

 Ok, I'll file a PR and Cc Eric.

This seems to be the same issue as PR target/61535.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix for PR63595

2014-10-24 Thread Martin Liška

Hello.

Following patch contains addition of PHI result comparison in IPA ICF.
Boostrap works on x86_64-linux-pc, no regression observed.

Ready for trunk?
Thanks,
Martin
gcc/testsuite/ChangeLog:

2014-10-24  Martin Liska  mli...@suse.cz

* gcc.dg/ipa/pr63595.c: New test.


gcc/ChangeLog:

2014-10-24  Martin Liska  mli...@suse.cz

* ipa-icf.c (sem_function::compare_phi_node): PHI result comparison
added.
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index d1238a4..7456fec 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -869,6 +869,12 @@ sem_function::compare_phi_node (basic_block bb1, basic_block bb2)
   phi1 = gsi_stmt (si1);
   phi2 = gsi_stmt (si2);
 
+  tree phi_result1 = gimple_phi_result (phi1);
+  tree phi_result2 = gimple_phi_result (phi2);
+
+  if (!m_checker-compare_operand (phi_result1, phi_result2))
+	return return_false_with_msg (PHI results are different);
+
   size1 = gimple_phi_num_args (phi1);
   size2 = gimple_phi_num_args (phi2);
 
diff --git a/gcc/testsuite/gcc.dg/ipa/pr63595.c b/gcc/testsuite/gcc.dg/ipa/pr63595.c
new file mode 100644
index 000..52851fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr63595.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-ipa-icf-details  } */
+
+typedef int size_t;
+
+typedef struct TypHeader {
+unsigned long size;
+struct TypHeader * * ptr;
+char name[3];
+unsigned char type;
+} * TypHandle;
+
+__attribute__((noinline))
+static TypHandle Error(const char *str, unsigned long l1, unsigned long l2)
+{
+  return 0;
+}
+
+extern TypHandle (* EvTab[81]) ( TypHandle hd );
+extern TypHandle (*TabProd[28][28]) ( TypHandle, TypHandle );
+
+__attribute__((noinline))
+TypHandle FunOnRight (TypHandle hdCall)
+{
+TypHandle hdRes;
+TypHandle hdPnt;
+TypHandle hdElm;
+
+
+if ( ((hdCall)-size) != 3*((size_t)sizeof(TypHandle)) )
+return Error(,0L,0L);
+hdPnt = ((long)(((TypHandle*)((hdCall)-ptr))[1])1 ? (((TypHandle*)((hdCall)-ptr))[1]) : (* EvTab[(((long)(((TypHandle*)((hdCall)-ptr))[1])  1) ? 1 : TypHandle*)((hdCall)-ptr))[1])-type))])TypHandle*)((hdCall)-ptr))[1])));
+hdElm = ((long)(((TypHandle*)((hdCall)-ptr))[2])1 ? (((TypHandle*)((hdCall)-ptr))[2]) : (* EvTab[(((long)(((TypHandle*)((hdCall)-ptr))[2])  1) ? 1 : TypHandle*)((hdCall)-ptr))[2])-type))])TypHandle*)((hdCall)-ptr))[2])));
+
+
+hdRes = ((*TabProd[(((long)(hdPnt)  1) ? 1 : ((hdPnt)-type))][(((long)(hdElm)  1) ? 1 : ((hdElm)-type))])((hdPnt),(hdElm)));
+return hdRes;
+}
+
+__attribute__((noinline))
+TypHandle FunOnLeft (TypHandle hdCall)
+{
+TypHandle hdRes;
+TypHandle hdPnt;
+TypHandle hdElm;
+
+
+if ( ((hdCall)-size) != 3*((size_t)sizeof(TypHandle)) )
+return Error(,0L,0L);
+hdPnt = ((long)(((TypHandle*)((hdCall)-ptr))[1])1 ? (((TypHandle*)((hdCall)-ptr))[1]) : (* EvTab[(((long)(((TypHandle*)((hdCall)-ptr))[1])  1) ? 1 : TypHandle*)((hdCall)-ptr))[1])-type))])TypHandle*)((hdCall)-ptr))[1])));
+hdElm = ((long)(((TypHandle*)((hdCall)-ptr))[2])1 ? (((TypHandle*)((hdCall)-ptr))[2]) : (* EvTab[(((long)(((TypHandle*)((hdCall)-ptr))[2])  1) ? 1 : TypHandle*)((hdCall)-ptr))[2])-type))])TypHandle*)((hdCall)-ptr))[2])));
+
+
+hdRes = ((*TabProd[(((long)(hdElm)  1) ? 1 : ((hdElm)-type))][(((long)(hdPnt)  1) ? 1 : ((hdPnt)-type))])((hdElm),(hdPnt)));
+return hdRes;
+}
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-ipa-dump Equal symbols: 0 icf  } } */
+/* { dg-final { scan-ipa-dump PHI results are different icf  } } */
+/* { dg-final { cleanup-ipa-dump icf } } */


Re: [PATCHv4] Enable -fsanitize-recover for KASan

2014-10-24 Thread Dmitry Vyukov
On Fri, Oct 24, 2014 at 12:28 PM, Yury Gribov y.gri...@samsung.com wrote:
 On 10/23/2014 03:10 PM, Andrey Ryabinin wrote:

 On 10/23/2014 02:38 PM, Jakub Jelinek wrote:

 On Thu, Oct 23, 2014 at 02:33:42PM +0400, Yury Gribov wrote:

 Actually this is a historical artifact.  If inlining proves to be
 significantly faster, they may want to switch.


 Ok.

 So, at that point you can include your ugly hacks in __asan_load* logic
 in
 the kernel, the difference between __asan_load4 and
 __asan_load4_noabort
 will be just that the latter will always return, while the former will
 not
 if some error has been reported.
 All the __asan_load* and __asan_store* entrypoints, regardless of
 -f{,no-}sanitize-recover=kernel-address are by definition not noreturn,
 they
 in the common case (if the code is not buggy) return.


 Perhaps we should just keep __asan_load* as is and leave the decision
 whether to abort or continue for the runtime?  This would make semantics
 of
 -fsanitize-recover cumbersome though (because it wouldn't work if user
 selects outline instrumentation).


 Well, the don't ever report anything while some per-CPU flag is set
 thing
 can be considered as part of the is this memory access ok test, it is
 pretending everything is accessible.

 But, otherwise, if it is supposed to be developer's decision at compile
 time, __asan_load*_noabort should better always continue, even if it
 reported issues, and __asan_load* should better not return after
 reporting
 errors.


 True, but why we need new functions for that.
 __asan_load could also abort or not depending on what user/developer
 wants.
 Why we have to rebuild the entire kernel if someone wants to switch from
 abort to noabort?

 I'm not against __asan_load_noabort, I'm just saying that this is no point
 to have separate
 __asan_load/__asan_load_noabort functions in kernel.


 I'd still suggest to emit __asan_load_noabort so that we match userspace
 (where __asan_load strictly matches __asan_report in terminating the
 program).  Behavior of __asan_load_noabort can further be restricted by user
 via various environment settings (kernel parameters, /proc, etc.).

 @Dmitry: what's your opinion on this?


I am somewhat lost in this thread and probably missing something.
But why do we need __asan_load (which is not noabort) at all? Outline
instrumentation is non a default mode for both user-space asan and
kasan (at least in the envisioned future). I would expect that these
non-typical cases that use outline instrumentation can also bear the
overhead of non-noreturn functions. Can we use just one version of
__asan_load and let runtime decide on abort?


Re: [PATCH] Fix genmatch linking

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Richard Biener wrote:

 On Fri, 24 Oct 2014, Rainer Orth wrote:
 
  Richard Biener rguent...@suse.de writes:
  
   On Thu, 23 Oct 2014, Richard Biener wrote:
  
   
   Final try for today.
  
   And this may work as well and is slightly simpler.
  
   Index: gcc/Makefile.in
   ===
   --- gcc/Makefile.in   (revision 216590)
   +++ gcc/Makefile.in   (working copy)
   @@ -981,7 +981,15 @@ else
LIBIBERTY = ../libiberty/libiberty.a
BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a
endif
   +# For stage1 and when cross-compiling use the build libcpp which is
   +# built with NLS disabled.  For stage2+ use the host library and
   +# its dependencies.
   +ifeq ($(build_objdir),$(build_libobjdir))
BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
   +else
   +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV)
   +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) 
   $(LIBICONV_DEP)
   +endif

# Dependencies on the intl and portability libraries.
LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) 
   $(LIBICONV_DEP) \
  
  
   Can you test it please?
  
  Sure: this version allowed an i386-pc-solaris2.10 bootstrap to complete
  just fine.
 
 Great.  Installed as follows.
 
 Richard.
 
 2014-10-24  Richard Biener  rguent...@suse.de
 
   * Makefile.in (BUILD_CPPLIB): When in stage2+ use the
   host library and make sure to pull in the required libintl
   and libiconv dependencies.
 
 Index: gcc/Makefile.in
 ===
 --- gcc/Makefile.in   (revision 216590)
 +++ gcc/Makefile.in   (working copy)
 @@ -981,7 +981,15 @@ else
  LIBIBERTY = ../libiberty/libiberty.a
  BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a
  endif
 +# For stage1 and when cross-compiling use the build libcpp which is
 +# built with NLS disabled.  For stage2+ use the host library and
 +# its dependencies.
 +ifeq ($(build_objdir),$(build_libobjdir))
  BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
 +else
 +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV)
 +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) 
 $(LIBICONV_DEP)
 +endif
  
  # Dependencies on the intl and portability libraries.
  LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \

Dominique reported that this fails for system libiconv but built libintl.

Which might be fixed by the following.  Does that still work for you?

Thanks,
Richard.

Index: gcc/Makefile.in
===
--- gcc/Makefile.in (revision 216626)
+++ gcc/Makefile.in (working copy)
@@ -981,15 +981,6 @@ else
 LIBIBERTY = ../libiberty/libiberty.a
 BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a
 endif
-# For stage1 and when cross-compiling use the build libcpp which is
-# built with NLS disabled.  For stage2+ use the host library and
-# its dependencies.
-ifeq ($(build_objdir),$(build_libobjdir))
-BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
-else
-BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV)
-build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP)
-endif
 
 # Dependencies on the intl and portability libraries.
 LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \
@@ -2529,6 +2520,17 @@ genprog = $(genprogerr) check checksum c
 # These programs need libs over and above what they get from the above list.
 build/genautomata$(build_exeext) : BUILD_LIBS += -lm
 
+# For stage1 and when cross-compiling use the build libcpp which is
+# built with NLS disabled.  For stage2+ use the host library and
+# its dependencies.
+ifeq ($(build_objdir),$(build_libobjdir))
+BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
+else
+BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY)
+build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP)
+build/genmatch$(build_exeext): BUILD_LIBS += $(LIBINTL) $(LIBICONV)
+endif
+
 build/genmatch$(build_exeext) : $(BUILD_CPPLIB) \
   $(BUILD_ERRORS) build/vec.o build/hash-table.o
 


Re: [PATCHv4] Enable -fsanitize-recover for KASan

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 01:44:27PM +0400, Dmitry Vyukov wrote:
 I am somewhat lost in this thread and probably missing something.
 But why do we need __asan_load (which is not noabort) at all? Outline
 instrumentation is non a default mode for both user-space asan and
 kasan (at least in the envisioned future). I would expect that these
 non-typical cases that use outline instrumentation can also bear the
 overhead of non-noreturn functions. Can we use just one version of
 __asan_load and let runtime decide on abort?

__asan_load actually must never be noreturn, because in the common
case where the load is valid it of course returns.

Jakub


Re: avoid alignment of static variables affecting stack's

2014-10-24 Thread Richard Biener
On Fri, Oct 24, 2014 at 11:18 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 24, 2014 at 11:10:08AM +0200, Richard Biener wrote:
  For something in static storage, this seems OK.  However, I think a hard
  register variable ought to be left alone -- even if we can't spill it to
  a stack slot today, there's a reasonable chance we might add that
  capability in the future.
 
  Hmm, but then wouldn't it need to be the code generating the spill
  that's responsible for enforcing suitable alignment? I can certainly
  re-submit without the hard register special cased (as it would still
  fix the original issue I'm seeing), but it feels wrong to do so.

 Yes, ISTR the spilling code is supposed to update the required
 stack alignment.  After all the RA decision might affect required
 alignment of spills.

 From what I remember, at RA time you already have to know conservatively
 that you'll want to do dynamic stack realignment and what the highest needed
 alignment will be, so various parts of expansion etc. conservatively compute
 what will be needed.  I think that is because you e.g. need to reserve some
 registers (vDRAP, etc.) if doing dynamic realignment.
 If you conservatively assume you'll need dynamic stack realignment and after
 RA you find you really don't need it, there are some optimizations in
 prologue threading where it attempts to at least decrease amount of
 unnecessary code, but the harm has already been done.

 Might be that with LRA perhaps this could be changed and not conservatively
 assume more alignment than proven to be needed, but such code isn't there I
 think.

I stand corrected then.

Richard.

 Jakub


Re: [PATCHv4] Enable -fsanitize-recover for KASan

2014-10-24 Thread Dmitry Vyukov
On Fri, Oct 24, 2014 at 1:50 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 24, 2014 at 01:44:27PM +0400, Dmitry Vyukov wrote:
 I am somewhat lost in this thread and probably missing something.
 But why do we need __asan_load (which is not noabort) at all? Outline
 instrumentation is non a default mode for both user-space asan and
 kasan (at least in the envisioned future). I would expect that these
 non-typical cases that use outline instrumentation can also bear the
 overhead of non-noreturn functions. Can we use just one version of
 __asan_load and let runtime decide on abort?

 __asan_load actually must never be noreturn, because in the common
 case where the load is valid it of course returns.


Right!

Then I am puzzled by that message by Yury:

I'd still suggest to emit __asan_load_noabort so that we match
userspace (where __asan_load strictly matches __asan_report in
terminating the program)

Why are we discussing __asan_load_noabort?


Re: [PATCHv4] Enable -fsanitize-recover for KASan

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 11:50:58AM +0200, Jakub Jelinek wrote:
 On Fri, Oct 24, 2014 at 01:44:27PM +0400, Dmitry Vyukov wrote:
  I am somewhat lost in this thread and probably missing something.
  But why do we need __asan_load (which is not noabort) at all? Outline
  instrumentation is non a default mode for both user-space asan and
  kasan (at least in the envisioned future). I would expect that these
  non-typical cases that use outline instrumentation can also bear the
  overhead of non-noreturn functions. Can we use just one version of
  __asan_load and let runtime decide on abort?
 
 __asan_load actually must never be noreturn, because in the common
 case where the load is valid it of course returns.

The point of __asan_load*_noabort (vs. __asan_load*) and
__asan_report*_noabort (vs. __asan_report*) is to allow the choice
what is fatal and what is not fatal to be done at compile time,
per compilation unit.  For __asan_report* that is a must, as __asan_report*
without noabort is noreturn, for __asan_load* which is not noreturn
the implementation can of course choose not to make something fatal when it
wishes.  Without -fsanitize-recover={address,kernel-address} support,
the choice could be done only per program or kernel globally, without a way
for programmer to differentiate.

Jakub


Re: [PATCHv4] Enable -fsanitize-recover for KASan

2014-10-24 Thread Yury Gribov

On 10/24/2014 01:44 PM, Dmitry Vyukov wrote:

On Fri, Oct 24, 2014 at 12:28 PM, Yury Gribov y.gri...@samsung.com wrote:

On 10/23/2014 03:10 PM, Andrey Ryabinin wrote:


On 10/23/2014 02:38 PM, Jakub Jelinek wrote:


On Thu, Oct 23, 2014 at 02:33:42PM +0400, Yury Gribov wrote:


Actually this is a historical artifact.  If inlining proves to be
significantly faster, they may want to switch.



Ok.


So, at that point you can include your ugly hacks in __asan_load* logic
in
the kernel, the difference between __asan_load4 and
__asan_load4_noabort
will be just that the latter will always return, while the former will
not
if some error has been reported.
All the __asan_load* and __asan_store* entrypoints, regardless of
-f{,no-}sanitize-recover=kernel-address are by definition not noreturn,
they
in the common case (if the code is not buggy) return.



Perhaps we should just keep __asan_load* as is and leave the decision
whether to abort or continue for the runtime?  This would make semantics
of
-fsanitize-recover cumbersome though (because it wouldn't work if user
selects outline instrumentation).



Well, the don't ever report anything while some per-CPU flag is set
thing
can be considered as part of the is this memory access ok test, it is
pretending everything is accessible.

But, otherwise, if it is supposed to be developer's decision at compile
time, __asan_load*_noabort should better always continue, even if it
reported issues, and __asan_load* should better not return after
reporting
errors.



True, but why we need new functions for that.
__asan_load could also abort or not depending on what user/developer
wants.
Why we have to rebuild the entire kernel if someone wants to switch from
abort to noabort?

I'm not against __asan_load_noabort, I'm just saying that this is no point
to have separate
__asan_load/__asan_load_noabort functions in kernel.



I'd still suggest to emit __asan_load_noabort so that we match userspace
(where __asan_load strictly matches __asan_report in terminating the
program).  Behavior of __asan_load_noabort can further be restricted by user
via various environment settings (kernel parameters, /proc, etc.).

@Dmitry: what's your opinion on this?


I am somewhat lost in this thread and probably missing something.
But why do we need __asan_load (which is not noabort) at all? Outline
instrumentation is non a default mode for both user-space asan and
kasan (at least in the envisioned future).


Well, it's still enabled automatically if number of memory accesses is 
getting large enough (I think it was 7000?) so it is default in a way.



I would expect that these
non-typical cases that use outline instrumentation can also bear the
overhead of non-noreturn functions.


As Jakub mentioned both __asan_load and __asan_load_noabort will _not_ 
be NORETURN.  Two different versions are necessary so that compiler can 
insert outline instrumentation that matches the -fsanitize-recover setting.


 Can we use just one version of

__asan_load and let runtime decide on abort?


In this case semantics of inline and outline instrumentation will differ 
(the former depending on -fsanitize-recover compile-time setting and the 
latter depending on runtime options) which may be undesirable given that 
compiler may automatically choose to switch from inline to outline 
depending on function size.


-Y



Re: [PATCH] Fix for PR63595

2014-10-24 Thread Richard Biener
On Fri, Oct 24, 2014 at 11:35 AM, Martin Liška mli...@suse.cz wrote:
 Hello.

 Following patch contains addition of PHI result comparison in IPA ICF.
 Boostrap works on x86_64-linux-pc, no regression observed.

 Ready for trunk?

Ok.

Thanks,
Richard.

 Thanks,
 Martin


Re: avoid alignment of static variables affecting stack's

2014-10-24 Thread Jan Beulich
 On 24.10.14 at 11:52, richard.guent...@gmail.com wrote:
 On Fri, Oct 24, 2014 at 11:18 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 24, 2014 at 11:10:08AM +0200, Richard Biener wrote:
  For something in static storage, this seems OK.  However, I think a hard
  register variable ought to be left alone -- even if we can't spill it to
  a stack slot today, there's a reasonable chance we might add that
  capability in the future.
 
  Hmm, but then wouldn't it need to be the code generating the spill
  that's responsible for enforcing suitable alignment? I can certainly
  re-submit without the hard register special cased (as it would still
  fix the original issue I'm seeing), but it feels wrong to do so.

 Yes, ISTR the spilling code is supposed to update the required
 stack alignment.  After all the RA decision might affect required
 alignment of spills.

 From what I remember, at RA time you already have to know conservatively
 that you'll want to do dynamic stack realignment and what the highest needed
 alignment will be, so various parts of expansion etc. conservatively compute
 what will be needed.  I think that is because you e.g. need to reserve some
 registers (vDRAP, etc.) if doing dynamic realignment.
 If you conservatively assume you'll need dynamic stack realignment and after
 RA you find you really don't need it, there are some optimizations in
 prologue threading where it attempts to at least decrease amount of
 unnecessary code, but the harm has already been done.

 Might be that with LRA perhaps this could be changed and not conservatively
 assume more alignment than proven to be needed, but such code isn't there I
 think.
 
 I stand corrected then.

So am I to conclude then that I need to take out the hard register
check in order for this to be accepted?

Jan



Re: [PATCHv4] Enable -fsanitize-recover for KASan

2014-10-24 Thread Dmitry Vyukov
On Fri, Oct 24, 2014 at 1:59 PM, Yury Gribov y.gri...@samsung.com wrote:
 On 10/24/2014 01:44 PM, Dmitry Vyukov wrote:

 On Fri, Oct 24, 2014 at 12:28 PM, Yury Gribov y.gri...@samsung.com
 wrote:

 On 10/23/2014 03:10 PM, Andrey Ryabinin wrote:


 On 10/23/2014 02:38 PM, Jakub Jelinek wrote:


 On Thu, Oct 23, 2014 at 02:33:42PM +0400, Yury Gribov wrote:


 Actually this is a historical artifact.  If inlining proves to be
 significantly faster, they may want to switch.



 Ok.

 So, at that point you can include your ugly hacks in __asan_load*
 logic
 in
 the kernel, the difference between __asan_load4 and
 __asan_load4_noabort
 will be just that the latter will always return, while the former
 will
 not
 if some error has been reported.
 All the __asan_load* and __asan_store* entrypoints, regardless of
 -f{,no-}sanitize-recover=kernel-address are by definition not
 noreturn,
 they
 in the common case (if the code is not buggy) return.



 Perhaps we should just keep __asan_load* as is and leave the decision
 whether to abort or continue for the runtime?  This would make
 semantics
 of
 -fsanitize-recover cumbersome though (because it wouldn't work if user
 selects outline instrumentation).



 Well, the don't ever report anything while some per-CPU flag is set
 thing
 can be considered as part of the is this memory access ok test, it is
 pretending everything is accessible.

 But, otherwise, if it is supposed to be developer's decision at compile
 time, __asan_load*_noabort should better always continue, even if it
 reported issues, and __asan_load* should better not return after
 reporting
 errors.


 True, but why we need new functions for that.
 __asan_load could also abort or not depending on what user/developer
 wants.
 Why we have to rebuild the entire kernel if someone wants to switch from
 abort to noabort?

 I'm not against __asan_load_noabort, I'm just saying that this is no
 point
 to have separate
 __asan_load/__asan_load_noabort functions in kernel.



 I'd still suggest to emit __asan_load_noabort so that we match userspace
 (where __asan_load strictly matches __asan_report in terminating the
 program).  Behavior of __asan_load_noabort can further be restricted by
 user
 via various environment settings (kernel parameters, /proc, etc.).

 @Dmitry: what's your opinion on this?


 I am somewhat lost in this thread and probably missing something.
 But why do we need __asan_load (which is not noabort) at all? Outline
 instrumentation is non a default mode for both user-space asan and
 kasan (at least in the envisioned future).


 Well, it's still enabled automatically if number of memory accesses is
 getting large enough (I think it was 7000?) so it is default in a way.

 I would expect that these
 non-typical cases that use outline instrumentation can also bear the
 overhead of non-noreturn functions.


 As Jakub mentioned both __asan_load and __asan_load_noabort will _not_ be
 NORETURN.  Two different versions are necessary so that compiler can insert
 outline instrumentation that matches the -fsanitize-recover setting.

 Can we use just one version of

 __asan_load and let runtime decide on abort?


 In this case semantics of inline and outline instrumentation will differ
 (the former depending on -fsanitize-recover compile-time setting and the
 latter depending on runtime options) which may be undesirable given that
 compiler may automatically choose to switch from inline to outline depending
 on function size.


Ok, thank you, now I am on the same page.

Yury, I would expect that the inline instrumentation will become the
default in the kernel as well (it's 2 times faster, or maybe even more
if happens to affect loop registration). Do you agree?
If we consider inline as default, then the user won't be able to
simply switch between abort/noabort with a runtime flag w/o rebuilding
the kernel, because for __asan_report* we have to have 2 versions.
I would also consider that abort/noabort is pretty persistent in a
particular testing environment -- you either use one or another and do
not switch frequently between them.

Taking that into account and the fact that __asan_load* can be emitted
due to call threshold, I am mildly in favor of Jakub's position of
making it all consistent between user/kernel and load/report and
explicit. To make it clear, I mean the runtime interface:
__asan_load* -- not-noreturn, always aborts in runtime on failure
__asan_load*_noabort - not-noreturn, never aborts in runtime on failure


Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-24 Thread Yuri Rumyantsev
Richard,

Patch containing new core related to extended predication is attached.
Here is few comments which explain a main goal of design.

1. I don't want to insert any critical edge splitting since it may
lead to less efficient binaries (I remember some performance issue
when we designed lazy code motion algorithm in SPARC compiler).
2. One special case of extended PHI node predication was introduced
when #arguments is more than 2 but only two arguments are different
and one argument has the only occurrence. For such PHI conditional
scalar reduction is applied.
This is correspondent to the following:
if (q1  q2  q3) var++
 New function phi_has_two_different_args was introduced to detect such phi.
3. Original algorithm for PHI predication used assumption that at
least one incoming edge for blocks containing PHI is not critical - it
guarantees that all computations related to predicate of normal edge
are already inserted above this block and
core related to PHI predication can be inserted at the beginning of
block. But this is not true for critical edges for which predicate
computations are  in the block where code for phi predication must be
inserted. So new function find_insertion_point is introduced which is
simply found out the last statement in block defining predicates
correspondent to all incoming edges and insert phi predication code
after it (with some minor exceptions).

If you need more comments or something unclear will let me know.

Thanks.
Yuri.

ChangeLog:

2014-10-24  Yuri Rumyantsev  ysrum...@gmail.com

* tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
FLAG_FORCE_VECTORIZE instead of loop flag.
(if_convertible_bb_p): Allow bb has more than 2 predecessors if
FLAG_FORCE_VECTORIZE is true.
(if_convertible_bb_p): Delete check that bb has at least one
non-critical incoming edge.
(phi_has_two_different_args): New function.
(is_cond_scalar_reduction): Add argument EXTENDED to choose access
to phi arguments. Invoke phi_has_two_different_args to get phi
arguments if EXTENDED is true. Change check that block
containing reduction statement candidate is predecessor
of phi-block since phi may have more than two arguments.
(convert_scalar_cond_reduction): Add argument BEFORE to insert
statement before/after gsi point.
(predicate_scalar_phi): Add argument false (which means non-extended
predication) to call of is_cond_scalar_reduction. Add argument
true (which correspondent to argument BEFORE) to call of
convert_scalar_cond_reduction.
(get_predicate_for_edge): New function.
(predicate_arbitrary_scalar_phi): New function.
(predicate_extended_scalar_phi): New function.
(find_insertion_point): New function.
(predicate_all_scalar_phis): Add two boolean variables EXTENDED and
BEFORE. Initialize EXTENDED to true if BB containing phi has more
than 2 predecessors or both incoming edges are critical. Invoke
find_phi_replacement_condition and predicate_scalar_phi or
find_insertion_point and predicate_extended_scalar_phi depending on
EXTENDED value.
(insert_gimplified_predicates): Add check that non-predicated block
may have statements to insert. Insert predicate of BB just after label
if FLAG_FORCE_VECTORIZE is true.
(tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
is copy of inner or outer loop field force_vectorize.


2014-10-24 13:12 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 In my initial design I did such splitting but before start real
 if-conversion but I decided to not perform it since code size for
 if-converted loop is growing (number of phi nodes is increased). It is
 worth noting also that for phi with #nodes  2 we need to get all
 predicates (except one) to do phi-predication and it means that block
 containing such phi can have only 1 critical edge.

 Can you point me to the patch with the special insertion code then?
 I definitely want to avoid the mess we ran into with the reassoc
 code clever insertion code.

 Richard.

 Thanks.
 Yuri.

 2014-10-21 18:19 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 I saw the sources of these functions, but I can't understand why I
 should use something else? Note that all predicate computations are
 located in basic blocks ( by design of if-conv) and there is special
 function that put these computations in bb
 (insert_gimplified_predicates). Edge contains only predicate not its
 computations. New function - find_insertion_point() does very simple
 search - it finds out the latest (in current bb) operand def-stmt of
 predicates taken from all incoming edges.
 In original algorithm the predicate of non-critical edge is taken to
 perform phi-node predication since for critical edge it does not work
 properly.

 My question is: does your comments mean that I 

Re: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*

2014-10-24 Thread Marcus Shawcroft
On 23 October 2014 18:51, Charles Baylis charles.bay...@linaro.org wrote:

 Otherwise this and the previous 1/2 associated patch look good, can
 you respin with these tidy ups?

 OK for trunk?

OK
/Marcus


Re: [patch,avr] tweak sign extensions, take #2

2014-10-24 Thread Georg-Johann Lay

Am 10/23/2014 08:16 PM schrieb Denis Chertykov:

This optimization makes most sign-extensions one instruction shorter in the
case when the source register may be clobbered and the register numbers are
different.  Source and destination may overlap.

Ok for trunk?

Johann

gcc/
 * config/avr/avr.md (extendqihi2, extendqipsi2, extendqisi2)
 (extendhipsi2, extendhisi2): Optimize if source reg is unused
 after the insns and has different REGNO than destination.


Approved.

Denis.


Finally I switched to a solution that avoids all the ugly asm snippets and 
special casing, and which is exact w.r.t code size.  So allow me drop the patch 
from above and to propose this one for trunk.  Sorry for the inconvenience.


In any case it uses LSL/SBC idiom instead of the old CLR/SBRC/COM.


Johann

* avr-protos.h (avr_out_sign_extend): New.
* avr.c (avr_adjust_insn_length) [ADJUST_LEN_SEXT]: Handle.
(avr_out_sign_extend): New function.
* avr.md (extendqihi2, extendqipsi2, extendqisi2, extendhipsi2)
(extendhisi2, extendpsisi2): Use it.
(adjust_len) [sext]: New.



Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(revision 216591)
+++ config/avr/avr-protos.h	(working copy)
@@ -57,6 +57,7 @@ extern const char *avr_out_compare (rtx_
 extern const char *avr_out_compare64 (rtx_insn *, rtx*, int*);
 extern const char *ret_cond_branch (rtx x, int len, int reverse);
 extern const char *avr_out_movpsi (rtx_insn *, rtx*, int*);
+extern const char *avr_out_sign_extend (rtx_insn *, rtx*, int*);
 
 extern const char *ashlqi3_out (rtx_insn *insn, rtx operands[], int *len);
 extern const char *ashlhi3_out (rtx_insn *insn, rtx operands[], int *len);
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 216592)
+++ config/avr/avr.c	(working copy)
@@ -7734,6 +7734,56 @@ avr_out_bitop (rtx insn, rtx *xop, int *
 }
 
 
+/* Output sign extension from XOP[1] to XOP[0] and return .
+   If PLEN == NULL, print assembler instructions to perform the operation;
+   otherwise, set *PLEN to the length of the instruction sequence (in words)
+   as printed with PLEN == NULL.  */
+
+const char*
+avr_out_sign_extend (rtx_insn *insn, rtx *xop, int *plen)
+{
+  // Size in bytes of source resp. destination operand.
+  unsigned n_src = GET_MODE_SIZE (GET_MODE (xop[1]));
+  unsigned n_dest = GET_MODE_SIZE (GET_MODE (xop[0]));
+  rtx r_msb = all_regs_rtx[REGNO (xop[1]) + n_src - 1];
+
+  if (plen)
+*plen = 0;
+
+  // Copy destination to source
+
+  if (REGNO (xop[0]) != REGNO (xop[1]))
+{
+  gcc_assert (n_src = 2);
+
+  if (n_src == 2)
+avr_asm_len (AVR_HAVE_MOVW
+ ? movw %0,%1
+ : mov %B0,%B1, xop, plen, 1);
+  if (n_src == 1 || !AVR_HAVE_MOVW)
+avr_asm_len (mov %A0,%A1, xop, plen, 1);
+}
+
+  // Set Carry to the sign bit MSB.7...
+
+  if (REGNO (xop[0]) == REGNO (xop[1])
+  || !reg_unused_after (insn, r_msb))
+{
+  avr_asm_len (mov __tmp_reg__,%0, r_msb, plen, 1);
+  r_msb = tmp_reg_rtx;
+}
+  
+  avr_asm_len (lsl %0, r_msb, plen, 1);
+   
+  // ...and propagate it to all the new sign bits
+
+  for (unsigned n = n_src; n  n_dest; n++)
+avr_asm_len (sbc %0,%0, all_regs_rtx[REGNO (xop[0]) + n], plen, 1);
+
+  return ;
+}
+
+
 /* PLEN == NULL: Output code to add CONST_INT OP[0] to SP.
PLEN != NULL: Set *PLEN to the length of that sequence.
Return .  */
@@ -8578,6 +8628,7 @@ avr_adjust_insn_length (rtx_insn *insn,
 case ADJUST_LEN_MOVMEM: avr_out_movmem (insn, op, len); break;
 case ADJUST_LEN_XLOAD: avr_out_xload (insn, op, len); break;
 case ADJUST_LEN_LPM: avr_out_lpm (insn, op, len); break;
+case ADJUST_LEN_SEXT: avr_out_sign_extend (insn, op, len); break;
 
 case ADJUST_LEN_SFRACT: avr_out_fract (insn, op, true, len); break;
 case ADJUST_LEN_UFRACT: avr_out_fract (insn, op, false, len); break;
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 216592)
+++ config/avr/avr.md	(working copy)
@@ -147,7 +147,7 @@ (define_attr length 
 ;; Otherwise do special processing depending on the attribute.
 
 (define_attr adjust_len
-  out_bitop, plus, addto_sp,
+  out_bitop, plus, addto_sp, sext,
tsthi, tstpsi, tstsi, compare, compare64, call,
mov8, mov16, mov24, mov32, reload_in16, reload_in24, reload_in32,
ufract, sfract, round,
@@ -4174,62 +4174,66 @@ (define_insn extendqihi2
   [(set (match_operand:HI 0 register_operand =r,r)
 (sign_extend:HI (match_operand:QI 1 combine_pseudo_register_operand 0,*r)))]
   
-  @
-	clr %B0\;sbrc %0,7\;com %B0
-	mov %A0,%A1\;clr %B0\;sbrc %A0,7\;com %B0
+  {
+return avr_out_sign_extend (insn, operands, NULL);
+  }
   [(set_attr length 3,4)
-   (set_attr cc set_n,set_n)])
+   

Re: avoid alignment of static variables affecting stack's

2014-10-24 Thread Richard Biener
On Fri, Oct 24, 2014 at 12:12 PM, Jan Beulich jbeul...@suse.com wrote:
 On 24.10.14 at 11:52, richard.guent...@gmail.com wrote:
 On Fri, Oct 24, 2014 at 11:18 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 24, 2014 at 11:10:08AM +0200, Richard Biener wrote:
  For something in static storage, this seems OK.  However, I think a hard
  register variable ought to be left alone -- even if we can't spill it to
  a stack slot today, there's a reasonable chance we might add that
  capability in the future.
 
  Hmm, but then wouldn't it need to be the code generating the spill
  that's responsible for enforcing suitable alignment? I can certainly
  re-submit without the hard register special cased (as it would still
  fix the original issue I'm seeing), but it feels wrong to do so.

 Yes, ISTR the spilling code is supposed to update the required
 stack alignment.  After all the RA decision might affect required
 alignment of spills.

 From what I remember, at RA time you already have to know conservatively
 that you'll want to do dynamic stack realignment and what the highest needed
 alignment will be, so various parts of expansion etc. conservatively compute
 what will be needed.  I think that is because you e.g. need to reserve some
 registers (vDRAP, etc.) if doing dynamic realignment.
 If you conservatively assume you'll need dynamic stack realignment and after
 RA you find you really don't need it, there are some optimizations in
 prologue threading where it attempts to at least decrease amount of
 unnecessary code, but the harm has already been done.

 Might be that with LRA perhaps this could be changed and not conservatively
 assume more alignment than proven to be needed, but such code isn't there I
 think.

 I stand corrected then.

 So am I to conclude then that I need to take out the hard register
 check in order for this to be accepted?

Yes.

Thanks,
Richard.

 Jan



Re: [PATCH][AArch64] Cleanup logic around aarch64_final_prescan

2014-10-24 Thread Marcus Shawcroft
On 22 October 2014 14:57, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 2014-10-13  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64.h (ADJUST_INSN_LENGTH): Wrap definition in
 do while (0).
 * config/aarch64/aarch64.c (is_mem_p): Delete.
 (is_memory_op): Rename to...
 (has_memory_op): ... This.  Use FOR_EACH_SUBRTX.
 (dep_between_memop_and_curr): Assert that the input is a SET.
 (aarch64_madd_needs_nop): Add comment.  Do not call
 dep_between_memop_and_curr on NULL body.
 (aarch64_final_prescan_insn): Add comment.
 Include rtl-iter.h.

OK /Marcus


Re: [PATCH] PR58867 ASan and UBSan tests not run for installed testing.

2014-10-24 Thread Eric Botcazou
 some time ago, Andrew wrote a patch that fixes PR58867
 (http://patchwork.ozlabs.org/patch/286866/), but for some reasons it
 wasn't committed to trunk.
 
 This is resurrected Andrew's patch, extended to support Tsan testsuite.

This patch broke --disable-libsanitizer though, i.e. you now get gazillions of 
sanitizer failures in the C and C++ testsuites.

-- 
Eric Botcazou


Re: [PATCH][AArch64][4.8] Backport Cortex-A53 erratum 835769 workaround

2014-10-24 Thread Marcus Shawcroft
On 17 October 2014 16:55, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This is the 4.8 backport of the Cortex-A53 erratum 835769 workaround.
 4.8 doesn't have rtx_insns and the type attributes are different.
 Other than that there's not much different from the trunk version.

 Bootstrapped and tested on aarch64-none-linux-gnu with and without the
 workaround enabled.
 Compiled various large benchmarks with it.

 Ok for the 4.8 branch?

OK /Marcus


[PATCH][3,4,5/n] Merge from match-and-simplify, fold, fold_stmt and first patterns

2014-10-24 Thread Richard Biener

This combines the already posted 3/n (first simple patterns),
4/n (hook into fold-const.c) and not yet posted 5/n (hook into
fold_stmt).  Over the first posting this also contains
recent improvements to the generator from the branch regarding
to TREE_SIDE_EFFECTS and NON_LVALUE_EXPR handling.

The hook into fold_stmt leaves all existing calls in doing
what fold_stmt does currently (the match-and-simplify machinery
will not follow SSA edges).  It adds the ability to enable
that though via an overload taking a valueization hook as
argument (this is how tree-ssa-forwprop.c will exercise it).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Thanks,
Richard.

2014-10-24  Richard Biener  rguent...@suse.de

* genmatch.c (expr::gen_transform): Use fold_buildN_loc
and build_call_expr_loc.
(dt_simplify::gen): Drop non_lvalue for GIMPLE, use
non_lvalue_loc to build it for GENERIC.
(decision_tree::gen_generic): Add location argument to
generic_simplify prototype.
(capture_info): New class.
(capture_info::capture_info): New constructor.
(capture_info::walk_match): New method.
(capture_info::walk_result): New method.
(capture_info::walk_c_expr): New method.
(dt_simplify::gen): Handle preserving side-effects for
GENERIC code generation.
(decision_tree::gen_generic): Do not reject operands
with TREE_SIDE_EFFECTS.
* generic-match.h: New file.
* generic-match-head.c: Include generic-match.h, not gimple-match.h.
* match.pd: Add some constant folding patterns from fold-const.c.
* fold-const.c: Include generic-match.h.
(fold_unary_loc): Dispatch to generic_simplify.
(fold_ternary_loc): Likewise.
(fold_binary_loc): Likewise.  Remove patterns now implemented
by generic_simplify.
* gimple-fold.c (replace_stmt_with_simplification): New function.
(fold_stmt_1): Add valueize parameter, dispatch to gimple_simplify.
(no_follow_ssa_edges): New function.
(fold_stmt): New overload with valueization hook.  Use
no_follow_ssa_edges for the overload without hook.
(fold_stmt_inplace): Likewise.
* gimple-fold.h (no_follow_ssa_edges): Declare.

Index: gcc/generic-match.h
===
*** /dev/null   1970-01-01 00:00:00.0 +
--- gcc/generic-match.h 2014-10-23 15:45:28.322836040 +0200
***
*** 0 
--- 1,33 
+ /* Generic simplify definitions.
+ 
+Copyright (C) 2011-2014 Free Software Foundation, Inc.
+Contributed by Richard Guenther rguent...@suse.de
+ 
+ This file is part of GCC.
+ 
+ GCC is free software; you can redistribute it and/or modify it under
+ the terms of the GNU General Public License as published by the Free
+ Software Foundation; either version 3, or (at your option) any later
+ version.
+ 
+ GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+ 
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING3.  If not see
+ http://www.gnu.org/licenses/.  */
+ 
+ #ifndef GCC_GENERIC_MATCH_H
+ #define GCC_GENERIC_MATCH_H
+ 
+ /* Note the following functions are supposed to be only used from
+fold_unary_loc, fold_binary_loc and fold_ternary_loc respectively.
+They are not considered a public API.  */
+ 
+ tree generic_simplify (location_t, enum tree_code, tree, tree);
+ tree generic_simplify (location_t, enum tree_code, tree, tree, tree);
+ tree generic_simplify (location_t, enum tree_code, tree, tree, tree, tree);
+ 
+ #endif  /* GCC_GENERIC_MATCH_H */
Index: gcc/generic-match-head.c
===
*** gcc/generic-match-head.c.orig   2014-10-23 15:45:26.935836135 +0200
--- gcc/generic-match-head.c2014-10-23 15:45:28.322836040 +0200
*** along with GCC; see the file COPYING3.
*** 43,48 
  #include tree-phinodes.h
  #include ssa-iterators.h
  #include dumpfile.h
! #include gimple-match.h
  
  
--- 43,48 
  #include tree-phinodes.h
  #include ssa-iterators.h
  #include dumpfile.h
! #include generic-match.h
  
  
Index: gcc/fold-const.c
===
*** gcc/fold-const.c.orig   2014-10-23 15:44:38.601839463 +0200
--- gcc/fold-const.c2014-10-23 15:45:51.976834411 +0200
*** along with GCC; see the file COPYING3.
*** 70,75 
--- 70,76 
  #include hash-table.h  /* Required for ENABLE_FOLD_CHECKING.  */
  #include builtins.h
  #include cgraph.h
+ #include generic-match.h
  
  /* Nonzero if we are folding constants inside an initializer; zero
 otherwise.  */
*** fold_unary_loc (location_t loc, enum tre
*** 

Re: [PATCH][AArch64][4.8] Add --enable-fix-cortex-a53-835769 configure option

2014-10-24 Thread Marcus Shawcroft
On 17 October 2014 16:55, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This is the 4.8 backport of the configure option
 --enable-fix-cortex-a53-835769 to enable the workaround
 for the Cortex-A53 erratum 835769 by default. The patch is very similar to
 the trunk version, just some
 differences in the placement of the relevant sections.

 Bootstrapped and tested on aarch64-none-linux-gnu.

 Ok for the 4.8 branch together with the -mfix-cortex-a53-835769 option
 backport?

OK /Marcus


Re: [PATCH] Fix genmatch linking

2014-10-24 Thread Rainer Orth
Richard Biener rguent...@suse.de writes:

 Dominique reported that this fails for system libiconv but built libintl.

 Which might be fixed by the following.  Does that still work for you?

It does: an i386-pc-solaris2.10 bootstrap has finished by now and make
check is running.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][AArch64] LINK_SPEC changes for Cortex-A53 erratum 835769 workaround

2014-10-24 Thread Marcus Shawcroft
On 22 October 2014 15:20, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This patch contains the LINK_SPEC changes required to pass on the linker
 option --fix-cortex-a53-835769 when compiling with -mfix-cortex-a53-835769
 (or by default when configured with --enable-fix-cortex-a53-835769).

 This requires a binutils installation with the patch posted at
 https://sourceware.org/ml/binutils/2014-10/msg00198.html applied.


 Bootstrapped and tested on aarch64-none-linux-gnu and built various
 benchmarks.
 This patch applies to 4.9 (4.8 version will be posted separately) and has
 been tested there as well.

 Ok for trunk and 4.9?

The corresponding binutils changes are  committed on binutils trunk,
2.25 and 2.24.

The trunk patch is OK.

Given that Jakub is in the process of preparing a  4.9.2 I'd like an
explicit OK before we commit on 4.9. Jakub?

Cheers
/Marcus


[match-and-simplify] Merge from trunk

2014-10-24 Thread Richard Biener

2014-10-24  Richard Biener  rguent...@suse.de

Merge from trunk r216543 through r216631.

Brings back second merge piece.



Re: [PATCH][AArch64] LINK_SPEC changes for Cortex-A53 erratum 835769 workaround

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 12:04:52PM +0100, Marcus Shawcroft wrote:
 On 22 October 2014 15:20, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
  Hi all,
 
  This patch contains the LINK_SPEC changes required to pass on the linker
  option --fix-cortex-a53-835769 when compiling with -mfix-cortex-a53-835769
  (or by default when configured with --enable-fix-cortex-a53-835769).
 
  This requires a binutils installation with the patch posted at
  https://sourceware.org/ml/binutils/2014-10/msg00198.html applied.
 
 
  Bootstrapped and tested on aarch64-none-linux-gnu and built various
  benchmarks.
  This patch applies to 4.9 (4.8 version will be posted separately) and has
  been tested there as well.
 
  Ok for trunk and 4.9?
 
 The corresponding binutils changes are  committed on binutils trunk,
 2.25 and 2.24.
 
 The trunk patch is OK.
 
 Given that Jakub is in the process of preparing a  4.9.2 I'd like an
 explicit OK before we commit on 4.9. Jakub?

Is that a regression on the 4.9 branch?  If not, I'd prefer if it could wait
for 4.9.3.

Jakub


Re: [PATCH] Fix genmatch linking

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Rainer Orth wrote:

 Richard Biener rguent...@suse.de writes:
 
  Dominique reported that this fails for system libiconv but built libintl.
 
  Which might be fixed by the following.  Does that still work for you?
 
 It does: an i386-pc-solaris2.10 bootstrap has finished by now and make
 check is running.

Dominique reported an ok as well.  Bootstrapped myself on
x86_64-unknown-linux-gnu and commited as r216632.

Richard.

2014-10-24  Richard Biener  rguent...@suse.de

* Makefile.in (BUILD_CPPLIB): Move $(LIBINTL) $(LIBICONV)
to genmatch BUILD_LIBS instead.

Index: gcc/Makefile.in
===
--- gcc/Makefile.in (revision 216626)
+++ gcc/Makefile.in (working copy)
@@ -981,15 +981,6 @@ else
 LIBIBERTY = ../libiberty/libiberty.a
 BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a
 endif
-# For stage1 and when cross-compiling use the build libcpp which is
-# built with NLS disabled.  For stage2+ use the host library and
-# its dependencies.
-ifeq ($(build_objdir),$(build_libobjdir))
-BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
-else
-BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV)
-build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP)
-endif
 
 # Dependencies on the intl and portability libraries.
 LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \
@@ -2529,6 +2520,17 @@ genprog = $(genprogerr) check checksum c
 # These programs need libs over and above what they get from the above list.
 build/genautomata$(build_exeext) : BUILD_LIBS += -lm
 
+# For stage1 and when cross-compiling use the build libcpp which is
+# built with NLS disabled.  For stage2+ use the host library and
+# its dependencies.
+ifeq ($(build_objdir),$(build_libobjdir))
+BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a
+else
+BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY)
+build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP)
+build/genmatch$(build_exeext): BUILD_LIBS += $(LIBINTL) $(LIBICONV)
+endif
+
 build/genmatch$(build_exeext) : $(BUILD_CPPLIB) \
   $(BUILD_ERRORS) build/vec.o build/hash-table.o
 


Re: [PATCH][3,4,5/n] Merge from match-and-simplify, fold, fold_stmt and first patterns

2014-10-24 Thread Marc Glisse



+ /* Same applies to modulo operations, but fold is inconsistent here
+and simplifies 0 % x to 0, only preserving literal 0 % 0.  */
+ (for op (ceil_mod floor_mod round_mod trunc_mod)
+  /* 0 % X is always zero.  */
+  (simplify
+   (trunc_mod integer_zerop@0 @1)
+   /* But not for 0 % 0 so that we can get the proper warnings and errors.  */
+   (if (!integer_zerop (@1))
+@0))
+  /* X % 1 is always zero.  */
+  (simplify
+   (trunc_mod @0 integer_onep)
+   { build_zero_cst (type); }))


op is unused, you probably meant to replace trunc_mod with it.

--
Marc Glisse


Re: [PATCH][3,4,5/n] Merge from match-and-simplify, fold, fold_stmt and first patterns

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Marc Glisse wrote:

 
  + /* Same applies to modulo operations, but fold is inconsistent here
  +and simplifies 0 % x to 0, only preserving literal 0 % 0.  */
  + (for op (ceil_mod floor_mod round_mod trunc_mod)
  +  /* 0 % X is always zero.  */
  +  (simplify
  +   (trunc_mod integer_zerop@0 @1)
  +   /* But not for 0 % 0 so that we can get the proper warnings and errors.
  */
  +   (if (!integer_zerop (@1))
  +@0))
  +  /* X % 1 is always zero.  */
  +  (simplify
  +   (trunc_mod @0 integer_onep)
  +   { build_zero_cst (type); }))
 
 op is unused, you probably meant to replace trunc_mod with it.

Oh, indeed.  I'll fix that up next week (heh - sth for a first warning
from genmatch!).

Thanks,
Richard.


[PATCH][ARM] revert changes on check_effective_target_arm_*_ok

2014-10-24 Thread Jiong Wang

we should not add explicit declaration there.

arm_neon.h contains those prototype already. they will be available if the
compiler configuration is with related builtin predefine, for example 
__ARM_FEATURE_CRYPTO.

so, actually, if there is any warning when compile these test programs, they 
are expected,
and we rely on these warnings to check whether certain features are available.

previously, I only verified on arm-none-linux-gnueabi cross check, so have not 
exposed
these regressions.

no verified on arm-none-linux-gnueabihf, regression gone away on arm directory.

make check RUNTESTFLAGS=aapcs.exp neon.exp acle.exp simd.exp arm.exp

ok for trunk?

gcc/testsuite/

* lib/target-supports.exp
(check_effective_target_arm_crypto_ok_nocache): Remove declaration for
vaeseq_u8.
(check_effective_target_arm_neon_fp16_ok_nocache): Remove declaration 
for
vcvt_f16_f32.
(check_effective_target_arm_neonv2_ok_nocache): Remove declaration for
vfma_f32.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 91460c2..4398345 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2404,7 +2404,6 @@ proc check_effective_target_arm_crypto_ok_nocache { } {
 	foreach flags { -mfloat-abi=softfp -mfpu=crypto-neon-fp-armv8 -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp} {
 	if { [check_no_compiler_messages_nocache arm_crypto_ok object {
 		#include arm_neon.h
-		extern uint8x16_t vaeseq_u8 (uint8x16_t, uint8x16_t);
 		uint8x16_t
 		foo (uint8x16_t a, uint8x16_t b)
 		{
@@ -2549,7 +2548,6 @@ proc check_effective_target_arm_neon_fp16_ok_nocache { } {
 	   -mfpu=neon-fp16 -mfloat-abi=softfp} {
 	if { [check_no_compiler_messages_nocache arm_neon_fp_16_ok object {
 		#include arm_neon.h
-		extern float16x4_t vcvt_f16_f32 (float32x4_t);
 		float16x4_t
 		foo (float32x4_t arg)
 		{
@@ -2625,7 +2623,6 @@ proc check_effective_target_arm_neonv2_ok_nocache { } {
 	foreach flags { -mfloat-abi=softfp -mfpu=neon-vfpv4 -mfpu=neon-vfpv4 -mfloat-abi=softfp} {
 	if { [check_no_compiler_messages_nocache arm_neonv2_ok object {
 		#include arm_neon.h
-		extern float32x2_t vfma_f32 (float32x2_t, float32x2_t, float32x2_t);
 		float32x2_t 
 		foo (float32x2_t a, float32x2_t b, float32x2_t c)
 {

[PATCH][ARM] gnu11 cleanup for aapcs testcases

2014-10-24 Thread Jiong Wang

a furhter cleanup under aapcs sub-directory.

ok for trunk?

gcc/testsuite/
  * gcc.target/arm/aapcs/abitest.h: Declare memcpy.
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
index 06a92c3..7bce58b 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
+++ b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
@@ -49,6 +49,8 @@
 
 
 extern void abort (void);
+typedef unsigned int size_t;
+extern int memcmp (const void *s1, const void *s2, size_t n);
 
 __attribute__((naked))  void dumpregs () __asm(myfunc);
 __attribute__((naked))  void dumpregs ()

Re: [PATCH][ARM] gnu11 cleanup for aapcs testcases

2014-10-24 Thread Marek Polacek
On Fri, Oct 24, 2014 at 12:48:24PM +0100, Jiong Wang wrote:
 a furhter cleanup under aapcs sub-directory.
 
 ok for trunk?
 
 gcc/testsuite/
   * gcc.target/arm/aapcs/abitest.h: Declare memcpy.

 diff --git a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h 
 b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
 index 06a92c3..7bce58b 100644
 --- a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
 +++ b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h
 @@ -49,6 +49,8 @@
  
  
  extern void abort (void);
 +typedef unsigned int size_t;
 +extern int memcmp (const void *s1, const void *s2, size_t n);

You can use __SIZE_TYPE__ and then you don't need the typedef.

Marek


Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-10-24 Thread Alan Lawrence



Rainer Orth wrote:

However, as a quick first step, does adding the ilp32 / lp64 (and keeping
the architectures list for now) solve the immediate problem? Patch
attached, OK for trunk?


No, as I said this is wrong for biarch targets like sparc and i386.


When you say no this does not solve the immediate problem, are you saying that 
you are (still) seeing test failures with the require-effective-target patch 
applied? Or is the issue that this would not execute the tests as widely as 
might be possible? In principle I'm quite happy to relax the target patterns, 
although have been having issues with sparc (below)...


Re. what the architectures have in common is largely that these are the 
primary/secondary archs on which I've checked the test passes! I can now add 
mips and microblaze to this list, however I'm nervous of dropping the target 
entirely given the very large number of target architectures gcc supports; and 
e.g. IA64 (in ILP32 mode) generates an ashiftrt:DI by 31 places, not 
ashiftrt:SI, which does not match the simplification criteria in combine.c.


This should be something like 


  { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* }

E.g. sparc-sun-solaris2.11 with -m64 is lp64, but would be excluded by
your target list.  Keep the list sorted alphabetically and best add an
explanation so others know what those targets have in common.


So I've built a stage-1 compiler with --target=sparc-sun-solaris2.11, and I find

  * without -m64, my dg-require-effective-target ilp32 causes the 32-bit test 
to execute, and pass; dg-require-effective-target lp64 prevents execution of 
the 64-bit test (which would fail) - so all as expected and desired.


  * with -lp64, behaviour is as previous (this is probably expected)

  * with -m64, dg-require-effective-target ilp32 still causes the test to 
execute (but it fails, as the RTL now has an ashiftrt:DI by 31 places, which 
doesn't meet the simplification criteria in combine.c - this is pretty much as 
expected). dg-require-effective-target lp64 stops the 64-bit test from 
executing however (despite that it would now pass).


Can you clarify what I should be doing on sparc, therefore?

Thanks for your help!

Alan



[PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral

2014-10-24 Thread Alan Lawrence
This is the first half of my previous patch series 
(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01456.html), that is the part 
making the REDUC_..._EXPR tree codes endian-neutral, and adding a new 
reduce-to-scalar optab in place of the endianness-dependent 
reduc_[us](plus|min|max)_optab.


I'm leaving the vec_shr portion out of this patch series, as the link between 
the two halves is only the end goal of removing an if (BYTES_BIG_ENDIAN) from 
tree-vect-loop.c; this series removes that from one code path so can stand alone.


Patches 1-6 are as previously posted apart from rebasing and removing the 
old/poisoned AArch64 patterns as per maintainer's request. Patches 1, 2, 4, 5 
and 6 have already been approved; patch 3 was discussed somewhat but I think we 
decided against most of the ideas raised, I have added comment to 
scalar_reduc_to_vector. I now reread Richie's Otherwise the patch looks good to 
me and wonder if I should have taken that as an approval but I didn't read it 
that way at the time...???


Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC, to 
the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work out 
how to do the same for MIPS (specifically what I need to add to 
mips_expand_vec_reduc), and have had no response from the maintainers, so am 
leaving that for now. Also I haven't migrated (or worked out how to target) 
rs6000/paired.md, help would be most welcome.



The suggestion was then to complete the migration, by removing the old optabs. 
There are a few options here and I'll follow up with appropriate patches 
according to feedback received. I see options:


(1) just delete the old optabs (and the migration code). This would 
performance-regress the MIPS backend, but should not break it, although one 
should really do *something* with the then-unused reduc_[us](plus|min|max)_optab 
in config/mips/loongson.md.


(2) also renaming reduc_..._scal_optab back to reduc_..._optab; would break the 
MIPS backend if something were not done with it's existing patterns.


(2a) Alternatively I could just use a different new name, e.g. reduce_, 
reduct_, vec_reduc_..., anything that's less of a mouthful than 
reduc_..._scal. Whilst being only-very-slightly-different from the current 
reduc_... might be confusing, so might changing the meaning of the optab, and 
its signature, with the existing name, so am open to suggestions?


Cheers, Alancommit 9819291c17610dcdcca19a3d9ea3a4260df0577e
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Thu Aug 21 13:05:43 2014 +0100

Temporarily remove gimple_fold

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 3dba1b2..a49da89 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1188,6 +1188,9 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args,
   return NULL_TREE;
 }
 
+/* Handling of reduction operations temporarily removed so as to decouple
+   changes to tree codes from AArch64 NEON Intrinsics.  */
+#if 0
 bool
 aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 {
@@ -1259,6 +1262,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 
   return changed;
 }
+#endif
 
 void
 aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index db5ff59..27d82f3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10015,8 +10015,8 @@ aarch64_asan_shadow_offset (void)
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required
 
-#undef TARGET_GIMPLE_FOLD_BUILTIN
-#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
+//#undef TARGET_GIMPLE_FOLD_BUILTIN
+//#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
 #undef TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR aarch64_gimplify_va_arg_exprcommit bf6d5d32c552ce1c6ccd890f501db4f39291088f
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Tue Jul 29 11:46:01 2014 +0100

Make tree codes produce scalar, with NOP_EXPRs. (tree-vect-loop.c mess)

diff --git a/gcc/expr.c b/gcc/expr.c
index a6233f3..c792028 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9044,7 +9044,17 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
   {
 op0 = expand_normal (treeop0);
 this_optab = optab_for_tree_code (code, type, optab_default);
-temp = expand_unop (mode, this_optab, op0, target, unsignedp);
+enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
+temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
+gcc_assert (temp);
+/* The tree code produces a scalar result, but (somewhat by convention)
+   the optab produces a vector with the result in element 0 if
+   little-endian, or element N-1 if big-endian.  So pull the scalar

[PATCH 7/11][ARM] Migrate to new reduc_plus_scal_optab

2014-10-24 Thread Alan Lawrence
This migrates ARM from reduc_splus_optab and reduc_uplus optab to a single 
reduc_plus_optab.


Tested, in combination with next patch:
bootstrap on arm-none-linux-gnueabihf
cross-tested check-gcc on arm-none-eabi.

gcc/ChangeLog:

config/arm/neon.md (reduc_plus_*): Rename to...
(reduc_plus_scal_*): ...this; reduce to temp and extract scalar result.commit 22e60bd46f2a591f5357a543d76b19ed89f401ed
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Thu Aug 28 16:12:24 2014 +0100

ARM reduc_plus_scal, V_elem not V_ext, rm old reduc_[us]plus, emit the extract!

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 41cf913..d13fe5d 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1349,33 +1349,47 @@
 
 ;; Reduction operations
 
-(define_expand reduc_splus_mode
-  [(match_operand:VD 0 s_register_operand )
+(define_expand reduc_plus_scal_mode
+  [(match_operand:V_elem 0 nonimmediate_operand )
(match_operand:VD 1 s_register_operand )]
   TARGET_NEON  (!Is_float_mode || flag_unsafe_math_optimizations)
 {
-  neon_pairwise_reduce (operands[0], operands[1], MODEmode,
+  rtx vec = gen_reg_rtx (MODEmode);
+  neon_pairwise_reduce (vec, operands[1], MODEmode,
 			gen_neon_vpadd_internalmode);
+  /* The same result is actually computed into every element.  */
+  emit_insn (gen_vec_extractmode (operands[0], vec, const0_rtx));
   DONE;
 })
 
-(define_expand reduc_splus_mode
-  [(match_operand:VQ 0 s_register_operand )
+(define_expand reduc_plus_scal_mode
+  [(match_operand:V_elem 0 nonimmediate_operand )
(match_operand:VQ 1 s_register_operand )]
   TARGET_NEON  (!Is_float_mode || flag_unsafe_math_optimizations)
 !BYTES_BIG_ENDIAN
 {
   rtx step1 = gen_reg_rtx (V_HALFmode);
-  rtx res_d = gen_reg_rtx (V_HALFmode);
 
   emit_insn (gen_quad_halves_plusmode (step1, operands[1]));
-  emit_insn (gen_reduc_splus_V_half (res_d, step1));
-  emit_insn (gen_move_lo_quad_mode (operands[0], res_d));
+  emit_insn (gen_reduc_plus_scal_V_half (operands[0], step1));
+
+  DONE;
+})
+
+(define_expand reduc_plus_scal_v2di
+  [(match_operand:DI 0 nonimmediate_operand =w)
+   (match_operand:V2DI 1 s_register_operand )]
+  TARGET_NEON  !BYTES_BIG_ENDIAN
+{
+  rtx vec = gen_reg_rtx (V2DImode);
+
+  emit_insn (gen_arm_reduc_plus_internal_v2di (vec, operands[1]));
+  emit_insn (gen_vec_extractv2di (operands[0], vec, const0_rtx));
 
   DONE;
 })
 
-(define_insn reduc_splus_v2di
+(define_insn arm_reduc_plus_internal_v2di
   [(set (match_operand:V2DI 0 s_register_operand =w)
 	(unspec:V2DI [(match_operand:V2DI 1 s_register_operand w)]
 		 UNSPEC_VPADD))]
@@ -1384,17 +1398,6 @@
   [(set_attr type neon_add_q)]
 )
 
-;; NEON does not distinguish between signed and unsigned addition except on
-;; widening operations.
-(define_expand reduc_uplus_mode
-  [(match_operand:VDQI 0 s_register_operand )
-   (match_operand:VDQI 1 s_register_operand )]
-  TARGET_NEON  (Is_d_reg || !BYTES_BIG_ENDIAN)
-{
-  emit_insn (gen_reduc_splus_mode (operands[0], operands[1]));
-  DONE;
-})
-
 (define_expand reduc_smin_mode
   [(match_operand:VD 0 s_register_operand )
(match_operand:VD 1 s_register_operand )]

Re: [PATCH][ARM] revert changes on check_effective_target_arm_*_ok

2014-10-24 Thread Ramana Radhakrishnan
On Fri, Oct 24, 2014 at 12:47 PM, Jiong Wang jiong.w...@arm.com wrote:
 we should not add explicit declaration there.

 arm_neon.h contains those prototype already. they will be available if the
 compiler configuration is with related builtin predefine, for example
 __ARM_FEATURE_CRYPTO.

 so, actually, if there is any warning when compile these test programs, they
 are expected,
 and we rely on these warnings to check whether certain features are
 available.

 previously, I only verified on arm-none-linux-gnueabi cross check, so have
 not exposed
 these regressions.

 I had also missed the vaes and vfma turning on by default by this
change. This is OK.

ramana

 no verified on arm-none-linux-gnueabihf, regression gone away on arm
 directory.

 make check RUNTESTFLAGS=aapcs.exp neon.exp acle.exp simd.exp arm.exp

 ok for trunk?

 gcc/testsuite/

 * lib/target-supports.exp
 (check_effective_target_arm_crypto_ok_nocache): Remove declaration
 for
 vaeseq_u8.
 (check_effective_target_arm_neon_fp16_ok_nocache): Remove
 declaration for
 vcvt_f16_f32.
 (check_effective_target_arm_neonv2_ok_nocache): Remove declaration
 for
 vfma_f32.


[PATCH 8/11][ARM] Migrate to new reduc_[us](min|max)_scal_optab

2014-10-24 Thread Alan Lawrence

Similarly to last patch.

Tested, in combination with previous patch:
bootstrap on arm-none-linux-gnueabihf
cross-tested check-gcc on arm-none-eabi.

gcc/ChangeLog:

config/arm/neon.md (reduc_smin_mode *2): Rename to...
(reduc_smin_scal_mode *2): ...this; extract scalar result.
(reduc_smax_mode *2): Rename to...
(reduc_smax_scal_mode *2): ...this; extract scalar result.
(reduc_umin_mode *2): Rename to...
(reduc_umin_scal_mode *2): ...this; extract scalar result.
(reduc_umax_mode *2): Rename to...
(reduc_umax_scal_mode *2): ...this; extract scalar result.commit 537c31561933f8054a2289198f35b19cf5c4196e
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Thu Aug 28 16:49:24 2014 +0100

ARM reduc_[us](min|max)_scal, V_elem not V_ext, rm old non-_scal version.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index d13fe5d..19e1ba0 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1398,104 +1398,109 @@
   [(set_attr type neon_add_q)]
 )
 
-(define_expand reduc_smin_mode
-  [(match_operand:VD 0 s_register_operand )
+(define_expand reduc_smin_scal_mode
+  [(match_operand:V_elem 0 nonimmediate_operand )
(match_operand:VD 1 s_register_operand )]
   TARGET_NEON  (!Is_float_mode || flag_unsafe_math_optimizations)
 {
-  neon_pairwise_reduce (operands[0], operands[1], MODEmode,
+  rtx vec = gen_reg_rtx (MODEmode);
+
+  neon_pairwise_reduce (vec, operands[1], MODEmode,
 			gen_neon_vpsminmode);
+  /* The result is computed into every element of the vector.  */
+  emit_insn (gen_vec_extractmode (operands[0], vec, const0_rtx));
   DONE;
 })
 
-(define_expand reduc_smin_mode
-  [(match_operand:VQ 0 s_register_operand )
+(define_expand reduc_smin_scal_mode
+  [(match_operand:V_elem 0 nonimmediate_operand )
(match_operand:VQ 1 s_register_operand )]
   TARGET_NEON  (!Is_float_mode || flag_unsafe_math_optimizations)
 !BYTES_BIG_ENDIAN
 {
   rtx step1 = gen_reg_rtx (V_HALFmode);
-  rtx res_d = gen_reg_rtx (V_HALFmode);
 
   emit_insn (gen_quad_halves_sminmode (step1, operands[1]));
-  emit_insn (gen_reduc_smin_V_half (res_d, step1));
-  emit_insn (gen_move_lo_quad_mode (operands[0], res_d));
+  emit_insn (gen_reduc_smin_scal_V_half (operands[0], step1));
 
   DONE;
 })
 
-(define_expand reduc_smax_mode
-  [(match_operand:VD 0 s_register_operand )
+(define_expand reduc_smax_scal_mode
+  [(match_operand:V_elem 0 nonimmediate_operand )
(match_operand:VD 1 s_register_operand )]
   TARGET_NEON  (!Is_float_mode || flag_unsafe_math_optimizations)
 {
-  neon_pairwise_reduce (operands[0], operands[1], MODEmode,
+  rtx vec = gen_reg_rtx (MODEmode);
+  neon_pairwise_reduce (vec, operands[1], MODEmode,
 			gen_neon_vpsmaxmode);
+  /* The result is computed into every element of the vector.  */
+  emit_insn (gen_vec_extractmode (operands[0], vec, const0_rtx));
   DONE;
 })
 
-(define_expand reduc_smax_mode
-  [(match_operand:VQ 0 s_register_operand )
+(define_expand reduc_smax_scal_mode
+  [(match_operand:V_elem 0 nonimmediate_operand )
(match_operand:VQ 1 s_register_operand )]
   TARGET_NEON  (!Is_float_mode || flag_unsafe_math_optimizations)
 !BYTES_BIG_ENDIAN
 {
   rtx step1 = gen_reg_rtx (V_HALFmode);
-  rtx res_d = gen_reg_rtx (V_HALFmode);
 
   emit_insn (gen_quad_halves_smaxmode (step1, operands[1]));
-  emit_insn (gen_reduc_smax_V_half (res_d, step1));
-  emit_insn (gen_move_lo_quad_mode (operands[0], res_d));
+  emit_insn (gen_reduc_smax_scal_V_half (operands[0], step1));
 
   DONE;
 })
 
-(define_expand reduc_umin_mode
-  [(match_operand:VDI 0 s_register_operand )
+(define_expand reduc_umin_scal_mode
+  [(match_operand:V_elem 0 nonimmediate_operand )
(match_operand:VDI 1 s_register_operand )]
   TARGET_NEON
 {
-  neon_pairwise_reduce (operands[0], operands[1], MODEmode,
+  rtx vec = gen_reg_rtx (MODEmode);
+  neon_pairwise_reduce (vec, operands[1], MODEmode,
 			gen_neon_vpuminmode);
+  /* The result is computed into every element of the vector.  */
+  emit_insn (gen_vec_extractmode (operands[0], vec, const0_rtx));
   DONE;
 })
 
-(define_expand reduc_umin_mode
-  [(match_operand:VQI 0 s_register_operand )
+(define_expand reduc_umin_scal_mode
+  [(match_operand:V_elem 0 nonimmediate_operand )
(match_operand:VQI 1 s_register_operand )]
   TARGET_NEON  !BYTES_BIG_ENDIAN
 {
   rtx step1 = gen_reg_rtx (V_HALFmode);
-  rtx res_d = gen_reg_rtx (V_HALFmode);
 
   emit_insn (gen_quad_halves_uminmode (step1, operands[1]));
-  emit_insn (gen_reduc_umin_V_half (res_d, step1));
-  emit_insn (gen_move_lo_quad_mode (operands[0], res_d));
+  emit_insn (gen_reduc_umin_scal_V_half (operands[0], step1));
 
   DONE;
 })
 
-(define_expand reduc_umax_mode
-  [(match_operand:VDI 0 s_register_operand )
+(define_expand reduc_umax_scal_mode
+  [(match_operand:V_elem 0 nonimmediate_operand )
(match_operand:VDI 1 s_register_operand )]
   TARGET_NEON
 {
-  neon_pairwise_reduce (operands[0], operands[1], MODEmode,
+  rtx vec 

Re: [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Alan Lawrence wrote:

 This is the first half of my previous patch series
 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01456.html), that is the part
 making the REDUC_..._EXPR tree codes endian-neutral, and adding a new
 reduce-to-scalar optab in place of the endianness-dependent
 reduc_[us](plus|min|max)_optab.
 
 I'm leaving the vec_shr portion out of this patch series, as the link between
 the two halves is only the end goal of removing an if (BYTES_BIG_ENDIAN)
 from tree-vect-loop.c; this series removes that from one code path so can
 stand alone.
 
 Patches 1-6 are as previously posted apart from rebasing and removing the
 old/poisoned AArch64 patterns as per maintainer's request. Patches 1, 2, 4, 5
 and 6 have already been approved; patch 3 was discussed somewhat but I think
 we decided against most of the ideas raised, I have added comment to
 scalar_reduc_to_vector. I now reread Richie's Otherwise the patch looks good
 to me and wonder if I should have taken that as an approval but I didn't read
 it that way at the time...???

Yes, it was an approval ;)

 Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC, to
 the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work out
 how to do the same for MIPS (specifically what I need to add to
 mips_expand_vec_reduc), and have had no response from the maintainers, so am
 leaving that for now. Also I haven't migrated (or worked out how to target)
 rs6000/paired.md, help would be most welcome.
 
 
 The suggestion was then to complete the migration, by removing the old
 optabs. There are a few options here and I'll follow up with appropriate
 patches according to feedback received. I see options:
 
 (1) just delete the old optabs (and the migration code). This would
 performance-regress the MIPS backend, but should not break it, although one
 should really do *something* with the then-unused
 reduc_[us](plus|min|max)_optab in config/mips/loongson.md.

 (2) also renaming reduc_..._scal_optab back to reduc_..._optab; would break
 the MIPS backend if something were not done with it's existing patterns.
 
 (2a) Alternatively I could just use a different new name, e.g. reduce_,
 reduct_, vec_reduc_..., anything that's less of a mouthful than
 reduc_..._scal. Whilst being only-very-slightly-different from the current
 reduc_... might be confusing, so might changing the meaning of the optab, and
 its signature, with the existing name, so am open to suggestions?

I definitely prefer (2).

Thanks,
Richard.


[PATCH 9/11][i386] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread Alan Lawrence

Bootstrapped and check-gcc on x86_64-none-linux-gnu.

gcc/ChangeLog:

* config/i386/i386.c (ix86_expand_reduc): Extract result into scalar.
* config/i386/sse.md (reduc_splus_v8df, reduc_code_mode * 3,
reduc_umin_v8hi): Rename to...
(reduc_plus_scal_v8df, reduc_code_scal_mode * 3,
reduc_umin_scal_v8hi): ...these, changing result mode to scalar.

(reduc_splus_v4df, reduc_splus_v2df, reduc_splus_v16sf,
reduc_splus_v8sf, reduc_splus_v4sf): Rename to...
(reduc_plus_scal_v4df, reduc_plus_scal_v2df, reduc_plus_scal_v16sf,
reduc_plus_scal_v8sf, reduc_plus_scal_v4sf): ...these, adding
gen_vec_extract for scalar result.commit 80b0d10a78b2f3e86325f373e99e9cf71e42e622
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Tue Oct 7 13:25:08 2014 +0100

i386

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4c4a6eb..670a5f5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41211,12 +41211,12 @@ emit_reduc_half (rtx dest, rtx src, int i)
 }
 
 /* Expand a vector reduction.  FN is the binary pattern to reduce;
-   DEST is the destination; IN is the input vector.  */
+   DEST is the (scalar) destination; IN is the input vector.  */
 
 void
 ix86_expand_reduc (rtx (*fn) (rtx, rtx, rtx), rtx dest, rtx in)
 {
-  rtx half, dst, vec = in;
+  rtx half, dst = NULL_RTX, vec = in;
   enum machine_mode mode = GET_MODE (in);
   int i;
 
@@ -41225,23 +41225,21 @@ ix86_expand_reduc (rtx (*fn) (rtx, rtx, rtx), rtx dest, rtx in)
mode == V8HImode
fn == gen_uminv8hi3)
 {
-  emit_insn (gen_sse4_1_phminposuw (dest, in));
-  return;
+  dst = gen_reg_rtx (mode);
+  emit_insn (gen_sse4_1_phminposuw (dst, in));
 }
-
-  for (i = GET_MODE_BITSIZE (mode);
-   i  GET_MODE_BITSIZE (GET_MODE_INNER (mode));
-   i = 1)
-{
+  else
+for (i = GET_MODE_BITSIZE (mode);
+	  i  GET_MODE_BITSIZE (GET_MODE_INNER (mode));
+	  i = 1)
+  {
   half = gen_reg_rtx (mode);
   emit_reduc_half (half, vec, i);
-  if (i == GET_MODE_BITSIZE (GET_MODE_INNER (mode)) * 2)
-	dst = dest;
-  else
-	dst = gen_reg_rtx (mode);
+  dst = gen_reg_rtx (mode);
   emit_insn (fn (dst, half, vec));
   vec = dst;
 }
+  ix86_expand_vector_extract (false, dest, dst, 0);
 }
 
 /* Target hook for scalar_mode_supported_p.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e7646d7..e4e0b95 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2238,8 +2238,8 @@
(set_attr prefix_rep 1,*)
(set_attr mode V4SF)])
 
-(define_expand reduc_splus_v8df
-  [(match_operand:V8DF 0 register_operand)
+(define_expand reduc_plus_scal_v8df
+  [(match_operand:DF 0 register_operand)
(match_operand:V8DF 1 register_operand)]
   TARGET_AVX512F
 {
@@ -2247,30 +2247,35 @@
   DONE;
 })
 
-(define_expand reduc_splus_v4df
-  [(match_operand:V4DF 0 register_operand)
+(define_expand reduc_plus_scal_v4df
+  [(match_operand:DF 0 register_operand)
(match_operand:V4DF 1 register_operand)]
   TARGET_AVX
 {
   rtx tmp = gen_reg_rtx (V4DFmode);
   rtx tmp2 = gen_reg_rtx (V4DFmode);
+  rtx tmp3 = gen_reg_rtx (V4DFmode);
+  
   emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1]));
   emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1)));
-  emit_insn (gen_addv4df3 (operands[0], tmp, tmp2));
+  emit_insn (gen_addv4df3 (tmp3, tmp, tmp2));
+  emit_insn (gen_vec_extractv4df (operands[0], tmp3, GEN_INT (1)));
   DONE;
 })
 
-(define_expand reduc_splus_v2df
-  [(match_operand:V2DF 0 register_operand)
+(define_expand reduc_plus_scal_v2df
+  [(match_operand:DF 0 register_operand)
(match_operand:V2DF 1 register_operand)]
   TARGET_SSE3
 {
-  emit_insn (gen_sse3_haddv2df3 (operands[0], operands[1], operands[1]));
+  rtx tmp = gen_reg_rtx (V2DFmode);
+  emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1]));
+  emit_insn (gen_vec_extractv2df (operands[0], tmp, GEN_INT (0)));
   DONE;
 })
 
-(define_expand reduc_splus_v16sf
-  [(match_operand:V16SF 0 register_operand)
+(define_expand reduc_plus_scal_v16sf
+  [(match_operand:SF 0 register_operand)
(match_operand:V16SF 1 register_operand)]
   TARGET_AVX512F
 {
@@ -2278,30 +2283,35 @@
   DONE;
 })
 
-(define_expand reduc_splus_v8sf
-  [(match_operand:V8SF 0 register_operand)
+(define_expand reduc_plus_scal_v8sf
+  [(match_operand:SF 0 register_operand)
(match_operand:V8SF 1 register_operand)]
   TARGET_AVX
 {
   rtx tmp = gen_reg_rtx (V8SFmode);
   rtx tmp2 = gen_reg_rtx (V8SFmode);
+  rtx tmp3 = gen_reg_rtx (V8SFmode);
+  
   emit_insn (gen_avx_haddv8sf3 (tmp, operands[1], operands[1]));
   emit_insn (gen_avx_haddv8sf3 (tmp2, tmp, tmp));
   emit_insn (gen_avx_vperm2f128v8sf3 (tmp, tmp2, tmp2, GEN_INT (1)));
-  emit_insn (gen_addv8sf3 (operands[0], tmp, tmp2));
+  emit_insn (gen_addv8sf3 (tmp3, tmp, tmp2));
+  emit_insn (gen_vec_extractv8sf (operands[0], tmp3, GEN_INT (0)));
   DONE;
 })
 

[PATCH] c11-atomic-exec-5: Avoid dead code where LDBL_MANT_DIG is 106

2014-10-24 Thread Maciej W. Rozycki
Hi,

 Commit 216437 missed a part of Adhemerval's original change that made 
`long_double_add_overflow', `complex_long_double_add_overflow', 
`long_double_sub_overflow' and `complex_long_double_sub_overflow' tests 
consistently defined only if called.  These tests are now only made under 
the `LDBL_MANT_DIG != 106' condition, otherwise there is no need to 
provide definitions that become dead code.

 Here's the missing part, I have verified the source still builds after 
the change manually with:

$ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=113 -Wunused-function -std=c11 
-pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o 
c11-atomic-exec-5 c11-atomic-exec-5.c

and:

$ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=106 -Wunused-function -std=c11 
-pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o 
c11-atomic-exec-5 c11-atomic-exec-5.c

It also passed regression testing with the powerpc-gnu-linux target and my 
usual multilibs that have LDBL_MANT_DIG set to 106, which is the only case 
this change really affects.

 Without this change I get this instead:

$ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=113 -Wunused-function -std=c11 
-pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o 
c11-atomic-exec-5 c11-atomic-exec-5.c
$

(OK), and:

$ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=106 -Wunused-function -std=c11 
-pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o 
c11-atomic-exec-5 c11-atomic-exec-5.c
c11-atomic-exec-5.c:62:1: warning: 'test_main_long_double_add_overflow' 
definedbut not used [-Wunused-function]
 test_main_##NAME (void)   \
 ^
c11-atomic-exec-5.c:334:1: note: in expansion of macro 'TEST_FUNCS'
 TEST_FUNCS (long_double_add_overflow, long double, , += LDBL_MAX, 0,
 ^
c11-atomic-exec-5.c:62:1: warning: 
'test_main_complex_long_double_add_overflow'defined but not used 
[-Wunused-function]
 test_main_##NAME (void)   \
 ^
c11-atomic-exec-5.c:352:1: note: in expansion of macro 'TEST_FUNCS'
 TEST_FUNCS (complex_long_double_add_overflow, _Complex long double, , += 
LDBL_MAX, 0,
 ^
c11-atomic-exec-5.c:62:1: warning: 'test_main_long_double_sub_overflow' 
definedbut not used [-Wunused-function]
 test_main_##NAME (void)   \
 ^
c11-atomic-exec-5.c:358:1: note: in expansion of macro 'TEST_FUNCS'
 TEST_FUNCS (long_double_sub_overflow, long double, , -= LDBL_MAX, 0,
 ^
c11-atomic-exec-5.c:62:1: warning: 
'test_main_complex_long_double_sub_overflow'defined but not used 
[-Wunused-function]
 test_main_##NAME (void)   \
 ^
c11-atomic-exec-5.c:376:1: note: in expansion of macro 'TEST_FUNCS'
 TEST_FUNCS (complex_long_double_sub_overflow, _Complex long double, , -= 
LDBL_MAX, 0,
 ^
$

(not quite so).

 This also wraps the definitions of the `NOT_LDBL_EPSILON_2' and 
`NOT_MINUS_LDBL_EPSILON_2' macros into this condition, but these aren't 
referred to if `LDBL_MANT_DIG' is 106 either.

 No changes compared to original code so all credit goes to Adhemerval.  
OK to apply?

2014-10-24  Adhemerval Zanella  azane...@linux.vnet.ibm.com

gcc/testsuite/
* gcc.dg/atomic/c11-atomic-exec-5.c 
(test_main_long_double_add_overflow): Only actually define if
LDBL_MANT_DIG != 106.
(test_main_complex_long_double_add_overflow): Likewise.
(test_main_long_double_sub_overflow): Likewise.
(test_main_complex_long_double_sub_overflow): Likewise.

(NOT_LDBL_EPSILON_2): Likewise.
(NOT_MINUS_LDBL_EPSILON_2): Likewise.

  Maciej

gcc-r216437-azanella-rs6000-atomic-assign-expand-env-update.diff
Index: gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
===
--- gcc-fsf-trunk-quilt.orig/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
2014-10-22 21:59:45.788954624 +0100
+++ gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c 
2014-10-22 21:59:15.788143775 +0100
@@ -331,11 +331,11 @@ TEST_FUNCS (complex_double_div_overflow,
 TEST_FUNCS (long_double_add_invalid, long double, , += __builtin_infl (), 0,
0, __builtin_isinf, 0,
-__builtin_infl (), FE_INVALID)
+#if LDBL_MANT_DIG != 106
 TEST_FUNCS (long_double_add_overflow, long double, , += LDBL_MAX, 0,
LDBL_MAX, __builtin_isinf, FE_OVERFLOW | FE_INEXACT,
0, 0)
 #define NOT_LDBL_EPSILON_2(X) ((X) != LDBL_EPSILON / 2)
-#if LDBL_MANT_DIG != 106
 TEST_FUNCS (long_double_add_inexact, long double, , += LDBL_EPSILON / 2, 0,
1.0L, NOT_LDBL_EPSILON_2, FE_INEXACT,
0, 0)
@@ -348,18 +348,18 @@ TEST_FUNCS (long_double_preinc_inexact, 
 TEST_FUNCS (long_double_postinc_inexact, long double, , ++, 0,
LDBL_EPSILON / 2, NOT_MINUS_1, FE_INEXACT,
-1, 0)
-#endif
 TEST_FUNCS (complex_long_double_add_overflow, _Complex long double, , += 
LDBL_MAX, 0,
LDBL_MAX, REAL_ISINF, FE_OVERFLOW | FE_INEXACT,
0, 0)
+#endif
 TEST_FUNCS (long_double_sub_invalid, long 

[Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab

2014-10-24 Thread Alan Lawrence
This is an attempt to migrate IA64 to the newer optabs, however, I found none of 
the tests in gcc.dg/vect seemed to touch any of the affected patternsso this 
is only really tested by building a stage-1 compiler.


gcc/ChangeLog:

* config/ia64/vect.md (reduc_splus_v2sf): Rename to...
(reduc_plus_v2sf): ...this, add a vec_extractv2sf.
(reduc_smin_v2sf): Rename to...
(reduc_smin_scal_v2sf): ...this, add a vec_extractv2sf.
(reduc_smax_v2sf): Rename to...
(reduc_smax_scal_v2sf): ...this, add a vec_extractv2sf.



[PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread Alan Lawrence
This migrates the reduction patterns in altivec.md and vector.md to the new 
names. I've not touched paired.md as I wasn't really sure how to fix that (how 
do I vec_extractv2sf ?), moreover the testing I did didn't seem to exercise any 
of those patterns (iow: I'm not sure what would be an appropriate target machine?).


I note the reduc_uplus_v16qi (which I've removed, as unsigned and signed 
addition should be equivalent) differed from reduc_splus_v16qi in using 
gen_altivec_vsum4ubs rather than gen_altivec_vsum4sbs.  Testcases 
gcc.dg/vect/{slp-24-big-array.c,slp-24.c,vect-reduc-1char-big-array.c,vert-reduc-1char.c} 
thus produce assembly which differs from previously (only) in that vsum4ubs 
becomes vsum4sbs. These tests are still passing so I assume this is OK.


The combining of signed and unsigned addition also improves 
gcc.dg/vect/{vect-outer-4i.c,vect-reduc-1short.c,vect-reduc-dot-u8b.c,vect-reduc-pattern-1c-big-array.c,vect-reduc-pattern-1c.c} 
: these are now reduced using direct vector reduction, rather than with shifts 
as previously (because there was only a reduc_splus rather than the reduc_uplus 
these tests looked for).


((Side note: the RTL changes to vector.md are to match the combine patterns in 
vsx.md; now that we now longer depend upon combine to generate those patterns 
(as the optab outputs them directly), one might wish to remove the smaller 
pattern from vsx.md, and/or simplify the RTL. I theorize that a reduction of a 
two-element vector is just adding the first element to the second, so maybe to 
something like


  [(parallel [(set (match_operand:DF 0 vfloat_operand )
   (VEC_reduc:V2DF
(vec_select:DF
 (match_operand:V2DF 1 vfloat_operand )
 (parallel [(const_int 1)]))
(vec_select:DF
 (match_dup 1)
 (parallel [(const_int 0)]
  (clobber (match_scratch:V2DF 2 ))])]

but I think it's best for me to leave that to the port maintainers.))

Bootstrapped and check-gcc on powerpc64-none-linux-gnu (gcc110.fsffrance.org, 
with thanks to the GCC Compile Farm).


gcc/ChangeLog:

* config/rs6000/altivec.md (reduc_splus_mode): Rename to...
(reduc_plus_scal_mode): ...this, and rs6000_expand_vector_extract.
(reduc_uplus_v16qi): Remove.

* config/rs6000/vector.md (VEC_reduc_name): change splus to plus
(reduc_VEC_reduc_name_v2df): Rename to...
(reduc_VEC_reduc_name_scal_v2df): ...this, wrap VEC_reduc in a
vec_select of element 1.
(reduc_VEC_reduc_name_v4sf): Rename to...
(reduc_VEC_reduc_name_scal_v4sf): ...this, wrap VEC_reduc in a
vec_select of element 3, add scratch register.



Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread Alan Lawrence

Ooops, attached.commit e48d59399722ce8316d4b1b4f28b40d87b1193fa
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Tue Oct 7 15:28:47 2014 +0100

PowerPC v2 (but not paired.md)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 02ea142..92bb5d0 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2596,35 +2596,22 @@
   operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
 })
 
-(define_expand reduc_splus_mode
-  [(set (match_operand:VIshort 0 register_operand =v)
+(define_expand reduc_plus_scal_mode
+  [(set (match_operand:VI_scalar 0 register_operand =v)
 (unspec:VIshort [(match_operand:VIshort 1 register_operand v)]
 			UNSPEC_REDUC_PLUS))]
   TARGET_ALTIVEC
 {
   rtx vzero = gen_reg_rtx (V4SImode);
   rtx vtmp1 = gen_reg_rtx (V4SImode);
-  rtx dest = gen_lowpart (V4SImode, operands[0]);
+  rtx vtmp2 = gen_reg_rtx (MODEmode);
+  rtx dest = gen_lowpart (V4SImode, vtmp2);
+  HOST_WIDE_INT last_elem = GET_MODE_NUNITS (MODEmode) - 1;
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
   emit_insn (gen_altivec_vsum4sVI_chars (vtmp1, operands[1], vzero));
   emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
-  DONE;
-})
-
-(define_expand reduc_uplus_v16qi
-  [(set (match_operand:V16QI 0 register_operand =v)
-(unspec:V16QI [(match_operand:V16QI 1 register_operand v)]
-		  UNSPEC_REDUC_PLUS))]
-  TARGET_ALTIVEC
-{
-  rtx vzero = gen_reg_rtx (V4SImode);
-  rtx vtmp1 = gen_reg_rtx (V4SImode);
-  rtx dest = gen_lowpart (V4SImode, operands[0]);
-
-  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
-  emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero));
-  emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
+  rs6000_expand_vector_extract (operands[0], vtmp2, last_elem);
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 237724e..54b18aa 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -81,7 +81,7 @@
 ;; Vector reduction code iterators
 (define_code_iterator VEC_reduc [plus smin smax])
 
-(define_code_attr VEC_reduc_name [(plus splus)
+(define_code_attr VEC_reduc_name [(plus plus)
   (smin smin)
   (smax smax)])
 
@@ -1077,18 +1077,20 @@
 
 ;; Vector reduction expanders for VSX
 
-(define_expand reduc_VEC_reduc_name_v2df
-  [(parallel [(set (match_operand:V2DF 0 vfloat_operand )
-		   (VEC_reduc:V2DF
-		(vec_concat:V2DF
-		 (vec_select:DF
-		  (match_operand:V2DF 1 vfloat_operand )
-		  (parallel [(const_int 1)]))
-		 (vec_select:DF
-		  (match_dup 1)
-		  (parallel [(const_int 0)])))
-		(match_dup 1)))
-	  (clobber (match_scratch:V2DF 2 ))])]
+(define_expand reduc_VEC_reduc_name_scal_v2df
+  [(parallel [(set (match_operand:DF 0 vfloat_operand )
+		   (vec_select:DF
+		(VEC_reduc:V2DF
+		 (vec_concat:V2DF
+		  (vec_select:DF
+		   (match_operand:V2DF 1 vfloat_operand )
+		   (parallel [(const_int 1)]))
+		  (vec_select:DF
+		   (match_dup 1)
+		   (parallel [(const_int 0)])))
+		 (match_dup 1))
+		(parallel [(const_int 1)])))
+	  (clobber (match_scratch:DF 2 ))])]
   VECTOR_UNIT_VSX_P (V2DFmode)
   )
 
@@ -1099,13 +1101,16 @@
 ; is to allow us to use a code iterator, but not completely list all of the
 ; vector rotates, etc. to prevent canonicalization
 
-(define_expand reduc_VEC_reduc_name_v4sf
-  [(parallel [(set (match_operand:V4SF 0 vfloat_operand )
-		   (VEC_reduc:V4SF
-		(unspec:V4SF [(const_int 0)] UNSPEC_REDUC)
-		(match_operand:V4SF 1 vfloat_operand )))
+(define_expand reduc_VEC_reduc_name_scal_v4sf
+  [(parallel [(set (match_operand:SF 0 vfloat_operand )
+		   (vec_select:SF
+		(VEC_reduc:V4SF
+		 (unspec:V4SF [(const_int 0)] UNSPEC_REDUC)
+		 (match_operand:V4SF 1 vfloat_operand ))
+		(parallel [(const_int 3)])))
 	  (clobber (match_scratch:V4SF 2 ))
-	  (clobber (match_scratch:V4SF 3 ))])]
+	  (clobber (match_scratch:V4SF 3 ))
+	  (clobber (match_scratch:V4SF 4 ))])]
   VECTOR_UNIT_VSX_P (V4SFmode)
   )
 

Re: [Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab

2014-10-24 Thread Alan Lawrence

Ooops, attached.commit 56296417b9f6795e541b1101dce6e6ac1789de9a
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Wed Oct 8 15:58:27 2014 +0100

IA64 (?!)

diff --git a/gcc/config/ia64/vect.md b/gcc/config/ia64/vect.md
index e3ce292..45f4156 100644
--- a/gcc/config/ia64/vect.md
+++ b/gcc/config/ia64/vect.md
@@ -1217,45 +1217,54 @@
   fpmin %0 = %1, %2
   [(set_attr itanium_class fmisc)])
 
-(define_expand reduc_splus_v2sf
-  [(match_operand:V2SF 0 fr_register_operand )
+(define_expand reduc_plus_scal_v2sf
+  [(match_operand:SF 0 fr_register_operand )
(match_operand:V2SF 1 fr_register_operand )]
   
 {
   rtx tmp = gen_reg_rtx (V2SFmode);
+  rtx tmp2 = gen_reg_rtx (V2SFmode);
+
   if (TARGET_BIG_ENDIAN)
 emit_insn (gen_fswap (tmp, CONST0_RTX (V2SFmode), operands[1]));
   else
 emit_insn (gen_fswap (tmp, operands[1], CONST0_RTX (V2SFmode)));
-  emit_insn (gen_addv2sf3 (operands[0], operands[1], tmp));
+  emit_insn (gen_addv2sf3 (tmp2, operands[1], tmp));
+  emit_insn (gen_vec_extractv2sf (operands[0], tmp2, GEN_INT (0)));
   DONE;
 })
 
-(define_expand reduc_smax_v2sf
-  [(match_operand:V2SF 0 fr_register_operand )
+(define_expand reduc_smax_scal_v2sf
+  [(match_operand:SF 0 fr_register_operand )
(match_operand:V2SF 1 fr_register_operand )]
   
 {
   rtx tmp = gen_reg_rtx (V2SFmode);
+  rtx tmp2 = gen_reg_rtx (V2SFmode);
+
   if (TARGET_BIG_ENDIAN)
 emit_insn (gen_fswap (tmp, CONST0_RTX (V2SFmode), operands[1]));
   else
 emit_insn (gen_fswap (tmp, operands[1], CONST0_RTX (V2SFmode)));
-  emit_insn (gen_smaxv2sf3 (operands[0], operands[1], tmp));
+  emit_insn (gen_smaxv2sf3 (tmp2, operands[1], tmp));
+  emit_insn (gen_vec_extractv2sf (operands[0], tmp2, GEN_INT (0)));
   DONE;
 })
 
-(define_expand reduc_smin_v2sf
-  [(match_operand:V2SF 0 fr_register_operand )
+(define_expand reduc_smin_scal_v2sf
+  [(match_operand:SF 0 fr_register_operand )
(match_operand:V2SF 1 fr_register_operand )]
   
 {
   rtx tmp = gen_reg_rtx (V2SFmode);
+  rtx tmp2 = gen_reg_rtx (V2SFmode);
+
   if (TARGET_BIG_ENDIAN)
 emit_insn (gen_fswap (tmp, CONST0_RTX (V2SFmode), operands[1]));
   else
 emit_insn (gen_fswap (tmp, operands[1], CONST0_RTX (V2SFmode)));
-  emit_insn (gen_sminv2sf3 (operands[0], operands[1], tmp));
+  emit_insn (gen_sminv2sf3 (tmp2, operands[1], tmp));
+  emit_insn (gen_vec_extractv2sf (operands[0], tmp2, GEN_INT (0)));
   DONE;
 })
 

[v3] Minor atomic tweaks

2014-10-24 Thread Paolo Carlini

Hi,

tested x86_64-linux.

Thanks,
Paolo.

///
2014-10-24  Paolo Carlini  paolo.carl...@oracle.com

* include/bits/atomic_base.h: Avoid including stdbool.h.
* include/std/atomic: When __cplusplus  201103L skip the rest of
the header.
* testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc: Adjust.
Index: include/bits/atomic_base.h
===
--- include/bits/atomic_base.h  (revision 216624)
+++ include/bits/atomic_base.h  (working copy)
@@ -33,7 +33,6 @@
 #pragma GCC system_header
 
 #include bits/c++config.h
-#include stdbool.h
 #include stdint.h
 #include bits/atomic_lockfree_defines.h
 
Index: include/std/atomic
===
--- include/std/atomic  (revision 216624)
+++ include/std/atomic  (working copy)
@@ -36,7 +36,7 @@
 
 #if __cplusplus  201103L
 # include bits/c++0x_warning.h
-#endif
+#else
 
 #include bits/atomic_base.h
 
@@ -1129,4 +1129,6 @@
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
-#endif
+#endif // C++11
+
+#endif // _GLIBCXX_ATOMIC
Index: testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc
===
--- testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc(revision 
216624)
+++ testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc(working copy)
@@ -18,7 +18,7 @@
 // with this library; see the file COPYING3.  If not see
 // http://www.gnu.org/licenses/.
 
-#include atomic  // { dg-excess-errors In file included from }
+#include atomic
 
 // { dg-error ISO C.. 2011  { target *-*-* } 32 }
 


Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-10-24 Thread Rainer Orth
Alan Lawrence alan.lawre...@arm.com writes:

 Rainer Orth wrote:
 However, as a quick first step, does adding the ilp32 / lp64 (and keeping
 the architectures list for now) solve the immediate problem? Patch
 attached, OK for trunk?

 No, as I said this is wrong for biarch targets like sparc and i386.

 When you say no this does not solve the immediate problem, are you saying
 that you are (still) seeing test failures with the require-effective-target
 patch applied? Or is the issue that this would not execute the tests as

I didn't try that patch yet, but the target part is wrong, as I tried to
explain.  Consider the sparc case: 

* if you configure for sparc-sun-solaris2.11, you default to -m32
  (i.e. ilp32), while -m64 is lp64

* if you configure for sparcv9-sun-solaris2.11 instead, you default to
  -m64 (lp64), but get ilp32 with -m32

So, irrespective of the sparc vs. sparc64 (which is wrong, btw., the
canonical form for 64-bit-default sparc is sparcv9) forms, you can get
ilp32 and lp64 with both.

Similar issues hold for i?86 vs. x86_64 and probably other biarch
targets like powerpc vs. powerpc64, so you need to use the most generic
forms of the target names in you target lists.

 widely as might be possible? In principle I'm quite happy to relax the
 target patterns, although have been having issues with sparc (below)...

 Re. what the architectures have in common is largely that these are the
 primary/secondary archs on which I've checked the test passes! I can now
 add mips and microblaze to this list, however I'm nervous of dropping the
 target entirely given the very large number of target architectures gcc
 supports; and e.g. IA64 (in ILP32 mode) generates an ashiftrt:DI by 31
 places, not ashiftrt:SI, which does not match the simplification criteria
 in combine.c.

As I stated before, such target lists without any explanation are bound
to confuse future readers/testers: at the very least, add comments
explaining what those lists have in common.  OTOH, at this stage it
might be best to just drop the target list for now, learn which targets
pass and fail the tests, and then reintroduce them or, better yet, add
an effective-target keyword which matches them.  Otherwise, you'll never
get test coverage beyond your current list.

 This should be something like 

   { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* }

 E.g. sparc-sun-solaris2.11 with -m64 is lp64, but would be excluded by
 your target list.  Keep the list sorted alphabetically and best add an
 explanation so others know what those targets have in common.

 So I've built a stage-1 compiler with --target=sparc-sun-solaris2.11, and I 
 find

   * without -m64, my dg-require-effective-target ilp32 causes the 32-bit
 test to execute, and pass; dg-require-effective-target lp64 prevents
 execution of the 64-bit test (which would fail) - so all as expected and
 desired.

   * with -lp64, behaviour is as previous (this is probably expected)

Huh?  What's -lp64?

   * with -m64, dg-require-effective-target ilp32 still causes the test to
 execute (but it fails, as the RTL now has an ashiftrt:DI by 31 places,
 which doesn't meet the simplification criteria in combine.c - this is
 pretty much as expected). dg-require-effective-target lp64 stops the
 64-bit test from executing however (despite that it would now pass).

 Can you clarify what I should be doing on sparc, therefore?

It's not only about sparc, but about all biarch targets.  The following
patch (which only includes the parts strictly necessary to avoid the
failures, nothing else I suggested above) works for me on
sparc-sun-solaris2.11 (-m32 and -m64), x86_64-unknown-linux-gnu (-m64
and -m32), and i686-unknown-linux-gnu (-m32 and -m64): the first test is
run for 64-bit only, while the second one only for 32-bit:

diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
--- a/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
+++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
@@ -1,4 +1,5 @@
-/* { dg-do compile {target sparc64*-*-* aarch64*-*-* x86_64-*-* powerpc64*-*-*} } */
+/* { dg-do compile { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* } } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-options -O2 -fdump-rtl-combine-all } */
 
 typedef long long int int64_t;
diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
--- a/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
+++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
@@ -1,4 +1,5 @@
-/* { dg-do compile {target arm*-*-* i?86-*-* powerpc-*-* sparc-*-*} } */
+/* { dg-do compile { target arm*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* } } */
+/* { dg-require-effective-target ilp32 } */
 /* { dg-options -O2 -fdump-rtl-combine-all } */
 
 typedef long int32_t;

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH][6/n] Merge from match-and-simplify, make forwprop fold all stmts

2014-10-24 Thread Richard Biener

This patch makes GIMPLE forwprop fold all statements, following
single-use SSA edges only (as suggested by Jeff and certainly
how this will regress the least until we replace manual
simplification code that does not restrict itself this way).

forwprop is run up to 4 times at the moment (once only for -Og,
not at all for -O0), which still seems reasonable.  IMHO the
forwprop pass immediately after inlining is somewhat superfluous,
it was added there just for its ADDR_EXPR propagation.  We should
eventually split this pass into two.

Note that just folding what we propagated into (like the SSA
propagators do during substitute-and-fold phase) will miss
cases where we propagate into a stmt feeding the one we could
simplify.  Unless we always fold all single-use (and their use)
stmts we have to fold everything from time to time.  Changing
how / when we fold stuff is certainly sth to look after with
fold_stmt now being able to follow SSA edges.

Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress.

From earlier testing I remember I need to adjust a few testcases
that don't expect the early folding - notably two strlenopt cases
(previously XFAILed but then PASSed again).

I also expect to massage the single-use heuristic as I get to
merging the patterns I added for the various forwprop manual
pattern matchings to trunk (a lot of them do not restrict themselves
this way).

Does this otherwise look ok?

Thanks,
Richard.

2014-10-24  Richard Biener  rguent...@suse.de

* tree-ssa-forwprop.c: Include tree-cfgcleanup.h and tree-into-ssa.h.
(lattice): New global.
(fwprop_ssa_val): New function.
(fold_all_stmts): Likewise.
(pass_forwprop::execute): Finally fold all stmts.

Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c 
(svn+ssh://rgue...@gcc.gnu.org/svn/gcc/trunk/gcc/tree-ssa-forwprop.c)   
(revision 216631)
+++ gcc/tree-ssa-forwprop.c (.../gcc/tree-ssa-forwprop.c)   (working copy)
@@ -54,6 +54,8 @@ along with GCC; see the file COPYING3.
 #include tree-ssa-propagate.h
 #include tree-ssa-dom.h
 #include builtins.h
+#include tree-cfgcleanup.h
+#include tree-into-ssa.h
 
 /* This pass propagates the RHS of assignment statements into use
sites of the LHS of the assignment.  It's basically a specialized
@@ -3586,6 +3588,93 @@ simplify_mult (gimple_stmt_iterator *gsi
 
   return false;
 }
+
+
+/* Const-and-copy lattice for fold_all_stmts.  */
+static vectree lattice;
+
+/* Primitive lattice function for gimple_simplify.  */
+
+static tree
+fwprop_ssa_val (tree name)
+{
+  /* First valueize NAME.  */
+  if (TREE_CODE (name) == SSA_NAME
+   SSA_NAME_VERSION (name)  lattice.length ())
+{
+  tree val = lattice[SSA_NAME_VERSION (name)];
+  if (val)
+   name = val;
+}
+  /* If NAME is not the only use signal we don't want to continue
+ matching into its definition.  */
+  if (TREE_CODE (name) == SSA_NAME
+   !has_single_use (name))
+return NULL_TREE;
+  return name;
+}
+
+/* Fold all stmts using fold_stmt following only single-use chains
+   and using a simple const-and-copy lattice.  */
+
+static bool
+fold_all_stmts (struct function *fun)
+{
+  bool cfg_changed = false;
+
+  /* Combine stmts with the stmts defining their operands.  Do that
+ in an order that guarantees visiting SSA defs before SSA uses.  */
+  lattice.create (num_ssa_names);
+  lattice.quick_grow_cleared (num_ssa_names);
+  int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (fun));
+  int postorder_num = inverted_post_order_compute (postorder);
+  for (int i = 0; i  postorder_num; ++i)
+{
+  basic_block bb = BASIC_BLOCK_FOR_FN (fun, postorder[i]);
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi); gsi_next (gsi))
+   {
+ gimple stmt = gsi_stmt (gsi);
+ gimple orig_stmt = stmt;
+
+ if (fold_stmt (gsi, fwprop_ssa_val))
+   {
+ stmt = gsi_stmt (gsi);
+ if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt)
+  gimple_purge_dead_eh_edges (bb))
+   cfg_changed = true;
+ /* Cleanup the CFG if we simplified a condition to
+true or false.  */
+ if (gimple_code (stmt) == GIMPLE_COND
+  (gimple_cond_true_p (stmt)
+ || gimple_cond_false_p (stmt)))
+   cfg_changed = true;
+ update_stmt (stmt);
+   }
+
+ /* Fill up the lattice.  */
+ if (gimple_assign_single_p (stmt))
+   {
+ tree lhs = gimple_assign_lhs (stmt);
+ tree rhs = gimple_assign_rhs1 (stmt);
+ if (TREE_CODE (lhs) == SSA_NAME)
+   {
+ if (TREE_CODE (rhs) == SSA_NAME)
+   lattice[SSA_NAME_VERSION (lhs)] = fwprop_ssa_val (rhs);
+ else if (is_gimple_min_invariant (rhs))
+  

Re: Patch committed: Don't define TARGET_HAS_F_SETLKW

2014-10-24 Thread Andreas Schwab
Ian Taylor i...@golang.org writes:

 2014-10-23  Ian Lance Taylor  i...@google.com

 * config/mep/mep.h (TARGET_HAS_F_SETLKW): Don't define.

s/define/undefine/

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


[PATCH] Fix modulo patterns in match.pd

2014-10-24 Thread Richard Biener

As noted by Marc I forgot to actually utilize the iterator variable.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

PS: How do we want to refer to patterns in ChangeLogs?

2014-10-24  Richard Biener  rguent...@suse.de

* match.pd (0 % X): Properly use the iterator iterating over
all modulo operators.
(X % 1): Likewise.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 216648)
+++ gcc/match.pd(working copy)
@@ -64,13 +64,13 @@ (define_predicates
 (for op (ceil_mod floor_mod round_mod trunc_mod)
  /* 0 % X is always zero.  */
  (simplify
-  (trunc_mod integer_zerop@0 @1)
+  (op integer_zerop@0 @1)
   /* But not for 0 % 0 so that we can get the proper warnings and errors.  */
   (if (!integer_zerop (@1))
@0))
  /* X % 1 is always zero.  */
  (simplify
-  (trunc_mod @0 integer_onep)
+  (op @0 integer_onep)
   { build_zero_cst (type); }))
 
 /* x | ~0 - ~0  */


Re: [PATCH] Fix modulo patterns in match.pd

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 03:27:19PM +0200, Richard Biener wrote:
 As noted by Marc I forgot to actually utilize the iterator variable.
 
 Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
 
 Richard.
 
 PS: How do we want to refer to patterns in ChangeLogs?

Perhaps the syntax should be (simplify name (...) { ... })
(maybe the name being optional?), where you'd give some name to the
simplification, say 0 % X or 0 % X = 0 or 0 % X variant 3
or whatever, then you could easily refer to those strings in ChangeLog,
on gcc-patches, in comments etc.

 2014-10-24  Richard Biener  rguent...@suse.de
 
   * match.pd (0 % X): Properly use the iterator iterating over
   all modulo operators.
   (X % 1): Likewise.
 
 Index: gcc/match.pd
 ===
 --- gcc/match.pd  (revision 216648)
 +++ gcc/match.pd  (working copy)
 @@ -64,13 +64,13 @@ (define_predicates
  (for op (ceil_mod floor_mod round_mod trunc_mod)
   /* 0 % X is always zero.  */
   (simplify
 -  (trunc_mod integer_zerop@0 @1)
 +  (op integer_zerop@0 @1)
/* But not for 0 % 0 so that we can get the proper warnings and errors.  */
(if (!integer_zerop (@1))
 @0))
   /* X % 1 is always zero.  */
   (simplify
 -  (trunc_mod @0 integer_onep)
 +  (op @0 integer_onep)
{ build_zero_cst (type); }))
  
  /* x | ~0 - ~0  */

Jakub


Re: [PATCH] Fix modulo patterns in match.pd

2014-10-24 Thread Richard Biener
On Fri, 24 Oct 2014, Jakub Jelinek wrote:

 On Fri, Oct 24, 2014 at 03:27:19PM +0200, Richard Biener wrote:
  As noted by Marc I forgot to actually utilize the iterator variable.
  
  Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
  
  Richard.
  
  PS: How do we want to refer to patterns in ChangeLogs?
 
 Perhaps the syntax should be (simplify name (...) { ... })
 (maybe the name being optional?), where you'd give some name to the
 simplification, say 0 % X or 0 % X = 0 or 0 % X variant 3
 or whatever, then you could easily refer to those strings in ChangeLog,
 on gcc-patches, in comments etc.

I ripped out optional name support when I added user-defined predicates
which look like

(match truth_valued_p
  (truth_not @0))

or

(match (logical_inverted_value @0)
 (bit_not truth_valued_p@0))

(un-)conveniently the parsers for (simplify...) and (match...)
are shared.

I can see to re-add the optional pattern naming.  OTOH it will be
fun to invent an unique name for each of them ;)  (patternN
anyone? ...)

Richard.

  2014-10-24  Richard Biener  rguent...@suse.de
  
  * match.pd (0 % X): Properly use the iterator iterating over
  all modulo operators.
  (X % 1): Likewise.
  
  Index: gcc/match.pd
  ===
  --- gcc/match.pd(revision 216648)
  +++ gcc/match.pd(working copy)
  @@ -64,13 +64,13 @@ (define_predicates
   (for op (ceil_mod floor_mod round_mod trunc_mod)
/* 0 % X is always zero.  */
(simplify
  -  (trunc_mod integer_zerop@0 @1)
  +  (op integer_zerop@0 @1)
 /* But not for 0 % 0 so that we can get the proper warnings and errors.  
  */
 (if (!integer_zerop (@1))
  @0))
/* X % 1 is always zero.  */
(simplify
  -  (trunc_mod @0 integer_onep)
  +  (op @0 integer_onep)
 { build_zero_cst (type); }))
   
   /* x | ~0 - ~0  */


[PATCHv5][Kasan] Allow to override Asan shadow offset from command line

2014-10-24 Thread Yury Gribov

Hi all,

On 10/17/2014 11:53 AM, Yury Gribov wrote:

On 09/29/2014 09:21 PM, Yury Gribov wrote:

Kasan developers has asked for an option to override offset of Asan
shadow memory region. This should simplify experimenting with memory
layouts on 64-bit architectures.


New patch which checks that -fasan-shadow-offset is only enabled for
-fsanitize=kernel-address. I (unfortunately) can't make this --param
because this can be a 64-bit value.


New patchset that adds strtoull to libiberty (blind copy-paste of
already existing strtoul.c) and uses it to parse -fasan-shadow-offset
(to avoid problem with compiling for 64-bit target a 32-bit host).


A new version of patchset which does a proper implementation of 
strtoll/strtoull in libiberty (with tests, docs and stuff).


Bootstrapped and regtested on x64.

As mentioned previously, I'm not sure how to properly test strtoll 
implementation (strtoll is already part of Linux glibc so my 
implementation is not compiled in by default).  I've manually embedded 
strtoll.o/strtoull.o into libiberty.a and verified that regression tests 
passed.


-Y
From 1882c41de6c8ae53b7e199b3cc655b6f4b31e8fb Mon Sep 17 00:00:00 2001
From: Yury Gribov y.gri...@samsung.com
Date: Thu, 16 Oct 2014 18:31:10 +0400
Subject: [PATCH 1/2] Add strtoll and strtoull to libiberty.

2014-10-20  Yury Gribov  y.gri...@samsung.com

include/
	* libiberty.h (strtol, strtoul, strtoll, strtoull): New prototypes.

libiberty/
	* strtoll.c: New file.
	* strtoull.c: New file.
	* configure.ac: Add long long checks. Add harness for strtoll and
	strtoull. Check decls for strtol, strtoul, strtoll, strtoull.
	* Makefile.in (CFILES, CONFIGURED_OFILES): Added strtoll and strtoull.
	* config.in: Regenerate.
	* configure: Regenerate.
	* functions.texi: Regenerate.
	* testsuite/Makefile.in (check-strtol): New rule.
	(test-strtol): Likewise.
	(mostlyclean): Clean up strtol test.
	* testsuite/test-strtol.c: New test.
---
 include/libiberty.h   |   27 ++
 libiberty/Makefile.in |   46 +++---
 libiberty/config.in   |   31 +++
 libiberty/configure   |  122 +++-
 libiberty/configure.ac|   14 ++-
 libiberty/functions.texi  |   18 
 libiberty/strtoll.c   |  175 +++
 libiberty/strtoull.c  |  122 
 libiberty/testsuite/Makefile.in   |   12 ++-
 libiberty/testsuite/test-strtol.c |  184 +
 10 files changed, 733 insertions(+), 18 deletions(-)
 create mode 100644 libiberty/strtoll.c
 create mode 100644 libiberty/strtoull.c
 create mode 100644 libiberty/testsuite/test-strtol.c

diff --git a/include/libiberty.h b/include/libiberty.h
index d09c9a5..26355a9 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -655,6 +655,33 @@ extern size_t strnlen (const char *, size_t);
 extern int strverscmp (const char *, const char *);
 #endif
 
+#if defined(HAVE_DECL_STRTOL)  !HAVE_DECL_STRTOL
+extern long int strtol (const char *nptr,
+char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRTOUL)  !HAVE_DECL_STRTOUL
+extern unsigned long int strtoul (const char *nptr,
+  char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRTOLL)  !HAVE_DECL_STRTOLL
+__extension__
+extern long long int strtoll (const char *nptr,
+  char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRTOULL)  !HAVE_DECL_STRTOULL
+__extension__
+extern unsigned long long int strtoull (const char *nptr,
+char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRVERSCMP)  !HAVE_DECL_STRVERSCMP
+/* Compare version strings.  */
+extern int strverscmp (const char *, const char *);
+#endif
+
 /* Set the title of a process */
 extern void setproctitle (const char *name, ...);
 
diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in
index 9b87720..1b0d8ae 100644
--- a/libiberty/Makefile.in
+++ b/libiberty/Makefile.in
@@ -152,8 +152,8 @@ CFILES = alloca.c argv.c asprintf.c atexit.c\
 	 spaces.c splay-tree.c stack-limit.c stpcpy.c stpncpy.c		\
 	 strcasecmp.c strchr.c strdup.c strerror.c strncasecmp.c	\
 	 strncmp.c strrchr.c strsignal.c strstr.c strtod.c strtol.c	\
-	 strtoul.c strndup.c strnlen.c strverscmp.c			\
-	timeval-utils.c tmpnam.c	\
+	 strtoll.c strtoul.c strtoull.c strndup.c strnlen.c \
+	 strverscmp.c timeval-utils.c tmpnam.c\
 	unlink-if-ordinary.c		\
 	vasprintf.c vfork.c vfprintf.c vprintf.c vsnprintf.c vsprintf.c	\
 	waitpid.c			\
@@ -219,8 +219,8 @@ CONFIGURED_OFILES = ./asprintf.$(objext) ./atexit.$(objext)		\
 	 ./strchr.$(objext) ./strdup.$(objext) ./strncasecmp.$(objext)	\
 	 ./strncmp.$(objext) ./strndup.$(objext) ./strnlen.$(objext)	\
 	 ./strrchr.$(objext) ./strstr.$(objext) ./strtod.$(objext)	\
-	 ./strtol.$(objext) ./strtoul.$(objext) ./strverscmp.$(objext)	

Re: [patch,avr] tweak sign extensions, take #2

2014-10-24 Thread Denis Chertykov
2014-10-24 14:37 GMT+04:00 Georg-Johann Lay a...@gjlay.de:
 Am 10/23/2014 08:16 PM schrieb Denis Chertykov:

 This optimization makes most sign-extensions one instruction shorter in
 the
 case when the source register may be clobbered and the register numbers
 are
 different.  Source and destination may overlap.

 Ok for trunk?

 Johann

 gcc/
  * config/avr/avr.md (extendqihi2, extendqipsi2, extendqisi2)
  (extendhipsi2, extendhisi2): Optimize if source reg is unused
  after the insns and has different REGNO than destination.


 Approved.

 Denis.


 Finally I switched to a solution that avoids all the ugly asm snippets and
 special casing, and which is exact w.r.t code size.  So allow me drop the
 patch from above and to propose this one for trunk.  Sorry for the
 inconvenience.

 In any case it uses LSL/SBC idiom instead of the old CLR/SBRC/COM.


 Johann

 * avr-protos.h (avr_out_sign_extend): New.
 * avr.c (avr_adjust_insn_length) [ADJUST_LEN_SEXT]: Handle.
 (avr_out_sign_extend): New function.
 * avr.md (extendqihi2, extendqipsi2, extendqisi2, extendhipsi2)
 (extendhisi2, extendpsisi2): Use it.
 (adjust_len) [sext]: New.




I'm agree with you. It's better.
Approved.

Denis.


Re: [PATCHv5][Kasan] Allow to override Asan shadow offset from command line

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 05:56:37PM +0400, Yury Gribov wrote:
 From 1882c41de6c8ae53b7e199b3cc655b6f4b31e8fb Mon Sep 17 00:00:00 2001
 From: Yury Gribov y.gri...@samsung.com
 Date: Thu, 16 Oct 2014 18:31:10 +0400
 Subject: [PATCH 1/2] Add strtoll and strtoull to libiberty.
 
 2014-10-20  Yury Gribov  y.gri...@samsung.com
 
 include/
   * libiberty.h (strtol, strtoul, strtoll, strtoull): New prototypes.
 
 libiberty/
   * strtoll.c: New file.
   * strtoull.c: New file.
   * configure.ac: Add long long checks. Add harness for strtoll and
   strtoull. Check decls for strtol, strtoul, strtoll, strtoull.
   * Makefile.in (CFILES, CONFIGURED_OFILES): Added strtoll and strtoull.
   * config.in: Regenerate.
   * configure: Regenerate.
   * functions.texi: Regenerate.
   * testsuite/Makefile.in (check-strtol): New rule.
   (test-strtol): Likewise.
   (mostlyclean): Clean up strtol test.
   * testsuite/test-strtol.c: New test.

Ian, can you please review this?

 --- a/gcc/common.opt
 +++ b/gcc/common.opt
 @@ -883,6 +883,10 @@ fsanitize=
  Common Driver Report Joined
  Select what to sanitize
  
 +fasan-shadow-offset=
 +Common Joined RejectNegative Var(common_deferred_options) Defer
 +-fasan-shadow-offset=stringUse custom shadow memory offset.

Shouldn't that be =number or =address instead of string?

 --- a/gcc/doc/invoke.texi
 +++ b/gcc/doc/invoke.texi
 @@ -297,7 +297,7 @@ Objective-C and Objective-C++ Dialects}.
  @xref{Debugging Options,,Options for Debugging Your Program or GCC}.
  @gccoptlist{-d@var{letters}  -dumpspecs  -dumpmachine  -dumpversion @gol
  -fsanitize=@var{style} -fsanitize-recover -fsanitize-recover=@var{style} @gol
 --fsanitize-undefined-trap-on-error @gol
 +-fasan-shadow-offset=@var{string} -fsanitize-undefined-trap-on-error @gol

Likewise here, @var{number} instead.

  -fdbg-cnt-list -fdbg-cnt=@var{counter-value-list} @gol
  -fdisable-ipa-@var{pass_name} @gol
  -fdisable-rtl-@var{pass_name} @gol
 @@ -5642,6 +5642,12 @@ While @option{-ftrapv} causes traps for signed 
 overflows to be emitted,
  @option{-fsanitize=undefined} gives a diagnostic message.
  This currently works only for the C family of languages.
  
 +@item -fasan-shadow-offset=@var{string}

And here.

Otherwise looks good to me.

Jakub


Re: [PATCH][optabs] PR63442 libgcc_cmp_return_mode not always return word_mode

2014-10-24 Thread Jiong Wang

ping~

thanks.

Regards,
Jiong

On 17/10/14 13:04, Jiong Wang wrote:

the cause should be one minor bug in prepare_cmp_insn.

the last mode parameter pmode of prepare_cmp_insn should match the
mode of the first parameter x, while during the recursive call of 
prepare_cmp_insn,
x is with mode of targetm.libgcc_cmp_return_mode () and pmode is assign to 
word_mode.

generally this is OK, because default libgcc_cmp_return_mode hook always return 
word_mode,
but AArch64 has a target private implementation which always return SImode, so 
there is a
mismatch which cause a ICE later.

this minor issue is hidding because nearly all other targets use default hook, 
and the
compare is rarely invoked.

Thanks

gcc/
PR target/63442
* optabs.c (prepare_cmp_insn): Use target hook libgcc_cmp_return_mode 
instead of word_mode.





Re: [PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode

2014-10-24 Thread Evgeny Stupachenko
The following patch align stack for mcount and there should be no
problems with unwind as ix86_frame_pointer_required is true when
crtl-profile is true and flag_fentry is false (we call mcount after
function prolog).
When flag_fentry is true it is set to false in 32bit PIC mode:
  if (!TARGET_64BIT_P (opts-x_ix86_isa_flags)  opts-x_flag_pic)
{
  if (opts-x_flag_fentry  0)
sorry (-mfentry isn%'t supported for 32-bit in combination 
  with -fpic);
  opts-x_flag_fentry = 0;
}



2014-10-24  Evgeny Stupachenko  evstu...@gmail.com

PR target/63534
* config/i386/i386.c (x86_function_profiler): Add GOT register init
for mcount call.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6235c4f..2dff29c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -39124,13 +39124,22 @@ x86_function_profiler (FILE *file, int
labelno ATTRIBUTE_UNUSED)
   else
x86_print_call_or_nop (file, mcount_name);
 }
+  /* At this stage we can't detrmine where GOT register is, as RA can allocate
+ it to any hard register.  Therefore we need to set it once again.  */
   else if (flag_pic)
 {
+  pic_labels_used |= 1  BX_REG;
+  fprintf (file,\tsub\t$16, %%esp\n);
+  fprintf (file,\tmovl\t%%ebx, (%%esp)\n);
+  fprintf (file,\tcall\t__x86.get_pc_thunk.bx\n);
+  fprintf (file,\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n);
 #ifndef NO_PROFILE_COUNTERS
   fprintf (file, \tleal\t%sP%d@GOTOFF(%%ebx),%%
PROFILE_COUNT_REGISTER \n,
   LPREFIX, labelno);
 #endif
   fprintf (file, 1:\tcall\t*%s@GOT(%%ebx)\n, mcount_name);
+  fprintf (file,\tmovl\t(%%esp), %%ebx\n);
+  fprintf (file,\tadd\t$16, %%esp\n);
 }
   else
 {

On Fri, Oct 17, 2014 at 6:38 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 17, 2014 at 06:30:42PM +0400, Evgeny Stupachenko wrote:
 Hi,

 The patch fixes profile in 32bits PIC mode (only -p option affected).

 x86 bootstrap, make check passed

 spec2000 o2 -p train data on Corei7:
 CINT -5%
 CFP  +1,5
 compared to a compiler before enabling ebx.

 There is a potential performance improve after the patch applied
 suggested by Jakub:
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534#c8
 There is opened bug on this: PR63527. However the fix of the bug is
 more complicated.

 Is it ok?

 Unfortunately I don't think it is ok.
 1) you don't set the appropriate bit in pic_labels_used (for ebx)
 2) more importantly, it causes the stack to be misaligned (i.e. violating
ABI) for the _mcount call, and, break unwind info.

 2014-10-16  Evgeny Stupachenko  evstu...@gmail.com

 PR target/63534
 * config/i386/i386.c (x86_function_profiler): Add GOT register init
 for mcount call.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index a3ca2ed..5117572 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -39119,11 +39126,15 @@ x86_function_profiler (FILE *file, int
 labelno ATTRIBUTE_UNUSED)
  }
else if (flag_pic)
  {
 +  fprintf (file,\tpush\t%%ebx\n);
 +  fprintf (file,\tcall\t__x86.get_pc_thunk.bx\n);
 +  fprintf (file,\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n);
  #ifndef NO_PROFILE_COUNTERS
fprintf (file, \tleal\t%sP%d@GOTOFF(%%ebx),%%
 PROFILE_COUNT_REGISTER \n,
LPREFIX, labelno);
  #endif
fprintf (file, 1:\tcall\t*%s@GOT(%%ebx)\n, mcount_name);
 +  fprintf (file,\tpop\t%%ebx\n);
  }
else
  {

 Jakub


Unifying std::atomic_int and std::atomicint

2014-10-24 Thread Jonathan Wakely

Our atomic was implemented (by Benjamin IIRC) based on an early
C++0x draft when the spec was still trying to be valid for both C and
C++. Part of the C compatibility aspect was that std::atomic_int is
allowed to be either a typedef for std::atomicint or a base class of
it, so that a C library could define std::atomic_int and then the C++
library could make std::atomicint derive from that.

In the final C11 spec atomics work completely differently, and
atomic_int is a typedef for _Atomic int, which is not a valid base
class. So the old C++0x draft's compatibility aim is impossible,
atomic_int can never be the same type in C and C++.

In our implementation, std::atomic_int is a base class of
std::atomicint, which has no benefit I can see, but causes
https://gcc.gnu.org/PR60940

Rather than overloading every atomic_op() non-member function to
handle the derived class and the base class, it would be simpler to
just get rid of the base classes and make atomic_xxx a typedef for
atomicxxx, as the attached patch does for atomic_{bool,char,schar}.

Does anyone object to that change?

If you object, are you prepared to do the work to fix PR60940? :-)



[Note:- it could probably be simplified even further so atomicchar
is just:

 template
   struct atomicchar : public __atomic_basechar
   {
 using __atomic_basechar::__atomic_base;
   };

But that could be done later as it wouldn't change anything
observable, making atomic_char a typedef for atomicchar is the
observable and IMHO important change. -end note]

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index 1fc0ebb..a591c46 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -120,12 +120,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   templatetypename _IntTp
 struct __atomic_base;
 
-  /// atomic_char
-  typedef __atomic_basechar  atomic_char;
-
-  /// atomic_schar
-  typedef __atomic_basesigned char   atomic_schar;
-
   /// atomic_uchar
   typedef __atomic_baseunsigned char atomic_uchar;
 
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 85dc252..c58853e 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -49,21 +49,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-  /// atomic_bool
+  templatetypename _Tp
+struct atomic;
+
+  /// atomicbool
   // NB: No operators or fetch-operations for this type.
-  struct atomic_bool
+  template
+  struct atomicbool
   {
   private:
 __atomic_basebool_M_base;
 
   public:
-atomic_bool() noexcept = default;
-~atomic_bool() noexcept = default;
-atomic_bool(const atomic_bool) = delete;
-atomic_bool operator=(const atomic_bool) = delete;
-atomic_bool operator=(const atomic_bool) volatile = delete;
+atomic() noexcept = default;
+~atomic() noexcept = default;
+atomic(const atomic) = delete;
+atomic operator=(const atomic) = delete;
+atomic operator=(const atomic) volatile = delete;
 
-constexpr atomic_bool(bool __i) noexcept : _M_base(__i) { }
+constexpr atomic(bool __i) noexcept : _M_base(__i) { }
 
 bool
 operator=(bool __i) noexcept
@@ -151,6 +155,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return _M_base.compare_exchange_strong(__i1, __i2, __m); }
   };
 
+  /// atomic_bool
+  typedef atomicbool atomic_bool;
+
 
   /**
*  @brief Generic atomic type, primary class template.
@@ -485,31 +492,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
 
-  /// Explicit specialization for bool.
-  template
-struct atomicbool : public atomic_bool
-{
-  typedef bool __integral_type;
-  typedef atomic_bool  __base_type;
-
-  atomic() noexcept = default;
-  ~atomic() noexcept = default;
-  atomic(const atomic) = delete;
-  atomic operator=(const atomic) = delete;
-  atomic operator=(const atomic) volatile = delete;
-
-  constexpr atomic(__integral_type __i) noexcept : __base_type(__i) { }
-
-  using __base_type::operator __integral_type;
-  using __base_type::operator=;
-};
-
   /// Explicit specialization for char.
   template
-struct atomicchar : public atomic_char
+struct atomicchar : public __atomic_basechar
 {
   typedef char __integral_type;
-  typedef atomic_char  __base_type;
+  typedef __atomic_basechar  __base_type;
 
   atomic() noexcept = default;
   ~atomic() noexcept = default;
@@ -523,12 +511,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 };
 
+  /// atomic_char
+  typedef atomicchar atomic_char;
+
   /// Explicit specialization for signed char.
   template
-struct atomicsigned char : public atomic_schar
+struct atomicsigned char : public __atomic_basesigned char
 {
   typedef signed char  __integral_type;
- 

[PATCH] Fix typedef-name printing (PR c/56980)

2014-10-24 Thread Marek Polacek
Our current C pretty printer output sometimes looks a bit goofy:
expected ‘enum F *’ but argument is of type ‘enum F *’.
It's because it always prints struct/union/enum even though
the type is a typedef name.  This patch ought to fix this. 
We've got a bunch of reports about this over the years...

The C++ printer can also print B* {aka A*}, I'll try to learn
c_tree_printer to do something similar as well.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2014-10-24  Marek Polacek  pola...@redhat.com

PR c/56980
* c-pretty-print.c (c_pretty_printer::simple_type_specifier): Don't
print struct/union/enum for typedefed names.

* gcc.dg/pr56980.c: New test.

diff --git gcc/c-family/c-pretty-print.c gcc/c-family/c-pretty-print.c
index 3b2dbc1..9096a07 100644
--- gcc/c-family/c-pretty-print.c
+++ gcc/c-family/c-pretty-print.c
@@ -416,7 +416,9 @@ c_pretty_printer::simple_type_specifier (tree t)
 case UNION_TYPE:
 case RECORD_TYPE:
 case ENUMERAL_TYPE:
-  if (code == UNION_TYPE)
+  if (TYPE_NAME (t)  TREE_CODE (TYPE_NAME (t)) == TYPE_DECL)
+   /* Don't decorate the type if this is a typedef name.  */;
+  else if (code == UNION_TYPE)
pp_c_ws_string (this, union);
   else if (code == RECORD_TYPE)
pp_c_ws_string (this, struct);
diff --git gcc/testsuite/gcc.dg/pr56980.c gcc/testsuite/gcc.dg/pr56980.c
index e69de29..f48379a 100644
--- gcc/testsuite/gcc.dg/pr56980.c
+++ gcc/testsuite/gcc.dg/pr56980.c
@@ -0,0 +1,24 @@
+/* PR c/56980 */
+/* { dg-do compile } */
+
+typedef struct A { int i; } B;
+typedef union U { int i; } V;
+typedef enum E { G } F;
+
+void foo_s (struct A); /* { dg-message expected .struct A. but argument is of 
type .B \\*. } */
+void foo_u (union U); /* { dg-message expected .union U. but argument is of 
type .V \\*. } */
+void foo_e (enum E); /* { dg-message expected .enum E. but argument is of 
type .F \\*. } */
+void foo_sp (B *); /* { dg-message expected .B \\*. but argument is of type 
.struct B \\*. } */
+void foo_up (V *); /* { dg-message expected .V \\*. but argument is of type 
.union V \\*. } */
+void foo_ep (F *); /* { dg-message expected .F \\*. but argument is of type 
.enum F \\*. } */
+
+void 
+bar (B *b, V *v, F *f)
+{
+  foo_s (b); /* { dg-error incompatible } */
+  foo_u (v); /* { dg-error incompatible } */
+  foo_e (f); /* { dg-error incompatible } */
+  foo_sp ((struct B *) b); /* { dg-error passing argument } */
+  foo_up ((union V *) v); /* { dg-error passing argument } */
+  foo_ep (__extension__ (enum F *) f); /* { dg-error passing argument } */
+}

Marek


Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

2014-10-24 Thread Ilya Verbin
On 20 Oct 15:19, Ilya Verbin wrote:
 On 15 Oct 16:23, Richard Biener wrote:
   +static bool
   +initialize_offload (void)
   +{
   +  bool have_offload = false;
   +  struct cgraph_node *node;
   +  struct varpool_node *vnode;
   +
   +  FOR_EACH_DEFINED_FUNCTION (node)
   +if (lookup_attribute (omp declare target, DECL_ATTRIBUTES 
   (node-decl)))
   +  {
   + have_offload = true;
   + break;
   +  }
   +
   +  FOR_EACH_DEFINED_VARIABLE (vnode)
   +{
   +  if (!lookup_attribute (omp declare target,
   +  DECL_ATTRIBUTES (vnode-decl))
   +   || TREE_CODE (vnode-decl) != VAR_DECL
   +   || DECL_SIZE (vnode-decl) == 0)
   + continue;
   +  have_offload = true;
   +}
   +
   +  return have_offload;
   +}
   +
  
  I wonder if we can avoid the above by means of a global have_offload
  flag?  (or inside gcc::context)

   +/* Select what needs to be streamed out.  In regular lto mode stream 
   everything.
   +   In offload lto mode stream only stuff marked with an attribute.  */
   +void
   +select_what_to_stream (bool offload_lto_mode)
   +{
   +  struct symtab_node *snode;
   +  FOR_EACH_SYMBOL (snode)
   +snode-need_lto_streaming
   +  = !offload_lto_mode || lookup_attribute (omp declare target,
   +DECL_ATTRIBUTES (snode-decl));
  
  I suppose I suggested this already earlier this year.  Why keep this
  artificial attribute when you have a cgraph node flag?
 
   +   /* If '#pragma omp critical' is inside target region, the symbol must
   +  have an 'omp declare target' attribute.  */
   +   omp_context *octx;
   +   for (octx = ctx-outer; octx; octx = octx-outer)
   + if (is_targetreg_ctx (octx))
   +   {
   + DECL_ATTRIBUTES (decl)
   +   = tree_cons (get_identifier (omp declare target),
   +NULL_TREE, DECL_ATTRIBUTES (decl));
  
  Here - why not set a flag on cgraph_get_node (decl) instead?
 
 I thought that select_what_to_stream is exactly what you've suggested.
 Could you please clarify this?  You propose to replace omp declare target
 attribure with some cgraph node flag like need_offload?  But we'll need
 need_lto_streaming anyway, since for LTO it should be 1 for all nodes, but for
 offloading it should be equal to need_offload.

We have to set the global have_offload flag in few places in omp-low.c and in FE
(c/c-decl.c:c_decl_attributes, fortran/trans-common.c:build_common_decl,
fortran/trans-decl.c:add_attributes_to_decl).
This way looks for me a bit more complicated than the current approach.

Actually, we could follow Jakub's suggestion of caching the attribute in a bit
field, and set the global have_offload flag on the run without any changes in
FE.  However, I don't know a suitable place for it.  If you agree with the
approach, could you please specify the place?

Thanks,
  -- Ilya


Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 06:16:01PM +0400, Ilya Verbin wrote:
 We have to set the global have_offload flag in few places in omp-low.c and in 
 FE
 (c/c-decl.c:c_decl_attributes, fortran/trans-common.c:build_common_decl,
 fortran/trans-decl.c:add_attributes_to_decl).
 This way looks for me a bit more complicated than the current approach.
 
 Actually, we could follow Jakub's suggestion of caching the attribute in a bit
 field, and set the global have_offload flag on the run without any changes in
 FE.  However, I don't know a suitable place for it.  If you agree with the
 approach, could you please specify the place?

Can't you do that when creating the cgraph or varpool nodes?
I'd expect the attribute to be already present on the decls at those spots.

Jakub


Re: [PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode

2014-10-24 Thread Jakub Jelinek
On Fri, Oct 24, 2014 at 06:12:15PM +0400, Evgeny Stupachenko wrote:
 The following patch align stack for mcount and there should be no
 problems with unwind as ix86_frame_pointer_required is true when
 crtl-profile is true and flag_fentry is false (we call mcount after
 function prolog).
 When flag_fentry is true it is set to false in 32bit PIC mode:
   if (!TARGET_64BIT_P (opts-x_ix86_isa_flags)  opts-x_flag_pic)
 {
   if (opts-x_flag_fentry  0)
 sorry (-mfentry isn%'t supported for 32-bit in combination 
   with -fpic);
   opts-x_flag_fentry = 0;
 }

What is wrong in emitting the set_got right before the PROLOGUE_END
note and that way sharing a single load from both?
This looks just as a hack.

 2014-10-24  Evgeny Stupachenko  evstu...@gmail.com
 
 PR target/63534
 * config/i386/i386.c (x86_function_profiler): Add GOT register init
 for mcount call.
 
 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 6235c4f..2dff29c 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -39124,13 +39124,22 @@ x86_function_profiler (FILE *file, int
 labelno ATTRIBUTE_UNUSED)
else
 x86_print_call_or_nop (file, mcount_name);
  }
 +  /* At this stage we can't detrmine where GOT register is, as RA can 
 allocate
 + it to any hard register.  Therefore we need to set it once again.  */
else if (flag_pic)
  {
 +  pic_labels_used |= 1  BX_REG;
 +  fprintf (file,\tsub\t$16, %%esp\n);
 +  fprintf (file,\tmovl\t%%ebx, (%%esp)\n);
 +  fprintf (file,\tcall\t__x86.get_pc_thunk.bx\n);
 +  fprintf (file,\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n);
  #ifndef NO_PROFILE_COUNTERS
fprintf (file, \tleal\t%sP%d@GOTOFF(%%ebx),%%
 PROFILE_COUNT_REGISTER \n,
LPREFIX, labelno);
  #endif
fprintf (file, 1:\tcall\t*%s@GOT(%%ebx)\n, mcount_name);
 +  fprintf (file,\tmovl\t(%%esp), %%ebx\n);
 +  fprintf (file,\tadd\t$16, %%esp\n);
  }
else
  {
 

Jakub


Re: [PATCH 3/4] Add libgomp plugin for Intel MIC

2014-10-24 Thread Jakub Jelinek
On Thu, Oct 23, 2014 at 07:41:12PM +0400, Ilya Verbin wrote:
  malloc can fail, SIGSEGV in response to that is not desirable.
  Can't you fallback to alloca, or use just alloca, or use alloca
  with malloc fallback?
 
 I replaced it with alloca.

There is a risk if a suid or otherwise priviledge escalated program
uses it and attacker passes huge env vars.
Perhaps use alloca if it is = 2KB and malloc otherwise, and in that case
if malloc fails, just do a fatal error?

  Where does this artificial limit come from?  Using libNNN.so library names?
  Can't you use lib%d.so instead?
 
 Yes, it comes from the Image structure 
 (liboffloadmic/runtime/offload_host.h:52)
 It must contain a null-terminated name, therefore I need to allocate some 
 space
 for the name in plugin's struct TargetImage.  But the structure can't contain
 any bytes after the trailing zero and before the actual data.
 So, now I extended the name to 10 digits and removed the comparison with 1000.

Ok.

  Also, seeing register_image, shouldn't there be
  GOMP_OFFLOAD_unregister_image which would be invoked when the library
  containing MIC offloading regions is dlclosed?
  One could use __cxa_atexit or similar for that, something that is given
  __dso_handle.  Or is no cleanup necessary?  At least unregistering it
  from translation tables, because the same addresses might be reused by a
  different shared library?
  With dlopen/dlclose in mind, 1000 might be easily reached, consider 1
  times dlopening/dlclosing (perhaps over longer time, by long running daemon)
  a shared library containg #pragma omp target region.
 
 Hmm, previously we've tested only cases when all libraries are loaded before 
 the
 first offload.  Offloading from a dlopened library after the call to
 gomp_target_init isn't working.  So, this will require some changes in
 libgomp/target.c .  Is it ok to fix this bug in a separate patch?

I guess it can be done incrementally, even during stage3.

Jakub


[PATCH v2] avoid alignment of static variables affecting stack's

2014-10-24 Thread Jan Beulich
Function (or more narrow) scope static variables (as well as others not
placed on the stack) should also not have any effect on the stack
alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
construct using an 8-byte aligned sub-file-scope local variable.

According to my checking bad behavior started with 4.6.x (4.5.3 was
still okay), but generated code got quite a bit worse as of 4.9.0.

[v2: Drop inclusion of hard register variables, as requested by
 Jakub and Richard.]

gcc/
2014-10-24  Jan Beulich  jbeul...@suse.com

* cfgexpand.c (expand_one_var): Exclude static and external
variables when adjusting stack alignment related state.

gcc/testsuite/
2014-10-24  Jan Beulich  jbeul...@suse.com

* gcc.c-torture/execute/stkalign.c: New.

--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1233,12 +1233,15 @@ static HOST_WIDE_INT
 expand_one_var (tree var, bool toplevel, bool really_expand)
 {
   unsigned int align = BITS_PER_UNIT;
+  bool stack = true;
   tree origvar = var;
 
   var = SSAVAR (var);
 
   if (TREE_TYPE (var) != error_mark_node  TREE_CODE (var) == VAR_DECL)
 {
+  stack = !TREE_STATIC (var)  !DECL_EXTERNAL (var);
+
   /* Because we don't know if VAR will be in register or on stack,
 we conservatively assume it will be on stack even if VAR is
 eventually put into register after RA pass.  For non-automatic
@@ -1267,22 +1270,25 @@ expand_one_var (tree var, bool toplevel,
align = POINTER_SIZE;
 }
 
-  if (SUPPORTS_STACK_ALIGNMENT
-   crtl-stack_alignment_estimated  align)
+  if (stack)
 {
-  /* stack_alignment_estimated shouldn't change after stack
- realign decision made */
-  gcc_assert (!crtl-stack_realign_processed);
-  crtl-stack_alignment_estimated = align;
+  if (SUPPORTS_STACK_ALIGNMENT
+  crtl-stack_alignment_estimated  align)
+   {
+ /* stack_alignment_estimated shouldn't change after stack
+realign decision made */
+ gcc_assert (!crtl-stack_realign_processed);
+ crtl-stack_alignment_estimated = align;
+   }
+
+  /* stack_alignment_needed  PREFERRED_STACK_BOUNDARY is permitted.
+So here we only make sure stack_alignment_needed = align.  */
+  if (crtl-stack_alignment_needed  align)
+   crtl-stack_alignment_needed = align;
+  if (crtl-max_used_stack_slot_alignment  align)
+   crtl-max_used_stack_slot_alignment = align;
 }
 
-  /* stack_alignment_needed  PREFERRED_STACK_BOUNDARY is permitted.
- So here we only make sure stack_alignment_needed = align.  */
-  if (crtl-stack_alignment_needed  align)
-crtl-stack_alignment_needed = align;
-  if (crtl-max_used_stack_slot_alignment  align)
-crtl-max_used_stack_slot_alignment = align;
-
   if (TREE_CODE (origvar) == SSA_NAME)
 {
   gcc_assert (TREE_CODE (var) != VAR_DECL
--- a/gcc/testsuite/gcc.c-torture/execute/stkalign.c
+++ b/gcc/testsuite/gcc.c-torture/execute/stkalign.c
@@ -0,0 +1,26 @@
+/* { dg-options -fno-inline } */
+
+#include assert.h
+
+#define ALIGNMENT 64
+
+unsigned test(unsigned n, unsigned p)
+{
+  static struct { char __attribute__((__aligned__(ALIGNMENT))) c; } s;
+  unsigned x;
+
+  assert(__alignof__(s) == ALIGNMENT);
+  asm ( : =g (x), +m (s) : 0 (x));
+
+  return n ? test(n - 1, x) : (x ^ p);
+}
+
+int main (int argc, char *argv[] __attribute__((unused)))
+{
+  unsigned int x = test(argc, 0);
+
+  x |= test(argc + 1, 0);
+  x |= test(argc + 2, 0);
+
+  return !(x  (ALIGNMENT - 1));
+}



avoid alignment of static variables affecting stack's

Function (or more narrow) scope static variables (as well as others not
placed on the stack) should also not have any effect on the stack
alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
construct using an 8-byte aligned sub-file-scope local variable.

According to my checking bad behavior started with 4.6.x (4.5.3 was
still okay), but generated code got quite a bit worse as of 4.9.0.

[v2: Drop inclusion of hard register variables, as requested by
 Jakub and Richard.]

gcc/
2014-10-24  Jan Beulich  jbeul...@suse.com

* cfgexpand.c (expand_one_var): Exclude static and external
variables when adjusting stack alignment related state.

gcc/testsuite/
2014-10-24  Jan Beulich  jbeul...@suse.com

* gcc.c-torture/execute/stkalign.c: New.

--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1233,12 +1233,15 @@ static HOST_WIDE_INT
 expand_one_var (tree var, bool toplevel, bool really_expand)
 {
   unsigned int align = BITS_PER_UNIT;
+  bool stack = true;
   tree origvar = var;
 
   var = SSAVAR (var);
 
   if (TREE_TYPE (var) != error_mark_node  TREE_CODE (var) == VAR_DECL)
 {
+  stack = !TREE_STATIC (var)  !DECL_EXTERNAL (var);
+
   /* Because we don't know if VAR will be in register or on stack,
 we conservatively assume it will be on stack even if VAR is
 eventually put into register after RA pass.  For 

Re: [PATCH 3/4] Add libgomp plugin for Intel MIC

2014-10-24 Thread Ilya Verbin
On 24 Oct 16:35, Jakub Jelinek wrote:
 On Thu, Oct 23, 2014 at 07:41:12PM +0400, Ilya Verbin wrote:
   malloc can fail, SIGSEGV in response to that is not desirable.
   Can't you fallback to alloca, or use just alloca, or use alloca
   with malloc fallback?
  
  I replaced it with alloca.
 
 There is a risk if a suid or otherwise priviledge escalated program
 uses it and attacker passes huge env vars.
 Perhaps use alloca if it is = 2KB and malloc otherwise, and in that case
 if malloc fails, just do a fatal error?

Why is this more preferable than just a malloc + fatal error?
This function is executed only once at plugin initialization, therefore no real
performance gain could be achived.

Thanks,
  -- Ilya


Re: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*

2014-10-24 Thread Charles Baylis
On 24 October 2014 11:23, Marcus Shawcroft marcus.shawcr...@gmail.com wrote:
 On 23 October 2014 18:51, Charles Baylis charles.bay...@linaro.org wrote:

 Otherwise this and the previous 1/2 associated patch look good, can
 you respin with these tidy ups?

 OK for trunk?

 OK
 /Marcus

Committed to trunk as r216671 and r216672.


RE: [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral

2014-10-24 Thread Matthew Fortune
Alan Lawrence alan.lawre...@arm.com writes:
 Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC,
 to
 the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work
 out
 how to do the same for MIPS (specifically what I need to add to
 mips_expand_vec_reduc), and have had no response from the maintainers, so
 am

Sorry, I was looking at this but failed to send an email saying so. The lack
of vec_extract appears to be the stumbling point here so at the very least
we need to add a naïve version of that I believe.

 (2) also renaming reduc_..._scal_optab back to reduc_..._optab; would
 break the
 MIPS backend if something were not done with it's existing patterns.

I suspect we can deal with this in time to make a rename OK.

One thing occurred to me about this change in general which is that on the
whole the reduction to a scalar seems good for an epilogue but is there
a problem if the result is then replicated across a vector for further
processing. I.e. a vector is reduced to a scalar, which moves the value
from a SIMD register to a GP register (because scalar modes are not
supported in SIMD registers generally) and then gets moved back to a
SIMD register to form part of a new vector? Would you expect the
redundant moves to get eliminated?

Thanks,
Matthew


  1   2   >