Re: [PATCH 3/5] IPA ICF pass

2014-10-11 Thread Jan Hubicka
 
 After few days of measurement and tuning, I was able to get numbers to the 
 following shape:
 Execution times (seconds)
  phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
 wall1412 kB ( 0%) ggc
  phase opt and generate  :  27.83 (59%) usr   0.66 (19%) sys  28.52 (37%) 
 wall 1028813 kB (24%) ggc
  phase stream in :  16.90 (36%) usr   0.63 (18%) sys  17.60 (23%) 
 wall 3246453 kB (76%) ggc
  phase stream out:   2.76 ( 6%) usr   2.19 (63%) sys  31.34 (40%) 
 wall   2 kB ( 0%) ggc
  callgraph optimization  :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 0%) 
 wall  40 kB ( 0%) ggc
  ipa dead code removal   :   3.31 ( 7%) usr   0.01 ( 0%) sys   3.25 ( 4%) 
 wall   0 kB ( 0%) ggc
  ipa virtual call target :   3.69 ( 8%) usr   0.03 ( 1%) sys   3.80 ( 5%) 
 wall  21 kB ( 0%) ggc
  ipa devirtualization:   0.12 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) 
 wall   13704 kB ( 0%) ggc
  ipa cp  :   1.11 ( 2%) usr   0.07 ( 2%) sys   1.17 ( 2%) 
 wall  188558 kB ( 4%) ggc
  ipa inlining heuristics :   8.17 (17%) usr   0.14 ( 4%) sys   8.27 (11%) 
 wall  494738 kB (12%) ggc
  ipa comdats :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa lto gimple in   :   1.86 ( 4%) usr   0.40 (11%) sys   2.20 ( 3%) 
 wall  537970 kB (13%) ggc
  ipa lto gimple out  :   0.19 ( 0%) usr   0.08 ( 2%) sys   0.27 ( 0%) 
 wall   2 kB ( 0%) ggc
  ipa lto decl in :  12.20 (26%) usr   0.37 (11%) sys  12.64 (16%) 
 wall 2441687 kB (57%) ggc
  ipa lto decl out:   2.51 ( 5%) usr   0.21 ( 6%) sys   2.71 ( 3%) 
 wall   0 kB ( 0%) ggc
  ipa lto constructors in :   0.13 ( 0%) usr   0.02 ( 1%) sys   0.17 ( 0%) 
 wall   15692 kB ( 0%) ggc
  ipa lto constructors out:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa lto cgraph I/O  :   0.54 ( 1%) usr   0.09 ( 3%) sys   0.63 ( 1%) 
 wall  407182 kB (10%) ggc
  ipa lto decl merge  :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.34 ( 2%) 
 wall8220 kB ( 0%) ggc
  ipa lto cgraph merge:   1.00 ( 2%) usr   0.00 ( 0%) sys   1.00 ( 1%) 
 wall   14605 kB ( 0%) ggc
  whopr wpa   :   0.92 ( 2%) usr   0.00 ( 0%) sys   0.89 ( 1%) 
 wall   1 kB ( 0%) ggc
  whopr wpa I/O   :   0.01 ( 0%) usr   1.90 (55%) sys  28.31 (37%) 
 wall   0 kB ( 0%) ggc
  whopr partitioning  :   2.81 ( 6%) usr   0.01 ( 0%) sys   2.83 ( 4%) 
 wall4943 kB ( 0%) ggc
  ipa reference   :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.35 ( 2%) 
 wall   0 kB ( 0%) ggc
  ipa profile :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa pure const  :   1.62 ( 3%) usr   0.00 ( 0%) sys   1.63 ( 2%) 
 wall   0 kB ( 0%) ggc
  ipa icf :   2.65 ( 6%) usr   0.02 ( 1%) sys   2.68 ( 3%) 
 wall1352 kB ( 0%) ggc
  inline parameters   :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) 
 wall   0 kB ( 0%) ggc
  tree SSA rewrite:   0.11 ( 0%) usr   0.01 ( 0%) sys   0.08 ( 0%) 
 wall   18919 kB ( 0%) ggc
  tree SSA other  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
 wall   0 kB ( 0%) ggc
  tree SSA incremental:   0.24 ( 1%) usr   0.01 ( 0%) sys   0.32 ( 0%) 
 wall   11325 kB ( 0%) ggc
  tree operand scan   :   0.15 ( 0%) usr   0.02 ( 1%) sys   0.18 ( 0%) 
 wall  116283 kB ( 3%) ggc
  dominance frontiers :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
 wall   0 kB ( 0%) ggc
  dominance computation   :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.16 ( 0%) 
 wall   0 kB ( 0%) ggc
  varconst:   0.01 ( 0%) usr   0.02 ( 1%) sys   0.01 ( 0%) 
 wall   0 kB ( 0%) ggc
  loop fini   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
 wall   0 kB ( 0%) ggc
  unaccounted todo:   0.55 ( 1%) usr   0.00 ( 0%) sys   0.56 ( 1%) 
 wall   0 kB ( 0%) ggc
  TOTAL :  47.49 3.4877.46
 4276682 kB
 
 and I was able to reduce function bodies loaded in WPA to 35% (from previous 
 55%). The main problem

35% means that 35% of all function bodies are compared with something else? 
That feels pretty high.
but overall numbers are not so terrible.

 with speed was hidden in work list for congruence classes, where hash_set was 
 used. I chose the data
 structure to support delete operation, but it was really slow. Thus, hash_set 
 was replaced with linked list
 and a flag is used to identify if a set is removed or not.

Interesting, I would not expect bottleneck in a congruence solving :)
 
 I have no clue who complicated can it be to implement release_body function 
 to an operation that
 really releases the memory?

I suppose one can keep the caches from streamer and free trees read.  Freeing
gimple statemnts, cfg should be relatively easy. 

Lets however first try to tune the implementation rather than try to this hack
implemented. Explicit ggc_free calls 

Re: [PATCH 3/5] IPA ICF pass

2014-10-11 Thread Jan Hubicka
  +/* Verifies for given GIMPLEs S1 and S2 that
  +   goto statements are semantically equivalent.  */
  +
  +bool
  +func_checker::compare_gimple_goto (gimple g1, gimple g2)
  +{
  +  tree dest1, dest2;
  +
  +  dest1 = gimple_goto_dest (g1);
  +  dest2 = gimple_goto_dest (g2);
  +
  +  if (TREE_CODE (dest1) != TREE_CODE (dest2) || TREE_CODE (dest1) != 
  SSA_NAME)
  +return false;
  +
  +  return compare_operand (dest1, dest2);
  
  You probably need to care only about indirect gotos, the direct ones are 
  checked by
  CFG compare.  So is the condtional jump.
 
 It looks that this code is visited quite rare.

Hmm, perhaps it is called only for indirect calls, because all others are not 
represented as statements.
 
  +
  +/* Verifies for given GIMPLEs S1 and S2 that ASM statements are equivalent.
  +   For the beginning, the pass only supports equality for
  +   '__asm__ __volatile__ (, , , memory)'.  */
  +
  +bool
  +func_checker::compare_gimple_asm (gimple g1, gimple g2)
  +{
  +  if (gimple_asm_volatile_p (g1) != gimple_asm_volatile_p (g2))
  +return false;
  +
  +  if (gimple_asm_ninputs (g1) || gimple_asm_ninputs (g2))
  +return false;
  +
  +  if (gimple_asm_noutputs (g1) || gimple_asm_noutputs (g2))
  +return false;
  +
  +  if (gimple_asm_nlabels (g1) || gimple_asm_nlabels (g2))
  +return false;
  +
  +  if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2))
  +return false;
  +
  +  for (unsigned i = 0; i  gimple_asm_nclobbers (g1); i++)
  +{
  +  tree clobber1 = TREE_VALUE (gimple_asm_clobber_op (g1, i));
  +  tree clobber2 = TREE_VALUE (gimple_asm_clobber_op (g2, i));
  +
  +  if (!operand_equal_p (clobber1, clobber2, OEP_ONLY_CONST))
  +   return false;
  +}
  +
  
  Even asm statements with no inputs or outputs can differ by the actual
  asm statement. Compare it too.
  
  Comparing inputs/outputs/labels should be very easy to do.
  
  Compare all gimple_asm_n* for equivalency.
 
 This makes fully sense, but I don't understand what kind of operands do you 
 mean?

You can look some other code dealing with gimple asm statements.  You can just 
compare
gimple_op for 0 gimple_num_ops and be ready to deal with TREE_LIST as 
described
bellow. 

Honza
 
  At the end walk operands and watch the case they are TREE_LIST.
  THen compare TREE_VALUE (op) of the list for operand_equal_p
  and TREE_VALUE (TREE_PURPOSE (op)) for equivalency
  (those are the constraints)
  
  If they are not (clobbers are not, those are just strings), operand_equal_p
  should do.
  
  +  return true;
  +}
  +
  +} // ipa_icf namespace
  
  Otherwise I think ipa-gimple-icf is quite fine now.
  Please send updated version and I think it can go to mainline before the 
  actual ipa-icf.
 
 I renamed both files and put them to a newly created namespace ipa_icf_gimple.
 
 Thank you,
 Martin


Re: [PATCH 3/5] IPA ICF pass

2014-10-11 Thread Jan Hubicka
 
 Hello.
 
 Yeah, you are right. But even Richard advised me to put it to a single place. 
 Maybe we are a bit
 more strict than it would be necessary. But I hope that's fine ;)

OK, lets do extra checking now and play with this incrementally.
 
 Good point. Do you mean cases like, foo (alias_foo) and bar (alias_bar). If 
 we prove that foo equals to bar, can we also merge aliases?
 I am curious if such comparison can really save something? Are there any 
 interesting cases?

What probably matters is that you recognize the equivalence to know that uses 
of alias_foo can be merged with uses alias_bar.
Similarly for thunks.  Again something to do incrementally I guess.

Honza
 
 Martin
 
 
  +case INTEGER_CST:
  +  {
  +  ret = types_are_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2))
  + wi::to_offset  (t1) == wi::to_offset  (t2);
  
tree_int_cst_equal
  
  +case FIELD_DECL:
  +  {
  +  tree fctx1 = DECL_FCONTEXT (t1);
  +  tree fctx2 = DECL_FCONTEXT (t2);
  
  DECL_FCONTEXT has no semantic meaning; so you can skip comparing it.
  +
  +  tree offset1 = DECL_FIELD_OFFSET (t1);
  +  tree offset2 = DECL_FIELD_OFFSET (t2);
  +
  +  tree bit_offset1 = DECL_FIELD_BIT_OFFSET (t1);
  +  tree bit_offset2 = DECL_FIELD_BIT_OFFSET (t2);
  +
  +  ret = compare_operand (fctx1, fctx2)
  + compare_operand (offset1, offset2)
  + compare_operand (bit_offset1, bit_offset2);
  
  You probably want to compare type here?
  +case CONSTRUCTOR:
  +  {
  +  unsigned len1 = vec_safe_length (CONSTRUCTOR_ELTS (t1));
  +  unsigned len2 = vec_safe_length (CONSTRUCTOR_ELTS (t2));
  +
  +  if (len1 != len2)
  +return false;
  +
  +  for (unsigned i = 0; i  len1; i++)
  +if (!sem_variable::equals (CONSTRUCTOR_ELT (t1, i)-value,
  +   CONSTRUCTOR_ELT (t2, i)-value))
  +  return false;
  
  You want to compare -index, too.
  +case INTEGER_CST:
  +  return func_checker::types_are_compatible_p (TREE_TYPE (t1), 
  TREE_TYPE (t2),
  +   true)
  +wi::to_offset (t1) == wi::to_offset (t2);
  again ;)
  
  This is where I stopped for now.  Generally the patch seems OK to me with 
  few of these
  details fixed.
  
  Honza
  


[PING] [PATCH, xtensa] Add zero-overhead looping for xtensa backend

2014-10-11 Thread Yangfei (Felix)
PING?

 
 Hi Sterling,
 
 I made some improvement to the patch. Two changes:
 1. TARGET_LOOPS is now used as a condition of the doloop related
 patterns, which is more elegant.
 2. As the trip count register of the zero-cost loop maybe potentially 
 spilled,
 we need to change the patterns in order to handle this issue. The solution is
 similar to that adapted by c6x backend.
 Just turn the zero-cost loop into a regular loop when that happens when reload
 is completed.
 Attached please find version 4 of the patch. Make check regression tested
 with xtensa-elf-gcc/simulator.
 OK for trunk?
 
 Index: gcc/ChangeLog
 
 ===
 --- gcc/ChangeLog(revision 216079)
 +++ gcc/ChangeLog(working copy)
 @@ -1,3 +1,20 @@
 +2014-10-10  Felix Yang  felix.y...@huawei.com
 +
 +* config/xtensa/xtensa.h (TARGET_LOOPS): New Macro.
 +* config/xtensa/xtensa.c (xtensa_reorg): New.
 +(xtensa_reorg_loops): New.
 +(xtensa_can_use_doloop_p): New.
 +(xtensa_invalid_within_doloop): New.
 +(hwloop_optimize): New.
 +(hwloop_fail): New.
 +(hwloop_pattern_reg): New.
 +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end
 label.
 +(xtensa_doloop_hooks): Define.
 +* config/xtensa/xtensa.md (doloop_end): New.
 +(loop_end): New
 +(zero_cost_loop_start): Rewritten.
 +(zero_cost_loop_end): Rewritten.
 +
  2014-10-10  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * configure.ac: Add --enable-fix-cortex-a53-835769 option.
 Index: gcc/config/xtensa/xtensa.md
 
 ===
 --- gcc/config/xtensa/xtensa.md(revision 216079)
 +++ gcc/config/xtensa/xtensa.md(working copy)
 @@ -35,6 +35,8 @@
(UNSPEC_TLS_CALL9)
(UNSPEC_TP10)
(UNSPEC_MEMW11)
 +  (UNSPEC_LSETUP_START  12)
 +  (UNSPEC_LSETUP_END13)
 
(UNSPECV_SET_FP1)
(UNSPECV_ENTRY2)
 @@ -1289,41 +1291,120 @@
 (set_attr length3)])
 
 
 +;; Zero-overhead looping support.
 +
  ;; Define the loop insns used by bct optimization to represent the -;; start 
 and
 end of a zero-overhead loop (in loop.c).  This start -;; template generates 
 the
 loop insn; the end template doesn't generate -;; any instructions since loop 
 end
 is handled in hardware.
 +;; start and end of a zero-overhead loop.  This start template
 +generates ;; the loop insn; the end template doesn't generate any
 +instructions since ;; loop end is handled in hardware.
 
  (define_insn zero_cost_loop_start
[(set (pc)
 -(if_then_else (eq (match_operand:SI 0 register_operand a)
 -  (const_int 0))
 -  (label_ref (match_operand 1  ))
 -  (pc)))
 -   (set (reg:SI 19)
 -(plus:SI (match_dup 0) (const_int -1)))]
 -  
 -  loopnez\t%0, %l1
 +(if_then_else (ne (match_operand:SI 0 register_operand 2)
 +  (const_int 1))
 +  (label_ref (match_operand 1  ))
 +  (pc)))
 +   (set (match_operand:SI 2 register_operand =a)
 +(plus (match_dup 0)
 +  (const_int -1)))
 +   (unspec [(const_int 0)] UNSPEC_LSETUP_START)]  TARGET_LOOPS 
 + optimize
 +  loop\t%0, %l1_LEND
[(set_attr typejump)
 (set_attr modenone)
 (set_attr length3)])
 
  (define_insn zero_cost_loop_end
[(set (pc)
 -(if_then_else (ne (reg:SI 19) (const_int 0))
 -  (label_ref (match_operand 0  ))
 -  (pc)))
 -   (set (reg:SI 19)
 -(plus:SI (reg:SI 19) (const_int -1)))]
 -  
 +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand
 2,2)
 +  (const_int 1))
 +  (label_ref (match_operand 1  ))
 +  (pc)))
 +   (set (match_operand:SI 2 nonimmediate_operand =a,m)
 +(plus (match_dup 0)
 +  (const_int -1)))
 +   (unspec [(const_int 0)] UNSPEC_LSETUP_END)
 +   (clobber (match_scratch:SI 3 =X,r))]  TARGET_LOOPS  optimize
 +  #
 +  [(set_attr typejump)
 +   (set_attr modenone)
 +   (set_attr length0)])
 +
 +(define_insn loop_end
 +  [(set (pc)
 +(if_then_else (ne (match_operand:SI 0 register_operand 2)
 +  (const_int 1))
 +  (label_ref (match_operand 1  ))
 +  (pc)))
 +   (set (match_operand:SI 2 register_operand =a)
 +(plus (match_dup 0)
 +  (const_int -1)))
 +   (unspec [(const_int 0)] UNSPEC_LSETUP_END)]
 +  TARGET_LOOPS  optimize
  {
 -xtensa_emit_loop_end (insn, operands);
 -return ;
 +  xtensa_emit_loop_end (insn, operands);  return ;
  }
[(set_attr typejump)
 (set_attr modenone)
 (set_attr length0)])
 
 +(define_split
 +  [(set (pc)
 +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand )
 +  (const_int 1))
 +  (label_ref (match_operand 1  ))
 +  

Re: RFA: fix mode confusion in caller-save.c:replace_reg_with_saved_mem

2014-10-11 Thread Joern Rennecke
On 10 October 2014 21:13, Jeff Law l...@redhat.com wrote:
...
 ISTM it would be better to find the mode of the same class that corresponds
 to GET_MODE_SIZE (mode) / nregs.  In your case that's obviously QImode :-)

Like this?
Or did you mean to remove the save_mode[regno] use altogether?  I can
think of arguments for or against, but got no
concrete examples for either.
2014-10-11  Joern Rennecke  joern.renne...@embecosm.com
Jeff Law  l...@redhat.com

* caller-save.c (replace_reg_with_saved_mem): If saved_mode covers
multiple hard registers, use word_mode.

diff --git a/gcc/caller-save.c b/gcc/caller-save.c
index e28facb..31b1a36 100644
--- a/gcc/caller-save.c
+++ b/gcc/caller-save.c
@@ -1158,9 +1158,12 @@ replace_reg_with_saved_mem (rtx *loc,
  }
else
  {
-   gcc_assert (save_mode[regno] != VOIDmode);
-   XVECEXP (mem, 0, i) = gen_rtx_REG (save_mode [regno],
-  regno + i);
+   enum machine_mode smode = save_mode[regno];
+   gcc_assert (smode != VOIDmode);
+   if (hard_regno_nregs [regno][smode]  1)
+ smode = mode_for_size (GET_MODE_SIZE (mode) / nregs,
+GET_MODE_CLASS (mode), 0);
+   XVECEXP (mem, 0, i) = gen_rtx_REG (smode, regno + i);
  }
 }
 


[testsuite patch] avoid test when compile options is conflict with default mthumb

2014-10-11 Thread Wang Deqiang
When testing arm-linux-gnueabihf triple with configure options
--with-mode=thumb(that makes -mthumb option default).
some testcase is failed with error message sorry, unimplemented:
Thumb-1 hard-float VFP ABI.
I found gcc compiler show this error message when :
1. -mthumb is used with -march=armv6 (or armv5e) and -mcpu=xscale
2. the test source have function body.

And when -mthumb is the default option of compiler, the dg-skip-if
functions can not detect it,
There is no xscale check function in target-supports.exp in. so we
need to add it .
And there are only macros in the test program in
check_effective_target_arm* function . no function body, we need to
add it too.

Here is my patch:

2014-10-08  Wangdeqiang  wangdeqi...@linaro.org
* lib/target-supports.exp (check_effective_target_arm_
xscale_ok): New function.
(check_effective_target_arm_arch_FUNC_ok): Add test function body.
* gcc.target/arm/pr40887.c (dg-require-effective-target): add
arm_arch_v5te_ok check
* gcc.target/arm/scd42-1.c (dg-require-effective-target): add
arm_xscale_ok check
* gcc.target/arm/scd42-2.c : Likewise
* gcc.target/arm/scd42-3.c : Likewise
* gcc.target/arm/g2.c : Likewise
* gcc.target/arm/xor-and.c (dg-require-effective-target): add
arm_arch_v6_ok check

Index: gcc/testsuite/gcc.target/arm/pr40887.c
===
--- gcc/testsuite/gcc.target/arm/pr40887.c  (revision 216115)
+++ gcc/testsuite/gcc.target/arm/pr40887.c  (working copy)
@@ -1,6 +1,7 @@
 /* { dg-skip-if need at least armv5 { *-*-* } { -march=armv[234]*
} {  } } */
 /* { dg-options -O2 -march=armv5te }  */
 /* { dg-final { scan-assembler blx } } */
+/* { dg-require-effective-target arm_arch_v5te_ok } */

 int (*indirect_func)(int x);

Index: gcc/testsuite/gcc.target/arm/scd42-2.c
===
--- gcc/testsuite/gcc.target/arm/scd42-2.c  (revision 216115)
+++ gcc/testsuite/gcc.target/arm/scd42-2.c  (working copy)
@@ -5,6 +5,7 @@
 /* { dg-skip-if Test is specific to the Xscale { arm*-*-* } {
-mcpu=* } { -mcpu=xscale } } */
 /* { dg-skip-if Test is specific to ARM mode { arm*-*-* } {
-mthumb } {  } } */
 /* { dg-require-effective-target arm32 } */
+/* { dg-require-effective-target arm_xscale_ok } */

 unsigned load2(void) __attribute__ ((naked));
 unsigned load2(void)
Index: gcc/testsuite/gcc.target/arm/scd42-3.c
===
--- gcc/testsuite/gcc.target/arm/scd42-3.c  (revision 216115)
+++ gcc/testsuite/gcc.target/arm/scd42-3.c  (working copy)
@@ -3,6 +3,7 @@
 /* { dg-skip-if Test is specific to Xscale { arm*-*-* } {
-march=* } { -march=xscale } } */
 /* { dg-skip-if Test is specific to Xscale { arm*-*-* } { -mcpu=*
} { -mcpu=xscale } } */
 /* { dg-options -mcpu=xscale -O } */
+/* { dg-require-effective-target arm_xscale_ok } */

 unsigned load4(void) __attribute__ ((naked));
 unsigned load4(void)
Index: gcc/testsuite/gcc.target/arm/g2.c
===
--- gcc/testsuite/gcc.target/arm/g2.c   (revision 216115)
+++ gcc/testsuite/gcc.target/arm/g2.c   (working copy)
@@ -5,6 +5,7 @@
 /* { dg-skip-if Test is specific to the Xscale { arm*-*-* } {
-mcpu=* } { -mcpu=xscale } } */
 /* { dg-skip-if Test is specific to ARM mode { arm*-*-* } {
-mthumb } {  } } */
 /* { dg-require-effective-target arm32 } */
+/* { dg-require-effective-target arm_xscale_ok } */

 /* Brett Gaines' test case. */
 unsigned BCPL(unsigned) __attribute__ ((naked));
Index: gcc/testsuite/gcc.target/arm/xor-and.c
===
--- gcc/testsuite/gcc.target/arm/xor-and.c  (revision 216115)
+++ gcc/testsuite/gcc.target/arm/xor-and.c  (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options -O -march=armv6 } */
 /* { dg-prune-output switch .* conflicts with } */
+/* { dg-require-effective-target arm_arch_v6_ok } */

 unsigned short foo (unsigned short x)
 {
Index: gcc/testsuite/gcc.target/arm/scd42-1.c
===
--- gcc/testsuite/gcc.target/arm/scd42-1.c  (revision 216115)
+++ gcc/testsuite/gcc.target/arm/scd42-1.c  (working copy)
@@ -2,6 +2,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if incompatible options { arm*-*-* } { -march=* } {  } } */
 /* { dg-options -mcpu=xscale -O } */
+/* { dg-require-effective-target arm_xscale_ok } */

 unsigned load1(void) __attribute__ ((naked));
 unsigned load1(void)
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 216115)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -2721,6 +2721,11 @@ foreach { armfunc armflag armdef } { v4
#if !defined (DEF)
#error !DEF
#endif
+   int

Fallout on full bootstrap (was: r216010 - in /trunk/gcc: ChangeLog ipa-polymorp...)

2014-10-11 Thread Jan-Benedict Glaw
On Wed, 2014-10-08 17:10:01 -, hubi...@gcc.gnu.org hubi...@gcc.gnu.org 
wrote:
 URL: https://gcc.gnu.org/viewcvs?rev=216010root=gccview=rev
   * ipa-polymorphic-call.c (extr_type_from_vtbl_store): Do better
   pattern matching of MEM_REF.
   (check_stmt_for_type_change): Update.

This recent commit led to fallout for all targets build with
config-list.mk:

g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common 
 -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. 
-I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include 
-I/opt/cfarm/mpc/include  -I../../../gcc/gcc/../libdecnumber 
-I../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I../../../gcc/gcc/../libbacktrace-o ipa-polymorphic-call.o -MT 
ipa-polymorphic-call.o -MMD -MP -MF ./.deps/ipa-polymorphic-call.TPo 
../../../gcc/gcc/ipa-polymorphic-call.c
../../../gcc/gcc/ipa-polymorphic-call.c: In function ‘tree_node* 
extr_type_from_vtbl_ptr_store(gimple, type_change_info*, long int*)’:
../../../gcc/gcc/ipa-polymorphic-call.c:2117:1: error: assuming signed overflow 
does not occur when assuming that (X + c)  X is always false 
[-Werror=strict-overflow]
 }
 ^
cc1plus: all warnings being treated as errors
make[2]: *** [ipa-polymorphic-call.o] Error 1
make[2]: Leaving directory 
`/home/jbglaw/build-configlist_mk/iq2000-elf/build-gcc/mk/iq2000-elf/gcc'
make[1]: *** [all-gcc] Error 2


(Note that this `g++' is an up-to-date revision, and the line number
mentioned is also wrong.)  It's probably caused by this chunk:


diff --git a/gcc/ipa-polymorphic-call.c b/gcc/ipa-polymorphic-call.c
index 3e4aa04..51c6709 100644
--- a/gcc/ipa-polymorphic-call.c
+++ b/gcc/ipa-polymorphic-call.c
[...]
@@ -1218,7 +1226,19 @@ extr_type_from_vtbl_ptr_store (gimple stmt, struct 
type_change_info *tci,
  print_generic_expr (dump_file, tci-instance, TDF_SLIM);
  fprintf (dump_file,  with offset %i\n, (int)tci-offset);
}
- return NULL_TREE;
+ return tci-offset  GET_MODE_BITSIZE (Pmode) ? error_mark_node : 
NULL_TREE;
+   }
+  if (offset != tci-offset
+ || size != POINTER_SIZE
+ || max_size != POINTER_SIZE)
+   {
+ if (dump_file)
+   fprintf (dump_file, wrong offset %i!=%i or size %i\n,
+(int)offset, (int)tci-offset, (int)size);
+ return offset + GET_MODE_BITSIZE (Pmode) = offset

+|| (max_size != -1
+ tci-offset + GET_MODE_BITSIZE (Pmode)  offset + 
max_size)
+? error_mark_node : NULL;
}
 }
 

This is visible on all config-list.mk builds, see eg. just a few
recent ones:

m32r-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361814
lm32-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361741
ia64-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361682
ia64-linux: 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361738

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of: really soon now:  an unspecified period of time, 
likly to
the second  : be greater than any reasonable 
definition
  of soon.


signature.asc
Description: Digital signature


Re: Fallout on full bootstrap

2014-10-11 Thread Andreas Tobler

On 11.10.14 12:43, Jan-Benedict Glaw wrote:

On Wed, 2014-10-08 17:10:01 -, hubi...@gcc.gnu.org hubi...@gcc.gnu.org 
wrote:

URL: https://gcc.gnu.org/viewcvs?rev=216010root=gccview=rev
* ipa-polymorphic-call.c (extr_type_from_vtbl_store): Do better
pattern matching of MEM_REF.
(check_stmt_for_type_change): Update.


This recent commit led to fallout for all targets build with
config-list.mk:

g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common 
 -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. 
-I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include 
-I/opt/cfarm/mpc/include  -I../../../gcc/gcc/../libdecnumber 
-I../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I../../../gcc/gcc/../libbacktrace-o ipa-polymorphic-call.o -MT 
ipa-polymorphic-call.o -MMD -MP -MF ./.deps/ipa-polymorphic-call.TPo 
../../../gcc/gcc/ipa-polymorphic-call.c
../../../gcc/gcc/ipa-polymorphic-call.c: In function ‘tree_node* 
extr_type_from_vtbl_ptr_store(gimple, type_change_info*, long int*)’:
../../../gcc/gcc/ipa-polymorphic-call.c:2117:1: error: assuming signed overflow 
does not occur when assuming that (X + c)  X is always false 
[-Werror=strict-overflow]
  }
  ^
cc1plus: all warnings being treated as errors
make[2]: *** [ipa-polymorphic-call.o] Error 1
make[2]: Leaving directory 
`/home/jbglaw/build-configlist_mk/iq2000-elf/build-gcc/mk/iq2000-elf/gcc'
make[1]: *** [all-gcc] Error 2


(Note that this `g++' is an up-to-date revision, and the line number
mentioned is also wrong.)  It's probably caused by this chunk:


diff --git a/gcc/ipa-polymorphic-call.c b/gcc/ipa-polymorphic-call.c
index 3e4aa04..51c6709 100644
--- a/gcc/ipa-polymorphic-call.c
+++ b/gcc/ipa-polymorphic-call.c
[...]
@@ -1218,7 +1226,19 @@ extr_type_from_vtbl_ptr_store (gimple stmt, struct 
type_change_info *tci,
   print_generic_expr (dump_file, tci-instance, TDF_SLIM);
   fprintf (dump_file,  with offset %i\n, (int)tci-offset);
 }
- return NULL_TREE;
+ return tci-offset  GET_MODE_BITSIZE (Pmode) ? error_mark_node : 
NULL_TREE;
+   }
+  if (offset != tci-offset
+ || size != POINTER_SIZE
+ || max_size != POINTER_SIZE)
+   {
+ if (dump_file)
+   fprintf (dump_file, wrong offset %i!=%i or size %i\n,
+(int)offset, (int)tci-offset, (int)size);
+ return offset + GET_MODE_BITSIZE (Pmode) = offset 

+|| (max_size != -1
+ tci-offset + GET_MODE_BITSIZE (Pmode)  offset + 
max_size)
+? error_mark_node : NULL;
 }
  }


This is visible on all config-list.mk builds, see eg. just a few
recent ones:

m32r-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361814
lm32-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361741
ia64-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361682
ia64-linux: 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361738



This is Bug 63496

Andreas



Re: -fuse-caller-save - Collect register usage information

2014-10-11 Thread Eric Botcazou
 So, I hate the name of the option, and the documentation seems wrong to me. 
 It doesn’t use the caller saved registers for allocation, it uses the call
 clobbered registers for allocation.  Or, one could say it uses the callee
 saved registers for allocation.

Seconded, the description is a bit confusing and caller saved/callee saved 
should be avoided IMO, call clobbered/call saved is much clearer.

-- 
Eric Botcazou


Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition

2014-10-11 Thread Felix Yang
Hello Jeff,

I see that you have improved the RTL typesafety issue for ira.c,
so I rebased this patch
on the latest trunk and change to use the new list walking interface.
Bootstrapped on x86_64-SUSE-Linux and make check regression tested.
OK for trunk?

Index: gcc/ChangeLog
===
--- gcc/ChangeLog(revision 216116)
+++ gcc/ChangeLog(working copy)
@@ -1,3 +1,14 @@
+2014-10-11  Felix Yang  felix.y...@huawei.com
+Jeff Law  l...@redhat.com
+
+* ira.c (struct equivalence): Change member is_arg_equivalence
and replace
+into boolean bitfields; turn member loop_depth into a short
integer; add new
+member no_equiv and reserved.
+(no_equiv): Set no_equiv of struct equivalence if register is marked
+as having no known equivalence.
+(update_equiv_regs): Check all definitions for a multiple-set
+register to make sure that the RHS have the same value.
+
 2014-10-11  Martin Liska  mli...@suse.cz

 PR/63376
Index: gcc/ira.c
===
--- gcc/ira.c(revision 216116)
+++ gcc/ira.c(working copy)
@@ -2902,12 +2902,14 @@ struct equivalence

   /* Loop depth is used to recognize equivalences which appear
  to be present within the same loop (or in an inner loop).  */
-  int loop_depth;
+  short loop_depth;
   /* Nonzero if this had a preexisting REG_EQUIV note.  */
-  int is_arg_equivalence;
+  unsigned char is_arg_equivalence : 1;
   /* Set when an attempt should be made to replace a register
  with the associated src_p entry.  */
-  char replace;
+  unsigned char replace : 1;
+  /* Set if this register has no known equivalence.  */
+  unsigned char no_equiv : 1;
 };

 /* reg_equiv[N] (where N is a pseudo reg number) is the equivalence
@@ -3255,6 +3257,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE
   if (!REG_P (reg))
 return;
   regno = REGNO (reg);
+  reg_equiv[regno].no_equiv = 1;
   list = reg_equiv[regno].init_insns;
   if (list  list-insn () == NULL)
 return;
@@ -3381,7 +3384,7 @@ update_equiv_regs (void)

   /* If this insn contains more (or less) than a single SET,
  only mark all destinations as having no known equivalence.  */
-  if (set == 0)
+  if (set == NULL_RTX)
 {
   note_stores (PATTERN (insn), no_equiv, NULL);
   continue;
@@ -3476,16 +3479,49 @@ update_equiv_regs (void)
   if (note  GET_CODE (XEXP (note, 0)) == EXPR_LIST)
 note = NULL_RTX;

-  if (DF_REG_DEF_COUNT (regno) != 1
-   (! note
+  if (DF_REG_DEF_COUNT (regno) != 1)
+{
+  bool equal_p = true;
+  rtx_insn_list *list;
+
+  /* If we have already processed this pseudo and determined it
+ can not have an equivalence, then honor that decision.  */
+  if (reg_equiv[regno].no_equiv)
+continue;
+
+  if (! note
   || rtx_varies_p (XEXP (note, 0), 0)
   || (reg_equiv[regno].replacement
! rtx_equal_p (XEXP (note, 0),
-reg_equiv[regno].replacement
-{
-  no_equiv (dest, set, NULL);
-  continue;
+reg_equiv[regno].replacement)))
+{
+  no_equiv (dest, set, NULL);
+  continue;
+}
+
+  list = reg_equiv[regno].init_insns;
+  for (; list; list = list-next ())
+{
+  rtx note_tmp;
+  rtx_insn *insn_tmp;
+
+  insn_tmp = list-insn ();
+  note_tmp = find_reg_note (insn_tmp, REG_EQUAL, NULL_RTX);
+  gcc_assert (note_tmp);
+  if (! rtx_equal_p (XEXP (note, 0), XEXP (note_tmp, 0)))
+{
+  equal_p = false;
+  break;
+}
+}
+
+  if (! equal_p)
+{
+  no_equiv (dest, set, NULL);
+  continue;
+}
 }
+
   /* Record this insn as initializing this register.  */
   reg_equiv[regno].init_insns
 = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns);
@@ -3514,10 +3550,9 @@ update_equiv_regs (void)
  a register used only in one basic block from a MEM.  If so, and the
  MEM remains unchanged for the life of the register, add a REG_EQUIV
  note.  */
-
   note = find_reg_note (insn, REG_EQUIV, NULL_RTX);

-  if (note == 0  REG_BASIC_BLOCK (regno) = NUM_FIXED_BLOCKS
+  if (note == NULL_RTX  REG_BASIC_BLOCK (regno) = NUM_FIXED_BLOCKS
MEM_P (SET_SRC (set))
validate_equiv_mem (insn, dest, SET_SRC (set)))
 note = set_unique_reg_note (insn, REG_EQUIV, copy_rtx (SET_SRC (set)));
@@ -3547,7 +3582,7 @@ update_equiv_regs (void)

   reg_equiv[regno].replacement = x;
   reg_equiv[regno].src_p = SET_SRC (set);
-  reg_equiv[regno].loop_depth = loop_depth;
+  reg_equiv[regno].loop_depth = (short) loop_depth;

   /* Don't mess 

[patch,fortran] Handle (signed) zeros, infinities and NaNs in some intrinsics

2014-10-11 Thread FX
The attached patch fixes the compile-time simplification of special values 
(positive and negative zeros, infinities, and NaNs) in intrinsics EXPONENT, 
FRACTION, RRSPACING, SET_EXPONENT, SPACING. Those are all the intrinsics in the 
Fortran 2008 standard that say anything about these special values, so it makes 
sense to fix them. This is the compile-time part of PR 48979 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48979).

Some notes:

  - We’re not technically required to do anything about infinities and NaNs 
unless IEEE_ARITHMETIC is accessible. My view is that it makes sense, as a 
quality of implementation issue, to handle them correctly anyway. I’ve done so 
here for simplification, and intent to do the same later for code generation in 
trans-intrinsic.c

  - For FRACTION, the 2003 standard says FRACTION(inf) = inf, while Fortran 
2008 says FRACTION(inf) = NaN. I agree with Tobias, who said in the PR we 
shouldn’t emit different code based on -std=f2003/f2008. Instead, we use the 
Fortran 2008 intepretation here. It makes more sense anyway.

  - While digging into MPFR doc, I realized that the test (mpfr_sgn 
(x-value.real) == 0) used a few times in simplify.c is not only true for 
zeros, but also for NaNs! I thus replaced it with mpfr_zero_p (x-value.real). 
It affects only some (invalid) warnings. For example, before my patch, the code 
LOG((nan,nan)) would emit an error Complex argument of LOG cannot be zero”, 
which makes little sense.


Regtested on x86_64-apple-darwin14. OK to commit?

FX




intrinsics.ChangeLog
Description: Binary data


intrinsics.diff
Description: Binary data


[SH][committed] Remove TARGET_SH4A_ARCH macro

2014-10-11 Thread Oleg Endo
Hi,

The TARGET_SH4A_ARCH macro has the same meaning as TARGET_SH4A and thus
can be removed.  Tested with 'make all' on sh-elf, committed as r216119.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.h (TARGET_SH4A_ARCH): Remove macro.
* config/sh/sh.h: Replace uses of TARGET_SH4A_ARCH with TARGET_SH4A.
* config/sh/sh.c: Likewise.
* config/sh/sh-mem.cc: Likewise.
* config/sh/sh.md: Likewise.
* config/sh/predicates.md: Likewise.
* config/sh/sync.md: Likewise.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 216118)
+++ gcc/config/sh/sh.c	(working copy)
@@ -818,7 +818,7 @@
   assembler_dialect = 1;
   sh_cpu = PROCESSOR_SH4;
 }
-  if (TARGET_SH4A_ARCH)
+  if (TARGET_SH4A)
 {
   assembler_dialect = 1;
   sh_cpu = PROCESSOR_SH4A;
@@ -11597,7 +11597,7 @@
   if (TARGET_HARD_SH4 || TARGET_SH5)
 {
   if (!TARGET_INLINE_IC_INVALIDATE
-	  || (!(TARGET_SH4A_ARCH || TARGET_SH4_300)  TARGET_USERMODE))
+	  || (!(TARGET_SH4A || TARGET_SH4_300)  TARGET_USERMODE))
 	emit_library_call (function_symbol (NULL, __ic_invalidate,
 	FUNCTION_ORDINARY),
 			   LCT_NORMAL, VOIDmode, 1, tramp, SImode);
Index: gcc/config/sh/sh.h
===
--- gcc/config/sh/sh.h	(revision 216118)
+++ gcc/config/sh/sh.h	(working copy)
@@ -70,13 +70,9 @@
 #undef TARGET_SH4
 #define TARGET_SH4 ((target_flags  MASK_SH4) != 0  TARGET_SH1)
 
-/* Nonzero if we're generating code for the common subset of
-   instructions present on both SH4a and SH4al-dsp.  */
-#define TARGET_SH4A_ARCH TARGET_SH4A
-
 /* Nonzero if we're generating code for SH4a, unless the use of the
FPU is disabled (which makes it compatible with SH4al-dsp).  */
-#define TARGET_SH4A_FP (TARGET_SH4A_ARCH  TARGET_FPU_ANY)
+#define TARGET_SH4A_FP (TARGET_SH4A  TARGET_FPU_ANY)
 
 /* Nonzero if we should generate code using the SHcompact instruction
set and 32-bit ABI.  */
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 216118)
+++ gcc/config/sh/sh.md	(working copy)
@@ -6938,7 +6938,7 @@
   emit_insn (gen_ic_invalidate_line_compact (operands[0], operands[1]));
   DONE;
 }
-  else if (TARGET_SH4A_ARCH || TARGET_SH4_300)
+  else if (TARGET_SH4A || TARGET_SH4_300)
 {
   emit_insn (gen_ic_invalidate_line_sh4a (operands[0]));
   DONE;
@@ -6971,7 +6971,7 @@
 (define_insn ic_invalidate_line_sh4a
   [(unspec_volatile [(match_operand:SI 0 register_operand r)]
 		UNSPEC_ICACHE)]
-  TARGET_SH4A_ARCH || TARGET_SH4_300
+  TARGET_SH4A || TARGET_SH4_300
 {
   return   ocbwb	@%0	\n
 	 	synco		\n
@@ -13487,7 +13487,7 @@
   [(set (match_operand:SI 0 register_operand =z)
 	(unspec:SI [(match_operand:BLK 1 unaligned_load_operand Sua)]
 		   UNSPEC_MOVUA))]
-  TARGET_SH4A_ARCH
+  TARGET_SH4A
   movua.l	%1,%0
   [(set_attr type movua)])
 
@@ -13500,7 +13500,7 @@
 	(sign_extract:SI (mem:SI (match_operand:SI 1 register_operand ))
 			 (const_int 32) (const_int 0)))
(set (match_dup 1) (plus:SI (match_dup 1) (const_int 4)))]
-  TARGET_SH4A_ARCH  REGNO (operands[0]) != REGNO (operands[1])
+  TARGET_SH4A  REGNO (operands[0]) != REGNO (operands[1])
   [(set (match_operand:SI 0 register_operand )
 	(sign_extract:SI (mem:SI (post_inc:SI
   (match_operand:SI 1 register_operand )))
@@ -13512,7 +13512,7 @@
 	(sign_extract:SI (match_operand:QI 1 unaligned_load_operand )
 			 (match_operand 2 const_int_operand )
 			 (match_operand 3 const_int_operand )))]
-  TARGET_SH4A_ARCH || TARGET_SH2A
+  TARGET_SH4A || TARGET_SH2A
 {
   if (TARGET_SH2A  TARGET_BITOPS
(satisfies_constraint_Sbw (operands[1])
@@ -13525,7 +13525,7 @@
 	emit_insn (gen_movsi (operands[0], gen_rtx_REG (SImode, T_REG)));
   DONE;
}
-  if (TARGET_SH4A_ARCH
+  if (TARGET_SH4A
INTVAL (operands[2]) == 32
INTVAL (operands[3]) == 0
MEM_P (operands[1])  MEM_ALIGN (operands[1])  32)
@@ -13544,7 +13544,7 @@
 	(zero_extract:SI (match_operand:QI 1 unaligned_load_operand )
 			 (match_operand 2 const_int_operand )
 			 (match_operand 3 const_int_operand )))]
-  TARGET_SH4A_ARCH || TARGET_SH2A
+  TARGET_SH4A || TARGET_SH2A
 {
   if (TARGET_SH2A  TARGET_BITOPS
(satisfies_constraint_Sbw (operands[1])
@@ -13557,7 +13557,7 @@
 	emit_insn (gen_movsi (operands[0], gen_rtx_REG (SImode, T_REG)));
   DONE;
 }
-  if (TARGET_SH4A_ARCH
+  if (TARGET_SH4A
INTVAL (operands[2]) == 32
INTVAL (operands[3]) == 0
MEM_P (operands[1])  MEM_ALIGN (operands[1])  32)
Index: gcc/config/sh/predicates.md
===
--- gcc/config/sh/predicates.md	(revision 216118)
+++ gcc/config/sh/predicates.md	(working copy)
@@ -1074,14 +1074,14 @@
(and (match_test satisfies_constraint_I08 (op))
 	(match_test mode != QImode)

Re: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit ices@dwf_regno

2014-10-11 Thread Maciej W. Rozycki
On Thu, 9 Oct 2014, Maciej W. Rozycki wrote:

  Seeing Rohit got good results it has struck me that perhaps one of the 
 patches I had previously reverted, to be able to compile GCC in the first 
 place, interfered with this fix -- I backed out all the subsequent patches 
 to test yours and Rohit's by themselves only.  And it was actually the 
 case, with this change:
 
 2013-05-21  Christian Bruel  christian.br...@st.com
 
   * dwarf2out.c (multiple_reg_loc_descriptor): Use dbx_reg_number for
   spanning registers. LEAF_REG_REMAP is supported only for contiguous
   registers. Set register size out of the PARALLEL loop.
 
 back in place, in addition to your fix, I get an all-passed score for 
 gdb.base/store.exp.  So your change looks good and my decision to back out 
 the other patches unfortunate.  I'll yet run full e500v2 testing now to 
 double check, and let you know what the results are, within a couple of 
 hours if things work well.

 It took a bit more because I saw some regressions that I wanted to 
investigate.  In the end they turned out intermittent and the failures 
happen sometimes whether your change is applied or not.  So I'm fine with 
your change, thanks for your work and patience.

 For the record the failures were:

FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile Read 
tp_first_run: 0 2
FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile Read 
tp_first_run: 2 1
FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile Read 
tp_first_run: 3 1

  Maciej


[PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2014-10-11 Thread Ilya Verbin
Hello,

This is the last common infrastructure patch in the series.
(Next patches will contain tests for libgomp testsuite and MIC specific things)

It introduces 2 new options:
1. -foffload=targets=options
   By default, GCC will build offload images for all offload targets specified
in configure, with non-target-specific options passed to host compiler.
This option is used to control offload targets and options for them.

It can be used in a few ways:
* -foffload=disable
  Tells GCC to disable offload support.
  OpenMP target regions will be run in host fallback mode.
* -foffload=targets
  Tells GCC to build offload images for targets.
  They will be built with non-target-specific options passed to host compiler.
* -foffload=options
  Tells GCC to build offload images for all targets specified in configure. 
  They will be built with non-target-specific options passed to host compiler
  plus options.
* -foffload=targets=options
  Tells GCC to build offload images for targets.
  They will be built with non-target-specific options passed to host compiler
  plus options.

Options specified by -foffload are appended to the end of option set, so in case
of option conflicts they have more priority.

2. -foffload-abi=[lp64|ilp32]
   This option is supposed to tell mkoffload (and offload compiler) which ABI is
used in streamed GIMPLE.  This option is desirable, because host and offload
compilers must have the same ABI.  The option is generated by the host compiler
automatically, it should not be specified by user.

Examples:
$ gcc -fopenmp -c -O2 test1.c
$ gcc -fopenmp -c -O1 -msse -foffload=-mavx test2.c
$ gcc -fopenmp -foffload=-O3 -v test1.o test2.o

In this example the offload images will be built with the following options:
-O2 -mavx -O3 -v for targets specified in configure.

$ gcc -fopenmp -foffload=x86_64-intelmicemul-linux-gnu=-mavx2 \
  -foffload=nvptx-none -foffload=-O3 -O2 test.c

In this example 2 offload images will be built:
for MIC with -O2 -mavx2 -O3 and for PTX with -O2 -O3.

Bootstrapped and regtested on top of patch 5.  Is it OK for trunk?
kyukhin/gomp4-offload branch is updated correspondingly.

Thanks,
  -- Ilya


2014-10-11  Bernd Schmidt  ber...@codesourcery.com
Andrey Turetskiy  andrey.turets...@intel.com
Ilya Verbin  ilya.ver...@intel.com

gcc/
* common.opt (foffload, foffload-abi): New options.
* config/i386/i386.c (ix86_offload_options): New static function.
(TARGET_OFFLOAD_OPTIONS): Define.
* coretypes.h (enum offload_abi): New enum.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_OFFLOAD_OPTIONS): Document.
* gcc.c (offload_targets): New static variable.
(handle_foffload_option): New static function.
(driver_handle_option): Handle OPT_foffload_.
(driver::maybe_putenv_OFFLOAD_TARGETS): Set OFFLOAD_TARGET_NAMES
according to offload_targets.
* hooks.c (hook_charptr_void_null): New hook.
* hooks.h (hook_charptr_void_null): Declare.
* lto-opts.c: Include lto-section-names.h.
(lto_write_options): Append options from target offload_options hook and
store them to offload_lto section.  Do not store target-specific,
driver and diagnostic options in offload_lto section.
* lto-wrapper.c (merge_and_complain): Handle OPT_foffload_ and
OPT_foffload_abi_.
(append_compiler_options, append_linker_options)
(append_offload_options): New static functions.
(compile_offload_image): Add new arguments with options.
Call append_compiler_options and append_offload_options.
(compile_images_for_offload_targets): Add new arguments with options.
(find_and_merge_options): New static function.
(run_gcc): Outline options handling into the new functions:
find_and_merge_options, append_compiler_options, append_linker_options.
* opts.c (common_handle_option): Don't handle OPT_foffload_.
* target.def (offload_options): New target hook.

---

diff --git a/gcc/common.opt b/gcc/common.opt
index b4f0ed4..37a5fd4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1640,6 +1640,23 @@ fnon-call-exceptions
 Common Report Var(flag_non_call_exceptions) Optimization
 Support synchronous non-call exceptions
 
+foffload=
+Common Driver Joined MissingArgError(options or targets missing after %qs)
+-foffload=targets=options  Specify offloading targets and options for them
+
+foffload-abi=
+Common Joined RejectNegative Enum(offload_abi) Var(flag_offload_abi) 
Init(OFFLOAD_ABI_UNSET)
+-foffload-abi=[lp64|ilp32] Set the ABI to use in an offload compiler
+
+Enum
+Name(offload_abi) Type(enum offload_abi) UnknownError(unknown offload ABI %qs)
+
+EnumValue
+Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
+
+EnumValue
+Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
+
 fomit-frame-pointer
 Common Report Var(flag_omit_frame_pointer) Optimization
 When possible do not 

Re: [Ping] [PATCH, 1/10] two hooks for conditional compare (ccmp)

2014-10-11 Thread Richard Henderson
On 09/22/2014 11:43 PM, Zhenqiang Chen wrote:
 
 +@cindex @code{ccmp} instruction pattern
 +@item @samp{ccmp}
 +Conditional compare instruction.  Operand 2 and 5 are RTLs which perform
 +two comparisons.  Operand 1 is AND or IOR, which operates on the result of
 +operand 2 and 5.
 +It uses recursive method to support more than two compares.  e.g.
 +
 +  CC0 = CMP (a, b);
 +  CC1 = CCMP (NE (CC0, 0), CMP (e, f));
 +  ...
 +  CCn = CCMP (NE (CCn-1, 0), CMP (...));
 +
 +Two target hooks are used to generate conditional compares.  GEN_CCMP_FISRT
 +is used to generate the first CMP.  And GEN_CCMP_NEXT is used to generate the
 +following CCMPs.  Operand 1 is AND or IOR.  Operand 3 is the result of
 +GEN_CCMP_FISRT or a previous GEN_CCMP_NEXT.  Operand 2 is NE.
 +Operand 4, 5 and 6 is another compare expression.
 +
 +A typical CCMP pattern looks like
 +
 +@smallexample
 +(define_insn *ccmp_and_ior
 +  [(set (match_operand 0 dominant_cc_register )
 +(compare
 + (match_operator 1
 +  (match_operator 2 comparison_operator
 +   [(match_operand 3 dominant_cc_register)
 +(const_int 0)])
 +  (match_operator 4 comparison_operator
 +   [(match_operand 5 register_operand)
 +(match_operand 6 compare_operand]))
 + (const_int 0)))]
 +  
 +  @dots{})
 +@end smallexample
 +

This whole section should be removed.  You do not have a named ccmp pattern.
Even your example below is an *unnamed pattern.  This is an implementation
detail of the aarch64 backend.

Named patterns are used when that is the interface the middle-end uses to emit
code.  But you're not using named patterns, you're using:

 
 +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (int @var{code}, rtx 
 @var{op0}, rtx @var{op1})
 +This function emits a comparison insn for the first of a sequence of
 + conditional comparisions.  It returns a comparison expression appropriate
 + for passing to @code{gen_ccmp_next} or to @code{cbranch_optab}.
 + @code{unsignedp} is used when converting @code{op0} and @code{op1}'s mode.
 +@end deftypefn
 +
 +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_NEXT (rtx @var{prev}, int 
 @var{cmp_code}, rtx @var{op0}, rtx @var{op1}, int @var{bit_code})
 +This function emits a conditional comparison within a sequence of
 + conditional comparisons.  The @code{prev} expression is the result of a
 + prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}.  It may return
 + @code{NULL} if the combination of @code{prev} and this comparison is
 + not supported, otherwise the result must be appropriate for passing to
 + @code{gen_ccmp_next} or @code{cbranch_optab}.  @code{bit_code}
 + is AND or IOR, which is the op on the two compares.
 +@end deftypefn

Every place above where you refer to the arguments of the function should use
@var; you're using @code for most of them.  Use @code{AND} and @code{IOR}.


r~


Re: [Ping] [PATCH, 2/10] prepare ccmp

2014-10-11 Thread Richard Henderson
On 09/22/2014 11:43 PM, Zhenqiang Chen wrote:
 +   /* If jumps are cheap and the target does not support conditional
 +  compare, turn some more codes into jumpy sequences.  */
 +   else if (BRANCH_COST (optimize_insn_for_speed_p (), false)  4
 + (targetm.gen_ccmp_first == NULL))

Don't add unnecessary parenthesis around the == expression.

Otherwise ok.


r~


Re: [Ping] [PATCH, 5/10] aarch64: add ccmp operand predicate

2014-10-11 Thread Richard Henderson
On 09/22/2014 11:44 PM, Zhenqiang Chen wrote:
 +/* Return true if val can be encoded as a 5-bit unsigned immediate.  */
 +bool
 +aarch64_uimm5 (HOST_WIDE_INT val)
 +{
 +  return (val  (HOST_WIDE_INT) 0x1f) == val;
 +}

This is just silly.

 +(define_constraint Usn
 + A constant that can be used with a CCMN operation (once negated).
 + (and (match_code const_int)
 +  (match_test aarch64_uimm5 (-ival

(match_test IN_RANGE (ival, -31, 0))

 +(define_predicate aarch64_ccmp_immediate
 +  (and (match_code const_int)
 +   (ior (match_test aarch64_uimm5 (INTVAL (op)))
 + (match_test aarch64_uimm5 (-INTVAL (op))

(and (match_code const_int)
 (match_test IN_RANGE (INTVAL (op), -31, 31)))


r~


Re: [Ping] [PATCH, 6/10] aarch64: add ccmp CC mode

2014-10-11 Thread Richard Henderson
On 09/22/2014 11:44 PM, Zhenqiang Chen wrote:
 +case CC_DNEmode:
 +  return comp_code == NE ? AARCH64_NE : AARCH64_EQ;
 +case CC_DEQmode:
 +  return comp_code == NE ? AARCH64_EQ : AARCH64_NE;
 +case CC_DGEmode:
 +  return comp_code == NE ? AARCH64_GE : AARCH64_LT;
 +case CC_DLTmode:
 +  return comp_code == NE ? AARCH64_LT : AARCH64_GE;
 +case CC_DGTmode:
 +  return comp_code == NE ? AARCH64_GT : AARCH64_LE;
 +case CC_DLEmode:
 +  return comp_code == NE ? AARCH64_LE : AARCH64_GT;
 +case CC_DGEUmode:
 +  return comp_code == NE ? AARCH64_CS : AARCH64_CC;
 +case CC_DLTUmode:
 +  return comp_code == NE ? AARCH64_CC : AARCH64_CS;
 +case CC_DGTUmode:
 +  return comp_code == NE ? AARCH64_HI : AARCH64_LS;
 +case CC_DLEUmode:
 +  return comp_code == NE ? AARCH64_LS : AARCH64_HI;

I think these should return -1 if comp_code is not EQ.  Like the CC_Zmode case
below.

Perhaps you can share some code to make the whole thing less bulky.  E.g.

...
case CC_DLEUmode:
  ne = AARCH64_LS;
  eq = AARCH64_HI;
  break;
case CC_Zmode:
  ne = AARCH64_NE;
  eq = AARCH64_EQ;
  break;
}
  if (code == NE)
return ne;
  if (code == EQ)
return eq;
  return -1;

This does beg the question of whether you need both CC_Zmode and CC_DNEmode.
I'll leave it to an ARM maintainer to say which one of the two should be kept.


r~


Re: [patch,fortran] Handle (signed) zeros, infinities and NaNs in some intrinsics

2014-10-11 Thread Steve Kargl
On Sat, Oct 11, 2014 at 03:13:00PM +0200, FX wrote:
 The attached patch fixes the compile-time simplification of special
 values (positive and negative zeros, infinities, and NaNs) in
 intrinsics EXPONENT, FRACTION, RRSPACING, SET_EXPONENT, SPACING.
 Those are all the intrinsics in the Fortran 2008 standard that say
 anything about these special values, so it makes sense to fix them.
 This is the compile-time part of PR 48979
 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48979).
 

Looks ok to me.

-- 
Steve


Re: [Ping] [PATCH, 7/10] aarch64: add function to output ccmp insn

2014-10-11 Thread Richard Henderson
On 09/22/2014 11:45 PM, Zhenqiang Chen wrote:
 +static unsigned int
 +aarch64_code_to_nzcv (enum rtx_code code, bool inverse) {
 +  switch (code)
 +{
 +case NE: /* NE, Z == 0.  */
 +  return inverse ? AARCH64_CC_Z : 0;
 +case EQ: /* EQ, Z == 1.  */
 +  return inverse ? 0 : AARCH64_CC_Z;
 +case LE: /* LE, !(Z == 0  N == V).  */
 +  return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_Z;
 +case GT: /* GT, Z == 0  N == V.  */
 +  return inverse ? AARCH64_CC_Z : AARCH64_CC_N | AARCH64_CC_V;
 +case LT: /* LT, N != V.  */
 +  return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_N;
 +case GE: /* GE, N == V.  */
 +  return inverse ? AARCH64_CC_N : AARCH64_CC_N | AARCH64_CC_V;
 +case LEU: /* LS, !(C == 1  Z == 0).  */
 +  return inverse ? AARCH64_CC_C: AARCH64_CC_Z;
 +case GTU: /* HI, C ==1  Z == 0.  */
 +  return inverse ? AARCH64_CC_Z : AARCH64_CC_C;
 +case LTU: /* CC, C == 0.  */
 +  return inverse ? AARCH64_CC_C : 0;
 +case GEU: /* CS, C == 1.  */
 +  return inverse ? 0 : AARCH64_CC_C;
 +default:
 +  gcc_unreachable ();
 +  return 0;
 +}
 +}
 +

I'm not overly fond of this, since code doesn't map 1-1.  It needs the
context of a mode to provide a unique mapping.

I think it would be better to rearrange the existing aarch64_cond_code enum
such that AARCH64_NE et al are meaningful wrt NZCV.  Then you can use
aarch64_get_condition_code_1 to get this mapping.

 +static unsigned
 +aarch64_mode_to_condition_code (enum machine_mode mode, bool
 inverse) {
 +  switch (mode)
 +{
 +case CC_DNEmode:
 +  return inverse ? aarch64_get_condition_code_1 (CCmode, EQ)
 +  : aarch64_get_condition_code_1 (CCmode, NE);

This function is just silly.  Modulo the unsigned result, which is wrong after
the rebase, the whole thing reduces to

  return aarch64_get_condition_code_1 (mode, inverse ? EQ : NE);

I'm really not sure what you're after here.

 +const char *
 +aarch64_output_ccmp (rtx *operands, bool is_and, int which_alternative)

Is this really used more than once?  I'm not fond of the use of
which_alternative without the context of a pattern.  I think this could simply
be inlined.


r~


[PATCH] Fix detection of thread support with uClibc in libgcc

2014-10-11 Thread Kwok Cheung Yeung
__gthread_active_p() in libgcc checks for thread support by looking for 
the presence of a symbol from libpthread. With glibc, it looks for 
__pthread_key_create. However, it determines that glibc is being used by 
checking for a definition of __GLIBC__, which is also defined by uClibc 
(in include/features.h), but it does not export __pthread_key_create, 
causing the test to always fail. I've fixed this by extending the test 
for glibc to check that __UCLIBC__ is not defined, causing the default 
pthread_cancel to be tested with uClibc instead.


This affects anything that uses the C++11 thread library together with 
the uClibc implementation of libpthread. This caused a large number of 
failed tests from the g++, libgomp and libstdc++ testsuites when run on 
a MIPS Linux target with uClibc as the C library.


Kwok


2014-10-11  Kwok Cheung Yeung  k...@codesourcery.com

libgcc/
* gthr-posix.h (GTHR_ACTIVE_PROXY): Check that __UCLIBC__ is
not defined before defining to __gthrw_(__pthread_key_create).

Index: libgcc/gthr-posix.h
===
--- libgcc/gthr-posix.h (revision 216119)
+++ libgcc/gthr-posix.h (working copy)
@@ -232,7 +232,7 @@
library does not provide pthread_cancel, so we do use pthread_create
there (and interceptor libraries lose).  */

-#ifdef __GLIBC__
+#if defined (__GLIBC__)  !defined (__UCLIBC__)
 __gthrw2(__gthrw_(__pthread_key_create),
 __pthread_key_create,
 pthread_key_create)


Re: [PATCH] Fix detection of thread support with uClibc in libgcc

2014-10-11 Thread Andrew Pinski
On Sat, Oct 11, 2014 at 9:42 AM, Kwok Cheung Yeung k...@codesourcery.com 
wrote:
 __gthread_active_p() in libgcc checks for thread support by looking for the
 presence of a symbol from libpthread. With glibc, it looks for
 __pthread_key_create. However, it determines that glibc is being used by
 checking for a definition of __GLIBC__, which is also defined by uClibc (in
 include/features.h), but it does not export __pthread_key_create, causing
 the test to always fail. I've fixed this by extending the test for glibc to
 check that __UCLIBC__ is not defined, causing the default pthread_cancel to
 be tested with uClibc instead.


Why is __GLIBC__ being defined for uclibc?  That seems broken.  We
complain about __GNUC__ defined for other compilers besides GCC; we
should do the same for defining __GLIBC__ also.

Thanks,
Andrew


 This affects anything that uses the C++11 thread library together with the
 uClibc implementation of libpthread. This caused a large number of failed
 tests from the g++, libgomp and libstdc++ testsuites when run on a MIPS
 Linux target with uClibc as the C library.

 Kwok


 2014-10-11  Kwok Cheung Yeung  k...@codesourcery.com

 libgcc/
 * gthr-posix.h (GTHR_ACTIVE_PROXY): Check that __UCLIBC__ is
 not defined before defining to __gthrw_(__pthread_key_create).

 Index: libgcc/gthr-posix.h
 ===
 --- libgcc/gthr-posix.h (revision 216119)
 +++ libgcc/gthr-posix.h (working copy)
 @@ -232,7 +232,7 @@
 library does not provide pthread_cancel, so we do use pthread_create
 there (and interceptor libraries lose).  */

 -#ifdef __GLIBC__
 +#if defined (__GLIBC__)  !defined (__UCLIBC__)
  __gthrw2(__gthrw_(__pthread_key_create),
  __pthread_key_create,
  pthread_key_create)


Re: [PATCH] Fix detection of thread support with uClibc in libgcc

2014-10-11 Thread Kwok Cheung Yeung

On 11/10/2014 5:56 PM, Andrew Pinski wrote:

On Sat, Oct 11, 2014 at 9:42 AM, Kwok Cheung Yeung k...@codesourcery.com 
wrote:

__gthread_active_p() in libgcc checks for thread support by looking for the
presence of a symbol from libpthread. With glibc, it looks for
__pthread_key_create. However, it determines that glibc is being used by
checking for a definition of __GLIBC__, which is also defined by uClibc (in
include/features.h), but it does not export __pthread_key_create, causing
the test to always fail. I've fixed this by extending the test for glibc to
check that __UCLIBC__ is not defined, causing the default pthread_cancel to
be tested with uClibc instead.



Why is __GLIBC__ being defined for uclibc?  That seems broken.  We
complain about __GNUC__ defined for other compilers besides GCC; we
should do the same for defining __GLIBC__ also.



From the comments in include/features.h:

/*  There is an unwholesomely huge amount of code out there that depends 
on the
 *  presence of GNU libc header files.  We have GNU libc header files. 
 So here
 *  we commit a horrible sin.  At this point, we _lie_ and claim to be 
GNU libc
 *  to make things like /usr/include/linux/socket.h and lots of apps 
work as
 *  their developers intended.  This is IMHO, pardonable, since these 
defines
 *  are not really intended to check for the presence of a particular 
library,

 *  but rather are used to define an _interface_.  */

So it looks like a compatibility hack...

Kwok


Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

2014-10-11 Thread Mike Stump
On Oct 10, 2014, at 8:32 PM, Bin.Cheng amker.ch...@gmail.com wrote:
 
 Though I guess if we run fusion + peep2 between sched1 and sched2, that 
 problem would just resolve itself as we'd have fused AB together into a new 
 insn and we'd schedule normally with the fused insns and X, Y.
 
 Yes, in my version, I ran it really early, before sched.  I needed to run 
 before ira and run before other people
 
 Two reasons why I run it late in compilation process.
 1) IRA is the pass I tend not to disturb since code is changed
 dramatically.  With it after IRA, I can get certain improvement from
 fusion, there is less noise here.

Since I have a front-end background, I think nothing of creating pseudos when I 
want to, I just know if I do, I have to do this before allocation.  :-)  For my 
peepholes, since they create registers, they must run before allocation.

 2) The spilling generates many load/store pair opportunities on ARM,
 which I don't want to miss.

I happen to have enough registers that spilling wasn’t my primary concern.

 add rx, ry, rz
 ldr   r1, [rx]
 ldr   r2, [rx+4]
 ldr   r3, [rx+8]
 
 It will be transformed into:
 
 add rx, ry, rz
 ldr   r1, [ry+rz]
 ldr   r2, [rx+4]
 ldr   r3, [rx+8]

Yeah, that seems to tickle a neuron.

 On the other hand, if you have
 
 left
 left
 right
 right
 
 There is no way to sort them to get:
 
 left
 right
 left
 right
 
 and then fuse:
 
 left_right
 left_right
 
 This would be impossible.
 
 I can't understand this very well, this exactly is one case we want to
 fuse on ARM for movw/movt.  Given
 moww  r1, const_1

This differs from the above by having r1 and const_1, in my example, there is 
no r1 and no const_1, this matters.  I wanted to list a case where it is 
impossible to sort.  This happens when there isn’t enough data to sort on, for 
example, no offset, no register number.

[fortran,patch]

2014-10-11 Thread FX
After the compile-time simplification, this patch fixes the handling of special 
values (infinities and NaNs) by intrinsics EXPONENT, FRACTION, SPACING, 
RRSPACING  SET_EXPONENT on the code generation side.

Bootstrapped and regtested on x86_64-linux.
OK to commit?





intrinsics.ChangeLog
Description: Binary data


intrinsics.diff
Description: Binary data


Re: [Ping] [PATCH, 8/10] aarch64: ccmp insn patterns

2014-10-11 Thread Richard Henderson
On 09/22/2014 11:45 PM, Zhenqiang Chen wrote:
 +(define_expand cbranchcc4
 +  [(set (pc) (if_then_else
 +   (match_operator 0 aarch64_comparison_operator
 +[(match_operand 1 cc_register )
 + (const_int 0)])
 +   (label_ref (match_operand 3  ))
 +   (pc)))]
 +  
 +   )

Extra space.

 +(define_insn *ccmp_and
 +  [(set (match_operand 6 ccmp_cc_register )
 + (compare
 +  (and:SI
 +   (match_operator 4 aarch64_comparison_operator
 +[(match_operand 0 ccmp_cc_register )
 + (match_operand 1 aarch64_plus_operand )])
 +   (match_operator 5 aarch64_comparison_operator
 +[(match_operand:GPI 2 register_operand r,r,r)
 + (match_operand:GPI 3 aarch64_ccmp_operand r,Uss,Usn)]))
 +  (const_int 0)))]
 +  
 +  {
 +return aarch64_output_ccmp (operands, true, which_alternative);
 +  }
 +  [(set_attr type alus_sreg,alus_imm,alus_imm)]
 +)
 +
 +(define_insn *ccmp_ior
 +  [(set (match_operand 6 ccmp_cc_register )
 + (compare
 +  (ior:SI
 +   (match_operator 4 aarch64_comparison_operator
 +[(match_operand 0 ccmp_cc_register )
 + (match_operand 1 aarch64_plus_operand )])
 +   (match_operator 5 aarch64_comparison_operator
 +[(match_operand:GPI 2 register_operand r,r,r)
 + (match_operand:GPI 3 aarch64_ccmp_operand r,Uss,Usn)]))
 +  (const_int 0)))]
 +  
 +  {
 +return aarch64_output_ccmp (operands, false, which_alternative);
 +  }
 +  [(set_attr type alus_sreg,alus_imm,alus_imm)]

Surely not aarch64_plus_operand for operand 1.  That's a comparison with the
flags register.  Surely (const_int 0) is the only valid operand there.

These could be combined with a code iterator, and thus there would be exactly
one call to aarch64_output_ccmp, and thus inlined.  Although...

It seems to me that you don't need a function call at all.  How about

AND
  @
   ccmp\\t%w2, %w3, %K5, %m4
   ccmp\\t%w2, %w3, %K5, %m4
   ccmn\\t%w2, #%n3, %K5, %m4

IOR
  @
   ccmp\\t%w2, %w3, %k5, %M4
   ccmp\\t%w2, %w3, %k5, %M4
   ccmn\\t%w2, #%n3, %k5, %M4

where 'k' and 'K' are new print_operand codes that output the nzcv (or its
inverse) integer for the comparison, much like 'm' and 'M' print the name of
the comparison.


r~


Re: [Ping] [PATCH, 9/10] aarch64: generate conditional compare instructions

2014-10-11 Thread Richard Henderson
On 09/22/2014 11:46 PM, Zhenqiang Chen wrote:
 +static bool
 +aarch64_convert_mode (rtx* op0, rtx* op1, int unsignedp)
 +{
 +  enum machine_mode mode;
 +
 +  mode = GET_MODE (*op0);
 +  if (mode == VOIDmode)
 +mode = GET_MODE (*op1);
 +
 +  if (mode == QImode || mode == HImode)
 +{
 +  *op0 = convert_modes (SImode, mode, *op0, unsignedp);
 +  *op1 = convert_modes (SImode, mode, *op1, unsignedp);
 +}
 +  else if (mode != SImode  mode != DImode)
 +return false;
 +
 +  return true;
 +}

Hum.  I'd rather not replicate too much of the expander logic here.

We could avoid that by using struct expand_operand, create_input_operand et al,
then expand_insn.  That does require that the target hooks be given trees
rather than rtl as input.


r~


Re: [Ping] [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov

2014-10-11 Thread Richard Henderson
On 09/22/2014 11:46 PM, Zhenqiang Chen wrote:
 @@ -2375,10 +2387,21 @@ noce_get_condition (rtx_insn *jump, rtx_insn 
 **earliest, bool then_else_reversed
return cond;
  }
  
 +  /* For conditional compare, set ALLOW_CC_MODE to TRUE.  */
 +  if (targetm.gen_ccmp_first)
 +{
 +  rtx prev = prev_nonnote_nondebug_insn (jump);
 +  if (prev
 +NONJUMP_INSN_P (prev)
 +BLOCK_FOR_INSN (prev) == BLOCK_FOR_INSN (jump)
 +ccmp_insn_p (prev))
 + allow_cc_mode = true;
 +}
 +
/* Otherwise, fall back on canonicalize_condition to do the dirty
   work of manipulating MODE_CC values and COMPARE rtx codes.  */
tmp = canonicalize_condition (jump, cond, reverse, earliest,
 - NULL_RTX, false, true);
 + NULL_RTX, allow_cc_mode, true);

This needs a lot more explanation.  Why it it ok to allow a cc_mode when the
source is a ccmp, and not for any other comparison?

The issue is going to be how we use the comparison once we've finished with the
transformation.  Is it going to be able to be properly handled by
emit_conditional_move?

If the target doesn't have cbranchcc4, I think that prep_cmp_insn will fail.
But as you show from

 +++ b/gcc/config/aarch64/aarch64.md
 @@ -2589,15 +2589,19 @@
  (match_operand:ALLI 3 register_operand )))]

{
 -rtx ccreg;
  enum rtx_code code = GET_CODE (operands[1]);
  
  if (code == UNEQ || code == LTGT)
FAIL;
  
 -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 -   XEXP (operands[1], 1));
 -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +if (!ccmp_cc_register (XEXP (operands[1], 0),
 +GET_MODE (XEXP (operands[1], 0
 +  {
 + rtx ccreg;
 + ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 +  XEXP (operands[1], 1));
 + operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +  }

this change, even more than that may be required.


r~


[PATCH] cleanups in line-map

2014-10-11 Thread Manuel López-Ibáñez
A few cleanups in line-map code. Bootstrapped and regression tested on
x86_64-linux-gnu.

OK?

libcpp/ChangeLog:

2014-10-12  Manuel López-Ibáñez  m...@gcc.gnu.org

* include/line-map.h (linemap_location_from_macro_expansion_p):
const struct line_maps * argument.
(linemap_position_for_line_and_column): const struct line_map *
argument.
* line-map.c (linemap_macro_map_loc_to_def_point): Delete
redundant declaration.
(linemap_add_macro_token): Use correct argument name in comment.
(linemap_position_for_line_and_column): const struct line_map *
argument.
(linemap_macro_map_loc_to_def_point): Fix comment. Make static.
(linemap_location_from_macro_expansion_p): const struct line_maps *
argument.
(linemap_resolve_location): Fix argument names in comment.
Index: libcpp/include/line-map.h
===
--- libcpp/include/line-map.h   (revision 216098)
+++ libcpp/include/line-map.h   (working copy)
@@ -521,11 +521,11 @@ int linemap_location_in_system_header_p
 source_location);
 
 /* Return TRUE if LOCATION is a source code location of a token coming
from a macro replacement-list at a macro expansion point, FALSE
otherwise.  */
-bool linemap_location_from_macro_expansion_p (struct line_maps *,
+bool linemap_location_from_macro_expansion_p (const struct line_maps *,
  source_location);
 
 /* source_location values from 0 to RESERVED_LOCATION_COUNT-1 will
be reserved for libcpp user as special values, no token from libcpp
will contain any of those locations.  */
@@ -597,13 +597,14 @@ bool linemap_location_from_macro_expansi
 extern source_location
 linemap_position_for_column (struct line_maps *, unsigned int);
 
 /* Encode and return a source location from a given line and
column.  */
-source_location linemap_position_for_line_and_column (struct line_map *,
- linenum_type,
- unsigned int);
+source_location
+linemap_position_for_line_and_column (const struct line_map *,
+ linenum_type, unsigned int);
+
 /* Return the file this map is for.  */
 #define LINEMAP_FILE(MAP)  \
   (linemap_check_ordinary (MAP)-d.ordinary.to_file)
 
 /* Return the line number this map started encoding location from.  */
Index: libcpp/line-map.c
===
--- libcpp/line-map.c   (revision 216098)
+++ libcpp/line-map.c   (working copy)
@@ -29,12 +29,10 @@ along with this program; see the file CO
 static void trace_include (const struct line_maps *, const struct line_map *);
 static const struct line_map * linemap_ordinary_map_lookup (struct line_maps *,
source_location);
 static const struct line_map* linemap_macro_map_lookup (struct line_maps *,
source_location);
-static source_location linemap_macro_map_loc_to_def_point
-(const struct line_map*, source_location);
 static source_location linemap_macro_map_loc_unwind_toward_spelling
 (const struct line_map*, source_location);
 static source_location linemap_macro_map_loc_to_exp_point
 (const struct line_map*, source_location);
 static source_location linemap_macro_loc_to_spelling_point
@@ -482,11 +480,11 @@ linemap_enter_macro (struct line_maps *s
definition, it is the locus in the macro definition; otherwise it
is a location in the context of the caller of this macro expansion
(which is a virtual location or a source location if the caller is
itself a macro expansion or not).
 
-   MACRO_DEFINITION_LOC is the location in the macro definition,
+   ORIG_PARM_REPLACEMENT_LOC is the location in the macro definition,
either of the token itself or of a macro parameter that it
replaces.  */
 
 source_location
 linemap_add_macro_token (const struct line_map *map,
@@ -619,11 +617,11 @@ linemap_position_for_column (struct line
 
 /* Encode and return a source location from a given line and
column.  */
 
 source_location
-linemap_position_for_line_and_column (struct line_map *map,
+linemap_position_for_line_and_column (const struct line_map *map,
  linenum_type line,
  unsigned column)
 {
   linemap_assert (ORDINARY_MAP_STARTING_LINE_NUMBER (map) = line);
 
@@ -770,19 +768,17 @@ linemap_macro_map_loc_to_exp_point (cons
MACRO_MAP_NUM_MACRO_TOKENS (map));
 
   return MACRO_MAP_EXPANSION_POINT_LOCATION (map);
 }
 
-/* If LOCATION is the source location of a token that belongs to a
-   macro replacement-list -- as part of a macro expansion -- then
-   return the location of the token at the definition point of the
-   macro.  Otherwise, 

C++ PATCH for c++/62115 (ICE on invalid conversion to reference to base)

2014-10-11 Thread Jason Merrill
convert_like_real was getting confused because it was seeing a reference 
binding that we had marked as bad, but it couldn't tell what was bad 
about it.  This happened because when we did the ck_base conversion the 
rvalue expression became an lvalue.  Fixed by preserving rvalueness 
through convert_to_base.  As a consequence of these changes I also 
needed to tweak build_dynamic_cast_1 so that we don't try to pass off a 
tree of REFERENCE_TYPE to build_static_cast.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 339616961e9ebd216bf73abd6e36e0a5f049ed71
Author: Jason Merrill ja...@redhat.com
Date:   Fri Oct 10 18:21:02 2014 -0400

	PR c++/62115
	* class.c (build_base_path): Preserve rvalueness.
	* call.c (convert_like_real) [ck_base]: Let convert_to_base handle /*.
	* rtti.c (build_dynamic_cast_1): Call convert_to_reference later.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 76d8eab..8a89aad 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6341,10 +6341,8 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum,
 	  /* We are going to bind a reference directly to a base-class
 	 subobject of EXPR.  */
 	  /* Build an expression for `*((base*) expr)'.  */
-	  expr = cp_build_addr_expr (expr, complain);
-	  expr = convert_to_base (expr, build_pointer_type (totype),
+	  expr = convert_to_base (expr, totype,
   !c_cast_p, /*nonnull=*/true, complain);
-	  expr = cp_build_indirect_ref (expr, RO_IMPLICIT_CONVERSION, complain);
 	  return expr;
 	}
 
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index b661187..99bfa95 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -251,6 +251,7 @@ build_base_path (enum tree_code code,
   int want_pointer = TYPE_PTR_P (TREE_TYPE (expr));
   bool has_empty = false;
   bool virtual_access;
+  bool rvalue = false;
 
   if (expr == error_mark_node || binfo == error_mark_node || !binfo)
 return error_mark_node;
@@ -324,8 +325,11 @@ build_base_path (enum tree_code code,
 }
 
   if (!want_pointer)
-/* This must happen before the call to save_expr.  */
-expr = cp_build_addr_expr (expr, complain);
+{
+  rvalue = !real_lvalue_p (expr);
+  /* This must happen before the call to save_expr.  */
+  expr = cp_build_addr_expr (expr, complain);
+}
   else
 expr = mark_rvalue_use (expr);
 
@@ -351,9 +355,7 @@ build_base_path (enum tree_code code,
   || in_template_function ())
 {
   expr = build_nop (ptr_target_type, expr);
-  if (!want_pointer)
-	expr = build_indirect_ref (EXPR_LOCATION (expr), expr, RO_NULL);
-  return expr;
+  goto indout;
 }
 
   /* If we're in an NSDMI, we don't have the full constructor context yet
@@ -364,9 +366,7 @@ build_base_path (enum tree_code code,
 {
   expr = build1 (CONVERT_EXPR, ptr_target_type, expr);
   CONVERT_EXPR_VBASE_PATH (expr) = true;
-  if (!want_pointer)
-	expr = build_indirect_ref (EXPR_LOCATION (expr), expr, RO_NULL);
-  return expr;
+  goto indout;
 }
 
   /* Do we need to check for a null pointer?  */
@@ -402,6 +402,8 @@ build_base_path (enum tree_code code,
 {
   expr = cp_build_indirect_ref (expr, RO_NULL, complain);
   expr = build_simple_base_path (expr, binfo);
+  if (rvalue)
+	expr = move (expr);
   if (want_pointer)
 	expr = build_address (expr);
   target_type = TREE_TYPE (expr);
@@ -478,8 +480,13 @@ build_base_path (enum tree_code code,
   else
 null_test = NULL;
 
+ indout:
   if (!want_pointer)
-expr = cp_build_indirect_ref (expr, RO_NULL, complain);
+{
+  expr = cp_build_indirect_ref (expr, RO_NULL, complain);
+  if (rvalue)
+	expr = move (expr);
+}
 
  out:
   if (null_test)
diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index 10cc168..762953b 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -608,10 +608,6 @@ build_dynamic_cast_1 (tree type, tree expr, tsubst_flags_t complain)
 	  errstr = _(source is of incomplete class type);
 	  goto fail;
 	}
-
-  /* Apply trivial conversion T - T for dereferenced ptrs.  */
-  expr = convert_to_reference (exprtype, expr, CONV_IMPLICIT,
-   LOOKUP_NORMAL, NULL_TREE, complain);
 }
 
   /* The dynamic_cast operator shall not cast away constness.  */
@@ -631,6 +627,11 @@ build_dynamic_cast_1 (tree type, tree expr, tsubst_flags_t complain)
   return build_static_cast (type, expr, complain);
   }
 
+  /* Apply trivial conversion T - T for dereferenced ptrs.  */
+  if (tc == REFERENCE_TYPE)
+expr = convert_to_reference (exprtype, expr, CONV_IMPLICIT,
+ LOOKUP_NORMAL, NULL_TREE, complain);
+
   /* Otherwise *exprtype must be a polymorphic class (have a vtbl).  */
   if (TYPE_POLYMORPHIC_P (TREE_TYPE (exprtype)))
 {
diff --git a/gcc/testsuite/g++.dg/expr/cond6.C b/gcc/testsuite/g++.dg/expr/cond6.C
index 943aa85..8f7f084 100644
--- a/gcc/testsuite/g++.dg/expr/cond6.C
+++ b/gcc/testsuite/g++.dg/expr/cond6.C
@@ -1,10 +1,11 @@
 // { dg-do run }
 
 extern C void abort ();
+bool ok = 

Re: [Ping] [PATCH, 7/10] aarch64: add function to output ccmp insn

2014-10-11 Thread Richard Henderson
On 10/11/2014 09:11 AM, Richard Henderson wrote:
 On 09/22/2014 11:45 PM, Zhenqiang Chen wrote:
 +static unsigned int
 +aarch64_code_to_nzcv (enum rtx_code code, bool inverse) {
 +  switch (code)
 +{
 +case NE: /* NE, Z == 0.  */
 +  return inverse ? AARCH64_CC_Z : 0;
 +case EQ: /* EQ, Z == 1.  */
 +  return inverse ? 0 : AARCH64_CC_Z;
 +case LE: /* LE, !(Z == 0  N == V).  */
 +  return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_Z;
 +case GT: /* GT, Z == 0  N == V.  */
 +  return inverse ? AARCH64_CC_Z : AARCH64_CC_N | AARCH64_CC_V;
 +case LT: /* LT, N != V.  */
 +  return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_N;
 +case GE: /* GE, N == V.  */
 +  return inverse ? AARCH64_CC_N : AARCH64_CC_N | AARCH64_CC_V;
 +case LEU: /* LS, !(C == 1  Z == 0).  */
 +  return inverse ? AARCH64_CC_C: AARCH64_CC_Z;
 +case GTU: /* HI, C ==1  Z == 0.  */
 +  return inverse ? AARCH64_CC_Z : AARCH64_CC_C;
 +case LTU: /* CC, C == 0.  */
 +  return inverse ? AARCH64_CC_C : 0;
 +case GEU: /* CS, C == 1.  */
 +  return inverse ? 0 : AARCH64_CC_C;
 +default:
 +  gcc_unreachable ();
 +  return 0;
 +}
 +}
 +
 
 I'm not overly fond of this, since code doesn't map 1-1.  It needs the
 context of a mode to provide a unique mapping.
 
 I think it would be better to rearrange the existing aarch64_cond_code enum
 such that AARCH64_NE et al are meaningful wrt NZCV.  Then you can use
 aarch64_get_condition_code_1 to get this mapping.

Slight mistake in the advice here.  I think you should use
aarch64_get_conditional_code_1 to get an aarch64_cond_code, and use that to
index an array to get the nzcv bits.

Further, does it actually make sense to store both nzcv and its inverse, or
does it work to use nzcv and ~nzcv?


r~