Re: [PATCH 3/5] IPA ICF pass
After few days of measurement and tuning, I was able to get numbers to the following shape: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall1412 kB ( 0%) ggc phase opt and generate : 27.83 (59%) usr 0.66 (19%) sys 28.52 (37%) wall 1028813 kB (24%) ggc phase stream in : 16.90 (36%) usr 0.63 (18%) sys 17.60 (23%) wall 3246453 kB (76%) ggc phase stream out: 2.76 ( 6%) usr 2.19 (63%) sys 31.34 (40%) wall 2 kB ( 0%) ggc callgraph optimization : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 40 kB ( 0%) ggc ipa dead code removal : 3.31 ( 7%) usr 0.01 ( 0%) sys 3.25 ( 4%) wall 0 kB ( 0%) ggc ipa virtual call target : 3.69 ( 8%) usr 0.03 ( 1%) sys 3.80 ( 5%) wall 21 kB ( 0%) ggc ipa devirtualization: 0.12 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 13704 kB ( 0%) ggc ipa cp : 1.11 ( 2%) usr 0.07 ( 2%) sys 1.17 ( 2%) wall 188558 kB ( 4%) ggc ipa inlining heuristics : 8.17 (17%) usr 0.14 ( 4%) sys 8.27 (11%) wall 494738 kB (12%) ggc ipa comdats : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple in : 1.86 ( 4%) usr 0.40 (11%) sys 2.20 ( 3%) wall 537970 kB (13%) ggc ipa lto gimple out : 0.19 ( 0%) usr 0.08 ( 2%) sys 0.27 ( 0%) wall 2 kB ( 0%) ggc ipa lto decl in : 12.20 (26%) usr 0.37 (11%) sys 12.64 (16%) wall 2441687 kB (57%) ggc ipa lto decl out: 2.51 ( 5%) usr 0.21 ( 6%) sys 2.71 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.17 ( 0%) wall 15692 kB ( 0%) ggc ipa lto constructors out: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.54 ( 1%) usr 0.09 ( 3%) sys 0.63 ( 1%) wall 407182 kB (10%) ggc ipa lto decl merge : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.34 ( 2%) wall8220 kB ( 0%) ggc ipa lto cgraph merge: 1.00 ( 2%) usr 0.00 ( 0%) sys 1.00 ( 1%) wall 14605 kB ( 0%) ggc whopr wpa : 0.92 ( 2%) usr 0.00 ( 0%) sys 0.89 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.01 ( 0%) usr 1.90 (55%) sys 28.31 (37%) wall 0 kB ( 0%) ggc whopr partitioning : 2.81 ( 6%) usr 0.01 ( 0%) sys 2.83 ( 4%) wall4943 kB ( 0%) ggc ipa reference : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.35 ( 2%) wall 0 kB ( 0%) ggc ipa profile : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 1.62 ( 3%) usr 0.00 ( 0%) sys 1.63 ( 2%) wall 0 kB ( 0%) ggc ipa icf : 2.65 ( 6%) usr 0.02 ( 1%) sys 2.68 ( 3%) wall1352 kB ( 0%) ggc inline parameters : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree SSA rewrite: 0.11 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 18919 kB ( 0%) ggc tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental: 0.24 ( 1%) usr 0.01 ( 0%) sys 0.32 ( 0%) wall 11325 kB ( 0%) ggc tree operand scan : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.18 ( 0%) wall 116283 kB ( 3%) ggc dominance frontiers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.01 ( 0%) usr 0.02 ( 1%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc loop fini : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.55 ( 1%) usr 0.00 ( 0%) sys 0.56 ( 1%) wall 0 kB ( 0%) ggc TOTAL : 47.49 3.4877.46 4276682 kB and I was able to reduce function bodies loaded in WPA to 35% (from previous 55%). The main problem 35% means that 35% of all function bodies are compared with something else? That feels pretty high. but overall numbers are not so terrible. with speed was hidden in work list for congruence classes, where hash_set was used. I chose the data structure to support delete operation, but it was really slow. Thus, hash_set was replaced with linked list and a flag is used to identify if a set is removed or not. Interesting, I would not expect bottleneck in a congruence solving :) I have no clue who complicated can it be to implement release_body function to an operation that really releases the memory? I suppose one can keep the caches from streamer and free trees read. Freeing gimple statemnts, cfg should be relatively easy. Lets however first try to tune the implementation rather than try to this hack implemented. Explicit ggc_free calls
Re: [PATCH 3/5] IPA ICF pass
+/* Verifies for given GIMPLEs S1 and S2 that + goto statements are semantically equivalent. */ + +bool +func_checker::compare_gimple_goto (gimple g1, gimple g2) +{ + tree dest1, dest2; + + dest1 = gimple_goto_dest (g1); + dest2 = gimple_goto_dest (g2); + + if (TREE_CODE (dest1) != TREE_CODE (dest2) || TREE_CODE (dest1) != SSA_NAME) +return false; + + return compare_operand (dest1, dest2); You probably need to care only about indirect gotos, the direct ones are checked by CFG compare. So is the condtional jump. It looks that this code is visited quite rare. Hmm, perhaps it is called only for indirect calls, because all others are not represented as statements. + +/* Verifies for given GIMPLEs S1 and S2 that ASM statements are equivalent. + For the beginning, the pass only supports equality for + '__asm__ __volatile__ (, , , memory)'. */ + +bool +func_checker::compare_gimple_asm (gimple g1, gimple g2) +{ + if (gimple_asm_volatile_p (g1) != gimple_asm_volatile_p (g2)) +return false; + + if (gimple_asm_ninputs (g1) || gimple_asm_ninputs (g2)) +return false; + + if (gimple_asm_noutputs (g1) || gimple_asm_noutputs (g2)) +return false; + + if (gimple_asm_nlabels (g1) || gimple_asm_nlabels (g2)) +return false; + + if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2)) +return false; + + for (unsigned i = 0; i gimple_asm_nclobbers (g1); i++) +{ + tree clobber1 = TREE_VALUE (gimple_asm_clobber_op (g1, i)); + tree clobber2 = TREE_VALUE (gimple_asm_clobber_op (g2, i)); + + if (!operand_equal_p (clobber1, clobber2, OEP_ONLY_CONST)) + return false; +} + Even asm statements with no inputs or outputs can differ by the actual asm statement. Compare it too. Comparing inputs/outputs/labels should be very easy to do. Compare all gimple_asm_n* for equivalency. This makes fully sense, but I don't understand what kind of operands do you mean? You can look some other code dealing with gimple asm statements. You can just compare gimple_op for 0 gimple_num_ops and be ready to deal with TREE_LIST as described bellow. Honza At the end walk operands and watch the case they are TREE_LIST. THen compare TREE_VALUE (op) of the list for operand_equal_p and TREE_VALUE (TREE_PURPOSE (op)) for equivalency (those are the constraints) If they are not (clobbers are not, those are just strings), operand_equal_p should do. + return true; +} + +} // ipa_icf namespace Otherwise I think ipa-gimple-icf is quite fine now. Please send updated version and I think it can go to mainline before the actual ipa-icf. I renamed both files and put them to a newly created namespace ipa_icf_gimple. Thank you, Martin
Re: [PATCH 3/5] IPA ICF pass
Hello. Yeah, you are right. But even Richard advised me to put it to a single place. Maybe we are a bit more strict than it would be necessary. But I hope that's fine ;) OK, lets do extra checking now and play with this incrementally. Good point. Do you mean cases like, foo (alias_foo) and bar (alias_bar). If we prove that foo equals to bar, can we also merge aliases? I am curious if such comparison can really save something? Are there any interesting cases? What probably matters is that you recognize the equivalence to know that uses of alias_foo can be merged with uses alias_bar. Similarly for thunks. Again something to do incrementally I guess. Honza Martin +case INTEGER_CST: + { + ret = types_are_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2)) + wi::to_offset (t1) == wi::to_offset (t2); tree_int_cst_equal +case FIELD_DECL: + { + tree fctx1 = DECL_FCONTEXT (t1); + tree fctx2 = DECL_FCONTEXT (t2); DECL_FCONTEXT has no semantic meaning; so you can skip comparing it. + + tree offset1 = DECL_FIELD_OFFSET (t1); + tree offset2 = DECL_FIELD_OFFSET (t2); + + tree bit_offset1 = DECL_FIELD_BIT_OFFSET (t1); + tree bit_offset2 = DECL_FIELD_BIT_OFFSET (t2); + + ret = compare_operand (fctx1, fctx2) + compare_operand (offset1, offset2) + compare_operand (bit_offset1, bit_offset2); You probably want to compare type here? +case CONSTRUCTOR: + { + unsigned len1 = vec_safe_length (CONSTRUCTOR_ELTS (t1)); + unsigned len2 = vec_safe_length (CONSTRUCTOR_ELTS (t2)); + + if (len1 != len2) +return false; + + for (unsigned i = 0; i len1; i++) +if (!sem_variable::equals (CONSTRUCTOR_ELT (t1, i)-value, + CONSTRUCTOR_ELT (t2, i)-value)) + return false; You want to compare -index, too. +case INTEGER_CST: + return func_checker::types_are_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2), + true) +wi::to_offset (t1) == wi::to_offset (t2); again ;) This is where I stopped for now. Generally the patch seems OK to me with few of these details fixed. Honza
[PING] [PATCH, xtensa] Add zero-overhead looping for xtensa backend
PING? Hi Sterling, I made some improvement to the patch. Two changes: 1. TARGET_LOOPS is now used as a condition of the doloop related patterns, which is more elegant. 2. As the trip count register of the zero-cost loop maybe potentially spilled, we need to change the patterns in order to handle this issue. The solution is similar to that adapted by c6x backend. Just turn the zero-cost loop into a regular loop when that happens when reload is completed. Attached please find version 4 of the patch. Make check regression tested with xtensa-elf-gcc/simulator. OK for trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 216079) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,20 @@ +2014-10-10 Felix Yang felix.y...@huawei.com + +* config/xtensa/xtensa.h (TARGET_LOOPS): New Macro. +* config/xtensa/xtensa.c (xtensa_reorg): New. +(xtensa_reorg_loops): New. +(xtensa_can_use_doloop_p): New. +(xtensa_invalid_within_doloop): New. +(hwloop_optimize): New. +(hwloop_fail): New. +(hwloop_pattern_reg): New. +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label. +(xtensa_doloop_hooks): Define. +* config/xtensa/xtensa.md (doloop_end): New. +(loop_end): New +(zero_cost_loop_start): Rewritten. +(zero_cost_loop_end): Rewritten. + 2014-10-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * configure.ac: Add --enable-fix-cortex-a53-835769 option. Index: gcc/config/xtensa/xtensa.md === --- gcc/config/xtensa/xtensa.md(revision 216079) +++ gcc/config/xtensa/xtensa.md(working copy) @@ -35,6 +35,8 @@ (UNSPEC_TLS_CALL9) (UNSPEC_TP10) (UNSPEC_MEMW11) + (UNSPEC_LSETUP_START 12) + (UNSPEC_LSETUP_END13) (UNSPECV_SET_FP1) (UNSPECV_ENTRY2) @@ -1289,41 +1291,120 @@ (set_attr length3)]) +;; Zero-overhead looping support. + ;; Define the loop insns used by bct optimization to represent the -;; start and end of a zero-overhead loop (in loop.c). This start -;; template generates the loop insn; the end template doesn't generate -;; any instructions since loop end is handled in hardware. +;; start and end of a zero-overhead loop. This start template +generates ;; the loop insn; the end template doesn't generate any +instructions since ;; loop end is handled in hardware. (define_insn zero_cost_loop_start [(set (pc) -(if_then_else (eq (match_operand:SI 0 register_operand a) - (const_int 0)) - (label_ref (match_operand 1 )) - (pc))) - (set (reg:SI 19) -(plus:SI (match_dup 0) (const_int -1)))] - - loopnez\t%0, %l1 +(if_then_else (ne (match_operand:SI 0 register_operand 2) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 register_operand =a) +(plus (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_START)] TARGET_LOOPS + optimize + loop\t%0, %l1_LEND [(set_attr typejump) (set_attr modenone) (set_attr length3)]) (define_insn zero_cost_loop_end [(set (pc) -(if_then_else (ne (reg:SI 19) (const_int 0)) - (label_ref (match_operand 0 )) - (pc))) - (set (reg:SI 19) -(plus:SI (reg:SI 19) (const_int -1)))] - +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand 2,2) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 nonimmediate_operand =a,m) +(plus (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END) + (clobber (match_scratch:SI 3 =X,r))] TARGET_LOOPS optimize + # + [(set_attr typejump) + (set_attr modenone) + (set_attr length0)]) + +(define_insn loop_end + [(set (pc) +(if_then_else (ne (match_operand:SI 0 register_operand 2) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 register_operand =a) +(plus (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)] + TARGET_LOOPS optimize { -xtensa_emit_loop_end (insn, operands); -return ; + xtensa_emit_loop_end (insn, operands); return ; } [(set_attr typejump) (set_attr modenone) (set_attr length0)]) +(define_split + [(set (pc) +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand ) + (const_int 1)) + (label_ref (match_operand 1 )) +
Re: RFA: fix mode confusion in caller-save.c:replace_reg_with_saved_mem
On 10 October 2014 21:13, Jeff Law l...@redhat.com wrote: ... ISTM it would be better to find the mode of the same class that corresponds to GET_MODE_SIZE (mode) / nregs. In your case that's obviously QImode :-) Like this? Or did you mean to remove the save_mode[regno] use altogether? I can think of arguments for or against, but got no concrete examples for either. 2014-10-11 Joern Rennecke joern.renne...@embecosm.com Jeff Law l...@redhat.com * caller-save.c (replace_reg_with_saved_mem): If saved_mode covers multiple hard registers, use word_mode. diff --git a/gcc/caller-save.c b/gcc/caller-save.c index e28facb..31b1a36 100644 --- a/gcc/caller-save.c +++ b/gcc/caller-save.c @@ -1158,9 +1158,12 @@ replace_reg_with_saved_mem (rtx *loc, } else { - gcc_assert (save_mode[regno] != VOIDmode); - XVECEXP (mem, 0, i) = gen_rtx_REG (save_mode [regno], - regno + i); + enum machine_mode smode = save_mode[regno]; + gcc_assert (smode != VOIDmode); + if (hard_regno_nregs [regno][smode] 1) + smode = mode_for_size (GET_MODE_SIZE (mode) / nregs, +GET_MODE_CLASS (mode), 0); + XVECEXP (mem, 0, i) = gen_rtx_REG (smode, regno + i); } }
[testsuite patch] avoid test when compile options is conflict with default mthumb
When testing arm-linux-gnueabihf triple with configure options --with-mode=thumb(that makes -mthumb option default). some testcase is failed with error message sorry, unimplemented: Thumb-1 hard-float VFP ABI. I found gcc compiler show this error message when : 1. -mthumb is used with -march=armv6 (or armv5e) and -mcpu=xscale 2. the test source have function body. And when -mthumb is the default option of compiler, the dg-skip-if functions can not detect it, There is no xscale check function in target-supports.exp in. so we need to add it . And there are only macros in the test program in check_effective_target_arm* function . no function body, we need to add it too. Here is my patch: 2014-10-08 Wangdeqiang wangdeqi...@linaro.org * lib/target-supports.exp (check_effective_target_arm_ xscale_ok): New function. (check_effective_target_arm_arch_FUNC_ok): Add test function body. * gcc.target/arm/pr40887.c (dg-require-effective-target): add arm_arch_v5te_ok check * gcc.target/arm/scd42-1.c (dg-require-effective-target): add arm_xscale_ok check * gcc.target/arm/scd42-2.c : Likewise * gcc.target/arm/scd42-3.c : Likewise * gcc.target/arm/g2.c : Likewise * gcc.target/arm/xor-and.c (dg-require-effective-target): add arm_arch_v6_ok check Index: gcc/testsuite/gcc.target/arm/pr40887.c === --- gcc/testsuite/gcc.target/arm/pr40887.c (revision 216115) +++ gcc/testsuite/gcc.target/arm/pr40887.c (working copy) @@ -1,6 +1,7 @@ /* { dg-skip-if need at least armv5 { *-*-* } { -march=armv[234]* } { } } */ /* { dg-options -O2 -march=armv5te } */ /* { dg-final { scan-assembler blx } } */ +/* { dg-require-effective-target arm_arch_v5te_ok } */ int (*indirect_func)(int x); Index: gcc/testsuite/gcc.target/arm/scd42-2.c === --- gcc/testsuite/gcc.target/arm/scd42-2.c (revision 216115) +++ gcc/testsuite/gcc.target/arm/scd42-2.c (working copy) @@ -5,6 +5,7 @@ /* { dg-skip-if Test is specific to the Xscale { arm*-*-* } { -mcpu=* } { -mcpu=xscale } } */ /* { dg-skip-if Test is specific to ARM mode { arm*-*-* } { -mthumb } { } } */ /* { dg-require-effective-target arm32 } */ +/* { dg-require-effective-target arm_xscale_ok } */ unsigned load2(void) __attribute__ ((naked)); unsigned load2(void) Index: gcc/testsuite/gcc.target/arm/scd42-3.c === --- gcc/testsuite/gcc.target/arm/scd42-3.c (revision 216115) +++ gcc/testsuite/gcc.target/arm/scd42-3.c (working copy) @@ -3,6 +3,7 @@ /* { dg-skip-if Test is specific to Xscale { arm*-*-* } { -march=* } { -march=xscale } } */ /* { dg-skip-if Test is specific to Xscale { arm*-*-* } { -mcpu=* } { -mcpu=xscale } } */ /* { dg-options -mcpu=xscale -O } */ +/* { dg-require-effective-target arm_xscale_ok } */ unsigned load4(void) __attribute__ ((naked)); unsigned load4(void) Index: gcc/testsuite/gcc.target/arm/g2.c === --- gcc/testsuite/gcc.target/arm/g2.c (revision 216115) +++ gcc/testsuite/gcc.target/arm/g2.c (working copy) @@ -5,6 +5,7 @@ /* { dg-skip-if Test is specific to the Xscale { arm*-*-* } { -mcpu=* } { -mcpu=xscale } } */ /* { dg-skip-if Test is specific to ARM mode { arm*-*-* } { -mthumb } { } } */ /* { dg-require-effective-target arm32 } */ +/* { dg-require-effective-target arm_xscale_ok } */ /* Brett Gaines' test case. */ unsigned BCPL(unsigned) __attribute__ ((naked)); Index: gcc/testsuite/gcc.target/arm/xor-and.c === --- gcc/testsuite/gcc.target/arm/xor-and.c (revision 216115) +++ gcc/testsuite/gcc.target/arm/xor-and.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-options -O -march=armv6 } */ /* { dg-prune-output switch .* conflicts with } */ +/* { dg-require-effective-target arm_arch_v6_ok } */ unsigned short foo (unsigned short x) { Index: gcc/testsuite/gcc.target/arm/scd42-1.c === --- gcc/testsuite/gcc.target/arm/scd42-1.c (revision 216115) +++ gcc/testsuite/gcc.target/arm/scd42-1.c (working copy) @@ -2,6 +2,7 @@ /* { dg-do compile } */ /* { dg-skip-if incompatible options { arm*-*-* } { -march=* } { } } */ /* { dg-options -mcpu=xscale -O } */ +/* { dg-require-effective-target arm_xscale_ok } */ unsigned load1(void) __attribute__ ((naked)); unsigned load1(void) Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 216115) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -2721,6 +2721,11 @@ foreach { armfunc armflag armdef } { v4 #if !defined (DEF) #error !DEF #endif + int
Fallout on full bootstrap (was: r216010 - in /trunk/gcc: ChangeLog ipa-polymorp...)
On Wed, 2014-10-08 17:10:01 -, hubi...@gcc.gnu.org hubi...@gcc.gnu.org wrote: URL: https://gcc.gnu.org/viewcvs?rev=216010root=gccview=rev * ipa-polymorphic-call.c (extr_type_from_vtbl_store): Do better pattern matching of MEM_REF. (check_stmt_for_type_change): Update. This recent commit led to fallout for all targets build with config-list.mk: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. -I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include -I/opt/cfarm/mpc/include -I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../../gcc/gcc/../libbacktrace-o ipa-polymorphic-call.o -MT ipa-polymorphic-call.o -MMD -MP -MF ./.deps/ipa-polymorphic-call.TPo ../../../gcc/gcc/ipa-polymorphic-call.c ../../../gcc/gcc/ipa-polymorphic-call.c: In function ‘tree_node* extr_type_from_vtbl_ptr_store(gimple, type_change_info*, long int*)’: ../../../gcc/gcc/ipa-polymorphic-call.c:2117:1: error: assuming signed overflow does not occur when assuming that (X + c) X is always false [-Werror=strict-overflow] } ^ cc1plus: all warnings being treated as errors make[2]: *** [ipa-polymorphic-call.o] Error 1 make[2]: Leaving directory `/home/jbglaw/build-configlist_mk/iq2000-elf/build-gcc/mk/iq2000-elf/gcc' make[1]: *** [all-gcc] Error 2 (Note that this `g++' is an up-to-date revision, and the line number mentioned is also wrong.) It's probably caused by this chunk: diff --git a/gcc/ipa-polymorphic-call.c b/gcc/ipa-polymorphic-call.c index 3e4aa04..51c6709 100644 --- a/gcc/ipa-polymorphic-call.c +++ b/gcc/ipa-polymorphic-call.c [...] @@ -1218,7 +1226,19 @@ extr_type_from_vtbl_ptr_store (gimple stmt, struct type_change_info *tci, print_generic_expr (dump_file, tci-instance, TDF_SLIM); fprintf (dump_file, with offset %i\n, (int)tci-offset); } - return NULL_TREE; + return tci-offset GET_MODE_BITSIZE (Pmode) ? error_mark_node : NULL_TREE; + } + if (offset != tci-offset + || size != POINTER_SIZE + || max_size != POINTER_SIZE) + { + if (dump_file) + fprintf (dump_file, wrong offset %i!=%i or size %i\n, +(int)offset, (int)tci-offset, (int)size); + return offset + GET_MODE_BITSIZE (Pmode) = offset +|| (max_size != -1 + tci-offset + GET_MODE_BITSIZE (Pmode) offset + max_size) +? error_mark_node : NULL; } } This is visible on all config-list.mk builds, see eg. just a few recent ones: m32r-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361814 lm32-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361741 ia64-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361682 ia64-linux: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361738 MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: really soon now: an unspecified period of time, likly to the second : be greater than any reasonable definition of soon. signature.asc Description: Digital signature
Re: Fallout on full bootstrap
On 11.10.14 12:43, Jan-Benedict Glaw wrote: On Wed, 2014-10-08 17:10:01 -, hubi...@gcc.gnu.org hubi...@gcc.gnu.org wrote: URL: https://gcc.gnu.org/viewcvs?rev=216010root=gccview=rev * ipa-polymorphic-call.c (extr_type_from_vtbl_store): Do better pattern matching of MEM_REF. (check_stmt_for_type_change): Update. This recent commit led to fallout for all targets build with config-list.mk: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. -I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include -I/opt/cfarm/mpc/include -I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../../gcc/gcc/../libbacktrace-o ipa-polymorphic-call.o -MT ipa-polymorphic-call.o -MMD -MP -MF ./.deps/ipa-polymorphic-call.TPo ../../../gcc/gcc/ipa-polymorphic-call.c ../../../gcc/gcc/ipa-polymorphic-call.c: In function ‘tree_node* extr_type_from_vtbl_ptr_store(gimple, type_change_info*, long int*)’: ../../../gcc/gcc/ipa-polymorphic-call.c:2117:1: error: assuming signed overflow does not occur when assuming that (X + c) X is always false [-Werror=strict-overflow] } ^ cc1plus: all warnings being treated as errors make[2]: *** [ipa-polymorphic-call.o] Error 1 make[2]: Leaving directory `/home/jbglaw/build-configlist_mk/iq2000-elf/build-gcc/mk/iq2000-elf/gcc' make[1]: *** [all-gcc] Error 2 (Note that this `g++' is an up-to-date revision, and the line number mentioned is also wrong.) It's probably caused by this chunk: diff --git a/gcc/ipa-polymorphic-call.c b/gcc/ipa-polymorphic-call.c index 3e4aa04..51c6709 100644 --- a/gcc/ipa-polymorphic-call.c +++ b/gcc/ipa-polymorphic-call.c [...] @@ -1218,7 +1226,19 @@ extr_type_from_vtbl_ptr_store (gimple stmt, struct type_change_info *tci, print_generic_expr (dump_file, tci-instance, TDF_SLIM); fprintf (dump_file, with offset %i\n, (int)tci-offset); } - return NULL_TREE; + return tci-offset GET_MODE_BITSIZE (Pmode) ? error_mark_node : NULL_TREE; + } + if (offset != tci-offset + || size != POINTER_SIZE + || max_size != POINTER_SIZE) + { + if (dump_file) + fprintf (dump_file, wrong offset %i!=%i or size %i\n, +(int)offset, (int)tci-offset, (int)size); + return offset + GET_MODE_BITSIZE (Pmode) = offset +|| (max_size != -1 + tci-offset + GET_MODE_BITSIZE (Pmode) offset + max_size) +? error_mark_node : NULL; } } This is visible on all config-list.mk builds, see eg. just a few recent ones: m32r-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361814 lm32-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361741 ia64-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361682 ia64-linux: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=361738 This is Bug 63496 Andreas
Re: -fuse-caller-save - Collect register usage information
So, I hate the name of the option, and the documentation seems wrong to me. It doesn’t use the caller saved registers for allocation, it uses the call clobbered registers for allocation. Or, one could say it uses the callee saved registers for allocation. Seconded, the description is a bit confusing and caller saved/callee saved should be avoided IMO, call clobbered/call saved is much clearer. -- Eric Botcazou
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Hello Jeff, I see that you have improved the RTL typesafety issue for ira.c, so I rebased this patch on the latest trunk and change to use the new list walking interface. Bootstrapped on x86_64-SUSE-Linux and make check regression tested. OK for trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 216116) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,14 @@ +2014-10-11 Felix Yang felix.y...@huawei.com +Jeff Law l...@redhat.com + +* ira.c (struct equivalence): Change member is_arg_equivalence and replace +into boolean bitfields; turn member loop_depth into a short integer; add new +member no_equiv and reserved. +(no_equiv): Set no_equiv of struct equivalence if register is marked +as having no known equivalence. +(update_equiv_regs): Check all definitions for a multiple-set +register to make sure that the RHS have the same value. + 2014-10-11 Martin Liska mli...@suse.cz PR/63376 Index: gcc/ira.c === --- gcc/ira.c(revision 216116) +++ gcc/ira.c(working copy) @@ -2902,12 +2902,14 @@ struct equivalence /* Loop depth is used to recognize equivalences which appear to be present within the same loop (or in an inner loop). */ - int loop_depth; + short loop_depth; /* Nonzero if this had a preexisting REG_EQUIV note. */ - int is_arg_equivalence; + unsigned char is_arg_equivalence : 1; /* Set when an attempt should be made to replace a register with the associated src_p entry. */ - char replace; + unsigned char replace : 1; + /* Set if this register has no known equivalence. */ + unsigned char no_equiv : 1; }; /* reg_equiv[N] (where N is a pseudo reg number) is the equivalence @@ -3255,6 +3257,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE if (!REG_P (reg)) return; regno = REGNO (reg); + reg_equiv[regno].no_equiv = 1; list = reg_equiv[regno].init_insns; if (list list-insn () == NULL) return; @@ -3381,7 +3384,7 @@ update_equiv_regs (void) /* If this insn contains more (or less) than a single SET, only mark all destinations as having no known equivalence. */ - if (set == 0) + if (set == NULL_RTX) { note_stores (PATTERN (insn), no_equiv, NULL); continue; @@ -3476,16 +3479,49 @@ update_equiv_regs (void) if (note GET_CODE (XEXP (note, 0)) == EXPR_LIST) note = NULL_RTX; - if (DF_REG_DEF_COUNT (regno) != 1 - (! note + if (DF_REG_DEF_COUNT (regno) != 1) +{ + bool equal_p = true; + rtx_insn_list *list; + + /* If we have already processed this pseudo and determined it + can not have an equivalence, then honor that decision. */ + if (reg_equiv[regno].no_equiv) +continue; + + if (! note || rtx_varies_p (XEXP (note, 0), 0) || (reg_equiv[regno].replacement ! rtx_equal_p (XEXP (note, 0), -reg_equiv[regno].replacement -{ - no_equiv (dest, set, NULL); - continue; +reg_equiv[regno].replacement))) +{ + no_equiv (dest, set, NULL); + continue; +} + + list = reg_equiv[regno].init_insns; + for (; list; list = list-next ()) +{ + rtx note_tmp; + rtx_insn *insn_tmp; + + insn_tmp = list-insn (); + note_tmp = find_reg_note (insn_tmp, REG_EQUAL, NULL_RTX); + gcc_assert (note_tmp); + if (! rtx_equal_p (XEXP (note, 0), XEXP (note_tmp, 0))) +{ + equal_p = false; + break; +} +} + + if (! equal_p) +{ + no_equiv (dest, set, NULL); + continue; +} } + /* Record this insn as initializing this register. */ reg_equiv[regno].init_insns = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns); @@ -3514,10 +3550,9 @@ update_equiv_regs (void) a register used only in one basic block from a MEM. If so, and the MEM remains unchanged for the life of the register, add a REG_EQUIV note. */ - note = find_reg_note (insn, REG_EQUIV, NULL_RTX); - if (note == 0 REG_BASIC_BLOCK (regno) = NUM_FIXED_BLOCKS + if (note == NULL_RTX REG_BASIC_BLOCK (regno) = NUM_FIXED_BLOCKS MEM_P (SET_SRC (set)) validate_equiv_mem (insn, dest, SET_SRC (set))) note = set_unique_reg_note (insn, REG_EQUIV, copy_rtx (SET_SRC (set))); @@ -3547,7 +3582,7 @@ update_equiv_regs (void) reg_equiv[regno].replacement = x; reg_equiv[regno].src_p = SET_SRC (set); - reg_equiv[regno].loop_depth = loop_depth; + reg_equiv[regno].loop_depth = (short) loop_depth; /* Don't mess
[patch,fortran] Handle (signed) zeros, infinities and NaNs in some intrinsics
The attached patch fixes the compile-time simplification of special values (positive and negative zeros, infinities, and NaNs) in intrinsics EXPONENT, FRACTION, RRSPACING, SET_EXPONENT, SPACING. Those are all the intrinsics in the Fortran 2008 standard that say anything about these special values, so it makes sense to fix them. This is the compile-time part of PR 48979 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48979). Some notes: - We’re not technically required to do anything about infinities and NaNs unless IEEE_ARITHMETIC is accessible. My view is that it makes sense, as a quality of implementation issue, to handle them correctly anyway. I’ve done so here for simplification, and intent to do the same later for code generation in trans-intrinsic.c - For FRACTION, the 2003 standard says FRACTION(inf) = inf, while Fortran 2008 says FRACTION(inf) = NaN. I agree with Tobias, who said in the PR we shouldn’t emit different code based on -std=f2003/f2008. Instead, we use the Fortran 2008 intepretation here. It makes more sense anyway. - While digging into MPFR doc, I realized that the test (mpfr_sgn (x-value.real) == 0) used a few times in simplify.c is not only true for zeros, but also for NaNs! I thus replaced it with mpfr_zero_p (x-value.real). It affects only some (invalid) warnings. For example, before my patch, the code LOG((nan,nan)) would emit an error Complex argument of LOG cannot be zero”, which makes little sense. Regtested on x86_64-apple-darwin14. OK to commit? FX intrinsics.ChangeLog Description: Binary data intrinsics.diff Description: Binary data
[SH][committed] Remove TARGET_SH4A_ARCH macro
Hi, The TARGET_SH4A_ARCH macro has the same meaning as TARGET_SH4A and thus can be removed. Tested with 'make all' on sh-elf, committed as r216119. Cheers, Oleg gcc/ChangeLog: * config/sh/sh.h (TARGET_SH4A_ARCH): Remove macro. * config/sh/sh.h: Replace uses of TARGET_SH4A_ARCH with TARGET_SH4A. * config/sh/sh.c: Likewise. * config/sh/sh-mem.cc: Likewise. * config/sh/sh.md: Likewise. * config/sh/predicates.md: Likewise. * config/sh/sync.md: Likewise. Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 216118) +++ gcc/config/sh/sh.c (working copy) @@ -818,7 +818,7 @@ assembler_dialect = 1; sh_cpu = PROCESSOR_SH4; } - if (TARGET_SH4A_ARCH) + if (TARGET_SH4A) { assembler_dialect = 1; sh_cpu = PROCESSOR_SH4A; @@ -11597,7 +11597,7 @@ if (TARGET_HARD_SH4 || TARGET_SH5) { if (!TARGET_INLINE_IC_INVALIDATE - || (!(TARGET_SH4A_ARCH || TARGET_SH4_300) TARGET_USERMODE)) + || (!(TARGET_SH4A || TARGET_SH4_300) TARGET_USERMODE)) emit_library_call (function_symbol (NULL, __ic_invalidate, FUNCTION_ORDINARY), LCT_NORMAL, VOIDmode, 1, tramp, SImode); Index: gcc/config/sh/sh.h === --- gcc/config/sh/sh.h (revision 216118) +++ gcc/config/sh/sh.h (working copy) @@ -70,13 +70,9 @@ #undef TARGET_SH4 #define TARGET_SH4 ((target_flags MASK_SH4) != 0 TARGET_SH1) -/* Nonzero if we're generating code for the common subset of - instructions present on both SH4a and SH4al-dsp. */ -#define TARGET_SH4A_ARCH TARGET_SH4A - /* Nonzero if we're generating code for SH4a, unless the use of the FPU is disabled (which makes it compatible with SH4al-dsp). */ -#define TARGET_SH4A_FP (TARGET_SH4A_ARCH TARGET_FPU_ANY) +#define TARGET_SH4A_FP (TARGET_SH4A TARGET_FPU_ANY) /* Nonzero if we should generate code using the SHcompact instruction set and 32-bit ABI. */ Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 216118) +++ gcc/config/sh/sh.md (working copy) @@ -6938,7 +6938,7 @@ emit_insn (gen_ic_invalidate_line_compact (operands[0], operands[1])); DONE; } - else if (TARGET_SH4A_ARCH || TARGET_SH4_300) + else if (TARGET_SH4A || TARGET_SH4_300) { emit_insn (gen_ic_invalidate_line_sh4a (operands[0])); DONE; @@ -6971,7 +6971,7 @@ (define_insn ic_invalidate_line_sh4a [(unspec_volatile [(match_operand:SI 0 register_operand r)] UNSPEC_ICACHE)] - TARGET_SH4A_ARCH || TARGET_SH4_300 + TARGET_SH4A || TARGET_SH4_300 { return ocbwb @%0 \n synco \n @@ -13487,7 +13487,7 @@ [(set (match_operand:SI 0 register_operand =z) (unspec:SI [(match_operand:BLK 1 unaligned_load_operand Sua)] UNSPEC_MOVUA))] - TARGET_SH4A_ARCH + TARGET_SH4A movua.l %1,%0 [(set_attr type movua)]) @@ -13500,7 +13500,7 @@ (sign_extract:SI (mem:SI (match_operand:SI 1 register_operand )) (const_int 32) (const_int 0))) (set (match_dup 1) (plus:SI (match_dup 1) (const_int 4)))] - TARGET_SH4A_ARCH REGNO (operands[0]) != REGNO (operands[1]) + TARGET_SH4A REGNO (operands[0]) != REGNO (operands[1]) [(set (match_operand:SI 0 register_operand ) (sign_extract:SI (mem:SI (post_inc:SI (match_operand:SI 1 register_operand ))) @@ -13512,7 +13512,7 @@ (sign_extract:SI (match_operand:QI 1 unaligned_load_operand ) (match_operand 2 const_int_operand ) (match_operand 3 const_int_operand )))] - TARGET_SH4A_ARCH || TARGET_SH2A + TARGET_SH4A || TARGET_SH2A { if (TARGET_SH2A TARGET_BITOPS (satisfies_constraint_Sbw (operands[1]) @@ -13525,7 +13525,7 @@ emit_insn (gen_movsi (operands[0], gen_rtx_REG (SImode, T_REG))); DONE; } - if (TARGET_SH4A_ARCH + if (TARGET_SH4A INTVAL (operands[2]) == 32 INTVAL (operands[3]) == 0 MEM_P (operands[1]) MEM_ALIGN (operands[1]) 32) @@ -13544,7 +13544,7 @@ (zero_extract:SI (match_operand:QI 1 unaligned_load_operand ) (match_operand 2 const_int_operand ) (match_operand 3 const_int_operand )))] - TARGET_SH4A_ARCH || TARGET_SH2A + TARGET_SH4A || TARGET_SH2A { if (TARGET_SH2A TARGET_BITOPS (satisfies_constraint_Sbw (operands[1]) @@ -13557,7 +13557,7 @@ emit_insn (gen_movsi (operands[0], gen_rtx_REG (SImode, T_REG))); DONE; } - if (TARGET_SH4A_ARCH + if (TARGET_SH4A INTVAL (operands[2]) == 32 INTVAL (operands[3]) == 0 MEM_P (operands[1]) MEM_ALIGN (operands[1]) 32) Index: gcc/config/sh/predicates.md === --- gcc/config/sh/predicates.md (revision 216118) +++ gcc/config/sh/predicates.md (working copy) @@ -1074,14 +1074,14 @@ (and (match_test satisfies_constraint_I08 (op)) (match_test mode != QImode)
Re: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit ices@dwf_regno
On Thu, 9 Oct 2014, Maciej W. Rozycki wrote: Seeing Rohit got good results it has struck me that perhaps one of the patches I had previously reverted, to be able to compile GCC in the first place, interfered with this fix -- I backed out all the subsequent patches to test yours and Rohit's by themselves only. And it was actually the case, with this change: 2013-05-21 Christian Bruel christian.br...@st.com * dwarf2out.c (multiple_reg_loc_descriptor): Use dbx_reg_number for spanning registers. LEAF_REG_REMAP is supported only for contiguous registers. Set register size out of the PARALLEL loop. back in place, in addition to your fix, I get an all-passed score for gdb.base/store.exp. So your change looks good and my decision to back out the other patches unfortunate. I'll yet run full e500v2 testing now to double check, and let you know what the results are, within a couple of hours if things work well. It took a bit more because I saw some regressions that I wanted to investigate. In the end they turned out intermittent and the failures happen sometimes whether your change is applied or not. So I'm fine with your change, thanks for your work and patience. For the record the failures were: FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile Read tp_first_run: 0 2 FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile Read tp_first_run: 2 1 FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile Read tp_first_run: 3 1 Maciej
[PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling
Hello, This is the last common infrastructure patch in the series. (Next patches will contain tests for libgomp testsuite and MIC specific things) It introduces 2 new options: 1. -foffload=targets=options By default, GCC will build offload images for all offload targets specified in configure, with non-target-specific options passed to host compiler. This option is used to control offload targets and options for them. It can be used in a few ways: * -foffload=disable Tells GCC to disable offload support. OpenMP target regions will be run in host fallback mode. * -foffload=targets Tells GCC to build offload images for targets. They will be built with non-target-specific options passed to host compiler. * -foffload=options Tells GCC to build offload images for all targets specified in configure. They will be built with non-target-specific options passed to host compiler plus options. * -foffload=targets=options Tells GCC to build offload images for targets. They will be built with non-target-specific options passed to host compiler plus options. Options specified by -foffload are appended to the end of option set, so in case of option conflicts they have more priority. 2. -foffload-abi=[lp64|ilp32] This option is supposed to tell mkoffload (and offload compiler) which ABI is used in streamed GIMPLE. This option is desirable, because host and offload compilers must have the same ABI. The option is generated by the host compiler automatically, it should not be specified by user. Examples: $ gcc -fopenmp -c -O2 test1.c $ gcc -fopenmp -c -O1 -msse -foffload=-mavx test2.c $ gcc -fopenmp -foffload=-O3 -v test1.o test2.o In this example the offload images will be built with the following options: -O2 -mavx -O3 -v for targets specified in configure. $ gcc -fopenmp -foffload=x86_64-intelmicemul-linux-gnu=-mavx2 \ -foffload=nvptx-none -foffload=-O3 -O2 test.c In this example 2 offload images will be built: for MIC with -O2 -mavx2 -O3 and for PTX with -O2 -O3. Bootstrapped and regtested on top of patch 5. Is it OK for trunk? kyukhin/gomp4-offload branch is updated correspondingly. Thanks, -- Ilya 2014-10-11 Bernd Schmidt ber...@codesourcery.com Andrey Turetskiy andrey.turets...@intel.com Ilya Verbin ilya.ver...@intel.com gcc/ * common.opt (foffload, foffload-abi): New options. * config/i386/i386.c (ix86_offload_options): New static function. (TARGET_OFFLOAD_OPTIONS): Define. * coretypes.h (enum offload_abi): New enum. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_OFFLOAD_OPTIONS): Document. * gcc.c (offload_targets): New static variable. (handle_foffload_option): New static function. (driver_handle_option): Handle OPT_foffload_. (driver::maybe_putenv_OFFLOAD_TARGETS): Set OFFLOAD_TARGET_NAMES according to offload_targets. * hooks.c (hook_charptr_void_null): New hook. * hooks.h (hook_charptr_void_null): Declare. * lto-opts.c: Include lto-section-names.h. (lto_write_options): Append options from target offload_options hook and store them to offload_lto section. Do not store target-specific, driver and diagnostic options in offload_lto section. * lto-wrapper.c (merge_and_complain): Handle OPT_foffload_ and OPT_foffload_abi_. (append_compiler_options, append_linker_options) (append_offload_options): New static functions. (compile_offload_image): Add new arguments with options. Call append_compiler_options and append_offload_options. (compile_images_for_offload_targets): Add new arguments with options. (find_and_merge_options): New static function. (run_gcc): Outline options handling into the new functions: find_and_merge_options, append_compiler_options, append_linker_options. * opts.c (common_handle_option): Don't handle OPT_foffload_. * target.def (offload_options): New target hook. --- diff --git a/gcc/common.opt b/gcc/common.opt index b4f0ed4..37a5fd4 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1640,6 +1640,23 @@ fnon-call-exceptions Common Report Var(flag_non_call_exceptions) Optimization Support synchronous non-call exceptions +foffload= +Common Driver Joined MissingArgError(options or targets missing after %qs) +-foffload=targets=options Specify offloading targets and options for them + +foffload-abi= +Common Joined RejectNegative Enum(offload_abi) Var(flag_offload_abi) Init(OFFLOAD_ABI_UNSET) +-foffload-abi=[lp64|ilp32] Set the ABI to use in an offload compiler + +Enum +Name(offload_abi) Type(enum offload_abi) UnknownError(unknown offload ABI %qs) + +EnumValue +Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32) + +EnumValue +Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64) + fomit-frame-pointer Common Report Var(flag_omit_frame_pointer) Optimization When possible do not
Re: [Ping] [PATCH, 1/10] two hooks for conditional compare (ccmp)
On 09/22/2014 11:43 PM, Zhenqiang Chen wrote: +@cindex @code{ccmp} instruction pattern +@item @samp{ccmp} +Conditional compare instruction. Operand 2 and 5 are RTLs which perform +two comparisons. Operand 1 is AND or IOR, which operates on the result of +operand 2 and 5. +It uses recursive method to support more than two compares. e.g. + + CC0 = CMP (a, b); + CC1 = CCMP (NE (CC0, 0), CMP (e, f)); + ... + CCn = CCMP (NE (CCn-1, 0), CMP (...)); + +Two target hooks are used to generate conditional compares. GEN_CCMP_FISRT +is used to generate the first CMP. And GEN_CCMP_NEXT is used to generate the +following CCMPs. Operand 1 is AND or IOR. Operand 3 is the result of +GEN_CCMP_FISRT or a previous GEN_CCMP_NEXT. Operand 2 is NE. +Operand 4, 5 and 6 is another compare expression. + +A typical CCMP pattern looks like + +@smallexample +(define_insn *ccmp_and_ior + [(set (match_operand 0 dominant_cc_register ) +(compare + (match_operator 1 + (match_operator 2 comparison_operator + [(match_operand 3 dominant_cc_register) +(const_int 0)]) + (match_operator 4 comparison_operator + [(match_operand 5 register_operand) +(match_operand 6 compare_operand])) + (const_int 0)))] + + @dots{}) +@end smallexample + This whole section should be removed. You do not have a named ccmp pattern. Even your example below is an *unnamed pattern. This is an implementation detail of the aarch64 backend. Named patterns are used when that is the interface the middle-end uses to emit code. But you're not using named patterns, you're using: +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (int @var{code}, rtx @var{op0}, rtx @var{op1}) +This function emits a comparison insn for the first of a sequence of + conditional comparisions. It returns a comparison expression appropriate + for passing to @code{gen_ccmp_next} or to @code{cbranch_optab}. + @code{unsignedp} is used when converting @code{op0} and @code{op1}'s mode. +@end deftypefn + +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_NEXT (rtx @var{prev}, int @var{cmp_code}, rtx @var{op0}, rtx @var{op1}, int @var{bit_code}) +This function emits a conditional comparison within a sequence of + conditional comparisons. The @code{prev} expression is the result of a + prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}. It may return + @code{NULL} if the combination of @code{prev} and this comparison is + not supported, otherwise the result must be appropriate for passing to + @code{gen_ccmp_next} or @code{cbranch_optab}. @code{bit_code} + is AND or IOR, which is the op on the two compares. +@end deftypefn Every place above where you refer to the arguments of the function should use @var; you're using @code for most of them. Use @code{AND} and @code{IOR}. r~
Re: [Ping] [PATCH, 2/10] prepare ccmp
On 09/22/2014 11:43 PM, Zhenqiang Chen wrote: + /* If jumps are cheap and the target does not support conditional + compare, turn some more codes into jumpy sequences. */ + else if (BRANCH_COST (optimize_insn_for_speed_p (), false) 4 + (targetm.gen_ccmp_first == NULL)) Don't add unnecessary parenthesis around the == expression. Otherwise ok. r~
Re: [Ping] [PATCH, 5/10] aarch64: add ccmp operand predicate
On 09/22/2014 11:44 PM, Zhenqiang Chen wrote: +/* Return true if val can be encoded as a 5-bit unsigned immediate. */ +bool +aarch64_uimm5 (HOST_WIDE_INT val) +{ + return (val (HOST_WIDE_INT) 0x1f) == val; +} This is just silly. +(define_constraint Usn + A constant that can be used with a CCMN operation (once negated). + (and (match_code const_int) + (match_test aarch64_uimm5 (-ival (match_test IN_RANGE (ival, -31, 0)) +(define_predicate aarch64_ccmp_immediate + (and (match_code const_int) + (ior (match_test aarch64_uimm5 (INTVAL (op))) + (match_test aarch64_uimm5 (-INTVAL (op)) (and (match_code const_int) (match_test IN_RANGE (INTVAL (op), -31, 31))) r~
Re: [Ping] [PATCH, 6/10] aarch64: add ccmp CC mode
On 09/22/2014 11:44 PM, Zhenqiang Chen wrote: +case CC_DNEmode: + return comp_code == NE ? AARCH64_NE : AARCH64_EQ; +case CC_DEQmode: + return comp_code == NE ? AARCH64_EQ : AARCH64_NE; +case CC_DGEmode: + return comp_code == NE ? AARCH64_GE : AARCH64_LT; +case CC_DLTmode: + return comp_code == NE ? AARCH64_LT : AARCH64_GE; +case CC_DGTmode: + return comp_code == NE ? AARCH64_GT : AARCH64_LE; +case CC_DLEmode: + return comp_code == NE ? AARCH64_LE : AARCH64_GT; +case CC_DGEUmode: + return comp_code == NE ? AARCH64_CS : AARCH64_CC; +case CC_DLTUmode: + return comp_code == NE ? AARCH64_CC : AARCH64_CS; +case CC_DGTUmode: + return comp_code == NE ? AARCH64_HI : AARCH64_LS; +case CC_DLEUmode: + return comp_code == NE ? AARCH64_LS : AARCH64_HI; I think these should return -1 if comp_code is not EQ. Like the CC_Zmode case below. Perhaps you can share some code to make the whole thing less bulky. E.g. ... case CC_DLEUmode: ne = AARCH64_LS; eq = AARCH64_HI; break; case CC_Zmode: ne = AARCH64_NE; eq = AARCH64_EQ; break; } if (code == NE) return ne; if (code == EQ) return eq; return -1; This does beg the question of whether you need both CC_Zmode and CC_DNEmode. I'll leave it to an ARM maintainer to say which one of the two should be kept. r~
Re: [patch,fortran] Handle (signed) zeros, infinities and NaNs in some intrinsics
On Sat, Oct 11, 2014 at 03:13:00PM +0200, FX wrote: The attached patch fixes the compile-time simplification of special values (positive and negative zeros, infinities, and NaNs) in intrinsics EXPONENT, FRACTION, RRSPACING, SET_EXPONENT, SPACING. Those are all the intrinsics in the Fortran 2008 standard that say anything about these special values, so it makes sense to fix them. This is the compile-time part of PR 48979 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48979). Looks ok to me. -- Steve
Re: [Ping] [PATCH, 7/10] aarch64: add function to output ccmp insn
On 09/22/2014 11:45 PM, Zhenqiang Chen wrote: +static unsigned int +aarch64_code_to_nzcv (enum rtx_code code, bool inverse) { + switch (code) +{ +case NE: /* NE, Z == 0. */ + return inverse ? AARCH64_CC_Z : 0; +case EQ: /* EQ, Z == 1. */ + return inverse ? 0 : AARCH64_CC_Z; +case LE: /* LE, !(Z == 0 N == V). */ + return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_Z; +case GT: /* GT, Z == 0 N == V. */ + return inverse ? AARCH64_CC_Z : AARCH64_CC_N | AARCH64_CC_V; +case LT: /* LT, N != V. */ + return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_N; +case GE: /* GE, N == V. */ + return inverse ? AARCH64_CC_N : AARCH64_CC_N | AARCH64_CC_V; +case LEU: /* LS, !(C == 1 Z == 0). */ + return inverse ? AARCH64_CC_C: AARCH64_CC_Z; +case GTU: /* HI, C ==1 Z == 0. */ + return inverse ? AARCH64_CC_Z : AARCH64_CC_C; +case LTU: /* CC, C == 0. */ + return inverse ? AARCH64_CC_C : 0; +case GEU: /* CS, C == 1. */ + return inverse ? 0 : AARCH64_CC_C; +default: + gcc_unreachable (); + return 0; +} +} + I'm not overly fond of this, since code doesn't map 1-1. It needs the context of a mode to provide a unique mapping. I think it would be better to rearrange the existing aarch64_cond_code enum such that AARCH64_NE et al are meaningful wrt NZCV. Then you can use aarch64_get_condition_code_1 to get this mapping. +static unsigned +aarch64_mode_to_condition_code (enum machine_mode mode, bool inverse) { + switch (mode) +{ +case CC_DNEmode: + return inverse ? aarch64_get_condition_code_1 (CCmode, EQ) + : aarch64_get_condition_code_1 (CCmode, NE); This function is just silly. Modulo the unsigned result, which is wrong after the rebase, the whole thing reduces to return aarch64_get_condition_code_1 (mode, inverse ? EQ : NE); I'm really not sure what you're after here. +const char * +aarch64_output_ccmp (rtx *operands, bool is_and, int which_alternative) Is this really used more than once? I'm not fond of the use of which_alternative without the context of a pattern. I think this could simply be inlined. r~
[PATCH] Fix detection of thread support with uClibc in libgcc
__gthread_active_p() in libgcc checks for thread support by looking for the presence of a symbol from libpthread. With glibc, it looks for __pthread_key_create. However, it determines that glibc is being used by checking for a definition of __GLIBC__, which is also defined by uClibc (in include/features.h), but it does not export __pthread_key_create, causing the test to always fail. I've fixed this by extending the test for glibc to check that __UCLIBC__ is not defined, causing the default pthread_cancel to be tested with uClibc instead. This affects anything that uses the C++11 thread library together with the uClibc implementation of libpthread. This caused a large number of failed tests from the g++, libgomp and libstdc++ testsuites when run on a MIPS Linux target with uClibc as the C library. Kwok 2014-10-11 Kwok Cheung Yeung k...@codesourcery.com libgcc/ * gthr-posix.h (GTHR_ACTIVE_PROXY): Check that __UCLIBC__ is not defined before defining to __gthrw_(__pthread_key_create). Index: libgcc/gthr-posix.h === --- libgcc/gthr-posix.h (revision 216119) +++ libgcc/gthr-posix.h (working copy) @@ -232,7 +232,7 @@ library does not provide pthread_cancel, so we do use pthread_create there (and interceptor libraries lose). */ -#ifdef __GLIBC__ +#if defined (__GLIBC__) !defined (__UCLIBC__) __gthrw2(__gthrw_(__pthread_key_create), __pthread_key_create, pthread_key_create)
Re: [PATCH] Fix detection of thread support with uClibc in libgcc
On Sat, Oct 11, 2014 at 9:42 AM, Kwok Cheung Yeung k...@codesourcery.com wrote: __gthread_active_p() in libgcc checks for thread support by looking for the presence of a symbol from libpthread. With glibc, it looks for __pthread_key_create. However, it determines that glibc is being used by checking for a definition of __GLIBC__, which is also defined by uClibc (in include/features.h), but it does not export __pthread_key_create, causing the test to always fail. I've fixed this by extending the test for glibc to check that __UCLIBC__ is not defined, causing the default pthread_cancel to be tested with uClibc instead. Why is __GLIBC__ being defined for uclibc? That seems broken. We complain about __GNUC__ defined for other compilers besides GCC; we should do the same for defining __GLIBC__ also. Thanks, Andrew This affects anything that uses the C++11 thread library together with the uClibc implementation of libpthread. This caused a large number of failed tests from the g++, libgomp and libstdc++ testsuites when run on a MIPS Linux target with uClibc as the C library. Kwok 2014-10-11 Kwok Cheung Yeung k...@codesourcery.com libgcc/ * gthr-posix.h (GTHR_ACTIVE_PROXY): Check that __UCLIBC__ is not defined before defining to __gthrw_(__pthread_key_create). Index: libgcc/gthr-posix.h === --- libgcc/gthr-posix.h (revision 216119) +++ libgcc/gthr-posix.h (working copy) @@ -232,7 +232,7 @@ library does not provide pthread_cancel, so we do use pthread_create there (and interceptor libraries lose). */ -#ifdef __GLIBC__ +#if defined (__GLIBC__) !defined (__UCLIBC__) __gthrw2(__gthrw_(__pthread_key_create), __pthread_key_create, pthread_key_create)
Re: [PATCH] Fix detection of thread support with uClibc in libgcc
On 11/10/2014 5:56 PM, Andrew Pinski wrote: On Sat, Oct 11, 2014 at 9:42 AM, Kwok Cheung Yeung k...@codesourcery.com wrote: __gthread_active_p() in libgcc checks for thread support by looking for the presence of a symbol from libpthread. With glibc, it looks for __pthread_key_create. However, it determines that glibc is being used by checking for a definition of __GLIBC__, which is also defined by uClibc (in include/features.h), but it does not export __pthread_key_create, causing the test to always fail. I've fixed this by extending the test for glibc to check that __UCLIBC__ is not defined, causing the default pthread_cancel to be tested with uClibc instead. Why is __GLIBC__ being defined for uclibc? That seems broken. We complain about __GNUC__ defined for other compilers besides GCC; we should do the same for defining __GLIBC__ also. From the comments in include/features.h: /* There is an unwholesomely huge amount of code out there that depends on the * presence of GNU libc header files. We have GNU libc header files. So here * we commit a horrible sin. At this point, we _lie_ and claim to be GNU libc * to make things like /usr/include/linux/socket.h and lots of apps work as * their developers intended. This is IMHO, pardonable, since these defines * are not really intended to check for the presence of a particular library, * but rather are used to define an _interface_. */ So it looks like a compatibility hack... Kwok
Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
On Oct 10, 2014, at 8:32 PM, Bin.Cheng amker.ch...@gmail.com wrote: Though I guess if we run fusion + peep2 between sched1 and sched2, that problem would just resolve itself as we'd have fused AB together into a new insn and we'd schedule normally with the fused insns and X, Y. Yes, in my version, I ran it really early, before sched. I needed to run before ira and run before other people Two reasons why I run it late in compilation process. 1) IRA is the pass I tend not to disturb since code is changed dramatically. With it after IRA, I can get certain improvement from fusion, there is less noise here. Since I have a front-end background, I think nothing of creating pseudos when I want to, I just know if I do, I have to do this before allocation. :-) For my peepholes, since they create registers, they must run before allocation. 2) The spilling generates many load/store pair opportunities on ARM, which I don't want to miss. I happen to have enough registers that spilling wasn’t my primary concern. add rx, ry, rz ldr r1, [rx] ldr r2, [rx+4] ldr r3, [rx+8] It will be transformed into: add rx, ry, rz ldr r1, [ry+rz] ldr r2, [rx+4] ldr r3, [rx+8] Yeah, that seems to tickle a neuron. On the other hand, if you have left left right right There is no way to sort them to get: left right left right and then fuse: left_right left_right This would be impossible. I can't understand this very well, this exactly is one case we want to fuse on ARM for movw/movt. Given moww r1, const_1 This differs from the above by having r1 and const_1, in my example, there is no r1 and no const_1, this matters. I wanted to list a case where it is impossible to sort. This happens when there isn’t enough data to sort on, for example, no offset, no register number.
[fortran,patch]
After the compile-time simplification, this patch fixes the handling of special values (infinities and NaNs) by intrinsics EXPONENT, FRACTION, SPACING, RRSPACING SET_EXPONENT on the code generation side. Bootstrapped and regtested on x86_64-linux. OK to commit? intrinsics.ChangeLog Description: Binary data intrinsics.diff Description: Binary data
Re: [Ping] [PATCH, 8/10] aarch64: ccmp insn patterns
On 09/22/2014 11:45 PM, Zhenqiang Chen wrote: +(define_expand cbranchcc4 + [(set (pc) (if_then_else + (match_operator 0 aarch64_comparison_operator +[(match_operand 1 cc_register ) + (const_int 0)]) + (label_ref (match_operand 3 )) + (pc)))] + + ) Extra space. +(define_insn *ccmp_and + [(set (match_operand 6 ccmp_cc_register ) + (compare + (and:SI + (match_operator 4 aarch64_comparison_operator +[(match_operand 0 ccmp_cc_register ) + (match_operand 1 aarch64_plus_operand )]) + (match_operator 5 aarch64_comparison_operator +[(match_operand:GPI 2 register_operand r,r,r) + (match_operand:GPI 3 aarch64_ccmp_operand r,Uss,Usn)])) + (const_int 0)))] + + { +return aarch64_output_ccmp (operands, true, which_alternative); + } + [(set_attr type alus_sreg,alus_imm,alus_imm)] +) + +(define_insn *ccmp_ior + [(set (match_operand 6 ccmp_cc_register ) + (compare + (ior:SI + (match_operator 4 aarch64_comparison_operator +[(match_operand 0 ccmp_cc_register ) + (match_operand 1 aarch64_plus_operand )]) + (match_operator 5 aarch64_comparison_operator +[(match_operand:GPI 2 register_operand r,r,r) + (match_operand:GPI 3 aarch64_ccmp_operand r,Uss,Usn)])) + (const_int 0)))] + + { +return aarch64_output_ccmp (operands, false, which_alternative); + } + [(set_attr type alus_sreg,alus_imm,alus_imm)] Surely not aarch64_plus_operand for operand 1. That's a comparison with the flags register. Surely (const_int 0) is the only valid operand there. These could be combined with a code iterator, and thus there would be exactly one call to aarch64_output_ccmp, and thus inlined. Although... It seems to me that you don't need a function call at all. How about AND @ ccmp\\t%w2, %w3, %K5, %m4 ccmp\\t%w2, %w3, %K5, %m4 ccmn\\t%w2, #%n3, %K5, %m4 IOR @ ccmp\\t%w2, %w3, %k5, %M4 ccmp\\t%w2, %w3, %k5, %M4 ccmn\\t%w2, #%n3, %k5, %M4 where 'k' and 'K' are new print_operand codes that output the nzcv (or its inverse) integer for the comparison, much like 'm' and 'M' print the name of the comparison. r~
Re: [Ping] [PATCH, 9/10] aarch64: generate conditional compare instructions
On 09/22/2014 11:46 PM, Zhenqiang Chen wrote: +static bool +aarch64_convert_mode (rtx* op0, rtx* op1, int unsignedp) +{ + enum machine_mode mode; + + mode = GET_MODE (*op0); + if (mode == VOIDmode) +mode = GET_MODE (*op1); + + if (mode == QImode || mode == HImode) +{ + *op0 = convert_modes (SImode, mode, *op0, unsignedp); + *op1 = convert_modes (SImode, mode, *op1, unsignedp); +} + else if (mode != SImode mode != DImode) +return false; + + return true; +} Hum. I'd rather not replicate too much of the expander logic here. We could avoid that by using struct expand_operand, create_input_operand et al, then expand_insn. That does require that the target hooks be given trees rather than rtl as input. r~
Re: [Ping] [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov
On 09/22/2014 11:46 PM, Zhenqiang Chen wrote: @@ -2375,10 +2387,21 @@ noce_get_condition (rtx_insn *jump, rtx_insn **earliest, bool then_else_reversed return cond; } + /* For conditional compare, set ALLOW_CC_MODE to TRUE. */ + if (targetm.gen_ccmp_first) +{ + rtx prev = prev_nonnote_nondebug_insn (jump); + if (prev +NONJUMP_INSN_P (prev) +BLOCK_FOR_INSN (prev) == BLOCK_FOR_INSN (jump) +ccmp_insn_p (prev)) + allow_cc_mode = true; +} + /* Otherwise, fall back on canonicalize_condition to do the dirty work of manipulating MODE_CC values and COMPARE rtx codes. */ tmp = canonicalize_condition (jump, cond, reverse, earliest, - NULL_RTX, false, true); + NULL_RTX, allow_cc_mode, true); This needs a lot more explanation. Why it it ok to allow a cc_mode when the source is a ccmp, and not for any other comparison? The issue is going to be how we use the comparison once we've finished with the transformation. Is it going to be able to be properly handled by emit_conditional_move? If the target doesn't have cbranchcc4, I think that prep_cmp_insn will fail. But as you show from +++ b/gcc/config/aarch64/aarch64.md @@ -2589,15 +2589,19 @@ (match_operand:ALLI 3 register_operand )))] { -rtx ccreg; enum rtx_code code = GET_CODE (operands[1]); if (code == UNEQ || code == LTGT) FAIL; -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), - XEXP (operands[1], 1)); -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); +if (!ccmp_cc_register (XEXP (operands[1], 0), +GET_MODE (XEXP (operands[1], 0 + { + rtx ccreg; + ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0), + XEXP (operands[1], 1)); + operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); + } this change, even more than that may be required. r~
[PATCH] cleanups in line-map
A few cleanups in line-map code. Bootstrapped and regression tested on x86_64-linux-gnu. OK? libcpp/ChangeLog: 2014-10-12 Manuel López-Ibáñez m...@gcc.gnu.org * include/line-map.h (linemap_location_from_macro_expansion_p): const struct line_maps * argument. (linemap_position_for_line_and_column): const struct line_map * argument. * line-map.c (linemap_macro_map_loc_to_def_point): Delete redundant declaration. (linemap_add_macro_token): Use correct argument name in comment. (linemap_position_for_line_and_column): const struct line_map * argument. (linemap_macro_map_loc_to_def_point): Fix comment. Make static. (linemap_location_from_macro_expansion_p): const struct line_maps * argument. (linemap_resolve_location): Fix argument names in comment. Index: libcpp/include/line-map.h === --- libcpp/include/line-map.h (revision 216098) +++ libcpp/include/line-map.h (working copy) @@ -521,11 +521,11 @@ int linemap_location_in_system_header_p source_location); /* Return TRUE if LOCATION is a source code location of a token coming from a macro replacement-list at a macro expansion point, FALSE otherwise. */ -bool linemap_location_from_macro_expansion_p (struct line_maps *, +bool linemap_location_from_macro_expansion_p (const struct line_maps *, source_location); /* source_location values from 0 to RESERVED_LOCATION_COUNT-1 will be reserved for libcpp user as special values, no token from libcpp will contain any of those locations. */ @@ -597,13 +597,14 @@ bool linemap_location_from_macro_expansi extern source_location linemap_position_for_column (struct line_maps *, unsigned int); /* Encode and return a source location from a given line and column. */ -source_location linemap_position_for_line_and_column (struct line_map *, - linenum_type, - unsigned int); +source_location +linemap_position_for_line_and_column (const struct line_map *, + linenum_type, unsigned int); + /* Return the file this map is for. */ #define LINEMAP_FILE(MAP) \ (linemap_check_ordinary (MAP)-d.ordinary.to_file) /* Return the line number this map started encoding location from. */ Index: libcpp/line-map.c === --- libcpp/line-map.c (revision 216098) +++ libcpp/line-map.c (working copy) @@ -29,12 +29,10 @@ along with this program; see the file CO static void trace_include (const struct line_maps *, const struct line_map *); static const struct line_map * linemap_ordinary_map_lookup (struct line_maps *, source_location); static const struct line_map* linemap_macro_map_lookup (struct line_maps *, source_location); -static source_location linemap_macro_map_loc_to_def_point -(const struct line_map*, source_location); static source_location linemap_macro_map_loc_unwind_toward_spelling (const struct line_map*, source_location); static source_location linemap_macro_map_loc_to_exp_point (const struct line_map*, source_location); static source_location linemap_macro_loc_to_spelling_point @@ -482,11 +480,11 @@ linemap_enter_macro (struct line_maps *s definition, it is the locus in the macro definition; otherwise it is a location in the context of the caller of this macro expansion (which is a virtual location or a source location if the caller is itself a macro expansion or not). - MACRO_DEFINITION_LOC is the location in the macro definition, + ORIG_PARM_REPLACEMENT_LOC is the location in the macro definition, either of the token itself or of a macro parameter that it replaces. */ source_location linemap_add_macro_token (const struct line_map *map, @@ -619,11 +617,11 @@ linemap_position_for_column (struct line /* Encode and return a source location from a given line and column. */ source_location -linemap_position_for_line_and_column (struct line_map *map, +linemap_position_for_line_and_column (const struct line_map *map, linenum_type line, unsigned column) { linemap_assert (ORDINARY_MAP_STARTING_LINE_NUMBER (map) = line); @@ -770,19 +768,17 @@ linemap_macro_map_loc_to_exp_point (cons MACRO_MAP_NUM_MACRO_TOKENS (map)); return MACRO_MAP_EXPANSION_POINT_LOCATION (map); } -/* If LOCATION is the source location of a token that belongs to a - macro replacement-list -- as part of a macro expansion -- then - return the location of the token at the definition point of the - macro. Otherwise,
C++ PATCH for c++/62115 (ICE on invalid conversion to reference to base)
convert_like_real was getting confused because it was seeing a reference binding that we had marked as bad, but it couldn't tell what was bad about it. This happened because when we did the ck_base conversion the rvalue expression became an lvalue. Fixed by preserving rvalueness through convert_to_base. As a consequence of these changes I also needed to tweak build_dynamic_cast_1 so that we don't try to pass off a tree of REFERENCE_TYPE to build_static_cast. Tested x86_64-pc-linux-gnu, applying to trunk. commit 339616961e9ebd216bf73abd6e36e0a5f049ed71 Author: Jason Merrill ja...@redhat.com Date: Fri Oct 10 18:21:02 2014 -0400 PR c++/62115 * class.c (build_base_path): Preserve rvalueness. * call.c (convert_like_real) [ck_base]: Let convert_to_base handle /*. * rtti.c (build_dynamic_cast_1): Call convert_to_reference later. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 76d8eab..8a89aad 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -6341,10 +6341,8 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum, /* We are going to bind a reference directly to a base-class subobject of EXPR. */ /* Build an expression for `*((base*) expr)'. */ - expr = cp_build_addr_expr (expr, complain); - expr = convert_to_base (expr, build_pointer_type (totype), + expr = convert_to_base (expr, totype, !c_cast_p, /*nonnull=*/true, complain); - expr = cp_build_indirect_ref (expr, RO_IMPLICIT_CONVERSION, complain); return expr; } diff --git a/gcc/cp/class.c b/gcc/cp/class.c index b661187..99bfa95 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -251,6 +251,7 @@ build_base_path (enum tree_code code, int want_pointer = TYPE_PTR_P (TREE_TYPE (expr)); bool has_empty = false; bool virtual_access; + bool rvalue = false; if (expr == error_mark_node || binfo == error_mark_node || !binfo) return error_mark_node; @@ -324,8 +325,11 @@ build_base_path (enum tree_code code, } if (!want_pointer) -/* This must happen before the call to save_expr. */ -expr = cp_build_addr_expr (expr, complain); +{ + rvalue = !real_lvalue_p (expr); + /* This must happen before the call to save_expr. */ + expr = cp_build_addr_expr (expr, complain); +} else expr = mark_rvalue_use (expr); @@ -351,9 +355,7 @@ build_base_path (enum tree_code code, || in_template_function ()) { expr = build_nop (ptr_target_type, expr); - if (!want_pointer) - expr = build_indirect_ref (EXPR_LOCATION (expr), expr, RO_NULL); - return expr; + goto indout; } /* If we're in an NSDMI, we don't have the full constructor context yet @@ -364,9 +366,7 @@ build_base_path (enum tree_code code, { expr = build1 (CONVERT_EXPR, ptr_target_type, expr); CONVERT_EXPR_VBASE_PATH (expr) = true; - if (!want_pointer) - expr = build_indirect_ref (EXPR_LOCATION (expr), expr, RO_NULL); - return expr; + goto indout; } /* Do we need to check for a null pointer? */ @@ -402,6 +402,8 @@ build_base_path (enum tree_code code, { expr = cp_build_indirect_ref (expr, RO_NULL, complain); expr = build_simple_base_path (expr, binfo); + if (rvalue) + expr = move (expr); if (want_pointer) expr = build_address (expr); target_type = TREE_TYPE (expr); @@ -478,8 +480,13 @@ build_base_path (enum tree_code code, else null_test = NULL; + indout: if (!want_pointer) -expr = cp_build_indirect_ref (expr, RO_NULL, complain); +{ + expr = cp_build_indirect_ref (expr, RO_NULL, complain); + if (rvalue) + expr = move (expr); +} out: if (null_test) diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c index 10cc168..762953b 100644 --- a/gcc/cp/rtti.c +++ b/gcc/cp/rtti.c @@ -608,10 +608,6 @@ build_dynamic_cast_1 (tree type, tree expr, tsubst_flags_t complain) errstr = _(source is of incomplete class type); goto fail; } - - /* Apply trivial conversion T - T for dereferenced ptrs. */ - expr = convert_to_reference (exprtype, expr, CONV_IMPLICIT, - LOOKUP_NORMAL, NULL_TREE, complain); } /* The dynamic_cast operator shall not cast away constness. */ @@ -631,6 +627,11 @@ build_dynamic_cast_1 (tree type, tree expr, tsubst_flags_t complain) return build_static_cast (type, expr, complain); } + /* Apply trivial conversion T - T for dereferenced ptrs. */ + if (tc == REFERENCE_TYPE) +expr = convert_to_reference (exprtype, expr, CONV_IMPLICIT, + LOOKUP_NORMAL, NULL_TREE, complain); + /* Otherwise *exprtype must be a polymorphic class (have a vtbl). */ if (TYPE_POLYMORPHIC_P (TREE_TYPE (exprtype))) { diff --git a/gcc/testsuite/g++.dg/expr/cond6.C b/gcc/testsuite/g++.dg/expr/cond6.C index 943aa85..8f7f084 100644 --- a/gcc/testsuite/g++.dg/expr/cond6.C +++ b/gcc/testsuite/g++.dg/expr/cond6.C @@ -1,10 +1,11 @@ // { dg-do run } extern C void abort (); +bool ok =
Re: [Ping] [PATCH, 7/10] aarch64: add function to output ccmp insn
On 10/11/2014 09:11 AM, Richard Henderson wrote: On 09/22/2014 11:45 PM, Zhenqiang Chen wrote: +static unsigned int +aarch64_code_to_nzcv (enum rtx_code code, bool inverse) { + switch (code) +{ +case NE: /* NE, Z == 0. */ + return inverse ? AARCH64_CC_Z : 0; +case EQ: /* EQ, Z == 1. */ + return inverse ? 0 : AARCH64_CC_Z; +case LE: /* LE, !(Z == 0 N == V). */ + return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_Z; +case GT: /* GT, Z == 0 N == V. */ + return inverse ? AARCH64_CC_Z : AARCH64_CC_N | AARCH64_CC_V; +case LT: /* LT, N != V. */ + return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_N; +case GE: /* GE, N == V. */ + return inverse ? AARCH64_CC_N : AARCH64_CC_N | AARCH64_CC_V; +case LEU: /* LS, !(C == 1 Z == 0). */ + return inverse ? AARCH64_CC_C: AARCH64_CC_Z; +case GTU: /* HI, C ==1 Z == 0. */ + return inverse ? AARCH64_CC_Z : AARCH64_CC_C; +case LTU: /* CC, C == 0. */ + return inverse ? AARCH64_CC_C : 0; +case GEU: /* CS, C == 1. */ + return inverse ? 0 : AARCH64_CC_C; +default: + gcc_unreachable (); + return 0; +} +} + I'm not overly fond of this, since code doesn't map 1-1. It needs the context of a mode to provide a unique mapping. I think it would be better to rearrange the existing aarch64_cond_code enum such that AARCH64_NE et al are meaningful wrt NZCV. Then you can use aarch64_get_condition_code_1 to get this mapping. Slight mistake in the advice here. I think you should use aarch64_get_conditional_code_1 to get an aarch64_cond_code, and use that to index an array to get the nzcv bits. Further, does it actually make sense to store both nzcv and its inverse, or does it work to use nzcv and ~nzcv? r~