Re: [ARM] fix for PR49423
On Tue, Sep 25, 2012 at 5:32 PM, Dinar Temirbulatov dtemirbula...@gmail.com wrote: Hi Ramana, Here is obvious fix for PR49423, I just added pool range for Sorry for the late response - I've been on vacation. No it's not ok. These were removed deliberately and subsequent efforts to put these back on have been rejected as being incomplete. Please investigate why reload wants to reload the load of a constant 0 which is a valid constant for HImode values into the constant pool . In addition it might be that movhi_insn needs to grow support for movw at the right architecture level. With your patch and for example 920928-2.c you'll start seeing 0 going into the constant pool and an ldrh out of that ! regards Ramana
Re: [ARM] fix for PR49423
On Fri, Oct 12, 2012 at 11:13 PM, Ramana Radhakrishnan ramana@googlemail.com wrote: On Tue, Sep 25, 2012 at 5:32 PM, Dinar Temirbulatov dtemirbula...@gmail.com wrote: Hi Ramana, Here is obvious fix for PR49423, I just added pool range for Sorry for the late response - I've been on vacation. No it's not ok. These were removed deliberately and subsequent efforts to put these back on have been rejected as being incomplete. Please investigate why reload wants to reload the load of a constant 0 which is a valid constant for HImode values into the constant pool . In addition it might be that movhi_insn needs to grow support for movw at the right architecture level. With your patch and for example 920928-2.c you'll start seeing 0 going into the constant pool and an ldrh out of that ! In my case (scal-to-vec2.c at -O1 -msoft-float), the register pressure is huge and ira (not reload as far as I can tell) was trying to reduce it slightly by generating the constant (0x3002) which was pulled out of the loop. The register allocator (IRA or reload) got so confused it changed: (insn 357 355 358 7 (set (reg:SI 372 [ D.4448 ]) (zero_extend:SI (subreg:HI (reg:SI 745 [ D.4141+4 ]) 0))) src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec2.c:49 161 {*arm_zero_extendhisi2_v6} (nil)) Into: (insn 357 1072 1075 7 (set (reg:SI 8 r8) (zero_extend:SI (mem/u/c:HI (symbol_ref/u:SI (*.LC0) [flags 0x2]) [0 S2 A16]))) src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec2.c:49 161 {*arm_zero_extendhisi2_v6} (nil)) (insn 1075 357 1077 7 (set (mem/c:SI (plus:SI (reg/f:SI 11 fp) (const_int -60 [0xffc4])) [0 %sfp+-24 S4 A32]) (reg:SI 8 r8)) src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec2.c:49 181 {*arm_movsi_insn} (nil)) Thanks, Andrew Pinski regards Ramana
Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
I'm having to correct my own comments again, sorry. Richard Sandiford rdsandif...@googlemail.com writes: + /* If this is post-increment, first copy the location to the reload reg. */ + if (post real_in != result) +emit_insn (gen_move_insn (result, real_in)); Nit, but real_in != result can never be true AIUI, and I was confused how the code could be correct in that case. Maybe just remove it, or make it an assert? Probably obvious, but I meant real_in == result can never be true. real_in != result could be removed or turned into an assert. +if (GET_CODE (op) == PLUS) + { +plus = op; +op = XEXP (op, 1); + } Sorry, I'm complaining about old reload code again, but: does this actually happen in LRA? In reload, a register operand could become a PLUS because of elimination, but I thought LRA did things differently. Besides, this is only needed for: +if (CONST_POOL_OK_P (mode, op) + ((targetm.preferred_reload_class + (op, (enum reg_class) goal_alt[i]) == NO_REGS) +|| no_input_reloads_p) + mode != VOIDmode) + { +rtx tem = force_const_mem (mode, op); + +change_p = true; +/* If we stripped a SUBREG or a PLUS above add it back. */ +if (plus != NULL_RTX) + tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem); and we shouldn't have (plus (constant ...) ...) after elimination (or at all outside of a CONST). I don't understand why the code is needed even in reload. Scratch the thing about needing it for reload. It's obviously the second second operand we're reloading, not the first, and it could well be that an elimination displacement needs to be reloaded via the constant pool. The question for LRA still stands though. Richard
[i386] scalar ops that preserve the high part of a vector
Hello, this patch provides an alternate pattern to let combine recognize scalar operations that preserve the high part of a vector. If the strategy is all right, I could do the same for more operations (mul, div, ...). Something similar is also possible for V4SF (different pattern though), but probably not as useful. bootstrap+testsuite ok. 2012-10-13 Marc Glisse marc.gli...@inria.fr PR target/54855 gcc/ * config/i386/sse.md (*sse2_vmplusminus_insnv2df3): New define_insn. gcc/testsuite/ * gcc.target/i386/pr54855.c: New testcase. -- Marc GlisseIndex: config/i386/sse.md === --- config/i386/sse.md (revision 192420) +++ config/i386/sse.md (working copy) @@ -812,20 +812,38 @@ (const_int 1)))] TARGET_SSE @ plusminus_mnemonicssescalarmodesuffix\t{%2, %0|%0, %2} vplusminus_mnemonicssescalarmodesuffix\t{%2, %1, %0|%0, %1, %2} [(set_attr isa noavx,avx) (set_attr type sseadd) (set_attr prefix orig,vex) (set_attr mode ssescalarmode)]) +(define_insn *sse2_vmplusminus_insnv2df3 + [(set (match_operand:V2DF 0 register_operand =x,x) + (vec_concat:V2DF + (plusminus:DF + (vec_select:DF + (match_operand:V2DF 1 register_operand 0,x) + (parallel [(const_int 0)])) + (match_operand:DF 2 nonimmediate_operand xm,xm)) + (vec_select:DF (match_dup 1) (parallel [(const_int 1)]] + TARGET_SSE2 + @ + plusminus_mnemonicsd\t{%2, %0|%0, %2} + vplusminus_mnemonicsd\t{%2, %1, %0|%0, %1, %2} + [(set_attr isa noavx,avx) + (set_attr type sseadd) + (set_attr prefix orig,vex) + (set_attr mode DF)]) + (define_expand mulmode3 [(set (match_operand:VF 0 register_operand) (mult:VF (match_operand:VF 1 nonimmediate_operand) (match_operand:VF 2 nonimmediate_operand)))] TARGET_SSE ix86_fixup_binary_operands_no_copy (MULT, MODEmode, operands);) (define_insn *mulmode3 [(set (match_operand:VF 0 register_operand =x,x) Index: testsuite/gcc.target/i386/pr54855.c === --- testsuite/gcc.target/i386/pr54855.c (revision 0) +++ testsuite/gcc.target/i386/pr54855.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O -msse2 } */ + +typedef double vec __attribute__((vector_size(16))); + +vec f (vec x) +{ + x[0] += 2; + return x; +} + +vec g (vec x) +{ + x[0] -= 1; + return x; +} + +/* { dg-final { scan-assembler-not mov } } */ Property changes on: testsuite/gcc.target/i386/pr54855.c ___ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native
Re: PR54915 (ssa-forwprop, vec_perm_expr)
On Fri, 12 Oct 2012, Marc Glisse wrote: Hello, apparently, in the optimization that recognizes that {v[1],v[0]} is a VEC_PERM_EXPR, I forgot to check that v is a 2-element vector... (not that there aren't things that could be done if v has a different size, just not directly a VEC_PERM_EXPR, and not right now, priority is to fix the bug) Checking that v has the same type as the result seemed like the easiest way, but there are many variations that could be slightly better or worse. bootstrap+testsuite ok. 2012-10-02 Marc Glisse marc.gli...@inria.fr PR tree-optimization/54915 gcc/ * tree-ssa-forwprop.c (simplify_vector_constructor): Check argument's type. gcc/testsuite/ * gcc.dg/tree-ssa/pr54915.c: New testcase. This new version, with a slightly relaxed test, seems preferable and also passes testing. -- Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/pr54915.c === --- testsuite/gcc.dg/tree-ssa/pr54915.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/pr54915.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +typedef double v2df __attribute__ ((__vector_size__ (16))); +typedef double v4df __attribute__ ((__vector_size__ (32))); + +void f (v2df *ret, v4df* xp) +{ + v4df x = *xp; + v2df xx = { x[2], x[3] }; + *ret = xx; +} Property changes on: testsuite/gcc.dg/tree-ssa/pr54915.c ___ Added: svn:eol-style + native Added: svn:keywords + Author Date Id Revision URL Index: tree-ssa-forwprop.c === --- tree-ssa-forwprop.c (revision 192420) +++ tree-ssa-forwprop.c (working copy) @@ -2833,20 +2833,22 @@ simplify_vector_constructor (gimple_stmt ref = TREE_OPERAND (op1, 0); if (orig) { if (ref != orig) return false; } else { if (TREE_CODE (ref) != SSA_NAME) return false; + if (!useless_type_conversion_p (type, TREE_TYPE (ref))) + return false; orig = ref; } if (TREE_INT_CST_LOW (TREE_OPERAND (op1, 1)) != elem_size) return false; sel[i] = TREE_INT_CST_LOW (TREE_OPERAND (op1, 2)) / elem_size; if (sel[i] != i) maybe_ident = false; } if (i nelts) return false;
Re: [PATCH 0/6] Thread pointer built-in functions / [SH] PR 54760
On 2012/10/12 06:55 AM, Oleg Endo wrote: This broke the recently added thread pointer built-ins on SH, but I was prepared for that, so no problem here. The attached patch is a straight forward fix. However, with the patch applied I get an ICE on one of the SH thread pointer tests: gcc/testsuite/gcc.target/sh/pr54760-3.c, function test04: internal compiler error: in expand_insn, at optabs.c:8208 __builtin_set_thread_pointer (xx[i]); Looks like I was supposed to use create_input_operand() there instead. I've committed the attached patch as obvious. This should be fixed now. Thanks, Chung-Lin * builtins.c (expand_builtin_set_thread_pointer): Use create_input_operand() instead of create_fixed_operand(). Index: builtins.c === --- builtins.c (revision 192421) +++ builtins.c (revision 192422) @@ -5776,7 +5776,7 @@ struct expand_operand op; rtx val = expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, Pmode, EXPAND_NORMAL); - create_fixed_operand (op, val); + create_input_operand (op, val, Pmode); expand_insn (icode, 1, op); return; }
Re: [PATCH 0/6] Thread pointer built-in functions / [SH] PR 54760
On Sat, 2012-10-13 at 17:33 +0800, Chung-Lin Tang wrote: On 2012/10/12 06:55 AM, Oleg Endo wrote: This broke the recently added thread pointer built-ins on SH, but I was prepared for that, so no problem here. The attached patch is a straight forward fix. However, with the patch applied I get an ICE on one of the SH thread pointer tests: gcc/testsuite/gcc.target/sh/pr54760-3.c, function test04: internal compiler error: in expand_insn, at optabs.c:8208 __builtin_set_thread_pointer (xx[i]); Looks like I was supposed to use create_input_operand() there instead. I've committed the attached patch as obvious. This should be fixed now. Yep, confirmed. Thanks! Cheers, Oleg
[patch] PR54885
Hello, I hadn't realized that dead registers can still reach EQ-notes. Fixed with the attached patch. Bootstrappedtested with normal and release checking on x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu. OK for trunk? Ciao! Steven df_rd_lr_eq.diff Description: Binary data
PR fortran/51727: make module files reproducible, question on C++ in gcc
Hi, first a question also to non-gfortraners: if I want to use std::map, where do I #include map? In system.h? Now to the patch-specific part: in this PR, module files are produced with random changes because the order in which symbols are written can depend on the memory layout. This patch fixes this by recording which symbols need to be written and then processing them in order. The patch doesn't make the more involved effort of putting all symbols into the module in an easily predicted order, instead it only makes sure that the order remains fixed across the compiler invocations. The reason why the former is difficult is that during the process of writing a symbol, it can turn out that other symbols will have to be written as well (say, because they appear in array specifications). Since the module-writing code determines which symbols to output while actually writing the file, recording all the symbols that need to be written before writing to the file would mean a lot of surgery. I'm putting forward two patches. One uses a C++ map to very concisely build up and handle the ordered list of symbols. This has three problems: 1) gfortran maintainers may not want C++isms (even though in this case it's very localized, and in my opinion very transparent), and 2) it can't be backported to old release branches which are still compiled as C. Joost expressed interested in a backport. 3) I don't know where to #include map (see above) Therefore I also propose a patch where I added the necessary ~50 lines of boilerplate code and added the necessary traversal function to use gfortran's GFC_BBT to maintain the ordered tree of symbols. Both patches pass the testsuite and Joost confirms that they fix the problem with CP2K. I also verified with a few examples that they both produce identical .mod files as they should. Is the C++ patch, modified to do the #include correctly, ok for the trunk? If not, the C-only patch? Can I put the C-only patch on the release branches? And which? Cheers, - Tobi 2012-10-13 Tobias Schlüter t...@gcc.gnu.org PR fortran/51727 * module.c (sorted_pointer_info): New. (gfc_get_sorted_pointer_info): New. (free_sorted_pointer_info_tree): New. (compare_sorted_pointer_info): New. (find_symbols_to_write): New. (write_symbol1_recursion): New. (write_symbol1): Collect symbols that need writing, output in order. (write_generic): Traverse tree in order. diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c index 5cfc335..4cfcae4 100644 --- a/gcc/fortran/module.c +++ b/gcc/fortran/module.c @@ -5150,32 +5150,122 @@ write_symbol0 (gfc_symtree *st) } -/* Recursive traversal function to write the secondary set of symbols - to the module file. These are symbols that were not public yet are - needed by the public symbols or another dependent symbol. The act - of writing a symbol can modify the pointer_info tree, so we cease - traversal if we find a symbol to write. We return nonzero if a - symbol was written and pass that information upwards. */ +/* Type for the temporary tree used when writing secondary symbols. */ + +struct sorted_pointer_info +{ + BBT_HEADER (sorted_pointer_info); + + pointer_info *p; +}; + +#define gfc_get_sorted_pointer_info() XCNEW (sorted_pointer_info) + +/* Recursively traverse the temporary tree, free its contents. */ + +static void +free_sorted_pointer_info_tree (sorted_pointer_info *p) +{ + if (!p) +return; + + free_sorted_pointer_info_tree (p-left); + free_sorted_pointer_info_tree (p-right); + + free (p); +} + +/* Comparison function for the temporary tree. */ static int -write_symbol1 (pointer_info *p) +compare_sorted_pointer_info (void *_spi1, void *_spi2) { - int result; + sorted_pointer_info *spi1, *spi2; + spi1 = (sorted_pointer_info *)_spi1; + spi2 = (sorted_pointer_info *)_spi2; + if (spi1-p-integer spi2-p-integer) +return -1; + if (spi1-p-integer spi2-p-integer) +return 1; + return 0; +} + + +/* Finds the symbols that need to be written and collects them in the + sorted_pi tree so that they can be traversed in an order + independent of memory addresses. */ + +static void +find_symbols_to_write(sorted_pointer_info **tree, pointer_info *p) +{ + if (!p) +return; + + if (p-type == P_SYMBOL p-u.wsym.state == NEEDS_WRITE) +{ + sorted_pointer_info *sp = gfc_get_sorted_pointer_info(); + sp-p = p; + + gfc_insert_bbt (tree, sp, compare_sorted_pointer_info); + } + + find_symbols_to_write (tree, p-left); + find_symbols_to_write (tree, p-right); +} + + +/* Recursive function that traverses the tree of symbols that need to be + written and writes them in order. */ + +static void +write_symbol1_recursion (sorted_pointer_info *sp) +{ + if (!sp) +return; + + write_symbol1_recursion (sp-left); + + pointer_info *p1 = sp-p; + gcc_assert (p1-type == P_SYMBOL
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
ps I forgot to mention that I also changed write_generic to traverse the tree in defined order, this is the same in the C++ and the C-only patch. Cheers, - Tobi On 2012-10-13 15:26, Tobias Schlüter wrote: Hi, first a question also to non-gfortraners: if I want to use std::map, where do I #include map? In system.h? Now to the patch-specific part: in this PR, module files are produced with random changes because the order in which symbols are written can depend on the memory layout. This patch fixes this by recording which symbols need to be written and then processing them in order. The patch doesn't make the more involved effort of putting all symbols into the module in an easily predicted order, instead it only makes sure that the order remains fixed across the compiler invocations. The reason why the former is difficult is that during the process of writing a symbol, it can turn out that other symbols will have to be written as well (say, because they appear in array specifications). Since the module-writing code determines which symbols to output while actually writing the file, recording all the symbols that need to be written before writing to the file would mean a lot of surgery. I'm putting forward two patches. One uses a C++ map to very concisely build up and handle the ordered list of symbols. This has three problems: 1) gfortran maintainers may not want C++isms (even though in this case it's very localized, and in my opinion very transparent), and 2) it can't be backported to old release branches which are still compiled as C. Joost expressed interested in a backport. 3) I don't know where to #include map (see above) Therefore I also propose a patch where I added the necessary ~50 lines of boilerplate code and added the necessary traversal function to use gfortran's GFC_BBT to maintain the ordered tree of symbols. Both patches pass the testsuite and Joost confirms that they fix the problem with CP2K. I also verified with a few examples that they both produce identical .mod files as they should. Is the C++ patch, modified to do the #include correctly, ok for the trunk? If not, the C-only patch? Can I put the C-only patch on the release branches? And which? Cheers, - Tobi
[PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
Hello, This fixes the long-standing enhancement request to use DF_LIVE in IRA. To quote the first comment in the PR: IRA should be using the DF-LIVE sets, which are smaller than the DF-LR sets when they are available (typically at O2 and above). The proper sets can be conveniently accessed using the df_get_live_[in,out] functions which use DF-LIVE if it is available and fall back to DF-LR if it is not. I thought that DF_LIVE wasn't available at -O1 in IRA, but interestingly ira.c:ira() adds it to the DF-problems list in that case already: $ grep -C3 -n df_live_add ira.c 4160- 4161- if (optimize == 1) 4162-{ 4163: df_live_add_problem (); 4164- df_live_set_all_dirty (); 4165-} 4166-#ifdef ENABLE_CHECKING So DF_LIVE is already being computed for IRA. It's just not being used because the DF_LR_{IN,OUT} macros are used instead of the df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are used, which results in marginally faster compile times for my set of cc1-i files on power64 and on x86_64, as well as smaller code. There's also a good speedup for the PR54146 test case. Only reload.c still uses the DF_LR_OUT. The comment in find_dummy_reload() suggests this is intentional. I also fond a bug in IRA: 4392- touch the artificial uses and defs. */ 4393- df_finish_pass (true); 4394- if (optimize 1) 4395:df_live_add_problem (); 4396- df_scan_alloc (NULL); 4397- df_scan_blocks (); 4398- At optimize1 the DF_LIVE problem is already always computed. I think the idea here was to *remove* the DF_LIVE problem at -O1, which is also part of the attached patch. (Perhaps we should even simply not add the DF_LIVE problem at -O1, but that can be done in a follow-up patch if necessary.) Bootstrapped and tested (with default checking and with release checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu. OK for trunk? Ciao! Steven PR rtl-optimization/38711 * df.h (df_get_live_out, df_get_live_in): Make static inline functions. * df-problems.c (df_get_live_out, df_get_live_in): Moved to df.h. * ira-lives.c (process_bb_node_lives): Use df_get_live_out instead of DF_LR_OUT. * ira-build.c (create_bb_allocnos): Likewise. (create_loop_allocnos): Likewise, and use df_get_live_in instead of DF_LR_IN. * ira-emit.c (generate_edge_moves): Likewise. (add_ranges_and_copies): Likewise. * ira.c (mark_elimination): Update DF_LR and DF_LIVE. (build_insn_chain): Use df_get_live_out instead of DF_LR_OUT. (do_reload): Remove the DF_LIVE problem for -O1. * ira-color.c (ira_loop_edge_freq): Use df_get_live_out instead of DF_LR_OUT, and df_get_live_in instead of DF_LR_IN. ira-speedup-3.diff Description: Binary data
Re: [RFC PATCH] Add support for sparc compare-and-branch.
The trouble area, and where I need help from either Rainer or Eric, is the Solaris2 bits. I think we need to move the Solaris assembler stuff over to a model where it passes: -m{32,64} -xarch=sparcFOO instead of using the v8plusX stuff to indicate 32bit. And that's the direction I tried to move in here. The only versions of the Solaris assembler I have access to only support v8plusX according to the man page. Has that changed recently? Could one of you help me get the solaris side correct? I made sure that binutils accepts the same options for this stuff, that's why I can unconditionally use '-xarch=sparc4' in the configure test. sparc4 sounds very 1990-ish... * config/sparc/sparc.opt (mvis4): New option. * config/sparc/sparc-c.c (sparc_target_macros): When TARGET_VIS4, define __VIS__ to 0x400. What's the relationship between VIS4 and SPARC-T4 exactly? The above manual only speaks of VIS3 as far as I can see. And the CBcond instructions are not marked as belonging to VIS (3 or 4), so using -mvis4 for them seems strange. Why not make them depend on -mcpu=niagara4 instead? -- Eric Botcazou
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On 12-10-13 9:40 AM, Steven Bosscher wrote: Hello, This fixes the long-standing enhancement request to use DF_LIVE in IRA. To quote the first comment in the PR: IRA should be using the DF-LIVE sets, which are smaller than the DF-LR sets when they are available (typically at O2 and above). The proper sets can be conveniently accessed using the df_get_live_[in,out] functions which use DF-LIVE if it is available and fall back to DF-LR if it is not. I thought that DF_LIVE wasn't available at -O1 in IRA, but interestingly ira.c:ira() adds it to the DF-problems list in that case already: $ grep -C3 -n df_live_add ira.c 4160- 4161- if (optimize == 1) 4162-{ 4163: df_live_add_problem (); 4164- df_live_set_all_dirty (); 4165-} 4166-#ifdef ENABLE_CHECKING So DF_LIVE is already being computed for IRA. It's just not being used because the DF_LR_{IN,OUT} macros are used instead of the df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are used, which results in marginally faster compile times for my set of cc1-i files on power64 and on x86_64, as well as smaller code. There's also a good speedup for the PR54146 test case. Only reload.c still uses the DF_LR_OUT. The comment in find_dummy_reload() suggests this is intentional. I also fond a bug in IRA: 4392- touch the artificial uses and defs. */ 4393- df_finish_pass (true); 4394- if (optimize 1) 4395:df_live_add_problem (); 4396- df_scan_alloc (NULL); 4397- df_scan_blocks (); 4398- At optimize1 the DF_LIVE problem is already always computed. I think the idea here was to *remove* the DF_LIVE problem at -O1, which is also part of the attached patch. (Perhaps we should even simply not add the DF_LIVE problem at -O1, but that can be done in a follow-up patch if necessary.) Bootstrapped and tested (with default checking and with release checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu. OK for trunk? LIVE is not used on purpose. I added LIVE usage to the old RA about 10 years ago (it was before DF-infrastructure). Everything was ok until, somebody reported compiler crash on semantically wrong program (something with usage of uninitialized variable). After that I removed this code. Probably, that is also a reason why reload uses LR not LIVE. So I'd like to wait a few weeks. When I have more time, I am going to reproduce this test suite and look how IRA behaves on it. Thanks, Steven.
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On Sat, Oct 13, 2012 at 5:12 PM, Vladimir Makarov wrote: On 12-10-13 9:40 AM, Steven Bosscher wrote: Hello, This fixes the long-standing enhancement request to use DF_LIVE in IRA. To quote the first comment in the PR: IRA should be using the DF-LIVE sets, which are smaller than the DF-LR sets when they are available (typically at O2 and above). The proper sets can be conveniently accessed using the df_get_live_[in,out] functions which use DF-LIVE if it is available and fall back to DF-LR if it is not. I thought that DF_LIVE wasn't available at -O1 in IRA, but interestingly ira.c:ira() adds it to the DF-problems list in that case already: $ grep -C3 -n df_live_add ira.c 4160- 4161- if (optimize == 1) 4162-{ 4163: df_live_add_problem (); 4164- df_live_set_all_dirty (); 4165-} 4166-#ifdef ENABLE_CHECKING So DF_LIVE is already being computed for IRA. It's just not being used because the DF_LR_{IN,OUT} macros are used instead of the df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are used, which results in marginally faster compile times for my set of cc1-i files on power64 and on x86_64, as well as smaller code. There's also a good speedup for the PR54146 test case. Only reload.c still uses the DF_LR_OUT. The comment in find_dummy_reload() suggests this is intentional. I also fond a bug in IRA: 4392- touch the artificial uses and defs. */ 4393- df_finish_pass (true); 4394- if (optimize 1) 4395:df_live_add_problem (); 4396- df_scan_alloc (NULL); 4397- df_scan_blocks (); 4398- At optimize1 the DF_LIVE problem is already always computed. I think the idea here was to *remove* the DF_LIVE problem at -O1, which is also part of the attached patch. (Perhaps we should even simply not add the DF_LIVE problem at -O1, but that can be done in a follow-up patch if necessary.) Bootstrapped and tested (with default checking and with release checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu. OK for trunk? LIVE is not used on purpose. I added LIVE usage to the old RA about 10 years ago (it was before DF-infrastructure). Everything was ok until, somebody reported compiler crash on semantically wrong program (something with usage of uninitialized variable). After that I removed this code. GCC was a completely different compiler 10 years ago. Handling of uninitialized variables is radically different since tree-ssa, and also the resource allocation side of the compiler has essentially been rewritten (by you :-). Whatever broke in the compiler 10 years ago may not be relevant anymore today. I can't find your work, or the reported problem, in the gcc mailing list archives. Did you add a test case for the problem, and if so, where can I find it? Probably, that is also a reason why reload uses LR not LIVE. The reason why reload uses DF_LR is PR20973, as documented in the code. (And FWIW, that bug doesn't reproduce even with GCC 4.0 if you add the transformations of pass_initialize_regs to it, that pass exists to avoid uninitialized regs hazards by making them initialized.) So I'd like to wait a few weeks. When I have more time, I am going to reproduce this test suite and look how IRA behaves on it. I don't think that's giving this patch a fair chance. In a few week's time, stage1 will be over. You may not have time, but that doesn't mean other developers should wait. If you have a test case that fails with my patch, I'll gladly spend my own time on it to see if it can be fixed. My patch was properly tested on two primary targets and passes all tests in the test suite, it improves the compiler now, and it addresses a PR that you've basically ignored for 4 years. I kindly ask you to reconsider your position :-) Ciao! Steven
Re: [lra] patch to fix GCC crash on a SPEC2006 test
On Thu, 2012-10-11 at 23:53 -0400, Vladimir Makarov wrote: My biggest problem is ESL. I should use simpler phrases. Heh, I'm not sure compiler speak qualifies as English. :) Is the following comment better? Presence of any pseudo in CALL_INSN_FUNCTION_USAGE does not affect value of insn_bitmap of the corresponding lra_reg_info. That is because we don't need to reload pseudos in CALL_INSN_FUNCTION_USAGEs. So if we process only insns in the insn_bitmap of given pseudo here, we can miss the pseudo in some CALL_INSN_FUNCTION_USAGEs. Sure, that's better. Thanks. Peter
[PATCH] Fix gcov handling directories with periods
PR gcov-profile/44728 * gcov.c (create_file_names): When stripping extension only look at base name. --- gcc/gcov.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/gcov.c b/gcc/gcov.c index cf26ce1..09831c2 100644 --- a/gcc/gcov.c +++ b/gcc/gcov.c @@ -842,7 +842,7 @@ create_file_names (const char *file_name) } /* Remove the extension. */ - cptr = strrchr (name, '.'); + cptr = strrchr (CONST_CAST (char *, lbasename (name)), '.'); if (cptr) *cptr = 0; -- 1.7.12.3 -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Two obvious loop-iv fixes
Hi, while proofreading loop-iv for possible cause of the miscompilation that turned out to be webizer bug I noticed two minor problems. 1) determine_max_iter has path dealing with AND that is bogus, because constant argument is always canonized to be second. Enabling the path however somethimes leads to worse results, so I improved it a bit by combining both estimates. 2) bounds are recorded as signed values intead of unsigned. This means that values with upper bit set gets promoted to almost infinity for double int. Bootstrappedregtested x86_64-linux, comitted as obvious. Honza * loop-iv.c (determine_max_iter): Fix handling of AND. (iv_number_of_iterations): Record upper bounds as unsigned values. Index: loop-iv.c === --- loop-iv.c (revision 192409) +++ loop-iv.c (working copy) @@ -2224,13 +2224,18 @@ determine_max_iter (struct loop *loop, s rtx niter = desc-niter_expr; rtx mmin, mmax, cmp; unsigned HOST_WIDEST_INT nmax, inc; + unsigned HOST_WIDEST_INT andmax = 0; + + /* We used to look for constant operand 0 of AND, + but canonicalization should always make this impossible. */ + gcc_checking_assert (GET_CODE (niter) != AND + || !CONST_INT_P (XEXP (niter, 0))); if (GET_CODE (niter) == AND - CONST_INT_P (XEXP (niter, 0))) + CONST_INT_P (XEXP (niter, 1))) { - nmax = INTVAL (XEXP (niter, 0)); - if (!(nmax (nmax + 1))) - return nmax; + andmax = UINTVAL (XEXP (niter, 1)); + niter = XEXP (niter, 0); } get_mode_bounds (desc-mode, desc-signed_p, desc-mode, mmin, mmax); @@ -2258,7 +2263,13 @@ determine_max_iter (struct loop *loop, s if (dump_file) fprintf (dump_file, ;; improved upper bound by one.\n); } - return nmax / inc; + nmax /= inc; + if (andmax) +nmax = MIN (nmax, andmax); + if (dump_file) +fprintf (dump_file, ;; Determined upper bound HOST_WIDEST_INT_PRINT_DEC.\n, +nmax); + return nmax; } /* Computes number of iterations of the CONDITION in INSN in LOOP and stores @@ -2563,7 +2574,7 @@ iv_number_of_iterations (struct loop *lo ? iv0.base : mode_mmin); max = (up - down) / inc + 1; - record_niter_bound (loop, double_int::from_shwi (max), + record_niter_bound (loop, double_int::from_uhwi (max), false, true); if (iv0.step == const0_rtx) @@ -2776,14 +2787,14 @@ iv_number_of_iterations (struct loop *lo desc-const_iter = true; desc-niter = val GET_MODE_MASK (desc-mode); - record_niter_bound (loop, double_int::from_shwi (desc-niter), + record_niter_bound (loop, double_int::from_uhwi (desc-niter), false, true); } else { max = determine_max_iter (loop, desc, old_niter); gcc_assert (max); - record_niter_bound (loop, double_int::from_shwi (max), + record_niter_bound (loop, double_int::from_uhwi (max), false, true); /* simplify_using_initial_values does a copy propagation on the registers
Re: Two obvious loop-iv fixes
On Sat, Oct 13, 2012 at 6:54 PM, Jan Hubicka wrote: 2) bounds are recorded as signed values intead of unsigned. This means that values with upper bit set gets promoted to almost infinity for double int. Could you check and see if this change fixes PR54919? Ciao! Steven
Re: Use conditional casting with symtab_node
On 10/12/12, Richard Biener richard.guent...@gmail.com wrote: On Oct 11, 2012 Xinliang David Li davi...@google.com wrote: On Oct 11, 2012 Lawrence Crowl cr...@googlers.com wrote: On 10/10/12, Xinliang David Li davi...@google.com wrote: In a different thread, I proposed the following alternative to 'try_xxx': templatetypename T T* symbol::cast_to(symbol* p) { if (p-isT()) return static_castT*(p); return 0; } cast: templatetypename T T symbol:as(symbol* p) { assert(p-isT()) return static_castT(*p); } My concern here was never the technical feasibility, nor the desirability of having the facility for generic code, but the amount of the amount of typing in non-generic contexts. That is if (cgraph_node *q = p-try_function ()) versus if (cgraph_node *q = p-cast_to cgraph_node * ()) I thought the latter would generate objections. Does anyone object to the extra typing? If not, I can make that change. Think about a more complex class hierarchy and interface consistency. I believe you don't want this: tree::try_decl () { .. } tree::try_ssa_name () { ..} tree::try_type() {...} tree::try_int_const () {..} tree::try_exp () { ...} .. Rather one (or two with the const_cast version). template T T *tree::cast_to () { if (kind_ == kind_traitsT::value) return static_castT* (this); return 0; } I also think that instead of if (cgraph_node *q = p-cast_to cgraph_node * ()) we want if ((q = cast_to cgraph_node * (p)) I see absolutely no good reason to make cast_to a member, given that the language has static_cast, const_cast and stuff. cast_to would simply be our equivalent to dynamic_cast within our OO model. Sorry, that was a think-o. Agreed there is no point in it being a member. Then I'd call it *_cast instead of cast_*, so, why not gcc_cast ? Or dyn_cast (). That way if ((q = dyn_cast function * (p)) would be how to use it. Which function? The point with my original proposal is that the kind of function was derivable from context, and thus can be shorter. Small enough, compared to if ((q = p-try_function ())) and a lot more familiar to folks knowing C++ (and probably doesn't make a difference to others). Thus, please re-use or follow existing concepts. Both are existing concepts. What I proposed is a common technique for avoiding the cost of dynamic_cast when down casting in a class hierarchy. Both concepts will work. I proposed what I thought would be most convenient to programmers. However, I will change to the other form unless someone objects. As for the example with tree we'd then have if ((q = dyn_cast INTEGER_CST (p))) if we can overload on the template parameter kind (can we? typename vs. enum?) There are two template parameters, one for the argument type and one for the return type. We can overload on the argument type but not on the return type. We can, however, (indirectly) partially specialize on the return type. We need to do that anyway to represent the fact that not all type conversions are legal. However, I recommend against specializing on the enum, as it would become somewhat obscure when the hierarchy is discriminated on more than one enum. or use tree_cast (no I don't want dyn_cast tree_common all around the code). Once we have proper class hierarchies, derived to base conversions are implicit. In the meantime, we need something else. -- Lawrence Crowl
Re: [PATCH] Fix gcov handling directories with periods
On Sat, Oct 13, 2012 at 8:51 AM, Andreas Schwab sch...@linux-m68k.org wrote: PR gcov-profile/44728 * gcov.c (create_file_names): When stripping extension only look at base name. diff --git a/gcc/gcov.c b/gcc/gcov.c index cf26ce1..09831c2 100644 --- a/gcc/gcov.c +++ b/gcc/gcov.c @@ -842,7 +842,7 @@ create_file_names (const char *file_name) } /* Remove the extension. */ - cptr = strrchr (name, '.'); + cptr = strrchr (CONST_CAST (char *, lbasename (name)), '.'); if (cptr) *cptr = 0; Why do you need the CONST_CAST? strrchr is a standard function and it takes const char * as the first argument. There is other code in gcc that calls strrchr with a const char * argument. This patch is OK without the CONST_CAST. Thanks. Ian
Re: Use conditional casting with symtab_node
On Sat, Oct 13, 2012 at 12:44 PM, Lawrence Crowl cr...@googlers.com wrote: Thus, please re-use or follow existing concepts. Both are existing concepts. What I proposed is a common technique for avoiding the cost of dynamic_cast when down casting in a class hierarchy. Both concepts will work. I proposed what I thought would be most convenient to programmers. However, I will change to the other form unless someone objects. Let me elaborate on your point. The concept you are trying to implement is that of: (a) check whether a tree is of a certain kind; (b) if so return a pointer to the (sub-)object of that kind; otherwise a null pointer. This is a standard idiom -- at least when using C++. It can be implemented in various ways. Your earlier attempt try_xxx is one example. Another common example, built into the language is dynamic_cast. The latter requires that classes involved in this test be polymorphic (because the builtin implementation needs to consult metadata that are already present with virtual functions.) Yet, another implementation is what is currently in GCC more-or-less. I think we should name the operation based on the abstract concept we are implementing as opposed the builtin language implementation. That is why I recommend is over dyn_cast or variations of it. We should be able to understand its uses without having to know the implementation details. However, I recommend against specializing on the enum, as it would become somewhat obscure when the hierarchy is discriminated on more than one enum. 100% agreed. -- Gaby
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On 2012-10-13 09:26 , Tobias Schlüter wrote: Hi, first a question also to non-gfortraners: if I want to use std::map, where do I #include map? In system.h? No. Ada includes system.h in pure C code. Why not include it exactly where you need it? Diego.
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On 2012-10-13 20:00, Diego Novillo wrote: On 2012-10-13 09:26 , Tobias Schlüter wrote: first a question also to non-gfortraners: if I want to use std::map, where do I #include map? In system.h? No. Ada includes system.h in pure C code. Why not include it exactly where you need it? Ok, I was just wondering if there's a rule because a quick grep revealed no non-header source file that includes system headers. Thanks, - Tobi
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On 2012-10-13 14:04 , Tobias Schlüter wrote: On 2012-10-13 20:00, Diego Novillo wrote: On 2012-10-13 09:26 , Tobias Schlüter wrote: first a question also to non-gfortraners: if I want to use std::map, where do I #include map? In system.h? No. Ada includes system.h in pure C code. Why not include it exactly where you need it? Ok, I was just wondering if there's a rule because a quick grep revealed no non-header source file that includes system headers. Joseph or others may have reason to create a system.hxx file or some such, but in principle I don't see why we shouldn't include std library headers as we need them. Joseph? Diego.
Re: [PATCH] Fix gcov handling directories with periods
Ian Lance Taylor i...@google.com writes: Why do you need the CONST_CAST? strrchr is a standard function and it takes const char * as the first argument. There is other code in gcc that calls strrchr with a const char * argument. strrchr is overloaded as const and non-const in C++. We need the non-const version. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [RFC] find_reloads_subreg_address rework triggers i386 back-end issue
On Fri, Oct 12, 2012 at 7:57 PM, Ulrich Weigand uweig...@de.ibm.com wrote: Hello, I was running a couple of tests on various platforms in preparation of getting the find_reload_subreg_address patch needed by aarch64 upstream: http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01421.html This unfortunately uncovered a regression in vect-98-big-array.c on i386. It seems to me that this is due to a pre-existing problem in the back-end. [...] But in principle the problem is latent even without the patch. The back-end should not permit reload to make changes to the pattern that fundamentally change the semantics of the resulting instruction ... I was wondering if the i386 port maintainers could have a look at this pattern. Shouldn't we really have two patterns, one to *load* an unaligned value and one to *store* and unaligned value, and not permit that memory access to get reloaded? Please find attached a fairly mechanical patch that splits move_unaligned pattern into load_unaligned and store_unaligned patterns. We've had some problems with this pattern, and finally we found the reason to make unaligned moves more robust. I will wait for the confirmation that attached patch avoids the failure you are seeing with your reload patch. [I guess a more fundamental change might also be to not have an unspec in the first place, but only plain moves, and check MEM_ALIGN in the move insn emitter to see which variant of the instruction is required ...] We already do this for AVX moves, but SSE aligned moves should ICE on unaligned access. This is handy to find errors in user programs (or ... well ... in the compiler itself). Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 192420) +++ config/i386/i386.c (working copy) @@ -16059,7 +16059,8 @@ ix86_avx256_split_vector_move_misalign (rtx op0, r { rtx m; rtx (*extract) (rtx, rtx, rtx); - rtx (*move_unaligned) (rtx, rtx); + rtx (*load_unaligned) (rtx, rtx); + rtx (*store_unaligned) (rtx, rtx); enum machine_mode mode; switch (GET_MODE (op0)) @@ -16068,39 +16069,52 @@ ix86_avx256_split_vector_move_misalign (rtx op0, r gcc_unreachable (); case V32QImode: extract = gen_avx_vextractf128v32qi; - move_unaligned = gen_avx_movdqu256; + load_unaligned = gen_avx_loaddqu256; + store_unaligned = gen_avx_storedqu256; mode = V16QImode; break; case V8SFmode: extract = gen_avx_vextractf128v8sf; - move_unaligned = gen_avx_movups256; + load_unaligned = gen_avx_loadups256; + store_unaligned = gen_avx_storeups256; mode = V4SFmode; break; case V4DFmode: extract = gen_avx_vextractf128v4df; - move_unaligned = gen_avx_movupd256; + load_unaligned = gen_avx_loadupd256; + store_unaligned = gen_avx_storeupd256; mode = V2DFmode; break; } - if (MEM_P (op1) TARGET_AVX256_SPLIT_UNALIGNED_LOAD) + if (MEM_P (op1)) { - rtx r = gen_reg_rtx (mode); - m = adjust_address (op1, mode, 0); - emit_move_insn (r, m); - m = adjust_address (op1, mode, 16); - r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m); - emit_move_insn (op0, r); + if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD) + { + rtx r = gen_reg_rtx (mode); + m = adjust_address (op1, mode, 0); + emit_move_insn (r, m); + m = adjust_address (op1, mode, 16); + r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m); + emit_move_insn (op0, r); + } + else + emit_insn (load_unaligned (op0, op1)); } - else if (MEM_P (op0) TARGET_AVX256_SPLIT_UNALIGNED_STORE) + else if (MEM_P (op0)) { - m = adjust_address (op0, mode, 0); - emit_insn (extract (m, op1, const0_rtx)); - m = adjust_address (op0, mode, 16); - emit_insn (extract (m, op1, const1_rtx)); + if (TARGET_AVX256_SPLIT_UNALIGNED_STORE) + { + m = adjust_address (op0, mode, 0); + emit_insn (extract (m, op1, const0_rtx)); + m = adjust_address (op0, mode, 16); + emit_insn (extract (m, op1, const1_rtx)); + } + else + emit_insn (store_unaligned (op0, op1)); } else -emit_insn (move_unaligned (op0, op1)); +gcc_unreachable (); } /* Implement the movmisalign patterns for SSE. Non-SSE modes go @@ -16195,7 +16209,7 @@ ix86_expand_vector_move_misalign (enum machine_mod op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); /* We will eventually emit movups based on insn attributes. */ - emit_insn (gen_sse2_movdqu (op0, op1)); + emit_insn (gen_sse2_loaddqu (op0, op1)); } else if (TARGET_SSE2 mode == V2DFmode) { @@ -16207,7 +16221,7 @@ ix86_expand_vector_move_misalign (enum machine_mod || optimize_function_for_size_p (cfun)) { /* We will eventually
Re: encoding all aliases options in .opt files
On 12 October 2012 17:18, Joseph S. Myers jos...@codesourcery.com wrote: On Fri, 12 Oct 2012, Manuel López-Ibáñez wrote: I am trying to encode the relationship between Wstrict-aliasing and Wstrict-aliasing= in the .opt files, and the same for Wstrict-overflow and Wstrict-overflow=. However, the parameters of Alias() are taken as strings, so we get 3 and WARN_STRICT_OVERFLOW_CONDITIONAL. Do you have a suggestion how to handle this? I don't see what the problem is supposed to be. Yes, the arguments are strings - they describe how one option that the user might pass on the command line is exactly equivalent to another that they might pass (up to and including the latter form of the option being the one that is used when matching specs and multilibs, because the driver does the translation, including working out the canonical textual form of the option, before processing specs against the command line). If -Wstrict-overflow on the command line is equivalent to -Wstrict-overflow=WARN_STRICT_OVERFLOW_CONDITIONAL on the command line, then specify WARN_STRICT_OVERFLOW_CONDITIONAL in the parameters. If it's equivalent to something else, specify something else. My next step would be to handle aliases also when internally generating options, so I can add EnabledBy(Wall) to Wstrict-aliasing. Does this sound ok to you? That seems reasonable. OK. The attached patch implements this. Does the approach look ok? I will write changelog and more comments if it seems reasonable. One thing I don't like is that now we print [-Werror=strict-aliasing=] in diagnostics. Two questions: 1) Are references allowed now that C++ is the default? 2) While fixing this, I was thinking: Why the difference between Joined and Separate? Why not make every function that takes an argument work with and without '=' and with separate argument. It seems we could remove a lot of options this way. What I am proposing is, instead of: aux-info Common Separate Var(aux_info_file_name) -aux-info fileEmit declaration information into file aux-info= Common Joined Alias(aux-info) simply aux-info Common RequiredArgument Var(aux_info_file_name) -aux-info fileEmit declaration information into file and all of these would be accepted: aux-infoFILE /* we could accept this to be compatible with some options like -B */ aux-info=FILE aux-info FILE We have dozens of options like these. Moreover, we could replace: Wstrict-aliasing Common Alias(Wstrict-aliasing=, 3, 0) Warning Warn about code which might break strict aliasing rules Wstrict-aliasing= Common Joined RejectNegative UInteger Var(warn_strict_aliasing) Warning Warn about code which might break strict aliasing rules with: Wstrict-aliasing Common MissingArgument(3) UInteger Var(warn_strict_aliasing) Warning Warn about code which might break strict aliasing rules and all of these will be accepted: Wstrict-aliasing Wno-strict-aliasing Wstrict-aliasing2 Wstrict-aliasing=2 Wstrict-aliasing 2 Werror=strict-aliasing Wno-error=strict-aliasing Cheers, Manuel. lang-enabled-by-with-args.diff Description: Binary data
Re: Add usage documentation for hash_table
On 10/12/12, Diego Novillo dnovi...@google.com wrote: Add usage documentation for hash_table. Andrew, does this help? Lawrence, I think I've gotten the details right, but please confirm. The patch merges the descriptor class with the element class, which we do not currently do and which I don't think we should do. If that class is used in a tree, it would be illegal. I'll work with Diego to address the issue. Tested by re-building stage 1. * hash-table.h: Add usage documentation. Tidy formatting. diff --git a/gcc/hash-table.h b/gcc/hash-table.h index 3aa66a7..0c51be6 100644 --- a/gcc/hash-table.h +++ b/gcc/hash-table.h @@ -21,8 +21,121 @@ along with GCC; see the file COPYING3. If not see /* This file implements a typed hash table. - The implementation borrows from libiberty's hashtab. */ + The implementation borrows from libiberty's htab_t. + Hash tables are instantiated with two type arguments: + + 1- Element: A type describing the elements in the table. + This type must provide 3 declarations: + + - A typedef to create the type Element::T. This is the type + of the elements stored in the table. So, if you want the + hash table of MyType elements, declare 'typedef MyType T;'. + + - A function named 'hash' that takes a pointer to Element::T + and returns a hashval_t value. This is the hashing + function. + + - A function named 'equal' that takes two pointers to + Element::T and returns an int. This is the comparison + function. It should return nonzero if the elements are + equal, 0 otherwise. + + - A static function named 'remove' that takes an Element::T + pointer and frees the memory allocated by it. This is used + when individual elements of the table need to be disposed of + (e.g., when deleting a hash table, removing elements from + the table, etc). + + 2- Allocator: A template type implementing allocation and deallocation for + the table itself. + + This type takes as argument a type passed on by the hash table + allocation and deallocation functions. It must provide four + static functions: + + - Allocator::control_alloc: This allocates the control data + blocks for the table. + +- Allocator::control_free: This frees the control data blocks + for the table. + +- Allocator::data_alloc: This allocates the data elements in + the table. + + - Allocator::data_free: This deallocates the data elements in + the table. + + In general, you will not need to provide your own Allocator type. + By default, hash tables will use the class xcallocator, which uses + malloc/free for allocation. + + Additionally, this file provides two common types for implementing + element removal: + + - typed_free_remove: it implements the method 'remove' to call + free(). + + - typed_noop_remove: it implements the method 'remove' to do + nothing. + + To use either of these removal strategies, simply make your type a + derived class of one of these two. + + Example usage: + + 1- A hash table for MyType: + + // Derive from typed_noop_remove so element removal does nothing. + class MyType : typed_noop_removeMyType + { + int f1; + OtherType f2; + + // Hash table support. Need a typedef and 2 static functions. + + // 'T' is the type used in all the hash table functions. + typedef MyType T; + + // The hashing function. 'T' and 'MyType' are equivalent here. + static inline hashval_t hash (const MyType *); + + // The equality function. 'T' and 'MyType' are equivalent here. + static inline int equal (const MyType *, const MyType *); + }; + + inline hashval_t + MyType::hash (const MyType *e) + { ... compute and return a hash value for E ... } + + inline int + MyType::equal (const MyType *p1, const MyType *p2) + { ... compare P1 vs P2. Return 1 if they are the same ... } + + + Note that since MyType derives from typed_noop_removeMyType, it does not + need to provide a 'remove' function. It inherits it from + typed_noop_remove. + + To instantiate a hash table for MyType: + + hash_tableMyType mytype_hash; + + You can then used any of the functions in hash_table's public interface. + See hash_table for details. The interface is very similar to libiberty's + htab_t. + + + 2- A hash table of pointers. + + This file provides the template type 'pointer_hash' which can be + used to create hash tables of pointers to any type. This class uses + the same hashing function used by libiberty's hash_pointer. + + To create a hash table of pointers to MyType: + + hash_table pointer_hash MyType ptr_htab; +*/ #ifndef TYPED_HASHTAB_H #define TYPED_HASHTAB_H @@ -71,7 +184,7 @@ xcallocator
Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure
On Fri, Oct 12, 2012 at 10:59 AM, Paolo Carlini paolo.carl...@oracle.com wrote: On 10/12/2012 04:20 PM, Pavel Chupin wrote: Please see attached patch (applicable after revert). I've moved libgcc libstdc++ common configure thread header chunk into separate gthr.m4. Could you please try it on AIX? Is it OK for trunk? Looks Ok. If David can test is successfully on AIX I can approve it. I was able to bootstrap successfully with the patch. Thanks, David
Re: [PATCH] Fix gcov handling directories with periods
On Sat, Oct 13, 2012 at 11:23 AM, Andreas Schwab sch...@linux-m68k.org wrote: Ian Lance Taylor i...@google.com writes: Why do you need the CONST_CAST? strrchr is a standard function and it takes const char * as the first argument. There is other code in gcc that calls strrchr with a const char * argument. strrchr is overloaded as const and non-const in C++. We need the non-const version. Oh yeah. Really we should overload lbasename the same way. Suppose you drop this into include/libiberty.h: #ifdef __cplusplus inline char *lbasename(char *s) { return const_castchar*(lbasename (s)); } #endif I'll preapprove that if it works. Ian
Re: encoding all aliases options in .opt files
On Sat, Oct 13, 2012 at 11:33 AM, Manuel López-Ibáñez lopeziba...@gmail.com wrote: 1) Are references allowed now that C++ is the default? I'm not sure we addressed those in the coding conventions. I would like to say: 1) const references are always fine; 2) non-const references as local variables are always fine; 3) non-const references as function parameters should never be used without a really excellent reason. The reason is simply that using non-const references as function parameters means that arguments passed to a function can change without any indication, and that is potentially quite confusing. Adding an before an output argument is easy enough, and we're all used to it. Ian
Re: Add usage documentation for hash_table
On 2012-10-13 14:40 , Lawrence Crowl wrote: On 10/12/12, Diego Novillo dnovi...@google.com wrote: Add usage documentation for hash_table. Andrew, does this help? Lawrence, I think I've gotten the details right, but please confirm. The patch merges the descriptor class with the element class, which we do not currently do and which I don't think we should do. If that class is used in a tree, it would be illegal. You mean I should s/Element/Descriptor/? Done. I included the formatting changes in the doc patch, but I can easily split the patch if you prefer. OK with those changes? Diego.
Re: [PATCH] Fix gcov handling directories with periods
Ian Lance Taylor i...@google.com writes: Suppose you drop this into include/libiberty.h: #ifdef __cplusplus inline char *lbasename(char *s) { return const_castchar*(lbasename (s)); } #endif That doesn't work: ../../gcc/libcpp/../include/libiberty.h: In function ‘char* lbasename(char*)’: ../../gcc/libcpp/../include/libiberty.h:123:31: error: declaration of C function ‘char* lbasename(char*)’ conflicts with ../../gcc/libcpp/../include/libiberty.h:121:20: error: previous declaration ‘const char* lbasename(const char*)’ here Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Fix PR rtl-optimization/54871
This is the execution failure of gfortran.dg/vector_subscript_1.f90 on SPARC/Solaris at -O3 -funroll-loops. The loop2_unroll dump of the reduced testcase reads: Loop 3 is simple: simple exit 10 - 12 number of iterations: (const_int -1 [0x]) upper bound: 1073741823 realistic bound: -1 The number of iterations is of course bogus. The problem is in simplify_using_initial_values: it uses the condition in: (insn 105 104 106 6 (set (reg:CC 100 %icc) (compare:CC (reg:SI 153 [ D.1086 ]) (reg:SI 229 [ D.1086 ]))) vector_subscript_1.f90:21 1 {*cmpsi_insn} (expr_list:REG_DEAD (reg:SI 229 [ D.1086 ]) (nil))) to record an equivalence between (reg:SI 153) and (reg:SI 229) and then: (insn 179 101 103 6 (set (reg:SI 229 [ D.1086 ]) (reg:SI 245 [ D.1086 ])) vector_subscript_1.f90:21 -1 (nil)) plus the equivalence to simplify the niter expression. But there is: (insn 104 103 105 6 (set (reg:SI 229 [ D.1086 ]) (if_then_else:SI (ge:CC (reg:CC 100 %icc) (const_int 0 [0])) (reg:SI 132 [ D.1086 ]) (reg:SI 245 [ D.1086 ]))) vector_subscript_1.f90:21 105 {*movsi_cc_v9} (expr_list:REG_EQUAL (if_then_else:SI (ge:CC (reg:CC 100 %icc) (const_int 0 [0])) (reg:SI 132 [ D.1086 ]) (const_int 0 [0])) (expr_list:REG_DEAD (reg:SI 132 [ D.1086 ]) (expr_list:REG_DEAD (reg:CC 100 %icc) (nil) between the two above instructions, which invalidates the equivalence. So the equivalence should have been pruned when insn 104 was processed. The patch also does a bit of cosmetic cleanup in loop-unroll.c, for example in the dump file. We have in the loop2_unroll dump of the original testcase: ;; *** Considering loop 3 *** ;; Considering unrolling loop with constant number of iterations ;; max_unroll 5 (6 copies, initial 5). ;; Decided to unroll the constant times rolling loop, 4 times. Question for the reader: how many times is the loop unrolled? 4, 5 or 6? :-) In fact, the max_unroll is always equal to the times + 1 and initial is an internal initial value for max_unroll. Morever, this max_unroll is _not_ equal to the max_unroll in unroll_loop_constant_iterations: unsigned max_unroll = loop-lpt_decision.times; which is equal to the times instead. And we also have comments like: /* Unroll LOOP with constant number of iterations LOOP-LPT_DECISION.TIMES + 1 times. so LOOP-LPT_DECISION.TIMES (the times above) isn't really what one might think it is. In the end, this is mightily confusing. Tested on x86_64-suse-linux and SPARC/Solaris, applied on the mainline. 2012-10-13 Eric Botcazou ebotca...@adacore.com PR rtl-optimization/54871 * loop-iv.c (simplify_using_initial_values): When scanning previous basic blocks, prune the recorded conditions if the current insn was not used to make a replacement. * loop-unroll.c (decide_unroll_constant_iterations): Clean up message. (unroll_loop_constant_iterations): Clarify head comment. (decide_unroll_runtime_iterations): Clean up message. (unroll_loop_runtime_iterations): Clarify head comment. (decide_peel_simple): Clean up message. (peel_loop_simple): Clarify head comment. (decide_unroll_stupid): Clean up message. (unroll_loop_stupid): Clarify head comment. -- Eric BotcazouIndex: loop-iv.c === --- loop-iv.c (revision 192353) +++ loop-iv.c (working copy) @@ -2004,11 +2004,30 @@ simplify_using_initial_values (struct lo } } else - /* If we did not use this insn to make a replacement, any overlap - between stores in this insn and our expression will cause the - expression to become invalid. */ - if (for_each_rtx (expr, altered_reg_used, this_altered)) - goto out; + { + rtx *pnote, *pnote_next; + + /* If we did not use this insn to make a replacement, any overlap + between stores in this insn and our expression will cause the + expression to become invalid. */ + if (for_each_rtx (expr, altered_reg_used, this_altered)) + goto out; + + /* Likewise for the conditions. */ + for (pnote = cond_list; *pnote; pnote = pnote_next) + { + rtx note = *pnote; + rtx old_cond = XEXP (note, 0); + + pnote_next = XEXP (note, 1); + if (for_each_rtx (old_cond, altered_reg_used, this_altered)) + { + *pnote = *pnote_next; + pnote_next = pnote; + free_EXPR_LIST_node (note); + } + } + } if (CONSTANT_P (*expr)) goto out; Index: loop-unroll.c === --- loop-unroll.c (revision 192353) +++ loop-unroll.c (working copy) @@ -602,26 +602,21 @@ decide_unroll_constant_iterations (struc } } - if (dump_file) -fprintf (dump_file, ;; max_unroll %d (%d
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
IRA should be using the DF-LIVE sets, which are smaller than the DF-LR sets when they are available (typically at O2 and above). The proper sets can be conveniently accessed using the df_get_live_[in,out] functions which use DF-LIVE if it is available and fall back to DF-LR if it is not. [...] At optimize1 the DF_LIVE problem is already always computed. I think the idea here was to *remove* the DF_LIVE problem at -O1, which is also part of the attached patch. (Perhaps we should even simply not add the DF_LIVE problem at -O1, but that can be done in a follow-up patch if necessary.) So, in the end, does the patch enable DF_LIVE at -O1 or not? There seems to be a contradiction between the subject and the body of the message. If yes, perhaps an acceptable compromise would be to keep things unchanged at -O1. -- Eric Botcazou
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On Sat, Oct 13, 2012 at 10:38 PM, Eric Botcazou ebotca...@adacore.com wrote: So, in the end, does the patch enable DF_LIVE at -O1 or not? There seems to be a contradiction between the subject and the body of the message. If yes, perhaps an acceptable compromise would be to keep things unchanged at -O1. Eh, right. When I wrote the e-mail, I was still under the impression that IRA only has DF_LIVE available at -O2 and higher, and I only discovered that DF_LIVE is added if optimize==1 just before submitting the patch. So my patch as submitted should have said ... (for -O1 and higher). I think it would be a good idea to keep things unchanged at -O1. For that, the patch needs a few minor modifications (remove calls to df_live_add_problem and make some code to update DF_LIVE_{IN,OUT} conditional). I can prepare an updated patch for that, if you think that's best. Ciao! Steven
Re: Make try_unroll_loop_completely to use loop bounds recorded
On Fri, 12 Oct 2012, Jan Hubicka wrote: * f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable. I wonder if other languages need similar adjustment? I also wondered ;) Only Fortran triggered, I will take a look. + /* Now destroy the loop. First try to do so by cancelling the + patch from exit conditional if we identified good candidate. + you mean 'path' here, right? Similar in other places. Yeah. + /* Unloop destroys the latch edge. */ + unloop (loop, irred_invalidated); + if (irred_invalidated) + mark_irreducible_loops (); this makes the whole thing quadratic in the number of loops. Please pass down a flag and handle this in the place we update SSA form (thus, once per function / unroll iteration). This was in my first version of the patch. Then I noticed that remove_path also relies on irreducible loops to be computed and does the exactly same logic as I do. So we would need to extend the irred_invalidated to remove_path as well. Well, then do that. Or at least constrain the number of loops that are updated in mark_irreducible_loops (though that would still be quadratic in the loop depth then?). We solved scalability issues of cunroll for 4.8, so adding new ones isn't going to fly. Well, limiting number of mark_irreducible_loops updates is probably quite easy to do. unloop is returning true just in quite corner cases and actually we do have this quadratic complexity issue in the RTL unroller forever. Or provide a more optimial implementation that only considers parent loops of 'loop' (even that is excessive though, quadratic in the loop depth). - FOR_EACH_LOOP (li, loop, 0) + for (fel_init (li, next, LI_ONLY_INNERMOST); next;) FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) but I'm sure that IV canonicalization should happen for all loops. So, why do you need this change? Esp. we should get rid of single-iteration outer loops here, too. FOR_EACH_LOOP seems to assume that loops are not cancelled under its hand. I was running into case where we cancelled inner loop but did not update SSA and tried to estimate number of iterations in the outer loop (now also innermost) that went into infinite loop. Well, that's not true (it uses fel_* stuff), FOR_EACH_LOOP constructs a vector of loops in the desired order it walks over. As for estimation I go for next loop before possibly cancelling it, while FOR_EACH_LOOP will go from possibly cancelled loop to next. you probably need to invalidate the scev cache? The current logic I do not think the _by_eval function uses much of scev cache. Here it was simply walking PHI from infinite loop that was artefact of earlier update. explicitely walks all innermost loops and then updates, you cannot walk an outer loop of an inner loop that changed without updating SSA form for example. I think the fel terators will do it when you are cancelling the loops on the go. I will dig out into this more. Honza Richard. Honza - FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) + for (fel_init (li, next, 0); next;) see above - FOR_EACH_LOOP (li, loop, 0) Otherwise looks ok. Please update and re-post. Thanks, Richard. Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c === --- testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -fdump-tree-cunroll-details } */ +int a[2]; +test(int c) +{ + int i; + for (i=0;ic;i++) +a[i]=5; +} +/* Array bounds says the loop will not roll much. */ +/* { dg-final { scan-tree-dump-not Unrolled loop 1 completely (duplicated 1 times). Last iteration exit edge was proved true. cunroll} } */ +/* { dg-final { cleanup-tree-dump cunroll } } */ Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c === --- testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -fdump-tree-cunroll-details } */ +int a[2]; +test(int c) +{ + int i; + for (i=0;ic;i++) +{ + a[i]=5; + if (test2()) + return; +} +} +/* We are not able to get rid of the final conditional because the loop has two exits. */ +/* { dg-final { scan-tree-dump-not Unrolled loop 1 completely (duplicated 2 times). cunroll} } */ +/* { dg-final { cleanup-tree-dump cunroll } } */ Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
I think it would be a good idea to keep things unchanged at -O1. For that, the patch needs a few minor modifications (remove calls to df_live_add_problem and make some code to update DF_LIVE_{IN,OUT} conditional). I can prepare an updated patch for that, if you think that's best. That was just a proposal to be put on the table. :-) Others, especially Vlad of course, have the final say. And I somewhat sympathize with the cautious approach here, because we were forced to downgrade from DF_LIVE to DF_LR for resource.c at some point (but of course it's resource.c so...). -- Eric Botcazou
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On 12-10-13 11:37 AM, Steven Bosscher wrote: On Sat, Oct 13, 2012 at 5:12 PM, Vladimir Makarov wrote: On 12-10-13 9:40 AM, Steven Bosscher wrote: Hello, This fixes the long-standing enhancement request to use DF_LIVE in IRA. To quote the first comment in the PR: IRA should be using the DF-LIVE sets, which are smaller than the DF-LR sets when they are available (typically at O2 and above). The proper sets can be conveniently accessed using the df_get_live_[in,out] functions which use DF-LIVE if it is available and fall back to DF-LR if it is not. I thought that DF_LIVE wasn't available at -O1 in IRA, but interestingly ira.c:ira() adds it to the DF-problems list in that case already: $ grep -C3 -n df_live_add ira.c 4160- 4161- if (optimize == 1) 4162-{ 4163: df_live_add_problem (); 4164- df_live_set_all_dirty (); 4165-} 4166-#ifdef ENABLE_CHECKING So DF_LIVE is already being computed for IRA. It's just not being used because the DF_LR_{IN,OUT} macros are used instead of the df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are used, which results in marginally faster compile times for my set of cc1-i files on power64 and on x86_64, as well as smaller code. There's also a good speedup for the PR54146 test case. Only reload.c still uses the DF_LR_OUT. The comment in find_dummy_reload() suggests this is intentional. I also fond a bug in IRA: 4392- touch the artificial uses and defs. */ 4393- df_finish_pass (true); 4394- if (optimize 1) 4395:df_live_add_problem (); 4396- df_scan_alloc (NULL); 4397- df_scan_blocks (); 4398- At optimize1 the DF_LIVE problem is already always computed. I think the idea here was to *remove* the DF_LIVE problem at -O1, which is also part of the attached patch. (Perhaps we should even simply not add the DF_LIVE problem at -O1, but that can be done in a follow-up patch if necessary.) Bootstrapped and tested (with default checking and with release checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu. OK for trunk? LIVE is not used on purpose. I added LIVE usage to the old RA about 10 years ago (it was before DF-infrastructure). Everything was ok until, somebody reported compiler crash on semantically wrong program (something with usage of uninitialized variable). After that I removed this code. GCC was a completely different compiler 10 years ago. Handling of uninitialized variables is radically different since tree-ssa, and also the resource allocation side of the compiler has essentially been rewritten (by you :-). Whatever broke in the compiler 10 years ago may not be relevant anymore today. I can't find your work, or the reported problem, in the gcc mailing list archives. Did you add a test case for the problem, and if so, where can I find it? Probably, that is also a reason why reload uses LR not LIVE. The reason why reload uses DF_LR is PR20973, as documented in the code. (And FWIW, that bug doesn't reproduce even with GCC 4.0 if you add the transformations of pass_initialize_regs to it, that pass exists to avoid uninitialized regs hazards by making them initialized.) So I'd like to wait a few weeks. When I have more time, I am going to reproduce this test suite and look how IRA behaves on it. I don't think that's giving this patch a fair chance. In a few week's time, stage1 will be over. You may not have time, but that doesn't mean other developers should wait. If you have a test case that fails with my patch, I'll gladly spend my own time on it to see if it can be fixed. My patch was properly tested on two primary targets and passes all tests in the test suite, it improves the compiler now, and it addresses a PR that you've basically ignored for 4 years. I kindly ask you to reconsider your position :-) I guess you are right. A lot of things are changed. DF is also different from what I used. Other passes also uses LIVE sets. Ok for the idea. If we have a problem later, we could fix it. I'll look at the next version of the patch when you send it to give your the final approval. Thanks for your attention to IRA. By the way, I found the reported problem: http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00533.html But it occurred on s390.
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On Sat, Oct 13, 2012 at 11:05 PM, Eric Botcazou wrote: I think it would be a good idea to keep things unchanged at -O1. For that, the patch needs a few minor modifications (remove calls to df_live_add_problem and make some code to update DF_LIVE_{IN,OUT} conditional). I can prepare an updated patch for that, if you think that's best. That was just a proposal to be put on the table. :-) Others, especially Vlad of course, have the final say. And I somewhat sympathize with the cautious approach here, because we were forced to downgrade from DF_LIVE to DF_LR for resource.c at some point (but of course it's resource.c so...). Right, that was for PR40710 (http://gcc.gnu.org/ml/gcc/2009-07/msg00241.html) because resource.c computes its own idea of liveness and it uses a similar definition as the DF_LR problem. Using incompatible views of liveness is not a good idea... For IRA it doesn't matter whether you use DF_LR or DF_LIVE, as long as you use one or the other consistently. Ciao! Steven
[PATCH, alpha]: Trivial alpha.md macroizations, part 5
Hello! 2012-10-13 Uros Bizjak ubiz...@gmail.com * config/alpha/alpha.md (I24MODE): New mode iterator. (any_divmod): New code iterator. (codesi3): Macroize expander from {div,mod,udiv,umod}si3 using any_divmod code iterator. (codesi3): Macroize expander from {div,mod,udiv,umod}di3 using any_divmod code iterator. (extendqimode2): Macroize insn from extendqi{hi,si}2 using I24MODE mode iterator. (unaligned_storemode): Macroize expander from unaligned_store{qi,hi} using I12MODE mode iterator. (movmode): Macroize expander from mov{qi,hi} using I12MODE mode iterator. Tested on alphaev68-linux-gnu, committed to mainline SVN. Uros. Index: alpha.md === --- alpha.md(revision 192421) +++ alpha.md(working copy) @@ -93,6 +93,7 @@ (define_mode_iterator IMODE [QI HI SI DI]) (define_mode_iterator I12MODE [QI HI]) (define_mode_iterator I124MODE [QI HI SI]) +(define_mode_iterator I24MODE [HI SI]) (define_mode_iterator I248MODE [HI SI DI]) (define_mode_iterator I48MODE [SI DI]) @@ -734,31 +735,16 @@ ;; problem. Is it worth the complication here to eliminate the sign ;; extension? -(define_expand divsi3 - [(set (match_dup 3) - (sign_extend:DI (match_operand:SI 1 nonimmediate_operand ))) - (set (match_dup 4) - (sign_extend:DI (match_operand:SI 2 nonimmediate_operand ))) - (parallel [(set (match_dup 5) - (sign_extend:DI (div:SI (match_dup 3) (match_dup 4 - (clobber (reg:DI 23)) - (clobber (reg:DI 28))]) - (set (match_operand:SI 0 nonimmediate_operand ) - (subreg:SI (match_dup 5) 0))] - TARGET_ABI_OSF -{ - operands[3] = gen_reg_rtx (DImode); - operands[4] = gen_reg_rtx (DImode); - operands[5] = gen_reg_rtx (DImode); -}) +(define_code_iterator any_divmod [div mod udiv umod]) -(define_expand udivsi3 +(define_expand codesi3 [(set (match_dup 3) (sign_extend:DI (match_operand:SI 1 nonimmediate_operand ))) (set (match_dup 4) (sign_extend:DI (match_operand:SI 2 nonimmediate_operand ))) (parallel [(set (match_dup 5) - (sign_extend:DI (udiv:SI (match_dup 3) (match_dup 4 + (sign_extend:DI + (any_divmod:SI (match_dup 3) (match_dup 4 (clobber (reg:DI 23)) (clobber (reg:DI 28))]) (set (match_operand:SI 0 nonimmediate_operand ) @@ -770,78 +756,16 @@ operands[5] = gen_reg_rtx (DImode); }) -(define_expand modsi3 - [(set (match_dup 3) - (sign_extend:DI (match_operand:SI 1 nonimmediate_operand ))) - (set (match_dup 4) - (sign_extend:DI (match_operand:SI 2 nonimmediate_operand ))) - (parallel [(set (match_dup 5) - (sign_extend:DI (mod:SI (match_dup 3) (match_dup 4 - (clobber (reg:DI 23)) - (clobber (reg:DI 28))]) - (set (match_operand:SI 0 nonimmediate_operand ) - (subreg:SI (match_dup 5) 0))] - TARGET_ABI_OSF -{ - operands[3] = gen_reg_rtx (DImode); - operands[4] = gen_reg_rtx (DImode); - operands[5] = gen_reg_rtx (DImode); -}) - -(define_expand umodsi3 - [(set (match_dup 3) - (sign_extend:DI (match_operand:SI 1 nonimmediate_operand ))) - (set (match_dup 4) - (sign_extend:DI (match_operand:SI 2 nonimmediate_operand ))) - (parallel [(set (match_dup 5) - (sign_extend:DI (umod:SI (match_dup 3) (match_dup 4 - (clobber (reg:DI 23)) - (clobber (reg:DI 28))]) - (set (match_operand:SI 0 nonimmediate_operand ) - (subreg:SI (match_dup 5) 0))] - TARGET_ABI_OSF -{ - operands[3] = gen_reg_rtx (DImode); - operands[4] = gen_reg_rtx (DImode); - operands[5] = gen_reg_rtx (DImode); -}) - -(define_expand divdi3 +(define_expand codedi3 [(parallel [(set (match_operand:DI 0 register_operand ) - (div:DI (match_operand:DI 1 register_operand ) - (match_operand:DI 2 register_operand ))) + (any_divmod:DI + (match_operand:DI 1 register_operand ) + (match_operand:DI 2 register_operand ))) (clobber (reg:DI 23)) (clobber (reg:DI 28))])] TARGET_ABI_OSF ) -(define_expand udivdi3 - [(parallel [(set (match_operand:DI 0 register_operand ) - (udiv:DI (match_operand:DI 1 register_operand ) - (match_operand:DI 2 register_operand ))) - (clobber (reg:DI 23)) - (clobber (reg:DI 28))])] - TARGET_ABI_OSF - ) - -(define_expand moddi3 - [(parallel [(set (match_operand:DI 0 register_operand ) - (mod:DI (match_operand:DI 1 register_operand ) - (match_operand:DI 2 register_operand ))) - (clobber (reg:DI 23)) - (clobber (reg:DI 28))])] - TARGET_ABI_OSF - ) - -(define_expand umoddi3 - [(parallel [(set
Re: [RFC PATCH] Add support for sparc compare-and-branch.
From: Eric Botcazou ebotca...@adacore.com Date: Sat, 13 Oct 2012 15:47:11 +0200 The trouble area, and where I need help from either Rainer or Eric, is the Solaris2 bits. I think we need to move the Solaris assembler stuff over to a model where it passes: -m{32,64} -xarch=sparcFOO instead of using the v8plusX stuff to indicate 32bit. And that's the direction I tried to move in here. The only versions of the Solaris assembler I have access to only support v8plusX according to the man page. Has that changed recently? For the older stuff I mean doing something like -m32 -xarch=v9X * config/sparc/sparc.opt (mvis4): New option. * config/sparc/sparc-c.c (sparc_target_macros): When TARGET_VIS4, define __VIS__ to 0x400. What's the relationship between VIS4 and SPARC-T4 exactly? The above manual only speaks of VIS3 as far as I can see. And the CBcond instructions are not marked as belonging to VIS (3 or 4), so using -mvis4 for them seems strange. Why not make them depend on -mcpu=niagara4 instead? The current assembler in Solaris Studio (called 'fbe') calls this stuff sparc4 which I guess means SPARC-T4 and later. I'm just calling it VIS4 in GCC so that we can export intrinsics of, for example, the cryptographic instructions at some point using the __VIS__ version CPP tests.
[patch] PR54919 - fix variable expansion in RTL loop unrolling
Hello, Today appears to be RTL loop optimizer patch day, because here's another patch... The problem here is that variable expansion does not update REG_EQUAL notes when it performs replacement of the renamed register. I fixed this by using validate_replace_rtx_group(). There is already code in analyze_insn_to_expand_var() to make sure that the to-be-replaced register is only used to accumulate into, so I think that using validate_replace_rtx_group is safe. Could use a 2nd pair of eyes to make sure, though. Tested with a bootstrapped compiler. Test coverage isn't great, because variable expansion is not enabled by default. OK for trunk? Ciao! Steven pr54919.diff Description: Binary data
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On Sat, 13 Oct 2012, Diego Novillo wrote: On 2012-10-13 14:04 , Tobias Schlüter wrote: On 2012-10-13 20:00, Diego Novillo wrote: On 2012-10-13 09:26 , Tobias Schlüter wrote: first a question also to non-gfortraners: if I want to use std::map, where do I #include map? In system.h? No. Ada includes system.h in pure C code. Why not include it exactly where you need it? Ok, I was just wondering if there's a rule because a quick grep revealed no non-header source file that includes system headers. Joseph or others may have reason to create a system.hxx file or some such, but in principle I don't see why we shouldn't include std library headers as we need them. Joseph? The poisoning of various standard library functions that should not be used directly in GCC can be problematic if system headers, using those functions, are included after that poisoning. Thus in general it's better for system header includes to go in system.h, before the poisoning. In addition, where there is some autoconf configuration about whether to include a header, it's generally preferred to keep down the number of places with such a conditional on host features (encapsulating whatever features the header provides in some way so that other code can use them unconditionally). -- Joseph S. Myers jos...@codesourcery.com
Re: encoding all aliases options in .opt files
On Sat, 13 Oct 2012, Manuel L?pez-Ib??ez wrote: OK. The attached patch implements this. Does the approach look ok? I will write changelog and more comments if it seems reasonable. One Without the comments explaining the semantics of the new functions and their parameters, I'm not going to attempt to reverse-engineer what the approach in question is intended to be and so whether it's reasonable or not. 2) While fixing this, I was thinking: Why the difference between Joined and Separate? Why not make every function that takes an argument work with and without '=' and with separate argument. It seems we could remove a lot of options this way. What I am proposing is, instead of: I see no significant advantage to users in having all these different ways to pass options, and multiple disadvantages: * If you add yet more ways of passing options, then any future overhaul of option handling needs to preserve those in addition to all the others. The fewer ways there are, the fewer constraints on the implementation. * Options like this aux-infoFILE /* we could accept this to be compatible with some options like -B */ are liable to be a pain for anyone, seeing such an option, to work out what is the option name that they might search for to get more information, and what is the argument. * In general, if all users are using an option in the same form it's easier for them to tell that another person really is using the same option rather than something that might be different from the option they are used to. * This also may not work so well with JoinedOrMissing options. Wstrict-aliasing2 Shades of the old -gdwarf2 that once meant DWARF 1, debug level 2, not to be confused with -gdwarf-2. I don't think such cryptic forms should be encouraged. Wstrict-aliasing 2 Ambiguous with the existing -Wstrict-aliasing option followed by a linker input file called 2. -- Joseph S. Myers jos...@codesourcery.com
[patch][wwwdocs] gcc 4.8 changes - mention scalability improvements
Hello, This patch adds a short notice about some speed-ups in GCC 4.8 for extremely large functions (coming from the work done on PR54146 by several people). OK for the wwwdocs? Ciao! Steven Index: htdocs/gcc-4.8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.44 diff -u -r1.44 changes.html --- htdocs/gcc-4.8/changes.html 9 Oct 2012 18:44:55 - 1.44 +++ htdocs/gcc-4.8/changes.html 13 Oct 2012 22:45:59 - @@ -65,10 +65,17 @@ level, and it makes PRE more aggressive. /li liThe struct reorg and matrix reorg optimizations (command-line -options code-fipa-struct-reorg/code and -code-fipa-matrix-reorg/code) have been removed. They did not -work correctly nor with link-time optimization (LTO), hence were only -applicable to programs consisting of a single translation unit./li + options code-fipa-struct-reorg/code and + code-fipa-matrix-reorg/code) have been removed. They did not + work correctly nor with link-time optimization (LTO), hence were only + applicable to programs consisting of a single translation unit. +/li +liSeveral scalability bottle-necks have been removed from GCC's + optimization passes. Compilation of extremely large functions, + e.g. due to the use of the codeflatten/code attribute in the + Eigen C++ linear algebra templates library, is significantly + faster than previous releases of GCC. +/li /ul
Re: Propagate profile counts during switch expansion
Ping. On Mon, Oct 8, 2012 at 5:46 PM, Easwaran Raman era...@google.com wrote: I have attached a revised patch. The updated ChangeLog is given below and I have responded to your comments inline: 2012-10-08 Easwaran Raman era...@google.com * optabs.c (emit_cmp_and_jump_insn_1): Add a new parameter to specificy the probability of taking the jump. (emit_cmp_and_jump_insns): Likewise. (expand_compare_and_swap_loop): Make the jump predicted not taken. * dojump.c (do_compare_rtx_and_jump): Remove the code attaching REG_BR_PROB note and pass probability to emit_cmp_and_jump_insns. * cfgbuild.c (compute_outgoing_frequencies): Do not guess outgoing probabilities for branches with more than two successors. * expr.c (emit_block_move_via_loop): Predict the loop backedge loop to be highly taken. (try_casesi): Pass the probability of jumping to the default label. (try_tablejump): Likewise. (do_tablejump): Likewise. * expr.h (try_tablejump): Add a new parameter. (try_casesi): Likewise. (emit_cmp_and_jump_insns): Add probability as default parameter with a default value of -1. * except.c (sjlj_emit_function_enter): Pass probability to emit_cmp_and_jump_insns. * stmt.c (case_node): Add new fields PROB and SUBTREE_PROB. (do_jump_if_equal): Pass probability for REG_BR_PROB note. (add_case_node): Pass estimated probability of jumping to the case label. (emit_case_decision_tree): Pass default_prob to emit_case_nodes. (get_outgoing_edge_probs): New function. (conditional_probability): Likewise. (reset_out_edges_aux): Likewise. (compute_cases_per_edge): Likewise. (emit_case_dispatch_table): Update probabilities of edges coming out of the switch statement. (expand_case): Compute and propagate default edge probability to emit_case_dispatch_table. (expand_sjlj_dispatch_table): Update calls to add_case_node and emit_case_dispatch_table. (balance_case_nodes): Update subtree_prob values. (emit_case_nodes): Compute edge probabilities and add pass them to emit_cmp_and_jump_insns. gcc/testsuite/ChangeLog: 2012-10-02 Easwaran Raman era...@google.com * gcc.dg/tree-prof/switch-case-1.c: New test case. * gcc.dg/tree-prof/switch-case-2.c: New test case. On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, This patch propagates the profile counts during RTL expansion. In many cases, there is no way to determine the exact count of an edge generated during the expansion. So this patch uses some simple heuristics to estimate the edge counts but ensures that the counts of the basic blocks corresponding to the cases are (nearly the) same as at the gimple level. Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for trunk? Index: gcc/expr.c === --- gcc/expr.c (revision 191879) +++ gcc/expr.c (working copy) @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) Please document default_probability. Done. { rtx temp, vector; @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r the maximum value of the range. */ if (default_label) -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, - default_label); +{ + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, + default_label); + if (default_probability != -1) +{ + rtx jump_insn = get_last_insn(); + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability)); +} +} dojump already does this kind of logic, but it is bit more cureful: emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, if_true_label); if (prob != -1 profile_status != PROFILE_ABSENT) { for (last = NEXT_INSN (last); last NEXT_INSN (last); last = NEXT_INSN (last)) if (JUMP_P (last)) break; if (last JUMP_P (last) ! NEXT_INSN (last) any_condjump_p (last)) { gcc_assert (!find_reg_note (last, REG_BR_PROB, 0)); add_reg_note (last, REG_BR_PROB, GEN_INT (prob)); } } What about making emit_cmp_and_jump_insns taking the probability argument and moving the code above
[C++ Patch] PR 17805
Hi, back in 2005 Alexandre worked on this issue up to the point that Mark approved a patch conditional to a couple of straightforward changes (see audit trail). Then nothing happened ;) Today I resurrected the patch, implemented the requests and in the process noticed that it wasn't really handling the single argument case, thus I also extended a bit the testcase. Tested x86_64-linux. (Finally) Ok? Thanks, Paolo. /// /cp 2012-10-14 Alexandre Oliva aol...@redhat.com Paolo Carlini paolo.carl...@oracle.com PR c++/17805 * call.c (build_new_op): Filter out operator functions that don't satisfy enum-conversion match requirements. /testsuite 2012-10-14 Alexandre Oliva aol...@redhat.com Paolo Carlini paolo.carl...@oracle.com PR c++/17805 * g++.dg/overload/operator6.C: New. Index: cp/call.c === --- cp/call.c (revision 192425) +++ cp/call.c (working copy) @@ -5043,6 +5043,11 @@ build_new_op_1 (location_t loc, enum tree_code cod NULL_TREE, arglist, NULL_TREE, NULL_TREE, false, NULL_TREE, NULL_TREE, flags, candidates, complain); + + args[0] = arg1; + args[1] = arg2; + args[2] = NULL_TREE; + /* Add class-member operators to the candidate set. */ if (CLASS_TYPE_P (TREE_TYPE (arg1))) { @@ -5062,11 +5067,50 @@ build_new_op_1 (location_t loc, enum tree_code cod BASELINK_ACCESS_BINFO (fns), flags, candidates, complain); } + /* Per 13.3.1.2/3, 2nd bullet, if no operand has a class type, then + only non-member functions that have type T1 or reference to + cv-qualified-opt T1 for the first argument, if the first argument + has an enumeration type, or T2 or reference to cv-qualified-opt + T2 for the second argument, if the the second argument has an + enumeration type. Filter out those that don't match. */ + else if (! arg2 || ! CLASS_TYPE_P (TREE_TYPE (arg2))) +{ + struct z_candidate **candp, **next; - args[0] = arg1; - args[1] = arg2; - args[2] = NULL_TREE; + for (candp = candidates; *candp; candp = next) + { + tree parmlist, parmtype; + int i, nargs = (arg2 ? 2 : 1); + cand = *candp; + next = cand-next; + + parmlist = TYPE_ARG_TYPES (TREE_TYPE (cand-fn)); + + for (i = 0; i nargs; ++i) + { + parmtype = TREE_VALUE (parmlist); + + if (TREE_CODE (parmtype) == REFERENCE_TYPE) + parmtype = TREE_TYPE (parmtype); + if (TREE_CODE (TREE_TYPE (args[i])) == ENUMERAL_TYPE + (same_type_ignoring_top_level_qualifiers_p + (TREE_TYPE (args[i]), parmtype))) + break; + + parmlist = TREE_CHAIN (parmlist); + } + + /* No argument has an appropriate type, so remove this +candidate function from the list. */ + if (i == nargs) + { + *candp = cand-next; + next = candp; + } + } +} + add_builtin_candidates (candidates, code, code2, fnname, args, flags, complain); Index: testsuite/g++.dg/overload/operator6.C === --- testsuite/g++.dg/overload/operator6.C (revision 0) +++ testsuite/g++.dg/overload/operator6.C (working copy) @@ -0,0 +1,27 @@ +// PR c++/17805 + +// Per 13.3.1.2/3 bullet 2, an operator function is not a candidate +// for overload resolution if neither argument is of class type and +// neither enumerator-typed argument gets an exact match, with or +// without reference binding, for the corresponding parameter. + +struct A +{ + A(int); + A(const char*); +}; + +bool operator==(const A, const A); +const A operator*(const A); + +enum E { e }; + +bool b1 = (e == ); // { dg-error no match } + +bool b2 = (A(1) == ); + +bool b3 = (e == A(1)); + +const A a1 = *e;// { dg-error no match } + +const A a2 = *A(1);