Re: [C++ Patch] PR 21682 (DR 565) (Take 2)
OK. Jason
[PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
Hi, The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find different MEM_REFs sharing common part in addressing expression. If such MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient addressing expression during the RTL passes. The pass analyzes addressing expression in each MEM_REF to see if it can be formalized as follows: base:MEM_REF (T1, C1) offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) bitpos: C4 * BITS_PER_UNIT Then restructures it into below form: MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), C1 + (C2 * C3) + C4) At last, rewrite the MEM_REFs if there are two or more sharing common (non-constant) part. The problem is it doesn't back trace T2. If T2 is recorded as a CAND_ADD in form of T2' + C5, the MEM_REF should be restructure into: MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)), C1 + (C2 * C3) + C4 + (C5 * C3)) The patch also includes a test case to illustrate the problem. Bootstrapped and tested on x86/x86_64/arm-a15, is it ok? Thanks. bin 2013-09-02 Bin Cheng bin.ch...@arm.com * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New. (restructure_reference): Call backtrace_base_for_ref. gcc/testsuite/ChangeLog 2013-09-02 Bin Cheng bin.ch...@arm.com * gcc.dg/tree-ssa/slsr-39.c: New test.Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c === --- gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0) @@ -0,0 +1,26 @@ +/* Verify straight-line strength reduction for back-tracing + CADN_ADD for T2 in: + +*PBASE:T1 +*POFFSET: MULT_EXPR (T2, C3) +*PINDEX: C1 + (C2 * C3) + C4 */ + +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-slsr } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i] [j++] = i; + a2 [i] [j++] = i; + a2 [i] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times MEM 4 slsr } } */ +/* { dg-final { cleanup-tree-dump slsr } } */ Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 202067) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed) add_cand_for_stmt (phi, c); } +/* Given PBASE which is a pointer to tree, loop up the defining + statement for it and check whether the candidate is in the + form of: + + X = B + (1 * S), S is integer constant + X = B + (i * S), S is integer one + + If so, set PBASE to the candiate's base_expr and return double + int (i * S). + Otherwise, just return double int zero. */ + +static double_int +backtrace_base_for_ref (tree *pbase) +{ + tree base_in = *pbase; + slsr_cand_t base_cand; + + STRIP_NOPS (base_in); + if (TREE_CODE (base_in) != SSA_NAME) +return tree_to_double_int (integer_zero_node); + + base_cand = base_cand_from_table (base_in); + + while (base_cand base_cand-kind != CAND_PHI) +{ + if (base_cand-kind == CAND_ADD + base_cand-index.is_one () + TREE_CODE (base_cand-stride) == INTEGER_CST) + { + /* X = B + (1 * S), S is integer constant. */ + *pbase = base_cand-base_expr; + return tree_to_double_int (base_cand-stride); + } + else if (base_cand-kind == CAND_ADD + TREE_CODE (base_cand-stride) == INTEGER_CST + integer_onep (base_cand-stride)) +{ + /* X = B + (i * S), S is integer one. */ + *pbase = base_cand-base_expr; + return base_cand-index; + } + + if (base_cand-next_interp) + base_cand = lookup_cand (base_cand-next_interp); + else + base_cand = NULL; +} + + return tree_to_double_int (integer_zero_node); +} + /* Look for the following pattern: *PBASE:MEM_REF (T1, C1) @@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) *PBASE:T1 *POFFSET: MULT_EXPR (T2, C3) -*PINDEX: C1 + (C2 * C3) + C4 */ +*PINDEX: C1 + (C2 * C3) + C4 + When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It + will be further restructured to: + +*PBASE:T1 +*POFFSET: MULT_EXPR (T2', C3) +*PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ + static bool restructure_reference (tree *pbase, tree *poffset, double_int *pindex, tree *ptype) @@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, double_int index = *pindex; double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); tree mult_op0, mult_op1, t1, t2, type; - double_int c1, c2, c3, c4; + double_int c1, c2, c3, c4, c5; if (!base || !offset @@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree *poffset, } c4 = index.udiv
[PATCH GCC]Find auto-increment use both before and after candidate's increment in IVOPT
Hi, For now set_autoinc_for_original_candidates only searches auto-inc uses before candidate's increment, causing pre-increment opportunities missed for original candidates. This is a straightforward fix by searching after candidate's increment too. The patch also includes a test case to illustrate the problem. Without the patch, assembly of the test is: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movwr3, #:lower16:__ctype_ptr__ ldrbr2, [r0]@ zero_extendqisi2 movtr3, #:upper16:__ctype_ptr__ ldr r1, [r3] addsr3, r1, r2 ldrbr3, [r3, #1]@ zero_extendqisi2 lslsr3, r3, #29 bmi .L2 addsr3, r0, #1 .L3: mov r0, r3 addsr3, r3, #1 ldrbr2, [r0]@ zero_extendqisi2 add r2, r2, r1 ldrbr2, [r2, #1]@ zero_extendqisi2 lslsr2, r2, #29 bpl .L3 .L2: bx lr .size foo, .-foo Which can be optimized into below: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movwr3, #:lower16:__ctype_ptr__ ldrbr1, [r0]@ zero_extendqisi2 movtr3, #:upper16:__ctype_ptr__ ldr r2, [r3] addsr3, r2, r1 ldrbr3, [r3, #1]@ zero_extendqisi2 lslsr1, r3, #29 bmi .L2 .L3: ldrbr3, [r0, #1]! @ zero_extendqisi2 add r3, r3, r2 ldrbr3, [r3, #1]@ zero_extendqisi2 lslsr3, r3, #29 bpl .L3 .L2: bx lr .size foo, .-foo Bootstrapped and tested on arm a15, is it OK? Thanks. bin 2013-09-02 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (set_autoinc_for_original_candidates): Find auto-increment use both before and after candidate. gcc/testsuite/ChangeLog 2013-09-02 Bin Cheng bin.ch...@arm.com * gcc.target/arm/ivopts-orig_biv-inc.c: New test.Index: gcc/testsuite/gcc.target/arm/ivopts-orig_biv-inc.c === --- gcc/testsuite/gcc.target/arm/ivopts-orig_biv-inc.c (revision 0) +++ gcc/testsuite/gcc.target/arm/ivopts-orig_biv-inc.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-ivopts-details } */ +/* { dg-skip-if { arm_thumb1 } } */ + +extern char *__ctype_ptr__; + +unsigned char * foo(unsigned char *ReadPtr) +{ + + unsigned char c; + + while (!(((__ctype_ptr__+sizeof([*ReadPtr]))[(int)(*ReadPtr)])04) == (!(0))) + ReadPtr++; + + return ReadPtr; +} + +/* { dg-final { scan-tree-dump-times original biv 2 ivopts} } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 200774) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -4876,22 +4876,36 @@ set_autoinc_for_original_candidates (struct ivopts for (i = 0; i n_iv_cands (data); i++) { struct iv_cand *cand = iv_cand (data, i); - struct iv_use *closest = NULL; + struct iv_use *closest_before = NULL; + struct iv_use *closest_after = NULL; if (cand-pos != IP_ORIGINAL) continue; + for (j = 0; j n_iv_uses (data); j++) { struct iv_use *use = iv_use (data, j); unsigned uid = gimple_uid (use-stmt); - if (gimple_bb (use-stmt) != gimple_bb (cand-incremented_at) - || uid gimple_uid (cand-incremented_at)) + + if (gimple_bb (use-stmt) != gimple_bb (cand-incremented_at)) continue; - if (closest == NULL || uid gimple_uid (closest-stmt)) - closest = use; + + if (uid gimple_uid (cand-incremented_at) + (closest_before == NULL + || uid gimple_uid (closest_before-stmt))) + closest_before = use; + + if (uid gimple_uid (cand-incremented_at) + (closest_after == NULL + || uid gimple_uid (closest_after-stmt))) + closest_after = use; } - if (closest == NULL || !autoinc_possible_for_pair (data, closest, cand)) - continue; - cand-ainc_use = closest; + + if (closest_before != NULL + autoinc_possible_for_pair (data, closest_before, cand)) + cand-ainc_use = closest_before; + else if (closest_after != NULL + autoinc_possible_for_pair (data, closest_after, cand)) + cand-ainc_use = closest_after; } }
RE: [PATCH ARM]Refine scaled address expression on ARM
-Original Message- From: Richard Earnshaw Sent: Thursday, August 29, 2013 9:06 PM To: Bin Cheng Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH ARM]Refine scaled address expression on ARM On 28/08/13 08:00, bin.cheng wrote: Hi, This patch refines scaled address expression on ARM. It supports base+index*scale in arm_legitimate_address_outer_p. It also tries to legitimize base + index * scale + offset with reg - base + offset; reg + index * scale by introducing thumb2_legitimize_address. For now + function thumb2_legitimize_address is a kind of placeholder and just does the mentioned transformation by calling to try_multiplier_address. Hoping we can improve it in the future. With this patch: 1) base+index*scale is recognized. That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form. So this shouldn't be necessary. Can you identify where this non-canoncial form is being generated? Oh, for now ivopt constructs index*scale to test whether backend supports scaled addressing mode, which is not valid on ARM, so I was going to construct base + index*scale instead. Since base + index * scale is not canonical form, I will construct the canonical form and drop this part of the patch. Is rest of this patch OK? Thanks. bin
patch to enable *.cc files in gengtype
Hello All, The attached patch (for trunk svn rev 202160) to gcc/gengtype.c permits files named *.cc (by adding another rule) to be passed to gengtype and gives a slightly meaningful fatal error message whan an unrecogized file name (e.g. *.cpp or something even more wild) is passed to gengtype. FWIW, this patch is imported from melt-branch svn rev 202160. gcc/ChangeLog entry 2013-09-02 Basile Starynkevitch bas...@starynkevitch.net * gengtype.c (file_rules): Added rule for *.cc files. (get_output_file_with_visibility): Give fatal message when no rules found. ### Ok for trunk? Cheers -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} *** Index: gcc/gengtype.c === --- gcc/gengtype.c (revision 202160) +++ gcc/gengtype.c (working copy) @@ -2004,14 +2004,21 @@ REG_EXTENDED, NULL_REGEX, gt-objc-objc-map.h, objc/objc-map.c, NULL_FRULACT }, - /* General cases. For header *.h and source *.c files, we need - * special actions to handle the language. */ + /* General cases. For header *.h and source *.c or *.cc files, we + * need special actions to handle the language. */ /* Source *.c files are using get_file_gtfilename to compute their output_name and get_file_basename to compute their for_name through the source_dot_c_frul action. */ { DIR_PREFIX_REGEX ([[:alnum:]_-]*)\\.c$, REG_EXTENDED, NULL_REGEX, gt-$3.h, $3.c, source_dot_c_frul}, + + /* Source *.cc files are using get_file_gtfilename to compute their + output_name and get_file_basename to compute their for_name + through the source_dot_c_frul action. */ + { DIR_PREFIX_REGEX ([[:alnum:]_-]*)\\.cc$, +REG_EXTENDED, NULL_REGEX, gt-$3.h, $3.cc, source_dot_c_frul}, + /* Common header files get gtype-desc.c as their output_name, * while language specific header files are handled specially. So * we need the header_dot_h_frul action. */ @@ -2269,9 +2276,9 @@ } if (!output_name || !for_name) { - /* This is impossible, and could only happen if the files_rules is - incomplete or buggy. */ - gcc_unreachable (); + /* This should not be possible, and could only happen if the + files_rules is incomplete or buggy. */ + fatal (failed to compute output name for %s, inpfname); } /* Look through to see if we've ever seen this output filename
Ping: small patch to accept = inside plugin arguments
Hello I'm pinging my last month's patch http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00382.html of August 07th 2013 to accept the = inside plugin arguments Cheers -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: [RFC] Changes to the wide-int classes
Mike Stump mikest...@comcast.net writes: * A static assert also prevents fixed_wide_ints from being initialised from wide_ints. I think combinations like that would always be a mistake. I don't see why. In C, this: int i; long l = i; is not an error, and while a user might not want to do this, other times, it is completely reasonable. Sure, but in these terms, int is FLEXIBLE_PRECISION, while wide_int is VAR_PRECISION. You can do the above because int has a sign, and so the compiler knows how to extend it to wider types. wide_int doesn't have a sign, so if you want to extend it to wider types you need to specify a sign explicitly. So you can do: max_wide_int x = max_wide_int::from (wide_int_var, SIGNED); max_wide_int x = max_wide_int::from (wide_int_var, UNSIGNED); What I'm saying is that the assert stops plain: max_wide_int x = wide_int_var; since it's very unlikely that you'd know up-front that wide_int_var has precision MAX_BITSIZE_MODE_ANY_INT. FLEXIBLE_PRECISION is for integers that act in a C-like way. CONST_PRECISION and VAR_PRECISION are (deliberately) not C-like. Thanks, Richard
Re: [PATCH] Convert more passes to new dump framework
On Fri, Aug 30, 2013 at 6:27 PM, Xinliang David Li davi...@google.com wrote: Except that in this form, the dump will be extremely large and not suitable for very large applications. So? You asked for it and you can easily grep the output before storing it away. Besides, we might also want to use the same machinery (dump_printf_loc etc) for dump file dumping. The current behavior of using '-details' to turn on opt-info-all messages for dump files are not desirable. How about the following: 1) add a new dump_kind modifier so that when that modifier is specified, the messages won't goto the alt_dumpfile (controlled by -fopt-info), but only to primary dump file. With this, the inline messages can be dumped via: dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .) 2) add more flags in -fdump- support: -fdump-ipa-inline-opt -- turn on opt-info messages only -fdump-ipa-inline-optall -- turn on opt-info-all messages -fdump-tree-pre-ir -- turn on GIMPLE dump only -fdump-tree-pre-details -- turn on everything (ir, optall, trace) With this, developers can really just use -fdump-ipa-inline-opt=stderr for inline messages. thanks, David On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson tejohn...@google.com wrote: On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener richard.guent...@gmail.com wrote: New patch below that removes this global variable, and also outputs the node-symbol.order (in square brackets after the function name so as to not clutter it). Inline messages with profile data look look: test.c:8:3: note: foobar [0] (9000) inlined into foo [2] (1000) with call count 9000 (via inline instance bar [3] (9000)) Ick. This looks both redundant and cluttered. This is supposed to be understandable by GCC users, not only GCC developers. The main part that is only useful/understandable to gcc developers is the node-symbol.order in square brackes, requested by Martin. One possibility is that I could put that part under a param, disabled by default. We have something similar on the google branches that emits LIPO module info in the message, enabled via a param. But we have _dump files_ for that. That's the developer-consumed form of opt-info. -fopt-info is purely user sugar and for usual translation units it shouldn't exceed a single terminal full of output. But as a developer I don't want to have to parse lots of dump files for a summary of the major optimizations performed (e.g. inlining, unrolling) for an application, unless I am diving into the reasons for why or why not one of those optimizations occurred in a particular location. I really do want a summary emitted to stderr so that it is easily searchable/summarizable for the app as a whole. For example, some of the apps I am interested in have thousands of input files, and trying to collect and parse dump files for each and every one is overwhelming (it probably would be even if my input files numbered in the hundreds). What has been very useful is having these high level summary messages of inlines and unrolls emitted to stderr by -fopt-info. Then it is easy to search and sort by hotness to get a feel for things like what inlines are missing when moving to a new compiler, or compiling a new version of the source, for example. Then you know which files to focus on and collect dump files for. I thought we can direct dump files to stderr now? So, just use -fdump-tree-all=stderr and grep its contents. I'd argue that the other information (the profile counts, emitted only when using -fprofile-use, and the inline call chains) are useful if you want to understand whether and how critical inlines are occurring. I think this is the type of information that users focused on optimizations, as well as gcc developers, want when they use -fopt-info. Otherwise it is difficult to make sense of the inline information. Well, I doubt that inline information is interesting to users unless we are able to aggressively filter it to what users are interested in. Which IMHO isn't possible - users are interested in I have not inlined this even though inlining would severely improve performance which would indicate a bug in the heuristics we can reliably detect and thus it wouldn't be there. I have interacted with users who are aware of optimizations such as inlining and unrolling and want to look at that information to diagnose performance differences when refactoring code or using a new compiler version. I also think inlining (especially cross-module) is one example of an optimization that is still being tuned, and user reports of performance issues related to that have been useful. I really think that the two groups of people who will find -fopt-info useful are gcc developers and savvy performance-hungry users. For the former group the additional info
Re: [PATCH] Convert more passes to new dump framework
On Fri, Aug 30, 2013 at 9:51 PM, Teresa Johnson tejohn...@google.com wrote: On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li davi...@google.com wrote: Except that in this form, the dump will be extremely large and not suitable for very large applications. Yes. I did some measurements for both a fairly large source file that is heavily optimized with LIPO and for a simple toy example that has some inlining. For the large source file, the output from -fdump-ipa-inline=stderr was almost 100x the line count of the -fopt-info output. For the toy source file it was 43x. The size of the -details output was 250x and 100x, respectively. Which is untenable for a large app. The issue I am having here is that I want a more verbose message, not a more voluminous set of messages. Using either -fopt-info-all or -fdump-ipa-inline to provoke the more verbose inline message will give me a much greater volume of output. I think we will never reach the state where the dumping is exactly what each developer wants (because their wants will differ). Developers can easily post-process the stderr output with piping through grep. Richard. One compromise could be to emit the more verbose inliner message under a param (and a more concise foo inlined into bar by default with -fopt-info). Or we could do some variant of what David talks about below. Besides, we might also want to use the same machinery (dump_printf_loc etc) for dump file dumping. The current behavior of using '-details' to turn on opt-info-all messages for dump files are not desirable. Interestingly, this doesn't even work. When I do -fdump-ipa-inline-details=stderr (with my patch containing the inliner messages) I am not getting those inliner messages emitted to stderr. Even though in dumpfile.c details is set to (TDF_DETAILS | MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not sure why, but will need to debug this. How about the following: 1) add a new dump_kind modifier so that when that modifier is specified, the messages won't goto the alt_dumpfile (controlled by -fopt-info), but only to primary dump file. With this, the inline messages can be dumped via: dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .) (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) ) Typically OR-ing together flags like this indicates dump under any of those conditions. But we could implement special handling for OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to the primary dump file, and only under the other conditions specified in the flag (here under -optimized) 2) add more flags in -fdump- support: -fdump-ipa-inline-opt -- turn on opt-info messages only -fdump-ipa-inline-optall -- turn on opt-info-all messages According to the documentation (see the -fdump-tree- documentation on http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options), the above are already supposed to be there (-optimized, -missed, -note and -optall). However, specifying any of these gives a warning like: cc1: warning: ignoring unknown option ‘optimized’ in ‘-fdump-ipa-inline’ [enabled by default] Probably because none is listed in the dump_options[] array in dumpfile.c. However, I don't think there is currently a way to use -fdump- options and *only* get one of these, as much of the current dump output is emitted whenever there is a dump_file defined. Until everything is migrated to the new framework it may be difficult to get this to work. -fdump-tree-pre-ir -- turn on GIMPLE dump only -fdump-tree-pre-details -- turn on everything (ir, optall, trace) With this, developers can really just use -fdump-ipa-inline-opt=stderr for inline messages. Yes, if we can figure out a good way to get this to work (i.e. only emit the optimized messages and not the rest of the dump messages). And unfortunately to get them all you need to specify -fdump-ipa-all-optimized -fdump-tree-all-optimized -fdump-rtl-all-optimized instead of just -fopt-info. Unless we can add -fdump-all-all-optimized. Teresa thanks, David On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson tejohn...@google.com wrote: On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener richard.guent...@gmail.com wrote: New patch below that removes this global variable, and also outputs the node-symbol.order (in square brackets after the function name so as to not clutter it). Inline messages with profile data look look: test.c:8:3: note: foobar [0] (9000) inlined into foo [2] (1000) with call count 9000 (via inline instance bar [3] (9000)) Ick. This looks both redundant and cluttered. This is supposed to be understandable by GCC users, not only GCC developers. The main part that is only useful/understandable to gcc developers is the node-symbol.order in square brackes, requested
Re: [PATCH] Convert more passes to new dump framework
On Fri, Aug 30, 2013 at 11:23 PM, Teresa Johnson tejohn...@google.com wrote: On Fri, Aug 30, 2013 at 1:30 PM, Xinliang David Li davi...@google.com wrote: On Fri, Aug 30, 2013 at 12:51 PM, Teresa Johnson tejohn...@google.com wrote: On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li davi...@google.com wrote: Except that in this form, the dump will be extremely large and not suitable for very large applications. Yes. I did some measurements for both a fairly large source file that is heavily optimized with LIPO and for a simple toy example that has some inlining. For the large source file, the output from -fdump-ipa-inline=stderr was almost 100x the line count of the -fopt-info output. For the toy source file it was 43x. The size of the -details output was 250x and 100x, respectively. Which is untenable for a large app. The issue I am having here is that I want a more verbose message, not a more voluminous set of messages. Using either -fopt-info-all or -fdump-ipa-inline to provoke the more verbose inline message will give me a much greater volume of output. One compromise could be to emit the more verbose inliner message under a param (and a more concise foo inlined into bar by default with -fopt-info). Or we could do some variant of what David talks about below. something like --param=verbose-opt-info=1 Yes. Richard, would this be acceptable for now? i.e. the inliner messages would be like: -fopt-info: test.c:8:3: note: foobar inlined into foo with call count 9000 (the with call count X only when there is profile feedback) -fopt-info --param=verbose-opt-info=1: test.c:8:3: note: foobar/0 (9000) inlined into foo/2 (1000) with call count 9000 (via inline instance bar [3] (9000)) (again the call counts only emitted under profile feedback) It looks like a hack to me. Is -fdump-ipa-inline useful at all? That is, can't we simply push some of the -details dumping into the non-details dump? Richard. Besides, we might also want to use the same machinery (dump_printf_loc etc) for dump file dumping. The current behavior of using '-details' to turn on opt-info-all messages for dump files are not desirable. Interestingly, this doesn't even work. When I do -fdump-ipa-inline-details=stderr (with my patch containing the inliner messages) I am not getting those inliner messages emitted to stderr. Even though in dumpfile.c details is set to (TDF_DETAILS | MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not sure why, but will need to debug this. It works for vectorizer pass. Ok, let me see what is going on - I just confirmed that it is not working for the loop unroller messages either. How about the following: 1) add a new dump_kind modifier so that when that modifier is specified, the messages won't goto the alt_dumpfile (controlled by -fopt-info), but only to primary dump file. With this, the inline messages can be dumped via: dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .) (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) ) Yes. Typically OR-ing together flags like this indicates dump under any of those conditions. But we could implement special handling for OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to the primary dump file, and only under the other conditions specified in the flag (here under -optimized) 2) add more flags in -fdump- support: -fdump-ipa-inline-opt -- turn on opt-info messages only -fdump-ipa-inline-optall -- turn on opt-info-all messages According to the documentation (see the -fdump-tree- documentation on http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options), the above are already supposed to be there (-optimized, -missed, -note and -optall). However, specifying any of these gives a warning like: cc1: warning: ignoring unknown option ‘optimized’ in ‘-fdump-ipa-inline’ [enabled by default] Probably because none is listed in the dump_options[] array in dumpfile.c. However, I don't think there is currently a way to use -fdump- options and *only* get one of these, as much of the current dump output is emitted whenever there is a dump_file defined. Until everything is migrated to the new framework it may be difficult to get this to work. -fdump-tree-pre-ir -- turn on GIMPLE dump only -fdump-tree-pre-details -- turn on everything (ir, optall, trace) With this, developers can really just use -fdump-ipa-inline-opt=stderr for inline messages. Yes, if we can figure out a good way to get this to work (i.e. only emit the optimized messages and not the rest of the dump messages). And unfortunately to get them all you need to specify -fdump-ipa-all-optimized -fdump-tree-all-optimized -fdump-rtl-all-optimized instead of just -fopt-info. Unless we can add -fdump-all-all-optimized. Having general support requires cleanup of all the old style if (dump_file) fprintf
Re: [RFC] Changes to the wide-int classes
Kenneth Zadeck zad...@naturalbridge.com writes: There is no place for exactly two HWIs in the machine independent parts of the compiler, I totally agree. In fact I was taking that so much for granted that I didn't even think to add a rider about it, sorry. I didn't mean to imply that we should keep double_int around. I think the reason for doing this is to prove that it can be done (so that the wide_int code isn't too big a change for the tree level) and to make it easier to merge the wide-int patches into trunk piecemeal if we need to. small bugs below this line. bottom of frag 3 of gcc/cp/init.c is wrong: you replaced rshift...lshift with lshift...lshift. Do you mean this bit: unsigned shift = (max_outer_nelts.get_precision ()) - 7 - - max_outer_nelts.clz ().to_shwi (); - max_outer_nelts = max_outer_nelts.rshiftu (shift).lshift (shift); + - wi::clz (max_outer_nelts); + max_outer_nelts = wi::lshift (wi::lrshift (max_outer_nelts, shift), + shift); ? That's lrshift (logical right shift). I ended up using the double-int names for right shifts. That does remind me of another thing though. I notice some of the wide-int code assumes that shifting a signed HWI right gives an arithmetic shift, but the language doesn't guarantee that. We probably need to go through and fix those. i will finish reading this tomorrow, but i wanted to get some comments in for the early shift.i stopped reading at line 1275. Thanks. TBH I've not really been through the third part myself to double-check. Will try to do that while waiting for comments on the first part. Richard
Re: [patch 4/4] -fstrict-volatile-bitfields cleanup v3: remove from defaults on all targets
On Sun, Sep 1, 2013 at 3:10 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: On Fri, 30 Aug 2013 11:47:21, Richard Biener wrote: On Tue, Jul 2, 2013 at 7:33 PM, DJ Delorie d...@redhat.com wrote: The choice appears to be to continue to have broken volatile bitfields on ARM with no way for users to make them conform to the ABI, or to change things so that they conform to the ABI if you specify -fstrict-volatile-bitfields explicitly and to the C/C++ standard by default, without that option. I can't speak for ARM, but for the other targets (for which I wrote the original patch), the requirement is that volatile bitfield accesses ALWAYS be in the mode of the type specified by the user. If the user says int x:8; then SImode (assuming 32-bit ints) must always be used, even if QImode could be used. The reason for this is that volatile bitfields are normally used for memory-mapped peripherals, and accessing peripheral registers in the wrong mode leads to incorrect operation. I've argued in the past that this part of the semantics can be easily implemented in the existing C++ memory model code (well, in get-bit-range) for the cases where it doesn't conflict with the C++ memory model. For the cases where it conflicts a warning can be emitted, Like when you do struct { volatile int x:8; char c; } x; where 'c' would be within the SImode you are supposed to use with -fstrict-volatile-bitfields. Which raises the question ... how does x.x = 1; actually work? IMHO it must be sth horribly inefficient like tem = x.c; // QImode tem = tem 8 | 1; // combine into SImode value x.x = tem; // SImode store AAPCS is very explicit what should happen here: tem = x.x; // SImode tem = (tem ~0xFF) | 1; x.x = tem; // SImode struct x should be 4 bytes large, and 4 bytes aligned (int x) the member c is at offset 1, and gets overwritten which is forbidden in C++ memory model, but required by AAPCS Ok, so what I think is weird here is that the volatile store to x.x is replaced with a read and a store - for hardware I/O that's exactly what I would not have expected if I qualify sth with volatile. In my simple mind 'volatile' preserves the exact number of loads and stores. Of course I was wrong here ;) hoping that no sane ABI makes 'x' size 2. Oh, I _can_ make it size 2: struct { volatile int x:8; char c; } __attribute__((packed)) x; char y; IMHO the AAPCS forbids packed structures. Therefore we need not interfere with the C++ memory model if we have unaligned data. note the fancy global object 'y' I placed after 'x'. Now the store will clobber y(?) So the only 'valid' way is tem = x.x; // SImode read(!) tem = tem 0xff..00 | 1; // manipulate value x.x = tem; // SImode store but then this doesn't work either because that 1-byte aligned object could reside at a page boundary and so the read and write would trap. That is exactly what happened see bug#56341, and it is fixed with parts 1 2 of Sandra's patch. Here because if the 1-byte alignment the function strict_volatile_bitfield_p() returns false and thus the C++ memory model is used. Makes me ask who designed that crap ;) But my point was that for all these special cases that likely do not happen in practice (fingers crossed) the C++ memory model way doesn't interfere with -fstrict-volatile-bitfields and the code can be perfectly used to make the code as close as possible to the -fstrict-volatile-bitifeld ABI. Actually the C++ memory model and -fstrict-volatile-bitfields have some significant differences, your first example is one of them. And since part 4 changes the default of -fstrict-volatile-bitfields, I thought it would be good to have a warning when such a construct is used, which generates inherently different code if -fstrict-volatile-bitfields is used or not. I personally could live with or without part 4 of the patch, and I do not insist on the warnings part either. That was only my try to find a way how part 4 of Sandra's patch could be generally accepted. Note: If you want I can re-post the warnings patch in a new thread. Can someone, in a new thread, ping the patches that are still in flight? ISTR having approved bits of some patches before my leave. Thanks, Richard. Thanks Bernd. Richard.
Re: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng bin.ch...@arm.com wrote: Hi, The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find different MEM_REFs sharing common part in addressing expression. If such MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient addressing expression during the RTL passes. The pass analyzes addressing expression in each MEM_REF to see if it can be formalized as follows: base:MEM_REF (T1, C1) offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) bitpos: C4 * BITS_PER_UNIT Then restructures it into below form: MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), C1 + (C2 * C3) + C4) At last, rewrite the MEM_REFs if there are two or more sharing common (non-constant) part. The problem is it doesn't back trace T2. If T2 is recorded as a CAND_ADD in form of T2' + C5, the MEM_REF should be restructure into: MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)), C1 + (C2 * C3) + C4 + (C5 * C3)) The patch also includes a test case to illustrate the problem. Bootstrapped and tested on x86/x86_64/arm-a15, is it ok? This looks ok to me if Bill is ok with it. Thanks, Richard. Thanks. bin 2013-09-02 Bin Cheng bin.ch...@arm.com * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New. (restructure_reference): Call backtrace_base_for_ref. gcc/testsuite/ChangeLog 2013-09-02 Bin Cheng bin.ch...@arm.com * gcc.dg/tree-ssa/slsr-39.c: New test.
Re: [PATCH GCC]Find auto-increment use both before and after candidate's increment in IVOPT
On Mon, Sep 2, 2013 at 9:03 AM, bin.cheng bin.ch...@arm.com wrote: Hi, For now set_autoinc_for_original_candidates only searches auto-inc uses before candidate's increment, causing pre-increment opportunities missed for original candidates. This is a straightforward fix by searching after candidate's increment too. The patch also includes a test case to illustrate the problem. Without the patch, assembly of the test is: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movwr3, #:lower16:__ctype_ptr__ ldrbr2, [r0]@ zero_extendqisi2 movtr3, #:upper16:__ctype_ptr__ ldr r1, [r3] addsr3, r1, r2 ldrbr3, [r3, #1]@ zero_extendqisi2 lslsr3, r3, #29 bmi .L2 addsr3, r0, #1 .L3: mov r0, r3 addsr3, r3, #1 ldrbr2, [r0]@ zero_extendqisi2 add r2, r2, r1 ldrbr2, [r2, #1]@ zero_extendqisi2 lslsr2, r2, #29 bpl .L3 .L2: bx lr .size foo, .-foo Which can be optimized into below: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movwr3, #:lower16:__ctype_ptr__ ldrbr1, [r0]@ zero_extendqisi2 movtr3, #:upper16:__ctype_ptr__ ldr r2, [r3] addsr3, r2, r1 ldrbr3, [r3, #1]@ zero_extendqisi2 lslsr1, r3, #29 bmi .L2 .L3: ldrbr3, [r0, #1]! @ zero_extendqisi2 add r3, r3, r2 ldrbr3, [r3, #1]@ zero_extendqisi2 lslsr3, r3, #29 bpl .L3 .L2: bx lr .size foo, .-foo Bootstrapped and tested on arm a15, is it OK? Ok. Thanks, Richard. Thanks. bin 2013-09-02 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (set_autoinc_for_original_candidates): Find auto-increment use both before and after candidate. gcc/testsuite/ChangeLog 2013-09-02 Bin Cheng bin.ch...@arm.com * gcc.target/arm/ivopts-orig_biv-inc.c: New test.
Re: [PATCH, PR 58106] Make ipa_edge_duplication_hook cope with recursive inlining
2013-08-27 Martin Jambor mjam...@suse.cz PR ipa/58106 * ipa-prop.c (ipa_edge_duplication_hook): Always put new rdesc to the linked list. When finding the correct duplicate, also consider also the caller in additon to its inlined_to node. testsuite/ * gcc.dg/ipa/pr58106.c: New test. OK, thanks Honza
Re: converting wide-int so that it uses its own type rather than hwi.
On Fri, Aug 30, 2013 at 5:21 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: richi, on further thought, this is going to be a huge task. The problem is at the edges. right now we share the rep of the array with the tree-csts and rtl (though the rtl sharing may go away to support canonization). So if the hwi rep inside of wide int changes then we will have to implement copying with reblocking at that interface or push the type into there and change all of the fits_*hwi and to_*hwi interfaces to fit this different type. Obviously the reps of RTL and tree would need to change as well. Yes, I am aware of the explicit HWI references in the API - those functions would be more complicated. i think i can get at least as good and perhaps better test coverage by changing the rep of hwi for a port.There will also be fallout work there, but it will be productive, in that it is just changing code from only doing the hwi case to supporting all precisions. Well, I'm not sure the fallout will be exactly small ;) What also could improve testing coverage is to skip the fast path, thus always go the out-of-line path even for len == 1. Not sure if the ool path is prepared to handle len == 1 though. Richard. Kenny
Re: [PATCH] Don't issue array bound warnings on zero-length arrays
On Fri, Aug 30, 2013 at 5:13 PM, Meador Inge mead...@codesourcery.com wrote: Hi All, This patch fixes a minor issue that can occur when issuing array bounds warnings. In GNU C mode we allow empty lists and their upper bound is initialized to -1. This confuses the array bound analysis in VRP and in some cases we end up issuing false positives. This patch fixes the issue by bailing out when a zero-length array is encountered. OK for trunk? gcc/ 2013-08-30 Meador Inge mead...@codesourcery.com * tree-vrp.c (check_array_ref): Bail out no emtpy arrays. gcc/testsuite/ 2013-08-30 Meador Inge mead...@codesourcery.com * gcc.dg/Warray-bounds-11.c: New testcase. Index: gcc/testsuite/gcc.dg/Warray-bounds-11.c === --- gcc/testsuite/gcc.dg/Warray-bounds-11.c (revision 0) +++ gcc/testsuite/gcc.dg/Warray-bounds-11.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -Warray-bounds -std=gnu99 } */ +/* Test zero-length arrays for GNU C. */ + +unsigned int a[] = { }; +unsigned int size_a; + +int test(void) +{ + /* This should not warn. */ + return size_a ? a[0] : 0; +} Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 202088) +++ gcc/tree-vrp.c (working copy) @@ -6137,9 +6137,10 @@ check_array_ref (location_t location, tr low_sub = up_sub = TREE_OPERAND (ref, 1); up_bound = array_ref_up_bound (ref); - /* Can not check flexible arrays. */ + /* Can not check flexible arrays or zero-length arrays. */ if (!up_bound - || TREE_CODE (up_bound) != INTEGER_CST) + || TREE_CODE (up_bound) != INTEGER_CST + || tree_int_cst_equal (up_bound, integer_minus_one_node)) That doesn't look correct - what if the lower bound is -10? That can easily happen for Ada, so please revert the patch. And I fail to see why the testcase should not warn. Clearly you have a definition of a here and it doesn't have an element so the access is out of bounds. Richard. return; /* Accesses to trailing arrays via pointers may access storage
Re: Fix PR middle-end/57370
On Fri, Aug 30, 2013 at 6:13 PM, Easwaran Raman era...@google.com wrote: On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman era...@google.com wrote: Richard, Do you want me to commit everything but the insert_stmt_after hunk now? Yes. There are more similar failures reported in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached the updated patch there. Shall I send that for review? Apart from the debug statement issue, almost all the bugs are due to dependence violation because certain newly inserted statements do not have the right UID. Instead of trying to catch all of them, will it be better if I check if the stmt has a proper uid (non-zero if it is not the first stmt) and assign a sensible value at the point where it is used (find_insert_point and appears_later_in_bb) instead of where the stmt is created? I think that would be less brittle. In the end all this placement stuff should be reverted and done as post-processing after reassoc is finished. You can remember the inserted stmts that are candidates for moving (just set a pass-local flag on them) and assign UIDs for the whole function after all stmts are inserted. The problem is we need sane UID values as reassoc is happening - not after it is finished. As it process each stmt in reassoc_bb, it might generate new stmts which might have to be compared in find_insert_point or appears_later_in_bb. But if you no longer find_insert_point during reassoc but instead do a scheduling pass after it finished you won't need sane UIDs during reassoc. Richard. - Easwaran Richard. - Easwaran On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote: I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. Missing ChangeLog entry for the insert_stmt_after hunk which I do not like at all. The other hunks are ok, but we need to work harder to preserve debug stmts - simply removing all is not going to fly. Richard. testsuite/ChangeLog: 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote: Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote: A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 - t5456 + t5459 - +t5462 + t5466 - t5468 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 +t5489 =
Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL
I'd like to ping this patch 1 of 2 that removes redundant zero/sign extension using value range information. Bootstrapped and no new regression for x86_64-unknown-linux-gnu and arm-none-linux-gnueabi. Thanks you for your time. Kugan n 14/08/13 16:49, Kugan wrote: Hi Richard, Here is an attempt to address your earlier review comments. Bootstrapped and there is no new regression for X86_64 and arm. Thank you very much for your time. Thanks, Kugan --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,25 @@ +2013-08-14 Kugan Vivekanandarajah kug...@linaro.org + +* tree-flow.h (mark_range_info_unknown): New function definition. +* tree-ssa-alias.c (dump_alias_info) : Check pointer type. +* tree-ssa-copy.c (fini_copy_prop) : Check pointer type and copy +range info. +* tree-ssanames.c (make_ssa_name_fn) : Check pointer type in +initialize. +* (mark_range_info_unknown) : New function. +* (duplicate_ssa_name_range_info) : Likewise. +* (duplicate_ssa_name_fn) : Check pointer type and call correct +duplicate function. +* tree-vrp.c (extract_exp_value_range): New function. +* (simplify_stmt_using_ranges): Call extract_exp_value_range and +tree_ssa_set_value_range. +* tree-ssaname.c (ssa_range_info): New function. +* tree.h (SSA_NAME_PTR_INFO) : changed to access via union +* tree.h (SSA_NAME_RANGE_INFO) : New macro +* gimple-pretty-print.c (print_double_int) : New function. +* gimple-pretty-print.c (dump_gimple_phi) : Dump range info. +* (pp_gimple_stmt_1) : Likewise. + 2013-08-09 Jan Hubicka j...@suse.cz * cgraph.c (cgraph_create_edge_1): Clear speculative flag. On 03/07/13 21:55, Kugan wrote: On 17/06/13 18:33, Richard Biener wrote: On Mon, 17 Jun 2013, Kugan wrote: +/* Extract the value range of assigned exprassion for GIMPLE_ASSIGN stmt. + If the extracted value range is valid, return true else return + false. */ +static bool +extract_exp_value_range (gimple stmt, value_range_t *vr) +{ + gcc_assert (is_gimple_assign (stmt)); + tree rhs1 = gimple_assign_rhs1 (stmt); + tree lhs = gimple_assign_lhs (stmt); + enum tree_code rhs_code = gimple_assign_rhs_code (stmt); ... @@ -8960,6 +9016,23 @@ simplify_stmt_using_ranges (gimple_stmt_iterator *gsi) { enum tree_code rhs_code = gimple_assign_rhs_code (stmt); tree rhs1 = gimple_assign_rhs1 (stmt); + tree lhs = gimple_assign_lhs (stmt); + + /* Set value range information for ssa. */ + if (!POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))) + (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME) + INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))) + !SSA_NAME_RANGE_INFO (lhs)) +{ + value_range_t vr = VR_INITIALIZER; ... + if (extract_exp_value_range (stmt, vr)) +tree_ssa_set_value_range (lhs, + tree_to_double_int (vr.min), + tree_to_double_int (vr.max), + vr.type == VR_RANGE); +} This looks overly complicated to me. In vrp_finalize you can simply do for (i = 0; i num_vr_values; i++) if (vr_value[i]) { tree name = ssa_name (i); if (POINTER_TYPE_P (name)) continue; if (vr_value[i].type == VR_RANGE || vr_value[i].type == VR_ANTI_RANGE) tree_ssa_set_value_range (name, tree_to_double_int (vr_value[i].min), tree_to_double_int (vr_value[i].max), vr_value[i].type == VR_RANGE); } Thanks Richard for taking time to review it. I was doing something like what you are suggesting earlier but noticed some problems and that’s the reason why I changed. For example, for the following testcase from the test suite, unsigned long l = (unsigned long)-2; unsigned short s; int main () { long t = l + 1; s = l; if (s != (unsigned short) -2) abort (); exit (0); } with the following gimple stmts main () { short unsigned int s.1; long unsigned int l.0; ;; basic block 2, loop depth 0 ;;pred: ENTRY l.0_2 = l; s.1_3 = (short unsigned int) l.0_2; s = s.1_3; if (s.1_3 != 65534) goto bb 3; else goto bb 4; ;;succ: 3 ;;4 ;; basic block 3, loop depth 0 ;;pred: 2 abort (); ;;succ: ;; basic block 4, loop depth 0 ;;pred: 2 exit (0); ;;succ: } has the following value range. l.0_2: VARYING s.1_3: [0, +INF] From zero/sign extension point of view, the variable s.1_3 is expected to have a value that will overflow (or varying) as this is what is assigned to a smaller variable. extract_range_from_assignment initially calculates the value range as VARYING but later changed to [0, +INF] by extract_range_basic. What I need here is the value that will be assigned from the rhs expression and not the value that we will have with proper assignment. I understand that
Re: [PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP
I'd like to ping this patch 2 of 2 that removes redundant zero/sign extension using value range information. Bootstrapped and no new regression for x86_64-unknown-linux-gnu and arm-none-linux-gnueabi. Thanks you for your time. Kugan On 14/08/13 16:59, Kugan wrote: Hi Eric, Thanks for reviewing the patch. On 01/07/13 18:51, Eric Botcazou wrote: [Sorry for the delay] For example, when an expression is evaluated and it's value is assigned to variable of type short, the generated RTL would look something like the following. (set (reg:SI 110) (zero_extend:SI (subreg:HI (reg:SI 117) 0))) However, if during value range propagation, if we can say for certain that the value of the expression which is present in register 117 is within the limits of short and there is no sign conversion, we do not need to perform the subreg and zero_extend; instead we can generate the following RTl. (set (reg:SI 110) (reg:SI 117))) Same could be done for other assign statements. The idea looks interesting. Some remarks: +2013-06-03 Kugan Vivekanandarajah kug...@linaro.org + +* gcc/dojump.c (do_compare_and_jump): generates rtl without +zero/sign extension if redundant. +* gcc/cfgexpand.c (expand_gimple_stmt_1): Likewise. +* gcc/gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New +function. +* gcc/gimple.h (gimple_assign_is_zero_sign_ext_redundant) : New +function definition. No gcc/ prefix in entries for gcc/ChangeLog. Generate RTL without... I have now changed it to. --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,13 @@ +2013-08-14 Kugan Vivekanandarajah kug...@linaro.org + +* dojump.c (do_compare_and_jump): Generate rtl without +zero/sign extension if redundant. +* cfgexpand.c (expand_gimple_stmt_1): Likewise. +* gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New +function. +* gimple.h (gimple_assign_is_zero_sign_ext_redundant) : New +function definition. + 2013-08-09 Jan Hubicka j...@suse.cz * cgraph.c (cgraph_create_edge_1): Clear speculative flag. +/* If the value in SUBREG of temp fits that SUBREG (does not + overflow) and is assigned to target SUBREG of the same mode + without sign convertion, we can skip the SUBREG + and extension. */ +else if (promoted + gimple_assign_is_zero_sign_ext_redundant (stmt) + (GET_CODE (temp) == SUBREG) + (GET_MODE (target) == GET_MODE (temp)) + (GET_MODE (SUBREG_REG (target)) + == GET_MODE (SUBREG_REG (temp + emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp)); else if (promoted) { int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target); Can we relax the strict mode equality here? This change augments the same transformation applied to the RHS when it is also a SUBREG_PROMOTED_VAR_P at the beginning of convert_move, but the condition on the mode is less strict in the latter case, so maybe it can serve as a model here. I have now changed it based on convert_move to +else if (promoted + gimple_assign_is_zero_sign_ext_redundant (stmt) + (GET_CODE (temp) == SUBREG) + (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (temp))) + = GET_MODE_PRECISION (GET_MODE (target))) + (GET_MODE (SUBREG_REG (target)) + == GET_MODE (SUBREG_REG (temp + { Is this what you wanted me to do. + /* Is zero/sign extension redundant as per VRP. */ + bool op0_ext_redundant = false; + bool op1_ext_redundant = false; + + /* If promoted and the value in SUBREG of op0 fits (does not overflow), + it is a candidate for extension elimination. */ + if (GET_CODE (op0) == SUBREG SUBREG_PROMOTED_VAR_P (op0)) +op0_ext_redundant = + gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT (treeop0)); + + /* If promoted and the value in SUBREG of op1 fits (does not overflow), + it is a candidate for extension elimination. */ + if (GET_CODE (op1) == SUBREG SUBREG_PROMOTED_VAR_P (op1)) +op1_ext_redundant = + gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT (treeop1)); Are the gimple_assign_is_zero_sign_ext_redundant checks necessary here? When set on a SUBREG, SUBREG_PROMOTED_VAR_P guarantees that SUBREG_REG is always properly extended (otherwise it's a bug) so don't you just need to compare SUBREG_PROMOTED_UNSIGNED_SET? See do_jump for an existing case. I am sorry I don’t think I understood you here. How would I know that extension is redundant without calling gimple_assign_is_zero_sign_ext_redundant ? Could you please elaborate. + /* If zero/sign extension is redundant, generate RTL + for operands without zero/sign extension. */ + if
Re: [RFC] Changes to the wide-int classes
On Mon, 2 Sep 2013, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: There is no place for exactly two HWIs in the machine independent parts of the compiler, I totally agree. In fact I was taking that so much for granted that I didn't even think to add a rider about it, sorry. I didn't mean to imply that we should keep double_int around. I think the reason for doing this is to prove that it can be done (so that the wide_int code isn't too big a change for the tree level) and to make it easier to merge the wide-int patches into trunk piecemeal if we need to. Note that changing the tree rep to non-double_int is easy. Also note that I only want 'double_int' to stay around to have a fast type that can work on two HWIs for the code that need more precision than a HWI. The only important cases I know of are in get_inner_reference and get_ref_base_and_extent and friends. Those are heavily used (and the only double-int function callers that even show up in regular cc1 profiles). So if wide_int_fixed2 ('2' better be replaced with 'number-that-gets-me-twice-target-sizetype-precision') works for those cases then fine (and we can drop double-ints). Richard. small bugs below this line. bottom of frag 3 of gcc/cp/init.c is wrong: you replaced rshift...lshift with lshift...lshift. Do you mean this bit: unsigned shift = (max_outer_nelts.get_precision ()) - 7 - - max_outer_nelts.clz ().to_shwi (); - max_outer_nelts = max_outer_nelts.rshiftu (shift).lshift (shift); + - wi::clz (max_outer_nelts); + max_outer_nelts = wi::lshift (wi::lrshift (max_outer_nelts, shift), + shift); ? That's lrshift (logical right shift). I ended up using the double-int names for right shifts. That does remind me of another thing though. I notice some of the wide-int code assumes that shifting a signed HWI right gives an arithmetic shift, but the language doesn't guarantee that. We probably need to go through and fix those. i will finish reading this tomorrow, but i wanted to get some comments in for the early shift.i stopped reading at line 1275. Thanks. TBH I've not really been through the third part myself to double-check. Will try to do that while waiting for comments on the first part. Richard -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping. On 13/8/20 10:57 AM, Chung-Lin Tang wrote: Ping. BTW, the SC has approved the Nios II port already: http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html The port is still awaiting technical review. Thanks, Chung-Lin On 13/7/14 下午3:54, Chung-Lin Tang wrote: Hi, the last ping of the Nios II patches was: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html After assessing the state, we feel it would be better to post a re-submission of the newest patches. The changes accumulated since the original post include: 1) Several bug fixes related to built-in function expanding. 2) A few holes in hard-float FPU code generation was plugged. 3) Support for parsing white-spaces in target attributes. 4) Revision of consistency check behavior of codes in custom instruction built-ins. 5) Some new testcases. The issues raised by Joseph in the first round of reviewing have been addressed. Testing has been re-done on both 32-bit and 64-bit hosts. PR55035 appears to not have been resolved yet, which affects nios2 among several other targets, thus configured with --enable-werror-always still does not build. As before, Sandra and me will serve as nios2 port maintainers. Attached is the patch for the compiler-proper. Thanks, Chung-Lin 2013-07-14 Chung-Lin Tang clt...@codesourcery.com Sandra Loosemore san...@codesourcery.com Based on patches from Altera Corporation * config.gcc (nios2-*-*): Add nios2 config targets. * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. ($cpu_type): Add nios2 as new cpu type. * configure: Regenerate. * config/nios2/nios2.c: New file. * config/nios2/nios2.h: New file. * config/nios2/nios2-opts.h: New file. * config/nios2/nios2-protos.h: New file. * config/nios2/elf.h: New file. * config/nios2/elf.opt: New file. * config/nios2/linux.h: New file. * config/nios2/nios2.opt: New file. * config/nios2/nios2.md: New file. * config/nios2/predicates.md: New file. * config/nios2/constraints.md: New file. * config/nios2/t-nios2: New file. * common/config/nios2/nios2-common.c: New file. * doc/invoke.texi (Nios II options): Document Nios II specific options. * doc/md.texi (Nios II family): Document Nios II specific constraints. * doc/extend.texi (Function Specific Option Pragmas): Document Nios II supported target pragma functionality.
Re: [patch 4/4] -fstrict-volatile-bitfields cleanup v3: remove from defaults on all targets
On 01/09/13 14:10, Bernd Edlinger wrote: IMHO the AAPCS forbids packed structures. Therefore we need not interfere with the C++ memory model if we have unaligned data. The AAPCS neither forbids nor requires packed structures. They're a GNU extension and as such not part of standard C++. Thus the semantics of such an operation are irrelavant to the AAPCS: you get to chose what the behaviour is in this case... R.
Re: RFC - Refactor tree.h
On Sat, Aug 31, 2013 at 1:22 AM, Diego Novillo dnovi...@google.com wrote: Thanks for the suggestions. I've incorporated them into the patch. It now adds tree-core.h with all the structures, enums, typedefs and some fundamental declarations from tree.h. Everything else stays in tree.h for now. As we convert gimple files, we'll move declarations out of tree.h into other headers and rewrite missing functions in the new gimple API. This new patch bootstraps and tests fine on x86_64. OK for trunk? Looks good to me. Richard. 2013-08-30 Diego Novillo dnovi...@google.com * Makefile.in (TREE_CORE_H): Define. (TREE_H): Use. (GTFILES): Add tree-core.h. * builtins.c (built_in_class_names): Use BUILT_IN_LAST to size the array. * tree-core.h: New file. Move all data structures, enum, typedefs, global declarations and constants from ... * tree.h: ... here. Thanks. Diego.
Re: folding (vec_)cond_expr in a binary operation
On Fri, 30 Aug 2013, Richard Biener wrote: On Sat, Jul 6, 2013 at 9:42 PM, Marc Glisse marc.gli...@inria.fr wrote: First, the fold-const bit causes an assertion failure (building libjava) in combine_cond_expr_cond, which calls: t = fold_binary_loc (gimple_location (stmt), code, type, op0, op1); and then checks: /* Require that we got a boolean type out if we put one in. */ gcc_assert (TREE_CODE (TREE_TYPE (t)) == TREE_CODE (type)); which makes sense... Note that the 2 relevant types are: well, the check is probably old and can be relaxed to also allow all compatible types. Ok. Maybe it could even be removed then, we shouldn't check in every function that calls fold_binary_loc that it returns something of the type that was asked for. Second, the way the forwprop transformation is done, it can be necessary to run the forwprop pass several times in a row, which is not nice. It takes: stmt_cond stmt_binop and produces: stmt_folded stmt_cond2 But one of the arguments of stmt_folded could be an ssa_name defined by a cond_expr, which could now be propagated into stmt_folded (there may be other new possible transformations). However, that cond_expr has already been visited and won't be again. The combine part of the pass uses a PLF to revisit statements, but the forwprop part doesn't have any specific mechanism. forwprop is designed to re-process stmts it has folded to catch cascading effects. Now, as it as both a forward and a backward run you don't catch 2nd-order effects with iterating them. On my TODO is to only do one kind, either forward or backward propagation. My impression is that even internally in the forward part, the reprocessing doesn't really work, but that'll disappear anyway when the forward part disappears. Btw, as for the patch I don't like that you basically feed everything into fold. Yes, I know we do that for conditions because that's quite important and nobody has re-written the condition folding as gimple pattern matcher. I doubt that this transform is important enough to warrant another exception ;) I am not sure what you want here. When I notice the pattern (a?b:c) op d, I need to check whether b op d and c op d are likely to simplify before transforming it to a?(b op d):(c op d). And currently I can't see any way to do that other than feeding (b op d) to fold. Even if/when we get a gimple folder, we will still need to call it in about the same way. -- Marc Glisse
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
On Aug 21, 2013, at 11:47 PM, Jan Hubicka hubi...@ucw.cz wrote: The problem is that DECL_ARGUMENTS of the thunk (aka _ZThn528_N1D3fooEv) is used during thunk code-generation, and thunk code-generation happens during the output of D::foo. I see, I will try to modify i386 backend to not output thunks. The problem indeed is that thunks' arguments are built by the front-end and they are no longer streamed. I am surprised i386 survives this given that it also produces some gimple thunks. I guess easiest way around is to make them to be streamed same way as we stream functions that are used as abstract origin. I have different plans in this direction - I want to lower thunks to gimple form early so they go through the usual channel and get i.e. the profile read correctly. So, any news on this? I was bid dragged into other debuggning. Sorry for the delay. This patch seems to fix the problems on ppc64 hacked to not use asm thunks. Can you, please, test it that it solves problems on your target? Honza Index: cgraphunit.c === --- cgraphunit.c(revision 202153) +++ cgraphunit.c(working copy) @@ -1414,14 +1414,18 @@ expand_thunk (struct cgraph_node *node) tree virtual_offset = NULL; tree alias = node-callees-callee-symbol.decl; tree thunk_fndecl = node-symbol.decl; - tree a = DECL_ARGUMENTS (thunk_fndecl); + tree a; + + if (in_lto_p) +cgraph_get_body (node); + a = DECL_ARGUMENTS (thunk_fndecl); current_function_decl = thunk_fndecl; /* Ensure thunks are emitted in their correct sections. */ resolve_unique_section (thunk_fndecl, 0, flag_function_sections); - if (this_adjusting + if (this_adjusting targetm.asm_out.can_output_mi_thunk (thunk_fndecl, fixed_offset, virtual_value, alias)) { Index: lto-streamer-in.c === --- lto-streamer-in.c (revision 202153) +++ lto-streamer-in.c (working copy) @@ -998,6 +998,7 @@ input_function (tree fn_decl, struct dat free_dominance_info (CDI_DOMINATORS); free_dominance_info (CDI_POST_DOMINATORS); free (stmts); + pop_cfun (); } @@ -1086,8 +1087,6 @@ lto_read_body (struct lto_file_decl_data /* Restore decl state */ file_data-current_decl_state = file_data-global_decl_state; - - pop_cfun (); } lto_data_in_delete (data_in); Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 202153) +++ lto-streamer-out.c (working copy) @@ -1982,8 +1982,7 @@ lto_output (void) cgraph_node *node = dyn_cast cgraph_node (snode); if (node lto_symtab_encoder_encode_body_p (encoder, node) - !node-symbol.alias - !node-thunk.thunk_p) + !node-symbol.alias) { #ifdef ENABLE_CHECKING gcc_assert (!bitmap_bit_p (output, DECL_UID (node-symbol.decl)));
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
On Mon, Sep 2, 2013 at 12:52 PM, Jan Hubicka hubi...@ucw.cz wrote: On Aug 21, 2013, at 11:47 PM, Jan Hubicka hubi...@ucw.cz wrote: The problem is that DECL_ARGUMENTS of the thunk (aka _ZThn528_N1D3fooEv) is used during thunk code-generation, and thunk code-generation happens during the output of D::foo. I see, I will try to modify i386 backend to not output thunks. The problem indeed is that thunks' arguments are built by the front-end and they are no longer streamed. I am surprised i386 survives this given that it also produces some gimple thunks. I guess easiest way around is to make them to be streamed same way as we stream functions that are used as abstract origin. I have different plans in this direction - I want to lower thunks to gimple form early so they go through the usual channel and get i.e. the profile read correctly. So, any news on this? I was bid dragged into other debuggning. Sorry for the delay. This patch seems to fix the problems on ppc64 hacked to not use asm thunks. Can you, please, test it that it solves problems on your target? Honza Index: cgraphunit.c === --- cgraphunit.c(revision 202153) +++ cgraphunit.c(working copy) @@ -1414,14 +1414,18 @@ expand_thunk (struct cgraph_node *node) tree virtual_offset = NULL; tree alias = node-callees-callee-symbol.decl; tree thunk_fndecl = node-symbol.decl; - tree a = DECL_ARGUMENTS (thunk_fndecl); + tree a; + + if (in_lto_p) +cgraph_get_body (node); That looks gross. It cannot possibly be the correct fix for this. Richard. + a = DECL_ARGUMENTS (thunk_fndecl); current_function_decl = thunk_fndecl; /* Ensure thunks are emitted in their correct sections. */ resolve_unique_section (thunk_fndecl, 0, flag_function_sections); - if (this_adjusting + if (this_adjusting targetm.asm_out.can_output_mi_thunk (thunk_fndecl, fixed_offset, virtual_value, alias)) { Index: lto-streamer-in.c === --- lto-streamer-in.c (revision 202153) +++ lto-streamer-in.c (working copy) @@ -998,6 +998,7 @@ input_function (tree fn_decl, struct dat free_dominance_info (CDI_DOMINATORS); free_dominance_info (CDI_POST_DOMINATORS); free (stmts); + pop_cfun (); } @@ -1086,8 +1087,6 @@ lto_read_body (struct lto_file_decl_data /* Restore decl state */ file_data-current_decl_state = file_data-global_decl_state; - - pop_cfun (); } lto_data_in_delete (data_in); Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 202153) +++ lto-streamer-out.c (working copy) @@ -1982,8 +1982,7 @@ lto_output (void) cgraph_node *node = dyn_cast cgraph_node (snode); if (node lto_symtab_encoder_encode_body_p (encoder, node) - !node-symbol.alias - !node-thunk.thunk_p) + !node-symbol.alias) { #ifdef ENABLE_CHECKING gcc_assert (!bitmap_bit_p (output, DECL_UID (node-symbol.decl)));
[PATCH] Preserve more pointer arithmetic in IVOPTs
tree-affine is a bit overzealous when converting elements of pointer-typed combinations to sizetype. From IVOPTs we often get a combination that doesn't start with a pointer element in which case the result was a pure sizetype compute. This shows when fixing PR57511 in gcc.dg/tree-ssa/reassoc-19.c where we after the fix detect induction variables but manage to break the pattern we expect. So the following re-arranges add_elt_to_tree to preserve pointerness when possible. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2013-09-02 Richard Biener rguent...@suse.de * tree-affine.c (add_elt_to_tree): Avoid converting all pointer arithmetic to sizetype. * gcc.dg/tree-ssa/loop-4.c: Adjust scan looking for one memory reference. Index: gcc/tree-affine.c === *** gcc/tree-affine.c (revision 202097) --- gcc/tree-affine.c (working copy) *** add_elt_to_tree (tree expr, tree type, t *** 377,411 type1 = sizetype; scale = double_int_ext_for_comb (scale, comb); ! elt = fold_convert (type1, elt); if (scale.is_one ()) { if (!expr) ! return fold_convert (type, elt); ! if (POINTER_TYPE_P (type)) ! return fold_build_pointer_plus (expr, elt); ! return fold_build2 (PLUS_EXPR, type, expr, elt); } if (scale.is_minus_one ()) { if (!expr) ! return fold_convert (type, fold_build1 (NEGATE_EXPR, type1, elt)); ! if (POINTER_TYPE_P (type)) ! { ! elt = fold_build1 (NEGATE_EXPR, type1, elt); ! return fold_build_pointer_plus (expr, elt); ! } ! return fold_build2 (MINUS_EXPR, type, expr, elt); } if (!expr) ! return fold_convert (type, !fold_build2 (MULT_EXPR, type1, elt, ! double_int_to_tree (type1, scale))); if (scale.is_negative ()) { --- 377,422 type1 = sizetype; scale = double_int_ext_for_comb (scale, comb); ! ! if (scale.is_minus_one () !POINTER_TYPE_P (TREE_TYPE (elt))) ! { ! elt = fold_build1 (NEGATE_EXPR, sizetype, convert_to_ptrofftype (elt)); ! scale = double_int_one; ! } if (scale.is_one ()) { if (!expr) ! return elt; ! if (POINTER_TYPE_P (TREE_TYPE (expr))) ! return fold_build_pointer_plus (expr, convert_to_ptrofftype (elt)); ! if (POINTER_TYPE_P (TREE_TYPE (elt))) ! return fold_build_pointer_plus (elt, convert_to_ptrofftype (expr)); ! return fold_build2 (PLUS_EXPR, type1, ! fold_convert (type1, expr), ! fold_convert (type1, elt)); } if (scale.is_minus_one ()) { if (!expr) ! return fold_build1 (NEGATE_EXPR, TREE_TYPE (elt), elt); ! if (POINTER_TYPE_P (TREE_TYPE (expr))) ! return fold_build_pointer_plus ! (expr, convert_to_ptrofftype !(fold_build1 (NEGATE_EXPR, TREE_TYPE (elt), elt))); ! return fold_build2 (MINUS_EXPR, type1, ! fold_convert (type1, expr), ! fold_convert (type1, elt)); } + elt = fold_convert (type1, elt); if (!expr) ! return fold_build2 (MULT_EXPR, type1, elt, ! double_int_to_tree (type1, scale)); if (scale.is_negative ()) { *** add_elt_to_tree (tree expr, tree type, t *** 417,429 elt = fold_build2 (MULT_EXPR, type1, elt, double_int_to_tree (type1, scale)); ! if (POINTER_TYPE_P (type)) { if (code == MINUS_EXPR) elt = fold_build1 (NEGATE_EXPR, type1, elt); return fold_build_pointer_plus (expr, elt); } ! return fold_build2 (code, type, expr, elt); } /* Makes tree from the affine combination COMB. */ --- 428,441 elt = fold_build2 (MULT_EXPR, type1, elt, double_int_to_tree (type1, scale)); ! if (POINTER_TYPE_P (TREE_TYPE (expr))) { if (code == MINUS_EXPR) elt = fold_build1 (NEGATE_EXPR, type1, elt); return fold_build_pointer_plus (expr, elt); } ! return fold_build2 (code, type1, ! fold_convert (type1, expr), elt); } /* Makes tree from the affine combination COMB. */ Index: gcc/testsuite/gcc.dg/tree-ssa/loop-4.c === *** gcc/testsuite/gcc.dg/tree-ssa/loop-4.c (revision 202097) --- gcc/testsuite/gcc.dg/tree-ssa/loop-4.c (working copy) *** void xxx(void) *** 37,43 /* { dg-final { scan-tree-dump-times \\* \[^\\n\\r\]*= 0 optimized } } */ /* { dg-final { scan-tree-dump-times \[^\\n\\r\]*= \\* 0 optimized } } */ ! /* { dg-final { scan-tree-dump-times MEM 1 optimized } } */ /* And the original induction variable should be
[PATCH] Fix PR57511
The fix for PR5/40281 was too broad which results in unhandled IVs in some loops and thus missed optimizations. The PR specifically talks about missed final value replacement but I've seem IVOPTs failing to detect BIVs like for gcc.dg/tree-ssa/reassoc-19.c. The following again allows SCEVs like { 0, +, { 1, +, 1 }_1 }_1, thus non-linear IVs. Re-bootstrap and testing running on x86_64-unknown-linux-gnu. Richard. 2013-09-02 Richard Biener rguent...@suse.de PR middle-end/57511 * tree-scalar-evolution.c (instantiate_scev_name): Allow non-linear SCEVs. * gcc.dg/tree-ssa/sccp-1.c: New testcase. Index: gcc/tree-scalar-evolution.c === *** gcc/tree-scalar-evolution.c (revision 202050) --- gcc/tree-scalar-evolution.c (working copy) *** instantiate_scev_name (basic_block insta *** 2252,2257 --- 2252,2258 else if (res != chrec_dont_know) { if (inner_loop + def_bb-loop_father != inner_loop !flow_loop_nested_p (def_bb-loop_father, inner_loop)) /* ??? We could try to compute the overall effect of the loop here. */ res = chrec_dont_know; Index: gcc/testsuite/gcc.dg/tree-ssa/sccp-1.c === *** gcc/testsuite/gcc.dg/tree-ssa/sccp-1.c (revision 0) --- gcc/testsuite/gcc.dg/tree-ssa/sccp-1.c (working copy) *** *** 0 --- 1,15 + /* { dg-do compile } */ + /* { dg-options -O2 -fdump-tree-optimized } */ + + int main(int argc, char* argv[]) + { + int i, a = 0; + for (i=0; i 10; i++) + a += i + 0xff00ff; + return a; + } + + /* There should be no loop left. */ + + /* { dg-final { scan-tree-dump-times goto 0 optimized } } */ + /* { dg-final { cleanup-tree-dump optimized } } */
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
+ tree a; + + if (in_lto_p) +cgraph_get_body (node); That looks gross. It cannot possibly be the correct fix for this. DECL_ARGUMENTS/DECL_RESULT are now part of function body. cgraph_get_body is there to load it for you when you need it. We are going to expand the function so it makes sense to get it. The same is done by the passmanager when function is going to be expanded. Only difference here is that thunks do not go through the passmanager. I can drop in_lto_p (the function does nothing when body is already there) Honza
Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy
Hi, On Fri, 30 Aug 2013, David Malcolm wrote: Here's the result of a pair of builds of r202029 without and with the patches, configured with --enable-checking=release, running make, then stripping debuginfo [1] So the overall sizes of such binaries are essentially unchanged. Yep, cool. Any suggestions on what to compile to compare performance? By 60 seconds, do you mean 60s for one TU, or do you mean a large build e.g. the linux kernel? For one TU, so as to not measure any significant execve, as or IO overhead. Some people compile e.g preprocessed variants of some GCC source file, myself I'm measuring such speeds since some years with unchanged versions of the attached (big-code.c, several large functions with arithmetics and one-deep loops) which can be customized by hand to produce different largeness and kdecore.cc [1] (real world C++ library code from 2009). You can customize big-code.c to match some compile time goal by commenting out some FUNC invocation for some types, or by changing the body size of the functions by fiddling in the FUNC macro itself to invoke more of the L2's or even L3's, but that's really slow. And the manual GTY markers are so not maintainable in the long run, gengtype or something else really needs to be taught to create them automatically. Apart from the GTY aspect, how do people feel about the patch series? Generally I like it, the parts I don't like are common to all the C++ification patches [2]. The manual GTY parts give me the creeps, but I think I expressed that already :) Ciao, Michael. [1] http://frakked.de/~matz/kdecore.cc.gz [2] In this series it's the is_a/has_a/dyn_cast uglification, - gcc_gimple_checking_assert (gimple_has_mem_ops (g)); - g-gsmembase.vuse = vuse; + gimple_statement_with_memory_ops *mem_ops_stmt = +as_a gimple_statement_with_memory_ops (g); + mem_ops_stmt-vuse = vuse; I can't really say I find this shorter, easier to read, more expressive or even safer than what was there before. And the repetition for adding the helpers for const and non-const types all the time doesn't make it better. (Btw: something for your helper script: it's customary to put the '=' to the next line; I know there is precedent for your variant, but '=' on next line is prevalent)extern int get_i(void); typedef signed char schar; typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef long long llong; typedef unsigned long long ullong; typedef long double ldouble; #define BODY(T) \ for (i = get_i(); i; i--) { \ accum += do_1_ ## T(accum, x, y, z); \ switch (get_i()) { \ case 1: case 3: case 34: case 123: \ if (z*x y) \ x = do_2_ ## T (x * y + z); \ else \ y = do_3_ ## T (x / y - z); \ break; \ case 6: case 43: case -2: \ z = do_4_ ## T ((int)z); \ break; \ }\ for (j = 1; j get_i (); j++) { \ a[i+j] = accum * b[i+j*get_i()] / c[j]; \ b[j] = b[j] - a[j-1]*x; \ x += b[j-1] + b[j+1]; \ y *= c[j*i] += a[0] + b[0] * c[(int)x]; \ } \ for (j = 0; j 8192; j++) { \ a[j] = a[j] + b[j] * c[j]; \ } \ } #define L1(T) BODY(T) BODY(T) BODY(T) BODY(T) BODY(T) BODY(T) BODY(T) BODY(T) #define L2(T) L1(T) L1(T) L1(T) L1(T) L1(T) L1(T) L1(T) L1(T) #define L3(T) L2(T) L2(T) L2(T) L2(T) L2(T) L2(T) L2(T) L2(T) #define FUNC(T) \ extern T do_1_ ## T(T, T, T, int); \ extern T do_2_ ## T(T); \ extern T do_3_ ## T(T); \ extern T do_4_ ## T(T); \ T func_ ## T (T x, T y, T z, T *a, T *b, T *c) \ { \ T accum = 0; \ int i, j; \ L2(T) \ /*L2(T)*/ \ return accum; \ } FUNC(schar) FUNC(uchar) FUNC(short) FUNC(ushort) FUNC(int) FUNC(uint) FUNC(long) FUNC(ulong) FUNC(llong) FUNC(ullong) FUNC(float) FUNC(double) FUNC(ldouble)
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
On Mon, Sep 2, 2013 at 1:43 PM, Jan Hubicka hubi...@ucw.cz wrote: + tree a; + + if (in_lto_p) +cgraph_get_body (node); That looks gross. It cannot possibly be the correct fix for this. DECL_ARGUMENTS/DECL_RESULT are now part of function body. Well, there is still fallout from this change so I'm not convinced this will stay this way. Also we stream the function-decl that refers to these fields in the global section which means we have a layering violation. Which means DECL_ARGUMENTS and DECL_RESULT should be moved to struct function? Of course frontends may not be happy with that (given DECL_ARGUMENTS is also used in function declarations) Please consider reverting these changes (at least the DECL_ARGUMENTS one). cgraph_get_body is there to load it for you when you need it. We are going to expand the function so it makes sense to get it. Then all DECL_ARGUMENTS/DECL_RESULT users need to be audited, no? Richard. The same is done by the passmanager when function is going to be expanded. Only difference here is that thunks do not go through the passmanager. I can drop in_lto_p (the function does nothing when body is already there) Honza
Re: [PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Hello, what is the blocking point for GCC integration? It was accepted by the SC and all issues of the last review have been addressed (at least this is my impression). Is it that none of the persons with global write permission seems to be responsible? The Binutils port is available since 2013-02-06. The Newlib port is available since 2013-05-06. The last step of an official FSF/Newlib tool chain for the Altera Nios II is the GCC part. On 2013-09-02 11:38, Chung-Lin Tang wrote: Ping. On 13/8/20 10:57 AM, Chung-Lin Tang wrote: Ping. BTW, the SC has approved the Nios II port already: http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html The port is still awaiting technical review. Thanks, Chung-Lin On 13/7/14 下午3:54, Chung-Lin Tang wrote: Hi, the last ping of the Nios II patches was: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html After assessing the state, we feel it would be better to post a re-submission of the newest patches. The changes accumulated since the original post include: 1) Several bug fixes related to built-in function expanding. 2) A few holes in hard-float FPU code generation was plugged. 3) Support for parsing white-spaces in target attributes. 4) Revision of consistency check behavior of codes in custom instruction built-ins. 5) Some new testcases. The issues raised by Joseph in the first round of reviewing have been addressed. Testing has been re-done on both 32-bit and 64-bit hosts. PR55035 appears to not have been resolved yet, which affects nios2 among several other targets, thus configured with --enable-werror-always still does not build. As before, Sandra and me will serve as nios2 port maintainers. Attached is the patch for the compiler-proper. Thanks, Chung-Lin -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy
Hi, On Fri, Aug 30, 2013 at 03:21:22PM -0400, David Malcolm wrote: Apart from the GTY aspect, how do people feel about the patch series? FWIW I have vague thoughts about doing something similar for tree - doing so *might* give an easier route to the type vs expression separation that Andrew spoke about at the Cauldron rearchitecture BoF. (I started looking at doing a similar C++-ification of rtx, but... gah) I like it but before you start looking at the biger things, could you perhpas proceed with the symtab? It has much fewer classes, will probably affect private development of fewer people, the accessor macros/functions of symtab are less developed so it will immediately really make code nicer, Honza has approved it and I'm really looking forward to it. Also, perhaps it will show us at much saller scale potential problems with the general scheme. I'm only writing this because the development there seems a bit stalled and it it a shame. Of course, you ay want to simplify the manual markings first. I'd perfectly understand that. Thanks, Martin
Re: [RFC] Changes to the wide-int classes
On 09/02/2013 05:35 AM, Richard Biener wrote: On Mon, 2 Sep 2013, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: There is no place for exactly two HWIs in the machine independent parts of the compiler, I totally agree. In fact I was taking that so much for granted that I didn't even think to add a rider about it, sorry. I didn't mean to imply that we should keep double_int around. I think the reason for doing this is to prove that it can be done (so that the wide_int code isn't too big a change for the tree level) and to make it easier to merge the wide-int patches into trunk piecemeal if we need to. Note that changing the tree rep to non-double_int is easy. Also note that I only want 'double_int' to stay around to have a fast type that can work on two HWIs for the code that need more precision than a HWI. The only important cases I know of are in get_inner_reference and get_ref_base_and_extent and friends. Those are heavily used (and the only double-int function callers that even show up in regular cc1 profiles). So if wide_int_fixed2 ('2' better be replaced with 'number-that-gets-me-twice-target-sizetype-precision') works for those cases then fine (and we can drop double-ints). This is what the addr_wide_int is for - it sniffs the port and is guaranteed to big enough to hold the largest pointer, + 4 bits (3 bits for bitposition and 1 bit so that unsigned things come over with no loss) then that is rounded up to the next hwi. (i will admit that the sniffing code needs a little work but that can be fixed without changing the api). kenny
Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL
On Wed, Jul 3, 2013 at 2:25 PM, Kugan kugan.vivekanandara...@linaro.org wrote: On 17/06/13 18:33, Richard Biener wrote: On Mon, 17 Jun 2013, Kugan wrote: +/* Extract the value range of assigned exprassion for GIMPLE_ASSIGN stmt. + If the extracted value range is valid, return true else return + false. */ +static bool +extract_exp_value_range (gimple stmt, value_range_t *vr) +{ + gcc_assert (is_gimple_assign (stmt)); + tree rhs1 = gimple_assign_rhs1 (stmt); + tree lhs = gimple_assign_lhs (stmt); + enum tree_code rhs_code = gimple_assign_rhs_code (stmt); ... @@ -8960,6 +9016,23 @@ simplify_stmt_using_ranges (gimple_stmt_iterator *gsi) { enum tree_code rhs_code = gimple_assign_rhs_code (stmt); tree rhs1 = gimple_assign_rhs1 (stmt); + tree lhs = gimple_assign_lhs (stmt); + + /* Set value range information for ssa. */ + if (!POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))) + (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME) + INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))) + !SSA_NAME_RANGE_INFO (lhs)) +{ + value_range_t vr = VR_INITIALIZER; ... + if (extract_exp_value_range (stmt, vr)) +tree_ssa_set_value_range (lhs, + tree_to_double_int (vr.min), + tree_to_double_int (vr.max), + vr.type == VR_RANGE); +} This looks overly complicated to me. In vrp_finalize you can simply do for (i = 0; i num_vr_values; i++) if (vr_value[i]) { tree name = ssa_name (i); if (POINTER_TYPE_P (name)) continue; if (vr_value[i].type == VR_RANGE || vr_value[i].type == VR_ANTI_RANGE) tree_ssa_set_value_range (name, tree_to_double_int (vr_value[i].min), tree_to_double_int (vr_value[i].max), vr_value[i].type == VR_RANGE); } Thanks Richard for taking time to review it. I was doing something like what you are suggesting earlier but noticed some problems and that’s the reason why I changed. For example, for the following testcase from the test suite, unsigned long l = (unsigned long)-2; unsigned short s; int main () { long t = l + 1; s = l; if (s != (unsigned short) -2) abort (); exit (0); } with the following gimple stmts main () { short unsigned int s.1; long unsigned int l.0; ;; basic block 2, loop depth 0 ;;pred: ENTRY l.0_2 = l; s.1_3 = (short unsigned int) l.0_2; s = s.1_3; if (s.1_3 != 65534) goto bb 3; else goto bb 4; ;;succ: 3 ;;4 ;; basic block 3, loop depth 0 ;;pred: 2 abort (); ;;succ: ;; basic block 4, loop depth 0 ;;pred: 2 exit (0); ;;succ: } has the following value range. l.0_2: VARYING s.1_3: [0, +INF] From zero/sign extension point of view, the variable s.1_3 is expected to have a value that will overflow (or varying) as this is what is assigned to a smaller variable. extract_range_from_assignment initially calculates the value range as VARYING but later changed to [0, +INF] by extract_range_basic. What I need here is the value that will be assigned from the rhs expression and not the value that we will have with proper assignment. I don't understand this. The relevant statement is s.1_3 = (short unsigned int) l.0_2; right? You have value-ranges for both s.1_3 and l.0_2 as above. And you clearly cannot optimize the truncation away (and if you could, you wond't need value-range information for that fact). I understand that the above code of mine needs to be changed but not convinced about the best way to do that. I can possibly re-factor extract_range_from_assignment to give me this information with an additional argument. Could you kindly let me know your preference. /* SSA name annotations. */ + union vrp_info_type { +/* Pointer attributes used for alias analysis. */ +struct GTY ((tag (TREE_SSA_PTR_INFO))) ptr_info_def *ptr_info; +/* Value range attributes used for zero/sign extension elimination. */ /* Value range information. */ +struct GTY ((tag (TREE_SSA_RANGE_INFO))) range_info_def *range_info; + } GTY ((desc (%1.def_stmt !POINTER_TYPE_P (TREE_TYPE ((tree)%1) vrp; why do you need to test %1.def_stmt here? I have seen some tree_ssa_name with def_stmt NULL. Thats why I added this. Is that something that should never happen. It should never happen - they should have a GIMPLE_NOP. +void +set_range_info (tree name, double_int min, + double_int max, bool vr_range) you have some whitespace issues here (please properly use tabs) + /* Allocate if not available. */ + if (ri == NULL) +{ + ri = ggc_alloc_cleared_range_info_def (); + mark_range_info_unknown (ri); that
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
Well, there is still fallout from this change so I'm not convinced this will stay this way. Also we stream the function-decl that refers to these fields in As far as I know there are two problems 1) problem with the thunk expansion not getting DECL_ARGUMENTS/DECL_RESULT addressed by this patch 2) ICE in gcc.dg/torture/pr8081.c Here the problem is in variable type of return value that gets streamed twice and we end up with duplicate. As mentioned earlier, I want to handle this by introducing variable_sized_function_type_p that will make those go specially the old way (with DECL_ARGUMENTS/DECL_RESULT in the global stream). I did not send the patch, because I think the variable sized parameters were handled incorrectly before my changes, too, and the fix is not so trivial. It just makes us not ICE the same way as before. The testcase is: /* { dg-do run } */ extern void abort (void); int main (int argc, char **argv) { int size = 10; typedef struct { char val[size]; } block; block a, b; block __attribute__((noinline)) retframe_block () { return *(block *) b; } b.val[0] = 1; b.val[9] = 2; a=retframe_block (); if (a.val[0] != 1 || a.val[9] != 2) abort (); return 0; } So here the size is not parameter of the function, it is a local vairable (later turned into structure) that gets into local function stream. The type is: record_type 0x77511930 block sizes-gimplified type_0 type_1 BLK size var_decl 0x7752f428 D.1738 type integer_type 0x7740d0a8 bitsizetype public unsigned TI size integer_cst 0x773f7fc0 constant 128 unit size integer_cst 0x773f7fe0 constant 16 align 128 symtab 0 alias set -1 canonical type 0x7740d0a8 precision 128 min integer_cst 0x7740f000 0 max integer_cst 0x773f7fa0 -1 used unsigned ignored TI file t.c line 8 col 19 size integer_cst 0x773f7fc0 128 unit size integer_cst 0x773f7fe0 16 align 128 context function_decl 0x77510f00 main chain var_decl 0x7752f4c0 D.1739 type integer_type 0x7740d738 long int used ignored DI file t.c line 10 col 20 size integer_cst 0x773f7f40 constant 64 unit size integer_cst 0x773f7f60 constant 8 align 64 context function_decl 0x77510f00 main chain var_decl 0x7752f558 D.1740 unit size var_decl 0x7752f2f8 D.1736 type integer_type 0x7740d000 sizetype public unsigned DI size integer_cst 0x773f7f40 64 unit size integer_cst 0x773f7f60 8 align 64 symtab 0 alias set -1 canonical type 0x7740d000 precision 64 min integer_cst 0x773f7f80 0 max integer_cst 0x773f7000 18446744073709551615 used unsigned ignored DI file t.c line 8 col 19 size integer_cst 0x773f7f40 64 unit size integer_cst 0x773f7f60 8 align 64 context function_decl 0x77510f00 main chain var_decl 0x7752f390 D.1737 type integer_type 0x7740d0a8 bitsizetype used unsigned ignored TI file t.c line 8 col 19 size integer_cst 0x773f7fc0 128 unit size integer_cst 0x773f7fe0 16 align 128 context function_decl 0x77510f00 main chain var_decl 0x7752f428 D.1738 align 8 symtab 0 alias set -1 canonical type 0x77511738 fields field_decl 0x7752f098 val type array_type 0x77511888 type integer_type 0x7740d3f0 char sizes-gimplified type_1 BLK size var_decl 0x7752f428 D.1738 unit size var_decl 0x7752f2f8 D.1736 align 8 symtab 0 alias set -1 structural equality domain integer_type 0x775117e0 decl_0 BLK file t.c line 10 col 20 size var_decl 0x7752f428 D.1738 unit size var_decl 0x7752f2f8 D.1736 align 8 offset_align 128 offset integer_cst 0x773f7f80 constant 0 bit offset integer_cst 0x7740f000 constant 0 context record_type 0x77511738 context function_decl 0x77510f00 main pointer_to_this pointer_type 0x77511bd0 chain type_decl 0x7753 D.1726 My understanding is that we end up with silently putting automatic variable D.1738 into global stream and we happen to not explode later. At stream in time we will have duplicated D.1738 parameters and it is only becuase the one in TYPE_SIZE is unused making us to not ICE. I also do not see how we can produce valid debug info for the nested
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
Well, there is still fallout from this change so I'm not convinced this will stay this way. Also we stream the function-decl that refers to these fields in As far as I know there are two problems 1) problem with the thunk expansion not getting DECL_ARGUMENTS/DECL_RESULT addressed by this patch 2) ICE in gcc.dg/torture/pr8081.c Here the problem is in variable type of return value that gets streamed twice and we end up with duplicate. As mentioned earlier, I want to handle this by introducing variable_sized_function_type_p that will make those go specially the old way (with DECL_ARGUMENTS/DECL_RESULT in the global stream). I did not send the patch, because I think the variable sized parameters were handled incorrectly before my changes, too, and the fix is not so trivial. It just makes us not ICE the same way as before. The testcase is: /* { dg-do run } */ extern void abort (void); int main (int argc, char **argv) { int size = 10; typedef struct { char val[size]; } block; block a, b; block __attribute__((noinline)) retframe_block () { return *(block *) b; } b.val[0] = 1; b.val[9] = 2; a=retframe_block (); if (a.val[0] != 1 || a.val[9] != 2) abort (); return 0; } So here the size is not parameter of the function, it is a local vairable (later turned into structure) that gets into local function stream. ^^^ reading my email, perhaps in this case the correct fix is to make tree-nested to fix-up the TYPE_SIZE? It knows how to access the frame variables from outer function via the invisible link pointer passed and it knows how to update gimple. That may also fix the issue with debug info.
Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C
+ case CILK_SYNC_STMT: +{ + if (!cfun-cilk_frame_decl) + { + error_at (input_location, expected %_Cilk_spawn% before + %_Cilk_sync%); + ret = GS_ERROR; + } First, surely you have a location you can use, instead of the generic input_location (perhaps the location for the CILK_SYNC_STMT??). Also, Can you not check for this in c_finish_cilk_sync_stmt(), or the corresponding code-- that is, in the FE somewhere? And hopefully, in a place you can share with the C++ FE? If it is a real pain, I am willing to let this go, since it happens only in the Cilk code path, though the general trend (especially with Andrew's proposed changes) is to do all type checking as close to the source as possible. If you look at the codes above (e.g. TRUTH_XOR_EXPR), they all use input_location. Also, in the beginning of the function there is a line like this: if (save_expr != error_mark_node EXPR_HAS_LOCATION (*expr_p)) input_location = EXPR_LOCATION (*expr_p); Yes, that is old legacy code. The present move is to steer away from input_location altogether. For the 2nd point, there can be a case where (with the help of Gotos) _Cilk_sync can come before _Cilk_spawn. So, the only way to do this check is to do it after the entire function is parsed. Fair enough. Alright, I'm fine with this current incantation. Thanks for all your hard work taking care of the things I've pointed out. It's now up to one of the core maintainers to take it from here. Aldy
Re: Eliminate vectorizer analysis side effects
On Fri, Aug 30, 2013 at 11:34 PM, Xinliang David Li davi...@google.com wrote: On Fri, Aug 30, 2013 at 1:23 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Aug 30, 2013 at 1:28 AM, Xinliang David Li davi...@google.com wrote: I was debugging a runtime failure of SPEC06 xalancbmk built with LIPO. Using -fdisable-pass option pinpoints the problem in slp vectorize pass on a particular function. dbgcnt support is added to to track down the individual BB, but it fails even when the dbg count is set to 0. It turns out that no BB was actually vectorized for that function, but turning on/off slp-vectorize does make a difference in generated code -- the only difference between the good and bad case is stack layout. The problem is in the alignment analysis phase -- which speculatively changes the base declaration's alignment regardless whether the vectorization transformation will be performed or not later. The attached patch fixes the problem. Testing is ok. Ok for trunk? Not in this form. I'd rather not put extra fields in the data-refs this way. (As for the xalancbmk runtime problem - doesn't this patch just paper over the real issue?) I believe it is stack-limit related. This program has some recursive call chains that can generate a call frame ~9k deep. The vectorizer side effect causes the affected function in the call frame to grow ~100 byte in stack size. Since this function appears lots of times in the callstack, the overall stack consumption increase a lot. Combined with the aggressive cross module inlining, it ends up blowing up the stack limit. For BB SLP you still adjust DR bases that do not take part in vectorization - the DR vector contains all refs in the BB, not just those in the SLP instances we are going to vectorize. So the commit of the re-aligning decision should be done from where we vectorize the DR, in vectorizable_load/store in its transform phase. If we decide to integrate the fields into the data-ref then the analysis and commit parts should go into the data-ref machinery as well. Otherwise the vectorizer should use data-ref-aux or some other way to hang off its private data. Good point. Other than that, modifying alignment of variables that are not vectorized is indeed a bug worth fixing. The new version of the patch is attached. Ok for trunk after testing? +/* A helper function to free data refs. */ + +void +destroy_datarefs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo) please rename to vect_destroy_datarefs @@ -431,6 +460,7 @@ static unsigned int execute_vect_slp (void) { basic_block bb; + bb_vec_info bb_vinfo; init_stmt_vec_info_vec (); @@ -438,8 +468,11 @@ execute_vect_slp (void) { vect_location = find_bb_location (bb); - if (vect_slp_analyze_bb (bb)) + if ((bb_vinfo = vect_slp_analyze_bb (bb))) { spurious change? bb_vinfo seems to be unused. +typedef struct _dataref_aux { + tree base_decl; + bool base_misaligned; + int misalignment; +} dataref_aux; we no longer need that typedef stuff with C++ ... + gcc_assert (dr-aux); + ((dataref_aux *)dr-aux)-base_decl = base; + ((dataref_aux *)dr-aux)-base_misaligned = true; dereferencing dr-aux will trap, so no need to assert (dr-aux). Please add a comment before this code explaining what this is for. -vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt, - slp_tree slp_node) +vectorizable_store_1 (gimple stmt, stmt_vec_info stmt_info, + struct data_reference *dr, + gimple_stmt_iterator *gsi, gimple *vec_stmt, + slp_tree slp_node) ... +/* A wrapper function to vectorizable_store. */ + +static bool +vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt, +slp_tree slp_node) +{ it shouldn't be necessary to split the functions, after if (!vec_stmt) /* transformation not required. */ { STMT_VINFO_TYPE (stmt_info) = store_vec_info_type; vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt, NULL, NULL, NULL); return true; } /** Transform. **/ the function is no longer allowed to fail, so at this point you can call ensure_base_align. Similar for the load case. Ok with those minor cases fixed. Thanks, Richard. thanks, David Richard. thanks, David
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
On Mon, Sep 2, 2013 at 3:02 PM, Jan Hubicka hubi...@ucw.cz wrote: Well, there is still fallout from this change so I'm not convinced this will stay this way. Also we stream the function-decl that refers to these fields in As far as I know there are two problems 1) problem with the thunk expansion not getting DECL_ARGUMENTS/DECL_RESULT addressed by this patch 2) ICE in gcc.dg/torture/pr8081.c Here the problem is in variable type of return value that gets streamed twice and we end up with duplicate. As mentioned earlier, I want to handle this by introducing variable_sized_function_type_p that will make those go specially the old way (with DECL_ARGUMENTS/DECL_RESULT in the global stream). I did not send the patch, because I think the variable sized parameters were handled incorrectly before my changes, too, and the fix is not so trivial. It just makes us not ICE the same way as before. The testcase is: /* { dg-do run } */ extern void abort (void); int main (int argc, char **argv) { int size = 10; typedef struct { char val[size]; } block; block a, b; block __attribute__((noinline)) retframe_block () { return *(block *) b; } b.val[0] = 1; b.val[9] = 2; a=retframe_block (); if (a.val[0] != 1 || a.val[9] != 2) abort (); return 0; } So here the size is not parameter of the function, it is a local vairable (later turned into structure) that gets into local function stream. ^^^ reading my email, perhaps in this case the correct fix is to make tree-nested to fix-up the TYPE_SIZE? It knows how to access the frame variables from outer function via the invisible link pointer passed and it knows how to update gimple. That may also fix the issue with debug info. But we still refer to the local entity from TREE_TYPE of the function decl, no? As for debug info we probably have to insert proper DEBUG_STMTs that tell us how to construct those. Don't we do this kind of dance already? All the gimplfied type-sizes in TYPE_SIZE are basically 'useless', they exist only for the gimplfiers sake and as 'marker' that the type has variable size. All references to the size are explicit in the code (and I believe debug info has to be emitted before gimplifying or doesn't properly handle such variadic types at all). That's some huge cleanup opportunity (I suppose NULLifying TYPE_SIZE doesn't work). Richard.
Re: Remove hash from remember_with_vars
Hi, unfortunately this patch ICEs on the following testcase /* This used to fail on SPARC with an unaligned memory access. */ void foo(int n) { struct S { int i[n]; unsigned int b:1; int i2; } __attribute__ ((packed)) __attribute__ ((aligned (4))); struct S s; s.i2 = 0; } int main(void) { foo(4); return 0; } (in a tetsuite I must have missed during testing) because it check that field offset is always non-variable. I merely copied this sanity check from identical one in lto_fixup_prevailing_decls that however does not fire because it needs mentions_vars_p_field_decl to record the var first and that never happens. This patch fixes it by adding an fixup. Here the variable is local, but moving n to file scope is allowed. I have bootstrapped/regtested ppc64-linux, will commit it as obvious. I apologize for the breakage. I will prioritize fixing the fallout I caused this week. Honza Index: ChangeLog === --- ChangeLog (revision 202173) +++ ChangeLog (working copy) @@ -1,5 +1,10 @@ 2013-08-31 Jan Hubicka j...@suse.cz + * lto.c (mentions_vars_p_field_decl, lto_fixup_prevailing_decls): + DECL_FIELD_OFFSET can contain an reference to variable. + +2013-08-31 Jan Hubicka j...@suse.cz + * lto.c (tree_with_vars): Turn into vector. (MAYBE_REMEMBER_WITH_VARS): Change to... (CHECK_VAR): ... this one. Index: lto.c === --- lto.c (revision 202153) +++ lto.c (working copy) @@ -1389,7 +1389,7 @@ mentions_vars_p_field_decl (tree t) { if (mentions_vars_p_decl_common (t)) return true; - CHECK_NO_VAR (DECL_FIELD_OFFSET (t)); + CHECK_VAR (DECL_FIELD_OFFSET (t)); CHECK_NO_VAR (DECL_BIT_FIELD_TYPE (t)); CHECK_NO_VAR (DECL_QUALIFIER (t)); CHECK_NO_VAR (DECL_FIELD_BIT_OFFSET (t)); @@ -3207,7 +3207,7 @@ lto_fixup_prevailing_decls (tree t) LTO_SET_PREVAIL (DECL_FUNCTION_PERSONALITY (t)); if (CODE_CONTAINS_STRUCT (code, TS_FIELD_DECL)) { - LTO_NO_PREVAIL (DECL_FIELD_OFFSET (t)); + LTO_SET_PREVAIL (DECL_FIELD_OFFSET (t)); LTO_NO_PREVAIL (DECL_BIT_FIELD_TYPE (t)); LTO_NO_PREVAIL (DECL_QUALIFIER (t)); LTO_NO_PREVAIL (DECL_FIELD_BIT_OFFSET (t));
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
But we still refer to the local entity from TREE_TYPE of the function decl, no? depending on definition of local entity. I tought we was discussing if PARM_DECL is local or not. I spent some time thining about the whole streaming scheme and I think we do have important side cases handled incorrectly. Lets try to discuss things. I will try to summarize my understanding to make sure we are on sync and I do not miss something important. So we now have two types of streams - the global stream this stream is one per compilation unit and it gets load into WPA and merged with other streams by tree mergina and LTO-symtab resolution. - local stream. this stream is local for function (and hope for variable initializer soon, too), it is read when we need the function body and it is not in any way merged with global stream, except that the references to global streams are resolved after merging We do have way to express references from local stream to global stream. We however can not express A) references from global stream to local stream B) references in between two different local streams. The decision what should go to local or global stream is basically motivated by 1) everything needed for interprocedural optimization has to be global 2) everything related to function bodies should be local. For tree nodes, the decision about where the node goes is made by tree_is_indexable. The current set of rules is - parm_decl/result_decl is local (before my change resut_decl was global and parm_decl was local, why those was not the same escapes me, but it needed ugly fixup in ipa-prop that needed to reconnect global DECL pointers to local DECL pointers after streaming in) - function variables are local if they are not static. (Even external declarations are local, not only automatic vars.) - Types are global unless they are variably modified types (The variably modified type code walks into subtypes, so even pointers to variably modified type are local or function declarations mentioning these.) - FIELD_DECL is global unless it is part of variably modified type. - SSA_NAME is local despite logic of tree_is_indexable because it is handled separately as part of gimple and gimple is always local. - LABEL_DECLs are global even when their address is not taken and it would make more sense to hide it in the body sections Now I think we are safe only when our references never violate A) and B) and moreover C) same entity that needs to be unique (such as declarations by one decl rule or BLOCK) is not streamed into multiple streams. We never merge things between streams and duplicating those is forbidden by design. It is an irritating property of our streaming machinery that it is not diagnozing violations of A/B/C. Violations of type A and B are basically hidden by introducing violations of type C and compiler tends to be resistant to those in a sense that is usually does not ICE. Now I only once added sanity check for C and larnt that 1) variably sized types may end up being twice - in global stream because they are mentioned by FUNCTION_DECL/VARIABLE_DECL or TYPE_DECL and local stream because of tree_is_indexable rule Moreover they violate A after landing in global stream because they refer to temporaries in the gimple body. As you mentioned, we may want to clear TYPE_SIZE and variable DECL_FIELD_OFFSET. But sadly I think dwarf2out needs both and gimple land needs DECL_FIELD_OFFSET. 2) TYPE_DECLS of local types (common in C++) tends to drag into global stream BLOCK that later brings to global stream the local variables. Making function local TYPE_DECLs local seems cool, but it break A because function/local static variable types may reffer to them. 3) _ORIGIN pointers violate B by design. 4) LABEL_DECL with address taken violate A and B. I am sure this list is not complette. Now we can play about what can be global or local. Like putting PARM_DECL into global stream only, or returning RESULT_DECL into global decl stream. But I do not really see how much chance it has to fix 1-4. I think the general flaw is that functions/static variables (we absolutely need in global stream) may reffer to local entities by contextes and variably sized types. Either we need to consistently not stream pointers causing the violations above - i.e. clear them in free_lang_data and make backend not dependent on them. Or we need to find way to make references of type A and perhaps B possible. I can think of local_ref function_decl localid tree construct that we patch in and replace by actual declaration after reading function_decl body when it is needed (by dwarf2out or tree optimizers), but it does not seem cool to me either. We will like soon notice that with comdats localid needs to be unique across the merged comdat bodies or somehting similarly nasty. Another alternative I can
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
Hi, the following testcase illustrate problem with the offset. Sadly it ICEs even w/o LTO: evans:/abuild/jh/trunk-3/build-inst12/gcc/:[1]# ./xgcc -B ./ -O2 ~/tt.c /root/tt.c: In function 'main': /root/tt.c:24:11: warning: overflow in implicit constant conversion [-Woverflow] b.b=0xdead; ^ /root/tt.c:25:12: internal compiler error: in assign_stack_temp_for_type, at function.c:762 a=retframe_block (); ^ 0x7ae808 assign_stack_temp_for_type(machine_mode, long, tree_node*) ../../gcc/function.c:762 /* { dg-do run } */ extern void abort (void); int main (int argc, char **argv) { int size = 10; typedef struct { char val[size]; char b; } block; block a, b; block __attribute__((noinline)) retframe_block () { if (b.b != 0xdead) abort (); return *(block *) b; } b.val[0] = 1; b.val[9] = 2; b.b=0xdead; a=retframe_block (); if (a.val[0] != 1 || a.val[9] != 2) abort (); return 0; } Honza
Re: [RFC] Changes to the wide-int classes
On Sun, 1 Sep 2013, Richard Sandiford wrote: like to get rid of them and just keep the genuine operators. The problem is that I'd have liked the AND routine to be wi::and, but of course that isn't possible with and being a keyword, so I went for wi::bit_and instead. Same for not and wi::bit_not, and or and wi::bit_or. Then it seemed like the others should be bit_* too, and wi::bit_and_not just seems a bit unwieldly... Hmm, if we decide to forbid the use of and in gcc, perhaps we could #define it to something safe. But that would probably be too confusing. and in C++ is not a keyword, but an alternative token (like % etc.). As such, it can't be defined as a macro, or used as a macro name in #define, #ifdef etc., and does not get converted to 0 in #if conditions but is interpreted as an operator there. (The status of new and delete in this regard is less clear; see http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#369.) -- Joseph S. Myers jos...@codesourcery.com
Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT
Hi, also to avoid the ICE in the original testcase, we do not really need the DECL_ARGUMENTS/RESULT_DECL lists. All we need is RESULT_DECL in the global stream. The following one liner fixes the testcase and all variants of my ulitimate death testcase that did not ICE in tree from beggining of August. Bootstrapped/regtested x86_64-linux with default, running testing on ppc64-linux including Ada. Does this seem resonable? (though still symptomatic in my belief) * lto-streamer-out.c (tree_is_indexable): RESULT_DECL/PARM_DECL of variably modified types are global. Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 202161) +++ lto-streamer-out.c (working copy) @@ -124,8 +124,11 @@ output_type_ref (struct output_block *ob static bool tree_is_indexable (tree t) { + /* Parameters and return values of functions of variably modified types + must go to global stream, because they may be used in the type + definition. */ if (TREE_CODE (t) == PARM_DECL || TREE_CODE (t) == RESULT_DECL) -return false; +return variably_modified_type_p (TREE_TYPE (DECL_CONTEXT (t)), NULL_TREE); else if (TREE_CODE (t) == VAR_DECL decl_function_context (t) !TREE_STATIC (t)) return false;
Re: [patch 4/4] -fstrict-volatile-bitfields cleanup v3: remove from defaults on all targets
On Mon, 2 Sep 2013, Richard Earnshaw wrote: On 01/09/13 14:10, Bernd Edlinger wrote: IMHO the AAPCS forbids packed structures. Therefore we need not interfere with the C++ memory model if we have unaligned data. The AAPCS neither forbids nor requires packed structures. They're a GNU extension and as such not part of standard C++. Thus the semantics of such an operation are irrelavant to the AAPCS: you get to chose what the behaviour is in this case... The trouble is that AAPCS semantics are incompatible with the default GNU semantics for non-packed structures as well - AAPCS strict-volatile-bitfields is only compatible with --param allow-store-data-races=1, which is not the default for any language variant accepted by GCC (and I say that the default language semantics here should not depend on the target architecture). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH]: Fix PR middle-end/56382 -- Only move MODE_COMPLEX_FLOAT by parts if we can create pseudos
Eric, your patch works for me. Tested on hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11. Thanks, also tested on x86-64/Linux and applied on the mainline. 2013-09-02 Eric Botcazou ebotca...@adacore.com PR middle-end/56382 * expr.c (emit_move_complex): Do not move complex FP values as parts if the source or the destination is a single hard register. -- Eric BotcazouIndex: expr.c === --- expr.c (revision 202160) +++ expr.c (working copy) @@ -3232,11 +3232,16 @@ emit_move_complex (enum machine_mode mod if (push_operand (x, mode)) return emit_move_complex_push (mode, x, y); - /* See if we can coerce the target into moving both values at once. */ - - /* Move floating point as parts. */ + /* See if we can coerce the target into moving both values at once, except + for floating point where we favor moving as parts if this is easy. */ if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT - optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing) + optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing + !(REG_P (x) + HARD_REGISTER_P (x) + hard_regno_nregs[REGNO(x)][mode] == 1) + !(REG_P (y) + HARD_REGISTER_P (y) + hard_regno_nregs[REGNO(y)][mode] == 1)) try_int = false; /* Not possible if the values are inherently not adjacent. */ else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT)
[linaro/gcc-4_8-branch] Backports from trunk
Hi, We have just committed backports of the following revisions from trunk to linaro/gcc-4_8-branch: r201636 (as 202175) r201501 (as 202176) (should have picked it up by 4.8 branch merge) r201341 (as 202181) Thanks, Christophe.
Re: [RFC] Changes to the wide-int classes
Joseph S. Myers jos...@codesourcery.com writes: On Sun, 1 Sep 2013, Richard Sandiford wrote: like to get rid of them and just keep the genuine operators. The problem is that I'd have liked the AND routine to be wi::and, but of course that isn't possible with and being a keyword, so I went for wi::bit_and instead. Same for not and wi::bit_not, and or and wi::bit_or. Then it seemed like the others should be bit_* too, and wi::bit_and_not just seems a bit unwieldly... Hmm, if we decide to forbid the use of and in gcc, perhaps we could #define it to something safe. But that would probably be too confusing. and in C++ is not a keyword, but an alternative token (like % etc.). As such, it can't be defined as a macro, or used as a macro name in #define, #ifdef etc., and does not get converted to 0 in #if conditions but is interpreted as an operator there. Ah, thanks, hadn't realised that. In some ways I'm glad that such a bad idea would fail to work :-) Richard
Fix gcc.dg/lto/20090218-1.c
Hi, gcc.dg/lto/20090218-1.c contains cross module call into always_inline function. At -O0 -flto we used to report nothing since optimize_inline_calls was not called. With my change we report it as error. I am not sure what is desired behaviour, but for now I am restoring the previous situation that is consistent with non-lto build. I suppose it is good thing to do given that we promise always inlines to be always inline only within given module. I am testing the patch on x86_64-linux and will commit it if testing passes. Honza * ipa-inline-transform.c (inline_transform): Do not optimize inline calls when not optimizing. Index: ipa-inline-transform.c === --- ipa-inline-transform.c (revision 202182) +++ ipa-inline-transform.c (working copy) @@ -432,7 +432,7 @@ inline_transform (struct cgraph_node *no ipa_remove_all_references (node-symbol.ref_list); timevar_push (TV_INTEGRATION); - if (node-callees) + if (node-callees optimize) todo = optimize_inline_calls (current_function_decl); timevar_pop (TV_INTEGRATION);
Re: [RFC] Changes to the wide-int classes
On Sep 2, 2013, at 1:22 AM, Richard Sandiford rdsandif...@googlemail.com wrote: What I'm saying is that the assert stops plain: max_wide_int x = wide_int_var; Sorry for being slow. I see the point, indeed, from the preexisting code, we did: fixed_wide_int (const wide_int_ro w) : wide_int_ro (w) { /* We only allow the same size in, as otherwise we would not know how to extend it. */ gcc_assert (precision == bitsize); } to disallow it dynamically. Doing this, or pushing it further into the type system so that it is caught a compile time, is fine. I also like the passing of sign: max_wide_int x = max_wide_int::from (wide_int_var, SIGNED); max_wide_int x = max_wide_int::from (wide_int_var, UNSIGNED); like this. In the previous code we did: static inline fixed_wide_int from_wide_int (const wide_int w) { if (w.neg_p (SIGNED)) return w.sforce_to_size (bitsize); return w.zforce_to_size (bitsize); } which is, as you point out, dangerous.
Re: Lambda templates and implicit function templates.
On 01.09.2013 21:05, Jason Merrill wrote: On 08/27/2013 03:42 PM, Adam Butcher wrote: I don't know if it's the correct thing to do but the implementation currently omits the conversion to function pointer operator if the argument list contains a parameter pack. I would expect that to work. Does the specification not provide for deduction in that case? It doesn't forbid it as far as I can see. But it also gives no example of a stateless lambda with a variadic parameter. TBH I just couldn't figure out the right implementation so thought it better to omit the conversion operator on the assumption that it is not a common use case rather than ICE . I'll have a rethink based on your follow up to 2/4 and try to get a pack expansion working there. One other thing, assuming the 'auto...' syntax can be made to work, bug 41933 needs to be resolved for the expansion returned by the generic lambda in N3690 5.1.2.5 to compile. Currently (transforming the 'auto...' to an explicit 'typename T... T...') appears to yield the bug. Bug 41933 is specifically about lambda capture; I think you're running into something else. The problem is in expanding the 'ts' capture from the 5.1.2.5. It looks like this: 1 auto vglambda = [](auto printer) { 2 return [=](auto ... ts) { // OK: ts is a function parameter pack 3 printer(std::forwarddecltype(ts)(ts)...); 4 return [=]() { 5 printer(ts ...); 6 }; 7 }; 8 }; It is the expansion of the captured 'ts' on line 5 that I think yields the bug. Might be slightly different to 41933 though due to not being explicitly captured as ts Unsure. My point was that in order to conform this issue will need to be fixed as well as getting the 'auto...' syntax to work. Cheers, Adam
Re: [PATCH 1/4] Support lambda templates.
On 01.09.2013 21:22, Jason Merrill wrote: On 08/27/2013 03:42 PM, Adam Butcher wrote: + vec_safe_push (argvec, arg); I bet we want convert_from_reference in the non-generic lambda case, too. OK with that change. I think I had made that change originally to keep the two impls the same and I hit issues with non-generic lambdas. But I can't remember the details. I'll try again. If it works with convert_from_reference in both cases should I push or should I sort out the parameter pack conversion op issue and roll that up into this?
Re: [PATCH 1/4] Support lambda templates.
On 09/02/2013 02:30 PM, Adam Butcher wrote: I think I had made that change originally to keep the two impls the same and I hit issues with non-generic lambdas. But I can't remember the details. I'll try again. If it works with convert_from_reference in both cases should I push or should I sort out the parameter pack conversion op issue and roll that up into this? I think roll them together, since that patch rewrites parts of this one. Jason
*ping* [patch, fortran] PR 56519: Reject impure intrinsic subroutines in DO CONCURRENT
Ping**0.5714 ? the attached patch rejects impure subroutines, such as RANDOM_NUMBER, within DO CONCURRENT. Regression-tested. OK for trunk?
Re: [PATCH 4/4] Support using 'auto' in a function parameter list to introduce an implicit template parameter.
On 01.09.2013 22:20, Jason Merrill wrote: On 08/27/2013 03:42 PM, Adam Butcher wrote: + else // extend current template parameter list + // pop the innermost template parms into tparms Most comments should start with a capital letter and end with a period. Will change. + for (size_t n = 0, end = TREE_VEC_LENGTH (inner_vec); n end; ++n) + tparms = chainon (tparms, TREE_VEC_ELT (inner_vec, n)); Doing chainon in a loop has bad algorithmic complexity, as it walks through the whole tparms list each iteration. Better to build up a list from inner_vec and then chainon that list as a whole. Okay. +template typename TreePredicate +inline tree +find_type_usage (tree t, TreePredicate pred) I don't think this needs to be a template, since we know the predicates take a single tree and return bool. I didn't know whether to allow for someone passing in a stateful lambda (or other functor) in future so I avoided the issue by making the predicate a template type param. If we're happy that only c-style functions (or stateless lambdas) will be passed then I'll put it back as 'bool (*) (const_tree)'. I don't see any diagnostic for the implicit function template extension; my earlier comment about not controlling it with -std=c++1y vs gnu++1y didn't mean it should go away entirely. :) Maybe we should call it part of c++1z, or just control the diagnostic with -pedantic. I must confess I was a bit unclear about how to proceed there. I'll reinstate the two messages and go with a specific diagnostic if -pedantic is set in the non-lambda case. Cheers, Adam
Re: Lambda templates and implicit function templates.
On 09/02/2013 02:27 PM, Adam Butcher wrote: Bug 41933 is specifically about lambda capture; I think you're running into something else. The problem is in expanding the 'ts' capture from the 5.1.2.5. It looks like this: 1 auto vglambda = [](auto printer) { 2 return [=](auto ... ts) { // OK: ts is a function parameter pack 3 printer(std::forwarddecltype(ts)(ts)...); 4 return [=]() { 5 printer(ts ...); 6 }; 7 }; 8 }; It is the expansion of the captured 'ts' on line 5 that I think yields the bug. Ah, yes, that is 41933. I thought you were saying that a testcase without the innermost lambda would fail. Jason
RE: [PATCH] Fix PR tree-optimization/58137
On Fri, 30 Aug 2013 12:34:51 Richard Biener wrote: On Tue, Aug 20, 2013 at 1:01 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: This patch fixes a bug in the vectorized pointer arithmetic in the forwprop optimization pass. Although it seems to be impossible to create a vector of pointers with the __attribute__((vector_size(x))) syntax, the vector loop optimization together with the loop unrolling can create code which adds a vector of ptroff_t repeatedly to a vector of pointers. The code in tree-ssa-forwprop.c handles program transformations of the form (PTR +- CST1) +- CST2 = PTR +- (CST1+-CST2) where PTR is a vector of pointers and CST1 and CST2 are vectors of ptroff_t, without the fix the result type of CST1+-CST2 was vector of pointer, which causes the ICE in tree-cfg.c, which sees an undefined pointer + pointer operation. Additionally the check in tree-cfg.c allows expressions of the form CST - PTR, which is clearly wrong. That should be fixed too. The patch was bootstrapped and regression tested on i686-pc-linux-gnu. It seems to me that the forwprop code does not handle the fact that we are permissive as to using PLUS_EXPR instead of POINTER_PLUS_EXPR for vector pointer - offset additions. So instead of doing this dance the better (and more easily backportable) fix is to simply disable the transform on pointer valued PLUS_EXPR. Like with if (POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt return false; at the beginning of the function. the condition would have to be: if (TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (gimple_assign_lhs (stmt) return false; I tried this, and it fixes the ICE. However the generated code was still vectorized but much less efficient and used more registers. Fortunately there will be no need to backport this, since this bug does not happen in gcc 4.8.2 I checked that. I believe it was introduced with the checkin r200059 by Marc Glisse where associate_plusminus was enhanced to handle vector values. Before that only TREE_CODE (rhs2) == INTEGER_CST was handled. Frankly I would prefer the initial version of the patch, because the code is more efficient this way. The vector data is folded correctly, only the data type was wrong and triggered the ICE in tree-cfg.c. Please advise. Thanks Bernd. The real fix is of course to make vector pointer operations properly use POINTER_PLUS_EXPR ... Richard. Regards Bernd Edlinger
[ping**n] reimplement -fstrict-volatile-bitfields, v3
On 09/02/2013 03:10 AM, Richard Biener wrote: Can someone, in a new thread, ping the patches that are still in flight? ISTR having approved bits of some patches before my leave. Here's the current state of the patch set I put together. I've lost track of where the canonical version of Bernd's followup patch is. On 07/09/2013 10:23 AM, Sandra Loosemore wrote: On 06/30/2013 09:24 PM, Sandra Loosemore wrote: Here is my third attempt at cleaning up -fstrict-volatile-bitfields. Part 1 removes the warnings and packedp flag. It is the same as in the last version, and has already been approved. I'll skip reposting it since the patch is here already: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00908.html Part 2 replaces parts 2, 3, and 4 in the last version. I've re-worked this code significantly to try to address Bernd Edlinger's comments on the last version in PR56997. Part 2: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg1.html Part 3 is the test cases, which are the same as in the last version. Nobody has reviewed these but I assume they are OK if Part 2 is approved? http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00912.html Part 4 is new; it makes -fstrict-volatile-bitfields not be the default for any target any more. It is independent of the other changes. Part 4: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg2.html -Sandra
Re: [patch, fortran] PR 56519: Reject impure intrinsic subroutines in DO CONCURRENT
Thomas Koenig wrote: the attached patch rejects impure subroutines, such as RANDOM_NUMBER, within DO CONCURRENT. Regression-tested. OK for trunk? Okay - and thanks for the patch. Unrelated to changes in your patch, I am a bit unhappy about: * Proliferation of global variables * Magic values (0, 1, 2) * Badly translatable strings: … inside a DO CONCURRENT %s, …gfc_do_concurrent_flag == 2 ? mask : block But as those are unchanged by your patch … Tobias 2013-08-29 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/PR56519 * gfortran.h: Declare gfc_do_concurrent_flag as extern. * resolve.c: Rename do_concurrent_flag to gfc_do_concurrent_flag. and make non-static. (resolve_function): Use gfc_do_concurrent_flag instead of do_concurrent_flag. (pure_subroutine): Likewise. (resolve_code): Likewise. (resolve_types): Likewise. * intrinsic.c (gfc_intrinsic_sub_interface): Raise error for non-pure intrinsic subroutines within DO CONCURRENT. 2013-08-29 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/PR56519 * gfortran.dg/do_concurrent_3.f90: New test case.
Re: [patch, fortran, docs] Unformatted sequential and special files
Thomas Koenig wrote: +Unformatted sequential files are stored using record markers. Each +full record consists of a leading record marker, the data written +by the user program, and a trailing record marker. The record markers +are four-byte integers by default, and eight-byte integers if the +@option{-fmax-subrecord-length=8} option is in effect. Each record +marker contains the number of bytes of data in the record. I wonder whether one should discourage the use of -fmax-subrecord-length=8 more explicitly here - when reading until here one might think that there is a 2GB limit for =4. +The maximum number of bytes of user data in a record is 2147483639 for +a four-byte record marker. Why this number? I had expected it to be exactly 2 GiB (minus 1 Byte) = 2**31-1 = 2147483647. Your number is one byte shorter. Why? Actually, I had also mentioned GiB as those are simpler to remember, e.g. The maximal user data per record is 2 gigabytes (2147483647/2147483639 bytes) for the four-byte record marker. If this is exceeded, a record is split into +subrecords. Each subrecord also has a leading and a trailing record +marker. If the leading record marker contains a negative number, the +number of user data bytes in the subrecord equals the absolute value +of this number, and another subrecord follows the current one. If the +trailing record marker contains a negative number, then the number of +bytes of user data equals the absolute value of that number, and there +is a preceding subrecord. + +The format for unformatted sequential data can be duplicated using +unformatted stream, as shown in this example program: (which assumes records shorter than 2 GiB) + +@smallexample +program main + implicit none + integer :: i I wonder whether one should make this more explicit by using use iso_fortran_env, only: int32 integer(int32) :: i +Unformatted sequential file access is @emph{not} supported for special +files. If necessary, it can be simulated using unformatted stream, +see @ref{Unformatted sequential file format}. + +I/O to and from block devices are also not supported. + +@code{BACKSPACE}, @code{REWIND} and @code{ENDFILE} are not supported +for special files. I think I understand what you mean but I/O to and from block devices are also not supported. sounds as if it also could apply to all I/O, including stream I/O. How about adding block devices to the list of special files in the intro? You could also mention sockets, which behave similarly. The BACKSPACE/REWIND/ENDFILE could be moved directly after the itemize in the same paragraph as the POS=. And instead of unformatted sequential, you could write unformatted sequential and directed access, which reminds me that we should also document that file format for the sake of completeness. Tobias
Re: [Patch, Fortran, F03] PR 55603: Memory leak with scalar allocatable function result
Am 27.08.2013 15:09, schrieb Janus Weil: here is a patch for PR 55603, which plugs a memory leak with scalar allocatable function results. To accomplish this, several things are done: 1) Allocatable scalar function results are passed as argument now and returned by reference (just like array or character results, cf. gfc_return_by_reference). [...] In fact the patch is just a first step and does not handle more advanced cases yet (like polymorphic allocatable scalar results, finalization, etc). Hooray an ABI breakage! (On the other hand, the finalizer already causes some breakage - but this is worse as with an interface, one can override the .mod-version check.) In my attempts to get this working, I kept the current version - but handled derived types and non-derived types separately. The reason was that functions can occur everywhere but DT/CLASS can only occur at some places. On the other hand, DT/CLASS can have allocatable components and all other kind of nasty things - and se-post comes too early for that. For some reasons, it seems to work if there are no allocatable components and other nastiness. I am not sure which approach is better. In any case, here is my current draft - completely unclean and not touched for about a month. And of course not ready/fully working. (Otherwise, I had posted a patch.) I have not yet looked at your patch - and I will first look through the backlog of gfortran emails/patches before returning to this one. Tobias diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 74e95b0..96de076 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -4226,6 +4226,51 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, } else gfc_conv_expr_reference (parmse, e); +#if 0 + /* Finalize function results after their use as + actual argument. */ + // FIXME: Cleanup of constructors + if (e-expr_type == EXPR_FUNCTION fsym + (fsym-ts.type == BT_CLASS + || (fsym-ts.type == BT_DERIVED + gfc_is_finalizable (e-ts.u.derived, NULL + { + tree final_fndecl, size, array; + gfc_expr *final_expr; + + if (fsym-ts.type == BT_CLASS) + { + gfc_is_finalizable (CLASS_DATA (e)-ts.u.derived, + final_expr); + final_fndecl = gfc_vtable_final_get (parmse.expr); + size = gfc_vtable_size_get (parmse.expr); + array = gfc_class_data_get (parmse.expr); + } + else + { + gfc_se fse; + gfc_is_finalizable (e-ts.u.derived, final_expr); + gfc_init_se (fse, NULL); + gfc_conv_expr (fse, final_expr); + final_fndecl = fse.expr; + size = gfc_typenode_for_spec (e-ts); + size = TYPE_SIZE_UNIT (size); + size = fold_convert (gfc_array_index_type, size); + array = parmse.expr; + } + if (POINTER_TYPE_P (TREE_TYPE (final_fndecl))) + final_fndecl + = build_fold_indirect_ref_loc (input_location, + final_fndecl); + array = gfc_conv_scalar_to_descriptor (parmse, array, + fsym-attr); + array = gfc_build_addr_expr (NULL_TREE, array); + tmp = build_call_expr_loc (input_location, + final_fndecl, 3, array, + size, boolean_false_node); + gfc_add_expr_to_block (parmse.post, tmp); + } +#endif /* Catch base objects that are not variables. */ if (e-ts.type == BT_CLASS @@ -5562,6 +5607,29 @@ gfc_conv_function_expr (gfc_se * se, gfc_expr * expr) gfc_conv_procedure_call (se, sym, expr-value.function.actual, expr, NULL); + /* Ensure that allocatable scalars get deallocated; we only handle + nonderived types as for TYPE/CLASS one runs into ordering problems + with allocatable components. On the other hand, TYPE and CLASS + can only occur with assignment and as actual argument, contrary to + intrinsic types. */ + if (sym-ts.type != BT_CLASS sym-ts.type != BT_DERIVED + ((sym-result !sym-result-as sym-result-attr.allocatable) + || (!sym-result !sym-as sym-attr.allocatable))) +{ + tree tmp; + bool undo_deref = !POINTER_TYPE_P (TREE_TYPE (se-expr)); + + if (undo_deref) + se-expr = gfc_build_addr_expr (NULL, se-expr); + + tmp = gfc_create_var (TREE_TYPE (se-expr), NULL); + gfc_add_modify (se-pre, tmp, se-expr); + + se-expr = tmp; + if (undo_deref) + se-expr = build_fold_indirect_ref_loc (input_location, se-expr); + gfc_add_expr_to_block (se-post, gfc_call_free (tmp)); +} } @@ -5665,7 +5733,18 @@ gfc_conv_initializer (gfc_expr * expr, gfc_typespec * ts, tree type, else if (pointer || procptr) { if (!expr || expr-expr_type == EXPR_NULL) - return fold_convert (type, null_pointer_node); + { + if (ts-type == BT_CLASS) + { + gfc_init_se (se, NULL); + gfc_conv_structure (se, gfc_class_null_initializer(ts, expr), 1); + gcc_assert (TREE_CODE (se.expr) == CONSTRUCTOR); +
Fix gcc.dg/lto/pr56297 failure
Hi, this patch fixes gcc.dg/lto/pr56297 failure. Here we have two modules defining global variable assigned to hard registers. Those variables do not go into assembler name hash because they have no real assembler name and consequentely they are not merged. We however may end up with two declarations being merged by tree merging and then symtab needs to compensate. We solve precisely same problem for builtins and abstract functions already. This patch thus makes us to do the right thing for variables, too. I also added comment since the code is bit weird. Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza * lto-symtab.c (lto_symtab_merge_symbols): Add comments; merge duplicated nodes for assembler names. * symtab.c (symtab_unregister_node): Do not attempt to unlink hard registers from assembler name hash. Index: lto-symtab.c === --- lto-symtab.c(revision 202182) +++ lto-symtab.c(working copy) @@ -586,6 +586,9 @@ lto_symtab_merge_symbols (void) FOR_EACH_SYMBOL (node) { cgraph_node *cnode, *cnode2; + varpool_node *vnode; + symtab_node node2; + if (!node-symbol.analyzed node-symbol.alias_target) { symtab_node tgt = symtab_node_for_asm (node-symbol.alias_target); @@ -594,22 +597,37 @@ lto_symtab_merge_symbols (void) symtab_resolve_alias (node, tgt); } node-symbol.aux = NULL; - + if (!(cnode = dyn_cast cgraph_node (node)) || !cnode-clone_of || cnode-clone_of-symbol.decl != cnode-symbol.decl) { + /* Builtins are not merged via decl merging. It is however +possible that tree merging unified the declaration. We +do not want duplicate entries in symbol table. */ if (cnode DECL_BUILT_IN (node-symbol.decl) (cnode2 = cgraph_get_node (node-symbol.decl)) cnode2 != cnode) lto_cgraph_replace_node (cnode2, cnode); + /* The user defined assembler variables are also not unified by their +symbol name (since it is irrelevant), but we need to unify symbol +nodes if tree merging occured. */ + if ((vnode = dyn_cast varpool_node (node)) + DECL_HARD_REGISTER (vnode-symbol.decl) + (node2 = symtab_get_node (vnode-symbol.decl)) + node2 != node) + lto_varpool_replace_node (dyn_cast varpool_node (node2), + vnode); + + /* Abstract functions may have duplicated cgraph nodes attached; remove them. */ else if (cnode DECL_ABSTRACT (cnode-symbol.decl) (cnode2 = cgraph_get_node (node-symbol.decl)) cnode2 != cnode) cgraph_remove_node (cnode2); + symtab_insert_node_to_hashtable ((symtab_node)node); } } Index: symtab.c === --- symtab.c(revision 202182) +++ symtab.c(working copy) @@ -283,7 +283,8 @@ symtab_unregister_node (symtab_node node else *slot = replacement_node; } - unlink_from_assembler_name_hash (node, false); + if (!is_a varpool_node (node) || !DECL_HARD_REGISTER (node-symbol.decl)) +unlink_from_assembler_name_hash (node, false); } /* Return symbol table node associated with DECL, if any,
Re: [PATCH 1/4] Support lambda templates.
On 02.09.2013 19:34, Jason Merrill wrote: On 09/02/2013 02:30 PM, Adam Butcher wrote: On 01.09.2013 21:22, Jason Merrill wrote: I bet we want convert_from_reference in the non-generic lambda case, too. I think I had made that change originally to keep the two impls the same and I hit issues with non-generic lambdas. But I can't remember the details. I'll try again. Okay, finally got around to trying this again. With convert_from_reference in the non-generic case, the code compiles okay but SEGVs on the attempt to branch to '_FUN'. │105 auto lf0 = [] (float a, int const b) { return a += b; }; │106 │107 INVOKEi (lf, float, 7, 0); │108 AS_FUNi (lf, float, 7, 0); │109 AS_PTRi (lf, float, int, 7, 0); │0x404500 main()+14687 mov%eax,-0x4bc(%rbp) │0x404506 main()+14693 mov0x36f0(%rip),%eax# 0x407bfc │0x40450c main()+14699 mov%eax,-0x4c0(%rbp) │0x404512 main()+14705 movl $0x7,-0x2a4(%rbp) │0x40451c main()+14715 lea-0x2a4(%rbp),%rdx │0x404523 main()+14722 lea-0x4bc(%rbp),%rax │0x40452a main()+14729 mov%rdx,%rsi │0x40452d main()+14732 mov%rax,%rdi │0x404530 main()+14735 callq 0x400934 lambda(float, int const)::_FUN(float , const int ) If it works with convert_from_reference in both cases should I push or should I sort out the parameter pack conversion op issue and roll that up into this? I think roll them together, since that patch rewrites parts of this one. Will assume, for now, that the convert_from_reference call is not wanted in the non-generic case (maybe something to do with using 'build_call_a' instead of 'build_nt_call_vec' or the convert_from_reference on the call itself?) and will focus on the parameter pack stuff (when I get a chance). Cheers, Adam
Re: [Patch] Rewrite regex matchers
On Fri, Aug 30, 2013 at 10:04 PM, Tim Shen timshe...@gmail.com wrote: I didn't use any tool to check that, but adjust it by hand. It shouldn't break anything, but I'll test it again before committing. Tested under -m32, -m64 and debug mode and committed. -- Tim Shen
Cleanup CFG after profile read/instrumentation
Hi, reading profile/instrumenting breaks basic blocks and introduces fake edges. The broken basic blocks are not re-merged until after LTO streaming that is wasteful. Fixed thus. Profiledbotostrapped/regtsted ppc64-linux, comitted. Index: tree-profile.c === --- tree-profile.c (revision 202185) +++ tree-profile.c (working copy) @@ -598,6 +598,8 @@ tree_profiling (void) } } + /* re-merge split blocks. */ + cleanup_tree_cfg (); update_ssa (TODO_update_ssa); rebuild_cgraph_edges ();
[PATCH] Portable Volatility Warning
This is a follow-up patch for Sandra Loosemore's patch in this thread: reimplement -fstrict-volatile-bitfields, v3. It was already posted a few weeks ago, but in the wrong thread. Therefore I re-post it herewith. It was initially suggested by Hans-Peter Nilsson, and I had much help from him in putting everything together. Thanks again H-P. Here is a short specification: The -Wportable-volatility warning is an optional warning, to warn about code for which separate incompatbile definitions on different platforms (or between C and C++) exist even within gcc. It will be usable for driver code you want to be portable on different architectures. This warning should only be emitted if the code is significantly different when -fstrict-volatile-bitfields is used or not. It should not be emitted for the code which is not affected by this option. In other words, it should be emitted on all bit-fields when the definition or the context is volatile, except when the whole structure is not AAPCS ABI compliant, i.e. packed or unaligned. On the other hand you should always get the same warnings if you combine -Wportable-volatility with -fstrict-volatile-bitfields or not. And of course it should not depend on the specific target that is used to compile. I boot-strapped this on a i686-pc-linux-gnu and all Wportable-volatility test cases are passed for C and C++. Additionally I used a cross-compiler for arm-eabi to manually cross-check that the warnings are independent of the target platform. Regards Bernd Edlinger2013-09-03 Bernd Edlinger bernd.edlin...@hotmail.de Implement -Wportable-volatility warning to warn about code which accesses volatile structure members for which different ABI specifications exist. * expr.c (check_portable_volatility): New function. (expand_assignment): call check_portable_volatility. (expand_real_expr_1): Likewise. * fold-const.c (optimize_bit_field_compare): Handle warn_portable_volatility. Removed if-statement, because condition flag_strict_volatile_bitfields 0 is always false. * stor-layout.c (layout_decl): Handle warn_portable_volatility. * c-family/c.opt: Add -Wportable-volatility option. * doc/invoke.texi: Add documentation about -Wportable-volatility. testsuite: c-c++-common/ * Wportable-volatility-1.c: New testcase. * Wportable-volatility-2.c: New testcase. patch-portable-volatility.diff Description: Binary data
RE: [ping**n] reimplement -fstrict-volatile-bitfields, v3
On Mon, 2 Sep 2013 12:56:22 Sandra Loosemore wrote: On 09/02/2013 03:10 AM, Richard Biener wrote: Can someone, in a new thread, ping the patches that are still in flight? ISTR having approved bits of some patches before my leave. Here's the current state of the patch set I put together. I've lost track of where the canonical version of Bernd's followup patch is. On 07/09/2013 10:23 AM, Sandra Loosemore wrote: On 06/30/2013 09:24 PM, Sandra Loosemore wrote: Here is my third attempt at cleaning up -fstrict-volatile-bitfields. Part 1 removes the warnings and packedp flag. It is the same as in the last version, and has already been approved. I'll skip reposting it since the patch is here already: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00908.html Part 2 replaces parts 2, 3, and 4 in the last version. I've re-worked this code significantly to try to address Bernd Edlinger's comments on the last version in PR56997. Part 2: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg1.html Part 3 is the test cases, which are the same as in the last version. Nobody has reviewed these but I assume they are OK if Part 2 is approved? regarding Part 3, I have a small comment on it: The test programs pr56997-*.c depend on stdint.h and other headers. I stumbled over it because I tried to compile the test programs with an eCos cross-compiler, and eCos happens to not have stdint.h. Many test cases try to avoid all dependencies on include files. http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00912.html Part 4 is new; it makes -fstrict-volatile-bitfields not be the default for any target any more. It is independent of the other changes. Part 4: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg2.html -Sandra And the warnings part is re-posted here: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00100.html Bernd.
Re: [PATCH 1/4] Support lambda templates.
On 09/02/2013 05:18 PM, Adam Butcher wrote: Will assume, for now, that the convert_from_reference call is not wanted in the non-generic case (maybe something to do with using 'build_call_a' instead of 'build_nt_call_vec' or the convert_from_reference on the call itself?) Ah, yes; we are passing references through directly as pointers rather than doing the equivalent of *. and will focus on the parameter pack stuff (when I get a chance). Sounds good. Jason
[bootstrap] Fix build for several targets (including pr58242)
Hi, Several builds are broken after r201838. Here is the fix, awaiting review: http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01245.html The targets that are broken now: 1) *linux* targets that do not include config/linux.h in their tm.h (e.g alpha-linux, ppc64-linux etc). For them we have: ../../../../gcc/gcc/config/linux-android.c: In function ‘bool linux_android_libc_has_function(function_class)’: ../../../../gcc/gcc/config/linux-android.c:40:7: error: ‘OPTION_BIONIC’ was not declared in this scope if (OPTION_BIONIC) ^ make[2]: *** [linux-android.o] Error 1 This is addressed in the changes of config/linux-android.c: linux_libc, LIBC_GLIBC and LIBC_BIONIC seem to be declared for all *linux* targets. 2) *uclinux* targets that include config/linux.h. For *uclinux* we do not use linux-protos.h, and therefore linux_android_libc_has_function is not declared there. I don't want to add aditional tmake_file, tm_p_file and extra_objs, so I added explicit define of TARGET_LIBC_HAS_FUNCTION as no_c99_libc_has_function for those targets. 3) *linux* targets that do not append to tmake_file (bfin*-linux-uclibc* or crisv32-*-linux* | cris-*-linux*) Alexander