Re: [PATCH, Pointer Bounds Checker 23/x] Function split
2014-09-19 23:45 GMT+04:00 Jeff Law l...@redhat.com: On 09/16/14 03:09, Ilya Enkovich wrote: I must be misunderstanding something then. I fundamentally don't see how the return bounds are any different here than the return value. If we have exposed the bounds in the IL, then aren't they going to be handled just like any other object in the IL? They are not handled like any other object in IL because return block and all statements in it are not handled as all other statements we put into split part. Here is a comment from find_return_bb: /* Return basic block containing RETURN statement. We allow basic blocks of the form: retval = tmp_var; return retval but return_bb can not be more complex than this. ... */ Phi nodes also may present in return_bb. Right. I've seen this stuff, but it's still not clear to me what the real issue is. The first thing that jumps out when I look at your dump is we don't have a PHI for __bound_tmp.322 in BB6. Now it may be that we really just wanted __bound_tmp.322_36, but that seems wrong as the return value varies depending on how we reach BB6 and it seems to me the bounds ought to vary in a similar manner. Bounds don't have to vary for different pointers. E.g. p and p + 1 always have equal bounds. In this particular case we have function pointers and all of them have default bounds. All blocks going to split part are analyzed by visit_bb function. Return basic block is not analyzed in the same way but still may be copied into split part in case return value is defined in it. There is a special code in visit_bb to add args of phi statements of return_bb as uses of split part to have no undefined values in copied block. It was enough when those phi args plus return value were only uses in return_bb. But now we add returned bounds to GIMPLE_RETURN as a new use and this new use is ignored. If split part returns value then return_bb will be copied into it. It means I should check returned bounds are defined there too. If SSA_NAME_DEF_STMT of returned bounds is in split part then it is OK. If SSA_NAME_DEF_STMT of returned bounds is in return_bb then it is also OK because it means it is a result of PHI node whose args were added as additional uses for split part earlier in visit_bb. At least that is how I think this happens :) Maybe you should post the IL for a case where this all matters and walk me through the key issues. I attach a dump I got from Chrome compilation with no additional checks restrictions in split. Original function returns value defined by phi node in return_bb and bounds defined in BB2. Split part contains BB3, BB4 and BB5 and resulting function part has usage of returned bounds but no producer for it. Right, but my question is whether or not the bounds from BB2 were really the correct bounds to be using in the first place! I would have expected a PHI in BB6 to select the bounds based on the path leading to BB6, much like we select a different return value. Consider we have pointer computation and then return __bnd_init_ptr_bounds (res); In such case you would never have a PHI node for bounds. Also do not forget that we may have no PHI nodes for both return value and return bounds. In such case we could also easily fall into undefined value as in dump. Thanks, Ilya Jeff
Re: [PATCH 0/5] Fix handling of word subregs of wide registers
Jeff Law l...@redhat.com writes: On 09/19/14 01:23, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 09/18/14 04:07, Richard Sandiford wrote: This series is a cleaned-up version of: https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html The underlying problem is that the semantics of subregs depend on the word size. You can't have a subreg for byte 2 of a 4-byte word, say, but you can have a subreg for word 2 of a 4-word value (as well as lowpart subregs of that word, etc.). This causes problems when an architecture has wider-than-word registers, since the addressability of a word can then depend on which register class is used. The register allocators need to fix up cases where a subreg turns out to be invalid for a particular class. This is really an extension of what we need to do for CANNOT_CHANGE_MODE_CLASS. Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf. I thought we fixed these problems long ago with the change to subreg_byte?!? No, that was fixing something else. (I'm just about old enough to remember that too!) The problem here is that (say): (subreg:SI (reg:DI X) 4) is independently addressable on little-endian AArch32 if X assigned to a GPR, but not if X is assigned to a vector register. We need to allow these kinds of subreg on pseudos in order to decompose multiword arithmetic. It's then up to the RA to realise that a reload would be needed if X were assigned to a vector register, since the upper half of a vector register cannot be independently accessed. Note that you could write this example even with the old word-style offsets and IIRC the effect would have been the same. OK. So I kept thinking in terms of the byte offset stuff. But what you're tackling is related to the mess around the mode of the subreg having a different meaning if its smaller than a word vs word-sized or greater. Right? Yeah, that's right. Addressability is based on words, which is inconvenient when your registers are bigger than a word. Thanks, Richard
Re: [PATCH 4/5] Generalise invalid_mode_change_p
Jeff Law l...@redhat.com writes: On 09/18/14 04:25, Richard Sandiford wrote: This is the main patch for the bug. We should treat a register as invalid for a mode change if simplify_subreg_regno cannot provide a new register number for the result. We should treat a class as invalid for a mode change if all registers in the class are invalid. This is an extension of the old CANNOT_CHANGE_MODE_CLASS-based check (simplify_subreg_regno checks C_C_C_M). I forgot to say that the patch is a prerequisite to removing aarch64's C_C_C_M. There are other prerequisites too, but removing C_C_C_M without this patch caused regressions in the existing testsuite, which is why no new tests are needed. gcc/ * hard-reg-set.h: Include hash-table.h. (target_hard_regs): Add a finalize method and a x_simplifiable_subregs field. * target-globals.c (target_globals::~target_globals): Handle hard_regs-finalize. * rtl.h (subreg_shape): New structure. (shape_of_subreg): New function. (simplifiable_subregs): Declare. * reginfo.c (simplifiable_subreg): New structure. (simplifiable_subregs_hasher): Likewise. (simplifiable_subregs): New function. (invalid_mode_changes): Delete. (alid_mode_changes, valid_mode_changes_obstack): New variables. (record_subregs_of_mode): Remove subregs_of_mode parameter. Record valid mode changes in valid_mode_changes. (find_subregs_of_mode): Remove subregs_of_mode parameter. Update calls to record_subregs_of_mode. (init_subregs_of_mode): Remove invalid_mode_changes and bitmap handling. Initialize new variables. Update call to find_subregs_of_mode. (invalid_mode_change_p): Check new variables instead of invalid_mode_changes. (finish_subregs_of_mode): Finalize new variables instead of invalid_mode_changes. (target_hard_regs::finalize): New function. * ira-costs.c (print_allocno_costs): Call invalid_mode_change_p even when CLASS_CANNOT_CHANGE_MODE is undefined. Index: gcc/rtl.h === --- gcc/rtl.h2014-09-15 11:55:40.459855161 +0100 +++ gcc/rtl.h2014-09-15 12:26:21.249077760 +0100 +/* Return the shape of a SUBREG rtx. */ + +static inline subreg_shape +shape_of_subreg (const_rtx x) +{ + return subreg_shape (GET_MODE (SUBREG_REG (x)), + SUBREG_BYTE (x), GET_MODE (x)); +} + Is there some reason you don't have a constructor that accepts a const_rtx? I was worried that by allowing implicit const_rtx-subreg_shape conversions, it would be less obvious that the rtx has to have code SUBREG. I.e. a checked conversion would be hidden in the constructor rather than being explicit. If with David's new rtx hierarchy we end up with an rtx_subreg subclass then I agree we should have a constructor that takes one of those. Thanks, Richard
Re: Speedup int_bit_from_pos
On Sun, 21 Sep 2014, Jan Hubicka wrote: Please omit static from inline functions. Yep, I suppose we want to drop static in all inlines? I can make patch for that. Also one notable difference with your patches is that the fits hwi is now not tested on the result but on the result input which, multiplied by 8, might not fit a hwi now. So please use wide-ints here (the to_offset flavor). The function must always suceed (so user promise it will fit in HWI) and for performance reasons I would rather not go into wide int by defualt, but I can do that with checking enabled. wide-int should be fast enough, please use it. Richard.
Re: [BUILDROBOT] genrecog fix uncovers problem in bfin.md (was: [Patch] Teach genrecog/genoutput that scratch registers require write constraint modifiers)
On Sat, Sep 20, 2014 at 08:40:01PM +0100, Jan-Benedict Glaw wrote: Hi! On Thu, 2014-09-18 11:19:21 +0100, James Greenhalgh james.greenha...@arm.com wrote: As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01334.html The construct (clobber (match_scratch 0 r)) is invalid - operand 0 must be marked either write or read/write. Likewise (match_* 0 r) is invalid, marking an operand earlyclobber does not remove the need to also mark it write or read/write. My build robot shows a new build error, which I guess is caused/uncovered by your genrecog change on bfin-elf (see eg. build http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=355667): build/genrecog /home/jbglaw/repos/gcc/gcc/common.md /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md \ insn-conditions.md tmp-recog.c /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md:1971: constraints not supported in define_split make[1]: *** [s-recog] Error 1 make[1]: Leaving directory `/home/jbglaw/build/bfin-elf/build-gcc/gcc' make: *** [all-gcc] Error 2 Would be nice if the bfin maintainer or you would come up with a fix. Hi Jan, I posted a fix for this on Friday evening at: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01682.html I'm waiting for a bfin maintainer to say OK, as it isn't a port I know well. Thanks, James
Re: [PATCH][match-and-simplify] User defined predicates
On Tue, 16 Sep 2014, Marc Glisse wrote: On Tue, 16 Sep 2014, Richard Biener wrote: The following adds the ability to write predicates using patterns with an example following negate_expr_p which already has a use in comparison folding (via its if c-expr). The syntax is as follows: (match negate_expr_p INTEGER_CST (if (TYPE_OVERFLOW_WRAPS (type) || may_negate_without_overflow_p (t (match negate_expr_p (bit_not @0) (if (INTEGRAL_TYPE_P (type) TYPE_OVERFLOW_WRAPS (type (match negate_expr_p FIXED_CST) (match negate_expr_p (negate @0)) ... that is, you write '(match id' instead of '(simplify' and then follow with a pattern and optional conditionals. There should be no transform pattern (unchecked yet). Multiple matches for the same id simply add to what is recognized as id. The predicate is applied to a single 'tree' operand and looks up SSA defs and utilizes the optional valueize hook. Currently both GENERIC and GIMPLE variants result in name-mangling and the proptotypes (unprototyped anywhere) bool tree_negate_expr_p (tree t); bool gimple_negate_expr_p (tree t, tree (*valueize)(tree) = NULL); Ah, I haven't looked at the generated code, but I was expecting something roughly like: struct matcher { std::functiontree(tree) valueize; bool negate_expr(tree); ... }; where we can call negate_expr recursively without caring about passing valueize (if there are 2 matchers, one without a valueize function, negate_expr can be static in that version). Although recursive calls sound potentially slow, and having a thread_local counter in valueize to limit the call depth may not be ideal. Please note that I am not at all saying the above is a good design, just dropping a random thought. Yeah, abstracting a bit from the low-level interface might be nice. Note that I wouldn't recommend using recursive predicates as on GIMPLE this easily can result in exponential runtime behavior (and capping on a recursion limit looks ugly). Instead of using recursive predicates a lattice of predicate results should be provided by the caller (which would ask for a lattice abstraction as well). So you'd have sth like struct lattice { tree valueize (tree); bool negate_expr_p (tree); bool nonnegative_p (tree); ... }; note that the fold-const.c negate_expr_p is really a predicate on whether negate_expr will be able to simplify. Yes, fold-const.c has recursive transforms, something match-and-simplify doesn't support either. Here the proper way is to implement this kind of stuff in a real pass. I'll rip out the recursive parts of negate_expr_p again at some point, I just put it there as an exercise ;) Richard.
Re: Stream ODR types
On Wed, 17 Sep 2014, Jan Hubicka wrote: Hi, this patch renames types reported by Wodr during LTO bootstrap. Bootrapping/regtesting in progress, OK if it passes? Honza * tree-ssa-ccp.c (prop_value_d): Rename to ... (ccp_prop_value_t): ... this one to avoid ODR violation; update uses. * ipa-prop.c (struct type_change_info): Rename to ... (prop_type_change_infoprop_type_change_info): ... this; update uses. Seems a bit excessive ;) Ok. Thanks, Richard. * ggc-page.c (globals): Rename to ... (static struct ggc_globals): ... this; update uses. * tree-ssa-loop-im.c (mem_ref): Rename to ... (im_mem_ref): ... this; update uses. * ggc-common.c (loc_descriptor): Rename to ... (ggc_loc_descriptor): ... this; update uses. * lra-eliminations.c (elim_table): Rename to ... (lra_elim_table): ... this; update uses. * bitmap.c (output_info): Rename to ... (bitmap_output_info): ... this; update uses. * gcse.c (expr): Rename to ... (gcse_expr) ... this; update uses. (occr): Rename to ... (gcse_occr): .. this; update uses. * tree-ssa-copy.c (prop_value_d): Rename to ... (prop_value_t): ... this. * predict.c (block_info_def): Rename to ... (block_info): ... this; update uses. (edge_info_def): Rename to ... (edge_info): ... this; update uses. * profile.c (bb_info): Rename to ... (bb_profile_info): ... this; update uses. * alloc-pool.c (output_info): Rename to ... (pool_output_info): ... this; update uses. Index: tree-ssa-ccp.c === --- tree-ssa-ccp.c(revision 215328) +++ tree-ssa-ccp.c(working copy) @@ -166,7 +166,7 @@ typedef enum VARYING } ccp_lattice_t; -struct prop_value_d { +struct ccp_prop_value_t { /* Lattice value. */ ccp_lattice_t lattice_val; @@ -180,24 +180,22 @@ struct prop_value_d { widest_int mask; }; -typedef struct prop_value_d prop_value_t; - /* Array of propagated constant values. After propagation, CONST_VAL[I].VALUE holds the constant value for SSA_NAME(I). If the constant is held in an SSA name representing a memory store (i.e., a VDEF), CONST_VAL[I].MEM_REF will contain the actual memory reference used to store (i.e., the LHS of the assignment doing the store). */ -static prop_value_t *const_val; +static ccp_prop_value_t *const_val; static unsigned n_const_val; -static void canonicalize_value (prop_value_t *); +static void canonicalize_value (ccp_prop_value_t *); static bool ccp_fold_stmt (gimple_stmt_iterator *); /* Dump constant propagation value VAL to file OUTF prefixed by PREFIX. */ static void -dump_lattice_value (FILE *outf, const char *prefix, prop_value_t val) +dump_lattice_value (FILE *outf, const char *prefix, ccp_prop_value_t val) { switch (val.lattice_val) { @@ -236,10 +234,10 @@ dump_lattice_value (FILE *outf, const ch /* Print lattice value VAL to stderr. */ -void debug_lattice_value (prop_value_t val); +void debug_lattice_value (ccp_prop_value_t val); DEBUG_FUNCTION void -debug_lattice_value (prop_value_t val) +debug_lattice_value (ccp_prop_value_t val) { dump_lattice_value (stderr, , val); fprintf (stderr, \n); @@ -272,10 +270,10 @@ extend_mask (const wide_int nonzero_bit 4- Initial values of variables that are not GIMPLE registers are considered VARYING. */ -static prop_value_t +static ccp_prop_value_t get_default_value (tree var) { - prop_value_t val = { UNINITIALIZED, NULL_TREE, 0 }; + ccp_prop_value_t val = { UNINITIALIZED, NULL_TREE, 0 }; gimple stmt; stmt = SSA_NAME_DEF_STMT (var); @@ -343,10 +341,10 @@ get_default_value (tree var) /* Get the constant value associated with variable VAR. */ -static inline prop_value_t * +static inline ccp_prop_value_t * get_value (tree var) { - prop_value_t *val; + ccp_prop_value_t *val; if (const_val == NULL || SSA_NAME_VERSION (var) = n_const_val) @@ -366,7 +364,7 @@ get_value (tree var) static inline tree get_constant_value (tree var) { - prop_value_t *val; + ccp_prop_value_t *val; if (TREE_CODE (var) != SSA_NAME) { if (is_gimple_min_invariant (var)) @@ -387,7 +385,7 @@ get_constant_value (tree var) static inline void set_value_varying (tree var) { - prop_value_t *val = const_val[SSA_NAME_VERSION (var)]; + ccp_prop_value_t *val = const_val[SSA_NAME_VERSION (var)]; val-lattice_val = VARYING; val-value = NULL_TREE; @@ -413,7 +411,7 @@ set_value_varying (tree var) For other constants, make sure to drop TREE_OVERFLOW. */ static void -canonicalize_value (prop_value_t *val) +canonicalize_value (ccp_prop_value_t *val) { enum machine_mode mode; tree type; @@ -451,7 +449,7 @@
Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.
Richard, here is reduced patch (part.1) which was reduced almost twice. Let's me also answer on your comments. 1. I really use edge field 'aux' to keep predicate for critical edges. My previous code was not correct and now it looks like: if (EDGE_COUNT (b-succs) == 1 || EDGE_COUNT (e-dest-preds) == 1) /* Edge E is not critical, use predicate of edge source bb. */ c = bb_predicate (b); else /* Edge E is critical and its aux field contains predicate. */ c = edge_predicate (e); 2. I completely delete all code related to creation of conditional expressions and completely rely on bool pattern recognition in vectorizer. But we need to delete all dead predicate computations which are not used since they prevent vectorization. I will add this local-dce function in next patch. 3. I also did not include in this patch recognition of general phi-nodes with two arguments only for which conversion of conditional scalar reduction can be applied also. Note that all these changes are applied for loop marked with pragma omp simd only. 2014-09-22 Yuri Rumyantsev ysrum...@gmail.com * tree-if-conv.c (cgraph.h): Add include file to detect function clone. (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (convert_name_to_cmp): New function. (add_to_predicate_list): Check unconditionally that bb is always executed to early exit. Use predicate of cd-equivalent block for join blocks if it exists. (add_to_dst_predicate_list): Invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_edges_are_critical): New function. (if_convertible_bb_p): Allow bb has more than two predecessors if FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical to reject block if-conversion with incoming critical edges only if FLAG_FORCE_VECTORIZE was not set-up. (predicate_bbs): Skip loop exit block also. Add check that if fold_build2 produces bool conversion, recompute predicate using build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE. (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and atleast one incoming edge is not critical original algorithm is used. (get_predicate_for_edge): New function. (find_insertion_point): New function. (predicate_arbitrary_scalar_phi): New function. (predicate_all_scalar_phis): Introduce new variable BEFORE. Invoke find_insertion_point to initialize gsi and predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals that extended predication must be applied). (insert_gimplified_predicates): Add test for non-predicated basic blocks that there are no gimplified statements to insert. Insert predicates at the block begining for extended if-conversion. (tree_if_conversion): Initialize flag_force_vectorize from current loop or outer loop (to support pragma omp declare).Do loop versioning for innermost loop marked with pragma omp simd and FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges for blocks with two successors. 2014-09-08 17:10 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard! Here is updated patch with the following changes: 1. Any restrictions on phi-function were eliminated for extended conversion. 2. Put predicate for critical edges to 'aux' field of edge, i.e. negate_predicate was deleted. 3. Deleted splitting of critical edges, i.e. both outgoing edges can be critical. 4. Use notion of cd-equivalence to set-up predicate for join basic blocks to simplify it. 5. I decided to not design pre-pass since it will lead generating chain of cond expressions for phi-node if conversion, whereas for phi of kind x = PHI 1(2), 1(3), 2(4) only one cond expression is required and this is considered as simple optimization for arbitrary phi-function. More precise, if phi-function have only two different arguments and one of them has single occurrence, if- conversion is performed as if phi have only 2 arguments. For arbitrary phi function a chain of cond expressions is produced. Updated patch is attached. Any comments will be appreciated. The patch is still very big and does multiple things at once which makes it hard to review. In addition to that it changes function singatures without updating the function comments. For example what is the convert_bool argument doing to add_to_dst_predicate_list? Why do we need all
Re: [GOOGLE] Fix LIPO COMDAT fixup and gcov-tool interactions
On 09/21/14 18:58, Xinliang David Li wrote: the intent is that that points to the gcov_info object of the object file containing the live version of the function. I couldn't quite get this to work though -- it involves emitting a function's gcov_fn_info decl in the same comdat group as the function itself. Another problem is that comdat functions may have different CFGs due to different early inline decisions. Comdatting gcov counters can lead to problems in profile use. Not comdatting profile counters have another advantage -- it allows context sensitive profiling for comdat function inline instances (IPA-inline). IIRC early inlining is done before the counters are created. You're right later inlining may be a problem, and require a non-comdat set of cloned counters. I can't recall exactly at what stage the counters are now inserted relative to inlining. The CFG machinery had a number of significant changes while, and shortly after, I was working on this. You'll see the checking of gfi_ptr-key != gi_ptr in libgcov-driver.c. Are you making use of this machinery, or inventing new machinery? Teresa's method is a different machinery -- it tries to propagate profile data from the selected comdat copy + inline instance copies to comdat copies with zero counts. It'd be preferrable to complete the mechanism I outline above, rather than have a competing mechanism. Also, this patch is in effect lying because the data then makes it look like the unselected comdat instances are in fact being executed -- looking at the whole program it's going to be harder to understand whether the different inline instances are being executed multiple times, or are duplicate data. Does the gcov user output indicate this subtlety in some way? nathan
Re: ptx preliminary address space fixes [4/4]
On Wed, Sep 17, 2014 at 10:15 PM, Bernd Schmidt ber...@codesourcery.com wrote: On 09/11/2014 01:41 PM, Richard Biener wrote: On Thu, Sep 11, 2014 at 12:12 PM, Bernd Schmidt ber...@codesourcery.com wrote: This one isn't a wrong-code issue, just a missed optimization. The strlen optimizations need to be made to look through ADDR_SPACE_CONVERT_EXPR to work on ptx. Bootstrapped and tested together with the other patches on x86_64-linux. Ok? Did you try adding ADDR_SPACE_CONVERT_EXPR to the tree codes handled in gimple_assign_cast_p? I did now (full test on x86_64, and also tested with ptx), and that also appears to work. Ok? Ok. Thanks, Richard. Bernd
Re: ptx preliminary rtl patches [3/4]
On Thu, Sep 18, 2014 at 8:57 PM, Bernd Schmidt ber...@codesourcery.com wrote: On 09/12/2014 10:04 AM, Richard Biener wrote: On Thu, Sep 11, 2014 at 6:36 PM, Bernd Schmidt ber...@codesourcery.com wrote: I strongly disagree. It's the same as for any other integer - there's one sign bit, and since there aren't any other bits, the number of sign bit copies is always exactly 1. I agree about that. But I fail to see what goes wrong with the existing code in combine. Maybe the code simply doesn't work for GET_MODE_PRECISION != GET_MODE_BITSIZE? I had to debug it again - the patch was a year old. This time I came to the conclusion that we're just using the wrong mode. We're trying to simplify (ne:BI (reg:HI x) (const_int 0)), and the code here was using BImode when calling num_sign_bit_copies for the register - what I think it wants to do is verify that the operand consists of all zeros or all ones. Digging a bit further I noticed that some of the cases around this code have a mode == GET_MODE (op0) test. These were added by rth in commit 3573fd048, which added BImode. It looks like this particular case slipped through the cracks. The easiest way to fix it is the below - bootstrapped and tested on x86_64-linux, ok if it also works with ptx? Ok. Thanks, Richard. Bernd
Re: Small fix for walking constructors
On Thu, Sep 18, 2014 at 10:38 PM, Jeff Law l...@redhat.com wrote: On 09/18/14 13:01, Bernd Schmidt wrote: This fixes an issue on ptx where we fail to output a declaration for a variable. The testcase is c-torture/compile/pr34856.c, and the cause of the problem is that the variable g is never inserted into the varpool, which is where a future patch will look for references to variables not defined in the current translation unit (ptx assembly requires declarations for these too). Bootstrapped and tested on x86_64-linux, ok? Bernd walk-more.diff commit 968a508fdd5c413147b9c26d37633bf7ab7a7e65 Author: Bernd Schmidtber...@codesourcery.com Date: Thu Sep 11 14:35:01 2014 +0200 Fix handling of CONSTRUCTORs in gimple-walk. * gimple-walk.c (walk_stmt_load_store_addr_ops): Look past casts when dealing with CONSTRUCTORs. OK. Errr - certainly not. It seems to me that walk_stmt_load_store_addr_ops is called on bogus input. The function is supposed to be called on GIMPLE stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have conversions in their elements. Please revert if you have applied already. Thanks, Richard. Jeff
Re: [PATCH] PR63300 'const volatile' sometimes stripped in debug info.
On Sat, Sep 20 2014, Mark Wielaard wrote: When adding DW_TAG_restrict_type I made a mistake when updating the code that handled types with multiple modifiers. This patch fixes it by putting the logic for finding the sub-qualified type in a separate function and fall back to adding the modifiers separately if there is no such existing type. The old tests didn't catch this case because there always was an existing sub-qualified type already. The new testcase fails before and succeeds after this patch. gcc/ChangeLog * dwarf2out.c (existing_sub_qualified_type): New function. (modified_type_die): Use existing_sub_qualified_type. Fall back to adding modifiers one by one of there is no existing sub-qualified type. gcc/testsuite/ChangeLog * gcc.dg/guality/pr63300-const-volatile.c: New testcase. --- gcc/dwarf2out.c| 85 ++ .../gcc.dg/guality/pr63300-const-volatile.c| 12 +++ 2 files changed, 84 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/guality/pr63300-const-volatile.c diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index e87ade2..0cbc316 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -10461,6 +10461,51 @@ decl_quals (const_tree decl) ? TYPE_QUAL_VOLATILE : TYPE_UNQUALIFIED)); } +/* Returns true if CV_QUALS contains QUAL and we have a qualified + variant of TYPE that has at least one other qualifier found in + CV_QUALS. Returns false if CV_QUALS doesn't contain QUAL, if + CV_QUALS is empty after subtracting QUAL, or if we don't have a + TYPE that has at least one qualifier from CV_QUALS minus QUAL. */ +static bool +existing_sub_qualified_type (tree type, int cv_quals, int qual) +{ + int sub_qual, sub_quals = cv_quals ~qual; + if ((cv_quals qual) == TYPE_UNQUALIFIED || sub_quals == TYPE_UNQUALIFIED) +return false; + + sub_qual = TYPE_QUAL_CONST; + if ((sub_quals ~sub_qual) != TYPE_UNQUALIFIED + get_qualified_type (type, sub_quals ~sub_qual) != NULL_TREE) +return true; + + sub_qual = TYPE_QUAL_VOLATILE; + if ((sub_quals ~sub_qual) != TYPE_UNQUALIFIED + get_qualified_type (type, sub_quals ~sub_qual) != NULL_TREE) +return true; + + sub_qual = TYPE_QUAL_RESTRICT; + if ((sub_quals ~sub_qual) != TYPE_UNQUALIFIED + get_qualified_type (type, sub_quals ~sub_qual) != NULL_TREE) +return true; + + sub_qual = TYPE_QUAL_CONST TYPE_QUAL_VOLATILE; You probably mean '|' instead of '' here. + if ((sub_quals ~sub_qual) != TYPE_UNQUALIFIED + get_qualified_type (type, sub_quals ~sub_qual) != NULL_TREE) +return true; + + sub_qual = TYPE_QUAL_CONST TYPE_QUAL_RESTRICT; See above. + if ((sub_quals ~sub_qual) != TYPE_UNQUALIFIED + get_qualified_type (type, sub_quals ~sub_qual) != NULL_TREE) +return true; + + sub_qual = TYPE_QUAL_VOLATILE TYPE_QUAL_RESTRICT; See above. + if ((sub_quals ~sub_qual) != TYPE_UNQUALIFIED + get_qualified_type (type, sub_quals ~sub_qual) != NULL_TREE) +return true; + + return false; +} IIUC, 'sub_qual' above represents the qualifiers to *omit* from the ones we're interested in, right? Maybe it would be more straightforward to reverse the logic, i.e., start with sub_qual = TYPE_QUAL_VOLATILE | TYPE_QUAL_RESTRICT; and then always use sub_qual instead of ~sub_qual. Also note that the logic wouldn't scale too well for yet more qualifiers... + /* Given a pointer to an arbitrary ..._TYPE tree node, return a debugging entry that chains various modifiers in front of the given type. */ @@ -10543,34 +10588,48 @@ modified_type_die (tree type, int cv_quals, dw_die_ref context_die) mod_scope = scope_die_for (type, context_die); - if ((cv_quals TYPE_QUAL_CONST) - /* If there are multiple type modifiers, prefer a path which - leads to a qualified type. */ - (((cv_quals ~TYPE_QUAL_CONST) == TYPE_UNQUALIFIED) - || get_qualified_type (type, cv_quals) == NULL_TREE - || (get_qualified_type (type, cv_quals ~TYPE_QUAL_CONST) - != NULL_TREE))) + /* If there are multiple type modifiers, prefer a path which + leads to a qualified type. */ + if (existing_sub_qualified_type (type, cv_quals, TYPE_QUAL_CONST)) { mod_type_die = new_die (DW_TAG_const_type, mod_scope, type); sub_die = modified_type_die (type, cv_quals ~TYPE_QUAL_CONST, context_die); } - else if ((cv_quals TYPE_QUAL_VOLATILE) - (((cv_quals ~TYPE_QUAL_VOLATILE) == TYPE_UNQUALIFIED) -|| get_qualified_type (type, cv_quals) == NULL_TREE -|| (get_qualified_type (type, cv_quals ~TYPE_QUAL_VOLATILE) -!= NULL_TREE))) + else if (existing_sub_qualified_type (type, cv_quals, TYPE_QUAL_VOLATILE)) { mod_type_die = new_die
Re: Small fix for walking constructors
On Mon, Sep 22, 2014 at 10:58 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Sep 18, 2014 at 10:38 PM, Jeff Law l...@redhat.com wrote: On 09/18/14 13:01, Bernd Schmidt wrote: This fixes an issue on ptx where we fail to output a declaration for a variable. The testcase is c-torture/compile/pr34856.c, and the cause of the problem is that the variable g is never inserted into the varpool, which is where a future patch will look for references to variables not defined in the current translation unit (ptx assembly requires declarations for these too). Bootstrapped and tested on x86_64-linux, ok? Bernd walk-more.diff commit 968a508fdd5c413147b9c26d37633bf7ab7a7e65 Author: Bernd Schmidtber...@codesourcery.com Date: Thu Sep 11 14:35:01 2014 +0200 Fix handling of CONSTRUCTORs in gimple-walk. * gimple-walk.c (walk_stmt_load_store_addr_ops): Look past casts when dealing with CONSTRUCTORs. OK. Errr - certainly not. It seems to me that walk_stmt_load_store_addr_ops is called on bogus input. The function is supposed to be called on GIMPLE stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have conversions in their elements. Please revert if you have applied already. For the testcase I can indeed see bb 2: pin_3 = {(unsigned int) (long int) g[16]}; but that's invalid GIMPLE, unfortunately not caught by out checker. Please fix the root cause and add checking to verify_gimple_assign_single. Thanks, Richard. Thanks, Richard. Jeff
Re: [PATCH 0/5] Fix handling of word subregs of wide registers
On Thu, Sep 18, 2014 at 3:07 AM, Richard Sandiford richard.sandif...@arm.com wrote: This series is a cleaned-up version of: https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html The underlying problem is that the semantics of subregs depend on the word size. You can't have a subreg for byte 2 of a 4-byte word, say, but you can have a subreg for word 2 of a 4-word value (as well as lowpart subregs of that word, etc.). This causes problems when an architecture has wider-than-word registers, since the addressability of a word can then depend on which register class is used. The register allocators need to fix up cases where a subreg turns out to be invalid for a particular class. This is really an extension of what we need to do for CANNOT_CHANGE_MODE_CLASS. Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf. This sounds like something which should be tested on spu as it is the main target that I can think of which has wider-than-word registers and that has had issues with subreg. I can't remember if the simulator for SPU is free (as in beer) and would run on anything besides PowerPC. It has been more than 4 years since I looked into the spu back-end also. Thanks, Andrew Pinski Thanks, Richard
Re: [BUILDROBOT] genrecog fix uncovers problem in bfin.md (was: [Patch] Teach genrecog/genoutput that scratch registers require write constraint modifiers)
On Mon, 2014-09-22 08:58:34 +0100, James Greenhalgh james.greenha...@arm.com wrote: On Sat, Sep 20, 2014 at 08:40:01PM +0100, Jan-Benedict Glaw wrote: My build robot shows a new build error, which I guess is caused/uncovered by your genrecog change on bfin-elf (see eg. build http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=355667): [...] build/genrecog /home/jbglaw/repos/gcc/gcc/common.md /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md \ insn-conditions.md tmp-recog.c /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md:1971: constraints not supported in define_split make[1]: *** [s-recog] Error 1 make[1]: Leaving directory `/home/jbglaw/build/bfin-elf/build-gcc/gcc' make: *** [all-gcc] Error 2 Would be nice if the bfin maintainer or you would come up with a fix. I posted a fix for this on Friday evening at: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01682.html I'm waiting for a bfin maintainer to say OK, as it isn't a port I know well. Ah great! I somehow missed to recognize your email, sorry for the noise. MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of:http://www.chiark.greenend.org.uk/~sgtatham/bugs.html the second : signature.asc Description: Digital signature
Re: Small fix for walking constructors
On 09/22/2014 11:00 AM, Richard Biener wrote: It seems to me that walk_stmt_load_store_addr_ops is called on bogus input. The function is supposed to be called on GIMPLE stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have conversions in their elements. Please revert if you have applied already. For the testcase I can indeed see bb 2: pin_3 = {(unsigned int) (long int) g[16]}; but that's invalid GIMPLE, unfortunately not caught by out checker. Please fix the root cause and add checking to verify_gimple_assign_single. Hmm, fix how exactly? What representation do you want for an initializer where a pointer is cast to an int (or to a different address space, something that will be possible with patches I'll submit in the near future)? Bernd
Re: Small fix for walking constructors
On Mon, Sep 22, 2014 at 11:00 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Sep 22, 2014 at 10:58 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Sep 18, 2014 at 10:38 PM, Jeff Law l...@redhat.com wrote: On 09/18/14 13:01, Bernd Schmidt wrote: This fixes an issue on ptx where we fail to output a declaration for a variable. The testcase is c-torture/compile/pr34856.c, and the cause of the problem is that the variable g is never inserted into the varpool, which is where a future patch will look for references to variables not defined in the current translation unit (ptx assembly requires declarations for these too). Bootstrapped and tested on x86_64-linux, ok? Bernd walk-more.diff commit 968a508fdd5c413147b9c26d37633bf7ab7a7e65 Author: Bernd Schmidtber...@codesourcery.com Date: Thu Sep 11 14:35:01 2014 +0200 Fix handling of CONSTRUCTORs in gimple-walk. * gimple-walk.c (walk_stmt_load_store_addr_ops): Look past casts when dealing with CONSTRUCTORs. OK. Errr - certainly not. It seems to me that walk_stmt_load_store_addr_ops is called on bogus input. The function is supposed to be called on GIMPLE stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have conversions in their elements. Please revert if you have applied already. For the testcase I can indeed see bb 2: pin_3 = {(unsigned int) (long int) g[16]}; but that's invalid GIMPLE, unfortunately not caught by out checker. Please fix the root cause and add checking to verify_gimple_assign_single. I'm on it. The reason for the invalid GIMPLE is the gimplifier which says well, looks like a constant for the target! and doesn't gimplify at all then. Oops (but only for vectors?!). Introduced by r118747 and only semi-fixed by r129739. The original rev. is also just a bugfix for an ICE. Thus I am testing the following. 2014-09-22 Richard Biener rguent...@suse.de * gimplify.c (gimplify_init_constructor): Do not leave non-GIMPLE vector constructors around. * tree-cfg.c (verify_gimple_assign_single): Verify that CONSTRUCTORs have gimple elements. Richard. Thanks, Richard. Thanks, Richard. Jeff p Description: Binary data
Re: Small fix for walking constructors
On Mon, Sep 22, 2014 at 11:52 AM, Bernd Schmidt ber...@codesourcery.com wrote: On 09/22/2014 11:00 AM, Richard Biener wrote: It seems to me that walk_stmt_load_store_addr_ops is called on bogus input. The function is supposed to be called on GIMPLE stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have conversions in their elements. Please revert if you have applied already. For the testcase I can indeed see bb 2: pin_3 = {(unsigned int) (long int) g[16]}; but that's invalid GIMPLE, unfortunately not caught by out checker. Please fix the root cause and add checking to verify_gimple_assign_single. Hmm, fix how exactly? What representation do you want for an initializer where a pointer is cast to an int (or to a different address space, something that will be possible with patches I'll submit in the near future)? For the above _4 = (long int) g[16]; _5 = (unsigned int) _4; pin_3 = { _5 }; it's GIMPLE after all, not GENERIC. Richard. Bernd
Re: [Patch bfin] Fixup use of constraints in define_split
On 09/19/2014 11:32 PM, James Greenhalgh wrote: As with the earlier patch for sh ( https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01627.html ), this fixes the fallout caused by https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01615.html. These are build failures, and the fixes are obvious, but I don't know my way around the failing ports, so I'd like an explicit maintainer ack. For testing, I've just checked that the build error is resolved. Looks obvious to me too. Thanks! Bernd
Re: [PATCH] gcc-gdb-test.exp: Handle old GDB short int and long int types.
On Sat, Sep 20, 2014 at 11:21:25PM +0200, Mark Wielaard wrote: Old GDB might show short and long as short int and long int. This made gcc.dg/guality/const-volatile.c ans restrict.c fail on older GDBs. According to the patch that changed this in newer versions of GDB this was a bug: https://sourceware.org/ml/gdb-patches/2012-09/msg00455.html The patch transforms the types short int and long int coming from GDB to plain short and long. And a variant has been added to the const-volatile.c testcase to make sure short and long long are handled correctly now with older GDB. Tested against GDB 7.7.1 and 7.4.50. gcc/testsuite/ChangeLog * lib/gcc-gdb-test.exp (gdb-test): Transform gdb types short int and long int to plain short and long. * gcc.dg/guality/const-volatile.c (struct bar): New struct containing short and long long fields. (bar): New variable to test the type. Ok, with a minor nit: --- a/gcc/testsuite/lib/gcc-gdb-test.exp +++ b/gcc/testsuite/lib/gcc-gdb-test.exp @@ -111,6 +111,10 @@ proc gdb-test { args } { # Squash all extra whitespace/newlines that gdb might use for # pretty printing into one so result is just one line. regsub -all {[\n\r\t ]+} $type type + # Old gdb might output long int instead of just long + # and short int instead of just short. Canonicalize. +regsub -all {\mlong int\M} $type long type +regsub -all {\mshort int\M} $type short type Please fix whitespace on the above 2 lines, should be tab + 4 spaces instead of 12 spaces. Jakub
Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge
On 19/09/14 17:19, Jeff Law wrote: On 09/19/14 10:02, Jiong Wang wrote: On 19/09/14 16:49, Jeff Law wrote: Probably. Though I'd be a bit concerned with next_block-next_bb. Wouldn't it be safer to stash away the relevant basic block prior to the call to split_edge, then use that saved copy. Something like this (untested): basic_block old_dest = live_edge-dest; next_block = split_edge (live_edge); /* We create a new basic block. Call df_grow_bb_info to make sure all data structures are allocated. */ df_grow_bb_info (df_live); bitmap_and (df_get_live_in (next_block), df_get_live_out (bb), df_get_live_in (old_dest)); The idea being we don't want to depend on the precise ordering blocks in the block chain. Could you try that and see if it does what you need? Jeff, Thanks, verified, it works. Great. Can you send an updated patchkit for review. patch attached. please review, thanks. gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the live-in of new created BB as the intersection of live-in from old_dest and live-out from bb. diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index fd24135..63deadf 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -217,12 +217,15 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, if (!df_live) return false; + basic_block old_dest = live_edge-dest; next_block = split_edge (live_edge); /* We create a new basic block. Call df_grow_bb_info to make sure all data structures are allocated. */ df_grow_bb_info (df_live); - bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb)); + + bitmap_and (df_get_live_in (next_block), df_get_live_out (bb), + df_get_live_in (old_dest)); df_set_bb_dirty (next_block); /* We should not split more than once for a function. */
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 19/09/14 21:43, Jeff Law wrote: On 09/15/14 08:33, Jiong Wang wrote: Jeff, thanks, I partially understand your meaning here. take the function ira_implicitly_set_insn_hard_regs in ira-lives.c for example, when generating address rtl, gcc will automatically generate const operator to prefix the address expression, like the following. so a simple CONSTANT_P check is enough in case there is no embedded register. (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94]) (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40] var_decl 0x2b2c2ff91510 recog_data) (const_int 480 [0x1e0] -1 but for architecture like aarch64, the following instruction sequences to forming address may be generated (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99]) (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) 35 {*movdi_aarch64} (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats)) (nil))) (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83]) (lo_sum:DI (reg/f:DI 20 x20 [99]) (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) {add_losym_di} (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats) (nil))) while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ, Hmm, it's been ~15 years since I regularly worked on a target that uses HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as well, but reading the docs for CONST, that clearly isn't the case. Sorry for that. Can you (re) send your current patch for this for review? patch attached. please review, thanks. gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P (src) to release more instruction sink opportunities. gcc/testsuite/ * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase. diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index fd24135..739e957 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include bb-reorder.h #include shrink-wrap.h #include regcprop.h +#include rtl-iter.h #ifdef HAVE_simple_return @@ -169,7 +170,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, { rtx set, src, dest; bitmap live_out, live_in, bb_uses, bb_defs; - unsigned int i, dregno, end_dregno, sregno, end_sregno; + unsigned int i, dregno, end_dregno; + unsigned int sregno = FIRST_PSEUDO_REGISTER; + unsigned int end_sregno = FIRST_PSEUDO_REGISTER; basic_block next_block; edge live_edge; @@ -179,7 +182,34 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, return false; src = SET_SRC (set); dest = SET_DEST (set); - if (!REG_P (dest) || !REG_P (src) + + if (!REG_P (src)) +{ + unsigned int reg_num = 0; + unsigned int nonconstobj_num = 0; + rtx src_inner = NULL_RTX; + + subrtx_var_iterator::array_type array; + FOR_EACH_SUBRTX_VAR (iter, array, src, ALL) + { + rtx x = *iter; + if (REG_P (x)) + { + reg_num++; + src_inner = x; + } + else if (!CONSTANT_P (x) OBJECT_P (x)) + nonconstobj_num++; + } + + if (nonconstobj_num 0 + || reg_num 1) + src = NULL_RTX; + else if (reg_num == 1) + src = src_inner; +} + + if (!REG_P (dest) || src == NULL_RTX /* STACK or FRAME related adjustment might be part of prologue. So keep them in the entry block. */ || dest == stack_pointer_rtx @@ -188,10 +218,13 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, return false; /* Make sure that the source register isn't defined later in BB. */ - sregno = REGNO (src); - end_sregno = END_REGNO (src); - if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno)) -return false; + if (REG_P (src)) +{ + sregno = REGNO (src); + end_sregno = END_REGNO (src); + if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno)) + return false; +} /* Make sure that the destination register isn't referenced later in BB. */ dregno = REGNO (dest); diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c new file mode 100644 index 000..ad2e588 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-rtl-pro_and_epilogue } */ + +extern char *asm_out_file; +extern void default_elf_asm_output_ascii (char *, const char *, int); + +void +assemble_string (const char *p, int size) +{ + int pos = 0; + int maximum = 2000; + + while (pos size) +{ + int thissize = size - pos; + + if (thissize maximum) + thissize = maximum; + + default_elf_asm_output_ascii (asm_out_file, p,
Re: [PATCH] remove duplicated lines in gcc/fortran/resolve.c
Le 21 sept. 2014 à 10:44, FX fxcoud...@gmail.com a écrit : AFAICT the lines 11200-11222 in gcc/fortran/resolve.c are a copy of the lines 11176-11198. The duplicates were introduced by revision 126468, an unrelated patch, after the original commit of the code as 126466. It looks like a svn/patch mishap. Thanks for tracking the origin of the problem. After having tested a patch, I usually do not revert it before doing the svn update when it is committed. If a last minute change has been made between the patch and the commit, it results in general a conflict during the svn update. However I have seen a couple times the patched tree is simply merged with the update leading to a duplicated piece of code. For the record, while the two blocks were identical from a functional pout of view, their formatting were slightly different: PUBLIC interface '%s' at %L takes dummy arguments of '%s' which is PRIVATE, iface-sym-name, --- PUBLIC interface '%s' at %L takes dummy arguments of '%s' which is PRIVATE, iface-sym-name, The following patch removes the duplicated lines. OK for the trunk? You didn’t say if it was regtested. OK if it was (one can never be too sure!) The patch has been in my working tree for months. I have posted recent test results with the patch (and r211089 reverted to fix pr61387) at https://gcc.gnu.org/ml/gcc-testresults/2014-09/msg02018.html Thanks, FX This is what I have committed as revision r215452 Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (revision 215451) +++ gcc/fortran/ChangeLog (working copy) @@ -1,3 +1,7 @@ +2014-09-21 Dominique d'Humieres domi...@lps.ens.fr + + * resolve.c (resolve_fl_procedure): Remove duplicated lines. + 2014-09-20 Alessandro Fanfarillo fanfarillo@gmail.com Tobias Burnus bur...@net-b.de Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c (revision 215451) +++ gcc/fortran/resolve.c (working copy) @@ -11196,30 +11196,6 @@ } } } - - /* PUBLIC interfaces may expose PRIVATE procedures that take types -PRIVATE to the containing module. */ - for (iface = sym-generic; iface; iface = iface-next) - { - for (arg = gfc_sym_get_dummy_args (iface-sym); arg; arg = arg-next) - { - if (arg-sym - arg-sym-ts.type == BT_DERIVED - !arg-sym-ts.u.derived-attr.use_assoc - !gfc_check_symbol_access (arg-sym-ts.u.derived) - !gfc_notify_std (GFC_STD_F2003, Procedure '%s' in - PUBLIC interface '%s' at %L takes - dummy arguments of '%s' which is - PRIVATE, iface-sym-name, - sym-name, iface-sym-declared_at, - gfc_typename(arg-sym-ts))) - { - /* Stop this message from recurring. */ - arg-sym-ts.u.derived-attr.access = ACCESS_PUBLIC; - return false; - } -} - } } if (sym-attr.function sym-value sym-attr.proc != PROC_ST_FUNCTION Thanks for the review, Dominique
Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result
On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence alan.lawre...@arm.com wrote: This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes. These are presently documented as producing a vector with the result in element 0, and this is inconsistent with their use in tree-vect-loop.c (which on bigendian targets pulls the bits out of the wrong end of the vector result). This leads to bugs on bigendian targets - see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114. I discounted fixing the vectorizer (to read from element 0) and then making bigendian targets (whose architectural insn produces the result in lane N-1) permute the result vector, as optimization of vectors in RTL seems unlikely to remove such a permute and would lead to a performance regression. Instead it seems more natural for the tree code to produce a scalar result (producing a vector with the result in lane 0 has already caused confusion, e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html). However, this patch preserves the meaning of the optab (producing a result in lane 0 on little-endian architectures or N-1 on bigendian), thus generally avoiding the need to change backends. Thus, expr.c extracts an endianness-dependent element from the optab result to give the result expected for the tree code. Previously posted as an RFC https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra VIEW_CONVERT_EXPR if the types of the reduction/result do not match. Huh. Does that ever happen? Please use a NOP_EXPR instead of a VIEW_CONVERT_EXPR. Ok with that change. Thanks, Richard. Testing: x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++ aarch64-none-linux-gnu: bootstrap aarch64-none-elf: check-gcc, check-g++ arm-none-eabi: check-gcc aarch64_be-none-elf: check-gcc, showing FAIL-PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test FAIL-PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test Passes the (previously-failing) reduced testcase on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114 Have also assembler/stage-1 tested that testcase on PowerPC, also fixed. gcc/ChangeLog: * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add extract_bit_field around optab result. * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR, produce scalar not vector. * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand type for REDUC_{MIN,MAX,PLUS}_EXPR. * tree-vect-loop.c (vect_analyze_loop): Update comment. (vect_create_epilog_for_reduction): For direct vector reduction, use result of tree code directly without extract_bit_field. * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update comment.
[committed] Fix -fcompare-debug issue in simd clone creation (PR debug/63328)
Hi! Obviously, it is a bad idea to emit gimple assign stmts for temporaries needed by debug stmts, we have to emit debug temporaries instead, otherwise we generate different code between -g0 and -g. Tested on x86_64-linux, committed to trunk/4.9. 2014-09-22 Jakub Jelinek ja...@redhat.com PR debug/63328 * omp-low.c (ipa_simd_modify_stmt_ops): For debug stmts insert a debug source bind stmt setting DEBUG_EXPR_DECL instead of a normal gimple assignment stmt. * c-c++-common/gomp/pr63328.c: New test. --- gcc/omp-low.c.jj2014-09-08 22:12:46.0 +0200 +++ gcc/omp-low.c 2014-09-22 11:44:47.751338842 +0200 @@ -11717,9 +11717,22 @@ ipa_simd_modify_stmt_ops (tree *tp, int if (tp != orig_tp) { repl = build_fold_addr_expr (repl); - gimple stmt - = gimple_build_assign (make_ssa_name (TREE_TYPE (repl), NULL), repl); - repl = gimple_assign_lhs (stmt); + gimple stmt; + if (is_gimple_debug (info-stmt)) + { + tree vexpr = make_node (DEBUG_EXPR_DECL); + stmt = gimple_build_debug_source_bind (vexpr, repl, NULL); + DECL_ARTIFICIAL (vexpr) = 1; + TREE_TYPE (vexpr) = TREE_TYPE (repl); + DECL_MODE (vexpr) = TYPE_MODE (TREE_TYPE (repl)); + repl = vexpr; + } + else + { + stmt = gimple_build_assign (make_ssa_name (TREE_TYPE (repl), +NULL), repl); + repl = gimple_assign_lhs (stmt); + } gimple_stmt_iterator gsi = gsi_for_stmt (info-stmt); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); *orig_tp = repl; --- gcc/testsuite/c-c++-common/gomp/pr63328.c.jj2014-09-22 12:09:50.140724501 +0200 +++ gcc/testsuite/c-c++-common/gomp/pr63328.c 2014-09-22 12:09:45.371745608 +0200 @@ -0,0 +1,5 @@ +/* PR debug/63328 */ +/* { dg-do compile } */ +/* { dg-options -O2 -fopenmp-simd -fno-strict-aliasing -fcompare-debug } */ + +#include pr60823-3.c Jakub
Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars
On Thu, Sep 18, 2014 at 1:54 PM, Alan Lawrence alan.lawre...@arm.com wrote: These match their corresponding tree codes, by taking a vector and returning a scalar; this is more architecturally neutral than the (somewhat loosely defined) previous optab that took a vector and returned a vector with the result in the least significant bits (i.e. element 0 for little-endian or N-1 for bigendian). However, the old optabs are preserved so as not to break existing backends, so clients check for both old + new optabs. Bootstrap, check-gcc and check-g++ on x86_64-none-linux-gnu. aarch64.exp + vect.exp on aarch64{,_be}-none-elf. (of course at this point in the series all these are using the old optab + migration path.) scalar_reduc_to_vector misses a comment. I wonder if at the end we wouldn't transition all backends and then renaming reduc_*_scal_optab back to reduc_*_optab makes sense. The optabs have only one mode - I wouldn't be surprised if an ISA invents for example v4si - di reduction? So do we want to make reduc_plus_scal_optab a little bit more future proof (maybe there is already an ISA that supports this kind of reduction?). Otherwise the patch looks good to me. Thanks, Richard. gcc/ChangeLog: * doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal optabs, and note in reduc_[us](plus|min|max) to prefer the former. * expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall back to old reduc_... + BIT_FIELD_REF only if not. * optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR, return the reduce-to-scalar (reduc_..._scal) optab. (scalar_reduc_to_vector): New. * optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab, reduc_plus_scal_optab, reduc_umax_scal_optab, reduc_umin_scal_optab): New. * optabs.h (scalar_reduc_to_vector): Declare. * tree-vect-loop.c (vectorizable_reduction): Look for optabs reducing to either scalar or vector.
Re: [PATCH 7/14][Testsuite] Add tests of reductions using whole-vector-shifts (multiplication)
On Thu, Sep 18, 2014 at 2:19 PM, Alan Lawrence alan.lawre...@arm.com wrote: For reduction operations (e.g. multiply) that don't have such a tree code ,or where the target platform doesn't define an optab handler for the tree code, we can perform the reduction using a series of log(N) shifts (where N = #elements in vector), using the VEC_RSHIFT_EXPR=whole-vector-shift tree code (if the platform handles the vec_shr_optab). First stage is to add some tests of non-(min/max/plus) reductions; here, multiplies. The first is designed to be non-foldable, so we make sure the architectural instructions line up with what the tree codes specify. The second is designed to be easily constant-propagated, to test the (currently endianness-dependent) constant folding code. In lib/target-supports.exp, I've defined a new check_effective_target_whole_vector_shift, which I intended to define to true for platforms with the vec_shr optab. However, I've not managed to make this test pass on PowerPC - even with -maltivec, -fdump-tree-vect-details gives me a message about the target not supporting vector multiplication - so I've omitted PowerPC from the whole_vector_shift. This doesn't feel right, suggestions welcomed from PowerPC maintainers? Tests passing on arm-none-eabi and x86_64-none-linux-gnu; also verified the scan-tree-dump part works on ia64-none-linux-gnu (by compiling to assembly only). (Tests are not run on AArch64, because we have no vec_shr_optab at this point; PowerPC, as above; or MIPS, as check_effective_target_vect_int_mult yields 0.) Ok. Thanks, Richard. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_whole_vector_shift): New. * gcc.dg/vect/vect-reduc-mul_1.c: New test. * gcc.dg/vect/vect-reduc-mul_2.c: New test.
Re: [PATCH 8/14][Testsuite] Add tests of reductions using whole-vector-shifts (ior)
On Thu, Sep 18, 2014 at 2:25 PM, Alan Lawrence alan.lawre...@arm.com wrote: These are like the previous patch, but using | rather than * - I was unable to get the previous test to pass on PowerPC and MIPS. I note there is no inherent vector operation here - a bitwise OR across a word, and a reduction via shifts using scalar (not vector) ops would be all that's necessary. However, GCC doesn't exploit this possibility at present, and I don't have any plans at present to add such myself. Passing on x86_64-linux-gnu, aarch64-none-elf, aarch64_be-none-elf, arm-none-eabi. The 'scan-tree-dump' part passes on mips64 and powerpc (although the latter is disabled as check_effective_target_whole_vector_shift gives 0, as per previous patch) Ok. Thanks, Richard. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-reduc-or_1.c: New test. * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
RE: [PATCH 0/5] Fix handling of word subregs of wide registers
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Richard Sandiford Sent: Monday, September 22, 2014 12:54 PM To: Jeff Law Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 0/5] Fix handling of word subregs of wide registers Jeff Law l...@redhat.com writes: On 09/19/14 01:23, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 09/18/14 04:07, Richard Sandiford wrote: This series is a cleaned-up version of: https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html The underlying problem is that the semantics of subregs depend on the word size. You can't have a subreg for byte 2 of a 4-byte word, say, but you can have a subreg for word 2 of a 4-word value (as well as lowpart subregs of that word, etc.). This causes problems when an architecture has wider-than-word registers, since the addressability of a word can then depend on which register class is used. The register allocators need to fix up cases where a subreg turns out to be invalid for a particular class. This is really an extension of what we need to do for CANNOT_CHANGE_MODE_CLASS. Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf. I thought we fixed these problems long ago with the change to subreg_byte?!? No, that was fixing something else. (I'm just about old enough to remember that too!) The problem here is that (say): (subreg:SI (reg:DI X) 4) is independently addressable on little-endian AArch32 if X assigned to a GPR, but not if X is assigned to a vector register. We need to allow these kinds of subreg on pseudos in order to decompose multiword arithmetic. It's then up to the RA to realise that a reload would be needed if X were assigned to a vector register, since the upper half of a vector register cannot be independently accessed. Note that you could write this example even with the old word-style offsets and IIRC the effect would have been the same. OK. So I kept thinking in terms of the byte offset stuff. But what you're tackling is related to the mess around the mode of the subreg having a different meaning if its smaller than a word vs word-sized or greater. Right? Yeah, that's right. Addressability is based on words, which is inconvenient when your registers are bigger than a word. If the architecture like Microblaze which doesn't support the 1 byte or 2 byte registers. In this scenario what should be returned when SUBREG_WORD is used. Thanks, Richard
Re: [PATCH 9/14] Enforce whole-vector-shifts to always be by a whole number of elements
On Thu, Sep 18, 2014 at 2:27 PM, Alan Lawrence alan.lawre...@arm.com wrote: The VEC_RSHIFT_EXPR is only ever used by the vectorizer in tree-vect-loop.c (vect_create_epilog_for_reduction), to shift the vector by a whole number of elements. The tree code allows more general shifts but only for integral types. This only causes pain and difficulty for backends (particularly for backends with different endiannesses), and enforcing that restriction for integral types too does no harm. bootstrapped on aarch64-none-linux-gnu and x86-64-none-linux-gnu check-gcc on aarch64-none-elf and x86_64-none-linux-gnu Hmm, but then (coming from the tree / gimple level) all shifts can be expressed with a VEC_PERM_EXPR. And of course a general whole-vector shift could be expressed using a VIEW_CONVERT_EXPR to a 1-element integer vector and a regular [RL]SHIFT_EXPR and then converting back. So it seems to me that the vectorizer should instead emit a VEC_PERM_EXPR (making sure the backends or the generic vec_perm expansion code in optabs.c handles the whole-vector-shift case in an optimal way). The current VEC_RSHIFT_EXPR description lacks information on what is shifted in btw (always zeros? the most significant bit (endian dependent?!)). So - can we instead remove VEC_[LR]SHIFT_EXPR? Seems that VEC_LSHIFT_EXPR is unused anyway, and thus vec_shl_optabs as well. Thanks, Richard. gcc/ChangeLog: * tree-cfg.c (verify_gimple_assign_binary): for VEC_RSHIFT_EXPR (and VEC_LSHIFT_EXPR), require shifts to be by a whole number of elements for all types, rather than only non-integral types. * tree.def (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR): Update comment. * doc/md.texi (vec_shl_m, vec_shr_m): Update comment.
Re: [PATCH 13/14][AArch64_be] Fix vec_shr pattern to correctly implement endianness-neutral optab
On Thu, Sep 18, 2014 at 2:45 PM, Alan Lawrence alan.lawre...@arm.com wrote: The previous patch broke aarch64_be by redefining VEC_RSHIFT_EXPR / vec_shr_optab to always shift the vector towards gcc's element 0. This fixes aarch64_be to do that. check-gcc on aarch64-none-elf (no changes) and aarch64_be-none-elf (fixes all regressions produced by previous patch, i.e. no regressions from before redefining vec_shr). Using vector permutes would have avoided this I guess? Richard. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_shr_mode *2): Fix bigendian.
Re: [PATCH 11/14] Remove VEC_LSHIFT_EXPR and vec_shl_optab
On Thu, Sep 18, 2014 at 2:35 PM, Alan Lawrence alan.lawre...@arm.com wrote: The VEC_LSHIFT_EXPR tree code, and the corresponding vec_shl_optab, seem to have been added for completeness, providing a counterpart to VEC_RSHIFT_EXPR and vec_shr_optab. However, whereas VEC_RSHIFT_EXPRs are generated (only) by the vectorizer, VEC_LSHIFT_EXPR expressions are not generated at all, so there seems little point in maintaining it. Bootstrapped on x86_64-unknown-linux-gnu. aarch64.exp+vect.exp on aarch64-none-elf and aarch64_be-none-elf. Ah, there it is ;) Ok. Thanks, Richard. gcc/ChangeLog: * expr.c (expand_expr_real_2): Remove code handling VEC_LSHIFT_EXPR. * fold-const.c (const_binop): Likewise. * cfgexpand.c (expand_debug_expr): Likewise. * tree-inline.c (estimate_operator_cost, dump_generic_node, op_code_prio, op_symbol_code): Likewise. * tree-vect-generic.c (expand_vector_operations_1): Likewise. * optabs.c (optab_for_tree_code): Likewise. (expand_vec_shift_expr): Likewise, update comment. * tree.def: Delete VEC_LSHIFT_EXPR, remove comment. * optabs.h (expand_vec_shift_expr): Remove comment re. VEC_LSHIFT_EXPR. * optabs.def: Remove vec_shl_optab. * doc/md.texi: Remove references to vec_shr_m.
Re: [PATCH 14/14][Vectorizer] Tidy up vect_create_epilog / use_scalar_result
On Thu, Sep 18, 2014 at 2:48 PM, Alan Lawrence alan.lawre...@arm.com wrote: Following earlier patches, vect_create_epilog_for_reduction contains exactly one case where extract_scalar_result==true. Hence, move the code 'if (extract_scalar_result)' there, and tidy-up/remove some variables. bootstrapped on x86_64-none-linux-gnu + check-gcc + check-g++. Ok. Thanks, Richard. gcc/ChangeLog: * tree-vect-loop.c (vect_create_epilog_for_reduction): Move code for 'if (extract_scalar_result)' to the only place that it is true.
Re: [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral
On Thu, Sep 18, 2014 at 2:42 PM, Alan Lawrence alan.lawre...@arm.com wrote: The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the general principles of tree. This patch updates fold-const and the vectorizer (the only place where such expressions are created), such that VEC_RSHIFT_EXPR always shifts towards element 0. The tree code still maps directly onto the vec_shr_optab, and so this patch *will break any bigendian platform defining the vec_shr optab*. -- For AArch64_be, patch follows next in series; -- For PowerPC, I think patch/rfc 15 should fix, please inspect; -- For MIPS, I think patch/rfc 16 should fix, please inspect. gcc/ChangeLog: * fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards element 0. * tree-vect-loop.c (vect_create_epilog_for_reduction): always extract the result of a reduction with vector shifts from element 0. * tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift direction. * doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction. Testing Done: Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on aarch64-none-elf. As said elsewhere I'd like the vectorizer to use VEC_PERM_EXPRs and the generic vec_perm expansion machinery handle the case where the permute can be expressed using the vec_shr_optab. You'd have, for a 1-element shift of V4SI x, VEC_PERM x, { 0, 0, 0, 0 }, {4, 3, 2, 1 } I'd say that if the target says it can handle the constant permute just fine then use the vec_perm_const expansion path. Richard.
Re: Fix i386 FP_TRAPPING_EXCEPTIONS
On Fri, 19 Sep 2014, Joseph S. Myers wrote: On Thu, 18 Sep 2014, Joseph S. Myers wrote: On Thu, 18 Sep 2014, Uros Bizjak wrote: OK for mainline and release branches. I've omitted ia64 from the targets in the testcase in the release branch version, given the lack of any definition of FP_TRAPPING_EXCEPTIONS at all there. (I think a definition as (~_fcw 0x3f) should work for ia64, but haven't tested that.) Here is an *untested* patch with that definition. 2014-09-19 Joseph Myers jos...@codesourcery.com PR target/63312 * config/ia64/sfp-machine.h (FE_EX_ALL, FP_TRAPPING_EXCEPTIONS): New macros. Now committed after Andreas's testing reporting in PR 63312. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check
Well, I haven't looked into this in detail: I've gone only as far as * swapping emit-rtl.o between 'good' compiles (svn r214042) and 'bad' compiles (r214043), finding that the critical difference is in the emit-rtl.o generated by r214043; *looking at the relocations in the 'bad' emit_rtl.o, seeing new entries 'fixed_regs + ', and that Richard Biener's changelog specifically mentions stripping signedness changes (and introduces the SIGN_NOPS). However, I apply your patch (minus the hunk adding the (set_attr type load1), this appears to have gone in already), and still see the same error message: emit-rtl.o: In function `gen_rtx_REG': emit-rtl.c:(.text+0x12f8): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o emit-rtl.o: In function `gen_rtx': emit-rtl.c:(.text+0x1824): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o collect2: error: ld returned 1 exit status and still see the same (suspicious-looking, although perhaps not convicted) relocations: $ readelf --relocs benchspec/CPU2006/403.gcc/build/build_base_test./emit-rtl.o | grep fixed_regs 12a8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 12ac 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 12f8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 12fc 005d0116 R_AARCH64_LDST8_A fixed_regs + 1824 005d0113 R_AARCH64_ADR_PRE fixed_regs + 1828 005d0116 R_AARCH64_LDST8_A fixed_regs + 186c 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 1870 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 I've also now bootstrapped my patch (STRIP_NOPS - STRIP_SIGN_NOPS * 2) on aarch64-none-linux-gnu and x86_64-none-linux-gnu, and check-gcc with no regressions, so would like to propose that patch for trunk...? --Alan Andrew Pinski wrote: On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com wrote: We've been seeing errors using aarch64-none-linux-gnu gcc to build the 403.gcc benchmark from spec2k6, that we've traced back to this patch. The error looks like: /home/alalaw01/bootstrap_richie/gcc/xgcc -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53 -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o combine.o conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o insn-extract.o insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o xcoffout.o ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o lbasename.o getpwd.o ucbqsort.o -lm-o gcc emit-rtl.o: In function `gen_rtx_REG': emit-rtl.c:(.text+0x12f8): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o emit-rtl.o: In function `gen_rtx': emit-rtl.c:(.text+0x1824): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o collect2: error: ld returned 1 exit status specmake: *** [gcc] Error 1 Error with make 'specmake -j7 build': check file '/home/alalaw01/spectest/benchspec/CPU2006/403.gcc/build/build_base_test./make.err' Command returned exit code 2 Error with make! *** Error building 403.gcc Inspecting the compiled emit-rtl.o shows: $ readelf --relocs good/emit-rtl.o | grep fixed_regs 12a8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 12ac 005d0115
[AArch64] Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c
Ok thanks Jeff. In that case I think I should draw this to the attention of the AArch64 maintainers to check the testsuite updates are OK before I commit...? Methinks it may be possible to get further, or at least increase our confidence, if I mock out try_widen_shift_mode, and/or try injecting some dubious RTL from a builtin, although this'll only give a momentary snapshot of behaviour. I may or may not have time to look into this though ;)... Cheers, Alan Jeff Law wrote: On 09/18/14 03:35, Alan Lawrence wrote: Moreover, I think we both agree that if result_mode==shift_mode, the transformation is correct. Just putting that check in, achieves what I'm trying for here, so I'd be happy to go with the attached patch and call it a day. However, I'm a little concerned about the other cases - i.e. where shift_mode is wider than result_mode. Let's go ahead and get the attached patch installed. I'm pretty sure it's correct and I know you want to see something move forward here. We can iterate further if we want. If I understand correctly (and I'm not sure about that, let's see how far I get), this means we'll perform the shift in (say) DImode, when we're only really concerned about the lower (say) 32-bits (for an originally-SImode shift). That's the class of cases I'm concerned about. try_widen_shift_mode will in this case check that the result of the operation *inside* the shift (in our case an XOR) has 33 sign bit copies (via num_sign_bit_copies), i.e. that its *top* 32-bits are all equal to the original SImode sign bit. count of these bits may be fed into the top of the desired SImode result by the DImode shift. Right so far? Correct. AFAICT, num_sign_bit_copies for an XOR, conservatively returns the minimum of the num_sign_bit_copies of its two operands. I'm not sure whether this is behaviour we should rely on in its callers, or for the sake of abstraction we should treat num_sign_bit_copies as a black box (which does what it says on the, erm, tin). Treat it as a black box. It returns the number of known sign bit copies. There may be more, but never less. If the former, doesn't having num_sign_bit_copies = the difference in size between result_mode and shift_mode, of both operands to the XOR, guarantee safety of the commutation (whether the constant is positive or negative)? We could perform the shift (in the larger mode) on both of the XOR operands safely, then XOR together their lower parts. I had convinced myself that when we flip the sign bit via the XOR and commute the XOR out that we invalidate the assumptions made when widening. But I'm not so sure anymore. Damn I hate changes made without suitable tests :( I almost convinced myself the problem is in the adjustment of C2 in the widened case, but that's not a problem either. At least not on paper. If, however, we want to play safe and ensure that we deal safely with any XOR whose top (mode size difference + 1) bits were the same, then I think the restriction that the XOR constant is positive is neither necessary nor sufficient; rather (mirroring try_widen_shift_mode) I think we need that num_sign_bit_copies of the constant in shift_mode, is more than the size difference between result_mode and shift_mode. But isn't that the same? Isn't the only case where it isn't the same when the constant has bits set that are outside the mode of the other operand? Hmm, what about (xor:QI A -1)? In that case -1 will be represented with bits outside the precision of QImode. Hmmm. I might try that patch at some point, I think it is the right check to make. (Meta-comment: this would be *so*much* easier if we could write unit tests more easily!) In the meantime I'd be happy to settle for the attached... No argument on the unit testing comment. It's a major failing in the design of GCC that we can't easily build a unit testing framework. Jeff
Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check
On Mon, Sep 22, 2014 at 1:10 PM, Alan Lawrence alan.lawre...@arm.com wrote: Well, I haven't looked into this in detail: I've gone only as far as * swapping emit-rtl.o between 'good' compiles (svn r214042) and 'bad' compiles (r214043), finding that the critical difference is in the emit-rtl.o generated by r214043; *looking at the relocations in the 'bad' emit_rtl.o, seeing new entries 'fixed_regs + ', and that Richard Biener's changelog specifically mentions stripping signedness changes (and introduces the SIGN_NOPS). However, I apply your patch (minus the hunk adding the (set_attr type load1), this appears to have gone in already), and still see the same error message: emit-rtl.o: In function `gen_rtx_REG': emit-rtl.c:(.text+0x12f8): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o emit-rtl.o: In function `gen_rtx': emit-rtl.c:(.text+0x1824): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o collect2: error: ld returned 1 exit status and still see the same (suspicious-looking, although perhaps not convicted) relocations: $ readelf --relocs benchspec/CPU2006/403.gcc/build/build_base_test./emit-rtl.o | grep fixed_regs 12a8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 12ac 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 12f8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 12fc 005d0116 R_AARCH64_LDST8_A fixed_regs + 1824 005d0113 R_AARCH64_ADR_PRE fixed_regs + 1828 005d0116 R_AARCH64_LDST8_A fixed_regs + 186c 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 1870 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 I've also now bootstrapped my patch (STRIP_NOPS - STRIP_SIGN_NOPS * 2) on aarch64-none-linux-gnu and x86_64-none-linux-gnu, and check-gcc with no regressions, so would like to propose that patch for trunk...? As Andrew said it certainly isn't a fix for the bug but only a workaround. That said, I don't think keeping the sign-change strip is important (it was just cleanup of that routine as I were there). So - the patch is ok for trunk. You still may want to fix the bug though ;) (I'd say it makes more sense then to remove stripping conversions entirely) Thanks, Richard. --Alan Andrew Pinski wrote: On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com wrote: We've been seeing errors using aarch64-none-linux-gnu gcc to build the 403.gcc benchmark from spec2k6, that we've traced back to this patch. The error looks like: /home/alalaw01/bootstrap_richie/gcc/xgcc -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53 -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o combine.o conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o insn-extract.o insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o xcoffout.o ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o lbasename.o getpwd.o ucbqsort.o -lm-o gcc emit-rtl.o: In function `gen_rtx_REG': emit-rtl.c:(.text+0x12f8): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o emit-rtl.o: In function `gen_rtx': emit-rtl.c:(.text+0x1824): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114
On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence alan.lawre...@arm.com wrote: The end goal here is to remove this code from tree-vect-loop.c (vect_create_epilog_for_reduction): if (BYTES_BIG_ENDIAN) bitpos = size_binop (MULT_EXPR, bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1), TYPE_SIZE (scalar_type)); else as this is the root cause of PR/61114 (see testcase there, failing on all bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener, all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is suspicious. The code snippet above is used on two paths: (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR = reduc_[us](plus|min|max)_optab. The optab is documented as the scalar result is stored in the least significant bits of operand 0, but the tree code as the first element in the vector holding the result of the reduction of all elements of the operand. This mismatch means that when the tree code is folded, the code snippet above reads the result from the wrong end of the vector. The strategy (as per https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new tree codes and optabs that produce scalar results directly; this seems better than tying (the element of the vector into which the result is placed) to (the endianness of the target), and avoids generating extra moves on current bigendian targets. However, the previous optabs are retained for now as a migration strategy so as not to break existing backends; moving individual platforms over will follow. A complication here is on AArch64, where we directly generate REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily remove this folding in order to decouple the midend and AArch64 backend. Sounds fine. I hope we can transition all backends for 5.0 and remove the vector variant optabs (maybe renaming the scalar ones). (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e. VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab is defined in an endianness-dependent way, leading to significant complication in fold-const.c. (Moreover, the equivalent vec_shl_optab is never used!). Few platforms appear to handle vec_shr_optab (and fewer bigendian - I see only PowerPC and MIPS), so it seems pertinent to change the existing optab to be endianness-neutral. Patch 10 defines vec_shr for AArch64, for the old specification; patch 13 updates that implementation to fit the new endianness-neutral specification, serving as a guide for other existing backends. Patches/RFCs 15 and 16 are equivalents for MIPS and PowerPC; I haven't tested these but hope they act as useful pointers for the port maintainers. Finally patch 14 cleans up the affected part of tree-vect-loop.c (vect_create_epilog_for_reduction). As said during the individual patches review I'd like the vectorizer to use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with only whole-element amounts). This means we can remove VEC_RSHIFT_EXPR. It also means that if the backend defines vec_perm_const (which it really should) it can handle the special permutes that boil down to a possibly more efficient vector shift there (a good optimization anyway). Until it does that all backends would at least create correct code (with the endian dependent vec_shr removed). Richard. --Alan
Re: [PATCH 0/5] Fix handling of word subregs of wide registers
Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com writes: Jeff Law l...@redhat.com writes: On 09/19/14 01:23, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 09/18/14 04:07, Richard Sandiford wrote: This series is a cleaned-up version of: https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html The underlying problem is that the semantics of subregs depend on the word size. You can't have a subreg for byte 2 of a 4-byte word, say, but you can have a subreg for word 2 of a 4-word value (as well as lowpart subregs of that word, etc.). This causes problems when an architecture has wider-than-word registers, since the addressability of a word can then depend on which register class is used. The register allocators need to fix up cases where a subreg turns out to be invalid for a particular class. This is really an extension of what we need to do for CANNOT_CHANGE_MODE_CLASS. Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf. I thought we fixed these problems long ago with the change to subreg_byte?!? No, that was fixing something else. (I'm just about old enough to remember that too!) The problem here is that (say): (subreg:SI (reg:DI X) 4) is independently addressable on little-endian AArch32 if X assigned to a GPR, but not if X is assigned to a vector register. We need to allow these kinds of subreg on pseudos in order to decompose multiword arithmetic. It's then up to the RA to realise that a reload would be needed if X were assigned to a vector register, since the upper half of a vector register cannot be independently accessed. Note that you could write this example even with the old word-style offsets and IIRC the effect would have been the same. OK. So I kept thinking in terms of the byte offset stuff. But what you're tackling is related to the mess around the mode of the subreg having a different meaning if its smaller than a word vs word-sized or greater. Right? Yeah, that's right. Addressability is based on words, which is inconvenient when your registers are bigger than a word. If the architecture like Microblaze which doesn't support the 1 byte or 2 byte registers. In this scenario what should be returned when SUBREG_WORD is used. I don't understand the question sorry. Subreg offsets are still represented as bytes rather than words. The patch doesn't change the way that subregs are represented or the rules about which subregs are valid. Both before and after the patch, the semantics of subregs say that if you have 4-byte words, only one of: (subreg:QI (reg:SI X) 0) (subreg:QI (reg:SI X) 1) (subreg:QI (reg:SI X) 2) (subreg:QI (reg:SI X) 3) is ever valid (0 for little-endian, 3 for big-endian). Writing to that one valid subreg will change the whole of X, unless the subreg is wrapped in a strict_lowpart. In other words, subregs are defined so that individual parts of a word are not independently addressable. However, individual words of a multiword register _are_ addressable. I.e.: (subreg:SI (reg:DI Y) 0) (subreg:SI (reg:DI Y) 4) are both valid. Writing to one does not change the other. The problem the patch was trying to solve was that you can have targets with 4-byte words but some 8-byte registers. In those cases, it's still possible to form both of the Y subregs above if Y is allocated to a word-sized register, but not if Y is allocated to a doubleword-sized register. Thanks, Richard
Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114
On Mon, Sep 22, 2014 at 1:21 PM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence alan.lawre...@arm.com wrote: The end goal here is to remove this code from tree-vect-loop.c (vect_create_epilog_for_reduction): if (BYTES_BIG_ENDIAN) bitpos = size_binop (MULT_EXPR, bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1), TYPE_SIZE (scalar_type)); else as this is the root cause of PR/61114 (see testcase there, failing on all bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener, all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is suspicious. The code snippet above is used on two paths: (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR = reduc_[us](plus|min|max)_optab. The optab is documented as the scalar result is stored in the least significant bits of operand 0, but the tree code as the first element in the vector holding the result of the reduction of all elements of the operand. This mismatch means that when the tree code is folded, the code snippet above reads the result from the wrong end of the vector. The strategy (as per https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new tree codes and optabs that produce scalar results directly; this seems better than tying (the element of the vector into which the result is placed) to (the endianness of the target), and avoids generating extra moves on current bigendian targets. However, the previous optabs are retained for now as a migration strategy so as not to break existing backends; moving individual platforms over will follow. A complication here is on AArch64, where we directly generate REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily remove this folding in order to decouple the midend and AArch64 backend. Sounds fine. I hope we can transition all backends for 5.0 and remove the vector variant optabs (maybe renaming the scalar ones). (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e. VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab is defined in an endianness-dependent way, leading to significant complication in fold-const.c. (Moreover, the equivalent vec_shl_optab is never used!). Few platforms appear to handle vec_shr_optab (and fewer bigendian - I see only PowerPC and MIPS), so it seems pertinent to change the existing optab to be endianness-neutral. Patch 10 defines vec_shr for AArch64, for the old specification; patch 13 updates that implementation to fit the new endianness-neutral specification, serving as a guide for other existing backends. Patches/RFCs 15 and 16 are equivalents for MIPS and PowerPC; I haven't tested these but hope they act as useful pointers for the port maintainers. Finally patch 14 cleans up the affected part of tree-vect-loop.c (vect_create_epilog_for_reduction). As said during the individual patches review I'd like the vectorizer to use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with only whole-element amounts). This means we can remove VEC_RSHIFT_EXPR. It also means that if the backend defines vec_perm_const (which it really should) it can handle the special permutes that boil down to a possibly more efficient vector shift there (a good optimization anyway). Until it does that all backends would at least create correct code (with the endian dependent vec_shr removed). It seems only Alpha completely lacks vec_perm_const but implements vec_shr. Richard. Richard. --Alan
Re: [PATCH 0/5] Fix handling of word subregs of wide registers
Andrew Pinski pins...@gmail.com writes: On Thu, Sep 18, 2014 at 3:07 AM, Richard Sandiford richard.sandif...@arm.com wrote: This series is a cleaned-up version of: https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html The underlying problem is that the semantics of subregs depend on the word size. You can't have a subreg for byte 2 of a 4-byte word, say, but you can have a subreg for word 2 of a 4-word value (as well as lowpart subregs of that word, etc.). This causes problems when an architecture has wider-than-word registers, since the addressability of a word can then depend on which register class is used. The register allocators need to fix up cases where a subreg turns out to be invalid for a particular class. This is really an extension of what we need to do for CANNOT_CHANGE_MODE_CLASS. Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf. This sounds like something which should be tested on spu as it is the main target that I can think of which has wider-than-word registers and that has had issues with subreg. I can't remember if the simulator for SPU is free (as in beer) and would run on anything besides PowerPC. It has been more than 4 years since I looked into the spu back-end also. Well, AArch64 and x86_64 should be good enough targets for testing the patch. In the AArch64 case the bug was holding up other big-endian fixes, in the x86_64 case it led to a workaround in C_C_M_C. Thanks, Richard
Re: [PATCH] Fix ubsan ICE on invalid (PR sanitizer/61272)
On Wed, May 21, 2014 at 08:58:57PM +0200, Jakub Jelinek wrote: On Wed, May 21, 2014 at 08:46:22PM +0200, Marek Polacek wrote: 2014-05-21 Marek Polacek pola...@redhat.com PR sanitizer/61272 * ubsan.c (is_ubsan_builtin_p): Turn assert into a condition. * g++.dg/ubsan/pr61272.C: New test. Ok, thanks. Now committed to 4.9 branch as well to fix PR63323. Marek
RE: [PATCH 0/5] Fix handling of word subregs of wide registers
-Original Message- From: Richard Sandiford [mailto:richard.sandif...@arm.com] Sent: Monday, September 22, 2014 4:56 PM To: Ajit Kumar Agarwal Cc: Jeff Law; gcc-patches@gcc.gnu.org Subject: Re: [PATCH 0/5] Fix handling of word subregs of wide registers Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com writes: Jeff Law l...@redhat.com writes: On 09/19/14 01:23, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 09/18/14 04:07, Richard Sandiford wrote: This series is a cleaned-up version of: https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html The underlying problem is that the semantics of subregs depend on the word size. You can't have a subreg for byte 2 of a 4-byte word, say, but you can have a subreg for word 2 of a 4-word value (as well as lowpart subregs of that word, etc.). This causes problems when an architecture has wider-than-word registers, since the addressability of a word can then depend on which register class is used. The register allocators need to fix up cases where a subreg turns out to be invalid for a particular class. This is really an extension of what we need to do for CANNOT_CHANGE_MODE_CLASS. Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf. I thought we fixed these problems long ago with the change to subreg_byte?!? No, that was fixing something else. (I'm just about old enough to remember that too!) The problem here is that (say): (subreg:SI (reg:DI X) 4) is independently addressable on little-endian AArch32 if X assigned to a GPR, but not if X is assigned to a vector register. We need to allow these kinds of subreg on pseudos in order to decompose multiword arithmetic. It's then up to the RA to realise that a reload would be needed if X were assigned to a vector register, since the upper half of a vector register cannot be independently accessed. Note that you could write this example even with the old word-style offsets and IIRC the effect would have been the same. OK. So I kept thinking in terms of the byte offset stuff. But what you're tackling is related to the mess around the mode of the subreg having a different meaning if its smaller than a word vs word-sized or greater. Right? Yeah, that's right. Addressability is based on words, which is inconvenient when your registers are bigger than a word. If the architecture like Microblaze which doesn't support the 1 byte or 2 byte registers. In this scenario what should be returned when SUBREG_WORD is used. I don't understand the question sorry. Subreg offsets are still represented as bytes rather than words. The patch doesn't change the way that subregs are represented or the rules about which subregs are valid. Both before and after the patch, the semantics of subregs say that if you have 4-byte words, only one of: (subreg:QI (reg:SI X) 0) (subreg:QI (reg:SI X) 1) (subreg:QI (reg:SI X) 2) (subreg:QI (reg:SI X) 3) is ever valid (0 for little-endian, 3 for big-endian). Writing to that one valid subreg will change the whole of X, unless the subreg is wrapped in a strict_lowpart. In other words, subregs are defined so that individual parts of a word are not independently addressable. However, individual words of a multiword register _are_ addressable. I.e.: (subreg:SI (reg:DI Y) 0) (subreg:SI (reg:DI Y) 4) are both valid. Writing to one does not change the other. The problem the patch was trying to solve was that you can have targets with 4-byte words but some 8-byte registers. In those cases, it's still possible to form both of the Y subregs above if Y is allocated to a word-sized register, but not if Y is allocated to a doubleword-sized register. Thanks Richard for the explanation. Thanks, Richard
[PATCH] msp430: inhibit automatic link of -lnosys in absence of -msim
Based on discussion on the mspgcc-users mailing list[1], this patch changes msp430 to not automatically apply -lnosys when -msim is absent, as this prevents a user from supplying a custom system interface. The existing behavior providing the CIO alternative can be obtained by explicitly linking -lcio instead, assuming the corresponding newlib patch[2] to move msp430's nosys implementation to libcio.a is also used. gcc/ChangeLog 2014-09-22 Peter A. Bigot p...@pabigot.com * config/msp430/msp430.h: Remove automatic -lnosys when -msim absent. [1] http://www.mail-archive.com/mspgcc-users@lists.sourceforge.net/msg12104.html [2] https://sourceware.org/ml/newlib/2014/msg00465.html Cc: d...@redhat.com Cc: ni...@redhat.com Signed-off-by: Peter A. Bigot p...@pabigot.com --- gcc/config/msp430/msp430.h | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h index 91fc91c..068bdad 100644 --- a/gcc/config/msp430/msp430.h +++ b/gcc/config/msp430/msp430.h @@ -70,7 +70,6 @@ extern bool msp430x; -lgcc \ -lcrt \ %{msim:-lsim} \ -%{!msim:-lnosys} \ --end-group\ %{!T*:%{!msim:%{mmcu=*:--script=%*.ld}}} \ %{!T*:%{!msim:%{!mmcu=*:%Tmsp430.ld}}} \ -- 1.8.5.5
[PATCH] Fix missing gimplification of vector constructors
The following fixes non-GIMPLE constructors slipping through the gimplifier. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-09-22 Richard Biener rguent...@suse.de * gimplify.c (gimplify_init_constructor): Do not leave non-GIMPLE vector constructors around. * tree-cfg.c (verify_gimple_assign_single): Verify that CONSTRUCTORs have gimple elements. Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 215450) +++ gcc/gimplify.c (working copy) @@ -4021,12 +4021,6 @@ gimplify_init_constructor (tree *expr_p, break; } - /* Don't reduce an initializer constant even if we can't - make a VECTOR_CST. It won't do anything for us, and it'll - prevent us from representing it as a single constant. */ - if (initializer_constant_valid_p (ctor, type)) - break; - TREE_CONSTANT (ctor) = 0; } Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 215450) +++ gcc/tree-cfg.c (working copy) @@ -4207,8 +4233,20 @@ verify_gimple_assign_single (gimple stmt debug_generic_stmt (rhs1); return true; } + if (!is_gimple_val (elt_v)) + { + error (vector CONSTRUCTOR element is not a GIMPLE value); + debug_generic_stmt (rhs1); + return true; + } } } + else if (CONSTRUCTOR_NELTS (rhs1) != 0) + { + error (non-vector CONSTRUCTOR with elements); + debug_generic_stmt (rhs1); + return true; + } return res; case OBJ_TYPE_REF: case ASSERT_EXPR:
Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result
Richard Biener wrote: Huh. Does that ever happen? Please use a NOP_EXPR instead of a VIEW_CONVERT_EXPR. Yes, the testcase is gcc.target/i386/pr51235.c which performs black magic*** with void *. (This testcase otherwise fails the verify_gimple_assign_unary check in tree-cfg.c .) However, test passes also with your suggestion of NOP_EXPR so that's good by me. ***that is, computes the minimum --Alan Ok with that change. Thanks, Richard. Testing: x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++ aarch64-none-linux-gnu: bootstrap aarch64-none-elf: check-gcc, check-g++ arm-none-eabi: check-gcc aarch64_be-none-elf: check-gcc, showing FAIL-PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test FAIL-PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test Passes the (previously-failing) reduced testcase on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114 Have also assembler/stage-1 tested that testcase on PowerPC, also fixed. gcc/ChangeLog: * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add extract_bit_field around optab result. * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR, produce scalar not vector. * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand type for REDUC_{MIN,MAX,PLUS}_EXPR. * tree-vect-loop.c (vect_analyze_loop): Update comment. (vect_create_epilog_for_reduction): For direct vector reduction, use result of tree code directly without extract_bit_field. * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update comment.
Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars
Richard Biener wrote: scalar_reduc_to_vector misses a comment. Ok to reuse the comment in optabs.h in optabs.c also? I wonder if at the end we wouldn't transition all backends and then renaming reduc_*_scal_optab back to reduc_*_optab makes sense. Yes, that sounds like a plan, the _scal is a bit of a mouthful. The optabs have only one mode - I wouldn't be surprised if an ISA invents for example v4si - di reduction? So do we want to make reduc_plus_scal_optab a little bit more future proof (maybe there is already an ISA that supports this kind of reduction?). That sounds like a plausible thing for an ISA to do, indeed. However given these names are only used by the autovectorizer rather than directly, the question is what the corresponding source code looks like, and/or what changes to the autovectorizer we might have to make to (look for code to) exploit such an instruction. At this point I could go for a reduc_{plus,min_max}_scal_modemode which reduces from the first vector mode to the second scalar mode, and then make the vectorizer look only for cases where the second mode was the element type of the first; but I'm not sure I want to do anything more complicated than that at this stage. (However, indeed it would leave the possibility open for the future.) --Alan
Re: [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral
On Thu, 2014-09-18 at 09:12 -0400, David Edelsohn wrote: On Thu, Sep 18, 2014 at 8:42 AM, Alan Lawrence alan.lawre...@arm.com wrote: The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the general principles of tree. This patch updates fold-const and the vectorizer (the only place where such expressions are created), such that VEC_RSHIFT_EXPR always shifts towards element 0. The tree code still maps directly onto the vec_shr_optab, and so this patch *will break any bigendian platform defining the vec_shr optab*. -- For AArch64_be, patch follows next in series; -- For PowerPC, I think patch/rfc 15 should fix, please inspect; -- For MIPS, I think patch/rfc 16 should fix, please inspect. gcc/ChangeLog: * fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards element 0. * tree-vect-loop.c (vect_create_epilog_for_reduction): always extract the result of a reduction with vector shifts from element 0. * tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift direction. * doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction. Testing Done: Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on aarch64-none-elf. Why wasn't this tested on the PowerLinux system in the GCC Compile Farm? Also, Bill Schmidt can help check the PPC parts fo the patches. Sorry for the late response; I just returned from vacation. I think that patch 15 looks reasonable on the surface, but would be more comfortable if it had been tested. I would echo David's suggestion that you please test this on gcc110 in the compile farm to avoid surprises. Given the similarity between vec_shl_mode and vec_shr_mode I am ok with removing the former; it won't be difficult to re-create it later if needed. Please add some of the language you used above about VEC_RSHIFT_EXPR as commentary for vec_shr_mode in vector.md, as right-shifting towards element zero is not an obvious concept on a BE machine. Thanks, Bill Thanks, David
[AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c
On Thu, Sep 18, 2014 at 11:12:15AM +0100, Richard Earnshaw wrote: On 18/09/14 10:53, James Greenhalgh wrote: +$(srcdir)/config/aarch64/aarch64-builtin-iterators.h: $(srcdir)/config/aarch64/geniterators.sh \ + $(srcdir)/config/aarch64/iterators.md + $(SHELL) $(srcdir)/config/aarch64/geniterators.sh \ + $(srcdir)/config/aarch64/iterators.md \ + $(srcdir)/config/aarch64/aarch64-builtin-iterators.h + aarch-common.o: $(srcdir)/config/arm/aarch-common.c $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(TM_H) $(TM_P_H) $(RTL_H) $(TREE_H) output.h $(C_COMMON_H) $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ Is there any real need to write this into the source directory and have the built file checked in? Ie. can't we always write to the build directory and use it from there. That avoids problems if the sources are on a read-only filesystem. If we do need to leave it in the sources, then contrib/update_gcc should be taught how to touch the generated file when resyncing from the repositories. I thought I had tried this and failed to make it work. I must not have been trying hard enough at the time. Updated as attached, generating the header in the build directory. It looks much better this way! Bootstrapped on aarch64-none-linux-gnueabi with no issues. Ok? Thanks, James --- gcc/ 2014-09-22 James Greenhalgh james.greenha...@arm.com * config/aarch64/geniterators.sh: New. * config/aarch64/iterators.md (VDQF_DF): New. * config/aarch64/t-aarch64: Generate aarch64-builtin-iterators.h. * config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 6a77d29..6b9c383 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -313,91 +313,7 @@ typedef struct VAR11 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K) \ VAR1 (T, N, MAP, L) -/* BUILTIN_ITERATOR macros should expand to cover the same range of - modes as is given for each define_mode_iterator in - config/aarch64/iterators.md. */ - -#define BUILTIN_DX(T, N, MAP) \ - VAR2 (T, N, MAP, di, df) -#define BUILTIN_GPF(T, N, MAP) \ - VAR2 (T, N, MAP, sf, df) -#define BUILTIN_SDQ_I(T, N, MAP) \ - VAR4 (T, N, MAP, qi, hi, si, di) -#define BUILTIN_SD_HSI(T, N, MAP) \ - VAR2 (T, N, MAP, hi, si) -#define BUILTIN_V2F(T, N, MAP) \ - VAR2 (T, N, MAP, v2sf, v2df) -#define BUILTIN_VALL(T, N, MAP) \ - VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, \ - v4si, v2di, v2sf, v4sf, v2df) -#define BUILTIN_VALLDI(T, N, MAP) \ - VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, \ - v4si, v2di, v2sf, v4sf, v2df, di) -#define BUILTIN_VALLDIF(T, N, MAP) \ - VAR12 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, \ - v4si, v2di, v2sf, v4sf, v2df, di, df) -#define BUILTIN_VB(T, N, MAP) \ - VAR2 (T, N, MAP, v8qi, v16qi) -#define BUILTIN_VD1(T, N, MAP) \ - VAR5 (T, N, MAP, v8qi, v4hi, v2si, v2sf, v1df) -#define BUILTIN_VDC(T, N, MAP) \ - VAR6 (T, N, MAP, v8qi, v4hi, v2si, v2sf, di, df) -#define BUILTIN_VDIC(T, N, MAP) \ - VAR3 (T, N, MAP, v8qi, v4hi, v2si) -#define BUILTIN_VDN(T, N, MAP) \ - VAR3 (T, N, MAP, v4hi, v2si, di) -#define BUILTIN_VDQ(T, N, MAP) \ - VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di) -#define BUILTIN_VDQF(T, N, MAP) \ - VAR3 (T, N, MAP, v2sf, v4sf, v2df) -#define BUILTIN_VDQF_DF(T, N, MAP) \ - VAR4 (T, N, MAP, v2sf, v4sf, v2df, df) -#define BUILTIN_VDQH(T, N, MAP) \ - VAR2 (T, N, MAP, v4hi, v8hi) -#define BUILTIN_VDQHS(T, N, MAP) \ - VAR4 (T, N, MAP, v4hi, v8hi, v2si, v4si) -#define BUILTIN_VDQIF(T, N, MAP) \ - VAR9 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2sf, v4sf, v2df) -#define BUILTIN_VDQM(T, N, MAP) \ - VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si) -#define BUILTIN_VDQV(T, N, MAP) \ - VAR5 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si) -#define BUILTIN_VDQQH(T, N, MAP) \ - VAR4 (T, N, MAP, v8qi, v16qi, v4hi, v8hi) -#define BUILTIN_VDQ_BHSI(T, N, MAP) \ - VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si) -#define BUILTIN_VDQ_I(T, N, MAP) \ - VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di) -#define BUILTIN_VDW(T, N, MAP) \ - VAR3 (T, N, MAP, v8qi, v4hi, v2si) -#define BUILTIN_VD_BHSI(T, N, MAP) \ - VAR3 (T, N, MAP, v8qi, v4hi, v2si) -#define BUILTIN_VD_HSI(T, N, MAP) \ - VAR2 (T, N, MAP, v4hi, v2si) -#define BUILTIN_VQ(T, N, MAP) \ - VAR6 (T, N, MAP, v16qi, v8hi, v4si, v2di, v4sf, v2df) -#define BUILTIN_VQN(T, N, MAP) \ - VAR3 (T, N, MAP, v8hi, v4si, v2di) -#define BUILTIN_VQW(T, N, MAP) \ - VAR3 (T, N, MAP, v16qi, v8hi, v4si) -#define BUILTIN_VQ_HSI(T, N, MAP) \ - VAR2 (T, N, MAP, v8hi, v4si) -#define BUILTIN_VQ_S(T, N, MAP) \ - VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si) -#define BUILTIN_VSDQ_HSI(T, N, MAP) \ - VAR6 (T, N, MAP, v4hi, v8hi, v2si, v4si, hi, si) -#define BUILTIN_VSDQ_I(T, N, MAP) \ - VAR11 (T, N, MAP, v8qi,
[patch] Implement move semantics for iostreams
This adds move and swap functions to the iostream classes. Although this is a pretty large patch, it's a pure addition that only affects C++11 mode, and should have no effect on existing code because it won't be moving or swapping streams. I wanted to use C++14's std::exchange so I added std::__exchange to bits/move.h and made std::exchange forward to that. I needed to add a new constructor to basic_ostream that doesn't call init(0), for basic_iostream's move constructor to use. (I wonder why our non-standard default constructors for basic_istream and basic_ostream call init(nullptr), rather than doing nothing. Derived classes that want init(nullptr) to be called can do that by passing nullptr to the standard basic_istream and basic_ostream constructors taking a pointer, which would allow the default constructors to be re-purposed to intentionally leave the object uninitialized). To ensure that the explicit instantiations in the library include the new functions I had to move several files from src/c++98 to src/c++11, which makes the patch huge, so the new tests are in a separate, gzipped file to keep this post below the mailing list size limits. Tested x86_64-linux, committed to trunk. commit ac4314b3a2451147601385e26f19bb95b84c69d0 Author: Jonathan Wakely jwak...@redhat.com Date: Fri Sep 12 19:56:06 2014 +0100 Make streams movable and swappable. PR libstdc++/54316 PR libstdc++/53626 * config/abi/pre/gnu.ver: Add new exports. * config/io/basic_file_stdio.h (__basic_file): Support moving and swapping. * include/bits/basic_ios.h (basic_ios::move, basic_ios::swap): Likewise. * include/bits/ios_base.h (ios_base::_M_move, ios_base::_M_swap): Likewise. * include/bits/fstream.tcc (basic_filebuf): Likewise. * include/bits/move.h (__exchange): Define for C++11 mode. * include/ext/stdio_filebuf.h (stdio_filebuf): Support moving and swapping. * include/ext/stdio_sync_filebuf.h (stdio_sync_filebuf): Likewise. * include/std/fstream (basic_filebuf, basic_ifstream, basic_ofstream, basic_fstream): Likewise. * include/std/ios: Remove whitespace. * include/std/istream (basic_istream, basic_iostream): Support moving and swapping. * include/std/ostream (basic_ostream): Likewise. * include/std/sstream (basic_stringbuf, basic_istringstream, basic_ostringstream, basic_stringstream): Likewise. * include/std/streambuf (basic_streambuf): Do not default copy constructor and assignment on first declaration. * include/std/utility (exchange): Forward to __exchange. * testsuite/27_io/basic_filebuf/cons/char/copy_neg.cc: New. * src/c++11/Makefile.am: Add stream-related files. * src/c++11/Makefile.in: Regenerate. * src/c++11/ext11-inst.cc (stdio_filebuf, stdio_sync_filebuf): New file for explicit instantiation definitions. * src/c++11/ios.cc: Move from src/c++98 to here. (ios_base::_M_move, ios_base::_M_swap): Define. * src/c++11/ios-inst.cc: Move from src/c++98 to here. * src/c++11/iostream-inst.cc: Likewise. * src/c++11/istream-inst.cc: Likewise. * src/c++11/ostream-inst.cc: Likewise. * src/c++11/sstream-inst.cc: Likewise. * src/c++11/streambuf-inst.cc: Likewise. * src/c++98/Makefile.am: Remove stream-related files. * src/c++98/Makefile.in: Regenerate. * src/c++98/ext-inst.cc (stdio_filebuf): Remove explicit instantiations. * src/c++98/misc-inst.cc (stdio_sync_filebuf): Likewise. * src/c++98/ios-inst.cc: Move to src/c++11/. * src/c++98/ios.cc: Move to src/c++11/. * src/c++98/iostream-inst.cc: Likewise. * src/c++98/istream-inst.cc: Likewise. * src/c++98/ostream-inst.cc: Likewise. * src/c++98/sstream-inst.cc: Likewise. * src/c++98/streambuf-inst.cc: Likewise. * testsuite/27_io/basic_filebuf/cons/char/copy_neg.cc: New. * testsuite/27_io/basic_fstream/cons/move.cc: New. * testsuite/27_io/basic_fstream/assign/1.cc: New. * testsuite/27_io/basic_ifstream/cons/move.cc: New. * testsuite/27_io/basic_ifstream/assign/1.cc: New. * testsuite/27_io/basic_istringstream/assign/1.cc: New. * testsuite/27_io/basic_istringstream/cons/move.cc: New. * testsuite/27_io/basic_ofstream/cons/move.cc: New. * testsuite/27_io/basic_ofstream/assign/1.cc: New. * testsuite/27_io/basic_ostringstream/assign/1.cc: New. * testsuite/27_io/basic_ostringstream/cons/move.cc: New. * testsuite/27_io/basic_stringstream/assign/1.cc: New. * testsuite/27_io/basic_stringstream/cons/move.cc: New. diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 41fac71..669e36d 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -989,7 +989,8 @@ GLIBCXX_3.4.10 { _ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EE6stosscEv;
Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars
On Mon, Sep 22, 2014 at 3:26 PM, Alan Lawrence alan.lawre...@arm.com wrote: Richard Biener wrote: scalar_reduc_to_vector misses a comment. Ok to reuse the comment in optabs.h in optabs.c also? Sure. I wonder if at the end we wouldn't transition all backends and then renaming reduc_*_scal_optab back to reduc_*_optab makes sense. Yes, that sounds like a plan, the _scal is a bit of a mouthful. The optabs have only one mode - I wouldn't be surprised if an ISA invents for example v4si - di reduction? So do we want to make reduc_plus_scal_optab a little bit more future proof (maybe there is already an ISA that supports this kind of reduction?). That sounds like a plausible thing for an ISA to do, indeed. However given these names are only used by the autovectorizer rather than directly, the question is what the corresponding source code looks like, and/or what changes to the autovectorizer we might have to make to (look for code to) exploit such an instruction. Ah, indeed. Would be sth like a REDUC_WIDEN_SUM_EXPR or so. At this point I could go for a reduc_{plus,min_max}_scal_modemode which reduces from the first vector mode to the second scalar mode, and then make the vectorizer look only for cases where the second mode was the element type of the first; but I'm not sure I want to do anything more complicated than that at this stage. (However, indeed it would leave the possibility open for the future.) Yeah, agreed. For the min/max case a widen variant isn't useful anyway. Thanks, Richard. --Alan
Re: FW: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C
Hi! On Tue, 27 Aug 2013 21:30:49 +, Iyer, Balaji V balaji.v.i...@intel.com wrote: --- /dev/null +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c @@ -0,0 +1,37 @@ +/* { dg-do run { target { i?86-*-* x86_64-*-* arm*-*-* } } } */ +/* { dg-options -fcilkplus } */ +/* { dg-options -lcilkrts { target { i?86-*-* x86_64-*-* arm*-*-* } } } */ + +void f0(volatile int *steal_flag) +{ + int i = 0; + /* Wait for steal_flag to be set */ + while (!*steal_flag) +; +} + +int f1() +{ + + volatile int steal_flag = 0; + _Cilk_spawn f0(steal_flag); + steal_flag = 1; // Indicate stolen + _Cilk_sync; + return 0; +} + +void f2(int q) +{ + q = 5; +} + +void f3() +{ + _Cilk_spawn f2(f1()); +} + +int main() +{ + f3(); + return 0; +} Is this really well-formed Cilk Plus code? Running with CILK_NWORKERS=1, or -- the equivalent -- in a system with just one CPU (as per libcilkrts/runtime/os-unix.c:__cilkrts_hardware_cpu_count returning 1), I see this test busy-loop as follows: Breakpoint 1, __cilkrts_hardware_cpu_count () at ../../../source/libcilkrts/runtime/os-unix.c:358 358 { (gdb) return 1 Make __cilkrts_hardware_cpu_count return now? (y or n) y #0 cilkg_get_user_settable_values () at ../../../source/libcilkrts/runtime/global_state.cpp:385 385 CILK_ASSERT(hardware_cpu_count 0); (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. f0 (steal_flag=steal_flag@entry=0x7fffd03c) at [...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:9 9 while (!*steal_flag) (gdb) info threads Id Target Id Frame * 1Thread 0x77fcd780 (LWP 30816) spawning_arg.ex f0 (steal_flag=steal_flag@entry=0x7fffd03c) at [...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:9 (gdb) list 4 5 void f0(volatile int *steal_flag) 6 { 7 int i = 0; 8 /* Wait for steal_flag to be set */ 9 while (!*steal_flag) 10 ; 11 } 12 13 int f1() (gdb) bt #0 f0 (steal_flag=steal_flag@entry=0x7fffd03c) at [...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:9 #1 0x004009c8 in _cilk_spn_0 () at [...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:17 #2 0x00400a4b in f1 () at [...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:17 #3 0x00400d0e in _cilk_spn_1 () at [...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:30 #4 0x00400d7a in f3 () at [...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:30 #5 0x00400e33 in main () at [...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:35 No additional thread has been spawned by libcilkrts, and the one initial thread is stuck in f0, without being able to make progress. Should in f0's while loop, some function be called to yield to libcilkrts scheduler, or should libcilkrts have spawned an additional thread, or is the test case just not valid Cilk Plus code? Grüße, Thomas pgpG_DU89ebEh.pgp Description: PGP signature
Re: [debug-early] Allow checking of DECL_ABSTRACT in decl_ultimate_origin
Hi, On Fri, 19 Sep 2014, Aldy Hernandez wrote: Michael, I really don't understand the need for this change in your original patch. I don't know if this was a temporary testing change or what. I'm pretty sure it was temporary testing, when I still was finding my way through dwarf2out limitations/constraints. I'm happy to report that with this and the last set of patches, both C and C++ guality tests have = regressions than mainline. Yay. Super. Ciao, Michael.
[PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Hi, I find that update_equiv_regs in ira.c sets the wrong EQUIV reg-note for pseudo with more than one definiton in certain situation. Here is a simplified RTL snippet after this function finishs handling: (insn 77 37 78 2 (set (reg:SI 171) (const_int 0 [0])) ticket151.c:33 52 {movsi_internal_dsp} (expr_list:REG_EQUAL (const_int 0 [0]) (nil))) .. (insn 79 50 53 2 (set (mem/c:SI (reg/f:SI 136) [2 g_728+0 S4 A64]) (reg:SI 171)) ticket151.c:33 52 {movsi_internal_dsp} (expr_list:REG_DEAD (reg:SI 171) (nil))) (insn 53 79 54 2 (set (mem/c:SI (reg/f:SI 162) [4 g_163+0 S4 A32]) (reg:SI 163)) 52 {movsi_internal_dsp} (expr_list:REG_DEAD (reg:SI 163) (expr_list:REG_DEAD (reg/f:SI 162) (nil (insn 54 53 14 2 (set (reg:SI 171) (mem/u/c:SI (symbol_ref/u:SI (*.LC8) [flags 0x2]) [4 S4 A32])) ticket151.c:49 52 {movsi_internal_dsp} (expr_list:REG_EQUIV (mem/u/c:SI (symbol_ref/u:SI (*.LC8) [flags 0x2]) [4 S4 A32]) (expr_list:REG_EQUAL (mem/u/c:SI (symbol_ref/u:SI (*.LC8) [flags 0x2]) [4 S4 A32]) (nil The REG_EQUIV of insn 54 is not correct as pseudo 171 is defined in insn 77 with a differerent value. This may causes reload replacing pseudo 171 with mem/u/c:SI (symbol_ref/u:SI (*.LC8), which is wrong. A proposed patch for this issue, please comment: Index: gcc/ira.c === --- gcc/ira.c(revision 215460) +++ gcc/ira.c(working copy) @@ -3477,18 +3477,26 @@ update_equiv_regs (void) no_equiv (dest, set, NULL); continue; } + /* Record this insn as initializing this register. */ reg_equiv[regno].init_insns = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns); /* If this register is known to be equal to a constant, record that it is always equivalent to the constant. */ - if (DF_REG_DEF_COUNT (regno) == 1 - note ! rtx_varies_p (XEXP (note, 0), 0)) + if (note ! rtx_varies_p (XEXP (note, 0), 0)) { - rtx note_value = XEXP (note, 0); - remove_note (insn, note); - set_unique_reg_note (insn, REG_EQUIV, note_value); + if (DF_REG_DEF_COUNT (regno) == 1) +{ + rtx note_value = XEXP (note, 0); + remove_note (insn, note); + set_unique_reg_note (insn, REG_EQUIV, note_value); +} + else +{ + no_equiv (dest, set, NULL); + continue; +} } /* If this insn introduces a constant register, decrease the priority Cheers, Felix
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On 09/21/14 21:10, Chen Gang wrote: On 9/22/14 2:09, Michael Eager wrote: Generally, you should use gcc to link programs, not ld. gcc is a driver which will select the appropriate libraries and support routines (such as crt0.o, which contains _start) and pass them to the linker. OK, thanks. When gcc, it misses the root directory for crt1.o and crtn.o: e.g. /lib/ld.so.1, crt1.o, crtn.o when gcc -v, but we need /upstream/ release/lib/ld.so.1, /upstream/lib/crt1.o, /upstream/libcrtn.o. You likely need to build mb-gcc with --sysroot=/upstream. How are you building gcc? What are your configuration options? -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On 09/21/14 20:55, Chen Gang wrote: On 9/22/14 2:03, Michael Eager wrote: On 09/20/14 23:24, Chen Gang wrote: And it seems, we also need 'LinkScr.ld' for ldscript, could you share it to me, thanks. set_board_info ldscript -T/home/eager/Xilinx/dg/microblaze_0/LinkScr.ld Hi Chen -- The DejaGNU configuration I provided is for a bare-metal environment. If you are testing in a Linux environment, the tool chain you uses should provide a default linker script which matches your hardware's memory layout. You should not need to provide a separate linker script. OK, thanks, I shall try to find the default linker script for it. If you are running mb-gcc which generates executables which run on the target, you do not need to provide a linker script. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: Move dwarf2 frame tables to read-only section for AIX
On Mon, 22 Sep 2014, Andrew Dixie wrote: I altered the dwarf2 frame and exception table generation so the decision on whether to use a read-only or read-write section is an independent decision from how the frame tables are registered. I renamed EH_FRAME_IN_DATA_SECTION to EH_FRAME_THROUGH_COLLECT2, as it now supports read-only, has slightly changed semantics, and I think this name better reflects what it currently does rather than what it historically did. If you rename a target macro, the old target macro name needs to be poisoned in system.h. 2014-09-22 Andrew Dixie andr...@gentrack.com Move exception tables to read-only memory on AIX. * dwarf2asm.c (dw2_asm_output_encoded_addr_rtx): Add call to ASM_OUTPUT_DWARF_DATAREL. * dwarf2out.c (switch_to_eh_frame_section): Use a read-only section even if EH_FRAME_SECTION_NAME is undefined. Add call to EH_FRAME_THROUGH_COLLECT2. * except.c (switch_to_exception_section): Use a read-only section even if EH_FRAME_SECTION_NAME is undefined. * collect2.c (write_c_file_stat): Provide dbase on AIX. (scan_prog_file): Don't output __dso_handle nor __gcc_unwind_dbase. * config/rs6000/aix.h (ASM_PREFERRED_EH_DATA_FORMAT): define. (EH_TABLES_CAN_BE_READ_ONLY): define. (ASM_OUTPUT_DWARF_PCREL): define. (ASM_OUTPUT_DWARF_DATAREL): define. (EH_FRAME_IN_DATA_SECTION): undefine. (EH_FRAME_THROUGH_COLLECT2): define. * config/rs6000/rs6000-aix.c: new file. (rs6000_aix_asm_output_dwarf_pcrel): new function. (rs6000_aix_asm_output_dwarf_datarel): new function. * config/rs6000/rs6000.c (rs6000_xcoff_asm_init_sections): remove assignment of exception_section. This ChangeLog entry seems very incomplete. It doesn't mention the changes for other architectures, or to defaults.h, or to the documentation, for example. -- Joseph S. Myers jos...@codesourcery.com
[patch] Fix std::try_lock behaviour
When I fixed std::try_lock a few years ago I misread the spec, exceptions should not be caught and turned into a return value. Tested x86_64-linux, committed to trunk. commit 5effca670aa009c60e31b639604da4d00f388038 Author: Jonathan Wakely jwak...@redhat.com Date: Thu Sep 18 16:15:54 2014 +0100 * include/std/mutex (try_lock): Do not swallow exceptions. * testsuite/30_threads/try_lock/4.cc: Fix test. diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex index f6b851c..d80fa5a 100644 --- a/libstdc++-v3/include/std/mutex +++ b/libstdc++-v3/include/std/mutex @@ -630,12 +630,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { int __idx; auto __locks = std::tie(__l1, __l2, __l3...); - __try - { __try_lock_impl0::__do_try_lock(__locks, __idx); } - __catch(const __cxxabiv1::__forced_unwind) - { __throw_exception_again; } - __catch(...) - { } + __try_lock_impl0::__do_try_lock(__locks, __idx); return __idx; } diff --git a/libstdc++-v3/testsuite/30_threads/try_lock/4.cc b/libstdc++-v3/testsuite/30_threads/try_lock/4.cc index 7741798..1212b65 100644 --- a/libstdc++-v3/testsuite/30_threads/try_lock/4.cc +++ b/libstdc++-v3/testsuite/30_threads/try_lock/4.cc @@ -133,8 +133,15 @@ void test03() while (unreliable_lock::throw_on 3) { unreliable_lock::count = 0; -int failed = std::try_lock(l1, l2, l3); -VERIFY( failed == unreliable_lock::throw_on ); +try + { +std::try_lock(l1, l2, l3); +VERIFY( false ); + } +catch (int e) + { +VERIFY( e == unreliable_lock::throw_on ); + } ++unreliable_lock::throw_on; } }
Re: [GOOGLE] Fix LIPO COMDAT fixup and gcov-tool interactions
On Mon, Sep 22, 2014 at 1:36 AM, Nathan Sidwell nat...@acm.org wrote: On 09/21/14 18:58, Xinliang David Li wrote: the intent is that that points to the gcov_info object of the object file containing the live version of the function. I couldn't quite get this to work though -- it involves emitting a function's gcov_fn_info decl in the same comdat group as the function itself. Another problem is that comdat functions may have different CFGs due to different early inline decisions. Comdatting gcov counters can lead to problems in profile use. Not comdatting profile counters have another advantage -- it allows context sensitive profiling for comdat function inline instances (IPA-inline). IIRC early inlining is done before the counters are created. You're right later inlining may be a problem, and require a non-comdat set of cloned counters. I can't recall exactly at what stage the counters are now inserted relative to inlining. The CFG machinery had a number of significant changes while, and shortly after, I was working on this. You'll see the checking of gfi_ptr-key != gi_ptr in libgcov-driver.c. Are you making use of this machinery, or inventing new machinery? Teresa's method is a different machinery -- it tries to propagate profile data from the selected comdat copy + inline instance copies to comdat copies with zero counts. It'd be preferrable to complete the mechanism I outline above, rather than have a competing mechanism. I don't think the above mechanism helps the problem my patches are trying to solve. Unless we are in whole-program mode, which we don't use, the only profiles available at profile-use time are those for the given module (and any other modules in the same module group in LIPO mode). If the COMDAT copy selected by the linker in the profile-gen binary is in a different module, we would see all-zero counts when compiling modules containing the other copies. I had submitted some patches to trunk awhile back in the 4.9 time frame to help deal with this by using estimated frequencies for zero-count COMDAT copies, and applying scaled counts when we inline them, but it is an imperfect solution. The approach we now take for LIPO builds is to propagate the counts for the profiled copy of the COMDAT to other modules. (Additionally the indirect call profiling we perform in LIPO mode would point to a module that we didn't have access to, which is a related issue that the COMDAT fixups we perform at the end of the LIPO profiling run are trying to solve.) Also, this patch is in effect lying because the data then makes it look like the unselected comdat instances are in fact being executed -- looking at the whole program it's going to be harder to understand whether the different inline instances are being executed multiple times, or are duplicate data. Does the gcov user output indicate this subtlety in some way? Correct in that it makes it look like these copies were executed. This was causing some issues when we rewrote/merged profiles with gcov-tool, which essentially operates in whole-program mode. To handle this, this patch marks the modified (previously all-zero) copies in the gcda file. So now gcov-tool can handle them appropriately (clear them on read before doing any analysis), and gcov-dump will flag them. My patch does not do anything special for these routines when they are read into the profile-use build, because we do want the propagated counts during optimization. Possibly in whole-program mode they should be cleared on read just as in gcov-tool, or they could be flagged in some way for downstream phases, but it is not a compilation mode we are using so I have not experimented. Thanks, Teresa nathan -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] gcc parallel make check
On 09/12/2014 08:04 PM, Jakub Jelinek wrote: I've been worried about the quick cases where parallelization is not beneficial, like make check-gcc \ RUNTESTFLAGS=dg.exp=pr60123.c or similar, but one doesn't usually pass -jN in that case. I have -jN in my $MAKEFLAGS, so I've been running into this with my rgt shell function: rgt () { ( cd ~/m/$CANON/gcc/gcc; make check-c++ ${1:+RUNTESTFLAGS=$*} ) } If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the explosion of shell verbosity. Could we add some '@'s to more of the rules, perhaps? Jason
[patch] Update C++11 library status docs
This documents some more C++11 features as incomplete and notes that iostreams are movable now. Committed to trunk. commit 8de61687f31c20aec35c02393c4fbb44ddcc0cbb Author: Jonathan Wakely jwak...@redhat.com Date: Mon Sep 22 15:57:28 2014 +0100 * doc/xml/manual/status_cxx2011.xml: Update C++11 status. * doc/xml/manual/status_cxx2014.xml: Update TS status. * doc/html/manual/status.html: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml index f0a256d..4433c89 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml @@ -600,16 +600,18 @@ particular release. entry/ /row row + ?dbhtml bgcolor=#B0B0B0 ? entry20.6.12.3/entry entrycodeuninitialized_fill/code/entry - entryY/entry - entry/ + entryPartial/entry + entryReturns codevoid/code../entry /row row + ?dbhtml bgcolor=#B0B0B0 ? entry20.6.12.4/entry entrycodeuninitialized_fill_n/code/entry - entryY/entry - entry/ + entryPartial/entry + entryReturns codevoid/code../entry /row row entry20.6.13/entry @@ -1183,10 +1185,11 @@ particular release. entry/ /row row + ?dbhtml bgcolor=#B0B0B0 ? entry22.3.3.1/entry entryCharacter classification/entry - entryY/entry - entry/ + entryPartial/entry + entryMissing codeisblank/code./entry /row row entry22.3.3.2/entry @@ -1639,10 +1642,11 @@ particular release. entry/ /row row + ?dbhtml bgcolor=#B0B0B0 ? entry25.3/entry entryMutating sequence operations/entry - entryY/entry - entry/ + entryPartial/entry + entrycoderotate/code returns codevoid/code./entry /row row entry25.4/entry @@ -2049,10 +2053,13 @@ particular release. entry/ /row row + ?dbhtml bgcolor=#B0B0B0 ? entry26.8/entry entryC Library/entry - entryY/entry - entry/ + entryPartial/entry + entrycodelt;ctgmathgt;/code doesn't include +codelt;ccomplexgt;/code + /entry /row row entry @@ -2129,7 +2136,6 @@ particular release. entryIostreams base classes/entry entryPartial/entry entry -Missing move and swap operations on codebasic_ios/code. Missing codeio_errc/code and codeiostream_category/code. codeios_base::failure/code is not derived from codesystem_error/code. Missing codeios_base::hexfloat/code. @@ -2147,23 +2153,20 @@ particular release. entryFormatting and manipulators/entry entryPartial/entry entry -Missing move and swap operations Missing codeget_time/code and codeput_time/code manipulators. /entry /row row - ?dbhtml bgcolor=#B0B0B0 ? entry27.8/entry entryString-based streams/entry - entryPartial/entry - entryMissing move and swap operations/entry + entryY/entry + entry/ /row row - ?dbhtml bgcolor=#B0B0B0 ? entry27.9/entry entryFile-based streams/entry - entryPartial/entry - entryMissing move and swap operations/entry + entryY/entry + entry/ /row row entry diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml index 82abd88..11254d6 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml @@ -402,18 +402,6 @@ not in any particular release. row ?dbhtml bgcolor=#C8B0B0 ? entry - link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3923.pdf; - N3923 - /link - /entry - entryA SFINAE-Friendly std::iterator_traits/entry - entryN/entry - entryLibrary Fundamentals TS/entry -/row - -row - ?dbhtml bgcolor=#C8B0B0 ? - entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3925.pdf; N3925 /link
Re: [PATCH] gcc parallel make check
On Mon, Sep 22, 2014 at 11:21:14AM -0400, Jason Merrill wrote: On 09/12/2014 08:04 PM, Jakub Jelinek wrote: I've been worried about the quick cases where parallelization is not beneficial, like make check-gcc \ RUNTESTFLAGS=dg.exp=pr60123.c or similar, but one doesn't usually pass -jN in that case. I have -jN in my $MAKEFLAGS, so I've been running into this with my rgt shell function: rgt () { ( cd ~/m/$CANON/gcc/gcc; make check-c++ ${1:+RUNTESTFLAGS=$*} ) } If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the explosion of shell verbosity. Could we add some '@'s to more of the rules, perhaps? I've been considering that too, but not sure what info people find valuable and what they don't. Jakub
Re: [PATCH, i386, Pointer Bounds Checker 32/x] Pointer Bounds Checker hooks for i386 target
On 19 Sep 18:21, Uros Bizjak wrote: On Fri, Sep 19, 2014 at 2:53 PM, Ilya Enkovich enkovich@gmail.com wrote: This patch adds i386 target hooks for Pointer Bounds Checker. New version with fixes and better documentation for ix86_load_bounds and ix86_store_bounds is below. +/* Expand pass uses this hook to load bounds for function parameter + PTR passed in SLOT in case its bounds are not passed in a register. + + If SLOT is a memory, then bounds are loaded as for regular pointer + loaded from memory. PTR may be NULL in case SLOT is a memory. + In such case value of PTR (if required) may be loaded from SLOT. + + If SLOT is NULL or a register then SLOT_NO is an integer constant + holding number of the target dependent special slot which should be + used to obtain bounds. + + Return loaded bounds. */ OK, I hope I understand this target-handling of SLOT_NO. Can you please clarify when SLOT is a register? For functions with more than four pointers passed in registers we do not have enough bound registers to pass bounds. These hooks are called then with SLOT set to register used to pass pointer I propose to write this function in the following (hopefully equivalent) way: Since addr computation is very similar for both loading and storing (the only difference is usage of either arg_pointer_rtx or stack_pointer_rtx) I decided additionally move it into a separate function. This should make functions simplier for understanding. --cut here-- { if (!slot) { gcc_assert (CONST_INT_P (slot_no)); addr = plus_constant (Pmode, arg_pointer_rtx, -(INTVAL (slot_no) + 1) * GET_MODE_SIZE (Pmode)); gcc_assert (ptr); } else if (REG_P (slot)) { gcc_assert (CONST_INT_P (slot_no)); addr = plus_constant (Pmode, arg_pointer_rtx, -(INTVAL (slot_no) + 1) * GET_MODE_SIZE (Pmode)); ptr = slot; } else if (MEM_P (slot)) { addr = XEXP (slot, 0); if (!register_operand (addr, Pmode)) addr = copy_addr_to_reg (addr); if (!ptr) ptr = copy_addr_to_reg (slot); } else gcc_unreachable (); if (!index_register_operand (ptr, Pmode)) ptr = copy_addr_to_reg (ptr); ... } --cut here-- Please add a comment in each of if/else, explaining what the code is doing. This is non-trivial to understand. + +static rtx +ix86_load_bounds (rtx slot, rtx ptr, rtx slot_no) +{ + rtx addr = NULL; + rtx reg; + + if (!ptr) +{ + gcc_assert (MEM_P (slot)); + ptr = copy_addr_to_reg (slot); +} + + if (!slot || REG_P (slot)) +{ + if (slot) + ptr = slot; + + gcc_assert (CONST_INT_P (slot_no)); + + /* Here we have the case when more than four pointers are +passed in registers. In this case we are out of bound +registers and have to use bndldx to load bound. RA, +RA - 8, etc. are used for address translation in bndldx. */ + addr = plus_constant (Pmode, arg_pointer_rtx, + -(INTVAL (slot_no) + 1) * GET_MODE_SIZE (Pmode)); +} + else if (MEM_P (slot)) +{ + addr = XEXP (slot, 0); + if (!register_operand (addr, Pmode)) + addr = copy_addr_to_reg (addr); +} + else +gcc_unreachable (); + + if (!register_operand (ptr, Pmode)) +ptr = copy_addr_to_reg (ptr); + + reg = gen_reg_rtx (BNDmode); + emit_insn (BNDmode == BND64mode +? gen_bnd64_ldx (reg, addr, ptr) +: gen_bnd32_ldx (reg, addr, ptr)); + + return reg; +} + +/* Expand pass uses this hook to store BOUNDS for call argument PTR + passed in SLOT in case BOUNDS are not passed in a register. + + If SLOT is a memory, then BOUNDS are stored as for regular pointer + stored in memory. PTR may be NULL in case SLOT is a memory. + In such case value of PTR (if required) may be loaded from SLOT. + + If SLOT is NULL or a register then SLOT_NO is an integer constant + holding number of the target dependent special slot which should be + used to store BOUNDS. */ + +static void +ix86_store_bounds (rtx ptr, rtx slot, rtx bounds, rtx slot_no) This function can be written in exactly the same way as the above proposed code, up to the check for ptr register_operand. +{ + rtx addr; + + if (ptr) +{ + if (!register_operand (ptr, Pmode)) + ptr = copy_addr_to_reg (ptr); +} + else +{ + gcc_assert (MEM_P (slot)); + ptr = copy_addr_to_reg (slot); +} + + if (!slot || REG_P (slot)) +{ + gcc_assert (CONST_INT_P (slot_no)); + addr = plus_constant (Pmode, stack_pointer_rtx, + -(INTVAL (slot_no) + 1) * GET_MODE_SIZE (Pmode)); +} + else if (MEM_P (slot)) +addr = XEXP (slot, 0);
Re: [PATCH, i386, Pointer Bounds Checker 34/x] Vararg functions support
On 21 Sep 18:08, Uros Bizjak wrote: Hello! This patch introduces initialization of incoming bounds for vararg function on i386 target. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-11 Ilya Enkovich ilya.enkov...@intel.com * config/i386/i386.c (ix86_setup_incoming_varargs): New. (ix86_va_start): Initialize bounds for pointers in va_list. (TARGET_SETUP_INCOMING_VARARG_BOUNDS): New. OK with a couple of smallchanges below. Thanks, Uros. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a67e6e7..c520f26 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -8456,6 +8456,72 @@ ix86_setup_incoming_varargs (cumulative_args_t cum_v, enum machine_mode mode, setup_incoming_varargs_64 (next_cum); } +static void +ix86_setup_incoming_vararg_bounds (cumulative_args_t cum_v, + enum machine_mode mode, + tree type, + int *pretend_size ATTRIBUTE_UNUSED, + int no_rtl) +{ + CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); + CUMULATIVE_ARGS next_cum; + tree fntype; + rtx save_area; + int bnd_reg, i, max; + + gcc_assert (!no_rtl); + Please add a comment for following condition. + if (!TARGET_64BIT) +return; + + fntype = TREE_TYPE (current_function_decl); + + /* For varargs, we do not want to skip the dummy va_dcl argument. + For stdargs, we do want to skip the last named argument. */ + next_cum = *cum; + if (stdarg_p (fntype)) +ix86_function_arg_advance (pack_cumulative_args (next_cum), mode, type, + true); + if (cum-call_abi == MS_ABI) +return; Put this early exit at the top of the function. + save_area = frame_pointer_rtx; + + max = cum-regno + cfun-va_list_gpr_size / UNITS_PER_WORD; + if (max X86_64_REGPARM_MAX) +max = X86_64_REGPARM_MAX; + + bnd_reg = cum-bnd_regno + cum-force_bnd_pass; + if (chkp_function_instrumented_p (current_function_decl)) +for (i = cum-regno; i max; i++) + { + rtx addr = plus_constant (Pmode, save_area, i * UNITS_PER_WORD); + rtx reg = gen_rtx_REG (DImode, + x86_64_int_parameter_registers[i]); + rtx ptr = reg; + rtx bounds; + + if (bnd_reg = LAST_BND_REG) + bounds = gen_rtx_REG (BNDmode, bnd_reg); + else + { + rtx ldx_addr = plus_constant (Pmode, arg_pointer_rtx, + (LAST_BND_REG - bnd_reg) * 8); No magic constants! + bounds = gen_reg_rtx (BNDmode); + emit_insn (TARGET_64BIT + ? gen_bnd64_ldx (bounds, ldx_addr, ptr) + : gen_bnd32_ldx (bounds, ldx_addr, ptr)); + } + + emit_insn (TARGET_64BIT + ? gen_bnd64_stx (addr, ptr, bounds) + : gen_bnd32_stx (addr, ptr, bounds)); Please check BNDmode instead of TARGET_64BIT. + bnd_reg++; + } +} + + /* Checks if TYPE is of kind va_list char *. */ static bool @@ -8478,7 +8544,7 @@ ix86_va_start (tree valist, rtx nextarg) { HOST_WIDE_INT words, n_gpr, n_fpr; tree f_gpr, f_fpr, f_ovf, f_sav; - tree gpr, fpr, ovf, sav, t; + tree gpr, fpr, ovf, sav, t, t1; tree type; rtx ovf_rtx; @@ -8529,6 +8595,13 @@ ix86_va_start (tree valist, rtx nextarg) crtl-args.arg_offset_rtx, NULL_RTX, 0, OPTAB_LIB_WIDEN); convert_move (va_r, next, 0); + + /* Store zero bounds for va_list. */ + if (chkp_function_instrumented_p (current_function_decl)) + chkp_expand_bounds_reset_for_mem (valist, + make_tree (TREE_TYPE (valist), +next)); + } return; } @@ -8582,10 +8655,15 @@ ix86_va_start (tree valist, rtx nextarg) t = make_tree (type, ovf_rtx); if (words != 0) t = fold_build_pointer_plus_hwi (t, words * UNITS_PER_WORD); + t1 = t; t = build2 (MODIFY_EXPR, type, ovf, t); TREE_SIDE_EFFECTS (t) = 1; expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); + /* Store zero bounds for overflow area pointer. */ + if (chkp_function_instrumented_p (current_function_decl)) +chkp_expand_bounds_reset_for_mem (ovf, t1); Can you please move this above to avoid another temporary (t1)? if (ix86_varargs_gpr_size || ix86_varargs_fpr_size) { /* Find the register save area. @@ -8594,9 +8672,14 @@ ix86_va_start (tree valist, rtx nextarg) t = make_tree (type, frame_pointer_rtx); if
Re: Fix ICE with ODR mering and variable sized types
On Fri, Sep 19, 2014 at 8:55 PM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this patch fixes ICE by avoiding mangling of types with variadic size (those are not really supported). Bootstrapped/regtested x86_64-linux, tested with libreoffice, comitted. Hmm, but how do global vars end up having variadic type? Isn't the bug that you are ending up with some local entity here? We call this on TYPE_NAME of all types, not only global vars. I do not think I should skip all types with function context, because static variables may have them (and I do not track where type comes from because ODR violation is caused even by types not used in global decls). For example: inline int test() { struct A {int a,b;}; static struct A testA; return testA.a++; } creates type A that is local and should be merged interprocedurally I think. Variadic types indeed can not appear in global declarations, so I think it is safe to ignore them. I am adding Jason to CC, perhaps he knows better. Honza Richard. PR lto/63286 * tree.c (need_assembler_name_p): Do not mangle variadic types. Index: tree.c === --- tree.c (revision 215328) +++ tree.c (working copy) @@ -5003,6 +5003,7 @@ need_assembler_name_p (tree decl) decl == TYPE_NAME (TREE_TYPE (decl)) !is_lang_specific (TREE_TYPE (decl)) AGGREGATE_TYPE_P (TREE_TYPE (decl)) + !variably_modified_type_p (TREE_TYPE (decl), NULL_TREE) !type_in_anonymous_namespace_p (TREE_TYPE (decl))) return !DECL_ASSEMBLER_NAME_SET_P (decl); /* Only FUNCTION_DECLs and VAR_DECLs are considered. */
Re: [PATCH, i386, Pointer Bounds Checker 2/x] Intel Memory Protection Extensions (MPX) instructions support
On 17 Sep 12:17, Ilya Enkovich wrote: On 19 May 11:23, Jeff Law wrote: On 05/19/14 02:19, Ilya Enkovich wrote: On 16 May 13:39, Jeff Law wrote: On 04/16/14 05:35, Ilya Enkovich wrote: Hi, This patch introduces Intel MPX bound registers and instructions. It was approved earlier for 4.9 and had no significant changes since then. I'll assume patch is OK if no objections arise. Patch was bootstrapped and tested for linux-x86_64. OK for the trunk. Please wait until entire set is approved before installing. jeff Here is an updated version. The only change is in mode_ldx expand. It now has the second operand preparation code moved from ix86_expand_builtin as was proposed by Uros. Thanks, Ilya Added similar operand handling into mode_stx expand. Thanks, Ilya diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 8e0a583..4e07d70 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -19,7 +19,7 @@ ;;; Unused letters: ;;; H -;;; h jw z +;;; h j z ;; Integer register constraints. ;; It is not necessary to define 'r' here. @@ -94,6 +94,9 @@ (define_register_constraint v TARGET_SSE ? ALL_SSE_REGS : NO_REGS Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).) +(define_register_constraint w TARGET_MPX ? BND_REGS : NO_REGS + @internal Any bound register.) + ;; We use the Y prefix to denote any number of conditional register sets: ;; z First SSE register. ;; i SSE2 inter-unit moves to SSE register enabled @@ -253,6 +256,8 @@ ;; T prefix is used for different address constraints ;; v - VSIB address ;; s - address with no segment register +;; i - address with no index and no rip +;; b - address with no base and no rip (define_address_constraint Tv VSIB address operand @@ -261,3 +266,11 @@ (define_address_constraint Ts Address operand without segment register (match_operand 0 address_no_seg_operand)) + +(define_address_constraint Ti + MPX address operand without index + (match_operand 0 address_mpx_no_index_operand)) + +(define_address_constraint Tb + MPX address operand without base + (match_operand 0 address_mpx_no_base_operand)) diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 2c05cec..a1e9289 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -399,6 +399,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, __XSAVEC__); if (isa_flag OPTION_MASK_ISA_XSAVES) def_or_undef (parse_in, __XSAVES__); + if (isa_flag OPTION_MASK_ISA_MPX) +def_or_undef (parse_in, __MPX__); } diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def index 07e5720..0e302e3 100644 --- a/gcc/config/i386/i386-modes.def +++ b/gcc/config/i386/i386-modes.def @@ -87,6 +87,9 @@ VECTOR_MODE (INT, DI, 1); /* V1DI */ VECTOR_MODE (INT, SI, 1); /* V1SI */ VECTOR_MODE (INT, QI, 2); /* V2QI */ +POINTER_BOUNDS_MODE (BND32, 8); +POINTER_BOUNDS_MODE (BND64, 16); + INT_MODE (OI, 32); INT_MODE (XI, 64); diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 39462bd..c8ef2d2 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -231,6 +231,8 @@ extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); extern void ix86_expand_sse2_abs (rtx, rtx); +extern bool ix86_bnd_prefixed_insn_p (rtx); + /* In i386-c.c */ extern void ix86_target_macros (void); extern void ix86_register_pragmas (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 929f1b1..01823ca 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2131,6 +2131,8 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] = /* Mask registers. */ MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, + /* MPX bound registers */ + BND_REGS, BND_REGS, BND_REGS, BND_REGS, }; /* The default register map used in 32bit mode. */ @@ -2147,6 +2149,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] = -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 16-23*/ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 24-31*/ 93, 94, 95, 96, 97, 98, 99, 100, /* Mask registers */ + 101, 102, 103, 104, /* bound registers */ }; /* The default register map used in 64bit mode. */ @@ -2163,6 +2166,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] = 67, 68, 69, 70, 71, 72, 73, 74, /* AVX-512 registers 16-23 */ 75, 76, 77, 78, 79, 80, 81, 82, /* AVX-512 registers 24-31 */ 118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */ + 126, 127, 128, 129, /* bound registers */ }; /* Define the register
Re: [PATCH] gcc parallel make check
On 09/22/2014 11:26 AM, Jakub Jelinek wrote: On Mon, Sep 22, 2014 at 11:21:14AM -0400, Jason Merrill wrote: If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the explosion of shell verbosity. Could we add some '@'s to more of the rules, perhaps? I've been considering that too, but not sure what info people find valuable and what they don't. I don't see much information in the ~128 repetitions of the check-parallel rules with different numbers; the actual runtest command is the same in all of them. Adding @ to all of the commands of the check-parallel-% rule makes things much better for me: diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 6f251a5..be4c840 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp fi check-parallel-% : site.exp - -test -d plugin || mkdir plugin - -test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR) - test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir) - -(rootme=`${PWD_COMMAND}`; export rootme; \ + -@test -d plugin || mkdir plugin + -@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR) + @test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir) + -@(rootme=`${PWD_COMMAND}`; export rootme; \ srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \ if [ -n $(check_p_subno) ] \ [ -n $$GCC_RUNTEST_PARALLELIZE_DIR ] \
Re: [debug-early] Allow checking of DECL_ABSTRACT in decl_ultimate_origin
On 09/22/14 08:17, Michael Matz wrote: Hi, On Fri, 19 Sep 2014, Aldy Hernandez wrote: Michael, I really don't understand the need for this change in your original patch. I don't know if this was a temporary testing change or what. I'm pretty sure it was temporary testing, when I still was finding my way through dwarf2out limitations/constraints. Ah perfect! I'm happy to report that with this and the last set of patches, both C and C++ guality tests have = regressions than mainline. Yay. Super. Again, thank you for your original patchset, which has immensely helped me navigate my way around the black box which was/is dwarf2out. Aldy
Re: [PATCH] gcc parallel make check
On Mon, Sep 22, 2014 at 05:26:04PM +0200, Jakub Jelinek wrote: I've been considering that too, but not sure what info people find valuable and what they don't. The ten million Running blablablalba.exp ... messages on a very parallel run aren't helpful in my opinion. There might be more but that drowns out everything else :-) Segher
Re: [PATCH] msp430: inhibit automatic link of -lnosys in absence of -msim
Hi Peter, gcc/ChangeLog 2014-09-22 Peter A. Bigot p...@pabigot.com * config/msp430/msp430.h: Remove automatic -lnosys when -msim absent. Approved and applied. Cheers Nick
[PATCH][AArch64] LR register not used in leaf functions
AArch64 has the same issue ARM had where the LR register was not used in leaf functions. This was reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this test-case need to be added with more live ranges for the need for the LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for the case AArch64 to see this. The same fix (from the thread https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went into ARM should apply to AArch64 as well. Regression tested on qemu for aarch64-none-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2014-09-23 Kugan Vivekanandarajah kug...@linaro.org * config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after epilogue_completed is true. diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index db950da..b3e4585 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -309,7 +309,7 @@ extern unsigned long aarch64_tune_flags; considered live at the start of the called function. */ #define EPILOGUE_USES(REGNO) \ - ((REGNO) == LR_REGNUM) + (epilogue_completed (REGNO) == LR_REGNUM) /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function, the stack pointer does not matter. The value is tested only in
Re: [PATCH] gcc parallel make check
On Mon, Sep 22, 2014 at 10:44:06AM -0500, Segher Boessenkool wrote: On Mon, Sep 22, 2014 at 05:26:04PM +0200, Jakub Jelinek wrote: I've been considering that too, but not sure what info people find valuable and what they don't. The ten million Running blablablalba.exp ... messages on a very parallel run aren't helpful in my opinion. There might be more but that drowns out everything else :-) It has some value, it shows the actual progress. Sure, you can just watch the *.log files as they are populated and get better picture. I think the Running *.exp messages go from dejagnu, not from gcc testsuite changes. Jakub
Re: Speedup int_bit_from_pos
On Sun, 21 Sep 2014, Jan Hubicka wrote: Please omit static from inline functions. Yep, I suppose we want to drop static in all inlines? I can make patch for that. Also one notable difference with your patches is that the fits hwi is now not tested on the result but on the result input which, multiplied by 8, might not fit a hwi now. So please use wide-ints here (the to_offset flavor). The function must always suceed (so user promise it will fit in HWI) and for performance reasons I would rather not go into wide int by defualt, but I can do that with checking enabled. wide-int should be fast enough, please use it. Like this? Index: tree.h === --- tree.h (revision 215421) +++ tree.h (working copy) @@ -3877,10 +3877,20 @@ extern tree size_in_bytes (const_tree); extern HOST_WIDE_INT int_size_in_bytes (const_tree); extern HOST_WIDE_INT max_int_size_in_bytes (const_tree); extern tree bit_position (const_tree); -extern HOST_WIDE_INT int_bit_position (const_tree); extern tree byte_position (const_tree); extern HOST_WIDE_INT int_byte_position (const_tree); +/* Like bit_position, but return as an integer. It must be representable in + that way (since it could be a signed value, we don't have the + option of returning -1 like int_size_in_byte can. */ + +static inline HOST_WIDE_INT int_bit_position (const_tree field) +{ + return ((wide_int)DECL_FIELD_OFFSET (field) * BITS_PER_UNIT + + (wide_int)DECL_FIELD_BIT_OFFSET (field)).to_shwi (); +} + + #define sizetype sizetype_tab[(int) stk_sizetype] #define bitsizetype sizetype_tab[(int) stk_bitsizetype] #define ssizetype sizetype_tab[(int) stk_ssizetype]
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On 09/22/2014 10:45 PM, Michael Eager wrote: On 09/21/14 21:10, Chen Gang wrote: On 9/22/14 2:09, Michael Eager wrote: Generally, you should use gcc to link programs, not ld. gcc is a driver which will select the appropriate libraries and support routines (such as crt0.o, which contains _start) and pass them to the linker. OK, thanks. When gcc, it misses the root directory for crt1.o and crtn.o: e.g. /lib/ld.so.1, crt1.o, crtn.o when gcc -v, but we need /upstream/ release/lib/ld.so.1, /upstream/lib/crt1.o, /upstream/libcrtn.o. You likely need to build mb-gcc with --sysroot=/upstream. OK, thanks! I guess it will solve all issues which I met, and I shall try next. How are you building gcc? What are your configuration options? The related information is below, please help check when you have time, thanks. [root@localhost ~]# /upstream/release/bin/microblaze-gchen-linux-gcc -v Using built-in specs. COLLECT_GCC=/upstream/release/bin/microblaze-gchen-linux-gcc COLLECT_LTO_WRAPPER=/upstream/release/libexec/gcc/microblaze-gchen-linux/5.0.0/lto-wrapper Target: microblaze-gchen-linux Configured with: ../gcc/configure --target=microblaze-gchen-linux --disable-nls --enable-languages=c --disable-threads --disable-shared --without-headers --disable-libssp --disable-libquadmath --disable-libgomp --disable-libatomic --prefix=/upstream/release Thread model: single gcc version 5.0.0 20140920 (experimental) (GCC) [root@localhost ~]# Thanks. -- Chen Gang Open share and attitude like air water and life which God blessed
Re: [PATCH] gcc parallel make check
On Mon, Sep 22, 2014 at 11:43:35AM -0400, Jason Merrill wrote: On 09/22/2014 11:26 AM, Jakub Jelinek wrote: On Mon, Sep 22, 2014 at 11:21:14AM -0400, Jason Merrill wrote: If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the explosion of shell verbosity. Could we add some '@'s to more of the rules, perhaps? I've been considering that too, but not sure what info people find valuable and what they don't. I don't see much information in the ~128 repetitions of the check-parallel rules with different numbers; the actual runtest command is the same in all of them. Adding @ to all of the commands of the check-parallel-% rule makes things much better for me: LGTM (though, supposedly we want similar change in libstdc++-v3/testsuite/Makefile.am). Or, if people would really like to see the commands, we could print them just once, using e.g. -$(if $(check_p_subno),@)(rootme= ... (then e.g. check-parallel-gcc goal would print the command, but check-parallel-gcc-1 or check-parallel-gcc-112 would not). --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp fi check-parallel-% : site.exp - -test -d plugin || mkdir plugin - -test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR) - test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir) - -(rootme=`${PWD_COMMAND}`; export rootme; \ + -@test -d plugin || mkdir plugin + -@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR) + @test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir) + -@(rootme=`${PWD_COMMAND}`; export rootme; \ srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \ if [ -n $(check_p_subno) ] \ [ -n $$GCC_RUNTEST_PARALLELIZE_DIR ] \ Jakub
Re: [PATCH][AArch64] LR register not used in leaf functions
On 22/09/14 16:43, Kugan wrote: AArch64 has the same issue ARM had where the LR register was not used in leaf functions. This was reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this test-case need to be added with more live ranges for the need for the LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for the case AArch64 to see this. The same fix (from the thread https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went into ARM should apply to AArch64 as well. Regression tested on qemu for aarch64-none-linux-gnu with no new regressions. Is this OK for trunk? This still be a partial fix. LR should be a caller-saved register free to use in case it's saved properly to across function call. I had a very similar patch to this sitting in my local tree and under various benchmark analysis. -- Jiong Thanks, Kugan gcc/ChangeLog: 2014-09-23 Kugan Vivekanandarajah kug...@linaro.org * config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after epilogue_completed is true.
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On 09/22/2014 10:46 PM, Michael Eager wrote: On 09/21/14 20:55, Chen Gang wrote: On 9/22/14 2:03, Michael Eager wrote: On 09/20/14 23:24, Chen Gang wrote: And it seems, we also need 'LinkScr.ld' for ldscript, could you share it to me, thanks. set_board_info ldscript -T/home/eager/Xilinx/dg/microblaze_0/LinkScr.ld Hi Chen -- The DejaGNU configuration I provided is for a bare-metal environment. If you are testing in a Linux environment, the tool chain you uses should provide a default linker script which matches your hardware's memory layout. You should not need to provide a separate linker script. OK, thanks, I shall try to find the default linker script for it. If you are running mb-gcc which generates executables which run on the target, you do not need to provide a linker script. OK, thanks. I shall remove it, next. Thanks. -- Chen Gang Open share and attitude like air water and life which God blessed
[jit] Improvements to documentation
Committed to branch dmalcolm/jit As before, an HTML version of the docs can be seen at: https://dmalcolm.fedorapeople.org/gcc/libgccjit-api-docs/index.html with the bulk of the changes occurring to: https://dmalcolm.fedorapeople.org/gcc/libgccjit-api-docs/intro/tutorial03.html gcc/jit/ChangeLog.jit: * docs/_build/texinfo/libgccjit.texi: Regenerate. * docs/intro/install.rst: Reduce width of listing. * docs/intro/tutorial01.rst: Use libgccjit.h rather than libgccjit.h when including the header. * docs/intro/tutorial02.rst: Likewise. * docs/intro/tutorial03.rst: Clarify various sections; show effect of reducing optimization level down from 3 to 2. (Putting it all together): Move to above... (Behind the curtain: optimizing away stack manipulation): ...this, and rename this to... (Behind the curtain: How does our code get optimized?): ...and add more detail, and discussion of elimination of tail recursion. --- gcc/jit/ChangeLog.jit | 15 + gcc/jit/docs/_build/texinfo/libgccjit.texi | 875 + gcc/jit/docs/intro/install.rst | 11 +- gcc/jit/docs/intro/tutorial01.rst | 2 +- gcc/jit/docs/intro/tutorial02.rst | 2 +- gcc/jit/docs/intro/tutorial03.rst | 533 +++--- 6 files changed, 1141 insertions(+), 297 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 8e546e6..14576f2 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,18 @@ +2014-09-22 David Malcolm dmalc...@redhat.com + + * docs/_build/texinfo/libgccjit.texi: Regenerate. + * docs/intro/install.rst: Reduce width of listing. + * docs/intro/tutorial01.rst: Use libgccjit.h rather than + libgccjit.h when including the header. + * docs/intro/tutorial02.rst: Likewise. + * docs/intro/tutorial03.rst: Clarify various sections; show + effect of reducing optimization level down from 3 to 2. + (Putting it all together): Move to above... + (Behind the curtain: optimizing away stack manipulation): + ...this, and rename this to... + (Behind the curtain: How does our code get optimized?): ...and + add more detail, and discussion of elimination of tail recursion. + 2014-09-19 David Malcolm dmalc...@redhat.com * TODO.rst: Add detection of uninitialized variables, since diff --git a/gcc/jit/docs/_build/texinfo/libgccjit.texi b/gcc/jit/docs/_build/texinfo/libgccjit.texi index 985b22c..850adf2 100644 --- a/gcc/jit/docs/_build/texinfo/libgccjit.texi +++ b/gcc/jit/docs/_build/texinfo/libgccjit.texi @@ -19,7 +19,7 @@ @copying @quotation -libgccjit 0.1, September 19, 2014 +libgccjit 0.1, September 22, 2014 David Malcolm @@ -131,8 +131,13 @@ Tutorial part 3: Adding JIT-compilation to a toy interpreter * Compiling the context:: * Single-stepping through the generated code:: * Examining the generated code:: -* Behind the curtain; optimizing away stack manipulation: Behind the curtain optimizing away stack manipulation. * Putting it all together:: +* Behind the curtain; How does our code get optimized?: Behind the curtain How does our code get optimized?. + +Behind the curtain: How does our code get optimized? + +* Optimizing away stack manipulation:: +* Elimination of tail recursion:: Topic Reference @@ -259,8 +264,13 @@ Tutorial part 3: Adding JIT-compilation to a toy interpreter * Compiling the context:: * Single-stepping through the generated code:: * Examining the generated code:: -* Behind the curtain; optimizing away stack manipulation: Behind the curtain optimizing away stack manipulation. * Putting it all together:: +* Behind the curtain; How does our code get optimized?: Behind the curtain How does our code get optimized?. + +Behind the curtain: How does our code get optimized? + +* Optimizing away stack manipulation:: +* Elimination of tail recursion:: @end menu @@ -321,12 +331,13 @@ needed to develop against it (@cite{libgccjit-devel}): @example $ rpm -qlv libgccjit -lrwxrwxrwx1 rootroot 18 Aug 12 07:56 /usr/lib64/libgccjit.so.0 - libgccjit.so.0.0.1 --rwxr-xr-x1 rootroot 14463448 Aug 12 07:57 /usr/lib64/libgccjit.so.0.0.1 +lrwxrwxrwx1 rootroot 18 Aug 12 07:56 /usr/lib64/libgccjit.so.0 - libgccjit.so.0.0.1 +-rwxr-xr-x1 rootroot 14463448 Aug 12 07:57 /usr/lib64/libgccjit.so.0.0.1 + $ rpm -qlv libgccjit-devel --rwxr-xr-x1 rootroot37654 Aug 12 07:56 /usr/include/libgccjit++.h --rwxr-xr-x1 rootroot28967 Aug 12 07:56 /usr/include/libgccjit.h -lrwxrwxrwx1 rootroot 14 Aug 12 07:56 /usr/lib64/libgccjit.so - libgccjit.so.0 +-rwxr-xr-x1 rootroot37654 Aug 12 07:56 /usr/include/libgccjit++.h +-rwxr-xr-x1 rootroot
Re: [AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c
On 22/09/14 14:30, James Greenhalgh wrote: On Thu, Sep 18, 2014 at 11:12:15AM +0100, Richard Earnshaw wrote: On 18/09/14 10:53, James Greenhalgh wrote: +$(srcdir)/config/aarch64/aarch64-builtin-iterators.h: $(srcdir)/config/aarch64/geniterators.sh \ + $(srcdir)/config/aarch64/iterators.md + $(SHELL) $(srcdir)/config/aarch64/geniterators.sh \ + $(srcdir)/config/aarch64/iterators.md \ + $(srcdir)/config/aarch64/aarch64-builtin-iterators.h + aarch-common.o: $(srcdir)/config/arm/aarch-common.c $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(TM_H) $(TM_P_H) $(RTL_H) $(TREE_H) output.h $(C_COMMON_H) $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ Is there any real need to write this into the source directory and have the built file checked in? Ie. can't we always write to the build directory and use it from there. That avoids problems if the sources are on a read-only filesystem. If we do need to leave it in the sources, then contrib/update_gcc should be taught how to touch the generated file when resyncing from the repositories. I thought I had tried this and failed to make it work. I must not have been trying hard enough at the time. Updated as attached, generating the header in the build directory. It looks much better this way! Bootstrapped on aarch64-none-linux-gnueabi with no issues. Ok? Thanks, James --- gcc/ 2014-09-22 James Greenhalgh james.greenha...@arm.com * config/aarch64/geniterators.sh: New. * config/aarch64/iterators.md (VDQF_DF): New. * config/aarch64/t-aarch64: Generate aarch64-builtin-iterators.h. * config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove. OK. R.
Re: [PATCH] gcc parallel make check
On Mon, Sep 22, 2014 at 05:49:12PM +0200, Jakub Jelinek wrote: On Mon, Sep 22, 2014 at 10:44:06AM -0500, Segher Boessenkool wrote: On Mon, Sep 22, 2014 at 05:26:04PM +0200, Jakub Jelinek wrote: I've been considering that too, but not sure what info people find valuable and what they don't. The ten million Running blablablalba.exp ... messages on a very parallel run aren't helpful in my opinion. There might be more but that drowns out everything else :-) It has some value, it shows the actual progress. Sure, you can just watch the *.log files as they are populated and get better picture. I think the Running *.exp messages go from dejagnu, not from gcc testsuite changes. Hrm. Looking at the log files it seems there are not more of those messages at all since the changes. Maybe it just all got too fast! :-) Segher
Re: [PATCH] gcc parallel make check
On 09/22/2014 11:58 AM, Jakub Jelinek wrote: LGTM (though, supposedly we want similar change in libstdc++-v3/testsuite/Makefile.am). Or, if people would really like to see the commands, we could print them just once, using e.g. -$(if $(check_p_subno),@)(rootme= ... (then e.g. check-parallel-gcc goal would print the command, but check-parallel-gcc-1 or check-parallel-gcc-112 would not). So, like this? commit c750897381a3f936e27cabd825cfa85ce936a6a9 Author: Jason Merrill ja...@redhat.com Date: Mon Sep 22 11:44:00 2014 -0400 gcc/ * Makefile.in (check-parallel-%): Add @. libstdc++-v3/ * testsuite/Makefile.am (%/site.exp): Add @. (check-DEJAGNU): Likewise. * testsuite/Makefile.in: Regenerate. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 6f251a5..97b439a 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp fi check-parallel-% : site.exp - -test -d plugin || mkdir plugin - -test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR) - test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir) - -(rootme=`${PWD_COMMAND}`; export rootme; \ + -@test -d plugin || mkdir plugin + -@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR) + @test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir) + -$(if $(check_p_subno),@)(rootme=`${PWD_COMMAND}`; export rootme; \ srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \ if [ -n $(check_p_subno) ] \ [ -n $$GCC_RUNTEST_PARALLELIZE_DIR ] \ diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/Makefile.am index e206aba..b4c9e85 100644 --- a/libstdc++-v3/testsuite/Makefile.am +++ b/libstdc++-v3/testsuite/Makefile.am @@ -91,9 +91,9 @@ new-abi-baseline: ${extract_symvers} ../src/.libs/libstdc++.so $${output}) %/site.exp: site.exp - -test -d $* || mkdir $* + -@test -d $* || mkdir $* @srcdir=`cd $(srcdir); ${PWD_COMMAND}`; - objdir=`${PWD_COMMAND}`/$*; \ + @objdir=`${PWD_COMMAND}`/$*; \ sed -e s|^set srcdir .*$$|set srcdir $$srcdir| \ -e s|^set objdir .*$$|set objdir $$objdir| \ site.exp $*/site.exp.tmp @@ -115,7 +115,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp # Run the testsuite in normal mode. check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp - AR=$(AR); export AR; \ + $(if $*,@)AR=$(AR); export AR; \ RANLIB=$(RANLIB); export RANLIB; \ if [ -z $* ] [ $(filter -j, $(MFLAGS)) = -j ]; then \ rm -rf normal-parallel || true; \ diff --git a/libstdc++-v3/testsuite/Makefile.in b/libstdc++-v3/testsuite/Makefile.in index 59060b8..0fc26f4 100644 --- a/libstdc++-v3/testsuite/Makefile.in +++ b/libstdc++-v3/testsuite/Makefile.in @@ -553,9 +553,9 @@ new-abi-baseline: ${extract_symvers} ../src/.libs/libstdc++.so $${output}) %/site.exp: site.exp - -test -d $* || mkdir $* + -@test -d $* || mkdir $* @srcdir=`cd $(srcdir); ${PWD_COMMAND}`; - objdir=`${PWD_COMMAND}`/$*; \ + @objdir=`${PWD_COMMAND}`/$*; \ sed -e s|^set srcdir .*$$|set srcdir $$srcdir| \ -e s|^set objdir .*$$|set objdir $$objdir| \ site.exp $*/site.exp.tmp @@ -566,7 +566,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp # Run the testsuite in normal mode. check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp - AR=$(AR); export AR; \ + $(if $*,@)AR=$(AR); export AR; \ RANLIB=$(RANLIB); export RANLIB; \ if [ -z $* ] [ $(filter -j, $(MFLAGS)) = -j ]; then \ rm -rf normal-parallel || true; \
Re: [PATCH] gcc parallel make check
On Mon, Sep 22, 2014 at 12:21:08PM -0400, Jason Merrill wrote: On 09/22/2014 11:58 AM, Jakub Jelinek wrote: LGTM (though, supposedly we want similar change in libstdc++-v3/testsuite/Makefile.am). Or, if people would really like to see the commands, we could print them just once, using e.g. -$(if $(check_p_subno),@)(rootme= ... (then e.g. check-parallel-gcc goal would print the command, but check-parallel-gcc-1 or check-parallel-gcc-112 would not). So, like this? Ok, thanks. commit c750897381a3f936e27cabd825cfa85ce936a6a9 Author: Jason Merrill ja...@redhat.com Date: Mon Sep 22 11:44:00 2014 -0400 gcc/ * Makefile.in (check-parallel-%): Add @. libstdc++-v3/ * testsuite/Makefile.am (%/site.exp): Add @. (check-DEJAGNU): Likewise. * testsuite/Makefile.in: Regenerate. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 6f251a5..97b439a 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp fi check-parallel-% : site.exp - -test -d plugin || mkdir plugin - -test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR) - test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir) - -(rootme=`${PWD_COMMAND}`; export rootme; \ + -@test -d plugin || mkdir plugin + -@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR) + @test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir) + -$(if $(check_p_subno),@)(rootme=`${PWD_COMMAND}`; export rootme; \ srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \ if [ -n $(check_p_subno) ] \ [ -n $$GCC_RUNTEST_PARALLELIZE_DIR ] \ diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/Makefile.am index e206aba..b4c9e85 100644 --- a/libstdc++-v3/testsuite/Makefile.am +++ b/libstdc++-v3/testsuite/Makefile.am @@ -91,9 +91,9 @@ new-abi-baseline: ${extract_symvers} ../src/.libs/libstdc++.so $${output}) %/site.exp: site.exp - -test -d $* || mkdir $* + -@test -d $* || mkdir $* @srcdir=`cd $(srcdir); ${PWD_COMMAND}`; - objdir=`${PWD_COMMAND}`/$*; \ + @objdir=`${PWD_COMMAND}`/$*; \ sed -e s|^set srcdir .*$$|set srcdir $$srcdir| \ -e s|^set objdir .*$$|set objdir $$objdir| \ site.exp $*/site.exp.tmp @@ -115,7 +115,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp # Run the testsuite in normal mode. check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp - AR=$(AR); export AR; \ + $(if $*,@)AR=$(AR); export AR; \ RANLIB=$(RANLIB); export RANLIB; \ if [ -z $* ] [ $(filter -j, $(MFLAGS)) = -j ]; then \ rm -rf normal-parallel || true; \ diff --git a/libstdc++-v3/testsuite/Makefile.in b/libstdc++-v3/testsuite/Makefile.in index 59060b8..0fc26f4 100644 --- a/libstdc++-v3/testsuite/Makefile.in +++ b/libstdc++-v3/testsuite/Makefile.in @@ -553,9 +553,9 @@ new-abi-baseline: ${extract_symvers} ../src/.libs/libstdc++.so $${output}) %/site.exp: site.exp - -test -d $* || mkdir $* + -@test -d $* || mkdir $* @srcdir=`cd $(srcdir); ${PWD_COMMAND}`; - objdir=`${PWD_COMMAND}`/$*; \ + @objdir=`${PWD_COMMAND}`/$*; \ sed -e s|^set srcdir .*$$|set srcdir $$srcdir| \ -e s|^set objdir .*$$|set objdir $$objdir| \ site.exp $*/site.exp.tmp @@ -566,7 +566,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp # Run the testsuite in normal mode. check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp - AR=$(AR); export AR; \ + $(if $*,@)AR=$(AR); export AR; \ RANLIB=$(RANLIB); export RANLIB; \ if [ -z $* ] [ $(filter -j, $(MFLAGS)) = -j ]; then \ rm -rf normal-parallel || true; \ Jakub
[patch] moving macro definitions to defaults.h
After being reminded of the tm.h issues brought up last november (here: https://gcc.gnu.org/ml/gcc-patches/2013-11/msg01731.html ), I started looking back into it. The general summary is the any header file which has a conditional on a target macro could be affected by include file reordering... ie, if a header file has #ifdef BLAH whatever and we move the header files around a bit, it wouldn't be immediately obvious if the order where changes and BLAH was define later in the include structure. The results for this header files would be different because whatever would no longer be in the preprocess source, and it may take a long time to discover the difference. Josephs solution was to identify these and instead put a default definition in default.h ... then change all the uses to #if instead.. ie, #if BLAH This way we can ensure that the definition has been seen and it will be a compile error if not. I looked at all the target macros listed by Joseph, and decide to start with the ones which are used *only* in an if defined situation in a .h files of some sort. ie #ifdef #ifndef or #if define() If this happens in a .c file, then changing the order of .h's shouldn't matter. (Unless the .c file is doing something really screwy. So for the moment, ignoring that.) So looking at only .h files, I found a number of default definitions in rtl.h, which this patch moves to defaults.h. I was going to #error if defaults.h wasn't included, but found that to be unnecessary since it uses some of those macros and would cause a compile failure anyway if it wasn't included. All the other uses of these particular macros use the #if model, so no further adjustment is required for them. This patch bootstraps on x86_64-unknown-linux-gnu, and regressions are running. Assuming no issues show up, OK for mainline? The remaining macros I will have a closer look at and most will likely need the full treatment moving from #ifdef to the #if model. I plan to do these next. The macros which fit this category for potential header trouble (along with their usage count) are: 2 USE_COLLECT2 3 TARGET_TERMINATE_DW2_EH_FRAME_INFO 3 TARGET_HAVE_CTORS_DTORS 3 REGMODE_NATURAL_SIZE 4 CC_STATUS_MDEP_INIT 4 CC_STATUS_MDEP_INIT 4 MODE_BASE_REG_REG_CLASS 4 REGNO_MODE_OK_FOR_REG_BASE_P 5 MODE_BASE_REG_CLASS 5 XCOFF_DEBUGGING_INFO 6 VMS_DEBUGGING_INFO 6 TARGET_ASM_OUTPUT_ANCHOR 6 EH_FRAME_SECTION_NAME 8 MODE_CODE_BASE_REG_CLASS 9 HARD_REGNO_CALLER_SAVE_MODE 9 HARD_REGNO_CALL_PART_CLOBBERED 9 SDB_DEBUGGING_INFO 9 CC_STATUS_MDEP 9 REGNO_MODE_CODE_OK_FOR_BASE_P 10 SECONDARY_RELOAD_CLASS 10 TARGET_VXWORKS_RTP 14 NO_DOT_IN_LABEL 14 REGNO_MODE_OK_FOR_BASE_P 17 STACK_REGS 19 TARGET_ASM_DESTRUCTOR 20 OBJECT_FORMAT_ELF 20 TARGET_ASM_CONSTRUCTOR 24 NO_DOLLAR_IN_LABEL 26 DBX_DEBUGGING_INFO 30 CPLUSPLUS_CPP_SPEC 42 ASM_OUTPUT_DEF * defaults.h (HAVE_PRE_INCREMENT, HAVE_PRE_DECREMENT, HAVE_POST_INCREMENT, HAVE_POST_DECREMENT, HAVE_POST_MODIFY_DISP, HAVE_POST_MODIFY_REG, HAVE_PRE_MODIFY_DISP, HAVE_PRE_MODIFY_REG, USE_LOAD_POST_INCREMENT, USE_LOAD_POST_DECREMENT, USE_LOAD_PRE_INCREMENT, USE_LOAD_PRE_DECREMENT, USE_STORE_POST_INCREMENT, USE_STORE_POST_DECREMENT, USE_STORE_PRE_INCREMENT, USE_STORE_PRE_DECREMENT, HARD_FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_IS_FRAME_POINTER, HARD_FRAME_POINTER_IS_ARG_POINTER): Relocate from rtl.h. * rtl.h: Move default definitions to defaults.h. (AUTO_INC_DEC): Adjust defintion check to assumed defaults exist. Index: defaults.h === *** defaults.h (revision 215355) --- defaults.h (working copy) *** see the files COPYING3 and COPYING.RUNTI *** 1168,1173 --- 1168,1264 #define DEFAULT_PCC_STRUCT_RETURN 1 #endif + #ifndef HAVE_PRE_INCREMENT + #define HAVE_PRE_INCREMENT 0 + #endif + + #ifndef HAVE_PRE_DECREMENT + #define HAVE_PRE_DECREMENT 0 + #endif + + #ifndef HAVE_POST_INCREMENT + #define HAVE_POST_INCREMENT 0 + #endif + + #ifndef HAVE_POST_DECREMENT + #define HAVE_POST_DECREMENT 0 + #endif + + #ifndef HAVE_POST_MODIFY_DISP + #define HAVE_POST_MODIFY_DISP 0 + #endif + + #ifndef HAVE_POST_MODIFY_REG + #define HAVE_POST_MODIFY_REG 0 + #endif + + #ifndef HAVE_PRE_MODIFY_DISP + #define HAVE_PRE_MODIFY_DISP 0 + #endif + + #ifndef HAVE_PRE_MODIFY_REG + #define HAVE_PRE_MODIFY_REG 0 + #endif + + /* Some architectures do not have complete pre/post increment/decrement +instruction sets, or only move some modes efficiently. These macros +allow us to tune autoincrement generation. */ + + #ifndef USE_LOAD_POST_INCREMENT + #define USE_LOAD_POST_INCREMENT(MODE) HAVE_POST_INCREMENT + #endif + + #ifndef USE_LOAD_POST_DECREMENT + #define USE_LOAD_POST_DECREMENT(MODE) HAVE_POST_DECREMENT + #endif + + #ifndef USE_LOAD_PRE_INCREMENT + #define USE_LOAD_PRE_INCREMENT(MODE)HAVE_PRE_INCREMENT + #endif + + #ifndef USE_LOAD_PRE_DECREMENT + #define
Re: Fix ICE with ODR mering and variable sized types
On 09/22/2014 11:37 AM, Jan Hubicka wrote: Variadic types indeed can not appear in global declarations, so I think it is safe to ignore them. Agreed. Jason
Re: [PATCH 4/5] Generalise invalid_mode_change_p
On 09/22/14 01:34, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 09/18/14 04:25, Richard Sandiford wrote: This is the main patch for the bug. We should treat a register as invalid for a mode change if simplify_subreg_regno cannot provide a new register number for the result. We should treat a class as invalid for a mode change if all registers in the class are invalid. This is an extension of the old CANNOT_CHANGE_MODE_CLASS-based check (simplify_subreg_regno checks C_C_C_M). I forgot to say that the patch is a prerequisite to removing aarch64's C_C_C_M. There are other prerequisites too, but removing C_C_C_M without this patch caused regressions in the existing testsuite, which is why no new tests are needed. gcc/ * hard-reg-set.h: Include hash-table.h. (target_hard_regs): Add a finalize method and a x_simplifiable_subregs field. * target-globals.c (target_globals::~target_globals): Handle hard_regs-finalize. * rtl.h (subreg_shape): New structure. (shape_of_subreg): New function. (simplifiable_subregs): Declare. * reginfo.c (simplifiable_subreg): New structure. (simplifiable_subregs_hasher): Likewise. (simplifiable_subregs): New function. (invalid_mode_changes): Delete. (alid_mode_changes, valid_mode_changes_obstack): New variables. (record_subregs_of_mode): Remove subregs_of_mode parameter. Record valid mode changes in valid_mode_changes. (find_subregs_of_mode): Remove subregs_of_mode parameter. Update calls to record_subregs_of_mode. (init_subregs_of_mode): Remove invalid_mode_changes and bitmap handling. Initialize new variables. Update call to find_subregs_of_mode. (invalid_mode_change_p): Check new variables instead of invalid_mode_changes. (finish_subregs_of_mode): Finalize new variables instead of invalid_mode_changes. (target_hard_regs::finalize): New function. * ira-costs.c (print_allocno_costs): Call invalid_mode_change_p even when CLASS_CANNOT_CHANGE_MODE is undefined. Index: gcc/rtl.h === --- gcc/rtl.h 2014-09-15 11:55:40.459855161 +0100 +++ gcc/rtl.h 2014-09-15 12:26:21.249077760 +0100 +/* Return the shape of a SUBREG rtx. */ + +static inline subreg_shape +shape_of_subreg (const_rtx x) +{ + return subreg_shape (GET_MODE (SUBREG_REG (x)), + SUBREG_BYTE (x), GET_MODE (x)); +} + Is there some reason you don't have a constructor that accepts a const_rtx? I was worried that by allowing implicit const_rtx-subreg_shape conversions, it would be less obvious that the rtx has to have code SUBREG. I.e. a checked conversion would be hidden in the constructor rather than being explicit. If with David's new rtx hierarchy we end up with an rtx_subreg subclass then I agree we should have a constructor that takes one of those. Makes sense. I'm not sure if I was explicit, but the patch is fine, that was more a curiosity on my part than anything. jeff
C++ PATCH for variable template diagnostics
This patch fixes two issues with variable templates: 1) -Wunused was warning about a variable template specialization when leaving the template specialization scope, and 2) The diagnostic referring to the specialization wasn't showing the template arguments. Tested x86_64-pc-linux-gnu, applying to trunk. commit 51aed4b97f9a6dd8bc7954a4691f7790d6bc41c5 Author: Jason Merrill ja...@redhat.com Date: Mon Sep 22 10:33:57 2014 -0400 * decl.c (poplevel): Don't warn about unused vars in template scope. * error.c (dump_decl): Handle variable templates. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index fe5a4af..12a9f43 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -624,6 +624,7 @@ poplevel (int keep, int reverse, int functionbody) /* Before we remove the declarations first check for unused variables. */ if ((warn_unused_variable || warn_unused_but_set_variable) + current_binding_level-kind != sk_template_parms !processing_template_decl) for (tree d = getdecls (); d; d = TREE_CHAIN (d)) { diff --git a/gcc/cp/error.c b/gcc/cp/error.c index 86fd405..a03bfe1 100644 --- a/gcc/cp/error.c +++ b/gcc/cp/error.c @@ -1044,6 +1044,18 @@ dump_decl (cxx_pretty_printer *pp, tree t, int flags) case FIELD_DECL: case PARM_DECL: dump_simple_decl (pp, t, TREE_TYPE (t), flags); + + /* Handle variable template specializations. */ + if (TREE_CODE (t) == VAR_DECL + DECL_LANG_SPECIFIC (t) + DECL_TEMPLATE_INFO (t) + PRIMARY_TEMPLATE_P (DECL_TI_TEMPLATE (t))) + { + pp_cxx_begin_template_argument_list (pp); + tree args = INNERMOST_TEMPLATE_ARGS (DECL_TI_ARGS (t)); + dump_template_argument_list (pp, args, flags); + pp_cxx_end_template_argument_list (pp); + } break; case RESULT_DECL: diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ12.C b/gcc/testsuite/g++.dg/cpp1y/var-templ12.C new file mode 100644 index 000..49ea588 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ12.C @@ -0,0 +1,10 @@ +// { dg-do compile { target c++14 } } +// { dg-options -Wall } + +template class T T x; +template int xint = 0; + +int main() +{ + return xint; +} diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ13.C b/gcc/testsuite/g++.dg/cpp1y/var-templ13.C new file mode 100644 index 000..e398d22 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ13.C @@ -0,0 +1,5 @@ +// { dg-do compile { target c++14 } } + +template class T T x; +template int xint = 0; +template int xint = 0; // { dg-error xint }
[patch] Fix specializations of std::uses_allocator in queue and stack
Include missing header, and add tests. Tested x86_64-linux, committed to trunk. commit a0ffb1d4d0bf78be525c7546665ef7002b66b13e Author: Jonathan Wakely jwak...@redhat.com Date: Mon Sep 22 16:42:51 2014 +0100 Include bits/uses_allocator.h in stack and queue. * include/bits/stl_queue.h: Include missing header. * include/bits/stl_stack.h: Likewise. * testsuite/23_containers/priority_queue/requirements/ uses_allocator.cc: New. * testsuite/23_containers/queue/requirements/uses_allocator.cc: New. * testsuite/23_containers/stack/requirements/uses_allocator.cc: New. diff --git a/libstdc++-v3/include/bits/stl_queue.h b/libstdc++-v3/include/bits/stl_queue.h index b516664..32124e3 100644 --- a/libstdc++-v3/include/bits/stl_queue.h +++ b/libstdc++-v3/include/bits/stl_queue.h @@ -58,6 +58,9 @@ #include bits/concept_check.h #include debug/debug.h +#if __cplusplus = 201103L +# include bits/uses_allocator.h +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_stack.h b/libstdc++-v3/include/bits/stl_stack.h index ee187da..f4bb72c 100644 --- a/libstdc++-v3/include/bits/stl_stack.h +++ b/libstdc++-v3/include/bits/stl_stack.h @@ -58,6 +58,9 @@ #include bits/concept_check.h #include debug/debug.h +#if __cplusplus = 201103L +# include bits/uses_allocator.h +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc new file mode 100644 index 000..efe73ae --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc @@ -0,0 +1,29 @@ +// Copyright (C) 2014 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +// { dg-options -std=gnu++11 } +// { dg-do compile } + +#include queue + +templatetypename A + using uses_allocator = std::uses_allocatorstd::priority_queueint, A; + +static_assert( uses_allocatorstd::allocatorint::value, valid allocator ); + +struct X { }; +static_assert( !uses_allocatorX::value, invalid allocator ); diff --git a/libstdc++-v3/testsuite/23_containers/queue/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/queue/requirements/uses_allocator.cc new file mode 100644 index 000..42106ca --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/queue/requirements/uses_allocator.cc @@ -0,0 +1,29 @@ +// Copyright (C) 2014 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +// { dg-options -std=gnu++11 } +// { dg-do compile } + +#include queue + +templatetypename A + using uses_allocator = std::uses_allocatorstd::queueint, A; + +static_assert( uses_allocatorstd::allocatorint::value, valid allocator ); + +struct X { }; +static_assert( !uses_allocatorX::value, invalid allocator ); diff --git a/libstdc++-v3/testsuite/23_containers/stack/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/stack/requirements/uses_allocator.cc new file mode 100644 index 000..3663d63 --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/stack/requirements/uses_allocator.cc @@ -0,0 +1,29 @@ +// Copyright (C) 2014 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful,
Re: [patch] moving macro definitions to defaults.h
On Mon, 2014-09-22 at 12:26 -0400, Andrew MacLeod wrote: After being reminded of the tm.h issues brought up last november (here: https://gcc.gnu.org/ml/gcc-patches/2013-11/msg01731.html ), I started looking back into it. The general summary is the any header file which has a conditional on a target macro could be affected by include file reordering... ie, if a header file has #ifdef BLAH whatever and we move the header files around a bit, it wouldn't be immediately obvious if the order where changes and BLAH was define later in the include structure. The results for this header files would be different because whatever would no longer be in the preprocess source, and it may take a long time to discover the difference. Josephs solution was to identify these and instead put a default definition in default.h ... then change all the uses to #if instead.. ie, #if BLAH This way we can ensure that the definition has been seen and it will be a compile error if not. I looked at all the target macros listed by Joseph, and decide to start with the ones which are used *only* in an if defined situation in a .h files of some sort. ie #ifdef #ifndef or #if define() If this happens in a .c file, then changing the order of .h's shouldn't matter. (Unless the .c file is doing something really screwy. So for the moment, ignoring that.) There appears to be a particular implicit order in which headers must be included. I notice that e.g. tm.h has: #ifndef GCC_TM_H #define GCC_TM_H so if we're going with this no header file includes any other header file model, would it make sense to add something like: #ifndef GCC_TM_H #error tm.h must have been included by this point /* We need tm.h here so that we can see: BAR, BAZ, QUUX, etc. */ #endif to header files needing it, thus expressing the expected dependencies explicitly? So looking at only .h files, I found a number of default definitions in rtl.h, which this patch moves to defaults.h. I was going to #error if defaults.h wasn't included, but found that to be unnecessary since it uses some of those macros and would cause a compile failure anyway if it wasn't included. All the other uses of these particular macros use the #if model, so no further adjustment is required for them. This patch bootstraps on x86_64-unknown-linux-gnu, and regressions are running. Assuming no issues show up, OK for mainline? The remaining macros I will have a closer look at and most will likely need the full treatment moving from #ifdef to the #if model. I plan to do these next. The macros which fit this category for potential header trouble (along with their usage count) are: 2 USE_COLLECT2 3 TARGET_TERMINATE_DW2_EH_FRAME_INFO 3 TARGET_HAVE_CTORS_DTORS 3 REGMODE_NATURAL_SIZE 4 CC_STATUS_MDEP_INIT 4 CC_STATUS_MDEP_INIT 4 MODE_BASE_REG_REG_CLASS 4 REGNO_MODE_OK_FOR_REG_BASE_P 5 MODE_BASE_REG_CLASS 5 XCOFF_DEBUGGING_INFO 6 VMS_DEBUGGING_INFO 6 TARGET_ASM_OUTPUT_ANCHOR 6 EH_FRAME_SECTION_NAME 8 MODE_CODE_BASE_REG_CLASS 9 HARD_REGNO_CALLER_SAVE_MODE 9 HARD_REGNO_CALL_PART_CLOBBERED 9 SDB_DEBUGGING_INFO 9 CC_STATUS_MDEP 9 REGNO_MODE_CODE_OK_FOR_BASE_P 10 SECONDARY_RELOAD_CLASS 10 TARGET_VXWORKS_RTP 14 NO_DOT_IN_LABEL 14 REGNO_MODE_OK_FOR_BASE_P 17 STACK_REGS 19 TARGET_ASM_DESTRUCTOR 20 OBJECT_FORMAT_ELF 20 TARGET_ASM_CONSTRUCTOR 24 NO_DOLLAR_IN_LABEL 26 DBX_DEBUGGING_INFO 30 CPLUSPLUS_CPP_SPEC 42 ASM_OUTPUT_DEF
Re: [AArch64] Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c
On 09/22/14 05:16, Alan Lawrence wrote: Ok thanks Jeff. In that case I think I should draw this to the attention of the AArch64 maintainers to check the testsuite updates are OK before I commit...? Can't hurt. Methinks it may be possible to get further, or at least increase our confidence, if I mock out try_widen_shift_mode, and/or try injecting some dubious RTL from a builtin, although this'll only give a momentary snapshot of behaviour. I may or may not have time to look into this though ;)... Yea, it's something I'd pondered as well, though I tend to inject the RTL I want directly in the debugger. The downside is doing so doesn't ensure various tables are updated properly, and I vaguely recall a per-pseudo table for the combiner's nonzero_bits, signbit_copies and friends. jeff
Re: [patch] moving macro definitions to defaults.h
On Mon, 22 Sep 2014, Andrew MacLeod wrote: Josephs solution was to identify these and instead put a default definition in default.h ... then change all the uses to #if instead.. ie, #if BLAH This way we can ensure that the definition has been seen and it will be a compile error if not. No, my suggestion was that whenever possible we should change preprocessor conditionals - #ifdef or #if - into C conditionals - if (MACRO). Changing from #ifdef to #if does nothing to make a missing tm.h include produce an error - the undefined macro simply quietly gets treated as 0 in preprocessor conditionals. To get an error from #if in such cases, you'd need to build GCC with -Wundef (together with existing -Werror), and I'd guess there are plenty of places that are not -Wundef clean at present. Now, I think moves of defaults to defaults.h are generally a good idea, and that moving from defined/undefined to true/false semantics are also a good idea - even if the way the macro is used means you can't take the further step of converting from #if to if (). They don't solve the problem of making a missing tm.h include immediately visible, but they *do* potentially help with future automatic refactoring to convert target macros into hooks. Obviously such moves do require checking the definitions and uses of the macros in question; you need to make sure you catch all places that use #ifdef / #if defined etc. on the macro (and make sure they have the same default). And if you're changing the semantics of the macro from defined / undefined to true / false, you need to watch out for any existing definitions with an empty expansion, or an expansion to 0, etc. -- Joseph S. Myers jos...@codesourcery.com
Re: [patch] moving macro definitions to defaults.h
On Mon, 22 Sep 2014, David Malcolm wrote: There appears to be a particular implicit order in which headers must be included. I notice that e.g. tm.h has: #ifndef GCC_TM_H #define GCC_TM_H so if we're going with this no header file includes any other header file model, would it make sense to add something like: #ifndef GCC_TM_H #error tm.h must have been included by this point /* We need tm.h here so that we can see: BAR, BAZ, QUUX, etc. */ #endif to header files needing it, thus expressing the expected dependencies explicitly? In principle, yes. In practice, some headers have definitions that depend on tm.h but for most users this doesn't matter. For example, flags.h depends on SWITCHABLE_TARGET. (I think the fix there is to make most users use options.h instead, and move miscellaneous declarations from flags.h to other headers.) In some cases, the target macro may be used only in a macro expansion. (BITS_PER_UNIT isn't strictly a target macro any more, but when it was its uses in tree.h were an example of that. tree.h still depends on the target macros NO_DOLLAR_IN_LABEL, NO_DOT_IN_LABEL and TARGET_DLLIMPORT_DECL_ATTRIBUTES, however, but we shouldn't make all tree.h users include tm.h.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check
On Mon, Sep 22, 2014 at 4:10 AM, Alan Lawrence alan.lawre...@arm.com wrote: Well, I haven't looked into this in detail: I've gone only as far as * swapping emit-rtl.o between 'good' compiles (svn r214042) and 'bad' compiles (r214043), finding that the critical difference is in the emit-rtl.o generated by r214043; *looking at the relocations in the 'bad' emit_rtl.o, seeing new entries 'fixed_regs + ', and that Richard Biener's changelog specifically mentions stripping signedness changes (and introduces the SIGN_NOPS). However, I apply your patch (minus the hunk adding the (set_attr type load1), this appears to have gone in already), and still see the same error message: emit-rtl.o: In function `gen_rtx_REG': emit-rtl.c:(.text+0x12f8): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o emit-rtl.o: In function `gen_rtx': emit-rtl.c:(.text+0x1824): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o collect2: error: ld returned 1 exit status and still see the same (suspicious-looking, although perhaps not convicted) relocations: $ readelf --relocs benchspec/CPU2006/403.gcc/build/build_base_test./emit-rtl.o | grep fixed_regs 12a8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 12ac 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 12f8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 12fc 005d0116 R_AARCH64_LDST8_A fixed_regs + 1824 005d0113 R_AARCH64_ADR_PRE fixed_regs + 1828 005d0116 R_AARCH64_LDST8_A fixed_regs + 186c 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 1870 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 I've also now bootstrapped my patch (STRIP_NOPS - STRIP_SIGN_NOPS * 2) on aarch64-none-linux-gnu and x86_64-none-linux-gnu, and check-gcc with no regressions, so would like to propose that patch for trunk...? You need to track down where R_AARCH64_ADR_PREL_PG_HI21 reloc is being created in the assembly and then track down why GCC is using tiny model here. Note my fix was for a similar issue; not necessary the exact same one in that there could be another pattern which needs to use the new constraint too. Thanks, Andrew --Alan Andrew Pinski wrote: On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com wrote: We've been seeing errors using aarch64-none-linux-gnu gcc to build the 403.gcc benchmark from spec2k6, that we've traced back to this patch. The error looks like: /home/alalaw01/bootstrap_richie/gcc/xgcc -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53 -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o combine.o conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o insn-extract.o insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o xcoffout.o ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o lbasename.o getpwd.o ucbqsort.o -lm-o gcc emit-rtl.o: In function `gen_rtx_REG': emit-rtl.c:(.text+0x12f8): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o emit-rtl.o: In function `gen_rtx': emit-rtl.c:(.text+0x1824): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o collect2: error: ld returned 1
Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge
On 09/22/14 04:24, Jiong Wang wrote: Great. Can you send an updated patchkit for review. patch attached. please review, thanks. gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the live-in of new created BB as the intersection of live-in from old_dest and live-out from bb. Looks good. However, before committing we need a couple things. 1. Bootstrap regression test this variant of the patch. I know you tested an earlier one, but please test this one just to be sure. 2. Testcase. I think you could test for either the reduction in the live-in set of the newly created block or that you're shrink wrapping one or more functions you didn't previously shrink-wrap. I think it's fine if this test is target specific. Jeff
Re: Speedup int_bit_from_pos
On Sun, 21 Sep 2014, Jan Hubicka wrote: Please omit static from inline functions. Yep, I suppose we want to drop static in all inlines? I can make patch for that. Also one notable difference with your patches is that the fits hwi is now not tested on the result but on the result input which, multiplied by 8, might not fit a hwi now. So please use wide-ints here (the to_offset flavor). The function must always suceed (so user promise it will fit in HWI) and for performance reasons I would rather not go into wide int by defualt, but I can do that with checking enabled. wide-int should be fast enough, please use it. Like this? Index: tree.h === --- tree.h(revision 215421) +++ tree.h(working copy) @@ -3877,10 +3877,20 @@ extern tree size_in_bytes (const_tree); extern HOST_WIDE_INT int_size_in_bytes (const_tree); extern HOST_WIDE_INT max_int_size_in_bytes (const_tree); extern tree bit_position (const_tree); -extern HOST_WIDE_INT int_bit_position (const_tree); extern tree byte_position (const_tree); extern HOST_WIDE_INT int_byte_position (const_tree); +/* Like bit_position, but return as an integer. It must be representable in + that way (since it could be a signed value, we don't have the + option of returning -1 like int_size_in_byte can. */ + +static inline HOST_WIDE_INT int_bit_position (const_tree field) +{ + return ((wide_int)DECL_FIELD_OFFSET (field) * BITS_PER_UNIT + + (wide_int)DECL_FIELD_BIT_OFFSET (field)).to_shwi (); Hmm, this gets me: /aux/hubicka/trunk-6/gcc/testsuite/gcc.c-torture/compile/2224-1.c:39:7: internal compiler error: in decompose, at wide-int.h:911 0x656ed6 wi::int_traitsgeneric_wide_intwide_int_storage ::decompose(long*, unsigned int, generic_wide_intwide_int_storage const) ../../gcc/wide-int.h:911 0x6ec9f4 wide_int_ref_storagetrue::wide_int_ref_storagegeneric_wide_intwide_int_storage (generic_wide_intwide_int_storage const, unsigned int) ../../gcc/wide-int.h:959 0x6ec75a generic_wide_intwide_int_ref_storagetrue ::generic_wide_intgeneric_wide_intwide_int_storage (generic_wide_intwide_int_storage const, unsigned int) ../../gcc/wide-int.h:735 0x7d52ec wi::binary_traitsgeneric_wide_intwide_int_storage, generic_wide_intwide_int_storage, wi::int_traitsgeneric_wide_intwide_int_storage ::precision_type, wi::int_traitsgeneric_wide_intwide_int_storage ::precision_type::result_type wi::addgeneric_wide_intwide_int_storage, generic_wide_intwide_int_storage (generic_wide_intwide_int_storage const, generic_wide_intwide_int_storage const) ../../gcc/wide-int.h:2287 0x7d4fcc wi::binary_traitsgeneric_wide_intwide_int_storage, generic_wide_intwide_int_storage, (wi::precision_type)1, wi::int_traitsgeneric_wide_intwide_int_storage ::precision_type::result_type generic_wide_intwide_int_storage::operator+generic_wide_intwide_int_storage (generic_wide_intwide_int_storage const) const ../../gcc/wide-int.h:696 0xf282ab int_bit_position ../../gcc/tree.h:3890 on precision mismatch. What is correct way to compute this? I tried offset_int but I do not seem smart enough to get past copmile errors with that one. +} + + #define sizetype sizetype_tab[(int) stk_sizetype] #define bitsizetype sizetype_tab[(int) stk_bitsizetype] #define ssizetype sizetype_tab[(int) stk_ssizetype]
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 09/22/14 04:29, Jiong Wang wrote: On 19/09/14 21:43, Jeff Law wrote: On 09/15/14 08:33, Jiong Wang wrote: Jeff, thanks, I partially understand your meaning here. take the function ira_implicitly_set_insn_hard_regs in ira-lives.c for example, when generating address rtl, gcc will automatically generate const operator to prefix the address expression, like the following. so a simple CONSTANT_P check is enough in case there is no embedded register. (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94]) (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40] var_decl 0x2b2c2ff91510 recog_data) (const_int 480 [0x1e0] -1 but for architecture like aarch64, the following instruction sequences to forming address may be generated (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99]) (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) 35 {*movdi_aarch64} (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats)) (nil))) (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83]) (lo_sum:DI (reg/f:DI 20 x20 [99]) (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) {add_losym_di} (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats) (nil))) while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ, Hmm, it's been ~15 years since I regularly worked on a target that uses HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as well, but reading the docs for CONST, that clearly isn't the case. Sorry for that. Can you (re) send your current patch for this for review? patch attached. please review, thanks. gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P (src) to release more instruction sink opportunities. gcc/testsuite/ * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase. Thanks. Please verify this version passes a bootstrap regression test. Assuming it does it is OK for the trunk. jeff
Re: Speedup int_bit_from_pos
On Sep 22, 2014, at 8:51 AM, Jan Hubicka hubi...@ucw.cz wrote: On Sun, 21 Sep 2014, Jan Hubicka wrote: Please omit static from inline functions. Yep, I suppose we want to drop static in all inlines? I can make patch for that. Also one notable difference with your patches is that the fits hwi is now not tested on the result but on the result input which, multiplied by 8, might not fit a hwi now. So please use wide-ints here (the to_offset flavor). The function must always suceed (so user promise it will fit in HWI) and for performance reasons I would rather not go into wide int by defualt, but I can do that with checking enabled. wide-int should be fast enough, please use it. Like this? Index: tree.h === --- tree.h(revision 215421) +++ tree.h(working copy) @@ -3877,10 +3877,20 @@ extern tree size_in_bytes (const_tree); extern HOST_WIDE_INT int_size_in_bytes (const_tree); extern HOST_WIDE_INT max_int_size_in_bytes (const_tree); extern tree bit_position (const_tree); -extern HOST_WIDE_INT int_bit_position (const_tree); extern tree byte_position (const_tree); extern HOST_WIDE_INT int_byte_position (const_tree); +/* Like bit_position, but return as an integer. It must be representable in + that way (since it could be a signed value, we don't have the + option of returning -1 like int_size_in_byte can. */ + +static inline HOST_WIDE_INT int_bit_position (const_tree field) +{ + return ((wide_int)DECL_FIELD_OFFSET (field) * BITS_PER_UNIT + + (wide_int)DECL_FIELD_BIT_OFFSET (field)).to_shwi (); +} + + Not quite: offset_int woffset = (wi::to_offset (xoffset) + wi::lrshift (wi::to_offset (DECL_FIELD_BIT_OFFSET (field)), LOG2_BITS_PER_UNIT)); offset_int is the type that can hold all bit positions, byte positions, byte sizes and bit sizes. One can use wi::to_offset to convert things into it. You can see woffset uses to see that various uses to convert back out. The idea is that eventually all the code that plays with that concept, could use that type. Convert into it sooner, and convert out of it later.
Re: [GOOGLE] Fix LIPO COMDAT fixup and gcov-tool interactions
On Mon, Sep 22, 2014 at 1:36 AM, Nathan Sidwell nat...@acm.org wrote: On 09/21/14 18:58, Xinliang David Li wrote: the intent is that that points to the gcov_info object of the object file containing the live version of the function. I couldn't quite get this to work though -- it involves emitting a function's gcov_fn_info decl in the same comdat group as the function itself. Another problem is that comdat functions may have different CFGs due to different early inline decisions. Comdatting gcov counters can lead to problems in profile use. Not comdatting profile counters have another advantage -- it allows context sensitive profiling for comdat function inline instances (IPA-inline). IIRC early inlining is done before the counters are created. Yes, and that will be the cause of problem for coverage mismatch when COMDATing profile counters -- due to difference in early inline decisions, the counter array created may be different across modules. You're right later inlining may be a problem, and require a non-comdat set of cloned counters. I can't recall exactly at what stage the counters are now inserted relative to inlining. The CFG machinery had a number of significant changes while, and shortly after, I was working on this. After early inining and before ipa inline. The early inline can lead to the coverage mismatch, and the latter can lead to loss of profile precision with the counter comdat. David You'll see the checking of gfi_ptr-key != gi_ptr in libgcov-driver.c. Are you making use of this machinery, or inventing new machinery? Teresa's method is a different machinery -- it tries to propagate profile data from the selected comdat copy + inline instance copies to comdat copies with zero counts. It'd be preferrable to complete the mechanism I outline above, rather than have a competing mechanism. Also, this patch is in effect lying because the data then makes it look like the unselected comdat instances are in fact being executed -- looking at the whole program it's going to be harder to understand whether the different inline instances are being executed multiple times, or are duplicate data. Does the gcov user output indicate this subtlety in some way? nathan
Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI
On 09/21/14 06:34, Uros Bizjak wrote: On Fri, Sep 19, 2014 at 9:55 AM, Ilya Enkovich enkovich@gmail.com wrote: Here is an updated version of this patch. @@ -25000,10 +25082,32 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, } call = gen_rtx_CALL (VOIDmode, fnaddr, callarg1); + if (retval) -call = gen_rtx_SET (VOIDmode, retval, call); +{ + /* For instrumented code we may have GPR + BR in parallel but +it will confuse DF and we need to put each reg +under EXPR_LIST. */ + if (chkp_function_instrumented_p (current_function_decl)) + chkp_put_regs_to_expr_list (retval); I assume that this is OK from infrastructure POV. I'd like to ask Jeff, if this approach is OK. This seems wrong. The documented way to handle multiple return values is to put them into a PARALLEL. If that's not working, I'd like Ilya to describe more precisely why. I missed this from patch #6. (define_c_enum unspecv [ @@ -11549,7 +11550,9 @@ (define_insn *call_value [(set (match_operand 0) (call (mem:QI (match_operand:W 1 call_insn_operand cBwBz)) - (match_operand 2)))] + (match_operand 2))) + (set (reg:BND64 BND0_REG) (unspec [(reg:BND64 BND0_REG)] UNSPEC_BNDRET)) + (set (reg:BND64 BND1_REG) (unspec [(reg:BND64 BND1_REG)] UNSPEC_BNDRET))] Does these patterns depend on BND0/BND1 values? If not, please use (const_int 0) for their arguments. OTOH, do you really need these sets here? Looking at the gcc internals documentation (slightly unclear in this area), it should be enough to list return values in EXPR_LIST. This question is somehow connected to the new comment in ix86_expand_call, so perhaps Jeff can comment on this approach. Excellent question. In fact, I'd hazard a guess this doesn't really do what Ilya wants. As written those additional sets occur in parallel with the call. ie, all the RHSs are evaluated in parallel, then the assignments to the LHSs occur in parallel. I suspect that's not what Ilya wants -- instead I think he wants those side effects to occur after the call. This kind of scheme also doesn't tend to play well with exception handling scheduling becuase you can't guarantee the sets and the call are in the same block and scheduler as a single group. See the block comment in the call expander on the PA port. The situation is not precisely the same, but it's a bit of a warning flag if we have the call expanders doing other assignments. jeff