date:20140922

2014-09-19 23:45 GMT+04:00 Jeff Law l...@redhat.com:
 On 09/16/14 03:09, Ilya Enkovich wrote:


 I must be misunderstanding something then.  I fundamentally don't see how
 the return bounds are any different here than the return value.  If we
 have
 exposed the bounds in the IL, then aren't they going to be handled just
 like
 any other object in the IL?


 They are not handled like any other object in IL because return block
 and all statements in it are not handled as all other statements we
 put into split part.

 Here is a comment from find_return_bb:

 /* Return basic block containing RETURN statement.  We allow basic blocks
 of the form:
 retval = tmp_var;
 return retval
 but return_bb can not be more complex than this.
 ...
 */

 Phi nodes also may present in return_bb.

 Right.  I've seen this stuff, but it's still not clear to me what the real
 issue is.

 The first thing that jumps out when I look at your dump is we don't have  a
 PHI for __bound_tmp.322 in BB6.  Now it may be that we really just wanted
 __bound_tmp.322_36, but that seems wrong as the return value varies
 depending on how we reach BB6 and it seems to me the bounds ought to vary in
 a similar manner.

Bounds don't have to vary for different pointers.  E.g. p and p + 1
always have equal bounds.  In this particular case we have function
pointers and all of them have default bounds.





 All blocks going to split part are analyzed by visit_bb function.
 Return basic block is not analyzed in the same way but still may be
 copied into split part in case return value is defined in it.  There
 is a special code in visit_bb to add args of phi statements of
 return_bb as uses of split part to have no undefined values in copied
 block.  It was enough when those phi args plus return value were only
 uses in return_bb.

 But now we add returned bounds to GIMPLE_RETURN as a new use and this
 new use is ignored.  If split part returns value then return_bb will
 be copied into it.  It means I should check returned bounds are
 defined there too.  If SSA_NAME_DEF_STMT of returned bounds is in
 split part then it is OK.  If SSA_NAME_DEF_STMT of returned bounds is
 in return_bb then it is also OK because it means it is a result of PHI
 node whose args were added as additional uses for split part earlier
 in visit_bb.

 At least that is how I think this happens :)


 Maybe you should post the IL for a case where this all matters and walk
 me
 through the key issues.


 I attach a dump I got from Chrome compilation with no additional
 checks restrictions in split.  Original function returns value defined
 by phi node in return_bb and bounds defined in BB2.  Split part
 contains BB3, BB4 and BB5 and resulting function part has usage of
 returned bounds but no producer for it.

 Right, but my question is whether or not the bounds from BB2 were really the
 correct bounds to be using in the first place!  I would have expected a PHI
 in BB6 to select the bounds based on the path leading to BB6, much like we
 select a different return value.

Consider we have pointer computation and then

return __bnd_init_ptr_bounds (res);

In such case you would never have a PHI node for bounds.  Also do not
forget that we may have no PHI nodes for both return value and return
bounds.  In such case we could also easily fall into undefined value
as in dump.

Thanks,
Ilya


 Jeff

Re: [PATCH 0/5] Fix handling of word subregs of wide registers

Jeff Law l...@redhat.com writes:
 On 09/19/14 01:23, Richard Sandiford wrote:
 Jeff Law l...@redhat.com writes:
 On 09/18/14 04:07, Richard Sandiford wrote:
 This series is a cleaned-up version of:

   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on the
 word size.  You can't have a subreg for byte 2 of a 4-byte word, say,
 but you can have a subreg for word 2 of a 4-word value (as well as lowpart
 subregs of that word, etc.).  This causes problems when an architecture has
 wider-than-word registers, since the addressability of a word can
 then depend
 on which register class is used.

 The register allocators need to fix up cases where a subreg turns out to
 be invalid for a particular class.  This is really an extension of what
 we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
 I thought we fixed these problems long ago with the change to subreg_byte?!?

 No, that was fixing something else.  (I'm just about old enough to remember
 that too!)  The problem here is that (say):

  (subreg:SI (reg:DI X) 4)

 is independently addressable on little-endian AArch32 if X assigned
 to a GPR, but not if X is assigned to a vector register.  We need
 to allow these kinds of subreg on pseudos in order to decompose multiword
 arithmetic.  It's then up to the RA to realise that a reload would be
 needed if X were assigned to a vector register, since the upper half
 of a vector register cannot be independently accessed.

 Note that you could write this example even with the old word-style offsets
 and IIRC the effect would have been the same.
 OK.  So I kept thinking in terms of the byte offset stuff.  But what 
 you're tackling is related to the mess around the mode of the subreg 
 having a different meaning if its smaller than a word vs word-sized or 
 greater.

 Right?

Yeah, that's right.  Addressability is based on words, which is inconvenient
when your registers are bigger than a word.

Thanks,
Richard

Re: [PATCH 4/5] Generalise invalid_mode_change_p

Jeff Law l...@redhat.com writes:
 On 09/18/14 04:25, Richard Sandiford wrote:
 This is the main patch for the bug.  We should treat a register as invalid
 for a mode change if simplify_subreg_regno cannot provide a new register
 number for the result.  We should treat a class as invalid for a mode change
 if all registers in the class are invalid.  This is an extension of the old
 CANNOT_CHANGE_MODE_CLASS-based check (simplify_subreg_regno checks C_C_C_M).

 I forgot to say that the patch is a prerequisite to removing aarch64's
 C_C_C_M.  There are other prerequisites too, but removing C_C_C_M without
 this patch caused regressions in the existing testsuite, which is why no
 new tests are needed.


 gcc/
  * hard-reg-set.h: Include hash-table.h.
  (target_hard_regs): Add a finalize method and a x_simplifiable_subregs
  field.
  * target-globals.c (target_globals::~target_globals): Handle
  hard_regs-finalize.
  * rtl.h (subreg_shape): New structure.
  (shape_of_subreg): New function.
  (simplifiable_subregs): Declare.
  * reginfo.c (simplifiable_subreg): New structure.
  (simplifiable_subregs_hasher): Likewise.
  (simplifiable_subregs): New function.
  (invalid_mode_changes): Delete.
  (alid_mode_changes, valid_mode_changes_obstack): New variables.
  (record_subregs_of_mode): Remove subregs_of_mode parameter.
  Record valid mode changes in valid_mode_changes.
  (find_subregs_of_mode): Remove subregs_of_mode parameter.
  Update calls to record_subregs_of_mode.
  (init_subregs_of_mode): Remove invalid_mode_changes and bitmap
  handling.  Initialize new variables.  Update call to
  find_subregs_of_mode.
  (invalid_mode_change_p): Check new variables instead of
  invalid_mode_changes.
  (finish_subregs_of_mode): Finalize new variables instead of
  invalid_mode_changes.
  (target_hard_regs::finalize): New function.
  * ira-costs.c (print_allocno_costs): Call invalid_mode_change_p
  even when CLASS_CANNOT_CHANGE_MODE is undefined.

 Index: gcc/rtl.h
 ===
 --- gcc/rtl.h2014-09-15 11:55:40.459855161 +0100
 +++ gcc/rtl.h2014-09-15 12:26:21.249077760 +0100
 +/* Return the shape of a SUBREG rtx.  */
 +
 +static inline subreg_shape
 +shape_of_subreg (const_rtx x)
 +{
 +  return subreg_shape (GET_MODE (SUBREG_REG (x)),
 +   SUBREG_BYTE (x), GET_MODE (x));
 +}
 +
 Is there some reason you don't have a constructor that accepts a 
 const_rtx?

I was worried that by allowing implicit const_rtx-subreg_shape
conversions, it would be less obvious that the rtx has to have
code SUBREG.  I.e. a checked conversion would be hidden in the
constructor rather than being explicit.

If with David's new rtx hierarchy we end up with an rtx_subreg
subclass then I agree we should have a constructor that takes
one of those.

Thanks,
Richard

Re: Speedup int_bit_from_pos

On Sun, 21 Sep 2014, Jan Hubicka wrote:

  
  Please omit static from inline functions.
 
 Yep, I suppose we want to drop static in all inlines? I can make patch for 
 that.
  
  Also one notable difference with your patches is that the fits hwi is now 
  not tested on the result but on the result input which, multiplied by 8, 
  might not fit a hwi now.  So please use wide-ints here (the to_offset 
  flavor).
 
 The function must always suceed (so user promise it will fit in HWI) and for
 performance reasons I would rather not go into wide int by defualt, but I can
 do that with checking enabled.

wide-int should be fast enough, please use it.

Richard.

Re: [BUILDROBOT] genrecog fix uncovers problem in bfin.md (was: [Patch] Teach genrecog/genoutput that scratch registers require write constraint modifiers)

2014-09-22 Thread James Greenhalgh

On Sat, Sep 20, 2014 at 08:40:01PM +0100, Jan-Benedict Glaw wrote:
 Hi!
 
 On Thu, 2014-09-18 11:19:21 +0100, James Greenhalgh 
 james.greenha...@arm.com wrote:
  As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01334.html
  The construct
  
(clobber (match_scratch 0 r))
  
  is invalid - operand 0 must be marked either write or read/write.
  
  Likewise
  
(match_* 0 r)
  
  is invalid, marking an operand earlyclobber does not remove the need to
  also mark it write or read/write.
 
 My build robot shows a new build error, which I guess is
 caused/uncovered by your genrecog change on bfin-elf (see eg. build
 http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=355667):
 
 build/genrecog /home/jbglaw/repos/gcc/gcc/common.md 
 /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md \
   insn-conditions.md  tmp-recog.c
 /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md:1971: constraints not 
 supported in define_split
 make[1]: *** [s-recog] Error 1
 make[1]: Leaving directory `/home/jbglaw/build/bfin-elf/build-gcc/gcc'
 make: *** [all-gcc] Error 2
 
 
 Would be nice if the bfin maintainer or you would come up with a fix.

Hi Jan,

I posted a fix for this on Friday evening at:

  https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01682.html

I'm waiting for a bfin maintainer to say OK, as it isn't a port I know
well.

Thanks,
James

Re: [PATCH][match-and-simplify] User defined predicates

On Tue, 16 Sep 2014, Marc Glisse wrote:

 On Tue, 16 Sep 2014, Richard Biener wrote:
 
  The following adds the ability to write predicates using patterns
  with an example following negate_expr_p which already has a
  use in comparison folding (via its if c-expr).
  
  The syntax is as follows:
  
  (match negate_expr_p
  INTEGER_CST
  (if (TYPE_OVERFLOW_WRAPS (type)
   || may_negate_without_overflow_p (t
  (match negate_expr_p
  (bit_not @0)
  (if (INTEGRAL_TYPE_P (type)  TYPE_OVERFLOW_WRAPS (type
  (match negate_expr_p
  FIXED_CST)
  (match negate_expr_p
  (negate @0))
  ...
  
  that is, you write '(match id' instead of '(simplify' and then
  follow with a pattern and optional conditionals.  There should
  be no transform pattern (unchecked yet).  Multiple matches for
  the same id simply add to what is recognized as id.
  The predicate is applied to a single 'tree' operand and looks
  up SSA defs and utilizes the optional valueize hook.
  
  Currently both GENERIC and GIMPLE variants result in name-mangling
  and the proptotypes (unprototyped anywhere)
  
  bool tree_negate_expr_p (tree t);
  bool gimple_negate_expr_p (tree t, tree (*valueize)(tree) = NULL);
 
 Ah, I haven't looked at the generated code, but I was expecting something
 roughly like:
 
 struct matcher
 {
   std::functiontree(tree) valueize;
   bool negate_expr(tree);
   ...
 };
 
 where we can call negate_expr recursively without caring about passing
 valueize (if there are 2 matchers, one without a valueize function,
 negate_expr can be static in that version). Although recursive calls sound
 potentially slow, and having a thread_local counter in valueize to limit the
 call depth may not be ideal.
 
 Please note that I am not at all saying the above is a good design, just
 dropping a random thought.

Yeah, abstracting a bit from the low-level interface might be nice.

Note that I wouldn't recommend using recursive predicates as on GIMPLE
this easily can result in exponential runtime behavior (and capping
on a recursion limit looks ugly).  Instead of using recursive predicates
a lattice of predicate results should be provided by the caller
(which would ask for a lattice abstraction as well).  So you'd have
sth like

struct lattice
{
  tree valueize (tree);
  bool negate_expr_p (tree);
  bool nonnegative_p (tree);
  ...
};

note that the fold-const.c negate_expr_p is really a predicate on
whether negate_expr will be able to simplify.  Yes, fold-const.c
has recursive transforms, something match-and-simplify doesn't
support either.  Here the proper way is to implement this kind
of stuff in a real pass.

I'll rip out the recursive parts of negate_expr_p again at some point,
I just put it there as an exercise ;)

Richard.

Re: Stream ODR types

On Wed, 17 Sep 2014, Jan Hubicka wrote:

 Hi,
 this patch renames types reported by Wodr during LTO bootstrap.
 
 Bootrapping/regtesting in progress, OK if it passes?
 
 Honza
 
   * tree-ssa-ccp.c (prop_value_d): Rename to ...
   (ccp_prop_value_t): ... this one to avoid ODR violation; update uses.
   * ipa-prop.c (struct type_change_info): Rename to ...
   (prop_type_change_infoprop_type_change_info): ... this; update uses.

Seems a bit excessive ;)

Ok.

Thanks,
Richard.

   * ggc-page.c (globals): Rename to ...
   (static struct ggc_globals): ... this; update uses.
   * tree-ssa-loop-im.c (mem_ref): Rename to ...
   (im_mem_ref): ... this; update uses.
   * ggc-common.c (loc_descriptor): Rename to ...
   (ggc_loc_descriptor): ... this; update uses.
   * lra-eliminations.c (elim_table): Rename to ...
   (lra_elim_table): ... this; update uses.
   * bitmap.c (output_info): Rename to ...
   (bitmap_output_info): ... this; update uses.
   * gcse.c (expr): Rename to ...
   (gcse_expr) ... this; update uses.
   (occr): Rename to ...
   (gcse_occr): .. this; update uses.
   * tree-ssa-copy.c (prop_value_d): Rename to ...
   (prop_value_t): ... this.
   * predict.c (block_info_def): Rename to ...
   (block_info): ... this; update uses.
   (edge_info_def): Rename to ...
   (edge_info): ... this; update uses.
   * profile.c (bb_info): Rename to ...
   (bb_profile_info): ... this; update uses.
   * alloc-pool.c (output_info): Rename to ...
   (pool_output_info): ... this; update uses.
   
 Index: tree-ssa-ccp.c
 ===
 --- tree-ssa-ccp.c(revision 215328)
 +++ tree-ssa-ccp.c(working copy)
 @@ -166,7 +166,7 @@ typedef enum
VARYING
  } ccp_lattice_t;
  
 -struct prop_value_d {
 +struct ccp_prop_value_t {
  /* Lattice value.  */
  ccp_lattice_t lattice_val;
  
 @@ -180,24 +180,22 @@ struct prop_value_d {
  widest_int mask;
  };
  
 -typedef struct prop_value_d prop_value_t;
 -
  /* Array of propagated constant values.  After propagation,
 CONST_VAL[I].VALUE holds the constant value for SSA_NAME(I).  If
 the constant is held in an SSA name representing a memory store
 (i.e., a VDEF), CONST_VAL[I].MEM_REF will contain the actual
 memory reference used to store (i.e., the LHS of the assignment
 doing the store).  */
 -static prop_value_t *const_val;
 +static ccp_prop_value_t *const_val;
  static unsigned n_const_val;
  
 -static void canonicalize_value (prop_value_t *);
 +static void canonicalize_value (ccp_prop_value_t *);
  static bool ccp_fold_stmt (gimple_stmt_iterator *);
  
  /* Dump constant propagation value VAL to file OUTF prefixed by PREFIX.  */
  
  static void
 -dump_lattice_value (FILE *outf, const char *prefix, prop_value_t val)
 +dump_lattice_value (FILE *outf, const char *prefix, ccp_prop_value_t val)
  {
switch (val.lattice_val)
  {
 @@ -236,10 +234,10 @@ dump_lattice_value (FILE *outf, const ch
  
  /* Print lattice value VAL to stderr.  */
  
 -void debug_lattice_value (prop_value_t val);
 +void debug_lattice_value (ccp_prop_value_t val);
  
  DEBUG_FUNCTION void
 -debug_lattice_value (prop_value_t val)
 +debug_lattice_value (ccp_prop_value_t val)
  {
dump_lattice_value (stderr, , val);
fprintf (stderr, \n);
 @@ -272,10 +270,10 @@ extend_mask (const wide_int nonzero_bit
 4- Initial values of variables that are not GIMPLE registers are
considered VARYING.  */
  
 -static prop_value_t
 +static ccp_prop_value_t
  get_default_value (tree var)
  {
 -  prop_value_t val = { UNINITIALIZED, NULL_TREE, 0 };
 +  ccp_prop_value_t val = { UNINITIALIZED, NULL_TREE, 0 };
gimple stmt;
  
stmt = SSA_NAME_DEF_STMT (var);
 @@ -343,10 +341,10 @@ get_default_value (tree var)
  
  /* Get the constant value associated with variable VAR.  */
  
 -static inline prop_value_t *
 +static inline ccp_prop_value_t *
  get_value (tree var)
  {
 -  prop_value_t *val;
 +  ccp_prop_value_t *val;
  
if (const_val == NULL
|| SSA_NAME_VERSION (var) = n_const_val)
 @@ -366,7 +364,7 @@ get_value (tree var)
  static inline tree
  get_constant_value (tree var)
  {
 -  prop_value_t *val;
 +  ccp_prop_value_t *val;
if (TREE_CODE (var) != SSA_NAME)
  {
if (is_gimple_min_invariant (var))
 @@ -387,7 +385,7 @@ get_constant_value (tree var)
  static inline void
  set_value_varying (tree var)
  {
 -  prop_value_t *val = const_val[SSA_NAME_VERSION (var)];
 +  ccp_prop_value_t *val = const_val[SSA_NAME_VERSION (var)];
  
val-lattice_val = VARYING;
val-value = NULL_TREE;
 @@ -413,7 +411,7 @@ set_value_varying (tree var)
For other constants, make sure to drop TREE_OVERFLOW.  */
  
  static void
 -canonicalize_value (prop_value_t *val)
 +canonicalize_value (ccp_prop_value_t *val)
  {
enum machine_mode mode;
tree type;
 @@ -451,7 +449,7 @@

Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.

2014-09-22 Thread Yuri Rumyantsev

Richard,

here is reduced patch (part.1) which was reduced almost twice.
Let's me also answer on your comments.

1. I really use edge field 'aux' to keep predicate for critical edges.
My previous code was not correct and now it looks like:

  if (EDGE_COUNT (b-succs) == 1 || EDGE_COUNT (e-dest-preds) == 1)
/* Edge E is not critical,  use predicate of edge source bb. */
c = bb_predicate (b);
  else
/* Edge E is critical and its aux field contains predicate.  */
c = edge_predicate (e);

2. I completely delete all code related to creation of conditional
expressions and completely rely on bool pattern recognition in
vectorizer. But we need to delete all dead predicate computations
which are not used since they prevent vectorization. I will add this
local-dce function in next patch.
3. I also did not include in this patch recognition of general
phi-nodes with two arguments only for which conversion of conditional
scalar reduction can be applied also.
Note that all these changes are applied for loop marked with pragma
omp simd only.

2014-09-22  Yuri Rumyantsev  ysrum...@gmail.com

* tree-if-conv.c (cgraph.h): Add include file to detect function clone.
(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(convert_name_to_cmp): New function.
(add_to_predicate_list): Check unconditionally that bb is always
executed to early exit. Use predicate of cd-equivalent block
for join blocks if it exists.
(add_to_dst_predicate_list): Invoke add_to_predicate_list if
destination block of edge is not always executed. Set-up predicate
for critical edge.
(if_convertible_phi_p): Accept phi nodes with more than two args
if FLAG_FORCE_VECTORIZE was set-up.
(ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
(if_convertible_stmt_p): Fix up pre-function comments.
(all_edges_are_critical): New function.
(if_convertible_bb_p): Allow bb has more than two predecessors if
FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
to reject block if-conversion with incoming critical edges only if
FLAG_FORCE_VECTORIZE was not set-up.
(predicate_bbs): Skip loop exit block also. Add check that if
fold_build2 produces bool conversion, recompute predicate using
build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
(if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and atleast one incoming edge is not critical original
algorithm is used.
(get_predicate_for_edge): New function.
(find_insertion_point): New function.
(predicate_arbitrary_scalar_phi): New function.
(predicate_all_scalar_phis): Introduce new variable BEFORE.
Invoke find_insertion_point to initialize gsi and
predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
that extended predication must be applied).
(insert_gimplified_predicates): Add test for non-predicated basic
blocks that there are no gimplified statements to insert. Insert
predicates at the block begining for extended if-conversion.
(tree_if_conversion): Initialize flag_force_vectorize from current
loop or outer loop (to support pragma omp declare).Do loop versioning
for innermost loop marked with pragma omp simd and
FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
for blocks with two successors.




2014-09-08 17:10 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard!
 Here is updated patch with the following changes:

 1. Any restrictions on phi-function were eliminated for extended conversion.
 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
 negate_predicate was deleted.
 3. Deleted splitting of critical edges, i.e. both outgoing edges can
 be critical.
 4. Use notion of cd-equivalence to set-up predicate for join basic
 blocks to simplify it.
 5. I decided to not design pre-pass since it will lead generating
 chain of cond expressions for phi-node if conversion, whereas for phi
 of kind
   x = PHI 1(2), 1(3), 2(4)
 only one cond expression is required and this is considered as simple
 optimization for arbitrary phi-function. More precise,
 if phi-function have only two different arguments and one of them has
 single occurrence, if- conversion is performed as if phi have only 2
 arguments.
 For arbitrary phi function a chain of cond expressions is produced.

 Updated patch is attached.

 Any comments will be appreciated.

 The patch is still very big and does multiple things at once which makes
 it hard to review.

 In addition to that it changes function singatures without updating
 the function comments.  For example what is the convert_bool
 argument doing to add_to_dst_predicate_list?  Why do we need
 all

Re: [GOOGLE] Fix LIPO COMDAT fixup and gcov-tool interactions

2014-09-22 Thread Nathan Sidwell


On 09/21/14 18:58, Xinliang David Li wrote:


the intent is that that points to the gcov_info object of the object file
containing the live version of the function.  I couldn't quite get this to
work though -- it involves emitting a function's gcov_fn_info decl in the
same comdat group as the function itself.


Another problem is that comdat functions may have different CFGs due
to different early inline decisions. Comdatting gcov counters can lead
to problems in profile use. Not comdatting profile counters have
another advantage -- it allows context sensitive profiling for comdat
function inline instances (IPA-inline).


IIRC early inlining is done before the counters are created.  You're right later 
inlining may be a problem, and require a non-comdat set of cloned counters.   I 
can't recall exactly at what stage the counters are now inserted relative to 
inlining.  The CFG machinery had a number of significant changes while, and 
shortly after, I was working on this.



You'll see the checking of gfi_ptr-key != gi_ptr in libgcov-driver.c.

Are you making use of this machinery, or inventing new machinery?


Teresa's method is a different machinery -- it tries to propagate
profile data from the selected comdat copy + inline instance copies to
comdat copies with zero counts.


It'd be preferrable to complete the mechanism I outline above, rather than have 
a competing mechanism.  Also, this patch  is in effect lying because the data 
then makes it look like the unselected comdat instances are in fact being 
executed -- looking at the whole program it's going to be harder to understand 
whether the different inline instances are being executed multiple times, or are 
duplicate data.  Does the gcov user output indicate this subtlety in some way?


nathan

Re: ptx preliminary address space fixes [4/4]

On Wed, Sep 17, 2014 at 10:15 PM, Bernd Schmidt ber...@codesourcery.com wrote:
 On 09/11/2014 01:41 PM, Richard Biener wrote:

 On Thu, Sep 11, 2014 at 12:12 PM, Bernd Schmidt ber...@codesourcery.com
 wrote:

 This one isn't a wrong-code issue, just a missed optimization.  The
 strlen
 optimizations need to be made to look through ADDR_SPACE_CONVERT_EXPR to
 work on ptx.

 Bootstrapped and tested together with the other patches on x86_64-linux.
 Ok?


 Did you try adding ADDR_SPACE_CONVERT_EXPR to the tree codes
 handled in gimple_assign_cast_p?


 I did now (full test on x86_64, and also tested with ptx), and that also
 appears to work.  Ok?

Ok.

Thanks,
Richard.


 Bernd

Re: ptx preliminary rtl patches [3/4]

On Thu, Sep 18, 2014 at 8:57 PM, Bernd Schmidt ber...@codesourcery.com wrote:
 On 09/12/2014 10:04 AM, Richard Biener wrote:

 On Thu, Sep 11, 2014 at 6:36 PM, Bernd Schmidt ber...@codesourcery.com
 wrote:

 I strongly disagree. It's the same as for any other integer - there's one
 sign bit, and since there aren't any other bits, the number of sign bit
 copies is always exactly 1.


 I agree about that.  But I fail to see what goes wrong with the existing
 code in combine.  Maybe the code simply doesn't work for
 GET_MODE_PRECISION != GET_MODE_BITSIZE?


 I had to debug it again - the patch was a year old. This time I came to the
 conclusion that we're just using the wrong mode. We're trying to simplify
 (ne:BI (reg:HI x) (const_int 0)), and the code here was using BImode when
 calling num_sign_bit_copies for the register - what I think it wants to do
 is verify that the operand consists of all zeros or all ones.

 Digging a bit further I noticed that some of the cases around this code have
 a mode == GET_MODE (op0) test. These were added by rth in commit 3573fd048,
 which added BImode. It looks like this particular case slipped through the
 cracks. The easiest way to fix it is the below - bootstrapped and tested on
 x86_64-linux, ok if it also works with ptx?

Ok.

Thanks,
Richard.


 Bernd

Re: Small fix for walking constructors

On Thu, Sep 18, 2014 at 10:38 PM, Jeff Law l...@redhat.com wrote:
 On 09/18/14 13:01, Bernd Schmidt wrote:

 This fixes an issue on ptx where we fail to output a declaration for a
 variable. The testcase is c-torture/compile/pr34856.c, and the cause of
 the problem is that the variable g is never inserted into the varpool,
 which is where a future patch will look for references to variables not
 defined in the current translation unit (ptx assembly requires
 declarations for these too).

 Bootstrapped and tested on x86_64-linux, ok?


 Bernd

 walk-more.diff


 commit 968a508fdd5c413147b9c26d37633bf7ab7a7e65
 Author: Bernd Schmidtber...@codesourcery.com
 Date:   Thu Sep 11 14:35:01 2014 +0200

  Fix handling of CONSTRUCTORs in gimple-walk.

 * gimple-walk.c (walk_stmt_load_store_addr_ops): Look past casts
 when
 dealing with CONSTRUCTORs.

 OK.

Errr - certainly not.

It seems to me that walk_stmt_load_store_addr_ops is called on
bogus input.  The function is supposed to be called on GIMPLE
stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
conversions in their elements.

Please revert if you have applied already.

Thanks,
Richard.

 Jeff

Re: [PATCH] PR63300 'const volatile' sometimes stripped in debug info.

2014-09-22 Thread Andreas Arnez

On Sat, Sep 20 2014, Mark Wielaard wrote:

 When adding DW_TAG_restrict_type I made a mistake when updating the
 code that handled types with multiple modifiers. This patch fixes it
 by putting the logic for finding the sub-qualified type in a separate
 function and fall back to adding the modifiers separately if there is
 no such existing type. The old tests didn't catch this case because
 there always was an existing sub-qualified type already. The new testcase
 fails before and succeeds after this patch.

 gcc/ChangeLog

   * dwarf2out.c (existing_sub_qualified_type): New function.
   (modified_type_die): Use existing_sub_qualified_type. Fall
   back to adding modifiers one by one of there is no existing
   sub-qualified type.

 gcc/testsuite/ChangeLog

   * gcc.dg/guality/pr63300-const-volatile.c: New testcase.
 ---
  gcc/dwarf2out.c| 85 
 ++
  .../gcc.dg/guality/pr63300-const-volatile.c| 12 +++
  2 files changed, 84 insertions(+), 13 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/guality/pr63300-const-volatile.c

 diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
 index e87ade2..0cbc316 100644
 --- a/gcc/dwarf2out.c
 +++ b/gcc/dwarf2out.c
 @@ -10461,6 +10461,51 @@ decl_quals (const_tree decl)
? TYPE_QUAL_VOLATILE : TYPE_UNQUALIFIED));
  }
  
 +/* Returns true if CV_QUALS contains QUAL and we have a qualified
 +   variant of TYPE that has at least one other qualifier found in
 +   CV_QUALS.  Returns false if CV_QUALS doesn't contain QUAL, if
 +   CV_QUALS is empty after subtracting QUAL, or if we don't have a
 +   TYPE that has at least one qualifier from CV_QUALS minus QUAL.  */
 +static bool
 +existing_sub_qualified_type (tree type, int cv_quals, int qual)
 +{
 +  int sub_qual, sub_quals = cv_quals  ~qual;
 +  if ((cv_quals  qual) == TYPE_UNQUALIFIED || sub_quals == TYPE_UNQUALIFIED)
 +return false;
 +
 +  sub_qual = TYPE_QUAL_CONST;
 +  if ((sub_quals  ~sub_qual) != TYPE_UNQUALIFIED
 +   get_qualified_type (type, sub_quals  ~sub_qual) != NULL_TREE)
 +return true;
 +
 +  sub_qual = TYPE_QUAL_VOLATILE;
 +  if ((sub_quals  ~sub_qual) != TYPE_UNQUALIFIED
 +   get_qualified_type (type, sub_quals  ~sub_qual) != NULL_TREE)
 +return true;
 +
 +  sub_qual = TYPE_QUAL_RESTRICT;
 +  if ((sub_quals  ~sub_qual) != TYPE_UNQUALIFIED
 +   get_qualified_type (type, sub_quals  ~sub_qual) != NULL_TREE)
 +return true;
 +
 +  sub_qual = TYPE_QUAL_CONST  TYPE_QUAL_VOLATILE;

You probably mean '|' instead of '' here.

 +  if ((sub_quals  ~sub_qual) != TYPE_UNQUALIFIED
 +   get_qualified_type (type, sub_quals  ~sub_qual) != NULL_TREE)
 +return true;
 +
 +  sub_qual = TYPE_QUAL_CONST  TYPE_QUAL_RESTRICT;

See above.

 +  if ((sub_quals  ~sub_qual) != TYPE_UNQUALIFIED
 +   get_qualified_type (type, sub_quals  ~sub_qual) != NULL_TREE)
 +return true;
 +
 +  sub_qual = TYPE_QUAL_VOLATILE  TYPE_QUAL_RESTRICT;

See above.

 +  if ((sub_quals  ~sub_qual) != TYPE_UNQUALIFIED
 +   get_qualified_type (type, sub_quals  ~sub_qual) != NULL_TREE)
 +return true;
 +
 +  return false;
 +}

IIUC, 'sub_qual' above represents the qualifiers to *omit* from the ones
we're interested in, right?  Maybe it would be more straightforward to
reverse the logic, i.e., start with

sub_qual = TYPE_QUAL_VOLATILE | TYPE_QUAL_RESTRICT;

and then always use sub_qual instead of ~sub_qual.

Also note that the logic wouldn't scale too well for yet more
qualifiers...

 +
  /* Given a pointer to an arbitrary ..._TYPE tree node, return a debugging
 entry that chains various modifiers in front of the given type.  */
  
 @@ -10543,34 +10588,48 @@ modified_type_die (tree type, int cv_quals, 
 dw_die_ref context_die)
  
mod_scope = scope_die_for (type, context_die);
  
 -  if ((cv_quals  TYPE_QUAL_CONST)
 -  /* If there are multiple type modifiers, prefer a path which
 -  leads to a qualified type.  */
 -   (((cv_quals  ~TYPE_QUAL_CONST) == TYPE_UNQUALIFIED)
 -   || get_qualified_type (type, cv_quals) == NULL_TREE
 -   || (get_qualified_type (type, cv_quals  ~TYPE_QUAL_CONST)
 -   != NULL_TREE)))
 +  /* If there are multiple type modifiers, prefer a path which
 + leads to a qualified type.  */
 +  if (existing_sub_qualified_type (type, cv_quals, TYPE_QUAL_CONST))
  {
mod_type_die = new_die (DW_TAG_const_type, mod_scope, type);
sub_die = modified_type_die (type, cv_quals  ~TYPE_QUAL_CONST,
  context_die);
  }
 -  else if ((cv_quals  TYPE_QUAL_VOLATILE)
 - (((cv_quals  ~TYPE_QUAL_VOLATILE) == TYPE_UNQUALIFIED)
 -|| get_qualified_type (type, cv_quals) == NULL_TREE
 -|| (get_qualified_type (type, cv_quals  ~TYPE_QUAL_VOLATILE)
 -!= NULL_TREE)))
 +  else if (existing_sub_qualified_type (type, cv_quals, TYPE_QUAL_VOLATILE))
  {
mod_type_die = new_die

Re: Small fix for walking constructors

On Mon, Sep 22, 2014 at 10:58 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Sep 18, 2014 at 10:38 PM, Jeff Law l...@redhat.com wrote:
 On 09/18/14 13:01, Bernd Schmidt wrote:

 This fixes an issue on ptx where we fail to output a declaration for a
 variable. The testcase is c-torture/compile/pr34856.c, and the cause of
 the problem is that the variable g is never inserted into the varpool,
 which is where a future patch will look for references to variables not
 defined in the current translation unit (ptx assembly requires
 declarations for these too).

 Bootstrapped and tested on x86_64-linux, ok?


 Bernd

 walk-more.diff


 commit 968a508fdd5c413147b9c26d37633bf7ab7a7e65
 Author: Bernd Schmidtber...@codesourcery.com
 Date:   Thu Sep 11 14:35:01 2014 +0200

  Fix handling of CONSTRUCTORs in gimple-walk.

 * gimple-walk.c (walk_stmt_load_store_addr_ops): Look past casts
 when
 dealing with CONSTRUCTORs.

 OK.

 Errr - certainly not.

 It seems to me that walk_stmt_load_store_addr_ops is called on
 bogus input.  The function is supposed to be called on GIMPLE
 stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
 conversions in their elements.

 Please revert if you have applied already.

For the testcase I can indeed see


  bb 2:
  pin_3 = {(unsigned int) (long int) g[16]};

but that's invalid GIMPLE, unfortunately not caught by out checker.

Please fix the root cause and add checking to verify_gimple_assign_single.

Thanks,
Richard.




 Thanks,
 Richard.

 Jeff

Re: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Andrew Pinski

On Thu, Sep 18, 2014 at 3:07 AM, Richard Sandiford
richard.sandif...@arm.com wrote:
 This series is a cleaned-up version of:

 https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on the
 word size.  You can't have a subreg for byte 2 of a 4-byte word, say,
 but you can have a subreg for word 2 of a 4-word value (as well as lowpart
 subregs of that word, etc.).  This causes problems when an architecture has
 wider-than-word registers, since the addressability of a word can then depend
 on which register class is used.

 The register allocators need to fix up cases where a subreg turns out to
 be invalid for a particular class.  This is really an extension of what
 we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.


This sounds like something which should be tested on spu as it is the
main target that I can think of which has wider-than-word registers
and that has had issues with subreg.  I can't remember if the
simulator for SPU is free (as in beer) and would run on anything
besides PowerPC.  It has been more than 4 years since I looked into
the spu back-end also.

Thanks,
Andrew Pinski


 Thanks,
 Richard

Re: [BUILDROBOT] genrecog fix uncovers problem in bfin.md (was: [Patch] Teach genrecog/genoutput that scratch registers require write constraint modifiers)

2014-09-22 Thread Jan-Benedict Glaw

On Mon, 2014-09-22 08:58:34 +0100, James Greenhalgh james.greenha...@arm.com 
wrote:
 On Sat, Sep 20, 2014 at 08:40:01PM +0100, Jan-Benedict Glaw wrote:
  My build robot shows a new build error, which I guess is
  caused/uncovered by your genrecog change on bfin-elf (see eg. build
  http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=355667):
[...]
  build/genrecog /home/jbglaw/repos/gcc/gcc/common.md 
  /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md \
insn-conditions.md  tmp-recog.c
  /home/jbglaw/repos/gcc/gcc/config/bfin/bfin.md:1971: constraints not 
  supported in define_split
  make[1]: *** [s-recog] Error 1
  make[1]: Leaving directory `/home/jbglaw/build/bfin-elf/build-gcc/gcc'
  make: *** [all-gcc] Error 2
  
  Would be nice if the bfin maintainer or you would come up with a fix.
 
 I posted a fix for this on Friday evening at:
 
   https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01682.html
 
 I'm waiting for a bfin maintainer to say OK, as it isn't a port I know
 well.

Ah great!  I somehow missed to recognize your email, sorry for the
noise.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
the second  :


signature.asc
Description: Digital signature

Re: Small fix for walking constructors

2014-09-22 Thread Bernd Schmidt


On 09/22/2014 11:00 AM, Richard Biener wrote:

It seems to me that walk_stmt_load_store_addr_ops is called on
bogus input.  The function is supposed to be called on GIMPLE
stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
conversions in their elements.

Please revert if you have applied already.


For the testcase I can indeed see


   bb 2:
   pin_3 = {(unsigned int) (long int) g[16]};

but that's invalid GIMPLE, unfortunately not caught by out checker.

Please fix the root cause and add checking to verify_gimple_assign_single.


Hmm, fix how exactly? What representation do you want for an initializer 
where a pointer is cast to an int (or to a different address space, 
something that will be possible with patches I'll submit in the near 
future)?



Bernd

Re: Small fix for walking constructors

On Mon, Sep 22, 2014 at 11:00 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Sep 22, 2014 at 10:58 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 18, 2014 at 10:38 PM, Jeff Law l...@redhat.com wrote:
 On 09/18/14 13:01, Bernd Schmidt wrote:

 This fixes an issue on ptx where we fail to output a declaration for a
 variable. The testcase is c-torture/compile/pr34856.c, and the cause of
 the problem is that the variable g is never inserted into the varpool,
 which is where a future patch will look for references to variables not
 defined in the current translation unit (ptx assembly requires
 declarations for these too).

 Bootstrapped and tested on x86_64-linux, ok?


 Bernd

 walk-more.diff


 commit 968a508fdd5c413147b9c26d37633bf7ab7a7e65
 Author: Bernd Schmidtber...@codesourcery.com
 Date:   Thu Sep 11 14:35:01 2014 +0200

  Fix handling of CONSTRUCTORs in gimple-walk.

 * gimple-walk.c (walk_stmt_load_store_addr_ops): Look past casts
 when
 dealing with CONSTRUCTORs.

 OK.

 Errr - certainly not.

 It seems to me that walk_stmt_load_store_addr_ops is called on
 bogus input.  The function is supposed to be called on GIMPLE
 stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
 conversions in their elements.

 Please revert if you have applied already.

 For the testcase I can indeed see


   bb 2:
   pin_3 = {(unsigned int) (long int) g[16]};

 but that's invalid GIMPLE, unfortunately not caught by out checker.

 Please fix the root cause and add checking to verify_gimple_assign_single.

I'm on it.  The reason for the invalid GIMPLE is the gimplifier which
says well, looks like a constant for the target! and doesn't gimplify
at all then.  Oops (but only for vectors?!).

Introduced by r118747 and only semi-fixed by r129739.  The original
rev. is also just a bugfix for an ICE.

Thus I am testing the following.

2014-09-22  Richard Biener  rguent...@suse.de

* gimplify.c (gimplify_init_constructor): Do not leave
non-GIMPLE vector constructors around.
* tree-cfg.c (verify_gimple_assign_single): Verify that
CONSTRUCTORs have gimple elements.


Richard.






 Thanks,
 Richard.




 Thanks,
 Richard.

 Jeff



p
Description: Binary data

Re: Small fix for walking constructors

On Mon, Sep 22, 2014 at 11:52 AM, Bernd Schmidt ber...@codesourcery.com wrote:
 On 09/22/2014 11:00 AM, Richard Biener wrote:

 It seems to me that walk_stmt_load_store_addr_ops is called on
 bogus input.  The function is supposed to be called on GIMPLE
 stmts and in GIMPLE stmts CONSTRUCTORs may _not_ have
 conversions in their elements.

 Please revert if you have applied already.


 For the testcase I can indeed see


bb 2:
pin_3 = {(unsigned int) (long int) g[16]};

 but that's invalid GIMPLE, unfortunately not caught by out checker.

 Please fix the root cause and add checking to verify_gimple_assign_single.


 Hmm, fix how exactly? What representation do you want for an initializer
 where a pointer is cast to an int (or to a different address space,
 something that will be possible with patches I'll submit in the near
 future)?

For the above

  _4 = (long int) g[16];
  _5 = (unsigned int) _4;
  pin_3 = { _5 };

it's GIMPLE after all, not GENERIC.

Richard.


 Bernd

Re: [Patch bfin] Fixup use of constraints in define_split

2014-09-22 Thread Bernd Schmidt


On 09/19/2014 11:32 PM, James Greenhalgh wrote:

As with the earlier patch for sh
( https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01627.html ), this fixes the
fallout caused by https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01615.html.

These are build failures, and the fixes are obvious, but I don't know
my way around the failing ports, so I'd like an explicit maintainer ack.

For testing, I've just checked that the build error is resolved.


Looks obvious to me too. Thanks!


Bernd

Re: [PATCH] gcc-gdb-test.exp: Handle old GDB short int and long int types.

On Sat, Sep 20, 2014 at 11:21:25PM +0200, Mark Wielaard wrote:
 Old GDB might show short and long as short int and long int. This made 
 gcc.dg/guality/const-volatile.c ans restrict.c fail on older GDBs.
 According to the patch that changed this in newer versions of GDB
 this was a bug: https://sourceware.org/ml/gdb-patches/2012-09/msg00455.html
 
 The patch transforms the types short int and long int coming from
 GDB to plain short and long. And a variant has been added to the
 const-volatile.c testcase to make sure short and long long are handled
 correctly now with older GDB.
 
 Tested against GDB 7.7.1 and 7.4.50.
 
 gcc/testsuite/ChangeLog
 
   * lib/gcc-gdb-test.exp (gdb-test): Transform gdb types short int
   and long int to plain short and long.
   * gcc.dg/guality/const-volatile.c (struct bar): New struct
   containing short and long long fields.
   (bar): New variable to test the type.

Ok, with a minor nit:

 --- a/gcc/testsuite/lib/gcc-gdb-test.exp
 +++ b/gcc/testsuite/lib/gcc-gdb-test.exp
 @@ -111,6 +111,10 @@ proc gdb-test { args } {
   # Squash all extra whitespace/newlines that gdb might use for
   # pretty printing into one so result is just one line.
   regsub -all {[\n\r\t ]+} $type   type
 + # Old gdb might output long int instead of just long
 + # and short int instead of just short. Canonicalize.
 +regsub -all {\mlong int\M} $type long type
 +regsub -all {\mshort int\M} $type short type

Please fix whitespace on the above 2 lines, should be tab + 4 spaces
instead of 12 spaces.

Jakub

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-22 Thread Jiong Wang


On 19/09/14 17:19, Jeff Law wrote:


On 09/19/14 10:02, Jiong Wang wrote:

On 19/09/14 16:49, Jeff Law wrote:


Probably.  Though I'd be a bit concerned with next_block-next_bb.
Wouldn't it be safer to stash away the relevant basic block prior to the
call to split_edge, then use that saved copy.  Something like this
(untested):

basic_block old_dest = live_edge-dest;
next_block = split_edge (live_edge);

/* We create a new basic block.  Call df_grow_bb_info to make sure
  all data structures are allocated.  */
df_grow_bb_info (df_live);
bitmap_and (df_get_live_in (next_block),
   df_get_live_out (bb),
   df_get_live_in (old_dest));


The idea being we don't want to depend on the precise ordering blocks in
the block chain.

Could you try that and see if it does what you need?

Jeff,

Thanks, verified, it works.

Great.  Can you send an updated patchkit for review.


patch attached.

please review, thanks.

gcc/
  * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the live-in of
  new created BB as the intersection of live-in from old_dest and live-out
  from bb.
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..63deadf 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -217,12 +217,15 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   if (!df_live)
 	return false;

+  basic_block old_dest = live_edge-dest;
   next_block = split_edge (live_edge);

   /* We create a new basic block.  Call df_grow_bb_info to make sure
 	 all data structures are allocated.  */
   df_grow_bb_info (df_live);
-  bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb));
+
+  bitmap_and (df_get_live_in (next_block), df_get_live_out (bb),
+		  df_get_live_in (old_dest));
   df_set_bb_dirty (next_block);

   /* We should not split more than once for a function.  */

Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-22 Thread Jiong Wang


On 19/09/14 21:43, Jeff Law wrote:


On 09/15/14 08:33, Jiong Wang wrote:

Jeff,

   thanks, I partially understand your meaning here.

take the function ira_implicitly_set_insn_hard_regs in ira-lives.c
for example,

when generating address rtl, gcc will automatically generate const
operator to prefix
the address expression, like the following. so a simple CONSTANT_P
check is enough in
case there is no embedded register.

(insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
  (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40]
var_decl 0x2b2c2ff91510 recog_data)
  (const_int 480 [0x1e0] -1


but for architecture like aarch64, the following instruction
sequences to forming address
may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
  (high:DI (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) 35 {*movdi_aarch64}
   (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags
0xc0]  var_decl 0x7ff755a60900 stats))
  (nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
  (lo_sum:DI (reg/f:DI 20 x20 [99])
  (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) {add_losym_di}
   (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0]
var_decl 0x7ff755a60900 stats)
  (nil)))

   while CONSTANT_P could not catch the latter lo_sum case, as the
RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,

Hmm, it's been ~15 years since I regularly worked on a target that uses
HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST
as well, but reading the docs for CONST, that clearly isn't the case.

Sorry for that.  Can you (re) send your current patch for this for review?


patch attached.

please review, thanks.

gcc/
  * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P 
(src) to
  release more instruction sink opportunities.

gcc/testsuite/
  * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase.
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..739e957 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include bb-reorder.h
 #include shrink-wrap.h
 #include regcprop.h
+#include rtl-iter.h

 #ifdef HAVE_simple_return

@@ -169,7 +170,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 {
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
-  unsigned int i, dregno, end_dregno, sregno, end_sregno;
+  unsigned int i, dregno, end_dregno;
+  unsigned int sregno = FIRST_PSEUDO_REGISTER;
+  unsigned int end_sregno = FIRST_PSEUDO_REGISTER;
   basic_block next_block;
   edge live_edge;

@@ -179,7 +182,34 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 return false;
   src = SET_SRC (set);
   dest = SET_DEST (set);
-  if (!REG_P (dest) || !REG_P (src)
+
+  if (!REG_P (src))
+{
+  unsigned int reg_num = 0;
+  unsigned int nonconstobj_num = 0;
+  rtx src_inner = NULL_RTX;
+
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, src, ALL)
+	{
+	  rtx x = *iter;
+	  if (REG_P (x))
+	{
+	  reg_num++;
+	  src_inner = x;
+	}
+	  else if (!CONSTANT_P (x)  OBJECT_P (x))
+	nonconstobj_num++;
+	}
+
+  if (nonconstobj_num  0
+	  || reg_num  1)
+	src = NULL_RTX;
+  else if (reg_num == 1)
+	src = src_inner;
+}
+
+  if (!REG_P (dest) || src == NULL_RTX
   /* STACK or FRAME related adjustment might be part of prologue.
 	 So keep them in the entry block.  */
   || dest == stack_pointer_rtx
@@ -188,10 +218,13 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 return false;

   /* Make sure that the source register isn't defined later in BB.  */
-  sregno = REGNO (src);
-  end_sregno = END_REGNO (src);
-  if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
-return false;
+  if (REG_P (src))
+{
+  sregno = REGNO (src);
+  end_sregno = END_REGNO (src);
+  if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
+	return false;
+}

   /* Make sure that the destination register isn't referenced later in BB.  */
   dregno = REGNO (dest);
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c
new file mode 100644
index 000..ad2e588
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-rtl-pro_and_epilogue } */
+
+extern char *asm_out_file;
+extern void default_elf_asm_output_ascii (char *, const char *, int);
+
+void
+assemble_string (const char *p, int size)
+{
+  int pos = 0;
+  int maximum = 2000;
+
+  while (pos  size)
+{
+  int thissize = size - pos;
+
+  if (thissize  maximum)
+	thissize = maximum;
+
+  default_elf_asm_output_ascii (asm_out_file, p,

Re: [PATCH] remove duplicated lines in gcc/fortran/resolve.c

2014-09-22 Thread Dominique d'Humières


Le 21 sept. 2014 à 10:44, FX fxcoud...@gmail.com a écrit :

 AFAICT the lines 11200-11222 in gcc/fortran/resolve.c are a copy of
 the lines 11176-11198.
 
 The duplicates were introduced by revision 126468, an unrelated patch, after 
 the original commit of the code as 126466. It looks like a svn/patch mishap.

Thanks for tracking the origin of the problem. After having tested a patch, I 
usually do not revert it before doing the svn update when it is committed. If a 
last minute change has been made between the patch and the commit, it results 
in general a conflict during the svn update. However I have seen a couple times 
the patched tree is simply merged with the update leading to a duplicated piece 
of code.

For the record, while the two blocks were identical from a functional pout of 
view, their formatting were slightly different:

 PUBLIC interface '%s' at %L takes 
 dummy arguments of '%s' which is 
 PRIVATE, iface-sym-name, 
---
 PUBLIC interface '%s' at %L 
 takes dummy arguments of '%s' which 
 is PRIVATE, iface-sym-name, 

 
 The following patch removes the duplicated
 lines. OK for the trunk?
 
 You didn’t say if it was regtested. OK if it was (one can never be too sure!)

The patch has been in my working tree for months. I have posted recent test 
results with the patch (and r211089 reverted to fix pr61387) at

https://gcc.gnu.org/ml/gcc-testresults/2014-09/msg02018.html

 
 Thanks,
 FX

This is what I have committed as revision r215452

Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog   (revision 215451)
+++ gcc/fortran/ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2014-09-21  Dominique d'Humieres domi...@lps.ens.fr
+
+   * resolve.c (resolve_fl_procedure): Remove duplicated lines.
+
 2014-09-20  Alessandro Fanfarillo  fanfarillo@gmail.com
Tobias Burnus  bur...@net-b.de
 
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (revision 215451)
+++ gcc/fortran/resolve.c   (working copy)
@@ -11196,30 +11196,6 @@
}
 }
}
-
-  /* PUBLIC interfaces may expose PRIVATE procedures that take types
-PRIVATE to the containing module.  */
-  for (iface = sym-generic; iface; iface = iface-next)
-   {
- for (arg = gfc_sym_get_dummy_args (iface-sym); arg; arg = arg-next)
-   {
- if (arg-sym
-  arg-sym-ts.type == BT_DERIVED
-  !arg-sym-ts.u.derived-attr.use_assoc
-  !gfc_check_symbol_access (arg-sym-ts.u.derived)
-  !gfc_notify_std (GFC_STD_F2003, Procedure '%s' in 
- PUBLIC interface '%s' at %L takes 
- dummy arguments of '%s' which is 
- PRIVATE, iface-sym-name, 
- sym-name, iface-sym-declared_at, 
- gfc_typename(arg-sym-ts)))
-   {
- /* Stop this message from recurring.  */
- arg-sym-ts.u.derived-attr.access = ACCESS_PUBLIC;
- return false;
-   }
-}
-   }
 }
 
   if (sym-attr.function  sym-value  sym-attr.proc != PROC_ST_FUNCTION

Thanks for the review,

Dominique

Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.

 These are presently documented as producing a vector with the result in
 element 0, and this is inconsistent with their use in tree-vect-loop.c
 (which on bigendian targets pulls the bits out of the wrong end of the
 vector result). This leads to bugs on bigendian targets - see also
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114.

 I discounted fixing the vectorizer (to read from element 0) and then
 making bigendian targets (whose architectural insn produces the result in
 lane N-1) permute the result vector, as optimization of vectors in RTL seems
 unlikely to remove such a permute and would lead to a performance
 regression.

 Instead it seems more natural for the tree code to produce a scalar result
 (producing a vector with the result in lane 0 has already caused confusion,
 e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).

 However, this patch preserves the meaning of the optab (producing a result
 in lane 0 on little-endian architectures or N-1 on bigendian), thus
 generally avoiding the need to change backends. Thus, expr.c extracts an
 endianness-dependent element from the optab result to give the result
 expected for the tree code.

 Previously posted as an RFC
 https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra
 VIEW_CONVERT_EXPR if the types of the reduction/result do not match.

Huh.  Does that ever happen?  Please use a NOP_EXPR instead of
a VIEW_CONVERT_EXPR.

Ok with that change.

Thanks,
Richard.

 Testing:
 x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
 aarch64-none-linux-gnu: bootstrap
 aarch64-none-elf:  check-gcc, check-g++
 arm-none-eabi: check-gcc

 aarch64_be-none-elf: check-gcc, showing
 FAIL-PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
 FAIL-PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
 Passes the (previously-failing) reduced testcase on
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114

 Have also assembler/stage-1 tested that testcase on PowerPC, also
 fixed.

 gcc/ChangeLog:

 * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
 extract_bit_field around optab result.

 * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR,
 produce
 scalar not vector.

 * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand
 type
 for REDUC_{MIN,MAX,PLUS}_EXPR.

 * tree-vect-loop.c (vect_analyze_loop): Update comment.
 (vect_create_epilog_for_reduction): For direct vector reduction, use
 result of tree code directly without extract_bit_field.

 * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
 comment.

[committed] Fix -fcompare-debug issue in simd clone creation (PR debug/63328)

Hi!

Obviously, it is a bad idea to emit gimple assign stmts for temporaries
needed by debug stmts, we have to emit debug temporaries instead, otherwise
we generate different code between -g0 and -g.
Tested on x86_64-linux, committed to trunk/4.9.

2014-09-22  Jakub Jelinek  ja...@redhat.com

PR debug/63328
* omp-low.c (ipa_simd_modify_stmt_ops): For debug stmts
insert a debug source bind stmt setting DEBUG_EXPR_DECL
instead of a normal gimple assignment stmt.

* c-c++-common/gomp/pr63328.c: New test.

--- gcc/omp-low.c.jj2014-09-08 22:12:46.0 +0200
+++ gcc/omp-low.c   2014-09-22 11:44:47.751338842 +0200
@@ -11717,9 +11717,22 @@ ipa_simd_modify_stmt_ops (tree *tp, int
   if (tp != orig_tp)
 {
   repl = build_fold_addr_expr (repl);
-  gimple stmt
-   = gimple_build_assign (make_ssa_name (TREE_TYPE (repl), NULL), repl);
-  repl = gimple_assign_lhs (stmt);
+  gimple stmt;
+  if (is_gimple_debug (info-stmt))
+   {
+ tree vexpr = make_node (DEBUG_EXPR_DECL);
+ stmt = gimple_build_debug_source_bind (vexpr, repl, NULL);
+ DECL_ARTIFICIAL (vexpr) = 1;
+ TREE_TYPE (vexpr) = TREE_TYPE (repl);
+ DECL_MODE (vexpr) = TYPE_MODE (TREE_TYPE (repl));
+ repl = vexpr;
+   }
+  else
+   {
+ stmt = gimple_build_assign (make_ssa_name (TREE_TYPE (repl),
+NULL), repl);
+ repl = gimple_assign_lhs (stmt);
+   }
   gimple_stmt_iterator gsi = gsi_for_stmt (info-stmt);
   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
   *orig_tp = repl;
--- gcc/testsuite/c-c++-common/gomp/pr63328.c.jj2014-09-22 
12:09:50.140724501 +0200
+++ gcc/testsuite/c-c++-common/gomp/pr63328.c   2014-09-22 12:09:45.371745608 
+0200
@@ -0,0 +1,5 @@
+/* PR debug/63328 */
+/* { dg-do compile } */
+/* { dg-options -O2 -fopenmp-simd -fno-strict-aliasing -fcompare-debug } */
+
+#include pr60823-3.c

Jakub

Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars

On Thu, Sep 18, 2014 at 1:54 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 These match their corresponding tree codes, by taking a vector and returning
 a scalar; this is more architecturally neutral than the (somewhat loosely
 defined) previous optab that took a vector and returned a vector with the
 result in the least significant bits (i.e. element 0 for little-endian or
 N-1 for bigendian). However, the old optabs are preserved so as not to break
 existing backends, so clients check for both old + new optabs.

 Bootstrap, check-gcc and check-g++ on x86_64-none-linux-gnu.
 aarch64.exp + vect.exp on aarch64{,_be}-none-elf.
 (of course at this point in the series all these are using the old optab +
 migration path.)

scalar_reduc_to_vector misses a comment.

I wonder if at the end we wouldn't transition all backends and then
renaming reduc_*_scal_optab back to reduc_*_optab makes sense.

The optabs have only one mode - I wouldn't be surprised if an ISA
invents for example v4si - di reduction?  So do we want to make
reduc_plus_scal_optab a little bit more future proof (maybe there
is already an ISA that supports this kind of reduction?).

Otherwise the patch looks good to me.

Thanks,
Richard.

 gcc/ChangeLog:

 * doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal
 optabs, and note in reduc_[us](plus|min|max) to prefer the former.

 * expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall
 back to old reduc_... + BIT_FIELD_REF only if not.

 * optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR,
 return the reduce-to-scalar (reduc_..._scal) optab.
 (scalar_reduc_to_vector): New.

 * optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab,
 reduc_plus_scal_optab, reduc_umax_scal_optab,
 reduc_umin_scal_optab):
 New.

 * optabs.h (scalar_reduc_to_vector): Declare.

 * tree-vect-loop.c (vectorizable_reduction): Look for optabs
 reducing
 to either scalar or vector.

Re: [PATCH 7/14][Testsuite] Add tests of reductions using whole-vector-shifts (multiplication)

On Thu, Sep 18, 2014 at 2:19 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 For reduction operations (e.g. multiply) that don't have such a tree code
 ,or where the target platform doesn't define an optab handler for the tree
 code, we can perform the reduction using a series of log(N) shifts (where N
 = #elements in vector), using the VEC_RSHIFT_EXPR=whole-vector-shift tree
 code (if the platform handles the vec_shr_optab).

 First stage is to add some tests of non-(min/max/plus) reductions; here,
 multiplies. The first is designed to be non-foldable, so we make sure the
 architectural instructions line up with what the tree codes specify. The
 second is designed to be easily constant-propagated, to test the (currently
 endianness-dependent) constant folding code.

 In lib/target-supports.exp, I've defined a new
 check_effective_target_whole_vector_shift, which I intended to define to
 true for platforms with the vec_shr optab. However, I've not managed to make
 this test pass on PowerPC - even with -maltivec, -fdump-tree-vect-details
 gives me a message about the target not supporting vector multiplication -
 so I've omitted PowerPC from the whole_vector_shift. This doesn't feel
 right, suggestions welcomed from PowerPC maintainers?

 Tests passing on arm-none-eabi and x86_64-none-linux-gnu;
 also verified the scan-tree-dump part works on ia64-none-linux-gnu (by
 compiling to assembly only).
 (Tests are not run on AArch64, because we have no vec_shr_optab at this
 point; PowerPC, as above; or MIPS, as check_effective_target_vect_int_mult
 yields 0.)

Ok.

Thanks,
Richard.

 gcc/testsuite/ChangeLog:

 * lib/target-supports.exp
 (check_effective_target_whole_vector_shift):
 New.

 * gcc.dg/vect/vect-reduc-mul_1.c: New test.
 * gcc.dg/vect/vect-reduc-mul_2.c: New test.

Re: [PATCH 8/14][Testsuite] Add tests of reductions using whole-vector-shifts (ior)

On Thu, Sep 18, 2014 at 2:25 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 These are like the previous patch, but using | rather than * - I was unable
 to get the previous test to pass on PowerPC and MIPS.

 I note there is no inherent vector operation here - a bitwise OR across a
 word, and a reduction via shifts using scalar (not vector) ops would be
 all that's necessary. However, GCC doesn't exploit this possibility at
 present, and I don't have any plans at present to add such myself.

 Passing on x86_64-linux-gnu, aarch64-none-elf, aarch64_be-none-elf,
 arm-none-eabi.
 The 'scan-tree-dump' part passes on mips64 and powerpc (although the latter
 is disabled as check_effective_target_whole_vector_shift gives 0, as per
 previous patch)

Ok.

Thanks,
Richard.

 gcc/testsuite/ChangeLog:

 * gcc.dg/vect/vect-reduc-or_1.c: New test.
 * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

RE: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Ajit Kumar Agarwal

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Richard Sandiford
Sent: Monday, September 22, 2014 12:54 PM
To: Jeff Law
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 0/5] Fix handling of word subregs of wide registers

Jeff Law l...@redhat.com writes:
 On 09/19/14 01:23, Richard Sandiford wrote:
 Jeff Law l...@redhat.com writes:
 On 09/18/14 04:07, Richard Sandiford wrote:
 This series is a cleaned-up version of:

   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on 
 the word size.  You can't have a subreg for byte 2 of a 4-byte 
 word, say, but you can have a subreg for word 2 of a 4-word value 
 (as well as lowpart subregs of that word, etc.).  This causes 
 problems when an architecture has wider-than-word registers, since 
 the addressability of a word can then depend on which register 
 class is used.

 The register allocators need to fix up cases where a subreg turns 
 out to be invalid for a particular class.  This is really an 
 extension of what we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
 I thought we fixed these problems long ago with the change to subreg_byte?!?

 No, that was fixing something else.  (I'm just about old enough to 
 remember that too!)  The problem here is that (say):

  (subreg:SI (reg:DI X) 4)

 is independently addressable on little-endian AArch32 if X assigned 
 to a GPR, but not if X is assigned to a vector register.  We need to 
 allow these kinds of subreg on pseudos in order to decompose 
 multiword arithmetic.  It's then up to the RA to realise that a 
 reload would be needed if X were assigned to a vector register, since 
 the upper half of a vector register cannot be independently accessed.

 Note that you could write this example even with the old word-style 
 offsets and IIRC the effect would have been the same.
 OK.  So I kept thinking in terms of the byte offset stuff.  But what 
 you're tackling is related to the mess around the mode of the subreg 
 having a different meaning if its smaller than a word vs word-sized or 
 greater.

 Right?

Yeah, that's right.  Addressability is based on words, which is inconvenient 
when your registers are bigger than a word.

If the architecture like Microblaze which doesn't support  the 1 byte or 2 byte 
registers. In this scenario what should be returned when SUBREG_WORD is used.

Thanks,
Richard

Re: [PATCH 9/14] Enforce whole-vector-shifts to always be by a whole number of elements

On Thu, Sep 18, 2014 at 2:27 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 The VEC_RSHIFT_EXPR is only ever used by the vectorizer in tree-vect-loop.c
 (vect_create_epilog_for_reduction), to shift the vector by a whole number of
 elements. The tree code allows more general shifts but only for integral
 types. This only causes pain and difficulty for backends (particularly for
 backends with different endiannesses), and enforcing that restriction for
 integral types too does no harm.

 bootstrapped on aarch64-none-linux-gnu and x86-64-none-linux-gnu
 check-gcc on aarch64-none-elf and x86_64-none-linux-gnu

Hmm, but then (coming from the tree / gimple level) all shifts can
be expressed with a VEC_PERM_EXPR.  And of course a general
whole-vector shift could be expressed using a VIEW_CONVERT_EXPR
to a 1-element integer vector and a regular [RL]SHIFT_EXPR and then
converting back.

So it seems to me that the vectorizer should instead emit a
VEC_PERM_EXPR (making sure the backends or the generic
vec_perm expansion code in optabs.c handles the whole-vector-shift
case in an optimal way).

The current VEC_RSHIFT_EXPR description lacks information
on what is shifted in btw (always zeros? the most significant bit (endian
dependent?!)).

So - can we instead remove VEC_[LR]SHIFT_EXPR?  Seems that
VEC_LSHIFT_EXPR is unused anyway, and thus vec_shl_optabs
as well.

Thanks,
Richard.

 gcc/ChangeLog:

 * tree-cfg.c (verify_gimple_assign_binary): for VEC_RSHIFT_EXPR (and
 VEC_LSHIFT_EXPR), require shifts to be by a whole number of elements
 for all types, rather than only non-integral types.

 * tree.def (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR): Update comment.

 * doc/md.texi (vec_shl_m, vec_shr_m): Update comment.

Re: [PATCH 13/14][AArch64_be] Fix vec_shr pattern to correctly implement endianness-neutral optab

On Thu, Sep 18, 2014 at 2:45 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 The previous patch broke aarch64_be by redefining VEC_RSHIFT_EXPR /
 vec_shr_optab to always shift the vector towards gcc's element 0. This fixes
 aarch64_be to do that.

 check-gcc on aarch64-none-elf (no changes) and aarch64_be-none-elf (fixes
 all regressions produced by previous patch, i.e. no regressions from before
 redefining vec_shr).

Using vector permutes would have avoided this I guess?

Richard.


 gcc/ChangeLog:

 * config/aarch64/aarch64-simd.md (vec_shr_mode *2): Fix bigendian.

Re: [PATCH 11/14] Remove VEC_LSHIFT_EXPR and vec_shl_optab

On Thu, Sep 18, 2014 at 2:35 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 The VEC_LSHIFT_EXPR tree code, and the corresponding vec_shl_optab, seem to
 have been added for completeness, providing a counterpart to VEC_RSHIFT_EXPR
 and vec_shr_optab. However, whereas VEC_RSHIFT_EXPRs are generated (only) by
 the vectorizer, VEC_LSHIFT_EXPR expressions are not generated at all, so
 there seems little point in maintaining it.

 Bootstrapped on x86_64-unknown-linux-gnu.
 aarch64.exp+vect.exp on aarch64-none-elf and aarch64_be-none-elf.

Ah, there it is ;)

Ok.

Thanks,
Richard.

 gcc/ChangeLog:

 * expr.c (expand_expr_real_2): Remove code handling VEC_LSHIFT_EXPR.
 * fold-const.c (const_binop): Likewise.
 * cfgexpand.c (expand_debug_expr): Likewise.
 * tree-inline.c (estimate_operator_cost, dump_generic_node,
 op_code_prio, op_symbol_code): Likewise.
 * tree-vect-generic.c (expand_vector_operations_1): Likewise.
 * optabs.c (optab_for_tree_code): Likewise.
 (expand_vec_shift_expr): Likewise, update comment.
 * tree.def: Delete VEC_LSHIFT_EXPR, remove comment.
 * optabs.h (expand_vec_shift_expr): Remove comment re.
 VEC_LSHIFT_EXPR.
 * optabs.def: Remove vec_shl_optab.
 * doc/md.texi: Remove references to vec_shr_m.

Re: [PATCH 14/14][Vectorizer] Tidy up vect_create_epilog / use_scalar_result

On Thu, Sep 18, 2014 at 2:48 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 Following earlier patches, vect_create_epilog_for_reduction contains exactly
 one case where extract_scalar_result==true. Hence, move the code 'if
 (extract_scalar_result)' there, and tidy-up/remove some variables.

 bootstrapped on x86_64-none-linux-gnu + check-gcc + check-g++.

Ok.

Thanks,
Richard.

 gcc/ChangeLog:

 * tree-vect-loop.c (vect_create_epilog_for_reduction): Move code for
 'if (extract_scalar_result)' to the only place that it is true.

Re: [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral

On Thu, Sep 18, 2014 at 2:42 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the
 general principles of tree. This patch updates fold-const and the vectorizer
 (the only place where such expressions are created), such that
 VEC_RSHIFT_EXPR always shifts towards element 0.

 The tree code still maps directly onto the vec_shr_optab, and so this patch
 *will break any bigendian platform defining the vec_shr optab*.
 -- For AArch64_be, patch follows next in series;
 -- For PowerPC, I think patch/rfc 15 should fix, please inspect;
 -- For MIPS, I think patch/rfc 16 should fix, please inspect.

 gcc/ChangeLog:

 * fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards
 element 0.

 * tree-vect-loop.c (vect_create_epilog_for_reduction): always
 extract
 the result of a reduction with vector shifts from element 0.

 * tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift
 direction.

 * doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction.

 Testing Done:

 Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on
 aarch64-none-elf.

As said elsewhere I'd like the vectorizer to use VEC_PERM_EXPRs
and the generic vec_perm expansion machinery handle the
case where the permute can be expressed using the vec_shr_optab.
You'd have, for a 1-element shift of V4SI x, VEC_PERM x, { 0, 0, 0, 0
}, {4, 3, 2, 1 }

I'd say that if the target says it can handle the constant permute just fine
then use the vec_perm_const expansion path.

Richard.

Re: Fix i386 FP_TRAPPING_EXCEPTIONS

On Fri, 19 Sep 2014, Joseph S. Myers wrote:

 On Thu, 18 Sep 2014, Joseph S. Myers wrote:
 
  On Thu, 18 Sep 2014, Uros Bizjak wrote:
  
   OK for mainline and release branches.
  
  I've omitted ia64 from the targets in the testcase in the release branch 
  version, given the lack of any definition of FP_TRAPPING_EXCEPTIONS at all 
  there.
  
  (I think a definition as (~_fcw  0x3f) should work for ia64, but haven't 
  tested that.)
 
 Here is an *untested* patch with that definition.
 
 2014-09-19  Joseph Myers  jos...@codesourcery.com
 
   PR target/63312
   * config/ia64/sfp-machine.h (FE_EX_ALL, FP_TRAPPING_EXCEPTIONS):
   New macros.

Now committed after Andreas's testing reporting in PR 63312.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check


Well, I haven't looked into this in detail: I've gone only as far as
  * swapping emit-rtl.o between 'good' compiles (svn r214042) and 'bad' 
compiles (r214043), finding that the critical difference is in the emit-rtl.o 
generated by r214043;
  *looking at the relocations in the 'bad' emit_rtl.o, seeing new entries 
'fixed_regs + ', and that Richard Biener's changelog specifically 
mentions stripping signedness changes (and introduces the SIGN_NOPS).


However, I apply your patch (minus the hunk adding the (set_attr type load1), 
this appears to have gone in already), and still see the same error message:


emit-rtl.o: In function `gen_rtx_REG':
emit-rtl.c:(.text+0x12f8): relocation truncated to fit: 
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section 
in regclass.o

emit-rtl.o: In function `gen_rtx':
emit-rtl.c:(.text+0x1824): relocation truncated to fit: 
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section 
in regclass.o

collect2: error: ld returned 1 exit status

and still see the same (suspicious-looking, although perhaps not convicted) 
relocations:


$ readelf --relocs 
benchspec/CPU2006/403.gcc/build/build_base_test./emit-rtl.o | grep fixed_regs

12a8  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
12ac  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
12f8  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 

12fc  005d0116 R_AARCH64_LDST8_A  fixed_regs + 

1824  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 

1828  005d0116 R_AARCH64_LDST8_A  fixed_regs + 

186c  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
1870  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0

I've also now bootstrapped my patch (STRIP_NOPS - STRIP_SIGN_NOPS * 2) on 
aarch64-none-linux-gnu and x86_64-none-linux-gnu, and check-gcc with no 
regressions, so would like to propose that patch for trunk...?


--Alan



Andrew Pinski wrote:

On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com wrote:

We've been seeing errors using aarch64-none-linux-gnu gcc to build the
403.gcc benchmark from spec2k6, that we've traced back to this patch. The
error looks like:

/home/alalaw01/bootstrap_richie/gcc/xgcc
-B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53
-DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o
attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o
c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o
cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o
cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o
alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o
cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o combine.o
conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o
diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o
emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o
function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o
hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o insn-extract.o
insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o
jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o
params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o
reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o
resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o
sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o
ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o
tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o xcoffout.o
ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o
xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o lbasename.o
getpwd.o ucbqsort.o -lm-o gcc
emit-rtl.o: In function `gen_rtx_REG':
emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
section in regclass.o
emit-rtl.o: In function `gen_rtx':
emit-rtl.c:(.text+0x1824): relocation truncated to fit:
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
section in regclass.o
collect2: error: ld returned 1 exit status
specmake: *** [gcc] Error 1
Error with make 'specmake -j7 build': check file
'/home/alalaw01/spectest/benchspec/CPU2006/403.gcc/build/build_base_test./make.err'
  Command returned exit code 2
  Error with make!
*** Error building 403.gcc

Inspecting the compiled emit-rtl.o shows:

$ readelf --relocs good/emit-rtl.o | grep fixed_regs
12a8 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
12ac 005d0115

[AArch64] Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

Ok thanks Jeff. In that case I think I should draw this to the attention of the 
AArch64 maintainers to check the testsuite updates are OK before I commit...?


Methinks it may be possible to get further, or at least increase our confidence, 
if I mock out try_widen_shift_mode, and/or try injecting some dubious RTL from 
a builtin, although this'll only give a momentary snapshot of behaviour. I may 
or may not have time to look into this though ;)...


Cheers, Alan

Jeff Law wrote:

On 09/18/14 03:35, Alan Lawrence wrote:

Moreover, I think we both agree that if result_mode==shift_mode, the
transformation is correct. Just putting that check in, achieves
what I'm trying for here, so I'd be happy to go with the attached
patch and call it a day. However, I'm a little concerned about the
other cases - i.e. where shift_mode is wider than result_mode.
Let's go ahead and get the attached patch installed.  I'm pretty sure 
it's correct and I know you want to see something move forward here.  We 
can iterate further if we want.



If I understand correctly (and I'm not sure about that, let's see how
far I get), this means we'll perform the shift in (say) DImode, when
we're only really concerned about the lower (say) 32-bits (for an
originally-SImode shift).

That's the class of cases I'm concerned about.


  try_widen_shift_mode will in this case

check that the result of the operation *inside* the shift (in our
case an XOR) has 33 sign bit copies (via num_sign_bit_copies), i.e.
that its *top* 32-bits are all equal to the original SImode sign bit.
count of these bits may be fed into the top of the desired SImode
result by the DImode shift. Right so far?

Correct.


AFAICT, num_sign_bit_copies for an XOR, conservatively returns the
minimum of the num_sign_bit_copies of its two operands. I'm not sure
whether this is behaviour we should rely on in its callers, or for
the sake of abstraction we should treat num_sign_bit_copies as a
black box (which does what it says on the, erm, tin).
Treat it as a black box.  It returns the number of known sign bit 
copies.  There may be more, but never less.




If the former, doesn't having num_sign_bit_copies = the difference
in size between result_mode and shift_mode, of both operands to the
XOR, guarantee safety of the commutation (whether the constant is
positive or negative)? We could perform the shift (in the larger
mode) on both of the XOR operands safely, then XOR together their
lower parts.
I had convinced myself that when we flip the sign bit via the XOR and 
commute the XOR out that we invalidate the assumptions made when 
widening.  But I'm not so sure anymore.  Damn I hate changes made 
without suitable tests :(


I almost convinced myself the problem is in the adjustment of C2 in the 
widened case, but that's not a problem either.  At least not on paper.



If, however, we want to play safe and ensure that we deal safely with
 any XOR whose top (mode size difference + 1) bits were the same,
then I think the restriction that the XOR constant is positive is
neither necessary nor sufficient; rather (mirroring
try_widen_shift_mode) I think we need that num_sign_bit_copies of the
constant in shift_mode, is more than the size difference between
result_mode and shift_mode.
But isn't that the same?  Isn't the only case where it isn't the same 
when the constant has bits set that are outside the mode of the other 
operand?


Hmm, what about (xor:QI A -1)?  In that case -1 will be represented with 
bits outside the precision of QImode.



Hmmm. I might try that patch at some point, I think it is the right
check to make. (Meta-comment: this would be *so*much* easier if we
could write unit tests more easily!) In the meantime I'd be happy to
settle for the attached...
No argument on the unit testing comment.  It's a major failing in the 
design of GCC that we can't easily build a unit testing framework.


Jeff

Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

On Mon, Sep 22, 2014 at 1:10 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 Well, I haven't looked into this in detail: I've gone only as far as
   * swapping emit-rtl.o between 'good' compiles (svn r214042) and 'bad'
 compiles (r214043), finding that the critical difference is in the
 emit-rtl.o generated by r214043;
   *looking at the relocations in the 'bad' emit_rtl.o, seeing new entries
 'fixed_regs + ', and that Richard Biener's changelog specifically
 mentions stripping signedness changes (and introduces the SIGN_NOPS).

 However, I apply your patch (minus the hunk adding the (set_attr type
 load1), this appears to have gone in already), and still see the same error
 message:

 emit-rtl.o: In function `gen_rtx_REG':
 emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
 section in regclass.o
 emit-rtl.o: In function `gen_rtx':
 emit-rtl.c:(.text+0x1824): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
 section in regclass.o
 collect2: error: ld returned 1 exit status

 and still see the same (suspicious-looking, although perhaps not convicted)
 relocations:

 $ readelf --relocs
 benchspec/CPU2006/403.gcc/build/build_base_test./emit-rtl.o | grep
 fixed_regs
 12a8  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
 12ac  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
 12f8  005d0113 R_AARCH64_ADR_PRE  fixed_regs +
 
 12fc  005d0116 R_AARCH64_LDST8_A  fixed_regs +
 
 1824  005d0113 R_AARCH64_ADR_PRE  fixed_regs +
 
 1828  005d0116 R_AARCH64_LDST8_A  fixed_regs +
 
 186c  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
 1870  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0

 I've also now bootstrapped my patch (STRIP_NOPS - STRIP_SIGN_NOPS * 2) on
 aarch64-none-linux-gnu and x86_64-none-linux-gnu, and check-gcc with no
 regressions, so would like to propose that patch for trunk...?

As Andrew said it certainly isn't a fix for the bug but only a workaround.
That said, I don't think keeping the sign-change strip is important (it was
just cleanup of that routine as I were there).

So - the patch is ok for trunk.  You still may want to fix the bug
though ;)

(I'd say it makes more sense then to remove stripping conversions
entirely)

Thanks,
Richard.

 --Alan




 Andrew Pinski wrote:

 On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com
 wrote:

 We've been seeing errors using aarch64-none-linux-gnu gcc to build the
 403.gcc benchmark from spec2k6, that we've traced back to this patch. The
 error looks like:

 /home/alalaw01/bootstrap_richie/gcc/xgcc
 -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53
 -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o
 attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o
 c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o
 cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o
 cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o
 alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o
 cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o
 combine.o
 conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o
 diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o
 emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o
 function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o
 hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o
 insn-extract.o
 insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o
 jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o
 params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o
 reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o
 resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o
 sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o
 ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o
 tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o
 xcoffout.o
 ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o
 xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o
 lbasename.o
 getpwd.o ucbqsort.o -lm-o gcc
 emit-rtl.o: In function `gen_rtx_REG':
 emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
 section in regclass.o
 emit-rtl.o: In function `gen_rtx':
 emit-rtl.c:(.text+0x1824): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON

Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114

On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 The end goal here is to remove this code from tree-vect-loop.c
 (vect_create_epilog_for_reduction):

   if (BYTES_BIG_ENDIAN)
 bitpos = size_binop (MULT_EXPR,
  bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) -
 1),
  TYPE_SIZE (scalar_type));
   else

 as this is the root cause of PR/61114 (see testcase there, failing on all
 bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener,
 all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is
 suspicious. The code snippet above is used on two paths:

 (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
 reduc_[us](plus|min|max)_optab.
 The optab is documented as the scalar result is stored in the least
 significant bits of operand 0, but the tree code as the first element in
 the vector holding the result of the reduction of all elements of the
 operand. This mismatch means that when the tree code is folded, the code
 snippet above reads the result from the wrong end of the vector.

 The strategy (as per
 https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new
 tree codes and optabs that produce scalar results directly; this seems
 better than tying (the element of the vector into which the result is
 placed) to (the endianness of the target), and avoids generating extra moves
 on current bigendian targets. However, the previous optabs are retained for
 now as a migration strategy so as not to break existing backends; moving
 individual platforms over will follow.

 A complication here is on AArch64, where we directly generate
 REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily
 remove this folding in order to decouple the midend and AArch64 backend.

Sounds fine.  I hope we can transition all backends for 5.0 and remove
the vector variant optabs (maybe renaming the scalar ones).

 (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
 VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab
 is defined in an endianness-dependent way, leading to significant
 complication in fold-const.c. (Moreover, the equivalent vec_shl_optab is
 never used!). Few platforms appear to handle vec_shr_optab (and fewer
 bigendian - I see only PowerPC and MIPS), so it seems pertinent to change
 the existing optab to be endianness-neutral.

 Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
 updates that implementation to fit the new endianness-neutral specification,
 serving as a guide for other existing backends. Patches/RFCs 15 and 16 are
 equivalents for MIPS and PowerPC; I haven't tested these but hope they act
 as useful pointers for the port maintainers.

 Finally patch 14 cleans up the affected part of tree-vect-loop.c
 (vect_create_epilog_for_reduction).

As said during the individual patches review I'd like the vectorizer to
use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with
only whole-element amounts).  This means we can remove
VEC_RSHIFT_EXPR.  It also means that if the backend defines
vec_perm_const (which it really should) it can handle the special
permutes that boil down to a possibly more efficient vector shift
there (a good optimization anyway).  Until it does that all backends
would at least create correct code (with the endian dependent
vec_shr removed).

Richard.

 --Alan

Re: [PATCH 0/5] Fix handling of word subregs of wide registers

Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com writes:
 Jeff Law l...@redhat.com writes:
 On 09/19/14 01:23, Richard Sandiford wrote:
 Jeff Law l...@redhat.com writes:
 On 09/18/14 04:07, Richard Sandiford wrote:
 This series is a cleaned-up version of:

   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on 
 the word size.  You can't have a subreg for byte 2 of a 4-byte 
 word, say, but you can have a subreg for word 2 of a 4-word value 
 (as well as lowpart subregs of that word, etc.).  This causes 
 problems when an architecture has wider-than-word registers, since 
 the addressability of a word can then depend on which register 
 class is used.

 The register allocators need to fix up cases where a subreg turns 
 out to be invalid for a particular class.  This is really an 
 extension of what we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
 I thought we fixed these problems long ago with the change to 
 subreg_byte?!?

 No, that was fixing something else.  (I'm just about old enough to 
 remember that too!)  The problem here is that (say):

  (subreg:SI (reg:DI X) 4)

 is independently addressable on little-endian AArch32 if X assigned 
 to a GPR, but not if X is assigned to a vector register.  We need to 
 allow these kinds of subreg on pseudos in order to decompose 
 multiword arithmetic.  It's then up to the RA to realise that a 
 reload would be needed if X were assigned to a vector register, since 
 the upper half of a vector register cannot be independently accessed.

 Note that you could write this example even with the old word-style 
 offsets and IIRC the effect would have been the same.
 OK.  So I kept thinking in terms of the byte offset stuff.  But what 
 you're tackling is related to the mess around the mode of the subreg 
 having a different meaning if its smaller than a word vs word-sized or 
 greater.

 Right?

Yeah, that's right.  Addressability is based on words, which is
 inconvenient when your registers are bigger than a word.

 If the architecture like Microblaze which doesn't support the 1 byte or
 2 byte registers. In this scenario what should be returned when
 SUBREG_WORD is used.

I don't understand the question sorry.  Subreg offsets are still represented
as bytes rather than words.  The patch doesn't change the way that subregs
are represented or the rules about which subregs are valid.

Both before and after the patch, the semantics of subregs say that if
you have 4-byte words, only one of:

(subreg:QI (reg:SI X) 0)
(subreg:QI (reg:SI X) 1)
(subreg:QI (reg:SI X) 2)
(subreg:QI (reg:SI X) 3)

is ever valid (0 for little-endian, 3 for big-endian).  Writing to that
one valid subreg will change the whole of X, unless the subreg is wrapped
in a strict_lowpart.  In other words, subregs are defined so that individual
parts of a word are not independently addressable.

However, individual words of a multiword register _are_ addressable.  I.e.:

   (subreg:SI (reg:DI Y) 0)
   (subreg:SI (reg:DI Y) 4)

are both valid.  Writing to one does not change the other.

The problem the patch was trying to solve was that you can have targets
with 4-byte words but some 8-byte registers.  In those cases, it's still
possible to form both of the Y subregs above if Y is allocated to a word-sized
register, but not if Y is allocated to a doubleword-sized register.

Thanks,
Richard

Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114

On Mon, Sep 22, 2014 at 1:21 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 The end goal here is to remove this code from tree-vect-loop.c
 (vect_create_epilog_for_reduction):

   if (BYTES_BIG_ENDIAN)
 bitpos = size_binop (MULT_EXPR,
  bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) -
 1),
  TYPE_SIZE (scalar_type));
   else

 as this is the root cause of PR/61114 (see testcase there, failing on all
 bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener,
 all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is
 suspicious. The code snippet above is used on two paths:

 (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
 reduc_[us](plus|min|max)_optab.
 The optab is documented as the scalar result is stored in the least
 significant bits of operand 0, but the tree code as the first element in
 the vector holding the result of the reduction of all elements of the
 operand. This mismatch means that when the tree code is folded, the code
 snippet above reads the result from the wrong end of the vector.

 The strategy (as per
 https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new
 tree codes and optabs that produce scalar results directly; this seems
 better than tying (the element of the vector into which the result is
 placed) to (the endianness of the target), and avoids generating extra moves
 on current bigendian targets. However, the previous optabs are retained for
 now as a migration strategy so as not to break existing backends; moving
 individual platforms over will follow.

 A complication here is on AArch64, where we directly generate
 REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily
 remove this folding in order to decouple the midend and AArch64 backend.

 Sounds fine.  I hope we can transition all backends for 5.0 and remove
 the vector variant optabs (maybe renaming the scalar ones).

 (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
 VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab
 is defined in an endianness-dependent way, leading to significant
 complication in fold-const.c. (Moreover, the equivalent vec_shl_optab is
 never used!). Few platforms appear to handle vec_shr_optab (and fewer
 bigendian - I see only PowerPC and MIPS), so it seems pertinent to change
 the existing optab to be endianness-neutral.

 Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
 updates that implementation to fit the new endianness-neutral specification,
 serving as a guide for other existing backends. Patches/RFCs 15 and 16 are
 equivalents for MIPS and PowerPC; I haven't tested these but hope they act
 as useful pointers for the port maintainers.

 Finally patch 14 cleans up the affected part of tree-vect-loop.c
 (vect_create_epilog_for_reduction).

 As said during the individual patches review I'd like the vectorizer to
 use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with
 only whole-element amounts).  This means we can remove
 VEC_RSHIFT_EXPR.  It also means that if the backend defines
 vec_perm_const (which it really should) it can handle the special
 permutes that boil down to a possibly more efficient vector shift
 there (a good optimization anyway).  Until it does that all backends
 would at least create correct code (with the endian dependent
 vec_shr removed).

It seems only Alpha completely lacks vec_perm_const but implements
vec_shr.

Richard.

 Richard.

 --Alan

Re: [PATCH 0/5] Fix handling of word subregs of wide registers

Andrew Pinski pins...@gmail.com writes:
 On Thu, Sep 18, 2014 at 3:07 AM, Richard Sandiford
 richard.sandif...@arm.com wrote:
 This series is a cleaned-up version of:

 https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on the
 word size.  You can't have a subreg for byte 2 of a 4-byte word, say,
 but you can have a subreg for word 2 of a 4-word value (as well as lowpart
 subregs of that word, etc.).  This causes problems when an architecture has
 wider-than-word registers, since the addressability of a word can then depend
 on which register class is used.

 The register allocators need to fix up cases where a subreg turns out to
 be invalid for a particular class.  This is really an extension of what
 we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.


 This sounds like something which should be tested on spu as it is the
 main target that I can think of which has wider-than-word registers
 and that has had issues with subreg.  I can't remember if the
 simulator for SPU is free (as in beer) and would run on anything
 besides PowerPC.  It has been more than 4 years since I looked into
 the spu back-end also.

Well, AArch64 and x86_64 should be good enough targets for testing
the patch.  In the AArch64 case the bug was holding up other big-endian
fixes, in the x86_64 case it led to a workaround in C_C_M_C.

Thanks,
Richard

Re: [PATCH] Fix ubsan ICE on invalid (PR sanitizer/61272)

2014-09-22 Thread Marek Polacek

On Wed, May 21, 2014 at 08:58:57PM +0200, Jakub Jelinek wrote:
 On Wed, May 21, 2014 at 08:46:22PM +0200, Marek Polacek wrote:
  2014-05-21  Marek Polacek  pola...@redhat.com
  
  PR sanitizer/61272
  * ubsan.c (is_ubsan_builtin_p): Turn assert into a condition.
  
  * g++.dg/ubsan/pr61272.C: New test.
 
 Ok, thanks.

Now committed to 4.9 branch as well to fix PR63323.

Marek

RE: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Ajit Kumar Agarwal

-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@arm.com] 
Sent: Monday, September 22, 2014 4:56 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 0/5] Fix handling of word subregs of wide registers

Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com writes:
 Jeff Law l...@redhat.com writes:
 On 09/19/14 01:23, Richard Sandiford wrote:
 Jeff Law l...@redhat.com writes:
 On 09/18/14 04:07, Richard Sandiford wrote:
 This series is a cleaned-up version of:

   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on 
 the word size.  You can't have a subreg for byte 2 of a 4-byte 
 word, say, but you can have a subreg for word 2 of a 4-word value 
 (as well as lowpart subregs of that word, etc.).  This causes 
 problems when an architecture has wider-than-word registers, since 
 the addressability of a word can then depend on which register 
 class is used.

 The register allocators need to fix up cases where a subreg turns 
 out to be invalid for a particular class.  This is really an 
 extension of what we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
 I thought we fixed these problems long ago with the change to 
 subreg_byte?!?

 No, that was fixing something else.  (I'm just about old enough to 
 remember that too!)  The problem here is that (say):

  (subreg:SI (reg:DI X) 4)

 is independently addressable on little-endian AArch32 if X assigned 
 to a GPR, but not if X is assigned to a vector register.  We need to 
 allow these kinds of subreg on pseudos in order to decompose 
 multiword arithmetic.  It's then up to the RA to realise that a 
 reload would be needed if X were assigned to a vector register, 
 since the upper half of a vector register cannot be independently accessed.

 Note that you could write this example even with the old word-style 
 offsets and IIRC the effect would have been the same.
 OK.  So I kept thinking in terms of the byte offset stuff.  But what 
 you're tackling is related to the mess around the mode of the subreg 
 having a different meaning if its smaller than a word vs word-sized 
 or greater.

 Right?

Yeah, that's right.  Addressability is based on words, which is  
inconvenient when your registers are bigger than a word.

 If the architecture like Microblaze which doesn't support the 1 byte 
 or
 2 byte registers. In this scenario what should be returned when 
 SUBREG_WORD is used.

I don't understand the question sorry.  Subreg offsets are still represented 
as bytes rather than words.  The patch doesn't change the way that subregs 
are represented or the rules about which subregs are valid.

Both before and after the patch, the semantics of subregs say that if you 
have 4-byte words, only one of:

(subreg:QI (reg:SI X) 0)
(subreg:QI (reg:SI X) 1)
(subreg:QI (reg:SI X) 2)
(subreg:QI (reg:SI X) 3)

is ever valid (0 for little-endian, 3 for big-endian).  Writing to that one 
valid subreg will change the whole of X, unless the subreg is wrapped in a 
strict_lowpart.  In other words, subregs are defined so that individual 
parts of a word are not independently addressable.

However, individual words of a multiword register _are_ addressable.  I.e.:

   (subreg:SI (reg:DI Y) 0)
   (subreg:SI (reg:DI Y) 4)

are both valid.  Writing to one does not change the other.

The problem the patch was trying to solve was that you can have targets with 
4-byte words but some 8-byte registers.  In those cases, it's still possible 
to form both of the Y subregs above if Y is allocated to a word-sized 
register, but not if Y is allocated to a doubleword-sized register.

Thanks Richard for the explanation. 

Thanks,
Richard

[PATCH] msp430: inhibit automatic link of -lnosys in absence of -msim

2014-09-22 Thread Peter A. Bigot

Based on discussion on the mspgcc-users mailing list[1], this patch
changes msp430 to not automatically apply -lnosys when -msim is absent,
as this prevents a user from supplying a custom system interface.

The existing behavior providing the CIO alternative can be obtained by
explicitly linking -lcio instead, assuming the corresponding newlib
patch[2] to move msp430's nosys implementation to libcio.a is also used.

gcc/ChangeLog
2014-09-22  Peter A. Bigot  p...@pabigot.com

* config/msp430/msp430.h: Remove automatic -lnosys when -msim absent.

[1] http://www.mail-archive.com/mspgcc-users@lists.sourceforge.net/msg12104.html
[2] https://sourceware.org/ml/newlib/2014/msg00465.html

Cc: d...@redhat.com
Cc: ni...@redhat.com
Signed-off-by: Peter A. Bigot p...@pabigot.com
---
 gcc/config/msp430/msp430.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 91fc91c..068bdad 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -70,7 +70,6 @@ extern bool msp430x;
 -lgcc  \
 -lcrt  \
 %{msim:-lsim}  \
-%{!msim:-lnosys}   \
 --end-group\
 %{!T*:%{!msim:%{mmcu=*:--script=%*.ld}}}   \
 %{!T*:%{!msim:%{!mmcu=*:%Tmsp430.ld}}} \
-- 
1.8.5.5

[PATCH] Fix missing gimplification of vector constructors


The following fixes non-GIMPLE constructors slipping through the
gimplifier.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-09-22  Richard Biener  rguent...@suse.de

* gimplify.c (gimplify_init_constructor): Do not leave
non-GIMPLE vector constructors around.
* tree-cfg.c (verify_gimple_assign_single): Verify that
CONSTRUCTORs have gimple elements.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 215450)
+++ gcc/gimplify.c  (working copy)
@@ -4021,12 +4021,6 @@ gimplify_init_constructor (tree *expr_p,
break;
  }
 
-   /* Don't reduce an initializer constant even if we can't
-  make a VECTOR_CST.  It won't do anything for us, and it'll
-  prevent us from representing it as a single constant.  */
-   if (initializer_constant_valid_p (ctor, type))
- break;
-
TREE_CONSTANT (ctor) = 0;
  }
 
Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 215450)
+++ gcc/tree-cfg.c  (working copy)
@@ -4207,8 +4233,20 @@ verify_gimple_assign_single (gimple stmt
  debug_generic_stmt (rhs1);
  return true;
}
+ if (!is_gimple_val (elt_v))
+   {
+ error (vector CONSTRUCTOR element is not a GIMPLE value);
+ debug_generic_stmt (rhs1);
+ return true;
+   }
}
}
+  else if (CONSTRUCTOR_NELTS (rhs1) != 0)
+   {
+ error (non-vector CONSTRUCTOR with elements);
+ debug_generic_stmt (rhs1);
+ return true;
+   }
   return res;
 case OBJ_TYPE_REF:
 case ASSERT_EXPR:

Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result


Richard Biener wrote:


Huh.  Does that ever happen?  Please use a NOP_EXPR instead of
a VIEW_CONVERT_EXPR.


Yes, the testcase is gcc.target/i386/pr51235.c which performs black magic*** 
with void *. (This testcase otherwise fails the verify_gimple_assign_unary check 
in tree-cfg.c .)   However, test passes also with your suggestion of NOP_EXPR so 
that's good by me.


***that is, computes the minimum

--Alan



Ok with that change.

Thanks,
Richard.


Testing:
x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
aarch64-none-linux-gnu: bootstrap
aarch64-none-elf:  check-gcc, check-g++
arm-none-eabi: check-gcc

aarch64_be-none-elf: check-gcc, showing
FAIL-PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
FAIL-PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
Passes the (previously-failing) reduced testcase on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114

Have also assembler/stage-1 tested that testcase on PowerPC, also
fixed.



gcc/ChangeLog:

* expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
extract_bit_field around optab result.

* fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR,
produce
scalar not vector.

* tree-cfg.c (verify_gimple_assign_unary): Check result vs operand
type
for REDUC_{MIN,MAX,PLUS}_EXPR.

* tree-vect-loop.c (vect_analyze_loop): Update comment.
(vect_create_epilog_for_reduction): For direct vector reduction, use
result of tree code directly without extract_bit_field.

* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
comment.

Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars


Richard Biener wrote:


scalar_reduc_to_vector misses a comment.


Ok to reuse the comment in optabs.h in optabs.c also?


I wonder if at the end we wouldn't transition all backends and then
renaming reduc_*_scal_optab back to reduc_*_optab makes sense.


Yes, that sounds like a plan, the _scal is a bit of a mouthful.


The optabs have only one mode - I wouldn't be surprised if an ISA
invents for example v4si - di reduction?  So do we want to make
reduc_plus_scal_optab a little bit more future proof (maybe there
is already an ISA that supports this kind of reduction?).


That sounds like a plausible thing for an ISA to do, indeed. However given these 
names are only used by the autovectorizer rather than directly, the question is 
what the corresponding source code looks like, and/or what changes to the 
autovectorizer we might have to make to (look for code to) exploit such an 
instruction. At this point I could go for a 
reduc_{plus,min_max}_scal_modemode which reduces from the first vector mode 
to the second scalar mode, and then make the vectorizer look only for cases 
where the second mode was the element type of the first; but I'm not sure I want 
to do anything more complicated than that at this stage. (However, indeed it 
would leave the possibility open for the future.)


--Alan

Re: [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral

2014-09-22 Thread Bill Schmidt

On Thu, 2014-09-18 at 09:12 -0400, David Edelsohn wrote:
 On Thu, Sep 18, 2014 at 8:42 AM, Alan Lawrence alan.lawre...@arm.com wrote:
  The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the
  general principles of tree. This patch updates fold-const and the vectorizer
  (the only place where such expressions are created), such that
  VEC_RSHIFT_EXPR always shifts towards element 0.
 
  The tree code still maps directly onto the vec_shr_optab, and so this patch
  *will break any bigendian platform defining the vec_shr optab*.
  -- For AArch64_be, patch follows next in series;
  -- For PowerPC, I think patch/rfc 15 should fix, please inspect;
  -- For MIPS, I think patch/rfc 16 should fix, please inspect.
 
  gcc/ChangeLog:
 
  * fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards
  element 0.
 
  * tree-vect-loop.c (vect_create_epilog_for_reduction): always
  extract
  the result of a reduction with vector shifts from element 0.
 
  * tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift
  direction.
 
  * doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction.
 
  Testing Done:
 
  Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on
  aarch64-none-elf.
 
 Why wasn't this tested on the PowerLinux system in the GCC Compile Farm?
 
 Also, Bill Schmidt can help check the PPC parts fo the patches.

Sorry for the late response; I just returned from vacation.  I think
that patch 15 looks reasonable on the surface, but would be more
comfortable if it had been tested.  I would echo David's suggestion that
you please test this on gcc110 in the compile farm to avoid surprises.
Given the similarity between vec_shl_mode and vec_shr_mode I am ok
with removing the former; it won't be difficult to re-create it later if
needed.

Please add some of the language you used above about VEC_RSHIFT_EXPR as
commentary for vec_shr_mode in vector.md, as right-shifting towards
element zero is not an obvious concept on a BE machine.

Thanks,
Bill

 
 Thanks, David

[AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c

2014-09-22 Thread James Greenhalgh


On Thu, Sep 18, 2014 at 11:12:15AM +0100, Richard Earnshaw wrote:
 On 18/09/14 10:53, James Greenhalgh wrote:
  +$(srcdir)/config/aarch64/aarch64-builtin-iterators.h: 
  $(srcdir)/config/aarch64/geniterators.sh \
  + $(srcdir)/config/aarch64/iterators.md
  + $(SHELL) $(srcdir)/config/aarch64/geniterators.sh \
  + $(srcdir)/config/aarch64/iterators.md  \
  + $(srcdir)/config/aarch64/aarch64-builtin-iterators.h
  +
   aarch-common.o: $(srcdir)/config/arm/aarch-common.c $(CONFIG_H) 
  $(SYSTEM_H) \
   coretypes.h $(TM_H) $(TM_P_H) $(RTL_H) $(TREE_H) output.h $(C_COMMON_H)
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 

 Is there any real need to write this into the source directory and have
 the built file checked in?  Ie. can't we always write to the build
 directory and use it from there.  That avoids problems if the sources
 are on a read-only filesystem.

 If we do need to leave it in the sources, then contrib/update_gcc should
 be taught how to touch the generated file when resyncing from the
 repositories.


I thought I had tried this and failed to make it work. I must not have
been trying hard enough at the time.

Updated as attached, generating the header in the build directory. It
looks much better this way!

Bootstrapped on aarch64-none-linux-gnueabi with no issues.

Ok?

Thanks,
James

---
gcc/

2014-09-22  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/geniterators.sh: New.
* config/aarch64/iterators.md (VDQF_DF): New.
* config/aarch64/t-aarch64: Generate aarch64-builtin-iterators.h.
* config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6a77d29..6b9c383 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -313,91 +313,7 @@ typedef struct
   VAR11 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, MAP, L)
 
-/* BUILTIN_ITERATOR macros should expand to cover the same range of
-   modes as is given for each define_mode_iterator in
-   config/aarch64/iterators.md.  */
-
-#define BUILTIN_DX(T, N, MAP) \
-  VAR2 (T, N, MAP, di, df)
-#define BUILTIN_GPF(T, N, MAP) \
-  VAR2 (T, N, MAP, sf, df)
-#define BUILTIN_SDQ_I(T, N, MAP) \
-  VAR4 (T, N, MAP, qi, hi, si, di)
-#define BUILTIN_SD_HSI(T, N, MAP) \
-  VAR2 (T, N, MAP, hi, si)
-#define BUILTIN_V2F(T, N, MAP) \
-  VAR2 (T, N, MAP, v2sf, v2df)
-#define BUILTIN_VALL(T, N, MAP) \
-  VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, \
-	 v4si, v2di, v2sf, v4sf, v2df)
-#define BUILTIN_VALLDI(T, N, MAP) \
-  VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, \
-	 v4si, v2di, v2sf, v4sf, v2df, di)
-#define BUILTIN_VALLDIF(T, N, MAP) \
-  VAR12 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, \
-	 v4si, v2di, v2sf, v4sf, v2df, di, df)
-#define BUILTIN_VB(T, N, MAP) \
-  VAR2 (T, N, MAP, v8qi, v16qi)
-#define BUILTIN_VD1(T, N, MAP) \
-  VAR5 (T, N, MAP, v8qi, v4hi, v2si, v2sf, v1df)
-#define BUILTIN_VDC(T, N, MAP) \
-  VAR6 (T, N, MAP, v8qi, v4hi, v2si, v2sf, di, df)
-#define BUILTIN_VDIC(T, N, MAP) \
-  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
-#define BUILTIN_VDN(T, N, MAP) \
-  VAR3 (T, N, MAP, v4hi, v2si, di)
-#define BUILTIN_VDQ(T, N, MAP) \
-  VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
-#define BUILTIN_VDQF(T, N, MAP) \
-  VAR3 (T, N, MAP, v2sf, v4sf, v2df)
-#define BUILTIN_VDQF_DF(T, N, MAP) \
-  VAR4 (T, N, MAP, v2sf, v4sf, v2df, df)
-#define BUILTIN_VDQH(T, N, MAP) \
-  VAR2 (T, N, MAP, v4hi, v8hi)
-#define BUILTIN_VDQHS(T, N, MAP) \
-  VAR4 (T, N, MAP, v4hi, v8hi, v2si, v4si)
-#define BUILTIN_VDQIF(T, N, MAP) \
-  VAR9 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2sf, v4sf, v2df)
-#define BUILTIN_VDQM(T, N, MAP) \
-  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
-#define BUILTIN_VDQV(T, N, MAP) \
-  VAR5 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si)
-#define BUILTIN_VDQQH(T, N, MAP) \
-  VAR4 (T, N, MAP, v8qi, v16qi, v4hi, v8hi)
-#define BUILTIN_VDQ_BHSI(T, N, MAP) \
-  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
-#define BUILTIN_VDQ_I(T, N, MAP) \
-  VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
-#define BUILTIN_VDW(T, N, MAP) \
-  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
-#define BUILTIN_VD_BHSI(T, N, MAP) \
-  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
-#define BUILTIN_VD_HSI(T, N, MAP) \
-  VAR2 (T, N, MAP, v4hi, v2si)
-#define BUILTIN_VQ(T, N, MAP) \
-  VAR6 (T, N, MAP, v16qi, v8hi, v4si, v2di, v4sf, v2df)
-#define BUILTIN_VQN(T, N, MAP) \
-  VAR3 (T, N, MAP, v8hi, v4si, v2di)
-#define BUILTIN_VQW(T, N, MAP) \
-  VAR3 (T, N, MAP, v16qi, v8hi, v4si)
-#define BUILTIN_VQ_HSI(T, N, MAP) \
-  VAR2 (T, N, MAP, v8hi, v4si)
-#define BUILTIN_VQ_S(T, N, MAP) \
-  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
-#define BUILTIN_VSDQ_HSI(T, N, MAP) \
-  VAR6 (T, N, MAP, v4hi, v8hi, v2si, v4si, hi, si)
-#define BUILTIN_VSDQ_I(T, N, MAP) \
-  VAR11 (T, N, MAP, v8qi,

[patch] Implement move semantics for iostreams


This adds move and swap functions to the iostream classes.

Although this is a pretty large patch, it's a pure addition that only
affects C++11 mode, and should have no effect on existing code because
it won't be moving or swapping streams.

I wanted to use C++14's std::exchange so I added std::__exchange to
bits/move.h and made std::exchange forward to that.

I needed to add a new constructor to basic_ostream that doesn't call
init(0), for basic_iostream's move constructor to use.  (I wonder why
our non-standard default constructors for basic_istream and
basic_ostream call init(nullptr), rather than doing nothing.  Derived
classes that want init(nullptr) to be called can do that by passing
nullptr to the standard basic_istream and basic_ostream constructors
taking a pointer, which would allow the default constructors to be
re-purposed to intentionally leave the object uninitialized).

To ensure that the explicit instantiations in the library include the
new functions I had to move several files from src/c++98 to src/c++11,
which makes the patch huge, so the new tests are in a separate,
gzipped file to keep this post below the mailing list size limits.

Tested x86_64-linux, committed to trunk.

commit ac4314b3a2451147601385e26f19bb95b84c69d0
Author: Jonathan Wakely jwak...@redhat.com
Date:   Fri Sep 12 19:56:06 2014 +0100

Make streams movable and swappable.

	PR libstdc++/54316
	PR libstdc++/53626
	* config/abi/pre/gnu.ver: Add new exports.
	* config/io/basic_file_stdio.h (__basic_file): Support moving and
	swapping.
	* include/bits/basic_ios.h (basic_ios::move, basic_ios::swap):
	Likewise.
	* include/bits/ios_base.h (ios_base::_M_move, ios_base::_M_swap):
	Likewise.
	* include/bits/fstream.tcc (basic_filebuf): Likewise.
	* include/bits/move.h (__exchange): Define for C++11 mode.
	* include/ext/stdio_filebuf.h (stdio_filebuf): Support moving and
	swapping.
	* include/ext/stdio_sync_filebuf.h (stdio_sync_filebuf): Likewise.
	* include/std/fstream (basic_filebuf, basic_ifstream, basic_ofstream,
	basic_fstream): Likewise.
	* include/std/ios: Remove whitespace.
	* include/std/istream (basic_istream, basic_iostream): Support moving
	and swapping.
	* include/std/ostream (basic_ostream): Likewise.
	* include/std/sstream (basic_stringbuf, basic_istringstream,
	basic_ostringstream, basic_stringstream): Likewise.
	* include/std/streambuf (basic_streambuf): Do not default copy
	constructor and assignment on first declaration.
	* include/std/utility (exchange): Forward to __exchange.
	* testsuite/27_io/basic_filebuf/cons/char/copy_neg.cc: New.
	* src/c++11/Makefile.am: Add stream-related files.
	* src/c++11/Makefile.in: Regenerate.
	* src/c++11/ext11-inst.cc (stdio_filebuf, stdio_sync_filebuf):
	New file for explicit instantiation definitions.
	* src/c++11/ios.cc: Move from src/c++98 to here.
	(ios_base::_M_move, ios_base::_M_swap): Define.
	* src/c++11/ios-inst.cc: Move from src/c++98 to here.
	* src/c++11/iostream-inst.cc: Likewise.
	* src/c++11/istream-inst.cc: Likewise.
	* src/c++11/ostream-inst.cc: Likewise.
	* src/c++11/sstream-inst.cc: Likewise.
	* src/c++11/streambuf-inst.cc: Likewise.
	* src/c++98/Makefile.am: Remove stream-related files.
	* src/c++98/Makefile.in: Regenerate.
	* src/c++98/ext-inst.cc (stdio_filebuf): Remove explicit
	instantiations.
	* src/c++98/misc-inst.cc (stdio_sync_filebuf): Likewise.
	* src/c++98/ios-inst.cc: Move to src/c++11/.
	* src/c++98/ios.cc: Move to src/c++11/.
	* src/c++98/iostream-inst.cc: Likewise.
	* src/c++98/istream-inst.cc: Likewise.
	* src/c++98/ostream-inst.cc: Likewise.
	* src/c++98/sstream-inst.cc: Likewise.
	* src/c++98/streambuf-inst.cc: Likewise.
	* testsuite/27_io/basic_filebuf/cons/char/copy_neg.cc: New.
	* testsuite/27_io/basic_fstream/cons/move.cc: New.
	* testsuite/27_io/basic_fstream/assign/1.cc: New.
	* testsuite/27_io/basic_ifstream/cons/move.cc: New.
	* testsuite/27_io/basic_ifstream/assign/1.cc: New.
	* testsuite/27_io/basic_istringstream/assign/1.cc: New.
	* testsuite/27_io/basic_istringstream/cons/move.cc: New.
	* testsuite/27_io/basic_ofstream/cons/move.cc: New.
	* testsuite/27_io/basic_ofstream/assign/1.cc: New.
	* testsuite/27_io/basic_ostringstream/assign/1.cc: New.
	* testsuite/27_io/basic_ostringstream/cons/move.cc: New.
	* testsuite/27_io/basic_stringstream/assign/1.cc: New.
	* testsuite/27_io/basic_stringstream/cons/move.cc: New.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 41fac71..669e36d 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -989,7 +989,8 @@ GLIBCXX_3.4.10 {
 _ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EE6stosscEv;

Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars

On Mon, Sep 22, 2014 at 3:26 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 Richard Biener wrote:


 scalar_reduc_to_vector misses a comment.


 Ok to reuse the comment in optabs.h in optabs.c also?

Sure.

 I wonder if at the end we wouldn't transition all backends and then
 renaming reduc_*_scal_optab back to reduc_*_optab makes sense.


 Yes, that sounds like a plan, the _scal is a bit of a mouthful.

 The optabs have only one mode - I wouldn't be surprised if an ISA
 invents for example v4si - di reduction?  So do we want to make
 reduc_plus_scal_optab a little bit more future proof (maybe there
 is already an ISA that supports this kind of reduction?).


 That sounds like a plausible thing for an ISA to do, indeed. However given
 these names are only used by the autovectorizer rather than directly, the
 question is what the corresponding source code looks like, and/or what
 changes to the autovectorizer we might have to make to (look for code to)
 exploit such an instruction.

Ah, indeed.  Would be sth like a REDUC_WIDEN_SUM_EXPR or so.

 At this point I could go for a
 reduc_{plus,min_max}_scal_modemode which reduces from the first vector
 mode to the second scalar mode, and then make the vectorizer look only for
 cases where the second mode was the element type of the first; but I'm not
 sure I want to do anything more complicated than that at this stage.
 (However, indeed it would leave the possibility open for the future.)

Yeah, agreed.  For the min/max case a widen variant isn't useful anyway.

Thanks,
Richard.

 --Alan

Re: FW: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2014-09-22 Thread Thomas Schwinge

Hi!

On Tue, 27 Aug 2013 21:30:49 +, Iyer, Balaji V balaji.v.i...@intel.com 
wrote:
 --- /dev/null
 +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c
 @@ -0,0 +1,37 @@
 +/* { dg-do run  { target { i?86-*-* x86_64-*-* arm*-*-* } } } */
 +/* { dg-options -fcilkplus } */
 +/* { dg-options -lcilkrts { target { i?86-*-* x86_64-*-* arm*-*-* } } } */
 +
 +void f0(volatile int *steal_flag)
 +{ 
 +  int i = 0;
 +  /* Wait for steal_flag to be set */
 +  while (!*steal_flag) 
 +;
 +}
 +
 +int f1()
 +{
 +
 +  volatile int steal_flag = 0;
 +  _Cilk_spawn f0(steal_flag);
 +  steal_flag = 1;  // Indicate stolen
 +  _Cilk_sync; 
 +  return 0;
 +}
 +
 +void f2(int q)
 +{
 +  q = 5;
 +}
 +
 +void f3()
 +{
 +   _Cilk_spawn f2(f1());
 +}
 +
 +int main()
 +{
 +  f3();
 +  return 0;
 +}

Is this really well-formed Cilk Plus code?  Running with CILK_NWORKERS=1,
or -- the equivalent -- in a system with just one CPU (as per
libcilkrts/runtime/os-unix.c:__cilkrts_hardware_cpu_count returning 1), I
see this test busy-loop as follows:

Breakpoint 1, __cilkrts_hardware_cpu_count () at 
../../../source/libcilkrts/runtime/os-unix.c:358
358 {
(gdb) return 1
Make __cilkrts_hardware_cpu_count return now? (y or n) y
#0  cilkg_get_user_settable_values () at 
../../../source/libcilkrts/runtime/global_state.cpp:385
385 CILK_ASSERT(hardware_cpu_count  0);
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
f0 (steal_flag=steal_flag@entry=0x7fffd03c) at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:9
9 while (!*steal_flag) 
(gdb) info threads
  Id   Target Id Frame 
* 1Thread 0x77fcd780 (LWP 30816) spawning_arg.ex f0 
(steal_flag=steal_flag@entry=0x7fffd03c) at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:9
(gdb) list
4
5   void f0(volatile int *steal_flag)
6   { 
7 int i = 0;
8 /* Wait for steal_flag to be set */
9 while (!*steal_flag) 
10  ;
11  }
12
13  int f1()
(gdb) bt
#0  f0 (steal_flag=steal_flag@entry=0x7fffd03c) at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:9
#1  0x004009c8 in _cilk_spn_0 () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:17
#2  0x00400a4b in f1 () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:17
#3  0x00400d0e in _cilk_spn_1 () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:30
#4  0x00400d7a in f3 () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:30
#5  0x00400e33 in main () at 
[...]/source/gcc/testsuite/c-c++-common/cilk-plus/CK/spawning_arg.c:35

No additional thread has been spawned by libcilkrts, and the one initial
thread is stuck in f0, without being able to make progress.  Should in
f0's while loop, some function be called to yield to libcilkrts
scheduler, or should libcilkrts have spawned an additional thread, or is
the test case just not valid Cilk Plus code?


Grüße,
 Thomas


pgpG_DU89ebEh.pgp
Description: PGP signature

Re: [debug-early] Allow checking of DECL_ABSTRACT in decl_ultimate_origin

2014-09-22 Thread Michael Matz

Hi,

On Fri, 19 Sep 2014, Aldy Hernandez wrote:

 Michael, I really don't understand the need for this change in your original
 patch.  I don't know if this was a temporary testing change or what.

I'm pretty sure it was temporary testing, when I still was finding my way 
through dwarf2out limitations/constraints.

 I'm happy to report that with this and the last set of patches, both C 
 and C++ guality tests have = regressions than mainline.  Yay.

Super.


Ciao,
Michael.

[PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition

2014-09-22 Thread Felix Yang

Hi,

I find that update_equiv_regs in ira.c sets the wrong EQUIV
reg-note for pseudo with more than one definiton in certain situation.
Here is a simplified RTL snippet after this function finishs handling:

 (insn 77 37 78 2 (set (reg:SI 171)
 (const_int 0 [0])) ticket151.c:33 52 {movsi_internal_dsp}
  (expr_list:REG_EQUAL (const_int 0 [0])
 (nil)))

..

(insn 79 50 53 2 (set (mem/c:SI (reg/f:SI 136) [2 g_728+0 S4 A64])
 (reg:SI 171)) ticket151.c:33 52 {movsi_internal_dsp}
  (expr_list:REG_DEAD (reg:SI 171)
 (nil)))
(insn 53 79 54 2 (set (mem/c:SI (reg/f:SI 162) [4 g_163+0 S4 A32])
 (reg:SI 163)) 52 {movsi_internal_dsp}
  (expr_list:REG_DEAD (reg:SI 163)
 (expr_list:REG_DEAD (reg/f:SI 162)
 (nil
(insn 54 53 14 2 (set (reg:SI 171)
 (mem/u/c:SI (symbol_ref/u:SI (*.LC8) [flags 0x2]) [4  S4
A32])) ticket151.c:49 52 {movsi_internal_dsp}
  (expr_list:REG_EQUIV (mem/u/c:SI (symbol_ref/u:SI (*.LC8)
[flags 0x2]) [4  S4 A32])
 (expr_list:REG_EQUAL (mem/u/c:SI (symbol_ref/u:SI (*.LC8)
[flags 0x2]) [4  S4 A32])
 (nil


The REG_EQUIV of insn 54 is not correct as pseudo 171 is defined
in insn 77 with a differerent value.
This may causes reload replacing pseudo 171 with mem/u/c:SI
(symbol_ref/u:SI (*.LC8), which is wrong.
A proposed patch for this issue, please comment:

 Index: gcc/ira.c
===
--- gcc/ira.c(revision 215460)
+++ gcc/ira.c(working copy)
@@ -3477,18 +3477,26 @@ update_equiv_regs (void)
   no_equiv (dest, set, NULL);
   continue;
 }
+
   /* Record this insn as initializing this register.  */
   reg_equiv[regno].init_insns
 = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns);

   /* If this register is known to be equal to a constant, record that
  it is always equivalent to the constant.  */
-  if (DF_REG_DEF_COUNT (regno) == 1
-   note  ! rtx_varies_p (XEXP (note, 0), 0))
+  if (note  ! rtx_varies_p (XEXP (note, 0), 0))
 {
-  rtx note_value = XEXP (note, 0);
-  remove_note (insn, note);
-  set_unique_reg_note (insn, REG_EQUIV, note_value);
+  if (DF_REG_DEF_COUNT (regno) == 1)
+{
+  rtx note_value = XEXP (note, 0);
+  remove_note (insn, note);
+  set_unique_reg_note (insn, REG_EQUIV, note_value);
+}
+  else
+{
+  no_equiv (dest, set, NULL);
+  continue;
+}
 }

   /* If this insn introduces a constant register, decrease the priority


Cheers,
Felix

Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'

2014-09-22 Thread Michael Eager


On 09/21/14 21:10, Chen Gang wrote:

On 9/22/14 2:09, Michael Eager wrote:


Generally, you should use gcc to link programs, not ld.  gcc is
a driver which will select the appropriate libraries and support routines
(such as crt0.o, which contains _start) and pass them to the linker.



OK, thanks.

When gcc, it misses the root directory for crt1.o and crtn.o: e.g.
/lib/ld.so.1, crt1.o, crtn.o when gcc -v, but we need /upstream/
release/lib/ld.so.1, /upstream/lib/crt1.o, /upstream/libcrtn.o.


You likely need to build mb-gcc with --sysroot=/upstream.

How are you building gcc?  What are your configuration options?


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'

2014-09-22 Thread Michael Eager


On 09/21/14 20:55, Chen Gang wrote:



On 9/22/14 2:03, Michael Eager wrote:

On 09/20/14 23:24, Chen Gang wrote:

And it seems, we also need 'LinkScr.ld' for ldscript, could you share it
to me, thanks.

set_board_info ldscript -T/home/eager/Xilinx/dg/microblaze_0/LinkScr.ld


Hi Chen --

The DejaGNU configuration I provided is for a bare-metal environment.

If you are testing in a Linux environment, the tool chain you uses
should provide a default linker script which matches your hardware's
memory layout. You should not need to provide a separate linker script.



OK, thanks, I shall try to find the default linker script for it.


If you are running mb-gcc which generates executables which run on
the target, you do not need to provide a linker script.


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

Re: Move dwarf2 frame tables to read-only section for AIX

On Mon, 22 Sep 2014, Andrew Dixie wrote:

 I altered the dwarf2 frame and exception table generation so the
 decision on whether to use a read-only or read-write section is an
 independent decision from how the frame tables are registered.
 I renamed EH_FRAME_IN_DATA_SECTION to EH_FRAME_THROUGH_COLLECT2, as it
 now supports read-only, has slightly changed semantics, and I think
 this name better reflects what it currently does rather than what it
 historically did.

If you rename a target macro, the old target macro name needs to be 
poisoned in system.h.

 2014-09-22  Andrew Dixie  andr...@gentrack.com
 
   Move exception tables to read-only memory on AIX.
* dwarf2asm.c (dw2_asm_output_encoded_addr_rtx): Add call to
ASM_OUTPUT_DWARF_DATAREL.
* dwarf2out.c (switch_to_eh_frame_section): Use a read-only section
even if EH_FRAME_SECTION_NAME is undefined.  Add call to
EH_FRAME_THROUGH_COLLECT2.
* except.c (switch_to_exception_section):  Use a read-only section
even if EH_FRAME_SECTION_NAME is undefined.
   * collect2.c (write_c_file_stat): Provide dbase on AIX.
(scan_prog_file): Don't output __dso_handle nor __gcc_unwind_dbase.
* config/rs6000/aix.h (ASM_PREFERRED_EH_DATA_FORMAT): define.
(EH_TABLES_CAN_BE_READ_ONLY): define.
(ASM_OUTPUT_DWARF_PCREL): define.
(ASM_OUTPUT_DWARF_DATAREL): define.
(EH_FRAME_IN_DATA_SECTION): undefine.
(EH_FRAME_THROUGH_COLLECT2): define.
* config/rs6000/rs6000-aix.c: new file.
(rs6000_aix_asm_output_dwarf_pcrel): new function.
(rs6000_aix_asm_output_dwarf_datarel): new function.
* config/rs6000/rs6000.c (rs6000_xcoff_asm_init_sections): remove
assignment of exception_section.

This ChangeLog entry seems very incomplete.  It doesn't mention the 
changes for other architectures, or to defaults.h, or to the 
documentation, for example.

-- 
Joseph S. Myers
jos...@codesourcery.com

[patch] Fix std::try_lock behaviour


When I fixed std::try_lock a few years ago I misread the spec,
exceptions should not be caught and turned into a return value.

Tested x86_64-linux, committed to trunk.

commit 5effca670aa009c60e31b639604da4d00f388038
Author: Jonathan Wakely jwak...@redhat.com
Date:   Thu Sep 18 16:15:54 2014 +0100

	* include/std/mutex (try_lock): Do not swallow exceptions.
	* testsuite/30_threads/try_lock/4.cc: Fix test.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index f6b851c..d80fa5a 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -630,12 +630,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   int __idx;
   auto __locks = std::tie(__l1, __l2, __l3...);
-  __try
-  { __try_lock_impl0::__do_try_lock(__locks, __idx); }
-  __catch(const __cxxabiv1::__forced_unwind)
-  { __throw_exception_again; }
-  __catch(...)
-  { }
+  __try_lock_impl0::__do_try_lock(__locks, __idx);
   return __idx;
 }
 
diff --git a/libstdc++-v3/testsuite/30_threads/try_lock/4.cc b/libstdc++-v3/testsuite/30_threads/try_lock/4.cc
index 7741798..1212b65 100644
--- a/libstdc++-v3/testsuite/30_threads/try_lock/4.cc
+++ b/libstdc++-v3/testsuite/30_threads/try_lock/4.cc
@@ -133,8 +133,15 @@ void test03()
   while (unreliable_lock::throw_on  3)
   {
 unreliable_lock::count = 0;
-int failed = std::try_lock(l1, l2, l3);
-VERIFY( failed == unreliable_lock::throw_on );
+try
+  {
+std::try_lock(l1, l2, l3);
+VERIFY( false );
+  }
+catch (int e)
+  {
+VERIFY( e == unreliable_lock::throw_on );
+  }
 ++unreliable_lock::throw_on;
   }
 }

Re: [GOOGLE] Fix LIPO COMDAT fixup and gcov-tool interactions

2014-09-22 Thread Teresa Johnson

On Mon, Sep 22, 2014 at 1:36 AM, Nathan Sidwell nat...@acm.org wrote:
 On 09/21/14 18:58, Xinliang David Li wrote:

 the intent is that that points to the gcov_info object of the object file
 containing the live version of the function.  I couldn't quite get this
 to
 work though -- it involves emitting a function's gcov_fn_info decl in the
 same comdat group as the function itself.


 Another problem is that comdat functions may have different CFGs due
 to different early inline decisions. Comdatting gcov counters can lead
 to problems in profile use. Not comdatting profile counters have
 another advantage -- it allows context sensitive profiling for comdat
 function inline instances (IPA-inline).


 IIRC early inlining is done before the counters are created.  You're right
 later inlining may be a problem, and require a non-comdat set of cloned
 counters.   I can't recall exactly at what stage the counters are now
 inserted relative to inlining.  The CFG machinery had a number of
 significant changes while, and shortly after, I was working on this.

 You'll see the checking of gfi_ptr-key != gi_ptr in libgcov-driver.c.

 Are you making use of this machinery, or inventing new machinery?


 Teresa's method is a different machinery -- it tries to propagate
 profile data from the selected comdat copy + inline instance copies to
 comdat copies with zero counts.


 It'd be preferrable to complete the mechanism I outline above, rather than
 have a competing mechanism.

I don't think the above mechanism helps the problem my patches are
trying to solve. Unless we are in whole-program mode, which we don't
use, the only profiles available at profile-use time are those for the
given module (and any other modules in the same module group in LIPO
mode). If the COMDAT copy selected by the linker in the profile-gen
binary is in a different module, we would see all-zero counts when
compiling modules containing the other copies. I had submitted some
patches to trunk awhile back in the 4.9 time frame to help deal with
this by using estimated frequencies for zero-count COMDAT copies, and
applying scaled counts when we inline them, but it is an imperfect
solution.

The approach we now take for LIPO builds is to propagate the counts
for the profiled copy of the COMDAT to other modules. (Additionally
the indirect call profiling we perform in LIPO mode would point to a
module that we didn't have access to, which is a related issue that
the COMDAT fixups we perform at the end of the LIPO profiling run are
trying to solve.)

 Also, this patch  is in effect lying because
 the data then makes it look like the unselected comdat instances are in fact
 being executed -- looking at the whole program it's going to be harder to
 understand whether the different inline instances are being executed
 multiple times, or are duplicate data.  Does the gcov user output indicate
 this subtlety in some way?

Correct in that it makes it look like these copies were executed. This
was causing some issues when we rewrote/merged profiles with
gcov-tool, which essentially operates in whole-program mode. To handle
this, this patch marks the modified (previously all-zero) copies in
the gcda file. So now gcov-tool can handle them appropriately (clear
them on read before doing any analysis), and gcov-dump will flag them.
My patch does not do anything special for these routines when they are
read into the profile-use build, because we do want the propagated
counts during optimization. Possibly in whole-program mode they should
be cleared on read just as in gcov-tool, or they could be flagged in
some way for downstream phases, but it is not a compilation mode we
are using so I have not experimented.

Thanks,
Teresa


 nathan



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] gcc parallel make check


On 09/12/2014 08:04 PM, Jakub Jelinek wrote:

I've been worried about the quick cases where
parallelization is not beneficial, like make check-gcc \
RUNTESTFLAGS=dg.exp=pr60123.c or similar, but one doesn't usually pass -jN
in that case.


I have -jN in my $MAKEFLAGS, so I've been running into this with my rgt 
shell function:


rgt ()
{
( cd ~/m/$CANON/gcc/gcc;
make check-c++ ${1:+RUNTESTFLAGS=$*} )
}

If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in 
the explosion of shell verbosity.  Could we add some '@'s to more of the 
rules, perhaps?


Jason

[patch] Update C++11 library status docs


This documents some more C++11 features as incomplete and notes that
iostreams are movable now.

Committed to trunk.
commit 8de61687f31c20aec35c02393c4fbb44ddcc0cbb
Author: Jonathan Wakely jwak...@redhat.com
Date:   Mon Sep 22 15:57:28 2014 +0100

	* doc/xml/manual/status_cxx2011.xml: Update C++11 status.
	* doc/xml/manual/status_cxx2014.xml: Update TS status.
	* doc/html/manual/status.html: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
index f0a256d..4433c89 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
@@ -600,16 +600,18 @@ particular release.
   entry/
 /row
 row
+  ?dbhtml bgcolor=#B0B0B0 ?
   entry20.6.12.3/entry
   entrycodeuninitialized_fill/code/entry
-  entryY/entry
-  entry/
+  entryPartial/entry
+  entryReturns codevoid/code../entry
 /row
 row
+  ?dbhtml bgcolor=#B0B0B0 ?
   entry20.6.12.4/entry
   entrycodeuninitialized_fill_n/code/entry
-  entryY/entry
-  entry/
+  entryPartial/entry
+  entryReturns codevoid/code../entry
 /row
 row
   entry20.6.13/entry
@@ -1183,10 +1185,11 @@ particular release.
   entry/
 /row
 row
+  ?dbhtml bgcolor=#B0B0B0 ?
   entry22.3.3.1/entry
   entryCharacter classification/entry
-  entryY/entry
-  entry/
+  entryPartial/entry
+  entryMissing codeisblank/code./entry
 /row
 row
   entry22.3.3.2/entry
@@ -1639,10 +1642,11 @@ particular release.
   entry/
 /row
 row
+  ?dbhtml bgcolor=#B0B0B0 ?
   entry25.3/entry
   entryMutating sequence operations/entry
-  entryY/entry
-  entry/
+  entryPartial/entry
+  entrycoderotate/code returns codevoid/code./entry
 /row
 row
   entry25.4/entry
@@ -2049,10 +2053,13 @@ particular release.
   entry/
 /row
 row
+  ?dbhtml bgcolor=#B0B0B0 ?
   entry26.8/entry
   entryC Library/entry
-  entryY/entry
-  entry/
+  entryPartial/entry
+  entrycodelt;ctgmathgt;/code doesn't include
+codelt;ccomplexgt;/code
+  /entry
 /row
 row
   entry
@@ -2129,7 +2136,6 @@ particular release.
   entryIostreams base classes/entry
   entryPartial/entry
   entry
-Missing move and swap operations on codebasic_ios/code.
 Missing codeio_errc/code and codeiostream_category/code.
 codeios_base::failure/code is not derived from codesystem_error/code.
 	Missing codeios_base::hexfloat/code.
@@ -2147,23 +2153,20 @@ particular release.
   entryFormatting and manipulators/entry
   entryPartial/entry
   entry
-Missing move and swap operations
 Missing codeget_time/code and codeput_time/code manipulators.
   /entry
 /row
 row
-  ?dbhtml bgcolor=#B0B0B0 ?
   entry27.8/entry
   entryString-based streams/entry
-  entryPartial/entry
-  entryMissing move and swap operations/entry
+  entryY/entry
+  entry/
 /row
 row
-  ?dbhtml bgcolor=#B0B0B0 ?
   entry27.9/entry
   entryFile-based streams/entry
-  entryPartial/entry
-  entryMissing move and swap operations/entry
+  entryY/entry
+  entry/
 /row
 row
   entry
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
index 82abd88..11254d6 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
@@ -402,18 +402,6 @@ not in any particular release.
 row
   ?dbhtml bgcolor=#C8B0B0 ?
   entry
-	link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3923.pdf;
-	  N3923
-	/link
-  /entry
-  entryA SFINAE-Friendly std::iterator_traits/entry
-  entryN/entry
-  entryLibrary Fundamentals TS/entry
-/row
-
-row
-  ?dbhtml bgcolor=#C8B0B0 ?
-  entry
 	link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3925.pdf;
 	  N3925
 	/link

Re: [PATCH] gcc parallel make check

On Mon, Sep 22, 2014 at 11:21:14AM -0400, Jason Merrill wrote:
 On 09/12/2014 08:04 PM, Jakub Jelinek wrote:
 I've been worried about the quick cases where
 parallelization is not beneficial, like make check-gcc \
 RUNTESTFLAGS=dg.exp=pr60123.c or similar, but one doesn't usually pass -jN
 in that case.
 
 I have -jN in my $MAKEFLAGS, so I've been running into this with my rgt
 shell function:
 
 rgt ()
 {
 ( cd ~/m/$CANON/gcc/gcc;
 make check-c++ ${1:+RUNTESTFLAGS=$*} )
 }
 
 If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the
 explosion of shell verbosity.  Could we add some '@'s to more of the rules,
 perhaps?

I've been considering that too, but not sure what info people find valuable
and what they don't.

Jakub

Re: [PATCH, i386, Pointer Bounds Checker 32/x] Pointer Bounds Checker hooks for i386 target

On 19 Sep 18:21, Uros Bizjak wrote:
 On Fri, Sep 19, 2014 at 2:53 PM, Ilya Enkovich enkovich@gmail.com wrote:
 
   This patch adds i386 target hooks for Pointer Bounds Checker.
 
  New version with fixes and better documentation for ix86_load_bounds and 
  ix86_store_bounds is below.
 
  +/* Expand pass uses this hook to load bounds for function parameter
  +   PTR passed in SLOT in case its bounds are not passed in a register.
  +
  +   If SLOT is a memory, then bounds are loaded as for regular pointer
  +   loaded from memory.  PTR may be NULL in case SLOT is a memory.
  +   In such case value of PTR (if required) may be loaded from SLOT.
  +
  +   If SLOT is NULL or a register then SLOT_NO is an integer constant
  +   holding number of the target dependent special slot which should be
  +   used to obtain bounds.
  +
  +   Return loaded bounds.  */
 
 OK, I hope I understand this target-handling of SLOT_NO. Can you
 please clarify when SLOT is a register?

For functions with more than four pointers passed in registers we do not have 
enough bound registers to pass bounds.  These hooks are called then with SLOT 
set to register used to pass pointer

 
 I propose to write this function in the following (hopefully equivalent) way:

Since addr computation is very similar for both loading and storing (the only 
difference is usage of either arg_pointer_rtx or stack_pointer_rtx) I decided 
additionally move it into a separate function.  This should make functions 
simplier for understanding.

 
 --cut here--
 {
   if (!slot)
 {
   gcc_assert (CONST_INT_P (slot_no));
   addr = plus_constant (Pmode, arg_pointer_rtx,
 -(INTVAL (slot_no) + 1) * GET_MODE_SIZE (Pmode));
   gcc_assert (ptr);
 }
   else if (REG_P (slot))
 {
   gcc_assert (CONST_INT_P (slot_no));
   addr = plus_constant (Pmode, arg_pointer_rtx,
 -(INTVAL (slot_no) + 1) * GET_MODE_SIZE (Pmode));
   ptr = slot;
 }
   else if (MEM_P (slot))
 {
   addr = XEXP (slot, 0);
   if (!register_operand (addr, Pmode))
 addr = copy_addr_to_reg (addr);
 
   if (!ptr)
 ptr = copy_addr_to_reg (slot);
 }
   else
 gcc_unreachable ();
 
   if (!index_register_operand (ptr, Pmode))
 ptr = copy_addr_to_reg (ptr);
 
   ...
 }
 --cut here--
 
 Please add a comment in each of if/else, explaining what the code is
 doing. This is non-trivial to understand.
 
  +
  +static rtx
  +ix86_load_bounds (rtx slot, rtx ptr, rtx slot_no)
  +{
  +  rtx addr = NULL;
  +  rtx reg;
  +
  +  if (!ptr)
  +{
  +  gcc_assert (MEM_P (slot));
  +  ptr = copy_addr_to_reg (slot);
  +}
  +
  +  if (!slot || REG_P (slot))
  +{
  +  if (slot)
  +   ptr = slot;
  +
  +  gcc_assert (CONST_INT_P (slot_no));
  +
  +  /* Here we have the case when more than four pointers are
  +passed in registers.  In this case we are out of bound
  +registers and have to use bndldx to load bound.  RA,
  +RA - 8, etc. are used for address translation in bndldx.  */
  +  addr = plus_constant (Pmode, arg_pointer_rtx,
  +   -(INTVAL (slot_no) + 1) * GET_MODE_SIZE 
  (Pmode));
  +}
  +  else if (MEM_P (slot))
  +{
  +  addr = XEXP (slot, 0);
  +  if (!register_operand (addr, Pmode))
  +   addr = copy_addr_to_reg (addr);
  +}
  +  else
  +gcc_unreachable ();
  +
  +  if (!register_operand (ptr, Pmode))
  +ptr = copy_addr_to_reg (ptr);
  +
  +  reg = gen_reg_rtx (BNDmode);
  +  emit_insn (BNDmode == BND64mode
  +? gen_bnd64_ldx (reg, addr, ptr)
  +: gen_bnd32_ldx (reg, addr, ptr));
  +
  +  return reg;
  +}
  +
  +/* Expand pass uses this hook to store BOUNDS for call argument PTR
  +   passed in SLOT in case BOUNDS are not passed in a register.
  +
  +   If SLOT is a memory, then BOUNDS are stored as for regular pointer
  +   stored in memory.  PTR may be NULL in case SLOT is a memory.
  +   In such case value of PTR (if required) may be loaded from SLOT.
  +
  +   If SLOT is NULL or a register then SLOT_NO is an integer constant
  +   holding number of the target dependent special slot which should be
  +   used to store BOUNDS.  */
  +
  +static void
  +ix86_store_bounds (rtx ptr, rtx slot, rtx bounds, rtx slot_no)
 
 This function can be written in exactly the same way as the above
 proposed code, up to the check for ptr register_operand.
 
  +{
  +  rtx addr;
  +
  +  if (ptr)
  +{
  +  if (!register_operand (ptr, Pmode))
  +   ptr = copy_addr_to_reg (ptr);
  +}
  +  else
  +{
  +  gcc_assert (MEM_P (slot));
  +  ptr = copy_addr_to_reg (slot);
  +}
  +
  +  if (!slot || REG_P (slot))
  +{
  +  gcc_assert (CONST_INT_P (slot_no));
  +  addr = plus_constant (Pmode, stack_pointer_rtx,
  +   -(INTVAL (slot_no) + 1) * GET_MODE_SIZE 
  (Pmode));
  +}
  +  else if (MEM_P (slot))
  +addr = XEXP (slot, 0);

Re: [PATCH, i386, Pointer Bounds Checker 34/x] Vararg functions support

On 21 Sep 18:08, Uros Bizjak wrote:
 Hello!
 
  This patch introduces initialization of incoming bounds for vararg function 
  on i386 target.
 
  Bootstrapped and tested on linux-x86_64.
 
  Thanks,
  Ilya
  --
  gcc/
 
  2014-06-11  Ilya Enkovich  ilya.enkov...@intel.com
 
  * config/i386/i386.c (ix86_setup_incoming_varargs): New.
  (ix86_va_start): Initialize bounds for pointers in va_list.
  (TARGET_SETUP_INCOMING_VARARG_BOUNDS): New.
 
 OK with a couple of smallchanges below.
 
 Thanks,
 Uros.
 
  diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
  index a67e6e7..c520f26 100644
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -8456,6 +8456,72 @@ ix86_setup_incoming_varargs (cumulative_args_t 
  cum_v, enum machine_mode mode,
   setup_incoming_varargs_64 (next_cum);
   }
 
  +static void
  +ix86_setup_incoming_vararg_bounds (cumulative_args_t cum_v,
  +  enum machine_mode mode,
  +  tree type,
  +  int *pretend_size ATTRIBUTE_UNUSED,
  +  int no_rtl)
  +{
  +  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
  +  CUMULATIVE_ARGS next_cum;
  +  tree fntype;
  +  rtx save_area;
  +  int bnd_reg, i, max;
  +
  +  gcc_assert (!no_rtl);
  +
 
 Please add a comment for following condition.
 
  +  if (!TARGET_64BIT)
  +return;
  +
  +  fntype = TREE_TYPE (current_function_decl);
  +
  +  /* For varargs, we do not want to skip the dummy va_dcl argument.
  + For stdargs, we do want to skip the last named argument.  */
  +  next_cum = *cum;
  +  if (stdarg_p (fntype))
  +ix86_function_arg_advance (pack_cumulative_args (next_cum), mode, 
  type,
  +  true);
  +  if (cum-call_abi == MS_ABI)
  +return;
 
 Put this early exit at the top of the function.
 
  +  save_area = frame_pointer_rtx;
  +
  +  max = cum-regno + cfun-va_list_gpr_size / UNITS_PER_WORD;
  +  if (max  X86_64_REGPARM_MAX)
  +max = X86_64_REGPARM_MAX;
  +
  +  bnd_reg = cum-bnd_regno + cum-force_bnd_pass;
  +  if (chkp_function_instrumented_p (current_function_decl))
  +for (i = cum-regno; i  max; i++)
  +  {
  +   rtx addr = plus_constant (Pmode, save_area, i * UNITS_PER_WORD);
  +   rtx reg = gen_rtx_REG (DImode,
  +  x86_64_int_parameter_registers[i]);
  +   rtx ptr = reg;
  +   rtx bounds;
  +
  +   if (bnd_reg = LAST_BND_REG)
  + bounds = gen_rtx_REG (BNDmode, bnd_reg);
  +   else
  + {
  +   rtx ldx_addr = plus_constant (Pmode, arg_pointer_rtx,
  + (LAST_BND_REG - bnd_reg) * 8);
 
 No magic constants!
 
  +   bounds = gen_reg_rtx (BNDmode);
  +   emit_insn (TARGET_64BIT
  +  ? gen_bnd64_ldx (bounds, ldx_addr, ptr)
  +  : gen_bnd32_ldx (bounds, ldx_addr, ptr));
  + }
  +
  +   emit_insn (TARGET_64BIT
  +  ? gen_bnd64_stx (addr, ptr, bounds)
  +  : gen_bnd32_stx (addr, ptr, bounds));
 
 Please check BNDmode instead of TARGET_64BIT.
 
  +   bnd_reg++;
  +  }
  +}
  +
  +
   /* Checks if TYPE is of kind va_list char *.  */
 
   static bool
  @@ -8478,7 +8544,7 @@ ix86_va_start (tree valist, rtx nextarg)
   {
 HOST_WIDE_INT words, n_gpr, n_fpr;
 tree f_gpr, f_fpr, f_ovf, f_sav;
  -  tree gpr, fpr, ovf, sav, t;
  +  tree gpr, fpr, ovf, sav, t, t1;
 tree type;
 rtx ovf_rtx;
 
  @@ -8529,6 +8595,13 @@ ix86_va_start (tree valist, rtx nextarg)
 crtl-args.arg_offset_rtx,
 NULL_RTX, 0, OPTAB_LIB_WIDEN);
convert_move (va_r, next, 0);
  +
  + /* Store zero bounds for va_list.  */
  + if (chkp_function_instrumented_p (current_function_decl))
  +   chkp_expand_bounds_reset_for_mem (valist,
  + make_tree (TREE_TYPE (valist),
  +next));
  +
  }
 return;
   }
  @@ -8582,10 +8655,15 @@ ix86_va_start (tree valist, rtx nextarg)
 t = make_tree (type, ovf_rtx);
 if (words != 0)
   t = fold_build_pointer_plus_hwi (t, words * UNITS_PER_WORD);
  +  t1 = t;
 t = build2 (MODIFY_EXPR, type, ovf, t);
 TREE_SIDE_EFFECTS (t) = 1;
 expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
 
  +  /* Store zero bounds for overflow area pointer.  */
  +  if (chkp_function_instrumented_p (current_function_decl))
  +chkp_expand_bounds_reset_for_mem (ovf, t1);
 
 Can you please move this above to avoid another temporary (t1)?
 
 if (ix86_varargs_gpr_size || ix86_varargs_fpr_size)
   {
 /* Find the register save area.
  @@ -8594,9 +8672,14 @@ ix86_va_start (tree valist, rtx nextarg)
 t = make_tree (type, frame_pointer_rtx);
 if

Re: Fix ICE with ODR mering and variable sized types

2014-09-22 Thread Jan Hubicka

 On Fri, Sep 19, 2014 at 8:55 PM, Jan Hubicka hubi...@ucw.cz wrote:
  Hi,
  this patch fixes ICE by avoiding mangling of types with variadic size 
  (those are
  not really supported).  Bootstrapped/regtested x86_64-linux, tested with 
  libreoffice,
  comitted.
 
 Hmm, but how do global vars end up having variadic type?  Isn't the
 bug that you are ending up with some local entity here?

We call this on TYPE_NAME of all types, not only global vars.  I do not think I 
should
skip all types with function context, because static variables may have them 
(and I do not
track where type comes from because ODR violation is caused even by types not 
used
in global decls).  For example:

inline
int test()
{
  struct A {int a,b;};
  static struct A testA;
  return testA.a++;
}

creates type A that is local and should be merged interprocedurally I think.

Variadic types indeed can not appear in global declarations, so I think it is 
safe to ignore them.
I am adding Jason to CC, perhaps he knows better.

Honza
 
 Richard.
 
  PR lto/63286
  * tree.c (need_assembler_name_p): Do not mangle variadic types.
  Index: tree.c
  ===
  --- tree.c  (revision 215328)
  +++ tree.c  (working copy)
  @@ -5003,6 +5003,7 @@ need_assembler_name_p (tree decl)
  decl == TYPE_NAME (TREE_TYPE (decl))
  !is_lang_specific (TREE_TYPE (decl))
  AGGREGATE_TYPE_P (TREE_TYPE (decl))
  +   !variably_modified_type_p (TREE_TYPE (decl), NULL_TREE)
  !type_in_anonymous_namespace_p (TREE_TYPE (decl)))
   return !DECL_ASSEMBLER_NAME_SET_P (decl);
 /* Only FUNCTION_DECLs and VAR_DECLs are considered.  */

Re: [PATCH, i386, Pointer Bounds Checker 2/x] Intel Memory Protection Extensions (MPX) instructions support

On 17 Sep 12:17, Ilya Enkovich wrote:
 On 19 May 11:23, Jeff Law wrote:
  On 05/19/14 02:19, Ilya Enkovich wrote:
  On 16 May 13:39, Jeff Law wrote:
  On 04/16/14 05:35, Ilya Enkovich wrote:
  Hi,
  
  This patch introduces Intel MPX bound registers and instructions.  It 
  was approved earlier for 4.9 and had no significant changes since then.  
  I'll assume patch is OK if no objections arise.
  
  Patch was bootstrapped and tested for linux-x86_64.
  OK for the trunk.  Please wait until entire set is approved before
  installing.
  
  jeff
  
 
 Here is an updated version.  The only change is in mode_ldx expand.  It now 
 has the second operand preparation code moved from ix86_expand_builtin as was 
 proposed by Uros.
 
 Thanks,
 Ilya

Added similar operand handling into mode_stx expand.

Thanks,
Ilya
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 8e0a583..4e07d70 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;   H
-;;;   h jw  z
+;;;   h j   z
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -94,6 +94,9 @@
 (define_register_constraint v TARGET_SSE ? ALL_SSE_REGS : NO_REGS
  Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).)
 
+(define_register_constraint w TARGET_MPX ? BND_REGS : NO_REGS
+ @internal Any bound register.)
+
 ;; We use the Y prefix to denote any number of conditional register sets:
 ;;  z  First SSE register.
 ;;  i  SSE2 inter-unit moves to SSE register enabled
@@ -253,6 +256,8 @@
 ;; T prefix is used for different address constraints
 ;;   v - VSIB address
 ;;   s - address with no segment register
+;;   i - address with no index and no rip
+;;   b - address with no base and no rip
 
 (define_address_constraint Tv
   VSIB address operand
@@ -261,3 +266,11 @@
 (define_address_constraint Ts
   Address operand without segment register
   (match_operand 0 address_no_seg_operand))
+
+(define_address_constraint Ti
+  MPX address operand without index
+  (match_operand 0 address_mpx_no_index_operand))
+
+(define_address_constraint Tb
+  MPX address operand without base
+  (match_operand 0 address_mpx_no_base_operand))
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 2c05cec..a1e9289 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -399,6 +399,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, __XSAVEC__);
   if (isa_flag  OPTION_MASK_ISA_XSAVES)
 def_or_undef (parse_in, __XSAVES__);
+  if (isa_flag  OPTION_MASK_ISA_MPX)
+def_or_undef (parse_in, __MPX__);
 }
 
 
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 07e5720..0e302e3 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -87,6 +87,9 @@ VECTOR_MODE (INT, DI, 1); /*   V1DI */
 VECTOR_MODE (INT, SI, 1); /*   V1SI */
 VECTOR_MODE (INT, QI, 2); /*   V2QI */
 
+POINTER_BOUNDS_MODE (BND32, 8);
+POINTER_BOUNDS_MODE (BND64, 16);
+
 INT_MODE (OI, 32);
 INT_MODE (XI, 64);
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 39462bd..c8ef2d2 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -231,6 +231,8 @@ extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
 
+extern bool ix86_bnd_prefixed_insn_p (rtx);
+
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
 extern void ix86_register_pragmas (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 929f1b1..01823ca 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2131,6 +2131,8 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   /* Mask registers.  */
   MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
   MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  /* MPX bound registers */
+  BND_REGS, BND_REGS, BND_REGS, BND_REGS,
 };
 
 /* The default register map used in 32bit mode.  */
@@ -2147,6 +2149,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 24-31*/
   93, 94, 95, 96, 97, 98, 99, 100,  /* Mask registers */
+  101, 102, 103, 104,  /* bound registers */
 };
 
 /* The default register map used in 64bit mode.  */
@@ -2163,6 +2166,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   67, 68, 69, 70, 71, 72, 73, 74,   /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,   /* AVX-512 registers 24-31 */
   118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
+  126, 127, 128, 129,  /* bound registers */
 };
 
 /* Define the register

Re: [PATCH] gcc parallel make check


On 09/22/2014 11:26 AM, Jakub Jelinek wrote:

On Mon, Sep 22, 2014 at 11:21:14AM -0400, Jason Merrill wrote:

If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the
explosion of shell verbosity.  Could we add some '@'s to more of the rules,
perhaps?


I've been considering that too, but not sure what info people find valuable
and what they don't.


I don't see much information in the ~128 repetitions of the 
check-parallel rules with different numbers; the actual runtest command 
is the same in all of them.  Adding @ to all of the commands of the 
check-parallel-% rule makes things much better for me:


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6f251a5..be4c840 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp
 	fi
 
 check-parallel-% : site.exp
-	-test -d plugin || mkdir plugin
-	-test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
-	test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir)
-	-(rootme=`${PWD_COMMAND}`; export rootme; \
+	-@test -d plugin || mkdir plugin
+	-@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
+	@test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir)
+	-@(rootme=`${PWD_COMMAND}`; export rootme; \
 	srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \
 	if [ -n $(check_p_subno) ] \
 	[ -n $$GCC_RUNTEST_PARALLELIZE_DIR ] \

Re: [debug-early] Allow checking of DECL_ABSTRACT in decl_ultimate_origin

2014-09-22 Thread Aldy Hernandez


On 09/22/14 08:17, Michael Matz wrote:

Hi,

On Fri, 19 Sep 2014, Aldy Hernandez wrote:


Michael, I really don't understand the need for this change in your original
patch.  I don't know if this was a temporary testing change or what.


I'm pretty sure it was temporary testing, when I still was finding my way
through dwarf2out limitations/constraints.


Ah perfect!




I'm happy to report that with this and the last set of patches, both C
and C++ guality tests have = regressions than mainline.  Yay.


Super.


Again, thank you for your original patchset, which has immensely helped 
me navigate my way around the black box which was/is dwarf2out.


Aldy

Re: [PATCH] gcc parallel make check

2014-09-22 Thread Segher Boessenkool

On Mon, Sep 22, 2014 at 05:26:04PM +0200, Jakub Jelinek wrote:
 I've been considering that too, but not sure what info people find valuable
 and what they don't.

The ten million Running blablablalba.exp ... messages on a very parallel
run aren't helpful in my opinion.  There might be more but that drowns out
everything else :-)


Segher

Re: [PATCH] msp430: inhibit automatic link of -lnosys in absence of -msim

2014-09-22 Thread Nicholas Clifton


Hi Peter,

 gcc/ChangeLog
 2014-09-22  Peter A. Bigot  p...@pabigot.com

* config/msp430/msp430.h: Remove automatic -lnosys when -msim absent.

Approved and applied.

Cheers
  Nick

[PATCH][AArch64] LR register not used in leaf functions

2014-09-22 Thread Kugan

AArch64 has the same issue ARM had where the LR register was not used in
leaf functions. This was reported in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this
test-case need to be added with more live ranges for the need for the
LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for
the case AArch64 to see this.

The same fix (from the thread
https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went
into ARM should apply to AArch64 as well. Regression tested on qemu for
aarch64-none-linux-gnu with no new regressions. Is this OK for trunk?

Thanks,
Kugan


gcc/ChangeLog:

2014-09-23  Kugan Vivekanandarajah  kug...@linaro.org

* config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after
epilogue_completed is true.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index db950da..b3e4585 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -309,7 +309,7 @@ extern unsigned long aarch64_tune_flags;
considered live at the start of the called function.  */
 
 #define EPILOGUE_USES(REGNO) \
-  ((REGNO) == LR_REGNUM)
+  (epilogue_completed  (REGNO) == LR_REGNUM)
 
 /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,
the stack pointer does not matter.  The value is tested only in

Re: [PATCH] gcc parallel make check

On Mon, Sep 22, 2014 at 10:44:06AM -0500, Segher Boessenkool wrote:
 On Mon, Sep 22, 2014 at 05:26:04PM +0200, Jakub Jelinek wrote:
  I've been considering that too, but not sure what info people find valuable
  and what they don't.
 
 The ten million Running blablablalba.exp ... messages on a very parallel
 run aren't helpful in my opinion.  There might be more but that drowns out
 everything else :-)

It has some value, it shows the actual progress.  Sure, you can just watch
the *.log files as they are populated and get better picture.  I think the
Running *.exp messages go from dejagnu, not from gcc testsuite changes.

Jakub

Re: Speedup int_bit_from_pos

2014-09-22 Thread Jan Hubicka

 On Sun, 21 Sep 2014, Jan Hubicka wrote:
 
   
   Please omit static from inline functions.
  
  Yep, I suppose we want to drop static in all inlines? I can make patch for 
  that.
   
   Also one notable difference with your patches is that the fits hwi is now 
   not tested on the result but on the result input which, multiplied by 8, 
   might not fit a hwi now.  So please use wide-ints here (the to_offset 
   flavor).
  
  The function must always suceed (so user promise it will fit in HWI) and for
  performance reasons I would rather not go into wide int by defualt, but I 
  can
  do that with checking enabled.
 
 wide-int should be fast enough, please use it.

Like this?

Index: tree.h
===
--- tree.h  (revision 215421)
+++ tree.h  (working copy)
@@ -3877,10 +3877,20 @@ extern tree size_in_bytes (const_tree);
 extern HOST_WIDE_INT int_size_in_bytes (const_tree);
 extern HOST_WIDE_INT max_int_size_in_bytes (const_tree);
 extern tree bit_position (const_tree);
-extern HOST_WIDE_INT int_bit_position (const_tree);
 extern tree byte_position (const_tree);
 extern HOST_WIDE_INT int_byte_position (const_tree);
 
+/* Like bit_position, but return as an integer.  It must be representable in
+   that way (since it could be a signed value, we don't have the
+   option of returning -1 like int_size_in_byte can.  */
+
+static inline HOST_WIDE_INT int_bit_position (const_tree field)
+{ 
+  return ((wide_int)DECL_FIELD_OFFSET (field) * BITS_PER_UNIT
+ + (wide_int)DECL_FIELD_BIT_OFFSET (field)).to_shwi ();
+}
+
+
 #define sizetype sizetype_tab[(int) stk_sizetype]
 #define bitsizetype sizetype_tab[(int) stk_bitsizetype]
 #define ssizetype sizetype_tab[(int) stk_ssizetype]

Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'

2014-09-22 Thread Chen Gang

On 09/22/2014 10:45 PM, Michael Eager wrote:
On 09/21/14 21:10, Chen Gang wrote:
On 9/22/14 2:09, Michael Eager wrote:

Generally, you should use gcc to link programs, not ld. gcc is
a driver which will select the appropriate libraries and support routines
(such as crt0.o, which contains _start) and pass them to the linker.

OK, thanks.

When gcc, it misses the root directory for crt1.o and crtn.o: e.g.
/lib/ld.so.1, crt1.o, crtn.o when gcc -v, but we need /upstream/
release/lib/ld.so.1, /upstream/lib/crt1.o, /upstream/libcrtn.o.

You likely need to build mb-gcc with --sysroot=/upstream.

OK, thanks! I guess it will solve all issues which I met, and I shall
try next.

How are you building gcc? What are your configuration options?

The related information is below, please help check when you have time,
thanks.

[root@localhost ~]# /upstream/release/bin/microblaze-gchen-linux-gcc -v
Using built-in specs.
COLLECT_GCC=/upstream/release/bin/microblaze-gchen-linux-gcc
COLLECT_LTO_WRAPPER=/upstream/release/libexec/gcc/microblaze-gchen-linux/5.0.0/lto-wrapper
Target: microblaze-gchen-linux
Configured with: ../gcc/configure --target=microblaze-gchen-linux --disable-nls
--enable-languages=c --disable-threads --disable-shared --without-headers
--disable-libssp --disable-libquadmath --disable-libgomp --disable-libatomic
--prefix=/upstream/release
Thread model: single
gcc version 5.0.0 20140920 (experimental) (GCC)
[root@localhost ~]#

Thanks.
--
Chen Gang

Open share and attitude like air water and life which God blessed

Re: [PATCH] gcc parallel make check

On Mon, Sep 22, 2014 at 11:43:35AM -0400, Jason Merrill wrote:
 On 09/22/2014 11:26 AM, Jakub Jelinek wrote:
 On Mon, Sep 22, 2014 at 11:21:14AM -0400, Jason Merrill wrote:
 If I say 'rgt dg.exp=var-templ1.C' the actual test results are lost in the
 explosion of shell verbosity.  Could we add some '@'s to more of the rules,
 perhaps?
 
 I've been considering that too, but not sure what info people find valuable
 and what they don't.
 
 I don't see much information in the ~128 repetitions of the check-parallel
 rules with different numbers; the actual runtest command is the same in all
 of them.  Adding @ to all of the commands of the check-parallel-% rule makes
 things much better for me:

LGTM (though, supposedly we want similar change in
libstdc++-v3/testsuite/Makefile.am).
Or, if people would really like to see the commands, we could print them
just once, using e.g.
-$(if $(check_p_subno),@)(rootme= ...
(then e.g. check-parallel-gcc goal would print the command, but
check-parallel-gcc-1 or check-parallel-gcc-112 would not).

 --- a/gcc/Makefile.in
 +++ b/gcc/Makefile.in
 @@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp
   fi
  
  check-parallel-% : site.exp
 - -test -d plugin || mkdir plugin
 - -test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
 - test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir 
 $(TESTSUITEDIR)/$(check_p_subdir)
 - -(rootme=`${PWD_COMMAND}`; export rootme; \
 + -@test -d plugin || mkdir plugin
 + -@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
 + @test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir 
 $(TESTSUITEDIR)/$(check_p_subdir)
 + -@(rootme=`${PWD_COMMAND}`; export rootme; \
   srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \
   if [ -n $(check_p_subno) ] \
   [ -n $$GCC_RUNTEST_PARALLELIZE_DIR ] \


Jakub

Re: [PATCH][AArch64] LR register not used in leaf functions

2014-09-22 Thread Jiong Wang


On 22/09/14 16:43, Kugan wrote:


AArch64 has the same issue ARM had where the LR register was not used in
leaf functions. This was reported in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this
test-case need to be added with more live ranges for the need for the
LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for
the case AArch64 to see this.

The same fix (from the thread
https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went
into ARM should apply to AArch64 as well. Regression tested on qemu for
aarch64-none-linux-gnu with no new regressions. Is this OK for trunk?

This still be a partial fix. LR should be a caller-saved register free
to use in case it's saved properly to across function call.

I had a very similar patch to this sitting in my local tree and under
various benchmark analysis.

-- Jiong


Thanks,
Kugan


gcc/ChangeLog:

2014-09-23  Kugan Vivekanandarajah  kug...@linaro.org

* config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after
epilogue_completed is true.

Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'

2014-09-22 Thread Chen Gang

On 09/22/2014 10:46 PM, Michael Eager wrote:
 On 09/21/14 20:55, Chen Gang wrote:


 On 9/22/14 2:03, Michael Eager wrote:
 On 09/20/14 23:24, Chen Gang wrote:
 And it seems, we also need 'LinkScr.ld' for ldscript, could you share it
 to me, thanks.

 set_board_info ldscript 
 -T/home/eager/Xilinx/dg/microblaze_0/LinkScr.ld

 Hi Chen --

 The DejaGNU configuration I provided is for a bare-metal environment.

 If you are testing in a Linux environment, the tool chain you uses
 should provide a default linker script which matches your hardware's
 memory layout. You should not need to provide a separate linker script.


 OK, thanks, I shall try to find the default linker script for it.
 
 If you are running mb-gcc which generates executables which run on
 the target, you do not need to provide a linker script.
 

OK, thanks. I shall remove it, next.

Thanks.
-- 
Chen Gang

Open share and attitude like air water and life which God blessed

[jit] Improvements to documentation

2014-09-22 Thread David Malcolm

Committed to branch dmalcolm/jit

As before, an HTML version of the docs can be seen at:
 https://dmalcolm.fedorapeople.org/gcc/libgccjit-api-docs/index.html

with the bulk of the changes occurring to:
 https://dmalcolm.fedorapeople.org/gcc/libgccjit-api-docs/intro/tutorial03.html

gcc/jit/ChangeLog.jit:
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* docs/intro/install.rst: Reduce width of listing.
* docs/intro/tutorial01.rst: Use libgccjit.h rather than
libgccjit.h when including the header.
* docs/intro/tutorial02.rst: Likewise.
* docs/intro/tutorial03.rst: Clarify various sections; show
effect of reducing optimization level down from 3 to 2.
(Putting it all together): Move to above...
(Behind the curtain: optimizing away stack manipulation):
...this, and rename this to...
(Behind the curtain: How does our code get optimized?): ...and
add more detail, and discussion of elimination of tail recursion.
---
 gcc/jit/ChangeLog.jit  |  15 +
 gcc/jit/docs/_build/texinfo/libgccjit.texi | 875 +
 gcc/jit/docs/intro/install.rst |  11 +-
 gcc/jit/docs/intro/tutorial01.rst  |   2 +-
 gcc/jit/docs/intro/tutorial02.rst  |   2 +-
 gcc/jit/docs/intro/tutorial03.rst  | 533 +++---
 6 files changed, 1141 insertions(+), 297 deletions(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 8e546e6..14576f2 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,18 @@
+2014-09-22  David Malcolm  dmalc...@redhat.com
+
+   * docs/_build/texinfo/libgccjit.texi: Regenerate.
+   * docs/intro/install.rst: Reduce width of listing.
+   * docs/intro/tutorial01.rst: Use libgccjit.h rather than
+   libgccjit.h when including the header.
+   * docs/intro/tutorial02.rst: Likewise.
+   * docs/intro/tutorial03.rst: Clarify various sections; show
+   effect of reducing optimization level down from 3 to 2.
+   (Putting it all together): Move to above...
+   (Behind the curtain: optimizing away stack manipulation):
+   ...this, and rename this to...
+   (Behind the curtain: How does our code get optimized?): ...and
+   add more detail, and discussion of elimination of tail recursion.
+
 2014-09-19  David Malcolm  dmalc...@redhat.com
 
* TODO.rst: Add detection of uninitialized variables, since
diff --git a/gcc/jit/docs/_build/texinfo/libgccjit.texi 
b/gcc/jit/docs/_build/texinfo/libgccjit.texi
index 985b22c..850adf2 100644
--- a/gcc/jit/docs/_build/texinfo/libgccjit.texi
+++ b/gcc/jit/docs/_build/texinfo/libgccjit.texi
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-libgccjit 0.1, September 19, 2014
+libgccjit 0.1, September 22, 2014
 
 David Malcolm
 
@@ -131,8 +131,13 @@ Tutorial part 3: Adding JIT-compilation to a toy 
interpreter
 * Compiling the context:: 
 * Single-stepping through the generated code:: 
 * Examining the generated code:: 
-* Behind the curtain; optimizing away stack manipulation: Behind the curtain 
optimizing away stack manipulation. 
 * Putting it all together:: 
+* Behind the curtain; How does our code get optimized?: Behind the curtain How 
does our code get optimized?. 
+
+Behind the curtain: How does our code get optimized?
+
+* Optimizing away stack manipulation:: 
+* Elimination of tail recursion:: 
 
 Topic Reference
 
@@ -259,8 +264,13 @@ Tutorial part 3: Adding JIT-compilation to a toy 
interpreter
 * Compiling the context:: 
 * Single-stepping through the generated code:: 
 * Examining the generated code:: 
-* Behind the curtain; optimizing away stack manipulation: Behind the curtain 
optimizing away stack manipulation. 
 * Putting it all together:: 
+* Behind the curtain; How does our code get optimized?: Behind the curtain How 
does our code get optimized?. 
+
+Behind the curtain: How does our code get optimized?
+
+* Optimizing away stack manipulation:: 
+* Elimination of tail recursion:: 
 
 @end menu
 
@@ -321,12 +331,13 @@ needed to develop against it (@cite{libgccjit-devel}):
 
 @example
 $ rpm -qlv libgccjit
-lrwxrwxrwx1 rootroot   18 Aug 12 07:56 
/usr/lib64/libgccjit.so.0 - libgccjit.so.0.0.1
--rwxr-xr-x1 rootroot 14463448 Aug 12 07:57 
/usr/lib64/libgccjit.so.0.0.1
+lrwxrwxrwx1 rootroot   18 Aug 12 07:56 /usr/lib64/libgccjit.so.0 
- libgccjit.so.0.0.1
+-rwxr-xr-x1 rootroot 14463448 Aug 12 07:57 
/usr/lib64/libgccjit.so.0.0.1
+
 $ rpm -qlv libgccjit-devel
--rwxr-xr-x1 rootroot37654 Aug 12 07:56 
/usr/include/libgccjit++.h
--rwxr-xr-x1 rootroot28967 Aug 12 07:56 
/usr/include/libgccjit.h
-lrwxrwxrwx1 rootroot   14 Aug 12 07:56 
/usr/lib64/libgccjit.so - libgccjit.so.0
+-rwxr-xr-x1 rootroot37654 Aug 12 07:56 /usr/include/libgccjit++.h
+-rwxr-xr-x1 rootroot

Re: [AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c

2014-09-22 Thread Richard Earnshaw

On 22/09/14 14:30, James Greenhalgh wrote:
 
 On Thu, Sep 18, 2014 at 11:12:15AM +0100, Richard Earnshaw wrote:
 On 18/09/14 10:53, James Greenhalgh wrote:
 +$(srcdir)/config/aarch64/aarch64-builtin-iterators.h: 
 $(srcdir)/config/aarch64/geniterators.sh \
 + $(srcdir)/config/aarch64/iterators.md
 + $(SHELL) $(srcdir)/config/aarch64/geniterators.sh \
 + $(srcdir)/config/aarch64/iterators.md  \
 + $(srcdir)/config/aarch64/aarch64-builtin-iterators.h
 +
  aarch-common.o: $(srcdir)/config/arm/aarch-common.c $(CONFIG_H) 
 $(SYSTEM_H) \
  coretypes.h $(TM_H) $(TM_P_H) $(RTL_H) $(TREE_H) output.h $(C_COMMON_H)
   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \


 Is there any real need to write this into the source directory and have
 the built file checked in?  Ie. can't we always write to the build
 directory and use it from there.  That avoids problems if the sources
 are on a read-only filesystem.

 If we do need to leave it in the sources, then contrib/update_gcc should
 be taught how to touch the generated file when resyncing from the
 repositories.

 
 I thought I had tried this and failed to make it work. I must not have
 been trying hard enough at the time.
 
 Updated as attached, generating the header in the build directory. It
 looks much better this way!
 
 Bootstrapped on aarch64-none-linux-gnueabi with no issues.
 
 Ok?
 
 Thanks,
 James
 
 ---
 gcc/
 
 2014-09-22  James Greenhalgh  james.greenha...@arm.com
 
   * config/aarch64/geniterators.sh: New.
   * config/aarch64/iterators.md (VDQF_DF): New.
   * config/aarch64/t-aarch64: Generate aarch64-builtin-iterators.h.
   * config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove.
 
 

OK.

R.

Re: [PATCH] gcc parallel make check

2014-09-22 Thread Segher Boessenkool

On Mon, Sep 22, 2014 at 05:49:12PM +0200, Jakub Jelinek wrote:
 On Mon, Sep 22, 2014 at 10:44:06AM -0500, Segher Boessenkool wrote:
  On Mon, Sep 22, 2014 at 05:26:04PM +0200, Jakub Jelinek wrote:
   I've been considering that too, but not sure what info people find 
   valuable
   and what they don't.
  
  The ten million Running blablablalba.exp ... messages on a very parallel
  run aren't helpful in my opinion.  There might be more but that drowns out
  everything else :-)
 
 It has some value, it shows the actual progress.  Sure, you can just watch
 the *.log files as they are populated and get better picture.  I think the
 Running *.exp messages go from dejagnu, not from gcc testsuite changes.

Hrm.  Looking at the log files it seems there are not more of those messages
at all since the changes.  Maybe it just all got too fast! :-)


Segher

Re: [PATCH] gcc parallel make check


On 09/22/2014 11:58 AM, Jakub Jelinek wrote:

LGTM (though, supposedly we want similar change in
libstdc++-v3/testsuite/Makefile.am).
Or, if people would really like to see the commands, we could print them
just once, using e.g.
-$(if $(check_p_subno),@)(rootme= ...
(then e.g. check-parallel-gcc goal would print the command, but
check-parallel-gcc-1 or check-parallel-gcc-112 would not).


So, like this?



commit c750897381a3f936e27cabd825cfa85ce936a6a9
Author: Jason Merrill ja...@redhat.com
Date:   Mon Sep 22 11:44:00 2014 -0400

gcc/
	* Makefile.in (check-parallel-%): Add @.
libstdc++-v3/
	* testsuite/Makefile.am (%/site.exp): Add @.
	(check-DEJAGNU): Likewise.
	* testsuite/Makefile.in: Regenerate.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6f251a5..97b439a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp
 	fi
 
 check-parallel-% : site.exp
-	-test -d plugin || mkdir plugin
-	-test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
-	test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir)
-	-(rootme=`${PWD_COMMAND}`; export rootme; \
+	-@test -d plugin || mkdir plugin
+	-@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
+	@test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir $(TESTSUITEDIR)/$(check_p_subdir)
+	-$(if $(check_p_subno),@)(rootme=`${PWD_COMMAND}`; export rootme; \
 	srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \
 	if [ -n $(check_p_subno) ] \
 	[ -n $$GCC_RUNTEST_PARALLELIZE_DIR ] \
diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/Makefile.am
index e206aba..b4c9e85 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -91,9 +91,9 @@ new-abi-baseline:
 	  ${extract_symvers} ../src/.libs/libstdc++.so $${output})
 
 %/site.exp: site.exp
-	-test -d $* || mkdir $*
+	-@test -d $* || mkdir $*
 	@srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
-	objdir=`${PWD_COMMAND}`/$*; \
+	@objdir=`${PWD_COMMAND}`/$*; \
 	sed -e s|^set srcdir .*$$|set srcdir $$srcdir| \
 	-e s|^set objdir .*$$|set objdir $$objdir| \
 	site.exp  $*/site.exp.tmp
@@ -115,7 +115,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp
 
 # Run the testsuite in normal mode.
 check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
-	AR=$(AR); export AR; \
+	$(if $*,@)AR=$(AR); export AR; \
 	RANLIB=$(RANLIB); export RANLIB; \
 	if [ -z $* ]  [ $(filter -j, $(MFLAGS)) = -j ]; then \
 	  rm -rf normal-parallel || true; \
diff --git a/libstdc++-v3/testsuite/Makefile.in b/libstdc++-v3/testsuite/Makefile.in
index 59060b8..0fc26f4 100644
--- a/libstdc++-v3/testsuite/Makefile.in
+++ b/libstdc++-v3/testsuite/Makefile.in
@@ -553,9 +553,9 @@ new-abi-baseline:
 	  ${extract_symvers} ../src/.libs/libstdc++.so $${output})
 
 %/site.exp: site.exp
-	-test -d $* || mkdir $*
+	-@test -d $* || mkdir $*
 	@srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
-	objdir=`${PWD_COMMAND}`/$*; \
+	@objdir=`${PWD_COMMAND}`/$*; \
 	sed -e s|^set srcdir .*$$|set srcdir $$srcdir| \
 	-e s|^set objdir .*$$|set objdir $$objdir| \
 	site.exp  $*/site.exp.tmp
@@ -566,7 +566,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp
 
 # Run the testsuite in normal mode.
 check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
-	AR=$(AR); export AR; \
+	$(if $*,@)AR=$(AR); export AR; \
 	RANLIB=$(RANLIB); export RANLIB; \
 	if [ -z $* ]  [ $(filter -j, $(MFLAGS)) = -j ]; then \
 	  rm -rf normal-parallel || true; \

Re: [PATCH] gcc parallel make check

On Mon, Sep 22, 2014 at 12:21:08PM -0400, Jason Merrill wrote:
 On 09/22/2014 11:58 AM, Jakub Jelinek wrote:
 LGTM (though, supposedly we want similar change in
 libstdc++-v3/testsuite/Makefile.am).
 Or, if people would really like to see the commands, we could print them
 just once, using e.g.
  -$(if $(check_p_subno),@)(rootme= ...
 (then e.g. check-parallel-gcc goal would print the command, but
 check-parallel-gcc-1 or check-parallel-gcc-112 would not).
 
 So, like this?

Ok, thanks.

 commit c750897381a3f936e27cabd825cfa85ce936a6a9
 Author: Jason Merrill ja...@redhat.com
 Date:   Mon Sep 22 11:44:00 2014 -0400
 
 gcc/
   * Makefile.in (check-parallel-%): Add @.
 libstdc++-v3/
   * testsuite/Makefile.am (%/site.exp): Add @.
   (check-DEJAGNU): Likewise.
   * testsuite/Makefile.in: Regenerate.
 
 diff --git a/gcc/Makefile.in b/gcc/Makefile.in
 index 6f251a5..97b439a 100644
 --- a/gcc/Makefile.in
 +++ b/gcc/Makefile.in
 @@ -3674,10 +3674,10 @@ $(lang_checks_parallelized): check-% : site.exp
   fi
  
  check-parallel-% : site.exp
 - -test -d plugin || mkdir plugin
 - -test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
 - test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir 
 $(TESTSUITEDIR)/$(check_p_subdir)
 - -(rootme=`${PWD_COMMAND}`; export rootme; \
 + -@test -d plugin || mkdir plugin
 + -@test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
 + @test -d $(TESTSUITEDIR)/$(check_p_subdir) || mkdir 
 $(TESTSUITEDIR)/$(check_p_subdir)
 + -$(if $(check_p_subno),@)(rootme=`${PWD_COMMAND}`; export rootme; \
   srcdir=`cd ${srcdir}; ${PWD_COMMAND}` ; export srcdir ; \
   if [ -n $(check_p_subno) ] \
   [ -n $$GCC_RUNTEST_PARALLELIZE_DIR ] \
 diff --git a/libstdc++-v3/testsuite/Makefile.am 
 b/libstdc++-v3/testsuite/Makefile.am
 index e206aba..b4c9e85 100644
 --- a/libstdc++-v3/testsuite/Makefile.am
 +++ b/libstdc++-v3/testsuite/Makefile.am
 @@ -91,9 +91,9 @@ new-abi-baseline:
 ${extract_symvers} ../src/.libs/libstdc++.so $${output})
  
  %/site.exp: site.exp
 - -test -d $* || mkdir $*
 + -@test -d $* || mkdir $*
   @srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
 - objdir=`${PWD_COMMAND}`/$*; \
 + @objdir=`${PWD_COMMAND}`/$*; \
   sed -e s|^set srcdir .*$$|set srcdir $$srcdir| \
   -e s|^set objdir .*$$|set objdir $$objdir| \
   site.exp  $*/site.exp.tmp
 @@ -115,7 +115,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: 
 normal%/site.exp
  
  # Run the testsuite in normal mode.
  check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
 - AR=$(AR); export AR; \
 + $(if $*,@)AR=$(AR); export AR; \
   RANLIB=$(RANLIB); export RANLIB; \
   if [ -z $* ]  [ $(filter -j, $(MFLAGS)) = -j ]; then \
 rm -rf normal-parallel || true; \
 diff --git a/libstdc++-v3/testsuite/Makefile.in 
 b/libstdc++-v3/testsuite/Makefile.in
 index 59060b8..0fc26f4 100644
 --- a/libstdc++-v3/testsuite/Makefile.in
 +++ b/libstdc++-v3/testsuite/Makefile.in
 @@ -553,9 +553,9 @@ new-abi-baseline:
 ${extract_symvers} ../src/.libs/libstdc++.so $${output})
  
  %/site.exp: site.exp
 - -test -d $* || mkdir $*
 + -@test -d $* || mkdir $*
   @srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
 - objdir=`${PWD_COMMAND}`/$*; \
 + @objdir=`${PWD_COMMAND}`/$*; \
   sed -e s|^set srcdir .*$$|set srcdir $$srcdir| \
   -e s|^set objdir .*$$|set objdir $$objdir| \
   site.exp  $*/site.exp.tmp
 @@ -566,7 +566,7 @@ $(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: 
 normal%/site.exp
  
  # Run the testsuite in normal mode.
  check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
 - AR=$(AR); export AR; \
 + $(if $*,@)AR=$(AR); export AR; \
   RANLIB=$(RANLIB); export RANLIB; \
   if [ -z $* ]  [ $(filter -j, $(MFLAGS)) = -j ]; then \
 rm -rf normal-parallel || true; \


Jakub

[patch] moving macro definitions to defaults.h

2014-09-22 Thread Andrew MacLeod

After being reminded of the tm.h issues brought up last november (here:  
https://gcc.gnu.org/ml/gcc-patches/2013-11/msg01731.html ), I started 
looking back into it.


The general summary is the any header file which has a conditional on a  
target macro could be affected by include file reordering... ie,  if a 
header file has

#ifdef BLAH
 whatever

and we move the header files around a bit, it wouldn't be immediately 
obvious if the order where changes and BLAH was define later in the 
include structure.  The results for this header files would be different 
because whatever would no longer be in the preprocess source, and it 
may take a long time to discover the difference.


Josephs solution was to identify these and instead put a default 
definition in default.h ...  then change all the uses to #if instead.. ie,

#if BLAH

This way we can ensure that the definition has been seen and it will be 
a compile error if not.



I looked at all the target macros listed by Joseph, and decide to start 
with the ones which are used *only* in an if defined situation in a .h 
files of some sort.

ie
#ifdef
#ifndef
or #if define()

If this happens in a .c file, then changing the order of .h's shouldn't 
matter. (Unless the .c file is doing something really screwy.  So for 
the moment, ignoring that.)


So looking at only .h files, I found a number of default definitions in 
rtl.h, which this patch moves to defaults.h.  I was going to #error if 
defaults.h wasn't included, but found that to be unnecessary since it 
uses some of those macros and would cause a compile failure anyway if it 
wasn't included. All the other uses of these particular macros use the 
#if model, so no further adjustment is required for them.


This patch bootstraps on x86_64-unknown-linux-gnu, and regressions are 
running.  Assuming no issues show up, OK for mainline?


The remaining macros I will have a closer look at and most will likely 
need the full treatment moving from #ifdef to the #if model.  I plan to 
do these next.  The macros which fit this category for potential header 
trouble (along with their usage count) are:


2  USE_COLLECT2
3  TARGET_TERMINATE_DW2_EH_FRAME_INFO
3  TARGET_HAVE_CTORS_DTORS
3  REGMODE_NATURAL_SIZE
4  CC_STATUS_MDEP_INIT
4  CC_STATUS_MDEP_INIT
4  MODE_BASE_REG_REG_CLASS
4  REGNO_MODE_OK_FOR_REG_BASE_P
5  MODE_BASE_REG_CLASS
5  XCOFF_DEBUGGING_INFO
6  VMS_DEBUGGING_INFO
6  TARGET_ASM_OUTPUT_ANCHOR
6  EH_FRAME_SECTION_NAME
8  MODE_CODE_BASE_REG_CLASS
9  HARD_REGNO_CALLER_SAVE_MODE
9  HARD_REGNO_CALL_PART_CLOBBERED
9  SDB_DEBUGGING_INFO
9  CC_STATUS_MDEP
9  REGNO_MODE_CODE_OK_FOR_BASE_P
10 SECONDARY_RELOAD_CLASS
10 TARGET_VXWORKS_RTP
14 NO_DOT_IN_LABEL
14 REGNO_MODE_OK_FOR_BASE_P
17 STACK_REGS
19 TARGET_ASM_DESTRUCTOR
20 OBJECT_FORMAT_ELF
20 TARGET_ASM_CONSTRUCTOR
24 NO_DOLLAR_IN_LABEL
26 DBX_DEBUGGING_INFO
30 CPLUSPLUS_CPP_SPEC
42 ASM_OUTPUT_DEF






	* defaults.h (HAVE_PRE_INCREMENT, HAVE_PRE_DECREMENT,
	HAVE_POST_INCREMENT, HAVE_POST_DECREMENT, HAVE_POST_MODIFY_DISP,
	HAVE_POST_MODIFY_REG, HAVE_PRE_MODIFY_DISP, HAVE_PRE_MODIFY_REG,
	USE_LOAD_POST_INCREMENT, USE_LOAD_POST_DECREMENT,
	USE_LOAD_PRE_INCREMENT, USE_LOAD_PRE_DECREMENT,
	USE_STORE_POST_INCREMENT, USE_STORE_POST_DECREMENT,
	USE_STORE_PRE_INCREMENT, USE_STORE_PRE_DECREMENT,
	HARD_FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_IS_FRAME_POINTER,
	HARD_FRAME_POINTER_IS_ARG_POINTER): Relocate from rtl.h.
	* rtl.h: Move default definitions to defaults.h.
	(AUTO_INC_DEC): Adjust defintion check to assumed defaults exist.


Index: defaults.h
===
*** defaults.h	(revision 215355)
--- defaults.h	(working copy)
*** see the files COPYING3 and COPYING.RUNTI
*** 1168,1173 
--- 1168,1264 
  #define DEFAULT_PCC_STRUCT_RETURN 1
  #endif
  
+ #ifndef HAVE_PRE_INCREMENT
+ #define HAVE_PRE_INCREMENT 0
+ #endif
+ 
+ #ifndef HAVE_PRE_DECREMENT
+ #define HAVE_PRE_DECREMENT 0
+ #endif
+ 
+ #ifndef HAVE_POST_INCREMENT
+ #define HAVE_POST_INCREMENT 0
+ #endif
+ 
+ #ifndef HAVE_POST_DECREMENT
+ #define HAVE_POST_DECREMENT 0
+ #endif
+ 
+ #ifndef HAVE_POST_MODIFY_DISP
+ #define HAVE_POST_MODIFY_DISP 0
+ #endif
+ 
+ #ifndef HAVE_POST_MODIFY_REG
+ #define HAVE_POST_MODIFY_REG 0
+ #endif
+ 
+ #ifndef HAVE_PRE_MODIFY_DISP
+ #define HAVE_PRE_MODIFY_DISP 0
+ #endif
+ 
+ #ifndef HAVE_PRE_MODIFY_REG
+ #define HAVE_PRE_MODIFY_REG 0
+ #endif
+ 
+ /* Some architectures do not have complete pre/post increment/decrement
+instruction sets, or only move some modes efficiently.  These macros
+allow us to tune autoincrement generation.  */
+ 
+ #ifndef USE_LOAD_POST_INCREMENT
+ #define USE_LOAD_POST_INCREMENT(MODE)   HAVE_POST_INCREMENT
+ #endif
+ 
+ #ifndef USE_LOAD_POST_DECREMENT
+ #define USE_LOAD_POST_DECREMENT(MODE)   HAVE_POST_DECREMENT
+ #endif
+ 
+ #ifndef USE_LOAD_PRE_INCREMENT
+ #define USE_LOAD_PRE_INCREMENT(MODE)HAVE_PRE_INCREMENT
+ #endif
+ 
+ #ifndef USE_LOAD_PRE_DECREMENT
+ #define

Re: Fix ICE with ODR mering and variable sized types


On 09/22/2014 11:37 AM, Jan Hubicka wrote:

Variadic types indeed can not appear in global declarations, so I think it is 
safe to ignore them.


Agreed.

Jason

Re: [PATCH 4/5] Generalise invalid_mode_change_p


On 09/22/14 01:34, Richard Sandiford wrote:

Jeff Law l...@redhat.com writes:

On 09/18/14 04:25, Richard Sandiford wrote:

This is the main patch for the bug.  We should treat a register as invalid
for a mode change if simplify_subreg_regno cannot provide a new register
number for the result.  We should treat a class as invalid for a mode change
if all registers in the class are invalid.  This is an extension of the old
CANNOT_CHANGE_MODE_CLASS-based check (simplify_subreg_regno checks C_C_C_M).

I forgot to say that the patch is a prerequisite to removing aarch64's
C_C_C_M.  There are other prerequisites too, but removing C_C_C_M without
this patch caused regressions in the existing testsuite, which is why no
new tests are needed.


gcc/
* hard-reg-set.h: Include hash-table.h.
(target_hard_regs): Add a finalize method and a x_simplifiable_subregs
field.
* target-globals.c (target_globals::~target_globals): Handle
hard_regs-finalize.
* rtl.h (subreg_shape): New structure.
(shape_of_subreg): New function.
(simplifiable_subregs): Declare.
* reginfo.c (simplifiable_subreg): New structure.
(simplifiable_subregs_hasher): Likewise.
(simplifiable_subregs): New function.
(invalid_mode_changes): Delete.
(alid_mode_changes, valid_mode_changes_obstack): New variables.
(record_subregs_of_mode): Remove subregs_of_mode parameter.
Record valid mode changes in valid_mode_changes.
(find_subregs_of_mode): Remove subregs_of_mode parameter.
Update calls to record_subregs_of_mode.
(init_subregs_of_mode): Remove invalid_mode_changes and bitmap
handling.  Initialize new variables.  Update call to
find_subregs_of_mode.
(invalid_mode_change_p): Check new variables instead of
invalid_mode_changes.
(finish_subregs_of_mode): Finalize new variables instead of
invalid_mode_changes.
(target_hard_regs::finalize): New function.
* ira-costs.c (print_allocno_costs): Call invalid_mode_change_p
even when CLASS_CANNOT_CHANGE_MODE is undefined.

Index: gcc/rtl.h
===
--- gcc/rtl.h   2014-09-15 11:55:40.459855161 +0100
+++ gcc/rtl.h   2014-09-15 12:26:21.249077760 +0100
+/* Return the shape of a SUBREG rtx.  */
+
+static inline subreg_shape
+shape_of_subreg (const_rtx x)
+{
+  return subreg_shape (GET_MODE (SUBREG_REG (x)),
+  SUBREG_BYTE (x), GET_MODE (x));
+}
+

Is there some reason you don't have a constructor that accepts a
const_rtx?


I was worried that by allowing implicit const_rtx-subreg_shape
conversions, it would be less obvious that the rtx has to have
code SUBREG.  I.e. a checked conversion would be hidden in the
constructor rather than being explicit.

If with David's new rtx hierarchy we end up with an rtx_subreg
subclass then I agree we should have a constructor that takes
one of those.

Makes sense.

I'm not sure if I was explicit, but the patch is fine, that was more a 
curiosity on my part than anything.


jeff

C++ PATCH for variable template diagnostics


This patch fixes two issues with variable templates:

1) -Wunused was warning about a variable template specialization when 
leaving the template specialization scope, and
2) The diagnostic referring to the specialization wasn't showing the 
template arguments.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 51aed4b97f9a6dd8bc7954a4691f7790d6bc41c5
Author: Jason Merrill ja...@redhat.com
Date:   Mon Sep 22 10:33:57 2014 -0400

	* decl.c (poplevel): Don't warn about unused vars in template scope.
	* error.c (dump_decl): Handle variable templates.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index fe5a4af..12a9f43 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -624,6 +624,7 @@ poplevel (int keep, int reverse, int functionbody)
 
   /* Before we remove the declarations first check for unused variables.  */
   if ((warn_unused_variable || warn_unused_but_set_variable)
+   current_binding_level-kind != sk_template_parms
!processing_template_decl)
 for (tree d = getdecls (); d; d = TREE_CHAIN (d))
   {
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 86fd405..a03bfe1 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1044,6 +1044,18 @@ dump_decl (cxx_pretty_printer *pp, tree t, int flags)
 case FIELD_DECL:
 case PARM_DECL:
   dump_simple_decl (pp, t, TREE_TYPE (t), flags);
+
+  /* Handle variable template specializations.  */
+  if (TREE_CODE (t) == VAR_DECL
+	   DECL_LANG_SPECIFIC (t)
+	   DECL_TEMPLATE_INFO (t)
+	   PRIMARY_TEMPLATE_P (DECL_TI_TEMPLATE (t)))
+	{
+	  pp_cxx_begin_template_argument_list (pp);
+	  tree args = INNERMOST_TEMPLATE_ARGS (DECL_TI_ARGS (t));
+	  dump_template_argument_list (pp, args, flags);
+	  pp_cxx_end_template_argument_list (pp);
+	}
   break;
 
 case RESULT_DECL:
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ12.C b/gcc/testsuite/g++.dg/cpp1y/var-templ12.C
new file mode 100644
index 000..49ea588
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ12.C
@@ -0,0 +1,10 @@
+// { dg-do compile { target c++14 } }
+// { dg-options -Wall }
+
+template class T T x;
+template  int xint = 0;
+
+int main()
+{
+  return xint;
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ13.C b/gcc/testsuite/g++.dg/cpp1y/var-templ13.C
new file mode 100644
index 000..e398d22
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ13.C
@@ -0,0 +1,5 @@
+// { dg-do compile { target c++14 } }
+
+template class T T x;
+template  int xint = 0;
+template  int xint = 0;	// { dg-error xint }

[patch] Fix specializations of std::uses_allocator in queue and stack


Include missing header, and add tests.

Tested x86_64-linux, committed to trunk.
commit a0ffb1d4d0bf78be525c7546665ef7002b66b13e
Author: Jonathan Wakely jwak...@redhat.com
Date:   Mon Sep 22 16:42:51 2014 +0100

Include bits/uses_allocator.h in stack and queue.

	* include/bits/stl_queue.h: Include missing header.
	* include/bits/stl_stack.h: Likewise.
	* testsuite/23_containers/priority_queue/requirements/
	uses_allocator.cc: New.
	* testsuite/23_containers/queue/requirements/uses_allocator.cc: New.
	* testsuite/23_containers/stack/requirements/uses_allocator.cc: New.

diff --git a/libstdc++-v3/include/bits/stl_queue.h b/libstdc++-v3/include/bits/stl_queue.h
index b516664..32124e3 100644
--- a/libstdc++-v3/include/bits/stl_queue.h
+++ b/libstdc++-v3/include/bits/stl_queue.h
@@ -58,6 +58,9 @@
 
 #include bits/concept_check.h
 #include debug/debug.h
+#if __cplusplus = 201103L
+# include bits/uses_allocator.h
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
diff --git a/libstdc++-v3/include/bits/stl_stack.h b/libstdc++-v3/include/bits/stl_stack.h
index ee187da..f4bb72c 100644
--- a/libstdc++-v3/include/bits/stl_stack.h
+++ b/libstdc++-v3/include/bits/stl_stack.h
@@ -58,6 +58,9 @@
 
 #include bits/concept_check.h
 #include debug/debug.h
+#if __cplusplus = 201103L
+# include bits/uses_allocator.h
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
diff --git a/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc
new file mode 100644
index 000..efe73ae
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+// { dg-options -std=gnu++11 }
+// { dg-do compile }
+
+#include queue
+
+templatetypename A
+  using uses_allocator = std::uses_allocatorstd::priority_queueint, A;
+
+static_assert( uses_allocatorstd::allocatorint::value, valid allocator );
+
+struct X { };
+static_assert( !uses_allocatorX::value, invalid allocator );
diff --git a/libstdc++-v3/testsuite/23_containers/queue/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/queue/requirements/uses_allocator.cc
new file mode 100644
index 000..42106ca
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/queue/requirements/uses_allocator.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+// { dg-options -std=gnu++11 }
+// { dg-do compile }
+
+#include queue
+
+templatetypename A
+  using uses_allocator = std::uses_allocatorstd::queueint, A;
+
+static_assert( uses_allocatorstd::allocatorint::value, valid allocator );
+
+struct X { };
+static_assert( !uses_allocatorX::value, invalid allocator );
diff --git a/libstdc++-v3/testsuite/23_containers/stack/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/stack/requirements/uses_allocator.cc
new file mode 100644
index 000..3663d63
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/stack/requirements/uses_allocator.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,

Re: [patch] moving macro definitions to defaults.h

2014-09-22 Thread David Malcolm

On Mon, 2014-09-22 at 12:26 -0400, Andrew MacLeod wrote:
 After being reminded of the tm.h issues brought up last november (here:  
 https://gcc.gnu.org/ml/gcc-patches/2013-11/msg01731.html ), I started 
 looking back into it.
 
 The general summary is the any header file which has a conditional on a  
 target macro could be affected by include file reordering... ie,  if a 
 header file has
 #ifdef BLAH
   whatever
 
 and we move the header files around a bit, it wouldn't be immediately 
 obvious if the order where changes and BLAH was define later in the 
 include structure.  The results for this header files would be different 
 because whatever would no longer be in the preprocess source, and it 
 may take a long time to discover the difference.
 
 Josephs solution was to identify these and instead put a default 
 definition in default.h ...  then change all the uses to #if instead.. ie,
 #if BLAH
 
 This way we can ensure that the definition has been seen and it will be 
 a compile error if not.
 
 
 I looked at all the target macros listed by Joseph, and decide to start 
 with the ones which are used *only* in an if defined situation in a .h 
 files of some sort.
 ie
 #ifdef
 #ifndef
 or #if define()
 
 If this happens in a .c file, then changing the order of .h's shouldn't 
 matter. (Unless the .c file is doing something really screwy.  So for 
 the moment, ignoring that.)

There appears to be a particular implicit order in which headers must be
included.  I notice that e.g. tm.h has:

  #ifndef GCC_TM_H
  #define GCC_TM_H

so if we're going with this no header file includes any other header
file model, would it make sense to add something like:

#ifndef GCC_TM_H
#error tm.h must have been included by this point
/* We need tm.h here so that we can see: BAR, BAZ, QUUX,
   etc.  */
#endif

to header files needing it, thus expressing the expected dependencies
explicitly?


 So looking at only .h files, I found a number of default definitions in 
 rtl.h, which this patch moves to defaults.h.  I was going to #error if 
 defaults.h wasn't included, but found that to be unnecessary since it 
 uses some of those macros and would cause a compile failure anyway if it 
 wasn't included. All the other uses of these particular macros use the 
 #if model, so no further adjustment is required for them.


 
 This patch bootstraps on x86_64-unknown-linux-gnu, and regressions are 
 running.  Assuming no issues show up, OK for mainline?
 
 The remaining macros I will have a closer look at and most will likely 
 need the full treatment moving from #ifdef to the #if model.  I plan to 
 do these next.  The macros which fit this category for potential header 
 trouble (along with their usage count) are:
 
 2  USE_COLLECT2
 3  TARGET_TERMINATE_DW2_EH_FRAME_INFO
 3  TARGET_HAVE_CTORS_DTORS
 3  REGMODE_NATURAL_SIZE
 4  CC_STATUS_MDEP_INIT
 4  CC_STATUS_MDEP_INIT
 4  MODE_BASE_REG_REG_CLASS
 4  REGNO_MODE_OK_FOR_REG_BASE_P
 5  MODE_BASE_REG_CLASS
 5  XCOFF_DEBUGGING_INFO
 6  VMS_DEBUGGING_INFO
 6  TARGET_ASM_OUTPUT_ANCHOR
 6  EH_FRAME_SECTION_NAME
 8  MODE_CODE_BASE_REG_CLASS
 9  HARD_REGNO_CALLER_SAVE_MODE
 9  HARD_REGNO_CALL_PART_CLOBBERED
 9  SDB_DEBUGGING_INFO
 9  CC_STATUS_MDEP
 9  REGNO_MODE_CODE_OK_FOR_BASE_P
 10 SECONDARY_RELOAD_CLASS
 10 TARGET_VXWORKS_RTP
 14 NO_DOT_IN_LABEL
 14 REGNO_MODE_OK_FOR_BASE_P
 17 STACK_REGS
 19 TARGET_ASM_DESTRUCTOR
 20 OBJECT_FORMAT_ELF
 20 TARGET_ASM_CONSTRUCTOR
 24 NO_DOLLAR_IN_LABEL
 26 DBX_DEBUGGING_INFO
 30 CPLUSPLUS_CPP_SPEC
 42 ASM_OUTPUT_DEF

Re: [AArch64] Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c


On 09/22/14 05:16, Alan Lawrence wrote:

Ok thanks Jeff. In that case I think I should draw this to the attention
of the AArch64 maintainers to check the testsuite updates are OK before
I commit...?

Can't hurt.



Methinks it may be possible to get further, or at least increase our
confidence, if I mock out try_widen_shift_mode, and/or try injecting
some dubious RTL from a builtin, although this'll only give a momentary
snapshot of behaviour. I may or may not have time to look into this
though ;)...
Yea, it's something I'd pondered as well, though I tend to inject the 
RTL I want directly in the debugger.  The downside is doing so doesn't 
ensure various tables are updated properly, and I vaguely recall a 
per-pseudo table for the combiner's nonzero_bits, signbit_copies and 
friends.



jeff

Re: [patch] moving macro definitions to defaults.h

On Mon, 22 Sep 2014, Andrew MacLeod wrote:

 Josephs solution was to identify these and instead put a default definition in
 default.h ...  then change all the uses to #if instead.. ie,
 #if BLAH
 
 This way we can ensure that the definition has been seen and it will be a
 compile error if not.

No, my suggestion was that whenever possible we should change preprocessor 
conditionals - #ifdef or #if - into C conditionals - if (MACRO).

Changing from #ifdef to #if does nothing to make a missing tm.h include 
produce an error - the undefined macro simply quietly gets treated as 0 in 
preprocessor conditionals.  To get an error from #if in such cases, you'd 
need to build GCC with -Wundef (together with existing -Werror), and I'd 
guess there are plenty of places that are not -Wundef clean at present.

Now, I think moves of defaults to defaults.h are generally a good idea, 
and that moving from defined/undefined to true/false semantics are also a 
good idea - even if the way the macro is used means you can't take the 
further step of converting from #if to if ().  They don't solve the 
problem of making a missing tm.h include immediately visible, but they 
*do* potentially help with future automatic refactoring to convert target 
macros into hooks.

Obviously such moves do require checking the definitions and uses of the 
macros in question; you need to make sure you catch all places that use 
#ifdef / #if defined etc. on the macro (and make sure they have the same 
default).  And if you're changing the semantics of the macro from defined 
/ undefined to true / false, you need to watch out for any existing 
definitions with an empty expansion, or an expansion to 0, etc.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [patch] moving macro definitions to defaults.h

On Mon, 22 Sep 2014, David Malcolm wrote:

 There appears to be a particular implicit order in which headers must be
 included.  I notice that e.g. tm.h has:
 
   #ifndef GCC_TM_H
   #define GCC_TM_H
 
 so if we're going with this no header file includes any other header
 file model, would it make sense to add something like:
 
 #ifndef GCC_TM_H
 #error tm.h must have been included by this point
 /* We need tm.h here so that we can see: BAR, BAZ, QUUX,
etc.  */
 #endif
 
 to header files needing it, thus expressing the expected dependencies
 explicitly?

In principle, yes.

In practice, some headers have definitions that depend on tm.h but for 
most users this doesn't matter.  For example, flags.h depends on 
SWITCHABLE_TARGET.  (I think the fix there is to make most users use 
options.h instead, and move miscellaneous declarations from flags.h to 
other headers.)  In some cases, the target macro may be used only in a 
macro expansion.  (BITS_PER_UNIT isn't strictly a target macro any more, 
but when it was its uses in tree.h were an example of that.  tree.h still 
depends on the target macros NO_DOLLAR_IN_LABEL, NO_DOT_IN_LABEL and 
TARGET_DLLIMPORT_DECL_ATTRIBUTES, however, but we shouldn't make all 
tree.h users include tm.h.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

2014-09-22 Thread Andrew Pinski

On Mon, Sep 22, 2014 at 4:10 AM, Alan Lawrence alan.lawre...@arm.com wrote:
 Well, I haven't looked into this in detail: I've gone only as far as
   * swapping emit-rtl.o between 'good' compiles (svn r214042) and 'bad'
 compiles (r214043), finding that the critical difference is in the
 emit-rtl.o generated by r214043;
   *looking at the relocations in the 'bad' emit_rtl.o, seeing new entries
 'fixed_regs + ', and that Richard Biener's changelog specifically
 mentions stripping signedness changes (and introduces the SIGN_NOPS).

 However, I apply your patch (minus the hunk adding the (set_attr type
 load1), this appears to have gone in already), and still see the same error
 message:

 emit-rtl.o: In function `gen_rtx_REG':
 emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
 section in regclass.o
 emit-rtl.o: In function `gen_rtx':
 emit-rtl.c:(.text+0x1824): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
 section in regclass.o
 collect2: error: ld returned 1 exit status

 and still see the same (suspicious-looking, although perhaps not convicted)
 relocations:

 $ readelf --relocs
 benchspec/CPU2006/403.gcc/build/build_base_test./emit-rtl.o | grep
 fixed_regs
 12a8  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
 12ac  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
 12f8  005d0113 R_AARCH64_ADR_PRE  fixed_regs +
 
 12fc  005d0116 R_AARCH64_LDST8_A  fixed_regs +
 
 1824  005d0113 R_AARCH64_ADR_PRE  fixed_regs +
 
 1828  005d0116 R_AARCH64_LDST8_A  fixed_regs +
 
 186c  005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
 1870  005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0

 I've also now bootstrapped my patch (STRIP_NOPS - STRIP_SIGN_NOPS * 2) on
 aarch64-none-linux-gnu and x86_64-none-linux-gnu, and check-gcc with no
 regressions, so would like to propose that patch for trunk...?


You need to track down where R_AARCH64_ADR_PREL_PG_HI21 reloc is being
created in the assembly and then track down why GCC is using tiny
model here.  Note my fix was for a similar issue; not necessary the
exact same one in that there could be another pattern which needs to
use the new constraint too.

Thanks,
Andrew


 --Alan




 Andrew Pinski wrote:

 On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com
 wrote:

 We've been seeing errors using aarch64-none-linux-gnu gcc to build the
 403.gcc benchmark from spec2k6, that we've traced back to this patch. The
 error looks like:

 /home/alalaw01/bootstrap_richie/gcc/xgcc
 -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53
 -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o
 attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o
 c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o
 cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o
 cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o
 alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o
 cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o
 combine.o
 conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o
 diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o
 emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o
 function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o
 hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o
 insn-extract.o
 insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o
 jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o
 params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o
 reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o
 resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o
 sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o
 ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o
 tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o
 xcoffout.o
 ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o
 xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o
 lbasename.o
 getpwd.o ucbqsort.o -lm-o gcc
 emit-rtl.o: In function `gen_rtx_REG':
 emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
 section in regclass.o
 emit-rtl.o: In function `gen_rtx':
 emit-rtl.c:(.text+0x1824): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
 section in regclass.o
 collect2: error: ld returned 1

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge


On 09/22/14 04:24, Jiong Wang wrote:

Great.  Can you send an updated patchkit for review.


patch attached.

please review, thanks.

gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the
live-in of new created BB as the intersection of live-in from
old_dest and live-out from bb.

Looks good.  However, before committing we need a couple things.

1. Bootstrap  regression test this variant of the patch.  I know you 
tested an earlier one, but please test this one just to be sure.


2. Testcase.  I think you could test for either the reduction in the 
live-in set of the newly created block or that you're shrink wrapping 
one or more functions you didn't previously shrink-wrap.  I think it's 
fine if this test is target specific.


Jeff

Re: Speedup int_bit_from_pos

2014-09-22 Thread Jan Hubicka

  On Sun, 21 Sep 2014, Jan Hubicka wrote:
  

Please omit static from inline functions.
   
   Yep, I suppose we want to drop static in all inlines? I can make patch 
   for that.

Also one notable difference with your patches is that the fits hwi is 
now not tested on the result but on the result input which, multiplied 
by 8, might not fit a hwi now.  So please use wide-ints here (the 
to_offset flavor).
   
   The function must always suceed (so user promise it will fit in HWI) and 
   for
   performance reasons I would rather not go into wide int by defualt, but I 
   can
   do that with checking enabled.
  
  wide-int should be fast enough, please use it.
 
 Like this?
 
 Index: tree.h
 ===
 --- tree.h(revision 215421)
 +++ tree.h(working copy)
 @@ -3877,10 +3877,20 @@ extern tree size_in_bytes (const_tree);
  extern HOST_WIDE_INT int_size_in_bytes (const_tree);
  extern HOST_WIDE_INT max_int_size_in_bytes (const_tree);
  extern tree bit_position (const_tree);
 -extern HOST_WIDE_INT int_bit_position (const_tree);
  extern tree byte_position (const_tree);
  extern HOST_WIDE_INT int_byte_position (const_tree);
  
 +/* Like bit_position, but return as an integer.  It must be representable in
 +   that way (since it could be a signed value, we don't have the
 +   option of returning -1 like int_size_in_byte can.  */
 +
 +static inline HOST_WIDE_INT int_bit_position (const_tree field)
 +{ 
 +  return ((wide_int)DECL_FIELD_OFFSET (field) * BITS_PER_UNIT
 +   + (wide_int)DECL_FIELD_BIT_OFFSET (field)).to_shwi ();

Hmm, this gets me:
/aux/hubicka/trunk-6/gcc/testsuite/gcc.c-torture/compile/2224-1.c:39:7: 
internal compiler error: in decompose, at wide-int.h:911
0x656ed6 wi::int_traitsgeneric_wide_intwide_int_storage ::decompose(long*, 
unsigned int, generic_wide_intwide_int_storage const)
../../gcc/wide-int.h:911
0x6ec9f4 
wide_int_ref_storagetrue::wide_int_ref_storagegeneric_wide_intwide_int_storage
 (generic_wide_intwide_int_storage const, unsigned int)
../../gcc/wide-int.h:959
0x6ec75a generic_wide_intwide_int_ref_storagetrue 
::generic_wide_intgeneric_wide_intwide_int_storage 
(generic_wide_intwide_int_storage const, unsigned int)
../../gcc/wide-int.h:735
0x7d52ec wi::binary_traitsgeneric_wide_intwide_int_storage, 
generic_wide_intwide_int_storage, 
wi::int_traitsgeneric_wide_intwide_int_storage ::precision_type, 
wi::int_traitsgeneric_wide_intwide_int_storage 
::precision_type::result_type wi::addgeneric_wide_intwide_int_storage, 
generic_wide_intwide_int_storage (generic_wide_intwide_int_storage const, 
generic_wide_intwide_int_storage const)
../../gcc/wide-int.h:2287
0x7d4fcc wi::binary_traitsgeneric_wide_intwide_int_storage, 
generic_wide_intwide_int_storage, (wi::precision_type)1, 
wi::int_traitsgeneric_wide_intwide_int_storage 
::precision_type::result_type 
generic_wide_intwide_int_storage::operator+generic_wide_intwide_int_storage
 (generic_wide_intwide_int_storage const) const
../../gcc/wide-int.h:696
0xf282ab int_bit_position
../../gcc/tree.h:3890

on precision mismatch. What is correct way to compute this?
I tried offset_int but I do not seem smart enough to get past copmile errors 
with that one.
 +}
 +
 +
  #define sizetype sizetype_tab[(int) stk_sizetype]
  #define bitsizetype sizetype_tab[(int) stk_bitsizetype]
  #define ssizetype sizetype_tab[(int) stk_ssizetype]

Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions


On 09/22/14 04:29, Jiong Wang wrote:

On 19/09/14 21:43, Jeff Law wrote:


On 09/15/14 08:33, Jiong Wang wrote:

Jeff,

   thanks, I partially understand your meaning here.

take the function ira_implicitly_set_insn_hard_regs in ira-lives.c
for example,

when generating address rtl, gcc will automatically generate const
operator to prefix
the address expression, like the following. so a simple CONSTANT_P
check is enough in
case there is no embedded register.

(insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
  (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40]
var_decl 0x2b2c2ff91510 recog_data)
  (const_int 480 [0x1e0] -1


but for architecture like aarch64, the following instruction
sequences to forming address
may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
  (high:DI (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) 35 {*movdi_aarch64}
   (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags
0xc0]  var_decl 0x7ff755a60900 stats))
  (nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
  (lo_sum:DI (reg/f:DI 20 x20 [99])
  (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) {add_losym_di}
   (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0]
var_decl 0x7ff755a60900 stats)
  (nil)))

   while CONSTANT_P could not catch the latter lo_sum case, as the
RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,

Hmm, it's been ~15 years since I regularly worked on a target that uses
HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST
as well, but reading the docs for CONST, that clearly isn't the case.

Sorry for that.  Can you (re) send your current patch for this for
review?


patch attached.

please review, thanks.

gcc/
   * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when
!REG_P (src) to
   release more instruction sink opportunities.

gcc/testsuite/
   * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase.
Thanks.  Please verify this version passes a bootstrap  regression 
test.  Assuming it does it is OK for the trunk.


jeff

Re: Speedup int_bit_from_pos

2014-09-22 Thread Mike Stump

On Sep 22, 2014, at 8:51 AM, Jan Hubicka hubi...@ucw.cz wrote:
 On Sun, 21 Sep 2014, Jan Hubicka wrote:
 
 
 Please omit static from inline functions.
 
 Yep, I suppose we want to drop static in all inlines? I can make patch for 
 that.
 
 Also one notable difference with your patches is that the fits hwi is now 
 not tested on the result but on the result input which, multiplied by 8, 
 might not fit a hwi now.  So please use wide-ints here (the to_offset 
 flavor).
 
 The function must always suceed (so user promise it will fit in HWI) and for
 performance reasons I would rather not go into wide int by defualt, but I 
 can
 do that with checking enabled.
 
 wide-int should be fast enough, please use it.
 
 Like this?
 
 Index: tree.h
 ===
 --- tree.h(revision 215421)
 +++ tree.h(working copy)
 @@ -3877,10 +3877,20 @@ extern tree size_in_bytes (const_tree);
 extern HOST_WIDE_INT int_size_in_bytes (const_tree);
 extern HOST_WIDE_INT max_int_size_in_bytes (const_tree);
 extern tree bit_position (const_tree);
 -extern HOST_WIDE_INT int_bit_position (const_tree);
 extern tree byte_position (const_tree);
 extern HOST_WIDE_INT int_byte_position (const_tree);
 
 +/* Like bit_position, but return as an integer.  It must be representable in
 +   that way (since it could be a signed value, we don't have the
 +   option of returning -1 like int_size_in_byte can.  */
 +
 +static inline HOST_WIDE_INT int_bit_position (const_tree field)
 +{ 
 +  return ((wide_int)DECL_FIELD_OFFSET (field) * BITS_PER_UNIT
 +   + (wide_int)DECL_FIELD_BIT_OFFSET (field)).to_shwi ();
 +}
 +
 +

Not quite:

  offset_int woffset
= (wi::to_offset (xoffset)
   + wi::lrshift (wi::to_offset (DECL_FIELD_BIT_OFFSET (field)),
  LOG2_BITS_PER_UNIT));

offset_int is the type that can hold all bit positions, byte positions, byte 
sizes and bit sizes.

One can use wi::to_offset to convert things into it.  You can see woffset uses 
to see that various uses to convert back out.  The idea is that eventually all 
the code that plays with that concept, could use that type.  Convert into it 
sooner, and convert out of it later.

Re: [GOOGLE] Fix LIPO COMDAT fixup and gcov-tool interactions

2014-09-22 Thread Xinliang David Li

On Mon, Sep 22, 2014 at 1:36 AM, Nathan Sidwell nat...@acm.org wrote:
 On 09/21/14 18:58, Xinliang David Li wrote:

 the intent is that that points to the gcov_info object of the object file
 containing the live version of the function.  I couldn't quite get this
 to
 work though -- it involves emitting a function's gcov_fn_info decl in the
 same comdat group as the function itself.


 Another problem is that comdat functions may have different CFGs due
 to different early inline decisions. Comdatting gcov counters can lead
 to problems in profile use. Not comdatting profile counters have
 another advantage -- it allows context sensitive profiling for comdat
 function inline instances (IPA-inline).


 IIRC early inlining is done before the counters are created.

Yes, and that will be the cause of problem for coverage mismatch when
COMDATing profile counters -- due to difference in early inline
decisions, the counter array created may be different across modules.

You're right
 later inlining may be a problem, and require a non-comdat set of cloned
 counters.   I can't recall exactly at what stage the counters are now
 inserted relative to inlining.  The CFG machinery had a number of
 significant changes while, and shortly after, I was working on this.

After early inining and before ipa inline.  The early inline can lead
to the coverage mismatch, and the latter can lead to loss of profile
precision with the counter comdat.


David


 You'll see the checking of gfi_ptr-key != gi_ptr in libgcov-driver.c.

 Are you making use of this machinery, or inventing new machinery?


 Teresa's method is a different machinery -- it tries to propagate
 profile data from the selected comdat copy + inline instance copies to
 comdat copies with zero counts.


 It'd be preferrable to complete the mechanism I outline above, rather than
 have a competing mechanism.  Also, this patch  is in effect lying because
 the data then makes it look like the unselected comdat instances are in fact
 being executed -- looking at the whole program it's going to be harder to
 understand whether the different inline instances are being executed
 multiple times, or are duplicate data.  Does the gcov user output indicate
 this subtlety in some way?

 nathan

Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI