RE: [PATCH GCC]Simplify address expression in IVOPT

2013-11-04 Thread bin.cheng


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, October 30, 2013 10:46 PM
 To: Bin Cheng
 Cc: GCC Patches
 Subject: Re: [PATCH GCC]Simplify address expression in IVOPT
 
 On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng bin.ch...@arm.com wrote:
 
 Hmm.  I think you want what get_inner_reference_aff does, using the return
 value of get_inner_reference as starting point for determine_base_object.
 And you don't want to restrict yourselves so much on what exprs to
process,
 but only exclude DECL_Ps.
Did you mean I should pass all address expressions to
get_inner_reference_aff except the one with declaration operand?
I am a little confused here why DECL_Ps should be handled specially?  Seems
it can be handled properly by the get_* function.  Anything important I
missed?

 Just amend get_inner_reference_aff to return the tree base object.
I found it's inappropriate to do this because: functions like
get_inner_reference* only accept reference expressions, while base_object
has to be computed for other kinds of expressions too.  Take gimple
statement a_1 = *ptr_1 as an example, the base passed to alloc_iv is
ptr_1; the base_object computed by determine_base_object is also ptr_1,
which we can't rely on get_inner_reference* .

Also It seems the comment before determine_base_object might be inaccurate:
 Returns a memory object to that EXPR points. which I think is  Returns a
pointer pointing to the main memory object to that EXPR points.
Right?
 
 Note that this isn't really simplifying but rather lowering all
addresses.
 
 The other changes in this patch are unrelated, right?
Right, I should have named this message like refine cost computation of
expressions in IVOPT just as the patch file.  

Thanks.
bin






Re: [wide-int] Update main comment

2013-11-04 Thread Richard Biener
On Wed, 30 Oct 2013, Richard Sandiford wrote:

 Kenneth Zadeck zad...@naturalbridge.com writes:
  On 10/30/2013 07:01 AM, Richard Sandiford wrote:
  Kenneth Zadeck zad...@naturalbridge.com writes:
  On 10/29/2013 06:37 PM, Richard Sandiford wrote:
  This patch tries to update the main wide_int comment to reflect the 
  current
  implementation.
 
  - bitsizetype is TImode on x86_64 and others, so I don't think it's
  necessarily true that all offset_ints are signed.  (widest_int are
  though.)
  i am wondering if this is too conservative an interpretation.I
  believe that they are ti mode because that is the next thing after di
  mode and so they wanted to accommodate the 3 extra bits. Certainly there
  is no x86 that is able to address more than 64 bits.
  Right, but my point is that it's a different case from widest_int.
  It'd be just as valid to do bitsizetype arithmetic using wide_int
  rather than offset_int, and those wide_ints would have precision 128,
  just like the offset_ints.  And I wouldn't really say that those wide_ints
  were fundamentally signed in any way.  Although the tree layer might know
  that X upper bits of the bitsizetype are always signs, the tree-wide_int
  interface treats them in the same way as any other 128-bit type.
 
  Maybe I'm just being pedantic, but I think offset_int would only be like
  widest_int if bitsizetype had precision 67 or whatever.  Then we could
  say that both offset_int and widest_int must be wider than any inputs,
  meaning that there's at least one leading sign bit.
  this was of course what mike and i wanted, but we could not really 
  figure out how to pull it off.
  in particular, we could not find any existing reliable marker in the 
  targets to say what the width of the widest pointer on any 
  implementation.   We actually used the number 68 rather than 67 because 
  we assumed 64 for the widest pointer on any existing platform, 3 bits 
  for the bits and 1 bit for the sign.
 
 Ah yeah, 68 would be better for signed types.
 
 Is the patch OK while we still have 128-bit bitsizetypes though?
 I agree the current comment would be right if we ever did switch
 to sub-128 bitsizes.

The issue with sub-128bit bitsizetype is code generation quality.
We do generate code for bitsizetype operations (at least from Ada),
so a power-of-two precision is required to avoid a lot of masking
operations.

Richard.


[wide-int] Fix UNSIGNED_FIX typo

2013-11-04 Thread Richard Sandiford
t had got switched to x during the conversion.  This showed up in
one of the alpha g++.dg tests.

Applied as obvious.

Richard


Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  2013-11-03 13:01:39.083385620 +
+++ gcc/simplify-rtx.c  2013-11-03 20:22:41.115469359 +
@@ -1819,7 +1819,8 @@ simplify_const_unary_operation (enum rtx
  if (REAL_VALUES_LESS (t, x))
return immed_wide_int_const (wmax, mode);
 
- return immed_wide_int_const (real_to_integer (t, fail, width), 
mode);
+ return immed_wide_int_const (real_to_integer (x, fail, width),
+  mode);
  break;
 
default:


[wide-int] Make integer_onep return false for signed 1-bit bitfields

2013-11-04 Thread Richard Sandiford
As discussed on gcc@, integer_onep is supposed to return false for
a nonzero 1-bit value if the type is signed.

Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/tree.c
===
--- gcc/tree.c  2013-10-29 19:19:27.623468618 +
+++ gcc/tree.c  2013-11-02 17:25:06.499657501 +
@@ -2092,7 +2092,7 @@ integer_onep (const_tree expr)
   switch (TREE_CODE (expr))
 {
 case INTEGER_CST:
-  return wi::eq_p (expr, 1);
+  return wi::eq_p (wi::to_widest (expr), 1);
 case COMPLEX_CST:
   return (integer_onep (TREE_REALPART (expr))
   integer_zerop (TREE_IMAGPART (expr)));


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Biener
On Mon, 4 Nov 2013, Richard Biener wrote:

 On Fri, 1 Nov 2013, Richard Sandiford wrote:
 
  I'm building one target for each supported CPU and comparing the wide-int
  assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
  output from the merge point.  This patch removes all the differences I saw
  for alpha-linux-gnu in gcc.c-torture.
  
  Hunk 1: Preserve the current trunk behaviour in which the shift count
  is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
  inspection after hunk 5.
  
  Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
  types and where we weren't extending according to the sign of the source.
  We should probably assert that the input is at least as wide as the type...
  
  Hunk 4: The  in:
  
if ((TREE_INT_CST_HIGH (arg1)  mask_hi) == mask_hi
 (TREE_INT_CST_LOW (arg1)  mask_lo) == mask_lo)
  
  had got dropped during the conversion.
  
  Hunk 5: The old code was:
  
if (shift  0)
  {
*mask = r1mask.llshift (shift, TYPE_PRECISION (type));
*val = r1val.llshift (shift, TYPE_PRECISION (type));
  }
else if (shift  0)
  {
shift = -shift;
*mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
*val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
  }
  
  and these precision arguments had two purposes: to control which
  bits of the first argument were shifted, and to control the truncation
  mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
  for the second.
  
  (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
  end of the !GENERATOR_FILE list in rtl.def, since the current position 
  caused
  some spurious differences.  The problem AFAICT is that hash_rtx hashes
  on code, RTL PRE creates registers in the hash order of the associated
  expressions, RA uses register numbers as a tie-breaker during ordering,
  and so the order of rtx_code can indirectly influence register allocation.
  First time I'd realised that could happen, so just thought I'd mention it.
  I think we should keep rtl.def in the current (logical) order though.)
  
  Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
 
 Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
 from double-int.c on the trunk?  If not then putting this at the
 callers of wi::rshift and friends is clearly asking for future
 mistakes.
 
 Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
 than in machine instruction patterns was a mistake in the past.

Oh, and my understanding is that it's maybe required for correctness on
the RTL side if middle-end code removes masking operations.
Constant propagation results later may not change the result because
of that.  I'm not sure this is the way it happens, but if we have

  (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f))

and we'd transform it to (based on SHIFT_COUNT_TRUNCATED)

  (lshift:si (reg:si 1) (reg:qi 2))

and later constant propagate 77 to reg:qi 2.

That isn't the way SHIFT_COUNT_TRUNCATED should work, of course,
but instead the backends should have a define_insn which matches
the 'and' and combine should try to match that.

You can see that various architectures may have truncating shifts
but not for all patterns which results in stuff like

config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD

which clearly means that it's not implemented in terms of providing
shift insns which can match a shift operand that is masked.

That is, either try to clean that up (hooray!), or preserve
the double-int.[ch] behavior (how does CONST_INT RTL constant
folding honor it?).

Richard.

 Thanks,
 Richard.
 
 [btw, please use diff -p to get us at least some context if you
 are not writing changelogs ...]
 
  Thanks,
  Richard
  
  
  Index: gcc/fold-const.c
  ===
  --- gcc/fold-const.c(revision 204247)
  +++ gcc/fold-const.c(working copy)
  @@ -1008,9 +1008,12 @@
 The following code ignores overflow; perhaps a C standard
 interpretation ruling is needed.  */
  res = wi::rshift (arg1, arg2, sign,
  - GET_MODE_BITSIZE (TYPE_MODE (type)));
  + SHIFT_COUNT_TRUNCATED
  + ? GET_MODE_BITSIZE (TYPE_MODE (type)) : 0);
 else
  -   res = wi::lshift (arg1, arg2, GET_MODE_BITSIZE (TYPE_MODE (type)));
  +   res = wi::lshift (arg1, arg2,
  + SHIFT_COUNT_TRUNCATED
  + ? GET_MODE_BITSIZE (TYPE_MODE (type)) : 0);
 break;
 
   case RROTATE_EXPR:
  @@ -6870,7 +6873,8 @@
   return NULL_TREE;
   
 if (TREE_CODE (arg1) == INTEGER_CST)
  -arg1 = force_fit_type (inner_type, arg1, 0, TREE_OVERFLOW (arg1));
  +arg1 = force_fit_type (inner_type, wi::to_widest (arg1), 0,
  +

Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Sandiford
Richard Biener rguent...@suse.de writes:
 On Fri, 1 Nov 2013, Richard Sandiford wrote:
 I'm building one target for each supported CPU and comparing the wide-int
 assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
 output from the merge point.  This patch removes all the differences I saw
 for alpha-linux-gnu in gcc.c-torture.
 
 Hunk 1: Preserve the current trunk behaviour in which the shift count
 is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
 inspection after hunk 5.
 
 Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
 types and where we weren't extending according to the sign of the source.
 We should probably assert that the input is at least as wide as the type...
 
 Hunk 4: The  in:
 
if ((TREE_INT_CST_HIGH (arg1)  mask_hi) == mask_hi
 (TREE_INT_CST_LOW (arg1)  mask_lo) == mask_lo)
 
 had got dropped during the conversion.
 
 Hunk 5: The old code was:
 
if (shift  0)
  {
*mask = r1mask.llshift (shift, TYPE_PRECISION (type));
*val = r1val.llshift (shift, TYPE_PRECISION (type));
  }
else if (shift  0)
  {
shift = -shift;
*mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
*val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
  }
 
 and these precision arguments had two purposes: to control which
 bits of the first argument were shifted, and to control the truncation
 mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
 for the second.
 
 (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
 end of the !GENERATOR_FILE list in rtl.def, since the current position caused
 some spurious differences.  The problem AFAICT is that hash_rtx hashes
 on code, RTL PRE creates registers in the hash order of the associated
 expressions, RA uses register numbers as a tie-breaker during ordering,
 and so the order of rtx_code can indirectly influence register allocation.
 First time I'd realised that could happen, so just thought I'd mention it.
 I think we should keep rtl.def in the current (logical) order though.)
 
 Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?

 Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
 from double-int.c on the trunk?  If not then putting this at the
 callers of wi::rshift and friends is clearly asking for future
 mistakes.

 Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
 than in machine instruction patterns was a mistake in the past.

Sure, I can try, but what counts as success?  For better or worse:

unsigned int i = 1  32;

has traditionally honoured target/SHIFT_COUNT_TRUNCATED behaviour,
so (e.g.) i is 0 for x86_64 and 1 for MIPS.  So if the idea is to
get rid of SHIFT_COUNT_TRUNCATED at the tree level, it will be a
visible change (but presumably only for undefined code).

 [btw, please use diff -p to get us at least some context if you
 are not writing changelogs ...]

Yeah, sorry, this whole no changelog thing has messed up my workflow :-)
The scripts I normally use did that for me.

Thanks,
Richard


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Sandiford
Richard Biener rguent...@suse.de writes:
 On Mon, 4 Nov 2013, Richard Biener wrote:

 On Fri, 1 Nov 2013, Richard Sandiford wrote:
 
  I'm building one target for each supported CPU and comparing the wide-int
  assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
  output from the merge point.  This patch removes all the differences I saw
  for alpha-linux-gnu in gcc.c-torture.
  
  Hunk 1: Preserve the current trunk behaviour in which the shift count
  is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
  inspection after hunk 5.
  
  Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
  types and where we weren't extending according to the sign of the source.
  We should probably assert that the input is at least as wide as the type...
  
  Hunk 4: The  in:
  
   if ((TREE_INT_CST_HIGH (arg1)  mask_hi) == mask_hi
(TREE_INT_CST_LOW (arg1)  mask_lo) == mask_lo)
  
  had got dropped during the conversion.
  
  Hunk 5: The old code was:
  
   if (shift  0)
 {
   *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
   *val = r1val.llshift (shift, TYPE_PRECISION (type));
 }
   else if (shift  0)
 {
   shift = -shift;
   *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
   *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
 }
  
  and these precision arguments had two purposes: to control which
  bits of the first argument were shifted, and to control the truncation
  mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
  for the second.
  
  (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
  end of the !GENERATOR_FILE list in rtl.def, since the current position 
  caused
  some spurious differences.  The problem AFAICT is that hash_rtx hashes
  on code, RTL PRE creates registers in the hash order of the associated
  expressions, RA uses register numbers as a tie-breaker during ordering,
  and so the order of rtx_code can indirectly influence register allocation.
  First time I'd realised that could happen, so just thought I'd mention it.
  I think we should keep rtl.def in the current (logical) order though.)
  
  Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
 
 Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
 from double-int.c on the trunk?  If not then putting this at the
 callers of wi::rshift and friends is clearly asking for future
 mistakes.
 
 Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
 than in machine instruction patterns was a mistake in the past.

 Oh, and my understanding is that it's maybe required for correctness on
 the RTL side if middle-end code removes masking operations.
 Constant propagation results later may not change the result because
 of that.  I'm not sure this is the way it happens, but if we have

   (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f))

 and we'd transform it to (based on SHIFT_COUNT_TRUNCATED)

   (lshift:si (reg:si 1) (reg:qi 2))

 and later constant propagate 77 to reg:qi 2.

 That isn't the way SHIFT_COUNT_TRUNCATED should work, of course,
 but instead the backends should have a define_insn which matches
 the 'and' and combine should try to match that.

 You can see that various architectures may have truncating shifts
 but not for all patterns which results in stuff like

 config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD

 which clearly means that it's not implemented in terms of providing
 shift insns which can match a shift operand that is masked.

 That is, either try to clean that up (hooray!), or preserve
 the double-int.[ch] behavior (how does CONST_INT RTL constant
 folding honor it?).

The CONST_INT code does the modulus explicitly:

  /* Truncate the shift if SHIFT_COUNT_TRUNCATED, otherwise make sure
 the value is in range.  We can't return any old value for
 out-of-range arguments because either the middle-end (via
 shift_truncation_mask) or the back-end might be relying on
 target-specific knowledge.  Nor can we rely on
 shift_truncation_mask, since the shift might not be part of an
 ashlM3, lshrM3 or ashrM3 instruction.  */
  if (SHIFT_COUNT_TRUNCATED)
arg1 = (unsigned HOST_WIDE_INT) arg1 % width;
  else if (arg1  0 || arg1 = GET_MODE_BITSIZE (mode))
return 0;

There was a strong feeling (from me and others) that wide-int.h
should not depend on tm.h.  I don't think wi::lshift itself should
know about SHIFT_COUNT_TRUNCATED.  Especially since (and I think we agree
on this :-)) it's a terrible interface.  It would be better if it took
a mode as an argument and (perhaps) returned a mask as a result.  But that
would make it even harder for wide-int.h to do the right thing, because
it doesn't know about modes.  Having the callers do 

Re: [RFC] libgcov.c re-factoring and offline profile-tool

2013-11-04 Thread Jan Hubicka
 Thanks! Can you attach the patch file for the proposed change?
 
 David
 
 On Fri, Nov 1, 2013 at 2:08 PM, Rong Xu x...@google.com wrote:
  libgcov.c is the main file to generate libgcov.a. This single source 
  generates
  21 object files and then archives to a library. These objects are of
  very different purposes but they are in the same file guarded by various 
  macros.
  The source file becomes quite large and its readability becomes very
  low. In its current state:
  1) The code is very hard to understand by new developers;

Agreed, libgcov has became hard to maintain since several parts are written
in very low-level way.
  2) It makes regressions more likely when making simple changes;
  More importantly,
  3) it makes code reuse/sharing very difficult. For instance, Using
  libgcov to implement an offline profile tool (see below); kernel FDO
  support etc.

Yep, it was my longer term plan to do this, too.
 
  We came to this issue when working on an offline tool to handle
  profile counters.
  Our plan is to have a profile-tool can merge, diff, collect statistics, and
  better dump raw profile counters. This is one of the most requested features
  from out internal users. This tool can also be very useful for any 
  FDO/coverage
  users. In the first part of tool, we want to have the weight
  merge. Again, to reuse the code, we have to change libgcov.c. But we don't 
  want
  to further diverge libgcov.a in our google branches from the trunk.
 
  Another issue in libgcov.c is function gcov_exit(). It is a huge function
  with about 467 lines of code. This function traverses gcov_list and dumps 
  all
  the gcda files. The code for merging the gcda files also sits in the same
   function. The structure makes the code-reuse and extension really difficult
  (like in kernel FDO, we have to break this function to reuse some of the 
  code).
 
  We propose to break gcov_exit into smaller functions and split libgcov.c 
  into
  several files. Our plan for profile-tool is listed in (3).
 
  1. break gcov_exit() into the following structure:
 gcov_exit()
-- gcov_exit_compute_summary()
-- allocate_filename_struct()
for gi_ptr in gcov_list
  -- gcov_exit_dump_gcov()
 -- gcov_exit_open_gcda_file()
 -- gcov_exit_merge_gcda ()
 -- gcov_exit_write_gcda ()

Just a short comment that summaries are also produced during the merging - i.e. 
they are
done on per-file basis. But otherwise I agree.
We should be cureful to not increase footprint of libgcov much for embedded 
users, but
I believe changes like this can be done w/o additional overhead in the 
optimized library.
 
  2. split libgcov.c into the following files:
   libgcov-profiler.c
   libgcov-merge.c
   libgcov-interface.c
   libgcov-driver.c
 libgcov-driver-system.c (source included into libgcov-driver.c)

Seems resonable.  Splitting i/o stuff into separate module (driver) is 
something I planned
for long time.  What is difference in between libgcov-interface and 
libgcov-driver?
 
  The details of the splitting:
  libgcov-interface.c
  /* This file contains API functions to other programs and the supporting
 functions.  */
__gcov_dump()
__gcov_execl()
__gcov_execle()
__gcov_execlp()
__gcov_execv()
__gcov_execve()
__gcov_execvp()
__gcov_flush()
__gcov_fork()
__gcov_reset()
init_mx()
init_mx_once()
 
  libgcov-profiler.c
  /* This file contains runtime profile handlers.  */
variables:
  __gcov_indirect_call_callee
  __gcov_indirect_call_counters
functions:
  __gcov_average_profiler()
  __gcov_indirect_call_profiler()
  __gcov_indirect_call_profiler_v2()
  __gcov_interval_profiler()
  __gcov_ior_profiler()
  __gcov_one_value_profiler()
  __gcov_one_value_profiler_body()
  __gcov_pow2_profiler()
 
  libgcov-merge.c
  /* This file contains the merge functions for various counters.  */
functions:
  __gcov_merge_add()
  __gcov_merge_delta()
  __gcov_merge_ior()
  __gcov_merge_single()
 
  libcov-driver.c
  /* This file contains the gcov dumping functions. We separate the
 system dependent part to libgcov-driver-system.c.  */
variables:
  gcov_list
  gcov_max_filename
  gcov_dump_complete
  -- newly added file static variables --
  this_prg
  all_prg
  crc32
  gi_filename
  fn_buffer
  fn_tail
  sum_buffer
  next_sum_buffer
  sum_tail
  - end -
functions:
  free_fn_data()
  buffer_fn_data()
  crc32_unsigned()
  gcov_version()
  gcov_histogram_insert()
  gcov_compute_histogram()
  gcov_clear()
  __gcov_init()
  gcov_exit()
  --- newly added static functions --
  gcov_exit_compute_summary ()
  gcov_exit_merge_gcda()
  gcov_exit_write_gcda()
  gcov_exit_dump_gcov()
  - end 

Re: [PATCH 1/n] Add conditional compare support

2013-11-04 Thread Joey Ye
+  TARGET_ARM || TARGET_THUMB2
TARGET_32BIT

+static const char *const ite = it\t%d4;
+static const int cmp_idx[9] = {0, 0, 1, 0, 1};
s/9/5/

On Wed, Oct 30, 2013 at 5:32 PM, Zhenqiang Chen zhenqiang.c...@arm.com wrote:

 -Original Message-
 From: Richard Henderson [mailto:r...@redhat.com]
 Sent: Monday, October 28, 2013 11:07 PM
 To: Zhenqiang Chen; Richard Earnshaw; 'Richard Biener'
 Cc: GCC Patches
 Subject: Re: [PATCH 1/n] Add conditional compare support

 On 10/28/2013 01:32 AM, Zhenqiang Chen wrote:
  Patch is updated according to your comments. Main changes are:
  * Add two hooks: legitimize_cmp_combination and
  legitimize_ccmp_combination
  * Improve document.

 No, these are not the hooks I proposed.

 You should *not* have a ccmp_optab, because the middle-end has
 absolutely no idea what mode TARGET should be in.

 The hook needs to return an rtx expression appropriate for feeding to
 cbranch et al.  E.g. for arm,

   (ne (reg:CC_Z CC_REGNUM) (const_int 0))

 after having emitted the instruction that sets CC_REGNUM.

 We need to push this to the backend because for ia64 the expression needs
 to be of te form

   (ne (reg:BI new_pseudo) (const_int 0))

 Thanks for the clarification.
 Patch is updated:
 * ccmp_optab is removed.
 * Add two hooks gen_ccmp_with_cmp_cmp and gen_ccmp_with_ccmp_cmp for backends 
 to generate the conditional compare instructions.

 Thanks!
 -Zhenqiang


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Biener
On Mon, 4 Nov 2013, Richard Sandiford wrote:

 Richard Biener rguent...@suse.de writes:
  On Fri, 1 Nov 2013, Richard Sandiford wrote:
  I'm building one target for each supported CPU and comparing the wide-int
  assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
  output from the merge point.  This patch removes all the differences I saw
  for alpha-linux-gnu in gcc.c-torture.
  
  Hunk 1: Preserve the current trunk behaviour in which the shift count
  is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
  inspection after hunk 5.
  
  Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
  types and where we weren't extending according to the sign of the source.
  We should probably assert that the input is at least as wide as the type...
  
  Hunk 4: The  in:
  
   if ((TREE_INT_CST_HIGH (arg1)  mask_hi) == mask_hi
(TREE_INT_CST_LOW (arg1)  mask_lo) == mask_lo)
  
  had got dropped during the conversion.
  
  Hunk 5: The old code was:
  
   if (shift  0)
 {
   *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
   *val = r1val.llshift (shift, TYPE_PRECISION (type));
 }
   else if (shift  0)
 {
   shift = -shift;
   *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
   *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
 }
  
  and these precision arguments had two purposes: to control which
  bits of the first argument were shifted, and to control the truncation
  mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
  for the second.
  
  (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
  end of the !GENERATOR_FILE list in rtl.def, since the current position 
  caused
  some spurious differences.  The problem AFAICT is that hash_rtx hashes
  on code, RTL PRE creates registers in the hash order of the associated
  expressions, RA uses register numbers as a tie-breaker during ordering,
  and so the order of rtx_code can indirectly influence register allocation.
  First time I'd realised that could happen, so just thought I'd mention it.
  I think we should keep rtl.def in the current (logical) order though.)
  
  Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
 
  Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
  from double-int.c on the trunk?  If not then putting this at the
  callers of wi::rshift and friends is clearly asking for future
  mistakes.
 
  Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
  than in machine instruction patterns was a mistake in the past.
 
 Sure, I can try, but what counts as success?  For better or worse:
 
 unsigned int i = 1  32;
 
 has traditionally honoured target/SHIFT_COUNT_TRUNCATED behaviour,
 so (e.g.) i is 0 for x86_64 and 1 for MIPS.  So if the idea is to
 get rid of SHIFT_COUNT_TRUNCATED at the tree level, it will be a
 visible change (but presumably only for undefined code).

Sure - but the above is undefined (as you say).  Generally
constant folding should follow language standards, not machine
dependent behavior.

  [btw, please use diff -p to get us at least some context if you
  are not writing changelogs ...]
 
 Yeah, sorry, this whole no changelog thing has messed up my workflow :-)
 The scripts I normally use did that for me.

;)

Richard.


RE: [PATCH 1/n] Add conditional compare support

2013-11-04 Thread Zhenqiang Chen

 -Original Message-
 From: Richard Henderson [mailto:r...@redhat.com]
 Sent: Friday, November 01, 2013 12:27 AM
 To: Zhenqiang Chen
 Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches
 Subject: Re: [PATCH 1/n] Add conditional compare support
 
 On 10/31/2013 02:06 AM, Zhenqiang Chen wrote:
  With two compares, we might swap the order of the two compares to get
  a valid combination. This is what current ARM does (Please refer
  arm_select_dominace_cc_mode, cmp_and, cmp_ior, and_scc_scc and
 ior_scc_scc).
  To improve gen_ccmp_first, we need another function/hook to determine
  which compare is done first. I will double check ARM backend to avoid hook.
 
 Hmm.  I hadn't thought of that, and missed that point when looking through
 your previous patches.  I think you should have a third hook for that.  ARM
 will need it, but IA-64 doesn't.

Thanks. I add a new hook. The default function will return -1 if the target 
does not care about the order.

+DEFHOOK
+(select_ccmp_cmp_order,
+ For some target (like ARM), the order of two compares is sensitive for\n\
+conditional compare.  cmp0-cmp1 might be an invalid combination.  But when\n\
+swapping the order, cmp1-cmp0 is valid.  The function will return\n\
+  -1: if @code{code1} and @code{code2} are valid combination.\n\
+   1: if @code{code2} and @code{code1} are valid combination.\n\
+   0: both are invalid.,
+ int, (int code1, int code2),
+ default_select_ccmp_cmp_order)
 
  Hook gen_ccmp_next needs one more parameter to indicate AND/IOR
 since
  they will generate different instructions.
 
 True, I forgot about that while writing the hook descriptions.

For gen_ccmp_next, I add another parameter CC_P to indicate the result is used 
as CC or not. If CC_P is false, the gen_ccmp_next will return a general 
register. This is for code like

int test (int a, int b)
{
  return a  0  b  0;
}
During expand, there might have no branch at all. So gen_ccmp_next  can not 
return CC for a  0  b  0.

+DEFHOOK
+(gen_ccmp_next,
+ This function emits a conditional comparison within a sequence of\n\
+ conditional comparisons.  The @code{prev} expression is the result of a\n\
+ prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}.  It may return\n\
+ @code{NULL} if the combination of @code{prev} and this comparison is\n\
+ not supported, otherwise the result must be appropriate for passing to\n\
+ @code{gen_ccmp_next} or @code{cbranch_optab} if @code{cc_p} is true.\n\
+ If @code{cc_p} is false, it returns a general register.  @code{bit_code}\n\
+ is AND or IOR, which is the op on the two compares.,
+ rtx, (rtx prev, int cmp_code, rtx op0, rtx op1, int bit_code, bool cc_p),
+ NULL)

A updated patch is attached.

Thanks!
-Zhenqiang

ccmp-hook3.patch
Description: Binary data


Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

2013-11-04 Thread James Greenhalgh
On Fri, Nov 01, 2013 at 04:48:53PM +, Cong Hou wrote:
 diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
 index 2a5a2e1..8f5d39a 100644
 --- a/gcc/doc/md.texi
 +++ b/gcc/doc/md.texi
 @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3.
 Operand 3 is of a mode equal or
  wider than the mode of the product. The result is placed in operand 0, which
  is of the same mode as operand 3.
 
 +@cindex @code{ssad@var{m}} instruction pattern
 +@item @samp{ssad@var{m}}
 +@cindex @code{usad@var{m}} instruction pattern
 +@item @samp{usad@var{m}}
 +Compute the sum of absolute differences of two signed/unsigned elements.
 +Operand 1 and operand 2 are of the same mode. Their absolute difference, 
 which
 +is of a wider mode, is computed and added to operand 3. Operand 3 is of a 
 mode
 +equal or wider than the mode of the absolute difference. The result is placed
 +in operand 0, which is of the same mode as operand 3.
 +
  @cindex @code{ssum_widen@var{m3}} instruction pattern
  @item @samp{ssum_widen@var{m3}}
  @cindex @code{usum_widen@var{m3}} instruction pattern
 diff --git a/gcc/expr.c b/gcc/expr.c
 index 4975a64..1db8a49 100644

I'm not sure I follow, and if I do - I don't think it matches what
you have implemented for i386.

From your text description I would guess the series of operations to be:

  v1 = widen (operands[1])
  v2 = widen (operands[2])
  v3 = abs (v1 - v2)
  operands[0] = v3 + operands[3]

But if I understand the behaviour of PSADBW correctly, what you have
actually implemented is:

  v1 = widen (operands[1])
  v2 = widen (operands[2])
  v3 = abs (v1 - v2)
  v4 = reduce_plus (v3)
  operands[0] = v4 + operands[3]

To my mind, synthesizing the reduce_plus step will be wasteful for targets
who do not get this for free with their Absolute Difference step. Imagine a
simple loop where we have synthesized the reduce_plus, we compute partial
sums each loop iteration, though we would be better to leave the reduce_plus
step until after the loop. REDUC_PLUS_EXPR would be the appropriate
Tree code for this.

I would prefer to see this Tree code not imply the reduce_plus.

Thanks,
James



Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Biener
On Mon, 4 Nov 2013, Richard Sandiford wrote:

 Richard Biener rguent...@suse.de writes:
  On Mon, 4 Nov 2013, Richard Biener wrote:
 
  On Fri, 1 Nov 2013, Richard Sandiford wrote:
  
   I'm building one target for each supported CPU and comparing the wide-int
   assembly output of gcc.c-torture, gcc.dg and g++.dg with the 
   corresponding
   output from the merge point.  This patch removes all the differences I 
   saw
   for alpha-linux-gnu in gcc.c-torture.
   
   Hunk 1: Preserve the current trunk behaviour in which the shift count
   is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
   inspection after hunk 5.
   
   Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
   types and where we weren't extending according to the sign of the source.
   We should probably assert that the input is at least as wide as the 
   type...
   
   Hunk 4: The  in:
   
  if ((TREE_INT_CST_HIGH (arg1)  mask_hi) == mask_hi
   (TREE_INT_CST_LOW (arg1)  mask_lo) == mask_lo)
   
   had got dropped during the conversion.
   
   Hunk 5: The old code was:
   
  if (shift  0)
{
  *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
  *val = r1val.llshift (shift, TYPE_PRECISION (type));
}
  else if (shift  0)
{
  shift = -shift;
  *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
  *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
}
   
   and these precision arguments had two purposes: to control which
   bits of the first argument were shifted, and to control the truncation
   mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
   for the second.
   
   (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to 
   the
   end of the !GENERATOR_FILE list in rtl.def, since the current position 
   caused
   some spurious differences.  The problem AFAICT is that hash_rtx hashes
   on code, RTL PRE creates registers in the hash order of the associated
   expressions, RA uses register numbers as a tie-breaker during ordering,
   and so the order of rtx_code can indirectly influence register 
   allocation.
   First time I'd realised that could happen, so just thought I'd mention 
   it.
   I think we should keep rtl.def in the current (logical) order though.)
   
   Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
  
  Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
  from double-int.c on the trunk?  If not then putting this at the
  callers of wi::rshift and friends is clearly asking for future
  mistakes.
  
  Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
  than in machine instruction patterns was a mistake in the past.
 
  Oh, and my understanding is that it's maybe required for correctness on
  the RTL side if middle-end code removes masking operations.
  Constant propagation results later may not change the result because
  of that.  I'm not sure this is the way it happens, but if we have
 
(lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f))
 
  and we'd transform it to (based on SHIFT_COUNT_TRUNCATED)
 
(lshift:si (reg:si 1) (reg:qi 2))
 
  and later constant propagate 77 to reg:qi 2.
 
  That isn't the way SHIFT_COUNT_TRUNCATED should work, of course,
  but instead the backends should have a define_insn which matches
  the 'and' and combine should try to match that.
 
  You can see that various architectures may have truncating shifts
  but not for all patterns which results in stuff like
 
  config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD
 
  which clearly means that it's not implemented in terms of providing
  shift insns which can match a shift operand that is masked.
 
  That is, either try to clean that up (hooray!), or preserve
  the double-int.[ch] behavior (how does CONST_INT RTL constant
  folding honor it?).
 
 The CONST_INT code does the modulus explicitly:
 
 /* Truncate the shift if SHIFT_COUNT_TRUNCATED, otherwise make sure
the value is in range.  We can't return any old value for
out-of-range arguments because either the middle-end (via
shift_truncation_mask) or the back-end might be relying on
target-specific knowledge.  Nor can we rely on
shift_truncation_mask, since the shift might not be part of an
ashlM3, lshrM3 or ashrM3 instruction.  */
 if (SHIFT_COUNT_TRUNCATED)
   arg1 = (unsigned HOST_WIDE_INT) arg1 % width;
 else if (arg1  0 || arg1 = GET_MODE_BITSIZE (mode))
   return 0;
 
 There was a strong feeling (from me and others) that wide-int.h
 should not depend on tm.h.  I don't think wi::lshift itself should
 know about SHIFT_COUNT_TRUNCATED.  Especially since (and I think we agree
 on this :-)) it's a terrible interface.  It would be better if it took
 a mode as an argument and (perhaps) returned a mask as a result. 

Re: [PATCH] Fix up reassoc (PR tree-optimization/58946)

2013-11-04 Thread Richard Biener
On Fri, 1 Nov 2013, Jakub Jelinek wrote:

 Hi!
 
 My recent reassoc patch caused following regression (though, it only started
 failing on this testcase with Andrew's ifcombine changes).
 
 The issue is that update_ops relies on walking the same stmts as get_ops
 did, and uses has_single_uses (either directly or indirectly through
 is_reassociable_op).  optimize_range_tests itself doesn't change the IL
 except for inserting new stmts using values on which get_ops already didn't
 recurse (either because they were multiple uses or non-reassociable).
 The problem in the testcase is when optimizing a GIMPLE_COND directly, there
 is no guarantee of single use, we treat the condition as the starting point
 of init_range_info and thus SSA_NAME != 0 or SSA_NAME == 0 etc. and that
 is just fine if SSA_NAME has multiple uses, so if we first change the
 condition to something else (as instructed by the changed ops[i]-op value
 from NULL to some SSA_NAME), we might turn something update_ops looks at
 from multiple uses into single use.
 
 This patch fixes it by doing all the update_ops calls before changing
 GIMPLE_CONDs themselves.  I believe it is safe, update_ops will walk only
 single use SSA_NAMEs and thus they occur only in the single particular
 update_ops call, and never removes anything, only adds new stmt (which
 can make single use SSA_NAMEs into multiple use, but that happened after
 we've walked that originally single use exactly ones from the single use),
 and GIMPLE_COND adjustments never use has_single_use, thus they can be
 safely done after all update_ops have been called.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2013-11-01  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/58946
   * tree-ssa-reassoc.c (maybe_optimize_range_tests): Update all
   bbs with bbinfo[idx].op != NULL before all blocks with
   bbinfo[idx].op == NULL.
 
   * gcc.c-torture/compile/pr58946.c: New test.
 
 --- gcc/tree-ssa-reassoc.c.jj 2013-10-24 10:19:21.0 +0200
 +++ gcc/tree-ssa-reassoc.c2013-11-01 09:23:09.264615181 +0100
 @@ -2657,6 +2657,7 @@ maybe_optimize_range_tests (gimple stmt)
edge e;
vecoperand_entry_t ops = vNULL;
vecinter_bb_range_test_entry bbinfo = vNULL;
 +  bool any_changes = false;
  
/* Consider only basic blocks that end with GIMPLE_COND or
   a cast statement satisfying final_range_test_p.  All
 @@ -2870,41 +2871,31 @@ maybe_optimize_range_tests (gimple stmt)
   break;
  }
if (ops.length ()  1)
 +any_changes = optimize_range_tests (ERROR_MARK, ops);
 +  if (any_changes)
  {
unsigned int idx;
 -  bool any_changes = optimize_range_tests (ERROR_MARK, ops);
 -  for (bb = last_bb, idx = 0; any_changes; bb = single_pred (bb), idx++)
 +  /* update_ops relies on has_single_use predicates returning the
 +  same values as it did during get_ops earlier.  Additionally it
 +  never removes statements, only adds new ones and it should walk
 +  from the single imm use and check the predicate already before
 +  making those changes.
 +  On the other side, the handling of GIMPLE_COND directly can turn
 +  previously multiply used SSA_NAMEs into single use SSA_NAMEs, so
 +  it needs to be done in a separate loop afterwards.  */
 +  for (bb = last_bb, idx = 0; ; bb = single_pred (bb), idx++)
   {
 -   if (bbinfo[idx].first_idx  bbinfo[idx].last_idx)
 +   if (bbinfo[idx].first_idx  bbinfo[idx].last_idx
 +bbinfo[idx].op != NULL_TREE)
   {
 -   gimple stmt = last_stmt (bb);
 tree new_op;
  
 -   if (bbinfo[idx].op == NULL_TREE)
 - {
 -   if (ops[bbinfo[idx].first_idx]-op != NULL_TREE)
 - {
 -   if (integer_zerop (ops[bbinfo[idx].first_idx]-op))
 - gimple_cond_make_false (stmt);
 -   else if (integer_onep (ops[bbinfo[idx].first_idx]-op))
 - gimple_cond_make_true (stmt);
 -   else
 - {
 -   gimple_cond_set_code (stmt, NE_EXPR);
 -   gimple_cond_set_lhs (stmt,
 -ops[bbinfo[idx].first_idx]-op);
 -   gimple_cond_set_rhs (stmt, boolean_false_node);
 - }
 -   update_stmt (stmt);
 - }
 -   bbinfo[idx].op = new_op = boolean_false_node;
 - }
 -   else
 - new_op = update_ops (bbinfo[idx].op,
 -  (enum tree_code)
 -  ops[bbinfo[idx].first_idx]-rank,
 -  ops, bbinfo[idx].first_idx,
 -  loop_containing_stmt (stmt));
 +   stmt = last_stmt (bb);
 +   new_op = update_ops (bbinfo[idx].op,
 + 

PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size

2013-11-04 Thread H.J. Lu
emit_block_move_via_movmem and set_storage_via_setmem have

  for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
   mode = GET_MODE_WIDER_MODE (mode))
{
  enum insn_code code = direct_optab_handler (movmem_optab, mode);

  if (code != CODE_FOR_nothing
  /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
 here because if SIZE is less than the mode mask, as it is
 returned by the macro, it will definitely be less than the
 actual mode mask.  */
   ((CONST_INT_P (size)
((unsigned HOST_WIDE_INT) INTVAL (size)
   = (GET_MODE_MASK (mode)  1)))
  || GET_MODE_BITSIZE (mode) = BITS_PER_WORD))
{

Backend may assume mode of size in movmem and setmem expanders is no
widder than Pmode since size is within the Pmode address space.  X86
backend expand_set_or_movmem_prologue_epilogue_by_misaligned has

  rtx saveddest = *destptr;

  gcc_assert (desired_align = size);
  /* Align destptr up, place it to new register.  */
  *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr,
  GEN_INT (prolog_size),
  NULL_RTX, 1, OPTAB_DIRECT);
  *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr,
  GEN_INT (-desired_align),
  *destptr, 1, OPTAB_DIRECT);
  /* See how many bytes we skipped.  */
  saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
   *destptr,
   saveddest, 1, OPTAB_DIRECT);
  /* Adjust srcptr and count.  */
  if (!issetmem)
*srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, 
saveddest,
*srcptr, 1, OPTAB_DIRECT);
  *count = expand_simple_binop (GET_MODE (*count), PLUS, *count,
saveddest, *count, 1, OPTAB_DIRECT);

saveddest is a negative number in Pmode and *count is in word_mode.  For
x32, when Pmode is SImode and word_mode is DImode, saveddest + *count
leads to overflow.  We could fix it by using mode of saveddest to compute
saveddest + *count.  But it leads to extra conversions and other backends
may run into the same problem.  A better fix is to limit mode of size in
movmem and setmem expanders to Pmode.  It generates better and correct
memcpy and memset for x32.

There is also a typo in comments.  It should be BITS_PER_WORD, not
BITS_PER_HOST_WIDE_INT. 

Tested on x32.  OK to install?

Thanks.


H.J.
---
gcc/

2013-11-04  H.J. Lu  hongjiu...@intel.com

PR middle-end/58981
* expr.c (emit_block_move_via_movmem): Don't use mode wider than
Pmode for size.
(set_storage_via_setmem): Likewise.

gcc/testsuite/

2013-11-04  H.J. Lu  hongjiu...@intel.com

PR middle-end/58981
* gcc.dg/pr58981.c: New test.

diff --git a/gcc/expr.c b/gcc/expr.c
index 551a660..1a869650 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1294,13 +1294,16 @@ emit_block_move_via_movmem (rtx x, rtx y, rtx size, 
unsigned int align,
   enum insn_code code = direct_optab_handler (movmem_optab, mode);
 
   if (code != CODE_FOR_nothing
- /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
-here because if SIZE is less than the mode mask, as it is
+ /* We don't need MODE to be narrower than BITS_PER_WORD here
+because if SIZE is less than the mode mask, as it is
 returned by the macro, it will definitely be less than the
-actual mode mask.  */
+actual mode mask unless MODE is wider than Pmode.  Since
+SIZE is within the Pmode address space, we should use
+Pmode in this case.  */
   ((CONST_INT_P (size)
((unsigned HOST_WIDE_INT) INTVAL (size)
   = (GET_MODE_MASK (mode)  1)))
+ || GET_MODE_BITSIZE (mode) = GET_MODE_BITSIZE (Pmode)
  || GET_MODE_BITSIZE (mode) = BITS_PER_WORD))
{
  struct expand_operand ops[6];
@@ -2879,13 +2882,16 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, 
unsigned int align,
   enum insn_code code = direct_optab_handler (setmem_optab, mode);
 
   if (code != CODE_FOR_nothing
- /* We don't need MODE to be narrower than
-BITS_PER_HOST_WIDE_INT here because if SIZE is less than
-the mode mask, as it is returned by the macro, it will
-definitely be less than the actual mode mask.  */
+ /* We don't need MODE to be narrower than BITS_PER_WORD here
+because if SIZE is less than the mode mask, as it is
+returned by the macro, it will definitely be less than the
+actual mode mask unless MODE is wider than Pmode.  Since
+SIZE is within the Pmode 

Re: RFC: simd enabled functions (omp declare simd / elementals)

2013-11-04 Thread Richard Biener
On Fri, Nov 1, 2013 at 4:04 AM, Aldy Hernandez al...@redhat.com wrote:
 Hello gentlemen.  I'm CCing all of you, because each of you can provide
 valuable feedback to various parts of the compiler which I touch.  I have
 sprinkled love notes with your names throughout the post :).

 This is a patch against the gomp4 branch.  It provides initial support for
 simd-enabled functions which are #pragma omp declare simd in the OpenMP
 world and elementals in Cilk Plus nomenclature.  The parsing bits for OpenMP
 are already in trunk, but they are silently ignored.  This patch aims to
 remedy the situation.  The Cilk Plus parsing bits, OTOH, are not ready, but
 could trivially be adapted to use this infrastructure (see below).

 I would like to at least get this into the gomp4 branch for now, because I
 am accumulating far too many changes locally.

 The main idea is that for a simd annotated function, we can create one or
 more cloned vector variants of a scalar function that can later be used by
 the vectorizer.

 For a simple example with multiple returns...

 #pragma omp declare simd simdlen(4) notinbranch
 int foo (int a, int b)
 {
   if (a == b)
 return 555;
   else
 return 666;
 }

 ...we would generate with this patch (unoptimized):

Just a quick question, queueing the thread for later review (aww, queue
back at 100 threads again - I can't get any work done anymore :().

What does #pragma omp declare simd guarantee about memory
side-effects and memory use in the function?  That is, unless the
function can be safely annotated with the 'const' attribute the
whole thing is useless for the vectorizer.

Thanks,
Richard.

 foo.simdclone.0 (vector(4) int simd.4, vector(4) int simd.5)
 {
   unsigned int iter.6;
   int b.3[4];
   int a.2[4];
   int retval.1[4];
   int _3;
   int _5;
   int _6;
   vector(4) int _7;

   bb 2:
   a.2 = VIEW_CONVERT_EXPRint[4](simd.4);
   b.3 = VIEW_CONVERT_EXPRint[4](simd.5);
   iter.6_12 = 0;

   bb 3:
   # iter.6_9 = PHI iter.6_12(2), iter.6_14(6)
   _5 = a.2[iter.6_9];
   _6 = b.3[iter.6_9];
   if (_5 == _6)
 goto bb 5;
   else
 goto bb 4;

   bb 4:

   bb 5:
   # _3 = PHI 555(3), 666(4)
   retval.1[iter.6_9] = _3;
   iter.6_14 = iter.6_9 + 1;
   if (iter.6_14  4)
 goto bb 6;
   else
 goto bb 7;

   bb 6:
   goto bb 3;

   bb 7:
   _7 = VIEW_CONVERT_EXPRvector(4) int(retval.1);
   return _7;

 }

 The new loop is properly created and annotated with loop-force_vect=true
 and loop-safelen set.

 A possible use may be:

 int array[1000];
 void bar ()
 {
   int i;
   for (i=0; i  1000; ++i)
 array[i] = foo(i, 123);
 }

 In which case, we would use the simd clone if available:

 bar ()
 {
   vector(4) int vect_cst_.21;
   vector(4) int vect_i_6.20;
   vector(4) int * vectp_array.19;
   vector(4) int * vectp_array.18;
   vector(4) int vect_cst_.17;
   vector(4) int vect__4.16;
   vector(4) int vect_vec_iv_.15;
   vector(4) int vect_cst_.14;
   vector(4) int vect_cst_.13;
   int stmp_var_.12;
   int i;
   unsigned int ivtmp_1;
   int _4;
   unsigned int ivtmp_7;
   unsigned int ivtmp_20;
   unsigned int ivtmp_21;

   bb 2:
   vect_cst_.13_8 = { 0, 1, 2, 3 };
   vect_cst_.14_2 = { 4, 4, 4, 4 };
   vect_cst_.17_13 = { 123, 123, 123, 123 };
   vectp_array.19_15 = array;
   vect_cst_.21_5 = { 1, 1, 1, 1 };
   goto bb 4;

   bb 3:

   bb 4:
   # i_9 = PHI i_6(3), 0(2)
   # ivtmp_1 = PHI ivtmp_7(3), 1000(2)
   # vect_vec_iv_.15_11 = PHI vect_vec_iv_.15_12(3), vect_cst_.13_8(2)
   # vectp_array.18_16 = PHI vectp_array.18_17(3), vectp_array.19_15(2)
   # ivtmp_20 = PHI ivtmp_21(3), 0(2)
   vect_vec_iv_.15_12 = vect_vec_iv_.15_11 + vect_cst_.14_2;
   vect__4.16_14 = foo.simdclone.0 (vect_vec_iv_.15_11, vect_cst_.17_13);
   _4 = 0;
   MEM[(int *)vectp_array.18_16] = vect__4.16_14;
   vect_i_6.20_19 = vect_vec_iv_.15_11 + vect_cst_.21_5;
   i_6 = i_9 + 1;
   ivtmp_7 = ivtmp_1 - 1;
   vectp_array.18_17 = vectp_array.18_16 + 16;
   ivtmp_21 = ivtmp_20 + 1;
   if (ivtmp_21  250)
 goto bb 3;
   else
 goto bb 5;

   bb 5:
   return;

 }

 That's the idea.

 Some of the ABI issues still need to be resolved (mangling for avx-512, what
 to do with non x86 architectures, what (if any) default clones will be
 created when no vector length is specified, etc etc), but the main
 functionality can be seen above.

 Uniform and linear parameters (which are passed as scalars) are still not
 handled.  Also, Jakub mentioned that with the current vectorizer we probably
 can't make good use of the inbranch/masked clones.  I have a laundry list of
 missing things prepended by // FIXME if anyone is curious.

 I'd like some feedback from y'all in your respective areas, since this
 touches a few places besides OpenMP.  For instance...

 [Honza] Where do you suggest I place a list of simd clones for a particular
 (scalar) function?  Right now I have added a simdclone_of field in
 cgraph_node and am (temporarily) serially scanning all functions in
 get_simd_clone().  This is obviously 

Re: [PATCH] Time profiler - phase 1

2013-11-04 Thread Jan Hubicka
 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index fca665b..3b62bcc 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,31 @@
 +2013-10-29  Martin Liska  marxin.li...@gmail.com
 + Jan Hubicka  j...@suse.cz
 +
 + * cgraph.c (dump_cgraph_node): Profile dump added.
 + * cgraph.h (struct cgraph_node): New time profile variable added.
 + * cgraphclones.c (cgraph_clone_node): Time profile is cloned.
 + * gcov-io.h (gcov_type): New profiler type introduced.
 + * ipa-profile.c (lto_output_node): Streaming for time profile added.
 + (input_node): Time profiler is read from LTO stream.
 + * predict.c (maybe_hot_count_p): Hot prediction changed.
 + * profile.c (instrument_values): New case for time profiler added.
 + (compute_value_histograms): Read of time profile.
 + * tree-pretty-print.c (dump_function_header): Time profiler is dumped.
 + * tree-profile.c (init_ic_make_global_vars): Time profiler function 
 added.
 + (gimple_init_edge_profiler): TP function instrumentation.
 + (gimple_gen_time_profiler): New.
 + * value-prof.c (gimple_add_histogram_value): Support for time profiler
 + added.
 + (dump_histogram_value): TP type added to dumps.
 + (visit_hist): More sensitive check that takes TP into account.
 + (gimple_find_values_to_profile): TP instrumentation.
 + * value-prof.h (hist_type): New histogram type added.
 + (struct histogram_value_t): Pointer to struct function added.
 + * libgcc/Makefile.in: New GCOV merge function for TP added.
 + * libgcov.c: function_counter variable introduced.
 + (_gcov_merge_time_profile): New.
 + (_gcov_time_profiler): New. 
 +
  2013-10-29  David Malcolm  dmalc...@redhat.com
  
   * doc/gty.texi (Inheritance and GTY): Make it clear that
 diff --git a/gcc/cgraph.c b/gcc/cgraph.c
 index 52d9ab0..c95a54e 100644
 --- a/gcc/cgraph.c
 +++ b/gcc/cgraph.c
 @@ -1890,6 +1890,7 @@ dump_cgraph_node (FILE *f, struct cgraph_node *node)
if (node-profile_id)
  fprintf (f,   Profile id: %i\n,
node-profile_id);
 +  fprintf (f,   First run: %i\n, node-tp_first_run);
fprintf (f,   Function flags:);
if (node-count)
  fprintf (f,  executed HOST_WIDEST_INT_PRINT_DECx,
 diff --git a/gcc/cgraph.h b/gcc/cgraph.h
 index 7706419..479d49f 100644
 --- a/gcc/cgraph.h
 +++ b/gcc/cgraph.h
 @@ -247,7 +247,6 @@ struct GTY(()) cgraph_clone_info
bitmap combined_args_to_skip;
  };
  
 -
  /* The cgraph data structure.
 Each function decl has assigned cgraph_node listing callees and callers.  
 */
  
 @@ -324,6 +323,8 @@ struct GTY(()) cgraph_node {
unsigned tm_clone : 1;
/* True if this decl is a dispatcher for function versions.  */
unsigned dispatcher_function : 1;
 +  /* Time profiler: first run of function.  */
 +  int tp_first_run;

Move this up after profile_id.
 --- a/gcc/gcov-io.c
 +++ b/gcc/gcov-io.c
 @@ -68,7 +68,7 @@ gcov_open (const char *name, int mode)
  #if IN_LIBGCOV
const int mode = 0;
  #endif
 -#if GCOV_LOCKED
 +#if GCOV_LOCKED  
struct flock s_flock;
int fd;
  
Accidental change?
 @@ -651,6 +658,9 @@ lto_symtab_prevailing_decl (tree decl)
if (TREE_CODE (decl) == FUNCTION_DECL  DECL_BUILT_IN (decl))
  return decl;
  
 +  if (!DECL_ASSEMBLER_NAME_SET_P (decl))
 +return decl;
 +
/* Ensure DECL_ASSEMBLER_NAME will not set assembler name.  */
gcc_assert (DECL_ASSEMBLER_NAME_SET_P (decl));
  
Remove this change - it is unrelated hack from my old tree.
 diff --git a/gcc/predict.c b/gcc/predict.c
 index cc9a053..4b655d3 100644
 --- a/gcc/predict.c
 +++ b/gcc/predict.c
 @@ -170,7 +170,7 @@ maybe_hot_count_p (struct function *fun, gcov_type count)
if (fun  profile_status_for_function (fun) != PROFILE_READ)
  return true;
/* Code executed at most once is not hot.  */
 -  if (profile_info-runs = count)
 +  if (count = 1)
  return false;
return (count = get_hot_bb_threshold ());
  }
And also this change.
 @@ -895,9 +907,19 @@ compute_value_histograms (histogram_values values, 
 unsigned cfg_checksum,
hist-hvalue.counters =  XNEWVEC (gcov_type, hist-n_counters);
for (j = 0; j  hist-n_counters; j++)
  if (aact_count)
 -   hist-hvalue.counters[j] = aact_count[j];
 - else
 -   hist-hvalue.counters[j] = 0;
 +  hist-hvalue.counters[j] = aact_count[j];
 +else
 +  hist-hvalue.counters[j] = 0;
 +
 +  if (hist-type == HIST_TYPE_TIME_PROFILE)
 +{
 +  node = cgraph_get_node (hist-fun-decl);
 +  
 +  node-tp_first_run = hist-hvalue.counters[0];
 +
 +  if (dump_file)
 +fprintf (dump_file, Read tp_first_run: %d\n, 
 node-tp_first_run);
 +}
Probably add a comment why you need to annotate counter here.
 diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
 index b2c5411..0fe262c 100644
 --- a/gcc/tree-pretty-print.c
 +++ 

Re: [PATCH, MPX, 2/X] Pointers Checker [8/25] Languages support

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 10:11 AM, Ilya Enkovich enkovich@gmail.com wrote:
 Hi,

 This patch adds support Pointer Bounds Checker into c-family and LTO 
 front-ends.  The main purpose of changes in front-end is to register all 
 statically initialized objects for checker.  We also need to register such 
 objects created by compiler.

You define CHKP as supported in LTO.  That means it has to be supported
for all languages and thus you should drop the langhook.

Richard.

 Thanks,
 Ilya
 --

 gcc/

 2013-10-29  Ilya Enkovich  ilya.enkov...@intel.com

 * c/c-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
 * cp/cp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
 * objc/objc-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
 * objcp/objcp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
 * lto/lto-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New.
 * c/c-parser.c (c_parser_declaration_or_fndef): Register statically
 initialized decls in Pointer Bounds Checker.
 * cp/decl.c (cp_finish_decl): Likewise.
 * gimplify.c (gimplify_init_constructor): Likewise.


 diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c
 index 614c46d..a32bc6b 100644
 --- a/gcc/c/c-lang.c
 +++ b/gcc/c/c-lang.c
 @@ -43,6 +43,8 @@ enum c_language_kind c_language = clk_c;
  #define LANG_HOOKS_INIT c_objc_common_init
  #undef LANG_HOOKS_INIT_TS
  #define LANG_HOOKS_INIT_TS c_common_init_ts
 +#undef LANG_HOOKS_CHKP_SUPPORTED
 +#define LANG_HOOKS_CHKP_SUPPORTED true

  /* Each front end provides its own lang hook initializer.  */
  struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
 diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
 index 9ccae3b..65d83c8 100644
 --- a/gcc/c/c-parser.c
 +++ b/gcc/c/c-parser.c
 @@ -1682,6 +1682,12 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
 fndef_ok,
   maybe_warn_string_init (TREE_TYPE (d), init);
   finish_decl (d, init_loc, init.value,
init.original_type, asm_name);
 +
 + /* Register all decls with initializers in Pointer
 +Bounds Checker to generate required static bounds
 +initializers.  */
 + if (DECL_INITIAL (d) != error_mark_node)
 +   chkp_register_var_initializer (d);
 }
 }
   else
 diff --git a/gcc/cp/cp-lang.c b/gcc/cp/cp-lang.c
 index a7fa8e4..6d138bd 100644
 --- a/gcc/cp/cp-lang.c
 +++ b/gcc/cp/cp-lang.c
 @@ -81,6 +81,8 @@ static tree get_template_argument_pack_elems_folded 
 (const_tree);
  #define LANG_HOOKS_EH_PERSONALITY cp_eh_personality
  #undef LANG_HOOKS_EH_RUNTIME_TYPE
  #define LANG_HOOKS_EH_RUNTIME_TYPE build_eh_type_type
 +#undef LANG_HOOKS_CHKP_SUPPORTED
 +#define LANG_HOOKS_CHKP_SUPPORTED true

  /* Each front end provides its own lang hook initializer.  */
  struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
 diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
 index 1e92f2a..db40e75 100644
 --- a/gcc/cp/decl.c
 +++ b/gcc/cp/decl.c
 @@ -6379,6 +6379,12 @@ cp_finish_decl (tree decl, tree init, bool 
 init_const_expr_p,
  the class specifier.  */
   if (!DECL_EXTERNAL (decl))
 var_definition_p = true;
 +
 + /* If var has initilizer then we need to register it in
 +Pointer Bounds Checker to generate static bounds initilizer
 +if required.  */
 + if (DECL_INITIAL (decl)  DECL_INITIAL (decl) != error_mark_node)
 +   chkp_register_var_initializer (decl);
 }
/* If the variable has an array type, lay out the type, even if
  there is no initializer.  It is valid to index through the
 diff --git a/gcc/gimplify.c b/gcc/gimplify.c
 index 4f52c27..503450f 100644
 --- a/gcc/gimplify.c
 +++ b/gcc/gimplify.c
 @@ -4111,6 +4111,11 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
 *pre_p, gimple_seq *post_p,

 walk_tree (ctor, force_labels_r, NULL, NULL);
 ctor = tree_output_constant_def (ctor);
 +
 +   /* We need to register created constant object to
 +  initialize bounds for pointers in it.  */
 +   chkp_register_var_initializer (ctor);
 +
 if (!useless_type_conversion_p (type, TREE_TYPE (ctor)))
   ctor = build1 (VIEW_CONVERT_EXPR, type, ctor);
 TREE_OPERAND (*expr_p, 1) = ctor;
 diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
 index 0fa0fc9..b6073d9 100644
 --- a/gcc/lto/lto-lang.c
 +++ b/gcc/lto/lto-lang.c
 @@ -1278,6 +1278,8 @@ static void lto_init_ts (void)

  #undef LANG_HOOKS_INIT_TS
  #define LANG_HOOKS_INIT_TS lto_init_ts
 +#undef LANG_HOOKS_CHKP_SUPPORTED
 +#define LANG_HOOKS_CHKP_SUPPORTED true

  struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;

 diff --git a/gcc/objc/objc-lang.c b/gcc/objc/objc-lang.c
 index bc0008b..5e7e43b 100644
 --- a/gcc/objc/objc-lang.c
 +++ b/gcc/objc/objc-lang.c
 @@ -49,6 +49,8 @@ enum c_language_kind 

Re: PR 58958: wrong aliasing info

2013-11-04 Thread Richard Biener
On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

 the issue was described in the PR and the message linked from there.
 ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
 detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
 never learns of it and uses the offset+size as if they were meaningful.

Well...  it certainly learns of it, but it chooses to ignore...

   if (TREE_CODE (ptr) == ADDR_EXPR)
-ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
-ref-offset, t1, t2);
+{
+  ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0),
+  t, 0, safe);
+  ref-offset = BITS_PER_UNIT * t;
+}

safe == (t1 != -1  t1 == t2)

note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
so I fail to see why it matters at all ...?  That is, if you feed in a wrong
size then it's the callers error.

Richard.

 Bootstrap+testsuite on x86_64-unknown-linux-gnu.

 2013-11-04  Marc Glisse  marc.gli...@inria.fr

 PR tree-optimization/
 gcc/
 * tree-dfa.h (get_addr_base_and_unit_offset_1): Add error reporting.
 * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Use
 get_addr_base_and_unit_offset_1 instead of get_ref_base_and_extent.

 gcc/testsuite/
 * gcc.dg/tree-ssa/pr58958.c: New file.

 --
 Marc Glisse
 Index: gcc/testsuite/gcc.dg/tree-ssa/pr58958.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/pr58958.c (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/pr58958.c (working copy)
 @@ -0,0 +1,12 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-optimized } */
 +
 +double a[10];
 +int f(int n){
 +  a[3]=9;
 +  __builtin_memset(a[n],3,sizeof(double));
 +  return a[3]==9;
 +}
 +
 +/* { dg-final { scan-tree-dump  == 9 optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */

 Property changes on: gcc/testsuite/gcc.dg/tree-ssa/pr58958.c
 ___
 Added: svn:keywords
 ## -0,0 +1 ##
 +Author Date Id Revision URL
 \ No newline at end of property
 Added: svn:eol-style
 ## -0,0 +1 ##
 +native
 \ No newline at end of property
 Index: gcc/tree-dfa.h
 ===
 --- gcc/tree-dfa.h  (revision 204302)
 +++ gcc/tree-dfa.h  (working copy)
 @@ -30,65 +30,70 @@ extern tree ssa_default_def (struct func
  extern void set_ssa_default_def (struct function *, tree, tree);
  extern tree get_or_create_ssa_default_def (struct function *, tree);
  extern tree get_ref_base_and_extent (tree, HOST_WIDE_INT *,
  HOST_WIDE_INT *, HOST_WIDE_INT *);
  extern tree get_addr_base_and_unit_offset (tree, HOST_WIDE_INT *);
  extern bool stmt_references_abnormal_ssa_name (gimple);
  extern void dump_enumerated_decls (FILE *, int);

  /* Returns the base object and a constant BITS_PER_UNIT offset in *POFFSET
 that
 denotes the starting address of the memory access EXP.
 -   Returns NULL_TREE if the offset is not constant or any component
 -   is not BITS_PER_UNIT-aligned.
 +   If the offset is not constant or any component is not
 BITS_PER_UNIT-aligned,
 +   sets *SAFE to false or returns NULL_TREE if SAFE is NULL.
 VALUEIZE if non-NULL is used to valueize SSA names.  It should return
 its argument or a constant if the argument is known to be constant.  */
  /* ??? This is a static inline here to avoid the overhead of the indirect
 calls
 to VALUEIZE.  But is this overhead really that significant?  And should
 we
 perhaps just rely on WHOPR to specialize the function?  */

  static inline tree
  get_addr_base_and_unit_offset_1 (tree exp, HOST_WIDE_INT *poffset,
 -tree (*valueize) (tree))
 +tree (*valueize) (tree), bool *safe = NULL)
  {
HOST_WIDE_INT byte_offset = 0;
 +  bool issafe = true;

/* Compute cumulative byte-offset for nested component-refs and
 array-refs,
   and find the ultimate containing object.  */
while (1)
  {
switch (TREE_CODE (exp))
 {
 case BIT_FIELD_REF:
   {
 HOST_WIDE_INT this_off = TREE_INT_CST_LOW (TREE_OPERAND (exp,
 2));
 if (this_off % BITS_PER_UNIT)
 - return NULL_TREE;
 -   byte_offset += this_off / BITS_PER_UNIT;
 + issafe = false;
 +   else
 + byte_offset += this_off / BITS_PER_UNIT;
   }
   break;

 case COMPONENT_REF:
   {
 tree field = TREE_OPERAND (exp, 1);
 tree this_offset = component_ref_field_offset (exp);
 HOST_WIDE_INT hthis_offset;

 if (!this_offset
 || TREE_CODE (this_offset) != INTEGER_CST
 || 

Re: RFC: simd enabled functions (omp declare simd / elementals)

2013-11-04 Thread Jakub Jelinek
Hi!

On Mon, Nov 04, 2013 at 11:37:19AM +0100, Richard Biener wrote:
 Just a quick question, queueing the thread for later review (aww, queue
 back at 100 threads again - I can't get any work done anymore :().
 
 What does #pragma omp declare simd guarantee about memory
 side-effects and memory use in the function?  That is, unless the
 function can be safely annotated with the 'const' attribute the
 whole thing is useless for the vectorizer.

The main restriction is:

The execution of the function or subroutine cannot have any side effects that 
would
alter its execution for concurrent iterations of a SIMD chunk.

There are other restrictions, omp declare simd functions can't call
setjmp/longjmp or throw, etc.

The functions certainly can't be in the general case annotated with the
const attribute, they can read and write memory, but the user is responsible
for making sure that the effects or running the function sequentially as
part of non-vectorized loop are the same as running the simd clone of that
as part of a vectorized loop with the given simdlen vectorization factor.
So it is certainly meant to be used by the vectorizer, after all, that is
it's sole purpose.

Jakub


Re: Expect JUMP_TABLE_DATA in NEXT_INSN(label)

2013-11-04 Thread Richard Biener
On Sat, Nov 2, 2013 at 1:45 AM, Steven Bosscher stevenb@gmail.com wrote:
 Hello,

 As $SUBJECT. The assert for this has been in place for several months
 now without triggering any issues, so I'd like to make this next step
 before stage1 is over.

 OK for trunk?

Ok.  Can you verify this from some RTL insn verifier (if it not already is).

Thanks,
Richard.

 Ciao!
 Steven


Re: PR 58958: wrong aliasing info

2013-11-04 Thread Marc Glisse

On Mon, 4 Nov 2013, Richard Biener wrote:


On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr wrote:

Hello,

the issue was described in the PR and the message linked from there.
ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
never learns of it and uses the offset+size as if they were meaningful.


Well...  it certainly learns of it, but it chooses to ignore...

  if (TREE_CODE (ptr) == ADDR_EXPR)
-ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
-ref-offset, t1, t2);
+{
+  ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0),
+  t, 0, safe);
+  ref-offset = BITS_PER_UNIT * t;
+}

safe == (t1 != -1  t1 == t2)


I'll try that... (I need to think whether that's sufficient to be safe)


note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
so I fail to see why it matters at all ...?  That is, if you feed in a wrong
size then it's the callers error.


The caller is feeding the right size. The issue is that 
get_ref_base_and_extent cannot determine the offset as a constant. 
Normally, get_ref_base_and_extent then gives you a safe combination of 
offset+maxsize to cover the whole decl. Here, we don't want to use the 
size determined by get_ref_base_and_extent, we know better, but that also 
means we have to handle the case where the offset couldn't be determined 
as a constant.


--
Marc Glisse


Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size

2013-11-04 Thread Richard Sandiford
H.J. Lu hongjiu...@intel.com writes:
 emit_block_move_via_movmem and set_storage_via_setmem have

   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
mode = GET_MODE_WIDER_MODE (mode))
 {
   enum insn_code code = direct_optab_handler (movmem_optab, mode);

   if (code != CODE_FOR_nothing
   /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
  here because if SIZE is less than the mode mask, as it is
  returned by the macro, it will definitely be less than the
  actual mode mask.  */
((CONST_INT_P (size)
 ((unsigned HOST_WIDE_INT) INTVAL (size)
= (GET_MODE_MASK (mode)  1)))
   || GET_MODE_BITSIZE (mode) = BITS_PER_WORD))
 {

 Backend may assume mode of size in movmem and setmem expanders is no
 widder than Pmode since size is within the Pmode address space.  X86
 backend expand_set_or_movmem_prologue_epilogue_by_misaligned has

   rtx saveddest = *destptr;

   gcc_assert (desired_align = size);
   /* Align destptr up, place it to new register.  */
   *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr,
   GEN_INT (prolog_size),
   NULL_RTX, 1, OPTAB_DIRECT);
   *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr,
   GEN_INT (-desired_align),
   *destptr, 1, OPTAB_DIRECT);
   /* See how many bytes we skipped.  */
   saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
*destptr,
saveddest, 1, OPTAB_DIRECT);
   /* Adjust srcptr and count.  */
   if (!issetmem)
 *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, 
 saveddest,
 *srcptr, 1, OPTAB_DIRECT);
   *count = expand_simple_binop (GET_MODE (*count), PLUS, *count,
 saveddest, *count, 1, OPTAB_DIRECT);

 saveddest is a negative number in Pmode and *count is in word_mode.  For
 x32, when Pmode is SImode and word_mode is DImode, saveddest + *count
 leads to overflow.  We could fix it by using mode of saveddest to compute
 saveddest + *count.  But it leads to extra conversions and other backends
 may run into the same problem.  A better fix is to limit mode of size in
 movmem and setmem expanders to Pmode.  It generates better and correct
 memcpy and memset for x32.

 There is also a typo in comments.  It should be BITS_PER_WORD, not
 BITS_PER_HOST_WIDE_INT. 

I don't think it's a typo.  It's explaining why we don't have to worry about:

GET_MODE_BITSIZE (mode)  BITS_PER_HOST_WIDE_INT

in the CONST_INT_P test (because in that case the GET_MODE_MASK macro
will be an all-1 HOST_WIDE_INT, even though that's narrower than the
real mask).

I don't think the current comment covers the BITS_PER_WORD test at all.
AIUI it's there because the pattern is defined as taking a length of
at most word_mode, so we should stop once we reach it.

FWIW, I agree Pmode makes more sense on face value.  But shouldn't
we replace the BITS_PER_WORD test instead of adding to it?  Having both
would only make a difference for Pmode  word_mode targets, which might
be able to handle full Pmode lengths.

Either way, the md.texi documentation should be updated too.

Thanks,
Richard



Re: PR 58958: wrong aliasing info

2013-11-04 Thread Marc Glisse

On Mon, 4 Nov 2013, Richard Biener wrote:


On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener
richard.guent...@gmail.com wrote:

On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr wrote:

Hello,

the issue was described in the PR and the message linked from there.
ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
never learns of it and uses the offset+size as if they were meaningful.


Well...  it certainly learns of it, but it chooses to ignore...

   if (TREE_CODE (ptr) == ADDR_EXPR)
-ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
-ref-offset, t1, t2);
+{
+  ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr, 0),
+  t, 0, safe);
+  ref-offset = BITS_PER_UNIT * t;
+}

safe == (t1 != -1  t1 == t2)

note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
so I fail to see why it matters at all ...?  That is, if you feed in a wrong
size then it's the callers error.


I think one issue is that the above code uses get_ref_base_and_extent
on an address.  It should have used get_addr_base_and_unit_offset.


Isn't that what my patch does? Except that get_addr_base_and_unit_offset 
often gives up and returns NULL_TREE, whereas I believe we still want a 
base even if we have trouble determining a constant offset, so I modified 
get_addr_base_and_unit_offset_1 a bit.


(not saying the result is pretty...)

--
Marc Glisse


Re: [wide-int] Do not treat rtxes as sign-extended

2013-11-04 Thread Richard Biener
On Sat, Nov 2, 2013 at 3:25 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Kenneth Zadeck zad...@naturalbridge.com writes:
 On 11/02/2013 06:30 AM, Richard Sandiford wrote:
 Bah.  After all that effort, it turns out that -- by design --
 there is one special case where CONST_INTs are not sign-extended.
 Nonzero/true BImode integers are stored as STORE_FLAG_VALUE,
 which can be 1 rather than -1.  So (const_int 1) can be a valid
 BImode integer -- and consequently (const_int -1) can be wrong --
 even though BImode only has 1 bit.

 It might be nice to change that, but for wide-int I think we should
 just treat rtxes like trees for now.

 Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  It fixes some ICEs
 seen on bfin-elf.  OK to install?
 do we have to throw away the baby with the bath water here?  I guess
 what you are saying is that it is worse to have is_sign_extended be a
 variable that is almost always true than to be a hard false.

 Right.  is_sign_extended is only useful if it's a compile-time value.
 Making it a run-time value would negate the benefit.

 I think in practice STORE_FLAG_VALUE is a compile-time constant too,
 so we could set is_sign_extended to STORE_FLAG_VALUE == -1.  But AFAICT
 that would only help SPU and m68k.

 also we could preserve the test and make it not apply to bimode.

 You mean the one in the assert?  Yeah, OK.  How about this version?

 Richard


 Index: gcc/rtl.h
 ===
 --- gcc/rtl.h   2013-11-02 11:06:12.738517644 +
 +++ gcc/rtl.h   2013-11-02 14:22:05.636007860 +
 @@ -1408,7 +1408,9 @@ typedef std::pair rtx, enum machine_mod
{
  static const enum precision_type precision_type = VAR_PRECISION;
  static const bool host_dependent_precision = false;
 -static const bool is_sign_extended = true;
 +/* This ought to be true, except for the special case that BImode
 +   is canonicalized to STORE_FLAG_VALUE, which might be 1.  */
 +static const bool is_sign_extended = false;
  static unsigned int get_precision (const rtx_mode_t );
  static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int,
   const rtx_mode_t );
 @@ -1432,7 +1434,8 @@ wi::int_traits rtx_mode_t::decompose (
  case CONST_INT:
if (precision  HOST_BITS_PER_WIDE_INT)
 gcc_checking_assert (INTVAL (x.first)
 -== sext_hwi (INTVAL (x.first), precision));
 +== sext_hwi (INTVAL (x.first), precision)
 +|| (precision == 1  INTVAL (x.first) == 1));

please add a comment here (and a check for BImode?).

Thanks,
Richard.

return wi::storage_ref (INTVAL (x.first), 1, precision);



[C++ PATCH] Fix PR58979

2013-11-04 Thread Marek Polacek
Even RO_ARROW_STAR can end up in invalid_indirection_error in invalid
code, so handle that instead of ICEing later on.

Regtested/bootstrapped on x86_64-linux, ok for trunk and 4.8?

2013-11-04  Marek Polacek  pola...@redhat.com

PR c++/58979
c-family/
* c-common.c (invalid_indirection_error): Handle RO_ARROW_STAR case.
testsuite/
* g++.dg/diagnostic/pr58979.C: New test.

--- gcc/c-family/c-common.c.mp2 2013-11-04 10:30:15.265951084 +0100
+++ gcc/c-family/c-common.c 2013-11-04 10:47:29.164783425 +0100
@@ -9930,6 +9930,7 @@ invalid_indirection_error (location_t lo
type);
   break;
 case RO_ARROW:
+case RO_ARROW_STAR:
   error_at (loc,
invalid type argument of %-% (have %qT),
type);
--- gcc/testsuite/g++.dg/diagnostic/pr58979.C.mp2   2013-11-04 
11:09:30.515553664 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr58979.C   2013-11-04 11:09:26.505538785 
+0100
@@ -0,0 +1,4 @@
+// PR c++/58979
+// { dg-do compile }
+
+int i = 0-*0; // { dg-error invalid type argument of }

Marek


Re: [wide-int] Simplify some force_fit_type calls

2013-11-04 Thread Richard Biener
On Sun, Nov 3, 2013 at 11:31 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 When going through force_fit_type calls to see whether they were
 extending correctly, I noticed some of the calls in VRP could be simplified.

 There's no change in behaviour, it's just shorter and more efficient.

 Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

 Thanks,
 Richard


 Index: gcc/tree-vrp.c
 ===
 --- gcc/tree-vrp.c  2013-11-03 10:04:56.004019113 +
 +++ gcc/tree-vrp.c  2013-11-03 10:25:45.398984735 +
 @@ -1617,16 +1617,8 @@ extract_range_from_assert (value_range_t
/* Make sure to not set TREE_OVERFLOW on the final type
  conversion.  We are willingly interpreting large positive
  unsigned values as negative singed values here.  */
 -  min = force_fit_type (TREE_TYPE (var),
 -   wide_int::from (min,
 -   TYPE_PRECISION (TREE_TYPE (var)),
 -   TYPE_SIGN (TREE_TYPE (min))),
 -   0, false);
 -  max = force_fit_type (TREE_TYPE (var),
 -   wide_int::from (max,
 -   TYPE_PRECISION (TREE_TYPE (var)),
 -   TYPE_SIGN (TREE_TYPE (max))),
 -   0, false);
 +  min = force_fit_type (TREE_TYPE (var), wi::to_widest (min), 0, false);
 +  max = force_fit_type (TREE_TYPE (var), wi::to_widest (max), 0, false);

/* We can transform a max, min range to an anti-range or
   vice-versa.  Use set_and_canonicalize_value_range which does
 @@ -3235,20 +3227,12 @@ extract_range_from_unary_expr_1 (value_r
   if (is_overflow_infinity (vr0.min))
 new_min = negative_overflow_infinity (outer_type);
   else
 -   new_min = force_fit_type (outer_type,
 - wide_int::from
 -   (vr0.min,
 -TYPE_PRECISION (outer_type),
 -TYPE_SIGN (TREE_TYPE (vr0.min))),
 +   new_min = force_fit_type (outer_type, wi::to_widest (vr0.min),
   0, false);
   if (is_overflow_infinity (vr0.max))
 new_max = positive_overflow_infinity (outer_type);
   else
 -   new_max = force_fit_type (outer_type,
 - wide_int::from
 -   (vr0.max,
 -TYPE_PRECISION (outer_type),
 -TYPE_SIGN (TREE_TYPE (vr0.max))),
 +   new_max = force_fit_type (outer_type, wi::to_widest (vr0.max),
   0, false);
   set_and_canonicalize_value_range (vr, vr0.type,
 new_min, new_max, NULL);


Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size

2013-11-04 Thread H.J. Lu
Y
On Mon, Nov 4, 2013 at 3:11 AM, Richard Sandiford
rsand...@linux.vnet.ibm.com wrote:
 H.J. Lu hongjiu...@intel.com writes:
 emit_block_move_via_movmem and set_storage_via_setmem have

   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
mode = GET_MODE_WIDER_MODE (mode))
 {
   enum insn_code code = direct_optab_handler (movmem_optab, mode);

   if (code != CODE_FOR_nothing
   /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
  here because if SIZE is less than the mode mask, as it is
  returned by the macro, it will definitely be less than the
  actual mode mask.  */
((CONST_INT_P (size)
 ((unsigned HOST_WIDE_INT) INTVAL (size)
= (GET_MODE_MASK (mode)  1)))
   || GET_MODE_BITSIZE (mode) = BITS_PER_WORD))
 {

 Backend may assume mode of size in movmem and setmem expanders is no
 widder than Pmode since size is within the Pmode address space.  X86
 backend expand_set_or_movmem_prologue_epilogue_by_misaligned has

   rtx saveddest = *destptr;

   gcc_assert (desired_align = size);
   /* Align destptr up, place it to new register.  */
   *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr,
   GEN_INT (prolog_size),
   NULL_RTX, 1, OPTAB_DIRECT);
   *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr,
   GEN_INT (-desired_align),
   *destptr, 1, OPTAB_DIRECT);
   /* See how many bytes we skipped.  */
   saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
*destptr,
saveddest, 1, OPTAB_DIRECT);
   /* Adjust srcptr and count.  */
   if (!issetmem)
 *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, 
 saveddest,
 *srcptr, 1, OPTAB_DIRECT);
   *count = expand_simple_binop (GET_MODE (*count), PLUS, *count,
 saveddest, *count, 1, OPTAB_DIRECT);

 saveddest is a negative number in Pmode and *count is in word_mode.  For
 x32, when Pmode is SImode and word_mode is DImode, saveddest + *count
 leads to overflow.  We could fix it by using mode of saveddest to compute
 saveddest + *count.  But it leads to extra conversions and other backends
 may run into the same problem.  A better fix is to limit mode of size in
 movmem and setmem expanders to Pmode.  It generates better and correct
 memcpy and memset for x32.

 There is also a typo in comments.  It should be BITS_PER_WORD, not
 BITS_PER_HOST_WIDE_INT.

 I don't think it's a typo.  It's explaining why we don't have to worry about:

 GET_MODE_BITSIZE (mode)  BITS_PER_HOST_WIDE_INT

 in the CONST_INT_P test (because in that case the GET_MODE_MASK macro
 will be an all-1 HOST_WIDE_INT, even though that's narrower than the
 real mask).

Thanks for explanation.

 I don't think the current comment covers the BITS_PER_WORD test at all.
 AIUI it's there because the pattern is defined as taking a length of
 at most word_mode, so we should stop once we reach it.

I see.

 FWIW, I agree Pmode makes more sense on face value.  But shouldn't
 we replace the BITS_PER_WORD test instead of adding to it?  Having both
 would only make a difference for Pmode  word_mode targets, which might
 be able to handle full Pmode lengths.

Do we ever have a target with Pmode is wider than
word_mode? If not, we can check Pmode instead.

 Either way, the md.texi documentation should be updated too.

 Thanks,
 Richard




-- 
H.J.


Re: [PATCH GCC]Compute, cache and use cost of auto-increment rtx patterns in IVOPT

2013-11-04 Thread Richard Biener
On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng bin.ch...@arm.com wrote:
 Hi,

 The IVOPT in GCC has a problem that it does not use cost of auto-increment
 address expression in accounting, while it retreats to cost of address
 expression if auto-increment addressing mode is unavailable.
 For example, on ARM target:
 1) the cost of [reg] (which is 6) is used for address expression [reg],
 #off;
 2) the cost of [reg+off] (which is 2) is used for address expression
 [reg, #off]!;

 This causes:
 1) cost of non-auto increment address expression is used for auto-increment
 address expression;
 2) different address costs are used for pre/post increment address
 expressions.
 This patch fixes the problem by computing, caching and using the cost of
 auto-increment address expressions.

 Bootstrap and test on x86/arm.  Is it OK?

But don't you need to adjust

static bool
determine_use_iv_cost_address (struct ivopts_data *data,
   struct iv_use *use, struct iv_cand *cand)
{
  bitmap depends_on;
  bool can_autoinc;
  int inv_expr_id = -1;
  comp_cost cost = get_computation_cost (data, use, cand, true, depends_on,
 can_autoinc, inv_expr_id);

  if (cand-ainc_use == use)
{
  if (can_autoinc)
cost.cost -= cand-cost_step;

this which seems to try to compensate for your issue?

Or maybe I don't understand.

CCing Bernd who implemented this IIRC.

Richard.

 2013-11-01  Bin Cheng  bin.ch...@arm.com

 * tree-ssa-loop-ivopts.c (enum ainc_type): New.
 (address_cost_data): New field.
 (get_address_cost): Compute auto-increment rtx cost in ainc_costs.
 Use ainc_costs for auto-increment rtx patterns.
 Cleanup TWS.


Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Dodji Seketeli
Jakub Jelinek ja...@redhat.com writes:

 I think even as a fallback is the patch too expensive.
 I'd say best would be to write some getline API compatible function
 and just use it, using fread on say fixed size buffer.

OK, thanks for the insight.  I have just used the getline_fallback
function you proposed, slightly amended to use the memory allocation
routines commonly used in gcc and renamed into get_line, with a
hopefully complete comment explaining where this function comes from
etc.

[...]

 A slight complication is what to do on mingw/cygwin and other DOS or
 Mac style line ending environments, no idea what fgets exactly does
 there.

Actually, I think that even fgets just deals with '\n'.  The reason,
from what I gathered being that on windows, we fopen the input file in
text mode; and in that mode, the \r\n is transformed into just \n.

Apparently OSX is compatible with '\n' too.  Someone corrects me if I am
saying non-sense here.

So the patch below is what I am bootstrapping at the moment.

OK if it passes bootstrap on x86_64-unknown-linux-gnu against trunk?

 BTW, we probably want to do something with the speed of the caret
 diagnostics too, right now it opens the file again for each single line
 to be printed in caret diagnostics and reads all lines until the right one,
 so imagine how fast is printing of many warnings on almost adjacent lines
 near the end of many megabytes long file.
 Perhaps we could remember the last file we've opened for caret diagnostics,
 don't fclose the file right away but only if a new request is for a
 different file, perhaps keep some vector of line start offsets (say starting
 byte of every 100th line or similar) and also remember the last read line
 offset, so if a new request is for the same file, but higher line than last,
 we can just keep getlineing, and if it is smaller line than last, we look up
 the offset of the line / 100, fseek to it and just getline only modulo 100
 lines.  Maybe we should keep not just one, but 2 or 4 opened files as cache
 (again, with the starting line offsets vectors).

I like this idea.  I'll try and work on it.

And now the patch.

Cheers.

gcc/ChangeLog:

* input.h (location_get_source_line): Take an additional line_size
parameter by reference.
* input.c (get_line): New static function definition.
(read_line): Take an additional line_length output parameter to be
set to the size of the line.  Use the new get_line function to
compute the size of the line returned by fgets, rather than using
strlen.  Ensure that the buffer is initially zeroed; ensure that
when growing the buffer too.
(location_get_source_line): Take an additional output line_len
parameter.  Update the use of read_line to pass it the line_len
parameter.
* diagnostic.c (adjust_line): Take an additional input parameter
for the length of the line, rather than calculating it with
strlen.
(diagnostic_show_locus): Adjust the use of
location_get_source_line and adjust_line with respect to their new
signature.  While displaying a line now, do not stop at the first
null byte.  Rather, display the zero byte as a space and keep
going until we reach the size of the line.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
---
 gcc/diagnostic.c   |  17 ++--
 gcc/input.c| 104 ++---
 gcc/input.h|   3 +-
 .../c-c++-common/cpp/warning-zero-in-literals-1.c  | Bin 0 - 240 bytes
 4 files changed, 103 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 36094a1..0ca7081 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context,
MAX_WIDTH by some margin, then adjust the start of the line such
that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
margin is either 10 characters or the difference between the column
-   and the length of the line, whatever is smaller.  */
+   and the length of the line, whatever is smaller.  The length of
+   LINE is given by LINE_WIDTH.  */
 static const char *
-adjust_line (const char *line, int max_width, int *column_p)
+adjust_line (const char *line, int line_width,
+int max_width, int *column_p)
 {
   int right_margin = 10;
-  int line_width = strlen (line);
   int column = *column_p;
 
   right_margin = MIN (line_width - column, right_margin);
@@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context,
   const diagnostic_info *diagnostic)
 {
   const char *line;
+  int line_width;
   char *buffer;
   expanded_location s;
   int max_width;
@@ -297,22 +299,25 @@ 

Re: Patch RFA: With -fnon-call-exceptions sync builtins may throw

2013-11-04 Thread Richard Biener
On Mon, Nov 4, 2013 at 7:01 AM, Ian Lance Taylor i...@google.com wrote:
 The middle-end currently marks all the sync builtins as nothrow.  That
 is incorrect for most of them when using -fnon-call-exceptions.  The
 -fnon-call-exceptions option means that instructions that trap may throw
 exceptions.  Most of the sync builtins access memory via pointers passed
 by user code.  Therefore, they may trap.  (I noticed this because Go
 uses -fnon-call-exceptions.)

 This patch fixes the problem by introducing a new internal function
 attribute nothrow call.  The new attribute is exactly like nothrow,
 except that it is ignored when using -fnon-call-exceptions.

 It is an internal attribute because there is no reason for a user to use
 this.  The problem only arises when the middle-end sees a function call
 that the backend turns into an instruction sequence that directly
 references memory.  That can only happen for functions that are built in
 to the compiler.

 This requires changes to the C family, LTO, and Ada frontends.  I think
 I'm handling the option correctly for LTO, but it would be good if
 somebody could check the logic for me.

 Bootstrapped and ran tests on x86_64-unknown-linux-gnu.  OK for
 mainline?

Hmm.  I think you can handle this in a simpler way by just
doing sth similar to

#define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \
ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING)

that is, conditionally drop the _NOTHROW part.  Btw, I also think
that if it can throw then it isn't LEAF (it may return exceptionally
into another function in the unit).

Also this whole conditional-on-a-flag thing doesn't work well with
using -fnon-call-exceptions in the optimize attribute or with
different -fnon-call-exceptions settings in different TUs we LTO
together.  Because we only have a single builtin decl (so for
proper operation you'd have to clone builtin decls based on
the matrix of flags that generate different attributes and use the
correct one depending on context).

That said, I'd prefer a simpler approach for now, mimicing flag_errno_math
handling.

Your lto-wrapper parts are clearly wrong (in a conservative way
though).  They follow -fexceptions so the lto-wrapper.c and
lto-opts.c parts are ok.

Fixing the LTO pieces would require to really stream most
builtins instead of generating them in the LTO frontend and streaming
them via DECL_FUNCTION_CODE.  tree merging should get
rid of the duplicates.  I suppose I should try that again ;)

Thanks,
Richard.

 Ian


 gcc/ChangeLog:

 2013-11-03  Ian Lance Taylor  i...@google.com

 * builtin-attrs.def (ATTR_NOTHROWCALL): Define.
 (ATTR_NOTHROWCALL_LIST, ATTR_NOTHROWCALL_LEAF_LIST): Define.
 * sync-builtins.def: Use ATTR_NOTHROWCALL_LEAF_LIST for all sync
 builtins that take pointers.
 * lto-opts.c (lto_write_options): Write -fnon-call-exceptions
 if set.
 * lto-wrapper.c (merge_and_complain): Collect
 OPT_fnon_call_exceptions.
 (run_gcc): Pass -fnon-call-exceptions.

 gcc/c-family/ChangeLog:

 2013-11-03  Ian Lance Taylor  i...@google.com

 * c-common.c (c_common_attribute_table): Add nothrow call.
 (handle_nothrowcall_attribute): New static function.

 gcc/lto/ChangeLog:

 2013-11-03  Ian Lance Taylor  i...@google.com

 * lto-lang.c (lto_attribute_table): Add nothrow call
 (handle_nothrowcall_attribute): New static function.

 gcc/ada/ChangeLog:

 2013-11-03  Ian Lance Taylor  i...@google.com

 * gcc-interface/utils.c (gnat_internal_attribute_table): Add
 nothrow call.
 (handle_nothrowcall_attribute): New static function.

 gcc/testsuite/ChangeLog:

 2013-11-03  Ian Lance Taylor  i...@google.com

 * g++.dg/ext/sync-4.C: New test.




Re: PR 58958: wrong aliasing info

2013-11-04 Thread Richard Biener
On Mon, Nov 4, 2013 at 12:18 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 4 Nov 2013, Richard Biener wrote:

 On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener
 richard.guent...@gmail.com wrote:

 On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr
 wrote:

 Hello,

 the issue was described in the PR and the message linked from there.
 ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
 detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
 never learns of it and uses the offset+size as if they were meaningful.


 Well...  it certainly learns of it, but it chooses to ignore...

if (TREE_CODE (ptr) == ADDR_EXPR)
 -ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
 -ref-offset, t1, t2);
 +{
 +  ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr,
 0),
 +  t, 0, safe);
 +  ref-offset = BITS_PER_UNIT * t;
 +}

 safe == (t1 != -1  t1 == t2)

 note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
 so I fail to see why it matters at all ...?  That is, if you feed in a
 wrong
 size then it's the callers error.


 I think one issue is that the above code uses get_ref_base_and_extent
 on an address.  It should have used get_addr_base_and_unit_offset.


 Isn't that what my patch does? Except that get_addr_base_and_unit_offset
 often gives up and returns NULL_TREE, whereas I believe we still want a base
 even if we have trouble determining a constant offset, so I modified
 get_addr_base_and_unit_offset_1 a bit.

 (not saying the result is pretty...)

I don't think we want get_addr_base_and_unit_offset to return non-NULL
for a non-constant offset.  But yes, inside your patch is the correct fix,
so if you drop changing get_addr_base_and_unit_offset ...

Richard.

 --
 Marc Glisse


RE: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Bernd Edlinger
Hi,


I see another read_line at gcov.c, which seems to be a copy.

Maybe this should be changed too?

What do you think?

Bernd.

On Mon, 4 Nov 2013 12:46:10, Dodji Seketeli wrote:

 Jakub Jelinek ja...@redhat.com writes:

 I think even as a fallback is the patch too expensive.
 I'd say best would be to write some getline API compatible function
 and just use it, using fread on say fixed size buffer.

 OK, thanks for the insight. I have just used the getline_fallback
 function you proposed, slightly amended to use the memory allocation
 routines commonly used in gcc and renamed into get_line, with a
 hopefully complete comment explaining where this function comes from
 etc.

 [...]

 A slight complication is what to do on mingw/cygwin and other DOS or
 Mac style line ending environments, no idea what fgets exactly does
 there.

 Actually, I think that even fgets just deals with '\n'. The reason,
 from what I gathered being that on windows, we fopen the input file in
 text mode; and in that mode, the \r\n is transformed into just \n.

 Apparently OSX is compatible with '\n' too. Someone corrects me if I am
 saying non-sense here.

 So the patch below is what I am bootstrapping at the moment.

 OK if it passes bootstrap on x86_64-unknown-linux-gnu against trunk?

 BTW, we probably want to do something with the speed of the caret
 diagnostics too, right now it opens the file again for each single line
 to be printed in caret diagnostics and reads all lines until the right one,
 so imagine how fast is printing of many warnings on almost adjacent lines
 near the end of many megabytes long file.
 Perhaps we could remember the last file we've opened for caret diagnostics,
 don't fclose the file right away but only if a new request is for a
 different file, perhaps keep some vector of line start offsets (say starting
 byte of every 100th line or similar) and also remember the last read line
 offset, so if a new request is for the same file, but higher line than last,
 we can just keep getlineing, and if it is smaller line than last, we look up
 the offset of the line / 100, fseek to it and just getline only modulo 100
 lines. Maybe we should keep not just one, but 2 or 4 opened files as cache
 (again, with the starting line offsets vectors).

 I like this idea. I'll try and work on it.

 And now the patch.

 Cheers.

 gcc/ChangeLog:

 * input.h (location_get_source_line): Take an additional line_size
 parameter by reference.
 * input.c (get_line): New static function definition.
 (read_line): Take an additional line_length output parameter to be
 set to the size of the line. Use the new get_line function to
 compute the size of the line returned by fgets, rather than using
 strlen. Ensure that the buffer is initially zeroed; ensure that
 when growing the buffer too.
 (location_get_source_line): Take an additional output line_len
 parameter. Update the use of read_line to pass it the line_len
 parameter.
 * diagnostic.c (adjust_line): Take an additional input parameter
 for the length of the line, rather than calculating it with
 strlen.
 (diagnostic_show_locus): Adjust the use of
 location_get_source_line and adjust_line with respect to their new
 signature. While displaying a line now, do not stop at the first
 null byte. Rather, display the zero byte as a space and keep
 going until we reach the size of the line.

 gcc/testsuite/ChangeLog:

 * c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
 ---
 gcc/diagnostic.c | 17 ++--
 gcc/input.c | 104 ++---
 gcc/input.h | 3 +-
 .../c-c++-common/cpp/warning-zero-in-literals-1.c | Bin 0 - 240 bytes
 4 files changed, 103 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c

 diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
 index 36094a1..0ca7081 100644
 --- a/gcc/diagnostic.c
 +++ b/gcc/diagnostic.c
 @@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context,
 MAX_WIDTH by some margin, then adjust the start of the line such
 that the COLUMN is smaller than MAX_WIDTH minus the margin. The
 margin is either 10 characters or the difference between the column
 - and the length of the line, whatever is smaller. */
 + and the length of the line, whatever is smaller. The length of
 + LINE is given by LINE_WIDTH. */
 static const char *
 -adjust_line (const char *line, int max_width, int *column_p)
 +adjust_line (const char *line, int line_width,
 + int max_width, int *column_p)
 {
 int right_margin = 10;
 - int line_width = strlen (line);
 int column = *column_p;

 right_margin = MIN (line_width - column, right_margin);
 @@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context,
 const diagnostic_info *diagnostic)
 {
 const char *line;
 + int line_width;
 char *buffer;
 expanded_location s;
 int max_width;
 @@ -297,22 +299,25 @@ diagnostic_show_locus (diagnostic_context * context,

 context-last_location = diagnostic-location;
 s = 

Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 12:59:49PM +0100, Bernd Edlinger wrote:
 I see another read_line at gcov.c, which seems to be a copy.

Copy of what?  gcov.c read_line hardly can be allowed to fail because out of
mem unlike this one for caret diagnostics.
Though, surely, this one could be somewhat adjusted so that it really
doesn't use a temporary buffer but reads directly into the initially
malloced, then realloced, buffer.  But, if we want it to eventually switch
to caching the caret diagnostics, it won't be possible/desirable anymore.

Jakub


RE: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Bernd Edlinger

 On Mon, Nov 04, 2013 at 12:59:49PM +0100, Bernd Edlinger wrote:
 I see another read_line at gcov.c, which seems to be a copy.

 Copy of what? gcov.c read_line hardly can be allowed to fail because out of
 mem unlike this one for caret diagnostics.
 Though, surely, this one could be somewhat adjusted so that it really
 doesn't use a temporary buffer but reads directly into the initially
 malloced, then realloced, buffer. But, if we want it to eventually switch
 to caching the caret diagnostics, it won't be possible/desirable anymore.

 Jakub

gcov.c and input.c currently both have a static function read_line
they are currently 100% in sync. Both _can_ fail, if the file gets
deleted or modified while the function executes.

If gcov.c crashes in that event, I'd call it a bug.

Bernd.

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-11-04 Thread Richard Biener
On Wed, Oct 30, 2013 at 5:20 PM, Jeff Law l...@redhat.com wrote:
 On 10/30/13 04:34, Ilya Enkovich wrote:

 On 30 Oct 10:26, Richard Biener wrote:


 Ick - you enlarge all return statements?  But you don't set the
 actual value? So why allocate it with 2 ops in the first place??


 When return does not return bounds it has operand with zero value
 similar to case when it does not return value. What is the difference
 then?

 In general, when someone proposes a change in the size of tree, rtl or
 gimple nodes, it's a yellow flag that something may need further
 investigation.

 In this specific instance, I could trivially predict how that additional
 field would be used and a GIMPLE_RETURN isn't terribly important from a size
 standpoint, so I didn't call it out.

Btw, both for regular return with no value and this case only required
operands could be allocated.  There may be legacy issues with
the regular return value but for new uses always allocating the
operand seems wrong (even if cheap in practice).

Richard.

 Returns instrumentation. We add new operand to return statement to
 hold returned bounds and instrumentation pass is responsible to fill
 this operand with correct bounds

 Exactly what I expected.



 Unfortunately patch has been already installed.  Should we uninstall
 it?  If not, then here is patch for documentation.

 I think we're OK for now.  If Richi wants it out, he'll say so explicitly.




 Thanks, Ilya --

 gcc/

 2013-10-30  Ilya Enkovich  ilya.enkov...@intel.com

 * doc/gimple.texi (gimple_call_num_nobnd_args): New.
 (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New.
 (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index):
 New.

 Can you also fixup the GIMPLE_RETURN documentation in gimple.texi.  It needs
 a minor update after these changes.

 jeff



Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-11-04 Thread Richard Biener
On Wed, Oct 30, 2013 at 6:40 PM, Jeff Law l...@redhat.com wrote:
 On 10/30/13 04:48, Richard Biener wrote:

 foo (int * p, unsigned int size)
 {
unnamed type __bound_tmp.0;
long unsigned int D.2239;
long unsigned int _2;
sizetype _6;
int * _7;

bb 3:
__bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D));

bb 2:
_2 = (long unsigned int) size_1(D);
__builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D));
_6 = _2 + 18446744073709551615;
_7 = p_3(D) + _6;
__builtin_ia32_bndcu (__bound_tmp.0_4, _7);
access_and_store (p_3(D), __bound_tmp.0_4, size_1(D));

 so it seems there is now a mismatch between DECL_ARGUMENTS
 and the GIMPLE call stmt arguments.  How (if) did you amend
 the GIMPLE stmt verifier for this?

 Effectively the bounds are passed on the side.

Well, not exactly.  I can see __bound_tmp.0_4 passed to access_and_store.

Are __builtin_ia32_bndcu and access_and_store supposed to be
called directly after each other?  What if for example profile instrumentation
inserts a call inbetween them?

 How does regular code deal with this which may expect matching
 to DECL_ARGUMENTS?  In fact interleaving the additional
 arguments sounds very error-prone for existing code - I'd have
 appended all bound args at the end.  Also you unconditionally
 claim all pointer arguments have a bound - that looks like bad
 design as well.  Why didn't you add a flag to the relevant
 PARM_DECL (and then, what do you do for indirect calls?).

 You can't actually interleave them -- that results in MPX and normal code
 not being able to interact.   Passing the bound at the end doesn't really
 work either -- varargs and the desire to pass some of the bounds around in
 bound registers.

I'm only looking at how bound arguments are passed at the GIMPLE
level - which I think is arbitrary given they are passed on-the-side
during code-generation.  So I'm looking for a more sensible option
for the GIMPLE representation which looks somewhat fragile to me
(it looks the argument is only there to have a data dependence
between the bndcu intrinsic and the actual call?)


 /* Return the number of arguments used by call statement GS
 ignoring bound ones.  */

 static inline unsigned
 gimple_call_num_nobnd_args (const_gimple gs)
 {
unsigned num_args = gimple_call_num_args (gs);
unsigned res = num_args;
for (unsigned n = 0; n  num_args; n++)
  if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
res--;
return res;
 }

 the choice means that gimple_call_num_nobnd_args is not O(1).

 Yes, but I don't see that's terribly problematical.

Well, consider compile.exp limits-fnargs.c written with int * parameters
and bound support ;)  [there is a limit on the number of bound registers
available, but the GIMPLE parts doesn't put any limit on instrumented
args?]


 /* Return INDEX's call argument ignoring bound ones.  */
 static inline tree
 gimple_call_nobnd_arg (const_gimple gs, unsigned index)
 {
/* No bound args may exist if pointers checker is off.  */
if (!flag_check_pointer_bounds)
  return gimple_call_arg (gs, index);
return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs,
 index));
 }

 GIMPLE layout depending on flag_check_pointer_bounds sounds
 like a recipie for desaster if you consider TUs compiled with and
 TUs compiled without and LTO.  Or if you consider using
 optimized attribute with that flag.

 Sorry, I don't follow.  Can you elaborate please.

Compile TU1 with -fno-chk, TU2 with -fchk, LTO both object files
which gives you - well, either -fno-chk or -fchk.  Then you read in
the GIMPLE and when calling gimple_call_nobnd_arg you get
a mismatch between where you generated the gimple and where
you use the gimple.

A flag on whether a call is instrumented is better (consider inlining
from TU1 into TU2 where a flag on struct function for example wouldn't
be enough either).

Richard.

 I hope the reviewers that approved the patch will work with you to
 address the above issues.  I can't be everywhere.

 Obviously I will.

 jeff



Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Richard Biener
On Wed, Oct 30, 2013 at 8:55 PM, Ian Lance Taylor i...@google.com wrote:
 On Wed, Oct 30, 2013 at 8:47 AM, Jason Merrill ja...@redhat.com wrote:
 I find -Wformat warnings about unknown % codes when building with an older
 GCC to be mildly annoying noise; this patch avoids them by passing
 -Wno-format during stage 1.

 Tested x86_64-pc-linux-gnu.  Is this OK for trunk?  Do you have another
 theory of how this should work?

 Thank you!

Please make sure the warning is preserved when not bootstrapping,
it's useful to have them in your dev-tree and not notice errors only
during -Werror bootstrap ...

Richard.

 Ian


Re: [PATCH][RFA] Improvements to infer_nonnull_range

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 6:41 AM, Jeff Law l...@redhat.com wrote:
 On 10/28/13 23:02, Jeff Law wrote:


 Based on a suggestion from Marc, I want to use infer_nonnull_range in
 the erroneous path isolation optimization.

 In the process of doing that, I found  a few deficiencies in
 infer_nonnull_range that need to be addressed.


 First, infer_nonnull_range_p doesn't infer ranges from a GIMPLE_RETURN
 if the current function is marked as returns_nonnull.  That's fixed in
 the relatively obvious way.

 Second, infer_nonnull_range_p, when presented with an arglist where the
 non-null attribute applies to all pointer arguments, it won't bother to
 determine if OP is actually one of the arguments :(  It just assumes
 that OP is in the argument list.  Opps.

 Third, I want to be able to call infer_nonnull_range with OP being 0B.
 That lets me use infer_nonnull_range to look for explicit null pointer
 dereferences, explicit uses of null in a return statement in functions
 that can't return non-null and explicit uses of null arguments when
 those arguments can't be null.   Sadly, to make that work we need to use
 operand_equal_p rather than simple pointer comparisons to see if OP
 shows up in STMT.

 Finally, for detecting explicit null pointers, infer_nonnull_range calls
 count_ptr_derefs.  count_ptr_derefs counts things in two ways.  One with
 a FOR_EACH_SSA_TREE_OPERAND, then again with simple walks of the tree
 structures.   Not surprisingly if we're looking for an explicit 0B, then
 the loop over the operands finds nothing, but walking the tree
 structures does. And the checking assert triggers.  This change removes
 the assert and instead sets *num_uses_p to a sane value.

 I don't have testcases for this stuff that are independent of the
 erroneous path isolation optimization.  However, each is triggered by
 tests I'll include in the the erroneous path isolation patch.

 Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  OK for
 the trunk?

 I'm withdrawing this patch.  While everything works exactly how I wanted, I
 think it's best to rework things a bit and totally eliminate
 count_ptr_derefs entirely.

Yes!  And count_uses_and_derefs, too.  See walk_stmt_load_store_addr_ops.

Richard.


 That change makes it significantly easier to move infer_nonnull_range out of
 tree-vrp.c and into gimple.c (or wherever Andrew wants it) without a
 significant modularity break.

 Jeff



Re: [PATCH GCC]Compute, cache and use cost of auto-increment rtx patterns in IVOPT

2013-11-04 Thread Bin.Cheng
On Mon, Nov 4, 2013 at 7:38 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng bin.ch...@arm.com wrote:
 Hi,

 The IVOPT in GCC has a problem that it does not use cost of auto-increment
 address expression in accounting, while it retreats to cost of address
 expression if auto-increment addressing mode is unavailable.
 For example, on ARM target:
 1) the cost of [reg] (which is 6) is used for address expression [reg],
 #off;
 2) the cost of [reg+off] (which is 2) is used for address expression
 [reg, #off]!;

 This causes:
 1) cost of non-auto increment address expression is used for auto-increment
 address expression;
 2) different address costs are used for pre/post increment address
 expressions.
 This patch fixes the problem by computing, caching and using the cost of
 auto-increment address expressions.

 Bootstrap and test on x86/arm.  Is it OK?

 But don't you need to adjust

 static bool
 determine_use_iv_cost_address (struct ivopts_data *data,
struct iv_use *use, struct iv_cand *cand)
 {
   bitmap depends_on;
   bool can_autoinc;
   int inv_expr_id = -1;
   comp_cost cost = get_computation_cost (data, use, cand, true, depends_on,
  can_autoinc, inv_expr_id);

   if (cand-ainc_use == use)
 {
   if (can_autoinc)
 cost.cost -= cand-cost_step;

 this which seems to try to compensate for your issue?
That's where problem gets complicated depending on how backend defines
address cost.  For back ends define cost according to the true cost of
addressing mode approximately, the address cost of auto-increment
addressing mode doesn't take the saved stepping instruction into
consideration, so the code is necessary.
Moreover, according to gcc internal's description about
TARGET_ADDRESS_COST, RISC machines may define different address cost
for addressing modes which actually have equal execution on
micro-architecture level (like ARM for now).  The problems are:
1) Though address costs are defined in this discriminated way, it's
unlikely to have the saved stepping instruction considered either.
The address cost of auto-increment address expression shouldn't go so
far.
2) We think the discriminated address cost model is established
before gimple pass and is outdated.  The true micro-architecture
address cost (or cost normalized with COSTS_N_INSNS) should be used in
GCC nowadays.  The rtl passes like fwprop_addr which use address cost
as heuristic information should be refined... but that's totally
another problem (am investigating it).

So overall, I think the stepping cost should to be subtracted here.


 Or maybe I don't understand.

 CCing Bernd who implemented this IIRC.
Any suggestions appreciated.

Thanks.
bin

-- 
Best Regards.


Re: PATCH: middle-end/58981: movmem/setmem use mode wider than Pmode for size

2013-11-04 Thread H.J. Lu
On Mon, Nov 4, 2013 at 3:34 AM, H.J. Lu hjl.to...@gmail.com wrote:
 Y
 On Mon, Nov 4, 2013 at 3:11 AM, Richard Sandiford
 rsand...@linux.vnet.ibm.com wrote:
 H.J. Lu hongjiu...@intel.com writes:
 emit_block_move_via_movmem and set_storage_via_setmem have

   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
mode = GET_MODE_WIDER_MODE (mode))
 {
   enum insn_code code = direct_optab_handler (movmem_optab, mode);

   if (code != CODE_FOR_nothing
   /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
  here because if SIZE is less than the mode mask, as it is
  returned by the macro, it will definitely be less than the
  actual mode mask.  */
((CONST_INT_P (size)
 ((unsigned HOST_WIDE_INT) INTVAL (size)
= (GET_MODE_MASK (mode)  1)))
   || GET_MODE_BITSIZE (mode) = BITS_PER_WORD))
 {

 Backend may assume mode of size in movmem and setmem expanders is no
 widder than Pmode since size is within the Pmode address space.  X86
 backend expand_set_or_movmem_prologue_epilogue_by_misaligned has

   rtx saveddest = *destptr;

   gcc_assert (desired_align = size);
   /* Align destptr up, place it to new register.  */
   *destptr = expand_simple_binop (GET_MODE (*destptr), PLUS, *destptr,
   GEN_INT (prolog_size),
   NULL_RTX, 1, OPTAB_DIRECT);
   *destptr = expand_simple_binop (GET_MODE (*destptr), AND, *destptr,
   GEN_INT (-desired_align),
   *destptr, 1, OPTAB_DIRECT);
   /* See how many bytes we skipped.  */
   saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, 
 saveddest,
*destptr,
saveddest, 1, OPTAB_DIRECT);
   /* Adjust srcptr and count.  */
   if (!issetmem)
 *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, 
 saveddest,
 *srcptr, 1, OPTAB_DIRECT);
   *count = expand_simple_binop (GET_MODE (*count), PLUS, *count,
 saveddest, *count, 1, OPTAB_DIRECT);

 saveddest is a negative number in Pmode and *count is in word_mode.  For
 x32, when Pmode is SImode and word_mode is DImode, saveddest + *count
 leads to overflow.  We could fix it by using mode of saveddest to compute
 saveddest + *count.  But it leads to extra conversions and other backends
 may run into the same problem.  A better fix is to limit mode of size in
 movmem and setmem expanders to Pmode.  It generates better and correct
 memcpy and memset for x32.

 There is also a typo in comments.  It should be BITS_PER_WORD, not
 BITS_PER_HOST_WIDE_INT.

 I don't think it's a typo.  It's explaining why we don't have to worry about:

 GET_MODE_BITSIZE (mode)  BITS_PER_HOST_WIDE_INT

 in the CONST_INT_P test (because in that case the GET_MODE_MASK macro
 will be an all-1 HOST_WIDE_INT, even though that's narrower than the
 real mask).

 Thanks for explanation.

 I don't think the current comment covers the BITS_PER_WORD test at all.
 AIUI it's there because the pattern is defined as taking a length of
 at most word_mode, so we should stop once we reach it.

 I see.

 FWIW, I agree Pmode makes more sense on face value.  But shouldn't
 we replace the BITS_PER_WORD test instead of adding to it?  Having both
 would only make a difference for Pmode  word_mode targets, which might
 be able to handle full Pmode lengths.

 Do we ever have a target with Pmode is wider than
 word_mode? If not, we can check Pmode instead.

 Either way, the md.texi documentation should be updated too.


This is the updated patch with md.texi change.   The testcase is the same.
Tested on x32.  OK to install?

Thanks.

-- 
H.J.
---
gcc/

2013-11-04  H.J. Lu  hongjiu...@intel.com

PR middle-end/58981
* doc/md.texi (@code{movmem@var{m}}): Specify Pmode as mode of
pattern, instead of word_mode.

* expr.c (emit_block_move_via_movmem): Don't use mode wider than
Pmode for size.
(set_storage_via_setmem): Likewise.

gcc/testsuite/

2013-11-04  H.J. Lu  hongjiu...@intel.com

PR middle-end/58981
* gcc.dg/pr58981.c: New test.

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index ac10a0a..1e22b88 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5291,12 +5291,13 @@ are the first two operands, and both are
@code{mem:BLK}s with an
 address in mode @code{Pmode}.

 The number of bytes to move is the third operand, in mode @var{m}.
-Usually, you specify @code{word_mode} for @var{m}.  However, if you can
+Usually, you specify @code{Pmode} for @var{m}.  However, if you can
 generate better code knowing the range of valid lengths is smaller than
-those representable in a full word, you should provide a pattern with a
+those representable in a 

Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Andrew MacLeod

On 11/01/2013 06:42 PM, David Malcolm wrote:



!   gimple_statement_call *call;
!   call = gimple_build_call_vec (build_fold_addr_expr_loc (0,
alias), vargs);
  gimple_call_set_lhs (call, atree);

that is looking much better :-)

I'd love to make use of compile-time type-safety like this, but does
this have to gate this initial step?
no, I wasn't implying it did... just mapping out what I think we can go 
on to do with it.





And eventually we can pull the accessor routines and others into the
class itself:
 gimple_call call;
 call = gimple_build_call_vec (build_fold_addr_expr_loc (0,
alias), vargs);
 call-set_lhs (atree);

Nice.  It's readable (IMHO), and the type-checking (that it's a call) is
enforced at compile-time, rather than at run-time (and only in
ENABLE_CHECKING builds).

Sadly, this may need some further gengtype work: the simple inheritance
support I'm using in this patch series will barf if it sees methods
within a class.  So I can have a look at supporting methods in gengtype,
I guess.   Though the change above of gimple_call_set_lhs to accepting a
subclass ptr gives a middle-ground.
Eventually this is going to be an issue elsewhere as well.  Whats the 
problem with methods in gengtype?  it just doesn't parse them? or 
something deeper and more insidious?

Which results in a similar feel to the new gimple_type, gimple_decl,
etc. classes with my upcoming tree wrappers.  Changing gimple statements
to behave something like this is actually mentioned in the plan, but out
in the future once trees have been taken care of.

I would also plan to create instances for each of the gimple statements

By instances presumably you meant subclasses?

yeah.

That all said, this change enables that work to proceed if someone wants
to do it.

My question is: Is anyone going to do it, and if so,  who and when? :-)

I'd be up for working on a followup patch that adds such subclasses, and
I'd be up for changing the accessors to give up compile-time
type-safety.  I'd do it one accessor at a time.
Its probably pushing it for 4.9...  and doesn't really serve much 
purpose to try do do any of this followup work now.  Probably a better 
task  for the next stage 1 anyway when there is time to fully flush it 
out and prove the various improvements to have full support from 
everyone :-)

Again, as noted in the earlier patch series, the names of the structs
are rather verbose.  I would prefer to also rename them all to eliminate
the _statement component:
gimple_statement_base - gimple_base
gimple_statement_phi  - gimple_phi
gimple_statement_omp  - gimple_omp
etc, but I didn't do this to mimimize the patch size.  But if the core
maintainers are up for that, I can redo the patch series with that
change also, or do that as a followup.

As mentioned, I'd rather see the short names be typedefs for pointers to
the long names since we're typically using the pointers.

[FWIW, I'm not a fan of such typedefs, but I'll defer to you in this]


Just expressing my opinion :-)


Presumably we'd want typedefs for the extra subclasses as well?

Also, I take it we'd use the printable_name from gimple.def - though
would we use it for the subclass name, or for the ptr typedef?


I would think a typedef for any type that we would directly refer to in 
the code, like a gimple_omp_clause, gimple_assign, or whatever.



Other than stomping on the same bits at the moment (less so once
gimple-stmt.[ch] is split out), these changes are completely orthogonal,
but in line with with my direction.  I think It should be done sooner or
later  My preference is to also see the follow up work carried out
as well.  Or at least a plan for it.

So you like the approach, provided I commit to the followup work? [1]

OK, I'll try to create some followup patches next week.  I'd prefer to
get the conversion to inheritance into trunk sooner rather than later,
though.

Thanks for looking at this (and for your cleanup work!)

Dave
[1] and the patches still need formal review.


Im not the one that has to be sold on this, I'm actually ambivalent to 
the change this close to the end of stage1...  I'm pointing out what we 
can eventually do with it in case it sways opinion..


I would have considered something similar in a while when the other 
refactoring work reached that point.  In some ways it might be better 
for next stage 1, but perhaps you have outlined enough benefits for it 
to be attractive to someone today.  It seems to be pretty neutral in 
most other respects.



Andrew


[Patch, avr] Emit diagnostics if -f{pic,PIC,pie,PIE} or -shared is passed

2013-11-04 Thread Senthil Kumar Selvaraj
The AVR backend does not generate position independent code, yet it
happily accepts -fpic, -fPIC, -fpie and -fPIE. The generated code
doesn't change at all. Also, it accepts the -shared option to generate a
shared library, without really doing anything with it.

This causes one of the regression tests 
(gcc.dg/lto/pr54709 c_lto_pr54709_0.o-c_lto_pr54709_1.o link) to fail with 
an 'undefined reference to main' error, when the test is trying to build 
a shared object.

The attached patch generates a warning if one of the -f{pic,PIC,pie,PIE} 
options is provided, and an error if -shared is provided (
config/mep/mep.c and config/s390/tpf.h already do something very similar).

Regression tested with no new failures.Tests which exercise PIC now report as 
unsupported.

If ok, could someone commit please? I don't have commit access.

Regards
Senthil

gcc/ChangeLog
2013-11-04  Senthil Kumar Selvaraj  senthil_kumar.selva...@atmel.com

* config/avr/avr.c (avr_option_override): Warn if asked to generate
position independent code.
* config/avr/avr.h: Modify LINK_SPEC to reject -shared.


diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index e7e1c2f..cf4b8ca 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -310,6 +310,15 @@ avr_option_override (void)
   flag_omit_frame_pointer = 0;
 }
 
+  if (flag_pic == 1)
+warning (OPT_fpic, -fpic is not supported);
+  if (flag_pic == 2)
+warning (OPT_fPIC, -fPIC is not supported);
+  if (flag_pie == 1)
+warning (OPT_fpie, -fpie is not supported);
+  if (flag_pie == 2)
+warning (OPT_fPIE, -fPIE is not supported);
+
   avr_current_device = avr_mcu_types[avr_mcu_index];
   avr_current_arch = avr_arch_types[avr_current_device-arch];
 
diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index f223a61..1eff5be 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -522,7 +522,8 @@ extern const char *avr_device_to_sp8 (int argc, const char 
**argv);
mmcu=at90can64*|\
mmcu=at90usb64*:--pmem-wrap-around=64k}}}\
 %:device_to_ld(%{mmcu=*:%*})\
-%:device_to_data_start(%{mmcu=*:%*})
+%:device_to_data_start(%{mmcu=*:%*})\
+%{shared:%eshared is not supported}
 
 #define LIB_SPEC \
   
%{!mmcu=at90s1*:%{!mmcu=attiny11:%{!mmcu=attiny12:%{!mmcu=attiny15:%{!mmcu=attiny28:
 -lc }


Re: [RFA][PATCH] Isolate erroneous paths optimization

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 7:11 AM, Jeff Law l...@redhat.com wrote:

 I've incorporated the various suggestions from Marc and Richi, except for
 Richi's to integrate this into jump threading.

 I've also made the following changes since the last version:

   1. Added more testcases.

   2. Use infer_nonnull_range, moving it from tree-vrp.c
   into gimple.c.  Minor improvements to infer_nonnull_range
   to make it handle more cases we care about and avoid using
   unnecessary routines from tree-ssa.c (which can now be removed)

   3. Multiple undefined statements in a block are handled in the
   logical way.

 Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  OK for the
 trunk?

Comments inline

 Thanks,
 Jeff

 * Makefile.in (OBJS): Add gimple-ssa-isolate-paths.o
 * common.opt (-fisolate-erroneous-paths): Add option and
 documentation.
 * gimple-ssa-isolate-paths.c: New file.
 * gimple.c (check_loadstore): New function.
 (infer_nonnull_range): Moved into gimple.c from tree-vrp.c
 Verify OP is in the argument list and the argument corresponding
 to OP is a pointer type.  Use operand_equal_p rather than
 pointer equality when testing if OP is on the nonnull list.
 Use check_loadstore rather than count_ptr_derefs.  Handle
 GIMPLE_RETURN statements.
 * tree-vrp.c (infer_nonnull_range): Remove.
 * gimple.h (infer_nonnull_range): Declare.
 (gsi_start_nondebug_after_labels): New function.
 * opts.c (default_options_table): Add OPT_fisolate_erroneous_paths.
 * passes.def: Add pass_isolate_erroneous_paths.
 * timevar.def (TV_ISOLATE_ERRONEOUS_PATHS): New timevar.
 * tree-pass.h (make_pass_isolate_erroneous_paths): Declare.
 * tree-ssa.c (struct count_ptr_d): Remove.
 (count_ptr_derefs, count_uses_and_derefs): Remove.
 * tree-ssa.h (count_uses_and_derefs): Remove.



 * gcc.dg/pr38984.c: Add -fno-isolate-erroneous-paths.
 * gcc.dg/tree-ssa/20030711-3.c: Update expected output.
 * gcc.dg/tree-ssa/isolate-1.c: New test.
 * gcc.dg/tree-ssa/isolate-2.c: New test.
 * gcc.dg/tree-ssa/isolate-3.c: New test.
 * gcc.dg/tree-ssa/isolate-4.c: New test.



 diff --git a/gcc/Makefile.in b/gcc/Makefile.in
 index 29609fd..7e9a702 100644
 --- a/gcc/Makefile.in
 +++ b/gcc/Makefile.in
 @@ -1233,6 +1233,7 @@ OBJS = \
 gimple-fold.o \
 gimple-low.o \
 gimple-pretty-print.o \
 +   gimple-ssa-isolate-paths.o \
 gimple-ssa-strength-reduction.o \
 gimple-streamer-in.o \
 gimple-streamer-out.o \
 diff --git a/gcc/common.opt b/gcc/common.opt
 index deeb3f2..6db9f56 100644
 --- a/gcc/common.opt
 +++ b/gcc/common.opt
 @@ -2104,6 +2104,12 @@ foptimize-strlen
  Common Report Var(flag_optimize_strlen) Optimization
  Enable string length optimizations on trees

 +fisolate-erroneous-paths
 +Common Report Var(flag_isolate_erroneous_paths) Init(1) Optimization

Drop Init(1) (see below)

 +Detect paths which trigger erroneous or undefined behaviour.  Isolate those
 +paths from the main control flow and turn the statement with erroneous or
 +undefined behaviour into a trap.
 +
  ftree-loop-distribution
  Common Report Var(flag_tree_loop_distribution) Optimization
  Enable loop distribution on trees
 diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c
 new file mode 100644
 index 000..aa526cc
 --- /dev/null
 +++ b/gcc/gimple-ssa-isolate-paths.c
 @@ -0,0 +1,332 @@
 +/* Detect paths through the CFG which can never be executed in a conforming
 +   program and isolate them.
 +
 +   Copyright (C) 2013
 +   Free Software Foundation, Inc.
 +
 +This file is part of GCC.
 +
 +GCC is free software; you can redistribute it and/or modify
 +it under the terms of the GNU General Public License as published by
 +the Free Software Foundation; either version 3, or (at your option)
 +any later version.
 +
 +GCC is distributed in the hope that it will be useful,
 +but WITHOUT ANY WARRANTY; without even the implied warranty of
 +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +GNU General Public License for more details.
 +
 +You should have received a copy of the GNU General Public License
 +along with GCC; see the file COPYING3.  If not see
 +http://www.gnu.org/licenses/.  */
 +
 +#include config.h
 +#include system.h
 +#include coretypes.h
 +#include tree.h
 +#include flags.h
 +#include basic-block.h
 +#include gimple.h
 +#include tree-ssa.h
 +#include gimple-ssa.h
 +#include tree-ssa-operands.h
 +#include tree-phinodes.h
 +#include ssa-iterators.h
 +#include cfgloop.h
 +#include tree-pass.h
 +
 +
 +static bool cfg_altered;
 +
 +/* BB when reached via incoming edge E will exhibit undefined behaviour
 +   at STMT.  Isolate and optimize the path which exhibits undefined
 +   behaviour.
 +
 +   Isolation is simple.  Duplicate BB and redirect E to BB'.
 +

Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Andrew MacLeod

On 11/01/2013 06:58 PM, David Malcolm wrote:

On Fri, 2013-11-01 at 22:57 +0100, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:47:14PM -0400, Andrew MacLeod wrote:

On 11/01/2013 05:41 PM, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:36:34PM -0400, Andrew MacLeod wrote:

   static inline void
! gimple_call_set_lhs (gimple gs, tree lhs)
   {
-   GIMPLE_CHECK (gs, GIMPLE_CALL);

The checking you are removing here.


What checking?  There ought to be no checking at all in this
example...  gimple_build_call_vec returns a gimple_call, and
gimple_call_set_lhs()  doesn't have to check anything because it
only accepts gimple_call's.. so there is no checking other than the
usual does my parameter match that the compiler has to do...

and want to replace it by checking of the types at compile time.
The problem is that it uglifies the source too much, and, when you
actually don't have a gimple_call but supposedly a base class of it,
I expect you'd do as_a which is not only further uglification, but has
runtime cost also for --enable-checking=release.

I can have a look next week at every call to gimple_call_set_lhs in the
tree, and see to what extent we know at compile-time that the initial
arg is indeed a call (of the ones I quickly grepped just now, most are
from gimple_build_call and friends, but one was from a gimple_copy).

FWIW I did some performance testing of the is_a/as_a code in the earlier
version of the patch, and it didn't have a noticable runtime cost
compared to the GIMPLE_CHECK in the existing code:
Size of compiler executable:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01920.html
Compile times:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00171.html
I actually really dislike as_a and is_a, and  think code needs to be 
restructured rather than use them, other than possibly at the very 
bottom level when we're allocating memory or something like that, or 
some kind of emergency :-)...   If we require frequent uses of those, 
I'd be against it, I find them quite ugly.


Like I said in the other reply, no rush, I don't think any of this 
follow up is appropriate this late in stage 1.  It would be more of an 
interest examination right now.. at least in my opinion...  I suspect 
thinks like gimple_assign are more complex cases, but without looking 
its hard to tell for sure.


Andrew



Re: Pre-Patch RFC: proposed changes to option-lookup

2013-11-04 Thread Richard Biener
On Wed, Oct 30, 2013 at 11:56 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 On Wed, 30 Oct 2013, David Malcolm wrote:

 My idea is to introduce a GCC_OPTION macro, and replace the above with:

   static bool
   gate_vrp (void)
   {
 return GCC_OPTION (flag_tree_vrp) != 0;
   }

 That's only slightly shorter than the full expansion using global_options;
 I'd prefer using the full expansion.

 (Of course the ideal is to use explicit options pointers instead of
 global_options.  For example, if such a pointer were associated with the
 current function, it might make function-specific options handling a bit
 less fragile.)

Seconded - ideally the 'struct function' a pass is supposed to work
on would be passed as argument here and the options in effect
should be able to be extracted from it.  [that is, we shouldn't have
push/pop_cfun or cfun or current_function_decl at all]

Note that I think expanding this to global_options.x_flag_tree_vrp
wouldn't be a step forward.

Richard.


 --
 Joseph S. Myers
 jos...@codesourcery.com


[PATCH, rs6000] Fix vect_pack_trunc_v2df pattern for little endian

2013-11-04 Thread Bill Schmidt
Hi,

This cleans up another case where a vector-pack operation needs to
reverse its operand order for little endian.  This fixes the last
remaining vector test failure in the test suite when building for little
endian.

Next I'll be spending some time looking for additional little endian
issues for which we don't yet have test coverage.  As an example, there
are several patterns similar to vect_pack_trunc_v2df that probably need
the same treatment as this patch, but we don't have any test cases that
expose a problem.  I need to verify those are broken and add test cases
when fixing them.

Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


2013-11-04  Bill Schmidt  wschm...@linux.vnet.ibm.com

* config/rs6000/vector.md (vec_pack_trunc_v2df):  Adjust for
little endian.


Index: gcc/config/rs6000/vector.md
===
--- gcc/config/rs6000/vector.md (revision 204320)
+++ gcc/config/rs6000/vector.md (working copy)
@@ -830,7 +830,12 @@
 
   emit_insn (gen_vsx_xvcvdpsp (r1, operands[1]));
   emit_insn (gen_vsx_xvcvdpsp (r2, operands[2]));
-  rs6000_expand_extract_even (operands[0], r1, r2);
+
+  if (BYTES_BIG_ENDIAN)
+rs6000_expand_extract_even (operands[0], r1, r2);
+  else
+rs6000_expand_extract_even (operands[0], r2, r1);
+
   DONE;
 })
 




Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 10:02 AM, Ilya Enkovich enkovich@gmail.com wrote:
 Hi,

 Here is a patch which hadles the problem with optimization of 
 BUILT_IN_CHKP_ARG_BND calls.  Pointer Bounds Checker expects that argument of 
 this call is a default SSA_NAME of the PARM_DECL whose bounds we want to get. 
  The problem is in optimizations which may replace arg with it's copy or a 
 known value.  This patch suppress such modifications.

This doesn't seem like a good fix.  I suppose you require the same on
RTL, that is, have the incoming arg reg coalesced with the use?
In that case better set SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

Richard.

 Thanks,
 Ilya
 --

 gcc/

 2013-10-28  Ilya Enkovich  ilya.enkov...@intel.com

 * tree-into-ssa.c: Include target.h
 (rewrite_update_stmt): Skip BUILT_IN_CHKP_ARG_BND calls.
 * tree-ssa-dom.c: Include target.h
 (cprop_into_stmt): Skip BUILT_IN_CHKP_ARG_BND calls.


 diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c
 index 981e9f4..8d48f6d 100644
 --- a/gcc/tree-into-ssa.c
 +++ b/gcc/tree-into-ssa.c
 @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
  #include params.h
  #include diagnostic-core.h
  #include tree-into-ssa.h
 +#include target.h


  /* This file builds the SSA form for a function as described in:
 @@ -1921,8 +1922,14 @@ rewrite_update_stmt (gimple stmt, gimple_stmt_iterator 
 gsi)
  }

/* Rewrite USES included in OLD_SSA_NAMES and USES whose underlying
 - symbol is marked for renaming.  */
 -  if (rewrite_uses_p (stmt))
 + symbol is marked for renaming.
 + Skip calls to BUILT_IN_CHKP_ARG_BND whose arg should never be
 + renamed.  */
 +  if (rewrite_uses_p (stmt)
 +   !(flag_check_pointer_bounds
 +   (gimple_code (stmt) == GIMPLE_CALL)
 +   gimple_call_fndecl (stmt)
 +  == targetm.builtin_chkp_function (BUILT_IN_CHKP_ARG_BND)))
  {
if (is_gimple_debug (stmt))
 {
 diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
 index 211bfcf..445278a 100644
 --- a/gcc/tree-ssa-dom.c
 +++ b/gcc/tree-ssa-dom.c
 @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
  #include params.h
  #include tree-ssa-threadedge.h
  #include tree-ssa-dom.h
 +#include target.h

  /* This file implements optimizations on the dominator tree.  */

 @@ -2266,6 +2267,16 @@ cprop_into_stmt (gimple stmt)
use_operand_p op_p;
ssa_op_iter iter;

 +  /* Call used to obtain bounds of input arg by Pointer Bounds Checker
 + should not be optimized.  Argument of the call is a default
 + SSA_NAME of PARM_DECL.  It should never be replaced by value.  */
 +  if (flag_check_pointer_bounds  gimple_code (stmt) == GIMPLE_CALL)
 +{
 +  tree fndecl = gimple_call_fndecl (stmt);
 +  if (fndecl == targetm.builtin_chkp_function (BUILT_IN_CHKP_ARG_BND))
 +   return;
 +}
 +
FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_USE)
  cprop_operand (stmt, op_p);
  }


Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 10:24 AM, Ilya Enkovich enkovich@gmail.com wrote:
 Hi,

 Here is a patch to support of instrumented code in calls verifiers and calls 
 copy with skipped args.

 Thanks,
 Ilya
 --

 gcc/

 2013-10-29  Ilya Enkovich  ilya.enkov...@intel.com

 * cgraph.c (gimple_check_call_args): Handle bound args.
 * gimple.c (gimple_call_copy_skip_args): Likewise.
 (validate_call): Likewise.


 diff --git a/gcc/cgraph.c b/gcc/cgraph.c
 index 52d9ab0..9d7ae85 100644
 --- a/gcc/cgraph.c
 +++ b/gcc/cgraph.c
 @@ -3030,40 +3030,54 @@ gimple_check_call_args (gimple stmt, tree fndecl, 
 bool args_count_match)
  {
for (i = 0, p = DECL_ARGUMENTS (fndecl);
i  nargs;
 -  i++, p = DECL_CHAIN (p))
 +  i++)
 {
 - tree arg;
 + tree arg = gimple_call_arg (stmt, i);
 +
 + /* Skip bound args inserted by Pointer Bounds Checker.  */
 + if (POINTER_BOUNDS_P (arg))
 +   continue;
 +
   /* We cannot distinguish a varargs function from the case
  of excess parameters, still deferring the inlining decision
  to the callee is possible.  */
   if (!p)
 break;
 - arg = gimple_call_arg (stmt, i);
 +
   if (p == error_mark_node
   || arg == error_mark_node
   || (!types_compatible_p (DECL_ARG_TYPE (p), TREE_TYPE (arg))
!fold_convertible_p (DECL_ARG_TYPE (p), arg)))
  return false;
 +
 + p = DECL_CHAIN (p);
 }
if (args_count_match  p)
 return false;
  }
else if (parms)
  {
 -  for (i = 0, p = parms; i  nargs; i++, p = TREE_CHAIN (p))
 +  for (i = 0, p = parms; i  nargs; i++)
 {
 - tree arg;
 + tree arg = gimple_call_arg (stmt, i);
 +
 + /* Skip bound args inserted by Pointer Bounds Checker.  */
 + if (POINTER_BOUNDS_P (arg))
 +   continue;
 +
   /* If this is a varargs function defer inlining decision
  to callee.  */
   if (!p)
 break;
 - arg = gimple_call_arg (stmt, i);
 +
   if (TREE_VALUE (p) == error_mark_node
   || arg == error_mark_node
   || TREE_CODE (TREE_VALUE (p)) == VOID_TYPE
   || (!types_compatible_p (TREE_VALUE (p), TREE_TYPE (arg))
!fold_convertible_p (TREE_VALUE (p), arg)))
  return false;
 +
 + p = TREE_CHAIN (p);
 }
  }
else
 diff --git a/gcc/gimple.c b/gcc/gimple.c
 index 20f6010..dc85bf8 100644
 --- a/gcc/gimple.c
 +++ b/gcc/gimple.c
 @@ -3048,15 +3048,20 @@ canonicalize_cond_expr_cond (tree t)
  gimple
  gimple_call_copy_skip_args (gimple stmt, bitmap args_to_skip)
  {
 -  int i;
 +  int i, bit;
int nargs = gimple_call_num_args (stmt);
vectree vargs;
vargs.create (nargs);
gimple new_stmt;

 -  for (i = 0; i  nargs; i++)
 -if (!bitmap_bit_p (args_to_skip, i))
 -  vargs.quick_push (gimple_call_arg (stmt, i));
 +  for (i = 0, bit = 0; i  nargs; i++, bit++)
 +  if (POINTER_BOUNDS_P (gimple_call_arg (stmt, i)))
 +   {
 + if (!bitmap_bit_p (args_to_skip, --bit))
 +   vargs.quick_push (gimple_call_arg (stmt, i));
 +   }
 +  else if (!bitmap_bit_p (args_to_skip, bit))
 + vargs.quick_push (gimple_call_arg (stmt, i));

The new code is completely confusing.  You need to update
comments with what bits args_to_skip refers to and what happens
with pointer bounds.  I suppose the bitmap also contains
bound parameters as they are in calls (but not in fndecls -- I think
this discrepancy still is a way to desaster).

if (gimple_call_internal_p (stmt))
  new_stmt = gimple_build_call_internal_vec (gimple_call_internal_fn 
 (stmt),
 @@ -3702,6 +3707,9 @@ validate_call (gimple stmt, tree fndecl)
if (!targs)
 return true;
tree arg = gimple_call_arg (stmt, i);
 +  /* Skip bounds.  */
 +  if (flag_check_pointer_bounds  POINTER_BOUNDS_P (arg))
 +   continue;

Why check flag_check_pointer_bounds here but not elsewhere?

if (INTEGRAL_TYPE_P (TREE_TYPE (arg))
INTEGRAL_TYPE_P (TREE_VALUE (targs)))
 ;


[Patch, testsuite] Require scheduling support in gcc.dg/superblock.c

2013-11-04 Thread Senthil Kumar Selvaraj
Hi,

gcc.dg/superblock.c uses the -fdump-rtl-sched2 option and then scans the 
resulting dump. This fails for the AVR target, as it doesn't have any
instruction scheduling support.

The attached patch makes this test require scheduling support in the
target. If ok, could someone commit please? I do not have commit access.

Regards
Senthil

gcc/testsuite

2013-11-04  Senthil Kumar Selvaraj  senthil_kumar.selva...@atmel.com

* gcc.dg/superblock.c: Require scheduling support

diff --git gcc/testsuite/gcc.dg/superblock.c gcc/testsuite/gcc.dg/superblock.c
index 2b9aedf..272d161 100644
--- gcc/testsuite/gcc.dg/superblock.c
+++ gcc/testsuite/gcc.dg/superblock.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fno-asynchronous-unwind-tables -fsched2-use-superblocks 
-fdump-rtl-sched2 -fdump-rtl-bbro } */
+/* { dg-require-effective-target scheduling } */
 
 typedef int aligned __attribute__ ((aligned (64)));
 extern void abort (void);


Re: Aliasing: look through pointer's def stmt

2013-11-04 Thread Richard Biener
On Thu, Oct 31, 2013 at 6:11 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Wed, 30 Oct 2013, Richard Biener wrote:

 --- gcc/tree-ssa-alias.c(revision 204188)
 +++ gcc/tree-ssa-alias.c(working copy)
 @@ -567,20 +567,29 @@ void
  ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size)
  {
HOST_WIDE_INT t1, t2;
ref-ref = NULL_TREE;
if (TREE_CODE (ptr) == SSA_NAME)
  {
gimple stmt = SSA_NAME_DEF_STMT (ptr);
if (gimple_assign_single_p (stmt)
gimple_assign_rhs_code (stmt) == ADDR_EXPR)
 ptr = gimple_assign_rhs1 (stmt);
 +  else if (is_gimple_assign (stmt)
 +   gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
 +   host_integerp (gimple_assign_rhs2 (stmt), 0)
 +   (t1 = int_cst_value (gimple_assign_rhs2 (stmt))) = 0)


 No need to restrict this to positive offsets I think.


 Here is the follow-up patch to remove this restriction (bootstrap+testsuite
 on x86_64-unknown-linux-gnu).
 I don't really know how safe it is.

 I couldn't use host_integerp(,1) since it returns false for (size_t)(-1) on
 x86_64, and host_integerp(,0) seemed strange, but I could use it if you
 want.

Well, host_integer_p (, 0) looks correct to me.  But ...

 Index: gcc/tree-ssa-alias.h
 ===
 --- gcc/tree-ssa-alias.h(revision 204267)
 +++ gcc/tree-ssa-alias.h(working copy)
 @@ -139,30 +139,30 @@ extern void pt_solution_set_var (struct

  extern void dump_pta_stats (FILE *);

  extern GTY(()) struct pt_solution ipa_escaped_pt;

  /* Return true, if the two ranges [POS1, SIZE1] and [POS2, SIZE2]
 overlap.  SIZE1 and/or SIZE2 can be (unsigned)-1 in which case the
 range is open-ended.  Otherwise return false.  */

  static inline bool
 -ranges_overlap_p (unsigned HOST_WIDE_INT pos1,
 - unsigned HOST_WIDE_INT size1,
 - unsigned HOST_WIDE_INT pos2,
 - unsigned HOST_WIDE_INT size2)
 +ranges_overlap_p (HOST_WIDE_INT pos1,
 + HOST_WIDE_INT size1,
 + HOST_WIDE_INT pos2,
 + HOST_WIDE_INT size2)

I think size[12] should still be unsigned (we don't allow negative
sizes but only the special value -1U which we special-case only to
avoid pos + size to wrap)

Otherwise ok.

Thanks,
Richard.

  {
if (pos1 = pos2
 -   (size2 == (unsigned HOST_WIDE_INT)-1
 +   (size2 == -1
   || pos1  (pos2 + size2)))
  return true;
if (pos2 = pos1
 -   (size1 == (unsigned HOST_WIDE_INT)-1
 +   (size1 == -1
   || pos2  (pos1 + size1)))
  return true;

return false;
  }





  #endif /* TREE_SSA_ALIAS_H  */



Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Andrew MacLeod

On 11/01/2013 05:57 PM, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:47:14PM -0400, Andrew MacLeod wrote:

On 11/01/2013 05:41 PM, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:36:34PM -0400, Andrew MacLeod wrote:

   static inline void
! gimple_call_set_lhs (gimple gs, tree lhs)
   {
-   GIMPLE_CHECK (gs, GIMPLE_CALL);

The checking you are removing here.


What checking?  There ought to be no checking at all in this
example...  gimple_build_call_vec returns a gimple_call, and
gimple_call_set_lhs()  doesn't have to check anything because it
only accepts gimple_call's.. so there is no checking other than the
usual does my parameter match that the compiler has to do...

and want to replace it by checking of the types at compile time.
The problem is that it uglifies the source too much, and, when you
actually don't have a gimple_call but supposedly a base class of it,


But when you convert all the source base for gimple calls, then you have 
context pretty much everywhere.. you wont be doing it from a base 
class...  at least pretty infrequently.

I expect you'd do as_a which is not only further uglification, but has
runtime cost also for --enable-checking=release.


Im not a fan of as_a or is_a at all.  I really dislike them. They 
should be kept to the barest minimum, or even non existent if 
possible.   If you need it, you probably need restructured source.  
ie, functions which work with switches on gimple_code() need to be 
restructured... And the end result should look cleaner.


Of course, thats what makes it a big project too :-)

Andrew


Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 08:54:40AM -0500, Andrew MacLeod wrote:
 What checking?  There ought to be no checking at all in this
 example...  gimple_build_call_vec returns a gimple_call, and
 gimple_call_set_lhs()  doesn't have to check anything because it
 only accepts gimple_call's.. so there is no checking other than the
 usual does my parameter match that the compiler has to do...
 and want to replace it by checking of the types at compile time.
 The problem is that it uglifies the source too much, and, when you
 actually don't have a gimple_call but supposedly a base class of it,
 
 But when you convert all the source base for gimple calls, then you
 have context pretty much everywhere.. you wont be doing it from a
 base class...  at least pretty infrequently.

Usually you just have IL which contains various kinds of statements,
it can be a call, assign, dozens of other gimple stmt forms.
If you need to modify something in these, you'd need to uglify with
as_a/is_a if you require everybody using these setters/getters to use
gimple_call * rather than gimple, pointer to any kind of stmt.

Jakub


Re: [PATCH, rs6000] Fix vect_pack_trunc_v2df pattern for little endian

2013-11-04 Thread David Edelsohn
On Mon, Nov 4, 2013 at 8:28 AM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 This cleans up another case where a vector-pack operation needs to
 reverse its operand order for little endian.  This fixes the last
 remaining vector test failure in the test suite when building for little
 endian.

 Next I'll be spending some time looking for additional little endian
 issues for which we don't yet have test coverage.  As an example, there
 are several patterns similar to vect_pack_trunc_v2df that probably need
 the same treatment as this patch, but we don't have any test cases that
 expose a problem.  I need to verify those are broken and add test cases
 when fixing them.

 Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
 regressions.  Is this ok for trunk?

 Thanks,
 Bill


 2013-11-04  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/vector.md (vec_pack_trunc_v2df):  Adjust for
 little endian.

This patch is okay.

Thanks, David


Re: [PATCH, rs6000] Re-permute source register for postreload splits of VSX LE stores

2013-11-04 Thread David Edelsohn
On Sat, Nov 2, 2013 at 7:44 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 When I created the patch to split VSX loads and stores to add permutes
 so the register image was correctly little endian, I forgot to implement
 a known requirement.  When a VSX store is split after reload, we must
 reuse the source register for the permute, which leaves it in a swapped
 state after the store.  If the register remains live, this is invalid.
 After the store, we need to permute the register back to its original
 value.

 For each of the little endian VSX store patterns, I've replaced the
 define_insn_and_split with a define_insn and two define_splits, one for
 prior to reload and one for after reload.  The post-reload split has the
 extra behavior.

 I don't know of a way to set the insn's length attribute conditionally,
 so I'm optimistically setting this to 8, though it will be 12 in the
 post-reload case.  Is there any concern with that?  Is there a way to
 make it fully accurate?

 Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
 regressions.  This fixes two failing test cases.  Is this ok for trunk?

 Thanks!
 Bill


 2013-11-02  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/vsx.md (*vsx_le_perm_store_mode for VSX_D):
 Replace the define_insn_and_split with a define_insn and two
 define_splits, with the split after reload re-permuting the source
 register to its original value.
 (*vsx_le_perm_store_mode for VSX_W): Likewise.
 (*vsx_le_perm_store_v8hi): Likewise.
 (*vsx_le_perm_store_v16qi): Likewise.

This is okay, but you must change the insn length to the conservative,
larger value.  insn length is used to reverse branches when the
displacement is too large.

Thanks, David


Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Andrew MacLeod

On 11/04/2013 09:00 AM, Jakub Jelinek wrote:

On Mon, Nov 04, 2013 at 08:54:40AM -0500, Andrew MacLeod wrote:

What checking?  There ought to be no checking at all in this
example...  gimple_build_call_vec returns a gimple_call, and
gimple_call_set_lhs()  doesn't have to check anything because it
only accepts gimple_call's.. so there is no checking other than the
usual does my parameter match that the compiler has to do...

and want to replace it by checking of the types at compile time.
The problem is that it uglifies the source too much, and, when you
actually don't have a gimple_call but supposedly a base class of it,

But when you convert all the source base for gimple calls, then you
have context pretty much everywhere.. you wont be doing it from a
base class...  at least pretty infrequently.

Usually you just have IL which contains various kinds of statements,
it can be a call, assign, dozens of other gimple stmt forms.
If you need to modify something in these, you'd need to uglify with
as_a/is_a if you require everybody using these setters/getters to use
gimple_call * rather than gimple, pointer to any kind of stmt.

Jakub
In many cases splitting out the case body and typing functions will 
surprisingly cover many of the the uses required.
Proper subclassing will cover a lot too if the get/set methods are in 
the right classes.  In cases where none of this works...  well, there is 
an appropriate time for a virtual function...  Thats really all those 
kinds off switches are implementing  but I won't get into that meta 
discussion right now :-).   We can save that for someday when there is 
actually a real proposal on the table.


Andrew



Re: LRA vs reload on powerpc: 2 extra FAILs that are actually improvements?

2013-11-04 Thread David Edelsohn
Hi, Steven

Thanks for investigating this. This presumably was the reason that
Vlad changed the constraint modifier for that pattern in his patch for
LRA.  I don't think that using memory is an improvement, but Mike is
the best person to comment.

Thanks, David

On Sat, Nov 2, 2013 at 6:48 PM, Steven Bosscher stevenb@gmail.com wrote:
 Hello,

 Today's powerpc64-linux gcc has 2 extra failures with -mlra vs. reload
 (i.e. svn unpatched).

 (I'm excluding guality failure differences here because there are too
 many of them that seem to fail at random after minimal changes
 anywhere in the compiler...).

 Test results are posted here:
 reload: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00128.html
 lra: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00129.html

 The new failures and total score is as follows (+=lra, -=reload):
 +FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6
 +FAIL: gcc.target/powerpc/pr58330.c scan-assembler-not stwbrx

 === gcc Summary ===

 -# of expected passes   97887
 -# of unexpected failures   536
 +# of expected passes   97903
 +# of unexpected failures   538
  # of unexpected successes  38
  # of expected failures 244
 -# of unsupported tests 1910
 +# of unsupported tests 1892


 The failure of pr53199.c is because of different instruction selection
 for bswap. Test case is reduced to just one function:

 /* { dg-options -O2 -mcpu=power6 -mavoid-indexed-addresses } */
 long long
 reg_reverse (long long x)
 {
   return __builtin_bswap64 (x);
 }

 Reload left vs. LRA right:
 reg_reverse:   reg_reverse:
 srdi 8,3,32  | addi 8,1,-16
 rlwinm 7,3,8,0x  | srdi 10,3,32
 rlwinm 9,8,8,0x  | addi 9,8,4
 rlwimi 7,3,24,0,7| stwbrx 3,0,8
 rlwimi 7,3,24,16,23  | stwbrx 10,0,9
 rlwimi 9,8,24,0,7| ld 3,-16(1)
 rlwimi 9,8,24,16,23  
 sldi 7,7,32  
 or 7,7,9 
 mr 3,7   
 blrblr

 This same difference is responsible for the failure of pr58330.c which
 also uses __builtin_bswap64().

 The difference in RTL for the test case is this (after reload vs. after LRA):
 -   11: {%7:DI=bswap(%3:DI);clobber %8:DI;clobber %9:DI;clobber %10:DI;}
 -   20: %3:DI=%7:DI
 +   20: %8:DI=%1:DI-0x10
 +   21: %8:DI=%8:DI  // stupid no-op move
 +   11: {[%8:DI]=bswap(%3:DI);clobber %9:DI;clobber %10:DI;clobber scratch;}
 +   19: %3:DI=[%1:DI-0x10]

 So LRA believes going through memory is better than using a register,
 even though obviously there are plenty registers available.

 What LRA does:
   Creating newreg=129
 Removing SCRATCH in insn #11 (nop 2)
   Creating newreg=130
 Removing SCRATCH in insn #11 (nop 3)
   Creating newreg=131
 Removing SCRATCH in insn #11 (nop 4)
 // at this point the insn would be a bswapdi2_64bit:
 //   11: {%3:DI=bswap(%3:DI);clobber r129;clobber r130;clobber r131;}
 // cost calculation for the insn alternatives:
 0 Early clobber: reject++
 1 Non-pseudo reload: reject+=2
 1 Spill pseudo in memory: reject+=3
 2 Scratch win: reject+=2
 3 Scratch win: reject+=2
 4 Scratch win: reject+=2
   alt=0,overall=18,losers=1,rld_nregs=0
 0 Non-pseudo reload: reject+=2
 0 Spill pseudo in memory: reject+=3
 0 Non input pseudo reload: reject++
 2 Scratch win: reject+=2
 3 Scratch win: reject+=2
   alt=1,overall=16,losers=1,rld_nregs=0
 Staticly defined alt reject+=12
 0 Early clobber: reject++
 2 Scratch win: reject+=2
 3 Scratch win: reject+=2
 4 Scratch win: reject+=2
 0 Conflict early clobber reload: reject--
   alt=2,overall=24,losers=1,rld_nregs=0
  Choosing alt 1 in insn 11:  (0) Z  (1) r  (2) b  (3) r  (4)
 X {*bswapdi2_64bit}
   Change to class BASE_REGS for r129
   Change to class GENERAL_REGS for r130
   Creating newreg=132 from oldreg=3, assigning class NO_REGS to r132
   Change to class NO_REGS for r131
11: {r132:DI=bswap(%3:DI);clobber r129:DI;clobber r130:DI;clobber r131:DI;}
   REG_UNUSED r131:DI
   REG_UNUSED r130:DI
   REG_UNUSED r129:DI

 LRA selects alternative 1 (Z,r,b,r,X) which seems to be the right
 choice, from looking at the constraints. Reload selects alternative 2
 which is slightly^2 discouraged: (??r,r,r,r,r).

 Is this an improvement or a regression? If it's an improvement then
 these two test cases should be adjusted :-)

 Ciao!
 Steven


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Kenneth Zadeck

On 11/04/2013 04:12 AM, Richard Biener wrote:

On Fri, 1 Nov 2013, Richard Sandiford wrote:


I'm building one target for each supported CPU and comparing the wide-int
assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
output from the merge point.  This patch removes all the differences I saw
for alpha-linux-gnu in gcc.c-torture.

Hunk 1: Preserve the current trunk behaviour in which the shift count
is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
inspection after hunk 5.

Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
types and where we weren't extending according to the sign of the source.
We should probably assert that the input is at least as wide as the type...

Hunk 4: The  in:

  if ((TREE_INT_CST_HIGH (arg1)  mask_hi) == mask_hi
   (TREE_INT_CST_LOW (arg1)  mask_lo) == mask_lo)

had got dropped during the conversion.

Hunk 5: The old code was:

  if (shift  0)
{
  *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
  *val = r1val.llshift (shift, TYPE_PRECISION (type));
}
  else if (shift  0)
{
  shift = -shift;
  *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
  *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
}

and these precision arguments had two purposes: to control which
bits of the first argument were shifted, and to control the truncation
mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
for the second.

(BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
end of the !GENERATOR_FILE list in rtl.def, since the current position caused
some spurious differences.  The problem AFAICT is that hash_rtx hashes
on code, RTL PRE creates registers in the hash order of the associated
expressions, RA uses register numbers as a tie-breaker during ordering,
and so the order of rtx_code can indirectly influence register allocation.
First time I'd realised that could happen, so just thought I'd mention it.
I think we should keep rtl.def in the current (logical) order though.)

Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?

Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
from double-int.c on the trunk?  If not then putting this at the
callers of wi::rshift and friends is clearly asking for future
mistakes.

Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
than in machine instruction patterns was a mistake in the past.
I disagree.   The programmer has the right to expect that the optimizer 
transforms the program in a manner that is consistent, but faster, with 
the underlying machine.   Saying you are going to do it in the patterns 
is not realistic.


I should point out that i had originally put the shift count truncated 
inside wide-int and you (richi) told me to push it back out to the 
callers.   I personally think that it should be returned to wide-int to 
assure consistent behavior.


However, i would also say that it would be nice if were actually done 
right.   There are (as far as i know) actually 3 ways handling shift 
amounts that have been used in the past:


(1) mask the shift amount by (bitsize - 1).   This is the most popular 
and is what is done if you set SHIFT_COUNT_TRUNCATED.
(2) mask the shift amount by ((2*bitsize) -1).   There are a couple of 
architectures that do this including the ppc.   However, gcc has never 
done it correctly for these platforms.
(3) assume that the shift amount can be a big number.The only 
machine that i know of that ever did this was the Knuth's mmix, and 
AFAIK, it has never known silicon.   (it turns out that this is almost 
impossible to build efficiently).   And yet, we support this if you do 
not set SHIFT_COUNT_TRUNCATED.
I will, if you like, fix this properly.   But i want to see a plan that 
at least richi and richard are happy with first.My gut feeling is to 
provide a target hook that takes a bitsize and a shift value as wide-int 
(or possibly a hwi) and returns a machine specific shift amount.Then 
i would put that inside of the shift routines of wide-int.


kenny







Re: libsanitizer merge from upstream r191666

2013-11-04 Thread Konstantin Serebryany
+glider, our Mac expert.

Hi Jack,

This patch has not been tested on Mac and we generally do not claim
that gcc-asan is supported on Mac.
clang-asan *is* supported on Mac and our bots are green (this patch is
a merge of the sources which are regularly tested on Mac,
but the build procedure is different).

The failing assertion is weird, I haven't seen it since last year
(https://code.google.com/p/address-sanitizer/issues/detail?id=117),
but Alexander should know more.

What is your version of Mac OS?

--kcc

On Sat, Nov 2, 2013 at 10:25 AM, Jack Howarth howa...@bromo.med.uc.edu wrote:
 On Fri, Nov 01, 2013 at 04:13:05PM -0700, Konstantin Serebryany wrote:
 Jukub,

 This works!
 New patch attached.

 Konstantin,
This patch, when applied on top of r204305, bootstraps fine on 
 x86_64-apple-darwin12 but
 produces a lot of regressions with...

 make -k check RUNTESTFLAGS=asan.exp --target_board=unix'{-m64}'

 Native configuration is x86_64-apple-darwin12.5.0

 === g++ tests ===


 Running target unix/-m64
 FAIL: c-c++-common/asan/global-overflow-1.c  -O0  execution test
 FAIL: c-c++-common/asan/global-overflow-1.c  -O1  execution test
 FAIL: c-c++-common/asan/global-overflow-1.c  -O2  execution test
 FAIL: c-c++-common/asan/global-overflow-1.c  -O3 -fomit-frame-pointer  
 execution test
 FAIL: c-c++-common/asan/global-overflow-1.c  -O3 -g  execution test
 FAIL: c-c++-common/asan/global-overflow-1.c  -Os  execution test
 FAIL: c-c++-common/asan/global-overflow-1.c  -O2 -flto -flto-partition=none  
 execution test
 FAIL: c-c++-common/asan/global-overflow-1.c  -O2 -flto  execution test
 FAIL: c-c++-common/asan/heap-overflow-1.c  -O0  execution test
 FAIL: c-c++-common/asan/heap-overflow-1.c  -O1  execution test
 FAIL: c-c++-common/asan/heap-overflow-1.c  -O2  execution test
 FAIL: c-c++-common/asan/heap-overflow-1.c  -O3 -fomit-frame-pointer  
 execution test
 FAIL: c-c++-common/asan/heap-overflow-1.c  -O3 -g  execution test
 FAIL: c-c++-common/asan/heap-overflow-1.c  -Os  execution test
 FAIL: c-c++-common/asan/heap-overflow-1.c  -O2 -flto -flto-partition=none  
 execution test
 FAIL: c-c++-common/asan/heap-overflow-1.c  -O2 -flto  execution test
 FAIL: c-c++-common/asan/memcmp-1.c  -O0  execution test
 FAIL: c-c++-common/asan/memcmp-1.c  -O1  execution test
 FAIL: c-c++-common/asan/memcmp-1.c  -O2  execution test
 FAIL: c-c++-common/asan/memcmp-1.c  -O3 -fomit-frame-pointer  execution test
 FAIL: c-c++-common/asan/memcmp-1.c  -O3 -g  execution test
 FAIL: c-c++-common/asan/memcmp-1.c  -Os  execution test
 FAIL: c-c++-common/asan/memcmp-1.c  -O2 -flto -flto-partition=none  execution 
 test
 FAIL: c-c++-common/asan/memcmp-1.c  -O2 -flto  execution test
 FAIL: c-c++-common/asan/null-deref-1.c  -O0  execution test
 FAIL: c-c++-common/asan/null-deref-1.c  -O1  execution test
 FAIL: c-c++-common/asan/null-deref-1.c  -O2  execution test
 FAIL: c-c++-common/asan/null-deref-1.c  -O3 -fomit-frame-pointer  execution 
 test
 FAIL: c-c++-common/asan/null-deref-1.c  -O3 -g  execution test
 FAIL: c-c++-common/asan/null-deref-1.c  -Os  execution test
 FAIL: c-c++-common/asan/null-deref-1.c  -O2 -flto -flto-partition=none  
 execution test
 FAIL: c-c++-common/asan/null-deref-1.c  -O2 -flto  execution test
 FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O0  execution test
 FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O1  execution test
 FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2  execution test
 FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O3 -fomit-frame-pointer  
 execution test
 FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O3 -g  execution test
 FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -Os  execution test
 FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2 -flto 
 -flto-partition=none  execution test
 FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2 -flto  execution test
 FAIL: c-c++-common/asan/sleep-before-dying-1.c  -O2  execution test
 FAIL: c-c++-common/asan/sleep-before-dying-1.c  -O2 -flto 
 -flto-partition=none  execution test
 FAIL: c-c++-common/asan/sleep-before-dying-1.c  -O2 -flto  execution test
 FAIL: c-c++-common/asan/stack-overflow-1.c  -O0  execution test
 FAIL: c-c++-common/asan/stack-overflow-1.c  -O1  execution test
 FAIL: c-c++-common/asan/stack-overflow-1.c  -O2  execution test
 FAIL: c-c++-common/asan/stack-overflow-1.c  -O3 -fomit-frame-pointer  
 execution test
 FAIL: c-c++-common/asan/stack-overflow-1.c  -O3 -g  execution test
 FAIL: c-c++-common/asan/stack-overflow-1.c  -Os  execution test
 FAIL: c-c++-common/asan/stack-overflow-1.c  -O2 -flto -flto-partition=none  
 execution test
 FAIL: c-c++-common/asan/stack-overflow-1.c  -O2 -flto  execution test
 FAIL: c-c++-common/asan/strip-path-prefix-1.c  -O2  execution test
 FAIL: c-c++-common/asan/strip-path-prefix-1.c  -O2 -flto -flto-partition=none 
  execution test
 FAIL: c-c++-common/asan/strip-path-prefix-1.c  -O2 -flto  execution test
 FAIL: c-c++-common/asan/strncpy-overflow-1.c  -O0  

Re: Ping^3 Re: Add predefined macros for library use in defining __STDC_IEC_559*

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 02:51:37PM +, Joseph S. Myers wrote:
 Ping^3.  The addition of a new target hook in this patch 
 http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01131.html is still pending 
 review.

Ok.

Jakub


Re: [wide-int] Avoid some changes in output code

2013-11-04 Thread Richard Sandiford
Kenneth Zadeck zad...@naturalbridge.com writes:
 However, i would also say that it would be nice if were actually done 
 right.   There are (as far as i know) actually 3 ways handling shift 
 amounts that have been used in the past:

 (1) mask the shift amount by (bitsize - 1).   This is the most popular 
 and is what is done if you set SHIFT_COUNT_TRUNCATED.
 (2) mask the shift amount by ((2*bitsize) -1).   There are a couple of 
 architectures that do this including the ppc.   However, gcc has never 
 done it correctly for these platforms.

There's TARGET_SHIFT_TRUNCATION_MASK for these cases.  It applies only
to the shift instruction patterns though, not to any random rtl shift
expression.

Part of the reason for this mess is that the things represented as
rtl shifts didn't necessarily start out as shifts.  E.g. during combine,
extractions are rewritten as shifts, optimised, then converted back.
So since SHIFT_COUNT_TRUNCATED applies to all rtl shift expressions,
it also has to apply to all bitfield operations, since bitfield operations
temporarily appear as shifts (see tm.texi).  TARGET_SHIFT_TRUNCATION_MASK
deliberately limits the scope to the shift instruction patterns because
that's easier to contain.

E.g. knowing that shifts operate to a certain mask allows you
to generate more efficient doubleword shift sequences, which was why
TARGET_SHIFT_TRUNCATION_MASK was added in the first place.

 (3) assume that the shift amount can be a big number.The only 
 machine that i know of that ever did this was the Knuth's mmix, and 
 AFAIK, it has never known silicon.   (it turns out that this is almost 
 impossible to build efficiently).   And yet, we support this if you do 
 not set SHIFT_COUNT_TRUNCATED.

Well, we support it in the sense that we make no assumptions about what
happens, at least at the rtl level.  We just punt on any out-of-range shift
and leave it to be evaluated at run time.

Thanks,
Richard



Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Dodji Seketeli
Bernd Edlinger bernd.edlin...@hotmail.de writes:


 I see another read_line at gcov.c, which seems to be a copy.

 Maybe this should be changed too?

I have seen it as well.

I'd rather have the patch be reviewed and everthing, and only then
propose to share the implementation with the gcov module.

-- 
Dodji


[omp4]

2013-11-04 Thread Aldy Hernandez

Hi.

While looking over some of your testcases I noticed that array 
subscripts are not being properly adjusted:


foo(int i) {
array[i] = 
}

The `i' should use the simd array magic instead of ICEing :).

Is the attached patch OK for the branch?

Aldy
gcc/ChangeLog.gomp

* omp-low.c (ipa_simd_modify_function_body): Adjust tree operands
on the LHS of an assign.
(ipa_simd_modify_function_body_ops_1): New.
(ipa_simd_modify_function_body_ops): New.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d30fb17..94058af 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11049,6 +11049,35 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
   return seq;
 }
 
+static tree
+ipa_simd_modify_function_body_ops_1 (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  ipa_parm_adjustment_vec *adjustments = (ipa_parm_adjustment_vec *) wi-info;
+  tree t = *tp;
+
+  if (DECL_P (t) || TREE_CODE (t) == SSA_NAME)
+return (tree) sra_ipa_modify_expr (tp, true, *adjustments);
+  else
+*walk_subtrees = 1;
+  return NULL_TREE;
+}
+
+/* Helper function for ipa_simd_modify_function_body.  Make any
+   necessary adjustments for tree operators.  */
+
+static bool
+ipa_simd_modify_function_body_ops (tree *tp,
+  ipa_parm_adjustment_vec *adjustments)
+{
+  struct walk_stmt_info wi;
+  memset (wi, 0, sizeof (wi));
+  wi.info = adjustments;
+  bool res = (bool) walk_tree (tp, ipa_simd_modify_function_body_ops_1,
+  wi, NULL);
+  return res;
+}
+
 /* Traverse the function body and perform all modifications as
described in ADJUSTMENTS.  At function return, ADJUSTMENTS will be
modified such that the replacement/reduction value will now be an
@@ -11121,6 +11150,11 @@ ipa_simd_modify_function_body (struct cgraph_node 
*node,
case GIMPLE_ASSIGN:
  t = gimple_assign_lhs_ptr (stmt);
  modified |= sra_ipa_modify_expr (t, false, adjustments);
+
+ /* The LHS may have operands that also need adjusting
+(e.g. `foo' in array[foo]).  */
+ modified |= ipa_simd_modify_function_body_ops (t, adjustments);
+
  for (i = 0; i  gimple_num_ops (stmt); ++i)
{
  t = gimple_op_ptr (stmt, i);
diff --git a/gcc/testsuite/gcc.dg/gomp/simd-clones-6.c 
b/gcc/testsuite/gcc.dg/gomp/simd-clones-6.c
new file mode 100644
index 000..8818594
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/simd-clones-6.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options -fopenmp } */
+
+/* Test that array subscripts are properly adjusted.  */
+
+int array[1000];
+#pragma omp declare simd notinbranch simdlen(4)
+void foo (int i)
+{
+  array[i] = 555;
+}


Re: [omp4] fix array subscripts in simd clones

2013-11-04 Thread Aldy Hernandez

On 11/04/13 08:19, Aldy Hernandez wrote:

Hi.

While looking over some of your testcases I noticed that array
subscripts are not being properly adjusted:

foo(int i) {
 array[i] = 
}

The `i' should use the simd array magic instead of ICEing :).

Is the attached patch OK for the branch?

Aldy


Ughh, my apologies for the lack of subject.  Fixed in this followup.


Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Jason Merrill

On 11/04/2013 07:41 AM, Richard Biener wrote:

Please make sure the warning is preserved when not bootstrapping,
it's useful to have them in your dev-tree and not notice errors only
during -Werror bootstrap ...


I found it to be useless noise when not bootstrapping, since my system 
compiler isn't likely to recognize recently added % codes any time soon.


Jason



Re: [C++ PATCH] Fix PR58979

2013-11-04 Thread Jason Merrill

On 11/04/2013 05:50 AM, Marek Polacek wrote:

  case RO_ARROW:
+case RO_ARROW_STAR:
error_at (loc,
invalid type argument of %-% (have %qT),
type);


I think the diagnostic for RO_ARROW_STAR should say -*, not just -.

Jason



Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals

2013-11-04 Thread Dodji Seketeli
Jakub Jelinek ja...@redhat.com writes:

[...]

 Eventually, I think using int for sizes is just a ticking bomb, what if
 somebody uses  2GB long lines?  Surely, on 32-bit hosts we are unlikely to
 handle it, but why couldn't 64-bit host handle it?  Column info maybe bogus
 in there, sure, but at least we should not crash or overflow buffers on it
 ;).  Anyway, not something needed to be fixed right now, but in the future
 it would be nicer to use size_t and/or ssize_t here.

Yes.  I initially tried to use size_t but found that I'd need to modify
several other places to shutdown many warning because these places where
using int :-(.  So I felt that would be a battle for later.

But I am adding this to my TODO.  I'll send a patch later that
changes this to size_t then, and adjusts the other places that need it
as well.

[...]

context-last_location = diagnostic-location;
s = expand_location_to_spelling_point (diagnostic-location);
 -  line = location_get_source_line (s);
 +  line = location_get_source_line (s, line_width);

 I think richi didn't like C++ reference arguments to be used that way (and
 perhaps guidelines don't either), because it isn't immediately obvious
 that line_width is modified by the call.  Can you change it to a pointer
 argument instead and pass line_width?

Sure.  I have done the change in the patch below.  Sorry for this
reflex.  I tend to use pointers like these only in places where we can
allow them to be NULL.

 XNEWVEC or XRESIZEVEC will never return NULL though, so it doesn't have
 to be tested.  Though, the question is if that is what we want, caret
 diagnostics should be optional, if we can't print it, we just won't.

Hmmh.  This particular bug was noticed because of the explicit OOM
message displayed by XNEWVEC/XRESIZEVEC; otherwise, I bet this could
have just felt through the crack for a little longer.  So I'd say let's
just use XNEWVEC/XRESIZEVEC and remove the test, as you first
suggested.  The caret diagnostics functionality as a whole can be
disabled with -fno-diagnostic-show-caret.


[...]

 Otherwise, LGTM.

Thanks.

So here is the patch that bootstraps.

gcc/ChangeLog:

* input.h (location_get_source_line): Take an additional line_size
parameter by reference.
* input.c (get_line): New static function definition.
(read_line): Take an additional line_length output parameter to be
set to the size of the line.  Use the new get_line function to
compute the size of the line returned by fgets, rather than using
strlen.  Ensure that the buffer is initially zeroed; ensure that
when growing the buffer too.
(location_get_source_line): Take an additional output line_len
parameter.  Update the use of read_line to pass it the line_len
parameter.
* diagnostic.c (adjust_line): Take an additional input parameter
for the length of the line, rather than calculating it with
strlen.
(diagnostic_show_locus): Adjust the use of
location_get_source_line and adjust_line with respect to their new
signature.  While displaying a line now, do not stop at the first
null byte.  Rather, display the zero byte as a space and keep
going until we reach the size of the line.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
---
 gcc/diagnostic.c   |  17 ++--
 gcc/input.c| 100 ++---
 gcc/input.h|   3 +-
 .../c-c++-common/cpp/warning-zero-in-literals-1.c  | Bin 0 - 240 bytes
 4 files changed, 99 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 36094a1..0ca7081 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context,
MAX_WIDTH by some margin, then adjust the start of the line such
that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
margin is either 10 characters or the difference between the column
-   and the length of the line, whatever is smaller.  */
+   and the length of the line, whatever is smaller.  The length of
+   LINE is given by LINE_WIDTH.  */
 static const char *
-adjust_line (const char *line, int max_width, int *column_p)
+adjust_line (const char *line, int line_width,
+int max_width, int *column_p)
 {
   int right_margin = 10;
-  int line_width = strlen (line);
   int column = *column_p;
 
   right_margin = MIN (line_width - column, right_margin);
@@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context,
   const diagnostic_info *diagnostic)
 {
   const char *line;
+  int line_width;
   char *buffer;
   expanded_location s;
   int max_width;
@@ -297,22 +299,25 @@ diagnostic_show_locus 

Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Paolo Carlini

On 11/04/2013 04:34 PM, Jason Merrill wrote:

On 11/04/2013 07:41 AM, Richard Biener wrote:

Please make sure the warning is preserved when not bootstrapping,
it's useful to have them in your dev-tree and not notice errors only
during -Werror bootstrap ...


I found it to be useless noise when not bootstrapping, since my system 
compiler isn't likely to recognize recently added % codes any time soon.

Likewise here.

Paolo.


Re: [omp4]

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 08:19:12AM -0700, Aldy Hernandez wrote:
 Hi.
 
 While looking over some of your testcases I noticed that array
 subscripts are not being properly adjusted:
 
 foo(int i) {
   array[i] = 
 }
 
 The `i' should use the simd array magic instead of ICEing :).
 
 Is the attached patch OK for the branch?

I guess short time yes, but I wonder if it wouldn't be better to use
walk_gimple_op and do all changes in the callback.  Instead of passing
adjustments pass around a struct containing the adjustments, current stmt
and the modified flag.  You can use the val_only and is_lhs to determine
what you need to do (probably need to reset those two for the subtrees
to val_only = true, is_lhs = false and not walk subtrees of types) and you
could (if not val_only) immediately gimplify it properly (insert temporary
SSA_NAME setter before resp. store after depending on is_lhs).
Then you could avoid the regimplification.

Jakub


Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 10:34:55AM -0500, Jason Merrill wrote:
 On 11/04/2013 07:41 AM, Richard Biener wrote:
 Please make sure the warning is preserved when not bootstrapping,
 it's useful to have them in your dev-tree and not notice errors only
 during -Werror bootstrap ...
 
 I found it to be useless noise when not bootstrapping, since my
 system compiler isn't likely to recognize recently added % codes any
 time soon.

That's true, but it is easy to ignore those, while without -Wformat
you won't notice until you bootstrap that you actually passed incorrect
parameters to some *printf etc. family function.  Generally, in
non-bootstrap compilers I tend to ignore warnings except for the
file(s) I've been hacking on.

Jakub


Re: [PATCH, rs6000] (3/3) Fix mulv4si3 and mulv8hi3 patterns for little endian

2013-11-04 Thread Richard Sandiford
Bill Schmidt wschm...@linux.vnet.ibm.com writes:
 +   /* We need this to be vmulouh for both big and little endian,
 +  but for little endian we would swap this, so avoid that.  */
 +   if (BYTES_BIG_ENDIAN)
 + emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
 +   else
 + emit_insn (gen_vec_widen_umult_even_v8hi (low_product, one, two));

FWIW, an alternative would be to turn vec_widen_smult_{even,odd}_* into
define_expands and have define_insns for the underlying instructions.
E.g. vec_widen_umult_even_v16qi could call gen_vmuleub or gen_vmuloub
depending on endianness.

Then the unspec name would always match the instruction, and you could
also use gen_vmulouh rather than gen_vec_widen_umult_*_v8hi above.

It probably works out as more code overall, but maybe it means jumping
through fewer mental hoops...

Thanks,
Richard



Re: [C++ PATCH] Fix PR58979

2013-11-04 Thread Marek Polacek
On Mon, Nov 04, 2013 at 10:36:35AM -0500, Jason Merrill wrote:
 On 11/04/2013 05:50 AM, Marek Polacek wrote:
   case RO_ARROW:
 +case RO_ARROW_STAR:
 error_at (loc,
  invalid type argument of %-% (have %qT),
  type);
 
 I think the diagnostic for RO_ARROW_STAR should say -*, not just -.

Ah, sorry about that.  What about this one?

2013-11-04  Marek Polacek  pola...@redhat.com

PR c++/58979
c-family/
* c-common.c (invalid_indirection_error): Handle RO_ARROW_STAR case.
testsuite/
* g++.dg/diagnostic/pr58979.C: New test.

--- gcc/c-family/c-common.c.mp  2013-11-04 16:44:48.674023654 +0100
+++ gcc/c-family/c-common.c 2013-11-04 16:45:52.632267904 +0100
@@ -9934,6 +9934,11 @@ invalid_indirection_error (location_t lo
invalid type argument of %-% (have %qT),
type);
   break;
+case RO_ARROW_STAR:
+  error_at (loc,
+   invalid type argument of %-*% (have %qT),
+   type);
+  break;
 case RO_IMPLICIT_CONVERSION:
   error_at (loc,
invalid type argument of implicit conversion (have %qT),
--- gcc/testsuite/g++.dg/diagnostic/pr58979.C.mp2   2013-11-04 
11:09:30.515553664 +0100
+++ gcc/testsuite/g++.dg/diagnostic/pr58979.C   2013-11-04 11:09:26.505538785 
+0100
@@ -0,0 +1,4 @@
+// PR c++/58979
+// { dg-do compile }
+
+int i = 0-*0; // { dg-error invalid type argument of }

Marek


Re: [C++ PATCH] Fix PR58979

2013-11-04 Thread Jason Merrill

OK.

Jason


Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-11-04 Thread David Edelsohn
Balaji,

I am seeing a large number of libcilkrts failures on AIX.  These all
are of the form:

Executing on host: /tmp/20131103/gcc/xgcc -B/tmp/20131103/gcc/ /nasfarm/edelsohn
/src/src/gcc/testsuite/c-c++-common/cilk-plus/CK/sync_wo_spawn.c  -fno-diagnosti
cs-show-caret -fdiagnostics-color=never   -O0 -flto -g -fcilkplus -L/tmp/2013110
3/powerpc-ibm-aix7.1.0.0/./libcilkrts/.libs -fcilkplus -S  -o sync_wo_spawn.s
 (timeout = 300)
cc1: error: LTO support has not been enabled in this configuration
compiler exited with status 1

If the feature / functionality requires LTO, the tests should check
that LTO is supported with check_effective_target_lto, either in each
relevant test or in cilk-plus.exp.

Thanks, David


patch to fix pr58968

2013-11-04 Thread Vladimir Makarov
  The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58968


  It is actually the 3rd try to fix the problem.  Removing no-ops move
may result in mode-switching crashes on x86/x86-64 as it expects a
specific code (by the way reload has the same problem on x86/x86-64).

  The patch was successfully bootstrapped and tested on x86/x86-64 and
ppc64 (with LRA).

  Committed as rev. 204353.

2013-11-04  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/58968
* lra-spills.c (return_regno_p): New function.
  (lra_final_code_change): Use it.

2013-11-04  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/58968
* gfortran.dg/pr58968.f: New

Index: lra-spills.c
===
--- lra-spills.c	(revision 204352)
+++ lra-spills.c	(working copy)
@@ -618,6 +618,33 @@ alter_subregs (rtx *loc, bool final_p)
   return res;
 }
 
+/* Return true if REGNO is used for return in the current
+   function.  */
+static bool
+return_regno_p (unsigned int regno)
+{
+  rtx outgoing = crtl-return_rtx;
+
+  if (! outgoing)
+return false;
+
+  if (REG_P (outgoing))
+return REGNO (outgoing) == regno;
+  else if (GET_CODE (outgoing) == PARALLEL)
+{
+  int i;
+
+  for (i = 0; i  XVECLEN (outgoing, 0); i++)
+	{
+	  rtx x = XEXP (XVECEXP (outgoing, 0, i), 0);
+
+	  if (REG_P (x)  REGNO (x) == regno)
+	return true;
+	}
+}
+  return false;
+}
+
 /* Final change of pseudos got hard registers into the corresponding
hard registers and removing temporary clobbers.  */
 void
@@ -625,7 +652,7 @@ lra_final_code_change (void)
 {
   int i, hard_regno;
   basic_block bb;
-  rtx insn, curr, set;
+  rtx insn, curr;
   int max_regno = max_reg_num ();
 
   for (i = FIRST_PSEUDO_REGISTER; i  max_regno; i++)
@@ -636,7 +663,6 @@ lra_final_code_change (void)
 FOR_BB_INSNS_SAFE (bb, insn, curr)
   if (INSN_P (insn))
 	{
-	  bool change_p;
 	  rtx pat = PATTERN (insn);
 
 	  if (GET_CODE (pat) == CLOBBER  LRA_TEMP_CLOBBER_P (pat))
@@ -649,12 +675,24 @@ lra_final_code_change (void)
 	  continue;
 	}
 
-	  set = single_set (insn);
-	  change_p = (set != NULL
-		   REG_P (SET_SRC (set))  REG_P (SET_DEST (set))
-		   REGNO (SET_SRC (set)) = FIRST_PSEUDO_REGISTER
-		   REGNO (SET_DEST (set)) = FIRST_PSEUDO_REGISTER);
-	  
+	  /* IRA can generate move insns involving pseudos.  It is
+	 better remove them earlier to speed up compiler a bit.
+	 It is also better to do it here as they might not pass
+	 final RTL check in LRA, (e.g. insn moving a control
+	 register into itself).  So remove an useless move insn
+	 unless next insn is USE marking the return reg (we should
+	 save this as some subsequent optimizations assume that
+	 such original insns are saved).  */
+	  if (NONJUMP_INSN_P (insn)  GET_CODE (pat) == SET
+	   REG_P (SET_SRC (pat))  REG_P (SET_DEST (pat))
+	   REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
+	   ! return_regno_p (REGNO (SET_SRC (pat
+	{
+	  lra_invalidate_insn_data (insn);
+	  delete_insn (insn);
+	  continue;
+	}
+	
 	  lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
 	  struct lra_static_insn_data *static_id = id-insn_static_data;
 	  bool insn_change_p = false;
@@ -668,20 +706,5 @@ lra_final_code_change (void)
 	  }
 	  if (insn_change_p)
 	lra_update_operator_dups (id);
-
-	  if (change_p  REGNO (SET_SRC (set)) == REGNO (SET_DEST (set)))
-	{
-	  /* Remove an useless move insn but only involving
-		 pseudos as some subsequent optimizations are based on
-		 that move insns involving originally hard registers
-		 are preserved.  IRA can generate move insns involving
-		 pseudos.  It is better remove them earlier to speed
-		 up compiler a bit.  It is also better to do it here
-		 as they might not pass final RTL check in LRA,
-		 (e.g. insn moving a control register into
-		 itself).  */
-	  lra_invalidate_insn_data (insn);
-	  delete_insn (insn);
-	}
 	}
 }
Index: testsuite/gfortran.dg/pr58968.f
===
--- testsuite/gfortran.dg/pr58968.f	(revision 0)
+++ testsuite/gfortran.dg/pr58968.f	(working copy)
@@ -0,0 +1,96 @@
+C PR rtl-optimization/58968.f
+C { dg-do compile { target powerpc*-*-*} }
+C { dg-options -mcpu=power7 -O3 -w -ffast-math  -funroll-loops }
+  SUBROUTINE MAKTABS(IW,SOME,LBOX1,LBOX2,LBOX3,NSPACE,NA,NB,
+ *LBST,X,
+ *NX,IAMA,IAMI,IBMA,IBMI,MNUM,IDIM,MSTA,IBO,
+ *IDSYM,ISYM1,NSYM,
+ *NACT,LWRK,KTAB,LGMUL,
+ *LCON,LCOA,LCOB,
+ *LANDET,LBNDET,NAST,NBST,LSYMA,LSYMB,LGCOM,
+ *MINI,MAXI,LSPA,LSPB,LDISB,
+ *LSAS,LSBS,LSAC,LSBC,
+ *ITGA,ITGB,IAST,IBST,NCI,NA1EX,NB1EX,FDIRCT)
+  IMPLICIT DOUBLE PRECISION(A-H,O-Z)
+  LOGICAL SOME
+  

Re: [omp4]

2013-11-04 Thread Jakub Jelinek
On Mon, Nov 04, 2013 at 09:22:39AM -0700, Aldy Hernandez wrote:
 Ok, then let's leave this for later.  I'm following what we've done
 in tree-sra.c to be consistent.  Perhaps later we could even clean
 up ipa_sra_modify_function_body in tree-sra.c.  But for now I think
 we can concentrate on the basic functionality.

Well, ipa_sra_modify_function_body doesn't need to regimplify, unlike
the omp-low.c version thereof (supposedly because SRA only modifies
loads/stores, but doesn't change SSA_NAMEs on arithmetics etc.).
But yeah, this isn't requirement for hacking on this on gomp-4_0-branch, but
it would be nice to clean it up before merging to trunk.

 Committing to omp4 branch.

Thanks.

Jakub


Re: Aliasing: look through pointer's def stmt

2013-11-04 Thread Marc Glisse

On Mon, 4 Nov 2013, Richard Biener wrote:


Well, host_integer_p (, 0) looks correct to me.  But ...


Ok, I'll put it back.


Index: gcc/tree-ssa-alias.h
===
--- gcc/tree-ssa-alias.h(revision 204267)
+++ gcc/tree-ssa-alias.h(working copy)
@@ -139,30 +139,30 @@ extern void pt_solution_set_var (struct

 extern void dump_pta_stats (FILE *);

 extern GTY(()) struct pt_solution ipa_escaped_pt;

 /* Return true, if the two ranges [POS1, SIZE1] and [POS2, SIZE2]
overlap.  SIZE1 and/or SIZE2 can be (unsigned)-1 in which case the
range is open-ended.  Otherwise return false.  */

 static inline bool
-ranges_overlap_p (unsigned HOST_WIDE_INT pos1,
- unsigned HOST_WIDE_INT size1,
- unsigned HOST_WIDE_INT pos2,
- unsigned HOST_WIDE_INT size2)
+ranges_overlap_p (HOST_WIDE_INT pos1,
+ HOST_WIDE_INT size1,
+ HOST_WIDE_INT pos2,
+ HOST_WIDE_INT size2)


I think size[12] should still be unsigned (we don't allow negative
sizes but only the special value -1U which we special-case only to
avoid pos + size to wrap)


But all we do with size[12] is compare it to -1 and perform arithmetic 
with pos[12]. At least for the second one, we need to cast size to a 
signed type so the comparisons make sense (unless we decide to use a 
double_int there). So I thought I would do the cast in the argument. 
Otherwise, I'll start the function with:

HOST_WIDE_INT ssize1 = (HOST_WIDE_INT)size1;
and replace size1 with ssize1 through the function.

Is the reason of keeping size[12] unsigned for documentation? Or am I 
missing a reason why I may be breaking things?


--
Marc Glisse


[C++ Patch, Java related/RFC] PR 11006

2013-11-04 Thread Paolo Carlini

Hi,

we have got this very old bug, where we ICE in build_java_class_ref 
because TYPE is an INTEGER_TYPE and we try to use TYPE_FIELDS on it. 
Shall we apply something like the below and resolve it for good? 
Alternately, we could maybe provide the same message we currently 
provide in release-builds, when we get to:


if (!field)
  {
error (can%'t find %class$% in %qT, type);
return error_mark_node;
  }

Thanks!
Paolo.

//

Index: cp/init.c
===
--- cp/init.c   (revision 204353)
+++ cp/init.c   (working copy)
@@ -2461,9 +2461,16 @@ build_new_1 (vectree, va_gc **placement, tree ty
   if (vec_safe_is_empty (*placement)  TYPE_FOR_JAVA (elt_type))
 {
   tree class_addr;
-  tree class_decl = build_java_class_ref (elt_type);
+  tree class_decl;
   static const char alloc_name[] = _Jv_AllocObject;
 
+  if (!MAYBE_CLASS_TYPE_P (elt_type))
+   {
+ error (%qT isn%'t a valid Java class type, elt_type);
+ return error_mark_node;
+   }
+
+  class_decl = build_java_class_ref (elt_type);
   if (class_decl == error_mark_node)
return error_mark_node;
 
Index: testsuite/g++.dg/other/java3.C
===
--- testsuite/g++.dg/other/java3.C  (revision 0)
+++ testsuite/g++.dg/other/java3.C  (working copy)
@@ -0,0 +1,7 @@
+// PR c++/11006
+
+typedef int* jclass;
+
+void foo () {
+  new __java_boolean;  // { dg-error valid }
+}


Re: libsanitizer merge from upstream r191666

2013-11-04 Thread Jack Howarth
On Mon, Nov 04, 2013 at 06:47:13AM -0800, Konstantin Serebryany wrote:
 +glider, our Mac expert.
 
 Hi Jack,
 
 This patch has not been tested on Mac and we generally do not claim
 that gcc-asan is supported on Mac.
 clang-asan *is* supported on Mac and our bots are green (this patch is
 a merge of the sources which are regularly tested on Mac,
 but the build procedure is different).
 
 The failing assertion is weird, I haven't seen it since last year
 (https://code.google.com/p/address-sanitizer/issues/detail?id=117),
 but Alexander should know more.
 
 What is your version of Mac OS?

I am testing on MacOS X 10.8.5 with FSF gcc trunk built as...

% gcc-fsf-4.9 -v
Using built-in specs.
COLLECT_GCC=gcc-fsf-4.9
COLLECT_LTO_WRAPPER=/sw/lib/gcc4.9/libexec/gcc/x86_64-apple-darwin12.5.0/4.9.0/lto-wrapper
Target: x86_64-apple-darwin12.5.0
Configured with: ../gcc-4.9-20131101/configure --prefix=/sw 
--prefix=/sw/lib/gcc4.9 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.9/info 
--enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw 
--with-libiconv-prefix=/sw --with-isl=/sw --with-cloog=/sw --with-mpc=/sw 
--with-system-zlib --enable-checking=yes --x-includes=/usr/X11R6/include 
--x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.9
Thread model: posix
gcc version 4.9.0 20131101 (experimental) (GCC) 

 
 --kcc
 
 On Sat, Nov 2, 2013 at 10:25 AM, Jack Howarth howa...@bromo.med.uc.edu 
 wrote:
  On Fri, Nov 01, 2013 at 04:13:05PM -0700, Konstantin Serebryany wrote:
  Jukub,
 
  This works!
  New patch attached.
 
  Konstantin,
 This patch, when applied on top of r204305, bootstraps fine on 
  x86_64-apple-darwin12 but
  produces a lot of regressions with...
 
  make -k check RUNTESTFLAGS=asan.exp --target_board=unix'{-m64}'
 
  Native configuration is x86_64-apple-darwin12.5.0
 
  === g++ tests ===
 
 
  Running target unix/-m64
  FAIL: c-c++-common/asan/global-overflow-1.c  -O0  execution test
  FAIL: c-c++-common/asan/global-overflow-1.c  -O1  execution test
  FAIL: c-c++-common/asan/global-overflow-1.c  -O2  execution test
  FAIL: c-c++-common/asan/global-overflow-1.c  -O3 -fomit-frame-pointer  
  execution test
  FAIL: c-c++-common/asan/global-overflow-1.c  -O3 -g  execution test
  FAIL: c-c++-common/asan/global-overflow-1.c  -Os  execution test
  FAIL: c-c++-common/asan/global-overflow-1.c  -O2 -flto -flto-partition=none 
   execution test
  FAIL: c-c++-common/asan/global-overflow-1.c  -O2 -flto  execution test
  FAIL: c-c++-common/asan/heap-overflow-1.c  -O0  execution test
  FAIL: c-c++-common/asan/heap-overflow-1.c  -O1  execution test
  FAIL: c-c++-common/asan/heap-overflow-1.c  -O2  execution test
  FAIL: c-c++-common/asan/heap-overflow-1.c  -O3 -fomit-frame-pointer  
  execution test
  FAIL: c-c++-common/asan/heap-overflow-1.c  -O3 -g  execution test
  FAIL: c-c++-common/asan/heap-overflow-1.c  -Os  execution test
  FAIL: c-c++-common/asan/heap-overflow-1.c  -O2 -flto -flto-partition=none  
  execution test
  FAIL: c-c++-common/asan/heap-overflow-1.c  -O2 -flto  execution test
  FAIL: c-c++-common/asan/memcmp-1.c  -O0  execution test
  FAIL: c-c++-common/asan/memcmp-1.c  -O1  execution test
  FAIL: c-c++-common/asan/memcmp-1.c  -O2  execution test
  FAIL: c-c++-common/asan/memcmp-1.c  -O3 -fomit-frame-pointer  execution test
  FAIL: c-c++-common/asan/memcmp-1.c  -O3 -g  execution test
  FAIL: c-c++-common/asan/memcmp-1.c  -Os  execution test
  FAIL: c-c++-common/asan/memcmp-1.c  -O2 -flto -flto-partition=none  
  execution test
  FAIL: c-c++-common/asan/memcmp-1.c  -O2 -flto  execution test
  FAIL: c-c++-common/asan/null-deref-1.c  -O0  execution test
  FAIL: c-c++-common/asan/null-deref-1.c  -O1  execution test
  FAIL: c-c++-common/asan/null-deref-1.c  -O2  execution test
  FAIL: c-c++-common/asan/null-deref-1.c  -O3 -fomit-frame-pointer  execution 
  test
  FAIL: c-c++-common/asan/null-deref-1.c  -O3 -g  execution test
  FAIL: c-c++-common/asan/null-deref-1.c  -Os  execution test
  FAIL: c-c++-common/asan/null-deref-1.c  -O2 -flto -flto-partition=none  
  execution test
  FAIL: c-c++-common/asan/null-deref-1.c  -O2 -flto  execution test
  FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O0  execution test
  FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O1  execution test
  FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2  execution test
  FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O3 -fomit-frame-pointer  
  execution test
  FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O3 -g  execution test
  FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -Os  execution test
  FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2 -flto 
  -flto-partition=none  execution test
  FAIL: c-c++-common/asan/sanity-check-pure-c-1.c  -O2 -flto  execution test
  FAIL: c-c++-common/asan/sleep-before-dying-1.c  -O2  execution test
  FAIL: c-c++-common/asan/sleep-before-dying-1.c  -O2 -flto 
  -flto-partition=none  execution test
  FAIL: 

Re: PR 58958: wrong aliasing info

2013-11-04 Thread Marc Glisse

On Mon, 4 Nov 2013, Richard Biener wrote:


On Mon, Nov 4, 2013 at 12:18 PM, Marc Glisse marc.gli...@inria.fr wrote:

On Mon, 4 Nov 2013, Richard Biener wrote:


On Mon, Nov 4, 2013 at 11:55 AM, Richard Biener
richard.guent...@gmail.com wrote:


On Fri, Nov 1, 2013 at 11:39 PM, Marc Glisse marc.gli...@inria.fr
wrote:


Hello,

the issue was described in the PR and the message linked from there.
ao_ref_init_from_ptr_and_size calls get_ref_base_and_extent, which may
detect an array_ref of variable index, but ao_ref_init_from_ptr_and_size
never learns of it and uses the offset+size as if they were meaningful.



Well...  it certainly learns of it, but it chooses to ignore...

   if (TREE_CODE (ptr) == ADDR_EXPR)
-ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
-ref-offset, t1, t2);
+{
+  ref-base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (ptr,
0),
+  t, 0, safe);
+  ref-offset = BITS_PER_UNIT * t;
+}

safe == (t1 != -1  t1 == t2)

note that ao_ref_init_from_ptr_and_size gets the size fed in as argument
so I fail to see why it matters at all ...?  That is, if you feed in a
wrong
size then it's the callers error.



I think one issue is that the above code uses get_ref_base_and_extent
on an address.  It should have used get_addr_base_and_unit_offset.



Isn't that what my patch does? Except that get_addr_base_and_unit_offset
often gives up and returns NULL_TREE, whereas I believe we still want a base
even if we have trouble determining a constant offset, so I modified
get_addr_base_and_unit_offset_1 a bit.

(not saying the result is pretty...)


I don't think we want get_addr_base_and_unit_offset to return non-NULL
for a non-constant offset.  But yes, inside your patch is the correct fix,
so if you drop changing get_addr_base_and_unit_offset ...


That means that in a number of cases, ao_ref_init_from_ptr_and_size will 
produce a meaningless ao_ref (both .ref and .base are 0). I expect that 
will cause regressions in code quality at least. Or did you mean something 
else?


get_inner_reference might be an option too (we have quite a few functions 
doing similar things).


--
Marc Glisse


Re: [C++ Patch, Java related/RFC] PR 11006

2013-11-04 Thread Jason Merrill
Surely it should be valid to allocate a Java boolean type.  Andrew, how 
should that work?


Jason


Re: PATCH to use -Wno-format during stage1

2013-11-04 Thread Jason Merrill

On 11/04/2013 10:50 AM, Jakub Jelinek wrote:

That's true, but it is easy to ignore those, while without -Wformat
you won't notice until you bootstrap that you actually passed incorrect
parameters to some *printf etc. family function.


Sure, but that is much less common than the false positives, and will be 
caught anyway when you actually bootstrap (or rebuild stage 3).


Jason



Re: [Patch, avr] Emit diagnostics if -f{pic,PIC,pie,PIE} or -shared is passed

2013-11-04 Thread Joerg Wunsch
As Senthil Kumar Selvaraj wrote:

 If ok, could someone commit please? I don't have commit access.

I don't have commit access either, but the changes look reasonable to me.
-- 
Joerg Wunsch * Development engineer, Dresden, Germany

Atmel Automotive GmbH, Theresienstrasse 2, D-74027 Heilbronn
Geschäftsführung: Steven Laub, Steve Skaggs, Dr. Matthias Kästner
Amtsgericht Stuttgart, Registration HRB 106594


Re: [PATCH, rs6000] (2/3) Fix widening multiply high/low operations for little endian

2013-11-04 Thread Bill Schmidt
Per Richard S's suggestion, I'm reworking parts 1 and 3 of the patch
set, but this one will remain unchanged and is ready for review.

Thanks,
Bill

On Sun, 2013-11-03 at 23:34 -0600, Bill Schmidt wrote:
 Hi,
 
 This patch fixes the widening multiply high/low operations to work
 correctly in the presence of the first patch of this series, which
 reverses the meanings of multiply even/odd instructions.  Here we
 reorder the input operands to the vector merge low/high instructions.
 
 The general rule is that vmrghh(x,y) [BE] = vmrglh(y,x) [LE], and so on;
 that is, we need to reverse the usage of merge high and merge low, and
 also swap their inputs, to obtain the same semantics.  In this case we
 are only swapping the inputs, because the reversed usage of high and low
 has already been done for us in the generic handling code for
 VEC_WIDEN_MULT_LO_EXPR.
 
 Bootstrapped and tested with the rest of the patch set on
 powerpc64{,le}-unknown-linux-gnu, with no regressions.  Is this ok for
 trunk?
 
 Thanks,
 Bill
 
 
 2013-11-03  Bill Schmidt  wschm...@linux.vnet.ibm.com
 
   * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap
   arguments to merge instruction for little endian.
   (vec_widen_umult_lo_v16qi): Likewise.
   (vec_widen_smult_hi_v16qi): Likewise.
   (vec_widen_smult_lo_v16qi): Likewise.
   (vec_widen_umult_hi_v8hi): Likewise.
   (vec_widen_umult_lo_v8hi): Likewise.
   (vec_widen_smult_hi_v8hi): Likewise.
   (vec_widen_smult_lo_v8hi): Likewise.
 
 
 Index: gcc/config/rs6000/altivec.md
 ===
 --- gcc/config/rs6000/altivec.md  (revision 204192)
 +++ gcc/config/rs6000/altivec.md  (working copy)
 @@ -2185,7 +2235,10 @@

emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
 -  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
 +  if (BYTES_BIG_ENDIAN)
 +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
 +  else
 +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
DONE;
  })
 
 @@ -2202,7 +2255,10 @@

emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
 -  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
 +  if (BYTES_BIG_ENDIAN)
 +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
 +  else
 +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
DONE;
  })
 
 @@ -2219,7 +2275,10 @@

emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
 -  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
 +  if (BYTES_BIG_ENDIAN)
 +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
 +  else
 +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
DONE;
  })
 
 @@ -2236,7 +2295,10 @@

emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
 -  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
 +  if (BYTES_BIG_ENDIAN)
 +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
 +  else
 +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
DONE;
  })
 
 @@ -2253,7 +2315,10 @@

emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
 -  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
 +  if (BYTES_BIG_ENDIAN)
 +emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
 +  else
 +emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
DONE;
  })
 
 @@ -2270,7 +2335,10 @@

emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
 -  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
 +  if (BYTES_BIG_ENDIAN)
 +emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
 +  else
 +emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
DONE;
  })
 
 @@ -2287,7 +2355,10 @@

emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
 -  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
 +  if (BYTES_BIG_ENDIAN)
 +emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
 +  else
 +emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
DONE;
  })
 
 @@ -2304,7 +2375,10 @@

emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
 -  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
 +  if (BYTES_BIG_ENDIAN)
 +emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
 +  else
 +

Re: Patch RFA: With -fnon-call-exceptions sync builtins may throw

2013-11-04 Thread Richard Biener
Ian Lance Taylor i...@google.com wrote:
On Mon, Nov 4, 2013 at 3:52 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Nov 4, 2013 at 7:01 AM, Ian Lance Taylor i...@google.com
wrote:
 The middle-end currently marks all the sync builtins as nothrow. 
That
 is incorrect for most of them when using -fnon-call-exceptions.  The
 -fnon-call-exceptions option means that instructions that trap may
throw
 exceptions.  Most of the sync builtins access memory via pointers
passed
 by user code.  Therefore, they may trap.  (I noticed this because Go
 uses -fnon-call-exceptions.)

 This patch fixes the problem by introducing a new internal function
 attribute nothrow call.  The new attribute is exactly like
nothrow,
 except that it is ignored when using -fnon-call-exceptions.

 It is an internal attribute because there is no reason for a user to
use
 this.  The problem only arises when the middle-end sees a function
call
 that the backend turns into an instruction sequence that directly
 references memory.  That can only happen for functions that are
built in
 to the compiler.

 This requires changes to the C family, LTO, and Ada frontends.  I
think
 I'm handling the option correctly for LTO, but it would be good if
 somebody could check the logic for me.

 Bootstrapped and ran tests on x86_64-unknown-linux-gnu.  OK for
 mainline?

 Hmm.  I think you can handle this in a simpler way by just
 doing sth similar to

 #define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \
 ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING)

 that is, conditionally drop the _NOTHROW part.

Good point.

 Btw, I also think
 that if it can throw then it isn't LEAF (it may return exceptionally
 into another function in the unit).

According to the docs, a leaf function is permitted to return by
throwing an exception.

 Also this whole conditional-on-a-flag thing doesn't work well with
 using -fnon-call-exceptions in the optimize attribute or with
 different -fnon-call-exceptions settings in different TUs we LTO
 together.  Because we only have a single builtin decl (so for
 proper operation you'd have to clone builtin decls based on
 the matrix of flags that generate different attributes and use the
 correct one depending on context).

Yes, but as you know this problem already exists with -fexceptions.
I'm not really making it worse.

 That said, I'd prefer a simpler approach for now, mimicing
flag_errno_math
 handling.

Done.  Attached.  Bootstrapped on x86_64-unknown-linux-gnu, running
tests now.  OK for mainline if the tests pass?

Ok

Thanks,
Richard

Ian


gcc/ChangeLog:

2013-11-04  Ian Lance Taylor  i...@google.com

   * builtins.def (ATTR_NOTHROWCALL_LEAF_LIST): Define.
   * sync-builtins.def: Use ATTR_NOTHROWCALL_LEAF_LIST for all sync
   builtins that take pointers.
   * lto-opts.c (lto_write_options): Write -fnon-call-exceptions
   if set.
   * lto-wrapper.c (merge_and_complain): Collect
   OPT_fnon_call_exceptions.
   (run_gcc): Pass -fnon-call-exceptions.

gcc/testsuite/ChangeLog:

2013-11-03  Ian Lance Taylor  i...@google.com

   * g++.dg/ext/sync-4.C: New test.




Re: lto-plugin: mismatch between ld's architecture and GCC's configure --host

2013-11-04 Thread Cary Coutant
 Ping.  To sum it up, with these patches applied, there are no changes for
 a regular build (not using the new configure options).  On the other
 hand, configuring GCC as described, it is possible use the 32-bit x86
 linker for/with a x86_64 build, and get the very same GCC test results as
 when using a x86_64 linker.

Sorry, I've been hoping someone with more global approval authority
would respond.

 Allow overriding the libiberty used for building the LTO plugin.

 lto-plugin/
 * configure.ac (--with-libiberty): New configure option.
 * configure: Regenerate.
 * Makefile.am (libiberty, libiberty_pic): New variables.
 (liblto_plugin_la_LIBADD, liblto_plugin_la_LDFLAGS)
 (liblto_plugin_la_DEPENDENCIES): Use them.
 * Makefile.in: Regenerate.

These look OK to me.

 Allow for overriding a module's srcdir.

 * Makefile.tpl (configure-[+prefix+][+module+])
 (configure-stage[+id+]-[+prefix+][+module+]): Allow for
 overriding a module's srcdir.
 * Makefile.in: Regenerate.

These look OK, but I think a global maintainer or build machinery
maintainer should give approval.

 Non-host system configuration for linker plugins.

 * configure.ac (--enable-linker-plugin-flags)
 (--enable-linker-plugin-configure-flags): New flags.
 (configdirs): Conditionally add libiberty-linker-plugin.
 * configure: Regenerate.
 * Makefile.def (host_modules): Add libiberty-linker-plugin.
 (host_modules) lto-plugin: Pay attention to
 @extra_linker_plugin_flags@ and
 @extra_linker_plugin_configure_flags@.
 (all-lto-plugin): Also depend on all-libiberty-linker-plugin.
 * Makefile.in: Regenerate.
 gcc/
 * doc/install.texi (--enable-linker-plugin-flags)
 (--enable-linker-plugin-configure-flags): Document new flags.

Same here.

 GNU ld can use linker plugins, too.

 gcc/
 * doc/sourcebuild.texi (Top Level) lto-plugin: GNU ld can use
 linker plugins, too.

OK.

 Fix typo.

 * Makefile.tpl: Fix typo.
 * Makefile.in: Regenerate.

OK (qualifies as trivial and obvious).

-cary


Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-04 Thread Jeff Law

On 11/01/13 15:41, Jakub Jelinek wrote:

On Fri, Nov 01, 2013 at 05:36:34PM -0400, Andrew MacLeod wrote:

   static inline void
! gimple_call_set_lhs (gimple gs, tree lhs)
   {
-   GIMPLE_CHECK (gs, GIMPLE_CALL);
 gimple_set_op (gs, 0, lhs);
to
 static inline void
! gimple_call_set_lhs (gimple_statement_call *gs, tree lhs)
   {
 gimple_set_op (gs, 0, lhs);


but then every location that calls it needs an appropriate change:

!   gimple call;
!   call = gimple_build_call_vec (build_fold_addr_expr_loc (0,
alias), vargs);
 gimple_call_set_lhs (call, atree);

--- 1518,1524 

!   gimple_statement_call *call;
!   call = as_agimple_statement_call (gimple_build_call_vec
(build_fold_addr_expr_loc (0, alias), vargs));
 gimple_call_set_lhs (call, atree);

And in fact there is a ripple effect to then change
gimple_build_call_vec to simply return a gimple_statement_call *...
Then this doesn't look as ugly either...

!   gimple_statement_call *call;
!   call = gimple_build_call_vec (build_fold_addr_expr_loc (0,
alias), vargs);
 gimple_call_set_lhs (call, atree);

that is looking much better :-)


Do you seriously think this is an improvement?  The cost of changing
the --enable-checking=yes cost to compile time checking in either
cases sounds way too high to me.  Please don't.
?!?  One of the things we're reallying trying to do here is be type safe 
and use the type system to check for things at compile-time rather than 
runtime checks.


jeff


Re: [PATCH] Try to avoid vector mode punning in SET_DEST on i?86

2013-11-04 Thread James Greenhalgh
On Thu, Oct 31, 2013 at 04:49:47PM +, Jakub Jelinek wrote:

 2013-10-31  Jakub Jelinek  ja...@redhat.com
 
 * optabs.c (expand_vec_perm): Avoid vector mode punning
 SUBREGs in SET_DEST.
 * expmed.c (store_bit_field_1): Likewise.
 * config/i386/sse.md (movdi_to_sse, vec_pack_sfix_trunc_v2df,
 vec_pack_sfix_v2df, vec_shl_mode, vec_shr_mode,
 vec_interleave_highmode, vec_interleave_lowmode): Likewise.
 * config/i386/i386.c (ix86_expand_vector_move_misalign,
 ix86_expand_sse_movcc, ix86_expand_int_vcond, ix86_expand_vec_perm,
 ix86_expand_sse_unpack, ix86_expand_args_builtin,
 ix86_expand_vector_init_duplicate, ix86_expand_vector_set,
 emit_reduc_half, expand_vec_perm_blend, expand_vec_perm_pshufb,
 expand_vec_perm_interleave2, expand_vec_perm_pshufb2,
 expand_vec_perm_vpshufb2_vpermq,
 expand_vec_perm_vpshufb2_vpermq_even_odd, expand_vec_perm_even_odd_1,
 expand_vec_perm_broadcast_1, expand_vec_perm_vpshufb4_vpermq2,
 ix86_expand_sse2_mulv4si3, ix86_expand_pinsr): Likewise.
 (expand_vec_perm_palignr): Likewise.  Modify a copy of *d rather
 than *d itself.
 
 --- gcc/optabs.c.jj 2013-10-29 09:25:45.0 +0100
 +++ gcc/optabs.c2013-10-31 13:20:40.384808642 +0100
 @@ -6674,7 +6674,7 @@ expand_vec_perm (enum machine_mode mode,
 }
tmp = gen_rtx_CONST_VECTOR (qimode, vec);
sel = gen_lowpart (qimode, sel);
 -  sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
 +  sel = expand_vec_perm (qimode, gen_reg_rtx (qimode), sel, tmp, NULL);
gcc_assert (sel != NULL);
 
/* Add the byte offset to each byte element.  */

This hunk causes issues on AArch64 and ARM.

We look to see which permute operation we should generate in
aarch64_expand_vec_perm_const. If we notice that all elements in 
would select from op0 we copy op0 to op1 and generate appropriate code.

With this hunk applied we end up selecting from the register
generated by gen_reg_rtx (qimode), rather than from 'sel' as we
intended. Thus we lose the value of 'sel' and everything starts to
go wrong!

The hunk looks suspicious to me (why do we pick a register out of
thin air?), and reverting it fixes the problems I see. Could you
give me a pointer as to what this hunk fixes on i?86?

I don't think I can fix this up in the expander very easily?

Thanks,
James




Re: [Patch, ObjC, ObjC++, Darwin] Remove -fobjc-sjlj-exceptions

2013-11-04 Thread Joseph S. Myers
On Sun, 3 Nov 2013, Iain Sandoe wrote:

 gcc/c-family:
 
   * c-opts.c (c_common_post_options): Remove handling of
   flag_objc_sjlj_exceptions.
   * c.opt (fobjc-sjlj-exceptions): Issue a warning if this
   option is given.

The c-family changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [driver] Force options Wa, Wl, Wp, to take a mandatory argument

2013-11-04 Thread Joseph S. Myers
On Mon, 4 Nov 2013, Mingjie Xing wrote:

 Hello,
 
 This patch forces options Wa, Wl, Wp, to take a mandatory argument,
 which can fix the bug
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55651.
 
 Tested on i686-pc-linux-gnu.
 
 2013-11-04  Mingjie Xing  mingjie.x...@gmail.com
 
 * common.opt (Wa, Wl, Wp,): Change JoinedOrMissing to Joined.
 
 Is it OK?

As discussed in the thread starting at 
http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00848.html, this is 
incorrect; these options should pass through empty strings unchanged.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

2013-11-04 Thread Cong Hou
On Mon, Nov 4, 2013 at 2:06 AM, James Greenhalgh
james.greenha...@arm.com wrote:
 On Fri, Nov 01, 2013 at 04:48:53PM +, Cong Hou wrote:
 diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
 index 2a5a2e1..8f5d39a 100644
 --- a/gcc/doc/md.texi
 +++ b/gcc/doc/md.texi
 @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3.
 Operand 3 is of a mode equal or
  wider than the mode of the product. The result is placed in operand 0, which
  is of the same mode as operand 3.

 +@cindex @code{ssad@var{m}} instruction pattern
 +@item @samp{ssad@var{m}}
 +@cindex @code{usad@var{m}} instruction pattern
 +@item @samp{usad@var{m}}
 +Compute the sum of absolute differences of two signed/unsigned elements.
 +Operand 1 and operand 2 are of the same mode. Their absolute difference, 
 which
 +is of a wider mode, is computed and added to operand 3. Operand 3 is of a 
 mode
 +equal or wider than the mode of the absolute difference. The result is 
 placed
 +in operand 0, which is of the same mode as operand 3.
 +
  @cindex @code{ssum_widen@var{m3}} instruction pattern
  @item @samp{ssum_widen@var{m3}}
  @cindex @code{usum_widen@var{m3}} instruction pattern
 diff --git a/gcc/expr.c b/gcc/expr.c
 index 4975a64..1db8a49 100644

 I'm not sure I follow, and if I do - I don't think it matches what
 you have implemented for i386.

 From your text description I would guess the series of operations to be:

   v1 = widen (operands[1])
   v2 = widen (operands[2])
   v3 = abs (v1 - v2)
   operands[0] = v3 + operands[3]

 But if I understand the behaviour of PSADBW correctly, what you have
 actually implemented is:

   v1 = widen (operands[1])
   v2 = widen (operands[2])
   v3 = abs (v1 - v2)
   v4 = reduce_plus (v3)
   operands[0] = v4 + operands[3]

 To my mind, synthesizing the reduce_plus step will be wasteful for targets
 who do not get this for free with their Absolute Difference step. Imagine a
 simple loop where we have synthesized the reduce_plus, we compute partial
 sums each loop iteration, though we would be better to leave the reduce_plus
 step until after the loop. REDUC_PLUS_EXPR would be the appropriate
 Tree code for this.

What do you mean when you use synthesizing here? For each pattern,
the only synthesized operation is the one being returned from the
pattern recognizer. In this case, it is USAD_EXPR. The recognition of
reduce sum is necessary as we need corresponding prolog and epilog for
reductions, which is already done before pattern recognition. Note
that reduction is not a pattern but is a type of vector definition. A
vectorization pattern can still be a reduction operation as long as
STMT_VINFO_RELATED_STMT of this pattern is a reduction operation. You
can check the other two reduction patterns: widen_sum_pattern and
dot_prod_pattern for reference.

Thank you for your comment!


Cong


 I would prefer to see this Tree code not imply the reduce_plus.

 Thanks,
 James



Re: changing a collision resolution strategy of the symbol table of identifiers

2013-11-04 Thread Roman Gareev
2013/10/20 Ondřej Bílka nel...@seznam.cz:
 On Sun, Oct 20, 2013 at 06:55:40PM +0600, Roman Gareev wrote:
 Dear gcc contributors,

 Recently I came across the list of ideas for speeding up GCC
 (http://gcc.gnu.org/wiki/Speedup_areas). Among others, there was
 suggested to replace identifier hash table with other data structure.

 Please find attached a patch, that switches the resolution strategy of
 a hash table used in the symbol table of identifiers from open
 addressing to separate chaining with linked list. I would be very
 grateful for your comments, feedback and ideas about further
 enhancements to gcc, in which I could take a part. I am pursuing
 master of science and looking for thesis theme, that is compiler
 related.

 Below are results of testing and the patch.

 During testing of the linux kernel (3.8.8) compilation time, the
 acquired results were the following: increasing by 0.17% for the
 version 4.8.0, increasing by 1.12% for the version 4.8.1, decreasing
 by 0.598% for trunk (this are average values).


 Tested x86_64-unknown-linux-gnu, applying to 4.8.0, 4.8.1 and trunk.


 diff --git a/gcc/stringpool.c b/gcc/stringpool.c

 index f4d0dae..cc696f7 100644

 --- a/gcc/stringpool.c

 +++ b/gcc/stringpool.c

 @@ -241,12 +241,8 @@ gt_pch_nx (unsigned char *x, gt_pointer_operator
 op, void *cookie)

 /* SPD is saved in the PCH file and holds the information needed

 to restore the string pool. */

 -struct GTY(()) string_pool_data {

 - ht_identifier_ptr *

 - GTY((length (%h.nslots),

 Mail client broke it?
Sorry, an error occurred somewhere. Below is an attachment with the patch.


patch
Description: Binary data


[PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-04 Thread Yufeng Zhang

Hi,

This patch extends the slsr pass to optionally use an alternative base 
expression in finding basis for CAND_REFs.  Currently the pass uses 
hash-based algorithm to match the base_expr in a candidate.  Given a 
test case like the following, slsr will not be able to recognize the two 
CAND_REFs have the same basis, as their base_expr are of different 
SSA_NAMEs:


typedef int arr_2[20][20];

void foo (arr_2 a2, int i, int j)
{
  a2[i][j] = 1;
  a2[i + 10][j] = 2;
}

The gimple dump before slsr is like the following (using an 
arm-none-eabi gcc):


  i.0_2 = (unsigned int) i_1(D);
  _3 = i.0_2 * 80;
  _5 = a2_4(D) + _3;
  *_5[j_7(D)] = 1;  
  _9 = _3 + 800;
  _10 = a2_4(D) + _9;
  *_10[j_7(D)] = 2; 

Here are the dumps for the two CAND_REFs generated for the two 
statements pointed by the arrows:



  4  [2] _5 = a2_4(D) + _3;
 ADD  : a2_4(D) + (80 * i_1(D)) : int[20] *
 basis: 0  dependent: 0  sibling: 0
 next-interp: 0  dead-savings: 0

  8  [2] *_10[j_7(D)] = 2;
 REF  : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] *
 basis: 5  dependent: 0  sibling: 0
 next-interp: 0  dead-savings: 0

As mentioned previously, slsr cannot establish that candidate 4 is the 
basis for the candidate 8, as they have different base_exprs: a2_4(D) 
and _10, respectively.  However, the two references actually only differ 
by an immediate offset (800).


This patch uses the tree affine combination facilities to create an 
optional alternative base expression to be used in finding (as well as 
recording) the basis.  It calls tree_to_aff_combination_expand on 
base_expr, reset the offset field of the generated aff_tree to 0 and 
generate a tree from it by calling aff_combination_to_tree.


The new tree is recorded as a potential basis, and when 
find_basis_for_candidate fails to find a basis for a CAND_REF in its 
normal approach, it searches again using a tree expanded in such way. 
Such an expanded tree usually discloses the expression behind an 
SSA_NAME.  In the example above, instead of seeing the strength 
reduction candidate chains like this:


  _5 - 5
  _10 - 8

we are now having:

  _5 - 5
  _10 - 8
  a2_4(D) + (sizetype) i_1(D) * 80 - 5 - 8

With the candidates 5 and 8 linked to the same tree expression (a2_4(D) 
+ (sizetype) i_1(D) * 80), slsr is now able to establish that 5 is the 
basis of 8.


The patch doesn't attempt to change the content of any CAND_REF though. 
 It only enables CAND_REFs which (1) have the same stride and (2) have 
the underlying expressions of their base_expr only differ in immediate 
offsets,  to be recognized to have the same basis.  The statements with 
such CAND_REFs will be lowered to MEM_REFs, and later on the RTL 
expander shall be able to fold and re-associate the immediate offsets to 
the rightmost side of the addressing expression, and therefore exposes 
the common sub-expression successfully.


The code-gen difference of the example code on arm with -O2 
-mcpu=cortex-15 is:


mov r3, r1, asl #6
-   add ip, r0, r2, asl #2
str lr, [sp, #-4]!
+   mov ip, #1
+   mov lr, #2
add r1, r3, r1, asl #4
-   mov lr, #1
-   mov r3, #2
add r0, r0, r1
-   add r0, r0, #800
-   str lr, [ip, r1]
-   str r3, [r0, r2, asl #2]
+   add r3, r0, r2, asl #2
+   str ip, [r0, r2, asl #2]
+   str lr, [r3, #800]
ldr pc, [sp], #4

One fewer instruction in this simple case.

The example used in illustration is too simple to show code-gen 
difference on x86_64, but the included test case will show the benefit 
of the patch quite obviously.


The patch has passed

* bootstrapping on arm and x86_64
* regtest on arm-none-eabi,  aarch64-none-elf and x86_64

There is no regression in SPEC2K on arm or x86_64.

OK to commit to the trunk?

Any comment is welcomed!

Thanks,
Yufeng


gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(find_basis_for_base_expr): Update comment.
(find_basis_for_candidate): Add new parameter 'alt_base_expr' of
type 'tree'.  Optionally call find_basis_for_base_expr with
'alt_base_expr'.
(record_potential_basis): Add new parameter 'alt_base_expr' of
type 'tree'; set node-base_expr with 'alt_base_expr' if it is
not NULL.
(name_expansions): New static variable.
(get_alternative_base): New function.
(alloc_cand_and_find_basis): Call get_alternative_base for 
CAND_REF.
Update calls to find_basis_for_candidate and 
record_potential_basis.

(execute_strength_reduction): Call free_affine_expand_cache with
name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-41.c: New test.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 9a5072c..3150046 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -48,6 

Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.

2013-11-04 Thread Richard Biener
Richard Biener richard.guent...@gmail.com wrote:
On Thu, Oct 31, 2013 at 10:02 AM, Ilya Enkovich
enkovich@gmail.com wrote:
 Hi,

 Here is a patch which hadles the problem with optimization of
BUILT_IN_CHKP_ARG_BND calls.  Pointer Bounds Checker expects that
argument of this call is a default SSA_NAME of the PARM_DECL whose
bounds we want to get.  The problem is in optimizations which may
replace arg with it's copy or a known value.  This patch suppress such
modifications.

This doesn't seem like a good fix.  I suppose you require the same on
RTL, that is, have the incoming arg reg coalesced with the use?
In that case better set SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

Btw, I would have chosen

P_2 = __builtin_xyz (p_1, bound)
Call (p_2)

Thus make the builtins a transparent pass-through which effectively binds 
parameter to their bound, removing the need for the artificial arguments and 
making propagations a non.issue

Also, how does the feature interact with other extensions such as nested 
functions or optimizations like inlining?

Richard.


Richard.

 Thanks,
 Ilya
 --

 gcc/

 2013-10-28  Ilya Enkovich  ilya.enkov...@intel.com

 * tree-into-ssa.c: Include target.h
 (rewrite_update_stmt): Skip BUILT_IN_CHKP_ARG_BND calls.
 * tree-ssa-dom.c: Include target.h
 (cprop_into_stmt): Skip BUILT_IN_CHKP_ARG_BND calls.


 diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c
 index 981e9f4..8d48f6d 100644
 --- a/gcc/tree-into-ssa.c
 +++ b/gcc/tree-into-ssa.c
 @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
  #include params.h
  #include diagnostic-core.h
  #include tree-into-ssa.h
 +#include target.h


  /* This file builds the SSA form for a function as described in:
 @@ -1921,8 +1922,14 @@ rewrite_update_stmt (gimple stmt,
gimple_stmt_iterator gsi)
  }

/* Rewrite USES included in OLD_SSA_NAMES and USES whose
underlying
 - symbol is marked for renaming.  */
 -  if (rewrite_uses_p (stmt))
 + symbol is marked for renaming.
 + Skip calls to BUILT_IN_CHKP_ARG_BND whose arg should never be
 + renamed.  */
 +  if (rewrite_uses_p (stmt)
 +   !(flag_check_pointer_bounds
 +   (gimple_code (stmt) == GIMPLE_CALL)
 +   gimple_call_fndecl (stmt)
 +  == targetm.builtin_chkp_function (BUILT_IN_CHKP_ARG_BND)))
  {
if (is_gimple_debug (stmt))
 {
 diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
 index 211bfcf..445278a 100644
 --- a/gcc/tree-ssa-dom.c
 +++ b/gcc/tree-ssa-dom.c
 @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
  #include params.h
  #include tree-ssa-threadedge.h
  #include tree-ssa-dom.h
 +#include target.h

  /* This file implements optimizations on the dominator tree.  */

 @@ -2266,6 +2267,16 @@ cprop_into_stmt (gimple stmt)
use_operand_p op_p;
ssa_op_iter iter;

 +  /* Call used to obtain bounds of input arg by Pointer Bounds
Checker
 + should not be optimized.  Argument of the call is a default
 + SSA_NAME of PARM_DECL.  It should never be replaced by value. 
*/
 +  if (flag_check_pointer_bounds  gimple_code (stmt) ==
GIMPLE_CALL)
 +{
 +  tree fndecl = gimple_call_fndecl (stmt);
 +  if (fndecl == targetm.builtin_chkp_function
(BUILT_IN_CHKP_ARG_BND))
 +   return;
 +}
 +
FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_USE)
  cprop_operand (stmt, op_p);
  }




Re: [C++ Patch, Java related/RFC] PR 11006

2013-11-04 Thread Paolo Carlini

On 11/04/2013 06:21 PM, Jason Merrill wrote:
Surely it should be valid to allocate a Java boolean type.  Andrew, 
how should that work?
Thanks. The problem we are facing (assuming we want to resolve this old 
isue) is that something seems seriously broken for the builtin Java 
types as declared in record_builtin_java_type, not just __java_boolean: 
all of them are just INTEGER_TYPEs or FLOAT_TYPEs. I'm wondering if we 
should special case the mangling for those in build_java_class_ref, 
something morally like the below. Then the testcase doesn't ICE anymore 
and if we add some missing declarations (definitely a _Jv_AllocObject) 
it even compiles. But my knowledge of Java is extremely weak, I have 
no idea how this is going to work.


Paolo.

/
Index: init.c
===
--- init.c  (revision 204355)
+++ init.c  (working copy)
@@ -3068,22 +3068,29 @@ build_java_class_ref (tree type)
   jclass_node = TREE_TYPE (jclass_node);
 }
 
-  /* Mangle the class$ field.  */
-  {
-tree field;
-for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
-  if (DECL_NAME (field) == CL_suffix)
+  if (MAYBE_CLASS_TYPE_P (type))
+{
+  /* Mangle the class$ field.  */
+  tree field;
+  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+   if (DECL_NAME (field) == CL_suffix)
+ {
+   mangle_decl (field);
+   name = DECL_ASSEMBLER_NAME (field);
+   break;
+ }
+  if (!field)
{
- mangle_decl (field);
- name = DECL_ASSEMBLER_NAME (field);
- break;
+ error (can%'t find %class$% in %qT, type);
+ return error_mark_node;
}
-if (!field)
-  {
-   error (can%'t find %class$% in %qT, type);
-   return error_mark_node;
-  }
-  }
+}
+  else
+{
+  /* Standard Java types per record_builtin_java_type.  */
+  mangle_decl (TYPE_NAME (type));
+  name = DECL_ASSEMBLER_NAME (TYPE_NAME (type));
+}
 
   class_decl = IDENTIFIER_GLOBAL_VALUE (name);
   if (class_decl == NULL_TREE)
extern Java
{
  namespace java
  {
namespace lang
{
  class Class;
  class Object;
}
  }
}

typedef struct java::lang::Object* jobject;
typedef class java::lang::Class* jclass;

extern C jobject _Jv_AllocObject (jclass) __attribute__((__malloc__));

void foo () {
  new __java_boolean;
}


[PATCH] PR58985: testcase error.

2013-11-04 Thread Wei Mi
Hi,

This is to fix testcase error reported in PR58985.

The intention of the testcase was to ensure there was no REG_EQUIV
notes generated for a reg which was used in a paradoxical subreg. When
target was x86, there was subreg generated so I omitted to add the
subreg in the regexp pattern. However there is no subreg generated for
target cris-axis-elf, so REG_EQUIV should be allowed.

Is it ok for trunk and gcc-4.8 branch?

Thanks,
Wei Mi.

2013-11-04  Wei Mi  w...@google.com

PR regression/58985
* testsuite/gcc.dg/pr57518.c: Add subreg in regexp pattern.

Index: testsuite/gcc.dg/pr57518.c
===
--- testsuite/gcc.dg/pr57518.c  (revision 204353)
+++ testsuite/gcc.dg/pr57518.c  (working copy)
@@ -2,7 +2,7 @@

 /* { dg-do compile } */
 /* { dg-options -O2 -fdump-rtl-ira } */
-/* { dg-final { scan-rtl-dump-not
REG_EQUIV\[^\n\]*mem\[^\n\]*\ip\ ira } } */
+/* { dg-final { scan-rtl-dump-not
REG_EQUIV\[^\n\]*mem\[^\n\]*\ip\.*subreg ira } } */

 char ip[10];
 int total;


Re: changing a collision resolution strategy of the symbol table of identifiers

2013-11-04 Thread Roman Gareev
2013/10/31 Florian Weimer fwei...@redhat.com:
 On 10/20/2013 02:55 PM, Roman Gareev wrote:

 During testing of the linux kernel (3.8.8) compilation time, the
 acquired results were the following: increasing by 0.17% for the
 version 4.8.0, increasing by 1.12% for the version 4.8.1, decreasing
 by 0.598% for trunk (this are average values).


 Can you share the raw numbers?  Are the differences statistically
 significant?

 --
 Florian Weimer / Red Hat Product Security Team

These are the row numbers. I'll share results of checking for
statistical significance as soon as more compilations are made.


results
Description: Binary data


RFA: patch to fix PR58967

2013-11-04 Thread Vladimir Makarov
The following patch fixes

   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58967

The removed code is too old.  To be honest, I even don't remember why I
added this.  LRA has been changed a lot since this change and now it
works fine without it.

There is no test for this case as it is too big.

The patch was successfully bootstrapped on ppc64 (with LRA) and tested
on ppc{32|64}.

Is it ok to commit it to the trunk?

2013-11-04  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/58967
* config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
!lra_in_progress for mode sizes bigger word.


Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c  (revision 204305)
+++ config/rs6000/rs6000.c  (working copy)
@@ -6388,7 +6388,7 @@ legitimate_lo_sum_address_p (enum machin
return false;
   if (GET_MODE_NUNITS (mode) != 1)
return false;
-  if (! lra_in_progress  GET_MODE_SIZE (mode)  UNITS_PER_WORD
+  if (GET_MODE_SIZE
(mottp://gcc.gnu.org/bugzilla/show_bug.cgi?id=58967de)  UNITS_PER_WORD
   !(/* ??? Assume floating point reg based on mode?  */
   TARGET_HARD_FLOAT  TARGET_FPRS  TARGET_DOUBLE_FLOAT
(mode == DFmode || mode == DDmode)))





Re: RFA: patch to fix PR58967

2013-11-04 Thread David Edelsohn
On Mon, Nov 4, 2013 at 2:23 PM, Vladimir Makarov vmaka...@redhat.com wrote:
 The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58967

 The removed code is too old.  To be honest, I even don't remember why I
 added this.  LRA has been changed a lot since this change and now it
 works fine without it.

 There is no test for this case as it is too big.

 The patch was successfully bootstrapped on ppc64 (with LRA) and tested
 on ppc{32|64}.

 Is it ok to commit it to the trunk?

 2013-11-04  Vladimir Makarov  vmaka...@redhat.com

 PR rtl-optimization/58967
 * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
 !lra_in_progress for mode sizes bigger word.


 Index: config/rs6000/rs6000.c
 ===
 --- config/rs6000/rs6000.c  (revision 204305)
 +++ config/rs6000/rs6000.c  (working copy)
 @@ -6388,7 +6388,7 @@ legitimate_lo_sum_address_p (enum machin
 return false;
if (GET_MODE_NUNITS (mode) != 1)
 return false;
 -  if (! lra_in_progress  GET_MODE_SIZE (mode)  UNITS_PER_WORD
 +  if (GET_MODE_SIZE
 (mottp://gcc.gnu.org/bugzilla/show_bug.cgi?id=58967de)  UNITS_PER_WORD

This is okay, assuming the weird bugzilla URL paste shown in the patch
is not really included.

Thanks, David


Re: [PATCH, rs6000] (2/3) Fix widening multiply high/low operations for little endian

2013-11-04 Thread David Edelsohn
On Mon, Nov 4, 2013 at 12:34 AM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 This patch fixes the widening multiply high/low operations to work
 correctly in the presence of the first patch of this series, which
 reverses the meanings of multiply even/odd instructions.  Here we
 reorder the input operands to the vector merge low/high instructions.

 The general rule is that vmrghh(x,y) [BE] = vmrglh(y,x) [LE], and so on;
 that is, we need to reverse the usage of merge high and merge low, and
 also swap their inputs, to obtain the same semantics.  In this case we
 are only swapping the inputs, because the reversed usage of high and low
 has already been done for us in the generic handling code for
 VEC_WIDEN_MULT_LO_EXPR.

 Bootstrapped and tested with the rest of the patch set on
 powerpc64{,le}-unknown-linux-gnu, with no regressions.  Is this ok for
 trunk?

 Thanks,
 Bill


 2013-11-03  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap
 arguments to merge instruction for little endian.
 (vec_widen_umult_lo_v16qi): Likewise.
 (vec_widen_smult_hi_v16qi): Likewise.
 (vec_widen_smult_lo_v16qi): Likewise.
 (vec_widen_umult_hi_v8hi): Likewise.
 (vec_widen_umult_lo_v8hi): Likewise.
 (vec_widen_smult_hi_v8hi): Likewise.
 (vec_widen_smult_lo_v8hi): Likewise.

This patch is okay.

I agree with Richard's suggestion to address the other parts of the
patch by restructuring the patterns.

Thanks, David


  1   2   >