Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-25 Thread Maxim Kuvyrkov
On 18/04/2012, at 9:17 PM, Richard Guenther wrote:

 On Wed, Apr 18, 2012 at 4:15 AM, Maxim Kuvyrkov ma...@codesourcery.com 
 wrote:
 Steven,
 Jorn,
 
 I am looking into fixing performance regression on EEMBC's bitmnp01, and a 
 version of your combined patch attached to PR38785 still works very well.  
 Would you mind me getting it through upstream review, or are there any 
 issues with contributing this patch to GCC mainline?
 
 We (CodeSourcery/Mentor) were carrying this patch in our toolchains since 
 GCC 4.4, and it didn't show any performance or correctness problems on x86, 
 ARM, MIPS, and other architectures.  It also reliably fixes bitmnp01 
 regression, which is still present in current mainline.
 
 I have tested this patch on recent mainline on i686-linux-gnu with no 
 regressions.  Unless I hear from you to the contrary, I will push this patch 
 for upstream review and, hopefully, get it checked in.
 
 Previous discussion of this patch is at 
 http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html
 
 The addition of -ftree-pre-partial-partial is ok if you change its name to
 -ftree-partial-pre and add documentation to invoke.texi.

OK.  Gerald, does the patch for gcc-4.8/changes.html look OK?

 
 + /* Assuming the expression is 50% anticipatable, we have
 +to multiply the number of insertions needed by two for a cost
 +comparison.  */
 
 why assume 50% anticipatibility if you can compute the exact
 anticipatibility?

Do you mean partial anticipatibility as in the updated patch?  To compute exact 
anticipatibility we would need to traverse the bottom part of CFG, to get the 
numbers right.

 
 + if (!optimize_function_for_speed_p (cfun)
 
 please look at how I changed regular PRE to look at whether we want to
 optimize the path which has the redundancy for speed via
 optimize_edge_for_speed_p.  The same surgerly should be applied to
 PPRE.

OK.  In the updated patch we require at least one of the successors the 
partially anticipates the expression to be on the speed path.

Any other comments?

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



pr38785-rel-note.patch
Description: Binary data


pr38785.ChangeLog
Description: Binary data


pr38785.patch
Description: Binary data


Re: [Patch, Fortran] PR52864 - fix actual/formal checks

2012-04-25 Thread Tobias Burnus

Hello Mikael,

thanks for the review. Regarding:

Mikael Morin wrote:
is there a reason to guard the class_pointer condition with 
attr.class_ok in the first conditional and with CLASS_DATA(...) != 
NULL in the two other ones? Not that it matters much, and in fact, I 
think the patch as is is good enough for committal (yes, it is a OK). 
I'm asking as I never know myself what is the correct, canonical way 
to handle the class_* hell...


It's a good question what's more appropriate. My impression is that both 
is nearly identical; I frankly don't know whether what's the exact 
difference. I recall that I once had to use CLASS_DATA() != NULL to 
avoid a segfault. I don't remember whether it was the issue below or 
something different.


For an expression, CLASS_DATA () != NULL has the big advantage that one 
avoids to walk the expression: For an expr, one needs to check 
expr-symtree-n.sym but also the ref tree. Thus, CLASS_DATA is much 
simpler than class_ok.


Looking at my patch, I have a dummy argument (i.e. gfc_symbol) - and for 
those, one can simply access fsym-sym-attr.class_ok.


Tobias


Re: FW: [v3] libstdc++/52689 testcase

2012-04-25 Thread Igor Zamyatin
This testcase is reported as failed on x86


 Noticed that this testcase wasn't put in as part of the patch. Fixed as 
 follows.

 tested x86/linux

 -benjamin



combine_conversions int-double-int

2012-04-25 Thread Marc Glisse

Hello,

a conversion like int-double-int is just the identity, as long as double 
is big enough to represent all ints exactly. The most natural way I found 
to do this optimization is the attached:


2012-04-25  Marc Glisse  marc.gli...@inria.fr

PR middle-end/27139
* tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT.

Does the approach make sense? I don't know that code, and adding 
FLOAT_EXPR / FIX_TRUNC_EXPR was a bit of guesswork. The precision of 
double could be multiplied by log2(base), but not doing it is safe. If 
the approach is ok, I could extend it so int-double-long also skips the 
intermediate conversion.


Bootstrapped and regression tested on linux-x86_64.

Should I try and write a testcase for a specific target checking for 
specific asm instructions there, or is there a better way?


(note that I can't commit)

--
Marc GlisseIndex: tree-ssa-forwprop.c
===
--- tree-ssa-forwprop.c (revision 186761)
+++ tree-ssa-forwprop.c (working copy)
@@ -2403,10 +2403,11 @@ combine_conversions (gimple_stmt_iterato
 {
   gimple stmt = gsi_stmt (*gsi);
   gimple def_stmt;
   tree op0, lhs;
   enum tree_code code = gimple_assign_rhs_code (stmt);
+  enum tree_code code2;
 
   gcc_checking_assert (CONVERT_EXPR_CODE_P (code)
   || code == FLOAT_EXPR
   || code == FIX_TRUNC_EXPR);
 
@@ -2423,11 +2424,13 @@ combine_conversions (gimple_stmt_iterato
 
   def_stmt = SSA_NAME_DEF_STMT (op0);
   if (!is_gimple_assign (def_stmt))
 return 0;
 
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
+  code2 = gimple_assign_rhs_code (def_stmt);
+  if (CONVERT_EXPR_CODE_P (code2) || code2 == FLOAT_EXPR
+  || code2 == FIX_TRUNC_EXPR)
 {
   tree defop0 = gimple_assign_rhs1 (def_stmt);
   tree type = TREE_TYPE (lhs);
   tree inside_type = TREE_TYPE (defop0);
   tree inter_type = TREE_TYPE (op0);
@@ -2453,13 +2456,16 @@ combine_conversions (gimple_stmt_iterato
   /* In addition to the cases of two conversions in a row
 handled below, if we are converting something to its own
 type via an object of identical or wider precision, neither
 conversion is needed.  */
   if (useless_type_conversion_p (type, inside_type)
-  (((inter_int || inter_ptr)  final_int)
- || (inter_float  final_float))
-  inter_prec = final_prec)
+  (inter_int || inter_ptr)  final_int)
+   || (inter_float  final_float))
+   inter_prec = final_prec)
+ || (inside_int  final_int  inter_float
+  REAL_MODE_FORMAT (TYPE_MODE (inter_type))-p
+= (int) inside_prec - !inside_unsignedp)))
{
  gimple_assign_set_rhs1 (stmt, unshare_expr (defop0));
  gimple_assign_set_rhs_code (stmt, TREE_CODE (defop0));
  update_stmt (stmt);
  return remove_prop_source_from_use (op0) ? 2 : 1;


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-25 Thread Andi Kleen
 Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2,
 both of them are turned on, but gcc does not allow any code growth --
 which makes them pretty useless at O2 (very few loops qualify). The
 default max complete peel iteration is also too low compared with both
 icc and llvm.  This needs to be tuned.

I found that at -O3 (where tree unroll is on by default) there is quite a 
bit of useless unrolling. I got somewhat irritated that my printf debug loops 
were commonly unrolled.

-Andi



Re: move default_tree_printer out of toplev.c

2012-04-25 Thread Richard Guenther
On Tue, 24 Apr 2012, Manuel López-Ibáñez wrote:

 It seems an obvious cleanup to me.
 
 OK?
 
 2012-04-24  Manuel López-Ibáñez  m...@gcc.gnu.org
 
 gcc/
   * tree-pretty-print.h (default_tree_printer): Do not declare.
   * tree-diagnostic.c: Include tree-pretty-print.h, tree-pass.h and
   intl.h.
   (default_tree_diagnostic_starter): Make static.
   (default_tree_printer): Move to here. Make static.
   (tree_diagnostics_defaults): New.
   * tree-diagnostic.h (default_tree_diagnostic_starter): Do not declare.
   * tree.c (free_lang_data): Use tree_diagnostics_defaults.

Bogus changelog - you changed general_init.

   * toplev.c: Do not include tree-pass.h.
   (default_tree_printer): Move from here.
   (general_init): Use tree_diagnostics_defaults.

Otherwise ok.

Thanks,
Richard.

[PATCH] Validate the HLE memory models.

2012-04-25 Thread Andi Kleen
From: Andi Kleen a...@linux.intel.com

This is on top of Kirill's latest patch.

Based on the earlier discussion. Warn and enforce reasonable memory models
for HLE ACQUIRE and RELEASE.  It's not useful to have a weaker compiler
memory model than what the CPU implements for a transaction.

With HLE_ACQUIRE model must be ACQUIRE or stronger. With HLE_RELEASE model must
be RELEASE or stronger.

I implemented this with a new target hook that replaces the target variable
originally added in the earlier patch.

Passes bootstrap and test suite on x86_64-linux

gcc/:
2012-04-24  Andi Kleen  a...@linux.intel.com

* builtins.c (get_memmodel): Add val. Call target.memmodel_check
and return new variable.
* config/i386/i386.c (ix86_memmodel_check): Add.
(TARGET_MEMMODEL_MASK): Replace with TARGET_MEMMODEL_CHECK.
* doc/tm.texi (TARGET_MEMMODEL_MASK): Replace with 
TARGET_MEMMODEL_CHECK..
* doc/tm.texi.in (TARGET_MEMMODEL_MASK): Replace with 
TARGET_MEMMODEL_CHECK.
* target.def (memmodel_mask): Replace with memmodel_check.
---
 gcc/builtins.c |8 ++--
 gcc/config/i386/i386.c |   35 +--
 gcc/doc/tm.texi|7 ---
 gcc/doc/tm.texi.in |7 ---
 gcc/target.def |6 +++---
 5 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index bd66ff0..8f25086 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5338,6 +5338,7 @@ static enum memmodel
 get_memmodel (tree exp)
 {
   rtx op;
+  int val;
 
   /* If the parameter is not a constant, it's a run time value so we'll just
  convert it to MEMMODEL_SEQ_CST to avoid annoying runtime checking.  */
@@ -5346,7 +5347,10 @@ get_memmodel (tree exp)
 
   op = expand_normal (exp);
 
-  if(INTVAL (op)  ~(MEMMODEL_MASK | targetm.memmodel_mask))
+  val = INTVAL (op);
+  if (targetm.memmodel_check)
+val = targetm.memmodel_check (val);
+  else if (val  ~MEMMODEL_MASK)
 {
   warning (OPT_Winvalid_memory_model,
   Unknown architecture specifier in memory model to builtin.);
@@ -5361,7 +5365,7 @@ get_memmodel (tree exp)
   return MEMMODEL_SEQ_CST;
 }
 
-  return (enum memmodel) INTVAL (op);
+  return (enum memmodel) val;
 }
 
 /* Expand the __atomic_exchange intrinsic:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0b6d6be..d17fcac 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -38967,6 +38967,37 @@ ix86_autovectorize_vector_sizes (void)
   return (TARGET_AVX  !TARGET_PREFER_AVX128) ? 32 | 16 : 0;
 }
 
+/* Validate target specific memory model bits in VAL. */
+
+static unsigned int
+ix86_memmodel_check (unsigned val)
+{
+  unsigned model = val  MEMMODEL_MASK;
+  int strong;
+
+  if (val  ~(IX86_HLE_ACQUIRE|IX86_HLE_RELEASE|MEMMODEL_MASK) 
+  || ((val  IX86_HLE_ACQUIRE)  (val  IX86_HLE_RELEASE)))
+{
+  warning (OPT_Winvalid_memory_model,
+  Unknown architecture specific memory model);
+  return MEMMODEL_SEQ_CST;
+}
+  strong = (model == MEMMODEL_ACQ_REL || model == MEMMODEL_SEQ_CST);
+  if (val  IX86_HLE_ACQUIRE  !(model == MEMMODEL_ACQUIRE || strong))
+{
+  warning (OPT_Winvalid_memory_model,
+  HLE_ACQUIRE not used with ACQUIRE or stronger memory model);
+  return MEMMODEL_SEQ_CST | IX86_HLE_ACQUIRE;
+}
+   if (val  IX86_HLE_RELEASE  !(model == MEMMODEL_RELEASE || strong))
+{
+  warning (OPT_Winvalid_memory_model,
+  HLE_RELEASE not used with RELEASE or stronger memory model);
+  return MEMMODEL_SEQ_CST | IX86_HLE_RELEASE;
+}
+  return val;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_RETURN_IN_MEMORY
 #define TARGET_RETURN_IN_MEMORY ix86_return_in_memory
@@ -39066,8 +39097,8 @@ ix86_autovectorize_vector_sizes (void)
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL ix86_function_ok_for_sibcall
 
-#undef TARGET_MEMMODEL_MASK
-#define TARGET_MEMMODEL_MASK (IX86_HLE_ACQUIRE|IX86_HLE_RELEASE)
+#undef TARGET_MEMMODEL_CHECK
+#define TARGET_MEMMODEL_CHECK ix86_memmodel_check
 
 #ifdef HAVE_AS_TLS
 #undef TARGET_HAVE_TLS
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index ee0d700..4c57a9c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11362,9 +11362,10 @@ MIPS, where add-immediate takes a 16-bit signed value,
 @code{TARGET_CONST_ANCHOR} is set to @samp{0x8000}.  The default value
 is zero, which disables this optimization.  @end deftypevr
 
-@deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_MASK
-Some architectures may prvide additional memory model bits. This hook
-must mask such a bits. @end deftypevr
+@deftypefn {Target Hook} unsigned TARGET_MEMMODEL_CHECK (unsigned @var{val})
+Validate target specific memory model mask bits. When NULL no target specific
+memory model bits are allowed. 
+@end deftypefn
 
 @deftypevr {Target Hook} {unsigned char} TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
 This value 

Re: [rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision

2012-04-25 Thread Richard Guenther
On Tue, 24 Apr 2012, Ulrich Weigand wrote:

 Hello,
 
 PR 52633 is caused by bad interaction between two different vectorizer
 pattern recognition passed.  A minimal test case is:
 
 void
 test (unsigned short *x, signed char *y)
 {
   int i;
   for (i = 0; i  32; i++)
 x[i] = (short) (y[i]  5);
 }
 
 built with cc1 -O3 -march=armv7-a -mfpu=neon -mfloat-abi=softfp
 on a arm-linux-gnueabi target.
 
 Before the vectorizer, we have something like:
 
   short unsigned int D.4976;
   int D.4975;
   int D.4974;
   signed char D.4973;
   signed char * D.4972;
   short unsigned int * D.4970;
 [snip]
   D.4973_8 = *D.4972_7;
   D.4974_9 = (int) D.4973_8;
   D.4975_10 = D.4974_9  5;
   D.4976_11 = (short unsigned int) D.4975_10;
   *D.4970_5 = D.4976_11;
 
 
 The pattern recognizer now goes through its list of patterns it tries
 to detect.  The first successful one is vect_recog_over_widening_pattern.
 This will annotate the statements (via related_stmt fields):
 
   D.4973_8 = *D.4972_7;
   D.4974_9 = (int) D.4973_8;
 -- D.4992_26 = (signed short) D.4973_8
   D.4975_10 = D.4974_9  5;
 -- patt.16_31 = D.4992_26  5
   D.4976_11 = (short unsigned int) D.4975_10;
 -- D.4994_32 = (short unsigned int) patt.16_31
   *D.4970_5 = D.4976_11;
 
 
 In the next step, vect_recog_widen_shift_pattern *also* matches, and
 creates a new annotation for the shift statement (using a widening
 shift):
 
   D.4973_8 = *D.4972_7;
   D.4974_9 = (int) D.4973_8;
 -- D.4992_26 = (signed short) D.4973_8
   D.4975_10 = D.4974_9  5;
[-- patt.16_31 = D.4992_26  5]
 -- patt.17_33 = D.4973_8 w 5
   D.4976_11 = (short unsigned int) D.4975_10;
 -- D.4994_32 = (short unsigned int) patt.16_31
   *D.4970_5 = D.4976_11;
 
 
 Since the original statement can only point to a single related_stmt,
 the statement setting patt.16_31 is now longer refered to as related_stmt
 by any other statement.  This causes it to no longer be considered
 relevant for the vectorizer.
 
 However, the statement:
 -- D.4994_32 = (short unsigned int) patt.16_31 
 *is* still considered relevant.  While analysing it, however, the
 vectorizer follows through to the def statement for patt.16_31,
 and gets quite confused to find that it doesn't have a vectype
 (because it wasn't considered by the vectorizer).  The symptom
 is a failing assertion
   gcc_assert (*vectype != NULL_TREE);
 in vect_is_simple_use_1.
 
 
 Now, it seems quite unusual for multiple patterns to match for a
 single original statement.  In fact, most pattern recognizers
 explicitly refuse to even consider statements that were already
 recognized.  However, vect_recog_widen_shift_pattern makes an
 exception:
 
   /* This statement was also detected as over-widening operation (it can't
  be any other pattern, because only over-widening detects shifts).
  LAST_STMT is the final type demotion statement, but its related
  statement is shift.  We analyze the related statement to catch cases:
 
  orig code:
   type a_t;
   itype res;
   TYPE a_T, res_T;
 
   S1 a_T = (TYPE) a_t;
   S2 res_T = a_T  CONST;
   S3 res = (itype)res_T;
 
   (size of type * 2 = size of itype
and size of itype * 2 = size of TYPE)
 
  code after over-widening pattern detection:
 
   S1 a_T = (TYPE) a_t;
-- a_it = (itype) a_t;
   S2 res_T = a_T  CONST;
   S3 res = (itype)res_T;  --- LAST_STMT
-- res = a_it  CONST;
 
  after widen_shift:
 
   S1 a_T = (TYPE) a_t;
-- a_it = (itype) a_t; - redundant
   S2 res_T = a_T  CONST;
   S3 res = (itype)res_T;
-- res = a_t w CONST;
 
   i.e., we replace the three statements with res = a_t w CONST.  */
 
 
 If everything were indeed as described in that comment, things would work out
 fine.  However, what is described above as code after over-widening pattern
 detection is only one of two possible outcomes of that latter pattern; the
 other is the one that happens in the current test case, where we still have
 a final type conversion left after applying the over-widening pattern.
 
 
 I guess one could try to distiguish the two cases somehow and handle both;
 but the overall approach seems quite fragile to me; it doesn't look really
 maintainable to have to rely on so many details of the operation of one
 particular pattern detection function while writing another one, or else
 risk creating subtle problems like the one described above.
 
 So I was wondering why vect_recog_widen_shift_pattern tries to take advantage
 of an already recognized over-widening pattern.  But indeed, if it does not,
 it will generate less efficient code in cases like the above test case: by
 itself vect_recog_widen_shift_pattern, would generate code to expand the
 char to a short, then do a widening shift resulting in an int, followed
 by narrowing back down to a 

Re: [rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision

2012-04-25 Thread Jakub Jelinek
On Wed, Apr 25, 2012 at 10:29:36AM +0200, Richard Guenther wrote:
  Does this look reasonable?  Any comments or suggestions appreciated!
 
 Yes, getting rid of this fragile interaction by doing more work in
 vect_recog_widen_shift_pattern sounds like the correct thing to do.

Or give up when seeing already pattern recognized stmts when detecting
different pattern, unless the current pattern recognizer is prepared to
handle them (and in that case tweak everything as necessary).

E.g. several pattern recognizers already start with
  if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
return NULL;

Jakub


[PATCH, tree-optimization] Fix for PR 52868

2012-04-25 Thread Igor Zamyatin
Hi all!

I'd like to post for review the patch which makes some costs adjusting
in get_computation_cost_at routine in ivopts part.
As mentioned in the PR changes also fix the bwaves regression from PR 52272.
Also changes introduce no degradations on spec2000/2006 and
EEMBC1.1/2.0(this was measured on Atom) on x86


Bootstrapped and regtested on x86. Ok to commit?


Changelog:

2012-04-26  Yuri Rumyantsev  yuri.rumyant...@intel.com

* tree-ssa-loop-ivopts.c (get_computation_cost_at): Add cost
of extra addition if cost of address difference is high enough.

Thanks,
Igor


ivopts.patch
Description: Binary data


Re: move default_tree_printer out of toplev.c

2012-04-25 Thread Manuel López-Ibáñez
On 25 April 2012 10:27, Richard Guenther rguent...@suse.de wrote:
 On Tue, 24 Apr 2012, Manuel López-Ibáñez wrote:

 It seems an obvious cleanup to me.

 OK?

 2012-04-24  Manuel López-Ibáñez  m...@gcc.gnu.org

 gcc/
       * tree-pretty-print.h (default_tree_printer): Do not declare.
       * tree-diagnostic.c: Include tree-pretty-print.h, tree-pass.h and
       intl.h.
       (default_tree_diagnostic_starter): Make static.
       (default_tree_printer): Move to here. Make static.
       (tree_diagnostics_defaults): New.
       * tree-diagnostic.h (default_tree_diagnostic_starter): Do not declare.
       * tree.c (free_lang_data): Use tree_diagnostics_defaults.

 Bogus changelog - you changed general_init.

Not sure what you mean. I changed tree.c (free_lang_data) and toplev.c
(general_init). to use tree_diagnostics_defaults. What should I have
said in the Changelog?

Cheers,

Manuel.


Re: combine_conversions int-double-int

2012-04-25 Thread Richard Guenther
On Wed, Apr 25, 2012 at 10:12 AM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

 a conversion like int-double-int is just the identity, as long as double
 is big enough to represent all ints exactly. The most natural way I found to
 do this optimization is the attached:

 2012-04-25  Marc Glisse  marc.gli...@inria.fr

        PR middle-end/27139
        * tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT.

 Does the approach make sense? I don't know that code, and adding FLOAT_EXPR
 / FIX_TRUNC_EXPR was a bit of guesswork. The precision of double could be
 multiplied by log2(base), but not doing it is safe. If the approach is ok, I
 could extend it so int-double-long also skips the intermediate conversion.

 Bootstrapped and regression tested on linux-x86_64.

 Should I try and write a testcase for a specific target checking for
 specific asm instructions there, or is there a better way?

Well, scanning the forwprop dump for example.

Btw, I think not munging this new case into the existing CONVERT_EXPR_P
code would be better - it makes the code (even) harder to understand and
I'm not convinced that adding FLOAT_EXPR/FIX_TRUNC_EXPR handling
does not wreck any assumptions in that code.

It also seems that for DECIMAL_FLOAT_TYPE_P the transform is always
valid?

Richard.

 (note that I can't commit)

 --
 Marc Glisse


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-25 Thread Richard Guenther
On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote:
 This patch adds heuristics to limit unrolling in loops with branches that may 
 increase
 branch mispredictions. It affects loops that are not frequently iterated, and 
 that are
 nested within a hot region of code that already contains many branch 
 instructions.

 Performance tested with both internal benchmarks and with SPEC 2000/2006 on a 
 variety
 of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD 
 Opteron systems.
 This improves performance of an internal search indexing benchmark by close 
 to 2% on
 all the tested Intel platforms.  It also consistently improves 445.gobmk 
 (with FDO feedback
 where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
 effects are
 neutral.

 Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?

 Thanks,
 Teresa

 2012-04-24   Teresa Johnson  tejohn...@google.com

        * loop-unroll.c (loop_has_call): New function.
        (loop_has_FP_comp): Ditto.
        (compute_weighted_branches): Ditto.
        (max_unroll_with_branches): Ditto.
        (decide_unroll_constant_iterations): Add heuristic to avoid
        increasing branch mispredicts when unrolling.
        (decide_unroll_runtime_iterations): Ditto.
        * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
        (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.

 Index: loop-unroll.c
 ===
 --- loop-unroll.c       (revision 186783)
 +++ loop-unroll.c       (working copy)
 @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc
                                             basic_block);
  static rtx get_expansion (struct var_to_expand *);

 +/* Determine whether LOOP contains call.  */
 +static bool
 +loop_has_call(struct loop *loop)
 +{
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);

You repeatedly do this and walk over all blocks.  Please think about
compile-time
issues when writing code.

This all looks sort-of target specific to me and I don't see why this
very specialized
patch is a good idea when unrolling does a very poor job deciding what and how
much to unroll generally.

Richard.

 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          if (CALL_P (insn))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Determine whether LOOP contains floating-point computation.  */
 +static bool
 +loop_has_FP_comp(struct loop *loop)
 +{
 +  rtx set, dest;
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          set = single_set (insn);
 +          if (!set)
 +            continue;
 +
 +          dest = SET_DEST (set);
 +          if (FLOAT_MODE_P (GET_MODE (dest)))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Compute the number of branches in LOOP, weighted by execution counts.  */
 +static float
 +compute_weighted_branches(struct loop *loop)
 +{
 +  int header_count = loop-header-count;
 +  unsigned i;
 +  float n;
 +  basic_block * body;
 +
 +  /* If no profile feedback data exists, don't limit unrolling  */
 +  if (header_count == 0)
 +    return 0.0;
 +
 +  gcc_assert (loop-latch != EXIT_BLOCK_PTR);
 +
 +  body = get_loop_body (loop);
 +  n = 0.0;
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      if (EDGE_COUNT (body[i]-succs) = 2)
 +        {
 +          /* If this block is executed less frequently than the header (loop
 +             entry), then it is weighted based on the ratio of times it is
 +             executed compared to the header.  */
 +          if (body[i]-count  header_count)
 +            n += ((float)body[i]-count)/header_count;
 +
 +          /* When it is executed more frequently than the header (i.e. it is
 +             in a nested inner loop), simply weight the branch at 1.0.  */
 +          else
 +            n += 1.0;
 +        }
 +    }
 +  free (body);
 +
 +  return n;
 +}
 +
 +/* Compute the maximum number of times LOOP can be unrolled without exceeding
 +   a branch budget, which can increase branch mispredictions. The number of
 +   branches is computed by weighting each branch with its expected execution
 +   probability through the loop based on profile data. If no profile feedback
 +   data exists, simply return the current NUNROLL factor.  */
 +static unsigned
 +max_unroll_with_branches(struct loop *loop, unsigned nunroll)
 +{
 +  struct loop *outer;
 +  struct niter_desc *outer_desc;
 +  int outer_niters = 1;
 +  float weighted_outer_branches = 

Re: [PATCH 04/11] Fix expansion point loc for macro-like tokens

2012-04-25 Thread Dodji Seketeli
Jason Merrill ja...@redhat.com writes:

 On 04/10/2012 03:42 PM, Dodji Seketeli wrote:
 In that case, besides returning NULL, enter_macro_context sets
 pfile-context-c.macro to NULL, making cpp_get_token_1 forget to set
 the location of the vari to the expansion point of A.

 This seems like a bug that should be fixed rather than worked around;
 we are still expanding A.

Right.  Below is an updated patch (with an updated introductory text)
that addresses the core of the issue.

Consider the test case gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c.
Its interesting part is:

#define A(x) vari x /* line 7.  */
#define vari(x)
#define B , varj
int A(B) ;  /* line 10.  */

In its initial version, this test was being pre-processed as:

# 1 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
# 1 build/gcc//
# 1 command-line
# 1 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
# 10 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
int
# 7 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
 vari

, varj ;

Note how int and vari are on separate lines, whereas int and
, varj are on the same line.

This looks like a bug to me, even independantly from the macro
location tracking work.

With macro location tracking turned on, the preprocessed output
becomes:

# 1 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
# 1 command-line
# 1 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
# 10 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
int vari , varj ;

Which, IMO, is what we'd expect.

This is due to an unexpected side effect of enter_macro_context when
passed a token that might look like a function-like macro at first
sight, but that it eventually considers to not be a macro after all.

This is the case for the vari token which looks like a macro when it
is first lexed, but is eventually considered to be a normal token by
enter_macro_context because it's not used as a function-like macro
invocation.

In that case, besides returning NULL, enter_macro_context sets
pfile-context-c.macro to NULL, making cpp_get_token_1 forget to set
the location of the vari to the expansion point of A.

enter_macro_context sets pfile-context-c.macro to NULL in that case
because funlike_invocation_p reads one token pass foo, sees that
there is no '(' token, so we are not invoking the function-like
parameter.  It then puts the tokens (which it has read after foo)
back into the tokens stream by calling _cpp_push_token_context on it,
which sets pfile-context-c.macro to NULL.

The fix here is to prevent funlike_invocation_p from forgetting the
current macro-ness.

Tested on x86_64-unknown-linux-gnu against trunk.  Now this test has
the same output with and without tracking locations accross macro
expansions.

Note that the bootstrap with -ftrack-macro-expansion exhibits other
separate issues that are addressed in subsequent patches.  This patch
just fixes one class of problems.

The patch does pass bootstrap with -ftrack-macro-expansion turned off,
though.

libcpp/
* macro.c (funlike_invocation_p): Don't forget macro-ness.

gcc/testsuite/

* gcc.dg/debug/dwarf2/pr41445-5.c: Adjust.
* gcc.dg/debug/dwarf2/pr41445-6.c: Likewise.
---
 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c |5 -
 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c |5 -
 libcpp/macro.c|   12 +++-
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
index 03af604..d21acd5 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c
@@ -9,6 +9,9 @@
 #define B , varj
 int A(B) ;
 
-/* { dg-final { scan-assembler 
DW_TAG_variable\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\vari\[^\\r\\n\]*DW_AT_name(\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*DW_AT_)*\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\[^0-9a-fA-FxX](0x)?7\[^0-9a-fA-FxX]\[^\\r\\n\]*DW_AT_decl_line
 } } */
+/*  We want to check that both vari and varj have the same line
+number.  */
+
+/* { dg-final { scan-assembler 
DW_TAG_variable\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\vari\[^\\r\\n\]*DW_AT_name(\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*DW_AT_)*\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\[^0-9a-fA-FxX](0xa|10)\[^0-9a-fA-FxX]\[^\\r\\n\]*DW_AT_decl_line
 } } */
 /* { dg-final { scan-assembler 
DW_TAG_variable\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\varj\[^\\r\\n\]*DW_AT_name(\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*DW_AT_)*\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\[^0-9a-fA-FxX](0xa|10)\[^0-9a-fA-FxX]\[^\\r\\n\]*DW_AT_decl_line
 } } */
 /* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c
index 8aa37d1..d6d79cc 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c
@@ -4,5 +4,8 @@
 
 #include pr41445-5.c
 
-/* { dg-final { scan-assembler 

Re: [PATCH, tree-optimization] Fix for PR 52868

2012-04-25 Thread Richard Guenther
On Wed, Apr 25, 2012 at 10:56 AM, Igor Zamyatin izamya...@gmail.com wrote:
 Hi all!

 I'd like to post for review the patch which makes some costs adjusting
 in get_computation_cost_at routine in ivopts part.
 As mentioned in the PR changes also fix the bwaves regression from PR 52272.
 Also changes introduce no degradations on spec2000/2006 and
 EEMBC1.1/2.0(this was measured on Atom) on x86


 Bootstrapped and regtested on x86. Ok to commit?

I can't make sense of the patch and the comment does not help.

+  diff_cost = cost.cost;
   cost.cost /= avg_loop_niter (data-current_loop);
+  add_cost_val = add_cost (TYPE_MODE (ctype), data-speed);
+  /* Do cost correction if address cost is small enough
+ and difference cost is high enough.  */
+  if (address_p  diff_cost  add_cost_val
+   get_address_cost (symbol_present, var_present,
+   offset, ratio, cstepi,
+   TYPE_MODE (TREE_TYPE (utype)),
+   TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+   speed, stmt_is_after_inc,
+   can_autoinc).cost = add_cost_val)
+cost.cost += add_cost_val;

Please explain more thoroughly.  It also would seem to be better to add
an extra case, as later code does

  /* Now the computation is in shape symbol + var1 + const + ratio * var2.
 (symbol/var1/const parts may be omitted).  If we are looking for an
 address, find the cost of addressing this.  */
  if (address_p)
return add_costs (cost,
  get_address_cost (symbol_present, var_present,
offset, ratio, cstepi,
TYPE_MODE (TREE_TYPE (utype)),
TYPE_ADDR_SPACE (TREE_TYPE (utype)),
speed, stmt_is_after_inc,
can_autoinc));

thus refactoring the code a bit would make it possible to CSE the
get_address_cost
call and eventually make it clearer what the code does.

Richard.


 Changelog:

 2012-04-26  Yuri Rumyantsev  yuri.rumyant...@intel.com

        * tree-ssa-loop-ivopts.c (get_computation_cost_at): Add cost
        of extra addition if cost of address difference is high enough.

 Thanks,
 Igor


Re: [PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058, take 2)

2012-04-25 Thread Jakub Jelinek
On Tue, Apr 24, 2012 at 10:33:58AM +0200, Richard Guenther wrote:
  So like this instead?
 
 Yes!

Unfortunately that didn't bootstrap, apparently double-int.h is included in
gen* which aren't linked with double-int.o, and the inlines stay at -O0
or with -fkeep-inline-functions.  Even guarding with #ifndef GENERATOR_FILE
doesn't work, because gengtype is apparently built both with that and
without.

So this new version instead just prototypes them in double-int.h and defines
in double-int.c.  Still ok?

2012-04-25  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/53058
* double-int.h (double_int_max_value, double_int_min_value): New
prototypes.
* double-int.c (double_int_max_value, double_int_min_value): New
functions.
* tree-vrp.c (register_edge_assert_for_2): Compare mask
for LE_EXPR or GT_EXPR with double_int_max_value
instead of double_int_mask.

* gcc.c-torture/compile/pr53058.c: New test.

--- gcc/double-int.h.jj 2012-03-29 12:01:31.623648408 +0200
+++ gcc/double-int.h2012-04-24 17:54:37.798945484 +0200
@@ -1,5 +1,5 @@
 /* Operations with long integers.
-   Copyright (C) 2006, 2007, 2008, 2010 Free Software Foundation, Inc.
+   Copyright (C) 2006, 2007, 2008, 2010, 2012 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -242,6 +242,9 @@ double_int double_int_sext (double_int, 
 double_int double_int_zext (double_int, unsigned);
 double_int double_int_mask (unsigned);
 
+double_int double_int_max_value (unsigned int, bool);
+double_int double_int_min_value (unsigned int, bool);
+
 #define ALL_ONES (~((unsigned HOST_WIDE_INT) 0))
 
 /* The operands of the following comparison functions must be processed
--- gcc/double-int.c.jj 2012-04-02 21:40:10.0 +0200
+++ gcc/double-int.c2012-04-24 17:55:44.745723956 +0200
@@ -1,5 +1,5 @@
 /* Operations with long integers.
-   Copyright (C) 2006, 2007, 2009, 2010 Free Software Foundation, Inc.
+   Copyright (C) 2006, 2007, 2009, 2010, 2012 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -616,6 +616,26 @@ double_int_mask (unsigned prec)
   return mask;
 }
 
+/* Returns a maximum value for signed or unsigned integer
+   of precision PREC.  */
+
+double_int
+double_int_max_value (unsigned int prec, bool uns)
+{
+  return double_int_mask (prec - (uns ? 0 : 1));
+}
+
+/* Returns a minimum value for signed or unsigned integer
+   of precision PREC.  */
+
+double_int
+double_int_min_value (unsigned int prec, bool uns)
+{
+  if (uns)
+return double_int_zero;
+  return double_int_lshift (double_int_one, prec - 1, prec, false);
+}
+
 /* Clears the bits of CST over the precision PREC.  If UNS is false, the bits
outside of the precision are set to the sign bit (i.e., the PREC-th one),
otherwise they are set to zero.
--- gcc/tree-vrp.c.jj   2012-04-24 07:59:58.142773712 +0200
+++ gcc/tree-vrp.c  2012-04-24 17:41:10.473562899 +0200
@@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e
   INTEGRAL_TYPE_P (TREE_TYPE (name2))
   IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1)
   prec = 2 * HOST_BITS_PER_WIDE_INT
+  prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val)))
   live_on_edge (e, name2)
   !has_single_use (name2))
{
@@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e
new_val = val2;
  else
{
+ double_int maxval
+   = double_int_max_value (prec, TYPE_UNSIGNED (TREE_TYPE (val)));
  mask = double_int_ior (tree_to_double_int (val2), mask);
- if (double_int_minus_one_p (double_int_sext (mask, prec)))
+ if (double_int_equal_p (mask, maxval))
new_val = NULL_TREE;
  else
new_val = double_int_to_tree (TREE_TYPE (val2), mask);
--- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj2012-04-24 
17:41:10.475662886 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr53058.c   2012-04-24 
17:41:10.475662886 +0200
@@ -0,0 +1,12 @@
+/* PR tree-optimization/53058 */
+
+int a, b, c;
+
+void
+foo ()
+{
+  c = b  16;
+  if (c  32767)
+c = 0;
+  a = b;
+}


Jakub


Re: combine_conversions int-double-int

2012-04-25 Thread Marc Glisse

First, thanks a lot for answering.

On Wed, 25 Apr 2012, Richard Guenther wrote:


On Wed, Apr 25, 2012 at 10:12 AM, Marc Glisse marc.gli...@inria.fr wrote:

Hello,

a conversion like int-double-int is just the identity, as long as double
is big enough to represent all ints exactly. The most natural way I found to
do this optimization is the attached:

2012-04-25  Marc Glisse  marc.gli...@inria.fr

       PR middle-end/27139
       * tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT.

Does the approach make sense? I don't know that code, and adding FLOAT_EXPR
/ FIX_TRUNC_EXPR was a bit of guesswork. The precision of double could be
multiplied by log2(base), but not doing it is safe. If the approach is ok, I
could extend it so int-double-long also skips the intermediate conversion.

Bootstrapped and regression tested on linux-x86_64.




Should I try and write a testcase for a specific target checking for
specific asm instructions there, or is there a better way?


Well, scanning the forwprop dump for example.


No idea how that works, but with some grepping and looking at other tests 
it shouln't be too hard. I'll try, thanks for the pointer.



Btw, I think not munging this new case into the existing CONVERT_EXPR_P
code would be better - it makes the code (even) harder to understand and
I'm not convinced that adding FLOAT_EXPR/FIX_TRUNC_EXPR handling
does not wreck any assumptions in that code.


Ok.


It also seems that for DECIMAL_FLOAT_TYPE_P the transform is always
valid?


What do you mean? I just tested and my patch does the simplification for 
int-_Decimal128-int but not int-_Decimal32-int, which makes sense to 
me as _Decimal32 can't represent exactly every int. For _Decimal64, it 
doesn't do the simplification although it could, because I didn't multiply 
by log2(base), as I said above. I also thought of combining in vrp, where 
the range information could make more simplifications valid, but that 
seemed harder.


--
Marc Glisse


Re: [PATCH 05/11] Make expand_location resolve to locus in main source file

2012-04-25 Thread Dodji Seketeli
 On 04/10/2012 03:49 PM, Dodji Seketeli wrote:
 Apparently, quite some places in the compiler (like the C/C++
 preprocessor, the debug info machinery) expect expand_location to
 resolve to locations that are in the main source file, even if the
 token at stake comes from a macro that was defined in a header
 somewhere.  Turning on -ftrack-macro-expansion by default was
 triggering a lot of failures (not necessarily related to diagnostics)
 because expand_location resolves to spelling locations instead.

 Can you elaborate on these failures?  For debug info I would think
 that the spelling location would be useful information, though I
 suppose without any way to unwind to the expansion context it could
 be a bit confusing.  What is the problem for the preprocessor?

For the preprocessor, consider a short excerpt of the the test
gcc/testsuite/gcc.dg/cpp/avoidpaste1.c:

#define f(x) x
#define g
#define tricky 1.0e ## -1

:: :g: :f(): :f(^): tricky

As the comment in the text says, it should preprocess as:

:: : : : : :^: 1.0e- 1

but it actually preprocess as:

# 1 /home/dodji/devel/git/gcc/tmp/gcc/testsuite/gcc.dg/cpp/avoidpaste1.c
# 1 command-line
# 1 /home/dodji/devel/git/gcc/tmp/gcc/testsuite/gcc.dg/cpp/avoidpaste1.c
# 25 /home/dodji/devel/git/gcc/tmp/gcc/testsuite/gcc.dg/cpp/avoidpaste1.c
:: : : : : :^:
# 14 /home/dodji/devel/git/gcc/tmp/gcc/testsuite/gcc.dg/cpp/avoidpaste1.c
   1.0e- 1

Note how the 1.0e- 1 is not on the same line as its preceding
:: : : : : :^:.

The problem is that the pre-processor code indirectly uses
expand_location in many spots.  For that kind of existing code that is
not related to diagnostics, I am really not confident in changing the
underlying implicit assumption of the that function which is, to
return the expansion point location.  I don't even know before hand
where all these spots are in the code base, so auditing it is hard for
me.

I forgot to say that are also other weird failure like:

FAIL: g++.dg/cdce3.C -std=gnu++98 scan-tree-dump cdce cdce3.C:92: note: 
function call is shrink-wrapped into error conditions.

due to that change.

That's why I prefer erring on the safe side by not changing the
assumption of existing code, and provide a way to expand locations to
their spelling point, just for diagnostics.  I think this is already
an improvement over what we previously had.

And for the debug info cases, I'd propose that if we find specific
examples where the unwinding to the spelling locations is better, then
we'll consider using it at that moment.

-- 
Dodji


Re: [C++ Patch] PR 39970

2012-04-25 Thread Paolo Carlini

On 04/25/2012 01:41 AM, Jason Merrill wrote:

On 04/24/2012 05:24 PM, Paolo Carlini wrote:

Perhaps we aren't checking default arguments unless they're actually
used; a could change that if they aren't dependent.


Your reply reached the mailing list a bit mangled, could you please
clarify?
If the default argument isn't dependent on other template parameters, 
we can check immediately whether it is a suitable template argument 
for its parameter, including being a constant expression.

Just wanted to add that we should indeed do this also to catch things like:

struct blah { int member; };

template char param = blah::member
class template6_blah;

which currently we are also missing.

Paolo.





Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

2012-04-25 Thread Richard Guenther
On Tue, Apr 24, 2012 at 11:19 PM, Tom de Vries tom_devr...@mentor.com wrote:
 On 17/04/12 14:24, Richard Guenther wrote:
 On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries tom_devr...@mentor.com wrote:
 On 27/01/12 21:37, Tom de Vries wrote:
 On 24/01/12 11:40, Richard Guenther wrote:
 On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com 
 wrote:
 Richard,
 Jakub,

 the following patch fixes PR51879.

 Consider the following test-case:
 ...
 int bar (int);
 void baz (int);

 void
 foo (int y)
 {
  int a;
  if (y == 6)
    a = bar (7);
  else
    a = bar (7);
  baz (a);
 }
 ...

 after compiling at -02, the representation looks like this before 
 tail-merging:
 ...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # .MEMD.1714_7 = VDEF .MEMD.1714_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_3 = barD.1703 (7);
  goto bb 5;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # PRED: 2 [80.1%]  (false,exec)
  # .MEMD.1714_8 = VDEF .MEMD.1714_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_4 = barD.1703 (7);
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:1
  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
  # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4)
  # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4)
  # .MEMD.1714_9 = VDEF .MEMD.1714_5
  # USE = nonlocal
  # CLB = nonlocal
  bazD.1705 (aD.1709_1);
  # VUSE .MEMD.1714_9
  return;
 ...

 the patch allows aD.1709_4 to be value numbered to aD.1709_3, and 
 .MEMD.1714_8
 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.

 The patch makes sure non-const/pure call results (gimple_vdef and
 gimple_call_lhs) are properly value numbered.

 Bootstrapped and reg-tested on x86_64.

 ok for stage1?

 The following cannot really work:

 @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
    result = vn_reference_lookup_1 (vr1, NULL);
    if (result)
      {
 -      changed = set_ssa_val_to (lhs, result);
 +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
 +      if (vdef)
 +       changed |= set_ssa_val_to (vdef, result_vdef);
 +      changed |= set_ssa_val_to (lhs, result);

 because 'result' may be not an SSA name.  It might also not have
 a proper definition statement (if VN_INFO (result)-needs_insertion
 is true).  So you at least need to guard things properly.


 Right. And that also doesn't work if the function is without lhs, such as 
 in the
 new test-case pr51879-6.c.

 I fixed this by storing both lhs and vdef, such that I don't have to derive
 the vdef from the lhs.

 (On a side-note - I _did_ want to remove value-numbering virtual operands
 at some point ...)


 Doing so willl hurt performance of tail-merging in its current form.
 OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
 value numbering as used in tail-merging has its limitations too.
 Do you have any ideas how to address that one?

 @@ -3359,8 +3366,10 @@ visit_use (tree use)
           /* ???  We should handle stores from calls.  */
           else if (TREE_CODE (lhs) == SSA_NAME)
             {
 +             tree vuse = gimple_vuse (stmt);
               if (!gimple_call_internal_p (stmt)
 -                  gimple_call_flags (stmt)  (ECF_PURE | ECF_CONST))
 +                  (gimple_call_flags (stmt)  (ECF_PURE | ECF_CONST)
 +                     || (vuse  SSA_VAL (vuse) != VN_TOP)))
                 changed = visit_reference_op_call (lhs, stmt);
               else
                 changed = defs_to_varying (stmt);

 ... exactly because of the issue that a stmt has multiple defs.  Btw,
 vuse should have been visited here or be part of our SCC, so, why do
 you need this check?


 Removed now, that was a workaround for a bug in an earlier version of the 
 patch,
 that I didn't need anymore.

 Bootstrapped and reg-tested on x86_64.

 OK for stage1?


 Richard,

 quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
 ...
 I think these fixes hint at that we should
 use structural equality as fallback if value-numbering doesn't equate
 two stmt effects.  Thus, treat two stmts with exactly the same operands
 and flags as equal and using value-numbering to canonicalize operands
 (when they are SSA names) for that comparison, or use VN entirely
 if there are no side-effects on the stmt.

 Changing value-numbering of virtual operands, even if it looks correct in 
 the
 simple cases you change, doesn't look like a general solution for the missed
 tail merging opportunities.
 ...

 The test-case pr51879-6.c shows a case where improving value numbering will 
 help
 tail-merging, but structural equality comparison not:
 ...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # .MEMD.1717_7 = VDEF .MEMD.1717_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  blaD.1708 (5);
  # .MEMD.1717_8 = VDEF .MEMD.1717_7
  # USE = nonlocal
  # CLB = nonlocal
  aD.1712_3 = barD.1704 (7);
  goto bb 5;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # 

Re: [PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058, take 2)

2012-04-25 Thread Richard Guenther
On Wed, Apr 25, 2012 at 11:21 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Apr 24, 2012 at 10:33:58AM +0200, Richard Guenther wrote:
  So like this instead?

 Yes!

 Unfortunately that didn't bootstrap, apparently double-int.h is included in
 gen* which aren't linked with double-int.o, and the inlines stay at -O0
 or with -fkeep-inline-functions.  Even guarding with #ifndef GENERATOR_FILE
 doesn't work, because gengtype is apparently built both with that and
 without.

 So this new version instead just prototypes them in double-int.h and defines
 in double-int.c.  Still ok?

Sure.

 2012-04-25  Jakub Jelinek  ja...@redhat.com

        PR tree-optimization/53058
        * double-int.h (double_int_max_value, double_int_min_value): New
        prototypes.
        * double-int.c (double_int_max_value, double_int_min_value): New
        functions.
        * tree-vrp.c (register_edge_assert_for_2): Compare mask
        for LE_EXPR or GT_EXPR with double_int_max_value
        instead of double_int_mask.

        * gcc.c-torture/compile/pr53058.c: New test.

 --- gcc/double-int.h.jj 2012-03-29 12:01:31.623648408 +0200
 +++ gcc/double-int.h    2012-04-24 17:54:37.798945484 +0200
 @@ -1,5 +1,5 @@
  /* Operations with long integers.
 -   Copyright (C) 2006, 2007, 2008, 2010 Free Software Foundation, Inc.
 +   Copyright (C) 2006, 2007, 2008, 2010, 2012 Free Software Foundation, Inc.

  This file is part of GCC.

 @@ -242,6 +242,9 @@ double_int double_int_sext (double_int,
  double_int double_int_zext (double_int, unsigned);
  double_int double_int_mask (unsigned);

 +double_int double_int_max_value (unsigned int, bool);
 +double_int double_int_min_value (unsigned int, bool);
 +
  #define ALL_ONES (~((unsigned HOST_WIDE_INT) 0))

  /* The operands of the following comparison functions must be processed
 --- gcc/double-int.c.jj 2012-04-02 21:40:10.0 +0200
 +++ gcc/double-int.c    2012-04-24 17:55:44.745723956 +0200
 @@ -1,5 +1,5 @@
  /* Operations with long integers.
 -   Copyright (C) 2006, 2007, 2009, 2010 Free Software Foundation, Inc.
 +   Copyright (C) 2006, 2007, 2009, 2010, 2012 Free Software Foundation, Inc.

  This file is part of GCC.

 @@ -616,6 +616,26 @@ double_int_mask (unsigned prec)
   return mask;
  }

 +/* Returns a maximum value for signed or unsigned integer
 +   of precision PREC.  */
 +
 +double_int
 +double_int_max_value (unsigned int prec, bool uns)
 +{
 +  return double_int_mask (prec - (uns ? 0 : 1));
 +}
 +
 +/* Returns a minimum value for signed or unsigned integer
 +   of precision PREC.  */
 +
 +double_int
 +double_int_min_value (unsigned int prec, bool uns)
 +{
 +  if (uns)
 +    return double_int_zero;
 +  return double_int_lshift (double_int_one, prec - 1, prec, false);
 +}
 +
  /* Clears the bits of CST over the precision PREC.  If UNS is false, the bits
    outside of the precision are set to the sign bit (i.e., the PREC-th one),
    otherwise they are set to zero.
 --- gcc/tree-vrp.c.jj   2012-04-24 07:59:58.142773712 +0200
 +++ gcc/tree-vrp.c      2012-04-24 17:41:10.473562899 +0200
 @@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e
               INTEGRAL_TYPE_P (TREE_TYPE (name2))
               IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1)
               prec = 2 * HOST_BITS_PER_WIDE_INT
 +              prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val)))
               live_on_edge (e, name2)
               !has_single_use (name2))
            {
 @@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e
            new_val = val2;
          else
            {
 +             double_int maxval
 +               = double_int_max_value (prec, TYPE_UNSIGNED (TREE_TYPE 
 (val)));
              mask = double_int_ior (tree_to_double_int (val2), mask);
 -             if (double_int_minus_one_p (double_int_sext (mask, prec)))
 +             if (double_int_equal_p (mask, maxval))
                new_val = NULL_TREE;
              else
                new_val = double_int_to_tree (TREE_TYPE (val2), mask);
 --- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj    2012-04-24 
 17:41:10.475662886 +0200
 +++ gcc/testsuite/gcc.c-torture/compile/pr53058.c       2012-04-24 
 17:41:10.475662886 +0200
 @@ -0,0 +1,12 @@
 +/* PR tree-optimization/53058 */
 +
 +int a, b, c;
 +
 +void
 +foo ()
 +{
 +  c = b  16;
 +  if (c  32767)
 +    c = 0;
 +  a = b;
 +}


        Jakub


Re: move default_tree_printer out of toplev.c

2012-04-25 Thread Richard Guenther
On Wed, 25 Apr 2012, Manuel López-Ibáñez wrote:

 On 25 April 2012 10:27, Richard Guenther rguent...@suse.de wrote:
  On Tue, 24 Apr 2012, Manuel López-Ibáñez wrote:
 
  It seems an obvious cleanup to me.
 
  OK?
 
  2012-04-24  Manuel López-Ibáñez  m...@gcc.gnu.org
 
  gcc/
        * tree-pretty-print.h (default_tree_printer): Do not declare.
        * tree-diagnostic.c: Include tree-pretty-print.h, tree-pass.h and
        intl.h.
        (default_tree_diagnostic_starter): Make static.
        (default_tree_printer): Move to here. Make static.
        (tree_diagnostics_defaults): New.
        * tree-diagnostic.h (default_tree_diagnostic_starter): Do not 
  declare.
        * tree.c (free_lang_data): Use tree_diagnostics_defaults.
 
  Bogus changelog - you changed general_init.
 
 Not sure what you mean. I changed tree.c (free_lang_data) and toplev.c
 (general_init). to use tree_diagnostics_defaults. What should I have
 said in the Changelog?

Oops.  Misread the patch.

Sorry for the noise,
Richard.

Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

2012-04-25 Thread Jakub Jelinek
On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote:
 void *foo ()
 {
   return __builtin_return_address (0);
 }
 
 void *bar (_Bool b)
 {
   if (b)
 return foo ();
   else
 return foo ();
 }
 
 int main()
 {
   if (bar(true) == bar(false))
 abort ();
 }
 
 ok ... outside of the scope of standard C, but we certainly _can_ do this.
 Which would question tail-merging the above at all, of course.

I don't think we guarantee the above, after all, even pure functions may
use __builtin_return_address (0) - it doesn't modify memory, and we happily
remove pure calls, CSE the return values etc.

Jakub


Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

2012-04-25 Thread Tom de Vries
On 25/04/12 12:09, Jakub Jelinek wrote:
 On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote:
 void *foo ()
 {
   return __builtin_return_address (0);
 }

 void *bar (_Bool b)
 {
   if (b)
 return foo ();
   else
 return foo ();
 }

 int main()
 {
   if (bar(true) == bar(false))
 abort ();
 }

 ok ... outside of the scope of standard C, but we certainly _can_ do this.
 Which would question tail-merging the above at all, of course.
 
 I don't think we guarantee the above, after all, even pure functions may
 use __builtin_return_address (0) - it doesn't modify memory, and we happily
 remove pure calls, CSE the return values etc.
 

Jakub,

pure:
- no effects except the return value
- return value depends only on the parameters and/or global variables

AFAIU, given this definition, a pure function cannot use
__builtin_return_address since it would mean that the return value depends on
something else than parameters and global variables.

Thanks,
- Tom

   Jakub



Re: PR c/53066 Wshadow should not warn for shadowing an extern function

2012-04-25 Thread Manuel López-Ibáñez
On 25 April 2012 00:01, Joseph S. Myers jos...@codesourcery.com wrote:
 On Sun, 22 Apr 2012, Manuel López-Ibáñez wrote:

 Wshadow warns whenever any declaration shadows a global function
 declaration. This is almost always noise, since most (always?) of the
 time one cannot mistakenly replace a function by another variable. The
 false positives are too common (Linus mentions using the name 'index'
 when including string.h).

 I think the correct rule would be: warn if a variable *with
 pointer-to-function type* shadows a function, and warn if a nested
 function shadows another function, but don't warn for variables shadowing
 functions if the variable has any other type (because if the variable has
 some type that isn't a pointer-to-function, no confusion is likely without
 another error being given).

Right. How does one check that a decl is a nested function?

Cheers,

Manuel.


Re: Continue strict-volatile-bitfields fixes

2012-04-25 Thread Thomas Schwinge
Hi!

On Thu, 19 Apr 2012 19:46:17 +0200, I wrote:
 Here is my current test case, reduced from gcc.dg/tree-ssa/20030922-1.c:
 
 extern void abort (void);
 
 enum tree_code
 {
   BIND_EXPR,
 };
 struct tree_common
 {
   enum tree_code code:8;
 };
 void
 voidify_wrapper_expr (struct tree_common *common)
 {
   switch (common-code)
 {
 case BIND_EXPR:
   if (common-code != BIND_EXPR)
 abort ();
 }
 }
 
 The diff for -fdump-tree-all between compiling with
 -fno-strict-volatile-bitfields and -fstrict-volatile-bitfields begins
 with the following, right in 20030922-1.c.003t.original:
 
 diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original
 --- fnsvb/20030922-1.c.003t.original2012-04-19 16:51:18.322150866 +0200
 +++ fsvb/20030922-1.c.003t.original 2012-04-19 16:49:18.132088498 +0200
 @@ -7,7 +7,7 @@
switch ((int) common-code)
  {
case 0:;
 -  if (common-code != 0)
 +  if ((BIT_FIELD_REF *common, 32, 0  255) != 0)
  {
abort ();
  }
 
 That is, for -fno-strict-volatile-bitfields the second instance of
 »common-code« it is a component_ref, whereas for
 -fstrict-volatile-bitfields it is a bit_field_ref.  The former will be
 optimized as intended, the latter will not.  So, what should be emitted
 in the -fstrict-volatile-bitfields case?  A component_ref -- but this is
 what Bernd's commit r182545 (for PR51200) changed, I think?  Or should
 later optimization stages learn to better deal with a bit_field_ref, and
 where would this have to happen?

I figured out why this generic code is behaving differently for SH
targets.  I compared to ARM as well as x86, for which indeed the
optimization possibility is not lost even when compiling with
-fstrict-volatile-bitfields.

The component_ref inside the if statement (but not the one in the switch
statement) is fed into optimize_bit_field_compare where it is optimized
to »BIT_FIELD_REF *common, 32, 0  255« only for SH, because
get_best_mode for SH returns SImode (ARM, x86: QImode), because SH has
»#define SLOW_BYTE_ACCESS 1«, and thus the original QImode is widened to
SImode, hoping for better code generation »since it may eliminate
subsequent memory access if subsequent accesses occur to other fields in
the same word of the structure, but to different bytes«.  (Well, the
converse happens here...)  After that, in optimize_bit_field_compare, for
ARM and x86, »nbitsize == lbitsize«, and thus the optimization is
aborted, whereas for SH it will be performed, that is, the component_ref
is replaced with »BIT_FIELD_REF *common, 32, 0  255«.

If I manually force SH's SImode back to QImode (that is, abort
optimize_bit_field_compare), the later optimization passes work as they
do for ARM and x86.

Before Bernd's r182545, optimize_bit_field_compare returned early (that
is, aborted this optimization attempt), because »lbitsize ==
GET_MODE_BITSIZE (lmode)« (8 bits).  (With the current sources, lmode is
VOIDmode.)

Is emmitting »BIT_FIELD_REF *common, 32, 0  255« wrong in this case,
or should a later optimization pass be able to figure out that
»BIT_FIELD_REF *common, 32, 0  255« is in fact the same as
common-code, and then be able to conflate these?  Any suggestions
where/how to tackle this?


Grüße,
 Thomas


pgp0SPag23wvg.pgp
Description: PGP signature


Re: Symbol table 13/many: reachability code rewrite

2012-04-25 Thread Jan Hubicka
 This caused:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53088
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53089

The first one seems just moved warning because of different gimplification 
order.
Fixed thus.  Note that the warning comes out at weird place with input_location,
but that problem existed here before the patch.

I've comitted the following.

I am looking into the fortran coarays now.  It seems that coarray registration
is somehow broken, but I am not even sure how it is supposed to work.

Honza

Index: gcc.target/i386/pr39082-1.c
===
--- gcc.target/i386/pr39082-1.c (revision 186814)
+++ gcc.target/i386/pr39082-1.c (working copy)
@@ -13,7 +13,7 @@ extern int bar1 (union un);
 extern union un bar2 (int);
 
 int
-foo1 (union un u)
+foo1 (union un u) /* { dg-message note: the ABI of passing union with long 
double has changed in GCC 4.4 } */
 {
   bar1 (u);
   return u.i;
@@ -30,6 +30,6 @@ foo2 (void)
 int
 foo3 (int x)
 {
-  union un u = bar2 (x); /* { dg-message note: the ABI of passing union with 
long double has changed in GCC 4.4 } */
+  union un u = bar2 (x);
   return u.i;
 }
Index: ChangeLog
===
--- ChangeLog   (revision 186814)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2012-04-25  Jan Hubicka  j...@suse.cz
+
+   PR middle-end/53088
+   * gcc.target/i386/pr39082-1.c: Update warning location.
+
 2012-04-25  Jakub Jelinek  ja...@redhat.com
 
PR c/52880


[PING][Patch, Testsuite] fix failure in test gcc.dg/pr52283.c

2012-04-25 Thread Greta Yorsh
PING! Here is the original post:
http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01235.html

This patch fixes the failure in gcc.dg/pr52283.c by adding the missing
dg-warning and dg-options.

OK for trunk?

Thanks,
Greta

gcc/testsuite/ChangeLog

2012-04-20  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/pr52283.c: Add missing dg-warning and dg-options.


diff --git a/gcc/testsuite/gcc.dg/pr52283.c b/gcc/testsuite/gcc.dg/pr52283.c
index 33785a5..070e71a 100644
--- a/gcc/testsuite/gcc.dg/pr52283.c
+++ b/gcc/testsuite/gcc.dg/pr52283.c
@@ -1,6 +1,7 @@
 /* Test for case labels not integer constant expressions but folding
to integer constants (used in Linux kernel).  */
 /* { dg-do compile } */
+/* { dg-options -pedantic } */
 
 extern unsigned int u;
 
@@ -9,7 +10,7 @@ b (int c)
 {
   switch (c)
 {
-case (int) (2  | ((4  8) ? 8 : u)):
+case (int) (2  | ((4  8) ? 8 : u)): /* { dg-warning case label is not
an integer constant expression } */
   ;
 }
 }





Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-25 Thread Ulrich Weigand
Richard Sandiford wrote:
 Vladimir Makarov vmaka...@redhat.com writes:
  Taking your results for S390 and ARM with Neon into account, I guess it 
  should be included and probably made by default for these 2 targets (for 
  sure for s390).
 
 OK, thanks to both of you.
 
 Ulrich and Andreas: would you be happy for s390 to use this by default?
 I'll update the patch and install if so.

I've talked to Andreas, and we agree that s390 should use this as default.
If you install the base patch, we'll do the back-end change accordingly.
(I'll also work with Ramana to enable it on ARM where it makes sense,
probably when targeting Cortex-A cores.)

Thanks again to you and Vlad for looking into this long-standing problem!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: [rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision

2012-04-25 Thread Ulrich Weigand
Jakub Jelinek wrote:
 On Wed, Apr 25, 2012 at 10:29:36AM +0200, Richard Guenther wrote:
   Does this look reasonable?  Any comments or suggestions appreciated!
  
  Yes, getting rid of this fragile interaction by doing more work in
  vect_recog_widen_shift_pattern sounds like the correct thing to do.
 
 Or give up when seeing already pattern recognized stmts when detecting
 different pattern, unless the current pattern recognizer is prepared to
 handle them (and in that case tweak everything as necessary).
 
 E.g. several pattern recognizers already start with
   if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
 return NULL;

Yes, we should do that, and it'll fix the ICE for sure, but -as I said
in the original mail- doing *only* this will cause regressions in some
cases because we no longer get the combined fix-over-widening-*and*-
use-widening-shift effect ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: [PR tree-optimization/52558]: RFC: questions on store data race

2012-04-25 Thread Richard Guenther
On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez al...@redhat.com wrote:
 On 04/13/12 03:46, Richard Guenther wrote:

 On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandezal...@redhat.com  wrote:


 Richard.  Thanks so much for reviewing and providing an alternative
 approach, which AFAICT provides superior results.


 A similar effect could be obtained by keeping a flag whether we entered
 the
 path that stored the value before the transform.  Thus, do

   lsm = g2;  // technically not needed, if you also handle loads
   flag = false;
   for (...)
     {
        if (g1)
          {
             if (flag)
               g2 = lsm;
          }
        else
          {
             lsm = 0;
             flag = true;
          }
     }
   if (flag)
     g2 = lsm;


 I have implemented this in the attached patch, and it seems to be generating
 better code than my other if-changed approach.  It also avoids generating
 unnecessary loads that may trap.

 For the original testcase:


 int g_1 = 1;
 int g_2 = 0;

 int func_1(void)
 {
  int l;
  for (l = 0; l  1234; l++)
  {
   if (g_1)
     return l;
   else
     g_2 = 0;
  }
  return 999;
 }

 After all optimizations are done, we are now generating the following for
 the flags approach:

 new:
 func_1:
        movl    g_1(%rip), %edx
        xorl    %eax, %eax
        testl   %edx, %edx
        je      .L5
        rep
        ret
 .L5:
        movl    $0, g_2(%rip)
        movw    $999, %ax
        ret

 Which is significantly better than the unmodified trunk of:

 func_1:
        movl    g_1(%rip), %edx
        movl    g_2(%rip), %eax
        testl   %edx, %edx
        je      .L5
        movl    %eax, g_2(%rip)
        xorl    %eax, %eax
        ret
 .L5:
        movl    $0, g_2(%rip)
        movl    $999, %eax
        ret

 And in other less contrived cases, we generate correct code that avoids
 writes on code paths that did not have it.  For example, in Hans register
 promotion example:

 int count;

 struct obj {
    int data;
    struct obj *next;
 } *q;

 void func()
 {
  struct obj *p;
  for (p = q; p; p = p-next)
        if (p-data  0)
                count++;
 }

 Under the new memory model we should avoid promoting count to a register
 and unilaterally writing to it upon exiting the loop.

 With the attached patch, we now generate:

 func:
 .LFB0:
        movq    q(%rip), %rax
        xorl    %ecx, %ecx
        movl    count(%rip), %edx
        testq   %rax, %rax
        je      .L1
 .L9:
        movl    (%rax), %esi
        testl   %esi, %esi
        jle     .L3
        addl    $1, %edx
        movl    $1, %ecx
 .L3:
        movq    8(%rax), %rax
        testq   %rax, %rax
        jne     .L9
        testb   %cl, %cl
        jne     .L12
 .L1:
        rep
        ret
 .L12:
        movl    %edx, count(%rip)
        ret

 Which is as expected.

 I don't understand your suggestion of:


    lsm = g2;  // technically not needed, if you also handle loads

 that would allow to extend this to cover the cases where the access may

 eventually trap (if you omit the load before the loop).


 Can't I just remove the lsm=g2?  What's this about also handling loads?



 Not sure which is more efficient (you can eventually combine flag
 variables
 for different store locations, combining the if-changed compare is not so
 easy
 I guess).


 So far, I see better code generated with the flags approach, so...


 1. No pass can figure out the equality (or inequality) of g_2_lsm and
 g_2.
  In the PR, Richard mentions that both FRE/PRE can figure this out, but
 they
 are not run after store motion.  DOM should also be able to, but
 apparently
 does not :(.  Tips?


 Well.  Schedule FRE after loop opts ...

 DOM is not prepared to handle loads/stores with differing VOPs - it was
 never
 updated really after the alias-improvements merge.

 2. The GIMPLE_CONDs I create in this patch are currently causing problems
 with complex floats/doubles.  do_jump is complaining that it can't
 compare
 them, so I assume it is too late (in tree-ssa-loop-im.c) to compare
 complex
 values since complex lowering has already happened (??). Is there a more
 canonical way of creating a GIMPLE_COND that may contain complex floats
 at
 this stage?


 As you are handling redundant stores you want a bitwise comparison anyway,
 but yes, complex compares need to be lowered.  But as said, you want a
 bitwise comparison, not a value-comparison.  You can try using


 I can ignore all this.  I have implemented both alternatives, with a target
 hook as suggested, but I'm thinking of removing it altogether and just
 leaving the flags approach.

 Few comments on the patch itself.

 +         new_bb = split_edge (ex);
 +         then_bb = create_empty_bb (new_bb);
 +         if (current_loops  new_bb-loop_father)

 +           add_bb_to_loop (then_bb, new_bb-loop_father);

 +         gsi = gsi_start_bb (new_bb);
 +         t1 = make_rename_temp (TREE_TYPE (ref-mem), NULL);

 Hmm, why do you need to re-load the 

Re: [rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision

2012-04-25 Thread Ulrich Weigand
Richard Guenther wrote:
 On Tue, 24 Apr 2012, Ulrich Weigand wrote:
  Does this look reasonable?  Any comments or suggestions appreciated!
 
 Yes, getting rid of this fragile interaction by doing more work in
 vect_recog_widen_shift_pattern sounds like the correct thing to do.

OK, I'll try to come up with a patch along those lines.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



[PATCH] Fix bitfield expansion (PR middle-end/52979)

2012-04-25 Thread Jakub Jelinek
Hi!

This patch fixes the attached execute/ testcases.  In some cases
get_best_mode was ignoring bitregion_* restrictions and returned
mode that overlapped into adjacent non-bitfields, or store_bit_field_1
for insv could ignore those restrictions completely.
After fixing that I've run into several issues that the other parts
of the patch fix, namely that get_best_mode returning VOIDmode more often
now may lead to store_split_bit_field being called when it wasn't before,
and when approaching the bitregion_end point we can't just keep using
large units (modes), but need to decrease their size in order not to
overwrite adjacent non-bitfields.  The get_best_mode dropping of % align
change prevents a mutual recursion, e.g. for bitregion_end == 103
it would prevent using  QImode even far before the bit region boundary,
even when not touching anything close to that bit region end.
But the extra
+  (bitregion_end == 0
+ || bitpos - (bitpos % unit) + unit = bitregion_end + 1))
needed to be added after that change, otherwise we'd regress on
c-c++-common/simulate-thread/bitfields-2.c.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-04-25  Jakub Jelinek  ja...@redhat.com

PR middle-end/52979
* stor-layout.c (get_best_mode): Don't return mode with bitsize
larger than maxbits.  Don't compute maxbits modulo align.
Also check that unit bytes long store at bitpos / unit * unit
doesn't affect bits beyond bitregion_end.
* expmed.c (store_bit_field_1): Avoid trying insv if OP_MODE MEM
would not fit into bitregion_start ... bitregion_end + 1 bit
region.
(store_split_bit_field): Decrease unit close to end of bitregion_end
if access is restricted in order to avoid mutual recursion.

* gcc.c-torture/compile/pr52979-1.c: New test.
* gcc.c-torture/execute/pr52979-1.c: New test.
* gcc.c-torture/execute/pr52979-2.c: New test.

--- gcc/stor-layout.c.jj2012-03-26 11:53:20.0 +0200
+++ gcc/stor-layout.c   2012-04-25 10:26:18.881631697 +0200
@@ -2624,7 +2624,7 @@ get_best_mode (int bitsize, int bitpos,
   if (!bitregion_end)
 maxbits = MAX_FIXED_MODE_SIZE;
   else
-maxbits = (bitregion_end - bitregion_start) % align + 1;
+maxbits = bitregion_end - bitregion_start + 1;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2645,7 +2645,10 @@ get_best_mode (int bitsize, int bitpos,
 (Though at least one Unix compiler ignores this problem:
 that on the Sequent 386 machine.  */
   || MIN (unit, BIGGEST_ALIGNMENT)  align
-  || (largest_mode != VOIDmode  unit  GET_MODE_BITSIZE (largest_mode)))
+  || (largest_mode != VOIDmode  unit  GET_MODE_BITSIZE (largest_mode))
+  || unit  maxbits
+  || (bitregion_end
+  bitpos - (bitpos % unit) + unit  bitregion_end + 1))
 return VOIDmode;
 
   if ((SLOW_BYTE_ACCESS  ! volatilep)
@@ -2663,7 +2666,9 @@ get_best_mode (int bitsize, int bitpos,
   unit = MIN (align, BIGGEST_ALIGNMENT)
   unit = maxbits
   (largest_mode == VOIDmode
- || unit = GET_MODE_BITSIZE (largest_mode)))
+ || unit = GET_MODE_BITSIZE (largest_mode))
+  (bitregion_end == 0
+ || bitpos - (bitpos % unit) + unit = bitregion_end + 1))
wide_mode = tmode;
}
 
--- gcc/expmed.c.jj 2012-04-19 11:09:13.0 +0200
+++ gcc/expmed.c2012-04-24 17:14:34.264897874 +0200
@@ -640,7 +640,13 @@ store_bit_field_1 (rtx str_rtx, unsigned
!(MEM_P (op0)  MEM_VOLATILE_P (op0)
flag_strict_volatile_bitfields  0)
! ((REG_P (op0) || GET_CODE (op0) == SUBREG)
-(bitsize + bitpos  GET_MODE_BITSIZE (op_mode
+(bitsize + bitpos  GET_MODE_BITSIZE (op_mode)))
+  /* Do not use insv if the bit region is restricted and
+op_mode integer at offset doesn't fit into the
+restricted region.  */
+   !(MEM_P (op0)  bitregion_end
+   bitnum - bitpos + GET_MODE_BITSIZE (op_mode)
+  bitregion_end + 1))
 {
   struct expand_operand ops[4];
   int xbitpos = bitpos;
@@ -760,7 +766,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
  || GET_MODE_BITSIZE (GET_MODE (op0))  maxbits
  || (op_mode != MAX_MACHINE_MODE
   GET_MODE_SIZE (GET_MODE (op0))  GET_MODE_SIZE (op_mode)))
-   bestmode = get_best_mode  (bitsize, bitnum,
+   bestmode = get_best_mode (bitsize, bitnum,
  bitregion_start, bitregion_end,
  MEM_ALIGN (op0),
  (op_mode == MAX_MACHINE_MODE
@@ -1096,6 +1102,16 @@ store_split_bit_field (rtx op0, unsigned
   offset = (bitpos + bitsdone) / unit;
   thispos = (bitpos + bitsdone) % unit;
 
+  /* 

[Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-04-25 Thread Greta Yorsh
The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon enabled:
FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect vectorized 1
loops 2

The test expects 2 loops to be vectorized, while gcc successfully vectorizes
3 loops in this test using neon on arm. This patch adjusts the expected
output. Fixed test passes on qemu for arm and powerpc.

OK for trunk?

Thanks,
Greta

gcc/testsuite/ChangeLog

2012-04-23  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/vect/slp-perm-8.c (dg-final): Adjust expected number
of vectorized loops for arm with neon.
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
index d211ef9..beaa96c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
@@ -52,7 +52,9 @@ int main (int argc, const char* argv[])
   return 0;
 }

-/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target 
vect_perm_byte } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target { 
vect_perm_byte  arm_neon_ok } } } } */
+/* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect { target { 
vect_perm_byte  arm_neon_ok } } } }*/
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { 
vect_perm_byte  {! arm_neon_ok } } } } } */
 /* { dg-final { scan-tree-dump-times vectorizing stmts using SLP 1 vect { 
target vect_perm_byte } } } */
 /* { dg-final { cleanup-tree-dump vect } } */


Re: Continue strict-volatile-bitfields fixes

2012-04-25 Thread Richard Guenther
On Wed, Apr 25, 2012 at 1:27 PM, Thomas Schwinge
tho...@codesourcery.com wrote:
 Hi!

 On Thu, 19 Apr 2012 19:46:17 +0200, I wrote:
 Here is my current test case, reduced from gcc.dg/tree-ssa/20030922-1.c:

     extern void abort (void);

     enum tree_code
     {
       BIND_EXPR,
     };
     struct tree_common
     {
       enum tree_code code:8;
     };
     void
     voidify_wrapper_expr (struct tree_common *common)
     {
       switch (common-code)
         {
         case BIND_EXPR:
           if (common-code != BIND_EXPR)
             abort ();
         }
     }

 The diff for -fdump-tree-all between compiling with
 -fno-strict-volatile-bitfields and -fstrict-volatile-bitfields begins
 with the following, right in 20030922-1.c.003t.original:

 diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original
 --- fnsvb/20030922-1.c.003t.original    2012-04-19 16:51:18.322150866 +0200
 +++ fsvb/20030922-1.c.003t.original     2012-04-19 16:49:18.132088498 +0200
 @@ -7,7 +7,7 @@
    switch ((int) common-code)
      {
        case 0:;
 -      if (common-code != 0)
 +      if ((BIT_FIELD_REF *common, 32, 0  255) != 0)
          {
            abort ();
          }

 That is, for -fno-strict-volatile-bitfields the second instance of
 »common-code« it is a component_ref, whereas for
 -fstrict-volatile-bitfields it is a bit_field_ref.  The former will be
 optimized as intended, the latter will not.  So, what should be emitted
 in the -fstrict-volatile-bitfields case?  A component_ref -- but this is
 what Bernd's commit r182545 (for PR51200) changed, I think?  Or should
 later optimization stages learn to better deal with a bit_field_ref, and
 where would this have to happen?

 I figured out why this generic code is behaving differently for SH
 targets.  I compared to ARM as well as x86, for which indeed the
 optimization possibility is not lost even when compiling with
 -fstrict-volatile-bitfields.

 The component_ref inside the if statement (but not the one in the switch
 statement) is fed into optimize_bit_field_compare where it is optimized
 to »BIT_FIELD_REF *common, 32, 0  255« only for SH, because
 get_best_mode for SH returns SImode (ARM, x86: QImode), because SH has
 »#define SLOW_BYTE_ACCESS 1«, and thus the original QImode is widened to
 SImode, hoping for better code generation »since it may eliminate
 subsequent memory access if subsequent accesses occur to other fields in
 the same word of the structure, but to different bytes«.  (Well, the
 converse happens here...)  After that, in optimize_bit_field_compare, for
 ARM and x86, »nbitsize == lbitsize«, and thus the optimization is
 aborted, whereas for SH it will be performed, that is, the component_ref
 is replaced with »BIT_FIELD_REF *common, 32, 0  255«.

 If I manually force SH's SImode back to QImode (that is, abort
 optimize_bit_field_compare), the later optimization passes work as they
 do for ARM and x86.

 Before Bernd's r182545, optimize_bit_field_compare returned early (that
 is, aborted this optimization attempt), because »lbitsize ==
 GET_MODE_BITSIZE (lmode)« (8 bits).  (With the current sources, lmode is
 VOIDmode.)

 Is emmitting »BIT_FIELD_REF *common, 32, 0  255« wrong in this case,
 or should a later optimization pass be able to figure out that
 »BIT_FIELD_REF *common, 32, 0  255« is in fact the same as
 common-code, and then be able to conflate these?  Any suggestions
 where/how to tackle this?

The BIT_FIELD_REF is somewhat of a red-herring.  It is created by fold-const.c
in optimize_bit_field_compare, code that I think should be removed completely.
Or it needs to be made aware of strict-volatile bitfield and C++ memory model
details.

Richard.


 Grüße,
  Thomas


Re: [PING][Patch, Testsuite] fix failure in test gcc.dg/pr52283.c

2012-04-25 Thread Richard Guenther
On Wed, Apr 25, 2012 at 1:40 PM, Greta Yorsh greta.yo...@arm.com wrote:
 PING! Here is the original post:
 http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01235.html

 This patch fixes the failure in gcc.dg/pr52283.c by adding the missing
 dg-warning and dg-options.

 OK for trunk?

Didn't I approve this already?

Richard.

 Thanks,
 Greta

 gcc/testsuite/ChangeLog

 2012-04-20  Greta Yorsh  greta.yo...@arm.com

        * gcc.dg/pr52283.c: Add missing dg-warning and dg-options.


 diff --git a/gcc/testsuite/gcc.dg/pr52283.c b/gcc/testsuite/gcc.dg/pr52283.c
 index 33785a5..070e71a 100644
 --- a/gcc/testsuite/gcc.dg/pr52283.c
 +++ b/gcc/testsuite/gcc.dg/pr52283.c
 @@ -1,6 +1,7 @@
  /* Test for case labels not integer constant expressions but folding
    to integer constants (used in Linux kernel).  */
  /* { dg-do compile } */
 +/* { dg-options -pedantic } */

  extern unsigned int u;

 @@ -9,7 +10,7 @@ b (int c)
  {
   switch (c)
     {
 -    case (int) (2  | ((4  8) ? 8 : u)):
 +    case (int) (2  | ((4  8) ? 8 : u)): /* { dg-warning case label is not
 an integer constant expression } */
       ;
     }
  }





Re: PR c/53066 Wshadow should not warn for shadowing an extern function

2012-04-25 Thread Joseph S. Myers
On Wed, 25 Apr 2012, Manuel López-Ibáñez wrote:

 On 25 April 2012 00:01, Joseph S. Myers jos...@codesourcery.com wrote:
  On Sun, 22 Apr 2012, Manuel López-Ibáñez wrote:
 
  Wshadow warns whenever any declaration shadows a global function
  declaration. This is almost always noise, since most (always?) of the
  time one cannot mistakenly replace a function by another variable. The
  false positives are too common (Linus mentions using the name 'index'
  when including string.h).
 
  I think the correct rule would be: warn if a variable *with
  pointer-to-function type* shadows a function, and warn if a nested
  function shadows another function, but don't warn for variables shadowing
  functions if the variable has any other type (because if the variable has
  some type that isn't a pointer-to-function, no confusion is likely without
  another error being given).
 
 Right. How does one check that a decl is a nested function?

I think you should check if the decl has function type or pointer to 
function type.  (Nested functions are simply I think the only valid case 
where you can end up with one function shadowing another; has function 
type is the logical check rather than is a nested function.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Fix bitfield expansion (PR middle-end/52979)

2012-04-25 Thread Richard Guenther
On Wed, 25 Apr 2012, Jakub Jelinek wrote:

 Hi!
 
 This patch fixes the attached execute/ testcases.  In some cases
 get_best_mode was ignoring bitregion_* restrictions and returned
 mode that overlapped into adjacent non-bitfields, or store_bit_field_1
 for insv could ignore those restrictions completely.
 After fixing that I've run into several issues that the other parts
 of the patch fix, namely that get_best_mode returning VOIDmode more often
 now may lead to store_split_bit_field being called when it wasn't before,
 and when approaching the bitregion_end point we can't just keep using
 large units (modes), but need to decrease their size in order not to
 overwrite adjacent non-bitfields.  The get_best_mode dropping of % align
 change prevents a mutual recursion, e.g. for bitregion_end == 103
 it would prevent using  QImode even far before the bit region boundary,
 even when not touching anything close to that bit region end.
 But the extra
 +  (bitregion_end == 0
 + || bitpos - (bitpos % unit) + unit = bitregion_end + 1))
 needed to be added after that change, otherwise we'd regress on
 c-c++-common/simulate-thread/bitfields-2.c.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2012-04-25  Jakub Jelinek  ja...@redhat.com
 
   PR middle-end/52979
   * stor-layout.c (get_best_mode): Don't return mode with bitsize
   larger than maxbits.  Don't compute maxbits modulo align.
   Also check that unit bytes long store at bitpos / unit * unit
   doesn't affect bits beyond bitregion_end.
   * expmed.c (store_bit_field_1): Avoid trying insv if OP_MODE MEM
   would not fit into bitregion_start ... bitregion_end + 1 bit
   region.
   (store_split_bit_field): Decrease unit close to end of bitregion_end
   if access is restricted in order to avoid mutual recursion.
 
   * gcc.c-torture/compile/pr52979-1.c: New test.
   * gcc.c-torture/execute/pr52979-1.c: New test.
   * gcc.c-torture/execute/pr52979-2.c: New test.
 
 --- gcc/stor-layout.c.jj  2012-03-26 11:53:20.0 +0200
 +++ gcc/stor-layout.c 2012-04-25 10:26:18.881631697 +0200
 @@ -2624,7 +2624,7 @@ get_best_mode (int bitsize, int bitpos,
if (!bitregion_end)
  maxbits = MAX_FIXED_MODE_SIZE;
else
 -maxbits = (bitregion_end - bitregion_start) % align + 1;
 +maxbits = bitregion_end - bitregion_start + 1;
  
/* Find the narrowest integer mode that contains the bit field.  */
for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
 @@ -2645,7 +2645,10 @@ get_best_mode (int bitsize, int bitpos,
(Though at least one Unix compiler ignores this problem:
that on the Sequent 386 machine.  */
|| MIN (unit, BIGGEST_ALIGNMENT)  align
 -  || (largest_mode != VOIDmode  unit  GET_MODE_BITSIZE 
 (largest_mode)))
 +  || (largest_mode != VOIDmode  unit  GET_MODE_BITSIZE (largest_mode))
 +  || unit  maxbits
 +  || (bitregion_end
 +bitpos - (bitpos % unit) + unit  bitregion_end + 1))
  return VOIDmode;
  
if ((SLOW_BYTE_ACCESS  ! volatilep)
 @@ -2663,7 +2666,9 @@ get_best_mode (int bitsize, int bitpos,
  unit = MIN (align, BIGGEST_ALIGNMENT)
  unit = maxbits
  (largest_mode == VOIDmode
 -   || unit = GET_MODE_BITSIZE (largest_mode)))
 +   || unit = GET_MODE_BITSIZE (largest_mode))
 +(bitregion_end == 0
 +   || bitpos - (bitpos % unit) + unit = bitregion_end + 1))
   wide_mode = tmode;
   }
  
 --- gcc/expmed.c.jj   2012-04-19 11:09:13.0 +0200
 +++ gcc/expmed.c  2012-04-24 17:14:34.264897874 +0200
 @@ -640,7 +640,13 @@ store_bit_field_1 (rtx str_rtx, unsigned
 !(MEM_P (op0)  MEM_VOLATILE_P (op0)
   flag_strict_volatile_bitfields  0)
 ! ((REG_P (op0) || GET_CODE (op0) == SUBREG)
 -  (bitsize + bitpos  GET_MODE_BITSIZE (op_mode
 +  (bitsize + bitpos  GET_MODE_BITSIZE (op_mode)))
 +  /* Do not use insv if the bit region is restricted and
 +  op_mode integer at offset doesn't fit into the
 +  restricted region.  */
 +   !(MEM_P (op0)  bitregion_end
 + bitnum - bitpos + GET_MODE_BITSIZE (op_mode)
 +bitregion_end + 1))
  {
struct expand_operand ops[4];
int xbitpos = bitpos;
 @@ -760,7 +766,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 || GET_MODE_BITSIZE (GET_MODE (op0))  maxbits
 || (op_mode != MAX_MACHINE_MODE
  GET_MODE_SIZE (GET_MODE (op0))  GET_MODE_SIZE (op_mode)))
 - bestmode = get_best_mode  (bitsize, bitnum,
 + bestmode = get_best_mode (bitsize, bitnum,
 bitregion_start, bitregion_end,
 MEM_ALIGN (op0),
 (op_mode == MAX_MACHINE_MODE
 @@ -1096,6 +1102,16 @@ store_split_bit_field (rtx op0, unsigned
offset = 

Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-04-25 Thread Richard Guenther
On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com wrote:
 The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon enabled:
 FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect vectorized 1
 loops 2

 The test expects 2 loops to be vectorized, while gcc successfully vectorizes
 3 loops in this test using neon on arm. This patch adjusts the expected
 output. Fixed test passes on qemu for arm and powerpc.

 OK for trunk?

I think the proper fix is to instead of

  for (i = 0; i  N; i++)
{
  input[i] = i;
  output[i] = 0;
  if (input[i]  256)
abort ();
}

use

  for (i = 0; i  N; i++)
{
  input[i] = i;
  output[i] = 0;
  __asm__ volatile ();
}

to prevent vectorization of initialization loops.

 Thanks,
 Greta

 gcc/testsuite/ChangeLog

 2012-04-23  Greta Yorsh  greta.yo...@arm.com

        * gcc.dg/vect/slp-perm-8.c (dg-final): Adjust expected number
        of vectorized loops for arm with neon.


Re: PR c/53066 Wshadow should not warn for shadowing an extern function

2012-04-25 Thread Gabriel Dos Reis
On Wed, Apr 25, 2012 at 5:36 AM, Manuel López-Ibáñez
lopeziba...@gmail.com wrote:
 On 25 April 2012 00:01, Joseph S. Myers jos...@codesourcery.com wrote:
 On Sun, 22 Apr 2012, Manuel López-Ibáñez wrote:

 Wshadow warns whenever any declaration shadows a global function
 declaration. This is almost always noise, since most (always?) of the
 time one cannot mistakenly replace a function by another variable. The
 false positives are too common (Linus mentions using the name 'index'
 when including string.h).

 I think the correct rule would be: warn if a variable *with
 pointer-to-function type* shadows a function, and warn if a nested
 function shadows another function, but don't warn for variables shadowing
 functions if the variable has any other type (because if the variable has
 some type that isn't a pointer-to-function, no confusion is likely without
 another error being given).

 Right. How does one check that a decl is a nested function?

 is DECL_CONTEXT (decl_function_context (t)) a FUNCTION_DECL?


Re: PR c/53066 Wshadow should not warn for shadowing an extern function

2012-04-25 Thread Gabriel Dos Reis
On Wed, Apr 25, 2012 at 6:54 AM, Joseph S. Myers
jos...@codesourcery.com wrote:
 On Wed, 25 Apr 2012, Manuel López-Ibáñez wrote:

 On 25 April 2012 00:01, Joseph S. Myers jos...@codesourcery.com wrote:
  On Sun, 22 Apr 2012, Manuel López-Ibáñez wrote:
 
  Wshadow warns whenever any declaration shadows a global function
  declaration. This is almost always noise, since most (always?) of the
  time one cannot mistakenly replace a function by another variable. The
  false positives are too common (Linus mentions using the name 'index'
  when including string.h).
 
  I think the correct rule would be: warn if a variable *with
  pointer-to-function type* shadows a function, and warn if a nested
  function shadows another function, but don't warn for variables shadowing
  functions if the variable has any other type (because if the variable has
  some type that isn't a pointer-to-function, no confusion is likely without
  another error being given).

 Right. How does one check that a decl is a nested function?

 I think you should check if the decl has function type or pointer to
 function type.  (Nested functions are simply I think the only valid case
 where you can end up with one function shadowing another; has function
 type is the logical check rather than is a nested function.)

indeed.  That should be enough to remove most of the false positives.


Re: PowerPC prologue and epilogue 6

2012-04-25 Thread David Edelsohn
On Wed, Apr 25, 2012 at 1:20 AM, Alan Modra amo...@gmail.com wrote:

 This patch adds a testcase to verify register saves and restores.
 I tried to write it so that it will run on all powerpc targets.  From
 past experience it probably won't.  OK to apply anyway, and fix
 fallout later?

        * gcc.target/powerpc/savres.c: New test.
        * gcc.target/powerpc/powerpc.exp: Run it.

Okay.

Thanks, David


Re: [PATCH] reload: Try alternative with swapped operands before going to the next

2012-04-25 Thread Ulrich Weigand
Andreas Krebbel wrote:

 2011-11-17  Andreas Krebbel  andreas.kreb...@de.ibm.com
 
   * reload.c (find_reloads): Change the loop nesting when trying an
   alternative with swapped operands.

This is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



[PATCH] Fix testsuite fallout of last patch

2012-04-25 Thread Richard Guenther

Committed, tested on x86_64-unknown-linux-gnu.

Richard.

2012-04-25  Richard Guenther  rguent...@suse.de

* gcc.target/i386/l_fma_float_5.c: Adjust.
* gcc.target/i386/l_fma_double_4.c: Likewise.
* gcc.target/i386/l_fma_float_2.c: Likewise.
* gcc.target/i386/l_fma_float_6.c: Likewise.
* gcc.target/i386/l_fma_double_1.c: Likewise.
* gcc.target/i386/l_fma_double_5.c: Likewise.
* gcc.target/i386/l_fma_float_3.c: Likewise.
* gcc.target/i386/l_fma_double_2.c: Likewise.
* gcc.target/i386/l_fma_double_6.c: Likewise.
* gcc.target/i386/l_fma_float_4.c: Likewise.
* gcc.target/i386/l_fma_double_3.c: Likewise.
* gcc.target/i386/l_fma_float_1.c: Likewise.

Index: gcc/testsuite/gcc.target/i386/l_fma_float_5.c
===
--- gcc/testsuite/gcc.target/i386/l_fma_float_5.c   (revision 186812)
+++ gcc/testsuite/gcc.target/i386/l_fma_float_5.c   (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times vfmsub132ps 8  } } */
 /* { dg-final { scan-assembler-times vfnmadd132ps 8  } } */
 /* { dg-final { scan-assembler-times vfnmsub132ps 8  } } */
-/* { dg-final { scan-assembler-times vfmadd132ss 8  } } */
-/* { dg-final { scan-assembler-times vfmsub132ss 8  } } */
-/* { dg-final { scan-assembler-times vfnmadd132ss 8  } } */
-/* { dg-final { scan-assembler-times vfnmsub132ss 8  } } */
+/* { dg-final { scan-assembler-times vfmadd132ss 16  } } */
+/* { dg-final { scan-assembler-times vfmsub132ss 16  } } */
+/* { dg-final { scan-assembler-times vfnmadd132ss 16  } } */
+/* { dg-final { scan-assembler-times vfnmsub132ss 16  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_double_4.c
===
--- gcc/testsuite/gcc.target/i386/l_fma_double_4.c  (revision 186812)
+++ gcc/testsuite/gcc.target/i386/l_fma_double_4.c  (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times vfmsub132pd 8  } } */
 /* { dg-final { scan-assembler-times vfnmadd132pd 8  } } */
 /* { dg-final { scan-assembler-times vfnmsub132pd 8  } } */
-/* { dg-final { scan-assembler-times vfmadd132sd 8  } } */
-/* { dg-final { scan-assembler-times vfmsub132sd 8  } } */
-/* { dg-final { scan-assembler-times vfnmadd132sd 8  } } */
-/* { dg-final { scan-assembler-times vfnmsub132sd 8  } } */
+/* { dg-final { scan-assembler-times vfmadd132sd 16  } } */
+/* { dg-final { scan-assembler-times vfmsub132sd 16  } } */
+/* { dg-final { scan-assembler-times vfnmadd132sd 16  } } */
+/* { dg-final { scan-assembler-times vfnmsub132sd 16  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_float_2.c
===
--- gcc/testsuite/gcc.target/i386/l_fma_float_2.c   (revision 186812)
+++ gcc/testsuite/gcc.target/i386/l_fma_float_2.c   (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times vfmsub132ps 8  } } */
 /* { dg-final { scan-assembler-times vfnmadd132ps 8  } } */
 /* { dg-final { scan-assembler-times vfnmsub132ps 8  } } */
-/* { dg-final { scan-assembler-times vfmadd132ss 8  } } */
-/* { dg-final { scan-assembler-times vfmsub132ss 8  } } */
-/* { dg-final { scan-assembler-times vfnmadd132ss 8  } } */
-/* { dg-final { scan-assembler-times vfnmsub132ss 8  } } */
+/* { dg-final { scan-assembler-times vfmadd132ss 16  } } */
+/* { dg-final { scan-assembler-times vfmsub132ss 16  } } */
+/* { dg-final { scan-assembler-times vfnmadd132ss 16  } } */
+/* { dg-final { scan-assembler-times vfnmsub132ss 16  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_float_6.c
===
--- gcc/testsuite/gcc.target/i386/l_fma_float_6.c   (revision 186812)
+++ gcc/testsuite/gcc.target/i386/l_fma_float_6.c   (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times vfmsub132ps 8  } } */
 /* { dg-final { scan-assembler-times vfnmadd132ps 8  } } */
 /* { dg-final { scan-assembler-times vfnmsub132ps 8  } } */
-/* { dg-final { scan-assembler-times vfmadd132ss 8  } } */
-/* { dg-final { scan-assembler-times vfmsub132ss 8  } } */
-/* { dg-final { scan-assembler-times vfnmadd132ss 8  } } */
-/* { dg-final { scan-assembler-times vfnmsub132ss 8  } } */
+/* { dg-final { scan-assembler-times vfmadd132ss 16  } } */
+/* { dg-final { scan-assembler-times vfmsub132ss 16  } } */
+/* { dg-final { scan-assembler-times vfnmadd132ss 16  } } */
+/* { dg-final { scan-assembler-times vfnmsub132ss 16  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_double_1.c
===
--- gcc/testsuite/gcc.target/i386/l_fma_double_1.c  (revision 186812)
+++ gcc/testsuite/gcc.target/i386/l_fma_double_1.c  (working copy)
@@ -16,11 +16,11 @@
 /* { dg-final { scan-assembler-times vfnmadd231pd 4  } } */
 /* { dg-final { scan-assembler-times vfnmsub132pd 4  } } */
 /* { dg-final { 

fix incorrect SRA transformation on non-integral VIEW_CONVERT argument

2012-04-25 Thread Olivier Hainque
Hello,

For the  PA(1).Z := 44; assignment in the attached Ada
testcase, we observe the gcc 4.5 SRA pass performing an
invalid transformation, turning:

  struct {
system__pack_48__bits_48 OBJ;
  } D.1432;

  D.1432.OBJ = D.1435;
  T1b.F = VIEW_CONVERT_EXPRstruct pt__point(D.1432);

into:

  SR.12_17 = D.1435_3;
  T1b.F = VIEW_CONVERT_EXPRstruct pt__point(SR.12_17);

where we have

var_decl D.1432
 type record_type 0x77fb72a0 BLK
size integer_cst 0x77fac960 constant 48

and
 
var_decl SR.12
 type integer_type system__pack_48__bits_48
size integer_cst 0x77ecd870 64

type integer_type system__pack_48__bits_48___UMT
size integer_cst 64

At least the change in size is problematic because the conversion
outcome might differ after the replacement, in particular on
big-endian targets.

mainline does something slightly different with the same effects
eventually (same testcase failure on sparc-solaris). The attached patch
is a proposal to address this at the point where we already check
for VCE in the access creation process, disqualifying scalarization
for a VCE argument of non-integral size.

We (AdaCore) have been using this internally for a while now.
I also checked that it fixes the observable problem on sparc, then
bootstrapped and regtested on i686-suse-linux.

OK to commit ?

Thanks in advance for your feedback,

Olivier

2012-04-25  Olivier Hainque  hain...@adacore.com

* tree-sra.c (create_access): Additional IN_VCE argument, telling
if EXPR is VIEW_CONVERT_EXPR input.  Disqualify base if access size
is not that of an integer mode in this case.
(build_access_from_expr_1): Adjust caller.

testsuite/
* gnat.dg/sra_vce[_decls].adb: New testcase.



sravce.dif
Description: video/dv


Re: [PATCH 08/11] Make conversion warnings work on NULL with -ftrack-macro-expansion

2012-04-25 Thread Dodji Seketeli
Gabriel Dos Reis g...@integrable-solutions.net writes:

 On Wed, Apr 11, 2012 at 3:46 AM, Dodji Seketeli do...@redhat.com wrote:
 There are various conversion related warnings that trigger on
 potentially dangerous uses of NULL (or __null).  NULL is defined as a
 macro in a system header, so calling warning or warning_at on a virtual
 location of NULL yields no diagnostic.  So the test accompanying this
 patch (as well as others), was failling when run with
 -ftrack-macro-expansion.

 I think it's necessary to use the location of NULL that is in the main
 source code (instead of, e.g, the spelling location that is in the
 system header where the macro is defined) in those cases.  Note that for
 __null, we don't have the issue.

 I like the idea.  However, you should write a separate function
 (get_null_ptr_cst_location?) that computes the location of the null
 pointer constant token and call it from where you need the location.

OK.  I have introduced such a new function and I gave it the slightly
more generic name expansion_point_location_if_in_system_header as I
suspect it can be useful for macros that are similar to NULL.  I haven't
spotted such macros (from test regressions) yet, though.

Here is the updated patch that passes bootstrap with
-ftrack-macro-expansion turned off; it also passes bootstrap with
-ftrack-macro-expansion turned on with the whole series of patches I
have locally on my tree.

gcc/
* input.h (expansion_point_location_if_in_system_header): Declare
new function.
* input.c (expansion_point_location_if_in_system_header): Define it.
gcc/cp/

* call.c (conversion_null_warnings): Use the new
expansion_point_location_if_in_system_header.
* cvt.c (build_expr_type_conversion): Likewise.
* typeck.c (cp_build_binary_op): Likewise.

gcc/testsuite/

* g++.dg/warn/Wconversion-null-2.C: Add testing for __null,
alongside the previous testing for NULL.
---
 gcc/cp/call.c  |   16 ++-
 gcc/cp/cvt.c   |   16 ++-
 gcc/cp/typeck.c|   18 +++--
 gcc/input.c|   14 ++
 gcc/input.h|1 +
 gcc/testsuite/g++.dg/warn/Wconversion-null-2.C |   31 +++-
 6 files changed, 88 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f9a7f08..1dc57fc 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -5598,12 +5598,24 @@ conversion_null_warnings (tree totype, tree expr, tree 
fn, int argnum)
   if (expr == null_node  TREE_CODE (totype) != BOOLEAN_TYPE
ARITHMETIC_TYPE_P (totype))
 {
+  /* The location of the warning here is most certainly the one for
+the token that represented null_node in the source code.
+That is either NULL or __null.  If it is NULL, then that
+macro is defined in a system header and, as a consequence,
+warning_at won't emit any diagnostic for it.  In this case,
+we are going to resolve that location to the point in the
+source code where the expression that triggered this error
+message is, rather than the point where the NULL macro is
+defined.  */
+  source_location loc =
+   expansion_point_location_if_in_system_header (input_location);
+
   if (fn)
-   warning_at (input_location, OPT_Wconversion_null,
+   warning_at (loc, OPT_Wconversion_null,
passing NULL to non-pointer argument %P of %qD,
argnum, fn);
   else
-   warning_at (input_location, OPT_Wconversion_null,
+   warning_at (loc, OPT_Wconversion_null,
converting to non-pointer type %qT from NULL, totype);
 }
 
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 3dab372..8defc61 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -1472,8 +1472,20 @@ build_expr_type_conversion (int desires, tree expr, bool 
complain)
   if (expr == null_node
(desires  WANT_INT)
!(desires  WANT_NULL))
-warning_at (input_location, OPT_Wconversion_null,
-   converting NULL to non-pointer type);
+{
+  /* The location of the warning here is the one for the
+(expansion of the) NULL macro, or for __null.  If it is for
+NULL, then, as that that macro is defined in a system header,
+warning_at won't emit any diagnostic.  In this case, we are
+going to resolve that location to the point in the source
+code where the expression that triggered this warning is,
+rather than the point where the NULL macro is defined.  */
+  source_location loc =
+   expansion_point_location_if_in_system_header (input_location);
+
+  warning_at (loc, OPT_Wconversion_null,
+ converting NULL to non-pointer type);
+}
 
   basetype = TREE_TYPE (expr);
 
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c

Re: [PING][Patch, Testsuite] fix failure in test gcc.dg/pr52283.c

2012-04-25 Thread Mike Stump
On Apr 25, 2012, at 4:52 AM, Richard Guenther wrote:
 On Wed, Apr 25, 2012 at 1:40 PM, Greta Yorsh greta.yo...@arm.com wrote:
 PING! Here is the original post:
 http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01235.html
 
 This patch fixes the failure in gcc.dg/pr52283.c by adding the missing
 dg-warning and dg-options.
 
 OK for trunk?
 
 Didn't I approve this already?

http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01361.html

I think that is the right message...  :-)


[PATCH] Default to -gdwarf-4

2012-04-25 Thread Jakub Jelinek
Hi!

For reasonable debugging experience recent GCC versions need
GDB = 7.0 for quite some time, and DWARF4 is almost 2 years old now,
and offers lots of improvements over DWARF2 we still default to.

So, I'd like to make -gdwarf-4 (just the version, of course -g
is needed to enable debug info generation) the default, together
with -fno-debug-types-section (as .debug_types isn't right now always a win,
see the data in the dwz-0.1 announcement plus not all DWARF consumers can
handle it yet or are gaining support only very recently (e.g. valgrind))
and -grecord-gcc-switches which is IMHO worth the few extra bytes per CU
(unless every CU is compiled with different code generation affecting
options usually just 4 extra bytes).  In Fedora we default to this combo
already for some time.  Targets that have tools that are upset by
any extensions that defaulted to -gno-strict-dwarf previously will now
default to -gdwarf-2 as before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-04-25  Jakub Jelinek  ja...@redhat.com

* common.opt (flag_debug_types_section): Default to 0.
(dwarf_version): Default to 4.
(dwarf_record_gcc_switches): Default to 1.
(dwarf_strict): Default to 0.
* toplev.c (process_options): Don't handle dwarf_strict
or dwarf_version here.
* config/vxworks.c (vxworks_override_options): Don't
test whether dwarf_strict or dwarf_version are negative,
instead test !global_options_set.x_dwarf_*.
* config/darwin.c (darwin_override_options): Default to
dwarf_version 2.
* doc/invoke.texi: Note that -gdwarf-4, -grecord-gcc-switches
and -fno-debug-types-section are now the default.

--- gcc/common.opt.jj   2012-04-25 12:14:55.0 +0200
+++ gcc/common.opt  2012-04-25 12:35:33.944460663 +0200
@@ -967,7 +967,7 @@ Common Joined RejectNegative Var(common_
 Map one directory name to another in debug information
 
 fdebug-types-section
-Common Report Var(flag_debug_types_section) Init(1)
+Common Report Var(flag_debug_types_section) Init(0)
 Output .debug_types section when using DWARF v4 debuginfo.
 
 ; Nonzero for -fdefer-pop: don't pop args after each function call
@@ -2212,7 +2212,7 @@ Common JoinedOrMissing Negative(gdwarf-)
 Generate debug information in COFF format
 
 gdwarf-
-Common Joined UInteger Var(dwarf_version) Init(-1) Negative(gstabs)
+Common Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
 Generate debug information in DWARF v2 (or later) format
 
 ggdb
@@ -2220,7 +2220,7 @@ Common JoinedOrMissing
 Generate debug information in default extended format
 
 gno-record-gcc-switches
-Common RejectNegative Var(dwarf_record_gcc_switches,0) Init(0)
+Common RejectNegative Var(dwarf_record_gcc_switches,0) Init(1)
 Don't record gcc command line switches in DWARF DW_AT_producer.
 
 grecord-gcc-switches
@@ -2236,7 +2236,7 @@ Common JoinedOrMissing Negative(gvms)
 Generate debug information in extended STABS format
 
 gno-strict-dwarf
-Common RejectNegative Var(dwarf_strict,0) Init(-1)
+Common RejectNegative Var(dwarf_strict,0) Init(0)
 Emit DWARF additions beyond selected version
 
 gstrict-dwarf
--- gcc/toplev.c.jj 2012-04-25 12:14:55.0 +0200
+++ gcc/toplev.c2012-04-25 12:34:38.016819701 +0200
@@ -1375,15 +1375,6 @@ process_options (void)
}
 }
 
-  /* Unless over-ridden for the target, assume that all DWARF levels
- may be emitted, if DWARF2_DEBUG is selected.  */
-  if (dwarf_strict  0)
-dwarf_strict = 0;
-
-  /* And select a default dwarf level.  */
-  if (dwarf_version  0)
-dwarf_version = 2;
-
   /* A lot of code assumes write_symbols == NO_DEBUG if the debugging
  level is 0.  */
   if (debug_info_level == DINFO_LEVEL_NONE)
--- gcc/config/vxworks.c.jj 2012-04-25 12:14:55.0 +0200
+++ gcc/config/vxworks.c2012-04-25 12:34:04.868027337 +0200
@@ -1,5 +1,5 @@
 /* Common VxWorks target definitions for GNU compiler.
-   Copyright (C) 2007, 2008, 2010
+   Copyright (C) 2007, 2008, 2010, 2012
Free Software Foundation, Inc.
Contributed by CodeSourcery, Inc.
 
@@ -147,9 +147,9 @@ vxworks_override_options (void)
 
   /* Default to strict dwarf-2 to prevent potential difficulties observed with
  non-gdb debuggers on extensions  2.  */
-  if (dwarf_strict  0)
+  if (!global_options_set.x_dwarf_strict)
 dwarf_strict = 1;
 
-  if (dwarf_version  0)
+  if (!global_options_set.x_dwarf_version)
 dwarf_version = 2;
 }
--- gcc/config/darwin.c.jj  2011-12-01 11:45:06.0 +0100
+++ gcc/config/darwin.c 2012-04-25 12:21:03.965982850 +0200
@@ -1,6 +1,6 @@
 /* Functions for generic Darwin as target machine for GNU C compiler.
Copyright (C) 1989, 1990, 1991, 1992, 1993, 2000, 2001, 2002, 2003, 2004,
-   2005, 2006, 2007, 2008, 2009, 2010, 2011
+   2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012
Free Software Foundation, Inc.
Contributed by Apple Computer Inc.
 
@@ -2973,6 +2973,8 

Re: [PATCH] Make sizetypes no longer sign-extending

2012-04-25 Thread Richard Guenther
On Tue, 24 Apr 2012, Richard Guenther wrote:

 On Tue, 24 Apr 2012, Richard Guenther wrote:
 
  
  I've been carrying this patch for quite some while now and really
  want to go forward with it - the problem is that while all default
  languages work fine after this patch Ada shows some testsuite
  regressions.  I've had various hacks/workarounds throughout the
  Ada frontend for them, but lost track of what fixed what and
  they all felt like hacks anyway.
  
  Thus - I know the patch will add Ada testsuite regressions.  But it will
  not break Ada bootstrap.  Ada is not in the set of default languages,
  nor is it considered release critical.
  
  Are the Ada folks happy with helping to fix the fallout after-the-fact
  (I got Eric to fix the bootstrap issues that were first present - thanks
  for that)?  I am happy to revisit my hacks/workarounds and post them,
  but it will be ultimatively easier to review them if you can see
  the FAIL for yourself (there are some workarounds/hacks posted on the
  mailinglist for previous attempts IIRC).
  
  Thanks for your consideration.
  
  The patch is currently under re-testing (it needs the 2nd patch
  below, which was already approved but back in time broke Ada
  bootstrap - I didn't verify if that still occurs).
 
 To followup myself - bootstrap with just the 2nd patch is still
 broken:
 
 /abuild/rguenther/obj2/./gcc/xgcc -B/abuild/rguenther/obj2/./gcc/ 
 -B/usr/local/x86_64-unknown-linux-gnu/bin/ 
 -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem 
 /usr/local/x86_64-unknown-linux-gnu/include -isystem 
 /usr/local/x86_64-unknown-linux-gnu/sys-include-c -g -O2 -m32 -fpic  
 -W -Wall -gnatpg -nostdinc -m32  s-secsta.adb -o s-secsta.o
 s-secsta.adb:501:4: error: size of variable 'System.Secondary_Stack.Chunk' 
 is too large
 Chunk : aliased Chunk_Id (1, Static_Secondary_Stack_Size);
 ^
 make[9]: *** [s-secsta.o] Error 1
 
 And the following is the list of regressions introduced by the combined
 patch set:
 
 === acats tests ===
 FAIL:   a71004a
 FAIL:   c36204d
 FAIL:   c36205l
 FAIL:   c37404b
 FAIL:   c41107a
 FAIL:   c41204a
 FAIL:   c43204c
 FAIL:   c43204e
 FAIL:   c43204f
 FAIL:   c43204g
 FAIL:   c43204h
 FAIL:   c43204i
 FAIL:   c52102a
 FAIL:   c52102c
 FAIL:   c64103c
 FAIL:   c64103d
 FAIL:   c64106a
 FAIL:   c95087a
 FAIL:   cc1224a
 FAIL:   cc1311a
 FAIL:   cc3106b
 FAIL:   cc3224a
 FAIL:   cd2a31a
 
 === acats Summary ===
 # of expected passes2297
 # of unexpected failures23
 
 === gnat tests ===
 
 
 Running target unix/
 FAIL: gnat.dg/array11.adb  (test for warnings, line 12)
 FAIL: gnat.dg/loop_optimization3.adb (test for excess errors)
 FAIL: gnat.dg/loop_optimization3.adb execution test
 FAIL: gnat.dg/object_overflow.adb  (test for warnings, line 8)
 FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized goto 2
 FAIL: gnat.dg/return3.adb scan-assembler loc 1 6
 FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors)
 FAIL: gnat.dg/test_8bitlong_overflow.adb execution test
 
 === gnat Summary for unix/ ===
 
 # of expected passes1089
 # of unexpected failures8
 # of expected failures  13
 # of unsupported tests  2
 
 Running target unix//-m32
 FAIL: gnat.dg/array11.adb  (test for warnings, line 12)
 FAIL: gnat.dg/loop_optimization3.adb (test for excess errors)
 FAIL: gnat.dg/loop_optimization3.adb execution test
 FAIL: gnat.dg/object_overflow.adb  (test for warnings, line 8)
 FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized goto 2
 FAIL: gnat.dg/return3.adb scan-assembler loc 1 6
 FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors)
 FAIL: gnat.dg/test_8bitlong_overflow.adb execution test
 
 === gnat Summary for unix//-m32 ===
 
 # of expected passes1089
 # of unexpected failures8
 # of expected failures  13
 # of unsupported tests  2
 
 === gnat Summary ===
 
 # of expected passes2178
 # of unexpected failures16
 # of expected failures  26
 # of unsupported tests  4
 
 Most of the ACATS errors are raised STORAGE_ERROR : object too large
 or error: size of variable '...' is too large.  The gnat testsuite
 adds warning: Storage_Error will be raised at run time to this.
 
 I remember one workaround (which usually involves re-setting TREE_OVERFLOW
 at strathegic places) fixes most of ACATS.  I'll try to isolate that
 (but it's a hack and does not feel right).

Ah, and all ACATS fails and

-FAIL: gnat.dg/loop_optimization3.adb (test for excess errors)
-FAIL: gnat.dg/loop_optimization3.adb execution test
-FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors)
-FAIL: gnat.dg/test_8bitlong_overflow.adb execution test

are fixed by for example

Index: trunk/gcc/stor-layout.c
===
--- 

Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-25 Thread Gerald Pfeifer
On Wed, 25 Apr 2012, Maxim Kuvyrkov wrote:
 OK.  Gerald, does the patch for gcc-4.8/changes.html look OK?

Yes, it does, thank you.  Just add a and the in three places,
as in A new option, the...optimization and at the...level
when committing it, please.

Cheers,
Gerald


[PATCH 10/13] Fix location for static class members

2012-04-25 Thread Dodji Seketeli
Consider the test case g++.dg/other/offsetof5.C:

#include stddef.h

struct A
{
  char c;
  int i;
};

int j = offsetof (A, i);// { dg-warning invalid 
access|offsetof }

template typename T
struct S
{
  T h;
  T i;
  static const int j = offsetof (S, i); // { dg-warning invalid 
access|offsetof }
};

int k = Sint::j;  // { dg-message required from here }

The second warning (that involves the instantiation of the S template)
is not emitted when -ftrack-macro-expansion is on.

This is because during the instantiation of the member j of S
template, the location that is used for the warning is the one for the
DECL j (set by instantiate_decl).  And that location is inaccurately
set to the locus of 'offsetof', which is a macro defined in a system
header, so it's discarded by the diagnostics machinery.

Note that when we reach the point where we emit the warning in
build_class_member_access_expr offsetof expression has long been
folded, so we cannot use e.g, the location of the ')' token that would
have been in the source code.  So I believe the location of 'j' is the
best we can get at this point.

The patch below sets the location of the DECL for 'j' to what I
believe is its precise location; with that, the test case passes with
and without -ftrack-macro-expansion.  But I had to adjust
g++.dg/template/sfinae6_neg.C for that.

Tested on x86_64-unknown-linux-gnu against trunk.

gcc/cp

* decl.c (grokdeclarator): Use the location carried by the
declarator for the DECL of the static class member.

gcc/testsuite/

* g++.dg/template/sfinae6_neg.C: Adjust.
---
 gcc/cp/decl.c   |3 ++-
 gcc/testsuite/g++.dg/template/sfinae6_neg.C |4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 28c7cee..40818a3 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10267,7 +10267,8 @@ grokdeclarator (const cp_declarator *declarator,
  {
/* C++ allows static class members.  All other work
   for this is done by grokfield.  */
-   decl = build_lang_decl (VAR_DECL, unqualified_id, type);
+   decl = build_lang_decl_loc (declarator-id_loc,
+   VAR_DECL, unqualified_id, type);
set_linkage_for_static_data_member (decl);
/* Even if there is an in-class initialization, DECL
   is considered undefined until an out-of-class
diff --git a/gcc/testsuite/g++.dg/template/sfinae6_neg.C 
b/gcc/testsuite/g++.dg/template/sfinae6_neg.C
index d4be5dd..9b7bdfd1 100644
--- a/gcc/testsuite/g++.dg/template/sfinae6_neg.C
+++ b/gcc/testsuite/g++.dg/template/sfinae6_neg.C
@@ -21,9 +21,9 @@ no_type check_is_callable2(...);
 templatetypename F, typename T1, typename T2 = T1
 struct is_callable2
 {
-  static const bool value = 
+  static const bool value = // { dg-error within this context }
 (sizeof(check_is_callable2(typeF(), typeT1(), typeT2()))
- == sizeof(yes_type)); // { dg-error within this context }
+ == sizeof(yes_type));
 };
 
 #define JOIN( X, Y ) DO_JOIN( X, Y )
-- 
Dodji


Re: combine_conversions int-double-int

2012-04-25 Thread Marc Glisse

On Wed, 25 Apr 2012, Richard Guenther wrote:


On Wed, Apr 25, 2012 at 10:12 AM, Marc Glisse marc.gli...@inria.fr wrote:

Hello,

a conversion like int-double-int is just the identity, as long as double
is big enough to represent all ints exactly. The most natural way I found to
do this optimization is the attached:

2012-04-25  Marc Glisse  marc.gli...@inria.fr

       PR middle-end/27139
       * tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT.

Does the approach make sense? I don't know that code, and adding FLOAT_EXPR
/ FIX_TRUNC_EXPR was a bit of guesswork. The precision of double could be
multiplied by log2(base), but not doing it is safe. If the approach is ok, I
could extend it so int-double-long also skips the intermediate conversion.

Bootstrapped and regression tested on linux-x86_64.

Should I try and write a testcase for a specific target checking for
specific asm instructions there, or is there a better way?


Well, scanning the forwprop dump for example.

Btw, I think not munging this new case into the existing CONVERT_EXPR_P
code would be better - it makes the code (even) harder to understand and
I'm not convinced that adding FLOAT_EXPR/FIX_TRUNC_EXPR handling
does not wreck any assumptions in that code.

It also seems that for DECIMAL_FLOAT_TYPE_P the transform is always
valid?

Richard.


(note that I can't commit)


Here is take 2 on this patch, which seems cleaner. Bootstrapped and 
regression tested.


gcc/ChangeLog
2012-04-25  Marc Glisse  marc.gli...@inria.fr

PR middle-end/27139
* tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT.

gcc/testsuite/ChangeLog
2012-04-25  Marc Glisse  marc.gli...@inria.fr

PR middle-end/27139
* gcc.dg/tree-ssa/forwprop-18.c: New test.


In my patch, the lines with gimple_assign_* are vaguely guessed from what 
is around, I don't pretend to understand them.



While doing this, I noticed what I think is a mistake in a comment:

--- gcc/real.c  (revision 186761)
+++ gcc/real.c  (working copy)
@@ -2814,11 +2814,11 @@ significand_size (enum machine_mode mode
 return 0;

   if (fmt-b == 10)
 {
   /* Return the size in bits of the largest binary value that can be
-held by the decimal coefficient for this mode.  This is one more
+held by the decimal coefficient for this mode.  This is one less
 than the number of bits required to hold the largest coefficient
 of this mode.  */
   double log2_10 = 3.3219281;
   return fmt-p * log2_10;
 }

--
Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/forwprop-18.c
===
--- testsuite/gcc.dg/tree-ssa/forwprop-18.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/forwprop-18.c (revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options -O -fdump-tree-forwprop1 } */
+
+short f1(short n)
+{
+  return (double)n;
+}
+long f2(short n)
+{
+  return (double)n;
+}
+
+long g1(long n)
+{
+  return (float)n;
+}
+short g2(long n)
+{
+  return (float)n;
+}
+
+/* { dg-final { scan-tree-dump-times \\\(float\\\) 2 forwprop1 } } */
+/* { dg-final { scan-tree-dump-not \\\(double\\\) forwprop1 } } */
+/* { dg-final { cleanup-tree-dump forwprop1 } } */

Property changes on: testsuite/gcc.dg/tree-ssa/forwprop-18.c
___
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: tree-ssa-forwprop.c
===
--- tree-ssa-forwprop.c (revision 186761)
+++ tree-ssa-forwprop.c (working copy)
@@ -2403,10 +2403,11 @@ combine_conversions (gimple_stmt_iterato
 {
   gimple stmt = gsi_stmt (*gsi);
   gimple def_stmt;
   tree op0, lhs;
   enum tree_code code = gimple_assign_rhs_code (stmt);
+  enum tree_code code2;
 
   gcc_checking_assert (CONVERT_EXPR_CODE_P (code)
   || code == FLOAT_EXPR
   || code == FIX_TRUNC_EXPR);
 
@@ -2423,11 +2424,13 @@ combine_conversions (gimple_stmt_iterato
 
   def_stmt = SSA_NAME_DEF_STMT (op0);
   if (!is_gimple_assign (def_stmt))
 return 0;
 
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
+  code2 = gimple_assign_rhs_code (def_stmt);
+
+  if (CONVERT_EXPR_CODE_P (code2))
 {
   tree defop0 = gimple_assign_rhs1 (def_stmt);
   tree type = TREE_TYPE (lhs);
   tree inside_type = TREE_TYPE (defop0);
   tree inter_type = TREE_TYPE (op0);
@@ -2552,10 +2555,45 @@ combine_conversions (gimple_stmt_iterato
gimple_assign_set_rhs_from_tree (gsi, tem);
  update_stmt (gsi_stmt (*gsi));
  return 1;
}
 }
+  else if (code == FIX_TRUNC_EXPR  code2 == FLOAT_EXPR)
+{
+  tree defop0 = gimple_assign_rhs1 (def_stmt);
+  tree type = TREE_TYPE (lhs);
+  tree inside_type = TREE_TYPE (defop0);
+  tree inter_type = TREE_TYPE (op0);
+  int inside_int = INTEGRAL_TYPE_P 

PATCH: PR debug/52857: DW_OP_GNU_regval_type is generated with INVALID_REGNUM

2012-04-25 Thread H.J. Lu
Hi,

We may generate DW_OP_GNU_regval_type with INVALID_REGNUM.  This patch
adds an assert to make sure that we don't.  OK for trunk?

Thanks.

H.J.
---
2012-04-06  H.J. Lu  hongjiu...@intel.com

PR debug/52857
* dwarf2out.c (dbx_reg_number): Assert return value !=
INVALID_REGNUM.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index ca88fc5..515a824 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -10167,7 +10167,9 @@ dbx_reg_number (const_rtx rtl)
 }
 #endif
 
-  return DBX_REGISTER_NUMBER (regno);
+  regno = DBX_REGISTER_NUMBER (regno);
+  gcc_assert (regno != INVALID_REGNUM);
+  return regno;
 }
 
 /* Optionally add a DW_OP_piece term to a location description expression.


[PATCH 11/13] Fix va_start related location

2012-04-25 Thread Dodji Seketeli
In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being
emitted because the relevant location was inside the var_start macro
defined in a system header.  It can even point to a token for a
builtin macro there.  This patch unwinds to the first token in real
source code in that case.

Tested on x86_64-unknown-linux-gnu against trunk.

* builtins.c (fold_builtin_next_arg): Unwinds to the first
location in real source code.
---
 gcc/builtins.c |   16 ++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index b47f218..ef90b25 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -12164,8 +12164,20 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
  the default argument promotions, the behavior is undefined.
   */
   else if (DECL_REGISTER (arg))
-warning (0, undefined behaviour when second parameter of 
- %va_start% is declared with %register% storage);
+   {
+ /* There is good chance the current input_location points
+inside the definition of the va_start macro (perhaps on
+the token for builtin) in a system header, so the warning
+will not be emitted.  Use the location in real source
+code.  */
+ source_location current_location =
+   linemap_unwind_to_first_non_reserved_loc (line_table, 
input_location,
+ NULL);
+ warning_at (current_location,
+ 0,
+ undefined behaviour when second parameter of 
+ %va_start% is declared with %register% storage);
+   }
 
   /* We want to verify the second parameter just once before the tree
 optimizers are run and then avoid keeping it in the tree,
-- 
Dodji


Re: [PATCH 08/11] Make conversion warnings work on NULL with -ftrack-macro-expansion

2012-04-25 Thread Gabriel Dos Reis
On Wed, Apr 25, 2012 at 8:42 AM, Dodji Seketeli do...@redhat.com wrote:
 Gabriel Dos Reis g...@integrable-solutions.net writes:

 On Wed, Apr 11, 2012 at 3:46 AM, Dodji Seketeli do...@redhat.com wrote:
 There are various conversion related warnings that trigger on
 potentially dangerous uses of NULL (or __null).  NULL is defined as a
 macro in a system header, so calling warning or warning_at on a virtual
 location of NULL yields no diagnostic.  So the test accompanying this
 patch (as well as others), was failling when run with
 -ftrack-macro-expansion.

 I think it's necessary to use the location of NULL that is in the main
 source code (instead of, e.g, the spelling location that is in the
 system header where the macro is defined) in those cases.  Note that for
 __null, we don't have the issue.

 I like the idea.  However, you should write a separate function
 (get_null_ptr_cst_location?) that computes the location of the null
 pointer constant token and call it from where you need the location.

 OK.  I have introduced such a new function and I gave it the slightly
 more generic name expansion_point_location_if_in_system_header as I
 suspect it can be useful for macros that are similar to NULL.  I haven't
 spotted such macros (from test regressions) yet, though.

Thanks.  But a point of the suggestion was that we won't need the
same comment/explanation duplicated over at least 3 places.  Just
one.  All those three places deal exactly with one instance: null
pointer constant.  That deserves a function in and of itself, which
is documented by the duplicated comments.
Please make that change.  Everything else is OK.  thanks.


Re: [PATCH 11/13] Fix va_start related location

2012-04-25 Thread Gabriel Dos Reis
On Wed, Apr 25, 2012 at 9:04 AM, Dodji Seketeli do...@redhat.com wrote:
 In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being
 emitted because the relevant location was inside the var_start macro
 defined in a system header.  It can even point to a token for a
 builtin macro there.  This patch unwinds to the first token in real
 source code in that case.

While you are at it, could you also use a non-zero value for the second
argument argument to warning_at?


 Tested on x86_64-unknown-linux-gnu against trunk.

        * builtins.c (fold_builtin_next_arg): Unwinds to the first
        location in real source code.
 ---
  gcc/builtins.c |   16 ++--
  1 files changed, 14 insertions(+), 2 deletions(-)

 diff --git a/gcc/builtins.c b/gcc/builtins.c
 index b47f218..ef90b25 100644
 --- a/gcc/builtins.c
 +++ b/gcc/builtins.c
 @@ -12164,8 +12164,20 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
          the default argument promotions, the behavior is undefined.
       */
       else if (DECL_REGISTER (arg))
 -        warning (0, undefined behaviour when second parameter of 
 -                 %va_start% is declared with %register% storage);
 +       {
 +         /* There is good chance the current input_location points
 +            inside the definition of the va_start macro (perhaps on
 +            the token for builtin) in a system header, so the warning
 +            will not be emitted.  Use the location in real source
 +            code.  */
 +         source_location current_location =
 +           linemap_unwind_to_first_non_reserved_loc (line_table, 
 input_location,
 +                                                     NULL);
 +         warning_at (current_location,
 +                     0,
 +                     undefined behaviour when second parameter of 
 +                     %va_start% is declared with %register% storage);
 +       }

       /* We want to verify the second parameter just once before the tree
         optimizers are run and then avoid keeping it in the tree,
 --
                Dodji


Re: [PATCH 10/13] Fix location for static class members

2012-04-25 Thread Gabriel Dos Reis
On Wed, Apr 25, 2012 at 8:55 AM, Dodji Seketeli do...@redhat.com wrote:
 Consider the test case g++.dg/other/offsetof5.C:

    #include stddef.h

    struct A
    {
      char c;
      int i;
    };

    int j = offsetof (A, i);            // { dg-warning invalid 
 access|offsetof }

    template typename T
    struct S
    {
      T h;
      T i;
      static const int j = offsetof (S, i);     // { dg-warning invalid 
 access|offsetof }
    };

    int k = Sint::j;                  // { dg-message required from here }

 The second warning (that involves the instantiation of the S template)
 is not emitted when -ftrack-macro-expansion is on.

 This is because during the instantiation of the member j of S
 template, the location that is used for the warning is the one for the
 DECL j (set by instantiate_decl).  And that location is inaccurately
 set to the locus of 'offsetof', which is a macro defined in a system
 header, so it's discarded by the diagnostics machinery.

 Note that when we reach the point where we emit the warning in
 build_class_member_access_expr offsetof expression has long been
 folded, so we cannot use e.g, the location of the ')' token that would
 have been in the source code.  So I believe the location of 'j' is the
 best we can get at this point.

 The patch below sets the location of the DECL for 'j' to what I
 believe is its precise location; with that, the test case passes with
 and without -ftrack-macro-expansion.  But I had to adjust
 g++.dg/template/sfinae6_neg.C for that.

 Tested on x86_64-unknown-linux-gnu against trunk.

OK.


 gcc/cp

        * decl.c (grokdeclarator): Use the location carried by the
        declarator for the DECL of the static class member.

 gcc/testsuite/

        * g++.dg/template/sfinae6_neg.C: Adjust.
 ---
  gcc/cp/decl.c                               |    3 ++-
  gcc/testsuite/g++.dg/template/sfinae6_neg.C |    4 ++--
  2 files changed, 4 insertions(+), 3 deletions(-)

 diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
 index 28c7cee..40818a3 100644
 --- a/gcc/cp/decl.c
 +++ b/gcc/cp/decl.c
 @@ -10267,7 +10267,8 @@ grokdeclarator (const cp_declarator *declarator,
              {
                /* C++ allows static class members.  All other work
                   for this is done by grokfield.  */
 -               decl = build_lang_decl (VAR_DECL, unqualified_id, type);
 +               decl = build_lang_decl_loc (declarator-id_loc,
 +                                           VAR_DECL, unqualified_id, type);
                set_linkage_for_static_data_member (decl);
                /* Even if there is an in-class initialization, DECL
                   is considered undefined until an out-of-class
 diff --git a/gcc/testsuite/g++.dg/template/sfinae6_neg.C 
 b/gcc/testsuite/g++.dg/template/sfinae6_neg.C
 index d4be5dd..9b7bdfd1 100644
 --- a/gcc/testsuite/g++.dg/template/sfinae6_neg.C
 +++ b/gcc/testsuite/g++.dg/template/sfinae6_neg.C
 @@ -21,9 +21,9 @@ no_type check_is_callable2(...);
  templatetypename F, typename T1, typename T2 = T1
  struct is_callable2
  {
 -  static const bool value =
 +  static const bool value = // { dg-error within this context }
     (sizeof(check_is_callable2(typeF(), typeT1(), typeT2()))
 -     == sizeof(yes_type)); // { dg-error within this context }
 +     == sizeof(yes_type));
  };

  #define JOIN( X, Y ) DO_JOIN( X, Y )
 --
                Dodji


Re: fix incorrect SRA transformation on non-integral VIEW_CONVERT argument

2012-04-25 Thread Richard Guenther
On Wed, Apr 25, 2012 at 3:37 PM, Olivier Hainque hain...@adacore.com wrote:
 Hello,

 For the  PA(1).Z := 44; assignment in the attached Ada
 testcase, we observe the gcc 4.5 SRA pass performing an
 invalid transformation, turning:

  struct {
    system__pack_48__bits_48 OBJ;
  } D.1432;

  D.1432.OBJ = D.1435;
  T1b.F = VIEW_CONVERT_EXPRstruct pt__point(D.1432);

 into:

  SR.12_17 = D.1435_3;
  T1b.F = VIEW_CONVERT_EXPRstruct pt__point(SR.12_17);

 where we have

    var_decl D.1432
     type record_type 0x77fb72a0 BLK
        size integer_cst 0x77fac960 constant 48

 and

    var_decl SR.12
     type integer_type system__pack_48__bits_48
        size integer_cst 0x77ecd870 64

        type integer_type system__pack_48__bits_48___UMT
            size integer_cst 64

 At least the change in size is problematic because the conversion
 outcome might differ after the replacement, in particular on
 big-endian targets.

 mainline does something slightly different with the same effects
 eventually (same testcase failure on sparc-solaris). The attached patch
 is a proposal to address this at the point where we already check
 for VCE in the access creation process, disqualifying scalarization
 for a VCE argument of non-integral size.

 We (AdaCore) have been using this internally for a while now.
 I also checked that it fixes the observable problem on sparc, then
 bootstrapped and regtested on i686-suse-linux.

 OK to commit ?

 Thanks in advance for your feedback,

Does this really cover all problematic cases?  Ah, the existing code
already rejects all VIEW_CONVERT_EXPRs down in the reference
chain.

I think much better would be to simply disallow any toplevel
VIEW_CONVERT_EXPR of BLKmode, so, something like

Index: gcc/tree-sra.c
===
--- gcc/tree-sra.c  (revision 186817)
+++ gcc/tree-sra.c  (working copy)
@@ -1001,9 +1001,10 @@ build_access_from_expr_1 (tree expr, gim

   /* We need to dive through V_C_Es in order to get the size of its parameter
  and not the result type.  Ada produces such statements.  We are also
- capable of handling the topmost V_C_E but not any of those buried in other
- handled components.  */
-  if (TREE_CODE (expr) == VIEW_CONVERT_EXPR)
+ capable of handling the topmost V_C_E if it has a suitable mode but
+ not any of those buried in other handled components.  */
+  if (TREE_CODE (expr) == VIEW_CONVERT_EXPR
+   TYPE_MODE (TREE_TYPE (expr)) != BLKmode)
 expr = TREE_OPERAND (expr, 0);

   if (contains_view_convert_expr_p (expr))

Does that fix your problems, too?  If so I prefer that.

Thanks,
Richard.

 Olivier

 2012-04-25  Olivier Hainque  hain...@adacore.com

        * tree-sra.c (create_access): Additional IN_VCE argument, telling
        if EXPR is VIEW_CONVERT_EXPR input.  Disqualify base if access size
        is not that of an integer mode in this case.
        (build_access_from_expr_1): Adjust caller.

        testsuite/
        * gnat.dg/sra_vce[_decls].adb: New testcase.



[PATCH 12/13] Adjust relevant test cases wrt -ftrack-macro-expansion=[0|2]

2012-04-25 Thread Dodji Seketeli
Even after all the patches I have already submitted for this
-ftrack-macro-expansion business, some test cases where errors happens
on tokens that are defined in macros see their output change in an
incompatible way, when you run them with or without
-ftrack-macro-expansion.

I think this is expected, because the (spelling) locus inside the
definition of the macro pointed to with -ftrack-macro-expansion is
different from the locus of the expansion point of the macro pointed
to without -ftrack-macro-expansion.

In those cases this patch either adjusts the test case and forces it
be run either with -ftrack-macro-expansion, or it just forces it to be
run without -ftrack-macro-expansion.

There are so many libstdc++ tests that were failing because of that
benign issue that I preferred to just run them with
-ftrack-macro-expansion diabled, after inspecting each of them to be
sure there was nothing more serious underneath.  I believe we can latter
turn on -ftrack-macro-expansion there on a case by case basis.

Boostrapped on x86_64-unknown-linux-gnu against trunk with and without
-ftrack-macro-expansion turned on.

gcc/testsuite/

* c-c++-common/tm/attrib-1.c: Force the test case to run without
-ftrack-macro-expansion.
* c-c++-common/warn-ommitted-condop.c: Likewise.
* gcc.dg/assign-warn-1.c: Likewise.
* gcc.dg/assign-warn-2.c: Likewise.
* gcc.dg/attr-alloc_size.c: Likewise.
* gcc.dg/builtin-stringop-chk-1.c: Likewise.
* gcc.dg/builtin-stringop-chk-2.c: Likewise.
* gcc.dg/builtin-strncat-chk-1.c: Likewise.
* gcc.dg/c90-const-expr-9.c: Likewise.
* gcc.dg/c99-const-expr-9.c: Likewise.
* gcc.dg/cpp/direct2.c: Likewise.  Adjust.
* gcc.dg/cpp/direct2s.c: Likewise.
* gcc/testsuite/gcc.dg/cpp/pr28709.c: Likewise.
* gcc.dg/cpp/pragma-diagnostic-1.c: Likewise.
* gcc.dg/dfp/composite-type.c: Likewise.
* gcc.dg/uninit-6-O0.c: Adjust the test case and force it to run
with -ftrack-macro-expansion
* g++.dg/cpp0x/constexpr-ex3.C: Likewise.
* g++.dg/cpp0x/constexpr-overflow.C: Likewise.
* g++.dg/ext/cleanup-1.C: Likewise.
* g++.dg/ext/gnu-inline-global-reject.C: Likewise.
* g++.dg/template/sfinae10.C: Likewise.
* g++.dg/tm/wrap-2.C: Likewise.
* g++.dg/warn/Wconversion-real-integer.C: Likewise.
* g++.dg/warn/Wsign-conversion.C: Likewise.
* g++.dg/warn/multiple-overflow-warn-1.C: Likewise.
* g++.old-deja/g++.mike/p10769b.C: Likewise.
* g++.dg/warn/Wdouble-promotion.C: Adjust the test case and force
it to run with -ftrack-macro-expansion.
* libstdc++-v3/scripts/testsuite_flags.in: By default, run the
test cases without -ftrack-macro-expansion.
---
 gcc/testsuite/c-c++-common/tm/attrib-1.c   |2 +-
 gcc/testsuite/c-c++-common/warn-ommitted-condop.c  |2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ex3.C |2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-overflow.C|2 +-
 gcc/testsuite/g++.dg/ext/cleanup-1.C   |2 +-
 .../g++.dg/ext/gnu-inline-global-reject.C  |2 +-
 gcc/testsuite/g++.dg/template/sfinae10.C   |2 +-
 gcc/testsuite/g++.dg/tm/wrap-2.C   |2 +-
 .../g++.dg/warn/Wconversion-real-integer.C |2 +-
 gcc/testsuite/g++.dg/warn/Wdouble-promotion.C  |6 +++---
 gcc/testsuite/g++.dg/warn/Wsign-conversion.C   |2 +-
 .../g++.dg/warn/multiple-overflow-warn-1.C |2 +-
 gcc/testsuite/g++.old-deja/g++.mike/p10769b.C  |2 +-
 gcc/testsuite/gcc.dg/assign-warn-1.c   |2 +-
 gcc/testsuite/gcc.dg/assign-warn-2.c   |2 +-
 gcc/testsuite/gcc.dg/attr-alloc_size.c |2 +-
 gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c  |2 +-
 gcc/testsuite/gcc.dg/builtin-stringop-chk-2.c  |2 +-
 gcc/testsuite/gcc.dg/builtin-strncat-chk-1.c   |2 +-
 gcc/testsuite/gcc.dg/c90-const-expr-9.c|2 +-
 gcc/testsuite/gcc.dg/c99-const-expr-9.c|2 +-
 gcc/testsuite/gcc.dg/cpp/direct2.c |   12 +++-
 gcc/testsuite/gcc.dg/cpp/direct2s.c|2 +-
 gcc/testsuite/gcc.dg/cpp/pr28709.c |8 +---
 gcc/testsuite/gcc.dg/cpp/pragma-diagnostic-1.c |2 +-
 gcc/testsuite/gcc.dg/dfp/composite-type.c  |2 +-
 gcc/testsuite/gcc.dg/uninit-6-O0.c |6 +++---
 libstdc++-v3/scripts/testsuite_flags.in|2 +-
 28 files changed, 42 insertions(+), 38 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/tm/attrib-1.c 
b/gcc/testsuite/c-c++-common/tm/attrib-1.c
index 536aeb3..534fa0e 100644
--- a/gcc/testsuite/c-c++-common/tm/attrib-1.c
+++ b/gcc/testsuite/c-c++-common/tm/attrib-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -fgnu-tm } */
+/* { dg-options -fgnu-tm -ftrack-macro-expansion=0 } */
 
 

[C++ Patch] PR 53096

2012-04-25 Thread Paolo Carlini

Hi,

this PR is about the resolution of core/1333 being unimplemented, thus 
we reject things like:


struct foo
{
  foo(foo) = default; // ERROR HERE
};

(and this can be annoying, as explained by Eric on the reflector, for 
example when one has to resort to out-of-class defaulting which means 
non-trivial)


Thus, the below elementary patch appears to work fine (I also double 
checked that in such cases the type remains trivial). It's all there is 
to it?


Thanks,
Paolo.

/
/cp
2012-04-25  Paolo Carlini  paolo.carl...@oracle.com

PR c++/53096
* class.c (check_bases_and_members): Implement core/1333, do not
disallow defaulted in the class body special members.

/testsuite
2012-04-25  Paolo Carlini  paolo.carl...@oracle.com

PR c++/53096
* g++.dg/cpp0x/defaulted35.C: New.
* g++.dg/cpp0x/defaulted15.C: Adjust.

Index: testsuite/g++.dg/cpp0x/defaulted15.C
===
--- testsuite/g++.dg/cpp0x/defaulted15.C(revision 186816)
+++ testsuite/g++.dg/cpp0x/defaulted15.C(working copy)
@@ -43,7 +43,7 @@ SA(__has_trivial_copy(E));
 
 struct F
 {
-  F(F) = default; // { dg-error non-const }
+  F(F) = default;
 };
 
 struct G: public F
Index: testsuite/g++.dg/cpp0x/defaulted35.C
===
--- testsuite/g++.dg/cpp0x/defaulted35.C(revision 0)
+++ testsuite/g++.dg/cpp0x/defaulted35.C(revision 0)
@@ -0,0 +1,8 @@
+// PR c++/53096
+// { dg-options -std=c++0x }
+
+struct foo
+{
+  foo(foo) = default;
+  foo operator=(foo) = default;
+};
Index: cp/class.c
===
--- cp/class.c  (revision 186816)
+++ cp/class.c  (working copy)
@@ -1,6 +1,7 @@
 /* Functions related to building classes and their related objects.
Copyright (C) 1987, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 2010, 2011
+   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 2010, 2011,
+   2012
Free Software Foundation, Inc.
Contributed by Michael Tiemann (tiem...@cygnus.com)
 
@@ -5144,9 +5145,6 @@ check_bases_and_members (tree t)
 give the synthesis error.  */
  error (%q+D declared to take const reference, but implicit 
 declaration would take non-const, fn);
-   else if (imp_const_p  !fn_const_p)
- error (%q+D declared to take non-const reference cannot be 
-defaulted in the class body, fn);
  }
defaulted_late_check (fn);
   }


RE: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-04-25 Thread Greta Yorsh
Richard Guenther wrote:
 On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com
 wrote:
  Richard Guenther wrote:
  On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com
  wrote:
   The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon
  enabled:
   FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect
 vectorized
  1
   loops 2
  
   The test expects 2 loops to be vectorized, while gcc successfully
  vectorizes
   3 loops in this test using neon on arm. This patch adjusts the
  expected
   output. Fixed test passes on qemu for arm and powerpc.
  
   OK for trunk?
 
  I think the proper fix is to instead of
 
    for (i = 0; i  N; i++)
      {
        input[i] = i;
        output[i] = 0;
        if (input[i]  256)
          abort ();
      }
 
  use
 
    for (i = 0; i  N; i++)
      {
        input[i] = i;
        output[i] = 0;
        __asm__ volatile ();
      }
 
  to prevent vectorization of initialization loops.
 
  Actually, it looks like both arm and powerpc vectorize this
 initialization loop (line 31), because the control flow is hoisted
 outside the loop by previous optimizations. In addition, arm with neon
 vectorizes the second loop (line 39), but powerpc does not:
 
  39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 * 9;
 
  If this is the expected behaviour for powerpc, then the patch I
 proposed is still needed to fix the test failure on arm. Also, there
 would be no need to disable vectorization of the initialization loop,
 right?
 
 Ah, I thought that was what changed.  Btw, the if () abort () tries to
 disable
 vectorization but does not succeed in doing so.
 
 Richard.

Here is an updated patch. It prevents vectorization of the initialization
loop, as Richard suggested, and updates the expected number of vectorized
loops accordingly. This patch assumes that the second loop in main (line 39)
should only be vectorized on arm with neon.  The test passes for arm and
powerpc.

OK for trunk?

Thank you,
Greta

gcc/testsuite/ChangeLog

2012-04-25  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/vect/slp-perm-8.c (main): Prevent
vectorization of initialization loop. 
(dg-final): Adjust the expected number of 
vectorized loops.




diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
index d211ef9..aaa6cbb 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
@@ -32,8 +32,7 @@ int main (int argc, const char* argv[])
 {
   input[i] = i;
   output[i] = 0;
-  if (input[i]  256)
-abort ();
+  __asm__ volatile ();
 }
 
   for (i = 0; i  N / 3; i++)
@@ -52,7 +51,8 @@ int main (int argc, const char* argv[])
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target 
vect_perm_byte } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { 
vect_perm_byte  arm_neon_ok } } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target { 
vect_perm_byte  {! arm_neon_ok } } } } } */
 /* { dg-final { scan-tree-dump-times vectorizing stmts using SLP 1 vect { 
target vect_perm_byte } } } */
 /* { dg-final { cleanup-tree-dump vect } } */
 


Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-04-25 Thread Richard Guenther
On Wed, Apr 25, 2012 at 4:27 PM, Greta Yorsh greta.yo...@arm.com wrote:
 Richard Guenther wrote:
 On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com
 wrote:
  Richard Guenther wrote:
  On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com
  wrote:
   The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon
  enabled:
   FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect
 vectorized
  1
   loops 2
  
   The test expects 2 loops to be vectorized, while gcc successfully
  vectorizes
   3 loops in this test using neon on arm. This patch adjusts the
  expected
   output. Fixed test passes on qemu for arm and powerpc.
  
   OK for trunk?
 
  I think the proper fix is to instead of
 
    for (i = 0; i  N; i++)
      {
        input[i] = i;
        output[i] = 0;
        if (input[i]  256)
          abort ();
      }
 
  use
 
    for (i = 0; i  N; i++)
      {
        input[i] = i;
        output[i] = 0;
        __asm__ volatile ();
      }
 
  to prevent vectorization of initialization loops.
 
  Actually, it looks like both arm and powerpc vectorize this
 initialization loop (line 31), because the control flow is hoisted
 outside the loop by previous optimizations. In addition, arm with neon
 vectorizes the second loop (line 39), but powerpc does not:
 
  39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 * 9;
 
  If this is the expected behaviour for powerpc, then the patch I
 proposed is still needed to fix the test failure on arm. Also, there
 would be no need to disable vectorization of the initialization loop,
 right?

 Ah, I thought that was what changed.  Btw, the if () abort () tries to
 disable
 vectorization but does not succeed in doing so.

 Richard.

 Here is an updated patch. It prevents vectorization of the initialization
 loop, as Richard suggested, and updates the expected number of vectorized
 loops accordingly. This patch assumes that the second loop in main (line 39)
 should only be vectorized on arm with neon.  The test passes for arm and
 powerpc.

 OK for trunk?

If arm cannot handle 9 * i then the approrpiate condition would be
vect_int_mult, not arm_neon_ok.

Ok with that change.

Richard.

 Thank you,
 Greta

 gcc/testsuite/ChangeLog

 2012-04-25  Greta Yorsh  greta.yo...@arm.com

        * gcc.dg/vect/slp-perm-8.c (main): Prevent
        vectorization of initialization loop.
        (dg-final): Adjust the expected number of
        vectorized loops.






Re: [PATCH, tree-optimization] Fix for PR 52868

2012-04-25 Thread Igor Zamyatin
On Wed, Apr 25, 2012 at 1:14 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Wed, Apr 25, 2012 at 10:56 AM, Igor Zamyatin izamya...@gmail.com wrote:
 Hi all!

 I'd like to post for review the patch which makes some costs adjusting
 in get_computation_cost_at routine in ivopts part.
 As mentioned in the PR changes also fix the bwaves regression from PR 52272.
 Also changes introduce no degradations on spec2000/2006 and
 EEMBC1.1/2.0(this was measured on Atom) on x86


 Bootstrapped and regtested on x86. Ok to commit?

 I can't make sense of the patch and the comment does not help.

 +      diff_cost = cost.cost;
       cost.cost /= avg_loop_niter (data-current_loop);
 +      add_cost_val = add_cost (TYPE_MODE (ctype), data-speed);
 +      /* Do cost correction if address cost is small enough
 +         and difference cost is high enough.  */
 +      if (address_p  diff_cost  add_cost_val
 +           get_address_cost (symbol_present, var_present,
 +                               offset, ratio, cstepi,
 +                               TYPE_MODE (TREE_TYPE (utype)),
 +                               TYPE_ADDR_SPACE (TREE_TYPE (utype)),
 +                               speed, stmt_is_after_inc,
 +                               can_autoinc).cost = add_cost_val)
 +        cost.cost += add_cost_val;

 Please explain more thoroughly.  It also would seem to be better to add
 an extra case, as later code does

For example for such code

   for (j=0; jM;j++) {
   for (i=0; iN; i++)
   sum += ptr-a[j][i] * ptr-c[k][i];
   }
 we currently have following gimple on x86 target (I provided a piece
of all phase output):

   # ivtmp.13_30 = PHI ivtmp.13_31(3), ivtmp.13_33(7)
   D.1748_34 = (void *) ivtmp.13_30;
   D.1722_7 = MEM[base: D.1748_34, offset: 0B];
   D.1750_36 = ivtmp.27_28;
   D.1751_37 = D.1750_36 + ivtmp.13_30; -- we got
non-invariant add which is not taken into account currently in cost
model
   D.1752_38 = (void *) D.1751_37;
   D.1753_39 = (sizetype) k_8(D);
   D.1754_40 = D.1753_39 * 800;
   D.1723_9 = MEM[base: D.1752_38, index: D.1754_40, offset: 16000B];
   ...

 With proposed fix we produce:

   # ivtmp.14_30 = PHI ivtmp.14_31(3), 0(7)
   D.1749_34 = (struct S *) ivtmp.25_28;
   D.1722_7 = MEM[base: D.1749_34, index: ivtmp.14_30, offset: 0B];
   D.1750_35 = (sizetype) k_8(D);
   D.1751_36 = D.1750_35 * 800;
   D.1752_37 = ptr_6(D) + D.1751_36;
   D.1723_9 = MEM[base: D.1752_37, index: ivtmp.14_30, offset: 16000B];

which is more effective on platforms where address cost is cheaper
than cost of addition operation. That's basically what this adjustment
is for.

So comment in the source code now looks as follows

/* Do cost correction when address difference produces
   additional non-invariant add operation which is less
   profitable if address cost is cheaper than cost of add.  */


  /* Now the computation is in shape symbol + var1 + const + ratio * var2.
     (symbol/var1/const parts may be omitted).  If we are looking for an
     address, find the cost of addressing this.  */
  if (address_p)
    return add_costs (cost,
                      get_address_cost (symbol_present, var_present,
                                        offset, ratio, cstepi,
                                        TYPE_MODE (TREE_TYPE (utype)),
                                        TYPE_ADDR_SPACE (TREE_TYPE (utype)),
                                        speed, stmt_is_after_inc,
                                        can_autoinc));

 thus refactoring the code a bit would make it possible to CSE the
 get_address_cost
 call and eventually make it clearer what the code does.

'offset' could be changed beetween two calls of get_address_cost so
such refactoring looks useless.

New patch (only the comment was changed) attached. Changelog was
changed as well.


 Richard.


Changelog:

 2012-04-26  Yuri Rumyantsev  yuri.rumyant...@intel.com

        * tree-ssa-loop-ivopts.c (get_computation_cost_at): Adjust
        cost model when address difference produces additional
non-invariant add operation which is less profitable if
address cost is cheaper than cost of add.

Thanks,
Igor


ivopts1.patch
Description: Binary data


[PATCH 13/13] Switch -ftrack-macro-expansion=2 on by default.

2012-04-25 Thread Dodji Seketeli
Hopefully closing the series, this patch switches the compiler to
-ftrack-macro-expansion=2 by default.

Tested and bootstrapped on x86_64-unknown-linux-gnu against trunk.

libcpp/

* init.c (cpp_create_reader): Switch -ftrack-macro-expansion=2 on
by default.  Add comments.

gcc/docs/

* cppopts.texi: Adjust for enabling -ftrack-macro-expansion=2 by
default.
---
 gcc/doc/cppopts.texi |2 ++
 libcpp/init.c|6 ++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/gcc/doc/cppopts.texi b/gcc/doc/cppopts.texi
index 205d870..27b1095 100644
--- a/gcc/doc/cppopts.texi
+++ b/gcc/doc/cppopts.texi
@@ -617,6 +617,8 @@ tokens locations completely. This value is the most memory 
hungry.
 When this option is given no argument, the default parameter value is
 @samp{2}.
 
+Note that -ftrack-macro-expansion=2 is activated by default.
+
 @item -fexec-charset=@var{charset}
 @opindex fexec-charset
 @cindex character set, execution
diff --git a/libcpp/init.c b/libcpp/init.c
index 5fa82ca..ba1e9cd 100644
--- a/libcpp/init.c
+++ b/libcpp/init.c
@@ -174,6 +174,12 @@ cpp_create_reader (enum c_lang lang, hash_table *table,
   CPP_OPTION (pfile, warn_dollars) = 1;
   CPP_OPTION (pfile, warn_variadic_macros) = 1;
   CPP_OPTION (pfile, warn_builtin_macro_redefined) = 1;
+  /* By default, track locations of tokens resulting from macro
+ expansion.  The '2' means, track the locations with the highest
+ accuracy.  Read the comments for struct
+ cpp_options::track_macro_expansion to learn about the other
+ values.  */
+  CPP_OPTION (pfile, track_macro_expansion) = 2;
   CPP_OPTION (pfile, warn_normalize) = normalized_C;
 
   /* Default CPP arithmetic to something sensible for the host for the
-- 
Dodji


Re: [PATCH, tree-optimization] Fix for PR 52868

2012-04-25 Thread Richard Guenther
On Wed, Apr 25, 2012 at 4:32 PM, Igor Zamyatin izamya...@gmail.com wrote:
 On Wed, Apr 25, 2012 at 1:14 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Apr 25, 2012 at 10:56 AM, Igor Zamyatin izamya...@gmail.com wrote:
 Hi all!

 I'd like to post for review the patch which makes some costs adjusting
 in get_computation_cost_at routine in ivopts part.
 As mentioned in the PR changes also fix the bwaves regression from PR 52272.
 Also changes introduce no degradations on spec2000/2006 and
 EEMBC1.1/2.0(this was measured on Atom) on x86


 Bootstrapped and regtested on x86. Ok to commit?

 I can't make sense of the patch and the comment does not help.

 +      diff_cost = cost.cost;
       cost.cost /= avg_loop_niter (data-current_loop);
 +      add_cost_val = add_cost (TYPE_MODE (ctype), data-speed);
 +      /* Do cost correction if address cost is small enough
 +         and difference cost is high enough.  */
 +      if (address_p  diff_cost  add_cost_val
 +           get_address_cost (symbol_present, var_present,
 +                               offset, ratio, cstepi,
 +                               TYPE_MODE (TREE_TYPE (utype)),
 +                               TYPE_ADDR_SPACE (TREE_TYPE (utype)),
 +                               speed, stmt_is_after_inc,
 +                               can_autoinc).cost = add_cost_val)
 +        cost.cost += add_cost_val;

 Please explain more thoroughly.  It also would seem to be better to add
 an extra case, as later code does

 For example for such code

   for (j=0; jM;j++) {
       for (i=0; iN; i++)
           sum += ptr-a[j][i] * ptr-c[k][i];
   }
  we currently have following gimple on x86 target (I provided a piece
 of all phase output):

           # ivtmp.13_30 = PHI ivtmp.13_31(3), ivtmp.13_33(7)
           D.1748_34 = (void *) ivtmp.13_30;
           D.1722_7 = MEM[base: D.1748_34, offset: 0B];
           D.1750_36 = ivtmp.27_28;
           D.1751_37 = D.1750_36 + ivtmp.13_30; -- we got
 non-invariant add which is not taken into account currently in cost
 model
           D.1752_38 = (void *) D.1751_37;
           D.1753_39 = (sizetype) k_8(D);
           D.1754_40 = D.1753_39 * 800;
           D.1723_9 = MEM[base: D.1752_38, index: D.1754_40, offset: 16000B];
           ...

  With proposed fix we produce:

           # ivtmp.14_30 = PHI ivtmp.14_31(3), 0(7)
           D.1749_34 = (struct S *) ivtmp.25_28;
           D.1722_7 = MEM[base: D.1749_34, index: ivtmp.14_30, offset: 0B];
           D.1750_35 = (sizetype) k_8(D);
           D.1751_36 = D.1750_35 * 800;
           D.1752_37 = ptr_6(D) + D.1751_36;
           D.1723_9 = MEM[base: D.1752_37, index: ivtmp.14_30, offset: 16000B];

 which is more effective on platforms where address cost is cheaper
 than cost of addition operation. That's basically what this adjustment
 is for.

If we generally miss to account for the add then why is the adjustment
conditional on diff_cost  add_cost and address_cost = add_cost?

Is this a new heuristic or a fix for not accurately computing the cost for the
stmts we generate?

Richard.

 So comment in the source code now looks as follows

 /* Do cost correction when address difference produces
   additional non-invariant add operation which is less
   profitable if address cost is cheaper than cost of add.  */


  /* Now the computation is in shape symbol + var1 + const + ratio * var2.
     (symbol/var1/const parts may be omitted).  If we are looking for an
     address, find the cost of addressing this.  */
  if (address_p)
    return add_costs (cost,
                      get_address_cost (symbol_present, var_present,
                                        offset, ratio, cstepi,
                                        TYPE_MODE (TREE_TYPE (utype)),
                                        TYPE_ADDR_SPACE (TREE_TYPE (utype)),
                                        speed, stmt_is_after_inc,
                                        can_autoinc));

 thus refactoring the code a bit would make it possible to CSE the
 get_address_cost
 call and eventually make it clearer what the code does.

 'offset' could be changed beetween two calls of get_address_cost so
 such refactoring looks useless.

 New patch (only the comment was changed) attached. Changelog was
 changed as well.


 Richard.


 Changelog:

  2012-04-26  Yuri Rumyantsev  yuri.rumyant...@intel.com

         * tree-ssa-loop-ivopts.c (get_computation_cost_at): Adjust
        cost model when address difference produces additional
        non-invariant add operation which is less profitable if
        address cost is cheaper than cost of add.

 Thanks,
 Igor


Re: [C] improve missing initializers diagnostics

2012-04-25 Thread H.J. Lu
On Sat, Apr 21, 2012 at 4:58 AM, Manuel López-Ibáñez
lopeziba...@gmail.com wrote:
 This patch improves missing initializers diagnostics. From:

 pr36446.c:13:3: warning: missing initializer [-Wmissing-field-initializers]
   .h = {1},
   ^
 pr36446.c:13:3: warning: (near initialization for ‘m0.h.b’)
 [-Wmissing-field-initializers]
   .h = {1},
   ^

 to:

 pr36446.c:13:3: warning: missing initializer for field ‘b’ of ‘struct
 h’ [-Wmissing-field-initializers]
   .h = {1},
   ^
 pr36446.c:3:7: note: ‘b’ declared here
   int b;
       ^

 Bootstrapped/regression tested.

 OK?


 2012-04-19  Manuel López-Ibáñez  m...@gcc.gnu.org

        * c-typeck.c (pop_init_level): Improve diagnostics.
 testsuite/
        * gcc.dg/m-un-2.c: Update.
        * gcc.dg/20011021-1.c: Update.

On Linux/x86, revision 186808 gave me:

FAIL: gcc.dg/20011021-1.c  (test for warnings, line 34)
FAIL: gcc.dg/20011021-1.c  (test for warnings, line 41)
FAIL: gcc.dg/20011021-1.c  (test for warnings, line 44)
FAIL: gcc.dg/20011021-1.c (test for excess errors)
FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 27)
FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 30)
FAIL: gcc.dg/m-un-2.c (test for excess errors)
FAIL: gcc.dg/m-un-2.c warning regression 2 (test for warnings, line 12)
FAIL: gcc.dg/missing-field-init-2.c  (test for warnings, line 14)
FAIL: gcc.dg/missing-field-init-2.c  (test for warnings, line 7)
FAIL: gcc.dg/missing-field-init-2.c  (test for warnings, line 8)
FAIL: gcc.dg/missing-field-init-2.c (test for excess errors)

Revision 186806 is OK.


-- 
H.J.


Re: [PATCH 08/11] Make conversion warnings work on NULL with -ftrack-macro-expansion

2012-04-25 Thread Dodji Seketeli
Gabriel Dos Reis g...@integrable-solutions.net writes:

 Thanks.  But a point of the suggestion was that we won't need the
 same comment/explanation duplicated over at least 3 places.  Just
 one.  All those three places deal exactly with one instance: null
 pointer constant.  That deserves a function in and of itself, which
 is documented by the duplicated comments.
 Please make that change.  Everything else is OK.  thanks.

I am sorry for the round trips.  Please find below a patch udpated
accordingly.

I am bootstrapping the whole patch set, but the impacted files of this
patch have built fine so far.

Thanks.

gcc/
* input.h (expansion_point_location_if_in_system_header): Declare
new function.
* input.c (expansion_point_location_if_in_system_header): Define it.
gcc/cp/

* call.c (conversion_null_warnings): Use the new
expansion_point_location_if_in_system_header.
* cvt.c (build_expr_type_conversion): Likewise.
* typeck.c (cp_build_binary_op): Likewise.

gcc/testsuite/

* g++.dg/warn/Wconversion-null-2.C: Add testing for __null,
alongside the previous testing for NULL.
---
 gcc/cp/call.c  |7 -
 gcc/cp/cvt.c   |9 +-
 gcc/cp/typeck.c|9 --
 gcc/input.c|   20 +++
 gcc/input.h|1 +
 gcc/testsuite/g++.dg/warn/Wconversion-null-2.C |   31 +++-
 6 files changed, 69 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f9a7f08..85e45c2 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -5598,12 +5598,15 @@ conversion_null_warnings (tree totype, tree expr, tree 
fn, int argnum)
   if (expr == null_node  TREE_CODE (totype) != BOOLEAN_TYPE
ARITHMETIC_TYPE_P (totype))
 {
+  source_location loc =
+   expansion_point_location_if_in_system_header (input_location);
+
   if (fn)
-   warning_at (input_location, OPT_Wconversion_null,
+   warning_at (loc, OPT_Wconversion_null,
passing NULL to non-pointer argument %P of %qD,
argnum, fn);
   else
-   warning_at (input_location, OPT_Wconversion_null,
+   warning_at (loc, OPT_Wconversion_null,
converting to non-pointer type %qT from NULL, totype);
 }
 
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 3dab372..49ba38a 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -1472,8 +1472,13 @@ build_expr_type_conversion (int desires, tree expr, bool 
complain)
   if (expr == null_node
(desires  WANT_INT)
!(desires  WANT_NULL))
-warning_at (input_location, OPT_Wconversion_null,
-   converting NULL to non-pointer type);
+{
+  source_location loc =
+   expansion_point_location_if_in_system_header (input_location);
+
+  warning_at (loc, OPT_Wconversion_null,
+ converting NULL to non-pointer type);
+}
 
   basetype = TREE_TYPE (expr);
 
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index fb2f1bc..52d264b 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -3838,9 +3838,12 @@ cp_build_binary_op (location_t location,
  || (!null_ptr_cst_p (orig_op1) 
   !TYPE_PTR_P (type1)  !TYPE_PTR_TO_MEMBER_P (type1)))
(complain  tf_warning))
-/* Some sort of arithmetic operation involving NULL was
-   performed.  */
-warning (OPT_Wpointer_arith, NULL used in arithmetic);
+{
+  source_location loc =
+   expansion_point_location_if_in_system_header (input_location);
+
+  warning_at (loc, OPT_Wpointer_arith, NULL used in arithmetic);
+}
 
   switch (code)
 {
diff --git a/gcc/input.c b/gcc/input.c
index 260be7e..5f14489 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -162,6 +162,26 @@ expand_location_to_spelling_point (source_location loc)
   return expand_location_1 (loc, /*expansion_piont_p=*/false);
 }
 
+/* If LOCATION is in a sytem header and if it's a virtual location for
+   a token coming from the expansion of a macro M, unwind it to the
+   location of the expansion point of M.  Otherwise, just return
+   LOCATION.
+
+   This is used for instance when we want to emit diagnostics about a
+   token that is located in a macro that is itself defined in a system
+   header -- e.g for the NULL macro.  In that case, if LOCATION is
+   passed to diagnostics emitting functions like warning_at as is, no
+   diagnostic won't be emitted.  */
+
+source_location
+expansion_point_location_if_in_system_header (source_location location)
+{
+  if (in_system_header_at (location))
+location = linemap_resolve_location (line_table, location,
+LRK_MACRO_EXPANSION_POINT,
+NULL);
+  return location;
+}
 
 #define ONE_K 1024
 #define ONE_M (ONE_K * ONE_K)
diff --git a/gcc/input.h b/gcc/input.h

RE: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-04-25 Thread Greta Yorsh


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: 25 April 2012 15:32
 To: Greta Yorsh
 Cc: gcc-patches@gcc.gnu.org; mikest...@comcast.net; r...@cebitec.uni-
 bielefeld.de; Richard Earnshaw
 Subject: Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-
 perm-8.c
 
 On Wed, Apr 25, 2012 at 4:27 PM, Greta Yorsh greta.yo...@arm.com
 wrote:
  Richard Guenther wrote:
  On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com
  wrote:
   Richard Guenther wrote:
   On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh
 greta.yo...@arm.com
   wrote:
The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with
 neon
   enabled:
FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect
  vectorized
   1
loops 2
   
The test expects 2 loops to be vectorized, while gcc
 successfully
   vectorizes
3 loops in this test using neon on arm. This patch adjusts the
   expected
output. Fixed test passes on qemu for arm and powerpc.
   
OK for trunk?
  
   I think the proper fix is to instead of
  
     for (i = 0; i  N; i++)
       {
         input[i] = i;
         output[i] = 0;
         if (input[i]  256)
           abort ();
       }
  
   use
  
     for (i = 0; i  N; i++)
       {
         input[i] = i;
         output[i] = 0;
         __asm__ volatile ();
       }
  
   to prevent vectorization of initialization loops.
  
   Actually, it looks like both arm and powerpc vectorize this
  initialization loop (line 31), because the control flow is hoisted
  outside the loop by previous optimizations. In addition, arm with
 neon
  vectorizes the second loop (line 39), but powerpc does not:
  
   39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 *
 9;
  
   If this is the expected behaviour for powerpc, then the patch I
  proposed is still needed to fix the test failure on arm. Also, there
  would be no need to disable vectorization of the initialization
 loop,
  right?
 
  Ah, I thought that was what changed.  Btw, the if () abort () tries
 to
  disable
  vectorization but does not succeed in doing so.
 
  Richard.
 
  Here is an updated patch. It prevents vectorization of the
 initialization
  loop, as Richard suggested, and updates the expected number of
 vectorized
  loops accordingly. This patch assumes that the second loop in main
 (line 39)
  should only be vectorized on arm with neon.  The test passes for arm
 and
  powerpc.
 
  OK for trunk?
 
 If arm cannot handle 9 * i then the approrpiate condition would be
 vect_int_mult, not arm_neon_ok.

It's the other way around: arm can handle this multiplication, but powerpc
does not handle it for some reason. 

Thank you,
Greta



 
 Ok with that change.
 
 Richard.
 
  Thank you,
  Greta
 
  gcc/testsuite/ChangeLog
 
  2012-04-25  Greta Yorsh  greta.yo...@arm.com
 
         * gcc.dg/vect/slp-perm-8.c (main): Prevent
         vectorization of initialization loop.
         (dg-final): Adjust the expected number of
         vectorized loops.
 
 
 
 






Re: Symbol table 13/many: reachability code rewrite

2012-04-25 Thread Jan Hubicka
Hi,
the fortran problem is caused by fact that fortran does nested funtions that
are static constructors.  I did not really think of that case in cgraph
construction code.  Fixed thus.

Honza
PR middle-end/53089 
* cgraphunit.c (cgraph_process_new_functions): Do not enqueue new 
functions here.
(referred_to_p): Move ahead in file to avoid forward declaration.
(cgraph_finalize_function): Finalize them here.
* symtab.c (dump_symtab): Dump ctors and dtors.

Index: cgraphunit.c
===
*** cgraphunit.c(revision 186770)
--- cgraphunit.c(working copy)
*** cgraph_process_new_functions (void)
*** 243,249 
  cgraph_finalize_function (fndecl, false);
  output = true;
cgraph_call_function_insertion_hooks (node);
- enqueue_node ((symtab_node) node);
  break;
  
case CGRAPH_STATE_IPA:
--- 243,248 
*** cgraph_reset_node (struct cgraph_node *n
*** 320,325 
--- 319,340 
cgraph_node_remove_callees (node);
  }
  
+ /* Return true when there are references to NODE.  */
+ 
+ static bool
+ referred_to_p (symtab_node node)
+ {
+   int i;
+   struct ipa_ref *ref;
+ 
+   for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, ref);
+i++)
+ return true;
+   if (symtab_function_p (node)  cgraph (node)-callers)
+ return true;
+   return false;
+ }
+ 
  /* DECL has been parsed.  Take it, queue it, compile it at the whim of the
 logic in effect.  If NESTED is true, then our caller cannot stand to have
 the garbage collector run at the moment.  We would need to either create
*** cgraph_finalize_function (tree decl, boo
*** 372,377 
--- 387,397 
  
if (!nested)
  ggc_collect ();
+ 
+   if (cgraph_state == CGRAPH_STATE_CONSTRUCTION
+(cgraph_decide_is_function_needed (node, decl)
+ || referred_to_p ((symtab_node)node)))
+ enqueue_node ((symtab_node)node);
  }
  
  /* Add the function FNDECL to the call graph.
*** process_function_and_variable_attributes
*** 1114,1135 
  }
  }
  
- /* Return true when there are references to NODE.  */
- 
- static bool
- referred_to_p (symtab_node node)
- {
-   int i;
-   struct ipa_ref *ref;
- 
-   for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, ref);
-i++)
- return true;
-   if (symtab_function_p (node)  cgraph (node)-callers)
- return true;
-   return false;
- }
- 
  /* Mark DECL as finalized.  By finalizing the declaration, frontend instruct 
the
 middle end to output the variable to asm file, if needed or externally
 visible.  */
--- 1134,1139 
Index: symtab.c
===
*** symtab.c(revision 186770)
--- symtab.c(working copy)
*** dump_symtab_base (FILE *f, symtab_node n
*** 414,419 
--- 414,426 
  fprintf (f,  virtual);
if (DECL_ARTIFICIAL (node-symbol.decl))
  fprintf (f,  artificial);
+   if (TREE_CODE (node-symbol.decl) == FUNCTION_DECL)
+ {
+   if (DECL_STATIC_CONSTRUCTOR (node-symbol.decl))
+   fprintf (f,  constructor);
+   if (DECL_STATIC_DESTRUCTOR (node-symbol.decl))
+   fprintf (f,  destructor);
+ }
fprintf (f, \n);

if (node-symbol.same_comdat_group)


[Ada] Atomic protected types

2012-04-25 Thread Arnaud Charlet
This patch cleans up the implementation of atomic protected types. No changes
in behavior.

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-04-25  Hristian Kirtchev  kirtc...@adacore.com

* exp_ch9.adb: Rename Lock_Free_Sub_Type
to Lock_Free_Subprogram. Remove type Subprogram_Id.
Rename LF_Sub_Table to Lock_Free_Subprogram_Table.
(Allow_Lock_Free_Implementation): Renamed to
Allows_Lock_Free_Implementation.  Update the comment on
lock-free restrictions. Code clean up and restructuring.
(Build_Lock_Free_Protected_Subprogram_Body): Update the
profile and related comments. Code clean up and restructuring.
(Build_Lock_Free_Unprotected_Subprogram_Body): Update the
profile and related comments. Code clean up and restructuring.
(Comp_Of): Removed.

Index: exp_ch9.adb
===
--- exp_ch9.adb (revision 186823)
+++ exp_ch9.adb (working copy)
@@ -81,29 +81,24 @@
-- Lock Free Data Structure --
--
 
-   --  A data structure used for the Lock Free (LF) implementation of protected
-   --  objects. Since a protected subprogram can only access a single protected
-   --  component in the LF implementation, this structure stores each protected
-   --  subprogram and its accessed protected component when the protected
-   --  object allows the LF implementation.
-
-   type Lock_Free_Sub_Type is record
+   type Lock_Free_Subprogram is record
   Sub_Body : Node_Id;
   Comp_Id  : Entity_Id;
end record;
+   --  This data structure and its fields must be documented, ALL global
+   --  data structures must be documented. We never rely on guessing what
+   --  things mean from their names.
 
-   subtype Subprogram_Id is Nat;
+   --  The following table establishes a relation between a subprogram body and
+   --  an unique protected component referenced in this body.
 
-   --  The following table used for the Lock Free implementation of protected
-   --  objects maps Lock_Free_Sub_Type to Subprogram_Id.
-
-   package LF_Sub_Table is new Table.Table (
- Table_Component_Type = Lock_Free_Sub_Type,
- Table_Index_Type = Subprogram_Id,
+   package Lock_Free_Subprogram_Table is new Table.Table (
+ Table_Component_Type = Lock_Free_Subprogram,
+ Table_Index_Type = Nat,
  Table_Low_Bound  = 1,
  Table_Initial= 5,
  Table_Increment  = 5,
- Table_Name   = LF_Sub_Table);
+ Table_Name   = Lock_Free_Subprogram_Table);
 
---
-- Local Subprograms --
@@ -139,9 +134,19 @@
--Decls is the list of declarations to be enhanced.
--Ent is the entity for the original entry body.
 
-   function Allow_Lock_Free_Implementation (N : Node_Id) return Boolean;
-   --  Given a protected body N, return True if N permits a lock free
-   --  implementation.
+   function Allows_Lock_Free_Implementation (N : Node_Id) return Boolean;
+   --  Given a protected body N, return True if N satisfies the following list
+   --  of lock-free restrictions:
+   --
+   --1) Protected type
+   -- May not contain entries
+   -- May contain only scalar components
+   -- Component types must support atomic compare and exchange
+   --
+   --2) Protected subprograms
+   -- May not have side effects
+   -- May not contain loop statements or procedure calls
+   -- Function calls and attribute references must be static
 
function Build_Accept_Body (Astat : Node_Id) return Node_Id;
--  Transform accept statement into a block with added exception handler.
@@ -189,20 +194,20 @@
--  Build subprogram declaration for previous one
 
function Build_Lock_Free_Protected_Subprogram_Body
- (N : Node_Id;
-  Pid   : Node_Id;
-  N_Op_Spec : Node_Id) return Node_Id;
-   --  This function is used to construct the lock free version of a protected
-   --  subprogram when the protected type denoted by Pid allows the lock free
-   --  implementation. It only contains a call to the unprotected version of
-   --  the subprogram body.
+ (N   : Node_Id;
+  Prot_Typ: Node_Id;
+  Unprot_Spec : Node_Id) return Node_Id;
+   --  N denotes a subprogram body of protected type Prot_Typ. Unprot_Spec is
+   --  the subprogram specification of the unprotected version of N. Transform
+   --  N such that it invokes the unprotected version of the body.
 
function Build_Lock_Free_Unprotected_Subprogram_Body
- (N : Node_Id;
-  Pid : Node_Id) return Node_Id;
-   --  This function is used to construct the lock free version of an
-   --  unprotected subprogram when the protected type denoted by Pid allows the
-   --  lock free implementation.
+ (N: Node_Id;
+  Prot_Typ : Node_Id) return Node_Id;
+   --  N denotes a subprogram body of protected type Prot_Typ. Build a version

Re: [PATCH 11/13] Fix va_start related location

2012-04-25 Thread Dodji Seketeli
Gabriel Dos Reis g...@integrable-solutions.net writes:

 On Wed, Apr 25, 2012 at 9:04 AM, Dodji Seketeli do...@redhat.com wrote:
 In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being
 emitted because the relevant location was inside the var_start macro
 defined in a system header.  It can even point to a token for a
 builtin macro there.  This patch unwinds to the first token in real
 source code in that case.

 While you are at it, could you also use a non-zero value for the second
 argument argument to warning_at?

I couldn't find any obvious value for it.  I am thinking maybe it would
make sense to introduction a new -Wva_start to warn about possible
dangerous uses of the va_start macro and use that as the second argument
for the relevant warnings emitted by fold_builtin_next_arg.  What do you
think?

In any case, this topic seems logically unrelated to the purpose of
enable -ftrack-macro-expansion by default, so IMHO it would be better to
do this in a separate self contain patch.  Among other things, this
would ease possible future back-ports.  Would you agree?

Thanks.

-- 
Dodji


Re: [PATCH 08/11] Make conversion warnings work on NULL with -ftrack-macro-expansion

2012-04-25 Thread Gabriel Dos Reis
OK, thanks!

-- Gaby


Re: [PATCH 11/13] Fix va_start related location

2012-04-25 Thread Gabriel Dos Reis
On Wed, Apr 25, 2012 at 10:20 AM, Dodji Seketeli do...@redhat.com wrote:
 Gabriel Dos Reis g...@integrable-solutions.net writes:

 On Wed, Apr 25, 2012 at 9:04 AM, Dodji Seketeli do...@redhat.com wrote:
 In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being
 emitted because the relevant location was inside the var_start macro
 defined in a system header.  It can even point to a token for a
 builtin macro there.  This patch unwinds to the first token in real
 source code in that case.

 While you are at it, could you also use a non-zero value for the second
 argument argument to warning_at?

 I couldn't find any obvious value for it.  I am thinking maybe it would
 make sense to introduction a new -Wva_start to warn about possible
 dangerous uses of the va_start macro and use that as the second argument
 for the relevant warnings emitted by fold_builtin_next_arg.  What do you
 think?

or -Wvarargs?



 In any case, this topic seems logically unrelated to the purpose of
 enable -ftrack-macro-expansion by default, so IMHO it would be better to
 do this in a separate self contain patch.  Among other things, this
 would ease possible future back-ports.  Would you agree?

OK.

-- Gaby


Re: [PATCH 12/13] Adjust relevant test cases wrt -ftrack-macro-expansion=[0|2]

2012-04-25 Thread Gabriel Dos Reis
On Wed, Apr 25, 2012 at 9:16 AM, Dodji Seketeli do...@redhat.com wrote:
 Even after all the patches I have already submitted for this
 -ftrack-macro-expansion business, some test cases where errors happens
 on tokens that are defined in macros see their output change in an
 incompatible way, when you run them with or without
 -ftrack-macro-expansion.

 I think this is expected, because the (spelling) locus inside the
 definition of the macro pointed to with -ftrack-macro-expansion is
 different from the locus of the expansion point of the macro pointed
 to without -ftrack-macro-expansion.

 In those cases this patch either adjusts the test case and forces it
 be run either with -ftrack-macro-expansion, or it just forces it to be
 run without -ftrack-macro-expansion.

 There are so many libstdc++ tests that were failing because of that
 benign issue that I preferred to just run them with
 -ftrack-macro-expansion diabled, after inspecting each of them to be
 sure there was nothing more serious underneath.  I believe we can latter
 turn on -ftrack-macro-expansion there on a case by case basis.

 Boostrapped on x86_64-unknown-linux-gnu against trunk with and without
 -ftrack-macro-expansion turned on.

OK.

-- Gaby


Re: [PATCH 13/13] Switch -ftrack-macro-expansion=2 on by default.

2012-04-25 Thread Gabriel Dos Reis
On Wed, Apr 25, 2012 at 9:33 AM, Dodji Seketeli do...@redhat.com wrote:
 Hopefully closing the series, this patch switches the compiler to
 -ftrack-macro-expansion=2 by default.

 Tested and bootstrapped on x86_64-unknown-linux-gnu against trunk.

Makes sense to me; Tom?

-- Gaby


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-25 Thread Teresa Johnson
On Tue, Apr 24, 2012 at 4:38 PM, Steven Bosscher stevenb@gmail.com wrote:
 On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote:

        * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
        (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.

 You should add documentation for these new PARAMs to doc/invoke.texi.

Ok, will do.


 I don't really like these new PARAMs: All other loop PARAMs are based
 on the number of insns in a loop, or the maximum number of times a
 transformation is applied. Your new
 PARAM_MIN_ITER_UNROLL_WITH_BRANCHES is completely different, because
 it is a number of iterations. This makes the PARAM value feel even
 more arbitrary than all the other PARAMs to some extend already do...

That's true, they are different in what they are checking than some of
the other loop unrolling params. But I need some threshold for
determining when a loop is hot enough that its unrolled branches will
be executed frequently enough to train the branch predictor and also
where the impact on the branch prediction in the outer region of code
is less likely to matter overall. The defaults were chosen so that the
new unrolling limit should only kick in for loops that are not
iterating much anyway, and where the outer hot region has quite a few
branches.


 (The only other PARAM like that is PARAM_ALIGN_LOOP_ITERATIONS, and
 its default value also looks quite arbitrary...)


 Index: loop-unroll.c
 ===
 --- loop-unroll.c       (revision 186783)
 +++ loop-unroll.c       (working copy)
 @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc
                                             basic_block);
  static rtx get_expansion (struct var_to_expand *);

 +/* Determine whether LOOP contains call.  */
 +static bool
 +loop_has_call(struct loop *loop)
 +{
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          if (CALL_P (insn))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Determine whether LOOP contains floating-point computation.  */
 +static bool
 +loop_has_FP_comp(struct loop *loop)
 +{
 +  rtx set, dest;
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          set = single_set (insn);
 +          if (!set)
 +            continue;
 +
 +          dest = SET_DEST (set);
 +          if (FLOAT_MODE_P (GET_MODE (dest)))
 +            {
 +              free (body);
 +              return true;

 So you only detect single-set FP operations where some insns stores in
 a float mode. It wouldn't be very difficult to just walk over all sets
 and look for float modes. This is also necessary e.g. for x87 sincos,
 as well as various insns on other machines. Your comments say you
 don't want to apply the new heuristic to loops containing FP
 operations because these loops usually benefit more from unrolling.
 Therefore, you should IMHO look at non-single_set() insns also here,
 to avoid applying the heuristics to loops containing non-single_set()
 FP insns.

Ok, thanks for the suggestion, I will expand this for the next version
of the patch.



 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}

 Nit: You are calling loop_has_call and loop_has_FP_comp() twice on
 each loop (first for constant iterations and next for runtime
 iterations),

I don't think that is true for loop_has_FP_comp, since it is called in
decide_unroll_constant_iterations and decide_unroll_runtime_iterations
just after we have checked if the loop has a constant number of
iterations, and returned early depending on the result of this check
and which routine we are in. So each inner loop will only reach the
call to loop_has_FP_comp in one of these routines.

In the case of loop_has_call, which is only called for a hot outer
loop, it is true we could invoke that more than once. That would
happen if a hot outer loop contains more than one nested inner loop
with a small iteration count and branches that we attempt to unroll
(it is called at most once per inner loop that we attempt to unroll).
I thought about attempting to cache this info for the outer loop in
the structure returned by get_simple_loop_desc() as you also suggest
below. I was concerned that currently this returns an niter_desc
structure which holds info about the # iterations, and this
information doesn't fit into that category. However, I could go ahead
and add it to that structure and perhaps rename the structure to
something more generic like loop_desc. What do you think?

The other issue is that we don't need this 

Re: [PATCH 05/11] Make expand_location resolve to locus in main source file

2012-04-25 Thread Dodji Seketeli
I have re-based my tree on top of a recent tree that incorporates the
changes for caret diagnostics and I had to update this patch
accordingly.  Here is its new version on trunk of today.  It basically
updates the new diagnostic_show_locus function to point to spelling
locations.

The patch does pass bootstrap with -ftrack-macro-expansion turned off,
and passes bootstrap with the full series with -ftrack-macro-expansion
turned on.

gcc/

* input.c (expand_location_1): New.  Takes a parameter to choose
whether to resolve the location to spelling or expansion point.
Was factorized from ...
(expand_location): ... here.
(expand_location_to_spelling_point): New.  Implemented in terms of
expand_location_1.
* diagnostic.c (diagnostic_build_prefix): Use the new
expand_location_to_spelling_point instead of expand_location.
---
 gcc/diagnostic.c |4 ++--
 gcc/input.c  |   40 +++-
 gcc/input.h  |9 +
 3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 4496803..729e865 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -214,7 +214,7 @@ diagnostic_build_prefix (diagnostic_context *context,
 must-not-happen
   };
   const char *text = _(diagnostic_kind_text[diagnostic-kind]);
-  expanded_location s = expand_location (diagnostic-location);
+  expanded_location s = expand_location_to_spelling_point 
(diagnostic-location);
   if (diagnostic-override_column)
 s.column = diagnostic-override_column;
   gcc_assert (diagnostic-kind  DK_LAST_DIAGNOSTIC_KIND);
@@ -266,7 +266,7 @@ diagnostic_show_locus (diagnostic_context * context,
   || diagnostic-location = BUILTINS_LOCATION)
 return;
 
-  s = expand_location(diagnostic-location);
+  s = expand_location_to_spelling_point (diagnostic-location);
   line = location_get_source_line (s);
   if (line == NULL)
 return;
diff --git a/gcc/input.c b/gcc/input.c
index bf5fe48..e9ba301 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -32,16 +32,22 @@ struct line_maps *line_table;
 
 /* Expand the source location LOC into a human readable location.  If
LOC resolves to a builtin location, the file name of the readable
-   location is set to the string built-in.  */
-
-expanded_location
-expand_location (source_location loc)
+   location is set to the string built-in. If EXPANSION_POINT_P is
+   TRUE and LOC is virtual, then it is resolved to the expansion
+   point of the involved macro.  Otherwise, it is resolved to the
+   spelling location of the token.  */
+
+static expanded_location
+expand_location_1 (source_location loc,
+  bool expansion_point_p)
 {
   expanded_location xloc;
   const struct line_map *map;
 
   loc = linemap_resolve_location (line_table, loc,
- LRK_SPELLING_LOCATION, map);
+ expansion_point_p
+ ? LRK_MACRO_EXPANSION_POINT
+ : LRK_SPELLING_LOCATION, map);
   xloc = linemap_expand_location (line_table, map, loc);
 
   if (loc = BUILTINS_LOCATION)
@@ -109,6 +115,30 @@ location_get_source_line(expanded_location xloc)
   return buffer;
 }
 
+/* Expand the source location LOC into a human readable location.  If
+   LOC is virtual, it resolves to the expansion point of the involved
+   macro.  If LOC resolves to a builtin location, the file name of the
+   readable location is set to the string built-in.  */
+
+expanded_location
+expand_location (source_location loc)
+{
+  return expand_location_1 (loc, /*expansion_point_p=*/true);
+}
+
+/* Expand the source location LOC into a human readable location.  If
+   LOC is virtual, it resolves to the expansion location of the
+   relevant macro.  If LOC resolves to a builtin location, the file
+   name of the readable location is set to the string
+   built-in.  */
+
+expanded_location
+expand_location_to_spelling_point (source_location loc)
+{
+  return expand_location_1 (loc, /*expansion_piont_p=*/false);
+}
+
+
 #define ONE_K 1024
 #define ONE_M (ONE_K * ONE_K)
 
diff --git a/gcc/input.h b/gcc/input.h
index 4b15222..f755cdf 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -39,6 +39,7 @@ extern char builtins_location_check[(BUILTINS_LOCATION
 
 extern expanded_location expand_location (source_location);
 extern const char * location_get_source_line(expanded_location xloc);
+extern expanded_location expand_location_to_spelling_point (source_location);
 
 /* Historically GCC used location_t, while cpp used source_location.
This could be removed but it hardly seems worth the effort.  */
@@ -50,6 +51,14 @@ extern location_t input_location;
 #define LOCATION_LINE(LOC) ((expand_location (LOC)).line)
 #define LOCATION_COLUMN(LOC)((expand_location (LOC)).column)
 
+#define EXPANSION_POINT_LOCATION_FILE(LOC) \
+  ((expand_location_to_expansion_point (LOC)).file)
+#define 

Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-25 Thread Teresa Johnson
On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen a...@firstfloor.org wrote:
 tejohn...@google.com (Teresa Johnson) writes:

 This patch adds heuristics to limit unrolling in loops with branches that 
 may increase
 branch mispredictions. It affects loops that are not frequently iterated, 
 and that are
 nested within a hot region of code that already contains many branch 
 instructions.

 Performance tested with both internal benchmarks and with SPEC 2000/2006 on 
 a variety
 of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD 
 Opteron systems.
 This improves performance of an internal search indexing benchmark by close 
 to 2% on
 all the tested Intel platforms.  It also consistently improves 445.gobmk 
 (with FDO feedback
 where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
 effects are
 neutral.

 Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?

 One problem with any unrolling heuristics is currently that gcc has both
 the tree level and the rtl level unroller. The tree one is even on at
 -O3.  So if you tweak anything for one you have to affect both, otherwise the
 other may still do the wrong thing(tm).

It's true that the tree level unroller could benefit from taking
branch mispredict effects into account as well. But since that is only
performing full unrolling of constant trip count loops I suspect that
there will be additional things that need to be considered, such as
whether the full unrolling enables better optimization in the
surrounding code/loop. Hence I wanted to tackle that later.


 For some other tweaks I looked into a shared cost model some time ago.
 May be still needed.

Yes, I think it would be good to unify some of the profitability
checks between the two unrolling passes, or at least between the tree
and rtl level full unrollers/peelers.

Teresa


 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-25 Thread Teresa Johnson
On Wed, Apr 25, 2012 at 2:03 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote:
 This patch adds heuristics to limit unrolling in loops with branches that 
 may increase
 branch mispredictions. It affects loops that are not frequently iterated, 
 and that are
 nested within a hot region of code that already contains many branch 
 instructions.

 Performance tested with both internal benchmarks and with SPEC 2000/2006 on 
 a variety
 of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD 
 Opteron systems.
 This improves performance of an internal search indexing benchmark by close 
 to 2% on
 all the tested Intel platforms.  It also consistently improves 445.gobmk 
 (with FDO feedback
 where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
 effects are
 neutral.

 Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?

 Thanks,
 Teresa

 2012-04-24   Teresa Johnson  tejohn...@google.com

        * loop-unroll.c (loop_has_call): New function.
        (loop_has_FP_comp): Ditto.
        (compute_weighted_branches): Ditto.
        (max_unroll_with_branches): Ditto.
        (decide_unroll_constant_iterations): Add heuristic to avoid
        increasing branch mispredicts when unrolling.
        (decide_unroll_runtime_iterations): Ditto.
        * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
        (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.

 Index: loop-unroll.c
 ===
 --- loop-unroll.c       (revision 186783)
 +++ loop-unroll.c       (working copy)
 @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc
                                             basic_block);
  static rtx get_expansion (struct var_to_expand *);

 +/* Determine whether LOOP contains call.  */
 +static bool
 +loop_has_call(struct loop *loop)
 +{
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);

 You repeatedly do this and walk over all blocks.  Please think about
 compile-time
 issues when writing code.

See my response to Steven where I address this issue and mention some
approaches to reducing the loop body walks. Please let me know if you
have any feedback on that.


 This all looks sort-of target specific to me and I don't see why this
 very specialized
 patch is a good idea when unrolling does a very poor job deciding what and how
 much to unroll generally.

I am hoping this will improve upon the job the unroller does in
deciding when/how to unroll. I didn't think that it was too target
specific as branch mispredictions could affect many targets. Note that
there are already some much more basic checks for the branch
misprediction effects in both decide_peel_simple and
decide_unroll_stupid, for example:

  /* Do not simply peel loops with branches inside -- it increases number
 of mispredicts.  */
  if (num_loop_branches (loop)  1)
{
  if (dump_file)
fprintf (dump_file, ;; Not peeling, contains branches\n);
  return;
}

It is possible that both of these checks could be made less aggressive
using the approach in this patch, which affects many more loops and
hence I am trying to add some more intelligent checking of whether
branch mispredicts might be triggered.

 Thanks,
Teresa


 Richard.

 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          if (CALL_P (insn))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Determine whether LOOP contains floating-point computation.  */
 +static bool
 +loop_has_FP_comp(struct loop *loop)
 +{
 +  rtx set, dest;
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          set = single_set (insn);
 +          if (!set)
 +            continue;
 +
 +          dest = SET_DEST (set);
 +          if (FLOAT_MODE_P (GET_MODE (dest)))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Compute the number of branches in LOOP, weighted by execution counts.  */
 +static float
 +compute_weighted_branches(struct loop *loop)
 +{
 +  int header_count = loop-header-count;
 +  unsigned i;
 +  float n;
 +  basic_block * body;
 +
 +  /* If no profile feedback data exists, don't limit unrolling  */
 +  if (header_count == 0)
 +    return 0.0;
 +
 +  gcc_assert (loop-latch != EXIT_BLOCK_PTR);
 +
 +  body = get_loop_body (loop);
 +  n = 0.0;
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      if (EDGE_COUNT (body[i]-succs) = 2)
 +        {
 +          /* If this block is 

Re: [RFA] Update config.sub to 2012-04-18 version.

2012-04-25 Thread Joel Brobecker
 ChangeLog:
 
 * config.sub: Update to 2012-04-18 version from official repo.

Thanks to everyone who answered. This patch is no in, both GCC  src.

-- 
Joel


Re: [PATCH] Default to -gdwarf-4

2012-04-25 Thread Richard Henderson
On 04/25/12 06:47, Jakub Jelinek wrote:
 2012-04-25  Jakub Jelinek  ja...@redhat.com
 
   * common.opt (flag_debug_types_section): Default to 0.
   (dwarf_version): Default to 4.
   (dwarf_record_gcc_switches): Default to 1.
   (dwarf_strict): Default to 0.
   * toplev.c (process_options): Don't handle dwarf_strict
   or dwarf_version here.
   * config/vxworks.c (vxworks_override_options): Don't
   test whether dwarf_strict or dwarf_version are negative,
   instead test !global_options_set.x_dwarf_*.
   * config/darwin.c (darwin_override_options): Default to
   dwarf_version 2.
   * doc/invoke.texi: Note that -gdwarf-4, -grecord-gcc-switches
   and -fno-debug-types-section are now the default.

Ok.


r~


Re: [C] improve missing initializers diagnostics

2012-04-25 Thread Manuel López-Ibáñez
On 25 April 2012 16:46, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, Apr 21, 2012 at 4:58 AM, Manuel López-Ibáñez
 lopeziba...@gmail.com wrote:
 This patch improves missing initializers diagnostics. From:

 pr36446.c:13:3: warning: missing initializer [-Wmissing-field-initializers]
   .h = {1},
   ^
 pr36446.c:13:3: warning: (near initialization for ‘m0.h.b’)
 [-Wmissing-field-initializers]
   .h = {1},
   ^

 to:

 pr36446.c:13:3: warning: missing initializer for field ‘b’ of ‘struct
 h’ [-Wmissing-field-initializers]
   .h = {1},
   ^
 pr36446.c:3:7: note: ‘b’ declared here
   int b;
       ^

 Bootstrapped/regression tested.

 OK?


 2012-04-19  Manuel López-Ibáñez  m...@gcc.gnu.org

        * c-typeck.c (pop_init_level): Improve diagnostics.
 testsuite/
        * gcc.dg/m-un-2.c: Update.
        * gcc.dg/20011021-1.c: Update.

 On Linux/x86, revision 186808 gave me:

 FAIL: gcc.dg/20011021-1.c  (test for warnings, line 34)
 FAIL: gcc.dg/20011021-1.c  (test for warnings, line 41)
 FAIL: gcc.dg/20011021-1.c  (test for warnings, line 44)
 FAIL: gcc.dg/20011021-1.c (test for excess errors)
 FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 27)
 FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 30)
 FAIL: gcc.dg/m-un-2.c (test for excess errors)
 FAIL: gcc.dg/m-un-2.c warning regression 2 (test for warnings, line 12)
 FAIL: gcc.dg/missing-field-init-2.c  (test for warnings, line 14)
 FAIL: gcc.dg/missing-field-init-2.c  (test for warnings, line 7)
 FAIL: gcc.dg/missing-field-init-2.c  (test for warnings, line 8)
 FAIL: gcc.dg/missing-field-init-2.c (test for excess errors)

 Revision 186806 is OK.

Somehow I committed a broken version of the patch. It should have been this:


--- gcc/c-typeck.c  (revision 186821)
+++ gcc/c-typeck.c  (working copy)
@@ -7063,11 +7063,11 @@ pop_init_level (int implicit, struct obs
if (warning_at (input_location, OPT_Wmissing_field_initializers,
missing initializer for field %qD of %qT,
constructor_unfilled_fields,
constructor_type))
  inform (DECL_SOURCE_LOCATION (constructor_unfilled_fields),
- %qT declared here, constructor_unfilled_fields);
+ %qD declared here, constructor_unfilled_fields);
  }
 }


I'll commit as soon as it finishes bootstrapping+testing.

Sorry for the mistake,

Manuel.


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-25 Thread Xinliang David Li
I think the general mechanism applies to most of the targets. What is
needed is target specific parameter (branch budget) tuning which can
be done separately -- there exist a way to do that already.

David

On Wed, Apr 25, 2012 at 2:03 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote:
 This patch adds heuristics to limit unrolling in loops with branches that 
 may increase
 branch mispredictions. It affects loops that are not frequently iterated, 
 and that are
 nested within a hot region of code that already contains many branch 
 instructions.

 Performance tested with both internal benchmarks and with SPEC 2000/2006 on 
 a variety
 of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD 
 Opteron systems.
 This improves performance of an internal search indexing benchmark by close 
 to 2% on
 all the tested Intel platforms.  It also consistently improves 445.gobmk 
 (with FDO feedback
 where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
 effects are
 neutral.

 Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?

 Thanks,
 Teresa

 2012-04-24   Teresa Johnson  tejohn...@google.com

        * loop-unroll.c (loop_has_call): New function.
        (loop_has_FP_comp): Ditto.
        (compute_weighted_branches): Ditto.
        (max_unroll_with_branches): Ditto.
        (decide_unroll_constant_iterations): Add heuristic to avoid
        increasing branch mispredicts when unrolling.
        (decide_unroll_runtime_iterations): Ditto.
        * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
        (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.

 Index: loop-unroll.c
 ===
 --- loop-unroll.c       (revision 186783)
 +++ loop-unroll.c       (working copy)
 @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc
                                             basic_block);
  static rtx get_expansion (struct var_to_expand *);

 +/* Determine whether LOOP contains call.  */
 +static bool
 +loop_has_call(struct loop *loop)
 +{
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);

 You repeatedly do this and walk over all blocks.  Please think about
 compile-time
 issues when writing code.

 This all looks sort-of target specific to me and I don't see why this
 very specialized
 patch is a good idea when unrolling does a very poor job deciding what and how
 much to unroll generally.

 Richard.

 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          if (CALL_P (insn))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Determine whether LOOP contains floating-point computation.  */
 +static bool
 +loop_has_FP_comp(struct loop *loop)
 +{
 +  rtx set, dest;
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          set = single_set (insn);
 +          if (!set)
 +            continue;
 +
 +          dest = SET_DEST (set);
 +          if (FLOAT_MODE_P (GET_MODE (dest)))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Compute the number of branches in LOOP, weighted by execution counts.  */
 +static float
 +compute_weighted_branches(struct loop *loop)
 +{
 +  int header_count = loop-header-count;
 +  unsigned i;
 +  float n;
 +  basic_block * body;
 +
 +  /* If no profile feedback data exists, don't limit unrolling  */
 +  if (header_count == 0)
 +    return 0.0;
 +
 +  gcc_assert (loop-latch != EXIT_BLOCK_PTR);
 +
 +  body = get_loop_body (loop);
 +  n = 0.0;
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      if (EDGE_COUNT (body[i]-succs) = 2)
 +        {
 +          /* If this block is executed less frequently than the header (loop
 +             entry), then it is weighted based on the ratio of times it is
 +             executed compared to the header.  */
 +          if (body[i]-count  header_count)
 +            n += ((float)body[i]-count)/header_count;
 +
 +          /* When it is executed more frequently than the header (i.e. it is
 +             in a nested inner loop), simply weight the branch at 1.0.  */
 +          else
 +            n += 1.0;
 +        }
 +    }
 +  free (body);
 +
 +  return n;
 +}
 +
 +/* Compute the maximum number of times LOOP can be unrolled without 
 exceeding
 +   a branch budget, which can increase branch mispredictions. The number of
 +   branches is computed by weighting each branch with its expected execution
 +   probability through the loop based on 

Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-04-25 Thread Richard Earnshaw
On 25/04/12 15:31, Richard Guenther wrote:
 On Wed, Apr 25, 2012 at 4:27 PM, Greta Yorsh greta.yo...@arm.com wrote:
 Richard Guenther wrote:
 On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com
 wrote:
 Richard Guenther wrote:
 On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com
 wrote:
 The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon
 enabled:
 FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect
 vectorized
 1
 loops 2

 The test expects 2 loops to be vectorized, while gcc successfully
 vectorizes
 3 loops in this test using neon on arm. This patch adjusts the
 expected
 output. Fixed test passes on qemu for arm and powerpc.

 OK for trunk?

 I think the proper fix is to instead of

   for (i = 0; i  N; i++)
 {
   input[i] = i;
   output[i] = 0;
   if (input[i]  256)
 abort ();
 }

 use

   for (i = 0; i  N; i++)
 {
   input[i] = i;
   output[i] = 0;
   __asm__ volatile ();
 }

 to prevent vectorization of initialization loops.

 Actually, it looks like both arm and powerpc vectorize this
 initialization loop (line 31), because the control flow is hoisted
 outside the loop by previous optimizations. In addition, arm with neon
 vectorizes the second loop (line 39), but powerpc does not:

 39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 * 9;

 If this is the expected behaviour for powerpc, then the patch I
 proposed is still needed to fix the test failure on arm. Also, there
 would be no need to disable vectorization of the initialization loop,
 right?

 Ah, I thought that was what changed.  Btw, the if () abort () tries to
 disable
 vectorization but does not succeed in doing so.

 Richard.

 Here is an updated patch. It prevents vectorization of the initialization
 loop, as Richard suggested, and updates the expected number of vectorized
 loops accordingly. This patch assumes that the second loop in main (line 39)
 should only be vectorized on arm with neon.  The test passes for arm and
 powerpc.

 OK for trunk?
 
 If arm cannot handle 9 * i then the approrpiate condition would be
 vect_int_mult, not arm_neon_ok.
 

The issue is that arm has (well, should be marked has having)
vect_char_mult.  The difference in count of vectorized loops is based on
that.

R.

 Ok with that change.
 
 Richard.
 
 Thank you,
 Greta

 gcc/testsuite/ChangeLog

 2012-04-25  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/vect/slp-perm-8.c (main): Prevent
vectorization of initialization loop.
(dg-final): Adjust the expected number of
vectorized loops.




 




Re: various minor obvious fixes in th track-macro-expansion code

2012-04-25 Thread Dodji Seketeli
Manuel López-Ibáñez lopeziba...@gmail.com writes:

 Hi Dodji,

Hello Manuel,

Sorry for my late reply, I was on the road.

 I was going to commit this as obvious, but I want to make sure that it
 doesn't conflict with your new track-macro-expansion patches. It can
 also wait until you commit all your patches.

[snip]

 2012-04-18  Manuel López-Ibáñez  m...@gcc.gnu.org

   * tree-diagnostic.c (maybe_unwind_expanded_macro_loc): Fix
   comment. Delete unused parameter first_exp_point_map.
   (virt_loc_aware_diagnostic_finalizer): Update call.
 libcpp/
   * line-map.c (linemap_resolve_location): Synchronize comments with
   those in line-map.h.
   * include/line-map.h (linemap_resolve_location): Fix spelling in
   comment.

This is fine by me.  I'll easily rebase my tree on your changes.  Git
FTW.

Thanks.

-- 
Dodji


Re: Symbol table 13/many: reachability code rewrite

2012-04-25 Thread Steven Bosscher
On Wed, Apr 25, 2012 at 5:04 PM, Jan Hubicka hubi...@ucw.cz wrote:
 + /* Return true when there are references to NODE.  */
 +
 + static bool
 + referred_to_p (symtab_node node)
 + {
 +   int i;
 +   struct ipa_ref *ref;
 +
 +   for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, 
 ref);
 +        i++)
 +     return true;

This looks odd. Don't you want to do something with ref?

 +   if (symtab_function_p (node)  cgraph (node)-callers)
 +     return true;
 +   return false;
 + }

Ciao!
Steven


Re: Symbol table 13/many: reachability code rewrite

2012-04-25 Thread Jan Hubicka
 On Wed, Apr 25, 2012 at 5:04 PM, Jan Hubicka hubi...@ucw.cz wrote:
  + /* Return true when there are references to NODE.  */
  +
  + static bool
  + referred_to_p (symtab_node node)
  + {
  +   int i;
  +   struct ipa_ref *ref;
  +
  +   for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, 
  ref);
  +        i++)
  +     return true;
 
 This looks odd. Don't you want to do something with ref?
No, I have just iteration API, not API to check how many refs are there
and I want to see if there are any...

Honza


Re: Symbol table 14/many: cleanups of cgraphunit.c

2012-04-25 Thread Steven Bosscher
Hello,

Thanks for this work!

Below are a bunch of spelling fixes ;-)


On Wed, Apr 25, 2012 at 6:32 PM, Jan Hubicka hubi...@ucw.cz wrote:
 --- 19,28 
  along with GCC; see the file COPYING3.  If not see
  http://www.gnu.org/licenses/.  */

 ! /* This module implements main driver of compilation process.

This module implements the main driver of the compilation process.

     The main scope of this file is to act as an interface in between
 !    tree based frontends and the backend.

s/tree based/GENERIC based/, or leave the 'tree' bit out (is there
another kind of front end?).

s/frontend/front end/g
s/backend/back end/g


     The front-end is supposed to use following functionality:

s/front-end/front end/

        The function can be called multiple times when multiple source level
 !       compilation units are combined.

Is this still applicable? I thought -combine was removed. Or is this
for things like Fortran modules?


      - cgraph_optimize

 !       This passes control to the back-end.  Optimizations are performed and

s/back-end/back end/

 !       final assembler is generated.  This is done in the following way. Note
 !       that with link time optimization the process is split into three
 !       stages (compile time, linktime analysis and parallel linktime as

s/linktime/link time/g

 !       indicated bellow).
 !
 !       Compile time:
 !
 !       1) Inter-procedural optimization.
 !          (ipa_passes)
 !
 !          This part is further split into:
 !
 !          a) early optimizations. These are local passes executed in
 !             the topological order on the callgraph.
 !
 !             The purpose of early optimiations is to optimize away simple
 !             things that may otherwise confuse IP analysis. Very simple
 !             propagation across the callgraph is done i.e. to discover

s/i.e./e.g./

 !             functions without side effects and simple inlining is performed.
 !
 !          b) early small interprocedural passes.
 !
 !             Those are interprocedural passes executed only at compilation
 !             time.  These include, for exmaple, transational memory lowering,

s/exmaple/example/

 !             unreachable code removal and other simple transformations.
 !
 !          c) IP analysis stage.  All interprocedural passes do their
 !             analysis.
 !
 !             Interprocedural passes differ from small interprocedural
 !             passes by their ability to operate across whole program

s/whole program/the whole program/, or a whole program, or whole programs.

 !             at linktime.  Their analysis stage is performed early to
 !             both reduce linking times and linktime memory usage by
 !             not having to represent whole program in memory.
 !
 !          d) LTO sreaming.  When doing LTO, everything important gets

s/sreaming/streaming/

 !             streamed into the object file.
 !
 !        Compile time and or linktime analysis stage (WPA):
 !
 !             At linktime units gets streamed back and symbol table is

s/units gets/units get/
s/and symbol/and the symbol/

 !             merged.  Function bodies are not streamed in and not
 !             available.
 !          e) IP propagation stage.  All IP passes execute their
 !             IP propagation. This is done based on the earlier analysis
 !             without having function bodies at hand.
 !          f) Ltrans streaming.  When doing WHOPR LTO, the program
 !             is partitioned and streamed into multple object files.

s/multple/multiple/

 !
 !        Compile time and/or parallel linktime stage (ltrans)

 +             Each of the object files is streamed back and compiled
 +             separately.  Now the function bodies becomes available
 +             again.
 +
 +        2) Virtual clone materialization
 +           (cgraph_materialize_clone)
 +
 +           IP passes can produce copies of existing functoins (such

s/functoins/functions/

 +           as versioned clones or inline clones) without actually
 +           manipulating their bodies by creating virtual clones in
 +           the callgraph. At this time the virtual clones are
 +           turned into real functions

add newline

 +        3) IP transformation
 +
 +           All IP passes transform function bodies based on earlier
 +           decision of the IP propagation.
 +
 +        4) late small IP passes
 +
 +           Simple IP passes working within single program partition.
 +
 +        5) Expansion
 +           (cgraph_expand_all_functions)
 +
 +           At this stage functions that needs to be output into

s/needs/need/

 +           assembler are identified and compiled in topological order

s/order/order./
add newline

 +        6) Output of variables and aliases
 +           Now it is known what variable references was not optimized

s/references was not/references were not/

 +           out and thus all variables are output to the file.
 +
 +        

Re: [PATCH 13/13] Switch -ftrack-macro-expansion=2 on by default.

2012-04-25 Thread Tom Tromey
 Gaby == Gabriel Dos Reis g...@integrable-solutions.net writes:

Gaby On Wed, Apr 25, 2012 at 9:33 AM, Dodji Seketeli do...@redhat.com wrote:
 Hopefully closing the series, this patch switches the compiler to
 -ftrack-macro-expansion=2 by default.
 
 Tested and bootstrapped on x86_64-unknown-linux-gnu against trunk.

Gaby Makes sense to me; Tom?

Yes, do it.  Thanks.

Tom


Re: FW: [v3] libstdc++/52689 testcase

2012-04-25 Thread Jonathan Wakely
On 25 April 2012 08:33, Igor Zamyatin wrote:
 This testcase is reported as failed on x86

Yep.

http://gcc.gnu.org/ml/gcc-testresults/2012-04/msg02547.html

Benjamin?



 Noticed that this testcase wasn't put in as part of the patch. Fixed as 
 follows.

 tested x86/linux

 -benjamin



[v3] fix scoped_allocator problems

2012-04-25 Thread Jonathan Wakely
* include/std/scoped_allocator (scoped_allocator::__outermost): Do
not pass non-POD to varargs function.
* testsuite/20_util/scoped_allocator/1.cc: Fix test.

This fixes a potential problem in scoped_allocator and fixes a
broken test (and ensures it actually runs!)

Tested x86_64-linux, committed to trunk, will put it on 4.7 too.


Re: Symbol table 13/many: reachability code rewrite

2012-04-25 Thread Jan Hubicka
  On Wed, Apr 25, 2012 at 5:04 PM, Jan Hubicka hubi...@ucw.cz wrote:
   + /* Return true when there are references to NODE.  */
   +
   + static bool
   + referred_to_p (symtab_node node)
   + {
   +   int i;
   +   struct ipa_ref *ref;
   +
   +   for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, 
   i, ref);
   +        i++)
   +     return true;
  
  This looks odd. Don't you want to do something with ref?
 No, I have just iteration API, not API to check how many refs are there
 and I want to see if there are any...
Actually it is bit too much of cutpaste. Just
 if (ipa_ref_list_referring_iterate (node-symbol.ref_list, 0, ref))
would work too.  i will update it ;)

Honza

 
 Honza


Re: Symbol table 14/many: cleanups of cgraphunit.c

2012-04-25 Thread Jan Hubicka
 Hello,
 
 Thanks for this work!
 
 Below are a bunch of spelling fixes ;-)

Thanks!
         The function can be called multiple times when multiple source level
  !       compilation units are combined.
 
 Is this still applicable? I thought -combine was removed. Or is this
 for things like Fortran modules?

It is still possible to finalize multiple times and it is used internally to
finalize after aliases.  No frontend is using it as far as I know.  Perhaps I
will drop it after cleaning up the aliases.

Honza


Re: PATCH: PR debug/52857: DW_OP_GNU_regval_type is generated with INVALID_REGNUM

2012-04-25 Thread Richard Henderson
On 04/25/12 06:59, H.J. Lu wrote:
   PR debug/52857
   * dwarf2out.c (dbx_reg_number): Assert return value !=
   INVALID_REGNUM.

Ok.

r~


Re: Symbol table 13/many: reachability code rewrite

2012-04-25 Thread Richard Henderson
On 04/25/12 11:46, Jan Hubicka wrote:
 On Wed, Apr 25, 2012 at 5:04 PM, Jan Hubicka hubi...@ucw.cz wrote:
 + /* Return true when there are references to NODE.  */
 +
 + static bool
 + referred_to_p (symtab_node node)
 + {
 +   int i;
 +   struct ipa_ref *ref;
 +
 +   for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, 
 ref);
 +i++)
 + return true;

 This looks odd. Don't you want to do something with ref?
 No, I have just iteration API, not API to check how many refs are there
 and I want to see if there are any...
 Actually it is bit too much of cutpaste. Just
  if (ipa_ref_list_referring_iterate (node-symbol.ref_list, 0, ref))
 would work too.  i will update it ;)

And with a comment too, please.


r~


[PATCH][Cilkplus] Elemental function insertion

2012-04-25 Thread Iyer, Balaji V
Hello Everyone,
This patch is for the Cilkplus branch affecting both C and C++ compilers. 
This patch will insert elemental functions when the loop is vectorized.

Thanking You,

Yours Sincerely,

Balaji V. Iyer.diff --git a/gcc/cilk.h b/gcc/cilk.h
index 27d5dd0..27ccd16 100644
--- a/gcc/cilk.h
+++ b/gcc/cilk.h
@@ -269,5 +269,7 @@ extern void cilk_remove_annotated_functions (rtx first);
 extern bool cilk_annotated_function_p (char *);
 extern void debug_zca_data (void);
 extern zca_data *get_zca_entry (int);
-extern void insert_in_zca_table (zca_data);
+extern void insert_in_zca_table (zca_data);
+extern bool is_elem_fn (tree);  
+extern tree find_elem_fn_name (tree, tree, tree);
 #endif /* GCC_CILK_H */
diff --git a/gcc/elem-function.c b/gcc/elem-function.c
index 4cc9035..42f6248 100644
--- a/gcc/elem-function.c
+++ b/gcc/elem-function.c
@@ -83,6 +83,58 @@ static elem_fn_info *extract_elem_fn_values (tree);
 static tree create_optimize_attribute (int);
 static tree create_processor_attribute (elem_fn_info *, tree *);
 
+/* this is an helper function for find_elem_fn_param_type */
+static enum elem_fn_parm_type
+find_elem_fn_parm_type_1 (tree fndecl, int parm_no)
+{
+  int ii = 0;
+  elem_fn_info *elem_fn_values;
+
+  elem_fn_values = extract_elem_fn_values (fndecl);
+  if (!elem_fn_values)
+return TYPE_NONE;
+
+  for (ii = 0; ii  elem_fn_values-no_lvars; ii++)
+if (elem_fn_values-linear_location[ii] == parm_no)
+  return TYPE_LINEAR;
+
+  for (ii = 0; ii  elem_fn_values-no_uvars; ii++)
+if (elem_fn_values-uniform_location[ii] == parm_no)
+  return TYPE_UNIFORM;
+
+  return TYPE_NONE;
+}
+  
+  
+/* this function will return the type of a parameter in elemental function.
+   The choices are UNIFORM or LINEAR. */
+enum elem_fn_parm_type
+find_elem_fn_parm_type (gimple stmt, tree op)
+{
+  tree fndecl, parm = NULL_TREE;
+  int ii, nargs;
+  enum elem_fn_parm_type return_type = TYPE_NONE;
+  
+  if (gimple_code (stmt) != GIMPLE_CALL)
+return TYPE_NONE;
+
+  fndecl = gimple_call_fndecl (stmt);
+  gcc_assert (fndecl);
+
+  nargs = gimple_call_num_args (stmt);
+
+  for (ii = 0; ii  nargs; ii++)
+{
+  parm = gimple_call_arg (stmt, ii);
+  if (op == parm)
+   {
+ return_type = find_elem_fn_parm_type_1 (fndecl, 1);
+ return return_type;
+   }
+}
+  return return_type;
+}
+  
 /* this function will concatinate the suffix to the existing function decl */
 static tree
 rename_elem_fn (tree decl, const char *suffix)
@@ -108,7 +160,7 @@ rename_elem_fn (tree decl, const char *suffix)
 
 /* this function will check to see if the node is part of an function that
  * needs to be converted to its vector equivalent. */
-static bool
+bool
 is_elem_fn (tree fndecl)
 {
   tree ii_tree;
@@ -349,6 +401,55 @@ find_suffix (elem_fn_info *elem_fn_values, bool masked)
   return suffix;
 }
 
+tree
+find_elem_fn_name (tree old_fndecl,
+  tree vectype_out ATTRIBUTE_UNUSED,
+  tree vectype_in ATTRIBUTE_UNUSED)
+{
+  elem_fn_info *elem_fn_values = NULL;
+  tree new_fndecl = NULL_TREE, arg_type = NULL_TREE;
+  char *suffix = NULL;
+  
+  elem_fn_values = extract_elem_fn_values (old_fndecl);
+ 
+  if (elem_fn_values)
+{
+  if (elem_fn_values-no_vlengths  0)
+   {
+ if (elem_fn_values-vectorlength[0] ==
+ (int)TYPE_VECTOR_SUBPARTS (vectype_out))
+   suffix = find_suffix (elem_fn_values, false);
+ else
+   return NULL_TREE;
+   }
+  else
+   return NULL_TREE;
+}
+  else
+return NULL_TREE;
+
+  new_fndecl = copy_node (rename_elem_fn (old_fndecl, suffix));
+  TREE_TYPE (new_fndecl) = copy_node (TREE_TYPE (old_fndecl));
+
+  TYPE_ARG_TYPES (TREE_TYPE (new_fndecl)) =
+copy_list (TYPE_ARG_TYPES (TREE_TYPE (new_fndecl)));
+  
+  for (arg_type = TYPE_ARG_TYPES (TREE_TYPE (new_fndecl));
+   arg_type  arg_type != void_type_node;
+   arg_type = TREE_CHAIN (arg_type))
+TREE_VALUE (arg_type) = vectype_out;
+  
+  if (TREE_TYPE (TREE_TYPE (new_fndecl)) != void_type_node)
+{
+  TREE_TYPE (TREE_TYPE (new_fndecl)) =
+   copy_node (TREE_TYPE (TREE_TYPE (new_fndecl)));
+  TREE_TYPE (TREE_TYPE (new_fndecl)) = vectype_out;
+  DECL_MODE (new_fndecl) = TYPE_MODE (vectype_out);
+}
+  
+  return new_fndecl;
+}
+
 /* this function wil create the elemental vector function node */
 static struct cgraph_node *
 create_elem_fn_nodes (struct cgraph_node *node)
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 1381b53..bea2773 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
 #include langhooks.h
 #include tree-affine.h
 #include params.h
+#include cilk.h
 
 static struct datadep_stats
 {
@@ -4383,8 +4384,18 @@ find_data_references_in_stmt (struct loop *nest, gimple 
stmt,
 
   if 

[PATCH] Improve andq $0xffffffff, %reg handling (PR target/53110)

2012-04-25 Thread Jakub Jelinek
Hi!

We have a splitter for reg1 = reg2  0x, but only if regnums
are different.  But movl %edi, %edi is a cheaper variant of
andq $0x, %rdi even with the same register and doesn't clobber
flags, so this patch attempts to expand it as a zero extension early.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-04-25  Jakub Jelinek  ja...@redhat.com

PR target/53110
* config/i386/i386.md (andmode3): For andq $0x, reg
instead expand it as zero extension.

--- gcc/config/i386/i386.md.jj  2012-04-25 12:14:54.0 +0200
+++ gcc/config/i386/i386.md 2012-04-25 14:50:48.708925963 +0200
@@ -7694,7 +7694,17 @@ (define_expand andmode3
(and:SWIM (match_operand:SWIM 1 nonimmediate_operand)
  (match_operand:SWIM 2 general_szext_operand)))]
   
-  ix86_expand_binary_operator (AND, MODEmode, operands); DONE;)
+{
+  if (MODEmode == DImode
+   GET_CODE (operands[2]) == CONST_INT
+   INTVAL (operands[2]) == (HOST_WIDE_INT) 0x
+   REG_P (operands[1]))
+emit_insn (gen_zero_extendsidi2 (operands[0],
+gen_lowpart (SImode, operands[1])));
+  else
+ix86_expand_binary_operator (AND, MODEmode, operands);
+  DONE;
+})
 
 (define_insn *anddi_1
   [(set (match_operand:DI 0 nonimmediate_operand =r,rm,r,r)

Jakub


Re: [PATCH] Improve andq $0xffffffff, %reg handling (PR target/53110)

2012-04-25 Thread Richard Henderson
On 04/25/12 12:14, Jakub Jelinek wrote:
 2012-04-25  Jakub Jelinek  ja...@redhat.com
 
   PR target/53110
   * config/i386/i386.md (andmode3): For andq $0x, reg
   instead expand it as zero extension.

Ok.


r~


FW: [PATCH][Cilkplus] Elemental function insertion

2012-04-25 Thread Iyer, Balaji V
Sorry, I forgot to include Changelog entries. I am attaching the updated patch.

Thanks,

Balaji V. Iyer.


From: Iyer, Balaji V
Sent: Wednesday, April 25, 2012 3:08 PM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH][Cilkplus] Elemental function insertion

Hello Everyone,
This patch is for the Cilkplus branch affecting both C and C++ compilers. 
This patch will insert elemental functions when the loop is vectorized.

Thanking You,

Yours Sincerely,

Balaji V. Iyer.diff --git a/gcc/ChangeLog.cilk b/gcc/ChangeLog.cilk
index 523b7ac..d6b28e2 100644
--- a/gcc/ChangeLog.cilk
+++ b/gcc/ChangeLog.cilk
@@ -1,3 +1,19 @@
+2012-04-24  Balaji V. Iyer  balaji.v.i...@intel.com
+
+   * elem-function.c (find_elem_fn_param_type_1): New function.
+   (find_elem_fn_param_type): Likewise.
+   (find_elem_fn_name): Likewise.
+   (is_elem_fn): Make it unstatic.
+   * tree-data-ref.c (find_data_references_in_stmt): Added support for
+   functions that can be made to elemental functions.
+   * tree-vect-stmts.c (vect_get_vec_def_for_operand): Added a check for
+   the parameters to see whether it is uniform, linear or neither.
+   (vectorizable_function): Handled code to substitute regular function
+   with the equivalent elemental function.
+   (vectorizable_call): Set the function type for substituted elemental
+   function.
+   * tree.h (enum elem_fn_parm_type): New enum.
+
 2012-04-20  Balaji V. Iyer  balaji.v.i...@intel.com
 
* final.c (rest_of_handle_final): Moved outputing ZCA data after
diff --git a/gcc/cilk.h b/gcc/cilk.h
index 27d5dd0..27ccd16 100644
--- a/gcc/cilk.h
+++ b/gcc/cilk.h
@@ -269,5 +269,7 @@ extern void cilk_remove_annotated_functions (rtx first);
 extern bool cilk_annotated_function_p (char *);
 extern void debug_zca_data (void);
 extern zca_data *get_zca_entry (int);
-extern void insert_in_zca_table (zca_data);
+extern void insert_in_zca_table (zca_data);
+extern bool is_elem_fn (tree);  
+extern tree find_elem_fn_name (tree, tree, tree);
 #endif /* GCC_CILK_H */
diff --git a/gcc/elem-function.c b/gcc/elem-function.c
index 4cc9035..42f6248 100644
--- a/gcc/elem-function.c
+++ b/gcc/elem-function.c
@@ -83,6 +83,58 @@ static elem_fn_info *extract_elem_fn_values (tree);
 static tree create_optimize_attribute (int);
 static tree create_processor_attribute (elem_fn_info *, tree *);
 
+/* this is an helper function for find_elem_fn_param_type */
+static enum elem_fn_parm_type
+find_elem_fn_parm_type_1 (tree fndecl, int parm_no)
+{
+  int ii = 0;
+  elem_fn_info *elem_fn_values;
+
+  elem_fn_values = extract_elem_fn_values (fndecl);
+  if (!elem_fn_values)
+return TYPE_NONE;
+
+  for (ii = 0; ii  elem_fn_values-no_lvars; ii++)
+if (elem_fn_values-linear_location[ii] == parm_no)
+  return TYPE_LINEAR;
+
+  for (ii = 0; ii  elem_fn_values-no_uvars; ii++)
+if (elem_fn_values-uniform_location[ii] == parm_no)
+  return TYPE_UNIFORM;
+
+  return TYPE_NONE;
+}
+  
+  
+/* this function will return the type of a parameter in elemental function.
+   The choices are UNIFORM or LINEAR. */
+enum elem_fn_parm_type
+find_elem_fn_parm_type (gimple stmt, tree op)
+{
+  tree fndecl, parm = NULL_TREE;
+  int ii, nargs;
+  enum elem_fn_parm_type return_type = TYPE_NONE;
+  
+  if (gimple_code (stmt) != GIMPLE_CALL)
+return TYPE_NONE;
+
+  fndecl = gimple_call_fndecl (stmt);
+  gcc_assert (fndecl);
+
+  nargs = gimple_call_num_args (stmt);
+
+  for (ii = 0; ii  nargs; ii++)
+{
+  parm = gimple_call_arg (stmt, ii);
+  if (op == parm)
+   {
+ return_type = find_elem_fn_parm_type_1 (fndecl, 1);
+ return return_type;
+   }
+}
+  return return_type;
+}
+  
 /* this function will concatinate the suffix to the existing function decl */
 static tree
 rename_elem_fn (tree decl, const char *suffix)
@@ -108,7 +160,7 @@ rename_elem_fn (tree decl, const char *suffix)
 
 /* this function will check to see if the node is part of an function that
  * needs to be converted to its vector equivalent. */
-static bool
+bool
 is_elem_fn (tree fndecl)
 {
   tree ii_tree;
@@ -349,6 +401,55 @@ find_suffix (elem_fn_info *elem_fn_values, bool masked)
   return suffix;
 }
 
+tree
+find_elem_fn_name (tree old_fndecl,
+  tree vectype_out ATTRIBUTE_UNUSED,
+  tree vectype_in ATTRIBUTE_UNUSED)
+{
+  elem_fn_info *elem_fn_values = NULL;
+  tree new_fndecl = NULL_TREE, arg_type = NULL_TREE;
+  char *suffix = NULL;
+  
+  elem_fn_values = extract_elem_fn_values (old_fndecl);
+ 
+  if (elem_fn_values)
+{
+  if (elem_fn_values-no_vlengths  0)
+   {
+ if (elem_fn_values-vectorlength[0] ==
+ (int)TYPE_VECTOR_SUBPARTS (vectype_out))
+   suffix = find_suffix (elem_fn_values, false);
+ else
+   return NULL_TREE;
+   }
+  else
+   return 

Re: [PATCH] Validate the HLE memory models.

2012-04-25 Thread Richard Henderson
On 04/25/12 01:29, Andi Kleen wrote:
 gcc/:
 2012-04-24  Andi Kleen  a...@linux.intel.com
 
   * builtins.c (get_memmodel): Add val. Call target.memmodel_check
   and return new variable.
   * config/i386/i386.c (ix86_memmodel_check): Add.
   (TARGET_MEMMODEL_MASK): Replace with TARGET_MEMMODEL_CHECK.
   * doc/tm.texi (TARGET_MEMMODEL_MASK): Replace with 
 TARGET_MEMMODEL_CHECK..
   * doc/tm.texi.in (TARGET_MEMMODEL_MASK): Replace with 
 TARGET_MEMMODEL_CHECK.
   * target.def (memmodel_mask): Replace with memmodel_check.

This looks much better to me than the mask solution.

 +  int val;

 +ix86_memmodel_check (unsigned val)

Please use HOST_WIDE_INT throughout, otherwise you'll fail to warn for
truly silly parameters such as 0x1__.



r~


  1   2   >