date:20130902

Re: [C++ Patch] PR 21682 (DR 565) (Take 2)

2013-09-02 Thread Jason Merrill


OK.

Jason

[PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction

2013-09-02 Thread bin.cheng

Hi,

The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find
different MEM_REFs sharing common part in addressing expression.  If such
MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient
addressing expression during the RTL passes.
The pass analyzes addressing expression in each MEM_REF to see if it can be
formalized as follows:
 base:MEM_REF (T1, C1)
 offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
 bitpos:  C4 * BITS_PER_UNIT
Then restructures it into below form:
 MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
  C1 + (C2 * C3) + C4)
At last, rewrite the MEM_REFs if there are two or more sharing common
(non-constant) part.
The problem is it doesn't back trace T2.  If T2 is recorded as a CAND_ADD in
form of T2' + C5, the MEM_REF should be restructure into:
 MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)),
  C1 + (C2 * C3) + C4 + (C5 * C3))

The patch also includes a test case to illustrate the problem.

Bootstrapped and tested on x86/x86_64/arm-a15, is it ok?

Thanks.
bin

2013-09-02  Bin Cheng  bin.ch...@arm.com

* gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New.
(restructure_reference): Call backtrace_base_for_ref.

gcc/testsuite/ChangeLog
2013-09-02  Bin Cheng  bin.ch...@arm.com

* gcc.dg/tree-ssa/slsr-39.c: New test.Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0)
@@ -0,0 +1,26 @@
+/* Verify straight-line strength reduction for back-tracing
+   CADN_ADD for T2 in:
+
+*PBASE:T1
+*POFFSET:  MULT_EXPR (T2, C3)
+*PINDEX:   C1 + (C2 * C3) + C4  */
+
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-slsr } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i] [j++] = i;
+  a2 [i] [j++] = i;
+  a2 [i] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times MEM 4 slsr } } */
+/* { dg-final { cleanup-tree-dump slsr } } */
Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 202067)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed)
   add_cand_for_stmt (phi, c);
 }
 
+/* Given PBASE which is a pointer to tree, loop up the defining
+   statement for it and check whether the candidate is in the
+   form of:
+
+ X = B + (1 * S), S is integer constant
+ X = B + (i * S), S is integer one
+
+   If so, set PBASE to the candiate's base_expr and return double
+   int (i * S).
+   Otherwise, just return double int zero.  */
+
+static double_int
+backtrace_base_for_ref (tree *pbase)
+{
+  tree base_in = *pbase;
+  slsr_cand_t base_cand;
+
+  STRIP_NOPS (base_in);
+  if (TREE_CODE (base_in) != SSA_NAME)
+return tree_to_double_int (integer_zero_node);
+
+  base_cand = base_cand_from_table (base_in);
+
+  while (base_cand  base_cand-kind != CAND_PHI)
+{
+  if (base_cand-kind == CAND_ADD
+  base_cand-index.is_one ()
+  TREE_CODE (base_cand-stride) == INTEGER_CST)
+   {
+ /* X = B + (1 * S), S is integer constant.  */
+ *pbase = base_cand-base_expr;
+ return tree_to_double_int (base_cand-stride);
+   }
+  else if (base_cand-kind == CAND_ADD
+   TREE_CODE (base_cand-stride) == INTEGER_CST
+   integer_onep (base_cand-stride))
+{
+ /* X = B + (i * S), S is integer one.  */
+ *pbase = base_cand-base_expr;
+ return base_cand-index;
+   }
+
+  if (base_cand-next_interp)
+   base_cand = lookup_cand (base_cand-next_interp);
+  else
+   base_cand = NULL;
+}
+
+  return tree_to_double_int (integer_zero_node);
+}
+
 /* Look for the following pattern:
 
 *PBASE:MEM_REF (T1, C1)
@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed)
 
 *PBASE:T1
 *POFFSET:  MULT_EXPR (T2, C3)
-*PINDEX:   C1 + (C2 * C3) + C4  */
+*PINDEX:   C1 + (C2 * C3) + C4
 
+   When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It
+   will be further restructured to:
+
+*PBASE:T1
+*POFFSET:  MULT_EXPR (T2', C3)
+*PINDEX:   C1 + (C2 * C3) + C4 + (C5 * C3)  */
+
 static bool
 restructure_reference (tree *pbase, tree *poffset, double_int *pindex,
   tree *ptype)
@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset,
   double_int index = *pindex;
   double_int bpu = double_int::from_uhwi (BITS_PER_UNIT);
   tree mult_op0, mult_op1, t1, t2, type;
-  double_int c1, c2, c3, c4;
+  double_int c1, c2, c3, c4, c5;
 
   if (!base
   || !offset
@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree *poffset,
 }
 
   c4 = index.udiv

[PATCH GCC]Find auto-increment use both before and after candidate's increment in IVOPT

2013-09-02 Thread bin.cheng

Hi,
For now set_autoinc_for_original_candidates only searches auto-inc uses
before candidate's increment, causing pre-increment opportunities missed for
original candidates.  This is a straightforward fix by searching after
candidate's increment too.

The patch also includes a test case to illustrate the problem.  Without the
patch, assembly of the test is:
foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr3, #:lower16:__ctype_ptr__
ldrbr2, [r0]@ zero_extendqisi2
movtr3, #:upper16:__ctype_ptr__
ldr r1, [r3]
addsr3, r1, r2
ldrbr3, [r3, #1]@ zero_extendqisi2
lslsr3, r3, #29
bmi .L2
addsr3, r0, #1
.L3:
mov r0, r3
addsr3, r3, #1
ldrbr2, [r0]@ zero_extendqisi2
add r2, r2, r1
ldrbr2, [r2, #1]@ zero_extendqisi2
lslsr2, r2, #29
bpl .L3
.L2:
bx  lr
.size   foo, .-foo

Which can be optimized into below:
foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr3, #:lower16:__ctype_ptr__
ldrbr1, [r0]@ zero_extendqisi2
movtr3, #:upper16:__ctype_ptr__
ldr r2, [r3]
addsr3, r2, r1
ldrbr3, [r3, #1]@ zero_extendqisi2
lslsr1, r3, #29
bmi .L2
.L3:
ldrbr3, [r0, #1]!   @ zero_extendqisi2
add r3, r3, r2
ldrbr3, [r3, #1]@ zero_extendqisi2
lslsr3, r3, #29
bpl .L3
.L2:
bx  lr
.size   foo, .-foo

Bootstrapped and tested on arm a15, is it OK?

Thanks.
bin

2013-09-02  Bin Cheng  bin.ch...@arm.com

* tree-ssa-loop-ivopts.c (set_autoinc_for_original_candidates):
Find auto-increment use both before and after candidate.

gcc/testsuite/ChangeLog
2013-09-02  Bin Cheng  bin.ch...@arm.com

* gcc.target/arm/ivopts-orig_biv-inc.c: New test.Index: gcc/testsuite/gcc.target/arm/ivopts-orig_biv-inc.c
===
--- gcc/testsuite/gcc.target/arm/ivopts-orig_biv-inc.c  (revision 0)
+++ gcc/testsuite/gcc.target/arm/ivopts-orig_biv-inc.c  (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-ivopts-details } */
+/* { dg-skip-if  { arm_thumb1 } } */
+
+extern char *__ctype_ptr__;
+
+unsigned char * foo(unsigned char *ReadPtr)
+{
+
+ unsigned char c;
+
+ while (!(((__ctype_ptr__+sizeof([*ReadPtr]))[(int)(*ReadPtr)])04) == 
(!(0)))
+  ReadPtr++;
+
+ return ReadPtr;
+}
+
+/* { dg-final { scan-tree-dump-times original biv 2 ivopts} } */
+/* { dg-final { cleanup-tree-dump ivopts } } */
Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c  (revision 200774)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -4876,22 +4876,36 @@ set_autoinc_for_original_candidates (struct ivopts
   for (i = 0; i  n_iv_cands (data); i++)
 {
   struct iv_cand *cand = iv_cand (data, i);
-  struct iv_use *closest = NULL;
+  struct iv_use *closest_before = NULL;
+  struct iv_use *closest_after = NULL;
   if (cand-pos != IP_ORIGINAL)
continue;
+
   for (j = 0; j  n_iv_uses (data); j++)
{
  struct iv_use *use = iv_use (data, j);
  unsigned uid = gimple_uid (use-stmt);
- if (gimple_bb (use-stmt) != gimple_bb (cand-incremented_at)
- || uid  gimple_uid (cand-incremented_at))
+
+ if (gimple_bb (use-stmt) != gimple_bb (cand-incremented_at))
continue;
- if (closest == NULL || uid  gimple_uid (closest-stmt))
-   closest = use;
+
+ if (uid  gimple_uid (cand-incremented_at)
+  (closest_before == NULL
+ || uid  gimple_uid (closest_before-stmt)))
+   closest_before = use;
+
+ if (uid  gimple_uid (cand-incremented_at)
+  (closest_after == NULL
+ || uid  gimple_uid (closest_after-stmt)))
+   closest_after = use;
}
-  if (closest == NULL || !autoinc_possible_for_pair (data, closest, cand))
-   continue;
-  cand-ainc_use = closest;
+
+  if (closest_before != NULL
+  autoinc_possible_for_pair (data, closest_before, cand))
+   cand-ainc_use = closest_before;
+  else if (closest_after != NULL
+   autoinc_possible_for_pair (data, closest_after, cand))
+   cand-ainc_use = closest_after;
 }
 }

RE: [PATCH ARM]Refine scaled address expression on ARM

2013-09-02 Thread bin.cheng

 -Original Message-
 From: Richard Earnshaw 
 Sent: Thursday, August 29, 2013 9:06 PM
 To: Bin Cheng
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH ARM]Refine scaled address expression on ARM

 On 28/08/13 08:00, bin.cheng wrote:
  Hi,

  This patch refines scaled address expression on ARM.  It supports 
  base+index*scale in arm_legitimate_address_outer_p.  It also tries 
  to legitimize base + index * scale + offset with reg - base + 
  offset;  reg
  + index * scale by introducing thumb2_legitimize_address.  For now 
  + function
  thumb2_legitimize_address is a kind of placeholder and just does the 
  mentioned transformation by calling to try_multiplier_address.  Hoping 
  we can improve it in the future.

  With this patch:
  1) base+index*scale is recognized.

 That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form.
  So this shouldn't be necessary.  Can you identify where this
non-canoncial form is being generated?

Oh, for now ivopt constructs index*scale to test whether backend supports
scaled addressing mode, which is not valid on ARM, so I was going to
construct base + index*scale instead.  Since base + index * scale is not
canonical form, I will construct the canonical form and drop this part of
the patch.

Is rest of this patch OK?

Thanks.
bin

patch to enable *.cc files in gengtype

2013-09-02 Thread Basile Starynkevitch

Hello All,

The attached patch (for trunk svn rev 202160) 
to gcc/gengtype.c permits files named *.cc (by adding another rule) 
to be passed to gengtype and gives a slightly meaningful fatal 
error message whan an unrecogized file name (e.g. *.cpp or something 
even more wild) is passed to gengtype. 
FWIW, this patch is imported from melt-branch svn rev 202160.


 gcc/ChangeLog entry 

2013-09-02  Basile Starynkevitch  bas...@starynkevitch.net

* gengtype.c (file_rules): Added rule for *.cc files.
(get_output_file_with_visibility): Give fatal message when no
rules found.

###

Ok for trunk?

Cheers
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***
Index: gcc/gengtype.c
===
--- gcc/gengtype.c	(revision 202160)
+++ gcc/gengtype.c	(working copy)
@@ -2004,14 +2004,21 @@
 REG_EXTENDED, NULL_REGEX,
 gt-objc-objc-map.h, objc/objc-map.c, NULL_FRULACT },
 
-  /* General cases.  For header *.h and source *.c files, we need
-   * special actions to handle the language.  */
+  /* General cases.  For header *.h and source *.c or *.cc files, we
+   * need special actions to handle the language.  */
 
   /* Source *.c files are using get_file_gtfilename to compute their
  output_name and get_file_basename to compute their for_name
  through the source_dot_c_frul action.  */
   { DIR_PREFIX_REGEX ([[:alnum:]_-]*)\\.c$,
 REG_EXTENDED, NULL_REGEX, gt-$3.h, $3.c, source_dot_c_frul},
+
+  /* Source *.cc files are using get_file_gtfilename to compute their
+ output_name and get_file_basename to compute their for_name
+ through the source_dot_c_frul action.  */
+  { DIR_PREFIX_REGEX ([[:alnum:]_-]*)\\.cc$,
+REG_EXTENDED, NULL_REGEX, gt-$3.h, $3.cc, source_dot_c_frul},
+
   /* Common header files get gtype-desc.c as their output_name,
* while language specific header files are handled specially.  So
* we need the header_dot_h_frul action.  */
@@ -2269,9 +2276,9 @@
   }
   if (!output_name || !for_name)
 {
-  /* This is impossible, and could only happen if the files_rules is
-	 incomplete or buggy.  */
-  gcc_unreachable ();
+  /* This should not be possible, and could only happen if the
+	 files_rules is incomplete or buggy.  */
+  fatal (failed to compute output name for %s, inpfname);
 }
 
   /* Look through to see if we've ever seen this output filename

Ping: small patch to accept = inside plugin arguments

2013-09-02 Thread Basile Starynkevitch

Hello

I'm pinging my last month's patch 
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00382.html
of August 07th 2013 to accept the = inside plugin arguments

Cheers
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***

Re: [RFC] Changes to the wide-int classes

2013-09-02 Thread Richard Sandiford

Mike Stump mikest...@comcast.net writes:
 * A static assert also prevents fixed_wide_ints from being initialised
   from wide_ints.  I think combinations like that would always be a
   mistake.

 I don't see why.  In C, this:

   int i;
   long l = i;

 is not an error, and while a user might not want to do this, other
 times, it is completely reasonable.

Sure, but in these terms, int is FLEXIBLE_PRECISION, while wide_int
is VAR_PRECISION.  You can do the above because int has a sign,
and so the compiler knows how to extend it to wider types.  wide_int
doesn't have a sign, so if you want to extend it to wider types you
need to specify a sign explicitly.  So you can do:

   max_wide_int x = max_wide_int::from (wide_int_var, SIGNED);
   max_wide_int x = max_wide_int::from (wide_int_var, UNSIGNED);

What I'm saying is that the assert stops plain:

   max_wide_int x = wide_int_var;

since it's very unlikely that you'd know up-front that wide_int_var
has precision MAX_BITSIZE_MODE_ANY_INT.

FLEXIBLE_PRECISION is for integers that act in a C-like way.
CONST_PRECISION and VAR_PRECISION are (deliberately) not C-like.

Thanks,
Richard

Re: [PATCH] Convert more passes to new dump framework

2013-09-02 Thread Richard Biener

On Fri, Aug 30, 2013 at 6:27 PM, Xinliang David Li davi...@google.com wrote:
 Except that in this form, the dump will be extremely large and not
 suitable for very large applications.

So?  You asked for it and you can easily grep the output before storing it
away.

 Besides, we might also want to
 use the same machinery (dump_printf_loc etc) for dump file dumping.
 The current behavior of using '-details' to turn on opt-info-all
 messages for dump files are not desirable.  How about the following:

 1) add a new dump_kind modifier so that when that modifier is
 specified, the messages won't goto the alt_dumpfile (controlled by
 -fopt-info), but only to primary dump file. With this, the inline
 messages can be dumped via:

dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .)


 2) add more flags in -fdump- support:

-fdump-ipa-inline-opt   -- turn on opt-info messages only
-fdump-ipa-inline-optall -- turn on opt-info-all messages
-fdump-tree-pre-ir -- turn on GIMPLE dump only
-fdump-tree-pre-details -- turn on everything (ir, optall, trace)

 With this, developers can really just use


 -fdump-ipa-inline-opt=stderr for inline messages.

 thanks,

 David

 On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson tejohn...@google.com wrote:
 On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 New patch below that removes this global variable, and also outputs
 the node-symbol.order (in square brackets after the function name so
 as to not clutter it). Inline messages with profile data look look:

 test.c:8:3: note: foobar [0] (9000) inlined into foo [2] (1000)
 with call count 9000 (via inline instance bar [3] (9000))

 Ick.  This looks both redundant and cluttered.  This is supposed to be
 understandable by GCC users, not only GCC developers.

 The main part that is only useful/understandable to gcc developers is
 the node-symbol.order in square brackes, requested by Martin. One
 possibility is that I could put that part under a param, disabled by
 default. We have something similar on the google branches that emits
 LIPO module info in the message, enabled via a param.

 But we have _dump files_ for that.  That's the developer-consumed
 form of opt-info.  -fopt-info is purely user sugar and for usual 
 translation
 units it shouldn't exceed a single terminal full of output.

 But as a developer I don't want to have to parse lots of dump files
 for a summary of the major optimizations performed (e.g. inlining,
 unrolling) for an application, unless I am diving into the reasons for
 why or why not one of those optimizations occurred in a particular
 location. I really do want a summary emitted to stderr so that it is
 easily searchable/summarizable for the app as a whole.

 For example, some of the apps I am interested in have thousands of
 input files, and trying to collect and parse dump files for each and
 every one is overwhelming (it probably would be even if my input files
 numbered in the hundreds). What has been very useful is having these
 high level summary messages of inlines and unrolls emitted to stderr
 by -fopt-info. Then it is easy to search and sort by hotness to get a
 feel for things like what inlines are missing when moving to a new
 compiler, or compiling a new version of the source, for example. Then
 you know which files to focus on and collect dump files for.

 I thought we can direct dump files to stderr now?  So, just use
 -fdump-tree-all=stderr

 and grep its contents.


 I'd argue that the other information (the profile counts, emitted only
 when using -fprofile-use, and the inline call chains) are useful if
 you want to understand whether and how critical inlines are occurring.
 I think this is the type of information that users focused on
 optimizations, as well as gcc developers, want when they use
 -fopt-info. Otherwise it is difficult to make sense of the inline
 information.

 Well, I doubt that inline information is interesting to users unless we are
 able to aggressively filter it to what users are interested in.  Which IMHO
 isn't possible - users are interested in I have not inlined this even 
 though
 inlining would severely improve performance which would indicate a bug
 in the heuristics we can reliably detect and thus it wouldn't be there.

 I have interacted with users who are aware of optimizations such as
 inlining and unrolling and want to look at that information to
 diagnose performance differences when refactoring code or using a new
 compiler version. I also think inlining (especially cross-module) is
 one example of an optimization that is still being tuned, and user
 reports of performance issues related to that have been useful.

 I really think that the two groups of people who will find -fopt-info
 useful are gcc developers and savvy performance-hungry users. For the
 former group the additional info

Re: [PATCH] Convert more passes to new dump framework

2013-09-02 Thread Richard Biener

On Fri, Aug 30, 2013 at 9:51 PM, Teresa Johnson tejohn...@google.com wrote:
 On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li davi...@google.com wrote:
 Except that in this form, the dump will be extremely large and not
 suitable for very large applications.

 Yes. I did some measurements for both a fairly large source file that
 is heavily optimized with LIPO and for a simple toy example that has
 some inlining. For the large source file, the output from
 -fdump-ipa-inline=stderr was almost 100x the line count of the
 -fopt-info output. For the toy source file it was 43x. The size of the
 -details output was 250x and 100x, respectively. Which is untenable
 for a large app.

 The issue I am having here is that I want a more verbose message, not
 a more voluminous set of messages. Using either -fopt-info-all or
 -fdump-ipa-inline to provoke the more verbose inline message will give
 me a much greater volume of output.

I think we will never reach the state where the dumping is exactly what
each developer wants (because their wants will differ).  Developers can
easily post-process the stderr output with piping through grep.

Richard.

 One compromise could be to emit the more verbose inliner message under
 a param (and a more concise foo inlined into bar by default with
 -fopt-info). Or we could do some variant of what David talks about
 below.

 Besides, we might also want to
 use the same machinery (dump_printf_loc etc) for dump file dumping.
 The current behavior of using '-details' to turn on opt-info-all
 messages for dump files are not desirable.

 Interestingly, this doesn't even work. When I do
 -fdump-ipa-inline-details=stderr (with my patch containing the inliner
 messages) I am not getting those inliner messages emitted to stderr.
 Even though in dumpfile.c details is set to (TDF_DETAILS |
 MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
 sure why, but will need to debug this.

 How about the following:

 1) add a new dump_kind modifier so that when that modifier is
 specified, the messages won't goto the alt_dumpfile (controlled by
 -fopt-info), but only to primary dump file. With this, the inline
 messages can be dumped via:

dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .)

 (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )

 Typically OR-ing together flags like this indicates dump under any of
 those conditions. But we could implement special handling for
 OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
 the primary dump file, and only under the other conditions specified
 in the flag (here under -optimized)



 2) add more flags in -fdump- support:

-fdump-ipa-inline-opt   -- turn on opt-info messages only
-fdump-ipa-inline-optall -- turn on opt-info-all messages

 According to the documentation (see the -fdump-tree- documentation on
 http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
 the above are already supposed to be there (-optimized, -missed, -note
 and -optall). However, specifying any of these gives a warning like:
cc1: warning: ignoring unknown option ‘optimized’ in
 ‘-fdump-ipa-inline’ [enabled by default]
 Probably because none is listed in the dump_options[] array in dumpfile.c.

 However, I don't think there is currently a way to use -fdump- options
 and *only* get one of these, as much of the current dump output is
 emitted whenever there is a dump_file defined. Until everything is
 migrated to the new framework it may be difficult to get this to work.

-fdump-tree-pre-ir -- turn on GIMPLE dump only
-fdump-tree-pre-details -- turn on everything (ir, optall, trace)

 With this, developers can really just use


 -fdump-ipa-inline-opt=stderr for inline messages.

 Yes, if we can figure out a good way to get this to work (i.e. only
 emit the optimized messages and not the rest of the dump messages).
 And unfortunately to get them all you need to specify
 -fdump-ipa-all-optimized -fdump-tree-all-optimized
 -fdump-rtl-all-optimized instead of just -fopt-info. Unless we can
 add -fdump-all-all-optimized.

 Teresa


 thanks,

 David

 On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson tejohn...@google.com 
 wrote:
 On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 New patch below that removes this global variable, and also outputs
 the node-symbol.order (in square brackets after the function name so
 as to not clutter it). Inline messages with profile data look look:

 test.c:8:3: note: foobar [0] (9000) inlined into foo [2] (1000)
 with call count 9000 (via inline instance bar [3] (9000))

 Ick.  This looks both redundant and cluttered.  This is supposed to be
 understandable by GCC users, not only GCC developers.

 The main part that is only useful/understandable to gcc developers is
 the node-symbol.order in square brackes, requested

Re: [PATCH] Convert more passes to new dump framework

2013-09-02 Thread Richard Biener

On Fri, Aug 30, 2013 at 11:23 PM, Teresa Johnson tejohn...@google.com wrote:
 On Fri, Aug 30, 2013 at 1:30 PM, Xinliang David Li davi...@google.com wrote:
 On Fri, Aug 30, 2013 at 12:51 PM, Teresa Johnson tejohn...@google.com 
 wrote:
 On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li davi...@google.com 
 wrote:
 Except that in this form, the dump will be extremely large and not
 suitable for very large applications.

 Yes. I did some measurements for both a fairly large source file that
 is heavily optimized with LIPO and for a simple toy example that has
 some inlining. For the large source file, the output from
 -fdump-ipa-inline=stderr was almost 100x the line count of the
 -fopt-info output. For the toy source file it was 43x. The size of the
 -details output was 250x and 100x, respectively. Which is untenable
 for a large app.

 The issue I am having here is that I want a more verbose message, not
 a more voluminous set of messages. Using either -fopt-info-all or
 -fdump-ipa-inline to provoke the more verbose inline message will give
 me a much greater volume of output.

 One compromise could be to emit the more verbose inliner message under
 a param (and a more concise foo inlined into bar by default with
 -fopt-info). Or we could do some variant of what David talks about
 below.

 something like --param=verbose-opt-info=1

 Yes. Richard, would this be acceptable for now?

 i.e. the inliner messages would be like:

 -fopt-info:
test.c:8:3: note: foobar inlined into foo with call count 9000
 (the with call count X only when there is profile feedback)

 -fopt-info --param=verbose-opt-info=1:
test.c:8:3: note: foobar/0 (9000) inlined into foo/2 (1000)
 with call count 9000 (via inline instance bar [3] (9000))
 (again the call counts only emitted under profile feedback)

It looks like a hack to me.  Is -fdump-ipa-inline useful at all?  That is,
can't we simply push some of the -details dumping into the non-details
dump?

Richard.




 Besides, we might also want to
 use the same machinery (dump_printf_loc etc) for dump file dumping.
 The current behavior of using '-details' to turn on opt-info-all
 messages for dump files are not desirable.

 Interestingly, this doesn't even work. When I do
 -fdump-ipa-inline-details=stderr (with my patch containing the inliner
 messages) I am not getting those inliner messages emitted to stderr.
 Even though in dumpfile.c details is set to (TDF_DETAILS |
 MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
 sure why, but will need to debug this.

 It works for vectorizer pass.

 Ok, let me see what is going on - I just confirmed that it is not
 working for the loop unroller messages either.



 How about the following:

 1) add a new dump_kind modifier so that when that modifier is
 specified, the messages won't goto the alt_dumpfile (controlled by
 -fopt-info), but only to primary dump file. With this, the inline
 messages can be dumped via:

dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .)

 (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )


 Yes.

 Typically OR-ing together flags like this indicates dump under any of
 those conditions. But we could implement special handling for
 OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
 the primary dump file, and only under the other conditions specified
 in the flag (here under -optimized)



 2) add more flags in -fdump- support:

-fdump-ipa-inline-opt   -- turn on opt-info messages only
-fdump-ipa-inline-optall -- turn on opt-info-all messages

 According to the documentation (see the -fdump-tree- documentation on
 http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
 the above are already supposed to be there (-optimized, -missed, -note
 and -optall). However, specifying any of these gives a warning like:
cc1: warning: ignoring unknown option ‘optimized’ in
 ‘-fdump-ipa-inline’ [enabled by default]
 Probably because none is listed in the dump_options[] array in dumpfile.c.

 However, I don't think there is currently a way to use -fdump- options
 and *only* get one of these, as much of the current dump output is
 emitted whenever there is a dump_file defined. Until everything is
 migrated to the new framework it may be difficult to get this to work.

-fdump-tree-pre-ir -- turn on GIMPLE dump only
-fdump-tree-pre-details -- turn on everything (ir, optall, trace)

 With this, developers can really just use


 -fdump-ipa-inline-opt=stderr for inline messages.

 Yes, if we can figure out a good way to get this to work (i.e. only
 emit the optimized messages and not the rest of the dump messages).
 And unfortunately to get them all you need to specify
 -fdump-ipa-all-optimized -fdump-tree-all-optimized
 -fdump-rtl-all-optimized instead of just -fopt-info. Unless we can
 add -fdump-all-all-optimized.

 Having general support requires cleanup of all the old style  if
 (dump_file) fprintf

Re: [RFC] Changes to the wide-int classes

2013-09-02 Thread Richard Sandiford

Kenneth Zadeck zad...@naturalbridge.com writes:
 There is no place for exactly two HWIs in the machine independent parts 
 of the compiler,

I totally agree.  In fact I was taking that so much for granted that
I didn't even think to add a rider about it, sorry.  I didn't mean
to imply that we should keep double_int around.

I think the reason for doing this is to prove that it can be done
(so that the wide_int code isn't too big a change for the tree level)
and to make it easier to merge the wide-int patches into trunk piecemeal
if we need to.

  small bugs below this line.
 bottom of frag 3 of gcc/cp/init.c is wrong:   you replaced 
 rshift...lshift with lshift...lshift.

Do you mean this bit:

unsigned shift = (max_outer_nelts.get_precision ()) - 7
- - max_outer_nelts.clz ().to_shwi ();
-   max_outer_nelts = max_outer_nelts.rshiftu (shift).lshift (shift);
+ - wi::clz (max_outer_nelts);
+   max_outer_nelts = wi::lshift (wi::lrshift (max_outer_nelts, shift),
+ shift);

?  That's lrshift (logical right shift).  I ended up using the double-int
names for right shifts.

That does remind me of another thing though.  I notice some of the wide-int
code assumes that shifting a signed HWI right gives an arithmetic shift,
but the language doesn't guarantee that.  We probably need to go through
and fix those.

 i will finish reading this tomorrow, but i wanted to get some comments 
 in for the early shift.i stopped reading at line 1275.

Thanks.  TBH I've not really been through the third part myself to
double-check.  Will try to do that while waiting for comments on the
first part.

Richard

Re: [patch 4/4] -fstrict-volatile-bitfields cleanup v3: remove from defaults on all targets

2013-09-02 Thread Richard Biener

On Sun, Sep 1, 2013 at 3:10 PM, Bernd Edlinger
bernd.edlin...@hotmail.de wrote:
 On Fri, 30 Aug 2013 11:47:21, Richard Biener wrote:
 On Tue, Jul 2, 2013 at 7:33 PM, DJ Delorie d...@redhat.com wrote:

 The choice appears to be to continue to have broken volatile bitfields
 on ARM with no way for users to make them conform to the ABI, or to
 change things so that they conform to the ABI if you specify
 -fstrict-volatile-bitfields explicitly and to the C/C++ standard by
 default, without that option.

 I can't speak for ARM, but for the other targets (for which I wrote
 the original patch), the requirement is that volatile bitfield
 accesses ALWAYS be in the mode of the type specified by the user.  If
 the user says int x:8; then SImode (assuming 32-bit ints) must
 always be used, even if QImode could be used.

 The reason for this is that volatile bitfields are normally used for
 memory-mapped peripherals, and accessing peripheral registers in the
 wrong mode leads to incorrect operation.

 I've argued in the past that this part of the semantics can be easily
 implemented in the existing C++ memory model code (well, in
 get-bit-range) for the cases where it doesn't conflict with the C++
 memory model.  For the cases where it conflicts a warning can
 be emitted, Like when you do

 struct {
 volatile int x:8;
 char c;
 } x;

 where 'c' would be within the SImode you are supposed to use
 with -fstrict-volatile-bitfields.

 Which raises the question ... how does

 x.x = 1;

 actually work?  IMHO it must be sth horribly inefficient like

  tem = x.c; // QImode
  tem = tem  8 | 1; // combine into SImode value
  x.x = tem; // SImode store


 AAPCS is very explicit what should happen here:
  tem = x.x; // SImode
  tem = (tem  ~0xFF) | 1;
  x.x = tem; // SImode

  struct x should be 4 bytes large, and 4 bytes aligned (int x)
  the member c is at offset 1, and gets overwritten which is
  forbidden in C++ memory model, but required by AAPCS

Ok, so what I think is weird here is that the volatile store to x.x is replaced
with a read and a store - for hardware I/O that's exactly what I would not
have expected if I qualify sth with volatile.  In my simple mind 'volatile'
preserves the exact number of loads and stores.

Of course I was wrong here ;)

 hoping that no sane ABI makes 'x' size 2.  Oh, I _can_ make it size 2:

 struct {
 volatile int x:8;
 char c;
 } __attribute__((packed)) x;
 char y;


 IMHO the AAPCS forbids packed structures. Therefore we need not
 interfere with the C++ memory model if we have unaligned data.

 note the fancy global object 'y' I placed after 'x'.  Now the store will
 clobber y(?)  So the only 'valid' way is

   tem = x.x; // SImode read(!)
   tem = tem  0xff..00 | 1; // manipulate value
   x.x = tem; // SImode store

 but then this doesn't work either because that 1-byte aligned object
 could reside at a page boundary and so the read and write would trap.


 That is exactly what happened see bug#56341, and it is fixed with
 parts 1  2 of Sandra's patch. Here because if the 1-byte alignment
 the function strict_volatile_bitfield_p() returns false and thus the
 C++ memory model is used.

 Makes me ask who designed that crap ;)

 But my point was that for all these special cases that likely do not
 happen in practice (fingers crossed) the C++ memory model way
 doesn't interfere with -fstrict-volatile-bitfields and the code can be
 perfectly used to make the code as close as possible to the
 -fstrict-volatile-bitifeld ABI.


 Actually the C++ memory model and -fstrict-volatile-bitfields have
 some significant differences, your first example is one of them.

 And since part 4 changes the default of -fstrict-volatile-bitfields,
 I thought it would be good to have a warning when such a construct
 is used, which generates inherently different code if
 -fstrict-volatile-bitfields is used or not.

 I personally could live with or without part 4 of the patch,
 and I do not insist on the warnings part either. That was only
 my try to find a way how part 4 of Sandra's patch could be generally
 accepted.

 Note: If you want I can re-post the warnings patch in a new thread.

Can someone, in a new thread, ping the patches that are still in
flight?  ISTR having approved bits of some patches before my leave.

Thanks,
Richard.

 Thanks
 Bernd.

 Richard.

Re: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction

2013-09-02 Thread Richard Biener

On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng bin.ch...@arm.com wrote:
 Hi,

 The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find
 different MEM_REFs sharing common part in addressing expression.  If such
 MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient
 addressing expression during the RTL passes.
 The pass analyzes addressing expression in each MEM_REF to see if it can be
 formalized as follows:
  base:MEM_REF (T1, C1)
  offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
  bitpos:  C4 * BITS_PER_UNIT
 Then restructures it into below form:
  MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
   C1 + (C2 * C3) + C4)
 At last, rewrite the MEM_REFs if there are two or more sharing common
 (non-constant) part.
 The problem is it doesn't back trace T2.  If T2 is recorded as a CAND_ADD in
 form of T2' + C5, the MEM_REF should be restructure into:
  MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)),
   C1 + (C2 * C3) + C4 + (C5 * C3))

 The patch also includes a test case to illustrate the problem.

 Bootstrapped and tested on x86/x86_64/arm-a15, is it ok?

This looks ok to me if Bill is ok with it.

Thanks,
Richard.

 Thanks.
 bin

 2013-09-02  Bin Cheng  bin.ch...@arm.com

 * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New.
 (restructure_reference): Call backtrace_base_for_ref.

 gcc/testsuite/ChangeLog
 2013-09-02  Bin Cheng  bin.ch...@arm.com

 * gcc.dg/tree-ssa/slsr-39.c: New test.

Re: [PATCH GCC]Find auto-increment use both before and after candidate's increment in IVOPT

2013-09-02 Thread Richard Biener

On Mon, Sep 2, 2013 at 9:03 AM, bin.cheng bin.ch...@arm.com wrote:
 Hi,
 For now set_autoinc_for_original_candidates only searches auto-inc uses
 before candidate's increment, causing pre-increment opportunities missed for
 original candidates.  This is a straightforward fix by searching after
 candidate's increment too.

 The patch also includes a test case to illustrate the problem.  Without the
 patch, assembly of the test is:
 foo:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 movwr3, #:lower16:__ctype_ptr__
 ldrbr2, [r0]@ zero_extendqisi2
 movtr3, #:upper16:__ctype_ptr__
 ldr r1, [r3]
 addsr3, r1, r2
 ldrbr3, [r3, #1]@ zero_extendqisi2
 lslsr3, r3, #29
 bmi .L2
 addsr3, r0, #1
 .L3:
 mov r0, r3
 addsr3, r3, #1
 ldrbr2, [r0]@ zero_extendqisi2
 add r2, r2, r1
 ldrbr2, [r2, #1]@ zero_extendqisi2
 lslsr2, r2, #29
 bpl .L3
 .L2:
 bx  lr
 .size   foo, .-foo

 Which can be optimized into below:
 foo:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 movwr3, #:lower16:__ctype_ptr__
 ldrbr1, [r0]@ zero_extendqisi2
 movtr3, #:upper16:__ctype_ptr__
 ldr r2, [r3]
 addsr3, r2, r1
 ldrbr3, [r3, #1]@ zero_extendqisi2
 lslsr1, r3, #29
 bmi .L2
 .L3:
 ldrbr3, [r0, #1]!   @ zero_extendqisi2
 add r3, r3, r2
 ldrbr3, [r3, #1]@ zero_extendqisi2
 lslsr3, r3, #29
 bpl .L3
 .L2:
 bx  lr
 .size   foo, .-foo

 Bootstrapped and tested on arm a15, is it OK?

Ok.

Thanks,
Richard.

 Thanks.
 bin

 2013-09-02  Bin Cheng  bin.ch...@arm.com

 * tree-ssa-loop-ivopts.c (set_autoinc_for_original_candidates):
 Find auto-increment use both before and after candidate.

 gcc/testsuite/ChangeLog
 2013-09-02  Bin Cheng  bin.ch...@arm.com

 * gcc.target/arm/ivopts-orig_biv-inc.c: New test.

Re: [PATCH, PR 58106] Make ipa_edge_duplication_hook cope with recursive inlining

2013-09-02 Thread Jan Hubicka

 2013-08-27  Martin Jambor  mjam...@suse.cz
 
   PR ipa/58106
   * ipa-prop.c (ipa_edge_duplication_hook): Always put new rdesc to the
   linked list.  When finding the correct duplicate, also consider also
   the caller in additon to its inlined_to node.
 
 testsuite/
   * gcc.dg/ipa/pr58106.c: New test.
OK,
thanks
Honza

Re: converting wide-int so that it uses its own type rather than hwi.

2013-09-02 Thread Richard Biener

On Fri, Aug 30, 2013 at 5:21 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:
 richi,

 on further thought, this is going to be a huge task.   The problem is at the
 edges.   right now we share the rep of the array with the tree-csts and rtl
 (though the rtl sharing may go away to support canonization).  So if the hwi
 rep inside of wide int changes then we will have to implement copying with
 reblocking at that interface or push the type into there and change all of
 the fits_*hwi and to_*hwi interfaces to fit this different type.

Obviously the reps of RTL and tree would need to change as well.  Yes,
I am aware of the explicit HWI references in the API - those functions would
be more complicated.

 i think i can get at least as good and perhaps better test coverage by
 changing the rep of hwi for a port.There will also be fallout work
 there, but it will be productive, in that it is just changing code from only
 doing the hwi case to supporting all precisions.

Well, I'm not sure the fallout will be exactly small ;)

What also could improve testing coverage is to skip the fast path, thus
always go the out-of-line path even for len == 1.  Not sure if the ool path
is prepared to handle len == 1 though.

Richard.

 Kenny

Re: [PATCH] Don't issue array bound warnings on zero-length arrays

2013-09-02 Thread Richard Biener

On Fri, Aug 30, 2013 at 5:13 PM, Meador Inge mead...@codesourcery.com wrote:
 Hi All,

 This patch fixes a minor issue that can occur when issuing array bounds
 warnings.  In GNU C mode we allow empty lists and their upper bound is
 initialized to -1.  This confuses the array bound analysis in VRP and
 in some cases we end up issuing false positives.  This patch fixes
 the issue by bailing out when a zero-length array is encountered.

 OK for trunk?

 gcc/

 2013-08-30  Meador Inge  mead...@codesourcery.com

 * tree-vrp.c (check_array_ref): Bail out no emtpy arrays.

 gcc/testsuite/

 2013-08-30  Meador Inge  mead...@codesourcery.com

 * gcc.dg/Warray-bounds-11.c: New testcase.

 Index: gcc/testsuite/gcc.dg/Warray-bounds-11.c
 ===
 --- gcc/testsuite/gcc.dg/Warray-bounds-11.c (revision 0)
 +++ gcc/testsuite/gcc.dg/Warray-bounds-11.c (revision 0)
 @@ -0,0 +1,12 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -Warray-bounds -std=gnu99 } */
 +/* Test zero-length arrays for GNU C.  */
 +
 +unsigned int a[] = { };
 +unsigned int size_a;
 +
 +int test(void)
 +{
 +  /* This should not warn.  */
 +  return size_a ? a[0] : 0;
 +}
 Index: gcc/tree-vrp.c
 ===
 --- gcc/tree-vrp.c  (revision 202088)
 +++ gcc/tree-vrp.c  (working copy)
 @@ -6137,9 +6137,10 @@ check_array_ref (location_t location, tr
low_sub = up_sub = TREE_OPERAND (ref, 1);
up_bound = array_ref_up_bound (ref);

 -  /* Can not check flexible arrays.  */
 +  /* Can not check flexible arrays or zero-length arrays.  */
if (!up_bound
 -  || TREE_CODE (up_bound) != INTEGER_CST)
 +  || TREE_CODE (up_bound) != INTEGER_CST
 +  || tree_int_cst_equal (up_bound, integer_minus_one_node))

That doesn't look correct - what if the lower bound is -10?  That can
easily happen
for Ada, so please revert the patch.  And I fail to see why the testcase should
not warn.  Clearly you have a definition of a here and it doesn't have
an element
so the access is out of bounds.

Richard.

  return;

/* Accesses to trailing arrays via pointers may access storage

Re: Fix PR middle-end/57370

2013-09-02 Thread Richard Biener

On Fri, Aug 30, 2013 at 6:13 PM, Easwaran Raman era...@google.com wrote:
 On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman era...@google.com wrote:
 Richard,
 Do you want me to commit everything but the  insert_stmt_after hunk
 now?

 Yes.

 There are more similar failures reported in
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
 the updated patch there. Shall I send that for review? Apart from the
 debug statement issue, almost all the bugs are due to dependence
 violation because certain newly inserted statements do not have the
 right UID. Instead of trying to catch all of them, will it be better
 if I check if the stmt has a proper uid (non-zero if it is not the
 first stmt) and assign a sensible value at the point where it is used
 (find_insert_point and appears_later_in_bb) instead of where the stmt
 is created? I think that would be less brittle.

 In the end all this placement stuff should be reverted and done as
 post-processing after reassoc is finished.  You can remember the
 inserted stmts that are candidates for moving (just set a pass-local
 flag on them) and assign UIDs for the whole function after all stmts
 are inserted.

 The problem is we need sane UID values as reassoc is happening - not
 after it is finished. As it process each stmt in reassoc_bb, it might
 generate new stmts which might have to be compared in
 find_insert_point or appears_later_in_bb.

But if you no longer find_insert_point during reassoc but instead do
a scheduling pass after it finished you won't need sane UIDs during
reassoc.

Richard.

 - Easwaran

 Richard.


 - Easwaran



 On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote:
 I have a new patch that supersedes this. The new patch also fixes PR
 tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
 no test regression on x86_64/linux. Ok for trunk?

 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
 of debug statements that cause inconsistent IR.

 Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
 at all.  The other hunks are ok, but we need to work harder to preserve
 debug stmts - simply removing all is not going to fly.

 Richard.


 testsuite/ChangeLog:
 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 PR tree-optimization/57393
 PR tree-optimization/58011
 * gfortran.dg/reassoc_12.f90: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.


 On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com 
 wrote:
 A newly generated statement in build_and_add_sum  function of
 tree-ssa-reassoc.c has to be assigned the UID of its adjacent
 statement. In one instance, it was assigned the wrong uid (of an
 earlier phi statement) which messed up the IR and caused the test
 program to hang. Bootstraps and no test regressions on x86_64/linux.
 Ok for trunk?

 Thanks,
 Easwaran


 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of 
 a phi
 node for a non-phi gimple statement.

 testsuite/ChangeLog:
 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * gfortran.dg/reassoc_12.f90: New testcase.


 Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
 ===
 --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 @@ -0,0 +1,74 @@
 +! { dg-do compile }
 +! { dg-options -O2 -ffast-math }
 +! PR middle-end/57370
 +
 + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
 +   grad_deriv,npoints, sx)
 +IMPLICIT REAL*8 (t)
 +INTEGER, PARAMETER :: dp=8
 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
 +   e_ndrho_ndrho_rho
 +  DO ii=1,npoints
 +  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
 +t1425 = t233 * t557
 +t1429 = beta * t225
 +t1622 = t327 * t1621
 +t1626 = t327 * t1625
 +t1632 = t327 * t1631
 +t1685 = t105 * t1684
 +t2057 = t1636 + t8 * (t2635 + t3288)
 +  END IF
 +  IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN
 +t5469 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 - t5456 + t5459  - 
 +t5462 + t5466 - t5468
 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
 +t5489 =

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-02 Thread Kugan

I'd like to ping this  patch 1 of 2 that removes redundant zero/sign 
extension using value range information.


Bootstrapped and no new regression for  x86_64-unknown-linux-gnu and 
arm-none-linux-gnueabi.


Thanks you for your time.
Kugan

n 14/08/13 16:49, Kugan wrote:

Hi Richard,

Here is an attempt to address your earlier review comments. Bootstrapped
and there is no new regression for X86_64 and arm. Thank you very much
for your time.

Thanks,
Kugan

--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,25 @@
+2013-08-14  Kugan Vivekanandarajah  kug...@linaro.org
+
+* tree-flow.h (mark_range_info_unknown): New function definition.
+* tree-ssa-alias.c (dump_alias_info) : Check pointer type.
+* tree-ssa-copy.c (fini_copy_prop) : Check pointer type and copy
+range info.
+* tree-ssanames.c (make_ssa_name_fn) : Check pointer type in
+initialize.
+* (mark_range_info_unknown) : New function.
+* (duplicate_ssa_name_range_info) : Likewise.
+* (duplicate_ssa_name_fn) : Check pointer type and call correct
+duplicate function.
+* tree-vrp.c (extract_exp_value_range): New function.
+* (simplify_stmt_using_ranges): Call extract_exp_value_range and
+tree_ssa_set_value_range.
+* tree-ssaname.c (ssa_range_info): New function.
+* tree.h (SSA_NAME_PTR_INFO) : changed to access via union
+* tree.h (SSA_NAME_RANGE_INFO) : New macro
+* gimple-pretty-print.c (print_double_int) : New function.
+* gimple-pretty-print.c (dump_gimple_phi) : Dump range info.
+* (pp_gimple_stmt_1) : Likewise.
+
   2013-08-09  Jan Hubicka  j...@suse.cz

   * cgraph.c (cgraph_create_edge_1): Clear speculative flag.

On 03/07/13 21:55, Kugan wrote:

On 17/06/13 18:33, Richard Biener wrote:

On Mon, 17 Jun 2013, Kugan wrote:
+/* Extract the value range of assigned exprassion for GIMPLE_ASSIGN
stmt.
+   If the extracted value range is valid, return true else return
+   false.  */
+static bool
+extract_exp_value_range (gimple stmt, value_range_t *vr)
+{
+  gcc_assert (is_gimple_assign (stmt));
+  tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
...
@@ -8960,6 +9016,23 @@ simplify_stmt_using_ranges (gimple_stmt_iterator
*gsi)
  {
enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+
+  /* Set value range information for ssa.  */
+  if (!POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+   (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
+   INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+   !SSA_NAME_RANGE_INFO (lhs))
+{
+  value_range_t vr = VR_INITIALIZER;
...
+  if (extract_exp_value_range (stmt, vr))
+tree_ssa_set_value_range (lhs,
+  tree_to_double_int (vr.min),
+  tree_to_double_int (vr.max),
+  vr.type == VR_RANGE);
+}

This looks overly complicated to me.  In vrp_finalize you can simply do

   for (i = 0; i  num_vr_values; i++)
 if (vr_value[i])
   {
 tree name = ssa_name (i);
 if (POINTER_TYPE_P (name))
   continue;
 if (vr_value[i].type == VR_RANGE
 || vr_value[i].type == VR_ANTI_RANGE)
   tree_ssa_set_value_range (name, tree_to_double_int
(vr_value[i].min), tree_to_double_int (vr_value[i].max),
vr_value[i].type
== VR_RANGE);
   }



Thanks Richard for taking time to review it.

I was doing something like what you are suggesting earlier but noticed
some problems and that’s the reason why I changed.

For example, for the following testcase from the test suite,

unsigned long l = (unsigned long)-2;
unsigned short s;

int main () {
   long t = l + 1;
   s = l;
   if (s != (unsigned short) -2)
 abort ();
   exit (0);
}

with the following gimple stmts

main ()
{
   short unsigned int s.1;
   long unsigned int l.0;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
   l.0_2 = l;
   s.1_3 = (short unsigned int) l.0_2;
   s = s.1_3;
   if (s.1_3 != 65534)
 goto bb 3;
   else
 goto bb 4;
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
   abort ();
;;succ:

;;   basic block 4, loop depth 0
;;pred:   2
   exit (0);
;;succ:

}



has the following value range.

l.0_2: VARYING
s.1_3: [0, +INF]


 From zero/sign extension point of view, the variable s.1_3 is expected
to have a value that will overflow (or varying) as this is what is
assigned to a smaller variable. extract_range_from_assignment initially
calculates the value range as VARYING but later changed to [0, +INF] by
extract_range_basic. What I need here is the value that will be assigned
from the rhs expression and not the value that we will have with proper
assignment.

I understand that

Re: [PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-09-02 Thread Kugan

I'd like to ping this  patch 2 of 2 that removes redundant zero/sign 
extension using value range information.


Bootstrapped and no new regression for  x86_64-unknown-linux-gnu and 
arm-none-linux-gnueabi.


Thanks you for your time.
Kugan

On 14/08/13 16:59, Kugan wrote:

Hi Eric,

Thanks for reviewing the patch.

On 01/07/13 18:51, Eric Botcazou wrote:

[Sorry for the delay]


For example, when an expression is evaluated and it's value is assigned
to variable of type short, the generated RTL would look something like
the following.

(set (reg:SI 110)
   (zero_extend:SI (subreg:HI (reg:SI 117) 0)))

However, if during value range propagation, if we can say for certain
that the value of the expression which is present in register 117 is
within the limits of short and there is no sign conversion, we do not
need to perform the subreg and zero_extend; instead we can generate the
following RTl.

(set (reg:SI 110)
   (reg:SI 117)))

Same could be done for other assign statements.


The idea looks interesting.  Some remarks:


+2013-06-03  Kugan Vivekanandarajah  kug...@linaro.org
+
+* gcc/dojump.c (do_compare_and_jump): generates rtl without
+zero/sign extension if redundant.
+* gcc/cfgexpand.c (expand_gimple_stmt_1): Likewise.
+* gcc/gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
+function.
+* gcc/gimple.h (gimple_assign_is_zero_sign_ext_redundant) : New
+function definition.


No gcc/ prefix in entries for gcc/ChangeLog.  Generate RTL without...


I have now changed it to.

--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2013-08-14  Kugan Vivekanandarajah  kug...@linaro.org
+
+* dojump.c (do_compare_and_jump): Generate rtl without
+zero/sign extension if redundant.
+* cfgexpand.c (expand_gimple_stmt_1): Likewise.
+* gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
+function.
+* gimple.h (gimple_assign_is_zero_sign_ext_redundant) : New
+function definition.
+
  2013-08-09  Jan Hubicka  j...@suse.cz

  * cgraph.c (cgraph_create_edge_1): Clear speculative flag.



+/* If the value in SUBREG of temp fits that SUBREG (does not
+   overflow) and is assigned to target SUBREG of the same
mode
+   without sign convertion, we can skip the SUBREG
+   and extension.  */
+else if (promoted
+  gimple_assign_is_zero_sign_ext_redundant (stmt)
+  (GET_CODE (temp) == SUBREG)
+  (GET_MODE (target) == GET_MODE (temp))
+  (GET_MODE (SUBREG_REG (target))
+ == GET_MODE (SUBREG_REG (temp
+  emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
  else if (promoted)
{
  int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target);

Can we relax the strict mode equality here?  This change augments the
same
transformation applied to the RHS when it is also a
SUBREG_PROMOTED_VAR_P at
the beginning of convert_move, but the condition on the mode is less
strict in
the latter case, so maybe it can serve as a model here.



I have now changed it based on convert_move to
+else if (promoted
+  gimple_assign_is_zero_sign_ext_redundant (stmt)
+  (GET_CODE (temp) == SUBREG)
+  (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (temp)))
+ = GET_MODE_PRECISION (GET_MODE (target)))
+  (GET_MODE (SUBREG_REG (target))
+ == GET_MODE (SUBREG_REG (temp
+  {

Is this what you wanted me to do.


+  /* Is zero/sign extension redundant as per VRP.  */
+  bool op0_ext_redundant = false;
+  bool op1_ext_redundant = false;
+
+  /* If promoted and the value in SUBREG of op0 fits (does not
overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op0) == SUBREG  SUBREG_PROMOTED_VAR_P (op0))
+op0_ext_redundant =
+  gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT
(treeop0));
+
+  /* If promoted and the value in SUBREG of op1 fits (does not
overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op1) == SUBREG  SUBREG_PROMOTED_VAR_P (op1))
+op1_ext_redundant =
+  gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT
(treeop1));

Are the gimple_assign_is_zero_sign_ext_redundant checks necessary here?
When set on a SUBREG, SUBREG_PROMOTED_VAR_P guarantees that SUBREG_REG is
always properly extended (otherwise it's a bug) so don't you just need to
compare SUBREG_PROMOTED_UNSIGNED_SET?  See do_jump for an existing case.


I am sorry I don’t think I understood you here. How would I know that
extension is redundant without calling
gimple_assign_is_zero_sign_ext_redundant ? Could you please elaborate.



+  /* If zero/sign extension is redundant, generate RTL
+ for operands without zero/sign extension.  */
+  if

Re: [RFC] Changes to the wide-int classes

2013-09-02 Thread Richard Biener

On Mon, 2 Sep 2013, Richard Sandiford wrote:

 Kenneth Zadeck zad...@naturalbridge.com writes:
  There is no place for exactly two HWIs in the machine independent parts 
  of the compiler,
 
 I totally agree.  In fact I was taking that so much for granted that
 I didn't even think to add a rider about it, sorry.  I didn't mean
 to imply that we should keep double_int around.
 
 I think the reason for doing this is to prove that it can be done
 (so that the wide_int code isn't too big a change for the tree level)
 and to make it easier to merge the wide-int patches into trunk piecemeal
 if we need to.

Note that changing the tree rep to non-double_int is easy.  Also note
that I only want 'double_int' to stay around to have a fast type
that can work on two HWIs for the code that need more precision
than a HWI.  The only important cases I know of are in
get_inner_reference and get_ref_base_and_extent and friends.  Those
are heavily used (and the only double-int function callers that
even show up in regular cc1 profiles).

So if wide_int_fixed2 ('2' better be replaced with
'number-that-gets-me-twice-target-sizetype-precision')
works for those cases then fine (and we can drop double-ints).

Richard.

   small bugs below this line.
  bottom of frag 3 of gcc/cp/init.c is wrong:   you replaced 
  rshift...lshift with lshift...lshift.
 
 Do you mean this bit:
 
   unsigned shift = (max_outer_nelts.get_precision ()) - 7
 -   - max_outer_nelts.clz ().to_shwi ();
 - max_outer_nelts = max_outer_nelts.rshiftu (shift).lshift (shift);
 +   - wi::clz (max_outer_nelts);
 + max_outer_nelts = wi::lshift (wi::lrshift (max_outer_nelts, shift),
 +   shift);
 
 ?  That's lrshift (logical right shift).  I ended up using the double-int
 names for right shifts.
 
 That does remind me of another thing though.  I notice some of the wide-int
 code assumes that shifting a signed HWI right gives an arithmetic shift,
 but the language doesn't guarantee that.  We probably need to go through
 and fix those.
 
  i will finish reading this tomorrow, but i wanted to get some comments 
  in for the early shift.i stopped reading at line 1275.
 
 Thanks.  TBH I've not really been through the third part myself to
 double-check.  Will try to do that while waiting for comments on the
 first part.
 
 Richard
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: [PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-09-02 Thread Chung-Lin Tang

Ping.

On 13/8/20 10:57 AM, Chung-Lin Tang wrote:
 Ping.
 
 BTW, the SC has approved the Nios II port already:
 http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html
 
 The port is still awaiting technical review.
 
 Thanks,
 Chung-Lin
 
 On 13/7/14 下午3:54, Chung-Lin Tang wrote:
 Hi, the last ping of the Nios II patches was:
 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html

 After assessing the state, we feel it would be better to post a
 re-submission of the newest patches.

 The changes accumulated since the original post include:

 1) Several bug fixes related to built-in function expanding.
 2) A few holes in hard-float FPU code generation was plugged.
 3) Support for parsing white-spaces in target attributes.
 4) Revision of consistency check behavior of codes in custom instruction
 built-ins.
 5) Some new testcases.

 The issues raised by Joseph in the first round of reviewing have been
 addressed. Testing has been re-done on both 32-bit and 64-bit hosts.

 PR55035 appears to not have been resolved yet, which affects nios2 among
 several other targets, thus configured with --enable-werror-always still
 does not build.

 As before, Sandra and me will serve as nios2 port maintainers.

 Attached is the patch for the compiler-proper.

 Thanks,
 Chung-Lin

 2013-07-14  Chung-Lin Tang  clt...@codesourcery.com
 Sandra Loosemore  san...@codesourcery.com
 Based on patches from Altera Corporation

 * config.gcc (nios2-*-*): Add nios2 config targets.
 * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case.
 ($cpu_type): Add nios2 as new cpu type.
 * configure: Regenerate.
 * config/nios2/nios2.c: New file.
 * config/nios2/nios2.h: New file.
 * config/nios2/nios2-opts.h: New file.
 * config/nios2/nios2-protos.h: New file.
 * config/nios2/elf.h: New file.
 * config/nios2/elf.opt: New file.
 * config/nios2/linux.h: New file.
 * config/nios2/nios2.opt: New file.
 * config/nios2/nios2.md: New file.
 * config/nios2/predicates.md: New file.
 * config/nios2/constraints.md: New file.
 * config/nios2/t-nios2: New file.
 * common/config/nios2/nios2-common.c: New file.
 * doc/invoke.texi (Nios II options): Document Nios II specific
 options.
 * doc/md.texi (Nios II family): Document Nios II specific
 constraints.
 * doc/extend.texi (Function Specific Option Pragmas): Document
 Nios II supported target pragma functionality.

Re: [patch 4/4] -fstrict-volatile-bitfields cleanup v3: remove from defaults on all targets

2013-09-02 Thread Richard Earnshaw

On 01/09/13 14:10, Bernd Edlinger wrote:
 IMHO the AAPCS forbids packed structures. Therefore we need not
 interfere with the C++ memory model if we have unaligned data.

The AAPCS neither forbids nor requires packed structures.  They're a GNU
extension and as such not part of standard C++.  Thus the semantics of
such an operation are irrelavant to the AAPCS: you get to chose what the
behaviour is in this case...

R.

Re: RFC - Refactor tree.h

2013-09-02 Thread Richard Biener

On Sat, Aug 31, 2013 at 1:22 AM, Diego Novillo dnovi...@google.com wrote:
 Thanks for the suggestions.  I've incorporated them into the patch.
 It now adds tree-core.h with all the structures, enums, typedefs and
 some fundamental declarations from tree.h.  Everything else stays in
 tree.h for now.

 As we convert gimple files, we'll move declarations out of tree.h into
 other headers and rewrite missing functions in the new gimple API.

 This new patch bootstraps and tests fine on x86_64.

 OK for trunk?

Looks good to me.

Richard.

 2013-08-30  Diego Novillo  dnovi...@google.com

 * Makefile.in (TREE_CORE_H): Define.
 (TREE_H): Use.
 (GTFILES): Add tree-core.h.
 * builtins.c (built_in_class_names): Use BUILT_IN_LAST to
 size the array.
 * tree-core.h: New file.
 Move all data structures, enum, typedefs, global
 declarations and constants from ...
 * tree.h: ... here.


 Thanks.  Diego.

Re: folding (vec_)cond_expr in a binary operation

2013-09-02 Thread Marc Glisse


On Fri, 30 Aug 2013, Richard Biener wrote:


On Sat, Jul 6, 2013 at 9:42 PM, Marc Glisse marc.gli...@inria.fr wrote:

First, the fold-const bit causes an assertion failure (building libjava) in
combine_cond_expr_cond, which calls:

  t = fold_binary_loc (gimple_location (stmt), code, type, op0, op1);

and then checks:

  /* Require that we got a boolean type out if we put one in.  */
  gcc_assert (TREE_CODE (TREE_TYPE (t)) == TREE_CODE (type));

which makes sense... Note that the 2 relevant types are:


well, the check is probably old and can be relaxed to also allow all
compatible types.


Ok. Maybe it could even be removed then, we shouldn't check in every 
function that calls fold_binary_loc that it returns something of the type 
that was asked for.



Second, the way the forwprop transformation is done, it can be necessary to
run the forwprop pass several times in a row, which is not nice. It takes:

stmt_cond
stmt_binop

and produces:

stmt_folded
stmt_cond2

But one of the arguments of stmt_folded could be an ssa_name defined by a
cond_expr, which could now be propagated into stmt_folded (there may be
other new possible transformations). However, that cond_expr has already
been visited and won't be again. The combine part of the pass uses a PLF to
revisit statements, but the forwprop part doesn't have any specific
mechanism.


forwprop is designed to re-process stmts it has folded to catch cascading
effects.  Now, as it as both a forward and a backward run you don't catch
2nd-order effects with iterating them.  On my TODO is to only do one kind,
either forward or backward propagation.


My impression is that even internally in the forward part, the
reprocessing doesn't really work, but that'll disappear anyway when the
forward part disappears.


Btw, as for the patch I don't like that you basically feed everything into
fold.  Yes, I know we do that for conditions because that's quite important
and nobody has re-written the condition folding as gimple pattern matcher.
I doubt that this transform is important enough to warrant another 
exception ;)


I am not sure what you want here. When I notice the pattern (a?b:c) op d, 
I need to check whether b op d and c op d are likely to simplify before 
transforming it to a?(b op d):(c op d). And currently I can't see any way 
to do that other than feeding (b op d) to fold. Even if/when we get a 
gimple folder, we will still need to call it in about the same way.


--
Marc Glisse

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Jan Hubicka

 On Aug 21, 2013, at 11:47 PM, Jan Hubicka hubi...@ucw.cz wrote:
  The problem is that DECL_ARGUMENTS of the thunk (aka _ZThn528_N1D3fooEv) 
  is used during thunk code-generation, and thunk code-generation happens 
  during the output of D::foo.
 
  I see, I will try to modify i386 backend to not output thunks.  The problem
  indeed is that thunks' arguments are built by the front-end and they are no
  longer streamed.  I am surprised i386 survives this given that it also
  produces some gimple thunks.
  
  I guess easiest way around is to make them to be streamed same way as we 
  stream
  functions that are used as abstract origin.  I have different plans in this
  direction - I want to lower thunks to gimple form early so they go through 
  the
  usual channel and get i.e. the profile read correctly.
 
 So, any news on this?
I was bid dragged into other debuggning. Sorry for the delay.
This patch seems to fix the problems on ppc64 hacked to not use asm thunks.
Can you, please, test it that it solves problems on your target?

Honza

Index: cgraphunit.c
===
--- cgraphunit.c(revision 202153)
+++ cgraphunit.c(working copy)
@@ -1414,14 +1414,18 @@ expand_thunk (struct cgraph_node *node)
   tree virtual_offset = NULL;
   tree alias = node-callees-callee-symbol.decl;
   tree thunk_fndecl = node-symbol.decl;
-  tree a = DECL_ARGUMENTS (thunk_fndecl);
+  tree a;
+
+  if (in_lto_p)
+cgraph_get_body (node);
+  a = DECL_ARGUMENTS (thunk_fndecl);
 
   current_function_decl = thunk_fndecl;
 
   /* Ensure thunks are emitted in their correct sections.  */
   resolve_unique_section (thunk_fndecl, 0, flag_function_sections);
 
-  if (this_adjusting
+  if (this_adjusting
targetm.asm_out.can_output_mi_thunk (thunk_fndecl, fixed_offset,
  virtual_value, alias))
 {
Index: lto-streamer-in.c
===
--- lto-streamer-in.c   (revision 202153)
+++ lto-streamer-in.c   (working copy)
@@ -998,6 +998,7 @@ input_function (tree fn_decl, struct dat
   free_dominance_info (CDI_DOMINATORS);
   free_dominance_info (CDI_POST_DOMINATORS);
   free (stmts);
+  pop_cfun ();
 }
 
 
@@ -1086,8 +1087,6 @@ lto_read_body (struct lto_file_decl_data
 
   /* Restore decl state */
   file_data-current_decl_state = file_data-global_decl_state;
-
-  pop_cfun ();
 }
 
   lto_data_in_delete (data_in);
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 202153)
+++ lto-streamer-out.c  (working copy)
@@ -1982,8 +1982,7 @@ lto_output (void)
   cgraph_node *node = dyn_cast cgraph_node (snode);
   if (node
   lto_symtab_encoder_encode_body_p (encoder, node)
-  !node-symbol.alias
-  !node-thunk.thunk_p)
+  !node-symbol.alias)
{
 #ifdef ENABLE_CHECKING
  gcc_assert (!bitmap_bit_p (output, DECL_UID (node-symbol.decl)));

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Richard Biener

On Mon, Sep 2, 2013 at 12:52 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Aug 21, 2013, at 11:47 PM, Jan Hubicka hubi...@ucw.cz wrote:
  The problem is that DECL_ARGUMENTS of the thunk (aka _ZThn528_N1D3fooEv) 
  is used during thunk code-generation, and thunk code-generation happens 
  during the output of D::foo.

  I see, I will try to modify i386 backend to not output thunks.  The problem
  indeed is that thunks' arguments are built by the front-end and they are no
  longer streamed.  I am surprised i386 survives this given that it also
  produces some gimple thunks.
 
  I guess easiest way around is to make them to be streamed same way as we 
  stream
  functions that are used as abstract origin.  I have different plans in this
  direction - I want to lower thunks to gimple form early so they go through 
  the
  usual channel and get i.e. the profile read correctly.

 So, any news on this?
 I was bid dragged into other debuggning. Sorry for the delay.
 This patch seems to fix the problems on ppc64 hacked to not use asm thunks.
 Can you, please, test it that it solves problems on your target?

 Honza

 Index: cgraphunit.c
 ===
 --- cgraphunit.c(revision 202153)
 +++ cgraphunit.c(working copy)
 @@ -1414,14 +1414,18 @@ expand_thunk (struct cgraph_node *node)
tree virtual_offset = NULL;
tree alias = node-callees-callee-symbol.decl;
tree thunk_fndecl = node-symbol.decl;
 -  tree a = DECL_ARGUMENTS (thunk_fndecl);
 +  tree a;
 +
 +  if (in_lto_p)
 +cgraph_get_body (node);

That looks gross.  It cannot possibly be the correct fix for this.

Richard.

 +  a = DECL_ARGUMENTS (thunk_fndecl);

current_function_decl = thunk_fndecl;

/* Ensure thunks are emitted in their correct sections.  */
resolve_unique_section (thunk_fndecl, 0, flag_function_sections);

 -  if (this_adjusting
 +  if (this_adjusting
 targetm.asm_out.can_output_mi_thunk (thunk_fndecl, fixed_offset,
   virtual_value, alias))
  {
 Index: lto-streamer-in.c
 ===
 --- lto-streamer-in.c   (revision 202153)
 +++ lto-streamer-in.c   (working copy)
 @@ -998,6 +998,7 @@ input_function (tree fn_decl, struct dat
free_dominance_info (CDI_DOMINATORS);
free_dominance_info (CDI_POST_DOMINATORS);
free (stmts);
 +  pop_cfun ();
  }


 @@ -1086,8 +1087,6 @@ lto_read_body (struct lto_file_decl_data

/* Restore decl state */
file_data-current_decl_state = file_data-global_decl_state;
 -
 -  pop_cfun ();
  }

lto_data_in_delete (data_in);
 Index: lto-streamer-out.c
 ===
 --- lto-streamer-out.c  (revision 202153)
 +++ lto-streamer-out.c  (working copy)
 @@ -1982,8 +1982,7 @@ lto_output (void)
cgraph_node *node = dyn_cast cgraph_node (snode);
if (node
lto_symtab_encoder_encode_body_p (encoder, node)
 -  !node-symbol.alias
 -  !node-thunk.thunk_p)
 +  !node-symbol.alias)
 {
  #ifdef ENABLE_CHECKING
   gcc_assert (!bitmap_bit_p (output, DECL_UID (node-symbol.decl)));

[PATCH] Preserve more pointer arithmetic in IVOPTs

2013-09-02 Thread Richard Biener


tree-affine is a bit overzealous when converting elements of
pointer-typed combinations to sizetype.  From IVOPTs we often
get a combination that doesn't start with a pointer element in
which case the result was a pure sizetype compute.  This shows
when fixing PR57511 in gcc.dg/tree-ssa/reassoc-19.c where
we after the fix detect induction variables but manage to
break the pattern we expect.

So the following re-arranges add_elt_to_tree to preserve
pointerness when possible.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2013-09-02  Richard Biener  rguent...@suse.de

* tree-affine.c (add_elt_to_tree): Avoid converting all pointer
arithmetic to sizetype.

* gcc.dg/tree-ssa/loop-4.c: Adjust scan looking for one memory
reference.

Index: gcc/tree-affine.c
===
*** gcc/tree-affine.c   (revision 202097)
--- gcc/tree-affine.c   (working copy)
*** add_elt_to_tree (tree expr, tree type, t
*** 377,411 
  type1 = sizetype;
  
scale = double_int_ext_for_comb (scale, comb);
!   elt = fold_convert (type1, elt);
  
if (scale.is_one ())
  {
if (!expr)
!   return fold_convert (type, elt);
  
!   if (POINTER_TYPE_P (type))
! return fold_build_pointer_plus (expr, elt);
!   return fold_build2 (PLUS_EXPR, type, expr, elt);
  }
  
if (scale.is_minus_one ())
  {
if (!expr)
!   return fold_convert (type, fold_build1 (NEGATE_EXPR, type1, elt));
  
!   if (POINTER_TYPE_P (type))
!   {
! elt = fold_build1 (NEGATE_EXPR, type1, elt);
! return fold_build_pointer_plus (expr, elt);
!   }
!   return fold_build2 (MINUS_EXPR, type, expr, elt);
  }
  
if (!expr)
! return fold_convert (type,
!fold_build2 (MULT_EXPR, type1, elt,
! double_int_to_tree (type1, scale)));
  
if (scale.is_negative ())
  {
--- 377,422 
  type1 = sizetype;
  
scale = double_int_ext_for_comb (scale, comb);
! 
!   if (scale.is_minus_one ()
!POINTER_TYPE_P (TREE_TYPE (elt)))
! {
!   elt = fold_build1 (NEGATE_EXPR, sizetype, convert_to_ptrofftype (elt));
!   scale = double_int_one;
! }
  
if (scale.is_one ())
  {
if (!expr)
!   return elt;
  
!   if (POINTER_TYPE_P (TREE_TYPE (expr)))
!   return fold_build_pointer_plus (expr, convert_to_ptrofftype (elt));
!   if (POINTER_TYPE_P (TREE_TYPE (elt)))
!   return fold_build_pointer_plus (elt, convert_to_ptrofftype (expr));
!   return fold_build2 (PLUS_EXPR, type1,
! fold_convert (type1, expr),
! fold_convert (type1, elt));
  }
  
if (scale.is_minus_one ())
  {
if (!expr)
!   return fold_build1 (NEGATE_EXPR, TREE_TYPE (elt), elt);
  
!   if (POINTER_TYPE_P (TREE_TYPE (expr)))
!   return fold_build_pointer_plus
!   (expr, convert_to_ptrofftype
!(fold_build1 (NEGATE_EXPR, TREE_TYPE (elt), elt)));
!   return fold_build2 (MINUS_EXPR, type1,
! fold_convert (type1, expr),
! fold_convert (type1, elt));
  }
  
+   elt = fold_convert (type1, elt);
if (!expr)
! return fold_build2 (MULT_EXPR, type1, elt,
!   double_int_to_tree (type1, scale));
  
if (scale.is_negative ())
  {
*** add_elt_to_tree (tree expr, tree type, t
*** 417,429 
  
elt = fold_build2 (MULT_EXPR, type1, elt,
 double_int_to_tree (type1, scale));
!   if (POINTER_TYPE_P (type))
  {
if (code == MINUS_EXPR)
  elt = fold_build1 (NEGATE_EXPR, type1, elt);
return fold_build_pointer_plus (expr, elt);
  }
!   return fold_build2 (code, type, expr, elt);
  }
  
  /* Makes tree from the affine combination COMB.  */
--- 428,441 
  
elt = fold_build2 (MULT_EXPR, type1, elt,
 double_int_to_tree (type1, scale));
!   if (POINTER_TYPE_P (TREE_TYPE (expr)))
  {
if (code == MINUS_EXPR)
  elt = fold_build1 (NEGATE_EXPR, type1, elt);
return fold_build_pointer_plus (expr, elt);
  }
!   return fold_build2 (code, type1,
! fold_convert (type1, expr), elt);
  }
  
  /* Makes tree from the affine combination COMB.  */
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-4.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/loop-4.c  (revision 202097)
--- gcc/testsuite/gcc.dg/tree-ssa/loop-4.c  (working copy)
*** void xxx(void)
*** 37,43 
  
  /* { dg-final { scan-tree-dump-times  \\* \[^\\n\\r\]*= 0 optimized } } */
  /* { dg-final { scan-tree-dump-times \[^\\n\\r\]*= \\*  0 optimized } } */
! /* { dg-final { scan-tree-dump-times MEM 1 optimized } } */
  
  /* And the original induction variable should be

[PATCH] Fix PR57511

2013-09-02 Thread Richard Biener


The fix for PR5/40281 was too broad which results in unhandled
IVs in some loops and thus missed optimizations.  The PR specifically
talks about missed final value replacement but I've seem IVOPTs
failing to detect BIVs like for gcc.dg/tree-ssa/reassoc-19.c.

The following again allows SCEVs like { 0, +, { 1, +, 1 }_1 }_1,
thus non-linear IVs.

Re-bootstrap and testing running on x86_64-unknown-linux-gnu.

Richard.

2013-09-02  Richard Biener  rguent...@suse.de

PR middle-end/57511
* tree-scalar-evolution.c (instantiate_scev_name): Allow
non-linear SCEVs.

* gcc.dg/tree-ssa/sccp-1.c: New testcase.

Index: gcc/tree-scalar-evolution.c
===
*** gcc/tree-scalar-evolution.c (revision 202050)
--- gcc/tree-scalar-evolution.c (working copy)
*** instantiate_scev_name (basic_block insta
*** 2252,2257 
--- 2252,2258 
else if (res != chrec_dont_know)
  {
if (inner_loop
+  def_bb-loop_father != inner_loop
   !flow_loop_nested_p (def_bb-loop_father, inner_loop))
/* ???  We could try to compute the overall effect of the loop here.  */
res = chrec_dont_know;
Index: gcc/testsuite/gcc.dg/tree-ssa/sccp-1.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/sccp-1.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/sccp-1.c  (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ /* { dg-options -O2 -fdump-tree-optimized } */
+ 
+ int main(int argc, char* argv[])
+ {
+   int i, a = 0;
+   for (i=0; i  10; i++)
+ a += i + 0xff00ff;
+   return a;
+ }
+ 
+ /* There should be no loop left.  */
+ 
+ /* { dg-final { scan-tree-dump-times goto 0 optimized } } */
+ /* { dg-final { cleanup-tree-dump optimized } } */

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Jan Hubicka

  +  tree a;
  +
  +  if (in_lto_p)
  +cgraph_get_body (node);
 
 That looks gross.  It cannot possibly be the correct fix for this.
DECL_ARGUMENTS/DECL_RESULT are now part of function body.  cgraph_get_body is
there to load it for you when you need it. We are going to expand the function
so it makes sense to get it.

The same is done by the passmanager when function is going to be expanded. Only
difference here is that thunks do not go through the passmanager.

I can drop in_lto_p (the function does nothing when body is already there)

Honza

Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-09-02 Thread Michael Matz

Hi,

On Fri, 30 Aug 2013, David Malcolm wrote:

 Here's the result of a pair of builds of r202029 without and with the
 patches, configured with --enable-checking=release, running make, then
 stripping debuginfo [1]
 
 So the overall sizes of such binaries are essentially unchanged.

Yep, cool.

 Any suggestions on what to compile to compare performance?  By 60 
 seconds, do you mean 60s for one TU, or do you mean a large build e.g. 
 the linux kernel?

For one TU, so as to not measure any significant execve, as or IO 
overhead.  Some people compile e.g preprocessed variants of some GCC 
source file, myself I'm measuring such speeds since some years with 
unchanged versions of the attached (big-code.c, several large functions 
with arithmetics and one-deep loops) which can be customized 
by hand to produce different largeness and kdecore.cc [1] (real world C++ 
library code from 2009).

You can customize big-code.c to match some compile time goal by 
commenting out some FUNC invocation for some types, or by changing the 
body size of the functions by fiddling in the FUNC macro itself to invoke 
more of the L2's or even L3's, but that's really slow.

  And the manual GTY markers are so not maintainable in the long run, 
  gengtype or something else really needs to be taught to create them 
  automatically.
 
 Apart from the GTY aspect, how do people feel about the patch series?

Generally I like it, the parts I don't like are common to all the 
C++ification patches [2].  The manual GTY parts give me the creeps, but I 
think I expressed that already :)


Ciao,
Michael.
[1] http://frakked.de/~matz/kdecore.cc.gz 
[2] In this series it's the is_a/has_a/dyn_cast uglification,
-  gcc_gimple_checking_assert (gimple_has_mem_ops (g));
-  g-gsmembase.vuse = vuse;
+  gimple_statement_with_memory_ops *mem_ops_stmt =
+as_a gimple_statement_with_memory_ops (g);
+  mem_ops_stmt-vuse = vuse;

I can't really say I find this shorter, easier to read, more
expressive or even safer than what was there before.  And the 
repetition for adding the helpers for const and non-const types
all the time doesn't make it better.

(Btw: something for your helper script: it's customary to put the
'=' to the next line; I know there is precedent for your variant,
but '=' on next line is prevalent)extern int get_i(void);
typedef signed char schar;
typedef unsigned char uchar;
typedef unsigned short ushort;
typedef unsigned int uint;
typedef unsigned long ulong;
typedef long long llong;
typedef unsigned long long ullong;
typedef long double ldouble;

#define BODY(T) \
for (i = get_i(); i; i--) { \
  accum += do_1_ ## T(accum, x, y, z); \
  switch (get_i()) { \
case 1: case 3: case 34: case 123: \
  if (z*x  y) \
x = do_2_ ## T (x * y + z); \
  else \
y = do_3_ ## T (x / y - z); \
  break; \
case 6: case 43: case -2: \
  z = do_4_ ## T ((int)z); \
  break; \
  }\
  for (j = 1; j  get_i (); j++) { \
a[i+j] = accum * b[i+j*get_i()] / c[j]; \
b[j] = b[j] - a[j-1]*x; \
x += b[j-1] + b[j+1]; \
y *= c[j*i] += a[0] + b[0] * c[(int)x]; \
  } \
  for (j = 0; j  8192; j++) { \
a[j] = a[j] + b[j] * c[j]; \
  } \
}

#define L1(T) BODY(T) BODY(T) BODY(T) BODY(T) BODY(T) BODY(T) BODY(T) BODY(T)
#define L2(T) L1(T) L1(T) L1(T) L1(T) L1(T) L1(T) L1(T) L1(T)
#define L3(T) L2(T) L2(T) L2(T) L2(T) L2(T) L2(T) L2(T) L2(T)

#define FUNC(T) \
extern T do_1_ ## T(T, T, T, int); \
extern T do_2_ ## T(T); \
extern T do_3_ ## T(T); \
extern T do_4_ ## T(T); \
T func_ ## T (T x, T y, T z, T *a, T *b, T *c) \
{ \
  T accum = 0; \
  int i, j; \
  L2(T) \
  /*L2(T)*/ \
  return accum; \
}

FUNC(schar)
FUNC(uchar)
FUNC(short)
FUNC(ushort)
FUNC(int)
FUNC(uint)
FUNC(long)
FUNC(ulong)
FUNC(llong)
FUNC(ullong)
FUNC(float)
FUNC(double)
FUNC(ldouble)

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Richard Biener

On Mon, Sep 2, 2013 at 1:43 PM, Jan Hubicka hubi...@ucw.cz wrote:
  +  tree a;
  +
  +  if (in_lto_p)
  +cgraph_get_body (node);

 That looks gross.  It cannot possibly be the correct fix for this.
 DECL_ARGUMENTS/DECL_RESULT are now part of function body.

Well, there is still fallout from this change so I'm not convinced
this will stay
this way.  Also we stream the function-decl that refers to these fields in
the global section which means we have a layering violation.  Which means
DECL_ARGUMENTS and DECL_RESULT should be moved to struct function?
Of course frontends may not be happy with that (given DECL_ARGUMENTS
is also used in function declarations)

Please consider reverting these changes (at least the DECL_ARGUMENTS one).

  cgraph_get_body is
 there to load it for you when you need it. We are going to expand the function
 so it makes sense to get it.

Then all DECL_ARGUMENTS/DECL_RESULT users need to be audited, no?

Richard.

 The same is done by the passmanager when function is going to be expanded. 
 Only
 difference here is that thunks do not go through the passmanager.

 I can drop in_lto_p (the function does nothing when body is already there)

 Honza

Re: [PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-09-02 Thread Sebastian Huber


Hello,

what is the blocking point for GCC integration?  It was accepted by the SC and 
all issues of the last review have been addressed (at least this is my 
impression).  Is it that none of the persons with global write permission seems 
to be responsible?  The Binutils port is available since 2013-02-06.  The 
Newlib port is available since 2013-05-06.  The last step of an official 
FSF/Newlib tool chain for the Altera Nios II is the GCC part.


On 2013-09-02 11:38, Chung-Lin Tang wrote:

Ping.

On 13/8/20 10:57 AM, Chung-Lin Tang wrote:

Ping.

BTW, the SC has approved the Nios II port already:
http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html

The port is still awaiting technical review.

Thanks,
Chung-Lin

On 13/7/14 下午3:54, Chung-Lin Tang wrote:

Hi, the last ping of the Nios II patches was:
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html

After assessing the state, we feel it would be better to post a
re-submission of the newest patches.

The changes accumulated since the original post include:

1) Several bug fixes related to built-in function expanding.
2) A few holes in hard-float FPU code generation was plugged.
3) Support for parsing white-spaces in target attributes.
4) Revision of consistency check behavior of codes in custom instruction
built-ins.
5) Some new testcases.

The issues raised by Joseph in the first round of reviewing have been
addressed. Testing has been re-done on both 32-bit and 64-bit hosts.

PR55035 appears to not have been resolved yet, which affects nios2 among
several other targets, thus configured with --enable-werror-always still
does not build.

As before, Sandra and me will serve as nios2 port maintainers.

Attached is the patch for the compiler-proper.

Thanks,
Chung-Lin



--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: [PATCH 0/6] Convert gimple to a C++ class hierarchy

2013-09-02 Thread Martin Jambor

Hi,

On Fri, Aug 30, 2013 at 03:21:22PM -0400, David Malcolm wrote:
 Apart from the GTY aspect, how do people feel about the patch series?
 FWIW I have vague thoughts about doing something similar for tree -
 doing so *might* give an easier route to the type vs expression
 separation that Andrew spoke about at the Cauldron rearchitecture BoF.
 (I started looking at doing a similar C++-ification of rtx, but...
 gah)
 

I like it but before you start looking at the biger things, could you
perhpas proceed with the symtab?  It has much fewer classes, will
probably affect private development of fewer people, the accessor
macros/functions of symtab are less developed so it will immediately
really make code nicer, Honza has approved it and I'm really looking
forward to it.  Also, perhaps it will show us at much saller scale
potential problems with the general scheme.

I'm only writing this because the development there seems a bit
stalled and it it a shame.  Of course, you ay want to simplify the
manual markings first.  I'd perfectly understand that.

Thanks,

Martin

Re: [RFC] Changes to the wide-int classes

2013-09-02 Thread Kenneth Zadeck


On 09/02/2013 05:35 AM, Richard Biener wrote:

On Mon, 2 Sep 2013, Richard Sandiford wrote:


Kenneth Zadeck zad...@naturalbridge.com writes:

There is no place for exactly two HWIs in the machine independent parts
of the compiler,

I totally agree.  In fact I was taking that so much for granted that
I didn't even think to add a rider about it, sorry.  I didn't mean
to imply that we should keep double_int around.

I think the reason for doing this is to prove that it can be done
(so that the wide_int code isn't too big a change for the tree level)
and to make it easier to merge the wide-int patches into trunk piecemeal
if we need to.

Note that changing the tree rep to non-double_int is easy.  Also note
that I only want 'double_int' to stay around to have a fast type
that can work on two HWIs for the code that need more precision
than a HWI.  The only important cases I know of are in
get_inner_reference and get_ref_base_and_extent and friends.  Those
are heavily used (and the only double-int function callers that
even show up in regular cc1 profiles).

So if wide_int_fixed2 ('2' better be replaced with
'number-that-gets-me-twice-target-sizetype-precision')
works for those cases then fine (and we can drop double-ints).
This is what the addr_wide_int is for - it sniffs the port and is 
guaranteed to big enough to hold the largest pointer, + 4 bits (3 bits 
for bitposition and 1 bit so that unsigned things come over with no 
loss) then that is rounded up to the next hwi.   (i will admit that the 
sniffing code needs a little work but that can be fixed without changing 
the api).


kenny

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-09-02 Thread Richard Biener

On Wed, Jul 3, 2013 at 2:25 PM, Kugan kugan.vivekanandara...@linaro.org wrote:
 On 17/06/13 18:33, Richard Biener wrote:

 On Mon, 17 Jun 2013, Kugan wrote:
 +/* Extract the value range of assigned exprassion for GIMPLE_ASSIGN stmt.
 +   If the extracted value range is valid, return true else return
 +   false.  */
 +static bool
 +extract_exp_value_range (gimple stmt, value_range_t *vr)
 +{
 +  gcc_assert (is_gimple_assign (stmt));
 +  tree rhs1 = gimple_assign_rhs1 (stmt);
 +  tree lhs = gimple_assign_lhs (stmt);
 +  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 ...
 @@ -8960,6 +9016,23 @@ simplify_stmt_using_ranges (gimple_stmt_iterator
 *gsi)
   {
 enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 tree rhs1 = gimple_assign_rhs1 (stmt);
 +  tree lhs = gimple_assign_lhs (stmt);
 +
 +  /* Set value range information for ssa.  */
 +  if (!POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
 +   (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
 +   INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
 +   !SSA_NAME_RANGE_INFO (lhs))
 +{
 +  value_range_t vr = VR_INITIALIZER;
 ...
 +  if (extract_exp_value_range (stmt, vr))
 +tree_ssa_set_value_range (lhs,
 +  tree_to_double_int (vr.min),
 +  tree_to_double_int (vr.max),
 +  vr.type == VR_RANGE);
 +}

 This looks overly complicated to me.  In vrp_finalize you can simply do

for (i = 0; i  num_vr_values; i++)
  if (vr_value[i])
{
  tree name = ssa_name (i);
  if (POINTER_TYPE_P (name))
continue;
  if (vr_value[i].type == VR_RANGE
  || vr_value[i].type == VR_ANTI_RANGE)
tree_ssa_set_value_range (name, tree_to_double_int
 (vr_value[i].min), tree_to_double_int (vr_value[i].max), vr_value[i].type
 == VR_RANGE);
}


 Thanks Richard for taking time to review it.

 I was doing something like what you are suggesting earlier but noticed some
 problems and that’s the reason why I changed.

 For example, for the following testcase from the test suite,

 unsigned long l = (unsigned long)-2;
 unsigned short s;

 int main () {
   long t = l + 1;
   s = l;
   if (s != (unsigned short) -2)
 abort ();
   exit (0);
 }

 with the following gimple stmts

 main ()
 {
   short unsigned int s.1;
   long unsigned int l.0;

 ;;   basic block 2, loop depth 0
 ;;pred:   ENTRY
   l.0_2 = l;
   s.1_3 = (short unsigned int) l.0_2;
   s = s.1_3;
   if (s.1_3 != 65534)
 goto bb 3;
   else
 goto bb 4;
 ;;succ:   3
 ;;4

 ;;   basic block 3, loop depth 0
 ;;pred:   2
   abort ();
 ;;succ:

 ;;   basic block 4, loop depth 0
 ;;pred:   2
   exit (0);
 ;;succ:

 }



 has the following value range.

 l.0_2: VARYING
 s.1_3: [0, +INF]


 From zero/sign extension point of view, the variable s.1_3 is expected to
 have a value that will overflow (or varying) as this is what is assigned to
 a smaller variable. extract_range_from_assignment initially calculates the
 value range as VARYING but later changed to [0, +INF] by
 extract_range_basic. What I need here is the value that will be assigned
 from the rhs expression and not the value that we will have with proper
 assignment.

I don't understand this.  The relevant statement is

  s.1_3 = (short unsigned int) l.0_2;

right?  You have value-ranges for both s.1_3 and l.0_2 as above.  And
you clearly cannot optimize the truncation away (and if you could,
you wond't need value-range information for that fact).

 I understand that the above code of mine needs to be changed but not
 convinced about the best way to do that.

 I can possibly re-factor extract_range_from_assignment to give me this
 information with an additional argument. Could you kindly let me know your
 preference.




 /* SSA name annotations.  */

 +  union vrp_info_type {
 +/* Pointer attributes used for alias analysis.  */
 +struct GTY ((tag (TREE_SSA_PTR_INFO))) ptr_info_def *ptr_info;
 +/* Value range attributes used for zero/sign extension elimination.
 */

 /* Value range information.  */

 +struct GTY ((tag (TREE_SSA_RANGE_INFO))) range_info_def
 *range_info;
 +  } GTY ((desc (%1.def_stmt  !POINTER_TYPE_P (TREE_TYPE
 ((tree)%1) vrp;

 why do you need to test %1.def_stmt here?



 I have seen some tree_ssa_name with def_stmt NULL. Thats why I added this.
 Is that something that should never happen.

It should never happen - they should have a GIMPLE_NOP.

+void
+set_range_info (tree name, double_int min,
+  double_int max, bool vr_range)

you have some whitespace issues here (please properly use tabs)

+  /* Allocate if not available.  */
+  if (ri == NULL)
+{
+  ri = ggc_alloc_cleared_range_info_def ();
+  mark_range_info_unknown (ri);

that

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Jan Hubicka

 
 Well, there is still fallout from this change so I'm not convinced
 this will stay
 this way.  Also we stream the function-decl that refers to these fields in

As far as I know there are two problems
 1) problem with the thunk expansion not getting DECL_ARGUMENTS/DECL_RESULT
addressed by this patch
 2) ICE in gcc.dg/torture/pr8081.c
Here the problem is in variable type of return value that gets streamed
twice and we end up with duplicate.

As mentioned earlier, I want to handle this by introducing
variable_sized_function_type_p that will make those go specially the
old way (with DECL_ARGUMENTS/DECL_RESULT in the global stream).

I did not send the patch, because I think the variable sized parameters
were handled incorrectly before my changes, too, and the fix is not
so trivial.  It just makes us
not ICE the same way as before.  The testcase is:
/* { dg-do run } */

extern void abort (void);
int
main (int argc, char **argv)
{
  int size = 10;
  typedef struct
{
  char val[size];
}
  block;
  block a, b;
  block __attribute__((noinline))
  retframe_block ()
{
  return *(block *) b;
}
  b.val[0] = 1;
  b.val[9] = 2;
  a=retframe_block ();
  if (a.val[0] != 1
  || a.val[9] != 2)
abort ();
  return 0;
}
So here the size is not parameter of the function, it is a local vairable
(later turned into structure) that gets into local function stream.  

The type is:
 record_type 0x77511930 block sizes-gimplified type_0 type_1 BLK
size var_decl 0x7752f428 D.1738
type integer_type 0x7740d0a8 bitsizetype public unsigned TI
size integer_cst 0x773f7fc0 constant 128
unit size integer_cst 0x773f7fe0 constant 16
align 128 symtab 0 alias set -1 canonical type 
0x7740d0a8 precision 128 min integer_cst 0x7740f000 0 max 
integer_cst 0x773f7fa0 -1
used unsigned ignored TI file t.c line 8 col 19 size 
integer_cst 0x773f7fc0 128 unit size integer_cst 0x773f7fe0 16
align 128 context function_decl 0x77510f00 main
chain var_decl 0x7752f4c0 D.1739 type integer_type 
0x7740d738 long int
used ignored DI file t.c line 10 col 20
size integer_cst 0x773f7f40 constant 64
unit size integer_cst 0x773f7f60 constant 8
align 64 context function_decl 0x77510f00 main chain 
var_decl 0x7752f558 D.1740
unit size var_decl 0x7752f2f8 D.1736
type integer_type 0x7740d000 sizetype public unsigned DI 
size integer_cst 0x773f7f40 64 unit size integer_cst 0x773f7f60 8
align 64 symtab 0 alias set -1 canonical type 
0x7740d000 precision 64 min integer_cst 0x773f7f80 0 max integer_cst 
0x773f7000 18446744073709551615
used unsigned ignored DI file t.c line 8 col 19 size 
integer_cst 0x773f7f40 64 unit size integer_cst 0x773f7f60 8
align 64 context function_decl 0x77510f00 main
chain var_decl 0x7752f390 D.1737 type integer_type 
0x7740d0a8 bitsizetype
used unsigned ignored TI file t.c line 8 col 19 size 
integer_cst 0x773f7fc0 128 unit size integer_cst 0x773f7fe0 16
align 128 context function_decl 0x77510f00 main chain 
var_decl 0x7752f428 D.1738
align 8 symtab 0 alias set -1 canonical type 0x77511738
fields field_decl 0x7752f098 val
type array_type 0x77511888 type integer_type 
0x7740d3f0 char
sizes-gimplified type_1 BLK size var_decl 0x7752f428 
D.1738 unit size var_decl 0x7752f2f8 D.1736
align 8 symtab 0 alias set -1 structural equality domain 
integer_type 0x775117e0
decl_0 BLK file t.c line 10 col 20 size var_decl 
0x7752f428 D.1738 unit size var_decl 0x7752f2f8 D.1736
align 8 offset_align 128
offset integer_cst 0x773f7f80 constant 0
bit offset integer_cst 0x7740f000 constant 0 context 
record_type 0x77511738 context function_decl 0x77510f00 main
pointer_to_this pointer_type 0x77511bd0 chain type_decl 
0x7753 D.1726

My understanding is that we end up with silently putting automatic variable
D.1738 into global stream and we happen to not explode later.  At stream
in time we will have duplicated D.1738 parameters and it is only becuase
the one in TYPE_SIZE is unused making us to not ICE. 

I also do not see how we can produce valid debug info for the nested

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Jan Hubicka

  
  Well, there is still fallout from this change so I'm not convinced
  this will stay
  this way.  Also we stream the function-decl that refers to these fields in
 
 As far as I know there are two problems
  1) problem with the thunk expansion not getting DECL_ARGUMENTS/DECL_RESULT
 addressed by this patch
  2) ICE in gcc.dg/torture/pr8081.c
 Here the problem is in variable type of return value that gets streamed
 twice and we end up with duplicate.
 
 As mentioned earlier, I want to handle this by introducing
 variable_sized_function_type_p that will make those go specially the
 old way (with DECL_ARGUMENTS/DECL_RESULT in the global stream).
 
 I did not send the patch, because I think the variable sized parameters
 were handled incorrectly before my changes, too, and the fix is not
 so trivial.  It just makes us
 not ICE the same way as before.  The testcase is:
   /* { dg-do run } */
 
   extern void abort (void);
   int
   main (int argc, char **argv)
   {
 int size = 10;
 typedef struct
   {
 char val[size];
   }
 block;
 block a, b;
 block __attribute__((noinline))
 retframe_block ()
   {
 return *(block *) b;
   }
 b.val[0] = 1;
 b.val[9] = 2;
 a=retframe_block ();
 if (a.val[0] != 1
 || a.val[9] != 2)
   abort ();
 return 0;
   }
 So here the size is not parameter of the function, it is a local vairable
 (later turned into structure) that gets into local function stream.  

^^^ reading my email, perhaps in this case the correct fix is to make
tree-nested to fix-up the TYPE_SIZE?  It knows how to access the
frame variables from outer function via the invisible link pointer
passed and it knows how to update gimple.  That may also fix the issue
with debug info.

Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2013-09-02 Thread Aldy Hernandez




+   case CILK_SYNC_STMT: +{ +   if (!cfun-cilk_frame_decl) +
{ + error_at (input_location, expected %_Cilk_spawn% before
 +  %_Cilk_sync%); +  ret = GS_ERROR; +
 }


First, surely you have a location you can use, instead of the
generic input_location (perhaps the location for the
CILK_SYNC_STMT??).  Also, Can you not check for this in
c_finish_cilk_sync_stmt(), or the corresponding code-- that is, in
the FE somewhere?  And hopefully, in a place you can share with the
C++ FE?  If it is a real pain, I am willing to let this go, since
it happens only in the Cilk code path, though the general trend
(especially with Andrew's proposed changes) is to do all type
checking as close to the source as possible.


If you look at the codes above (e.g. TRUTH_XOR_EXPR), they all use
input_location. Also, in the beginning of the function there is a
line  like this:

if (save_expr != error_mark_node  EXPR_HAS_LOCATION (*expr_p))
input_location = EXPR_LOCATION (*expr_p);


Yes, that is old legacy code.  The present move is to steer away from 
input_location altogether.



For the 2nd point, there can be a case where (with the help of Gotos)
_Cilk_sync can come before _Cilk_spawn. So, the only way to do this
check is to do it after the entire function is parsed.


Fair enough.

Alright, I'm fine with this current incantation.  Thanks for all your 
hard work taking care of the things I've pointed out.


It's now up to one of the core maintainers to take it from here.

Aldy

Re: Eliminate vectorizer analysis side effects

2013-09-02 Thread Richard Biener

On Fri, Aug 30, 2013 at 11:34 PM, Xinliang David Li davi...@google.com wrote:
 On Fri, Aug 30, 2013 at 1:23 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Aug 30, 2013 at 1:28 AM, Xinliang David Li davi...@google.com 
 wrote:
 I was debugging a runtime failure of SPEC06 xalancbmk built with LIPO.
 Using -fdisable-pass option pinpoints the problem in slp vectorize
 pass on a particular function. dbgcnt support is added to to track
 down the individual BB, but it  fails even when the dbg count is set
 to 0.

 It turns out that no BB was actually vectorized for that function, but
 turning on/off slp-vectorize does make a difference in generated code
 -- the only difference between the good and bad case is stack layout.
  The problem is  in the alignment analysis phase -- which
 speculatively changes the base declaration's alignment regardless
 whether the vectorization transformation will be performed or not
 later.

 The attached patch fixes the problem. Testing is ok. Ok for trunk?

 Not in this form.  I'd rather not put extra fields in the data-refs this way.
 (As for the xalancbmk runtime problem - doesn't this patch just paper
 over the real issue?)

 I believe it is stack-limit related. This program has some recursive
 call chains that can generate a call frame ~9k deep. The vectorizer
 side effect causes the affected function in the call frame to grow
 ~100 byte in stack size. Since this function appears lots of times in
 the callstack, the overall stack consumption increase a lot. Combined
 with the aggressive cross module inlining, it ends up blowing up the
 stack limit.



 For BB SLP you still adjust DR bases that do not take part in
 vectorization - the DR vector contains all refs in the BB, not just
 those in the SLP instances we are going to vectorize.  So the
 commit of the re-aligning decision should be done from where
 we vectorize the DR, in vectorizable_load/store in its transform
 phase.

 If we decide to integrate the fields into the data-ref then the
 analysis and commit parts should go into the data-ref machinery
 as well.  Otherwise the vectorizer should use data-ref-aux or some
 other way to hang off its private data.


 Good point.

 Other than that, modifying alignment of variables that are not
 vectorized is indeed a bug worth fixing.

 The new version of the patch is attached. Ok for trunk after testing?

+/* A helper function to free data refs.  */
+
+void
+destroy_datarefs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)

please rename to vect_destroy_datarefs

@@ -431,6 +460,7 @@ static unsigned int
 execute_vect_slp (void)
 {
   basic_block bb;
+  bb_vec_info bb_vinfo;

   init_stmt_vec_info_vec ();

@@ -438,8 +468,11 @@ execute_vect_slp (void)
 {
   vect_location = find_bb_location (bb);

-  if (vect_slp_analyze_bb (bb))
+  if ((bb_vinfo = vect_slp_analyze_bb (bb)))
 {

spurious change?  bb_vinfo seems to be unused.

+typedef struct _dataref_aux {
+  tree base_decl;
+  bool base_misaligned;
+  int misalignment;
+} dataref_aux;

we no longer need that typedef stuff with C++ ...

+  gcc_assert (dr-aux);
+  ((dataref_aux *)dr-aux)-base_decl = base;
+  ((dataref_aux *)dr-aux)-base_misaligned = true;

dereferencing dr-aux will trap, so no need to assert (dr-aux).
Please add a comment before this code explaining what this is for.

-vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
-   slp_tree slp_node)
+vectorizable_store_1 (gimple stmt, stmt_vec_info stmt_info,
+  struct data_reference *dr,
+  gimple_stmt_iterator *gsi, gimple *vec_stmt,
+  slp_tree slp_node)
...
+/* A wrapper function to vectorizable_store.  */
+
+static bool
+vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
+slp_tree slp_node)
+{

it shouldn't be necessary to split the functions, after

  if (!vec_stmt) /* transformation not required.  */
{
  STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
  vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt,
 NULL, NULL, NULL);
  return true;
}

  /** Transform.  **/

the function is no longer allowed to fail, so at this point you can call
ensure_base_align.  Similar for the load case.

Ok with those minor cases fixed.

Thanks,
Richard.


 thanks,

 David


 Richard.

 thanks,

 David

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Richard Biener

On Mon, Sep 2, 2013 at 3:02 PM, Jan Hubicka hubi...@ucw.cz wrote:
 
  Well, there is still fallout from this change so I'm not convinced
  this will stay
  this way.  Also we stream the function-decl that refers to these fields in

 As far as I know there are two problems
  1) problem with the thunk expansion not getting DECL_ARGUMENTS/DECL_RESULT
 addressed by this patch
  2) ICE in gcc.dg/torture/pr8081.c
 Here the problem is in variable type of return value that gets streamed
 twice and we end up with duplicate.

 As mentioned earlier, I want to handle this by introducing
 variable_sized_function_type_p that will make those go specially the
 old way (with DECL_ARGUMENTS/DECL_RESULT in the global stream).

 I did not send the patch, because I think the variable sized parameters
 were handled incorrectly before my changes, too, and the fix is not
 so trivial.  It just makes us
 not ICE the same way as before.  The testcase is:
   /* { dg-do run } */

   extern void abort (void);
   int
   main (int argc, char **argv)
   {
 int size = 10;
 typedef struct
   {
 char val[size];
   }
 block;
 block a, b;
 block __attribute__((noinline))
 retframe_block ()
   {
 return *(block *) b;
   }
 b.val[0] = 1;
 b.val[9] = 2;
 a=retframe_block ();
 if (a.val[0] != 1
 || a.val[9] != 2)
   abort ();
 return 0;
   }
 So here the size is not parameter of the function, it is a local vairable
 (later turned into structure) that gets into local function stream.

 ^^^ reading my email, perhaps in this case the correct fix is to make
 tree-nested to fix-up the TYPE_SIZE?  It knows how to access the
 frame variables from outer function via the invisible link pointer
 passed and it knows how to update gimple.  That may also fix the issue
 with debug info.

But we still refer to the local entity from TREE_TYPE of the function decl,
no?

As for debug info we probably have to insert proper DEBUG_STMTs that
tell us how to construct those.  Don't we do this kind of dance already?

All the gimplfied type-sizes in TYPE_SIZE are basically 'useless', they
exist only for the gimplfiers sake and as 'marker' that the type has
variable size.
All references to the size are explicit in the code (and I believe
debug info has
to be emitted before gimplifying or doesn't properly handle such variadic types
at all).  That's some huge cleanup opportunity (I suppose NULLifying TYPE_SIZE
doesn't work).

Richard.

Re: Remove hash from remember_with_vars

2013-09-02 Thread Jan Hubicka

Hi,
unfortunately this patch ICEs on the following testcase
/* This used to fail on SPARC with an unaligned memory access.  */

void foo(int n)
{
  struct S {
int i[n];
unsigned int b:1;
int i2;
  } __attribute__ ((packed)) __attribute__ ((aligned (4)));

  struct S s;

  s.i2 = 0;
}

int main(void)
{
  foo(4);
  
  return 0;
}

(in a tetsuite I must have missed during testing) because it check
that field offset is always non-variable.

I merely copied this sanity check from identical one in
lto_fixup_prevailing_decls that however does not fire because it needs
mentions_vars_p_field_decl to record the var first and that never happens.

This patch fixes it by adding an fixup.  Here the variable is local, but
moving n to file scope is allowed.

I have bootstrapped/regtested ppc64-linux, will commit it as obvious.

I apologize for the breakage.  I will prioritize fixing the fallout I caused
this week.

Honza

Index: ChangeLog
===
--- ChangeLog   (revision 202173)
+++ ChangeLog   (working copy)
@@ -1,5 +1,10 @@
 2013-08-31  Jan Hubicka  j...@suse.cz
 
+   * lto.c (mentions_vars_p_field_decl, lto_fixup_prevailing_decls): 
+   DECL_FIELD_OFFSET can contain an reference to variable.
+
+2013-08-31  Jan Hubicka  j...@suse.cz
+
* lto.c (tree_with_vars): Turn into vector.
(MAYBE_REMEMBER_WITH_VARS): Change to...
(CHECK_VAR): ... this one.
Index: lto.c
===
--- lto.c   (revision 202153)
+++ lto.c   (working copy)
@@ -1389,7 +1389,7 @@ mentions_vars_p_field_decl (tree t)
 {
   if (mentions_vars_p_decl_common (t))
 return true;
-  CHECK_NO_VAR (DECL_FIELD_OFFSET (t));
+  CHECK_VAR (DECL_FIELD_OFFSET (t));
   CHECK_NO_VAR (DECL_BIT_FIELD_TYPE (t));
   CHECK_NO_VAR (DECL_QUALIFIER (t));
   CHECK_NO_VAR (DECL_FIELD_BIT_OFFSET (t));
@@ -3207,7 +3207,7 @@ lto_fixup_prevailing_decls (tree t)
LTO_SET_PREVAIL (DECL_FUNCTION_PERSONALITY (t));
   if (CODE_CONTAINS_STRUCT (code, TS_FIELD_DECL))
{
- LTO_NO_PREVAIL (DECL_FIELD_OFFSET (t));
+ LTO_SET_PREVAIL (DECL_FIELD_OFFSET (t));
  LTO_NO_PREVAIL (DECL_BIT_FIELD_TYPE (t));
  LTO_NO_PREVAIL (DECL_QUALIFIER (t));
  LTO_NO_PREVAIL (DECL_FIELD_BIT_OFFSET (t));

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Jan Hubicka

 
 But we still refer to the local entity from TREE_TYPE of the function decl,
 no?

depending on definition of local entity.  I tought we was discussing if 
PARM_DECL
is local or not.

I spent some time thining about the whole streaming scheme and I think we do 
have
important side cases handled incorrectly.  Lets try to discuss things.  I will 
try
to summarize my understanding to make sure we are on sync and I do not miss 
something
important.

So we now have two types of streams
  - the global stream
this stream is one per compilation unit and it gets load into WPA
and merged with other streams by tree mergina and LTO-symtab resolution.
  - local stream.
this stream is local for function (and hope for variable initializer soon, 
too),
it is read when we need the function body and it is not in any way merged 
with
global stream, except that the references to global streams are resolved
after merging

We do have way to express references from local stream to global stream.  We 
however
can not express
  A) references from global stream to local stream 
  B) references in between two different local streams.

The decision what should go to local or global stream is basically motivated by
  1) everything needed for interprocedural optimization has to be global
  2) everything related to function bodies should be local.

For tree nodes, the decision about where the node goes is made by
tree_is_indexable.  The current set of rules is
 - parm_decl/result_decl is local
   (before my change resut_decl was global and parm_decl was local, why those
was not the same escapes me, but it needed ugly fixup in ipa-prop that
needed to reconnect global DECL pointers to local DECL pointers after
streaming in)
 - function variables are local if they are not static.
   (Even external declarations are local, not only automatic vars.)
 - Types are global unless they are variably modified types
   (The variably modified type code walks into subtypes, so even pointers
   to variably modified type are local or function declarations mentioning
   these.)
 - FIELD_DECL is global unless it is part of variably modified type.
 - SSA_NAME is local despite logic of tree_is_indexable because it is handled
   separately as part of gimple and gimple is always local.
 - LABEL_DECLs are global even when their address is not taken and it would make
   more sense to hide it in the body sections

Now I think we are safe only when our references never violate A) and B) and 
moreover 
C) same entity that needs to be unique (such as declarations by one decl rule
   or BLOCK) is not streamed into multiple streams.  We never merge things
   between streams and duplicating those is forbidden by design.

It is an irritating property of our streaming machinery that it is not
diagnozing violations of A/B/C. Violations of type A and B are basically hidden
by introducing violations of type C and compiler tends to be resistant to those
in a sense that is usually does not ICE.

Now I only once added sanity check for C and larnt that

 1) variably sized types may end up being twice - in global stream because they 
are
mentioned by FUNCTION_DECL/VARIABLE_DECL or TYPE_DECL
and local stream because of tree_is_indexable rule

Moreover they violate A after landing in global stream because they refer
to temporaries in the gimple body.
As you mentioned, we may want to clear TYPE_SIZE and variable 
DECL_FIELD_OFFSET.
But sadly I think dwarf2out needs both and gimple land needs 
DECL_FIELD_OFFSET.
 2) TYPE_DECLS of local types (common in C++) tends to drag into global stream
BLOCK that later brings to global stream the local variables.

Making function local TYPE_DECLs local seems cool, but it break A because
function/local static variable types may reffer to them.
 3) _ORIGIN pointers violate B by design.
 4) LABEL_DECL with address taken violate A and B.

I am sure this list is not complette.

Now we can play about what can be global or local.  Like putting PARM_DECL into 
global
stream only, or returning RESULT_DECL into global decl stream.  But
I do not really see how much chance it has to fix 1-4.  I think the general flaw
is that functions/static variables (we absolutely need in global stream) may 
reffer
to local entities by contextes and variably sized types.  

Either we need to consistently not stream pointers causing the violations above
- i.e. clear them in free_lang_data and make backend not dependent on them. Or
we need to find way to make references of type A and perhaps B possible.  I can
think of local_ref function_decl localid tree construct that we patch in and
replace by actual declaration after reading function_decl body when it is
needed (by dwarf2out or tree optimizers), but it does not seem cool to me 
either.
We will like soon notice that with comdats localid needs to be unique across
the merged comdat bodies or somehting similarly nasty.

Another alternative I can

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Jan Hubicka

Hi,
the following testcase illustrate problem with the offset.  Sadly it ICEs even 
w/o
LTO:
evans:/abuild/jh/trunk-3/build-inst12/gcc/:[1]# ./xgcc -B ./ -O2 ~/tt.c  
/root/tt.c: In function 'main':
/root/tt.c:24:11: warning: overflow in implicit constant conversion [-Woverflow]
   b.b=0xdead;
   ^
/root/tt.c:25:12: internal compiler error: in assign_stack_temp_for_type, at 
function.c:762
   a=retframe_block ();
^
0x7ae808 assign_stack_temp_for_type(machine_mode, long, tree_node*)
../../gcc/function.c:762


/* { dg-do run } */

extern void abort (void);
int
main (int argc, char **argv)
{ 
  int size = 10;
  typedef struct
{ 
  char val[size];
  char b;
}
  block;
  block a, b;
  block __attribute__((noinline))
  retframe_block ()
{ 
if (b.b != 0xdead)
abort ();
  return *(block *) b;
}
  b.val[0] = 1;
  b.val[9] = 2;
  b.b=0xdead;
  a=retframe_block ();
  if (a.val[0] != 1
  || a.val[9] != 2)
abort ();
  return 0;
}
Honza

Re: [RFC] Changes to the wide-int classes

2013-09-02 Thread Joseph S. Myers

On Sun, 1 Sep 2013, Richard Sandiford wrote:

   like to get rid of them and just keep the genuine operators.  The problem
   is that I'd have liked the AND routine to be wi::and, but of course that
   isn't possible with and being a keyword, so I went for wi::bit_and
   instead.  Same for not and wi::bit_not, and or and wi::bit_or.
   Then it seemed like the others should be bit_* too, and wi::bit_and_not
   just seems a bit unwieldly...
 
   Hmm, if we decide to forbid the use of and in gcc, perhaps we could
   #define it to something safe.  But that would probably be too confusing.

and in C++ is not a keyword, but an alternative token (like % etc.).  
As such, it can't be defined as a macro, or used as a macro name in 
#define, #ifdef etc., and does not get converted to 0 in #if conditions 
but is interpreted as an operator there.  (The status of new and 
delete in this regard is less clear; see 
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#369.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-09-02 Thread Jan Hubicka

Hi,
also to avoid the ICE in the original testcase, we do not really need the
DECL_ARGUMENTS/RESULT_DECL lists.  All we need is RESULT_DECL in the global
stream.  The following one liner fixes the testcase and all variants of my
ulitimate death testcase that did not ICE in tree from beggining of August.

Bootstrapped/regtested x86_64-linux with default, running testing on
ppc64-linux including Ada. Does this seem resonable? (though still symptomatic
in my belief)

* lto-streamer-out.c (tree_is_indexable): RESULT_DECL/PARM_DECL of
variably modified types are global.
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 202161)
+++ lto-streamer-out.c  (working copy)
@@ -124,8 +124,11 @@ output_type_ref (struct output_block *ob
 static bool
 tree_is_indexable (tree t)
 {
+  /* Parameters and return values of functions of variably modified types
+ must go to global stream, because they may be used in the type
+ definition.  */
   if (TREE_CODE (t) == PARM_DECL || TREE_CODE (t) == RESULT_DECL)
-return false;
+return variably_modified_type_p (TREE_TYPE (DECL_CONTEXT (t)), NULL_TREE);
   else if (TREE_CODE (t) == VAR_DECL  decl_function_context (t)
!TREE_STATIC (t))
 return false;

Re: [patch 4/4] -fstrict-volatile-bitfields cleanup v3: remove from defaults on all targets

2013-09-02 Thread Joseph S. Myers

On Mon, 2 Sep 2013, Richard Earnshaw wrote:

 On 01/09/13 14:10, Bernd Edlinger wrote:
  IMHO the AAPCS forbids packed structures. Therefore we need not
  interfere with the C++ memory model if we have unaligned data.
 
 The AAPCS neither forbids nor requires packed structures.  They're a GNU
 extension and as such not part of standard C++.  Thus the semantics of
 such an operation are irrelavant to the AAPCS: you get to chose what the
 behaviour is in this case...

The trouble is that AAPCS semantics are incompatible with the default GNU 
semantics for non-packed structures as well - AAPCS 
strict-volatile-bitfields is only compatible with --param 
allow-store-data-races=1, which is not the default for any language 
variant accepted by GCC (and I say that the default language semantics 
here should not depend on the target architecture).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH]: Fix PR middle-end/56382 -- Only move MODE_COMPLEX_FLOAT by parts if we can create pseudos

2013-09-02 Thread Eric Botcazou

 Eric, your patch works for me.  Tested on hppa2.0w-hp-hpux11.11 and
 hppa64-hp-hpux11.11.

Thanks, also tested on x86-64/Linux and applied on the mainline.


2013-09-02  Eric Botcazou  ebotca...@adacore.com

PR middle-end/56382
* expr.c (emit_move_complex): Do not move complex FP values as parts if
the source or the destination is a single hard register.


-- 
Eric BotcazouIndex: expr.c
===
--- expr.c	(revision 202160)
+++ expr.c	(working copy)
@@ -3232,11 +3232,16 @@ emit_move_complex (enum machine_mode mod
   if (push_operand (x, mode))
 return emit_move_complex_push (mode, x, y);
 
-  /* See if we can coerce the target into moving both values at once.  */
-
-  /* Move floating point as parts.  */
+  /* See if we can coerce the target into moving both values at once, except
+ for floating point where we favor moving as parts if this is easy.  */
   if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
-   optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing)
+   optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing
+   !(REG_P (x)
+	HARD_REGISTER_P (x)
+	hard_regno_nregs[REGNO(x)][mode] == 1)
+   !(REG_P (y)
+	HARD_REGISTER_P (y)
+	hard_regno_nregs[REGNO(y)][mode] == 1))
 try_int = false;
   /* Not possible if the values are inherently not adjacent.  */
   else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT)

[linaro/gcc-4_8-branch] Backports from trunk

2013-09-02 Thread Christophe Lyon

Hi,

We have just committed backports of the following revisions from trunk
to linaro/gcc-4_8-branch:

r201636 (as 202175)
r201501 (as 202176) (should have picked it up by 4.8 branch merge)
r201341 (as 202181)


Thanks,

Christophe.

Re: [RFC] Changes to the wide-int classes

2013-09-02 Thread Richard Sandiford

Joseph S. Myers jos...@codesourcery.com writes:
 On Sun, 1 Sep 2013, Richard Sandiford wrote:

   like to get rid of them and just keep the genuine operators.  The problem
   is that I'd have liked the AND routine to be wi::and, but of course that
   isn't possible with and being a keyword, so I went for wi::bit_and
   instead.  Same for not and wi::bit_not, and or and wi::bit_or.
   Then it seemed like the others should be bit_* too, and wi::bit_and_not
   just seems a bit unwieldly...
 
   Hmm, if we decide to forbid the use of and in gcc, perhaps we could
   #define it to something safe.  But that would probably be too confusing.

 and in C++ is not a keyword, but an alternative token (like % etc.).  
 As such, it can't be defined as a macro, or used as a macro name in 
 #define, #ifdef etc., and does not get converted to 0 in #if conditions 
 but is interpreted as an operator there.

Ah, thanks, hadn't realised that.  In some ways I'm glad that such a bad
idea would fail to work :-)

Richard

Fix gcc.dg/lto/20090218-1.c

2013-09-02 Thread Jan Hubicka

Hi,
gcc.dg/lto/20090218-1.c contains cross module call into always_inline function.
At -O0 -flto we used to report nothing since optimize_inline_calls was not 
called.  With
my change we report it as error.

I am not sure what is desired behaviour, but for now I am restoring the previous
situation that is consistent with non-lto build. I suppose it is good thing to 
do
given that we promise always inlines to be always inline only within given 
module.

I am testing the patch on x86_64-linux and will commit it if testing passes.

Honza

* ipa-inline-transform.c (inline_transform): Do not optimize inline
calls when not optimizing.
Index: ipa-inline-transform.c
===
--- ipa-inline-transform.c  (revision 202182)
+++ ipa-inline-transform.c  (working copy)
@@ -432,7 +432,7 @@ inline_transform (struct cgraph_node *no
   ipa_remove_all_references (node-symbol.ref_list);
 
   timevar_push (TV_INTEGRATION);
-  if (node-callees)
+  if (node-callees  optimize)
 todo = optimize_inline_calls (current_function_decl);
   timevar_pop (TV_INTEGRATION);

Re: [RFC] Changes to the wide-int classes

2013-09-02 Thread Mike Stump

On Sep 2, 2013, at 1:22 AM, Richard Sandiford rdsandif...@googlemail.com 
wrote:
 What I'm saying is that the assert stops plain:
 
   max_wide_int x = wide_int_var;

Sorry for being slow.  I see the point, indeed, from the preexisting code, we 
did:

  fixed_wide_int (const wide_int_ro w) : wide_int_ro (w) {
/* We only allow the same size in, as otherwise 
   
   we would not know how to extend it.  */
gcc_assert (precision == bitsize);
  }

to disallow it dynamically.  Doing this, or pushing it further into the type 
system so that it is caught a compile time, is fine.

I also like the passing of sign:

  max_wide_int x = max_wide_int::from (wide_int_var, SIGNED);
  max_wide_int x = max_wide_int::from (wide_int_var, UNSIGNED);

like this.  In the previous code we did:

  static inline fixed_wide_int from_wide_int (const wide_int w) {
if (w.neg_p (SIGNED))
  return w.sforce_to_size (bitsize);
return w.zforce_to_size (bitsize);
  }

which is, as you point out, dangerous.

Re: Lambda templates and implicit function templates.

2013-09-02 Thread Adam Butcher


On 01.09.2013 21:05, Jason Merrill wrote:

On 08/27/2013 03:42 PM, Adam Butcher wrote:

I don't know if it's the correct thing to do but the implementation
currently omits the conversion to function pointer operator if the
argument list contains a parameter pack.


I would expect that to work.  Does the specification not provide for
deduction in that case?


It doesn't forbid it as far as I can see.  But it also gives no
example of a stateless lambda with a variadic parameter.  TBH I just
couldn't figure out the right implementation so thought it better
to omit the conversion operator on the assumption that it is not a
common use case rather than ICE .  I'll have a rethink based on your
follow up to 2/4 and try to get a pack expansion working there.


One other thing, assuming the 'auto...' syntax can be made to work,
bug 41933 needs to be resolved for the expansion returned by the
generic lambda in N3690 5.1.2.5 to compile.  Currently (transforming
the 'auto...' to an explicit 'typename T... T...') appears to
yield the bug.


Bug 41933 is specifically about lambda capture; I think you're
running into something else.


The problem is in expanding the 'ts' capture from the 5.1.2.5.  It
looks like this:

  1 auto vglambda = [](auto printer) {
  2   return [=](auto ... ts) { // OK: ts is a function parameter 
pack

  3 printer(std::forwarddecltype(ts)(ts)...);
  4 return [=]() {
  5   printer(ts ...);
  6 };
  7   };
  8 };

It is the expansion of the captured 'ts' on line 5 that I think yields
the bug.  Might be slightly different to 41933 though due to not being
explicitly captured as ts  Unsure.   My point was that in order 
to

conform this issue will need to be fixed as well as getting the
'auto...' syntax to work.

Cheers,
Adam

Re: [PATCH 1/4] Support lambda templates.

2013-09-02 Thread Adam Butcher


On 01.09.2013 21:22, Jason Merrill wrote:

On 08/27/2013 03:42 PM, Adam Butcher wrote:

+ vec_safe_push (argvec, arg);


I bet we want convert_from_reference in the non-generic lambda case, 
too.


OK with that change.

I think I had made that change originally to keep the two impls the 
same and I hit issues with non-generic lambdas.  But I can't remember 
the details.  I'll try again.  If it works with convert_from_reference 
in both cases should I push or should I sort out the parameter pack 
conversion op issue and roll that up into this?

Re: [PATCH 1/4] Support lambda templates.

2013-09-02 Thread Jason Merrill


On 09/02/2013 02:30 PM, Adam Butcher wrote:

I think I had made that change originally to keep the two impls the same
and I hit issues with non-generic lambdas.  But I can't remember the
details.  I'll try again.  If it works with convert_from_reference in
both cases should I push or should I sort out the parameter pack
conversion op issue and roll that up into this?


I think roll them together, since that patch rewrites parts of this one.

Jason

ping [patch, fortran] PR 56519: Reject impure intrinsic subroutines in DO CONCURRENT

2013-09-02 Thread Thomas Koenig

Ping**0.5714 ?


 the attached patch rejects impure subroutines, such as RANDOM_NUMBER,
 within DO CONCURRENT.
 
 Regression-tested.  OK for trunk?

Re: [PATCH 4/4] Support using 'auto' in a function parameter list to introduce an implicit template parameter.

2013-09-02 Thread Adam Butcher


On 01.09.2013 22:20, Jason Merrill wrote:

On 08/27/2013 03:42 PM, Adam Butcher wrote:

+  else // extend current template parameter list
+  // pop the innermost template parms into tparms


Most comments should start with a capital letter and end with a 
period.



Will change.

+  for (size_t n = 0, end = TREE_VEC_LENGTH (inner_vec); n  
end; ++n)

+   tparms = chainon (tparms, TREE_VEC_ELT (inner_vec, n));


Doing chainon in a loop has bad algorithmic complexity, as it walks
through the whole tparms list each iteration.  Better to build up a
list from inner_vec and then chainon that list as a whole.


Okay.


+template typename TreePredicate
+inline tree
+find_type_usage (tree t, TreePredicate pred)


I don't think this needs to be a template, since we know the
predicates take a single tree and return bool.


I didn't know whether to allow for someone passing in a stateful
lambda (or other functor) in future so I avoided the issue by
making the predicate a template type param.  If we're happy that
only c-style functions (or stateless lambdas) will be passed then
I'll put it back as 'bool (*) (const_tree)'.


I don't see any diagnostic for the implicit function template
extension; my earlier comment about not controlling it with 
-std=c++1y

vs gnu++1y didn't mean it should go away entirely.  :)

Maybe we should call it part of c++1z, or just control the diagnostic
with -pedantic.


I must confess I was a bit unclear about how to proceed there.  I'll
reinstate the two messages and go with a specific diagnostic if
-pedantic is set in the non-lambda case.

Cheers,
Adam

Re: Lambda templates and implicit function templates.

2013-09-02 Thread Jason Merrill


On 09/02/2013 02:27 PM, Adam Butcher wrote:

Bug 41933 is specifically about lambda capture; I think you're
running into something else.


The problem is in expanding the 'ts' capture from the 5.1.2.5.  It
looks like this:

   1 auto vglambda = [](auto printer) {
   2   return [=](auto ... ts) { // OK: ts is a function parameter pack
   3 printer(std::forwarddecltype(ts)(ts)...);
   4 return [=]() {
   5   printer(ts ...);
   6 };
   7   };
   8 };

It is the expansion of the captured 'ts' on line 5 that I think yields
the bug.


Ah, yes, that is 41933.  I thought you were saying that a testcase 
without the innermost lambda would fail.


Jason

RE: [PATCH] Fix PR tree-optimization/58137

2013-09-02 Thread Bernd Edlinger

On Fri, 30 Aug 2013 12:34:51 Richard Biener wrote:
 On Tue, Aug 20, 2013 at 1:01 PM, Bernd Edlinger
 bernd.edlin...@hotmail.de wrote:
 This patch fixes a bug in the vectorized pointer arithmetic in the forwprop
 optimization pass.

 Although it seems to be impossible to create a vector of pointers with the
 __attribute__((vector_size(x))) syntax, the vector loop optimization together
 with the loop unrolling can create code which adds a vector of ptroff_t
 repeatedly to a vector of pointers.

 The code in tree-ssa-forwprop.c handles program transformations of the
 form (PTR +- CST1) +- CST2 = PTR +- (CST1+-CST2) where PTR is
 a vector of pointers and CST1 and CST2 are vectors of ptroff_t, without the
 fix the result type of CST1+-CST2 was vector of pointer, which causes the
 ICE in tree-cfg.c, which sees an undefined pointer + pointer operation.

 Additionally the check in tree-cfg.c allows expressions of the form CST - 
 PTR,
 which is clearly wrong. That should be fixed too.

 The patch was bootstrapped and regression tested on i686-pc-linux-gnu.

 It seems to me that the forwprop code does not handle the fact that we
 are permissive as to using PLUS_EXPR instead of POINTER_PLUS_EXPR
 for vector pointer - offset additions. So instead of doing this dance the
 better (and more easily backportable) fix is to simply disable the transform
 on pointer valued PLUS_EXPR. Like with

 if (POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt
 return false;

 at the beginning of the function.


the condition would have to be:

  if (TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE
   POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (gimple_assign_lhs (stmt)
    return false;

I tried this, and it fixes the ICE. However the generated code was still 
vectorized but
much less efficient and used more registers.

Fortunately there will be no need to backport this, since this bug does not 
happen in
gcc 4.8.2 I checked that. I believe it was introduced with the checkin r200059 
by
Marc Glisse where associate_plusminus was enhanced to handle vector values.
Before that only TREE_CODE (rhs2) == INTEGER_CST was handled.

Frankly I would prefer the initial version of the patch, because the code is 
more
efficient this way. The vector data is folded correctly, only the data type was 
wrong
and triggered the ICE in tree-cfg.c.

Please advise.

Thanks
Bernd.

 The real fix is of course to make vector pointer operations properly
 use POINTER_PLUS_EXPR ...

 Richard.



 Regards
 Bernd Edlinger

[ping**n] reimplement -fstrict-volatile-bitfields, v3

2013-09-02 Thread Sandra Loosemore


On 09/02/2013 03:10 AM, Richard Biener wrote:

Can someone, in a new thread, ping the patches that are still in
flight?  ISTR having approved bits of some patches before my leave.


Here's the current state of the patch set I put together.  I've lost 
track of where the canonical version of Bernd's followup patch is.


On 07/09/2013 10:23 AM, Sandra Loosemore wrote:

On 06/30/2013 09:24 PM, Sandra Loosemore wrote:

Here is my third attempt at cleaning up -fstrict-volatile-bitfields.



Part 1 removes the warnings and packedp flag.  It is the same as in the
last version, and has already been approved.  I'll skip reposting it
since the patch is here already:

http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00908.html

Part 2 replaces parts 2, 3, and 4 in the last version.  I've re-worked
this code significantly to try to address Bernd Edlinger's comments on
the last version in PR56997.


Part 2:  http://gcc.gnu.org/ml/gcc-patches/2013-07/msg1.html


Part 3 is the test cases, which are the same as in the last version.
Nobody has reviewed these but I assume they are OK if Part 2 is approved?

http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00912.html

Part 4 is new; it makes -fstrict-volatile-bitfields not be the default
for any target any more.  It is independent of the other changes.


Part 4:  http://gcc.gnu.org/ml/gcc-patches/2013-07/msg2.html


-Sandra

Re: [patch, fortran] PR 56519: Reject impure intrinsic subroutines in DO CONCURRENT

2013-09-02 Thread Tobias Burnus


Thomas Koenig wrote:

the attached patch rejects impure subroutines, such as RANDOM_NUMBER,
within DO CONCURRENT.
Regression-tested.  OK for trunk?


Okay - and thanks for the patch. Unrelated to changes in your patch, I 
am a bit unhappy about:


* Proliferation of global variables
* Magic values (0, 1, 2)
* Badly translatable strings: … inside a DO CONCURRENT %s, 
…gfc_do_concurrent_flag == 2 ? mask : block


But as those are unchanged by your patch …

Tobias


2013-08-29  Thomas Koenig  tkoe...@gcc.gnu.org

 PR fortran/PR56519
 * gfortran.h:  Declare gfc_do_concurrent_flag as extern.
 * resolve.c:  Rename do_concurrent_flag to gfc_do_concurrent_flag.
 and make non-static.
 (resolve_function):  Use gfc_do_concurrent_flag instead of
 do_concurrent_flag.
 (pure_subroutine):  Likewise.
 (resolve_code):  Likewise.
 (resolve_types):  Likewise.
 * intrinsic.c (gfc_intrinsic_sub_interface):  Raise error for
 non-pure intrinsic subroutines within DO CONCURRENT.

2013-08-29  Thomas Koenig  tkoe...@gcc.gnu.org

 PR fortran/PR56519
 * gfortran.dg/do_concurrent_3.f90:  New test case.

Re: [patch, fortran, docs] Unformatted sequential and special files

2013-09-02 Thread Tobias Burnus


Thomas Koenig wrote:

+Unformatted sequential files are stored using record markers. Each
+full record consists of a leading record marker, the data written
+by the user program, and a trailing record marker.  The record markers
+are four-byte integers by default, and eight-byte integers if the
+@option{-fmax-subrecord-length=8} option is in effect. Each record
+marker contains the number of bytes of data in the record.


I wonder whether one should discourage the use of 
-fmax-subrecord-length=8 more explicitly here - when reading until here 
one might think that there is a 2GB limit for =4.



+The maximum number of bytes of user data in a record is 2147483639 for
+a four-byte record marker.


Why this number? I had expected it to be exactly  2 GiB (minus 1 Byte) = 
2**31-1 = 2147483647. Your number is one byte shorter. Why?


Actually, I had also mentioned GiB as those are simpler to remember, e.g.
The maximal user data per record is 2 gigabytes (2147483647/2147483639 
bytes) for the four-byte record marker.



If this is exceeded, a record is split into
+subrecords. Each subrecord also has a leading and a trailing record
+marker. If the leading record marker contains a negative number, the
+number of user data bytes in the subrecord equals the absolute value
+of this number, and another subrecord follows the current one.  If the
+trailing record marker contains a negative number, then the number of
+bytes of user data equals the absolute value of that number, and there
+is a preceding subrecord.
+
+The format for unformatted sequential data can be duplicated using
+unformatted stream, as shown in this example program:


(which assumes records shorter than 2 GiB)


+
+@smallexample
+program main
+  implicit none
+  integer :: i


I wonder whether one should make this more explicit by using

use iso_fortran_env, only: int32
integer(int32) :: i


+Unformatted sequential file access is @emph{not} supported for special
+files.  If necessary, it can be simulated using unformatted stream,
+see @ref{Unformatted sequential file format}.
+
+I/O to and from block devices are also not supported.
+
+@code{BACKSPACE}, @code{REWIND} and @code{ENDFILE} are not supported
+for special files.


I think I understand what you mean but  I/O to and from block devices 
are also not supported. sounds as if it also could apply to all I/O, 
including stream I/O.


How about adding block devices to the list of special files in the 
intro? You could also mention sockets, which behave similarly. The 
BACKSPACE/REWIND/ENDFILE could be moved directly after the itemize in 
the same paragraph as the POS=.


And instead of unformatted sequential, you could write unformatted 
sequential and directed access, which reminds me that we should also 
document that file format for the sake of completeness.


Tobias

Re: [Patch, Fortran, F03] PR 55603: Memory leak with scalar allocatable function result

2013-09-02 Thread Tobias Burnus


Am 27.08.2013 15:09, schrieb Janus Weil:

here is a patch for PR 55603, which plugs a memory leak with scalar
allocatable function results.

To accomplish this, several things are done:
1) Allocatable scalar function results are passed as argument now and
returned by reference (just like array or character results, cf.
gfc_return_by_reference).

[...]

In fact the patch is just a first step and does not handle more
advanced cases yet (like polymorphic allocatable scalar results,
finalization, etc).


Hooray an ABI breakage! (On the other hand, the finalizer already causes 
some breakage - but this is worse as with an interface, one can override 
the .mod-version check.)


In my attempts to get this working, I kept the current version - but 
handled derived types and non-derived types separately. The reason was 
that functions can occur everywhere but DT/CLASS can only occur at some 
places. On the other hand, DT/CLASS can have allocatable components and 
all other kind of nasty things - and se-post comes too early for that. 
For some reasons, it seems to work if there are no allocatable 
components and other nastiness.


I am not sure which approach is better.  In any case, here is my current 
draft - completely unclean and not touched for about a month. And of 
course not ready/fully working. (Otherwise, I had posted a patch.)


I have not yet looked at your patch - and I will first look through the 
backlog of gfortran emails/patches before returning to this one.


Tobias
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 74e95b0..96de076 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -4226,6 +4226,51 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		}
 		  else
 		gfc_conv_expr_reference (parmse, e);
+#if 0
+		  /* Finalize function results after their use as
+		 actual argument.  */
+		// FIXME: Cleanup of constructors
+		  if (e-expr_type == EXPR_FUNCTION  fsym
+		   (fsym-ts.type == BT_CLASS
+			  || (fsym-ts.type == BT_DERIVED
+			   gfc_is_finalizable (e-ts.u.derived, NULL
+		{
+		  tree final_fndecl, size, array;
+		  gfc_expr *final_expr;
+
+		  if (fsym-ts.type == BT_CLASS)
+			{
+			  gfc_is_finalizable (CLASS_DATA (e)-ts.u.derived,
+	  final_expr);
+			  final_fndecl = gfc_vtable_final_get (parmse.expr);
+			  size = gfc_vtable_size_get (parmse.expr);
+			  array = gfc_class_data_get (parmse.expr);
+			}
+		  else
+			{
+			  gfc_se fse;
+			  gfc_is_finalizable (e-ts.u.derived, final_expr);
+			  gfc_init_se (fse, NULL);
+			  gfc_conv_expr (fse, final_expr);
+			  final_fndecl = fse.expr;
+			  size = gfc_typenode_for_spec (e-ts);
+			  size = TYPE_SIZE_UNIT (size);
+			  size = fold_convert (gfc_array_index_type, size);
+			  array = parmse.expr;
+			}
+		  if (POINTER_TYPE_P (TREE_TYPE (final_fndecl)))
+			final_fndecl
+			= build_fold_indirect_ref_loc (input_location,
+			   final_fndecl);
+		  array = gfc_conv_scalar_to_descriptor (parmse, array,
+ fsym-attr);
+		  array = gfc_build_addr_expr (NULL_TREE, array);
+		  tmp = build_call_expr_loc (input_location,
+		 final_fndecl, 3, array,
+		 size, boolean_false_node);
+		  gfc_add_expr_to_block (parmse.post, tmp);
+		}
+#endif
 
 		  /* Catch base objects that are not variables.  */
 		  if (e-ts.type == BT_CLASS
@@ -5562,6 +5607,29 @@ gfc_conv_function_expr (gfc_se * se, gfc_expr * expr)
 
   gfc_conv_procedure_call (se, sym, expr-value.function.actual, expr,
 			   NULL);
+  /* Ensure that allocatable scalars get deallocated; we only handle
+ nonderived types as for TYPE/CLASS one runs into ordering problems
+ with allocatable components.  On the other hand, TYPE and CLASS
+ can only occur with assignment and as actual argument, contrary to
+ intrinsic types.  */
+  if (sym-ts.type != BT_CLASS  sym-ts.type != BT_DERIVED
+   ((sym-result  !sym-result-as  sym-result-attr.allocatable)
+	  || (!sym-result  !sym-as  sym-attr.allocatable)))
+{
+  tree tmp;
+  bool undo_deref = !POINTER_TYPE_P (TREE_TYPE (se-expr));
+
+  if (undo_deref)
+	se-expr = gfc_build_addr_expr (NULL, se-expr);
+
+  tmp = gfc_create_var (TREE_TYPE (se-expr), NULL);
+  gfc_add_modify (se-pre, tmp, se-expr);
+
+  se-expr = tmp;
+  if (undo_deref)
+	se-expr = build_fold_indirect_ref_loc (input_location, se-expr);
+  gfc_add_expr_to_block (se-post, gfc_call_free (tmp));
+}
 }
 
 
@@ -5665,7 +5733,18 @@ gfc_conv_initializer (gfc_expr * expr, gfc_typespec * ts, tree type,
   else if (pointer || procptr)
 {
   if (!expr || expr-expr_type == EXPR_NULL)
-	return fold_convert (type, null_pointer_node);
+	{
+	  if (ts-type == BT_CLASS)
+	{
+	  gfc_init_se (se, NULL);
+	  gfc_conv_structure (se, gfc_class_null_initializer(ts, expr), 1);
+	  gcc_assert (TREE_CODE (se.expr) == CONSTRUCTOR);
+

Fix gcc.dg/lto/pr56297 failure

2013-09-02 Thread Jan Hubicka

Hi,
this patch fixes gcc.dg/lto/pr56297 failure.  Here we have two modules defining
global variable assigned to hard registers.  Those variables do not go into
assembler name hash because they have no real assembler name and consequentely
they are not merged.
We however may end up with two declarations being merged by tree merging and
then symtab needs to compensate.  We solve precisely same problem for
builtins and abstract functions already. This patch thus makes us to do the
right thing for variables, too.

I also added comment since the code is bit weird.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* lto-symtab.c (lto_symtab_merge_symbols): Add comments; merge
duplicated nodes for assembler names.
* symtab.c (symtab_unregister_node): Do not attempt to unlink
hard registers from assembler name hash.
Index: lto-symtab.c
===
--- lto-symtab.c(revision 202182)
+++ lto-symtab.c(working copy)
@@ -586,6 +586,9 @@ lto_symtab_merge_symbols (void)
   FOR_EACH_SYMBOL (node)
{
  cgraph_node *cnode, *cnode2;
+ varpool_node *vnode;
+ symtab_node node2;
+
  if (!node-symbol.analyzed  node-symbol.alias_target)
{
  symtab_node tgt = symtab_node_for_asm (node-symbol.alias_target);
@@ -594,22 +597,37 @@ lto_symtab_merge_symbols (void)
symtab_resolve_alias (node, tgt);
}
  node-symbol.aux = NULL;
- 
+
  if (!(cnode = dyn_cast cgraph_node (node))
  || !cnode-clone_of
  || cnode-clone_of-symbol.decl != cnode-symbol.decl)
{
+ /* Builtins are not merged via decl merging.  It is however
+possible that tree merging unified the declaration.  We
+do not want duplicate entries in symbol table.  */
  if (cnode  DECL_BUILT_IN (node-symbol.decl)
   (cnode2 = cgraph_get_node (node-symbol.decl))
   cnode2 != cnode)
lto_cgraph_replace_node (cnode2, cnode);
 
+ /* The user defined assembler variables are also not unified by 
their
+symbol name (since it is irrelevant), but we need to unify 
symbol
+nodes if tree merging occured.  */
+ if ((vnode = dyn_cast varpool_node (node))
+  DECL_HARD_REGISTER (vnode-symbol.decl)
+  (node2 = symtab_get_node (vnode-symbol.decl))
+  node2 != node)
+   lto_varpool_replace_node (dyn_cast varpool_node (node2),
+ vnode);
+ 
+
  /* Abstract functions may have duplicated cgraph nodes attached;
 remove them.  */
  else if (cnode  DECL_ABSTRACT (cnode-symbol.decl)
(cnode2 = cgraph_get_node (node-symbol.decl))
cnode2 != cnode)
cgraph_remove_node (cnode2);
+
  symtab_insert_node_to_hashtable ((symtab_node)node);
}
}
Index: symtab.c
===
--- symtab.c(revision 202182)
+++ symtab.c(working copy)
@@ -283,7 +283,8 @@ symtab_unregister_node (symtab_node node
   else
*slot = replacement_node;
 }
-  unlink_from_assembler_name_hash (node, false);
+  if (!is_a varpool_node (node) || !DECL_HARD_REGISTER (node-symbol.decl))
+unlink_from_assembler_name_hash (node, false);
 }
 
 /* Return symbol table node associated with DECL, if any,

Re: [PATCH 1/4] Support lambda templates.

2013-09-02 Thread Adam Butcher


On 02.09.2013 19:34, Jason Merrill wrote:

On 09/02/2013 02:30 PM, Adam Butcher wrote:

On 01.09.2013 21:22, Jason Merrill wrote:
I bet we want convert_from_reference in the non-generic lambda 
case, too.


I think I had made that change originally to keep the two impls the 
same

and I hit issues with non-generic lambdas.  But I can't remember the
details.  I'll try again.

Okay, finally got around to trying this again.  With 
convert_from_reference in the non-generic case, the code compiles okay 
but SEGVs on the attempt to branch to '_FUN'.


  │105   auto lf0 = [] (float a, int const b) { return a += b; };
  │106
  │107   INVOKEi (lf, float, 7, 0);
 │108   AS_FUNi (lf, float, 7, 0);
  │109   AS_PTRi (lf, float, int, 7, 0);

  │0x404500 main()+14687 mov%eax,-0x4bc(%rbp)
  │0x404506 main()+14693 mov0x36f0(%rip),%eax# 0x407bfc
  │0x40450c main()+14699 mov%eax,-0x4c0(%rbp)
  │0x404512 main()+14705 movl   $0x7,-0x2a4(%rbp)
  │0x40451c main()+14715 lea-0x2a4(%rbp),%rdx
  │0x404523 main()+14722 lea-0x4bc(%rbp),%rax
  │0x40452a main()+14729 mov%rdx,%rsi
  │0x40452d main()+14732 mov%rax,%rdi
 │0x404530 main()+14735 callq  0x400934 lambda(float, int 
const)::_FUN(float , const int )




If it works with convert_from_reference in
both cases should I push or should I sort out the parameter pack
conversion op issue and roll that up into this?


I think roll them together, since that patch rewrites parts of this 
one.


Will assume, for now, that the convert_from_reference call is not 
wanted in the non-generic case (maybe something to do with using 
'build_call_a' instead of 'build_nt_call_vec' or the 
convert_from_reference on the call itself?) and will focus on the 
parameter pack stuff (when I get a chance).


Cheers,
Adam

Re: [Patch] Rewrite regex matchers

2013-09-02 Thread Tim Shen

On Fri, Aug 30, 2013 at 10:04 PM, Tim Shen timshe...@gmail.com wrote:
 I didn't use any tool to check that, but adjust it by hand. It
 shouldn't break anything, but I'll test it again before committing.

Tested under -m32, -m64 and debug mode and committed.


-- 
Tim Shen

Cleanup CFG after profile read/instrumentation

2013-09-02 Thread Jan Hubicka

Hi,
reading profile/instrumenting breaks basic blocks and introduces fake edges.
The broken basic blocks are not re-merged until after LTO streaming that is
wasteful.  Fixed thus.

Profiledbotostrapped/regtsted ppc64-linux, comitted.

Index: tree-profile.c
===
--- tree-profile.c  (revision 202185)
+++ tree-profile.c  (working copy)
@@ -598,6 +598,8 @@ tree_profiling (void)
}
}
 
+  /* re-merge split blocks.  */
+  cleanup_tree_cfg ();
   update_ssa (TODO_update_ssa);
 
   rebuild_cgraph_edges ();

[PATCH] Portable Volatility Warning

2013-09-02 Thread Bernd Edlinger

This is a follow-up patch for Sandra Loosemore's patch in this
thread: reimplement -fstrict-volatile-bitfields, v3.

It was already posted a few weeks ago, but in the wrong thread.
Therefore I re-post it herewith.

It was initially suggested by Hans-Peter Nilsson, and I had much
help from him in putting everything together. Thanks again H-P.

Here is a short specification:

The -Wportable-volatility warning is an optional warning, to warn
about code for which separate incompatbile definitions on different
platforms (or between C and C++) exist even within gcc.
It will be usable for driver code you want to be portable on different
architectures.

This warning should only be emitted if the code is significantly
different when -fstrict-volatile-bitfields is used or not.

It should not be emitted for the code which is not affected by this
option.

In other words, it should be emitted on all bit-fields when the
definition or the context is volatile, except when the whole
structure is not AAPCS ABI compliant, i.e. packed or unaligned.

On the other hand you should always get the same
warnings if you combine -Wportable-volatility with
-fstrict-volatile-bitfields or not. And of course it should not
depend on the specific target that is used to compile.

I boot-strapped this on a i686-pc-linux-gnu and all Wportable-volatility
test cases are passed for C and C++.

Additionally I used a cross-compiler for arm-eabi to manually cross-check
that the warnings are independent of the target platform.

Regards
Bernd Edlinger2013-09-03  Bernd Edlinger  bernd.edlin...@hotmail.de

Implement -Wportable-volatility warning to warn about
code which accesses volatile structure members for
which different ABI specifications exist.
* expr.c (check_portable_volatility): New function.
(expand_assignment): call check_portable_volatility.
(expand_real_expr_1): Likewise.
* fold-const.c (optimize_bit_field_compare): Handle
warn_portable_volatility. Removed if-statement, because
condition flag_strict_volatile_bitfields  0 is always false.
* stor-layout.c (layout_decl): Handle warn_portable_volatility.
* c-family/c.opt: Add -Wportable-volatility option.
* doc/invoke.texi: Add documentation about -Wportable-volatility.

testsuite:
c-c++-common/
* Wportable-volatility-1.c: New testcase.
* Wportable-volatility-2.c: New testcase.



patch-portable-volatility.diff
Description: Binary data

RE: [ping**n] reimplement -fstrict-volatile-bitfields, v3

2013-09-02 Thread Bernd Edlinger

On Mon, 2 Sep 2013 12:56:22 Sandra Loosemore wrote:

 On 09/02/2013 03:10 AM, Richard Biener wrote:
 Can someone, in a new thread, ping the patches that are still in
 flight? ISTR having approved bits of some patches before my leave.

 Here's the current state of the patch set I put together. I've lost
 track of where the canonical version of Bernd's followup patch is.

 On 07/09/2013 10:23 AM, Sandra Loosemore wrote:
 On 06/30/2013 09:24 PM, Sandra Loosemore wrote:
 Here is my third attempt at cleaning up -fstrict-volatile-bitfields.

 Part 1 removes the warnings and packedp flag. It is the same as in the
 last version, and has already been approved. I'll skip reposting it
 since the patch is here already:

 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00908.html

 Part 2 replaces parts 2, 3, and 4 in the last version. I've re-worked
 this code significantly to try to address Bernd Edlinger's comments on
 the last version in PR56997.

 Part 2: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg1.html

 Part 3 is the test cases, which are the same as in the last version.
 Nobody has reviewed these but I assume they are OK if Part 2 is approved?


regarding Part 3, I have a small comment on it:
The test programs pr56997-*.c depend on stdint.h and other headers.
I stumbled over it because I tried to compile the test programs
with an eCos cross-compiler, and eCos happens to not have stdint.h.
Many test cases try to avoid all dependencies on include files.

 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00912.html

 Part 4 is new; it makes -fstrict-volatile-bitfields not be the default
 for any target any more. It is independent of the other changes.

 Part 4: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg2.html

 -Sandra


And the warnings part is re-posted here:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00100.html

Bernd.

Re: [PATCH 1/4] Support lambda templates.

2013-09-02 Thread Jason Merrill


On 09/02/2013 05:18 PM, Adam Butcher wrote:

Will assume, for now, that the convert_from_reference call is not wanted
in the non-generic case (maybe something to do with using 'build_call_a'
instead of 'build_nt_call_vec' or the convert_from_reference on the call
itself?)


Ah, yes; we are passing references through directly as pointers rather 
than doing the equivalent of *.



and will focus on the parameter pack stuff (when I get a chance).


Sounds good.

Jason

[bootstrap] Fix build for several targets (including pr58242)

2013-09-02 Thread Alexander Ivchenko

Hi,

Several builds are broken after r201838.

Here is the fix, awaiting review:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01245.html

The targets that are broken now:
1) *linux* targets that do not include config/linux.h in their tm.h
(e.g alpha-linux, ppc64-linux etc). For them we have:

../../../../gcc/gcc/config/linux-android.c: In function ‘bool
linux_android_libc_has_function(function_class)’:
../../../../gcc/gcc/config/linux-android.c:40:7: error:
‘OPTION_BIONIC’ was not declared in this scope
   if (OPTION_BIONIC)
   ^
make[2]: *** [linux-android.o] Error 1

This is addressed in the changes of config/linux-android.c: linux_libc,
LIBC_GLIBC and LIBC_BIONIC seem to be declared for all *linux*
targets.

2) *uclinux* targets that include config/linux.h. For *uclinux* we do
not use linux-protos.h, and therefore linux_android_libc_has_function
is not declared there.
I don't want to add aditional tmake_file, tm_p_file and extra_objs, so
I added explicit define of TARGET_LIBC_HAS_FUNCTION as
no_c99_libc_has_function for those targets.

3) *linux* targets that do not append to tmake_file
(bfin*-linux-uclibc* or crisv32-*-linux* | cris-*-linux*)


Alexander

71 matches

Mail list logo