date:20130417

Re: LRA assign same hard register with live range overlapped pseduos

2013-04-17 Thread Vladimir Makarov


On 13-04-15 1:20 AM, shiva Chen wrote:

HI,

I'm trying to port a new 32bit target to GCC 4.8.0 with LRA enabled

There is an error case which generates following RTL


  (insn 536 267 643 3 (set (reg/f:SI 0 $r0 [477])  == r477 assign to r0
  (plus:SI (reg/f:SI 31 $sp)
  (const_int 112 [0x70]))) test2.c:95 64 {*addsi3}
   (nil))
  (insn 643 536 537 3 (set (reg/f:SI 0 $r0 [565])   == r565 assign to
r0, and corrupt the usage of r477
  (reg/f:SI 31 $sp)) test2.c:95 44 {*movsi}
   (nil))
  (insn 537 643 538 3 (set (reg/v:SI 13 $r13 [orig:61 i14 ] [61])
  (mem/c:SI (plus:SI (reg/f:SI 0 $r0 [565])   == use r565
  (const_int 136 [0x88])) [5 %sfp+24 S4 A32])) test2.c:95 39
{*load_si}
   (expr_list:REG_DEAD (reg/f:SI 0 $r0 [565])
  (nil)))
...
  (insn 539 540 270 3 (set (reg:SI 0 $r0 [479])
  (plus:SI (reg/f:SI 0 $r0 [477])
  (reg:SI 5 $r5 [480]))) test2.c:95 62 {*add_16bit}
   (expr_list:REG_DEAD (reg:SI 5 $r5 [480])
 (expr_list:REG_DEAD (reg/f:SI 0 $r0 [477]) == use r477 which
should be  $sp +112

Note that the live ranges of r477 and r565 are overlapped but assigned
same register $r0. (r31 is stack pointer)

By tracing LRA process, I noticed that when r477 is created,
the lra_reg_info[r477].val = lra_reg_info[r31] due to (set r477 r31).
But after lra_eliminate(), the stack offset changes and
r477 is equal to r31+112 instead.

In next lra-iteration round, r565 is created, and r565 = r31.

In that case, register content of r477 should treat as not equal to
r565 due to eliminate offset have been changed.

Otherwise, r565 and r477 may assign to same hard register.


To recognize that, I record the eliminate offset when the pseudo
register have been created.

Register content are the same only when lra_reg_info[].val and
lra_reg_info[].offset are equal.


  gcc/lra-assigns.c |6 --
  gcc/lra-int.h |2 ++
  gcc/lra.c |   12 +++-
  3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index b204513..daf0aa9 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int
try_only_hard_regno)
int hr, conflict_hr, nregs;
enum machine_mode biggest_mode;
unsigned int k, conflict_regno;
-  int val, biggest_nregs, nregs_diff;
+  int offset, val, biggest_nregs, nregs_diff;
enum reg_class rclass;
bitmap_iterator bi;
bool *rclass_intersect_p;
@@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int
try_only_hard_regno)
  #endif
sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno);
val = lra_reg_info[regno].val;
+  offset = lra_reg_info[regno].offset;
CLEAR_HARD_REG_SET (impossible_start_hard_regs);
EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
-if (val == lra_reg_info[conflict_regno].val)
+if ((val == lra_reg_info[conflict_regno].val)
+ (offset == lra_reg_info[conflict_regno].offset))
{
 conflict_hr = live_pseudos_reg_renumber[conflict_regno];
 nregs = (hard_regno_nregs[conflict_hr]
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 98f2ff7..8ae4eb0 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -116,6 +116,8 @@ struct lra_reg
/* Value holding by register. If the pseudos have the same value
   they do not conflict.  */
int val;
+  /* Eliminate offset of the pseduo have been created.  */
+  int offset;
/* These members are set up in lra-lives.c and updated in
   lra-coalesce.c.  */
/* The biggest size mode in which each pseudo reg is referred in
diff --git a/gcc/lra.c b/gcc/lra.c
index 9df24b5..69962be 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -194,7 +194,17 @@ lra_create_new_reg (enum machine_mode md_mode,
rtx original,
new_reg
  = lra_create_new_reg_with_unique_value (md_mode, original, rclass, title);
if (original != NULL_RTX  REG_P (original))
-lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val;
+{
+  lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val;
+
+  rtx x = lra_eliminate_regs (original, VOIDmode, NULL_RTX);
+
+  if (GET_CODE (x) == PLUS
+  GET_CODE (XEXP (x, 1)) == CONST_INT)
+   lra_reg_info[REGNO (new_reg)].offset = INTVAL (XEXP (x, 1));
+  else
+   lra_reg_info[REGNO (new_reg)].offset = 0;
+}
return new_reg;
  }

--
1.7.9.5


Comments?


Thanks for working on it, Shiva.  Could you send me full dump for lra 
(and ira if possible) for better understanding the problem situation.  
It is hard for me to say now that your solution is complete (e.g. 
offsets can be changed again).

2 quick things: dead link + potential fix

2013-04-17 Thread Nicole Stoff

Hi there,

A quick note to say, first, thank you for providing such a great site! I am in 
the middle of a writing project (topic: finding - and keeping - a job in the 
modern world) and discovered your site while researching. And, second, I wanted 
to let you know that I did find a link that’s no longer working (on this page: 
http://gcc.gnu.org/ml/gcc/2002-11/msg00060.html) - you’re still linking to 
Yahoo’s old “hot jobs” site: http://hotjobs.yahoo.com/

This site was awesome in its day; unfortunately it is no longer live. If you 
would like to replace the site and need a recommendation, I wondered whether 
you might consider the job resource page provided by Answers.com? Check it out 
and see if you agree that it is a good replacement:

Job Search Help from Answers.com
http://jobs.answers.com/

Thanks for your time, and again, thank you for providing a great site!

Nicki

Nicole Stoff
Research Assistant
www.answers.com

Re: increasing testsuite-errors when optimizing for amdfam10/bdver2

2013-04-17 Thread Winfried Magerl

Hi,

looks like XOP/FAM4/FAM is responsible for the additional errors I
see when running gcc-testsuite or glibc-testsuite. I've opened Bug
56866 as a starting point, so the subject is a little bit misleading:

Bug 56866 - gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles 
glibc-2.17/crypt/sha512.c

Disabling XOP/FAM4/FAM shows no regression (compared with amdfam10) with
glibc-testsuite and no additional execution-errors in the gcc-testsuite.

Currently I'm running gcc-4.8-branch configured ith '--with-arch=bdver2'
and with a simple patch disabling XOP/FAM4/FAM for bdver2 in
gcc/config/i386/i386.c.

regards

winfried

On Mon, Apr 01, 2013 at 08:44:59PM +0200, winfried.mag...@t-online.de wrote:
 Hi,
 
 replacing my AMD Phenom2 with a AMD Piledriver (Bulldozer Version2)
 was reason enough for me to recompile gcc (and the whole linux-system)
 with hard optimisation set to bdver2 (as I've done since my first
 linux on an 68030).
 But this time an increasing number of errors makes me a little bit nervous
 and after some additional errors when running the glibc-2.17-testsuite
 I've refused to use this optimisation as default on my system.
 
 The results might be interesting for the gcc-developer-community and I've
 mailed four results with different set of '--with-arch' and '--with-tune'
 to gcc-testresu...@gcc.gnu.org from stock gcc-4.8.0.
 I've set '--build=x86_64-winnix-linux-gnu' just to make it easier to search
 the archive for this specific results (results include the complete set
 of relevant libs/tools).
 
 Basic flags for every compile/test-run:
 
 --build=x86_64-winnix-linux-gnu --enable-languages=c,c++ --enable-shared 
 --prefix=/usr --enable-multilib=no
 
 optimization for phenom2 (I've used since I've replaced
 my Athlon-FX):
 --with-arch=amdfam10 --with-tune=amdfam10
 
 soft-optimization for bdver2 which is the current configuration
 I use on my system (no additional errors in glibc-2.17:
 --with-arch=amdfam10 --with-tune=bdver2
 
 optimization for bdver2:
 --with-arch=bdver2 --with-tune=bdver2
 
 The number of additional errors is always increasing. Mostly errors
 in scan-assembler and scan-tree-dump (maybe wrong expections in the
 tests?) but with arch=bdver2 I see an increasing number of
 execution-tests failing.
 
 Surprisingly (at least for me) the difference is only visible in the
 gcc-testsuite and doesn't harm other languages.
 
 I've done some work to ensure errors are not related to the system-setup
 and maybe it's of interest what I've learned during this process:
 
 gcc.dg/guality/vla-1.c and vla-2.c depends on the gdb-version. Fails
 with stock gdb-7.5.1 (also tested prerelease gdb-7.5.91) and don't
 fail with gdb-patches from opensuse (fedora-patches works also).
 Using tcl8.6.0 as base for expect/dejagnu doesn't currently work,
 at least not with the gcc-testsuite.
 
 Please note that this is not a regression and that gcc-4.7.x gives
 very similar results.
 
 Thank you for listening and all the good work I apreciate since
 20 years with all sorts of cpu's and operating-systems gcc
 supports!
 
 best regards
 
 winfried

Delay slot filling - what still matters, and what doesn't matter so much anymore?

2013-04-17 Thread Steven Bosscher

Hello delay-slot target maintainers :-)

As you know, I'm playing with a new for-now-toy delay slot filling
pass that preserves the CFG, and uses DF and sched-deps instead of
resource.c. It's now beginning to take form enough that I run into the
to-be-expected unexpected problems and questions. The biggest problem
is that I have never been this far down into machine details since the
DFA scheduler conversions, and have never worked with targets that
have delay slots. I have no idea what really matters, and I hope you
can help me with some of those questions.


First of all: What is still important to handle?

It's clear that the expectations in reorg.c are anything goes but
modern RISCs (everything since the PA-8000, say) probably have some
limitations on what is helpful to have, or not have, in a delay slot.
According to the comments in pa.h about MASK_JUMP_IN_DELAY, having
jumps in delay slots of other jumps is one such thing: They don't
bring benefit to the PA-8000 and they don't work with DWARF2 CFI. As
far as I know, SPARC and MIPS don't allow jumps in delay slots, SH
looks like it doesn't allow it either, and CRIS can do it for short
branches but doesn't do because the trade-off between benefit and
machine description complexity comes out negative. On the scheduler
implementation side: Branches as delayed insns in delay slots of other
branches is impossible to express in the CFG (at least in GCC, but I
think in general it can't be done cleanly). Therefore I want to drop
support for branches in delay slots. What do you think about this?

What about multiple delay slots? It looks like reorg.c has code to
handle insns with multiple delay slots, but there currently are no GCC
targets in the FSF tree that have insns with multiple delay slots and
that use define_delay. The C6X has many more delay slots than just 1
(it can have up to 5 delay slots IIRC) but it is much more flexible
than traditional RISCs when it comes to putting insns in delay slots
(it uses predication so it can annul delayed insns on various
conditions) and it uses a very clever (and effective??) delay slot
filling mechanism via the normal scheduler, using back-tracking and
jump shadows (see UNSPEC_JUMP_SHADOW in the cx6 back end). But C6X
doesn't use reorg.c delay slot scheduling. I'm not aware of any
non-VLIW, non-DSP targets with more than one delay slot per insn, and
new VLIW/DSP ports with delay slots probably should look at c6x rather
than using define_delay. Supporting only a single delay slot per
delay_insn would make my scheduler a bit less complex. Would that be
enough for everyone, or is it necessary to continue to support
multiple delay slots per insn?


Another thing I completely fail to grasp, is how the pipeline
scheduler and delay slots interact. Doesn't dbr_schedule destroy all
the good work schedule_insns has tried to do? If so, how much does
that hurt on modern RISCs?


Related question: What, if anything, currently prevents dbr_schedule
from causing pipeline stalls by stuffing a long-latency insn in a
delay slot? I'm currently using a cost function using:

cost = insn_default_latency (trial_insn) - insn_default_latency (delay_insn);

saying that a trial_insn with greater latency than delay_insn, and
from the same basic block as delay_insn, should not be put in the
delay slot. But that's preventing my scheduler from filling slots that
reorg.c does fill. For example a case like this on sparc, where cost=1
is greater than the cost threshold I'm using (cost==0 i.e. no cost):

(gdb) p debug_rtx(delay_insn)
(jump_insn 18 0 0 2 (set (pc)
(if_then_else (gt (reg:CCX 100 %icc)
(const_int 0 [0]))
(label_ref:DI 77)
(pc))) t.c:18 48 {*normal_branch}
 (expr_list:REG_DEAD (reg:CCX 100 %icc)
(expr_list:REG_BR_PROB (const_int 2900 [0xb54])
(nil)))
 - 77)
$5 = void
(gdb) p insn_default_latency(delay_insn)
$6 = 1
(gdb) p debug_rtx(trial_insn)
(insn/s:TI 16 13 17 2 (set (reg/v:DI 26 %i2 [orig:112 d ] [112])
(mem/c:DI (plus:DI (reg/f:DI 1 %g1 [122])
(const_int 24 [0x18])) [2 x+24 S8 A64])) t.c:14 72
{*movdi_insn_sp64}
 (expr_list:REG_DEAD (reg/f:DI 1 %g1 [122])
(nil)))
$7 = void
(gdb) p insn_default_latency(trial_insn)
$8 = 2
(gdb)

What do you think will be a good strategy to deal with this (short of
integrating delay slot filling in the scheduler proper)? Should I try
to find cost==0 delay slot candidates, and only fill slots with cost0
candidates if nothing cheap is available? Prefer a nop over cost0
candidates? Ignore insn_default_latency?


Another thing I noticed about targets with delay slots that can be
nullified, is that at least some of the ifcvt.c transformations could
be applied to fill more delay slots (obviously if_case_1 and
if_case_2. In reorg.c, optimize_skip does some kind of if-conversion.
Has anyone looked at whether optimize_skip still does something, and
derived a test case for that?


Thanks for any

Re: LRA assign same hard register with live range overlapped pseduos

2013-04-17 Thread Shiva Chen

Full test2.c.209r.reload is about 296kb and i can't send successfully.
Is there another way to send the dump file?

Shiva

2013/4/18 Shiva Chen shiva0...@gmail.com:
 Hi, Vladimir

 attachment is the ira dump of the case

 Shiva

 2013/4/17 Vladimir Makarov vmaka...@redhat.com:
 On 13-04-15 1:20 AM, shiva Chen wrote:

 HI,

 I'm trying to port a new 32bit target to GCC 4.8.0 with LRA enabled

 There is an error case which generates following RTL


   (insn 536 267 643 3 (set (reg/f:SI 0 $r0 [477])  == r477 assign to r0
   (plus:SI (reg/f:SI 31 $sp)
   (const_int 112 [0x70]))) test2.c:95 64 {*addsi3}
(nil))
   (insn 643 536 537 3 (set (reg/f:SI 0 $r0 [565])   == r565 assign to
 r0, and corrupt the usage of r477
   (reg/f:SI 31 $sp)) test2.c:95 44 {*movsi}
(nil))
   (insn 537 643 538 3 (set (reg/v:SI 13 $r13 [orig:61 i14 ] [61])
   (mem/c:SI (plus:SI (reg/f:SI 0 $r0 [565])   == use r565
   (const_int 136 [0x88])) [5 %sfp+24 S4 A32])) test2.c:95
 39
 {*load_si}
(expr_list:REG_DEAD (reg/f:SI 0 $r0 [565])
   (nil)))
 ...
   (insn 539 540 270 3 (set (reg:SI 0 $r0 [479])
   (plus:SI (reg/f:SI 0 $r0 [477])
   (reg:SI 5 $r5 [480]))) test2.c:95 62 {*add_16bit}
(expr_list:REG_DEAD (reg:SI 5 $r5 [480])
  (expr_list:REG_DEAD (reg/f:SI 0 $r0 [477]) == use r477 which
 should be  $sp +112

 Note that the live ranges of r477 and r565 are overlapped but assigned
 same register $r0. (r31 is stack pointer)

 By tracing LRA process, I noticed that when r477 is created,
 the lra_reg_info[r477].val = lra_reg_info[r31] due to (set r477 r31).
 But after lra_eliminate(), the stack offset changes and
 r477 is equal to r31+112 instead.

 In next lra-iteration round, r565 is created, and r565 = r31.

 In that case, register content of r477 should treat as not equal to
 r565 due to eliminate offset have been changed.

 Otherwise, r565 and r477 may assign to same hard register.


 To recognize that, I record the eliminate offset when the pseudo
 register have been created.

 Register content are the same only when lra_reg_info[].val and
 lra_reg_info[].offset are equal.


   gcc/lra-assigns.c |6 --
   gcc/lra-int.h |2 ++
   gcc/lra.c |   12 +++-
   3 files changed, 17 insertions(+), 3 deletions(-)

 diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
 index b204513..daf0aa9 100644
 --- a/gcc/lra-assigns.c
 +++ b/gcc/lra-assigns.c
 @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int
 try_only_hard_regno)
 int hr, conflict_hr, nregs;
 enum machine_mode biggest_mode;
 unsigned int k, conflict_regno;
 -  int val, biggest_nregs, nregs_diff;
 +  int offset, val, biggest_nregs, nregs_diff;
 enum reg_class rclass;
 bitmap_iterator bi;
 bool *rclass_intersect_p;
 @@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int
 try_only_hard_regno)
   #endif
 sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno);
 val = lra_reg_info[regno].val;
 +  offset = lra_reg_info[regno].offset;
 CLEAR_HARD_REG_SET (impossible_start_hard_regs);
 EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos,
 conflict_regno)
 -if (val == lra_reg_info[conflict_regno].val)
 +if ((val == lra_reg_info[conflict_regno].val)
 + (offset == lra_reg_info[conflict_regno].offset))
 {
  conflict_hr = live_pseudos_reg_renumber[conflict_regno];
  nregs = (hard_regno_nregs[conflict_hr]
 diff --git a/gcc/lra-int.h b/gcc/lra-int.h
 index 98f2ff7..8ae4eb0 100644
 --- a/gcc/lra-int.h
 +++ b/gcc/lra-int.h
 @@ -116,6 +116,8 @@ struct lra_reg
 /* Value holding by register. If the pseudos have the same
 value
they do not conflict.  */
 int val;
 +  /* Eliminate offset of the pseduo have been created.  */
 +  int offset;
 /* These members are set up in lra-lives.c and updated in
lra-coalesce.c.  */
 /* The biggest size mode in which each pseudo reg is referred in
 diff --git a/gcc/lra.c b/gcc/lra.c
 index 9df24b5..69962be 100644
 --- a/gcc/lra.c
 +++ b/gcc/lra.c
 @@ -194,7 +194,17 @@ lra_create_new_reg (enum machine_mode md_mode,
 rtx original,
 new_reg
   = lra_create_new_reg_with_unique_value (md_mode, original, rclass,
 title);
 if (original != NULL_RTX  REG_P (original))
 -lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO
 (original)].val;
 +{
 +  lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO
 (original)].val;
 +
 +  rtx x = lra_eliminate_regs (original, VOIDmode, NULL_RTX);
 +
 +  if (GET_CODE (x) == PLUS
 +  GET_CODE (XEXP (x, 1)) == CONST_INT)
 +   lra_reg_info[REGNO (new_reg)].offset = INTVAL (XEXP (x, 1));
 +  else
 +   lra_reg_info[REGNO (new_reg)].offset = 0;
 +}
 return new_reg;
   }

 --
 1.7.9.5


 Comments?


 Thanks for working on it, Shiva.  Could you send me full dump for lra (and
 ira if possible)

Re: LRA assign same hard register with live range overlapped pseduos

2013-04-17 Thread Shiva Chen

Hi, Vladimir

Previous patch probably not completed.
The new patch will record lra_reg_info[i].offset as the offset from
eliminate register to the pseudo i
and keep updating when the stack has been changed.
Therefore, lra-assign could get the latest offset to identify the
pseudo content is equal or not.

 gcc/lra-assigns.c  |6 --
 gcc/lra-eliminations.c |   12 ++--
 gcc/lra-int.h  |2 ++
 gcc/lra.c  |5 -
 4 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index b204513..daf0aa9 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int
try_only_hard_regno)
   int hr, conflict_hr, nregs;
   enum machine_mode biggest_mode;
   unsigned int k, conflict_regno;
-  int val, biggest_nregs, nregs_diff;
+  int offset, val, biggest_nregs, nregs_diff;
   enum reg_class rclass;
   bitmap_iterator bi;
   bool *rclass_intersect_p;
@@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int
try_only_hard_regno)
 #endif
   sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno);
   val = lra_reg_info[regno].val;
+  offset = lra_reg_info[regno].offset;
   CLEAR_HARD_REG_SET (impossible_start_hard_regs);
   EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
-if (val == lra_reg_info[conflict_regno].val)
+if ((val == lra_reg_info[conflict_regno].val)
+ (offset == lra_reg_info[conflict_regno].offset))
   {
conflict_hr = live_pseudos_reg_renumber[conflict_regno];
nregs = (hard_regno_nregs[conflict_hr]
diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c
index 9df0bae..2d34b51 100644
--- a/gcc/lra-eliminations.c
+++ b/gcc/lra-eliminations.c
@@ -1046,6 +1046,7 @@ spill_pseudos (HARD_REG_SET set)
 static void
 update_reg_eliminate (bitmap insns_with_changed_offsets)
 {
+  int i;
   bool prev;
   struct elim_table *ep, *ep1;
   HARD_REG_SET temp_hard_reg_set;
@@ -1124,8 +1125,15 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
   setup_elimination_map ();
   for (ep = reg_eliminate; ep  reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
 if (elimination_map[ep-from] == ep  ep-previous_offset != ep-offset)
-  bitmap_ior_into (insns_with_changed_offsets,
-  lra_reg_info[ep-from].insn_bitmap);
+  {
+bitmap_ior_into (insns_with_changed_offsets,
+lra_reg_info[ep-from].insn_bitmap);
+
+   /* Update offset when the eliminate offset have been changed.  */
+for (i = FIRST_PSEUDO_REGISTER; i  max_reg_num (); i++)
+ if (lra_reg_info[i].val - 1 == ep-from)
+   lra_reg_info[i].offset += (ep-offset - ep-previous_offset);
+  }
 }

 /* Initialize the table of hard registers to eliminate.
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 98f2ff7..944cad1 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -116,6 +116,8 @@ struct lra_reg
   /* Value holding by register. If the pseudos have the same value
  they do not conflict.  */
   int val;
+  /* Offset from relative eliminate register to pesudo reg.  */
+  int offset;
   /* These members are set up in lra-lives.c and updated in
  lra-coalesce.c.  */
   /* The biggest size mode in which each pseudo reg is referred in
diff --git a/gcc/lra.c b/gcc/lra.c
index 9df24b5..7a60281 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -194,7 +194,10 @@ lra_create_new_reg (enum machine_mode md_mode,
rtx original,
   new_reg
 = lra_create_new_reg_with_unique_value (md_mode, original, rclass, title);
   if (original != NULL_RTX  REG_P (original))
-lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val;
+{
+  lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val;
+  lra_reg_info[REGNO (new_reg)].offset = 0;
+}
   return new_reg;
 }


Thanks for the comment :)
Shiva

2013/4/18 Shiva Chen shiva0...@gmail.com:
 Full test2.c.209r.reload is about 296kb and i can't send successfully.
 Is there another way to send the dump file?

 Shiva

 2013/4/18 Shiva Chen shiva0...@gmail.com:
 Hi, Vladimir

 attachment is the ira dump of the case

 Shiva

 2013/4/17 Vladimir Makarov vmaka...@redhat.com:
 On 13-04-15 1:20 AM, shiva Chen wrote:

 HI,

 I'm trying to port a new 32bit target to GCC 4.8.0 with LRA enabled

 There is an error case which generates following RTL


   (insn 536 267 643 3 (set (reg/f:SI 0 $r0 [477])  == r477 assign to r0
   (plus:SI (reg/f:SI 31 $sp)
   (const_int 112 [0x70]))) test2.c:95 64 {*addsi3}
(nil))
   (insn 643 536 537 3 (set (reg/f:SI 0 $r0 [565])   == r565 assign to
 r0, and corrupt the usage of r477
   (reg/f:SI 31 $sp)) test2.c:95 44 {*movsi}
(nil))
   (insn 537 643 538 3 (set (reg/v:SI 13 $r13 [orig:61 i14 ] [61])
   (mem/c:SI (plus:SI (reg/f:SI 0 $r0 [565])   == use r565
   (const_int 136 [0x88])) [5 %sfp+24 S4 A32]))

Re: Delay slot filling - what still matters, and what doesn't matter so much anymore?

2013-04-17 Thread Jeff Law


On 04/17/2013 03:52 PM, Steven Bosscher wrote:

First of all: What is still important to handle?

It's clear that the expectations in reorg.c are anything goes but
modern RISCs (everything since the PA-8000, say) probably have some
limitations on what is helpful to have, or not have, in a delay slot.
According to the comments in pa.h about MASK_JUMP_IN_DELAY, having
jumps in delay slots of other jumps is one such thing: They don't
bring benefit to the PA-8000 and they don't work with DWARF2 CFI. As
far as I know, SPARC and MIPS don't allow jumps in delay slots, SH
looks like it doesn't allow it either, and CRIS can do it for short
branches but doesn't do because the trade-off between benefit and
machine description complexity comes out negative.
Note that sparc and/or mips might use the adjust the return pointer 
trick.  I know it wasn't my idea when I added it to the PA.


Now the PA really can do jumps in the delay slot of another jump, but 
the  semantics are such that it's not all that helpful and we've never 
tried to model it.  You effectively get a single instruction executed at 
the first branch target, then you transfer to the second branch target 
IIRC.  It's actually pretty natural semantics once you look at the pc 
queues work on the PA.




 On the scheduler

implementation side: Branches as delayed insns in delay slots of other
branches is impossible to express in the CFG (at least in GCC, but I
think in general it can't be done cleanly). Therefore I want to drop
support for branches in delay slots. What do you think about this?
Certainly no need to support it in the generic case.  The only question 
is whether or not it's worth supporting the adjust the return pointer in 
the delay slot stuff.  Given an target without call/ret predictor stack, 
it can be a singificant advantage.  Such things might exist in the 
embedded space.







What about multiple delay slots? It looks like reorg.c has code to
handle insns with multiple delay slots, but there currently are no GCC
targets in the FSF tree that have insns with multiple delay slots and
that use define_delay.
Ping Hans, I think he was the last person who tried to deal with reorg 
and multiple delay slots (c4x?).  I certainly wouldn't lose any sleep if 
we killed the limit support for multiple delay slots.






Another thing I completely fail to grasp, is how the pipeline
scheduler and delay slots interact. Doesn't dbr_schedule destroy all
the good work schedule_insns has tried to do? If so, how much does
that hurt on modern RISCs?
It really depends on how the slot is filled and how far in the insn 
chain you had to look.  You're usually just ask likely to improve the 
schedule as you are to muck it up.  Also remember you're dealing stuff 
at block boundaries, where the scheduler really isn't helping much anyway.


There's always a tradeoff here.  It could always be improved by having 
the scheduler mark insns which are good candidates (scheduling-wise) for 
filling slots.  I certainly pondered this a couple decades ago when I 
cared about delay slot filling on in-order targets :-)  Oh yea, those 
hints have to be directional since it may be good to move an insn 
earlier to fill a path leading to the insn, but it may not be good to 
move it later to fill a branch after the insn.





Related question: What, if anything, currently prevents dbr_schedule
from causing pipeline stalls by stuffing a long-latency insn in a
delay slot? I'm currently using a cost function using:
This has generally been left to ports to sort out.   My experience was 
that loads/stores were often OK to put into a delay slot.  A large part 
of the reason for this is when we fill via the backwards walk, we're not 
doing anything speculatively.


A nullified slot is different in that it's usually implemented by 
cancelling out the last stage in the pipeline.  So even if you nullify, 
you still have to go through the entire pipeline.  For something like an 
fpsqrt or fpdiv, that's *really* bad.




What do you think will be a good strategy to deal with this (short of
integrating delay slot filling in the scheduler proper)? Should I try
to find cost==0 delay slot candidates, and only fill slots with cost0
candidates if nothing cheap is available? Prefer a nop over cost0
candidates? Ignore insn_default_latency?
It's really been left to the backends to deal with.  So for example, on 
the PA anything which touched the FPU was disallowed in a nullified slot.







Another thing I noticed about targets with delay slots that can be
nullified, is that at least some of the ifcvt.c transformations could
be applied to fill more delay slots (obviously if_case_1 and
if_case_2. In reorg.c, optimize_skip does some kind of if-conversion.
Has anyone looked at whether optimize_skip still does something, and
derived a test case for that?
I doubt anyone has looked at it recently.  It pre-dates our 
if-conversion code by a decade or more.



Jeff

[Bug tree-optimization/56984] [4.8/4.9 Regression] ICE in tree_vrp.c

2013-04-17 Thread jakub at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56984



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



 Status|UNCONFIRMED |NEW

   Last reconfirmed||2013-04-17

 CC||jakub at gcc dot gnu.org

   Target Milestone|--- |4.8.1

Summary|GCC-4.8.0 ICE in tree_vrp.c |[4.8/4.9 Regression] ICE in

   ||tree_vrp.c

 Ever Confirmed|0   |1



--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 
06:23:02 UTC ---

Started with http://gcc.gnu.org/r184927

[Bug rtl-optimization/56957] [4.9 regression] ICE in add_insn_after, at emit-rtl.c:3783

2013-04-17 Thread abel at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56957



Andrey Belevantsev abel at gcc dot gnu.org changed:



   What|Removed |Added



 Status|NEW |ASSIGNED

 AssignedTo|unassigned at gcc dot   |abel at gcc dot gnu.org

   |gnu.org |



--- Comment #5 from Andrey Belevantsev abel at gcc dot gnu.org 2013-04-17 
06:52:47 UTC ---

Created attachment 29886

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29886

proposed patch



Easy enough, we can have a speculation transformation that does not change insn

at all (e.g. we're asked to speculate an insn already speculated, so we just

changed the speculation probability, not the pattern itself), but

EXPR_WAS_CHANGED only tests that the transformation history vector is

non-empty, so it would report changes have actually happened.  So checking

additionally that the oldest insn form (last vector element) has the same

INSN_ID as the one of the current expr fixes the test.  I will throw this to

our Itanium for the full testing.



Steven, thanks for your insn emitting patches.  It was not that easy to catch

that kind of issues earlier, AFAIR we noticed it via corruption of our own

structures and we needed to trace that back to the offending move

[Bug debug/53453] darwin linker expects both AT_name and AT_comp_dir debug notes

2013-04-17 Thread ebotcazou at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53453



Eric Botcazou ebotcazou at gcc dot gnu.org changed:



   What|Removed |Added



 CC||ebotcazou at gcc dot

   ||gnu.org



--- Comment #17 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-04-17 
07:17:18 UTC ---

The patch was silently backported yesterday, but the wrong ChangeLog has been

modified.  Please post a message on gcc-patches@ and fix the ChangeLog.  TIA.

[Bug tree-optimization/56984] [4.8/4.9 Regression] ICE in tree_vrp.c

2013-04-17 Thread jakub at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56984



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



 Status|NEW |ASSIGNED

 AssignedTo|unassigned at gcc dot   |jakub at gcc dot gnu.org

   |gnu.org |



--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 
07:30:20 UTC ---

Created attachment 29887

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29887

gcc49-pr56984.patch



Untested fix.  Another thing is that fold resp. gimple_fold aren't able to

optimize (x  N)  M into 0 if M  N is the minimum value, but that isn't

something VRP should handle.

[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()

2013-04-17 Thread rguenth at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



 Status|NEW |ASSIGNED

 AssignedTo|unassigned at gcc dot   |rguenth at gcc dot gnu.org

   |gnu.org |



--- Comment #4 from Richard Biener rguenth at gcc dot gnu.org 2013-04-17 
08:26:20 UTC ---

I will have a look.

[Bug tree-optimization/50789] Gather vectorization

2013-04-17 Thread andrey.turetskiy at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789



Andrey Turetskiy andrey.turetskiy at gmail dot com changed:



   What|Removed |Added



 CC||andrey.turetskiy at gmail

   ||dot com



--- Comment #11 from Andrey Turetskiy andrey.turetskiy at gmail dot com 
2013-04-17 08:31:29 UTC ---

It looks like gathers can be used for vectorization in cases like:



#define N 1024



float x[4*N], y[N];



void foo ()

{

  int i;

  for (i = 0; i  N; i++)

y[i] = x[179 + 3*i];

}



Now this code isn't vectorized.

In addition there are a lot of such exampes in SPECS 2006. Vectorization with

gathers can give noticeable gain.

[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)

2013-04-17 Thread vincent-gcc at vinc17 dot net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296

--- Comment #16 from Vincent Lefèvre vincent-gcc at vinc17 dot net 2013-04-17 
08:40:09 UTC ---
(In reply to comment #3)
 A way to tell gcc a variable is not uninitialized is to perform
 self-initialization like
 
  int i = i;
 
 this will cause no code generation but inhibits the warning.  Other compilers
 may warn about this construct of course.

What makes things worse about this workaround is that even protecting this by a

#if defined(__GNUC__)

may not be sufficient as other compilers may claim GNUC compatibility and
behave differently. This is the case of clang (at least under Debian):
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705583

The only good solution would be to fix the bug. I've checked that it is still
there in the trunk revision 197260 (current Debian's gcc-snapshot).

[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()

2013-04-17 Thread rguenth at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982



--- Comment #5 from Richard Biener rguenth at gcc dot gnu.org 2013-04-17 
08:48:33 UTC ---

So the questions are:

- is it desirable that uncprop does anything to SSA_NAME_VAR == NULL phis?



sure - it is all about improving out-of-SSA coalescing opportunities

and avoiding copies



- shouldn't something like that be not performed if current function calls

setjmp (or more narrowly, if there is a returns twice function somewhere in

between the considered setter and user)?



the testcase shows that uncprop extends the lifetime of an SSA name

across a setjmp call - but it can only do so because it's an SSA name.

Which means the testcase is questionable as 'n' is not declared volatile, no?



- what other optimizations might be similarly problematic across returns twice

calls?



every optimization pass that performs hosting.  PRE comes to my mind for



  if (x)

tem = expr;

  setjmp ()

  var = expr;



which would happily eliminate the partial redundancy, moving expr to the

else arm of the if () and thus extending the lifetime of 'var' across

the setjmp call.



We do not explicitely model the abnormal control flow for setjmp / longjmp

which is the reason all these issues may appear.  So I believe the

correct fix is to either declare the testcase invalid or to model

the abnormal control flow explicitely. Add abnormal edges from all

call sites in the function that may end up calling longjmp _and_ eventually an

abnormal edge from function entry as we can call longjmp from callers

as well (though that may be invalid and thus we do not have to care?).



I don't see an easy fix for the issue (well, maybe the specific testcase).

That it happens only after my patch is probably pure luck because of for

example the PRE issue.  Testcase for that:



int f (int a, int flag)

{

  int tem;

  if (flag)

tem = a + 1;

  int x = setjmp (env);

  int tem2 = a + 1;

  if (x)

return tem2;

  return tem;

}



validity of course is questionable, but we clearly use tem only on the

normal path and tem2 on the abnormal path.  PRE does the transform

I indicated, proper abnormal edges would disable the transform.

[Bug tree-optimization/50789] Gather vectorization

2013-04-17 Thread rguenther at suse dot de



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789



--- Comment #12 from rguenther at suse dot de rguenther at suse dot de 
2013-04-17 08:53:21 UTC ---

On Wed, 17 Apr 2013, andrey.turetskiy at gmail dot com wrote:



 

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

 

 Andrey Turetskiy andrey.turetskiy at gmail dot com changed:

 

What|Removed |Added

 

  CC||andrey.turetskiy at gmail

||dot com

 

 --- Comment #11 from Andrey Turetskiy andrey.turetskiy at gmail dot com 
 2013-04-17 08:31:29 UTC ---

 It looks like gathers can be used for vectorization in cases like:

 

 #define N 1024

 

 float x[4*N], y[N];

 

 void foo ()

 {

   int i;

   for (i = 0; i  N; i++)

 y[i] = x[179 + 3*i];

 }

 

 Now this code isn't vectorized.

 In addition there are a lot of such exampes in SPECS 2006. Vectorization with

 gathers can give noticeable gain.



The above can be vectorized with the strided-load vectorization support

(just it doesn't trigger here).  And strided-load vectorization

code-generation can be imrpoved by using gather vectorization by

first building a vector of addresses / indices and then performing

a gather load.  If building a vector of addresses / indices is

cheaper than performing scalar loads and building a vector from

the results, that is.



So the above is more related to strided load support (and the

not yet implemented strided store support as well, if there

are also gather stores ...)



Richard.

[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()

2013-04-17 Thread jakub at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982



--- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 
08:56:00 UTC ---

I don't see how we could declare the testcase invalid, why would n need to be

volatile?  It isn't live across the setjmp call, it is even declared after the

setjmp call, and it is always initialized after the setjmp call.

[Bug fortran/56814] [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure

2013-04-17 Thread janus at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56814



--- Comment #5 from janus at gcc dot gnu.org 2013-04-17 08:58:25 UTC ---

Alternative patch:



Index: gcc/fortran/interface.c

===

--- gcc/fortran/interface.c(revision 198007)

+++ gcc/fortran/interface.c(working copy)

@@ -1184,9 +1184,20 @@ check_result_characteristics (gfc_symbol *s1, gfc_

 {

   gfc_symbol *r1, *r2;



-  r1 = s1-result ? s1-result : s1;

-  r2 = s2-result ? s2-result : s2;

+  if (s1-ts.interface  s1-ts.interface-result)

+r1 = s1-ts.interface-result;

+  else if (s1-result)

+r1 = s1-result;

+  else

+r1 = s1;



+  if (s2-ts.interface  s2-ts.interface-result)

+r2 = s2-ts.interface-result;

+  else if (s2-result)

+r2 = s2-result;

+  else

+r2 = s2;

+

   if (r1-ts.type == BT_UNKNOWN)

 return true;





Regtesting now ...

[Bug bootstrap/56644] --disable-nls requires symbols from libintl

2013-04-17 Thread meisenmann....@fh-salzburg.ac.at



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56644



--- Comment #6 from Markus Eisenmann meisenmann@fh-salzburg.ac.at 
2013-04-17 09:01:04 UTC ---

Created attachment 29888

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29888

Prevent redirect to some libintl-functions if NLS isn't requested



This Patch will undefine some macros which would cause unneed redirections to

libintl-functions (like vsnprintf); while NLS isn't configured (I.e. ENABLE_NLS

is not set).

[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()

2013-04-17 Thread rguenther at suse dot de



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982



--- Comment #7 from rguenther at suse dot de rguenther at suse dot de 
2013-04-17 09:07:10 UTC ---

On Wed, 17 Apr 2013, jakub at gcc dot gnu.org wrote:



 

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982

 

 --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 
 08:56:00 UTC ---

 I don't see how we could declare the testcase invalid, why would n need to be

 volatile?  It isn't live across the setjmp call, it is even declared after the

 setjmp call, and it is always initialized after the setjmp call.



Then there is no other way but to model the abnormal control flow

properly.  Even simple CSE can break things otherwise.  Consider



int tmp = a + 1;

setjmp ()

int tmp2 = a + 1;



even on RTL CSE would break that, no?  setjmp doesn't even

forcefully start a new basic-block.



Hmm, maybe doing that, start a new BB for all returns-twice

calls and add an abnormal edge from FN entry is enough to

avoid all possibly dangerous transforms.



Richard.

[Bug ada/40986] [4.6 regression] Assert_Failure sinfo.adb:360, error detected at a-unccon.ads:23:27

2013-04-17 Thread markus.schoepflin at comsoft dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40986

--- Comment #15 from Markus Schöpflin markus.schoepflin at comsoft dot de 
2013-04-17 09:15:22 UTC ---
I have bisected the problem using the git gcc repository, unfortunately 121
commits are left after bisecting, as in between the last known good and the
first known bad commit the gcc tree does not compile for a lot of commits.

Anyway, this is the last known good commit:

commit 244de65defd519a1245551886fce58113a4b7b2a
Author: charlet charlet@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Wed Jun 6 10:13:25 2007 +

This is the first known bad commit:

commit 7b29e7de1cf940343eeeb25058b7870877d15524
Author: charlet charlet@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Wed Jun 6 10:54:04 2007 +

The 120 commits in between do not compile, and all do massive changes in the
Ada part of gcc.

[Bug bootstrap/56644] --disable-nls requires symbols from libintl

2013-04-17 Thread meisenmann....@fh-salzburg.ac.at



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56644



--- Comment #7 from Markus Eisenmann meisenmann@fh-salzburg.ac.at 
2013-04-17 09:17:36 UTC ---

At least this error is based on some libintl-macros, which will redirect some

stdio-functions (like vsnprintf ...) to their libintl-version(s); I.e. the

header-file libintl.h is available and included, but NLS/libintl isn't

requested.



Solution (as be possible by processing the attached patch/diff-file):



Add following undef's in the #else-region of gcc/intl.h (of #ifdef ENABLE_NLS),

for example after line #54:



#undeffprintf

#undefvfprintf

#undefprintf

#undefvprintf

#undefsprintf

#undefvsprintf

#undefsnprintf

#undefvsnprintf

#undefasprintf

#undefvasprintf

#undefsetlocale



Additional comment:

The header-file gcc/intl.h does already contain undef's to prevent using

libintl if not requested or configured. But not for the affected stdio-funcs

as vsnprintf [...], which may cause linker-errors (I.e. unresolved externals).

[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)

2013-04-17 Thread manu at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296

Manuel López-Ibáñez manu at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||diagnostic

--- Comment #17 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 
09:19:09 UTC ---
(In reply to comment #16)
 (In reply to comment #3)
  A way to tell gcc a variable is not uninitialized is to perform
  self-initialization like
  
   int i = i;
  
  this will cause no code generation but inhibits the warning.  Other 
  compilers
  may warn about this construct of course.

 The only good solution would be to fix the bug. I've checked that it is still
 there in the trunk revision 197260 (current Debian's gcc-snapshot).

If you mean to fix the false warning, then you are likely to wait a long long
time (in order of years) because it doesn't seem a trivial thing to fix and
there are very very few people with enough GCC knowledge to fix it (and they
are busy with other things).

What would be trivial to fix (but require persistence, patience and time) is to
implement this idea:

http://gcc.gnu.org/ml/gcc/2010-08/msg00297.html

that is, either  __attribute__ ((initialized))

or _Pragma(GCC diagnostic ignored \-Wuninitialized\).

(Personally, I prefer the latter, since it reuses existing code).

Add as a follow-up, get rid of the non-portable valgrind-unfriendly i=i idiom
that has caused so much grief over the years.

However, we still need someone with the persistence, patience and time to
implement this and get it past the powers that be.

[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()

2013-04-17 Thread jakub at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982



--- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 
09:28:55 UTC ---

#include stdio.h

#include setjmp.h



static sigjmp_buf env;

static inline int g (int x)

{

  if (x)

{   

  fprintf (stderr, Returning 0\n);

  return 0;

}

  else

{ 

  fprintf (stderr, Returning 1\n);

  return 1;

}

}

__attribute__ ((noinline))

void bar (int n)

{

  if (n == 0)

exit (0);

  static int x;

  if (x++) abort ();

  longjmp (env, 42);

}

int

f (int *e)

{

  int n = *e;

  if (n)

return 1;

  int x = setjmp (env);

  n = g (x);

  fprintf (stderr, x = %i, n = %i\n, x, n);

  bar (n);

}

int

main () 

{

  int v = 0;

  return f (v);

}



Adjusted testcase that fails even with GCC 4.7.2 at -O2, works with -O2

-fno-dominator-opts (which disables uncprop).  Again, I don't see how

this could be declared invalid, while n is declared before the setjmp, it is

not live across the setjmp call.  This adjusted testcase regressed in April

2005 (i.e. 4.1+ regression).

[Bug web/44269] Search for PR number in mailing lists fails

2013-04-17 Thread skannan at redhat dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44269



Shakthi Kannan skannan at redhat dot com changed:



   What|Removed |Added



 CC||skannan at redhat dot com



--- Comment #2 from Shakthi Kannan skannan at redhat dot com 2013-04-17 
09:31:28 UTC ---

Searching for 18249 in the web archive of the gcc-patches mailing list with

mnoGoSearch 3.3.13 does return the link to PR 18249.



http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01458.html

[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)

2013-04-17 Thread manu at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296

--- Comment #18 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 
09:31:59 UTC ---
In fact, we should have removed the i=i idiom a long time ago. The correct
thing to do (as Linus says) is to initialize the variable to a sensible value
to silence the warning: http://lwn.net/Articles/529954/

If GCC is smart enough to remove the initialization, then there is no harm. If
GCC is not smart enough, then the code is probably complex enough that GCC
cannot optimize it properly and this is why it gives a false positive, so the
fake initialization is the least of your worries.

[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)

2013-04-17 Thread manu at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296

--- Comment #19 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 
09:37:24 UTC ---
(In reply to comment #2)

 1. Split the -Wuninitialized into two different warnings: one for which gcc
 knows that the variable is uninitialized and one for which it cannot decide.
 -Wuninitialized currently does both.

Note that -Wmaybe-uninitialized is available since at least GCC 4.8.0

[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-17 Thread burnus at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #3 from Tobias Burnus burnus at gcc dot gnu.org 2013-04-17 
09:39:58 UTC ---

(In reply to comment #2)

 There is a seek inside next_record_w_unf. That function is used for DIRECT 
 I/O.

 Looks conceptually wrong to me for sequential unformatted.  I won't have time

 for a few days to look at this further.



Well, what gfortran does is:



* write place-holder record length in the heading record marker

* write actual data

* write tailing record marker (1st call to write_us_marker in

next_record_w_unf)

* write actual length of this record, i.e. seek back + write_us_marker + see to

past the tailing record marker (all in next_record_w_unf)





I think what other compilers do is to make use of the following item in the

Fortran standard:



The value of the RECL= specifier shall be positive. It specifies the length of

each record in a file being connected for direct access, or specifies the

maximum length of a record in a file being connected for sequential access.

(F2008, 9.5.6.15 RECL= specifier in the OPEN statement)





I tried the following program:

---

integer, allocatable :: array(:)

integer :: rl, i

open(99,file=/dev/null,form=unformatted)

inquire(99,recl=rl)

allocate(array(1024*1024*100))

array = 0

print *,rl, size(array)/4

write(99) (array, i=1,1000)

close(99)

end

---



With gfortran, it takes only: 0.203s and one has:

 19 mmap

 26 open

392 lseek

   1784 write



The question is why there are that many seeks. There should be only a single

record!





With pathf95, it fails after 0.099s with the error:

 This request exceeds the maximum record size.



And with g95, it takes 4.946s (!) until it fails with Writing more data than

the record size (RECL):

 11 close

 17 fstat

 20 mprotect

 21 stat

 25 mmap

 30 write

 47 open

where the mmap+munmap pairs seem to take the lion share of the time.





However, one can do better: NAG f95 only needs 0.007s and does:

  5 read

  6 lseek

  8 mprotect

 10 fstat

 23 stat

 29 mmap

 40 open

   2003 write





Maybe something like the following would work:

* Create a reasonable sized buffer

* Use it to buffer the writes, and if it fits, write the length, the buffer,

the length.

* If the argument is a (too) big array, write the length of data in the buffer

plus array byte size, then the data - and only if another item comes, seek to

the beginning and update the length.



That should take care of:

  write(99) i, j, k

  write(99) i, j, k, small_array

  write(99) big_array

and even

  write(99) i, j, k, big_array

but it will not help for

  write(99) big_array1, big_array2



I think that covers the most important cases. One question is how large the

buffer should be initially, whether it should be resizable - and how long it

should remain allocated. Even a small buffer of 1024 kbyte (= 128 real(8)

values) will help when writing small data like in the example of comment 0.



If it is larger, the issue of freeing the data and/or resizing becomes more

important - and one needs to be careful not to require huge amount of memory

and/or do do very frequent memory allocation+freeing, which causes the problems

with g95.



 * * *



Closer look at NAG: It does the following (allocate moved before open, inquire

removed):



open(/dev/null, O_RDWR)   = 3

mmap(NULL, 3856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =

0x2ab0e000

mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =

0x2ab22000

fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0

ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,

0x7fff645b1c90) = -1 ENOTTY (Inappropriate ioctl for device)

fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0

ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,

0x7fff645b2200) = -1 ENOTTY (Inappropriate ioctl for device)

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =

0x2ac22000

lseek(3, 0, SEEK_CUR)   = 0



write(3, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0...,

4096) = 4096

write(3, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0...,

419426304) = 419426304



(and 999 further write lines)



lseek(3, 0, SEEK_SET)   = 0

read(3, , 4096)   = 0

lseek(3, 12, SEEK_CUR)  = 0

write(3, \0\0\0\250, 4)   = 4

lseek(3, 18446744072233156608, SEEK_SET) = 0

read(3, , 4096)   = 0

lseek(3, 20, SEEK_CUR)  = 0

lseek(3, 0, SEEK_CUR)   = 0

ftruncate(3, 0) = -1 EINVAL (Invalid argument)

close(3)= 0

munmap(0x2ac22000, 4096)= 0

[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()

2013-04-17 Thread rguenth at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982



--- Comment #9 from Richard Biener rguenth at gcc dot gnu.org 2013-04-17 
09:57:52 UTC ---

Created attachment 29889

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29889

patch



Untested patch.  The patch handles setjmp similar to a non-local label,

thus force it to start a new basic-block and get abnormal edges from all

sites that can make a non-local goto or call longjmp.



Fixes the testcase for me.  Somewhat reduced:



#include stdio.h

#include stdlib.h

#include setjmp.h



static sigjmp_buf env;



static inline int g(int x)

{

if (x)

{

fprintf(stderr, Returning 0\n);

return 0;

}

else

{

fprintf(stderr, Returning 1\n);

return 1;

}

}



int f(int *e)

{

if (*e)

  return 1;



int x = setjmp(env);

int n = g(x);

if (n == 0)

  exit(0);

if (x)

  abort();

longjmp(env, 42);

}



int main(int argc, char** argv)

{

int v = 0;

return f(v);

}



but I cannot remove the remaining printfs, so it's not appropriate for the

testsuite yet.

[Bug translation/56985] New: gcc/fortran/resolve.c:920: '%s' in cannot appear in COMMON ...

2013-04-17 Thread stigge at antcom dot de



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56985



 Bug #: 56985

   Summary: gcc/fortran/resolve.c:920: '%s' in cannot appear in

COMMON ...

Classification: Unclassified

   Product: gcc

   Version: 4.8.1

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: translation

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: sti...@antcom.de





In gcc/fortran/resolve.c:920: '%s' in cannot appear in COMMON ...



- I guess the in is not intended?

[Bug web/45655] GCC WIki Needs Text Colorizing Capability

2013-04-17 Thread skannan at redhat dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45655



Shakthi Kannan skannan at redhat dot com changed:



   What|Removed |Added



 CC||skannan at redhat dot com



--- Comment #1 from Shakthi Kannan skannan at redhat dot com 2013-04-17 
10:06:03 UTC ---

I tested with the following:



Color2(red courier on blue,col=red,bcol=blue,font=courier)



Color2(Green Font on Yellow Background,green,yellow)



Color2(Orange Text,orange)



Color2(Text with commas:one,two,three,red)



Color2(Optional parameters,bcol=yellow)



at:



  http://gcc.gnu.org/wiki/WikiSandBox



and can confirm that text colorizing macro isn't working on the GCC Wiki.

[Bug web/45688] Typo in attribute((version-id)) docs

2013-04-17 Thread skannan at redhat dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45688



Shakthi Kannan skannan at redhat dot com changed:



   What|Removed |Added



 CC||skannan at redhat dot com



--- Comment #2 from Shakthi Kannan skannan at redhat dot com 2013-04-17 
10:36:30 UTC ---

http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#Function-Attributes



now mentions version_id correctly:



  extern int foo () __attribute__((version_id (20040821)));

[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-17 Thread jb at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #4 from Janne Blomqvist jb at gcc dot gnu.org 2013-04-17 10:50:07 
UTC ---

The reason why gfortran is slow here is that for non-regular files we use

unbuffered I/O. If you write to a regular file instead of /dev/null, you'll see

us doing ~8 KB writes at a time. On my system, timing writing to /dev/null

gives



real0m0.727s

user0m0.272s

sys 0m0.452s



whereas writing to a file gives



real0m0.202s

user0m0.180s

sys 0m0.020s





The reason for this is that non-regular files (a.k.a. special files) are

special in many ways wrt seeking. Some allow seeking just fine, some always

return 0, some return an error (and which special files behave in which way is

to some extent different on different OS'es). As the buffered IO keeps track of

the logical file pointer position, it can easily get out of sync with the

physical position if it doesn't behave as for a regular file.



Also, for special files users often expect non-buffered IO, e.g. they want

output on the terminal directly instead of waiting until the 8 KB buffer fills

up, programs communicating via pipes can deadlock if data sits in the buffers,

etc. One could of course make unbuffered I/O in gfortran really mean flush

the buffer at the end of each I/O statement rather than not using a buffer at

all and instead using the raw POSIX I/O syscalls. This would perhaps not be a

bad idea per se, but would require making the buffered I/O code handle special

files in some sensible way.



Another reason for gfortran slowness is that we do quite a lot of checking in

data_transfer_init(), which means that there's quite a lot of per-record

overhead. Writing a single element unformatted is thus the worst case. One way

to speed up data_transfer_init, I think, is that instead of checking each flag

bit (which says which I/O specifiers are present) separately, create a variable

with forbidden flags for each I/O type (unformatted/formatted,

sequential/direct/stream = 6x), and check the entire flag variable once (flag

 forbidden_flags == 0). Only if there is an error, do the bit-by-bit checking

in order to generate the error message.

[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)

2013-04-17 Thread vincent-gcc at vinc17 dot net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296

--- Comment #20 from Vincent Lefèvre vincent-gcc at vinc17 dot net 2013-04-17 
11:17:14 UTC ---
(In reply to comment #18)
 In fact, we should have removed the i=i idiom a long time ago. The correct
 thing to do (as Linus says) is to initialize the variable to a sensible value
 to silence the warning: http://lwn.net/Articles/529954/

There is no real sensible value except some trap value. Letting the variable
uninitialized at that point (the declaration) allows some tools, like the
Formalin compiler described in WG14/N1637, to detect potential problems if the
variable is really used uninitialized.

(In reply to comment #19)
 Note that -Wmaybe-uninitialized is available since at least GCC 4.8.0

OK, so a solution would be to add a configure test for projects that don't want
such warnings (while still using -Wall) to see whether -Wno-maybe-uninitialized
is supported.

[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)

2013-04-17 Thread manu at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296

--- Comment #21 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 
11:26:01 UTC ---
(In reply to comment #20)
 OK, so a solution would be to add a configure test for projects that don't 
 want
 such warnings (while still using -Wall) to see whether 
 -Wno-maybe-uninitialized
 is supported.

When an unrecognized warning option is requested (e.g., -Wunknown-warning), GCC
will emit a diagnostic stating that the option is not recognized.  However, if
the -Wno- form is used, the behavior is slightly different: No diagnostic will
be
produced for -Wno-unknown-warning unless other diagnostics are being produced. 
This allows the use of new -Wno- options with old compilers, but if something
goes wrong, the compiler will warn that an unrecognized option was used.

[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)

2013-04-17 Thread manu at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296

--- Comment #22 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 
11:31:29 UTC ---
(In reply to comment #20)
 (In reply to comment #18)
  In fact, we should have removed the i=i idiom a long time ago. The correct
  thing to do (as Linus says) is to initialize the variable to a sensible 
  value
  to silence the warning: http://lwn.net/Articles/529954/
 
 There is no real sensible value except some trap value. Letting the variable
 uninitialized at that point (the declaration) allows some tools, like the
 Formalin compiler described in WG14/N1637, to detect potential problems if the
 variable is really used uninitialized.

That doesn't contradict my assessment above that i=i idiom should die. With the
Pragma one can choose to ignore GCC warnings if they don't want to initialize
the value.

The trap value would be an additional improvement, but someone needs to
implement it. Clang has fsanitize=undefined-trap:

http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation

[Bug fortran/56814] [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure

2013-04-17 Thread janus at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56814



--- Comment #6 from janus at gcc dot gnu.org 2013-04-17 11:41:21 UTC ---

(In reply to comment #5)

 Alternative patch:



In contrast to the patch in comment #3, this one regtests cleanly ...

[Bug rtl-optimization/56921] [4.9 Regression] ICE in rtx_cost called by doloop_optimize_loops for PPC

2013-04-17 Thread rguenth at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56921



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



 Status|NEW |RESOLVED

 Resolution||FIXED



--- Comment #15 from Richard Biener rguenth at gcc dot gnu.org 2013-04-17 
12:02:05 UTC ---

Author: rguenth

Date: Wed Apr 17 12:01:46 2013

New Revision: 198025



URL: http://gcc.gnu.org/viewcvs?rev=198025root=gccview=rev

Log:

2013-04-17  Richard Biener  rguent...@suse.de



PR rtl-optimization/56921

* cfgloop.h (struct loop): Add simple_loop_desc member.

(struct niter_desc): Mark with GTY(()).

(simple_loop_desc): Do not use aux field but simple_loop_desc.

* loop-iv.c (get_simple_loop_desc): Likewise.

(free_simple_loop_desc): Likewise.



Revert

2013-04-16  Richard Biener  rguent...@suse.de



PR rtl-optimization/56921

* loop-init.c (pass_rtl_move_loop_invariants): Add

TODO_do_not_ggc_collect to todo_flags_finish.

(pass_rtl_unswitch): Same.

(pass_rtl_unroll_and_peel_loops): Same.

(pass_rtl_doloop): Same.



Modified:

trunk/gcc/ChangeLog

trunk/gcc/cfgloop.h

trunk/gcc/loop-init.c

trunk/gcc/loop-iv.c

[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)

2013-04-17 Thread vincent-gcc at vinc17 dot net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296

--- Comment #23 from Vincent Lefèvre vincent-gcc at vinc17 dot net 2013-04-17 
12:24:56 UTC ---
(In reply to comment #21)
 When an unrecognized warning option is requested (e.g., -Wunknown-warning), 
 GCC
 will emit a diagnostic stating that the option is not recognized.  However, if
 the -Wno- form is used, the behavior is slightly different: No diagnostic will
 be
 produced for -Wno-unknown-warning unless other diagnostics are being 
 produced. 

That was mainly for pre-4.7 GCC versions, where without the i=i idiom, one
would get the usual may be used uninitialized in this function warning
because -Wno-maybe-uninitialized is not supported, but also the

  unrecognized command line option -Wno-maybe-uninitialized

warning because there was already a warning. However this may not really be
important.

[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)

2013-04-17 Thread vincent-gcc at vinc17 dot net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296

--- Comment #24 from Vincent Lefèvre vincent-gcc at vinc17 dot net 2013-04-17 
12:34:40 UTC ---
BTW, since with the latest GCC versions (such as Debian's GCC 4.7.2), the
warning is no longer issued with -Wno-maybe-uninitialized, perhaps the bug
severity could be lowered to enhancement.

[Bug web/45688] Typo in attribute((version-id)) docs

2013-04-17 Thread manu at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45688

Manuel López-Ibáñez manu at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||manu at gcc dot gnu.org
 Resolution||FIXED

--- Comment #3 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 
13:04:42 UTC ---
So FIXED. Thanks!

[Bug target/56948] PPC V2DI ICE when loading zero into GPRs

2013-04-17 Thread dje at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56948



David Edelsohn dje at gcc dot gnu.org changed:



   What|Removed |Added



 Status|ASSIGNED|RESOLVED

 Resolution||FIXED



--- Comment #2 from David Edelsohn dje at gcc dot gnu.org 2013-04-17 13:27:24 
UTC ---

Patch applied to trunk and GCC 4.8 branch.

[Bug web/45688] Typo in attribute((version-id)) docs

2013-04-17 Thread manu at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45688

--- Comment #4 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 
13:30:35 UTC ---
Actually, the bug was version level functioning. Since it is obvious, I fixed
it.

http://gcc.gnu.org/r198028

[Bug fortran/40958] module files too large

2013-04-17 Thread burnus at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40958



Tobias Burnus burnus at gcc dot gnu.org changed:



   What|Removed |Added



 CC||burnus at gcc dot gnu.org



--- Comment #9 from Tobias Burnus burnus at gcc dot gnu.org 2013-04-17 
13:50:31 UTC ---

Author: jb

Date: Tue Mar 26 22:08:17 2013

New Revision: 197124



URL: http://gcc.gnu.org/viewcvs?rev=197124root=gccview=rev

Log:

PR 25708 Use a temporary buffer when parsing module files.



2013-03-27  Janne Blomqvist  j...@gcc.gnu.org



PR fortran/25708

* module.c (module_locus): Use long for position.

(module_content): New variable.

(module_pos): Likewise.

(prev_character): Remove.

(bad_module): Free data instead of closing mod file.

(set_module_locus): Use module_pos.

(get_module_locus): Likewise.

(module_char): use buffer rather than stdio file.

(module_unget_char): Likewise.

(read_module_to_tmpbuf): New function.

(gfc_use_module): Call read_module_to_tmpbuf.



Modified:

trunk/gcc/fortran/ChangeLog

trunk/gcc/fortran/module.c

[Bug fortran/40958] module files too large

2013-04-17 Thread burnus at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40958



--- Comment #10 from Tobias Burnus burnus at gcc dot gnu.org 2013-04-17 
13:50:58 UTC ---

Author: jb

Date: Wed Apr 17 10:19:40 2013

New Revision: 198023



URL: http://gcc.gnu.org/viewcvs?rev=198023root=gccview=rev

Log:

PR 40958 Compress module files with zlib.



frontend ChangeLog:



2013-04-17  Janne Blomqvist  j...@gcc.gnu.org



PR fortran/40958

* scanner.h: New file.

* Make-lang.in: Dependencies on scanner.h.

* scanner.c (gfc_directorylist): Move to scanner.h.

* module.c: Don't include md5.h, include scanner.h and zlib.h.

(MOD_VERSION): Add comment about backwards compatibility.

(module_fp): Change type to gzFile.

(ctx): Remove.

(gzopen_included_file_1): New function.

(gzopen_included_file): New function.

(gzopen_intrinsic_module): New function.

(write_char): Use gzputc.

(read_crc32_from_module_file): New function.

(read_md5_from_module_file): Remove.

(gfc_dump_module): Use gz* functions instead of stdio, check gzip

crc32 instead of md5.

(read_module_to_tmpbuf): Use gz* functions instead of stdio.

(gfc_use_module): Use gz* functions.



testsuite ChangeLog:



2013-04-17  Janne Blomqvist  j...@gcc.gnu.org



PR fortran/40958

* lib/gcc-dg.exp (scan-module): Uncompress module file before

scanning.

* gfortran.dg/module_md5_1.f90: Remove.



Added:

trunk/gcc/fortran/scanner.h

Removed:

trunk/gcc/testsuite/gfortran.dg/module_md5_1.f90

Modified:

trunk/gcc/fortran/ChangeLog

trunk/gcc/fortran/Make-lang.in

trunk/gcc/fortran/module.c

trunk/gcc/fortran/scanner.c

trunk/gcc/testsuite/ChangeLog

trunk/gcc/testsuite/lib/gcc-dg.exp

[Bug c++/54320] [c++11] range access to VLA

2013-04-17 Thread paolo.carlini at oracle dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54320



Paolo Carlini paolo.carlini at oracle dot com changed:



   What|Removed |Added



 Status|UNCONFIRMED |NEW

   Last reconfirmed||2013-04-17

 Ever Confirmed|0   |1



--- Comment #8 from Paolo Carlini paolo.carlini at oracle dot com 2013-04-17 
14:11:54 UTC ---

This is now covered (allowed) in N3497.

[Bug c++/54320] [c++11] range access to VLA

2013-04-17 Thread paolo.carlini at oracle dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54320



--- Comment #9 from Paolo Carlini paolo.carlini at oracle dot com 2013-04-17 
14:16:40 UTC ---

Sorry, the most recent paper in the series is actually N3639.

[Bug c++/55149] capturing VLA in lambda

2013-04-17 Thread paolo.carlini at oracle dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55149



--- Comment #5 from Paolo Carlini paolo.carlini at oracle dot com 2013-04-17 
14:19:07 UTC ---

Likewise capturing VLAs is covered in N3639 (only capture by reference allowed)

[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-17 Thread burnus at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #5 from Tobias Burnus burnus at gcc dot gnu.org 2013-04-17 
14:50:16 UTC ---

(In reply to comment #4)

 The reason why gfortran is slow here is that for non-regular files we use

 unbuffered I/O. If you write to a regular file instead of /dev/null, you'll 

 see us doing ~8 KB writes at a time.

 

 The reason for this is that non-regular files (a.k.a. special files) are

 special in many ways wrt seeking. Some allow seeking just fine, some always

 return 0, some return an error (and which special files behave in which way is

 to some extent different on different OS'es).



I do not understand the argument regarding seek. If seek doesn't work - why

should there be a problem with buffering but not without? At least with

SEQUENTIAL one cannot do without (buffer exceeded or no buffering) and with

STREAM no seek should be required.



 Also, for special files users often expect non-buffered IO, e.g. they want

 output on the terminal directly instead of waiting until the 8 KB buffer fills

 up, programs communicating via pipes can deadlock if data sits in the buffers,

 etc.



But the code should be able to wait until a complete record has been written?

That should be rather quick, unless one write a 2GB array. I am not talking

about flushing the data only when 8kB are filled or when the file is closed.

And doing buffering within a record avoids seeks.



 One could of course make unbuffered I/O in gfortran really mean flush

 the buffer at the end of each I/O statement rather than not using a buffer at

 all.



We should consider this.



 * * *



I have now updated timings with writing to a file.



Results for the example in comment 0, but writing to a file (test.dat,

tmpfs). Unformatted is much faster with a normal file, but some others

compilers are still significantly faster. And for formatted, all other

compilers are significantly faster.



 Timing in sec 

Unformatted  Formatted

real / user  real / user  Compiler

---  ---  -

0.378/0.352  2.815/2.804  GCC 4.8.0 (-Ofast, 20130308, Rev. 196547)

0.307/0.296  1.303/1.288  g95 4.0.3 (g95 0.93!) Aug 17 2010 (-O3)

0.210/0.196  0.555/0.532  Sun Fortran 95 8.3 Linux_i386 2007/05/03

0.208/0.184  0.920/0.888  PathScale 3.2.99

0.176/0.152  2.185/2.168  NAGWare Fortran 5.1

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

0.127/0.125  1.091/1.080  GCC 4.9 (trunk, -Ofast)

0.120/0.118  0.465/0.459  g95 4.0.3 (g95 0.94!) Dec 17 2012

0.136/0.131  0.527/0.524  PathScale EKOPath 4.9.0

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

0.335/0.316  2.866/2.860  GCC 4.7.2 20120920 (Cray Inc.)

0.204/0.188  0.659/0.628  Cray Fortran : Version 8.1.6

0.881/0.328  1.281/0.672  Intel 64, Version 13.1.1.163

0.444/0.432  0.884/0.864  pgf90 12.10-0

---

[Bug c/35649] Incorrect printf warning: expect double has float

2013-04-17 Thread trevmrgn+bug at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35649



Trevor Morgan trevmrgn+bug at gmail dot com changed:



   What|Removed |Added



 Target|h8300-elf   |h8300-elf, rx-elf, avr



--- Comment #12 from Trevor Morgan trevmrgn+bug at gmail dot com 2013-04-17 
15:22:48 UTC ---

printf( %f, 2.0D );

will also produce the erroneous warning (tried on rx-elf)

[Bug translation/56986] New: config/epiphany/epiphany.opt:108: floatig

2013-04-17 Thread stigge at antcom dot de



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56986



 Bug #: 56986

   Summary: config/epiphany/epiphany.opt:108: floatig

Classification: Unclassified

   Product: gcc

   Version: 4.8.1

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: translation

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: sti...@antcom.de





Translatable string:



config/epiphany/epiphany.opt:108: floatig - floating?

[Bug translation/56987] New: gcc/config/avr/avr.opt:80: change - changed?

2013-04-17 Thread stigge at antcom dot de



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56987



 Bug #: 56987

   Summary: gcc/config/avr/avr.opt:80: change - changed?

Classification: Unclassified

   Product: gcc

   Version: 4.8.1

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: translation

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: sti...@antcom.de





Translatable string:



gcc/config/avr/avr.opt:80:

Warn if the address space of an address is change. - changed?

[Bug middle-end/10474] shrink wrapping for functions

2013-04-17 Thread jamborm at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474



Martin Jambor jamborm at gcc dot gnu.org changed:



   What|Removed |Added



 CC||jamborm at gcc dot gnu.org

  Component|tree-optimization   |middle-end



--- Comment #11 from Martin Jambor jamborm at gcc dot gnu.org 2013-04-17 
15:52:44 UTC ---

I've submitted a patch that actually makes shrink wrapping happen, at

least on x86_64.  It would be great if someone checked whether it

helps on other platforms:



http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01033.html



(I'm also changing the component to to middle end as this is hardly a

tree-optimization matter.)

[Bug debug/53453] darwin linker expects both AT_name and AT_comp_dir debug notes

2013-04-17 Thread mrs at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53453



m...@gcc.gnu.org mrs at gcc dot gnu.org changed:



   What|Removed |Added



  Known to work||4.7.4



--- Comment #18 from mrs at gcc dot gnu.org mrs at gcc dot gnu.org 2013-04-17 
15:55:25 UTC ---

Fixed the ChangeLog, thanks for spotting it.

[Bug middle-end/42371] dead code not eliminated during folding with whole-program

2013-04-17 Thread jamborm at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42371



Martin Jambor jamborm at gcc dot gnu.org changed:



   What|Removed |Added



URL||http://gcc.gnu.org/ml/gcc-p

   ||atches/2013-04/msg01032.htm

   ||l

 CC||jamborm at gcc dot gnu.org



--- Comment #16 from Martin Jambor jamborm at gcc dot gnu.org 2013-04-17 
15:58:17 UTC ---

I have submitted a patch to address this issue:



http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01032.html

[Bug tree-optimization/56718] Early inlining prevents type based devirtualization

2013-04-17 Thread jamborm at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56718



Martin Jambor jamborm at gcc dot gnu.org changed:



   What|Removed |Added



URL||http://gcc.gnu.org/ml/gcc-p

   ||atches/2013-04/msg01034.htm

   ||l



--- Comment #1 from Martin Jambor jamborm at gcc dot gnu.org 2013-04-17 
16:03:01 UTC ---

I have submitted a patch to address this issue:



http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01034.html

[Bug fortran/56814] [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure

2013-04-17 Thread janus at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56814



--- Comment #7 from janus at gcc dot gnu.org 2013-04-17 16:15:06 UTC ---

Fixed on trunk with:





Author: janus

Date: Wed Apr 17 16:13:07 2013

New Revision: 198032



URL: http://gcc.gnu.org/viewcvs?rev=198032root=gccview=rev

Log:

2013-04-17  Janus Weil  ja...@gcc.gnu.org



PR fortran/56814

* interface.c (check_result_characteristics): Get result from interface

if present.





2013-04-17  Janus Weil  ja...@gcc.gnu.org



PR fortran/56814

* gfortran.dg/proc_ptr_42.f90: New.



Added:

trunk/gcc/testsuite/gfortran.dg/proc_ptr_42.f90

Modified:

trunk/gcc/fortran/ChangeLog

trunk/gcc/fortran/interface.c

trunk/gcc/testsuite/ChangeLog







Will backport to 4.8 soon.

[Bug debug/53453] darwin linker expects both AT_name and AT_comp_dir debug notes

2013-04-17 Thread mrs at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53453



--- Comment #19 from mrs at gcc dot gnu.org mrs at gcc dot gnu.org 2013-04-17 
16:21:39 UTC ---

I've sent a message to the gcc-patches list with a pointer to the gcc-patches

list for the work.

[Bug middle-end/56988] New: ipa-cp incorrectly propagates a field of an aggregate

2013-04-17 Thread eraman at google dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56988



 Bug #: 56988

   Summary: ipa-cp incorrectly propagates a field of an aggregate

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: middle-end

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: era...@google.com





Created attachment 29890

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29890

Reduced test case



$ trunk_g++ --version

trunk_g++ (GCC) 4.9.0 20130416 (experimental)





$ trunk_g++ -S -O2  -std=c++11 -fno-exceptions upstream_test_case.ii  grep

mov.* _ZTVN12_GLOBAL__N_18RCTesterE upstream_test_case.s

movq%rax, _ZTVN12_GLOBAL__N_18RCTesterE+24(%rip)



The generated assembly attempts to write into RCTester class's vtable.



From the dump generated by -fdump-ipa-whole-program-all (just before ipa-cp),

the caller has the following code:



  # .MEM_11 = VDEF .MEM_10

  obj_3-D.2045._vptr.ReferenceCountedD.2013 = MEM[(voidD.45

*)_ZTVN12_GLOBAL__N_18RCTesterED.2049 + 16B];

  # .MEM_12 = VDEF .MEM_11

  obj_3-destructed_D.2025 = 0B;

  # .MEM_13 = VDEF .MEM_12

  obj_3-owner_D.2026 = 0B;

  # .MEM_5 = VDEF .MEM_13

  # USE = nonlocal null { D.2015 D.2049 } (glob)

  # CLB = nonlocal null { D.2015 D.2049 } (glob)

  _ZN12_GLOBAL__N_19TestResetEPNS_8RCTesterED.2068 (obj_3);





At the callee, we see:



void {anonymous}::TestReset({anonymous}::RCTester*) (struct RCTesterD.2017 *

objD.2067)

{

  const struct AssertionResultD.1962 gtest_arD.2071;

  boolD.1899 destructedD.2070;

  struct RCTesterD.2017 * obj.3D.2179;



  # .MEM_2 = VDEF .MEM_1(D)

  destructedD.2070 = 0;

  # VUSE .MEM_2

  # PT = nonlocal escaped 

  obj.3_3 = objD.2067;

  # .MEM_8 = VDEF .MEM_2

  MEM[(boolD.1899 * *)obj.3_3 + 8B] = destructedD.2070;



ipa-cp mistakenly thinks that the move statement

 obj.3_3 = objD.2067;



actually loads from offset 0 of objD.2067 and hence propagates MEM[(voidD.45

*)_ZTVN12_GLOBAL__N_18RCTesterED.2049 + 16B] into obj.3_3 which then

subsequently gets propagated to the store of destructedD.2070. 



The following patch fixes this, but not sure if this could be too restrictive:

Index: gcc/ipa-prop.c

===

--- gcc/ipa-prop.c(revision 197495)

+++ gcc/ipa-prop.c(working copy)

@@ -3892,7 +3892,7 @@ ipcp_transform_function (struct cgraph_node *node)

   {

 struct ipa_agg_replacement_value *v;

 gimple stmt = gsi_stmt (gsi);

-tree rhs, val, t;

+tree rhs, lhs, val, t;

 HOST_WIDE_INT offset;

 int index;

 bool by_ref, vce;

@@ -3900,6 +3900,7 @@ ipcp_transform_function (struct cgraph_node *node)

 if (!gimple_assign_load_p (stmt))

   continue;

 rhs = gimple_assign_rhs1 (stmt);

+lhs = gimple_assign_lhs (stmt);

 if (!is_gimple_reg_type (TREE_TYPE (rhs)))

   continue;



@@ -3924,7 +3925,8 @@ ipcp_transform_function (struct cgraph_node *node)

   continue;

 for (v = aggval; v; v = v-next)

   if (v-index == index

-   v-offset == offset)

+   v-offset == offset

+   TREE_TYPE (v-value) == TREE_TYPE (lhs))

 break;

 if (!v)

   continue;

[Bug ada/40986] [4.6 regression] Assert_Failure sinfo.adb:360, error detected at a-unccon.ads:23:27

2013-04-17 Thread ludo...@ludovic-brenta.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40986



Ludovic Brenta ludo...@ludovic-brenta.org changed:



   What|Removed |Added



 Status|RESOLVED|REOPENED

  Known to work|4.7.2   |

 Resolution|FIXED   |

  Known to fail||4.7.2



--- Comment #16 from Ludovic Brenta ludo...@ludovic-brenta.org 2013-04-17 
18:30:40 UTC ---

gcc-4.7 -c -I./ -gnato -gnatwl -gnatwauJF -gnatef -g -fno-strict-aliasing

-gnatwA -I- ./test.adb

+===GNAT BUG DETECTED==+

| 4.7.2 (x86_64-linux-gnu) Assert_Failure sinfo.adb:388|

| Error detected at a-unccon.ads:23:27 |





Thanks Markus for noticing the interference of gnatchop.  I did the mistake

of gnatchopping the reproducer, this hid the problem.

[Bug target/56866] gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles glibc-2.17/crypt/sha512.c

2013-04-17 Thread winfried.mag...@t-online.de



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866



--- Comment #9 from Winfried Magerl winfried.mag...@t-online.de 2013-04-17 
18:41:06 UTC ---

Hi,



at least one confirmation. I've done some further checks about

float-errors in glibc and that FAM/FAM4 are the extension responsible

for the additional float-errors.



How to proceed?



From my point of view and comapred with '-march=amdfam10' the

extensions XOP/FAM4/FAM are responsible for the failed tests.



Disabling it in gcc-4.8-noxop/gcc/config/i386/i386.c brings me back

to the same test-results I'm seeing with amdfam10 (excluding all

sorts of scan-*-errors).



I would propose the following patch for bdver2-support because

features which are untested and known to break code (like for example

all the additional test-errors in the gcc-testsuite) should be

disabeled:



--- gcc-4.8-noxop/gcc/config/i386/i386.c.orig   2013-04-12 20:49:09.181351855

+0200

+++ gcc-4.8-noxop/gcc/config/i386/i386.c2013-04-12 23:15:09.112185980

+0200

@@ -2976,9 +2976,9 @@

   {bdver2, PROCESSOR_BDVER2, CPU_BDVER2,

PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3

| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1

-   | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4

-   | PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C

-   | PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},

+   | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX

+   | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C

+   | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},

   {bdver3, PROCESSOR_BDVER3, CPU_BDVER3,

PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3

| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1



just an examp,e because the features should be disabled in bdver1/3 too

(XOP/FMA4/FMA are only available in bdver1/2/3). Maybe adding the

gcc-developers from @amd.com?



regards



winfried

[Bug c/56989] New: wrong location in error message

2013-04-17 Thread tromey at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56989

 Bug #: 56989
   Summary: wrong location in error message
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: tro...@gcc.gnu.org


Consider this program:

extern void voidf(void);
extern int intf(void);

int check(void)
{
  if (voidf()  0
  || intf()  0)
return -1;
  return 0;
}


I compiled it with a recent git gcc and got:

barimba. gcc --syntax-only qq.c
qq.c: In function ‘check’:
qq.c:7:7: error: void value not ignored as it ought to be
   || intf()  0)
   ^

I think the error message would be more helpful if it pointed
to the call to voidf.

[Bug sanitizer/56990] New: ICE: SIGFPE with -fsanitize=thread and empty struct

2013-04-17 Thread zsojka at seznam dot cz



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56990



 Bug #: 56990

   Summary: ICE: SIGFPE with -fsanitize=thread and empty struct

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: sanitizer

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: zso...@seznam.cz

CC: do...@gcc.gnu.org, dvyu...@gcc.gnu.org,

ja...@gcc.gnu.org, k...@gcc.gnu.org





Created attachment 29891

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29891

reduced testcase



Compiler output:

$ gcc -fsanitize=thread testcase.c

testcase.c: In function 'foo':

testcase.c:3:6: internal compiler error: Floating point exception

 void foo(struct S *p)

  ^

0xa24dbf crash_signal

/mnt/svn/gcc-trunk/gcc/toplev.c:332

0xa3af12 instrument_expr

/mnt/svn/gcc-trunk/gcc/tsan.c:134

0xa3c406 instrument_gimple

/mnt/svn/gcc-trunk/gcc/tsan.c:612

0xa3c406 instrument_memory_accesses

/mnt/svn/gcc-trunk/gcc/tsan.c:635

0xa3c406 tsan_pass

/mnt/svn/gcc-trunk/gcc/tsan.c:700

Please submit a full bug report,

with preprocessed source if appropriate.

Please include the complete backtrace with any bug report.

See http://gcc.gnu.org/bugs.html for instructions.





Program received signal SIGFPE, Arithmetic exception.

0x00a3af12 in instrument_expr (gsi=..., expr=0x76d9f000,

is_write=is_write@entry=true) at /mnt/svn/gcc-trunk/gcc/tsan.c:134

134   if (bitpos % (size * BITS_PER_UNIT)



Tested revisions:

r198018 - crash

4.8 r196898 - crash

[Bug fortran/40958] module files too large

2013-04-17 Thread Joost.VandeVondele at mat dot ethz.ch



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40958



--- Comment #11 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch 
2013-04-17 19:36:45 UTC ---

With these patches in, parallel compilation of multi-file cp2k becomes

significantly faster. Time for a full build goes from 70s to 50s. I think that

in a parallel build the IO bottleneck (bandwidth) was significant, while this

is now much improved. The effect will likely be even larger on mounted

filesystems.

[Bug ada/56909] [4.8 regression] s-atopri.adb:multiple undefined references on mingw32

2013-04-17 Thread mail2arthur at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56909



Arthur Zhang mail2arthur at gmail dot com changed:



   What|Removed |Added



 Status|RESOLVED|NEW

 Resolution|WONTFIX |



--- Comment #12 from Arthur Zhang mail2arthur at gmail dot com 2013-04-17 
19:44:34 UTC ---

I can build successfully with either '--with-arch=i686 --build=mingw32' or

'--build=i686-pc-mingw32', but as I mentioned in comment 10, change build

target cause packaging issue. 



What is the benefit to use '--build=i686-pc-mingw32' than '--with-arch=i686'?



Thanks.

[Bug c++/56991] New: constexpr std::initializer_list crashes on too complex initialization

2013-04-17 Thread morwenn29 at hotmail dot fr



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56991



 Bug #: 56991

   Summary: constexpr std::initializer_list crashes on too complex

initialization

Classification: Unclassified

   Product: gcc

   Version: 4.8.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: morwen...@hotmail.fr





I found some strange behaviour that, after a discussion on StackOverflow, seems

to be a bug (discussion here:

http://stackoverflow.com/questions/16057690/confusion-about-constant-expressions/16068953?noredirect=1#16068953).



It seems that GCC implements N3471 which means that every function of an

std::initializer_list are constexpr. When trying to pass simple constexpr

things in the initializer_list, it works fine:



#include array

#include initializer_list



int main()

{

constexpr std::arrayint, 3 a = {{ 1, 2, 3 }};

constexpr int a0 = a[0];

constexpr int a1 = a[1];

constexpr int a2 = a[2];

constexpr std::initializer_listint b = { a0, a1, a2 };



return 0;

}



However, without the intermediate variables a0, a1 and a2, the example above

crashes:



#include array

#include initializer_list



int main()

{

constexpr std::arrayint, 3 a = {{ 1, 2, 3 }};

constexpr std::initializer_listint b = { a[0], a[1], a[2] };



return 0;

}



The error is the following one:



error: 'const std::initializer_listint{((const int*)(anonymous)), 3u}' is

not a constant expression



This last example works fine if I remove the constexpr qualifier at the

beginning of the line or if I replace the initializer_list by a std::array. It

seems that the bug is only triggered when using std::initializer_list with

constexpr.

[Bug target/56866] gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles glibc-2.17/crypt/sha512.c

2013-04-17 Thread mikpe at it dot uu.se



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866



--- Comment #10 from Mikael Pettersson mikpe at it dot uu.se 2013-04-17 
20:15:47 UTC ---

(In reply to comment #9)

 How to proceed?



Derive a stand-alone test case from the failing glibc module and whatever glibc

code it requires, then minimize it.

[Bug ada/56909] [4.8 regression] s-atopri.adb:multiple undefined references on mingw32

2013-04-17 Thread ebotcazou at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56909



Eric Botcazou ebotcazou at gcc dot gnu.org changed:



   What|Removed |Added



 Status|NEW |RESOLVED

 Resolution||WONTFIX



--- Comment #13 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-04-17 
20:41:54 UTC ---

 I can build successfully with either '--with-arch=i686 --build=mingw32' or

 '--build=i686-pc-mingw32', but as I mentioned in comment 10, change build

 target cause packaging issue. 



Too bad, but I don't think this will ultimately change the decision, as

i686-pc-mingw32 is the standard triplet for Windows these days.



 What is the benefit to use '--build=i686-pc-mingw32' than '--with-arch=i686'?



It doesn't force -march=i686 by default.

[Bug ada/56909] [4.8 regression] s-atopri.adb:multiple undefined references on mingw32

2013-04-17 Thread mail2arthur at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56909



--- Comment #14 from Arthur Zhang mail2arthur at gmail dot com 2013-04-17 
21:02:14 UTC ---

(In reply to comment #13)

  What is the benefit to use '--build=i686-pc-mingw32' than 
  '--with-arch=i686'?

 

 It doesn't force -march=i686 by default.



Why below output has '-march=pentiumpro'?



bash-3.1$ gcc -v  -o t.exe ./test.c

Using built-in specs.

COLLECT_GCC=c:\MinGW\bin\gcc.exe

COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/i686-pc-mingw32/4.8.0/lto-wrappe

r.exe

Target: i686-pc-mingw32

Configured with: ../gcc-4.8.0/configure

--enable-languages=c,c++,ada,fortran,obj

c,obj-c++ --disable-sjlj-exceptions --with-dwarf2 --enable-shared

--enable-libgo

mp --disable-win32-registry --enable-libstdcxx-debug

--enable-version-specific-r

untime-libs --build=i686-pc-mingw32 --prefix=/mingw

Thread model: win32

gcc version 4.8.0 (GCC)

COLLECT_GCC_OPTIONS='-v' '-o' 't.exe' '-mtune=generic' '-march=pentiumpro'

...

[Bug target/56866] gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles glibc-2.17/crypt/sha512.c

2013-04-17 Thread winfried.mag...@t-online.de



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866



--- Comment #11 from Winfried Magerl winfried.mag...@t-online.de 2013-04-17 
21:02:38 UTC ---

Hi Mike,



On Wed, Apr 17, 2013 at 08:15:47PM +, mikpe at it dot uu.se wrote:

  

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866 

  

 --- Comment #10 from Mikael Pettersson mikpe at it dot uu.se 2013-04-17 
 20:15:47 UTC --- 

 (In reply to comment #9) 

  How to proceed? 

  

 Derive a stand-alone test case from the failing glibc module and whatever 
 glibc 

 code it requires, then minimize it. 



If fixing broken gcc's XOP/FMA/FMA4-extensions on AMD-CPUs depends on my

ability to extract a stand-alone-test from glibc-testsuite then I'm

realy sorry for not having the necessary skills (as already stated).



Why not simply using the failing test-cases from gcc-testsuite

which are all standalone and depends on XOP:



+FAIL: gcc.c-torture/execute/pr51581-1.c execution,  -O3 -fomit-frame-pointer

+FAIL: gcc.c-torture/execute/pr51581-1.c execution,  -O3 -fomit-frame-pointer

-funroll-loops

+FAIL: gcc.c-torture/execute/pr51581-1.c execution,  -O3 -fomit-frame-pointer

-funroll-all-loops -finline-functions

+FAIL: gcc.c-torture/execute/pr51581-1.c execution,  -O3 -g

+FAIL: gcc.c-torture/execute/pr51581-2.c execution,  -O3 -fomit-frame-pointer

+FAIL: gcc.c-torture/execute/pr51581-2.c execution,  -O3 -fomit-frame-pointer

-funroll-loops

+FAIL: gcc.c-torture/execute/pr51581-2.c execution,  -O3 -fomit-frame-pointer

-funroll-all-loops -finline-functions

+FAIL: gcc.c-torture/execute/pr51581-2.c execution,  -O3 -g

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -O1

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -O2

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -O3 -fomit-frame-pointer

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -O3 -fomit-frame-pointer

-funroll-loops

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -O3 -fomit-frame-pointer

-funroll-all-loops -finline-functions

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -O3 -g

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -Os

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -Og -g

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -O2 -flto

-fno-use-linker-plugin -flto-partition=none

+FAIL: gcc.c-torture/execute/pr53645.c execution,  -O2 -flto

-fuse-linker-plugin -fno-fat-lto-objects

+FAIL: gcc.dg/vect/pr51581-1.c execution test

+FAIL: gcc.dg/vect/pr51581-2.c execution test

+FAIL: gcc.dg/vect/pr51581-3.c execution test

+FAIL: gcc.dg/vect/pr51581-1.c -flto execution test

+FAIL: gcc.dg/vect/pr51581-2.c -flto execution test

+FAIL: gcc.dg/vect/pr51581-3.c -flto execution test

+FAIL: gcc.target/i386/avx-mul-1.c execution test

+FAIL: gcc.target/i386/avx-pr51581-1.c execution test

+FAIL: gcc.target/i386/avx-pr51581-2.c execution test

+FAIL: gcc.target/i386/sse2-mul-1.c execution test

+FAIL: gcc.target/i386/sse4_1-mul-1.c execution test



Or is this a formal problem because the subject does not realy match

the whole problem which looks like a more general problem with

extensions specific to bdver1/2/3 (and for this not reproducable

on other cpu's).



regards



winfried

[Bug c/56989] wrong location in error message

2013-04-17 Thread manu at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56989

Manuel López-Ibáñez manu at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-04-17
 CC||manu at gcc dot gnu.org
 Ever Confirmed|0   |1

--- Comment #1 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 
21:08:52 UTC ---
Index: c-typeck.c
===
--- c-typeck.c  (revision 198021)
+++ c-typeck.c  (working copy)
@@ -1981,11 +1981,12 @@ default_conversion (tree exp)
   if (TREE_NO_WARNING (orig_exp))
 TREE_NO_WARNING (exp) = 1;

   if (code == VOID_TYPE)
 {
-  error (void value not ignored as it ought to be);
+  error_at (EXPR_LOC_OR_HERE (exp),
+void value not ignored as it ought to be);
   return error_mark_node;
 }

   exp = require_complete_type (exp);
   if (exp == error_mark_node)



/home/manuel/void.c:6:12: error: void value not ignored as it ought to be
   if (voidf()  0
^

The location could be even better, but that is what the c-parser records.

I like Clang's diagnostic much more:

/home/manuel/void.c:6:15: error: invalid operands to binary expression ('void'
and 'int')
  if (voidf()  0
  ~~~ ^ ~

It is similar to what g++ produces:

/home/manuel/void.c:6:17: error: invalid operands of types ‘void’ and ‘int’ to
binary ‘operator’
   if (voidf()  0
 ^

but with better locations.

[Bug ada/56909] [4.8 regression] s-atopri.adb:multiple undefined references on mingw32

2013-04-17 Thread ebotcazou at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56909



--- Comment #15 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-04-17 
21:35:54 UTC ---

 Why below output has '-march=pentiumpro'?



I think it's the autodetected arch, but maybe I'm confused.  Never mind.

[Bug target/56866] gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles glibc-2.17/crypt/sha512.c

2013-04-17 Thread glisse at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866



--- Comment #12 from Marc Glisse glisse at gcc dot gnu.org 2013-04-17 
21:49:15 UTC ---

(In reply to comment #11)

 If fixing broken gcc's XOP/FMA/FMA4-extensions on AMD-CPUs depends on my

 ability to extract a stand-alone-test from glibc-testsuite then I'm

 realy sorry for not having the necessary skills (as already stated).



Skills can be learned, and the best way is through practice. Ideally someone

with the right combination of knowledge, hardware and free time would look at

it, and you seem to be the closest currently ;-)



 Why not simply using the failing test-cases from gcc-testsuite

 which are all standalone and depends on XOP:



Good idea. I suggest you pick a simple one:



 +FAIL: gcc.target/i386/sse2-mul-1.c execution test



it looks like a list of several tests in a row. If you can first replace the

aborts with printf to determine the first one that fails, then remove

everything after that point, you have already narrowed the issue quite a bit.

Then you can try to simplify what remains. Ideally, you would get a program

small enough that posting the dumps would show the obvious issue. Do make sure

while reducing the program that it still works correctly without the bdver2

option.

[Bug rtl-optimization/56847] [4.8/4.9 Regression] '-fpie' triggers - internal compiler error: in gen_add2_insn, at optabs.c:4705

2013-04-17 Thread shenhan at google dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56847



--- Comment #8 from Han Shen shenhan at google dot com 2013-04-17 23:42:22 
UTC ---

Hi, any progress on this?



Thanks!

[Bug c/56992] New: building Wine with -Og causes GCC to seg fault

2013-04-17 Thread jimportal at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56992

 Bug #: 56992
   Summary: building Wine with -Og causes GCC to seg fault
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: jimpor...@gmail.com


GCC seg faults when building Wine with -Og.  The Wine developers pointed me at
http://gcc.gnu.org/wiki/A_guide_to_testcase_reduction which helped me reduce
the problem down to the attached file (~14 lines).

$ gcc -Og -c testcase-min.c
testcase-min.c: In function ‘DnsHostnameToComputerNameA’:
testcase-min.c:13:1: internal compiler error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See https://bugs.archlinux.org/ for instructions.

I'm using gcc (GCC) 4.8.0 20130411 (prerelease) as supplied by ArchLinux.

My processor is an AMD Phenom(tm) II X4 955

For hints about how GCC was built, you can look for the configure line here:
https://projects.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/gcc-multilib

[Bug c/56992] building Wine with -Og causes GCC to seg fault

2013-04-17 Thread jimportal at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56992



--- Comment #1 from James Eder jimportal at gmail dot com 2013-04-17 23:47:39 
UTC ---

Created attachment 29892

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29892

testcase-min.c

[Bug target/56993] New: power gcc built 416.gamess generates wrong result

2013-04-17 Thread carrot at google dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56993



 Bug #: 56993

   Summary: power gcc built 416.gamess generates wrong result

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: target

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: car...@google.com

  Host: powerpc-linux-gnu

Target: powerpc-linux-gnu

 Build: powerpc-linux-gnu





When I use the trunk gcc to run spec2006 416.gamess, I got the following error



$ runspec --config=test.cfg --tune=base --size=test --nofeedback --noreportable

game

runspec v6152 - Copyright 1999-2008 Standard Performance Evaluation Corporation

Using 'linux-ydl23-ppc' tools

Reading MANIFEST... 18357 files

Loading runspec modules

Locating benchmarks...found 31 benchmarks in 6 benchsets.

Reading config file '/usr/local/google/carrot/spec2006/config/test.cfg'

Benchmarks selected: 416.gamess

Compiling Binaries

  Building 416.gamess base Linux64 default: (build_base_Linux64.)



Build successes: 416.gamess(base)



Setting Up Run Directories

  Setting up 416.gamess test base Linux64 default: created

(run_base_test_Linux64.)

Running Benchmarks

  Running (#1) 416.gamess test base Linux64 default





Contents of exam29.err



STOP IN ABRT







*** Miscompare of exam29.out; for details see

   

/usr/local/google/carrot/spec2006/benchspec/CPU2006/416.gamess/run/run_base_test_Linux64./exam29.out.mis

Invalid run; unable to continue.

If you wish to ignore errors please use '-I' or ignore_errors



The log for this run is in

/usr/local/google/carrot/spec2006/result/CPU2006.111.log

The debug log for this run is in

/usr/local/google/carrot/spec2006/result/CPU2006.111.log.debug



*

* Temporary files were NOT deleted; keeping temporaries such as

* /usr/local/google/carrot/spec2006/result/CPU2006.111.log.debug and

* /usr/local/google/carrot/spec2006/tmp/CPU2006.111

* (These may be large!)

*

runspec finished at Wed Apr 17 16:37:27 2013; 93 total seconds elapsed







My gcc is configured as



$ gcc -v

Using built-in specs.

COLLECT_GCC=gcc

COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc-linux-gnu/4.6/lto-wrapper

Target: powerpc-linux-gnu

Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-12'

--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs

--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr

--program-suffix=-4.6 --enable-shared --enable-linker-build-id

--with-system-zlib --libexecdir=/usr/lib --without-included-gettext

--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6

--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug

--enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-secureplt

--disable-softfloat --enable-targets=powerpc-linux,powerpc64-linux

--with-cpu=default32 --with-long-double-128 --enable-checking=release

--build=powerpc-linux-gnu --host=powerpc-linux-gnu --target=powerpc-linux-gnu

Thread model: posix

gcc version 4.6.2 (Debian 4.6.2-12)





GCC4.8 has the same error, but gcc4.7 is good.

[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()

2013-04-17 Thread bugfeed at online dot de



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982



--- Comment #10 from Leif Leonhardy bugfeed at online dot de 2013-04-18 
01:17:31 UTC ---

One proposed requirement on setjmp is that it be usable like any other

function, that is, that it be callable in *any* expression context, and that

the expression evaluate correctly whether the return from setjmp is direct or

via a call to longjmp. Unfortunately, any implementation of setjmp as a

conventional called function cannot know enough about the calling environment

to save any temporary registers or dynamic stack locations used part way

through an expression evaluation. [...] The temporaries may be correct on the

initial call to setjmp, but are not likely to be on any return initiated by a

corresponding call to longjmp. These considerations dictated the constraint

that setjmp be called only from within fairly simple expressions, ones not

likely to need temporary storage.



An alternative proposal considered by the C89 Committee was to require that

implementations recognize that calling setjmp is a special case, and hence that

they take whatever precautions are necessary to restore the setjmp environment

properly upon a longjmp call. This proposal was rejected on grounds of

consistency: implementations are currently allowed to implement library

functions specially, but no other situations require special treatment.





So according to this (The C99 Rationale [1], page 139 ff., likewise the Single

UNIX Specification), here setjmp() is simply used in an invalid context (i.e.,

in an assignment statement). ;-)



Still, with -Og at least, GCC 4.8.0 produces wrong code even if setjmp() is

used in an allowed context (as in e.g. if (setjmp(...)0) ..., or switch

(setjmp(...)) { ... }), and no matter whether n is declared volatile or not.





[1] http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf

[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well

2013-04-17 Thread jvdelisle at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981



--- Comment #6 from Jerry DeLisle jvdelisle at gcc dot gnu.org 2013-04-18 
01:21:42 UTC ---

I like Jannes idea with the flags.  Also, it seems that at the time we open a

file we know it is /dev/null or /dev/nul in some cases by the file name.  It

would be very low overhead in a few cases to disable some or all checks and

even disable the writing completely.  We would not get all situations, but the

low hanging fruit we could.  It could be done by setting a NULL bit.



One could consider doing this at compile time in some cases where the frontend

could have more elaborate configuration checks that determine the name of the

null device on the target system and look for its use. (probably not really

worth if fur NULL I/O



The other idea to consider is a compiler flag, say -fast-IO or similar that

also disables the extra error checking that is not critical to runtime after a

program has been debugged.

[Bug c/56682] -fsanitize documentation

2013-04-17 Thread pinskia at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56682



--- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2013-04-18 
01:58:03 UTC ---

-fsanitize=thread 



I think it requires -fPIE but really it should not.

[Bug fortran/56994] New: Incorrect documentation for Fortran NEAREST intrinsic function

2013-04-17 Thread spam.brian.taylor at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56994



 Bug #: 56994

   Summary: Incorrect documentation for Fortran NEAREST intrinsic

function

Classification: Unclassified

   Product: gcc

   Version: unknown

Status: UNCONFIRMED

  Severity: trivial

  Priority: P3

 Component: fortran

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: spam.brian.tay...@gmail.com





The GNU Fortran documentation** for the Fortran intrinsic function NEAREST(X,S)

says that S is an optional argument.  It is not optional according to the

Fortran standard.  It is implemented correctly in gfortran, so this is only an

error in the documentation.



** http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gfortran/NEAREST.html

Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output

2013-04-17 Thread Richard Henderson


On 2013-04-16 13:55, Jakub Jelinek wrote:

On Tue, Apr 16, 2013 at 03:41:52PM +0400, Maksim Kuznetsov wrote:

Richard, Jeff, could you please have a look?


I wonder if it %{ and %} shouldn't be better handled in final.c
for all #ifdef ASSEMBLER_DIALECT targets, rather than just for one specific.


Yes, please.


r~

Re: [patch] Fix PR middle-end/56474

2013-04-17 Thread Richard Biener

On Wed, Apr 17, 2013 at 1:12 AM, Eric Botcazou ebotca...@adacore.com wrote:
 For the C family I found exactly one - the layout_type case, and fixed
 it in the FEs by making empty arrays use [1, 0] domains or signed domains
 (I don't remember exactly).  I believe the layout_type change was to make
 Ada happy.

 I'm skeptical, I had to narrow down the initial kludge because it hurted Ada.

 It may be that enabling overflow detection for even unsigned sizetype was
 because of Ada as well.  After all only Ada changed its sizetype sign
 recently.

 Not true, overflow detection has _always_ been enabled for sizetypes.
 But sizetypes, including unsigned ones, were sign-extended so 0 -1 didn't
 overflow and we need that behavior back for Ada to work properly,

Yeah, well - they were effectively signed.

 I don't like special casing 0 - 1 in a general compute function.  Maybe
 you want to use size_diffop for the computation?  That would result in
 a signed result and thus no overflow for 0 - 1.

 But it's not a general compute function, it's size_binop which is meant to be
 used for sizetypes only and which forces overflow on unsigned types.  We need
 overflow detection for sizetypes but we can also tailor it to fit our needs.

I'm not against tailoring it to fit our needs - I'm just against special casing
behavior for specific values.  That just sounds wrong.

Maybe we should detect overflow as if the input and output were signed
while computing an unsigned result.  As far as I can see int_const_binop_1
does detect overflow as if operations were signed (it passes 'false' as
uns to all double-int operations rather than TYPE_UNSIGNED).
For example sub_with_overflow simply does

  neg_double (b.low, b.high, ret.low, ret.high);
  add_double (low, high, ret.low, ret.high, ret.low, ret.high);
  *overflow = OVERFLOW_SUM_SIGN (ret.high, b.high, high);

which I believe is wrong.  Shouldn't it be

  neg_double (b.low, b.high, ret.low, ret.high);
  HOST_WIDE_INT tem = ret.high;
  add_double (low, high, ret.low, ret.high, ret.low, ret.high);
  *overflow = OVERFLOW_SUM_SIGN (ret.high, tem, high);

?  Because we are computing a + (-b) and thus OVERFLOW_SUM_SIGN
expects the sign of a and -b, not a and b to verify against the
sign of ret.

 The other option is to for example disable overflow handling for _all_
 constants and MINUS_EXPR (and then please PLUS_EXPR as well)
 in size_binop.  Maybe it's only the MULT_EXPR overflow we want to
 know (byte-to-bit conversion / element size scaling IIRC).

 Well, we just need 0 - 1 because of the way we compute size expressions for
 variable-sized arrays.

I'm sceptical.  Where do you compute the size expression for variable-sized
arrays?  I suppose with the testcase in the initial patch I can then inspect
myself what actually happens?

Thanks,
Richard.

 --
 Eric Botcazou

Re: [patch] simplify emit_delay_sequence

2013-04-17 Thread Eric Botcazou

 This patch is also necessary for my new delay-slot scheduler to keep
 basic block boundaries correctly up-to-date. The emit-rtl API does
 that already.
 
 Cross-tested powerpc64 x mips. Currently running bootstraptest on
 sparc64-unknown-linux-gnu. OK if it passes?

Yes, modulo

@@ -538,6 +502,8 @@ emit_delay_sequence (rtx insn, rtx list, int lengt
INSN_LOCATION (seq_insn) = INSN_LOCATION (tem);
   INSN_LOCATION (tem) = 0;
 
+  /* Remove any REG_DEAD notes because we can't rely on them now
+that the insn has been moved.  */
   for (note = REG_NOTES (tem); note; note = next)
{
  next = XEXP (note, 1);

Did you mean to move the comment instead of duplicating it?

-- 
Eric Botcazou

Re: [patch] Fix ICE during RTL expansion at -O1

2013-04-17 Thread Eric Botcazou

 +  if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
 +goto may_overlap;
 
 ick, can TREE_CODE (type1) != RECORD_TYPE happen as well here?
 Please add a comment similar to the Fortran ??? above.

It can happen because we stop at unions (and qualified unions) and for them we 
cannot disambiguate based on the fields.  I'll add a regular comment.

 Can you please also add at least one testcase as
 gcc.dg/tree-ssa/ssa-fre-??.c that tests the functionality of this and that
 wasn't handled before? I suppose it would be sth like
 
 struct S { int i; int j; };
 struct U
 {
   struct S a[10];
 } u;
 
 u.a[n].i= i;
 u.a[n].j = j;
 return u.a[n].i;
 
 where we miss to CSE the load from u.a[n].i.

Yes, the patch does eliminate the redundant load in .fre1:

  u.a[n_2(D)].i = i_3(D);
  u.a[n_2(D)].j = j_5(D);
  _7 = u.a[n_2(D)].i;
  return _7;

becomes:

  u.a[n_2(D)].i = i_3(D);
  u.a[n_2(D)].j = j_5(D);
  _7 = i_3(D);
  return _7;

 Otherwise the patch is ok.

Thanks.

-- 
Eric Botcazou

Re: [PATCH] Add a new option -fstack-protector-strong

2013-04-17 Thread Florian Weimer


On 04/17/2013 02:49 AM, Han Shen wrote:

+  if (flag_stack_protect == 3)
+cpp_define (pfile, __SSP_STRONG__=3);
   if (flag_stack_protect == 2)
 cpp_define (pfile, __SSP_ALL__=2);


3 and 2 should be replaced by SPCT_FLAG_STRONG and SPCT_FLAG_ALL.

I define these SPCT_FLAG_XXX in cfgexpand.c locally, so they are not
visible to c-cppbuiltin.c, do you suggest define these inside
c-cppbuiltin.c also?


I see.  Let's use the constants for now.


Indentation is off (unless both mail clients I tried are clobbering your
patch).  I think the GNU coding style prohibits the braces around the
single-statement body of the outer 'for.


Done with indentation properly on and removed the braces. (GMail
composing window drops all the tabs when pasting... I have to use
Thunderbird to paste the patch. Hope it is right this time)


Thunderbird mangles patches as well, but I was able to repair the 
damage.  When using Thunderbird, please send the patch as a text file 
attachment.  You can put the changelog snippets at the beginning of the 
file as well.  This way, everything is sent out unchanged.



Can you make the conditional more similar to the comment, perhaps using a
switch statement on the value of the flag_stack_protect variable? That's
going to be much easier to read.


Re-coded. Now using 'switch-case'.


Thanks.  I think the comment is now redundant because it matches the 
code almost word-for-word. 8-)



No for 'struct-returning' functions. But I regard this not an issue ---
at the programming level, there is no way to get one's hand on the
address of a returned structure ---
struct Node foo();
struct Node *p = foo();  // compiler error - lvalue required as
unary '' operand.


C++ const references can bind to rvalues.

But I'm more worried about the interaction with the return value 
optimization.  Consider this C++ code:


struct S {
  S();
  int a;
  int b;
  int c;
  int d;
  int e;
};

void f1(int *);

S f2()
{
  S s;
  f1(s.a);
  return s;
}

S g2();

void g3()
{
  S s = g2();
}

void g3b(const S);

void g3b()
{
  g3b(g2());
}

With your patch and -O2 -fstack-protector-strong, this generates the 
following assembly:


.globl  _Z2f2v
.type   _Z2f2v, @function
_Z2f2v:
.LFB0:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq%rdi, %rbx
call_ZN1SC1Ev
movq%rbx, %rdi
call_Z2f1Pi
movq%rbx, %rax
popq%rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE0:
.size   _Z2f2v, .-_Z2f2v
.p2align 4,,15
.globl  _Z2g3v
.type   _Z2g3v, @function
_Z2g3v:
.LFB1:
.cfi_startproc
subq$40, %rsp
.cfi_def_cfa_offset 48
movq%rsp, %rdi
call_Z2g2v
addq$40, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE1:
.size   _Z2g3v, .-_Z2g3v
.p2align 4,,15
.globl  _Z3g3bv
.type   _Z3g3bv, @function
_Z3g3bv:
.LFB2:
.cfi_startproc
subq$40, %rsp
.cfi_def_cfa_offset 48
movq%rsp, %rdi
movq%fs:40, %rax
movq%rax, 24(%rsp)
xorl%eax, %eax
call_Z2g2v
movq%rsp, %rdi
call_Z3g3bRK1S
movq24(%rsp), %rax
xorq%fs:40, %rax
jne .L9
addq$40, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L9:
.cfi_restore_state
.p2align 4,,6
call__stack_chk_fail
.cfi_endproc
.LFE2:
.size   _Z3g3bv, .-_Z3g3bv

Here, g3b() is correctly instrumented, and f2() does not need 
instrumentation (because space for the returned object is not part of 
the local frame).  But an address on the stack escapes in g3() and is 
used for the return value of the call to g2().  This requires 
instrumentation, which is missing in this example.


I suppose this can be handled in a follow-up patch if necessary.


ChangeLog and patch below --

gcc/ChangeLog
2013-04-16  Han Shen  shen...@google.com
 * cfgexpand.c (record_or_union_type_has_array_p): Helper function
 to check if a record or union contains an array field.


I think the GNU convention is to write only this:

* cfgexpand.c (record_or_union_type_has_array_p): New function.


 (expand_used_vars): Add logic handling '-fstack-protector-strong'.
 * common.opt (fstack-protector-all): New option.


Should be fstack-protector-strong.

--
Florian Weimer / Red Hat Product Security Team

RE: [PATCH, AArch64] Compare instruction in shift_extend mode

2013-04-17 Thread Hurugalawadi, Naveen

Hi,

 I suggest for this one test case either making it compile only and 
 dropping main() such that the pattern match only looks in the
 assembled output of the cmp_* functions

The testcase will check only for assembly pattern of the instruction
as per your suggestion.

Please find attached the modified patch let me know if there should
be any further modifications in it.

Thanks,
Naveen

--- gcc/config/aarch64/aarch64.md	2013-04-17 11:18:29.453576713 +0530
+++ gcc/config/aarch64/aarch64.md	2013-04-17 15:22:36.161492471 +0530
@@ -2311,6 +2311,18 @@
(set_attr mode GPI:MODE)]
 )
 
+(define_insn *cmp_swp_optabALLX:mode_shft_GPI:mode
+  [(set (reg:CC_SWP CC_REGNUM)
+	(compare:CC_SWP (ashift:GPI
+			 (ANY_EXTEND:GPI
+			  (match_operand:ALLX 0 register_operand r))
+			 (match_operand:QI 1 aarch64_shift_imm_mode n))
+	(match_operand:GPI 2 register_operand r)))]
+  
+  cmp\\t%GPI:w2, %GPI:w0, suxtALLX:size %1
+  [(set_attr v8type alus_ext)
+   (set_attr mode GPI:MODE)]
+)
 
 ;; ---
 ;; Store-flag and conditional select insns
--- gcc/testsuite/gcc.target/aarch64/cmp.c	1970-01-01 05:30:00.0 +0530
+++ gcc/testsuite/gcc.target/aarch64/cmp.c	2013-04-17 15:23:36.121492125 +0530
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+int
+cmp_si_test1 (int a, int b, int c)
+{
+  if (a  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+int
+cmp_si_test2 (int a, int b, int c)
+{
+  if ((a  3)  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+typedef long long s64;
+
+s64
+cmp_di_test1 (s64 a, s64 b, s64 c)
+{
+  if (a  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+s64
+cmp_di_test2 (s64 a, s64 b, s64 c)
+{
+  if ((a  3)  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+int
+cmp_di_test3 (int a, s64 b, s64 c)
+{
+  if (a  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+int
+cmp_di_test4 (int a, s64 b, s64 c)
+{
+  if (((s64)a  3)  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+/* { dg-final { scan-assembler-times cmp\tw\[0-9\]+, w\[0-9\]+ 2 } } */
+/* { dg-final { scan-assembler-times cmp\tx\[0-9\]+, x\[0-9\]+ 4 } } */

RE: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

2013-04-17 Thread Kyrylo Tkachov

Hi Julian,

 From: Julian Brown [mailto:jul...@codesourcery.com]
 Sent: 13 April 2013 15:04
 To: Julian Brown
 Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana
 Radhakrishnan
 Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and
 vcvt_f32_f16 NEON intrinsics

 On Fri, 12 Apr 2013 20:09:39 +0100
 Julian Brown jul...@codesourcery.com wrote:

  On Fri, 12 Apr 2013 15:19:18 +0100
  Kyrylo Tkachov kyrylo.tkac...@arm.com wrote:

   Hi all,

   This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
   to arm_neon.h through the generator ML scripts and also adds the
   built-ins to which the intrinsics will map to. The generator ML
   scripts are updated and used to generate the relevant .texi
   documentation, arm_neon.h and the tests in gcc.target/arm/neon .

  FWIW, some of the changes to neon*.ml can be simplified somewhat --
 my
  attempt at an improved version of those bits is attached. I'm still
  not too happy with mode_suffix, but these new instructions require
  adding semantics to parts of the generator program which weren't
  really very well-defined to start with :-). I appreciate that it's a
  bit of a tangle...

 I thought of an improvement to the mode_suffix part from the last
 version of the patch, so here it is. I'm done fiddling with this now,
 so back to you!

Thanks for looking at it! My Ocaml-fu is rather limited.
It does look cleaner now.
Here it is together with all the other parts of the patch, plus some
minor formatting changes.

Ok for trunk now?

gcc/ChangeLog
2013-04-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com
Julian Brown  jul...@codesourcery.com

* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
(TB_DREG): Add T_V4HF.
(v4hf_UP): New macro.
(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
Handle initialisation of V4HF. Adjust initialisation of reinterpret
built-ins.
(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
(arm_vector_mode_supported_p): Handle V4HF.
(arm_mangle_map): Handle V4HFmode.
* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
* config/arm/arm_neon_builtins.def: Add entries for
vcvtv4hfv4sf, vcvtv4sfv4hf.
* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
(neon_vcvtv4hfv4sf): Likewise.
* config/arm/neon-gen.ml: Handle half-precision floating point
features.
* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
* config/arm/arm_neon.h: Regenerate.
* config/arm/neon.ml (type elts): Add F16.
(type vectype): Add T_float16x4, T_floatHF.
(type vecmode): Add V4HF.
(type features): Add Requires_FP_bit feature.
(elt_width): Handle F16.
(elt_class): Likewise.
(elt_of_class_width): Likewise.
(mode_of_elt): Refactor.
(type_for_elt): Handle F16, fix error messages.
(vectype_size): Handle T_float16x4.
(vcvt_sh): New function.
(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
(string_of_mode): Handle V4HF.
* doc/arm-neon-intrinsics.texi: Regenerate.

gcc/testsuite/ChangeLog
2013-04-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com
Julian Brown  jul...@codesourcery.com

* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.

neon-vcvt-intrinsics.patch
Description: Binary data

[PATCH] Fix PR56982, handle setjmp like non-local labels

2013-04-17 Thread Richard Biener


This fixes PR56982 by properly modeling the control-flow
of setjmp.  It basically behaves as a non-local goto target
so this patch treats it so - it makes it start a basic-block
and get abnormal edges from possible sources of non-local
gotos.  The patch also fixes the bug that longjmp is marked
as leaf.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
What about release branches (after it had some time to settle on
trunk of course)?

Thanks,
Richard.

2013-04-17  Richard Biener  rguent...@suse.de

PR tree-optimization/56982
* builtins.def (BUILT_IN_LONGJMP): longjmp is not a leaf
function.
* gimplify.c (gimplify_call_expr): Notice special calls.
(gimplify_modify_expr): Likewise.
* tree-cfg.c (make_abnormal_goto_edges): Handle setjmp-like
abnormal control flow receivers.
(call_can_make_abnormal_goto): Handle cfun-calls_setjmp
in the same way as cfun-has_nonlocal_labels.
(gimple_purge_dead_abnormal_call_edges): Likewise.
(stmt_starts_bb_p): Make setjmp-like abnormal control flow
receivers start a basic-block.

* gcc.c-torture/execute/pr56982.c: New testcase.

Index: gcc/gimplify.c
===
*** gcc/gimplify.c  (revision 198021)
--- gcc/gimplify.c  (working copy)
*** gimplify_call_expr (tree *expr_p, gimple
*** 2729,2734 
--- 2729,2735 
gimple_stmt_iterator gsi;
call = gimple_build_call_from_tree (*expr_p);
gimple_call_set_fntype (call, TREE_TYPE (fnptrtype));
+   notice_special_calls (call);
gimplify_seq_add_stmt (pre_p, call);
gsi = gsi_last (*pre_p);
fold_stmt (gsi);
*** gimplify_modify_expr (tree *expr_p, gimp
*** 4968,4973 
--- 4969,4975 
STRIP_USELESS_TYPE_CONVERSION (CALL_EXPR_FN (*from_p));
assign = gimple_build_call_from_tree (*from_p);
gimple_call_set_fntype (assign, TREE_TYPE (fnptrtype));
+   notice_special_calls (assign);
if (!gimple_call_noreturn_p (assign))
gimple_call_set_lhs (assign, *to_p);
  }
Index: gcc/tree-cfg.c
===
*** gcc/tree-cfg.c  (revision 198021)
--- gcc/tree-cfg.c  (working copy)
*** make_abnormal_goto_edges (basic_block bb
*** 967,991 
gimple_stmt_iterator gsi;
  
FOR_EACH_BB (target_bb)
! for (gsi = gsi_start_bb (target_bb); !gsi_end_p (gsi); gsi_next (gsi))
!   {
!   gimple label_stmt = gsi_stmt (gsi);
!   tree target;
  
!   if (gimple_code (label_stmt) != GIMPLE_LABEL)
! break;
  
!   target = gimple_label_label (label_stmt);
  
!   /* Make an edge to every label block that has been marked as a
!  potential target for a computed goto or a non-local goto.  */
!   if ((FORCED_LABEL (target)  !for_call)
!   || (DECL_NONLOCAL (target)  for_call))
! {
make_edge (bb, target_bb, EDGE_ABNORMAL);
!   break;
! }
!   }
  }
  
  /* Create edges for a goto statement at block BB.  */
--- 971,1005 
gimple_stmt_iterator gsi;
  
FOR_EACH_BB (target_bb)
! {
!   for (gsi = gsi_start_bb (target_bb); !gsi_end_p (gsi); gsi_next (gsi))
!   {
! gimple label_stmt = gsi_stmt (gsi);
! tree target;
  
! if (gimple_code (label_stmt) != GIMPLE_LABEL)
!   break;
  
! target = gimple_label_label (label_stmt);
  
! /* Make an edge to every label block that has been marked as a
!potential target for a computed goto or a non-local goto.  */
! if ((FORCED_LABEL (target)  !for_call)
! || (DECL_NONLOCAL (target)  for_call))
!   {
! make_edge (bb, target_bb, EDGE_ABNORMAL);
! break;
!   }
!   }
!   if (!gsi_end_p (gsi))
!   {
! /* Make an edge to every setjmp-like call.  */
! gimple call_stmt = gsi_stmt (gsi);
! if (is_gimple_call (call_stmt)
!  (gimple_call_flags (call_stmt)  ECF_RETURNS_TWICE))
make_edge (bb, target_bb, EDGE_ABNORMAL);
!   }
! }
  }
  
  /* Create edges for a goto statement at block BB.  */
*** call_can_make_abnormal_goto (gimple t)
*** 2147,2153 
  {
/* If the function has no non-local labels, then a call cannot make an
   abnormal transfer of control.  */
!   if (!cfun-has_nonlocal_label)
 return false;
  
/* Likewise if the call has no side effects.  */
--- 2161,2168 
  {
/* If the function has no non-local labels, then a call cannot make an
   abnormal transfer of control.  */
!   if (!cfun-has_nonlocal_label
!!cfun-calls_setjmp)
 return false;
  
/* Likewise if the call has no side effects.  */
*** stmt_starts_bb_p (gimple stmt, gimple pr
*** 2302,2307 
--- 2317,2327 
else

[PATCH] Fix PR56921

2013-04-17 Thread Richard Biener


This fixes PR56921 in a better way and restores the ability
to ggc-collect during RTL loop passes.  The patch stores
the simple-loop-description in a separate member of struct loop
and not its aux field which is not scanned by GC.

Bootstrapped and tested on x86_64-unknown-linux-gnu and
powerpc64-linux-gnu, applied.

Richard.

2013-04-17  Richard Biener  rguent...@suse.de

PR rtl-optimization/56921
* cfgloop.h (struct loop): Add simple_loop_desc member.
(struct niter_desc): Mark with GTY(()).
(simple_loop_desc): Do not use aux field but simple_loop_desc.
* loop-iv.c (get_simple_loop_desc): Likewise.
(free_simple_loop_desc): Likewise.

Revert
2013-04-16  Richard Biener  rguent...@suse.de

PR rtl-optimization/56921
* loop-init.c (pass_rtl_move_loop_invariants): Add
TODO_do_not_ggc_collect to todo_flags_finish.
(pass_rtl_unswitch): Same.
(pass_rtl_unroll_and_peel_loops): Same.
(pass_rtl_doloop): Same.

Index: gcc/cfgloop.h
===
*** gcc/cfgloop.h   (revision 198021)
--- gcc/cfgloop.h   (working copy)
*** struct GTY ((chain_next (%h.next))) lo
*** 172,177 
--- 172,180 
  
/* Head of the cyclic list of the exits of the loop.  */
struct loop_exit *exits;
+ 
+   /* Number of iteration analysis data for RTL.  */
+   struct niter_desc *simple_loop_desc;
  };
  
  /* Flags for state of loop structure.  */
*** struct rtx_iv
*** 372,378 
  /* The description of an exit from the loop and of the number of iterations
 till we take the exit.  */
  
! struct niter_desc
  {
/* The edge out of the loop.  */
edge out_edge;
--- 375,381 
  /* The description of an exit from the loop and of the number of iterations
 till we take the exit.  */
  
! struct GTY(()) niter_desc
  {
/* The edge out of the loop.  */
edge out_edge;
*** extern void free_simple_loop_desc (struc
*** 425,431 
  static inline struct niter_desc *
  simple_loop_desc (struct loop *loop)
  {
!   return (struct niter_desc *) loop-aux;
  }
  
  /* Accessors for the loop structures.  */
--- 428,434 
  static inline struct niter_desc *
  simple_loop_desc (struct loop *loop)
  {
!   return loop-simple_loop_desc;
  }
  
  /* Accessors for the loop structures.  */
Index: gcc/loop-iv.c
===
*** gcc/loop-iv.c   (revision 198021)
--- gcc/loop-iv.c   (working copy)
*** get_simple_loop_desc (struct loop *loop)
*** 3016,3025 
  
/* At least desc-infinite is not always initialized by
   find_simple_loop_exit.  */
!   desc = XCNEW (struct niter_desc);
iv_analysis_loop_init (loop);
find_simple_exit (loop, desc);
!   loop-aux = desc;
  
if (desc-simple_p  (desc-assumptions || desc-infinite))
  {
--- 3016,3025 
  
/* At least desc-infinite is not always initialized by
   find_simple_loop_exit.  */
!   desc = ggc_alloc_cleared_niter_desc ();
iv_analysis_loop_init (loop);
find_simple_exit (loop, desc);
!   loop-simple_loop_desc = desc;
  
if (desc-simple_p  (desc-assumptions || desc-infinite))
  {
*** free_simple_loop_desc (struct loop *loop
*** 3069,3074 
if (!desc)
  return;
  
!   free (desc);
!   loop-aux = NULL;
  }
--- 3069,3074 
if (!desc)
  return;
  
!   ggc_free (desc);
!   loop-simple_loop_desc = NULL;
  }
Index: gcc/loop-init.c
===
*** gcc/loop-init.c (revision 198021)
--- gcc/loop-init.c (working copy)
*** struct rtl_opt_pass pass_rtl_move_loop_i
*** 434,441 
0,/* properties_destroyed */
0,/* todo_flags_start */
TODO_df_verify |
!   TODO_df_finish | TODO_verify_rtl_sharing
!   | TODO_do_not_ggc_collect   /* todo_flags_finish */
   }
  };
  
--- 434,440 
0,/* properties_destroyed */
0,/* todo_flags_start */
TODO_df_verify |
!   TODO_df_finish | TODO_verify_rtl_sharing  /* todo_flags_finish */
   }
  };
  
*** struct rtl_opt_pass pass_rtl_unswitch =
*** 471,478 
0,/* properties_provided */
0,/* properties_destroyed */
0,/* todo_flags_start */
!   TODO_verify_rtl_sharing
!   | TODO_do_not_ggc_collect   /* todo_flags_finish */
   }
  };
  
--- 470,476 
0,/* properties_provided */
0,/* properties_destroyed */
0,/* todo_flags_start */
!   TODO_verify_rtl_sharing,  /* todo_flags_finish */
   }

Re: [PATCH][RFC] Handle commutative operations in SLP tree build

2013-04-17 Thread Richard Biener

On Wed, 10 Apr 2013, Richard Biener wrote:

 
 This handles commutative operations during SLP tree build in the
 way that if one configuration does not match, the build will
 try again with commutated operands for.  This allows to remove
 the special-casing of commutated loads in a complex addition
 that was in the end handled as permutation.  It of course
 also applies more generally.  Permutation is currently limited
 to 3 unsuccessful permutes to avoid running into the inherently
 exponential complexity of tree matching.
 
 The gcc.dg/vect/vect-complex-?.c testcases provide some testing
 coverage (previously handled by the special-casing).  I have
 seen failed SLP in the wild previously but it's usually on
 larger testcases and dependent on operand order of commutative
 operands.
 
 I've discussed ideas to restrict the cases where we try a permutation
 with Matz, but I'll rather defer that to an eventual followup.
 (compute per SSA name a value dependent on the shape of its
 use-def tree and use that as a quick check whether sub-trees
 can possibly match)
 
 Bootstrap and regtest running on x86_64-unknown-linux-gnu.
 
 Any comments?

Committed to trunk.

Richard.

 2013-04-10  Richard Biener  rguent...@suse.de
 
   * tree-vect-slp.c (vect_build_slp_tree_1): Split out from ...
   (vect_build_slp_tree): ... here.
   (vect_build_slp_tree_1): Compute which stmts of the SLP group
   match.  Remove special-casing of mismatched complex loads.
   (vect_build_slp_tree): Based on the result from vect_build_slp_tree_1
   re-try the match with swapped commutative operands.
   (vect_supported_load_permutation_p): Remove special-casing of
   mismatched complex loads.
   (vect_analyze_slp_instance): Adjust.

Re: [patch] RFC: ix86 / x86_64 register pressure aware scheduling

2013-04-17 Thread Igor Zamyatin

These changes are what we used to try here at Intel after bunch of
changes which made pre-alloc scheduler more stable. We benchmarked
both register pressure algorithms and overall result was not that
promising.

We saw number of regressions e.g. for optset -mavx -O3 -funroll-loops
-ffast-math -march=corei7 (for spec2000 not only lucas but also applu
regressed). And overall gain is negative even for x86_64. For 32 bits
picture was worse if I remember correctly.

In common we have doubts that this feature is good for OOO machine

Thanks,
Igor

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org
[mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Steven Bosscher
Sent: Monday, April 15, 2013 11:34 PM
To: GCC Patches
Cc: H.J. Lu; Uros Bizjak; Jan Hubicha
Subject: [patch] RFC: ix86 / x86_64 register pressure aware scheduling

Hello,

The attached patch enables register pressure aware scheduling for the
ix86 and x86_64 targets. It uses the optimistic algorithm to avoid
being overly conservative.

This is the same as what other CISCy targets, like s390, also do.

The motivation for this patch is the excessive spilling I've observed
in a few test cases with relatively large basic blocks, e.g.
encryption algorithms and codecs. The patch passes bootstrap+testing
on x86_64-unknown-linux-gnu and i686-unknown-linux-gnu, with a few new
failures due to PR56950.

Off-list, Uros, Honza and others have already looked at the patch and
benchmarked it. For x86_64 there is an overall improvement for SPEC2k
except that lucas regresses, but such a preliminary result is IMHO
very promising.

Comments/suggestions welcome :-)

Ciao!
Steven
* common/config/i386/i386-common.c (ix86_option_optimization_table):
Do not disable insns scheduling.  Enable register pressure aware
scheduling.
* config/i386/i386.c (ix86_option_override): Use the alternative,
optimistic scheduling-pressure algorithm by default.

Index: common/config/i386/i386-common.c
===
--- common/config/i386/i386-common.c(revision 197941)
+++ common/config/i386/i386-common.c(working copy)
@@ -707,9 +707,15 @@ static const struct default_options ix86
   {
 /* Enable redundant extension instructions removal at -O2 and higher.  */
 { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
-/* Turn off -fschedule-insns by default.  It tends to make the
-   problem with not enough registers even worse.  */
-{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
+/* Enable -fsched-pressure by default for all optimization levels.
+   Before SCHED_PRESSURE_MODEL register-pressure aware schedule was
+   available, -fschedule-insns was turned off completely by default for
+   this port, because scheduling before register allocation tends to
+   make the problem with not enough registers even worse.  However,
+   for very long basic blocks the scheduler can help bring register
+   pressure down significantly, and SCHED_PRESSURE_MODEL is still
+   conservative enough to avoid creating excessive register pressure.  */
+{ OPT_LEVELS_ALL, OPT_fsched_pressure, NULL, 1 },
 
 #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
 SUBTARGET_OPTIMIZATION_OPTIONS,
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 197941)
+++ config/i386/i386.c  (working copy)
@@ -3936,6 +3936,10 @@ ix86_option_override (void)
 
   ix86_option_override_internal (true);
 
+  /* Use the alternative scheduling-pressure algorithm by default.  */
+  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2,
+global_options.x_param_values,
+global_options_set.x_param_values);
 
   /* This needs to be done at start up.  It's convenient to do it here.  */
   register_pass (insert_vzeroupper_info);

[PATCH, ARM] emit LDRD epilogue instead of a single LDM return

2013-04-17 Thread Greta Yorsh

Currently, epilogue is not generated in RTL for function that can return
using a single instruction. This patch enables RTL epilogue for such
functions on targets that can benefit from using a sequence of LDRD
instructions in epilogue instead of a single LDM instruction.

No regression on qemu arm-none-eabi with cortex-a15.

Ok for trunk?

Thanks,
Greta

gcc/

2012-10-19  Greta Yorsh  Greta.Yorsh at arm.com

* config/arm/arm.c (use_return_insn): Return 0 for targets that
can benefit from using a sequence of LDRD instructions in epilogue
instead of a single LDM instruction.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 866385c..bca92af 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2296,6 +2296,10 @@ use_return_insn (int iscond, rtx sibling)
   if (IS_INTERRUPT (func_type)  (frame_pointer_needed || TARGET_THUMB))
 return 0;
 
+  if (TARGET_LDRD  current_tune-prefer_ldrd_strd
+   !optimize_function_for_size_p (cfun))
+return 0;
+
   offsets = arm_get_frame_offsets ();
   stack_adjust = offsets-outgoing_args - offsets-saved_regs;

[PATCH, ARM][10/n] Split scc patterns using cond_exec

2013-04-17 Thread Greta Yorsh

This patch converts define_insn into define_insn_and_split to split
some alternatives of movsicc_insn and some scc patterns that cannot be
expressed using movsicc. The patch emits cond_exec RTL insns.

Ok for trunk?

Thanks,
Greta

gcc/

2013-02-19  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (movsicc_insn): Convert define_insn into
define_insn_and_split.
(and_scc,ior_scc,negscc): Likewise.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 83b36ca..c2e59ed 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -858,7 +858,7 @@
 
 ;; This is the canonicalization of addsi3_compare0_for_combiner when the
 ;; addend is a constant.
-(define_insn *cmpsi2_addneg
+(define_insn cmpsi2_addneg
   [(set (reg:CC CC_REGNUM)
(compare:CC
 (match_operand:SI 1 s_register_operand r,r)
@@ -1415,7 +1415,7 @@
(set_attr type  simple_alu_imm,*,*)]
 )
 
-(define_insn *subsi3_compare
+(define_insn subsi3_compare
   [(set (reg:CC CC_REGNUM)
(compare:CC (match_operand:SI 1 arm_rhs_operand r,r,I)
(match_operand:SI 2 arm_rhs_operand I,r,r)))
@@ -8619,7 +8619,7 @@
(set_attr type f_selvfp_type)]
 )
 
-(define_insn *movsicc_insn
+(define_insn_and_split *movsicc_insn
   [(set (match_operand:SI 0 s_register_operand =r,r,r,r,r,r,r,r)
(if_then_else:SI
 (match_operator 3 arm_comparison_operator
@@ -8632,10 +8632,45 @@
mvn%D3\\t%0, #%B2
mov%d3\\t%0, %1
mvn%d3\\t%0, #%B1
-   mov%d3\\t%0, %1\;mov%D3\\t%0, %2
-   mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
-   mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
-   mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
+   #
+   #
+   #
+   #
+   ; alt4: mov%d3\\t%0, %1\;mov%D3\\t%0, %2
+   ; alt5: mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
+   ; alt6: mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
+   ; alt7: mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
+   reload_completed
+  [(const_int 0)]
+  {
+enum rtx_code rev_code;
+enum machine_mode mode;
+rtx rev_cond;
+
+emit_insn (gen_rtx_COND_EXEC (VOIDmode,
+  operands[3],
+  gen_rtx_SET (VOIDmode,
+   operands[0],
+   operands[1])));
+
+rev_code = GET_CODE (operands[3]);
+mode = GET_MODE (operands[4]);
+if (mode == CCFPmode || mode == CCFPEmode)
+  rev_code = reverse_condition_maybe_unordered (rev_code);
+else
+  rev_code = reverse_condition (rev_code);
+
+rev_cond = gen_rtx_fmt_ee (rev_code,
+   VOIDmode,
+   operands[4],
+   const0_rtx);
+emit_insn (gen_rtx_COND_EXEC (VOIDmode,
+  rev_cond,
+  gen_rtx_SET (VOIDmode,
+   operands[0],
+   operands[2])));
+DONE;
+  }
   [(set_attr length 4,4,4,4,8,8,8,8)
(set_attr conds use)
(set_attr insn mov,mvn,mov,mvn,mov,mov,mvn,mvn)
@@ -9604,27 +9639,64 @@
(set_attr type alu_shift,alu_shift_reg)])
 
 
-(define_insn *and_scc
+(define_insn_and_split *and_scc
   [(set (match_operand:SI 0 s_register_operand =r)
(and:SI (match_operator:SI 1 arm_comparison_operator
-[(match_operand 3 cc_register ) (const_int 0)])
-   (match_operand:SI 2 s_register_operand r)))]
+[(match_operand 2 cc_register ) (const_int 0)])
+   (match_operand:SI 3 s_register_operand r)))]
   TARGET_ARM
-  mov%D1\\t%0, #0\;and%d1\\t%0, %2, #1
+  #   ; mov%D1\\t%0, #0\;and%d1\\t%0, %3, #1
+   reload_completed
+  [(cond_exec (match_dup 5) (set (match_dup 0) (const_int 0)))
+   (cond_exec (match_dup 4) (set (match_dup 0)
+ (and:SI (match_dup 3) (const_int 1]
+  {
+enum machine_mode mode = GET_MODE (operands[2]);
+enum rtx_code rc = GET_CODE (operands[1]);
+
+/* Note that operands[4] is the same as operands[1],
+   but with VOIDmode as the result. */
+operands[4] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx);
+if (mode == CCFPmode || mode == CCFPEmode)
+  rc = reverse_condition_maybe_unordered (rc);
+else
+  rc = reverse_condition (rc);
+operands[5] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx);
+  }
   [(set_attr conds use)
(set_attr insn mov)
(set_attr length 8)]
 )
 
-(define_insn *ior_scc
+(define_insn_and_split *ior_scc
   [(set (match_operand:SI 0 s_register_operand =r,r)
-   (ior:SI (match_operator:SI 2 arm_comparison_operator
-[(match_operand 3 cc_register ) (const_int 0)])
-   (match_operand:SI 1 s_register_operand 0,?r)))]
+   (ior:SI (match_operator:SI 1 arm_comparison_operator
+[(match_operand 2 cc_register ) (const_int 0)])
+   (match_operand:SI 3 s_register_operand 0,?r)))]
   TARGET_ARM
   @
-

New German PO file for 'gcc' (version 4.8.0)

2013-04-17 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-4.8.0.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org

Re: [PATCH] Fix linking with -findirect-dispatch

2013-04-17 Thread Andreas Schwab

Bryce McKinlay bmckin...@gmail.com writes:

 It certainly _did_ work as intended previously.

Only by chance, when libtool has to relink the library during install.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.

[Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure

2013-04-17 Thread Janus Weil

Hi all,

here is patch for a recent regression with procedure pointers.
Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8?

Cheers,
Janus


2013-04-17  Janus Weil  ja...@gcc.gnu.org

PR fortran/56814
* interface.c (check_result_characteristics): Get result from interface
if present.


2013-04-17  Janus Weil  ja...@gcc.gnu.org

PR fortran/56814
* gfortran.dg/proc_ptr_42.f90: New.


pr56814_v2.diff
Description: Binary data


proc_ptr_42.f90
Description: Binary data

Re: [Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure

2013-04-17 Thread Tobias Burnus


Janus Weil:

here is patch for a recent regression with procedure pointers.
Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8?


Looks rather obvious. OK - and thanks for the patch.

Tobias

PS: If you have time, could you review my C_LOC patch at 
http://gcc.gnu.org/ml/fortran/2013-04/msg00073.html ?



2013-04-17  Janus Weil  ja...@gcc.gnu.org

 PR fortran/56814
 * interface.c (check_result_characteristics): Get result from interface
 if present.


2013-04-17  Janus Weil  ja...@gcc.gnu.org

 PR fortran/56814
 * gfortran.dg/proc_ptr_42.f90: New.

Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-17 Thread Vladimir Makarov


On 13-04-16 6:56 PM, Michael Meissner wrote:

I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
allocator.

At one point LRA has decided to use the CTR to hold a CCmode value:

(insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
 (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 
{*movcc_internal1}
  (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
 (nil)))

Now movcc_internal1 has moves from r-h (which includes ctr/lr) and ctr/lr-r,
but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
IMHO, LRA should not be generating NOP moves that are later deleted.

There are two ways to solve the problem.  One is not to let anything but int
modes into CTR/LR, which will also eliminate the register allocator from
spilling floating point values there (which we've seen in the past, but the
last time I tried to eliminate it I couldn't).  The following patch does this,
and also changes the assertion to call fatal_insn_not_found to make it clearer
what the error is.

I imagine, I could add a NOP move insn to movcc_internal1, but that just
strikes me as wrong.

Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
can't bootstrap the compiler using --with-cpu=power7, which I will get to
tomorrow.

2013-04-16  Michael Meissner  meiss...@linux.vnet.ibm.com

* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
control whether we only allow int modes to go in the CTR, LR,
VRSAVE, VSCR registers.
* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
constrained.
(rs6000_option_override_internal): Set -mconstrain-regs if we are
using the LRA register allocator.

* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
does not match.

Mike, thanks for the patch and all the SPEC2006 data  (which are very 
useful as I have no access to power machine which can be used for 
benchmarking).  I guess that may be some benchmark scores are lower 
because of LRA lacks some micro-optimizations which reload implements 
through many power hooks (e.g. LRA does not use push reload).  Although 
sometimes it is not a bad thing (e.g. LRA does not use  
SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for 
other useful things).


In general I got impression that power7 is the most difficult port for 
LRA.  If we manage to port it, LRA ports for other targets will be easier.


I also reproduced bootstrap failure --with-cpu=power7 and I am going to 
work on this and after that on SPEC2006 you wrote about.

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-04-17 Thread Jan Hubicka

 
 Bootstrap/make check/Specs2k are passing on i686 and x86_64.
Thanks for returning to this!

glibc has quite comprehensive testsuite for stringop.  It may be useful to test 
it
with -minline-all-stringop -mstringop-stategy=vector

I tested the patch on my core notebook and my memcpy micro benchmark. 
Vector loop is not a win since apparenlty we do not produce any SSE code for 
64bit
compilation. What CPUs and bock sizes this is intended for?

Also the internal loop with -march=native seems to come out as:
.L7:
movq(%rsi,%r8), %rax
movq8(%rsi,%r8), %rdx
movq48(%rsi,%r8), %r9
movq56(%rsi,%r8), %r10
movdqu  16(%rsi,%r8), %xmm3
movdqu  32(%rsi,%r8), %xmm1
movq%rax, (%rdi,%r8)
movq%rdx, 8(%rdi,%r8)
movdqa  %xmm3, 16(%rdi,%r8)
movdqa  %xmm1, 32(%rdi,%r8)
movq%r9, 48(%rdi,%r8)
movq%r10, 56(%rdi,%r8)
addq$64, %r8
cmpq%r11, %r8

It is not htat much of SSE enablement since RA seems to home the vars in 
integer regs.
Could you please look into it?
 
 Changelog entry:
 
 2013-04-10  Michael Zolotukhin  michael.v.zolotuk...@gmail.com
 
 * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
 * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
 adjust_address instead of change_address to keep info about alignment.
 (emit_strmov): Remove.
 (emit_memmov): New function.
 (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
 (expand_movmem_epilogue): Likewise and return updated rtx for
 destination.
 (expand_constant_movmem_prologue): Likewise and return updated rtx for
 destination and source.
 (decide_alignment): Refactor, handle vector_loop.
 (ix86_expand_movmem): Likewise.
 (ix86_expand_setmem): Likewise.
 * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg.
 * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 73a59b5..edb59da 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1565,6 +1565,18 @@ get_mem_align_offset (rtx mem, unsigned int align)
  expr = inner;
}
 }
+  else if (TREE_CODE (expr) == MEM_REF)
+{
+  tree base = TREE_OPERAND (expr, 0);
+  tree byte_offset = TREE_OPERAND (expr, 1);
+  if (TREE_CODE (base) != ADDR_EXPR
+ || TREE_CODE (byte_offset) != INTEGER_CST)
+   return -1;
+  if (!DECL_P (TREE_OPERAND (base, 0))
+ || DECL_ALIGN (TREE_OPERAND (base, 0))  align)

You can use TYPE_ALIGN here? In general can't we replace all the GIMPLE
handling by get_object_alignment?

+   return -1;
+  offset += tree_low_cst (byte_offset, 1);
+}
   else
 return -1;
 
This change out to go independently. I can not review it. 
I will make first look over the patch shortly, but please send updated patch 
fixing
the problem with integer regs.

Honza

1 2 >

1 - 100 of 130 matches

Mail list logo