[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2008-01-08 Thread bergner at gcc dot gnu dot org


--- Comment #45 from bergner at gcc dot gnu dot org  2008-01-08 15:44 
---
This has been fixed in mainline (4.3), but has not been fixed in 4.2.  I'm ok
with not back porting this to 4.2.  I'll give everyone a few days to object,
otherwise I'll change the Target Milestone to 4.3 and close as FIXED.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2008-01-07 Thread steven at gcc dot gnu dot org


--- Comment #44 from steven at gcc dot gnu dot org  2008-01-07 17:34 ---
Hello world.  Please, a status update.


-- 

steven at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|ASSIGNED|WAITING


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-11-10 Thread steven at gcc dot gnu dot org


--- Comment #43 from steven at gcc dot gnu dot org  2007-11-10 17:05 ---
What is the status of this bug now?  Re. comment #39, a meta-bug for what? 
There is only one open bug left that depends on this one.

Are we still tracking an issue in this bug?  If so, what?  If not, please close
this bug report.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-10-09 Thread mmitchel at gcc dot gnu dot org


--- Comment #42 from mmitchel at gcc dot gnu dot org  2007-10-09 19:21 
---
Change target milestone to 4.2.3, as 4.2.2 has been released.


-- 

mmitchel at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|4.2.2   |4.2.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-08-06 Thread bonzini at gnu dot org


--- Comment #39 from bonzini at gnu dot org  2007-08-06 08:08 ---
committed??


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-08-06 Thread pinskia at gcc dot gnu dot org


--- Comment #40 from pinskia at gcc dot gnu dot org  2007-08-06 11:35 
---
(In reply to comment #39)
 committed??

This is now more like a meta-bug, see the other two bugs which are opened for
the current issues (yes both are assigned to me and both are actively being
worked on, well one is depend on the other but still being worked on).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-08-06 Thread paolo dot bonzini at lu dot unisi dot ch


--- Comment #41 from paolo dot bonzini at lu dot unisi dot ch  2007-08-06 
11:52 ---
Subject: Re:  [4.2/4.3 Regression] Performace problem
 with indexed load/stores on powerpc


 This is now more like a meta-bug, see the other two bugs which are opened for
 the current issues (yes both are assigned to me and both are actively being
 worked on, well one is depend on the other but still being worked on).

Ah, I see.

Paolo


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-07-19 Thread mmitchel at gcc dot gnu dot org


-- 

mmitchel at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|4.2.1   |4.2.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-06-08 Thread bergner at gcc dot gnu dot org


--- Comment #38 from bergner at gcc dot gnu dot org  2007-06-09 04:08 
---
Created an attachment (id=13671)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13671action=view)
Updated patch to address x86_64 performance issues.

This updated patch reverts the swap_commutative_operands_with_target removal
from the previous patch, since it seems x86_64 still relies on this for some
benchmarks. The change doesn't seem to affect the POWER6 performance benefits
we were seeing with the older patch.


-- 

bergner at gcc dot gnu dot org changed:

   What|Removed |Added

  Attachment #12915|0   |1
is obsolete||
  Attachment #13101|0   |1
is obsolete||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-05-14 Thread mmitchel at gcc dot gnu dot org


--- Comment #37 from mmitchel at gcc dot gnu dot org  2007-05-14 22:25 
---
Will not be fixed in 4.2.0; retargeting at 4.2.1.


-- 

mmitchel at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|4.2.0   |4.2.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-02-23 Thread bergner at gcc dot gnu dot org


--- Comment #36 from bergner at gcc dot gnu dot org  2007-02-23 17:14 
---
Created an attachment (id=13101)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13101action=view)
Alternate patch to commutative_operand_precedence to increase the precedence of
REG_POINTER and MEM_POINTER objects.

With help from Honza, this updated patch eliminates the ICE the previous patch
hit.


-- 

bergner at gcc dot gnu dot org changed:

   What|Removed |Added

  Attachment #13042|0   |1
is obsolete||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-02-12 Thread bergner at gcc dot gnu dot org


--- Comment #35 from bergner at gcc dot gnu dot org  2007-02-12 17:29 
---
Created an attachment (id=13042)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13042action=view)
Alternate patch to commutative_operand_precedence to increase the precedence of
REG_POINTER and MEM_POINTER objects.

Ok, now that the libjava multilib problems have been fixed, I've been
able to attempt to bootstrap the patch in Comment #34 with java enabled.
In doing so, I'm now hitting an ICE while building the 64-bit libgcj.
The ICE is occurring in the same location and for the same reason as
the ICE (optabs.c:emit_cmp_and_jump_insns()) I hit when I attempted
to change swap_commutative_operands_p() (as in this alternate
patch) so that it sorted REG's by register numbers similar to how
simplify_plus_minus_op_data_cmp() sorts them.

With this attached alternate patch, we have a simple testcase that 
exposes the problem:

  void
  gomp_sem_wait_slow (int *sem, int a, int b)
  {
__sync_bool_compare_and_swap (sem, a, b);
  }

For this testcase, the swap_commutative_operands_p (x, y) call that guards
the gcc_assert (label) we're failing in, x and y are simple REGs that
are swapped due to REGNO(y) is smaller then RGENO(x).  I don't understand
why the gcc_assert(label) is needed, but I'll try and track that down.


-- 

bergner at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |bergner at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2007-01-17 Thread bergner at gcc dot gnu dot org


--- Comment #34 from bergner at gcc dot gnu dot org  2007-01-17 20:58 
---
Created an attachment (id=12915)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12915action=view)
Patch to commutative_operand_precedence to increase the precedence of
REG_POINTER and MEM_POINTER objects.

This patch modifies commutative_operand_precedence to increase the precedence
of REG_POINTER and MEM_POINTER objects. This obviates the need for
swap_commutative_operands_with_target which has been replaced by a call to
swap_commutative_operands_p.

This patch improves performance on POWER6 (using -mcpu=power6) by 30% across
both specint and specfp, with a 498% improvement on galgel. On POWER5 (using
-mcpu=power5), there was only negligible differences between the base and
patched compilers. It also correctly transforms all of the above test cases
except those with artificial pointers which will have to wait until Andrew's
PTR_PLUS_EXPR work is complete.

Paolo tested this patch on x86 and saw 4% degradation on galgel which he said
he'd look into. However, overall across both specint and specfp, the
performance change didn't seem that bad.


-- 

bergner at gcc dot gnu dot org changed:

   What|Removed |Added

  Attachment #12375|0   |1
is obsolete||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-12-05 Thread pthaugen at us dot ibm dot com


--- Comment #32 from pthaugen at us dot ibm dot com  2006-12-05 16:12 
---
Another example, pared down from ammp benchmark in cpu2000. 

void f2(int *, int *);
void mm_fv_update_nonbon(void)
{
 int j, nx;
 int naybor[27];

 f2(naybor, nx);

 for(j=0; j 27; j++)
 if( naybor[j]) break; /* Indexed load problem */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-12-05 Thread pthaugen at us dot ibm dot com


--- Comment #33 from pthaugen at us dot ibm dot com  2006-12-05 16:30 
---
My prior comment is missing the closing bracket for the procedure, but example
is otherwise complete.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-12-04 Thread bergner at vnet dot ibm dot com


--- Comment #30 from bergner at vnet dot ibm dot com  2006-12-05 04:22 
---
Ok, the problem from comment #28 was due to a latent bug in
reload1.c:eliminate_regs_in_insn(). The bug is that eliminate_regs_in_insn()
calls single_set() on the passed in insn.  This has been fine before, but now
with the patch, we end up passing in a parallel insn for a load with update and
the load portion of the parallel has the REG_UNUSED flag set.  This causes
single_set() to return the update portion of the parallel instead of
returning NULL as it would do normally with parallels.  This causes us to only
eliminate the update portion of the parallel and we skip eliminating the load
portion.  The problem insn belfore eliminate_regs_in_insn() looks like:

(insn 12 62 13 2 (parallel [
(set (reg:SI 0 0 [125])
(mem/s/j:SI (plus:SI (reg/f:SI 113 sfp)
(const_int 8 [0x8])) [0 S4 A32]))
(set (reg/f:SI 9 9 [orig:124 D.965 ] [124])
(plus:SI (reg/f:SI 113 sfp)
(const_int 8 [0x8])))
]) 373 {*movsi_update1} (nil)
(expr_list:REG_UNUSED (reg:SI 0 0 [125])
(nil)))

After eliminate_regs_in_insn(), we have:

(insn 12 62 13 2 (parallel [
(set (reg:SI 0 0 [125])
(mem/s/j:SI (plus:SI (reg/f:SI 113 sfp)
(const_int 8 [0x8])) [0 S4 A32]))
(set (reg/f:SI 9 9 [orig:124 D.965 ] [124])
(plus:SI (reg/f:SI 1 1)
(const_int 8 [0x8])))
]) 373 {*movsi_update1} (nil)
(expr_list:REG_UNUSED (reg:SI 0 0 [125])
(nil)))

However, calculate_needs_all_insns() ends up backing out the eliminated
(reg/f:SI 1 1) with the non eliminated (reg/f:SI 113 sfp) and  (reg/f:SI 113
sfp) never gets eliminated after that and we generate bogus assembler.

In addition to the latest patch attached here, I added the following patch to
stop eliminate-regs_in_insn from calling single_set for parallel insns.  It
fixed the bug here and bootstrapped and regtested with no errors.  I'll post
the  combined patch to gcc-patches.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-12-04 Thread bergner at vnet dot ibm dot com


--- Comment #31 from bergner at vnet dot ibm dot com  2006-12-05 04:41 
---
...and here's the patch I mentioned in the previous comment:

Index: reload1.c
===
--- reload1.c   (revision 119497)
+++ reload1.c   (working copy)
@@ -2930,7 +2930,7 @@ eliminate_regs_in_insn (rtx insn, int re
   int icode = recog_memoized (insn);
   rtx old_body = PATTERN (insn);
   int insn_is_asm = asm_noperands (old_body) = 0;
-  rtx old_set = single_set (insn);
+  rtx old_set;
   rtx new_body;
   int val = 0;
   int i;
@@ -2949,6 +2949,12 @@ eliminate_regs_in_insn (rtx insn, int re
   return 0;
 }

+  /* Guard against a PARALLEL with a REG_UNUSED note.  */
+  if (GET_CODE (PATTERN (insn)) != PARALLEL)
+old_set = single_set (insn);
+  else
+old_set = 0;
+
   if (old_set != 0  REG_P (SET_DEST (old_set))
REGNO (SET_DEST (old_set))  FIRST_PSEUDO_REGISTER)
 {


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-11-29 Thread bergner at vnet dot ibm dot com


--- Comment #28 from bergner at vnet dot ibm dot com  2006-11-29 20:11 
---
Another problem with the current patch, is we get one testsuite regression
(gfortran.fortran-torture/compile/defined_type_2.f90 at -O1).  For this simple
testcase, we end up generating bad assembler:

mr 9,sfp

instead of:

mr 9,1

For some reason, the stack frame pseudo isn't reloaded correctly and we spit
out the sfp text instead of the correct register number.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-11-29 Thread bergner at vnet dot ibm dot com


--- Comment #29 from bergner at vnet dot ibm dot com  2006-11-29 22:23 
---
Talking with Andrew on IRC, he said the test case in comment #27 fails for the
same reason as the test case in comment #24 (ie, it looks like an artificial
decl) and should be fixed with his PTR_PLUS_EXPR work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-11-28 Thread bonzini at gnu dot org


--- Comment #27 from bonzini at gnu dot org  2006-11-29 07:56 ---
This case is still not fixed:


struct s {
  int size;
  float *data;
};

void f(struct s *d, struct s *s)
{
  int i;
  for (i = 0; i  s-size; i++)
d-data[i] += s-data[i];
}

The body of the loop is compiled to:

L4:
slwi r2,r9,2
addi r9,r9,1
lfsx f0,r2,r3
lfsx f13,r4,r2
fadds f0,f0,f13
stfsx f0,r2,r3
bdnz L4

Note how r2 is twice in the first position, and once in the second.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-11-07 Thread bergner at vnet dot ibm dot com


--- Comment #24 from bergner at vnet dot ibm dot com  2006-11-08 03:30 
---
Ok, Anton hit another test case the last patch doesn't transform.  It's
actually the same test case as in comment #13, except, int *base is now a
global variable rather than a function parameter.

int *base;

int indexedload(int len)
{
  int i, sum = 0;
  for (i=0; i  len; i++)
sum += base[i];
  return sum;
}

With this case, we get the following RTL generated for the load of *base into a
register:

;; base.1 = base
(insn 24 22 25 (set (reg:SI 129)
(high:SI (symbol_ref:SI (base) [flags 0x84] var_decl 0x400a6930
base))) -1 (nil)
(nil))

(insn 25 24 26 (set (reg/f:SI 128)
(lo_sum:SI (reg:SI 129)
(symbol_ref:SI (base) [flags 0x84] var_decl 0x400a6930 base)))
-1 (nil)
(expr_list:REG_EQUAL (symbol_ref:SI (base) [flags 0x84] var_decl
0x400a6930 base)
(nil)))

(insn 26 25 0 (set (reg:SI 124 [ base.1 ])
(mem/c/i:SI (reg/f:SI 128) [3 base+0 S4 A32])) -1 (nil)
(nil))

There seem to be two problems here.  First, the mem/c/i:SI (reg/f:SI 128) seems
to be missing a MEM_POINTER attribute.  I tracked that down to
rtl.h:MEM_COPY_ATTRIBUTES(LHS,RHS) not copying the MEM_POINTER attribute from
RHS to LHS.  I added the code to do that, so the mem above does get the
MEM_POINTER flag set. However, the reg:SI 124 [base.1] is still missing the
REG_POINTER flag.

This second problem seems to be a problem in expand_one_register_var(tree var).
 In this specific case, var seems to be a pointer type, but looks to be
artifical, so we skip the code that would have marked the reg rtx with the
REG_POINTER:

Breakpoint 4, expand_one_register_var (var=0x400a6d90) at
/home/bergner/gcc/gcc-mainline-rtlanal/gcc/cfgexpand.c:643
643   tree type = TREE_TYPE (var);
(gdb) call debug_tree (var)
 var_decl 0x400a6d90 base.1
type pointer_type 0x4009ae38
type integer_type 0x4009a340 int sizes-gimplified public SI
size integer_cst 0x4008e5e0 constant invariant 32
unit size integer_cst 0x4008e2a0 constant invariant 4
align 32 symtab 0 alias set 2 precision 32 min integer_cst
0x4008e580 -2147483648 max integer_cst 0x4008e5a0 2147483647
pointer_to_this pointer_type 0x4009ae38
public unsigned SI size integer_cst 0x4008e5e0 32 unit size
integer_cst 0x4008e2a0 4
align 32 symtab 0 alias set 3
used unsigned ignored SI file indexedload5.c line 7 size integer_cst
0x4008e5e0 32 unit size integer_cst 0x4008e2a0 4
align 32 context function_decl 0x40151f00 indexedload chain var_decl
0x400a6e00 D.1531
(gdb) p var-decl_common.artificial_flag
$8 = 1

If I clear var-decl_common.artificial_flag with the debugger and continue,
then we get the REG_POINTER flag set and the indexed load is generated with the
base pointer first like we want.  Does anyone know why var above was marked as
artifical?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-11-07 Thread pinskia at gcc dot gnu dot org


--- Comment #25 from pinskia at gcc dot gnu dot org  2006-11-08 03:35 
---
(In reply to comment #24)
 Does anyone know why var above was marked as
 artifical?
Yes because it is an compiler generated decl for the load of the global.  The
main reason why we don't mark the RTL for artifical decls is because we get
still like:
(int*)int_var
Which causes problem.  This is what PTR_PLUS_EXPR which I am creating helps
solves.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690