from:"dann at godzilla dot ics dot uci dot edu"

[Bug tree-optimization/43854] New: names for compiler generated temporaries are too long

2010-04-22 Thread dann at godzilla dot ics dot uci dot edu

Looking at tree dumps, most variables used are compiler generated temporaries
and they have names like pretmp.DECIMAL_NUMBER

If instead of DECIMAL_NUMBER the same number bug in hex was used, this would
reduce the memory used for those temporary names.

This simple patch (that does not take care of all the temporaries, only a
subset):

Index: defaults.h
===
--- defaults.h  (revision 158360)
+++ defaults.h  (working copy)
@@ -46,12 +46,12 @@

 #ifndef ASM_PN_FORMAT
 # ifndef NO_DOT_IN_LABEL
-#  define ASM_PN_FORMAT %s.%lu
+#  define ASM_PN_FORMAT %s.%lx
 # else
 #  ifndef NO_DOLLAR_IN_LABEL
-#   define ASM_PN_FORMAT %s$%lu
+#   define ASM_PN_FORMAT %s$%lx
 #  else
-#   define ASM_PN_FORMAT __%s_%lu
+#   define ASM_PN_FORMAT __%s_%lx
 #  endif
 # endif
 #endif /* ! ASM_PN_FORMAT */

has this effect on the string pool (for an average size C file (dispnew.c from
emacs):

Before:

avg. entry  17.04 bytes (+/- 8.46)

after:
avg. entry  16.99 bytes (+/- 8.50)


so it's something given how small the change was.

The difference would be even bigger if instead of base 32 or base 64 were used
instead of hex, but that's a larger change...

Also pretmp prehitmp and ivtmp prefixes are too long, they could be one
or two letters...


-- 
   Summary: names for compiler generated temporaries are too long
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43854

[Bug middle-end/43855] New: assembly labels are too long

2010-04-22 Thread dann at godzilla dot ics dot uci dot edu

Assembly labels are generated like thisL .LDECIMAL_NUMBER
If instead of DECIMAL_NUMBER the hex version of the same number (or even better
base 32 or base 64) the total assembly size would be reduced.

For combine.s the file size difference for using hex is %1.5 for -O2 -S (if I
remember well)

Here's an emacs function that will estimate the size.  Evaluate the function,
open the .s file and do M-x my-estimate, it will show the size savings
estimate.

(defun my-estimate ()
  (interactive)
  (let ((crt-size (point-max)))
(goto-char (point-min))
(while (re-search-forward \\([.]L[A-Z]*\\)\\([0-9]+\\) nil t)
  (replace-match (format %s%x (match-string 1) (string-to-number
(match-string 2))) nil nil))
(message Size %% change = %f (/ (* 100.0 (- (point-max) crt-size))
(point-max)

This is a rather simple minded estimate, but it shouldn't bet too far.
Things like .LBB and .LBE need to be considered carefully.
This should help speed up the assembler a bit...


-- 
   Summary: assembly labels are too long
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43855

[Bug tree-optimization/43854] names for compiler generated temporaries are too long

2010-04-22 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #2 from dann at godzilla dot ics dot uci dot edu  2010-04-22 
19:54 ---
(In reply to comment #1)
 Also pretmp prehitmp and ivtmp prefixes are too long,
 
 They might be too long but they are useful long without looking too much into
 the code to figure out what kind of temp they are.  We could just use D.XYZ
 instead without a long name. 

Or using P. (or PR.), I. (or IV.)

 Really debug dumps should be used to debug the
 compiler which means having nice names sometimes makes it easier to debug.

As shown above, the names can be both shorter and nice, it's possible to have
both.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43854

[Bug tree-optimization/2462] restrict implementation bug

2009-06-25 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #8 from dann at godzilla dot ics dot uci dot edu  2009-06-25 
15:31 ---
(In reply to comment #7)
 With the new restrict implementation baz() works and all the rest would work
 as well if the calls to link_error () would not cause the malloced memory
 to be clobbered.  The artifact here is that malloced memory is considered
 global (we are not allowed to remove stores to it).

The intention for link_error was to just make it easier to write a test, not to
prohibit optimization.
Please feel free to adjust the code accordingly.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=2462

[Bug tree-optimization/39075] New: alignment for unsigned short a[10000

2009-02-02 Thread dann at godzilla dot ics dot uci dot edu




-- 
   Summary: alignment for unsigned short a[1
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39075

[Bug tree-optimization/39075] alignment for unsigned short a[10000] vs extern unsigned short a[10000]

2009-02-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #1 from dann at godzilla dot ics dot uci dot edu  2009-02-02 
14:50 ---
This code:
unsigned short a[1];
void test()
{
  int i;
  for (i = 0; i  1; ++i)  a[i] = 5;
}

will be vectorized with -O3 -march=core2 to this:

.L2:
movdqa  %xmm0, a(%eax)
addl$16, %eax
cmpl$2, %eax
jne .L2


but this one:

extern unsigned short a[1];

void test()
{
  int i;
  for (i = 0; i  1; ++i) a[i] = 5;
}

will get a lot of extra code before the loop because the vectorizer thinks it
needs to do peeling for alignment:
test.c:7: note: Alignment of access forced using peeling.

Intel's compiler does not generate the extra peeling code.


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

Summary|alignment for unsigned |alignment for unsigned
   |short a[1   |short a[1] vs extern
   ||unsigned short a[1]


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39075

[Bug tree-optimization/39068] signed short plus and signed char plus not vectorized

2009-02-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #3 from dann at godzilla dot ics dot uci dot edu  2009-02-02 
16:42 ---
(In reply to comment #2)
 (reminds me of a couple missed-optimization PRs where vectorization is also
 failing due to casts - PR31873 , PR26128 - don't know if this is related)

Are the casts actually needed in this case?  It seems the get introduced very
early on, the .original dump already has:

  a[i] = (short int) ((short unsigned int) b[i] + (short unsigned int) c[i]);


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39068

[Bug tree-optimization/39069] New: signed short plus and signed char plus not vectorized

2009-02-01 Thread dann at godzilla dot ics dot uci dot edu

gcc -march=core2 -O3 -ftree-vectorizer-verbose=6 
for this code: 

#define SIZE 1
signed short a[SIZE];
signed short b[SIZE];
signed short c[SIZE];

void add()
{
  int i;
  for (i = 0; i  SIZE; ++i)
a[i] = b[i] + c[i];
}

cannot vectorize the loop:

add_sshort.c:9: note: vect_model_load_cost: aligned.
add_sshort.c:9: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
add_sshort.c:9: note: not vectorized: relevant stmt not supported: D.1580_6 =
(short unsigned int) D.1579_5
add_sshort.c:7: note: vectorized 0 loops in function.

The same happens if the type for a,b and c is signed char.

But if the type is unsigned short or unsigned char the loop is vectorized.


-- 
   Summary: signed short plus and signed char plus not vectorized
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39069

[Bug tree-optimization/39068] New: signed short plus and signed char plus not vectorized

2009-02-01 Thread dann at godzilla dot ics dot uci dot edu

gcc -march=core2 -O3 -ftree-vectorizer-verbose=6 
for this code: 

#define SIZE 1
signed short a[SIZE];
signed short b[SIZE];
signed short c[SIZE];

void add()
{
  int i;
  for (i = 0; i  SIZE; ++i)
a[i] = b[i] + c[i];
}

cannot vectorize the loop:

add_sshort.c:9: note: vect_model_load_cost: aligned.
add_sshort.c:9: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
add_sshort.c:9: note: not vectorized: relevant stmt not supported: D.1580_6 =
(short unsigned int) D.1579_5
add_sshort.c:7: note: vectorized 0 loops in function.

The same happens if the type for a,b and c is signed char.

But if the type is unsigned short or unsigned char the loop is vectorized.


-- 
   Summary: signed short plus and signed char plus not vectorized
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39068

[Bug middle-end/38204] New: PRE for post dominating expressions

2008-11-20 Thread dann at godzilla dot ics dot uci dot edu

For this function:
int test (int a, int b, int c, int g)
{
  int d, e;
  if (a)
d = b * c;
  else
d = b - c;
  e = b * c + g;
  return d + e;
}

the multiply expression is moved to both branches of the if, it would be
better to move it before the if.  Intel's compiler does that.


-- 
   Summary: PRE for post dominating expressions
   Product: gcc
   Version: 4.3.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
  GCC host triplet: i386-pc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38204

[Bug tree-optimization/27810] inefficient gimplification of function calls

2008-11-20 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #4 from dann at godzilla dot ics dot uci dot edu  2008-11-20 
18:43 ---
Still happens with 4.4.0:

qqq (int a)
{
  int result.0;
  int D.1236;
  int result;

  result.0 = bar (a);
  result = result.0;
  D.1236 = result;
  return D.1236;
}


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

Version|unknown |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27810

[Bug tree-optimization/15484] [tree-ssa] bool and short function arguments promoted to int

2008-11-20 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #6 from dann at godzilla dot ics dot uci dot edu  2008-11-20 
23:27 ---
Still happens in 4.4. 


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

   Keywords||memory-hog


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15484

[Bug c++/13146] inheritance for nonoverlapping_component_refs_p

2008-03-14 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #8 from dann at godzilla dot ics dot uci dot edu  2008-03-15 
00:28 ---
(In reply to comment #7)
 The testcase is fixed by the SCCVN alias-oracle patch.

Are you sure? I still see the problem (.final_cleanup dump):

void bar(first*, multi*) (s1, s3)
{
bb 2:
  s1-f1 = 0;
  s3-f3 = 0;
  s1-f1 = s1-f1 + 1;
  s3-f3 = s3-f3 + 1;
  s1-f1 = s1-f1 + 1;
  s3-f3 = s3-f3 + 1;
  if (s1-f1 != 2)
goto bb 3;
  else
goto bb 4;
bb 3:
  link_error () [tail call];
bb 4:
  return;
}
void foo(first*, second*) (s1, s2)
{
bb 2:
  s1-f1 = 0;
  s2-f2 = 0;
  s1-f1 = s1-f1 + 1;
  s2-f2 = s2-f2 + 1;
  s1-f1 = s1-f1 + 1;
  s2-f2 = s2-f2 + 1;
  if (s1-f1 != 2)
goto bb 3;
  else
goto bb 4;
bb 3:
  link_error () [tail call];
bb 4:
  return;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13146

[Bug tree-optimization/27799] adding unused char field inhibits optimization

2008-03-04 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #5 from dann at godzilla dot ics dot uci dot edu  2008-03-04 
21:19 ---
(In reply to comment #4)
 http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00243.html

Thanks for working on this!
Have you looked at the impact?
Probably the generated code won't too different because the RTL alias analysis
probably catches this.
But it would be interesting to see what is the difference for the tree dumps
before and after this patch.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27799

[Bug tree-optimization/27799] adding unused char field inhibits optimization

2008-03-04 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #7 from dann at godzilla dot ics dot uci dot edu  2008-03-04 
21:32 ---
(In reply to comment #6)
 Actually RTL alias is just using the same routines.
Might be, but the RTL level code that optimizes away the abort() in both
testcases (if I remember well nonoverlapping_component_refs_p). 

 
   # SMT.4_6 = VDEF SMT.4_4(D)
   # SMT.5_7 = VDEF SMT.5_5(D)
   x_1(D)-x = 0;
   # SMT.5_8 = VDEF SMT.5_7
   y_2(D)-y = 1;
 
 vs.
 
   # SMT.18_5 = VDEF SMT.18_4(D)
   x_1(D)-x = 0;
   # SMT.19_7 = VDEF SMT.19_6(D)
   y_2(D)-y = 1;

That is for this testcase, but what about the impact on .final_cleanup for
something big like combine.c? 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27799

[Bug tree-optimization/27799] adding unused char field inhibits optimization

2008-03-04 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #9 from dann at godzilla dot ics dot uci dot edu  2008-03-04 
21:43 ---
(In reply to comment #8)
 Subject: Re:  adding unused char field inhibits
  optimization
 
 On Tue, 4 Mar 2008, dann at godzilla dot ics dot uci dot edu wrote:
 
  --- Comment #7 from dann at godzilla dot ics dot uci dot edu  
  2008-03-04 21:32 ---
  (In reply to comment #6)
   Actually RTL alias is just using the same routines.
  Might be, but the RTL level code that optimizes away the abort() in both
  testcases (if I remember well nonoverlapping_component_refs_p). 
 
 I still have the abort () with -O2.

Argghh, sorry, my bad: typo in the grep abort file.s command ...


  That is for this testcase, but what about the impact on .final_cleanup for
  something big like combine.c? 
 
 No idea, but feel free to check.

I don't have a recent build... 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27799

[Bug c/31575] New: Extra push+pop generated on x86

2007-04-14 Thread dann at godzilla dot ics dot uci dot edu

For this code: 

struct data {
unsigned data[4][1];
unsigned char valid[4];
unsigned char flags[4];
};

void
def(struct data *info, int index, unsigned *v)
{
if (info-flags[index]) {
info-valid[index] = 1;
info-data[index][0] = v[0];
}
}
SVN HEAD generates an extra push+pop compared to 4.1.1 when compiling with 
-O2 -march=pentium4

4.1.1 generates this code (3.4.3 generates identical code):   
def:
pushl   %ebp
movl%esp, %ebp
movl8(%ebp), %ecx
movl12(%ebp), %edx
cmpb$0, 20(%edx,%ecx)
je  .L4
movb$1, 16(%edx,%ecx)
movl16(%ebp), %eax
movl(%eax), %eax
movl%eax, (%ecx,%edx,4)
.L4:
popl%ebp
ret

SVN HEAD:
def:
pushl   %ebp
movl%esp, %ebp
pushl   %ebx  --- extra instruction
movl8(%ebp), %ecx
movl12(%ebp), %edx
cmpb$0, 20(%ecx,%edx)
je  .L4
movb$1, 16(%ecx,%edx)
movl16(%ebp), %ebx
movl(%ebx), %eax
movl%eax, (%ecx,%edx,4)
.L4:
popl%ebx  --- extra instruction
popl%ebp
ret


This is a regression from (at least) 3.4.3 and 4.1.1


-- 
   Summary: Extra push+pop generated on x86
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31575

[Bug middle-end/31575] Extra push+pop generated on x86

2007-04-14 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #2 from dann at godzilla dot ics dot uci dot edu  2007-04-14 
21:03 ---
(In reply to comment #1)
 This looks completely a register allocator issue and I think 4.2.0 and before
 were just getting lucky.

Also note that the extra push+pop are NOT generated when using -march=i386,
-march=athlon, or -march=core2. But they ARE generated when using -march=i486,
-march=i686, -march=pentium4


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31575

[Bug rtl-optimization/30643] New: CSE regression

2007-01-30 Thread dann at godzilla dot ics dot uci dot edu

CSE used to eliminate all the ifs in the code below at least in gcc-3.x (and
probably even earlier). Now in SVN HEAD it does not do it anymore. 4.1 still
does it. 

struct s {  int a;  int b;};
void bar (struct s *ps,  int *p, int *__restrict__ rp, int *__restrict__ rq)
{
  ps-a = 0;
  ps-b = 1;
  if (ps-a != 0)abort ();
  p[0] = 0;
  p[1] = 1;
  if (p[0] != 0) abort ();
  rp[0] = 0;
  rq[0] = 1;
  if (rp[0] != 0) abort();
}

-O2 assembly for SVN HEAD:
bar:
subl$12, %esp
movl16(%esp), %eax
movl20(%esp), %edx
movl24(%esp), %ecx
movl$0, (%eax)
movl$1, 4(%eax)
movl(%eax), %eax
testl   %eax, %eax
jne .L20
movl$0, (%edx)
movl(%edx), %eax
movl$1, 4(%edx)
testl   %eax, %eax
jne .L20
movl$0, (%ecx)
movl(%ecx), %ecx
movl28(%esp), %eax
testl   %ecx, %ecx
movl$1, (%eax)
jne .L20
addl$12, %esp
ret
.L20:
callabort


-O2 assembly for 4.1.1
bar:
movl4(%esp), %eax
movl8(%esp), %edx
movl$0, (%eax)
movl$1, 4(%eax)
movl12(%esp), %eax
movl$0, (%edx)
movl$1, 4(%edx)
movl$0, (%eax)
movl16(%esp), %eax
movl$1, (%eax)
ret


-- 
   Summary: CSE regression
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30643

[Bug tree-optimization/27798] gimplifying return CONSTANT creates unneeded temporaties

2007-01-28 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #5 from dann at godzilla dot ics dot uci dot edu  2007-01-28 
22:04 ---
(In reply to comment #2)
 i.e. it misses to initialize the temporary with the result.  Otherwise you
 can play with variants of the following patch:

Richard, have you tried to make this patch work? It seems that with all the
work that goes into inlining now, this might help a bit by making some function
bodies smaller and and allowing the inliner to better estimate the actual
size... 

 
 Index: gimplify.c
 ===
 *** gimplify.c  (revision 114599)
 --- gimplify.c  (working copy)
 *** gimplify_return_expr (tree stmt, tree *p
 *** ,1116 
 --- ,1124 
 if (!result_decl
 || aggregate_value_p (result_decl, TREE_TYPE (current_function_decl)))
   result = result_decl;
 +   else if (/*is_gimple_formal_tmp_reg (TREE_OPERAND (ret_expr, 1))
 +||*/ is_gimple_min_invariant (TREE_OPERAND (ret_expr, 1))
 +  /*is_gimple_val (TREE_OPERAND (ret_expr, 1))*/)
 + {
 +   TREE_OPERAND (stmt, 0) = TREE_OPERAND (ret_expr, 1);
 + 
 +   return GS_ALL_DONE;
 + }
 else if (gimplify_ctxp-return_temp)
   result = gimplify_ctxp-return_temp;
 else
 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27798

[Bug tree-optimization/30105] reassoc can sometimes get in the way of PRE

2006-12-11 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #6 from dann at godzilla dot ics dot uci dot edu  2006-12-12 
06:07 ---
(In reply to comment #5)
 (In reply to comment #1)
  Confirmed (but it's not PRE).
  

 The second is smaller, and no more or less efficient since the addition is
 calculated on both paths anyway.
 
 Both are valid results, and what RTL does with them is it's business.
 
 I don't believe you can claim they should generate identical assembly.
 
 The actual thing this testcase is trying to test, that load-PRE is performed,
 has succeeded.
 Thus i am closing this bug as WORKSFORME.
 If you see something *actually wrong* with the result, rather than just
 disassembly, please feel free to reopen.

Here is a slightly modified example that shows that there's still a PRE
opportunity 

void motion_test22(int * data, int i)
{
  int j;
  if (data[1]) {
data[data[2]] = 2;
j = data[0] *   data[3];
i = i *  j;
  }
  data[4] = data[0] * data[3];
  data[5] = i;
}

L0:;
  *((int *) ((unsigned int) *(data + 8B) * 4) + data) = 2;
  prephitmp.26 = *data;
  prephitmp.31 = *(data + 12B);
  i = prephitmp.26 * i * prephitmp.31;

L1:;
  *(data + 16B) = prephitmp.31 * prephitmp.26;
  *(data + 20B) = i;
  return;

There are 3 multiplications on the L0-L1 path. It should be possible
to only have 2 multiplications on that path.


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|WORKSFORME  |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30105

[Bug tree-optimization/30104] missed code motion optimization (invariant control structures)

2006-12-07 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #4 from dann at godzilla dot ics dot uci dot edu  2006-12-07 
18:24 ---
(In reply to comment #3)
 unswitching would duplicate the whole loop here, so not exactly I think.  But
 if-conversion to
 
   j = COND_EXPR p, 1, 2
 
 or
 
   j = 2 - (int)p;
 
 would make j loop invariant.

if-conversion would solve this particular testcase, but the more general case
of moving invariant control structures out of the loop is probably more
interesting. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30104

[Bug tree-optimization/30098] New: missed value numbering optimization

2006-12-06 Thread dann at godzilla dot ics dot uci dot edu

The following 2 functions should be compiled to the same thing. 
This is a test from Briggs' compiler benchmarks.

void vnum_test8(int *data)
{
  int i;
  int stop = data[3];
  int m = data[4];
  int n = m;
  for (i=0; istop; i++) {
int k = data[2];
data[k] = 2;
data[0] = m - n;
k = data[1];
m = m + k;
n = n + k;
  }
}
void vnum_result8(int *data)
{
  int i;
  int stop = data[3];
  for (i=0; istop; i++) {
int k = data[2];
data[k] = 2;
data[0] = 0;
  }
}


-- 
   Summary: missed value numbering optimization
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30098

[Bug tree-optimization/30099] New: missed value numbering optimization (conditional-based assertions)

2006-12-06 Thread dann at godzilla dot ics dot uci dot edu

The following 2 functions should be compiled to the same assembly. 
This is one of Briggs' compiler benchmarks.

void vnum_test10(int *data)
{
  int i = data[0];
  int m = i + 1;
  int j = data[1];
  int n = j + 1;
  data[2] = m + n;
  if (i == j)
data[3] = (m - n) * 21;
}
void vnum_result10(int *data)
{
  int i = data[0];
  int m = i + 1;
  int j = data[1];
  int n = j + 1;
  data[2] = m + n;
  if (i == j)
data[3] = 0;
}


-- 
   Summary: missed value numbering optimization (conditional-based
assertions)
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30099

[Bug tree-optimization/30100] New: missed value numbering optimization (conditional value numbers)

2006-12-06 Thread dann at godzilla dot ics dot uci dot edu

The following 2 functions should be compiled to the same assembly.
This is one of Briggs' compiler benchmarks.

void vnum_test11(int *data)
{
  int n;
  int stop = data[3];
  int j = data[1];
  int k = j;
  int i = 1;
  for (n=0; nstop; n++) {
if (j != k) i = 2;
if (i != 1) k = 2;
data[data[2]] = 2;
  }
  data[1] = i;
}
void vnum_result11(int *data)
{
  int n;
  int stop = data[3];
  for (n=0; nstop; n++)
data[data[2]] = 2;
  data[1] = 1;
}


-- 
   Summary: missed value numbering optimization (conditional value
numbers)
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30100

[Bug tree-optimization/30101] New: missed value numbering optimization (cprop+valnum)

2006-12-06 Thread dann at godzilla dot ics dot uci dot edu

The following 2 functions should be compiled to the same assembly. 
This is one of Briggs' compiler benchmarks.

void vnum_test12(int *data)
{
  int n;
  int stop = data[3];
  int j = data[1];
  int k = j;
  int i = 1;
  for (n=0; nstop; n++) {
if (j != k) i = 2;
i = 2 - i;
if (i != 1) k = 2;
data[data[2]] = 2;
  }
  data[1] = i;
}
void vnum_result12(int *data)
{
  int n;
  int stop = data[3];
  for (n=0; nstop; n++)
data[data[2]] = 2;
  data[1] = 1;
}


-- 
   Summary: missed value numbering optimization (cprop+valnum)
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30101

[Bug tree-optimization/30102] New: missed strength reduction optimization (irreducible loops)

2006-12-06 Thread dann at godzilla dot ics dot uci dot edu

The following 2 functions should be compiled to the same assembly. 
This is one of Briggs' compiler benchmarks.


void strength_test4(int *data)
{
  int i;
  if (data[1]) {
i = 2;
goto here;
  }
  i = 0;
  do {
i = i + 1;
here:
data[data[2]] = 2;
  } while (i * 21  data[3]);
}
void strength_result4(int *data)
{
  int i;
  if (data[1]) {
i = 42;
goto here;
  }
  i = 0;
  do {
i = i + 21;
here:
data[data[2]] = 2;
  } while (i  data[3]);
}


-- 
   Summary: missed strength reduction optimization (irreducible
loops)
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30102

[Bug tree-optimization/30103] New: missed strength reduction optimization (test replacement)

2006-12-06 Thread dann at godzilla dot ics dot uci dot edu

The following 2 functions should be compiled to the same assembly. 
This is one of Briggs' compiler benchmarks.

void strength_test10(int *data)
{
  int stop = data[3];
  int i = 0;
  do {
data[data[2]] = 21 * i;
i = i + 1;
  } while (i  stop);
}
void strength_result10(int *data)
{
  int stop = data[3] * 21;
  int i = 0;
  do {
data[data[2]] = i;
i = i + 21;
  } while (i  stop);
}


-- 
   Summary: missed strength reduction optimization (test
replacement)
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30103

[Bug tree-optimization/30104] New: missed code motion optimization (invariant control structures)

2006-12-06 Thread dann at godzilla dot ics dot uci dot edu

The following 2 functions should be compiled to the same assembly. 
This is one of Briggs' compiler benchmarks.

void motion_test10(int *data)
{
  int j;
  int p = data[1];
  int i = data[0];
  do {
if (p)
  j = 1;
else
  j = 2;
i = i + j;
data[data[2]] = 2;
  } while (i  data[3]);
}
void motion_result10(int *data)
{
  int j;
  int p = data[1];
  int i = data[0];
  if (p)
j = 1;
  else
j = 2;
  do {
i = i + j;
data[data[2]] = 2;
  } while (i  data[3]);
}


-- 
   Summary: missed code motion optimization (invariant control
structures)
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30104

[Bug tree-optimization/30105] New: missed PRE

2006-12-06 Thread dann at godzilla dot ics dot uci dot edu

The following 2 functions should be compiled to the same assembly. 
This is one of Briggs' compiler benchmarks.

void motion_test2(int *data)
{
  int j;
  int i = 1;
  if (data[1]) {
data[data[2]] = 2;
j = data[0] + data[3];
i = i + j;
  }
  data[4] = data[0] + data[3];
  data[5] = i;
}
void motion_result2(int *data)
{
  int j;
  int i = 1;
  if (data[1]) {
data[data[2]] = 2;
j = data[0] + data[3];
i = i + j;
  }
  else
j = data[0] + data[3];
  data[4] = j;
  data[5] = i;
}


-- 
   Summary: missed PRE
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30105

[Bug target/28946] [4.0/4.1/4.2 Regression] assembler shifts set the flag ZF, no need to re-test to zero

2006-09-04 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #3 from dann at godzilla dot ics dot uci dot edu  2006-09-04 
17:56 ---
This specific case can probably be solved at the tree level by changing the
test:

(nb  5) != 0
to 
nb  32


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

 CC||dann at godzilla dot ics dot
   ||uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28946

[Bug tree-optimization/27810] inefficient gimplification of function calls

2006-06-20 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #3 from dann at godzilla dot ics dot uci dot edu  2006-06-20 
19:09 ---
More data: for PR8361 the number of functions in the .gimple dump is 5045, the
number of functions in the cleanup_cfg dump is 1341. The majority of the
functions that are eliminated are small functions, for those the extra overhead
due to inefficiencies in gimplification is significant. Maybe the people
interested in compilation speed at -O0 (especially for C++) want to take a look
at this and the related PRs...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27810

[Bug middle-end/27896] inefficient lowering for return

2006-06-13 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #2 from dann at godzilla dot ics dot uci dot edu  2006-06-13 
14:18 ---
Add Diego to the CC list as per his request.


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

 CC||dnovillo at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27896

[Bug tree-optimization/27809] inefficient gimplification of globals

2006-06-13 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #3 from dann at godzilla dot ics dot uci dot edu  2006-06-13 
14:22 ---
(In reply to comment #2)
 (In reply to comment #1)
  Hmm, it should have produced G.3, G.n, at least I would have thought.
  
 
 No, we intentionally use the same variable for the lexically identical
 expressions, see internal_get_tmp_var/lookup_tmp_var.  Original intention was
 to make PRE and other redundancy elimination optimization passes more 
 efficient
 (this was essential especially for the old SSAPRE pass that used lexical
 equality of expressions to check for redundancies).  These reasons are no
 longer relevant, but keeping the code saves a significant amount of memory and
 compile time (I tried removing the code a few months ago, but since it slows
 down compilation by some 1-2%, I never bothered with posting the patch).

Using the same variable is surely good, wouldn't it be even better to not
create the redundant G.2 = G assignments in this PR?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27809

[Bug tree-optimization/27798] gimplifying return CONSTANT creates unneeded temporaties

2006-06-13 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #3 from dann at godzilla dot ics dot uci dot edu  2006-06-13 
14:42 ---
One of the issues with this PR and also 27800, 27809 and 27810 is that this
extra work/memory allocation done for a number of functions that are never
used: like all the inline functions present in the glibc headers. These
functions are thrown out after gimplification...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27798

[Bug tree-optimization/27896] New: inefficient lowering for return

2006-06-05 Thread dann at godzilla dot ics dot uci dot edu

The .lower dump for this code: int foo (void) { return 1;} looks like:

foo ()
{
  goto D1524;
  D1524:;
  return 1;

goto to the next line is useless, this just increases the memory usage and it
needs extra work to be eliminated in a subsequent pass...


-- 
   Summary: inefficient lowering for return
   Product: gcc
   Version: 4.0.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27896

[Bug tree-optimization/27810] inefficient gimplification of function calls

2006-05-31 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #2 from dann at godzilla dot ics dot uci dot edu  2006-05-31 
21:47 ---
My guesstimate is that for combine.i about 5-8% of the total number of
expressions
in the gimple dump are due to the gimplification inefficiencies shown in 
PRs 27798 27800 27809 27810, so these issues might have a compilation time
impact if fixed...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27810

[Bug tree-optimization/27798] New: gimplifying return CONSTANT creates unneeded temporaties

2006-05-29 Thread dann at godzilla dot ics dot uci dot edu

int zero { return 0; } 

is gimplified to: 

zero ()
{
  int D.2115;

  D.2115 = 0;
  return D.2115;
}

The D.2115 temporary is not needed, the return value is constant, it is of the
same type as the function return type, and return CONSTANT is valid 
gimple. 

Not creating the temporary should save some memory and processing time later.


-- 
   Summary: gimplifying return CONSTANT creates unneeded
temporaties
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27798

[Bug tree-optimization/27799] New: adding unused char field inhibits optimization

2006-05-29 Thread dann at godzilla dot ics dot uci dot edu

For this code:

struct X {double m; int x;};
struct Y {int y; short d;};
struct YY {int y; short d; char c;};

int foo(struct X *x,  struct Y *y)
{
  x-x =  0;
  y-y =  1;
  if (x-x != 0)
abort ();
}

int foo_no(struct X *x,  struct YY *y)
{
  x-x =  0;
  y-y =  1;
  if (x-x != 0)
abort ();
}

the if does not get optimized away (by the dom1 pass) for the foo_no
function, but it is optimized for foo 
The only difference between the 2 functions is that foo_no takes as a parameter
a pointer to a struct that has a char field that is not accessed in this
function.

It would be nice if both functions were optimized in the same way.


-- 
   Summary: adding unused char field inhibits optimization
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27799

[Bug tree-optimization/27800] New: extra temprorary created when gimplifying return

2006-05-29 Thread dann at godzilla dot ics dot uci dot edu

One would think that the temporaries created when gimplifying the following
2 functions would be the same:

void hhh (int a, int b, int c){ bar (a?b:c); }
int iii (int a, int b, int c){ return (a?b:c); }

But they are not:

hhh (a, b, c)
{
  int iftmp.0;
  if (a != 0)
{
  iftmp.0 = b;
}
  else
{
  iftmp.0 = c;
}
  bar (iftmp.0);
}

This one is fine. 
But this one:

iii (a, b, c)
{
  int D.2128;
  int iftmp.1;
  if (a != 0)
{
  iftmp.1 = b;
}
  else
{
  iftmp.1 = c;
}
  D.2128 = iftmp.1;
  return D.2128;
}
creates an extra temporary for the return expression. It would be more memory
efficient if it would just use iftmp.1 as the first function does.


-- 
   Summary: extra temprorary created when gimplifying return
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27800

[Bug tree-optimization/27800] extra temprorary created when gimplifying return

2006-05-29 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #1 from dann at godzilla dot ics dot uci dot edu  2006-05-29 
20:51 ---
An even simpler example which occurs quite frequently in programs:

int jjj (int a){ return bar (a); }

jjj (a)
{
  int D.1891;
  int D.1892;

  D.1892 = bar (a);
  D.1891 = D.1892;
  return D.1891;
}


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

Summary|extra temprorary created|extra temprorary created
   |when gimplifying return |when gimplifying return


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27800

[Bug tree-optimization/27809] New: inefficient gimplification of globals

2006-05-29 Thread dann at godzilla dot ics dot uci dot edu

This code:

int G;
int lll (int a) { bar (G, G, G, G); }

is gimplified like this:

lll (a)
{
  int G.2;

  G.2 = G;
  G.2 = G;
  G.2 = G;
  G.2 = G;
  bar (G.2, G.2, G.2, G.2);
}

Creating that many identical expressions is wastefull...


-- 
   Summary: inefficient gimplification of globals
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27809

[Bug tree-optimization/27810] New: inefficient gimplification of function calls

2006-05-29 Thread dann at godzilla dot ics dot uci dot edu

int qqq (int a) {  int result;   result = bar (a);   return result;}

is gimplified to:

qqq (a)
{
  int D.2147;
  int D.2148;
  int result;

  D.2147 = bar (a);
  result = D.2147;
  D.2148 = result;
  return D.2148;
}

The D.2147 variable is redundant, the result of bar can be directly assigned 
to result. Doing this just increases the memory footprint... 
(PR27800 is about the fact that D.2148 is also redundant)


-- 
   Summary: inefficient gimplification of function calls
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27810

[Bug tree-optimization/27440] New: [4.0/4.1/4.2 regression] code quality regression due to ivopts

2006-05-04 Thread dann at godzilla dot ics dot uci dot edu

Compiling this code with 3.4.6
void fill2 (unsigned int *arr,  unsigned int val, unsigned int start, unsigned
int limit)
{
  unsigned int i;
  for (i = start; i  start + limit; i++)
arr[i] = val;
}
generates: 
.L10:
movl%ecx, (%ebx,%eax,4)
incl%eax
.L8:
cmpl%eax, %edx
ja  .L10
4.0/4.1/4.2 -O2 generate:

.L4:
incl%edx
movl%esi, (%eax)
addl$4, %eax
cmpl%ecx, %edx
jne .L4
which is both slower and bigger. 

using -O2 -fno-ivopts the result is much better:
.L4:
movl%ecx, (%ebx,%eax,4)
incl%eax
cmpl%edx, %eax
jb  .L4

The difference in the .final_cleanup dump with and without ivopts is obvious:
With ivopts: 

  void * ivtmp.29;
  unsigned int ivtmp.26;
  unsigned int D.1290;

bb 0:
  D.1290 = start + limit;
  if (start  D.1290) goto L6; else goto L2;

L6:;
  ivtmp.29 = arr + (unsigned int *) (start * 4);
  ivtmp.26 = 0;

L0:;
  MEM[base: (unsigned int *) ivtmp.29] = val;
  ivtmp.26 = ivtmp.26 + 1;
  ivtmp.29 = ivtmp.29 + 4B;
  if (ivtmp.26 != D.1290 - start) goto L0; else goto L2;

L2:;
  return;

Without ivopts:
  unsigned int i;
  unsigned int D.1290;

bb 0:
  D.1290 = start + limit;
  if (start  D.1290) goto L11; else goto L2;

L11:;
  i = start;

L0:;
  *((unsigned int *) (i * 4) + arr) = val;
  i = i + 1;
  if (i  D.1290) goto L0; else goto L2;

L2:;
  return;


The   void * ivtmp.29 is created by the ivopts pass. Why is it
a void* when it is known to be assigned to a unsigned int* ? 

Note that loops like the one in this example are quite common. For example in
the assembly for PR8361 there are about 37 fill functions with very similar
code (they are intantiations of 2 different templates, but still...)


-- 
   Summary: [4.0/4.1/4.2 regression] code quality regression due to
ivopts
   Product: gcc
   Version: 4.0.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27440

[Bug target/27440] [4.0/4.1/4.2 regression] code quality regression due to ivopts

2006-05-04 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #2 from dann at godzilla dot ics dot uci dot edu  2006-05-04 
23:09 ---
(In reply to comment #1)
 IV-OPTs just gets info from the target.  Now if the target says weird
 addressing mode is the same as cheap ones, what do you think will happen?

Does IV-OPTs also take into consideration the cost of having 2 IVs instead of
1? 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27440

[Bug tree-optimization/27441] New: VAR - 1 not identified as the same as VAR + -1

2006-05-04 Thread dann at godzilla dot ics dot uci dot edu

It seems that neither FRE nor PRE can determine that stride.115 - 1 is the same
as stride.115 + -1 in the example below (taken from the comm3 function in mgrid
from SPEC2000). (Or am I missing something?)

bb 2:
  stride.115 = *n;
  stride.117 = stride.115 * stride.115;
  offset.118 = ~stride.115 - stride.117;
  D.1969 = stride.115 - 1;
  if (D.1969  1) goto L68; else goto L24;

L68:;
  pretmp.221 = stride.115 + -1;
  pretmp.228 = offset.118 + pretmp.221;
  pretmp.236 = offset.118 + stride.115;
  i3 = 2;


-- 
   Summary: VAR - 1 not identified as the same as VAR + -1
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27441

[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-03 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #5 from dann at godzilla dot ics dot uci dot edu  2006-05-03 
18:54 ---
IMO Comment #4 does not look close enough at what is actually happening.
IMO tree-ch is the root cause here.

The code looks like this before .ch
Before .ch
  goto bb 2 (L1);

L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI Int_Loc_3(0), Int_Index_58(1);
L1:;
  D.1305_26 = Int_Loc_3 + 1;
  if (Int_Index_1 = D.1305_26) goto L0; else goto L2;

L2:;


after .ch it looks like this: 
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 = D.1305_41) goto L0; else goto L2; -- this just
complicates the CFG. Look below to see what are the effects of doing this in
later passes. Plus just look at the comparison ...

  # Int_Index_37 = PHI Int_Index_58(1), Int_Loc_3(0);
L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  D.1305_26 = Int_Loc_3 + 1;
  if (D.1305_26 = Int_Index_58) goto L0; else goto L2;

L2:;

Given the above CFG, critical edge splitting transforms this into:
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 = D.1305_41) goto L6; else goto L7;

L7:;
  goto bb 2 (L2);

L6:;

  # Int_Index_37 = PHI Int_Index_58(5), Int_Loc_3(3);
L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1305_41 = Int_Index_58) goto L8; else goto L9;

L8:;
  goto bb 1 (L0);

L9:;

L2:;

Given the above CFG PRE will dutifully fill with code a lot of the empty basic
blocks: 

after pre
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 = D.1305_41) goto L6; else goto L7;

L7:;
  pretmp.34_45 = Int_Loc.0_4 * 200;
  pretmp.36_57 = (int[50] *) pretmp.34_45;
  pretmp.38_25 = Arr_2_Par_Ref_30 + pretmp.36_57;
  goto bb 2 (L2);

L6:;
  pretmp.30_26 = Int_Loc.0_4 * 200;
  pretmp.31_19 = (int[50] *) pretmp.30_26;
  pretmp.32_1 = pretmp.31_19 + Arr_2_Par_Ref_30;

  # Int_Index_37 = PHI Int_Index_58(5), Int_Loc_3(3);
L0:;
  D.1301_54 = pretmp.30_26;
  D.1302_55 = pretmp.31_19;
  D.1303_56 = pretmp.32_1;
  (*D.1303_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1305_41 = Int_Index_58) goto L8; else goto L9;

L8:;
  goto bb 1 (L0);

L9:;

  # prephitmp.39_23 = PHI D.1303_56(6), pretmp.38_25(4);
  # prephitmp.37_53 = PHI D.1302_55(6), pretmp.36_57(4);
  # prephitmp.35_49 = PHI D.1301_54(6), pretmp.34_45(4);
L2:;


Now when using -fno-tree-ch 

before critical edge splitting the code looks like this:
  goto bb 2 (L1);

L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI Int_Loc_3(0), Int_Index_58(1);
L1:;
  D.1305_26 = Int_Loc_3 + 1;
  if (Int_Index_1 = D.1305_26) goto L0; else goto L2;

L2:;


after crited it looks like this: (i.e. no change) 

  goto bb 2 (L1);

L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI Int_Loc_3(0), Int_Index_58(1);
L1:;
  D.1305_26 = Int_Loc_3 + 1;
  if (Int_Index_1 = D.1305_26) goto L0; else goto L2;

L2:;

and after PRE

  goto bb 2 (L1);

L0:;
  D.1301_54 = pretmp.31_49;
  D.1302_55 = pretmp.32_45;
  D.1303_56 = pretmp.33_41;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI Int_Loc_3(0), Int_Index_58(1);
L1:;
  D.1305_26 = pretmp.30_19;
  if (Int_Index_1 = D.1305_26) goto L0; else goto L2;

L2:;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944

[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-03 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #8 from dann at godzilla dot ics dot uci dot edu  2006-05-03 
21:53 ---
WRT this code generated by tree-ch:
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 = D.1305_41) goto L0; else goto L2;

AFAICT there's exactly one value for which the comparison can be false, IMO it
would be better to test directly that value instead of generating a new SSA
name and another expression.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944

[Bug tree-optimization/27365] New: add a way to mark that a path cannot be taken, something like __builtin_unreachable()

2006-04-30 Thread dann at godzilla dot ics dot uci dot edu

It would be nice to have some form of a builtin that shows that a portion of
the code is not reachable, and it generates no code in the binary. 

gcc_unreachable() is used now in the gcc sources for this, but it will generate
assembly code that calls abort().

Another way to accomplish the same thing could be with attributes
Can attributes be used for function calls? I beleive right now they can't. 
If they could, then something like this could work:
myfunc(foo,bar,baz) __attribute__((noreturn));
Some functions are known not to return only in certain situations, so they
cannot be declared as being noreturn. An example where this would be useful
is the Fsignal function in emacs.


-- 
   Summary: add a way to mark that a path cannot be taken, something
like __builtin_unreachable()
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27365

[Bug tree-optimization/15911] VRP/DOM does not like TRUTH_AND_EXPR

2006-04-30 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #28 from dann at godzilla dot ics dot uci dot edu  2006-04-30 
19:25 ---
Just a note, fixing the problem in this PR would fix the only remaining failure
for cprop in Brigg's compiler benchmarks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15911

[Bug tree-optimization/26944] New: -ftree-ch generates worse code

2006-03-30 Thread dann at godzilla dot ics dot uci dot edu

The loop from the code below is compiled to this when using gcc-4.1 -O2
.L5:
movl16(%ebp), %eax
addl%ecx, %eax
addl$1, %ecx
movl%edx, 20(%ebx,%eax,4)
leal(%edx,%ecx), %eax
cmpl%edi, %eax
jle .L5
but the code is much better when using gcc -fno-tree-ch -O2 
.L3:
addl$1, %ecx
movl%ebx, -4(%edx)
addl$4, %edx
cmpl%eax, %ecx
jle .L3
This is a regression as gcc-3.4.3 generates similar code. 

The code is from the Dhrystone as included in Unixbench.

The regression is quite important as embedded processor people still use
Dhrystone for benchmarking compiler/processor speed.

Its strange that tree-ch messes up, the loop is about as simple as loops can
get. 

typedef int One_Fifty;
typedef int Arr_1_Dim [50];
typedef int Arr_2_Dim [50] [50];
extern int Int_Glob;

void Proc_8 (Arr_1_Par_Ref, Arr_2_Par_Ref, Int_1_Par_Val, Int_2_Par_Val)
 Arr_1_Dim Arr_1_Par_Ref;
 Arr_2_Dim Arr_2_Par_Ref;
 int Int_1_Par_Val;
 int Int_2_Par_Val;
{
  register One_Fifty Int_Index;
  register One_Fifty Int_Loc;

  Int_Loc = Int_1_Par_Val + 5;
  Arr_1_Par_Ref [Int_Loc] = Int_2_Par_Val;
  Arr_1_Par_Ref [Int_Loc+1] = Arr_1_Par_Ref [Int_Loc];
  Arr_1_Par_Ref [Int_Loc+30] = Int_Loc;
  for (Int_Index = Int_Loc; Int_Index = Int_Loc+1; ++Int_Index)
Arr_2_Par_Ref [Int_Loc] [Int_Index] = Int_Loc;
  Arr_2_Par_Ref [Int_Loc] [Int_Loc-1] += 1;
  Arr_2_Par_Ref [Int_Loc+20] [Int_Loc] = Arr_1_Par_Ref [Int_Loc];
  Int_Glob = 5;
}


Intel's compiler generates even tighter code:

..B1.7: # Preds ..B1.10 ..B1.7
movl  %ebx, (%ecx,%edx,4)   #20.5
addl  $1, %edx  #19.55
cmpl  %eax, %edx#19.3
jle   ..B1.7# Prob 80%  #19.3


-- 
   Summary: -ftree-ch generates worse code
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944

[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-03-30 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #2 from dann at godzilla dot ics dot uci dot edu  2006-03-30 
16:43 ---
(In reply to comment #1)
 Note that this may be also PRE confusing SCEV in presence of loop headers. 

Talking about PRE, here's a maybe interesting observation in the PRE dump:

L7:;
  pretmp.30_53 = Int_Loc.0_4 * 200;
  pretmp.32_23 = (int[50] *) pretmp.30_53;
  pretmp.32_11 = pretmp.32_23 + Arr_2_Par_Ref_30;
  goto bb 4 (L2);

L6:;
  pretmp.27_59 = Int_Loc.0_4 * 200;
  pretmp.28_45 = (int[50] *) pretmp.27_59;
  pretmp.28_49 = Arr_2_Par_Ref_30 + pretmp.28_45;

  # Int_Index_37 = PHI Int_Index_58(7), Int_Loc_3(5);
L0:;
  D.1544_54 = pretmp.27_59;
  D.1545_55 = pretmp.28_45;
  D.1546_56 = pretmp.28_49;
  (*D.1546_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1548_41 = Int_Index_58) goto L8; else goto L9;

L8:;
  goto bb 3 (L0);

L9:;

  # prephitmp.33_40 = PHI D.1546_56(8), pretmp.32_11(6);
  # prephitmp.33_18 = PHI D.1545_55(8), pretmp.32_23(6);
  # prephitmp.31_25 = PHI D.1544_54(8), pretmp.30_53(6);


Compare pretmp.28_49 with pretmp.32_11, why are the arguments in a different
order? Is there something unstable in the PRE algorithm?

One has to wonder what are the tree-ch effects on more complex loops. 
It might be interesting test SPEC with and without tree-ch...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944

[Bug target/26949] New: [4.2 regression] worse code generated for -march=pentium4

2006-03-30 Thread dann at godzilla dot ics dot uci dot edu

Compiling the code in PR26944 with -O2 -march=pentium4 -fno-tree-ch
generates this for the loop:
.L3:
movl%esi, -4(%eax)
addl$1, %edx
addl$4, %eax
cmpl-16(%ebp), %edx  - note an extra memory access here
jle .L3

compiling for -march=i686 (or even just adding -fomit-frame-pointer) generates:

.L3:
addl$1, %ecx
movl%ebx, -4(%edx)
addl$4, %edx
cmpl%eax, %ecx   no memory access here
jle .L3

The above problem does not happen with gcc-4.0.3 or 4.1.0


-- 
   Summary: [4.2 regression] worse code generated for -
march=pentium4
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26949

[Bug tree-optimization/26850] New: unused function not eliminated with -fwhole-program --combine

2006-03-24 Thread dann at godzilla dot ics dot uci dot edu

Compile these 2 files with gcc -O2 -fwhole-program --combine a.c b.c 
a.c: 
int main (void) {  return 0;}

b.c:
static int tst1 (int x) {return x;}
static int global_static;
int global;
int tst2 (int x,  int y) {foo (tst1, x, y, global_static, global);}


The generated assembly still contains the tst1 function. tst2, global and
static_global have been eliminated. 

It seems that functions that have their address taken should be reconsidered
for elimination after eliminating the functions (and variables too) that took
their address. 

Note that in the above case compiling the files separately will generate less
code as the whole b.o file will be eliminated by the linker...


-- 
   Summary: unused function not eliminated with -fwhole-program --
combine
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26850

[Bug rtl-optimization/26537] New: Basic block reordering inserts redundant instruction

2006-03-02 Thread dann at godzilla dot ics dot uci dot edu

This code: 
extern char *nl_langinfo (int) __attribute__ ((__nothrow__));

char *
xtermEnvEncoding(void)
{
  static char *result;

  if (result == 0)
result = nl_langinfo(50);
  return result;
}
gets compile by gcc-4.1.0 -march=i686 -mtune=i686 to:

xtermEnvEncoding:
[snip]
.L6:
movl$50, (%esp)
callnl_langinfo
movl%eax, result.1281
movlresult.1281, %eax    note this
leave
ret

Note the redundant mov instruction. 4.0 does not generate that extra
instruction.

The extra instruction seems to be generated by the bbro pass. Here is the RTL
dump for the .44.rnreg pass: nothing unusual

(call_insn:HI 17 16 19 1 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:SI (nl_langinfo) [flags 0x41]
function_decl 0xb7f2f100
 nl_langinfo) [0 S1 A8])
(const_int 4 [0x4]))) 531 {*call_value_0} (nil)
(expr_list:REG_EH_REGION (const_int 0 [0x0])
(nil))
(nil))

(insn:HI 19 17 20 1 (set (mem/f/c/i:SI (symbol_ref:SI (result.1281) [flags
0x2] var_decl
 0xb7ebb108 result) [2 result+0 S4 A32])
(reg:SI 0 ax [orig:58 D.1283 ] [58])) 34 {*movsi_1}
(insn_list:REG_DEP_TRUE 18 (nil
))
(expr_list:REG_DEAD (reg:SI 0 ax [orig:58 D.1283 ] [58])
(nil)))


but the next dump, .45.bbro shows that an extra move instruction has been
inserted. 

(call_insn:HI 17 16 19 2 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:SI (nl_langinfo) [flags 0x41]
function_decl 0xb7f2f100
 nl_langinfo) [0 S1 A8])
(const_int 4 [0x4]))) 531 {*call_value_0} (nil)
(expr_list:REG_EH_REGION (const_int 0 [0x0])
(nil))
(nil))

(insn:HI 19 17 54 2 (set (mem/f/c/i:SI (symbol_ref:SI (result.1281) [flags
0x2] var_decl
 0xb7ebb108 result) [2 result+0 S4 A32])
(reg:SI 0 ax [orig:58 D.1283 ] [58])) 34 {*movsi_1}
(insn_list:REG_DEP_TRUE 18 (nil
))
(expr_list:REG_DEAD (reg:SI 0 ax [orig:58 D.1283 ] [58])
(nil)))

(insn 54 19 55 2 (set (reg/f:SI 0 ax [orig:61 result ] [61])
(mem/f/c/i:SI (symbol_ref:SI (result.1281) [flags 0x2] var_decl
0xb7ebb108 resul
t) [2 result+0 S4 A32])) 34 {*movsi_1} (nil)
(nil))

This problem is one of the causes for PR23153.


-- 
   Summary: Basic block reordering inserts redundant instruction
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26537

[Bug middle-end/23488] [4.1/4.2 Regression] GCSE load PRE does not work with non sets

2006-03-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #18 from dann at godzilla dot ics dot uci dot edu  2006-03-03 
02:14 ---
(In reply to comment #17)
 (In reply to comment #5)
  It's strange that the load(*) does not get optimized, given that it's in the
  same BB as the store that precedes it... 
  
 movl%eax, result.1282   
  (*)movlresult.1282, %eax
 
 This is because the copying of the trace is happening at the very end of the
 optimization phase so it does not optimized at all.

Right, the copying happens in .bbro (as shown in PR26537). 
gcc-4.0 did the same kind of copying in .bbro, but it did not generate the
redundant mov.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23488

[Bug c/26249] New: cc1 --help segfaults

2006-02-12 Thread dann at godzilla dot ics dot uci dot edu

On an up to date Fedora Core 4 system, with the latest update from svn today
cc1 --help segfaults.

The configure conmmand line was:
 ../gcc/configure --enable-languages=c --disable-checking --disable-nls
--enable-gather-detailed-mem-stats --prefix=${HOME}/build/gcc-HEAD

A gdb session: 
Starting program:
/home/dann/build/gcc-HEAD/libexec/gcc/i686-pc-linux-gnu/4.2.0/cc1 --help
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0xa7e000
The following options are language-independent:


Program received signal SIGSEGV, Segmentation fault.
0x082bee9d in print_filtered_help (flag=536870912) at ../../gcc/gcc/opts.c:1335
1335  memset (printed, 0, cl_options_count);
(gdb) bt
#0  0x082bee9d in print_filtered_help (flag=536870912) at
../../gcc/gcc/opts.c:1335
#1  0x082c0147 in decode_options (argc=2, argv=0xbfac53c4) at
../../gcc/gcc/opts.c:1284
#2  0x083175a9 in toplev_main (argc=2, argv=0xbfac53c4) at
../../gcc/gcc/toplev.c:1970
#3  0x0809e2bf in main (argc=134857264, argv=0x219) at ../../gcc/gcc/main.c:35


-- 
   Summary: cc1 --help segfaults
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26249

[Bug tree-optimization/26251] New: code size increase with -Os

2006-02-12 Thread dann at godzilla dot ics dot uci dot edu

Compiling the function below with -Os -march=i686 -mtune=pentiumpro
generates bigger code for 4.2 than for 4.0. 
The reason seems to be that 4.2 peels off one loop iteration.

typedef unsigned Tabs [10];

void
TabZonk(Tabs tabs)
{
int i;

for (i = 0; i  10; ++i)
 tabs[i] = 0;
}

sdiff gcc-4.0.s gcc-4.2.s

TabZonk:  TabZonk:
pushl   %ebp  pushl   %ebp
movl$1, %eax| movl$2, %eax
movl%esp, %ebpmovl%esp, %ebp
movl8(%ebp), %edx movl8(%ebp), %edx
 movl$0, (%edx)
 .p2align 4,,15
.L2:  .L2:
movl$0, -4(%edx,%eax,4) | xorl%ecx, %ecx
 movl%ecx, -4(%edx,%eax,4)
incl%eax  incl%eax
cmpl$11, %eax cmpl$11, %eax
jne .L2   jne .L2
popl%ebp  popl%ebp
ret   ret


-- 
   Summary: code size increase with -Os
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26251

[Bug tree-optimization/26251] [4.2 Regression] code size increase with -Os

2006-02-12 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #4 from dann at godzilla dot ics dot uci dot edu  2006-02-13 
02:34 ---
Here's another testcase of what seems to be the same problem. 
The 4.2 assembly contains 2 calls for TabSet, 4.0 only has 1.
(both this and the first example are function from xterm in case anybody
wonders)

typedef unsigned Tabs [10];
void TabSet(Tabs tabs, int col);

void
TabReset(Tabs tabs)
{
  int i;

  for (i = 0; i  10; ++i)
tabs[i] = 0;

  for (i = 0; i  ((1  5) * 10); i += 8)
TabSet(tabs, i);
}

void
TabSet(Tabs tabs, int col)
{
  tabs[((col)  5)] |= (1  ((col)  ((1  5)-1)));
}


4.2 assembly:

TabReset:
pushl   %ebp
movl$2, %eax
movl%esp, %ebp
pushl   %esi
movl8(%ebp), %esi
pushl   %ebx
movl$0, (%esi)
.L4:
movl$0, -4(%esi,%eax,4)
incl%eax
cmpl$11, %eax
jne .L4
pushl   $0
movl$8, %ebx
pushl   %esi
callTabSet
popl%ecx
popl%eax
.L6:
pushl   %ebx
addl$8, %ebx
pushl   %esi
callTabSet
cmpl$320, %ebx
popl%eax
popl%edx
jne .L6
leal-8(%ebp), %esp
popl%ebx
popl%esi
popl%ebp
ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26251

[Bug target/11877] gcc should use xor trick with -Os

2006-01-05 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #9 from dann at godzilla dot ics dot uci dot edu  2006-01-05 
20:22 ---
(In reply to comment #7)
 *** Bug 23338 has been marked as a duplicate of this bug. ***
 

Bug 23338 contained a patch that might fixed this issue. Here it is, so
that it can be evaluated.


*** i386.md 08 Aug 2005 16:38:37 -0700  1.652
--- i386.md 11 Aug 2005 11:27:11 -0700  
***
*** 18874,18881 
[(match_scratch:SI 1 r)
 (set (match_operand:SI 0 memory_operand )
  (const_int 0))]
!   ! optimize_size
! ! TARGET_USE_MOV0
  TARGET_SPLIT_LONG_MOVES
  get_attr_length (insn) = ix86_cost-large_insn
  peep2_regno_dead_p (0, FLAGS_REG)
--- 18874,18880 
[(match_scratch:SI 1 r)
 (set (match_operand:SI 0 memory_operand )
  (const_int 0))]
!   ! TARGET_USE_MOV0
  TARGET_SPLIT_LONG_MOVES
  get_attr_length (insn) = ix86_cost-large_insn
  peep2_regno_dead_p (0, FLAGS_REG)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11877

[Bug rtl-optimization/25489] New: Suboptimal code generated for coparisons on Sparc

2005-12-18 Thread dann at godzilla dot ics dot uci dot edu

This code:

typedef struct {
  int protected_mode;
  int x;
} TScreen;

extern void ClearRight (TScreen *screen, int n);
extern void ClearLeft(TScreen * screen);
extern void ClearLine(TScreen * screen);

void
do_erase_line(TScreen * screen, int param, int mode)
{
int saved_mode = screen-protected_mode;

if (saved_mode == 1
 saved_mode != mode)
screen-protected_mode = 0;

switch (param) {
case -1:/* DEFAULT */
case 0:
ClearRight(screen, -1);
break;
case 1:
ClearLeft(screen);
break;
case 2:
ClearLine(screen);
break;
}
screen-protected_mode = saved_mode;
}

is compiled to: (when using -O2 -mcpu=ultrasparc using gcc-4.0.2 and gcc-4.2)
do_erase_line:
save%sp, -112, %sp
ld  [%i0], %l0
xor %l0, 1, %g1 - from here
xor %l0, %i2, %i2
subcc   %g0, %g1, %g0
subx%g0, -1, %g2
subcc   %g0, %i2, %g0
addx%g0, 0, %g1
andcc   %g2, %g1, %g0   - to here
bne,a,pt %icc, .LL2
 st %g0, [%i0]
.LL2:
cmp %i1, 1
be,pn   %icc, .LL6
 nop
[snip]


The code generated for the if can be better implemented
as (pseudoassembly):
 xor save_mode, 1, tmp1
 xnor save_mode, mode, tmp2
 orcc tmp1, tmp2

I don't know if this is a Sparc specific problem, or a general problem.


-- 
   Summary: Suboptimal code generated for coparisons on Sparc
   Product: gcc
   Version: 4.0.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: sparc-sun-solaris2.8


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25489

[Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb

2005-12-18 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #6 from dann at godzilla dot ics dot uci dot edu  2005-12-18 
22:57 ---
(In reply to comment #5)
 Simplified testcase seems to work for me on 4.1 branch:
 restore_fpu:
 movl4(%esp), %edx
 movlboot_cpu_data+12, %eax
 testl   $16777216, %eax

4.0 still does better, it uses a single testb instruction instead of 2
dependent 
movl + testb instructions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810

[Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb

2005-11-12 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #2 from dann at godzilla dot ics dot uci dot edu  2005-11-13 
02:47 ---
Simplified testcase: 
struct cpuinfo_x86 {
  unsigned char x86;
  unsigned char x86_vendor;
  unsigned char x86_model;
  unsigned char x86_mask;
  char wp_works_ok;
  char hlt_works_ok;
  char hard_math;
  char rfu;
  int cpuid_level;
  unsigned long x86_capability[7];
} __attribute__((__aligned__((1  (7);

struct task_struct;
extern void foo (struct task_struct *tsk);
extern void bar (struct task_struct *tsk);

extern struct cpuinfo_x86 boot_cpu_data;

static inline __attribute__((always_inline)) int
constant_test_bit(int nr, const volatile unsigned long *addr)
{
 return ((1UL  (nr  31))  (addr[nr  5])) != 0;
}

void
restore_fpu(struct task_struct *tsk)
{
  if (constant_test_bit(24, boot_cpu_data.x86_capability))
foo (tsk);
  else
bar (tsk);
}

The generated code for this simplified tescase shows one additional issue:

restore_fpu:
movl%eax, %edx
movlboot_cpu_data+12, %eax  ; edx could be used here
testl   $16777216, %eax ; and here
je  .L2
movl%edx, %eax  ; then all the mov %eax, %edx and mov %edx, %eax
jmp foo ; instructions could be eliminated.
.p2align 4,,7
.L2:
movl%edx, %eax
jmp bar


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

Summary|mov + mov + testl generated |[4.1 Regression] mov + mov +
   |instead of testb|testl generated instead of
   ||testb


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810

[Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb

2005-11-11 Thread dann at godzilla dot ics dot uci dot edu

Compiling i387.c from the Linux kernel using: 
 -nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.0.1/include -D__KERNEL__
-Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -ffreestanding -O2 -fomit-frame-pointer -g -save-temps -msoft-float
-m32 -fno-builtin-sprintf -fno-builtin-log2 -fno-builtin-puts
-mpreferred-stack-boundary=2 -fno-unit-at-a-time -march=i686 -mtune=pentium4
-mregparm=3 -Iinclude/asm-i386/mach-default -Wdeclaration-after-statement
-Wno-pointer-sign -DKBUILD_BASENAME=i387 -DKBUILD_MODNAME=i387
-carch/i386/kernel/i387.c
(these are the flags generated by rpmbuild on a Fedora Core 4 system) 

Using 4.0 the restore_fpu function looks like:
restore_fpu:
testb   $1, boot_cpu_data+15
je  .L23
[snip]

Using 4.1 it looks like:
restore_fpu:
movl%eax, %edx
movlboot_cpu_data+12, %eax
testl   $16777216, %eax
je  .L24
[snip]

Similar code sequences appear in other functions in the same file: 
get_fpu_mxcsr, get_fpu_swd, get_fpu_cwd, set_fpregs.
The size of these functions increases by 5 bytes (i.e.20%) 

It seems that some of these functions might be on some critical path in the
kernel, so the size increase (and maybe speed penalty) could have an impact.

For 4.0 the 00.expand dump looks like:

(insn 9 7 10 1 (set (reg/f:SI 59)
(const:SI (plus:SI (symbol_ref:SI (boot_cpu_data) [flags 0x40]
var_decl 0xb7ee2d
80 boot_cpu_data)
(const_int 12 [0xc] -1 (nil)
(nil))

(insn 10 9 11 1 (set (reg:SI 60)
(mem/s/j:SI (reg/f:SI 59) [0 boot_cpu_data.x86_capability+0 S4 A32]))
-1 (nil)
(nil))

(insn 11 10 12 1 (parallel [
(set (reg:SI 61)
(and:SI (reg:SI 60)
(const_int 16777216 [0x100])))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(nil))

(insn 12 11 13 1 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 61)
(const_int 0 [0x0]))) -1 (nil)
(nil))


for 4.1 is identical except for insn 10 which has mem/s/v/j:SI 
instead of mem/s/j:SI. 

The combine pass of 4.0 deletes insn 10, that does not happen for 4.1


For 4.1 the generated code does not change when using -Os or -march=pentium4

This is one of the causes for PR23153


-- 
   Summary: mov + mov + testl generated instead of testb
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810

[Bug rtl-optimization/24810] mov + mov + testl generated instead of testb

2005-11-11 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #1 from dann at godzilla dot ics dot uci dot edu  2005-11-11 
19:29 ---
Created an attachment (id=10220)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10220action=view)
Preprocessed code containing the functions that exhibit the problem


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810

[Bug target/23153] [4.1 Regression] [meta-bug] code size regression from 4.0 on x86

2005-11-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #10 from dann at godzilla dot ics dot uci dot edu  2005-11-03 
00:53 ---
(In reply to comment #9)
 What are the flags for the sizes in comment #7 and comment #8? 

-O2 -march=i686


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23153

[Bug target/23153] [4.1 Regression] [meta-bug] code size regression from 4.0 on x86

2005-11-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #11 from dann at godzilla dot ics dot uci dot edu  2005-11-03 
00:59 ---
A very useful tool for comparing function sizes in 2 binaries/object file is:

 ftp://ftp.firstfloor.org/pub/ak/bloat-o-meter

it displays the function names, the size, the size difference as absolute value
and percentage. It would even be nice to have something like this in
gcc/contrib.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23153

[Bug rtl-optimization/23523] peephole2 causes code size on i686

2005-11-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #5 from dann at godzilla dot ics dot uci dot edu  2005-11-03 
01:27 ---
(In reply to comment #4)
 This is actually invalid as nothing happens for -Os case so what you are 
 seeing
 is just an atrifact.

Sorry but this explanation for marking the PR invalid does not make sense.
The code in question is generated using -O2 not -Os.
IMO the observation in comment #3 is important, and there should be some
explanation for it. 


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|INVALID |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523

[Bug rtl-optimization/23523] peephole2 causes code size on i686

2005-11-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #8 from dann at godzilla dot ics dot uci dot edu  2005-11-03 
02:12 ---
(In reply to comment #6)
 The use of ax vs cx will not matter in the real world.

This is from a real world program (xterm) and it seems to matter, when using
eax the code is smaller.

Are you sure that the fact that eax is not used does not cover some other
problem? 
Are the free registers picked at random for peepholes? 
It might be the case that 4.0 was just using eax by chance, but that does not
mean the PR should be dismissed as invalid without understanding the underlying
problem.


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|INVALID |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523

[Bug rtl-optimization/23523] peephole2 causes code size on i686

2005-11-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #10 from dann at godzilla dot ics dot uci dot edu  2005-11-03 
02:34 ---
(In reply to comment #9)
 Have you tested the speed?  As I said I really doubt it makes a real world
 change in terms of speed.  This is different from code size.

I am not sure what kind of answer you expect here. 
Speed and code size are not disjoint. Think about I-cache and I-TLB misses.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523

[Bug target/23524] [4.1 Regression]bigger version of mov + cmp produced

2005-11-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #16 from dann at godzilla dot ics dot uci dot edu  2005-11-03 
06:42 ---
(In reply to comment #15)
 (In reply to comment #11)
  And FWIW there is also a problem with this insn, the length is wrong:
  
  #(insn 11 46 47 0x2a955cc840 (set (reg:SI 0 eax [orig:61 x ] [61])
  #(mem/f:SI (symbol_ref:SI (x)) [5 x+0 S4 A32])) 44 {*movsi_1} 
  (nil)
  #(expr_list:REG_EQUIV (mem/f:SI (symbol_ref:SI (x)) [5 x+0 S4 A32])
  #(nil)))
  A1   movlx, %eax # 11*movsi_1/1  [length = 6]
 
 FYI: This problem is addressed in patch at
 http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00116.html.

Do you know if your patch also fixes this PR?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524

[Bug rtl-optimization/23523] peephole2 causes code size on i686

2005-11-02 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #12 from dann at godzilla dot ics dot uci dot edu  2005-11-03 
07:51 ---
(In reply to comment #11)
 (In reply to comment #10)
  I am not sure what kind of answer you expect here. 
  Speed and code size are not disjoint. Think about I-cache and I-TLB misses.
 But again who is using an pentiumpro machine any more.  People who really care

The code generated -march=pentium3 or -march=pentium-m generate the same code.

If you want to close this bug please address the technical issues about
peepholes in comment #8.


-- 

dann at godzilla dot ics dot uci dot edu changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|INVALID |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523

[Bug target/23303] [4.1 Regression] 4.1 generates sall + addl instead of leal

2005-11-01 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #7 from dann at godzilla dot ics dot uci dot edu  2005-11-01 
15:15 ---
(In reply to comment #5)
 Hmm,
 I am still not sure if it matters too much, but since there are actually
 dupes of this problem, I think we can simply add peep2 fixing this
 particular common case.
 
 I am testing attached patch.

Could you please try to measure the code size impact of this patch?
(like the examples in PR23153: xterm, PR8361 or kernel)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23303

[Bug target/23153] [4.1 Regression] [meta-bug] code size regression from 4.0 on x86

2005-10-30 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #8 from dann at godzilla dot ics dot uci dot edu  2005-10-31 
04:15 ---
More data, the Linux kernel compiled for i686: 
size -f *
   textdata bss dec hex filename
2625471  534012  611768 3771251  398b73 vmlinux.4.0
3023306  429364  347384 3800054  39fbf6 vmlinux.4.1

It would be good if someone else can try to reproduce this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23153

[Bug target/23524] [4.1 Regression]bigger version of mov + cmp produced

2005-10-30 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #13 from dann at godzilla dot ics dot uci dot edu  2005-10-31 
04:50 ---
(In reply to comment #12)
 A more interesting test would be to see the Linux kernel size difference,

There's such a comparison now in comment #8 in PR23153. It confirms the size
increase.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524

[Bug target/23524] [4.1 Regression]bigger version of mov + cmp produced

2005-10-27 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #8 from dann at godzilla dot ics dot uci dot edu  2005-10-27 
16:43 ---
(In reply to comment #7)
 Could the dear reported at least try to provide a small test case?

The testcase in the attachment contains only a 4 lines function: 
HandleDeIconify, the rest is just fluff to allow it to compile. Granted a lot
of it can be pruned, but I don't think it stops trying to debug the problem.


 I think this should not be marked as a regression.  

Why not? It is a regression. 

 It's just sad that this
 kind of non-bug keeps the regression count high, when in reality GCC 4.1
 produces smaller code overall.

PR23153 tells a completely different story about codesize (at least for i686).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524

[Bug target/23524] [4.1 Regression]bigger version of mov + cmp produced

2005-10-27 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #12 from dann at godzilla dot ics dot uci dot edu  2005-10-27 
18:08 ---
(In reply to comment #9)
 And CSiBE tells you the story that GCC 4.1 produces smaller code overall.
 http://www.inf.u-szeged.hu/csibe/draw-diag.php?draw=sum-osbasephp=s-i686-linux

Well, it obviously depends on applications.
The point of PR23153 is to show that there is a code size regression, and all
the PRs that depend on it are showing very specific issues that cause a part of
the regression.

A more interesting test would be to see the Linux kernel size difference,
because if there's any difference there would be some people screaming
(unfortunately I won't be able to do that comparison anytime soon, hope someone
else will).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524

[Bug rtl-optimization/24209] New: strange instruction selected for an annuled slot on sparc

2005-10-04 Thread dann at godzilla dot ics dot uci dot edu

4.1 selects a strange instruction to put in the delay slot of a bl,a
instruction because in the non-taken case the same instruction will be executed
anyway...


-O2 code for 4.1

PointToRowCol:
save%sp, -112, %sp
sethi   %hi(term), %g1
ld  [%g1+%lo(term)], %l2
add %l2, 136, %l1
ld  [%l1+572], %l3
ld  [%l1+772], %l0
sub %i0, %l3, %o0
call.div, 0
 ld [%l0+20], %o1 
sethi   %hi(firstValidRow), %g1
ld  [%g1+%lo(firstValidRow)], %i0
cmp %o0, %i0
bl,a.LL118
 ldsb   [%l2+1823], %g1  ;; this instruction
sethi   %hi(lastValidRow), %g1
ld  [%g1+%lo(lastValidRow)], %g1
cmp %o0, %g1
bg  .LL116
 mov%o0, %i0
.LL105:
ldsb[%l2+1823], %g1  ;; this will be executed on the
 ;; non-taken path
.LL118:
cmp %g1, 0
bne .LL110
 mov0, %o0
ld  [%l0+32], %o0
.LL110:
add %o0, %l3, %o0
ld  [%l0+16], %o1
call.div, 0
 sub%i1, %o0, %o0
cmp %o0, 0
bl  .LL113
 mov0, %g2
ld  [%l1+888], %g1
add %g1, 1, %g1
cmp %o0, %g1
bg  .LL117
 mov%o0, %g2
.LL113:
st  %i0, [%i2]
st  %g2, [%i3]
jmp %i7+8
 restore
.LL117:
st  %i0, [%i2]
mov %g1, %g2
st  %g2, [%i3]
jmp %i7+8
 restore
.LL116:
b   .LL105
 mov%g1, %i0


The 4.0 code is:
PointToRowCol:
save%sp, -112, %sp
sethi   %hi(term), %g1
ld  [%g1+%lo(term)], %l2
add %l2, 136, %l1
ld  [%l1+572], %l3
sub %i0, %l3, %o0
ld  [%l1+772], %i0
call.div, 0
 ld [%i0+20], %o1
sethi   %hi(firstValidRow), %g1
ld  [%g1+%lo(firstValidRow)], %g1
cmp %o0, %g1
bl  .LL42
 mov%o0, %l0
sethi   %hi(lastValidRow), %g1
ld  [%g1+%lo(lastValidRow)], %g1
cmp %o0, %g1
bg,a.LL32
 mov%g1, %l0
.LL32:
ldsb[%l2+1823], %g1
cmp %g1, 0
bne .LL36
 mov0, %o0
ld  [%i0+32], %o0
.LL36:
add %o0, %l3, %o0
ld  [%i0+16], %o1
call.div, 0
 sub%i1, %o0, %o0
cmp %o0, 0
bl,a.LL43
 st %l0, [%i2]
ld  [%l1+888], %g1
add %g1, 1, %g1
cmp %o0, %g1
bg,a.LL39
 mov%g1, %o0
.LL39:
st  %l0, [%i2]
st  %o0, [%i3]
jmp %i7+8
 restore
.LL42:
b   .LL32
 mov%g1, %l0
.LL43:
mov 0, %o0
st  %o0, [%i3]
jmp %i7+8

(the 4.0 code a few bytes smaller)

I'll attach the preprocessed code.


-- 
   Summary: strange instruction selected for an annuled slot on
sparc
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: sparc-sun-solaris2.8


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24209

[Bug rtl-optimization/24209] strange instruction selected for an annuled slot on sparc

2005-10-04 Thread dann at godzilla dot ics dot uci dot edu



--- Comment #1 from dann at godzilla dot ics dot uci dot edu  2005-10-05 
05:13 ---
Created an attachment (id=9889)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9889action=view)
preprocessed code for this bug


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24209

[Bug c/24068] Unconditional warning when using -combine

2005-09-29 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-29 20:10 ---
(In reply to comment #9)
 Subject: Re:  Unconditional warning when using -combine
 
 On Mon, Sep 26, 2005 at 08:46:20PM -, dann at godzilla dot ics dot uci dot
edu wrote:
   So this about the following:
   int f(a)
   int a;
   {
 return a;
   }
   int f(int);
   
   Which is questionable.
   
   So I don't think this is not an inappropriate warning.
  
  It seems that the warning was designed for code like your example above. 
  But if you have 1 KR file and one C90 file, then there should be no 
  warning... 
  Another bad thing is that if you swap the files on the command line then 
  you get
  no warning.
 
 There certainly should be a warning.  It's not obvious on most targets
 with int, but what you're doing here won't work with float arguments;
 if the prototype includes an argument list, the definition should also.
 

Sorry, I am not sure I understand what are you referring to... Both in the
original bug report and in Andrew's example above both the definition and the
prototype included an argument list with types for all the declarations.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068

[Bug target/23302] [4.1 Regression] extra move generated on x86

2005-09-28 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-28 17:29 ---
(In reply to comment #2)
 While it might be probably possible to design peephole or combiner insn patter
 I am tempted to close this and PR 23303 as WONTFIX as it seems to me we was
 optimizing this by pure luck and the patch seems to have overall positive 
 effect
 on code size...

IMHO closing these bugs as WONTFIX is not the right thing to do. This is clearly
a missed optimization opportunity. The fact that it worked by chance before your
(overall good) patch does not make fixing this issue less desirable.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23302

[Bug c/24068] New: Unconditional warning when using -fwhole-program

2005-09-26 Thread dann at godzilla dot ics dot uci dot edu

When trying to compile the attached preprocessed files using
gcc -c -fwhole-program --combine xterm.i xlwmenu.i

These warnings are produced unconditionally:

/home/dann/build/Emacs-CVS/emacs/lwlib/xlwmenu.c:57: warning: prototype for
'x_alloc_nearest_color_for_widget' follows non-prototype definition
/home/dann/build/Emacs-CVS/emacs/lwlib/xlwmenu.c:58: warning: prototype for
'x_alloc_lighter_color_for_widget' follows non-prototype definition
/home/dann/build/Emacs-CVS/emacs/lwlib/xlwmenu.c:64: warning: prototype for
'x_clear_errors' follows non-prototype definition
/home/dann/build/Emacs-CVS/emacs/lwlib/xlwmenu.c:65: warning: prototype for
'x_copy_dpy_color' follows non-prototype definition

AFAICT the warnings don't make much sense. The code is correct. The functions in
questions are defined in one file and then prototyped and used in the other
file. This kind of stuff appears in countless C programs. 
Can this warning be turned off by default?

-- 
   Summary: Unconditional warning when using -fwhole-program
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068

[Bug c/24068] Unconditional warning when using -fwhole-program

2005-09-26 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-26 19:25 ---
Created an attachment (id=9807)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9807action=view)
xterm.i


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068

[Bug c/24068] Unconditional warning when using -fwhole-program

2005-09-26 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-26 19:25 ---
Created an attachment (id=9808)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9808action=view)
xlwmenu.i


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068

[Bug c/24068] Unconditional warning when using -combine

2005-09-26 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-26 19:54 ---
(In reply to comment #4)
 Because one file uses KR style function defintions and the other uses a
prototype which is ANSI/ISO 
 style.
 Simple example:
[snip]
 So I don't think this is not an inappropriate warning.

The question is: can this EVER result in incorrect behavior? 
Is it incorrect from the standard point of view? 

If the answer to the above is no, then there no reason to warn.

 
 As an aside, I wish people would stop using KR style C already.

Aggreed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068

[Bug c/24068] Unconditional warning when using -combine

2005-09-26 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-26 20:46 ---
(In reply to comment #4)
 So this about the following:
 int f(a)
 int a;
 {
   return a;
 }
 int f(int);
 
 Which is questionable.
 
 So I don't think this is not an inappropriate warning.

It seems that the warning was designed for code like your example above. 
But if you have 1 KR file and one C90 file, then there should be no warning... 
Another bad thing is that if you swap the files on the command line then you get
no warning.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24068

[Bug target/23828] local calling convention not used when using --combine

2005-09-21 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-21 17:43 ---
(In reply to comment #8)
 (In reply to comment #4)

 Instead of the above check, change it to:
 if (local_regparm == 3   DECL_STRUCT_FUNCTION (fn)-static_chain_decl)
   local_regparm = 2;

DECL_STRUCT_FUNCTION does not work, it ICEs when running the testsuite... 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828

[Bug middle-end/23872] New: .t02.original dump weirdness

2005-09-13 Thread dann at godzilla dot ics dot uci dot edu

Using gcc -O2 -fdump-tree-all -S 
to compile: 

int bar (void) {   return 0;}

int foo (int reject) {   int result = 0;   return result;}

the .t02.original dump looks like:
;; Function bar (bar)
;; enabled by -tree-original
{
  return 0;
}
;; Function foo (foo)
;; enabled by -tree-original
{
  int result = 0;

int result = 0; --- this line appears twice...
  return result;
}

If the order of the 2 functions is reversed in the file then the dump looks 
like:

;; Function foo (foo)
;; enabled by -tree-original
{
  int result = 0;
  STATEMENT_LIST  --- the return does not appear...
}
;; Function bar (bar)
;; enabled by -tree-original
{
  return 0;
}

Using just -fdump-tree-original then the dump for foo always looks like the
second version.

-- 
   Summary: .t02.original dump weirdness
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23872

[Bug c/23872] .t02.original dump weirdness

2005-09-13 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-13 20:45 ---
The fact that the dump is different depending on function order or compilation
flags seems to point to either an uninitialized variable or some memory 
corruption.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23872

[Bug middle-end/23828] local calling convention not used when using -fwhole-program --combine

2005-09-13 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-13 22:36 ---
It looks like the -fwhole-program version of ClearLeft only passes the
first 2 arguments to the ClearInLine call in register, the 3rd one is
passed on the stack. 
The reason for that is this code in i386.c:ix86_function_regparm:

   /* We can't use regparm(3) for nested functions as these use
  static chain pointer in third argument.  */
   if (local_regparm == 3  DECL_CONTEXT (decl)
!DECL_NO_STATIC_CHAIN (decl))
  local_regparm = 2;

The test for nested functions is incorrect, in the -fwhole-program
case DECL_CONTEXT (DECL_for_ClearLeft) is a TRANSLATION_UNIT_DECL so
the test is true even though it should not be.

Changing the code to:

  if (local_regparm == 3
   DECL_CONTEXT (decl)
   (TREE_CODE (DECL_CONTEXT (decl)) != TRANSLATION_UNIT_DECL)
   !DECL_NO_STATIC_CHAIN (decl))
  local_regparm = 2;

fixes the testcase. 

But the above just fixes the symptoms, it's probably not the correct
way to test for a nested function.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828

[Bug middle-end/23828] local calling convention not used when using -fwhole-program --combine

2005-09-13 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-13 22:57 ---
(In reply to comment #6)
 Maybe a better check would be check in the decl's function struct's 
 field
 static_chain_decl is set.

I am not sure I understand what you mean here... 

Maybe adding a test like this
TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)
should work.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828

[Bug target/23153] [4.1 Regression] [meta-bug] code size regression from 4.0 on x86

2005-09-13 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-13 23:09 ---
Additional data: 
For the testcase in PR8361:

 size -f generate-3.4*.o
   textdata bss dec hex filename
 297025   4 181  297210   488fa generate-3.4-4.0.o
 318366   8 181  318555   4dc5b generate-3.4-4.1.o

so about a 7% increase for 4.1


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23153

[Bug middle-end/23828] local calling convention not used when using -fwhole-program --combine

2005-09-12 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-12 23:30 ---
(In reply to comment #1)
 If it changes calling-conventions
 in single-file compile mode the function must be declared static, so it
 definitely may be changed in whole-program mode, too.

Yep, both ClearLeft and ClearInLine are declared static.
It's interesting that both ClearLeft and ClearInLine appear on the
Marking local functions: line in the i00.cgraph dump.

Can you confirm this bug?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828

[Bug rtl-optimization/23828] New: local calling convention not used when using -fwhole-program --combine

2005-09-11 Thread dann at godzilla dot ics dot uci dot edu

When compiling the files in the attachment for PR22574 using the command line:

gcc -fwhole-program --combine -march=i686 -O2 button.i charproc.i charsets.i
cursor.i data.i doublechr.i fontutils.i input.i main.i menu.i misc.i print.i
ptydata.i screen.i scrollbar.i tabs.i util.i xstrings.i VTPrsTbl.i -S

the function ClearLeft looks like:

ClearLeft.221553:
pushl   %ebp
movl%esp, %ebp
subl$8, %esp
movl3748(%eax), %ecx
movl3752(%eax), %edx
movl$0, (%esp); - 0 is passed on the stack
incl%ecx
movl%ecx, 4(%esp)
callClearInLine.221545
leave
ret

When compiling just the file util.i that contains ClearLeft using 
-march=i686 -O2 the assembly is:

ClearLeft:
pushl   %ebp
movl%esp, %ebp
subl$8, %esp
movl3748(%eax), %ecx
movl3752(%eax), %edx
incl%ecx
movl%ecx, (%esp)
xorl%ecx, %ecx   ; 0 is passed in a register
callClearInLine
leave
ret

When using -fwhole-program --combine the parameter 0 to the ClearInLine
function is passed on the stack instead of being passed in a register. 

Is there a reason for that? Shouldn't it be more better to pass it in a 
register?

-- 
   Summary: local calling convention not used when using -fwhole-
program --combine
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: i686-pc-linux-gnus


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23828

[Bug rtl-optimization/23524] bigger version of mov + cmp produced

2005-09-07 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-09-07 22:05 ---
It seems that expand generates different insns in 4.0 and 4.1 for the comparison
in question:

4.0 generates: (from .00.expand)

(insn 15 13 16 1 (set (reg/f:SI 62)
(mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ])
(const_int 4 [0x4])) [5 variable.core.widget_class+0 S4 A32]))
-1 (nil)
(nil))

(insn 16 15 17 1 (set (reg/f:SI 63)
(mem/f/i:SI (symbol_ref:SI (xtermWidgetClass) [flags 0x40] var_decl
0xb7a8b0d8 x
termWidgetClass) [5 xtermWidgetClass+0 S4 A32])) -1 (nil)
(nil))

(insn 17 16 18 1 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/f:SI 62)
(reg/f:SI 63))) -1 (nil)
(nil))

4.1 generates: 

(insn 15 13 16 1 (set (reg:SI 62)
(mem/s/f:SI (plus:SI (reg/v/f:SI 58 [ gw ])
(const_int 4 [0x4])) [5 variable.core.widget_class+0 S4 A32]))
-1 (nil)
(nil))

(insn 16 15 17 1 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 62)
(mem/f/i:SI (symbol_ref:SI (xtermWidgetClass) [flags 0x40]
var_decl 0xb7b510
b0 xtermWidgetClass) [5 xtermWidgetClass+0 S4 A32]))) -1 (nil)
(nil))



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524

[Bug rtl-optimization/22563] [3.4/4.0/4.1 Regression] performance regression for gcc newer than 2.95

2005-08-24 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-08-25 02:49 ---
This message:
http://gcc.gnu.org/ml/gcc/2005-08/msg00208.html

was asking for the reason for the slowdown for S05e

AFAICT the inner loop for the benchmark (in s05e_test) gets compiled to: 

.L153:
fstl(%edx)
leal8(%edx), %eax
fstl(%eax)
fstl8(%eax)
fstl16(%eax)
fstl24(%eax)
fstl32(%eax)
fstl40(%eax)
fstl48(%eax)
leal56(%eax), %edx
cmpl%edx, %ecx
jne .L153

and to:

.L9:
movl$0, (%edx)
movl$1074266112, 4(%edx)
movl$0, 8(%edx)
movl$1074266112, 12(%edx)
movl$0, 16(%edx)
movl$1074266112, 20(%edx)
movl$0, 24(%edx)
movl$1074266112, 28(%edx)
movl$0, 32(%edx)
movl$1074266112, 36(%edx)
movl$0, 40(%edx)
movl$1074266112, 44(%edx)
movl$0, 48(%edx)
movl$1074266112, 52(%edx)
movl$0, 56(%edx)
movl$1074266112, 60(%edx)
addl$64, %edx
cmpl%edx, %ebx
jne .L9

by 4.1

The 4.1 code looks much worse...

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22563

[Bug rtl-optimization/23523] code size regression on x86

2005-08-24 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-08-25 05:43 ---
The issue is the peephole2 pass in 4.1. Before it the insn looks like:

(insn:HI 36 34 37 0 (set (mem/i:SI (symbol_ref:SI (waiting_for_initial_map)
[flags 0x40] var_decl 0xb7b71898 waiting_for_initial_map) [7
waiting_for_initial_map+0 S4 A32])
(const_int 0 [0x0])) 34 {*movsi_1} (nil)
(nil))

and after: 

(insn 58 34 59 0 (parallel [
(set (reg:SI 2 cx)
(const_int 0 [0x0]))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

(insn 59 58 37 0 (set (mem/i:SI (symbol_ref:SI (waiting_for_initial_map)
[flags 0x40] var_decl 0xb7b71898 waiting_for_initial_map) [7
waiting_for_initial_map+0 S4 A32])
(reg:SI 2 cx)) -1 (nil)
(expr_list:REG_DEAD (reg:SI 2 cx)
(nil)))

4.0 uses ax instead of cx. %eax is free at that point, so it is strange 
that 
it's not used in 4.1


-- 
   What|Removed |Added

  BugsThisDependsOn|18427   |
OtherBugsDependingO||23153
  nThis||
   Keywords|ra  |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23523

[Bug target/23525] New: inefficient parameter passing on x86

2005-08-23 Thread dann at godzilla dot ics dot uci dot edu

Compiling this code: 

extern int waiting_for_initial_map;
extern int close (int __fd);

void
first_map_occurred(void)
{
close(cp_pipe[0]);
close(pc_pipe[1]);
waiting_for_initial_map = 0;
}

using -O2 -march=i686 4.[01] generate sequences like:

movlcp_pipe, %eax
movl%eax, (%esp)

for calling the close function 

the Intel compiler generates: 
 pushl cp_pipe

-- 
   Summary: inefficient parameter passing on x86
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P2
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23525

[Bug rtl-optimization/23524] bigger version of mov + cmp produced

2005-08-23 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-08-23 18:05 ---
(In reply to comment #2)
 You really should know that we only care about code size at -Os.  We care
about performance 
 regressions though at -O2.

Code size is important for performance for modern processors. Small I-cache (and
I-TLB) footprint for otherwise equivalent code results in better performance.

BTW, this is a 4.1 regression. 


-- 
   What|Removed |Added

OtherBugsDependingO||23153
  nThis||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524

[Bug rtl-optimization/23524] bigger version of mov + cmp produced

2005-08-23 Thread dann at godzilla dot ics dot uci dot edu


--- Additional Comments From dann at godzilla dot ics dot uci dot edu  
2005-08-23 18:15 ---
(In reply to comment #4)

 Then use -Os every where instead.  You will see that the overall code 
 size for 4.1
 has reduced from 4.0.

That might be true, but -Os is not always an option. If there's a good reason
for -O2 to generate bigger code, then so be it, but that does not seem to be the
case for the code in this PR (at least AFAICT).

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23524

1 2 >

1 - 100 of 132 matches

Mail list logo