[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-18 Thread sezeroz at gmail dot com


--- Comment #27 from sezeroz at gmail dot com  2010-09-18 20:51 ---
Are 4.4 and 4.5 going to be fixed?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-17 Thread rguenth at gcc dot gnu dot org


--- Comment #22 from rguenth at gcc dot gnu dot org  2010-09-17 09:00 
---
Subject: Bug 45678

Author: rguenth
Date: Fri Sep 17 09:00:23 2010
New Revision: 164356

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=164356
Log:
2010-09-17  Richard Guenther  rguent...@suse.de

PR middle-end/45678
* builtins.c (fold_builtin_memory_op): Always properly adjust
alignment of memory accesses.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/builtins.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-17 Thread rguenth at gcc dot gnu dot org


--- Comment #23 from rguenth at gcc dot gnu dot org  2010-09-17 13:57 
---
Subject: Bug 45678

Author: rguenth
Date: Fri Sep 17 13:57:04 2010
New Revision: 164369

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=164369
Log:
2010-09-17  Richard Guenther  rguent...@suse.de

PR middle-end/45678
* gcc.dg/torture/pr45678-1.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/torture/pr45678-1.c
Modified:
trunk/gcc/testsuite/ChangeLog


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-17 Thread hjl dot tools at gmail dot com


--- Comment #24 from hjl dot tools at gmail dot com  2010-09-17 16:35 
---
Created an attachment (id=21821)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21821action=view)
A patch

The problem is we failed to update stack alignment when
we increase alignment of local variable.  This patch works
for me.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-17 Thread hjl dot tools at gmail dot com


--- Comment #25 from hjl dot tools at gmail dot com  2010-09-17 17:26 
---
A patch is posted at

http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01425.html


-- 

hjl dot tools at gmail dot com changed:

   What|Removed |Added

URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2010-
   ||09/msg01425.html


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-17 Thread hjl at gcc dot gnu dot org


--- Comment #26 from hjl at gcc dot gnu dot org  2010-09-17 17:49 ---
Subject: Bug 45678

Author: hjl
Date: Fri Sep 17 17:49:30 2010
New Revision: 164375

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=164375
Log:
Update stack alignment when increasing local variable alignment.

gcc/

2010-09-17  H.J. Lu  hongjiu...@intel.com

PR middle-end/45678
* cfgexpand.c (update_stack_alignment): New.
(get_decl_align_unit): Use it.
(expand_one_stack_var_at): Call update_stack_alignment.

gcc/testsuite/

2010-09-17  H.J. Lu  hongjiu...@intel.com

PR middle-end/45678
* gcc.dg/torture/pr45678-2.c: New.

Added:
trunk/gcc/testsuite/gcc.dg/torture/pr45678-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgexpand.c
trunk/gcc/testsuite/ChangeLog


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread rguenth at gcc dot gnu dot org


--- Comment #3 from rguenth at gcc dot gnu dot org  2010-09-16 10:17 ---
DECL_ALIGN of d is set to 128 (but appearantly it isn't ensured it'll end up
that way).  DECL_ALIGN is adjusted here:

Old value = 32
New value = 128
expand_one_stack_var_at (decl=0x75ae90a0, offset=-16)
at /space/rguenther/src/svn/trunk/gcc/cfgexpand.c:739
739   DECL_USER_ALIGN (decl) = 0;

so on trunk get_object_alignment of the MEM_REF will return 128 and thus
we do not run into unaligned move expansion here:

if (mode != BLKmode
 (unsigned) align  GET_MODE_ALIGNMENT (mode)
/* If the target does not have special handling for unaligned
   loads of mode then it can use regular moves for them.  */
 ((icode = optab_handler (movmisalign_optab, mode))
!= CODE_FOR_nothing))

manually setting alignment back to 32 in gdb results in ok asm.

movlps  (%esp), %xmm0
movhps  8(%esp), %xmm0
mulps   .LC4, %xmm0

instead of

mulps   (%esp), %xmm0

Appearantly stack alignment code doesn't work.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||hjl at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread rguenth at gcc dot gnu dot org


--- Comment #4 from rguenth at gcc dot gnu dot org  2010-09-16 10:18 ---
Created an attachment (id=21809)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21809action=view)
patch to fix half STRICT_ALIGNMENT targets memcpy folding

Might need this patch to fix as well.  i?86 / x86_64 isn't really
!STRICT_ALIGNMENT.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread jakub at gcc dot gnu dot org


--- Comment #5 from jakub at gcc dot gnu dot org  2010-09-16 10:40 ---
Re: #c4, shouldn't there be srcvar = NULL_TREE; somewhere for the
STRICT_ALIGNMENT non-aligned case?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread rguenth at gcc dot gnu dot org


--- Comment #6 from rguenth at gcc dot gnu dot org  2010-09-16 10:50 ---
Missing some else indeed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread jakub at gcc dot gnu dot org


--- Comment #7 from jakub at gcc dot gnu dot org  2010-09-16 11:57 ---
For the ix86/x86_64 alignment issue, I believe the problem here is that
  max_align = MAX (crtl-max_used_stack_slot_alignment,
   PREFERRED_STACK_BOUNDARY);
is fine for !SUPPORTS_STACK_ALIGNMENT targets, but for ix86/x86_64 if
max_used_stack_slot_alignment is really small, we might end up with deciding to
use a smaller alignment.
Perhaps for SUPPORTS_STACK_ALIGNMENT we should use here instead
max_align = MAX (crtl-max_used_stack_slot_alignment,
 INCOMING_STACK_BOUNDARY);
and if the align we compute is bigger than crtl-max_used_stack_slot_alignment
ensure we will keep using that alignment (e.g. by bumping also
crtl-stack_align_needed/estimated).  If INCOMING_STACK_BOUNDARY is 128 bits
aligned, I think using 128 bit alignment shouldn't cost us anything extra.
The problem with using INCOMING_STACK_BOUNDARY is that it is on ix86/x86-64
computed only too late (in expand_stack_alignment by
targetm.calls.update_stack_boundary (); ).  The comment above it says it is
computed again, but I can't actually find another call.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread hjl dot tools at gmail dot com


--- Comment #8 from hjl dot tools at gmail dot com  2010-09-16 13:02 ---
This also failed:

---
typedef float V __attribute__ ((vector_size (16)));
V g;
float d[4] = { 4, 3, 2, 1 };

int
main ()
{
  V e;
  __builtin_memcpy (e, d, sizeof (d));
  V f = { 5, 15, 25, 35 };
  e = e * f;
  g = e;
  return 0;
}
---

Program received signal SIGSEGV, Segmentation fault.
0x0804837e in main () at foo.c:11
11e = e * f;
Missing separate debuginfos, use: debuginfo-install glibc-2.12.1-2.0.f13.i686
(gdb) disass
Dump of assembler code for function main:
   0x08048374 +0: push   %ebp
   0x08048375 +1: mov%esp,%ebp
   0x08048377 +3: movaps 0x8048470,%xmm0
= 0x0804837e +10:mulps  0x8049644,%xmm0
   0x08048385 +17:movaps %xmm0,0x8049670
   0x0804838c +24:mov$0x0,%eax
   0x08048391 +29:pop%ebp
   0x08048392 +30:ret
End of assembler dump.
(gdb) q

There is no stack involved. Somehow we failed to align
array of float properly.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread hjl dot tools at gmail dot com


--- Comment #9 from hjl dot tools at gmail dot com  2010-09-16 13:05 ---
If __builtin_memcpy generates instructions which
require bigger alignment than alignments of
source or destination, it should increase the
alignment of source or destination.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread hjl dot tools at gmail dot com


--- Comment #10 from hjl dot tools at gmail dot com  2010-09-16 13:10 
---
When __builtin_memcpy increases the alignment of source
or destination, it should update needed stack alignment if
source or destination is on stack.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread hjl dot tools at gmail dot com


--- Comment #11 from hjl dot tools at gmail dot com  2010-09-16 13:21 
---
This code:

  if (TREE_CODE (srcvar) == ADDR_EXPR
   var_decl_component_p (TREE_OPERAND (srcvar, 0))
   tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len)
   (!STRICT_ALIGNMENT
  || !destvar
  || src_align = TYPE_ALIGN (desttype)))
srcvar = fold_build2 (MEM_REF, destvar ? desttype : srctype,
  srcvar, off0);

does

float d[4];
__m128 *p = (__m128 *) d;

and treats p as properly aligned.  I don't see how it can ever
work with SSE. It has nothing to do with stack alignment.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread hjl dot tools at gmail dot com


--- Comment #12 from hjl dot tools at gmail dot com  2010-09-16 13:32 
---
(In reply to comment #4)
 Created an attachment (id=21809)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21809action=view) [edit]
 patch to fix half STRICT_ALIGNMENT targets memcpy folding
 
 Might need this patch to fix as well.  i?86 / x86_64 isn't really
 !STRICT_ALIGNMENT.
 

We need a HARD_ALIGNMENT which depends on type for x86.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread rguenth at gcc dot gnu dot org


--- Comment #13 from rguenth at gcc dot gnu dot org  2010-09-16 13:39 
---
(In reply to comment #12)
 (In reply to comment #4)
  Created an attachment (id=21809)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21809action=view) [edit]
  patch to fix half STRICT_ALIGNMENT targets memcpy folding
  
  Might need this patch to fix as well.  i?86 / x86_64 isn't really
  !STRICT_ALIGNMENT.
  
 
 We need a HARD_ALIGNMENT which depends on type for x86.

With that patch the assignment generated from memcpy doesn't need more
that int alignment, but still cfgexpand.c sets DECL_ALIGN of the
decl to 128 so expand uses aligned instructions.

cfgexpand.c should not increase alignment and not set 'needs stack
alignment' then, based on your comment #10.  So this _is_ about
stack alignment (but maybe not exclusively).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread jakub at gcc dot gnu dot org


--- Comment #14 from jakub at gcc dot gnu dot org  2010-09-16 13:54 ---
The reason why cfgexpand does increase the alignment is that it believes that
the base slot will be at least PREFERRED_STACK_BOUNDARY bytes aligned, which is
true on all targets but i?86/x86-64, which apparently sometimes chooses even
smaller alignment for the stack base.  So, we can either use there
MAX (..., STACK_BOUNDARY); for STACK_ALIGNMENT_SUPPORTED instead, which might
penalize some code though, or use INCOMING_STACK_BOUNDARY there (after making
sure we compute it before) and bump needed alignment to whatever we pick there
up.  During expansion expanders of course make use of the DECL_ALIGN info
cfgexpand provides, after all that's why we do that. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread hjl dot tools at gmail dot com


--- Comment #15 from hjl dot tools at gmail dot com  2010-09-16 13:54 
---
Created an attachment (id=21810)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21810action=view)
A patch

This patch adds HARD_ALIGNMENT_MODE_P and works for me.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread hjl dot tools at gmail dot com


--- Comment #16 from hjl dot tools at gmail dot com  2010-09-16 13:59 
---
(In reply to comment #13)

 With that patch the assignment generated from memcpy doesn't need more
 that int alignment, but still cfgexpand.c sets DECL_ALIGN of the
 decl to 128 so expand uses aligned instructions.
 
 cfgexpand.c should not increase alignment and not set 'needs stack
 alignment' then, based on your comment #10.  So this _is_ about
 stack alignment (but maybe not exclusively).
 

When we do

float d[4];
__m128 *p = (__m128 *) d;


all bets are off.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread jakub at gcc dot gnu dot org


--- Comment #17 from jakub at gcc dot gnu dot org  2010-09-16 14:08 ---
That's true.  But many expanders can make use of DECL_ALIGN information, e.g.
to choose faster code.  If cfgexpand keeps doing what it does now, namely
bumping DECL_ALIGN of variables up to PREFERRED_STACK_BOUNDARY even when in the
end the stack block doesn't end up being aligned that way, then it lies to the
expander
and that will hit us again and again.  On x86-64/i686, I don't think we want to
prevent memcpy folding as your patch does, at least not for CPUs where movu* is
fast.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread rguenth at gcc dot gnu dot org


--- Comment #18 from rguenth at gcc dot gnu dot org  2010-09-16 14:13 
---
(In reply to comment #16)
 (In reply to comment #13)
 
  With that patch the assignment generated from memcpy doesn't need more
  that int alignment, but still cfgexpand.c sets DECL_ALIGN of the
  decl to 128 so expand uses aligned instructions.
  
  cfgexpand.c should not increase alignment and not set 'needs stack
  alignment' then, based on your comment #10.  So this _is_ about
  stack alignment (but maybe not exclusively).
  
 
 When we do
 
 float d[4];
 __m128 *p = (__m128 *) d;
 
 
 all bets are off.

?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread hjl dot tools at gmail dot com


--- Comment #19 from hjl dot tools at gmail dot com  2010-09-16 14:17 
---
(In reply to comment #17)
 That's true.  But many expanders can make use of DECL_ALIGN information, e.g.
 to choose faster code.  If cfgexpand keeps doing what it does now, namely
 bumping DECL_ALIGN of variables up to PREFERRED_STACK_BOUNDARY even when in 
 the
 end the stack block doesn't end up being aligned that way, then it lies to the
 expander

The problem isn't limited to stack.

 and that will hit us again and again.  On x86-64/i686, I don't think we want 
 to
 prevent memcpy folding as your patch does, at least not for CPUs where movu* 
 is
 fast.

That is true. Whatever we do, we can't lie about
alignment, on stack or not. Once we fix that,
the rest shouldn't be too hard to fix.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread rguenth at gcc dot gnu dot org


--- Comment #20 from rguenth at gcc dot gnu dot org  2010-09-16 14:22 
---
The patch in comment #4 makes memcpy folding not lie about alignment.

cfgexpand still lies about alignment though.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-16 Thread hjl dot tools at gmail dot com


--- Comment #21 from hjl dot tools at gmail dot com  2010-09-16 14:30 
---
(In reply to comment #20)
 The patch in comment #4 makes memcpy folding not lie about alignment.

X86 only cares about alignment for vector modes.
Can we combine 2 patches into one?

 cfgexpand still lies about alignment though.
 

Let's open a new bug and fix it separately.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678



[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

2010-09-15 Thread jakub at gcc dot gnu dot org


--- Comment #2 from jakub at gcc dot gnu dot org  2010-09-15 14:23 ---
Actually
typedef float V __attribute__ ((vector_size (16)));
V g;

int
main ()
{
  float d[4] = { 4, 3, 2, 1 };
  V e;
  __builtin_memcpy (e, d, sizeof (d));
  V f = { 5, 15, 25, 35 };
  e = e * f;
  g = e;
  return 0;
}

segfaults even with 4.5/4.6 at -O2 -m32 -msse2.


-- 

jakub at gcc dot gnu dot org changed:

   What|Removed |Added

Summary|[4.4 Regression] crash on   |[4.4/4.5/4.6 Regression]
   |vector code with -m32 -msse |crash on vector code with -
   ||m32 -msse


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678