[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #27 from sezeroz at gmail dot com 2010-09-18 20:51 --- Are 4.4 and 4.5 going to be fixed? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #22 from rguenth at gcc dot gnu dot org 2010-09-17 09:00 --- Subject: Bug 45678 Author: rguenth Date: Fri Sep 17 09:00:23 2010 New Revision: 164356 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=164356 Log: 2010-09-17 Richard Guenther rguent...@suse.de PR middle-end/45678 * builtins.c (fold_builtin_memory_op): Always properly adjust alignment of memory accesses. Modified: trunk/gcc/ChangeLog trunk/gcc/builtins.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #23 from rguenth at gcc dot gnu dot org 2010-09-17 13:57 --- Subject: Bug 45678 Author: rguenth Date: Fri Sep 17 13:57:04 2010 New Revision: 164369 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=164369 Log: 2010-09-17 Richard Guenther rguent...@suse.de PR middle-end/45678 * gcc.dg/torture/pr45678-1.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/torture/pr45678-1.c Modified: trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #24 from hjl dot tools at gmail dot com 2010-09-17 16:35 --- Created an attachment (id=21821) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21821action=view) A patch The problem is we failed to update stack alignment when we increase alignment of local variable. This patch works for me. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #25 from hjl dot tools at gmail dot com 2010-09-17 17:26 --- A patch is posted at http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01425.html -- hjl dot tools at gmail dot com changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc- ||patches/2010- ||09/msg01425.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #26 from hjl at gcc dot gnu dot org 2010-09-17 17:49 --- Subject: Bug 45678 Author: hjl Date: Fri Sep 17 17:49:30 2010 New Revision: 164375 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=164375 Log: Update stack alignment when increasing local variable alignment. gcc/ 2010-09-17 H.J. Lu hongjiu...@intel.com PR middle-end/45678 * cfgexpand.c (update_stack_alignment): New. (get_decl_align_unit): Use it. (expand_one_stack_var_at): Call update_stack_alignment. gcc/testsuite/ 2010-09-17 H.J. Lu hongjiu...@intel.com PR middle-end/45678 * gcc.dg/torture/pr45678-2.c: New. Added: trunk/gcc/testsuite/gcc.dg/torture/pr45678-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #3 from rguenth at gcc dot gnu dot org 2010-09-16 10:17 --- DECL_ALIGN of d is set to 128 (but appearantly it isn't ensured it'll end up that way). DECL_ALIGN is adjusted here: Old value = 32 New value = 128 expand_one_stack_var_at (decl=0x75ae90a0, offset=-16) at /space/rguenther/src/svn/trunk/gcc/cfgexpand.c:739 739 DECL_USER_ALIGN (decl) = 0; so on trunk get_object_alignment of the MEM_REF will return 128 and thus we do not run into unaligned move expansion here: if (mode != BLKmode (unsigned) align GET_MODE_ALIGNMENT (mode) /* If the target does not have special handling for unaligned loads of mode then it can use regular moves for them. */ ((icode = optab_handler (movmisalign_optab, mode)) != CODE_FOR_nothing)) manually setting alignment back to 32 in gdb results in ok asm. movlps (%esp), %xmm0 movhps 8(%esp), %xmm0 mulps .LC4, %xmm0 instead of mulps (%esp), %xmm0 Appearantly stack alignment code doesn't work. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added CC||hjl at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #4 from rguenth at gcc dot gnu dot org 2010-09-16 10:18 --- Created an attachment (id=21809) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21809action=view) patch to fix half STRICT_ALIGNMENT targets memcpy folding Might need this patch to fix as well. i?86 / x86_64 isn't really !STRICT_ALIGNMENT. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #5 from jakub at gcc dot gnu dot org 2010-09-16 10:40 --- Re: #c4, shouldn't there be srcvar = NULL_TREE; somewhere for the STRICT_ALIGNMENT non-aligned case? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #6 from rguenth at gcc dot gnu dot org 2010-09-16 10:50 --- Missing some else indeed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #7 from jakub at gcc dot gnu dot org 2010-09-16 11:57 --- For the ix86/x86_64 alignment issue, I believe the problem here is that max_align = MAX (crtl-max_used_stack_slot_alignment, PREFERRED_STACK_BOUNDARY); is fine for !SUPPORTS_STACK_ALIGNMENT targets, but for ix86/x86_64 if max_used_stack_slot_alignment is really small, we might end up with deciding to use a smaller alignment. Perhaps for SUPPORTS_STACK_ALIGNMENT we should use here instead max_align = MAX (crtl-max_used_stack_slot_alignment, INCOMING_STACK_BOUNDARY); and if the align we compute is bigger than crtl-max_used_stack_slot_alignment ensure we will keep using that alignment (e.g. by bumping also crtl-stack_align_needed/estimated). If INCOMING_STACK_BOUNDARY is 128 bits aligned, I think using 128 bit alignment shouldn't cost us anything extra. The problem with using INCOMING_STACK_BOUNDARY is that it is on ix86/x86-64 computed only too late (in expand_stack_alignment by targetm.calls.update_stack_boundary (); ). The comment above it says it is computed again, but I can't actually find another call. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #8 from hjl dot tools at gmail dot com 2010-09-16 13:02 --- This also failed: --- typedef float V __attribute__ ((vector_size (16))); V g; float d[4] = { 4, 3, 2, 1 }; int main () { V e; __builtin_memcpy (e, d, sizeof (d)); V f = { 5, 15, 25, 35 }; e = e * f; g = e; return 0; } --- Program received signal SIGSEGV, Segmentation fault. 0x0804837e in main () at foo.c:11 11e = e * f; Missing separate debuginfos, use: debuginfo-install glibc-2.12.1-2.0.f13.i686 (gdb) disass Dump of assembler code for function main: 0x08048374 +0: push %ebp 0x08048375 +1: mov%esp,%ebp 0x08048377 +3: movaps 0x8048470,%xmm0 = 0x0804837e +10:mulps 0x8049644,%xmm0 0x08048385 +17:movaps %xmm0,0x8049670 0x0804838c +24:mov$0x0,%eax 0x08048391 +29:pop%ebp 0x08048392 +30:ret End of assembler dump. (gdb) q There is no stack involved. Somehow we failed to align array of float properly. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #9 from hjl dot tools at gmail dot com 2010-09-16 13:05 --- If __builtin_memcpy generates instructions which require bigger alignment than alignments of source or destination, it should increase the alignment of source or destination. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #10 from hjl dot tools at gmail dot com 2010-09-16 13:10 --- When __builtin_memcpy increases the alignment of source or destination, it should update needed stack alignment if source or destination is on stack. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #11 from hjl dot tools at gmail dot com 2010-09-16 13:21 --- This code: if (TREE_CODE (srcvar) == ADDR_EXPR var_decl_component_p (TREE_OPERAND (srcvar, 0)) tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len) (!STRICT_ALIGNMENT || !destvar || src_align = TYPE_ALIGN (desttype))) srcvar = fold_build2 (MEM_REF, destvar ? desttype : srctype, srcvar, off0); does float d[4]; __m128 *p = (__m128 *) d; and treats p as properly aligned. I don't see how it can ever work with SSE. It has nothing to do with stack alignment. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #12 from hjl dot tools at gmail dot com 2010-09-16 13:32 --- (In reply to comment #4) Created an attachment (id=21809) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21809action=view) [edit] patch to fix half STRICT_ALIGNMENT targets memcpy folding Might need this patch to fix as well. i?86 / x86_64 isn't really !STRICT_ALIGNMENT. We need a HARD_ALIGNMENT which depends on type for x86. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #13 from rguenth at gcc dot gnu dot org 2010-09-16 13:39 --- (In reply to comment #12) (In reply to comment #4) Created an attachment (id=21809) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21809action=view) [edit] patch to fix half STRICT_ALIGNMENT targets memcpy folding Might need this patch to fix as well. i?86 / x86_64 isn't really !STRICT_ALIGNMENT. We need a HARD_ALIGNMENT which depends on type for x86. With that patch the assignment generated from memcpy doesn't need more that int alignment, but still cfgexpand.c sets DECL_ALIGN of the decl to 128 so expand uses aligned instructions. cfgexpand.c should not increase alignment and not set 'needs stack alignment' then, based on your comment #10. So this _is_ about stack alignment (but maybe not exclusively). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #14 from jakub at gcc dot gnu dot org 2010-09-16 13:54 --- The reason why cfgexpand does increase the alignment is that it believes that the base slot will be at least PREFERRED_STACK_BOUNDARY bytes aligned, which is true on all targets but i?86/x86-64, which apparently sometimes chooses even smaller alignment for the stack base. So, we can either use there MAX (..., STACK_BOUNDARY); for STACK_ALIGNMENT_SUPPORTED instead, which might penalize some code though, or use INCOMING_STACK_BOUNDARY there (after making sure we compute it before) and bump needed alignment to whatever we pick there up. During expansion expanders of course make use of the DECL_ALIGN info cfgexpand provides, after all that's why we do that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #15 from hjl dot tools at gmail dot com 2010-09-16 13:54 --- Created an attachment (id=21810) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21810action=view) A patch This patch adds HARD_ALIGNMENT_MODE_P and works for me. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #16 from hjl dot tools at gmail dot com 2010-09-16 13:59 --- (In reply to comment #13) With that patch the assignment generated from memcpy doesn't need more that int alignment, but still cfgexpand.c sets DECL_ALIGN of the decl to 128 so expand uses aligned instructions. cfgexpand.c should not increase alignment and not set 'needs stack alignment' then, based on your comment #10. So this _is_ about stack alignment (but maybe not exclusively). When we do float d[4]; __m128 *p = (__m128 *) d; all bets are off. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #17 from jakub at gcc dot gnu dot org 2010-09-16 14:08 --- That's true. But many expanders can make use of DECL_ALIGN information, e.g. to choose faster code. If cfgexpand keeps doing what it does now, namely bumping DECL_ALIGN of variables up to PREFERRED_STACK_BOUNDARY even when in the end the stack block doesn't end up being aligned that way, then it lies to the expander and that will hit us again and again. On x86-64/i686, I don't think we want to prevent memcpy folding as your patch does, at least not for CPUs where movu* is fast. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #18 from rguenth at gcc dot gnu dot org 2010-09-16 14:13 --- (In reply to comment #16) (In reply to comment #13) With that patch the assignment generated from memcpy doesn't need more that int alignment, but still cfgexpand.c sets DECL_ALIGN of the decl to 128 so expand uses aligned instructions. cfgexpand.c should not increase alignment and not set 'needs stack alignment' then, based on your comment #10. So this _is_ about stack alignment (but maybe not exclusively). When we do float d[4]; __m128 *p = (__m128 *) d; all bets are off. ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #19 from hjl dot tools at gmail dot com 2010-09-16 14:17 --- (In reply to comment #17) That's true. But many expanders can make use of DECL_ALIGN information, e.g. to choose faster code. If cfgexpand keeps doing what it does now, namely bumping DECL_ALIGN of variables up to PREFERRED_STACK_BOUNDARY even when in the end the stack block doesn't end up being aligned that way, then it lies to the expander The problem isn't limited to stack. and that will hit us again and again. On x86-64/i686, I don't think we want to prevent memcpy folding as your patch does, at least not for CPUs where movu* is fast. That is true. Whatever we do, we can't lie about alignment, on stack or not. Once we fix that, the rest shouldn't be too hard to fix. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #20 from rguenth at gcc dot gnu dot org 2010-09-16 14:22 --- The patch in comment #4 makes memcpy folding not lie about alignment. cfgexpand still lies about alignment though. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #21 from hjl dot tools at gmail dot com 2010-09-16 14:30 --- (In reply to comment #20) The patch in comment #4 makes memcpy folding not lie about alignment. X86 only cares about alignment for vector modes. Can we combine 2 patches into one? cfgexpand still lies about alignment though. Let's open a new bug and fix it separately. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678
[Bug rtl-optimization/45678] [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse
--- Comment #2 from jakub at gcc dot gnu dot org 2010-09-15 14:23 --- Actually typedef float V __attribute__ ((vector_size (16))); V g; int main () { float d[4] = { 4, 3, 2, 1 }; V e; __builtin_memcpy (e, d, sizeof (d)); V f = { 5, 15, 25, 35 }; e = e * f; g = e; return 0; } segfaults even with 4.5/4.6 at -O2 -m32 -msse2. -- jakub at gcc dot gnu dot org changed: What|Removed |Added Summary|[4.4 Regression] crash on |[4.4/4.5/4.6 Regression] |vector code with -m32 -msse |crash on vector code with - ||m32 -msse http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45678