[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #84 from mikulas at artax dot karlin dot mff dot cuni dot cz 2010-08-25 21:27 --- (In reply to comment #83) If the bug is not related to stack alignment (i.e. it crashes not on unaligned SSE access), simplify it and file another bugzilla entry. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #80 from mikulas at artax dot karlin dot mff dot cuni dot cz 2010-08-17 20:17 --- Comment #79: -mpreferred-stack-boundary=2 adheres to the sysv ABI but it doesn't adhere to the Linux ABI (that requires 16-byte alignment), so if you compile anything with -mpreferred-stack-boundary=2, it may crash when entering other dynamic libraries. -mstackrealign does the right thing, it realigns the stack when needed, but keeps it 16-byte aligned on function output. It should be used. I would be nice if gcc developers made -mstackrealign default (with an option to turn it off for scientists who need maximum performance and don't care about ABI) so that this ABI madness will finally end when distributions get recompiled with it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #82 from mikulas at artax dot karlin dot mff dot cuni dot cz 2010-08-17 21:17 --- -mstackrealign is available from gcc 4.5.0. For gcc 4.4 you can use my patch GCC-4.4.1-ALIGN-PATCH from this bugzilla or H.J.Lu's last patch. It basically does the same as -mstackrealign (but it doesn't add a command line option, it sets this behavior as default). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #68 from mikulas at artax dot karlin dot mff dot cuni dot cz 2010-04-20 07:48 --- gcc 4.5 is affected too. It would be nice if they fixed it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug target/40667] [4.4/4.5 Regression] stack frames are generated even with -fomit-frame-pointer
--- Comment #26 from mikulas at artax dot karlin dot mff dot cuni dot cz 2010-02-15 10:34 --- Comment #25: I don't understand your logic, what does attribute((noreturn)) have to do with a stack frame? The only valid reasons for generating a stack frame are alloca() or needed stack realignment. Other functions calls, either returning or noreturn don't need a frame. Note that attribute((noreturn)) functions normally don't trigger a stack frame. That example function was actually carefully minimized from a larger real-world function. If you change the content of the loop, the stack frame won't be generated. It looks like there is something rotten in gcc. -- mikulas at artax dot karlin dot mff dot cuni dot cz changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug middle-end/41992] ICE on invalid dereferencing of void *
--- Comment #3 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-11-11 21:06 --- You can dereference void * in asm arguments --- i.e. void *p; ... asm volatile (prefetch %0::m(*p)); gcc warns in this case about the dereference and maybe it shouldn't (but it's trivial to supress the warning with a cast to char *). If you change m constraint to mr, you get an ICE. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41992
[Bug target/40983] The scheduler incorrectly swaps MMX and floating point instructions
--- Comment #6 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-11-11 23:06 --- I think you can commit it to gcc. I don't see a reason why it shouldn't be committed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40983
[Bug middle-end/41992] New: ICE on invalid dereferencing of void *
Hi This piece of invalid code (the dereference shouldn't be there) triggers an ICE. The crash happens on 4.3.2, 4.4.1 and 4.4.2. It happens with or without optimizations. static void MONITOR(void *ptr) { __asm__ volatile ( \n\ XORL%%ECX, %%ECX\n\ XORL%%EDX, %%EDX\n\ MONITOR \n\ ::a(*ptr):cx,dx,cc,memory); } s.e: In function 'MONITOR': s.e:7: warning: dereferencing 'void *' pointer s.e:7: internal compiler error: in gimplify_expr, at gimplify.c:7074 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. -- Summary: ICE on invalid dereferencing of void * Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: minor Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41992
[Bug target/41900] New: call *%esp shouldn't be generated because of CPU errata
Hi Intel P6 family of processors (Pentium Pro, 2, 3) have a bug in call *%esp instruction. The instruction should put current EIP to stack, decrement ESP by 4 and jump to a value of ESP before the decrement. P6 processors will jump to the address after the decrement (so the will execute return address as code). See Pentium Pro errata 70, Pentium 2 errata A33, Pentium 3 errata E17. Gcc generates call *%esp for this example, when compiled with -O2 -fomit-frame-pointer -mpreferred-stack-boundary=2: int main() { volatile unsigned code = 0x00c3; ((void (*)(void))code)(); return 0; } The code crashes when executed on P6 processor and executes correctly on other processors. GCC shouldn't allow direct %esp register for call instruction. (addressing using %esp is fine). --- Note: this bug comes from a piece of code used to call an arbitrary interrupt. I coded it as this. The call *%esp bug looks weird but is not an artifical example, it comes from a real code that was written and used. static void INTR(unsigned int_no) { volatile unsigned code = 0xc300cd | (int_no 8); ((void (*)(void))code)(); } -- Summary: call *%esp shouldn't be generated because of CPU errata Product: gcc Version: 4.4.2 Status: UNCONFIRMED Severity: minor Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i486-linux-gnu GCC host triplet: i486-linux-gnu GCC target triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41900
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #58 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-10-15 20:24 --- (In reply to comment #53) Created an attachment (id=18656) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18656action=view) [edit] An updated patch for gcc 4.4 Seamonkey is correct with that patch. But it fails with structures containing sse vector, i.e. this. My patch (in comment #46) handles this case correctly (it reallign the stack if any structure on it has alignment at least 16), so should we submit that patch instead? I think it's wrong to add alignment test to the autovectorizer because vector data types can be created explicitly without autovectorizer. Mikulas typedef int v4si __attribute__ ((vector_size (16))); struct x { v4si v; v4si w; }; void y(void *); v4si x(void) { struct x x; y(x); } --- the function x should realign the stack but doesn't. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #61 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-10-16 02:10 --- Why should gcc align the stack when SSE registers aren't used at all? Because it passes pointer to the structure containing vector entries to someone else who expects it to be aligned. As for the updated patch --- why does it modify the autovectorizer? Anything that the autovectorizer does can be done manually without the autovectorizer. So, if there is a case where patching the autovectorizer is required to avoid a bug, there is definitely another case, where the bug still persists if the programmer vectorizes the code explicitly. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #54 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-27 09:03 --- (In reply to comment #51) For 4.4, the designers held two wrong assumptions: 1) the incoming stack is always aligned on -mincoming-stack-boundary (wrong for functions called from assembler or code generated by other compilers). 2) all the variables must be aligned on their alignment (wrong for double, long double, long long, mmx: the processor may accept them misaligned). The assumption 1) generates crashing code (for example Seamonkey). The assumption 2) generates suboptimal code (bug #40667). The assumption 1) can be trivially quashed with parameter -mincoming-stack-boundary=2, then the code will be non-crashing, but you will be hitting problem 2) and the code will be ugly and slow: any function containing double or long double variable will generate code for stack realigning and a frame pointer. (for long long this inefficiency was partially fixed in 4.4.1, but only partially, single long long variables don't trigger the alignment after 4.4.1 but structures with long long do, see bug #40667). So: to fix problem 1), gcc should assume that the incoming stack is 4-byte aligned. To fix problem 2), instead of single alignment, types and variables should have two alignments: preferred alignment and enforced alignment (so that you don't realign the stack if there is double on it, but you realign it if there is 16-byte SSE). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #55 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-27 09:07 --- If we assume incoming stack is 4byte aligned, we have to realign stack for every function due to #2, which isn't acceptable. No, you don't. All you have to do is to allocate the stack frame that is multiple of 16 bytes (gcc does that already). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #56 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-27 09:36 --- As for this old code that assumes 16-byte alignment: this is wrong code generated by old versions of gcc. It only works as long as it is called from other gcc = 3 code (call it from gcc 3, icc or assembler and it crashes). So you don't have to realign the stack to make sure that the code works always (it doesn't, anyway). All you have to do with this old code is to make sure that you don't make things worse --- i.e. if it worked before, it should continue work after redesign. So all you need is to make a stack frame having a multiple of 16-bytes. No realign needed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug middle-end/41475] common variables cannot be expected to be aligned
--- Comment #9 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-27 09:51 --- The common linker definitions were made to exactly to make code like this work and share the array between two object. So if you think it is undefined, don't support it (make -fno-common default and remove -fcommon at all). Or, if you want to support it, support it correctly. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41475
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #50 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-26 15:44 --- Please find a testcase first. Otherwise, nothing will be done. Thanks. I don't want anything to be done (unless the patch causes generation of wrong code --- I am not aware of such case). I am saying that the patch could be included in 4.4 as a quick fix and that 4.5 needs stack alignment redesign. You can't redesign it by incrementally testing against a set of examples. You (or someone else) need to sit down, draft the rules for propagation of preferred and enforced alignment across types down to the stack frame and code it. If you understand what you're doing, you can create test examples on your own. If you don't understand, then don't hack it at all. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug middle-end/41475] common variables cannot be expected to be aligned
--- Comment #7 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-26 15:27 --- Richard Guenther: the bug caused by common symbol (in file commonalign1.o) with the same name as data section symbol (in file commonalign2.o). In this case, the linker redirects the common symbol to the symbol in the data section. But the linker cannot change the content of the data section, so it cannot make both symbols array and array2 aligned. Linker is innocent in this and giving a warning is the best thing it can do. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41475
[Bug middle-end/41475] New: common variables cannot be expected to be aligned
Hi. When I compile these two sources, one with -O3 -march=pentium3 and the other with -Os, the linker warns about nonmatching alignments and the program crashes because of misaligned SSE accesses. /usr/bin/ld: Warning: alignment 4 of symbol `array' in commonalign2.o is smaller than 32 in commonalign1.o /usr/bin/ld: Warning: alignment 4 of symbol `array2' in commonalign2.o is smaller than 32 in commonalign1.o The reson is that gcc aligns arrays in common section to 32 bytes and expects that they will be aligned so. This expectation is wrong, the common entries may be resolved pointing to data section in another module that doesn't meet the alignment (it may be compiled with different compiler or with the same compiler with different flags (-Os)). For extern arrays, gcc correctly assumes that they are aligned to their ABI standard (4 bytes) and generates appropriate SSE code; for common arrays it should expect 4-byte alignment too. -- Summary: common variables cannot be expected to be aligned Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41475
[Bug middle-end/41475] common variables cannot be expected to be aligned
--- Comment #1 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-26 04:05 --- Created an attachment (id=18653) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18653action=view) the first file The first file of a two-file program. Compile with gcc -c -O3 -march=pentium3. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41475
[Bug middle-end/41475] common variables cannot be expected to be aligned
--- Comment #2 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-26 04:06 --- Created an attachment (id=18654) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18654action=view) the second file The second file. Compile with gcc -c -Os. Then, link both object files together and run it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41475
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #48 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-26 04:25 --- It can be seen from the patch. I don't know how to detect that a structure contains an array with required SSE align, so I realign the stack for all types BLKmode with alignment = 16. That may also catch structures containing long double with -m128bit-long-double and do unneeded align for them. Another point where it aligns and doesn't have to is: there are some SSE variables used --- that trigger frame generation to prepare for a possible align --- after register allocation, they are not spilled, so the alignment is not needed --- but there are some other aligned types on the stack (for example floating point; they do not need enforced alignment). Then, my patch simply realigns the stack. I think the patch is a good hack that may be added to 4.4 so that Gentoo people stop whining that gcc -O3 is unstable. But for 4.5, the stack realign needs to be redesigned. There are other cases ( PR/40667 ) where it triggers stack alignment that is not needed. As I said in comment #3: introduce preferred and enforced alignment for all types will do the right thing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #46 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-25 00:56 --- Created an attachment (id=18646) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18646action=view) A patch for gcc 4.4.1 I decided to make a patch on my own. Seamonkey works with it. It sometimes aligns unnecessarily but it should never miss an alignment. (someone with more knowledge about gcc than me could refine the patch to make fewer unneeded alignments) I searching the generated seamonkey code for 'movaps.*esp' and it shows that all the functions that store xmm on the stack use stack alignment. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #43 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-23 16:28 --- With the patch from comment #41, my test examples pass but seamonkey is still miscompiled, the function pow5mult still doesn't align the stack and spills xmm0 on it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #39 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-20 06:30 --- The updated patch fixes align-counterexample1.c, but not align-counterexample2.c. Note that you must align the stack for all functions that have some SSE operations, because you never know if the registers will be spilled. The generated code is: .globl f .type f, @function f: pxor%xmm0, %xmm0 subl$28, %esp xorl%eax, %eax .p2align 5,,24 .L2: movaps %xmm0, p(%eax) addl$16, %eax cmpl$400, %eax jne .L2 movaps %xmm0, (%esp) callg movdqa (%esp), %xmm0 xorl%eax, %eax .p2align 5,,24 .L3: movaps %xmm0, q(%eax) addl$16, %eax cmpl$400, %eax jne .L3 addl$28, %esp ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #32 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-13 13:58 --- Created an attachment (id=18578) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18578action=view) A bug example for 4.4 patch Shows a bug in 4.4 patch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #33 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-13 13:59 --- Created an attachment (id=18579) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18579action=view) Another bug in 4.4 patch Another bug in 4.4 patch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #34 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-09-13 14:07 --- So I posted these two examples that show that the patch is insufficient: 1) if the array is embedded in a structure and that structure is on the stack, the stack is not aligned. (if the array were out of structure, it would be) 2) gcc sometimes spills xmm registers over function calls (spilling zero this way is suboptimal, but that's only a performance issue) and the stack is not aligned in this case. Note, in align-counterexample2.c, there is another bug! Gcc shouldn't assume that arrays x and y are 16-byte aligned. Arrays x and y are in common section, that means that they can be declared in any other module and the linker joins these declarations. So, for example, if in another module someone declared int x[100] = { 1, 2, 3, 4 } and the module were compiled with different compiler that doesn't align the array (the ABI allows it) , the linker is forced to use the declaration with the initialization and the array would be misaligned. We can only assume alignment on the variables that are not in common section, i.e. with -fno-common or explicitly initialized. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug target/40983] The scheduler incorrectly swaps MMX and floating point instructions
--- Comment #4 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-23 19:18 --- The patch works fine. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40983
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #30 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-23 19:28 --- I tested the 4.4 patch and it works fine. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug target/41017] regparm=3 passes structures inconsistently
--- Comment #8 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-11 20:38 --- Created an attachment (id=18341) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18341action=view) A patch for -freg-struct-return Another patch that makes -freg-struct-return consistent. Return structures with size 1, 2, 4, 8 bytes in EAX or EDX:EAX. No matter what they contain. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41017
[Bug target/41017] regparm=3 passes structures inconsistently
--- Comment #9 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-11 20:40 --- The Basic rule implemented in both patches is: when you have aggregate type, you MUST NOT look at mode to infer parameter or return method. It is unreliable. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41017
[Bug target/41013] Fastcall calling convention is incompatible with Windows
--- Comment #2 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-11 20:48 --- Another inconsistency: struct { float f }; is returned in ST(0) in GCC and in EAX on Windows. struct { double f }; is returned in ST(0) in GCC and in EDX:EAX on Windows. See PR 41017 for more examples. Another one: On Windows, structure with zero size ( struct { int a[0]; }) is treated like having size 4 when passing arguments and treated like being returned in registers (well, really, there is nothing returned because the structure is empty, but it differs from GCC: GCC will pass dummy return place argument and Windows won't). Another one: struct { int a; int b[]; } is treated like having variable-size and allways returned by reference in GCC. In Windows b[] and b[0] seem to be equivalent and this is treated like having the size 4 and returned in EAX. I also quite didn't get the purpose of MS_AGGREGATE_RETURN --- it seems to be doing the same thing as -freg-struct-return in current gcc code. How are these two supposed to differ? But it could be used to differentiate this zero-sized-member thing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41013
[Bug rtl-optimization/40667] [4.4/4.5 Regression] stack frames are generated even with -fomit-frame-pointer
--- Comment #24 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-11 21:01 --- Another case when stack frame is spuriously generated: /* -O2 -fomit-frame-pointer -mno-accumulate-outgoing-args */ void __attribute__((__noreturn__)) crash(__const__ char *, ...); void F(int *q) { while (1) { if (*q 0) crash(buuu); if (!*q) break; q++; } } --- stack frame is generated for no apparent reason. The switch that actually does it is -ftree-ch (with -fno-tree-ch, the stack frame is not generated). This is not misgenerated code byt it may indicate that something bad is going on in the compiler. The conditions are very peculiar ... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug target/41017] regparm=3 passes structures inconsistently
--- Comment #7 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-10 19:36 --- Worse, try to return these structures with -freg-struct-return and it also follows this inconsistent pattern, some are returned in EAX:EDX, some are returned in ST(0). It is even inconsistent between GCC releases: struct { float a[1]; float a[0]; } is returned in EAX on GCC-3.3 and in ST(0) on GCC-3.4 and GCC-4. It is much worse than regparm inconsistency because some operating systems (for example FreeBSD) use -freg-struct-return as a default calling convention and GCC generates incompatible code for them. Where is this -freg-struct-return thing documented. The documentation in manpage, Return 'struct' and 'union' values in registers when possible. is really inadequate. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41017
[Bug target/41013] New: Fastcall calling convention is incompatible with Windows
In Windows, fastcall calling convention is implemented in the following way: * an argument that has integer type with size less-or-equal than 4 bytes is eligible for fastcall * the first argument that is eligible for fastcall is passed in ECX * the second argument that is eligible for fastcall is passed in EDX GCC implementes it badly, structures and 64-bit integer arguments are not correctly passed in registers but they incorrectly increase the register number in function_arg_advance_32 and further arguments eligible for fastcall are not passed in registers. Example: struct s { int a; }; int __attribute__((fastcall,noinline)) f(struct s s, int a1, int a2) { printf(args: %d, %d, %d\n, a1, a2, s.a); return 0; } int main() { struct s s = { 3 }; f(s, 1, 2); return 0; } --- on Windows, s goes on the stack, a1 goes in ECX and a2 goes in EDX. --- in gcc, s goes on the stack (but it incorrectly increased a register number), a1 goes in EDX and a2 goes on the stack too because gcc runs out of fastcall registers. -- Summary: Fastcall calling convention is incompatible with Windows Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: minor Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41013
[Bug target/41013] Fastcall calling convention is incompatible with Windows
--- Comment #1 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-09 13:39 --- an argument that has integer type should really be an argument that has integer or pointer type ... pointers are passed in registers too. Anything else isn't, I think. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41013
[Bug target/41017] New: regparm=3 passes structures inconsistently
regparm=3 passes structures in registers if they fit there. There is inconsistency in this rule, if structure contains only float or only double type, it is passed on the stack. Example: __attribute__((regparm(3))) void function(struct s s); now, the argument passing varies wildly depending on the definition of struct s. struct s { float a; float b; } --- passed in registers struct s { float a; } --- passed on stack struct s { float a[1]; } --- passed on stack struct s { float a[2]; } --- passed in registers struct s { float a; int b; } --- passed in registers struct s { float a[1]; float b[0]; } --- passed on stack struct s { double a; } --- passed on stack struct s { struct { float a; }; } --- passed on stack union s { float a; } --- passed in registers struct s { union { float a; }; } --- passed in registers struct s { struct { float a; } q[1]; } --- passed on stack struct s { long double a; } --- passed on stack struct s { union { long double a; } } --- passed in registers --- actually it seems that if the structure contains only one floating point entry inside other structures or arrays (not unions), it is passed on stack, not in registers ... otherwise it is passed in registers. Hardly anyone deliberately designed it this way. Gcc internals are exposed to the ABI! If the structure contains just one entry, its mode is different from BLKmode and it takes different path in function_arg_advance_32 and function_arg_32. I'd propose to change it so that structures are always passed in registers, the current state makes it hard or impossible to do any automatic marshalling of arguments for regparm functions. I found this bug when trying to extend libffi to handle regparm=3 calling convention. (another way to fix this is to pass structures always on the stack, maybe it would generate faster code, but it would create more ABI-incompatibility pain) -- Summary: regparm=3 passes structures inconsistently Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41017
[Bug target/41017] regparm=3 passes structures inconsistently
--- Comment #1 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-09 18:40 --- Created an attachment (id=18331) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18331action=view) a proposed patch. Fixed bug 41013 as well. Change it so that all the aggregate types take common code path, so passing of the structure no longer depends on its content. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41017
[Bug target/41017] regparm=3 passes structures inconsistently
--- Comment #4 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-09 21:16 --- Regparm changed between gcc 3.x - 4.x (I remember it too painfully, I had to rewrite some assembler files). In 3.x, all arguments were incrementing register count, even if they were on stack, if you had (float f1, int i2, int i3), f1 was on stack, i2 was in EDX and i3 was in ECX. In 4.x it changed so that f1 is on stack, i2 is in EAX and i3 is in EDX. Similarly, (double f1, int i2, int i3) was (stack, ECX, stack) in 3.x and (stack, EAX, EDX) in 4.x. If we change it so that structures are always in registers, it will be the least painful thing, because they already are in registers almost always (except few pathological cases, like struct containing only float or double). So it won't likely hurt too much, because few programmers rely on regparm(3) for external ABI, few programmers pass structures directly and few programmers declare structure with only one member. And the programmer will be hurt only if he does all these three things. If you want to change it to be consistent with the documentation (not with existing implementation) and pass structures always on stack, I wouldn't object against it. Just don't change it later. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41017
[Bug rtl-optimization/40667] [4.4/4.5 Regression] stack frames are generated even with -fomit-frame-pointer
--- Comment #21 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-08 10:41 --- Hmm, it still generates the stack frame (and the alignment itself) when there are structures containing long long and with -malign-double. Example, compile with -O2 -fomit-frame-pointer -mpreferred-stack-boundary=2 -malign-double: struct s { long long a; int b; }; void f(struct s *x); void g(void) { struct s x; f(x); } --- the stack is aligned although it doesn't have to be. Output: g: pushl %ebp movl%esp, %ebp andl$-8, %esp subl$20, %esp leal4(%esp), %eax movl%eax, (%esp) callf leave ret I compile the whole project with -malign-double (so I must use it on all modules, even integer ones, as it's ABI thing) and I compile integer-only parts with -mpreferred-stack-boundary=2. I don't know if you can extend that hack for structures containing long long ... but the whole stack alignment thing really needs to be redesigned for gcc-4.5. -- mikulas at artax dot karlin dot mff dot cuni dot cz changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug rtl-optimization/40667] [4.4/4.5 Regression] stack frames are generated even with -fomit-frame-pointer
--- Comment #23 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-08 14:15 --- (In reply to comment #22) It is because -malign-double will align long long to 8 byte. Yes, it aligns it in the structures ... but why on the stack? Those people who were writing it really didn't understand the difference between preferred alignment (long long, double, long double) that shouldn't trigger any stack realigns and enforced alignment (sse 16-byte) that should. So gcc aligns the stack when it's not needed and doesn't align it when it is (PR 40838). That's why I think it needs redesign, it can't be fixed with incremental hacks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #23 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-08 17:30 --- (In reply to comment #21) Unfortunatelly, that patch is wrong. It aligns when there is some vector type in the function but it doesn't align if the autovectorizer creates SSE instructions. Try that obstack example in comment #12 and you see that the function my_alloc uses 16-byte sse instructions on stack and it doesn't have aligned stack with the patch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug target/40983] New: The scheduler incorrectly swaps MMX and floating point instructions
Hi This example fails, because in function f, the scheduler incorrectly swapped floating point store to c and load of mmx registers. Compile with -O2 -march=pentium-mmx -- Summary: The scheduler incorrectly swaps MMX and floating point instructions Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40983
[Bug target/40983] The scheduler incorrectly swaps MMX and floating point instructions
--- Comment #1 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-06 04:12 --- Created an attachment (id=18310) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18310action=view) A failing example -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40983
[Bug target/40983] The scheduler incorrectly swaps MMX and floating point instructions
--- Comment #2 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-08-06 04:15 --- Assembler output: f: pushl %ebp movl%esp, %ebp subl$16, %esp movq%mm0, -8(%ebp) movq%mm1, -16(%ebp) emms fldla faddl b movq-8(%ebp), %mm0 paddd -16(%ebp), %mm0 fstpl c leave ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40983
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #14 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-31 13:54 --- Jakub: And how many other bugs like this are there? 75% of binaries in /bin are buggy. Do you think it is really sensible to declare that majority of Linux software is buggy? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #16 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-31 15:22 --- H.J. Lu: No, you only have to align the stack in functions that do 16-byte SSE. I mean this: there are two possible reasons for aligning the stack 1) improved performance (double, long double, MMX, 8-byte SSE) 2) avoid crashes (16-byte SSE) To solve the case 1), it is perfectly reasonable to not align the stack and simply make sure that every function subtracts the stack by multiple of 16-bytes. If, by chance, the code is called from some other code with unaligned stack (like in that obstack example), the user will only suffer lower performance, not crashes. It is legal to make floating point calculations from obstack allocation callback --- but it is not common. So we don't have to care about performance in this case. To solve the case 2), you need to manually realign the stack at function prologue. But there are few functions that use SSE in typical desktop/server environment, that's why I'm saying that it won't have big impact on performance. Anyway, if some scientist needs high 16-bit SSE performance, we can make a flag for him that avoids stack realign --- but don't compile typical desktop/server distribution with this flag, because there ARE cases where the stack is misaligned. That's what I'm proposing in comment #3: that every type has two alignments, preferred alignment and enforced alignment. Only enforced alignment will force stack realign. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #17 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-31 15:31 --- Even if we align the incoming stack properly, we still have to align the outgoing stack to 16byte I'm not opposing it. What I mean is: every function will have stack frame size that is multiple of mpreferred-stack-boundary (16 bytes) --- it is what GCC is doing now. And additionally, there will be stack realign for functions that do 16-byte SSE math. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #19 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-31 16:17 --- (In reply to comment #18) Yes. But not an option. Make it default and make it optional to disable the alignment. Make it default, because such option would be useless if all libraries didn't use it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #11 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-31 01:00 --- So I did this experiment whether the stack is aligned in current Linux binaries. I applied this patch for gcc, so that it crashes on function entry if the function has stack not aligned on 16 bytes. diff -urp gcc-4.4.1/gcc/varasm.c gcc-4.4.1-test-align/gcc/varasm.c --- gcc-4.4.1/gcc/varasm.c 2009-03-17 21:18:21.0 +0100 +++ gcc-4.4.1-test-align/gcc/varasm.c 2009-07-25 16:18:11.0 +0200 @@ -1760,6 +1760,8 @@ assemble_start_function (tree decl, cons /* Standard thing is just output label for the function. */ ASM_OUTPUT_LABEL (asm_out_file, fnname); #endif /* ASM_DECLARE_FUNCTION_NAME */ + if (!crtl-stack_realign_needed) + fputs(\tsubl\t$12, %esp\n\ttestl\t$15, %esp\n\tjz\t9f\n\tud2a\n9:\taddl\t$12, %esp\n, asm_out_file); } /* Output assembler code associated with defining the size of the --- and the results are terrifying: Gcc didn't even bootstrap itself. It failed because it calls glibc function obstack_init and it calls back to xmalloc - with misaligned stack. So I compiled gcc without bootstrap and tried to compile glibc-2.7 with it. Glibc compiles its integer-only code with -mpreferred-stack-boundary=2, so I changed it to -mpreferred-stack-boundary=4. Glibc didn't finish its build either (failed when running some self-compiled scripts), but it at least produced libc.so. So I tried to preload this libc.so with stack-alignment-checking to various Linux binaries (with LD_PRELOAD) and see what happens. Out of 95 binaries in /bin/, only 23 succeeded! The remaining crashed because of glibc was called with unaligned stack. (the distribution is up-to-date Debian Lenny). The non-crashing binaries are: bzip2recover, cpio, dmesg, fgconsole, fuser, kill, loadkeys, lsmod, lvnet, mktemp, more (displays help only, crashes when attempting to display any file), mount, mountpoint, mt, mt-gnu, nbd-server, pidof, ping, ping6, run-parts, sed, su, tailf, umount So anyone, who is saying that the stack is aligned to 16 bytes has his mind disconnected from reality. It isn't. I find it very unreasonable that GCC developers try to declare their own ABI with aligned stack --- and that conflicts with what is being used by the majority of Linux applications. GCC developers are trying to say that 3/4 of programs in /bin/ are wrong because they don't align the stack. I think you should really align the stack in the functions that do SSE math and don't rely on the fact that the stack is already aligned. It is definitelly easier to use the code for stack reallign than declaring that majority of Linux binaries are BAD and need to be recompiled. If some scientists needed extreme performance and can't take the penalty of realigning the stack, you can add an option -massume-aligned-stack form them and it is the responsibility of a given scientist that the code compiled with this option is never called back from libc or anything else else. But don't assume stack alignment for general code. It just isn't true. -- mikulas at artax dot karlin dot mff dot cuni dot cz changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #12 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-31 01:04 --- Created an attachment (id=18276) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18276action=view) Crash because gcc assumes false stack alignment Here I'm submitting an example code that, when compiled with gcc 4.4.1 with -O3 -march=pentium3, crashes on Debian Lenny. Don't close this bug unless this code is working! You can get it working either by modifying gcc to align the stack (IMHO the easier way) or forcing all the distributions to recompile all their binaries because you want to declare new ABI (IMHO harder or impossible). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI
--- Comment #26 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-31 01:18 --- Very unfortunatelly, gcc does assume stack alignment. The problem is not technical (the code to realign the stack is already there, it's easy to activate it), the problem is ideological, because some gcc developers think that they can declare their own ABI and the world will be changing according to them. See comments in the bug #40838. I added that alignment test at the beginning of every libc function, the result is that 75% programs in /bin misalign the stack and are non-conforming to GCC-developer's ideas about their new ABI. I also posted a simple example there that does floating point math in the function that is called as a callback from glibc --- and crashes because of SSE and unaligned stack. I really say that they should align the stack in SSE functions instead of closing this bug with WONTFIX and shouting everyone's code is bad because we declared it so! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38496
[Bug rtl-optimization/40906] New: Wrong code generated for push of long double
Hi. Try this program: #include stdlib.h void f(long double a) { if (a != 1.0) abort(); } int g(long double b) { f(b); return 0; } int main(void) { g(1.0); return 0; } Compile it with -O2 -mpush-args -mno-accumulate-outgoing-args -fomit-frame-pointer -fno-inline In gcc 4.3.2 it works, in gcc-4.4.1 it aborts. If you add -m128bit-long-double, both gcc 4.3 and 4.4 fail. The reason is that push of long double in the function g is badly generated --- it is pushing value that is already on the stack and while it is pushing it, the stack pointer changes. Gcc tries to compensate for it, but the code is buggy and it ends up pushing wrong words. -- Summary: Wrong code generated for push of long double Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: major Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40906
[Bug rtl-optimization/40906] Wrong code generated for push of long double
--- Comment #1 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-29 17:07 --- Created an attachment (id=18270) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18270action=view) A patch for the bug A patch for the problem. 1. reverse the direction of change_address loop. That fixes 4.3-4.4 regression for 96-bit long double. 2. in case of 128-bit long double, the value 4 is subtracted from the stack pointer before the copying starts. We need to compensate for it by adding 4 to the source address if the source address in on the stack. Problem 2 should be backported to gcc-4.3 and earlier versions. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40906
[Bug rtl-optimization/40667] [4.4/4.5 Regression] stack frames are generated even with -fomit-frame-pointer
--- Comment #15 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-23 11:12 --- The patch just walks around the problem and doesn't really fix it. It is simply a hack that disables frame generation for long long, while for the other types the same problem persists: Example: void g(double); int f(double a) { g(a); return 0; } (compile with -O2 -fomit-frame-pointer -mpreferred-stack-boundary=2) --- here it still generates stack frame although it doesn't have to. Gcc 4.3 is correct. Stack frame needs to be generated if * the user requests it (no -fomit-frame-pointer) * there is alloca * there is stack realignment needed (either because of attribute that requests it or SSE variables are on the stack) --- in the above example, none of these three conditions are true, frame is not realigned, yet the frame pointer is generated. The logic for deciding whether to generate the frame pointer is flawed. The current double regression is not as critical as the long long but it is still incorrect. -- mikulas at artax dot karlin dot mff dot cuni dot cz changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug rtl-optimization/40667] [4.4/4.5 Regression] stack frames are generated even with -fomit-frame-pointer
--- Comment #16 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-23 11:18 --- In the above example, the output of assembler is: f: pushl %ebp movl%esp, %ebp subl$8, %esp fldl8(%ebp) fstpl (%esp) callg xorl%eax, %eax leave ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug rtl-optimization/40667] [4.4/4.5 Regression] stack frames are generated even with -fomit-frame-pointer
--- Comment #18 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-23 12:16 --- The bug is this: you don't align the stack and you generate the frame. Why? Why don't you do one of these?: * generate the frame and align * don't generate the frame and don't align these two would be reasonable, but generating the frame and not aligning is not. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug rtl-optimization/40838] New: gcc shouldn't assume that the stack is aligned
typedef int v4si __attribute__ ((vector_size (16))); v4si y(v4si *s3) { return *s3; } extern v4si s1, s2; v4si x(void) { v4si s3 = s1 + s2; return y(s3); } And compile it with -O2 -fno-inline -msse2 -fomit-frame-pointer The variable s3 is stored using unaligned store (movdqu) and loaded using aligned load (movdqa). -mpreferred-stack-boundary=4 doesn't guarantee stack alignment, it only advises that there is stack alignment (the function may be called from OS callback, signal, another library compiled with lesser alignment, etc... --- and i386 mandates only 4-byte stack alignment), so use of movdqa is incorrect. (does GCC ABI mandate that all vector types must be aligned? If so, then movdqa is correct, but storing it on the stack, relying on alignment -mpreferred-stack-boundary=4 is not correct). Now, if you compile it with -mpreferred-stack-boundary=2, function x aligns the stack but uses movdqu to store on the aligned stack, so it generates suboptimal code. -- Summary: gcc shouldn't assume that the stack is aligned Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #3 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-23 13:15 --- What I would propose to fix this and bug #40667: Each type has required alignment and preferred alignment. Enforced alignment is what is needed to not crash and not violate the ABI, preferred alignment is the alignment that has the best performance. For i386, all the types have enforced alignment 4-byte, except 128-bit SSE type having enforced alignment 16-bytes (the movdqa instruction crashes otherwise). Preferred alignment is 8 for double and 8-byte vector types and 16 for long double and 16-byte vector types. Now, if the function has some variable with the enforced alignment greater than the ABI standard (4), the stack must be realigned. The ABI mandates 4-byte alignment and the function may be called from anywhere. As an optimization, the realign may be skipped if the function is static and it is proved to be called only from functions with greater or equal enforced alignment and having stack size aligned. Each function aligns its stack size to -mpreferred-stack-boundary, which basically means if the stack was aligned before (the most common case), performance will be good. But you can't rely on this for correctness, as in the pathological cases, the stack doesn't have to be aligned. As an optimization, if you can prove that the function will call only functions manipulating types with preferred alignment at most X and X is lower than -mpreferred-stack-boundary, you can lower stack alignment to X (so that if there's a call graph of functions using only double, you don't have to align the stack on the default 16 bytes, 8 bytes is sufficient). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #4 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-23 13:19 --- Linux/ix86 ABI says that stack must be 16 byte aligned. No it doesn't. There is a plenty of Linux code that doesn't have the stack aligned on 16-byte boundary. (at least anything that was compiled with the old gcc that didn't have -mpreferred-stack-boundary switch). Please don't change i386 ABI. AFAIK only MacOSX/x86 enforced aligned stack. -- mikulas at artax dot karlin dot mff dot cuni dot cz changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug rtl-optimization/40667] [4.4/4.5 Regression] stack frames are generated even with -fomit-frame-pointer
--- Comment #20 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-23 13:28 --- I see, if it gets spilled to the stack as a local variable, it realigns the stack, if it doesn't get spilled, it doesn't. But shouldn't passing the variable as an argument on the stack be treated equal to spilling? It is the same instruction. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #7 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-23 13:49 --- See bug #27537, quoting GNU/Linux follows the SYSV x86 ABI which is documented, maybe you cannot find it but it does exist. The SYSV x86 ABI says the stack is aligned 4 byte aligned. That bug seems to reappear. As Agner noted, 16-byte stack alignment requirement also break compatibility with Intel CC. I found even some part of current glibc that violates this 16-byte alignment (calling push %eax; call exit from the assembler without aligning the stack size). Another point: if gcc realigns the stack, why then use movdqu to store the values on the stack? That is suboptimal. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug target/38496] Gcc misaligns arrays when stack is forced follow the x8632 ABI
--- Comment #23 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-23 14:34 --- So, Joseph is basically arguing that it doesn't make sense to follow bad standards. Fine. So let's ignore the i386 ABI standard thing for a moment a look at the change from the practical point of view: --- If we assume 16-bit stack alignment, who gets the advantage? * some scientists doing number crunching, it will save stack realign. Most desktop applications don't use SSE heavily (or not at all). Maybe video players (most of them have sse in assembler though and don't rely on gcc for sse generation). --- If we assume 16-bit stack alignment, what problems will it bring? * anything called from an inline assembler will have a possibility to fail. Assembler programmers don't know about this alignment requirement and have been writing pushl $0; pushl $1; call function; addl $8, %esp for ages. * anything compiled by Intel CC, TCC or other compilers. Intel CC assumes 4-byte alignment and uses some algorithm to realign only at certain points (if the function can be only called from stack-aligned functions, it doesn't have to have the stack realigned). If Intel CC does only integer arithmetics, it aligns the stack only for 4 bytes. Intel CC-generated code calls glibc that is being compiled by gcc, so failures will come from there. * anything autogenerated (java, dosbox, qemu, firefox 3.5...) * anything compiled with gcc 2.95.* and earlier. The worst thing about these failures is that they'll happen only very sporadically, gcc autovectorizer doesn't generate vector functions in most of the glibc, so most of the code will be seemingly unaffected. If at random place in some library gcc vectorizes something and that random place will be called from any of the above code, the crash will happen. So you'll get crashes at random points. To turn these random crashes into deterministic crashes, I suggest to try this. Hack gcc to generate test $15, %esp; jnz abort at the beginning of every function. Compile the whole Linux distribution with this gcc. Test it (including various 3rd party Linux program). If it works, come back later to this debate and propose how stacks should be 16-byte aligned. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38496
[Bug rtl-optimization/40838] gcc shouldn't assume that the stack is aligned
--- Comment #10 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-23 14:36 --- Jakub: so try that test $15, %esp; jnz abort at every function, as I proposed in bug #38496. There are much more places that will trigger this. Just go catch them. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[Bug regression/40665] dereferencing type-punned pointer warnings cannot be disabled
--- Comment #7 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-07 16:31 --- extern int c; int a(void) { return *(short *)(void *)c; } This is a very bad example of a false positive as you are acessing an int as a short; that is undefined. I will look at your code later on, my laptop for home is currently broken. Whether it is a false positive or not depends on the context. For example, if I call function a() from function b(): int b(void) { c = 0x12345678; __asm__ volatile (:::memory); return a(); } the code is valid and the function b() must return a fixed value depending on the endianity of the machine. That __asm__ statement works as a barrier that prevents the compiler from reordering two accesses to c. So I am not against the warning. The warning is good. The problem is that there is no way to shut up the warning. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40665
[Bug regression/40667] Performance regression: stack frames are generated even with -fomit-frame-pointer
--- Comment #3 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-07 16:48 --- Why do you limit your stack boundary artificially? Because if I don't use FP code there is no point in aligning the stack. Aligning the stack wastes stack space and code size and doesn't improve performance of integer code in any way. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug regression/40665] dereferencing type-punned pointer warnings cannot be disabled
--- Comment #8 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-07 16:45 --- Thus code is undefined you have an acess of a char array as a struct. Yes you are only taking the address of an element but it is still considered an acess by the standards. Why is it undefined? An object shall have its store value accessed only by an lvalue that has one of the following types ... * a character type So you say that converting the char * pointer to struct * pointer is understood as accessing the stored value by the standard? The only possible problem could be some hypothetical computer that cannot hold misaligned pointers to structs (common computers allow unaligned pointers and trap only on dereferencing unaligned pointers, not on generating them). But once I know that I have a computer that allows unaligned pointers to structs, there should be a method how to shut the warning up (with (void *) cast). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40665
[Bug regression/40665] dereferencing type-punned pointer warnings cannot be disabled
--- Comment #10 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-07 18:07 --- So you mean that that -x operator is invalid and break the standard? Anyway the standard means if you write your code according to the standard = the code will run correctly, but the inverse implication doesn't apply. It is sometimes required to break the C standard. For example, you write unsigned char *framebuffer = vga_getgraphmem(); and now you want to access the framebuffer. According to the standard, you could only do it by bytes. But that means one bus cycle to the videocard for every byte transfered. So people understand that common processors allow aligned accesses to 2, 4 or 8 bytes and that common videocards have their framebuffer base address aligned on same larger boundary --- and they simply cast the pointer to u_int32_t or u_int64_t and access the videoram faster. Each time you watch some video, remember those undefined memory accesses that are hapenning for you to get faster performance :) Another example --- the C standard says how it's not allowed to even produce a pointer that points before the allocated array or more than one entry after the last entry of the array and how it's not allowed to subtract two pointers from different arrays. It is perfectly rational --- unless you are writing the memory allocator itself! Then you inevitably must do some operations that are considered undefined by the standard. Regarding that int * to short * cast --- obviously, the code may run on some computer that has tagged memory and traps if access with invalid type is done. But common computers don't have tagged memory and the programmer should be allowed to do such casts if he understands the implications (for example, if its done only in arch-specific part of an operating system, it is perfectly legal). So there should be a method to silence the warning. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40665
[Bug regression/40665] dereferencing type-punned pointer warnings cannot be disabled
--- Comment #12 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-07 19:40 --- So if there was char *buffer = malloc(512) instead of char buffer[512], would it be correct to cast it to the pointer to structure? And it is not about the cast between the pointer types which causes it to be undefined but rather the accesses. What is considered the access in that code? Is it the - operator? operator? = operator? Cast operator? Or anything else? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40665
[Bug regression/40665] New: dereferencing type-punned pointer warnings cannot be disabled
Gcc became recently (4.4) very bad regarding false positive type-punned warnings. In previous versions, the warnings could be suppressed by casting to (void *), in 3.x and 4.1 it worked perfectly, in 4.3 it still worked somehow (except in -Wstrict-aliasing=3 mode), in 4.4 there are cases where it doesn't work at all. I don't want to completely disable the warnings with -Wno-strict-aliasing (this could leave bugs unnotified), but I need a method to disable them on case-by-case basis once I verified that the code in question is correct. Simple example, compile with -O2 -Wall: extern int c; int a(void) { return *(short *)(void *)c; } In 4.4 the warning can't be disabled at all! The (void *) cast doesn't suppress the warning and none of three options to -Wstrict-aliasing helps. In 4.3 the cast to (void *) suppressed the warning in -Wstrict-aliasing 1,2 modes (and didn't suppress it in the default mode 3), in 4.4 the warning can't be suppressed at all. Gcc developers tried to made these warnings more intelligent with less false positives, but unfortunatelly they completely broke the method to disable them in the specific case. For me, false positives are not a major problem --- when I get a false positive, I just read the code, check it and if I conclude that it's OK, I disable the warning with (void *). But if there's no way to disable false positives, it makes the warnings completely useless. -- Summary: dereferencing type-punned pointer warnings cannot be disabled Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: major Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40665
[Bug regression/40665] dereferencing type-punned pointer warnings cannot be disabled
--- Comment #1 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-07 01:22 --- Created an attachment (id=18145) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18145action=view) a bug in -Wstrict-aliasing=3 This is an example of a flaw in -Wstrict-aliasing=3 (this mode is very bad, produces many false positives on my project and I'm wondering why is it default?) Gcc man page says that -Wstrict-aliasing=3 produces less false positives than -Wstrict-aliasing=2. This is counterexample, it produces type-punned warning in -Wstrict-aliasing=3 mode and doesn't warn in -Wstrict-aliasing=2. I added (void *) casts everywhere, but they don't quash the warning. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40665
[Bug regression/40665] dereferencing type-punned pointer warnings cannot be disabled
--- Comment #2 from mikulas at artax dot karlin dot mff dot cuni dot cz 2009-07-07 01:34 --- Created an attachment (id=18146) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18146action=view) a bug in -Wstrict-aliasing=3 This is an example of a flaw in -Wstrict-aliasing=3 (this mode is very bad, produces many false positives on my project and I'm wondering why is it default?) Gcc man page says that -Wstrict-aliasing=3 produces less false positives than -Wstrict-aliasing=2. This is counterexample, it produces type-punned warning in -Wstrict-aliasing=3 mode and doesn't warn in -Wstrict-aliasing=2. I added (void *) casts everywhere, but they don't quash the warning. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40665
[Bug regression/40667] New: Performance regression: stack frames are generated even with -fomit-frame-pointer
This is performance regression from 4.3 (which was better). On i386, when -O2 -fomit-frame-pointer -mpreferred-stack-boundary=2 is used, and function operates with long long values, stack frame is generated, although it doesn't have to be. Example: int f(long long x); int g(long long x) { f(x); return 0; } Generated code: .text .p2align 4,,15 .globl g .type g, @function g: pushl %ebp movl%esp, %ebp subl$8, %esp movl8(%ebp), %eax movl12(%ebp), %edx movl%eax, (%esp) movl%edx, 4(%esp) callf xorl%eax, %eax leave ret .size g, .-g Gcc 4.3 didn't generate stack frame in this case. On i386, spurious stack frame is especially bad because there are few registers and one register is lost for the stack frame. One my program doing heavy 64-bit math shows almost 1% code size increase because of these unneeded frames. -- Summary: Performance regression: stack frames are generated even with -fomit-frame-pointer Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40667
[Bug other/36781] New: gcc can't be compiled in an environment that requires CFLAGS
If you have an environment where special CFLAGS are needed for compilation, gcc doesn't compile (in my example, it was Sparc that requires -m64). The configure/make scripts are so messed up that they will eventually drop CFLAGS and try to compile parts of gcc without them. How to reproduce on any system: move file /usr/bin/gcc to /usr/bin/gcc.x. Create new /usr/bin/gcc containing this script: #!/bin/sh if [ -z `echo $@|grep -- -fomit-frame-pointer` ]; then echo GCC WAS EXECUTED WITHOUT CFLAGS exit 1 fi gcc.x $@ Unpack gcc, set CFLAGS, CPPFLAGS and LDFLAGS to -fomit-frame-pointer and try to compile. gcc attempts to compile libiberty without -fomit-frame-pointer and fails. I tried to set the variables before configure, to set the variables on make command line, tried also setting LIBCFLAGS and BOOT_CFLAGS, but none helps. The build scripts should be fixed to not lose CFLAGS and proper method for setting the flags (before ./configure, like most packages? or make CFLAGS=flags?) should be documented. The build stdout should be grepped for lines containing the compiler and not containing CFLAGS to check that CFLAGS are not lost in any part of compilation. -- Summary: gcc can't be compiled in an environment that requires CFLAGS Product: gcc Version: 4.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: sparc64-unknown-linux-gnu GCC host triplet: sparc64-unknown-linux-gnu GCC target triplet: sparc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36781
[Bug target/35504] New: incorrect code generated on i386 for C++ multiple inheritance, large return structures and regparm or fastcall calling conventions
Hi When GCC generates virtual methods for objects with multiple inheritance, it creates special thunk functions that adjust this pointer and jump to an original method. When a member function returns a structure, the first argument is a pointer where return structure should be placed, the second argument is this pointer and additional arguments are arguments of the member function. When using regparm 2 or 3, the first argument (ptr to return structure) is in EAX, the second argument (this) is in EDX and additional arguments are in ECX and on the stack. The thunking function is generated incorrectly and always tries to adjust EAX, causing corruption to this pointer and to the return value. Similarly, when using fastcall convention, the pointer to return structure is in ECX and this is in EDX, however thunking function adjusts ECX. This bug is present in all GCC releases. -- Summary: incorrect code generated on i386 for C++ multiple inheritance, large return structures and regparm or fastcall calling conventions Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35504
[Bug target/35504] incorrect code generated on i386 for C++ multiple inheritance, large return structures and regparm or fastcall calling conventions
--- Comment #1 from mikulas at artax dot karlin dot mff dot cuni dot cz 2008-03-08 04:51 --- Created an attachment (id=15279) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15279action=view) A testcase for the bug A testcase for thunk functions with all calling conventions. It fails with regparm(1), regparm(2), regparm(3) and fastcall conventions. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35504
[Bug target/35504] incorrect code generated on i386 for C++ multiple inheritance, large return structures and regparm or fastcall calling conventions
--- Comment #2 from mikulas at artax dot karlin dot mff dot cuni dot cz 2008-03-08 04:55 --- Created an attachment (id=15280) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15280action=view) a patch for the bug A patch for gcc 4.3.0. When the function returns an aggregate value: --- in fastcall mode, adjust %EDX, not %ECX --- in regparm(1) mode, adjust 4(%ESP), not %EAX --- in regparm(2) and regparm(3) mode, adjust %EDX, not %EAX -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35504
[Bug target/12081] Gcc can't be compiled with -mregparm=3
--- Comment #16 from mikulas at artax dot karlin dot mff dot cuni dot cz 2007-10-28 16:09 --- Subject: Re: Gcc can't be compiled with -mregparm=3 arguments the function receives. We have gen_* functions taking 0, 1, 2, 3, ... arguments and with GCC being designed the way it is, they need to be prototyped and defined with the same arguments. You are most definitely welcome to post some non-varargs code which works. You are most definitely also welcome to time it against my patches. You can cast them at the time of calling and store them as void * in the table --- that is standard-compliant. Or you can define union gen_function { rtx (*one_arg)(rtx); rtx (*two_args)(rtx, rtx); rtx (*three_args)(rtx, rtx, rtx); ... etc; }; --- this will be correct C without the performance impact of varargs. Mikulas -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12081
[Bug rtl-optimization/21299] New: internal error on invalid asm statement (3.2, 3.3, 3.4, 4.0)
The following incorrect code compiled with -O, -O2 or -O3 generates an internal error. Bug happens on gcc-3.2.3, gcc-3.3.3, gcc-3.4.3, gcc-4.0.0 void f(unsigned long long a) { __asm__ (nop::ad(a)); } as.c: In function 'f': as.c:4: error: unrecognizable insn: (insn:HI 12 7 13 0 (parallel [ (asm_operands/v (nop) () 0 [ (reg/v:DI 0 ax [orig:58 a ] [58]) ] [ (asm_input:DI (ad)) ] (as.c) 3) (clobber (reg:QI 19 dirflag)) (clobber (reg:QI 18 fpsr)) (clobber (reg:QI 17 flags)) ]) -1 (insn_list:REG_DEP_TRUE 6 (nil)) (nil)) as.c:4: internal compiler error: in reload_cse_simplify_operands, at postreload.c:391 Please submit a full bug report, with preprocessed source if appropriate. See URL:http://gcc.gnu.org/bugs.html for instructions. -- Summary: internal error on invalid asm statement (3.2, 3.3, 3.4, 4.0) Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: minor Priority: P2 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21299
[Bug c/20645] New: Complilation success depends on optimization being used
When this function is compiled with -O, it works. When it's compiled without -O, it reports error error: can't find a register in class `GENERAL_REGS' while reloading `asm'. I think that syntactic corectness of language shouldn't depend on optimization flags --- it should either report error in both cases or always succeed. void f(char *p) { asm volatile (::m(p[0]),m(p[0]),m(p[0]),m(p[0]),m(p[0]),m(p[0]),m(p[0]),m(p[0])); } -- Summary: Complilation success depends on optimization being used Product: gcc Version: 3.4.3 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20645
[Bug c/18063] New: Gcc doesn't check overflowed size of structure
The following code compiles and runs, but shouldn't, because the size of structure a overflows size_t type. Overflowed size is checked for arrays, for global and local variables, but not for structures. struct a { char x[0x7fff]; char b[0x7fff]; char c[3]; }; main() { struct a *b = malloc(sizeof(struct a)); return sizeof (struct a); } -- Summary: Gcc doesn't check overflowed size of structure Product: gcc Version: 3.4.2 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikulas at artax dot karlin dot mff dot cuni dot cz CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18063
[Bug c/18063] Gcc doesn't check overflowed size of structure
--- Additional Comments From mikulas at artax dot karlin dot mff dot cuni dot cz 2004-10-19 17:32 --- Subject: Re: Gcc doesn't check overflowed size of structure If you rewrite it to int main(void) { size_t c = sizeof(struct a); struct a *b = malloc(c); return sizeof (struct a); } , it doesn't give warning with -W -Wall (except for unused b). BTW. for array too large it gives error, so I think for structure, it should too. Mikulas -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18063