Re: [fpc-devel] i386-linux switched to a 16 byte aligned stack
On Mon, 16 Sep 2019 at 14:58, Ben Grasset wrote: > On Sun, Sep 15, 2019 at 1:36 PM Florian Klämpfl > wrote: >> In r43005 to 43014 I committed a couple of patches so FPC generates >> stack frames aligned to 16 byte boundaries on i386-linux > > Good change! Means, for example, the long-standing issues with popular > libraries like SDL2 on 32-bit Linux won't be a problem anymore. Wow, I almost forgot about this train wreck. For us it was opencv and we luckily had our own library between fpc and opencv so we could add -mstackrealign. Henry ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] i386-linux switched to a 16 byte aligned stack
On Sun, Sep 15, 2019 at 1:36 PM Florian Klämpfl wrote: > In r43005 to 43014 I committed a couple of patches so FPC generates > stack frames aligned to 16 byte boundaries on i386-linux > Good change! Means, for example, the long-standing issues with popular libraries like SDL2 on 32-bit Linux won't be a problem anymore. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] i386-linux switched to a 16 byte aligned stack
Ah whoops, misunderstood. Only for i386-linux, not i386-win32 as well. Would there be benefits to aligning the stack on that platform as well though? Gareth aka. Kit On 16/09/2019 13:32, J. Gareth Moreton wrote: It's a useful feature as far as hand-written and generated assembly language is concerned. The Intel SIMD instruction sets work far better with aligned memory (e.g. you can use MOVAPS instead of MOVUPS, the former being faster on older CPUs but triggering a segmentation fault if the memory is unaligned). Granted, while vectorcall currently only works on x86_64-win64 because I was able to re-use the code for the System V ABI, with an aligned stack it might make it potentially easier to port it to i386-win32 eventually (under Microsoft Visual C++, __vectorcall is supported on 32-bit platforms by only using ECX and EDX as the integer registers... the same as __fastcall... speaking of 'fastcall' I do wonder if it's worth implementing that calling convention in case one wants to communicate with an external library that uses the convention). Gareth aka. Kit On 15/09/2019 21:07, Florian Klämpfl wrote: Am 15.09.19 um 19:35 schrieb Florian Klämpfl: In r43005 to 43014 I committed a couple of patches so FPC generates stack frames aligned to 16 byte boundaries on i386-linux (before a call instruction, esp is dividable by 16). This is done because it seems that linux library start to depend on this property gcc ensures for around 20 years. To ensure this, FPC uses the same approach as clang (and as FPC for i386-darwin uses): esp has a fixed value fulfilling the alignment requirements during the whole procedure. Outgoing parameters are copied by mov instead of push instructions onto the stack. The consequences of these changes are: - For pure pascal programs, this does not change anything. The resulting code might be slightly bigger but in turn floating point code might be faster as double values can be properly aligned now. - Most assembler code is not affected by the change. Only code using constants to access the stack via esp might be affected, such code is rare. - Assembler code calling other procedures should be adapted to keep the stack aligned to 16 byte boundaries as well. Assembler code working on i386-darwin fulfills this requirement already. The define FPC_STACKALIGNMENT contains the alignment of the stack (16 in the case of i386-linux). - To test if the stack is always properly aligned, compile with -Ct: the stack checking code for i386-linux checks the stack alignment now as well. One thing (and actually an important one) I forgot to mention: this means also that the regcall calling conventions we use by default on i386-linux use now a caller-cleared stack. I forgot about because even our regression tests did not find this. OTOH it means, that probably little code out there is affected by this, an exception might be PascalScript. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] i386-linux switched to a 16 byte aligned stack
It's a useful feature as far as hand-written and generated assembly language is concerned. The Intel SIMD instruction sets work far better with aligned memory (e.g. you can use MOVAPS instead of MOVUPS, the former being faster on older CPUs but triggering a segmentation fault if the memory is unaligned). Granted, while vectorcall currently only works on x86_64-win64 because I was able to re-use the code for the System V ABI, with an aligned stack it might make it potentially easier to port it to i386-win32 eventually (under Microsoft Visual C++, __vectorcall is supported on 32-bit platforms by only using ECX and EDX as the integer registers... the same as __fastcall... speaking of 'fastcall' I do wonder if it's worth implementing that calling convention in case one wants to communicate with an external library that uses the convention). Gareth aka. Kit On 15/09/2019 21:07, Florian Klämpfl wrote: Am 15.09.19 um 19:35 schrieb Florian Klämpfl: In r43005 to 43014 I committed a couple of patches so FPC generates stack frames aligned to 16 byte boundaries on i386-linux (before a call instruction, esp is dividable by 16). This is done because it seems that linux library start to depend on this property gcc ensures for around 20 years. To ensure this, FPC uses the same approach as clang (and as FPC for i386-darwin uses): esp has a fixed value fulfilling the alignment requirements during the whole procedure. Outgoing parameters are copied by mov instead of push instructions onto the stack. The consequences of these changes are: - For pure pascal programs, this does not change anything. The resulting code might be slightly bigger but in turn floating point code might be faster as double values can be properly aligned now. - Most assembler code is not affected by the change. Only code using constants to access the stack via esp might be affected, such code is rare. - Assembler code calling other procedures should be adapted to keep the stack aligned to 16 byte boundaries as well. Assembler code working on i386-darwin fulfills this requirement already. The define FPC_STACKALIGNMENT contains the alignment of the stack (16 in the case of i386-linux). - To test if the stack is always properly aligned, compile with -Ct: the stack checking code for i386-linux checks the stack alignment now as well. One thing (and actually an important one) I forgot to mention: this means also that the regcall calling conventions we use by default on i386-linux use now a caller-cleared stack. I forgot about because even our regression tests did not find this. OTOH it means, that probably little code out there is affected by this, an exception might be PascalScript. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel