Re: [PATCH] Fix the GNU Stack markings on libgcc.a
On 05/02/2018 06:17 PM, Magnus Granberg wrote: > torsdag 3 maj 2018 kl. 01:07:51 CEST skrev Daniel Santos: >> Hello >> >> On 05/01/2018 06:32 AM, Magnus Granberg wrote: >>> New patch >>> libgcc/ChangeLog: >>> >>> 2018-05-01 Magnus Granberg <zo...@gentoo.org> >>> >>> * config/i386/resms64.h: Add .note.GNU-stack section >>> * config/i386/resms64f.h: Likewise. >>> * config/i386/resms64fx.h: Likewise. >>> * config/i386/resms64x.h: Likewise. >>> * config/i386/savms64.h: Likewise. >>> * config/i386/savms64f.h: Likewise. >>> >>> --- >> Well this isn't correct either because you are outside of the inclusion >> guard. Can you please move this up a line? >> >> Thanks, >> Daniel > /libgcc/ChangeLog: > 2018-05-01 Magnus Granberg <zo...@gentoo.org> > > * config/i386/resms64.h: Add .note.GNU-stack section > * config/i386/resms64f.h: Likewise. > * config/i386/resms64fx.h: Likewise. > * config/i386/resms64x.h: Likewise. > * config/i386/savms64.h: Likewise. > * config/i386/savms64f.h: Likewise. > > --- No, I meant to move the changes up a line so that, if for some reason the header was included twice, that it wouldn't output the section twice. Example: MS2SYSV_STUB_END(savms64_18) +#if·defined(__linux__)·&&·defined(__ELF__) +.section·.note.GNU-stack,"",%progbits +#endif #endif·/*·__x86_64__·*/ But upon further reflection, I think it can be cleanly added to i386-asm.h. Does that look sane Jakub? (I haven't tried it) Also, for the sake of my education, I don't exactly understand what the problem is as I haven't been keeping up with pax and hardening. I just want to clarify that the stack shouldn't be executable. These are not actual "functions" per-se (i.e., they do not adhere to any ABI), they operate on the stack of the calling function. Thanks, Daniel diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h index 267133a9b75..7eb3c12fc85 100644 --- a/libgcc/config/i386/i386-asm.h +++ b/libgcc/config/i386/i386-asm.h @@ -80,6 +80,10 @@ ASMNAME(fn): #ifdef MS2SYSV_STUB_PREFIX +# if·defined(__linux__)·&&·defined(__ELF__) +.section·.note.GNU-stack,"",%progbits +# endif + # define MS2SYSV_STUB_BEGIN(base_name) \ HIDDEN_FUNC(PASTE2(MS2SYSV_STUB_PREFIX, base_name))
Re: [PATCH] Fix the GNU Stack markings on libgcc.a
Hello On 05/01/2018 06:32 AM, Magnus Granberg wrote: > New patch > libgcc/ChangeLog: > > 2018-05-01 Magnus Granberg> > * config/i386/resms64.h: Add .note.GNU-stack section > * config/i386/resms64f.h: Likewise. > * config/i386/resms64fx.h: Likewise. > * config/i386/resms64x.h: Likewise. > * config/i386/savms64.h: Likewise. > * config/i386/savms64f.h: Likewise. > > --- Well this isn't correct either because you are outside of the inclusion guard. Can you please move this up a line? Thanks, Daniel
Re: [PATCH] Correct debug for -mcall-ms2sysv-xlogues stubs (PR target/83917, take 2)
On 02/26/2018 02:20 AM, Jakub Jelinek wrote: > On Sun, Feb 25, 2018 at 05:56:28PM -0600, Daniel Santos wrote: >>> --- libgcc/config/i386/i386-asm.h.jj2018-01-03 10:42:56.317763517 >>> +0100 >>> +++ libgcc/config/i386/i386-asm.h 2018-02-22 15:33:43.812922298 +0100 >>> @@ -27,8 +27,47 @@ see the files COPYING3 and COPYING.RUNTI >>> #define I386_ASM_H >>> >>> #include "auto-target.h" >>> +#undef PACKAGE_VERSION >>> +#undef PACKAGE_NAME >>> +#undef PACKAGE_STRING >>> +#undef PACKAGE_TARNAME >>> +#undef PACKAGE_URL >> This is a beautiful, temporary(?) fix to an ugly problem! >> >>> #include "auto-host.h" >>> --- libgcc/config/i386/cygwin.S.jj 2018-01-03 10:42:56.309763515 +0100 >>> +++ libgcc/config/i386/cygwin.S 2018-02-22 15:30:34.597925496 +0100 >>> @@ -23,31 +23,13 @@ >>> * <http://www.gnu.org/licenses/>. >>> */ >>> >>> -#include "auto-host.h" >> The following include should be here. >> >> +#include "i386-asm.h" > I don't understand this. i386-asm.h needs (both before my patch and after > it) both auto-host.h and auto-target.h, as it tests > HAVE_GAS_SECTIONS_DIRECTIVE (this one newly, comes from cygwin.S) The problem is that HAVE_GAS_SECTIONS_DIRECTIVE gets defined (or not) in ../../gcc/auto-host.h, but you are testing it before including auto-host.h, either directly or via i386-asm.h. So if i386-asm.h depends upon HAVE_GAS_SECTIONS_DIRECTIVE first being defined then it is a circular dependency. In its current form, cygwin.S would never define USE_GAS_CFI_DIRECTIVES prior to including i386-asm.h and also never emit .cfi_sections .debug_frame and rather or not USE_GAS_CFI_DIRECTIVES ends up being defined to 1 or 0 depends upon the test of __GCC_HAVE_DWARF2_CFI_ASM in i386-asm.h. So this area is new for me, but I don't understand why we're testing HAVE_GAS_SECTIONS_DIRECTIVE in cygwin.S and __GCC_HAVE_DWARF2_CFI_ASM when included from one of the stubs. Is this an error, or a lack of my understanding or both? :) > HAVE_GAS_HIDDEN > macros defined in auto-host.h > and > HAVE_AS_AVX > macro defined in auto-target.h. > Including auto-host.h when i386-asm.h will include it again just doesn't > work, these headers don't have multiple inclusion guards. And only including > auto-target.h will work only if the > .hidden > and > .cfi_sections .debug_frame > tests are duplicated from gcc/configure.ac to libgcc/configure.ac, then we > could include just auto-target.h in i386-asm.h. > I've just followed what i386-asm.h has been doing. And it's possible that I failed to test something correctly before presuming it to be available, although I *think* the test for .hidden is good. > > Jakub > Thanks for your work on this. If we need to test for CFI directives differently when being included from cygwin.S, maybe we can just define a simple cpp macro to indicate this and let i386-asm.h encapsulate the implementation of it (e.g., testing HAVE_GAS_SECTIONS_DIRECTIVE or __GCC_HAVE_DWARF2_CFI_ASM as appropriate). Ultimately, the proper cleanup will be moving these tests out of {gcc,libgcc}/configure.ac and into .m4 files in the root config directory so that we don't uglify them with massive copy & pastes. These tests are also fairly complex as there are a lot of dependencies. m4 isn't my strong suite, but I can look at this after we're out of code freeze. Daniel
Re: [PATCH] Correct debug for -mcall-ms2sysv-xlogues stubs (PR target/83917, take 2)
Sorry for the dropping the ball on this and thank you Jakub for stepping in! I've had a patch set sort-of rotting in my local repo, but I like yours better. I think I had gotten hung up on trying to figure out how to write a test for this, and like you I just tested mine manually in gdb. I do have one correction though. On 02/22/2018 08:56 AM, Jakub Jelinek wrote: > Hi! > > On Sat, Jan 20, 2018 at 06:01:16PM -0600, Daniel Santos wrote: >> Thanks. I like the idea of commonizing the macros for consistency. > Didn't see a progress on this P3 for a while, so I've written this > version of the patch; no tests though, what I've been using in testing was: > /* { dg-do compile { target lp64 } } */ > /* { dg-options "-mno-avx -msse2 -mcall-ms2sysv-xlogues -O2" } */ > > void __attribute__((sysv_abi, noipa)) > foo (void) > { > } > > static void __attribute__((sysv_abi)) (*volatile foop) () = foo; > > void __attribute__((ms_abi, noipa)) > bar (void) > { > foop (); > } > > int > main () > { > bar (); > return 0; > } > > with/without -fno-omit-frame-pointer, disas bar; b on the tail > call in there, stepi; bt (which before the patch failed, now works), > also up; p $rbp to see if %rbp has been properly declared to be saved. > There is no need to cfi_startproc/cfi_endproc for every single entrypoint in > there, it is enough if the whole range is covered. On the other side > we need the cfi_offset for the frame pointer case, otherwise up; p/x $rbp > doesn't work properly. > > Ok for trunk if it passes bootstrap/regtest on x86_64-linux and i686-linux? > > 2018-02-22 Jakub Jelinek <ja...@redhat.com> > > PR debug/83917 > * config/i386/i386-asm.h (PACKAGE_VERSION, PACKAGE_NAME, > PACKAGE_STRING, PACKAGE_TARNAME, PACKAGE_URL): Undefine between > inclusion of auto-target.h and auto-host.h. > (USE_GAS_CFI_DIRECTIVES): Define if not defined already based on > __GCC_HAVE_DWARF2_CFI_ASM. > (cfi_startproc, cfi_endproc, cfi_adjust_cfa_offset, > cfi_def_cfa_register, cfi_def_cfa, cfi_register, cfi_offset, cfi_push, > cfi_pop): Define. > * config/i386/cygwin.S: Don't include auto-host.h here, just > define USE_GAS_CFI_DIRECTIVES to 1 or 0 and include i386-asm.h. > (cfi_startproc, cfi_endproc, cfi_adjust_cfa_offset, > cfi_def_cfa_register, cfi_register, cfi_push, cfi_pop): Remove. > * config/i386/resms64fx.h: Add cfi_* directives. > * config/i386/resms64x.h: Likewise. > > --- libgcc/config/i386/i386-asm.h.jj 2018-01-03 10:42:56.317763517 +0100 > +++ libgcc/config/i386/i386-asm.h 2018-02-22 15:33:43.812922298 +0100 > @@ -27,8 +27,47 @@ see the files COPYING3 and COPYING.RUNTI > #define I386_ASM_H > > #include "auto-target.h" > +#undef PACKAGE_VERSION > +#undef PACKAGE_NAME > +#undef PACKAGE_STRING > +#undef PACKAGE_TARNAME > +#undef PACKAGE_URL This is a beautiful, temporary(?) fix to an ugly problem! > #include "auto-host.h" > > +#ifndef USE_GAS_CFI_DIRECTIVES > +# ifdef __GCC_HAVE_DWARF2_CFI_ASM > +# define USE_GAS_CFI_DIRECTIVES 1 > +# else > +# define USE_GAS_CFI_DIRECTIVES 0 > +# endif > +#endif > +#if USE_GAS_CFI_DIRECTIVES > +# define cfi_startproc() .cfi_startproc > +# define cfi_endproc() .cfi_endproc > +# define cfi_adjust_cfa_offset(X).cfi_adjust_cfa_offset X > +# define cfi_def_cfa_register(X) .cfi_def_cfa_register X > +# define cfi_def_cfa(R,O).cfi_def_cfa R, O > +# define cfi_register(D,S) .cfi_register D, S > +# define cfi_offset(R,O) .cfi_offset R, O > +# ifdef __x86_64__ > +# define cfi_push(X).cfi_adjust_cfa_offset 8; > .cfi_rel_offset X, 0 > +# define cfi_pop(X) .cfi_adjust_cfa_offset -8; .cfi_restore X > +# else > +# define cfi_push(X).cfi_adjust_cfa_offset 4; > .cfi_rel_offset X, 0 > +# define cfi_pop(X) .cfi_adjust_cfa_offset -4; .cfi_restore X > +# endif > +#else > +# define cfi_startproc() > +# define cfi_endproc() > +# define cfi_adjust_cfa_offset(X) > +# define cfi_def_cfa_register(X) > +# define cfi_def_cfa(R,O) > +# define cfi_register(D,S) > +# define cfi_offset(R,O) > +# define cfi_push(X) > +# define cfi_pop(X) > +#endif > + > #define PASTE2(a, b) PASTE2a(a, b) > #define PASTE2a(a, b) a ## b > > --- libgcc/config/i386/cygwin.S.jj2018-01-03 10:42:56.309763515 +0100 > +++ libgcc/config/i386/cygwin.S 2018-02-22 15:30:34.597925496 +0100 > @@ -23,31 +23,13 @@ > * <http://www.gnu.org/licenses/>. > */ > > -#include "auto-host.h&quo
Re: [PATCH, x86, libgcc] PR target/83917 Correct debug for -mcall-ms2sysv-xlogues stubs
On 01/19/2018 05:35 PM, Jakub Jelinek wrote: > On Fri, Jan 19, 2018 at 05:33:10PM -0600, Daniel Santos wrote: >> When stepping through tail-call restore stubs the debugger has to assume >> that rsp - 8 is the CFA, although it is not. This is because I did not >> explicitly add any .cfi directives. This patch adds them to the >> tail-call restore stubs, but this is new territory for me, so I would >> appreciate feedback. >> >> I've reg-tested on x86_64, but I still need to test on Solaris and >> Darwin. OK to commit after those tests? > I think you can't assume that the assembler supports .cfi_* directives. > While e.g. libgcc/config/i386/morestack.S uses them unconditionally, > it is guarded with: > if test "$libgcc_cv_cfi" = "yes"; then > tmake_file="${tmake_file} t-stack i386/t-stack-i386" > fi Ah hah! That explains a lot. Yeah, I wasn't thinking all assemblers would support it but I saw them in the Solaris assembler manual and figured that they were maybe more widely supported than I had thought. > in config.host. E.g. cygwin.S has: > #ifdef HAVE_GAS_CFI_SECTIONS_DIRECTIVE > .cfi_sections .debug_frame > # define cfi_startproc().cfi_startproc > # define cfi_endproc() .cfi_endproc > # define cfi_adjust_cfa_offset(X) .cfi_adjust_cfa_offset X > # define cfi_def_cfa_register(X).cfi_def_cfa_register X > # define cfi_register(D,S) .cfi_register D, S > # ifdef __x86_64__ > # define cfi_push(X) .cfi_adjust_cfa_offset 8; .cfi_rel_offset X, 0 > # define cfi_pop(X).cfi_adjust_cfa_offset -8; .cfi_restore X > # else > # define cfi_push(X) .cfi_adjust_cfa_offset 4; .cfi_rel_offset X, 0 > # define cfi_pop(X).cfi_adjust_cfa_offset -4; .cfi_restore X > # endif > #else > # define cfi_startproc() > # define cfi_endproc() > # define cfi_adjust_cfa_offset(X) > # define cfi_def_cfa_register(X) > # define cfi_register(D,S) > # define cfi_push(X) > # define cfi_pop(X) > #endif /* HAVE_GAS_CFI_SECTIONS_DIRECTIVE */ > perhaps you need something similar or commonize that (though, without > .cfi_sections, you want the default). > > Jakub Thanks. I like the idea of commonizing the macros for consistency. As far as adding tests, I guess I would need to dig into lib/gcc-gdb-test.exp to figure out how to do that. Daniel
[PATCH, x86, libgcc] PR target/83917 Correct debug for -mcall-ms2sysv-xlogues stubs
When stepping through tail-call restore stubs the debugger has to assume that rsp - 8 is the CFA, although it is not. This is because I did not explicitly add any .cfi directives. This patch adds them to the tail-call restore stubs, but this is new territory for me, so I would appreciate feedback. I've reg-tested on x86_64, but I still need to test on Solaris and Darwin. OK to commit after those tests? Thanks, Daniel Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- libgcc/config/i386/resms64fx.h | 19 +++ libgcc/config/i386/resms64x.h | 22 ++ 2 files changed, 41 insertions(+) diff --git a/libgcc/config/i386/resms64fx.h b/libgcc/config/i386/resms64fx.h index c5f63d879fe..7dc8c7d89ed 100644 --- a/libgcc/config/i386/resms64fx.h +++ b/libgcc/config/i386/resms64fx.h @@ -34,21 +34,40 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see .text MS2SYSV_STUB_BEGIN(resms64fx_17) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x68(%rsi),%r15 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_16) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x60(%rsi),%r14 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_15) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x58(%rsi),%r13 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_14) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x50(%rsi),%r12 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_13) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x48(%rsi),%rbx +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64fx_12) +.cfi_startproc +.cfi_def_cfa %rbp, 16 mov -0x40(%rsi),%rdi SSE_RESTORE mov -0x38(%rsi),%rsi leaveq +.cfi_def_cfa %rsp, 8 ret +.cfi_endproc MS2SYSV_STUB_END(resms64fx_12) MS2SYSV_STUB_END(resms64fx_13) MS2SYSV_STUB_END(resms64fx_14) diff --git a/libgcc/config/i386/resms64x.h b/libgcc/config/i386/resms64x.h index 1b44938ae7c..753be1f4c52 100644 --- a/libgcc/config/i386/resms64x.h +++ b/libgcc/config/i386/resms64x.h @@ -33,23 +33,45 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see .text MS2SYSV_STUB_BEGIN(resms64x_18) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x70(%rsi),%r15 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_17) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x68(%rsi),%r14 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_16) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x60(%rsi),%r13 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_15) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x58(%rsi),%r12 +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_14) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x50(%rsi),%rbp +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_13) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x48(%rsi),%rbx +.cfi_endproc MS2SYSV_STUB_BEGIN(resms64x_12) +.cfi_startproc +.cfi_def_cfa %r10, 8 mov -0x40(%rsi),%rdi SSE_RESTORE mov -0x38(%rsi),%rsi mov %r10,%rsp +.cfi_def_cfa_register %rsp ret +.cfi_endproc MS2SYSV_STUB_END(resms64x_12) MS2SYSV_STUB_END(resms64x_13) MS2SYSV_STUB_END(resms64x_14) -- 2.15.0
Re: [PATCH] Fix ms-sysv.exp testsuite FAILs (PR c/83117)
On 11/28/2017 05:22 AM, Jakub Jelinek wrote: > On Mon, Nov 27, 2017 at 05:02:32PM -0600, Daniel Santos wrote: >>> --- gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc.jj 2017-05-22 >>> 10:49:45.0 +0200 >>> +++ gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc 2017-11-27 >>> 11:57:14.889570915 +0100 >>> @@ -392,7 +392,7 @@ static void make_do_tests_decl (const ve >>> continue; >>> >>> comma.reset (); >>> - out << "static __attribute__ ((ms_abi)) long (*const do_test_" >>> + out << "static __attribute__ ((ms_abi)) long (*do_test_" >>> << (unaligned ? "u" : "") >>> << (varargs ? "v" : "") << i << ") ("; >> I don't have a problem with removing const, it's only there for >> const-correctness and caution. I just posted to the PR a bit ago and >> I'm curious if there is a better approach when using assembly stubs that >> are meant to be called in varying ways. CV would work also, although >> there's no real need to refetch the address before each use. >> >> If you don't have a better way to do this then please use this patch. > I've verified the resulting *.optimized dump as well as assembly is > practically identical without/with the patch, only differences are in > SSA_NAME versions, in assembly the .LC and .LCFI constants are > different but otherwise it is the same - the functions are emitted in > different orders by cgraph and committed the patch. > > Using assembly stubs that are meant to be called in varying ways should > just be avoided in portable programs, you could e.g. in the generator > instead of all those: > extern __attribute__ ((ms_abi)) long do_test_aligned (); > extern __attribute__ ((ms_abi)) long do_test_unaligned (); > static __attribute__ ((ms_abi)) long (*do_test_1) (long a) = > (void*)do_test_aligned; > static __attribute__ ((ms_abi)) long (*do_test_v1) (long a, ...) = > (void*)do_test_aligned; > static __attribute__ ((ms_abi)) long (*do_test_u1) (long a) = > (void*)do_test_unaligned; > static __attribute__ ((ms_abi)) long (*do_test_uv1) (long a, ...) = > (void*)do_test_unaligned; > emit: > extern __attribute__ ((ms_abi)) long do_test_1 (long a); > asm (".text; do_test_1: jmp do_test_aligned; .previous"); > extern __attribute__ ((ms_abi)) long do_test_v1 (long a, ...); > asm (".text; do_test_v1: jmp do_test_aligned; .previous"); > extern __attribute__ ((ms_abi)) long do_test_1 (long a); > asm (".text; do_test_u1: jmp do_test_unaligned; .previous"); > extern __attribute__ ((ms_abi)) long do_test_1 (long a, ...); > asm (".text; do_test_uv1: jmp do_test_unaligned; .previous"); > or something similar. > > Jakub Ah hah! That would indeed work. Thanks for the tip. I have some improvements to make to this set of tests, mostly tests triggered by GCC_TEST_RUN_EXPENSIVE, but perhaps I can make this modification as well. Come to think of it, attribute naked might work too. Thanks, Daniel
Re: [PATCH] Fix ms-sysv.exp testsuite FAILs (PR c/83117)
On 11/27/2017 04:34 PM, Jakub Jelinek wrote: > Hi! > > As mentioned in the PR, my C FE rvalue folding patch allows folding > const variable initializers into the uses of those variables in rvalue > contexts more than before, and so we get warnings about UB in the test, > because an unprototyped function is cast to a function type with ellipsis in > it. > > It isn't entirely clear what exactly the test wants to test, as mentioned > in the PR, this is one of the options how to solve it, by dropping the > const it can't be optimized in the FEs (the optimizers can still figure out > the static vars are never written to). Another option would be just > add -w to dg-options, another one is const volatile. > > Regtested on x86_64-linux and i686-linux, ok for trunk? > > 2017-11-27 Jakub Jelinek> > PR c/83117 > * gcc.target/x86_64/abi/ms-sysv/gen.cc (make_do_tests_decl): Drop > const from do_test_{u,v}*. > > --- gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc.jj 2017-05-22 > 10:49:45.0 +0200 > +++ gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc2017-11-27 > 11:57:14.889570915 +0100 > @@ -392,7 +392,7 @@ static void make_do_tests_decl (const ve > continue; > > comma.reset (); > - out << "static __attribute__ ((ms_abi)) long (*const do_test_" > + out << "static __attribute__ ((ms_abi)) long (*do_test_" > << (unaligned ? "u" : "") > << (varargs ? "v" : "") << i << ") ("; > > > Jakub > I don't have a problem with removing const, it's only there for const-correctness and caution. I just posted to the PR a bit ago and I'm curious if there is a better approach when using assembly stubs that are meant to be called in varying ways. CV would work also, although there's no real need to refetch the address before each use. If you don't have a better way to do this then please use this patch. Thanks! Daniel
Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
On 11/03/2017 04:22 PM, Daniel Santos wrote: > ... > How does this patch look? (Also, I've updated comments for > choose_baseaddr.) Currently re-running tests. > > Thanks, > Daniel > > @@ -13110,10 +13125,26 @@ ix86_expand_prologue (void) >target. */ >if (TARGET_SEH) > m->fs.sp_valid = false; > -} > > - if (m->call_ms2sysv) > -ix86_emit_outlined_ms2sysv_save (frame); > + /* If SP offset is non-immediate after allocation of the stack frame, > + then emit SSE saves or stub call prior to allocating the rest of the > + stack frame. This is less efficient for the out-of-line stub because > + we can't combine allocations across the call barrier, but it's better > + than using a scratch register. */ > + else if (!x86_64_immediate_operand (GEN_INT > (frame.stack_pointer_offset - m->fs.sp_realigned_offset), Pmode)) Oops, and also after fixing this formatting... Daniel
Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
On 11/03/2017 02:09 AM, Uros Bizjak wrote: > On Thu, Nov 2, 2017 at 11:43 PM, Daniel Santos <daniel.san...@pobox.com> > wrote: > >>>>int_registers_saved = (frame.nregs == 0); >>>>sse_registers_saved = (frame.nsseregs == 0); >>>> + save_stub_call_needed = (m->call_ms2sysv); >>>> + gcc_assert (!(!sse_registers_saved && save_stub_call_needed)); >>> Oooh, double negation :( >> I'm just saying that we shouldn't be saving SSE registers inline and via >> the stub. If I followed the naming convention of e.g., >> "see_registers_saved" then my variable would end up being called >> "save_stub_called" which would be incorrect and misleading, similar to >> how "see_registers_saved" is misleading when there are in fact no SSE >> register that need to be saved. Maybe I should rename >> (int|sse)_registers_saved to (int|sse)_register_saves_needed with >> inverted logic instead. > But, we can just say > > gcc_assert (sse_registers_saved || !save_stub_call_needed); > > No? > > Uros. > Oh yes, I see. Because "sse_registers_saved" really means that we've either already saved them or don't have to, and not literally that they have been saved. I ranted about it's name but didn't think it all the way through. :) How does this patch look? (Also, I've updated comments for choose_baseaddr.) Currently re-running tests. Thanks, Daniel diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2967876..fb81d4dba84 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11515,12 +11515,15 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx _reg, an alignment value (in bits) that is preferred or zero and will recieve the alignment of the base register that was selected, irrespective of rather or not CFA_OFFSET is a multiple of that - alignment value. + alignment value. If it is possible for the base register offset to be + non-immediate then SCRATCH_REGNO should specify a scratch register to + use. The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, + unsigned int scratch_regno = INVALID_REGNUM) { rtx base_reg = NULL; HOST_WIDE_INT base_offset = 0; @@ -11534,6 +11537,19 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) choose_basereg (cfa_offset, base_reg, base_offset, 0, align); gcc_assert (base_reg != NULL); + + rtx base_offset_rtx = GEN_INT (base_offset); + + if (!x86_64_immediate_operand (base_offset_rtx, Pmode)) +{ + gcc_assert (scratch_regno != INVALID_REGNUM); + + rtx scratch_reg = gen_rtx_REG (Pmode, scratch_regno); + emit_move_insn (scratch_reg, base_offset_rtx); + + return gen_rtx_PLUS (Pmode, base_reg, scratch_reg); +} + return plus_constant (Pmode, base_reg, base_offset); } @@ -12793,23 +12809,19 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame ) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout = xlogue_layout::get_instance (); - HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; /* AL should only be live with sysv_abi. */ gcc_assert (!ix86_eax_live_at_start_p ()); + gcc_assert (m->fs.sp_offset >= frame.sse_reg_save_offset); /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather we've actually realigned the stack or not. */ align = GET_MODE_ALIGNMENT (V4SFmode); addr = choose_baseaddr (frame.stack_realign_offset - + xlogue.get_stub_ptr_offset (), ); + + xlogue.get_stub_ptr_offset (), , AX_REG); gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - emit_insn (gen_rtx_SET (rax, addr)); - /* Allocate stack if not already done. */ - if (allocate > 0) - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, -GEN_INT (-allocate), -1, false); + emit_insn (gen_rtx_SET (rax, addr)); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP @@ -12841,6 +12853,7 @@ ix86_expand_prologue (void) HOST_WIDE_INT allocate; bool int_registers_saved; bool sse_registers_saved; + bool save_stub_call_needed; rtx static_chain = NULL_RTX; if (ix86_function_naked (current_function_decl)) @@ -13016,6 +13029,8 @@ ix86_expand_prologue (void) int_registers_saved = (frame.nregs == 0); sse_registers_saved = (frame.nsseregs == 0); + save_stub_call_needed = (m->call_ms2sysv); + gcc_assert (sse_registers_saved || !save_stub_call_needed); if (frame_pointer_needed && !m->fs.fp_valid) { @@ -13110,10 +13125,26 @@ ix86_expand_prologue (void) target. */ if (TARGET_SEH) m->fs.sp_va
Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
On 10/31/2017 04:31 AM, Uros Bizjak wrote: > On Tue, Oct 31, 2017 at 3:09 AM, Daniel Santos <daniel.san...@pobox.com> > wrote: >> When we are realigning the stack pointer, making an ms_abi to sysv_abi >> call and alllocating 2GiB or more on the stack we end up with an invalid >> INSN due to a non-immediate offset. This occurs both with and without >> -mcall-ms2sysv-xlogues. Additionally, I've discovered that the stack >> allocation with -mcall-ms2sysv-xlogues is incorrect as it ignores stack >> checking, stack clash checking and probing. >> >> This patch fixes these problems by >> >> 1. No longer allocate stack space in ix86_emit_outlined_ms2sysv_save. >> 2. Rearrange where we emit SSE saves or stub call: >>a. Before frame allocation when offset from frame to save area is >= 2GiB. >>b. After frame allocation when frame is < 2GiB. (Stack allocations >> prior to the stub call can't be combined with those afterwards, so >> this is better when possible.) >> 3. Modify choose_baseaddr to take an optional scratch_regno argument >>and never return rtx that cannot be used as an immediate. >> >> gcc: >> config/i386/i386.c (choose_basereg): Use optional scratch >> register and add assertion. >> (x86_emit_outlined_ms2sysv_save): use scratch register when >> needed, and don't allocate stack. >> (ix86_expand_prologue): Rearrange where SSE saves/stub call is >> emitted, correct wrong allocation with -mcall-ms2sysv-xlogues. >> (ix86_emit_outlined_ms2sysv_restore): Fix non-immediate offsets. >> >> gcc/testsuite: >> gcc.target/i386/pr82002-2a.c: Change from xfail to fail. >> gcc.target/i386/pr82002-2b.c: Likewise. >> >> Signed-off-by: Daniel Santos <daniel.san...@pobox.com> >> --- >> gcc/config/i386/i386.c | 76 >> -- >> gcc/testsuite/gcc.target/i386/pr82002-2a.c | 2 - >> gcc/testsuite/gcc.target/i386/pr82002-2b.c | 2 - >> 3 files changed, 62 insertions(+), 18 deletions(-) >> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 83a07afb3e1..abd8e937e0d 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -11520,7 +11520,8 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx >> _reg, >> The valid base registers are taken from CFUN->MACHINE->FS. */ >> >> static rtx >> -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) >> +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, >> +int scratch_regno = -1) >> { >>rtx base_reg = NULL; >>HOST_WIDE_INT base_offset = 0; >> @@ -11534,6 +11535,28 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned >> int *align) >> choose_basereg (cfa_offset, base_reg, base_offset, 0, align); >> >>gcc_assert (base_reg != NULL); >> + >> + if (TARGET_64BIT) >> +{ >> + rtx base_offset_rtx = GEN_INT (base_offset); >> + >> + if (scratch_regno >= 0) >> + { >> + if (!x86_64_immediate_operand (base_offset_rtx, DImode)) >> + { >> + rtx tmp; >> + rtx scratch_reg = gen_rtx_REG (DImode, scratch_regno); >> + >> + emit_insn (gen_rtx_SET (scratch_reg, base_offset_rtx)); >> + tmp = gen_rtx_PLUS (DImode, scratch_reg, base_reg); >> + emit_insn (gen_rtx_SET (scratch_reg, tmp)); >> + return scratch_reg; >> + } >> + } >> + else >> + gcc_assert (x86_64_immediate_operand (base_offset_rtx, DImode)); >> +} >> + >>return plus_constant (Pmode, base_reg, base_offset); >> } > This function doesn't need to return a register, it can return plus > RTX. I'd suggest the following implementation: > > --cut here-- > Index: i386.c > === > --- i386.c (revision 254243) > +++ i386.c (working copy) > @@ -11520,7 +11520,8 @@ > The valid base registers are taken from CFUN->MACHINE->FS. */ > > static rtx > -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) > +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, > +unsigned int scratch_regno = INVALID_REGNUM) > { >rtx base_reg = NULL; >HOST_WIDE_INT base_offset = 0; > @@ -11534,6 +11535,19 @@ > choose_basereg (cfa_offset, base_reg, base_offset, 0, align); > >gcc
Re: [PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
On 10/30/2017 09:09 PM, Daniel Santos wrote: > 3. Modify choose_baseaddr to take an optional scratch_regno argument >and never return rtx that cannot be used as an immediate. I should amend this, it actually does a gcc_assert, so that won't happen if --enable-checking=no, but it would still fail later in expand. > static rtx > -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) > +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, > + int scratch_regno = -1) > { >rtx base_reg = NULL; >HOST_WIDE_INT base_offset = 0; > @@ -11534,6 +11535,28 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned > int *align) > choose_basereg (cfa_offset, base_reg, base_offset, 0, align); > >gcc_assert (base_reg != NULL); > + > + if (TARGET_64BIT) > +{ > + rtx base_offset_rtx = GEN_INT (base_offset); > + > + if (scratch_regno >= 0) > + { > + if (!x86_64_immediate_operand (base_offset_rtx, DImode)) > + { > + rtx tmp; > + rtx scratch_reg = gen_rtx_REG (DImode, scratch_regno); > + > + emit_insn (gen_rtx_SET (scratch_reg, base_offset_rtx)); > + tmp = gen_rtx_PLUS (DImode, scratch_reg, base_reg); > + emit_insn (gen_rtx_SET (scratch_reg, tmp)); > + return scratch_reg; > + } > + } > + else > + gcc_assert (x86_64_immediate_operand (base_offset_rtx, DImode)); > +} > + >return plus_constant (Pmode, base_reg, base_offset); > } Daniel
[PATCH 2/2] [i386] PR82002 Part 2: Correct non-immediate offset/invalid INSN
When we are realigning the stack pointer, making an ms_abi to sysv_abi call and alllocating 2GiB or more on the stack we end up with an invalid INSN due to a non-immediate offset. This occurs both with and without -mcall-ms2sysv-xlogues. Additionally, I've discovered that the stack allocation with -mcall-ms2sysv-xlogues is incorrect as it ignores stack checking, stack clash checking and probing. This patch fixes these problems by 1. No longer allocate stack space in ix86_emit_outlined_ms2sysv_save. 2. Rearrange where we emit SSE saves or stub call: a. Before frame allocation when offset from frame to save area is >= 2GiB. b. After frame allocation when frame is < 2GiB. (Stack allocations prior to the stub call can't be combined with those afterwards, so this is better when possible.) 3. Modify choose_baseaddr to take an optional scratch_regno argument and never return rtx that cannot be used as an immediate. gcc: config/i386/i386.c (choose_basereg): Use optional scratch register and add assertion. (x86_emit_outlined_ms2sysv_save): use scratch register when needed, and don't allocate stack. (ix86_expand_prologue): Rearrange where SSE saves/stub call is emitted, correct wrong allocation with -mcall-ms2sysv-xlogues. (ix86_emit_outlined_ms2sysv_restore): Fix non-immediate offsets. gcc/testsuite: gcc.target/i386/pr82002-2a.c: Change from xfail to fail. gcc.target/i386/pr82002-2b.c: Likewise. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 76 -- gcc/testsuite/gcc.target/i386/pr82002-2a.c | 2 - gcc/testsuite/gcc.target/i386/pr82002-2b.c | 2 - 3 files changed, 62 insertions(+), 18 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 83a07afb3e1..abd8e937e0d 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11520,7 +11520,8 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx _reg, The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx -choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) +choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align, +int scratch_regno = -1) { rtx base_reg = NULL; HOST_WIDE_INT base_offset = 0; @@ -11534,6 +11535,28 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) choose_basereg (cfa_offset, base_reg, base_offset, 0, align); gcc_assert (base_reg != NULL); + + if (TARGET_64BIT) +{ + rtx base_offset_rtx = GEN_INT (base_offset); + + if (scratch_regno >= 0) + { + if (!x86_64_immediate_operand (base_offset_rtx, DImode)) + { + rtx tmp; + rtx scratch_reg = gen_rtx_REG (DImode, scratch_regno); + + emit_insn (gen_rtx_SET (scratch_reg, base_offset_rtx)); + tmp = gen_rtx_PLUS (DImode, scratch_reg, base_reg); + emit_insn (gen_rtx_SET (scratch_reg, tmp)); + return scratch_reg; + } + } + else + gcc_assert (x86_64_immediate_operand (base_offset_rtx, DImode)); +} + return plus_constant (Pmode, base_reg, base_offset); } @@ -12793,23 +12816,22 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame ) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout = xlogue_layout::get_instance (); - HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; /* AL should only be live with sysv_abi. */ gcc_assert (!ix86_eax_live_at_start_p ()); + gcc_assert (m->fs.sp_offset >= frame.sse_reg_save_offset); /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather we've actually realigned the stack or not. */ align = GET_MODE_ALIGNMENT (V4SFmode); addr = choose_baseaddr (frame.stack_realign_offset - + xlogue.get_stub_ptr_offset (), ); + + xlogue.get_stub_ptr_offset (), , AX_REG); gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - emit_insn (gen_rtx_SET (rax, addr)); - /* Allocate stack if not already done. */ - if (allocate > 0) - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, - GEN_INT (-allocate), -1, false); + /* If choose_baseaddr returned our scratch register, then we don't need to + do another SET. */ + if (!REG_P (addr) || REGNO (addr) != AX_REG) +emit_insn (gen_rtx_SET (rax, addr)); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP @@ -12841,6 +12863,7 @@ ix86_expand_prologue (void) HOST_WIDE_INT allocate; bool int_registers_saved; bool sse_registers_saved; + bool save_stub_call_needed; rtx static_chain = NULL_RTX; if (ix86_function_naked (current_function_decl)) @@ -13016,6 +13039,8 @@ ix86_expand_prol
[PATCH 1/2] [i386] PR82002 Part 1: Correct ICE caused by wrong calculation.
This is a residual problem caused by the off-by-one error in sp_valid_at and fp_valid_at originally corrected in r252099. However, adding tests that include an ms_abi to sysv_abi call reveals an additional, more complex problem with an invalid INSN due to overflowing the s32 offset. Therefore I'm including all new tests, but marking ones that are broken by this additional problem as xfail and addressing that in the next patch. gcc: config/i386/i386.c (ix86_expand_epilogue): Correct stack calculation. gcc/testsuite: gcc.target/i386/pr82002-1.c: New test. gcc.target/i386/pr82002-2a.c: New xfail test. gcc.target/i386/pr82002-2b.c: New xfail test. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 2 +- gcc/testsuite/gcc.target/i386/pr82002-1.c | 12 gcc/testsuite/gcc.target/i386/pr82002-2a.c | 14 ++ gcc/testsuite/gcc.target/i386/pr82002-2b.c | 14 ++ 4 files changed, 41 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr82002-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr82002-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr82002-2b.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2de0dd0c283..83a07afb3e1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13812,7 +13812,7 @@ ix86_expand_epilogue (int style) the stack pointer, if we will restore SSE regs via sp. */ if (TARGET_64BIT && m->fs.sp_offset > 0x7fff - && sp_valid_at (frame.stack_realign_offset) + && sp_valid_at (frame.stack_realign_offset + 1) && (frame.nsseregs + frame.nregs) != 0) { pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, diff --git a/gcc/testsuite/gcc.target/i386/pr82002-1.c b/gcc/testsuite/gcc.target/i386/pr82002-1.c new file mode 100644 index 000..86678a01992 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr82002-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-Ofast -mstackrealign -mabi=ms" } */ + +void a (char *); +void +b () +{ + char c[100]; + c[1099511627776] = 'b'; + a (c); + a (c); +} diff --git a/gcc/testsuite/gcc.target/i386/pr82002-2a.c b/gcc/testsuite/gcc.target/i386/pr82002-2a.c new file mode 100644 index 000..bc85080ba8e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr82002-2a.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-Ofast -mstackrealign -mabi=ms" } */ +/* { dg-xfail-if "" { *-*-* } } */ +/* { dg-xfail-run-if "" { *-*-* } } */ + +void __attribute__((sysv_abi)) a (char *); +void +b () +{ + char c[100]; + c[1099511627776] = 'b'; + a (c); + a (c); +} diff --git a/gcc/testsuite/gcc.target/i386/pr82002-2b.c b/gcc/testsuite/gcc.target/i386/pr82002-2b.c new file mode 100644 index 000..10e44cd7b1d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr82002-2b.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-Ofast -mstackrealign -mabi=ms -mcall-ms2sysv-xlogues" } */ +/* { dg-xfail-if "" { *-*-* } } */ +/* { dg-xfail-run-if "" { *-*-* } } */ + +void __attribute__((sysv_abi)) a (char *); +void +b () +{ + char c[100]; + c[1099511627776] = 'b'; + a (c); + a (c); +} -- 2.14.3
[PATCH 0/2] [i386] PR82002 Correct ICE with large stack frame
I originally intended to submit the first part of this patch set a few weeks ago as it was simpler, but here is the full fix. The first part is a really simple follow-up fix to an off-by-one error H.J. originally fixed with r252099, but in the process of testing I discovered a more complex problem when we add a ms_abi to sysv_abi call that resulted in a bad INSN because I didn't check for a non-immediate offset. I originally wrote a different solution where I added a mechanism to struct ix86_frame to track and reuse a scratch register in the pro/epilogue, but then I realized that I didn't need that if I just emitted the SSE saves or stub call after the SP realignment and prior to allocating the remainder of the frame. However, I still need to use a scratch register sometimes in the epilogue, so I've added a simplified mechanism to choose_baseaddr to manage that, but not to track and reuse it for subsequent calls. Unfortunately, this sat for so long that there's two duplicates in Bugzilla now (pr82485 and pr82712). Regression tested with {,-m32} and I've started one for x32 even though it *shouldn't* affect it (in theory). Thanks, Daniel
[PATCH] [testsuite/i386] PR 82268 Correct FAIL when configured --with-cpu
When I originally wrote this test I wasn't wasn't aware of the --with-cpu configure option, so this change explicitly disables avx to make sure we choose the sse implementation, even when --with-cpu specifies an arch that has avx support. OK for head? gcc/testsuite/ChangeLog: gcc.target/i386/pr82196-1.c (dg-options): Add -mno-avx. Thanks, Daniel --- gcc/testsuite/gcc.target/i386/pr82196-1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c b/gcc/testsuite/gcc.target/i386/pr82196-1.c index 541d975480d..ff108132bb5 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c @@ -1,5 +1,5 @@ /* { dg-do compile { target lp64 } } */ -/* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ +/* { dg-options "-mno-avx -msse -mcall-ms2sysv-xlogues -O2" } */ /* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */ /* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */ -- 2.14.3
[PATCH try 3] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch
OK, hopefully I've gotten everything cleaned up. I should note that because I'm now including both auto-target.h from libgcc (for HAVE_AS_AVX) and auto-host.h from gcc (HAVE_GAS_HIDDEN) that there are several warnings for redefined macros, but I plan on fixing that once this is resolved. Thanks for all of the corrections and improvements! The previous test run was good, but I still need to recheck after these changes. gcc/testsuite: * gcc.target/i386/pr82196-1.c: (b): Remove volatile asm. * gcc.target/i386/pr82196-2.c: (b): Likewise. libgcc: * configure.ac: Add Check for HAVE_AS_AVX. * config.in: Regenerate. * configure: Likewise. * config/i386/i386-asm.h: Include auto-target.h from libgcc. (SSE_SAVE, SSE_RESTORE): Emit .byte sequence for !HAVE_AS_AVX. Correct out-of-date comments. Please let me know if you find any other issues. Thanks, Daniel diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c b/gcc/testsuite/gcc.target/i386/pr82196-1.c index ef858328f00..541d975480d 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */ -/* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */ +/* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */ +/* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */ void __attribute__((sysv_abi)) a() { } @@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() { static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a; void __attribute__((ms_abi)) b() { - __asm__ __volatile__ ("" :::"rbx", "rbp", "r12", "r13", "r14", "r15"); a_noinfo (); } diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c b/gcc/testsuite/gcc.target/i386/pr82196-2.c index 8fe58411d5e..7166d068bc1 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__avx_savms64_18" } } */ -/* { dg-final { scan-assembler "jmp.*__avx_resms64x_18" } } */ +/* { dg-final { scan-assembler "call.*__avx_savms64f?_12" } } */ +/* { dg-final { scan-assembler "jmp.*__avx_resms64f?x_12" } } */ void __attribute__((sysv_abi)) a() { } @@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() { static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a; void __attribute__((ms_abi)) b() { - __asm__ __volatile__ ("" :::"rbx", "rbp", "r12", "r13", "r14", "r15"); a_noinfo (); } diff --git a/libgcc/config.in b/libgcc/config.in index 7de22ee0a72..f9fb253874f 100644 --- a/libgcc/config.in +++ b/libgcc/config.in @@ -1,5 +1,8 @@ /* config.in. Generated from configure.ac by autoheader. */ +/* Define to 1 if the assembler supports AVX. */ +#undef HAVE_AS_AVX + /* Define to 1 if the target assembler supports thread-local storage. */ #undef HAVE_CC_TLS diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h index 424e0f72aac..aad1a752106 100644 --- a/libgcc/config/i386/i386-asm.h +++ b/libgcc/config/i386/i386-asm.h @@ -26,6 +26,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #ifndef I386_ASM_H #define I386_ASM_H +#include "auto-target.h" #include "auto-host.h" #define PASTE2(a, b) PASTE2a(a, b) @@ -69,13 +70,15 @@ ASMNAME(fn): #ifdef MS2SYSV_STUB_AVX # define MS2SYSV_STUB_PREFIX __avx_ -# define MOVAPS vmovaps +# ifdef HAVE_AS_AVX +# define MOVAPS vmovaps +# endif #elif defined(MS2SYSV_STUB_SSE) # define MS2SYSV_STUB_PREFIX __sse_ # define MOVAPS movaps #endif -#if defined (MS2SYSV_STUB_PREFIX) && defined (MOVAPS) +#ifdef MS2SYSV_STUB_PREFIX # define MS2SYSV_STUB_BEGIN(base_name) \ HIDDEN_FUNC(PASTE2(MS2SYSV_STUB_PREFIX, base_name)) @@ -83,8 +86,10 @@ ASMNAME(fn): # define MS2SYSV_STUB_END(base_name) \ FUNC_END(PASTE2(MS2SYSV_STUB_PREFIX, base_name)) -/* Save SSE registers 6-15. off is the offset of rax to get to xmm6. */ -# define SSE_SAVE \ +/* If expanding for sse or avx and we have assembler support. */ +# ifdef MOVAPS +/* Save SSE registers 6-15 using rax as the base address. */ +# define SSE_SAVE \ MOVAPS %xmm15,-0x30(%rax); \ MOVAPS %xmm14,-0x20(%rax); \ MOVAPS %xmm13,-0x10(%rax); \ @@ -96,8 +101,8 @@ ASMNAME(fn): MOVAPS %xmm7, 0x50(%rax); \ MOVAPS %xmm6, 0x60(%rax) -/* Restore SSE registers 6-15. off is the offset of rsi to get to xmm6. */ -# define SSE_RESTORE \ +/* Restore SSE registers 6-15 using rsi as the base address. */ +# define SSE_RESTORE \ MOVAPS -0x30(%rsi), %xmm15; \ MOVAPS -0x20(%rsi), %xmm14; \ MOVAPS -0x10(%rsi), %xmm13; \ @@ -108,6 +113,31 @@ ASMNAME(fn): MOVAPS 0x40(%rsi), %xmm8 ; \ MOVAPS 0x50(%rsi), %xmm7 ; \ MOVAPS 0x60(%rsi), %xmm6 - -#endif /*
Re: [PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch
On 09/22/2017 06:50 AM, Uros Bizjak wrote: > On Fri, Sep 22, 2017 at 1:27 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >> On Fri, Sep 22, 2017 at 12:28 PM, Daniel Santos <daniel.san...@pobox.com> >> wrote: >>> On 09/22/2017 03:28 AM, Rainer Orth wrote: >>>> Hi Daniel, >>>> >>>>> On 09/22/2017 02:18 AM, Rainer Orth wrote: >>>>>> Hi Daniel, >>>>>> >>>>>>> On 09/21/2017 05:18 PM, Daniel Santos wrote: >>>>>>>> So libgcc doesn't use a config.in. :( >>>>>>> Scratch that, I forgot that we're using gcc/config.in via auto-host.h. >>>>>>> So I only have to add this to gcc/configure.ac and it will be available >>>>>>> for my libgcc header -- this is what I used to sniff out support for the >>>>>>> .hidden directive. >>>>>> Please don't go that route: it's totally the wrong direction. There's >>>>>> work going on to further decouple libgcc from gcc-private headers and >>>>>> configure results. libgcc already has its own configure tests for >>>>>> assembler features, and its own config.in. What's wrong with adapting >>>>>> libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for >>>>>> libgcc? Should be trivial... >>>>>> >>>>>> Rainer >>>>>> >>>>> Oops, I just saw your email after submitting my other patch. Yes, I am >>>>> mistaken about config.in, sorry about that. I didn't see a config.h >>>>> file, but examining further it looks like it outputs to auto-target.h. >>>>> Also, I was looking for some HAVE_AS* macros, but they are named >>>>> differently. >>>> Right: though some are for assembler features, the macros are named >>>> differently. >>>> >>>>> I had previously included gcc's auto-host.h since it was in the include >>>>> path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll >>>> HAVE_GAS_HIDDEN actually ;-) >>>> >>>>> need to add that check into libgcc/configure.ac as well. Again, >>>>> shouldn't be that much code. Sound sane to you? >>>> You could do that, but it was already used before your patches, so >>>> please separate it from the current issue if you go down that route. >>>> libgcc is still full of cleanup possibilities :-) >>>> >>>> Rainer >>> OK, so I'm just adding HAVE_AS_AVX mostly as-is from libitm (we don't >>> have $target_cpu so I'm using $target). I do have minor concerns about >>> how this test will work on a cross-build -- I'm not an autotools expert >>> and I don't understand which assembler it will invoke, but the results >>> of the test failing only means we use .byte instead of the real >>> mnemonic, so it really shouldn't be a problem. >>> >>> I've got tests started again, so presuming that *this* one passes, is it >>> OK for the trunk? >>> >>> gcc/testsuite: >>> * gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break >>> on Solaris or with -mno-omit-frame-pointer. >> No need to explain the change in the ChangeLog. Just say "(b): Remove >> volatile asm." >> >>> * gcc.target/i386/pr82196-2.c: Likewise. >>> >>> libgcc: >>> * configure.ac: Add check for HAVE_AS_AVX. >>> * config.in: Regenerate. >>> * configure: Likewise. >>> * config/i386/i386-asm.h: Include auto-target.h from libgcc. >>> (SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_AVX and directly emit raw >>> .byte code when assembler doesn't support avx, correct >>> out-of-date comments. >> (SSE_SAVE, SSE_RESTORE): Emit .byte sequence for !HAVE_AS_AVX. >> Correct out-of-date comments. >> >>> gcc/testsuite/gcc.target/i386/pr82196-1.c | 5 ++- >>> gcc/testsuite/gcc.target/i386/pr82196-2.c | 5 ++- >>> libgcc/config.in | 3 ++ >>> libgcc/config/i386/i386-asm.h | 45 ++- >>> libgcc/configure | 39 +++ >>> libgcc/configure.ac | 16 ++ >>> 6 files changed, 100 insertions(+), 13 deletions(-) >> >> #ifdef MS2SYSV_STUB_AVX >> # define MS2SYSV_STUB_PREFIX __avx_ >> -# define MOVAPS vmovaps >> +#
Re: [PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch
On 09/22/2017 05:33 AM, Jakub Jelinek wrote: > On Fri, Sep 22, 2017 at 05:28:00AM -0500, Daniel Santos wrote: >> +/* If the assembler doesn't support AVX then directly emit machine code >> + for the instructions above directly. */ > Just a nit: too many "directly" words. > > Jakub Well I hate nits in my code (or comments), so it's appreciated. :)
[PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch
On 09/22/2017 03:28 AM, Rainer Orth wrote: > Hi Daniel, > >> On 09/22/2017 02:18 AM, Rainer Orth wrote: >>> Hi Daniel, >>> >>>> On 09/21/2017 05:18 PM, Daniel Santos wrote: >>>>> So libgcc doesn't use a config.in. :( >>>> Scratch that, I forgot that we're using gcc/config.in via auto-host.h. >>>> So I only have to add this to gcc/configure.ac and it will be available >>>> for my libgcc header -- this is what I used to sniff out support for the >>>> .hidden directive. >>> Please don't go that route: it's totally the wrong direction. There's >>> work going on to further decouple libgcc from gcc-private headers and >>> configure results. libgcc already has its own configure tests for >>> assembler features, and its own config.in. What's wrong with adapting >>> libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for >>> libgcc? Should be trivial... >>> >>> Rainer >>> >> Oops, I just saw your email after submitting my other patch. Yes, I am >> mistaken about config.in, sorry about that. I didn't see a config.h >> file, but examining further it looks like it outputs to auto-target.h. >> Also, I was looking for some HAVE_AS* macros, but they are named >> differently. > Right: though some are for assembler features, the macros are named > differently. > >> I had previously included gcc's auto-host.h since it was in the include >> path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll > HAVE_GAS_HIDDEN actually ;-) > >> need to add that check into libgcc/configure.ac as well. Again, >> shouldn't be that much code. Sound sane to you? > You could do that, but it was already used before your patches, so > please separate it from the current issue if you go down that route. > libgcc is still full of cleanup possibilities :-) > > Rainer OK, so I'm just adding HAVE_AS_AVX mostly as-is from libitm (we don't have $target_cpu so I'm using $target). I do have minor concerns about how this test will work on a cross-build -- I'm not an autotools expert and I don't understand which assembler it will invoke, but the results of the test failing only means we use .byte instead of the real mnemonic, so it really shouldn't be a problem. I've got tests started again, so presuming that *this* one passes, is it OK for the trunk? gcc/testsuite: * gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break on Solaris or with -mno-omit-frame-pointer. * gcc.target/i386/pr82196-2.c: Likewise. libgcc: * configure.ac: Add check for HAVE_AS_AVX. * config.in: Regenerate. * configure: Likewise. * config/i386/i386-asm.h: Include auto-target.h from libgcc. (SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_AVX and directly emit raw .byte code when assembler doesn't support avx, correct out-of-date comments. gcc/testsuite/gcc.target/i386/pr82196-1.c | 5 ++- gcc/testsuite/gcc.target/i386/pr82196-2.c | 5 ++- libgcc/config.in | 3 ++ libgcc/config/i386/i386-asm.h | 45 ++- libgcc/configure | 39 +++ libgcc/configure.ac | 16 ++ 6 files changed, 100 insertions(+), 13 deletions(-) Thanks, Daniel diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c b/gcc/testsuite/gcc.target/i386/pr82196-1.c index ef858328f00..541d975480d 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */ -/* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */ +/* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */ +/* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */ void __attribute__((sysv_abi)) a() { } @@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() { static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a; void __attribute__((ms_abi)) b() { - __asm__ __volatile__ ("" :::"rbx", "rbp", "r12", "r13", "r14", "r15"); a_noinfo (); } diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c b/gcc/testsuite/gcc.target/i386/pr82196-2.c index 8fe58411d5e..7166d068bc1 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__avx_savms64_18" } }
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
On 09/22/2017 02:18 AM, Rainer Orth wrote: > Hi Daniel, > >> On 09/21/2017 05:18 PM, Daniel Santos wrote: >>> So libgcc doesn't use a config.in. :( >> Scratch that, I forgot that we're using gcc/config.in via auto-host.h. >> So I only have to add this to gcc/configure.ac and it will be available >> for my libgcc header -- this is what I used to sniff out support for the >> .hidden directive. > Please don't go that route: it's totally the wrong direction. There's > work going on to further decouple libgcc from gcc-private headers and > configure results. libgcc already has its own configure tests for > assembler features, and its own config.in. What's wrong with adapting > libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for > libgcc? Should be trivial... > > Rainer > Oops, I just saw your email after submitting my other patch. Yes, I am mistaken about config.in, sorry about that. I didn't see a config.h file, but examining further it looks like it outputs to auto-target.h. Also, I was looking for some HAVE_AS* macros, but they are named differently. I had previously included gcc's auto-host.h since it was in the include path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll need to add that check into libgcc/configure.ac as well. Again, shouldn't be that much code. Sound sane to you? Thanks, Daniel
[PATCH] [i386, testsuite, libgcc] Fix build breakage on Mac and test FAILS on Solaris.
I've bootstrapped and reg-tested {,-m64} with this on 86_64-pc-linux-gnu, but I'm waiting for a reference test set to finish to compare them. I've verified that we're getting HAVE_AS_IX86_AVX in auto-host.h and I've also built and tested w/o to double-verify that the AVX code is correct. Now that I think of it, I didn't run the tests with -mno-omit-frame-pointer (to simulate Solaris), but since the tests pass as-is, I presume that it understands the 'f?' part of my regex. (A separate set of stubs are used when rbp is the frame pointer and these have 'f' appended to their names.) OK to commit once I get a clean compare? gcc: configure.ac: Add Check for HAVE_AS_IX86_AVX config.in: Regenerate. configure: Likewise. gcc/testsuite: gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break on Solaris or with -mno-omit-frame-pointer. gcc.target/i386/pr82196-2.c: Likewise. libgcc: config/i386/i386-asm.h (SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_IX86_AVX and directly emit raw .byte code when assembler doesn't support avx, correct out-of-date comments. Thanks, Daniel Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config.in | 6 + gcc/configure | 32 ++ gcc/configure.ac | 6 + gcc/testsuite/gcc.target/i386/pr82196-1.c | 5 ++-- gcc/testsuite/gcc.target/i386/pr82196-2.c | 5 ++-- libgcc/config/i386/i386-asm.h | 44 ++- 6 files changed, 85 insertions(+), 13 deletions(-) diff --git a/gcc/config.in b/gcc/config.in index 89d7108e8db..df2e518baa6 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -406,6 +406,12 @@ #endif +/* Define if your assembler supports avx extensions. */ +#ifndef USED_FOR_TARGET +#undef HAVE_AS_IX86_AVX +#endif + + /* Define if your assembler supports the Sun syntax for cmov. */ #ifndef USED_FOR_TARGET #undef HAVE_AS_IX86_CMOV_SUN_SYNTAX diff --git a/gcc/configure b/gcc/configure index 13f97cd3663..e982b86c25c 100755 --- a/gcc/configure +++ b/gcc/configure @@ -25881,6 +25881,38 @@ if test $gcc_cv_as_ix86_swap = yes; then $as_echo "#define HAVE_AS_IX86_SWAP 1" >>confdefs.h +fi + + +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for avx extensions" >&5 +$as_echo_n "checking assembler for avx extensions... " >&6; } +if test "${gcc_cv_as_ix86_avx+set}" = set; then : + $as_echo_n "(cached) " >&6 +else + gcc_cv_as_ix86_avx=no + if test x$gcc_cv_as != x; then +$as_echo 'vzeroupper' > conftest.s +if { ac_try='$gcc_cv_as $gcc_cv_as_flags -o conftest.o conftest.s >&5' + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 + (eval $ac_try) 2>&5 + ac_status=$? + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; } +then + gcc_cv_as_ix86_avx=yes +else + echo "configure: failed program was" >&5 + cat conftest.s >&5 +fi +rm -f conftest.o conftest.s + fi +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_ix86_avx" >&5 +$as_echo "$gcc_cv_as_ix86_avx" >&6; } +if test $gcc_cv_as_ix86_avx = yes; then + +$as_echo "#define HAVE_AS_IX86_AVX 1" >>confdefs.h + fi diff --git a/gcc/configure.ac b/gcc/configure.ac index 82711389281..a05f2ca10b2 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -4171,6 +4171,12 @@ foo: nop [AC_DEFINE(HAVE_AS_IX86_SWAP, 1, [Define if your assembler supports the swap suffix.])]) +gcc_GAS_CHECK_FEATURE([avx extensions], + gcc_cv_as_ix86_avx,,, + [vzeroupper],, + [AC_DEFINE(HAVE_AS_IX86_AVX, 1, +[Define if your assembler supports avx extensions.])]) + gcc_GAS_CHECK_FEATURE([different section symbol subtraction], gcc_cv_as_ix86_diff_sect_delta,,, [.section .rodata diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c b/gcc/testsuite/gcc.target/i386/pr82196-1.c index ef858328f00..541d975480d 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */ -/* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */ +/* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */ +/* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */ void __attribute__((sysv_abi)) a() { } @@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() { static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a; void __attribute__((ms_abi)) b() {
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
On 09/21/2017 05:18 PM, Daniel Santos wrote: > So libgcc doesn't use a config.in. :( Scratch that, I forgot that we're using gcc/config.in via auto-host.h. So I only have to add this to gcc/configure.ac and it will be available for my libgcc header -- this is what I used to sniff out support for the .hidden directive. I still wouldn't mind centralizing assembler sniffing in config/as.m4 (or similar) later on. Thanks, Daniel
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
On 09/21/2017 11:14 AM, Rainer Orth wrote: > Hi Daniel, > >> On 09/19/2017 01:58 AM, Jakub Jelinek wrote: >>> What can be done in libgcc is detect in configure whether the assembler >>> supports AVX, and if not, provide some alternative (e.g. because the insns >>> are always the same, you could just code them as .byte or something >>> similar). >>> >>> Say like: >>> --- i386-asm.h 2017-09-18 18:34:30.917126996 +0200 >>> +++ i386-asm.h 2017-09-19 08:56:58.829559038 +0200 >>> @@ -70,6 +70,7 @@ ASMNAME(fn): >>> #ifdef MS2SYSV_STUB_AVX >>> # define MS2SYSV_STUB_PREFIX __avx_ >>> # define MOVAPS vmovaps >>> +# define BYTE .byte >>> #elif defined(MS2SYSV_STUB_SSE) >>> # define MS2SYSV_STUB_PREFIX __sse_ >>> # define MOVAPS movaps >>> @@ -84,7 +85,8 @@ ASMNAME(fn): >>> FUNC_END(PASTE2(MS2SYSV_STUB_PREFIX, base_name)) >>> >>> /* Save SSE registers 6-15. off is the offset of rax to get to xmm6. */ >>> -# define SSE_SAVE \ >>> +# ifdef HAVE_AS_AVX >> I'm not exactly an autotools expert, but libtim defines HAVE_AS_AVX from >> libitm/acinclude.m4 -- of course I need it in libgcc. Similarly, gcc >> has a nice generic gcc_GAS_CHECK_FEATURE macro in gcc/acinclude.m4 which >> it uses for all of its HAVE_AS_* macro tests defined in >> gcc/configure.ac. I can just copy, paste and edit what's in libitm, but >> I find that rather distasteful. Is there a cleaner way to do this? Can >> I suck gcc_GAS_CHECK_FEATURE and it's deps out of gcc/acinclude.m4 and >> put it somewhere central, like config/as.m4? The upside would be the >> ability to make HAVE_AS_* macros in other sub-projects more uniform. > This might be an option as a followup: less code duplication is > certainly a good thing ;-) > >> Alternatively, I can just do the copy and paste and deal with it -- it's >> not that much code. :) > However, given that the above will take some time and testing and your > patch has broken macOS bootstrap, I'd go this route now to unbreak the > tree ASAP. > > Rainer A very good point! So libgcc doesn't use a config.in. :( So what about committing my patch as is with HAVE_AS_AVX never defined and the avx version of the stubs always being built via the .byte directives so that the build is un-broken, and then figure out how (and where) to add HAVE_AS_AVX afterwards? I would still prefer to run a full bootstrap, but updating an already build bootstrap is good and the tests should hopefully be fixed on Solaris as well. Thanks, Daniel diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c b/gcc/testsuite/gcc.target/i386/pr82196-1.c index ef858328f00..541d975480d 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */ -/* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */ +/* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */ +/* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */ void __attribute__((sysv_abi)) a() { } @@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() { static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a; void __attribute__((ms_abi)) b() { - __asm__ __volatile__ ("" :::"rbx", "rbp", "r12", "r13", "r14", "r15"); a_noinfo (); } diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c b/gcc/testsuite/gcc.target/i386/pr82196-2.c index 8fe58411d5e..7166d068bc1 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__avx_savms64_18" } } */ -/* { dg-final { scan-assembler "jmp.*__avx_resms64x_18" } } */ +/* { dg-final { scan-assembler "call.*__avx_savms64f?_12" } } */ +/* { dg-final { scan-assembler "jmp.*__avx_resms64f?x_12" } } */ void __attribute__((sysv_abi)) a() { } @@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() { static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a; void __attribute__((ms_abi)) b() { - __asm__ __volatile__ ("" :::"rbx", "rbp", "r12", "r13", "r14", "r15"); a_noinfo (); } diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h index 424e0f72aac..91e1c0123ff 100644 --- a/libgcc/config/i386/i386-asm.h +++ b/libgcc/config/i386/i386-asm.h @@ -69,13 +69,15 @@ ASMNAME(fn): #ifdef MS2SYSV_STUB_AVX # define MS2SYSV_STUB_PREFIX __avx_ -# define MOVAPS vmovaps +# ifdef HAVE_AS_AVX +# define MOVAPS vmovaps +# endif #elif defined(MS2SYSV_STUB_SSE) # define MS2SYSV_STUB_PREFIX __sse_ # define MOVAPS movaps #endif -#if defined (MS2SYSV_STUB_PREFIX) && defined (MOVAPS) +#if defined (MS2SYSV_STUB_PREFIX) # define MS2SYSV_STUB_BEGIN(base_name) \ HIDDEN_FUNC(PASTE2(MS2SYSV_STUB_PREFIX, base_name)) @@ -83,8 +85,10 @@ ASMNAME(fn): # define
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
On 09/19/2017 01:58 AM, Jakub Jelinek wrote: > What can be done in libgcc is detect in configure whether the assembler > supports AVX, and if not, provide some alternative (e.g. because the insns > are always the same, you could just code them as .byte or something similar). > > Say like: > --- i386-asm.h2017-09-18 18:34:30.917126996 +0200 > +++ i386-asm.h2017-09-19 08:56:58.829559038 +0200 > @@ -70,6 +70,7 @@ ASMNAME(fn): > #ifdef MS2SYSV_STUB_AVX > # define MS2SYSV_STUB_PREFIX __avx_ > # define MOVAPS vmovaps > +# define BYTE .byte > #elif defined(MS2SYSV_STUB_SSE) > # define MS2SYSV_STUB_PREFIX __sse_ > # define MOVAPS movaps > @@ -84,7 +85,8 @@ ASMNAME(fn): > FUNC_END(PASTE2(MS2SYSV_STUB_PREFIX, base_name)) > > /* Save SSE registers 6-15. off is the offset of rax to get to xmm6. */ > -# define SSE_SAVE \ > +# ifdef HAVE_AS_AVX I'm not exactly an autotools expert, but libtim defines HAVE_AS_AVX from libitm/acinclude.m4 -- of course I need it in libgcc. Similarly, gcc has a nice generic gcc_GAS_CHECK_FEATURE macro in gcc/acinclude.m4 which it uses for all of its HAVE_AS_* macro tests defined in gcc/configure.ac. I can just copy, paste and edit what's in libitm, but I find that rather distasteful. Is there a cleaner way to do this? Can I suck gcc_GAS_CHECK_FEATURE and it's deps out of gcc/acinclude.m4 and put it somewhere central, like config/as.m4? The upside would be the ability to make HAVE_AS_* macros in other sub-projects more uniform. Alternatively, I can just do the copy and paste and deal with it -- it's not that much code. :) Also, is there any sense in doing this same check for SSE support in the assembler? It's been out for 18 years now. Thanks, Daniel
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
On 09/19/2017 07:13 AM, Rainer Orth wrote: > Daniel Santos <daniel.san...@pobox.com> writes: > >> On 09/17/2017 10:53 AM, Uros Bizjak wrote: >>> OK. >>> >>> Thanks, >>> Uros. >> Thanks. I should have posted this Friday when my tests finished, but >> I'll be committing with one minor change so tests don't run on m32 or mx32: >> >> --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c >> +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c >> @@ -1,4 +1,4 @@ >> -/* { dg-do compile } */ >> +/* { dg-do compile { target lp64 } } */ >> /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ >> /* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */ >> /* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */ >> diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c >> b/gcc/testsuite/gcc.target/i386/pr82196-2.c >> index 31705bee29b..8fe58411d5e 100644 >> --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c >> +++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c >> @@ -1,4 +1,4 @@ >> -/* { dg-do compile } */ >> +/* { dg-do compile { target lp64 } } */ >> /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */ >> /* { dg-final { scan-assembler "call.*__avx_savms64_18" } } */ >> /* { dg-final { scan-assembler "jmp.*__avx_resms64x_18" } } */ >> >> Other than that, full regression tests pass. > However, they do FAIL on 64-bit Solaris/x86: > > +FAIL: gcc.target/i386/pr82196-1.c (test for excess errors) > > Excess errors: > /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/pr82196-1.c:14:1: > error: bp cannot be used in asm here > > +UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler call.*__sse_savms64_18 > +UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler jmp.*__sse_resms64x_18 > +FAIL: gcc.target/i386/pr82196-2.c (test for excess errors) > +UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler call.*__avx_savms64_18 > +UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler jmp.*__avx_resms64x_18 > > Rainer Sorry about that, I forgot about Solaris' default enabled frame pointers. I don't even need a test this complicated, I'll make it much simpler. Thanks, Daniel
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
On 09/19/2017 01:58 AM, Jakub Jelinek wrote: > On Mon, Sep 18, 2017 at 06:10:29PM -0500, Daniel Santos wrote: >> Mike, can you take a look at this please? >> >> On 09/18/2017 10:17 AM, Dominique d'Humières wrote: >>> This patch (r252896) breaks bootstrap on x86_64-apple-darwin10 configured >>> with >>> >>> ../work/configure --prefix=/opt/gcc/gcc8w >>> --enable-languages=c,c++,fortran,objc,obj-c++,ada,lto >>> --with-gmp=/opt/mp-new --with-system-zlib --with-isl=/opt/mp-new >>> --enable-lto --enable-plugin >>> >>> /opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/ >>> -B/opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/bin/ >>> -B/opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/lib/ -isystem >>> /opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/include -isystem >>> /opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/sys-include-g -O2 -O2 -g -O2 >>> -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format >>> -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem >>> ./include -mmacosx-version-min=10.5 -pipe -fno-common -g -DIN_LIBGCC2 >>> -fbuilding-libgcc -fno-stack-protector -mmacosx-version-min=10.5 -pipe >>> -fno-common -I. -I. -I../.././gcc -I../../../work/libgcc >>> -I../../../work/libgcc/. -I../../../work/libgcc/../gcc >>> -I../../../work/libgcc/../include -DHAVE_CC_TLS -DUSE_EMUTLS -o >>> avx_savms64_s.o -MT avx_savms64_s.o -MD -MP -MF avx_savms64_s.dep -DSHARED >>> -c -xassembler-with-cpp ../../../work/libgcc/config/i386/avx_savms64.S >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm15,-0x30(%rax)' >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm14,-0x20(%rax)' >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm13,-0x10(%rax)' >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm12, (%rax)' >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm11, 0x10(%rax)' >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm10, 0x20(%rax)' >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm9, 0x30(%rax)' >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm8, 0x40(%rax)' >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm7, 0x50(%rax)' >>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps >>> %xmm6, 0x60(%rax)' >>> make[3]: *** [avx_savms64_s.o] Error 1 >>> >>> Dominique >> Thanks for the report. AVX has been out since early 2011 and Wikipedia >> claims that AVX support was added to OSX in version 10.6.8 in June 2011 >> and you seem to be using 10.8.0. I would presume that also means that >> the assembler supports it. So I'm going to guess that it's the >> "-mmacosx-version-min=10.5" parameter. Can you please try setting that >> to 10.6.8 and let me know the result? I don't know what the minimum >> system requirements for GCC 8 are going to be, but if it includes these >> older versions of OSX then I'll have to figure out how to cope with it >> in the libgcc build. > What can be done in libgcc is detect in configure whether the assembler > supports AVX, and if not, provide some alternative (e.g. because the insns > are always the same, you could just code them as .byte or something similar). > > Say like: > --- i386-asm.h2017-09-18 18:34:30.917126996 +0200 > +++ i386-asm.h2017-09-19 08:56:58.829559038 +0200 > @@ -70,6 +70,7 @@ ASMNAME(fn): > #ifdef MS2SYSV_STUB_AVX > # define MS2SYSV_STUB_PREFIX __avx_ > # define MOVAPS vmovaps > +# define BYTE .byte > #elif defined(MS2SYSV_STUB_SSE) > # define MS2SYSV_STUB_PREFIX __sse_ > # define MOVAPS movaps > @@ -84,7 +85,8 @@ ASMNAME(fn): > FUNC_END(PASTE2(MS2SYSV_STUB_PREFIX, base_name)) > > /* Save SSE registers 6-15. off is the offset of rax to get to xmm6. */ > -# define SSE_SAVE \ > +# ifdef HAVE_AS_AVX > +# define SSE_SAVE \ > MOVAPS %xmm15,-0x30(%rax); \ > MOVAPS %xmm14,-0x20(%rax); \ > MOVAPS %xmm13,-0x10(%rax); \ > @@ -95,6 +97,21 @@ ASMNAME(fn): > MOVAPS %xmm8, 0x40(%rax); \ > MOVAPS %xmm7, 0x50(%rax); \ > MOVAPS %xmm6, 0x60(%rax) > +# else > +/* If the assembler doesn't have AVX support,
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
Mike, can you take a look at this please? On 09/18/2017 10:17 AM, Dominique d'Humières wrote: > This patch (r252896) breaks bootstrap on x86_64-apple-darwin10 configured with > > ../work/configure --prefix=/opt/gcc/gcc8w > --enable-languages=c,c++,fortran,objc,obj-c++,ada,lto --with-gmp=/opt/mp-new > --with-system-zlib --with-isl=/opt/mp-new --enable-lto --enable-plugin > > /opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/ > -B/opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/bin/ > -B/opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/lib/ -isystem > /opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/include -isystem > /opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/sys-include-g -O2 -O2 -g -O2 > -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format > -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem > ./include -mmacosx-version-min=10.5 -pipe -fno-common -g -DIN_LIBGCC2 > -fbuilding-libgcc -fno-stack-protector -mmacosx-version-min=10.5 -pipe > -fno-common -I. -I. -I../.././gcc -I../../../work/libgcc > -I../../../work/libgcc/. -I../../../work/libgcc/../gcc > -I../../../work/libgcc/../include -DHAVE_CC_TLS -DUSE_EMUTLS -o > avx_savms64_s.o -MT avx_savms64_s.o -MD -MP -MF avx_savms64_s.dep -DSHARED -c > -xassembler-with-cpp ../../../work/libgcc/config/i386/avx_savms64.S > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm15,-0x30(%rax)' > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm14,-0x20(%rax)' > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm13,-0x10(%rax)' > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm12, (%rax)' > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm11, 0x10(%rax)' > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm10, 0x20(%rax)' > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm9, 0x30(%rax)' > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm8, 0x40(%rax)' > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm7, 0x50(%rax)' > ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps > %xmm6, 0x60(%rax)' > make[3]: *** [avx_savms64_s.o] Error 1 > > Dominique Thanks for the report. AVX has been out since early 2011 and Wikipedia claims that AVX support was added to OSX in version 10.6.8 in June 2011 and you seem to be using 10.8.0. I would presume that also means that the assembler supports it. So I'm going to guess that it's the "-mmacosx-version-min=10.5" parameter. Can you please try setting that to 10.6.8 and let me know the result? I don't know what the minimum system requirements for GCC 8 are going to be, but if it includes these older versions of OSX then I'll have to figure out how to cope with it in the libgcc build. Thanks, Daniel
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
On 09/17/2017 10:53 AM, Uros Bizjak wrote: > OK. > > Thanks, > Uros. Thanks. I should have posted this Friday when my tests finished, but I'll be committing with one minor change so tests don't run on m32 or mx32: --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target lp64 } } */ /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ /* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */ /* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c b/gcc/testsuite/gcc.target/i386/pr82196-2.c index 31705bee29b..8fe58411d5e 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target lp64 } } */ /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */ /* { dg-final { scan-assembler "call.*__avx_savms64_18" } } */ /* { dg-final { scan-assembler "jmp.*__avx_resms64x_18" } } */ Other than that, full regression tests pass. Thanks, Daniel
[PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
I made a silly mistake in libgcc by testing the cpp macro __AVX__ to determine rather to use movaps or vmovaps in the stubs. This resulted in the stubs choice of instruction being decided by the machine flags when the compiler was built rather than those being supplied at the command line. This patch splits stubs into separate sse and avx versions so that both are available. gcc: config/i386/i386.c: (xlogue_layout::STUB_NAME_MAX_LEN): Increase to 20 bytes. (xlogue_layout::s_stub_names): Add an additional size-2 diminsion. (xlogue_layout::get_stub_name): Modify to select the appropairate sse and avx version of the stub. gcc/testsuite: gcc.target/i386/pr82196-1.c: New test. gcc.target/i386/pr82196-2.c: Likewise. libgcc: config/i386/i386-asm.h (PASTE2): New macro. (ASMNAME): Modify to use PASTE2. (MS2SYSV_STUB_PREFIX): New macro for isa prefix. (MS2SYSV_STUB_BEGIN, MS2SYSV_STUB_END): New macros for stub headers. config/i386/resms64.S: Rename to a header file, use MS2SYSV_STUB_BEGIN instead of HIDDEN_FUNC and MS2SYSV_STUB_END instead of FUNC_END. config/i386/resms64f.S: Likewise. config/i386/resms64fx.S: Likewise. config/i386/resms64x.S: Likewise. config/i386/savms64.S: Likewise. config/i386/savms64f.S: Likewise. config/i386/avx_resms64.S: New file that only defines a macro and includes it's corresponding header file. config/i386/avx_resms64f.S: Likewise. config/i386/avx_resms64fx.S: Likewise. config/i386/avx_resms64x.S: Likewise. config/i386/avx_savms64.S: Likewise. config/i386/avx_savms64f.S: Likewise. config/i386/sse_resms64.S: Likewise. config/i386/sse_resms64f.S: Likewise. config/i386/sse_resms64fx.S: Likewise. config/i386/sse_resms64x.S: Likewise. config/i386/sse_savms64.S: Likewise. config/i386/sse_savms64f.S: Likewise. config/i386/t-msabi: Modified to add avx and sse versions of stubs. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 15 ++- gcc/testsuite/gcc.target/i386/pr82196-1.c | 14 ++ gcc/testsuite/gcc.target/i386/pr82196-2.c | 14 ++ libgcc/config/i386/avx_resms64.S| 2 ++ libgcc/config/i386/avx_resms64f.S | 2 ++ libgcc/config/i386/avx_resms64fx.S | 2 ++ libgcc/config/i386/avx_resms64x.S | 2 ++ libgcc/config/i386/avx_savms64.S| 2 ++ libgcc/config/i386/avx_savms64f.S | 2 ++ libgcc/config/i386/i386-asm.h | 34 - libgcc/config/i386/{resms64.S => resms64.h} | 28 ++-- libgcc/config/i386/{resms64f.S => resms64f.h} | 24 - libgcc/config/i386/{resms64fx.S => resms64fx.h} | 24 - libgcc/config/i386/{resms64x.S => resms64x.h} | 28 ++-- libgcc/config/i386/{savms64.S => savms64.h} | 28 ++-- libgcc/config/i386/{savms64f.S => savms64f.h} | 24 - libgcc/config/i386/sse_resms64.S| 2 ++ libgcc/config/i386/sse_resms64f.S | 2 ++ libgcc/config/i386/sse_resms64fx.S | 2 ++ libgcc/config/i386/sse_resms64x.S | 2 ++ libgcc/config/i386/sse_savms64.S| 2 ++ libgcc/config/i386/sse_savms64f.S | 2 ++ libgcc/config/i386/t-msabi | 18 - 23 files changed, 173 insertions(+), 102 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr82196-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr82196-2.c create mode 100644 libgcc/config/i386/avx_resms64.S create mode 100644 libgcc/config/i386/avx_resms64f.S create mode 100644 libgcc/config/i386/avx_resms64fx.S create mode 100644 libgcc/config/i386/avx_resms64x.S create mode 100644 libgcc/config/i386/avx_savms64.S create mode 100644 libgcc/config/i386/avx_savms64f.S rename libgcc/config/i386/{resms64.S => resms64.h} (76%) rename libgcc/config/i386/{resms64f.S => resms64f.h} (79%) rename libgcc/config/i386/{resms64fx.S => resms64fx.h} (79%) rename libgcc/config/i386/{resms64x.S => resms64x.h} (77%) rename libgcc/config/i386/{savms64.S => savms64.h} (76%) rename libgcc/config/i386/{savms64f.S => savms64f.h} (79%) create mode 100644 libgcc/config/i386/sse_resms64.S create mode 100644 libgcc/config/i386/sse_resms64f.S create mode 100644 libgcc/config/i386/sse_resms64fx.S create mode 100644 libgcc/config/i386/sse_resms64x.S create mode 100644 libgcc/config/i386/sse_savms64.S create mode 100644 libgcc/config/i386/sse_savms64f.S diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b2b02acc58a..f0d7d0eb196 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@
Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
On 08/23/2017 01:12 AM, Uros Bizjak wrote: > On Wed, Aug 23, 2017 at 7:23 AM, Daniel Santos <daniel.san...@pobox.com> > wrote: >> On 08/22/2017 03:00 PM, Uros Bizjak wrote: >>> On Tue, Aug 22, 2017 at 9:47 PM, Daniel Santos <daniel.san...@pobox.com> >>> wrote: >>>>> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to >>>>> UNKNOWN_ABI. >>>> It would seem to me that UNSPECIFIED_ABI would be a better value name. >>>> >>>> Also, I don't really understand what opts_set and opts are, except that I >>>> had >>>> guessed opts_set is what the user asked for (or didn't ask for) and opts is >>>> what we're going to actually use. Am I close? >>> Yes. opts_set is a flag that user specified an option at the command line. >>> >>> However, I fail to see what is the problem. If nothing was specified, >>> then opts->x_ix86_abi is set to DEFAULT_ABI. >> That is not what is happening. If -mabi=sysv is specified, then the >> test (!opts_set->x_ix86_abi) is true since the value of SYSV_ABI is >> zero. When that is evaluated as true, then the abi is set to >> DEFAULT_ABI, which on Windows is MS_ABI, thus ignoring the command line >> option. > Let's use the following patch: > > --cut here-- > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 3c82ae64f4f2..f8590f663285 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -5682,7 +5682,7 @@ ix86_option_override_internal (bool main_args_p, > ? PMODE_DI : PMODE_SI; > >if (!opts_set->x_ix86_abi) > -opts->x_ix86_abi = DEFAULT_ABI; > +printf ("Using default ABI\n"), opts->x_ix86_abi = DEFAULT_ABI; > >/* For targets using ms ABI enable ms-extensions, if not > explicit turned off. For non-ms ABI we turn off this > --cut here-- > > $ ./cc1 -O2 -quiet hello.c > Using default ABI > $ ./cc1 -O2 -mabi=sysv -quiet hello.c > $ > $ ./cc1 -O2 -mabi=sysv -quiet hello.c > $ > > Again, opts_set is set to true when the option is specified on the > command line, it has nothing to do with the value of the option. Interesting, I get the same result and in fact I can't reproduce the bug anymore. Either I made a mistake somewhere (likely) or something else fixed the problem (less likely). I'll try again from where the trunk was when I filed the bug and close it either invalid or fixed depending upon which it is. Thanks! Daniel >> I'm guessing that if we don't specify an Init() option then it will >> default to zero? We just need a valid way to differentiate when >> -mabi=sysv has been passed from when nothing has been passed. > Yes, it defaults to zero, but since we live in c++ world nowadays, we > can't initialize enum with integer zero... > > Uros.
Re: [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
On 08/23/2017 08:26 AM, Uros Bizjak wrote: >> @@ -1822,6 +1845,7 @@ proc check_avx2_hw_available { } { >> expr 0 >> } else { >> check_runtime_nocache avx2_hw_available { >> + #include > Why is the above include needed? It is only needed to #define NULL. Without the include, I've had this function fail due to NULL being undefined. Daniel
Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
On 08/22/2017 03:00 PM, Uros Bizjak wrote: > On Tue, Aug 22, 2017 at 9:47 PM, Daniel Santos <daniel.san...@pobox.com> > wrote: >>> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to >>> UNKNOWN_ABI. >> It would seem to me that UNSPECIFIED_ABI would be a better value name. >> >> Also, I don't really understand what opts_set and opts are, except that I had >> guessed opts_set is what the user asked for (or didn't ask for) and opts is >> what we're going to actually use. Am I close? > Yes. opts_set is a flag that user specified an option at the command line. > > However, I fail to see what is the problem. If nothing was specified, > then opts->x_ix86_abi is set to DEFAULT_ABI. That is not what is happening. If -mabi=sysv is specified, then the test (!opts_set->x_ix86_abi) is true since the value of SYSV_ABI is zero. When that is evaluated as true, then the abi is set to DEFAULT_ABI, which on Windows is MS_ABI, thus ignoring the command line option. > Probably we don't need > Init(SYSV_ABI) in mabi= declaration at all. I'm guessing that if we don't specify an Init() option then it will default to zero? We just need a valid way to differentiate when -mabi=sysv has been passed from when nothing has been passed. Daniel > > Uros. > >> I'm re-running tests, so if they pass is this OK? >> >> Thanks, >> Daniel >> --- >> gcc/config/i386/i386-opts.h | 5 +++-- >> gcc/config/i386/i386.c | 3 +-- >> gcc/config/i386/i386.opt| 2 +- >> 3 files changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h >> index 542cd0f3d67..a1d1552a3c6 100644 >> --- a/gcc/config/i386/i386-opts.h >> +++ b/gcc/config/i386/i386-opts.h >> @@ -44,8 +44,9 @@ last_alg >> /* Available call abi. */ >> enum calling_abi >> { >> - SYSV_ABI = 0, >> - MS_ABI = 1 >> + UNSPECIFIED_ABI = 0, >> + SYSV_ABI = 1, >> + MS_ABI = 2 >> }; >> >> enum fpmath_unit >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 650bcbc65ae..c08ad55fcd9 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -5681,12 +5681,11 @@ ix86_option_override_internal (bool main_args_p, >> opts->x_ix86_pmode = TARGET_LP64_P (opts->x_ix86_isa_flags) >> ? PMODE_DI : PMODE_SI; >> >> - if (!opts_set->x_ix86_abi) >> + if (opts_set->x_ix86_abi == UNSPECIFIED_ABI) >> opts->x_ix86_abi = DEFAULT_ABI; >> >>if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags)) >> error ("-mabi=ms not supported with X32 ABI"); >> - gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI); >> >>/* For targets using ms ABI enable ms-extensions, if not >> explicit turned off. For non-ms ABI we turn off this >> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt >> index cd564315f04..f7b9f9707f7 100644 >> --- a/gcc/config/i386/i386.opt >> +++ b/gcc/config/i386/i386.opt >> @@ -525,7 +525,7 @@ Target Report Mask(IAMCU) >> Generate code that conforms to Intel MCU psABI. >> >> mabi= >> -Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(SYSV_ABI) >> +Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) >> Init(UNSPECIFIED_ABI) >> Generate code that conforms to the given ABI. >> >> Enum >> -- >> 2.13.3 >>
[PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
Changes to lib/target-supports.exp and documentation: * Add effective-targets avx512f and avx512f_runtime (needed for new tests). * Corrects bug in check_avx2_hw_available. * Adds documentation for effective-targets avx2, avx2_runtime (both missing), avx512f and avx512f_runtime. The following tests are added. The testcase in the PR is used as a base and relevant variants are added to test other factors affected by the patch set. pr80969-1.c Base test case. pr80969-2.c With ms to sysv call. pr80969-2a.c With ms to sysv call using stubs. pr80969-3.c With alloca (for DRAP test). pr80969-4.c With va_args passed via va_list pr80969-4a.c With va_args passed via va_list and ms to sysv call. pr80969-4b.c With va_args passed via va_list and ms to sysv call using stubs. pr80969-4.h Common header for pr80969-4*.c. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/doc/sourcebuild.texi | 12 +++ gcc/testsuite/gcc.target/i386/pr80969-1.c | 16 gcc/testsuite/gcc.target/i386/pr80969-2.c | 27 +++ gcc/testsuite/gcc.target/i386/pr80969-2a.c | 8 ++ gcc/testsuite/gcc.target/i386/pr80969-3.c | 32 gcc/testsuite/gcc.target/i386/pr80969-4.c | 9 +++ gcc/testsuite/gcc.target/i386/pr80969-4.h | 119 + gcc/testsuite/gcc.target/i386/pr80969-4a.c | 9 +++ gcc/testsuite/gcc.target/i386/pr80969-4b.c | 9 +++ gcc/testsuite/lib/target-supports.exp | 66 10 files changed, 307 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.h create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index e6313dc031e..0bf4d6afeb6 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1855,6 +1855,18 @@ Target supports compiling @code{avx} instructions. @item avx_runtime Target supports the execution of @code{avx} instructions. +@item avx2 +Target supports compiling @code{avx2} instructions. + +@item avx2_runtime +Target supports the execution of @code{avx2} instructions. + +@item avx512f +Target supports compiling @code{avx512f} instructions. + +@item avx512f_runtime +Target supports the execution of @code{avx512f} instructions. + @item cell_hw Test system can execute AltiVec and Cell PPU instructions. diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c new file mode 100644 index 000..e0520b45c40 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c @@ -0,0 +1,16 @@ +/* { dg-do run { target { ! x32 } } } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +int a[56]; +int b; +int main (int argc, char *argv[]) { + int c; + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c new file mode 100644 index 000..f885dee6512 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c @@ -0,0 +1,27 @@ +/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */ +/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c new file mode 100644 index 000..baea0796d24 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c @@ -0,0 +1,8 @@ +/* { dg-do run { target { lp64 && avx512f_runtime } } } */ +/* { dg-do compile { target { lp64 && { ! avx512f_runtime } } } } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func using save/restore stubs. */ + +#include "pr80969-2.c" diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c new file mode 100644 index 000..d902a771cc8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c @@ -0,0 +1,32 @
[PATCH 3/4] [i386] Modify SP realignment in ix86_expand_prologue, et. al.
My first version of this patch inited m->fs.sp_realigned_fp_last with the value of m->fs.sp_offset prior to performing the stack realignment. I had forgotten, however, that when we're saving GP regs using MOV that we delay SP modification as long as possible so that the value of m->fs.sp_offset at this point is correct when we've used push, but incorrect when we've used mov. This has been tested on both x86_64-pc-linux-gnu{,x32} with --target_board=unix/\{-m64,-mx32,-m32\}. Original patch description: The SP allocation calculation is now done in ix86_compute_frame_layout and the result stored in ix86_frame::stack_realign_allocate. This change also updates comments for choose_baseaddr to clarify that the alignment returned doesn't necessarily reflect the alignment of the cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an alignment of 64 bytes). Since the alignment required may be more than 16-bytes, we cannot defer SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so that function needs to be updated as well. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 58 -- 1 file changed, 32 insertions(+), 26 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 30e84dd5303..dbc771da8aa 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13359,10 +13359,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx _reg, } /* Return an RTX that points to CFA_OFFSET within the stack frame and - the alignment of address. If align is non-null, it should point to + the alignment of address. If ALIGN is non-null, it should point to an alignment value (in bits) that is preferred or zero and will - recieve the alignment of the base register that was selected. The - valid base registers are taken from CFUN->MACHINE->FS. */ + recieve the alignment of the base register that was selected, + irrespective of rather or not CFA_OFFSET is a multiple of that + alignment value. + + The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) @@ -14445,35 +14448,35 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame ) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout = xlogue_layout::get_instance (); - HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset; - HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset; - HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in (); + HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; + + /* AL should only be live with sysv_abi. */ + gcc_assert (!ix86_eax_live_at_start_p ()); + + /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather + we've actually realigned the stack or not. */ + align = GET_MODE_ALIGNMENT (V4SFmode); + addr = choose_baseaddr (frame.stack_realign_offset + + xlogue.get_stub_ptr_offset (), ); + gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); + emit_insn (gen_rtx_SET (rax, addr)); - /* Verify that the incoming stack 16-byte alignment offset matches the - layout we're using. */ - gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD)); + /* Allocate stack if not already done. */ + if (allocate > 0) + pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, + GEN_INT (-allocate), -1, false); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP : XLOGUE_STUB_SAVE); RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym); - /* Setup RAX as the stub's base pointer. */ - align = GET_MODE_ALIGNMENT (V4SFmode); - addr = choose_baseaddr (rax_offset, ); - gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - insn = emit_insn (gen_rtx_SET (rax, addr)); - - gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ()); - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, -GEN_INT (-stack_alloc_size), -1, -m->fs.cfa_reg == stack_pointer_rtx); for (i = 0; i < ncregs; ++i) { const xlogue_layout::reginfo = xlogue.get_reginfo (i); rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode), r.regno); - RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);; + RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset); } gcc_assert (vi == (unsigned)GET_NUM_ELEM (v)); @@ -14728,14 +14731,15 @@ ix86_expand_prologue (void) gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT); /* Record last valid frame pointer
[PATCH 2/4] [i386] Modify ix86_compute_frame_layout
These changes affect how the stack frame is calculated from the region starting at frame.reg_save_offset until frame.frame_pointer_offset, which includes either the stub save area or the (inline) SSE register save area and the va_args register save area. The calculation used when not realigning the stack pointer is the same, but when when realigning we calculate the 16-byte aligned space needed in reverse so that the stack realignment boundary at frame.stack_realign_offset may not necessarily be a multiple of stack_alignment_needed, but the value of frame.frame_pointer_offset will. This results in a properly aligned stack for the function body and avoids wasting stack space. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 116 + gcc/config/i386/i386.h | 2 +- 2 files changed, 80 insertions(+), 38 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 601e3ef47f6..30e84dd5303 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12960,6 +12960,14 @@ ix86_compute_frame_layout (void) gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT); gcc_assert (preferred_alignment <= stack_alignment_needed); + /* The only ABI saving SSE regs should be 64-bit ms_abi. */ + gcc_assert (TARGET_64BIT || !frame->nsseregs); + if (TARGET_64BIT && m->call_ms2sysv) +{ + gcc_assert (stack_alignment_needed >= 16); + gcc_assert (!frame->nsseregs); +} + /* For SEH we have to limit the amount of code movement into the prologue. At present we do this via a BLOCKAGE, at which point there's very little scheduling that can be done, which means that there's very little point @@ -13022,54 +13030,88 @@ ix86_compute_frame_layout (void) if (TARGET_SEH) frame->hard_frame_pointer_offset = offset; - /* When re-aligning the stack frame, but not saving SSE registers, this - is the offset we want adjust the stack pointer to. */ - frame->stack_realign_allocate_offset = offset; + /* Calculate the size of the va-arg area (not including padding, if any). */ + frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size; - /* The re-aligned stack starts here. Values before this point are not - directly comparable with values below this point. Use sp_valid_at - to determine if the stack pointer is valid for a given offset and - fp_valid_at for the frame pointer. */ if (stack_realign_fp) -offset = ROUND_UP (offset, stack_alignment_needed); - frame->stack_realign_offset = offset; - - if (TARGET_64BIT && m->call_ms2sysv) { - gcc_assert (stack_alignment_needed >= 16); - gcc_assert (!frame->nsseregs); + /* We may need a 16-byte aligned stack for the remainder of the +register save area, but the stack frame for the local function +may require a greater alignment if using AVX/2/512. In order +to avoid wasting space, we first calculate the space needed for +the rest of the register saves, add that to the stack pointer, +and then realign the stack to the boundary of the start of the +frame for the local function. */ + HOST_WIDE_INT space_needed = 0; + HOST_WIDE_INT sse_reg_space_needed = 0; - m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD); - offset += xlogue_layout::get_instance ().get_stack_space_used (); -} + if (TARGET_64BIT) + { + if (m->call_ms2sysv) + { + m->call_ms2sysv_pad_in = 0; + space_needed = xlogue_layout::get_instance ().get_stack_space_used (); + } - /* Align and set SSE register save area. */ - else if (frame->nsseregs) -{ - /* The only ABI that has saved SSE registers (Win64) also has a -16-byte aligned default stack. However, many programs violate -the ABI, and Wine64 forces stack realignment to compensate. + else if (frame->nsseregs) + /* The only ABI that has saved SSE registers (Win64) also has a + 16-byte aligned default stack. However, many programs violate + the ABI, and Wine64 forces stack realignment to compensate. */ + space_needed = frame->nsseregs * 16; + + sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16); + + /* 64-bit frame->va_arg_size should always be a multiple of 16, but +rounding to be pedantic. */ + space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16); + } + else + space_needed = frame->va_arg_size; + + /* Record the allocation size required prior to the realignment AND. */ + frame->stack_realign_allocate = space_needed; + + /* The re-aligned stack starts at frame->stack_realign_offset. Values +before this point are not directly co
[PATCH 1/4] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at
When we realign the stack frame (without DRAP), there may be a range of CFA offsets that should never be touched because they are alignment padding and any reference to them is almost certainly an error. Previously, only the offset of where the realigned stack frame starts was recorded and checked in sp_valid_at and fp_valid_at. This change adds sp_realigned_fp_last to struct machine_frame_state to record the last valid offset from which the frame pointer can be used when the stack pointer is realigned and modifies sp_valid_at and fp_valid_at to fail an assertion when passed an offset in the "no-man's land" between these two values. Comments for struct machine_frame_state incorrectly stated that a realigned stack pointer could be used to access offsets equal to or greater than sp_realigned_offset, but it is only valid for offsets that are greater. This was the (incorrect) behaviour of sp_valid_at and fp_valid_at prior to r250587 and this change now corrects the documentation and adds clarification of the CFA-relative calculation. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 45 ++--- gcc/config/i386/i386.h | 18 +- 2 files changed, 43 insertions(+), 20 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c08ad55fcd9..601e3ef47f6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13177,26 +13177,36 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT offset) return len; } -/* Determine if the stack pointer is valid for accessing the cfa_offset. - The register is saved at CFA - CFA_OFFSET. */ +/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in + the frame save area. The register is saved at CFA - CFA_OFFSET. */ -static inline bool +static bool sp_valid_at (HOST_WIDE_INT cfa_offset) { const struct machine_frame_state = cfun->machine->fs; - return fs.sp_valid && !(fs.sp_realigned - && cfa_offset <= fs.sp_realigned_offset); + if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset) +{ + /* Validate that the cfa_offset isn't in a "no-man's land". */ + gcc_assert (cfa_offset <= fs.sp_realigned_fp_last); + return false; +} + return fs.sp_valid; } -/* Determine if the frame pointer is valid for accessing the cfa_offset. - The register is saved at CFA - CFA_OFFSET. */ +/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in + the frame save area. The register is saved at CFA - CFA_OFFSET. */ static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset) { const struct machine_frame_state = cfun->machine->fs; - return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned - && cfa_offset > fs.sp_realigned_offset); + if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last) +{ + /* Validate that the cfa_offset isn't in a "no-man's land". */ + gcc_assert (cfa_offset >= fs.sp_realigned_offset); + return false; +} + return fs.fp_valid; } /* Choose a base register based upon alignment requested, speed and/or @@ -14675,6 +14685,9 @@ ix86_expand_prologue (void) int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT; gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT); + /* Record last valid frame pointer offset. */ + m->fs.sp_realigned_fp_last = m->fs.sp_offset; + /* The computation of the size of the re-aligned stack frame means that we must allocate the size of the register save area before performing the actual alignment. Otherwise we cannot guarantee @@ -14688,13 +14701,15 @@ ix86_expand_prologue (void) insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-align_bytes))); - /* For the purposes of register save area addressing, the stack -pointer can no longer be used to access anything in the frame -below m->fs.sp_realigned_offset and the frame pointer cannot be -used for anything at or above. */ m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes); m->fs.sp_realigned = true; m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16; + /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset. +Beyond this point, stack access should be done via choose_baseaddr or +by using sp_valid_at and fp_valid_at to determine the correct base +register. Henceforth, any CFA offset should be thought of as logical +and not physical. */ + gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last); gcc_assert (m->fs.sp_realign
[PATCH v4 0/4] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
I had to fix a few things for x32 compatibility and I this is ready now. H.J. tested on machine with avx512 (including x32) and I've tested both native x32 and normal x86_64 with m64, m32 and mx32 and all is well. I've made more changes to the tests so I'm just submitting a version 2 of the whole patch set. OK for trunk? 2017-08-22 Daniel Santos <daniel.san...@pobox.com> * config/i386/i386.h (ix86_frame::stack_realign_allocate_offset): Remove field. (ix86_frame::stack_realign_allocate): New field. (struct machine_frame_state): Modify comments. (machine_frame_state::sp_realigned_fp_end): New field. * config/i386/i386.c (ix86_compute_frame_layout): Rework stack frame layout calculation. (sp_valid_at): Add assertion to assure no attempt to access invalid offset of a realigned stack. (fp_valid_at): Likewise. (choose_baseaddr): Modify comments. (ix86_emit_outlined_ms2sysv_save): Adjust to changes in ix86_expand_prologue. (ix86_expand_prologue): Modify stack realignment and allocation. (ix86_expand_epilogue): Modify comments. 2017-08-22 Daniel Santos <daniel.san...@pobox.com> * gcc.target/i386/pr80969-1.c: New testcase. * gcc.target/i386/pr80969-2a.c: Likewise. * gcc.target/i386/pr80969-2.c: Likewise. * gcc.target/i386/pr80969-3.c: Likewise. * gcc.target/i386/pr80969-4a.c: Likewise. * gcc.target/i386/pr80969-4b.c: Likewise. * gcc.target/i386/pr80969-4.c: Likewise. * gcc.target/i386/pr80969-4.h: New header common to pr80969-4*.c Thanks, Daniel
[PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to > UNKNOWN_ABI. It would seem to me that UNSPECIFIED_ABI would be a better value name. Also, I don't really understand what opts_set and opts are, except that I had guessed opts_set is what the user asked for (or didn't ask for) and opts is what we're going to actually use. Am I close? I'm re-running tests, so if they pass is this OK? Thanks, Daniel --- gcc/config/i386/i386-opts.h | 5 +++-- gcc/config/i386/i386.c | 3 +-- gcc/config/i386/i386.opt| 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h index 542cd0f3d67..a1d1552a3c6 100644 --- a/gcc/config/i386/i386-opts.h +++ b/gcc/config/i386/i386-opts.h @@ -44,8 +44,9 @@ last_alg /* Available call abi. */ enum calling_abi { - SYSV_ABI = 0, - MS_ABI = 1 + UNSPECIFIED_ABI = 0, + SYSV_ABI = 1, + MS_ABI = 2 }; enum fpmath_unit diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 650bcbc65ae..c08ad55fcd9 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5681,12 +5681,11 @@ ix86_option_override_internal (bool main_args_p, opts->x_ix86_pmode = TARGET_LP64_P (opts->x_ix86_isa_flags) ? PMODE_DI : PMODE_SI; - if (!opts_set->x_ix86_abi) + if (opts_set->x_ix86_abi == UNSPECIFIED_ABI) opts->x_ix86_abi = DEFAULT_ABI; if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags)) error ("-mabi=ms not supported with X32 ABI"); - gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI); /* For targets using ms ABI enable ms-extensions, if not explicit turned off. For non-ms ABI we turn off this diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index cd564315f04..f7b9f9707f7 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -525,7 +525,7 @@ Target Report Mask(IAMCU) Generate code that conforms to Intel MCU psABI. mabi= -Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(SYSV_ABI) +Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(UNSPECIFIED_ABI) Generate code that conforms to the given ABI. Enum -- 2.13.3
Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
OK, the problem is at line 4014 of gcc/Makefile.in: $(MAKE) TESTSUITEDIR="$(TESTSUITEDIR)" RUNTESTFLAGS="$(RUNTESTFLAGS)" \ check-parallel-$* \ Even worse, one can inject arbitrary shell commands here, not that I can think of a scenario where it would be an actual security problem: RUNTESTFLAGS="i386.exp=a b\"; beep\"" check-c I presume that the solution would be to re-escape the contents of RUNTESTFLAGS. Daniel
Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
On 08/22/2017 12:32 PM, Mike Stump wrote: > On Aug 22, 2017, at 10:32 AM, Daniel Santos <daniel.san...@pobox.com> wrote: >>> I would suggest "escaped or quoted." >>> The whole argument to RUNTESTFLAGS can be quoted in either single >>> or double quotes and, AFAICT, so can the space-separated test >>> names within it. >> Well, mysteriously, double quotes do not work. > Did you try the obvious: > > "\"pdf pdf\" pdf" > > ? I think it should work fine. I have found one additional working mechanism: RUNTESTFLAGS='i386.exp=\"pr80969-[12]*.c pr80969-4.c\"' But using double quotes for both does NOT work: RUNTESTFLAGS="i386.exp=\"pr80969-[12]*.c pr80969-4.c\"" So the three working options appears to be: 1. Escaping whitespace 2. Using double quotes for the whole value and single quotes for the file.exp=patterns expression 3. Using single quotes for the whole value and double quotes for the file.exp=patterns expression Daniel
Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
On 08/22/2017 12:32 PM, Mike Stump wrote: > On Aug 22, 2017, at 10:32 AM, Daniel Santos <daniel.san...@pobox.com> wrote: >>> I would suggest "escaped or quoted." >>> The whole argument to RUNTESTFLAGS can be quoted in either single >>> or double quotes and, AFAICT, so can the space-separated test >>> names within it. >> Well, mysteriously, double quotes do not work. > Did you try the obvious: > > "\"pdf pdf\" pdf" > > ? I think it should work fine. Yes. As I explained in the rest of my email I tried a great many variations. I can debug runtest some more and try to better understand how this is getting parsed. Daniel
[PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
OK, how's this one? * doc/install.texi: Modify to add more details on running selected tests. Thanks, Daniel Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/doc/install.texi | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 7c9e2f25d44..da360da1c50 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -2737,6 +2737,16 @@ the testsuite with filenames matching @samp{9805*}, you would use make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}" @end smallexample +The file-matching expression following @var{filename}@command{.exp=} is treated +as a series of whitespace-delimited glob expressions so that multiple patterns +may be passed, although any whitespace must either be escaped or surrounded by +single quotes if multiple expressions are desired. For example, + +@smallexample +make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c @var{other-options}" +make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' @var{other-options}" +@end smallexample + The @file{*.exp} files are located in the testsuite directories of the GCC source, the most important ones being @file{compile.exp}, @file{execute.exp}, @file{dg.exp} and @file{old-deja.exp}. -- 2.13.3
Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
On 08/22/2017 10:58 AM, Martin Sebor wrote: > On 08/21/2017 07:41 PM, Daniel Santos wrote: >> It took me a while to figure out how to do this so I figured that it >> should be >> in the docs. OK for trunk? >> >> * doc/install.texi: Add more details on selecting multiple tests. > > Thank you! It had taken me some time to figure this out. > >> +The file-matching expression following @var{filename}@command{.exp=} >> is treated >> +as a series of whitespace-delimited glob expressions so that >> multiple patterns >> +may be passed, although any whitespace must either be escaped or >> surrounded by >> +tick marks if multiple expressions are desired. For example, > > Do you mean single quotes? Yes. I guess I've heard the terms "tick marks" and "single quotes" used before. Perhaps using 'single quotes' would be a good way to express it (with the quotes). > I would suggest "escaped or quoted." > The whole argument to RUNTESTFLAGS can be quoted in either single > or double quotes and, AFAICT, so can the space-separated test > names within it. Well, mysteriously, double quotes do not work. So if I pass RUNTESTFLAGS='"i386.exp=pr80969-[12]*.c pr80969-4.c"' then the second pattern isn't used. I have NO idea what happens to it because it I pass RUNTESTFLAGS='i386.exp=pr80969-[12]*.c pr80969-4.c' then runtest properly demands that I tell it what in the hell pr80969-4.c is supposed to mean. As an experiment, I created a symlink named \"pr80969-4.c and using RUNTESTFLAGS='"i386.exp=pr80969-[12]*.c "pr80969-4.c' but it didn't pick it up. This is probably JAB (just another bug) in DejaGNU. Among the variations I've tried are enclosing the expressions in {braces}, \{escaped braces\} and comma-delimited \{escaped,braces\}, but none of these worked. Daniel > Martin >
Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
On 08/22/2017 01:26 AM, Andreas Schwab wrote: > On Aug 21 2017, Daniel Santos <daniel.san...@pobox.com> wrote: > >> This is a problem that occured because of this code in >> ix86_option_override_internal: >> >> if (!opts_set->x_ix86_abi) >> opts->x_ix86_abi = DEFAULT_ABI; > Why is that a problem? Note opts_set vs opts. Just because the test !opts_set->x_ix86_abi will be true rather we supplied no -mabi parameter or we supplied -mabi=sysv. Daniel > Andreas.
[PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW
This is a problem that occured because of this code in ix86_option_override_internal: if (!opts_set->x_ix86_abi) opts->x_ix86_abi = DEFAULT_ABI; I tested this along with my other patches. OK for trunk? * config/i386/i386-opts.h (enum calling_abi): Modify so that no legal values are equivalent to zero. Thanks, Daniel Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386-opts.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h index 542cd0f3d67..8c2b5380e49 100644 --- a/gcc/config/i386/i386-opts.h +++ b/gcc/config/i386/i386-opts.h @@ -44,8 +44,8 @@ last_alg /* Available call abi. */ enum calling_abi { - SYSV_ABI = 0, - MS_ABI = 1 + SYSV_ABI = 1, + MS_ABI = 2 }; enum fpmath_unit -- 2.13.3
[PATCH] [i386, testsuite] [PR 71958] Error on -mx32 with -mabi=ms
We currently error when -mx32 and -mabi=sysv and we encounter a function with attribute ms_abi, but we are not erroring on -mx32 and -mabi=ms (either explicitly or when it is the default on Windows). In fact, it generates code that runs, but is of an undfined ABI. I'm also changing pr64409.c because if you explicitly supply -m64, then the test became ineffective. This is because the -mx32 parameter passed in dg-options is later overridden by the explicit -m64 parameter. I've bootstrapped and tested on * an x86_64-pc-linux-gnux32 system building gcc with --with-abi=mx32, * a "normal" x86_64-pc-linux-gnu testing with --target_board=unix/\{,-m32\}, and * on Windows. OK for trunk? gcc/ChangeLog: 2017-08-11 Daniel Santos <daniel.san...@pobox.com> * config/i386/i386.c (ix86_option_override_internal): Error when -mx32 is combined with -mabi=ms. (ix86_function_type_abi): Limit errors for mixing -mx32 with attribute ms_abi. gcc/testsuite/ChangeLog: 2017-08-11 Daniel Santos <daniel.san...@pobox.com> * gcc.target/i386/pr71958.c: New test to verify error on -mx32 and -mabi=ms * gcc.target/i386/pr64409.c: Modify to only run on x32. * gcc.target/i386/pr46470.c: Modify to skip x32 target. * gcc.target/i386/pr66275.c: Likewise. * gcc.target/i386/pr68018.c: Likewise. Thanks, Daniel Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 12 ++-- gcc/testsuite/gcc.target/i386/pr46470.c | 2 +- gcc/testsuite/gcc.target/i386/pr64409.c | 2 +- gcc/testsuite/gcc.target/i386/pr66275.c | 2 +- gcc/testsuite/gcc.target/i386/pr68018.c | 2 +- gcc/testsuite/gcc.target/i386/pr71958.c | 7 +++ 6 files changed, 21 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr71958.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1d88e4f247a..3b537f2608f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5684,6 +5684,10 @@ ix86_option_override_internal (bool main_args_p, if (!opts_set->x_ix86_abi) opts->x_ix86_abi = DEFAULT_ABI; + if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags)) +error ("-mabi=ms not supported with X32 ABI"); + gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI); + /* For targets using ms ABI enable ms-extensions, if not explicit turned off. For non-ms ABI we turn off this option. */ @@ -8777,8 +8781,12 @@ ix86_function_type_abi (const_tree fntype) if (abi == SYSV_ABI && lookup_attribute ("ms_abi", TYPE_ATTRIBUTES (fntype))) { - if (TARGET_X32) - error ("X32 does not support ms_abi attribute"); + static int warned; + if (TARGET_X32 && !warned) + { + error ("X32 does not support ms_abi attribute"); + warned = 1; + } abi = MS_ABI; } diff --git a/gcc/testsuite/gcc.target/i386/pr46470.c b/gcc/testsuite/gcc.target/i386/pr46470.c index 9e8e731188e..c66a378a1ad 100644 --- a/gcc/testsuite/gcc.target/i386/pr46470.c +++ b/gcc/testsuite/gcc.target/i386/pr46470.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { ! x32 } } } */ /* The pic register save adds unavoidable stack pointer references. */ /* { dg-skip-if "" { ia32 && { ! nonpic } } } */ /* These options are selected to ensure 1 word needs to be allocated diff --git a/gcc/testsuite/gcc.target/i386/pr64409.c b/gcc/testsuite/gcc.target/i386/pr64409.c index 917472653f4..7bf9d1e398d 100644 --- a/gcc/testsuite/gcc.target/i386/pr64409.c +++ b/gcc/testsuite/gcc.target/i386/pr64409.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-do compile { target x32 } } */ /* { dg-require-effective-target maybe_x32 } */ /* { dg-options "-O0 -mx32" } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66275.c b/gcc/testsuite/gcc.target/i386/pr66275.c index b8759aeb5ec..51ae1f6859c 100644 --- a/gcc/testsuite/gcc.target/i386/pr66275.c +++ b/gcc/testsuite/gcc.target/i386/pr66275.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */ +/* { dg-do compile { target { *-*-linux* && lp64 } } } */ /* { dg-options "-mabi=ms -fdump-rtl-dfinit" } */ void diff --git a/gcc/testsuite/gcc.target/i386/pr68018.c b/gcc/testsuite/gcc.target/i386/pr68018.c index a0fa21e0b00..04929c6c13c 100644 --- a/gcc/testsuite/gcc.target/i386/pr68018.c +++ b/gcc/testsuite/gcc.target/i386/pr68018.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */ +/* { dg-do compile { target { *-*-linux* && lp64 } } } */ /* { dg-options "-O -mabi=ms -mstackrealign" } */ typedef float V __attribute__ ((vector_size (16))); diff --git a/gcc/testsuite/gcc.target/i386/pr
[PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS
It took me a while to figure out how to do this so I figured that it should be in the docs. OK for trunk? * doc/install.texi: Add more details on selecting multiple tests. Thanks, Daniel Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/doc/install.texi | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 7c9e2f25d44..6aefd213901 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -2737,6 +2737,16 @@ the testsuite with filenames matching @samp{9805*}, you would use make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}" @end smallexample +The file-matching expression following @var{filename}@command{.exp=} is treated +as a series of whitespace-delimited glob expressions so that multiple patterns +may be passed, although any whitespace must either be escaped or surrounded by +tick marks if multiple expressions are desired. For example, + +@smallexample +make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c @var{other-options}" +make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' @var{other-options}" +@end smallexample + The @file{*.exp} files are located in the testsuite directories of the GCC source, the most important ones being @file{compile.exp}, @file{execute.exp}, @file{dg.exp} and @file{old-deja.exp}. -- 2.13.3
[PATCH] [i386,testsuite] [PR 71958] Error on -mx32 with -mabi=ms
We currently error when -mx32 -mabi=sysv and we encounter a function with attribute ms_abi, but we are not erroring on -mx32 and -mabi=ms (either explicitly or when it is the default on Windows). In fact, it generates code that runs, but is of an undfined ABI. I'm running -m64 and -m32 tests now and will run x32 tests when those are done. Presuming that I've corrected all existing tests that do not filter out x32 target and there are no additional failures, is this OK for head? Thanks, Daniel gcc/ChangeLog: 2017-08-11 Daniel Santos <daniel.san...@pobox.com> * config/i386/i386.c (ix86_option_override_internal): Modify. (ix86_function_type_abi): Likewise. gcc/testsuite/ChangeLog: 2017-08-11 Daniel Santos <daniel.san...@pobox.com> * gcc.target/i386/pr71958.c: New test. * gcc.target/i386/pr64409.c: Modify to skip on Windows. * gcc.target/i386/pr46470.c: Modify to skip x32 target. * gcc.target/i386/pr66275.c: Likewise. * gcc.target/i386/pr68018.c: Likewise. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 11 +-- gcc/testsuite/gcc.target/i386/pr46470.c | 2 +- gcc/testsuite/gcc.target/i386/pr64409.c | 3 ++- gcc/testsuite/gcc.target/i386/pr66275.c | 2 +- gcc/testsuite/gcc.target/i386/pr68018.c | 2 +- gcc/testsuite/gcc.target/i386/pr71958.c | 8 6 files changed, 22 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr71958.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b04321a8d40..311a52c2a1f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5585,6 +5585,9 @@ ix86_option_override_internal (bool main_args_p, if (TARGET_X32_P (opts->x_ix86_isa_flags)) { + if (opts_set->x_ix86_abi == MS_ABI) + error ("-mx32 not supported with -mabi=ms"); + /* Always turn on OPTION_MASK_ISA_64BIT and turn off OPTION_MASK_ABI_64 for TARGET_X32. */ opts->x_ix86_isa_flags |= OPTION_MASK_ISA_64BIT; @@ -8777,8 +8780,12 @@ ix86_function_type_abi (const_tree fntype) if (abi == SYSV_ABI && lookup_attribute ("ms_abi", TYPE_ATTRIBUTES (fntype))) { - if (TARGET_X32) - error ("X32 does not support ms_abi attribute"); + static int warned; + if (TARGET_X32 && !warned) + { + error ("X32 does not support ms_abi attribute"); + warned = 1; + } abi = MS_ABI; } diff --git a/gcc/testsuite/gcc.target/i386/pr46470.c b/gcc/testsuite/gcc.target/i386/pr46470.c index 9e8e731188e..c66a378a1ad 100644 --- a/gcc/testsuite/gcc.target/i386/pr46470.c +++ b/gcc/testsuite/gcc.target/i386/pr46470.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { ! x32 } } } */ /* The pic register save adds unavoidable stack pointer references. */ /* { dg-skip-if "" { ia32 && { ! nonpic } } } */ /* These options are selected to ensure 1 word needs to be allocated diff --git a/gcc/testsuite/gcc.target/i386/pr64409.c b/gcc/testsuite/gcc.target/i386/pr64409.c index 917472653f4..3dbd9a09f01 100644 --- a/gcc/testsuite/gcc.target/i386/pr64409.c +++ b/gcc/testsuite/gcc.target/i386/pr64409.c @@ -1,6 +1,7 @@ /* { dg-do compile { target { ! ia32 } } } */ /* { dg-require-effective-target maybe_x32 } */ /* { dg-options "-O0 -mx32" } */ +/* { xfail { "*-*-cygwin* *-*-mingw*" } } */ int a; -int* __attribute__ ((ms_abi)) fn1 () { return } /* { dg-error "X32 does not support ms_abi attribute" } */ +int* __attribute__ ((ms_abi)) fn1 () { return } /* { dg-error "X32 does not support ms_abi attribute" { target { ! "*-*-mingw* *-*-cygwin*" } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66275.c b/gcc/testsuite/gcc.target/i386/pr66275.c index b8759aeb5ec..a1271857f6a 100644 --- a/gcc/testsuite/gcc.target/i386/pr66275.c +++ b/gcc/testsuite/gcc.target/i386/pr66275.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */ +/* { dg-do compile { target { *-*-linux* && { ! { ia32 || x32 } } } } } */ /* { dg-options "-mabi=ms -fdump-rtl-dfinit" } */ void diff --git a/gcc/testsuite/gcc.target/i386/pr68018.c b/gcc/testsuite/gcc.target/i386/pr68018.c index a0fa21e0b00..871fdddf643 100644 --- a/gcc/testsuite/gcc.target/i386/pr68018.c +++ b/gcc/testsuite/gcc.target/i386/pr68018.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */ +/* { dg-do compile { target { *-*-linux* && { ! { ia32 || x32 } } } } } */ /* { dg-options "-O -mabi=ms -mstackrealign" } */ typedef float V __attribute__ ((vector_size (16))); diff --git a/gcc/testsuite/gcc.target/i386/pr71958.c b/gcc/testsuite/gcc.target/i386/pr71958.c new file mode 100644 index 000..090d1970ca9 --- /dev/null +++ b/gcc
PING Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
Original message: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02005.html Patches 2 and 3 have been committed and I have corrected the error in patch 5. I configuring with --enable-checking=yes,rtl --enable-languages=all and retested with RUNTESTFLAGS="--target_board=unix/\{,-m32\}" The updated patches fix an error when using mov instead of push and add documentation for changes to target-supports.exp. I have included modified ChangeLogs. In addition to to fixing the ICE, this patch set makes more efficient use of stack space in some cases the outgoing stack boundary is > 16 bytes and realignment is necessary. This adds new tests, some of which require avx512f (gcc/testsuite/gcc.target/i386/pr80969-4*.c) -- these I have only tested these using Intel SDE. Below is an updated list of the patches. 1. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02006.html 2. Committed. 3. Committed. 4. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02009.html 5. v2 -- https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00249.html 6. v2 -- https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00618.html Thanks, Daniel 2017-08-08 Daniel Santos <daniel.san...@pobox.com> * config/i386/i386.h (ix86_frame::stack_realign_allocate_offset): Remove (ix86_frame::stack_realign_allocate): New field. (struct machine_frame_state): Modify comments. (machine_frame_state::sp_realigned_fp_end): New field. * config/i386/i386.c (ix86_compute_frame_layout): Modify. (sp_valid_at): Likewise. (fp_valid_at): Likewise. (choose_baseaddr): Modify comments. (ix86_emit_outlined_ms2sysv_save): Modify. (ix86_expand_prologue): Likewise. * doc/sourcebuild.texi (avx2, avx2_runtime): Add missing items to effective-targets. (avx512f, avx512f_runtime): Add new items to effective-tarets. 2017-08-08 Daniel Santos <daniel.san...@pobox.com> * lib/target-supports.exp (check_avx512_os_support_available): New Procedure. (check_avx2_hw_available): Modify. (check_avx512f_hw_available): New Procedure. (check_effective_target_avx512f_runtime): Likewise. * gcc.target/i386/pr80969-1.c: New testcase. * gcc.target/i386/pr80969-2a.c: Likewise. * gcc.target/i386/pr80969-2.c: Likewise. * gcc.target/i386/pr80969-3.c: Likewise. * gcc.target/i386/pr80969-4a.c: Likewise. * gcc.target/i386/pr80969-4b.c: Likewise. * gcc.target/i386/pr80969-4.c: Likewise.
[PATCH 6/6 v2] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
This update adds documentation for the new effective taregts in addition to a few existing effective targets that were undocumented. Changes to lib/target-supports.exp and documentation: * Add effective-targets avx512f and avx512f_runtime (needed for new tests). * Corrects bug in check_avx2_hw_available. * Adds documentation for effective-targets avx2, avx2_runtime (both missing), avx512f and avx512f_runtime. The following tests are added. The testcase in the PR is used as a base and relevant variants are added to test other factors affected by the patch set. pr80969-1.c Base test case. pr80969-2.c With ms to sysv call. pr80969-2a.c With ms to sysv call using stubs. pr80969-3.c With alloca (for DRAP test). pr80969-4.c With va_args passed via va_list pr80969-4a.c With va_args passed via va_list and ms to sysv call. pr80969-4b.c With va_args passed via va_list and ms to sysv call using stubs. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/doc/sourcebuild.texi | 12 +++ gcc/testsuite/gcc.target/i386/pr80969-1.c | 16 gcc/testsuite/gcc.target/i386/pr80969-2.c | 26 ++ gcc/testsuite/gcc.target/i386/pr80969-2a.c | 26 ++ gcc/testsuite/gcc.target/i386/pr80969-3.c | 31 gcc/testsuite/gcc.target/i386/pr80969-4.c | 123 gcc/testsuite/gcc.target/i386/pr80969-4a.c | 124 + gcc/testsuite/gcc.target/i386/pr80969-4b.c | 124 + gcc/testsuite/lib/target-supports.exp | 66 +++ 9 files changed, 548 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 85af8778167..66f040f212d 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1852,6 +1852,18 @@ Target supports compiling @code{avx} instructions. @item avx_runtime Target supports the execution of @code{avx} instructions. +@item avx2 +Target supports compiling @code{avx2} instructions. + +@item avx2_runtime +Target supports the execution of @code{avx2} instructions. + +@item avx512f +Target supports compiling @code{avx512f} instructions. + +@item avx512f_runtime +Target supports the execution of @code{avx512f} instructions. + @item cell_hw Test system can execute AltiVec and Cell PPU instructions. diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c new file mode 100644 index 000..eb8d767a778 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +int a[56]; +int b; +int main (int argc, char *argv[]) { + int c; + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c new file mode 100644 index 000..e868d6c7e5c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c new file mode 100644 index 000..071a90534a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func using save/restore stubs. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c new file mode 100644 index 000..5982981b55c --- /dev/null +++ b/gcc/testsuite/gcc.target/i
[PATCH 5/6 v2] [i386] Modify SP realignment in ix86_expand_prologue, et. al.
My first version of this patch inited m->fs.sp_realigned_fp_last with the value of m->fs.sp_offset prior to performing the stack realignment. I had forgotten, however, that when we're saving GP regs using MOV that we delay SP modification as long as possible so that the value of m->fs.sp_offset at this point is correct when we've used push, but incorrect when we've used mov. This time I've bootstraped with --enable-checking=yes,rtl --enable-languages=all and reg tested using the below command to test both 64- and 32-bit code. make -kj8 RUNTESTFLAGS="--target_board=unix/\{,-m32\}" check Original patch description: The SP allocation calculation is now done in ix86_compute_frame_layout and the result stored in ix86_frame::stack_realign_allocate. This change also updates comments for choose_baseaddr to clarify that the alignment returned doesn't necessarily reflect the alignment of the cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an alignment of 64 bytes). Since the alignment required may be more than 16-bytes, we cannot defer SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so that function needs to be updated as well. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 58 -- 1 file changed, 32 insertions(+), 26 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 0dc366cf16e..a1f39cd714c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13289,10 +13289,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx _reg, } /* Return an RTX that points to CFA_OFFSET within the stack frame and - the alignment of address. If align is non-null, it should point to + the alignment of address. If ALIGN is non-null, it should point to an alignment value (in bits) that is preferred or zero and will - recieve the alignment of the base register that was selected. The - valid base registers are taken from CFUN->MACHINE->FS. */ + recieve the alignment of the base register that was selected, + irrespective of rather or not CFA_OFFSET is a multiple of that + alignment value. + + The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) @@ -14338,35 +14341,35 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame ) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout = xlogue_layout::get_instance (); - HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset; - HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset; - HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in (); + HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; + + /* AL should only be live with sysv_abi. */ + gcc_assert (!ix86_eax_live_at_start_p ()); + + /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather + we've actually realigned the stack or not. */ + align = GET_MODE_ALIGNMENT (V4SFmode); + addr = choose_baseaddr (frame.stack_realign_offset + + xlogue.get_stub_ptr_offset (), ); + gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); + emit_insn (gen_rtx_SET (rax, addr)); - /* Verify that the incoming stack 16-byte alignment offset matches the - layout we're using. */ - gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD)); + /* Allocate stack if not already done. */ + if (allocate > 0) + pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, + GEN_INT (-allocate), -1, false); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP : XLOGUE_STUB_SAVE); RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym); - /* Setup RAX as the stub's base pointer. */ - align = GET_MODE_ALIGNMENT (V4SFmode); - addr = choose_baseaddr (rax_offset, ); - gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - insn = emit_insn (gen_rtx_SET (rax, addr)); - - gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ()); - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, -GEN_INT (-stack_alloc_size), -1, -m->fs.cfa_reg == stack_pointer_rtx); for (i = 0; i < ncregs; ++i) { const xlogue_layout::reginfo = xlogue.get_reginfo (i); rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode), r.regno); - RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);; + RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset); } gcc_assert (vi == (unsigned)GET_NUM_ELEM (v)); @@ -14621,14 +14624,15 @@ ix86_exp
Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
Well I just learned how to test 32-bit earlier and I've uncovered a problem when running 32-bit tests. Do you want me to commit the the two patches (squashed together) in the mean time? Thanks, Daniel
[PATCH 6/6] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
The testcase in the PR is used as a base and relevant variants are added to test other factors affected by the patch set. pr80969-1.c Base test case. pr80969-2.c With ms to sysv call. pr80969-2a.c With ms to sysv call using stubs. pr80969-3.c With alloca (for DRAP test). pr80969-4.c With va_args passed via va_list pr80969-4a.c With va_args passed via va_list and ms to sysv call. pr80969-4b.c With va_args passed via va_list and ms to sysv call using stubs. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/testsuite/gcc.target/i386/pr80969-1.c | 16 gcc/testsuite/gcc.target/i386/pr80969-2.c | 26 ++ gcc/testsuite/gcc.target/i386/pr80969-2a.c | 26 ++ gcc/testsuite/gcc.target/i386/pr80969-3.c | 31 gcc/testsuite/gcc.target/i386/pr80969-4.c | 123 gcc/testsuite/gcc.target/i386/pr80969-4a.c | 124 + gcc/testsuite/gcc.target/i386/pr80969-4b.c | 124 + gcc/testsuite/lib/target-supports.exp | 66 +++ 8 files changed, 536 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c new file mode 100644 index 000..eb8d767a778 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +int a[56]; +int b; +int main (int argc, char *argv[]) { + int c; + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c new file mode 100644 index 000..e868d6c7e5c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c new file mode 100644 index 000..071a90534a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test when calling a sysv func using save/restore stubs. */ + +int a[56]; +int b; + +static void __attribute__((sysv_abi)) sysv () +{ +} + +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv; + +int main (int argc, char *argv[]) { + int c; + sysv_noinfo (); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +a[b] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c new file mode 100644 index 000..5982981b55c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test with alloca (and DRAP). */ + +#include + +int a[56]; +volatile int b = -12345; +volatile const int d = 42; + +void foo (int *x, int y, int z) +{ +} + +void (*volatile const foo_noinfo)(int *, int, int) = foo; + +int main (int argc, char *argv[]) { + int c; + int *e = alloca (d); + foo_noinfo (e, d, 0); + for (; b; b++) { +c = b; +if (b & 1) + c = 2; +foo_noinfo (e, d, c); +a[-(b % 56)] = c; + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.c b/gcc/testsuite/gcc.target/i386/pr80969-4.c new file mode 100644 index 000..1ec54d081cd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80969-4.c @@ -0,0 +1,123 @@ +/* { dg-do run { target avx512f_runtime } } */ +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +/* Test with avx512 and va_args. */ + +#include +#include + +#include "avx-check.h" + +int a[56]; +int b; + +__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 }; +__m512d n2 = { -93.83, 893.318,
[PATCH 5/6] [i386] Modify SP realignment in ix86_expand_prologue, et. al.
The SP allocation calculation is now done in ix86_compute_frame_layout and the result stored in ix86_frame::stack_realign_allocate. This change also updates comments for choose_baseaddr to clarify that the alignment returned doesn't necessarily reflect the alignment of the cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an alignment of 64 bytes). Since the alignment required may be more than 16-bytes, we cannot defer SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so that function needs to be updated as well. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 54 +++--- 1 file changed, 29 insertions(+), 25 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index e92f322de0c..7e1fc4dfbf5 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13273,10 +13273,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx _reg, } /* Return an RTX that points to CFA_OFFSET within the stack frame and - the alignment of address. If align is non-null, it should point to + the alignment of address. If ALIGN is non-null, it should point to an alignment value (in bits) that is preferred or zero and will - recieve the alignment of the base register that was selected. The - valid base registers are taken from CFUN->MACHINE->FS. */ + recieve the alignment of the base register that was selected, + irrespective of rather or not CFA_OFFSET is a multiple of that + alignment value. + + The valid base registers are taken from CFUN->MACHINE->FS. */ static rtx choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align) @@ -14322,35 +14325,35 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame ) rtx sym, addr; rtx rax = gen_rtx_REG (word_mode, AX_REG); const struct xlogue_layout = xlogue_layout::get_instance (); - HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset; - HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset; - HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in (); + HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset; + + /* AL should only be live with sysv_abi. */ + gcc_assert (!ix86_eax_live_at_start_p ()); + + /* Setup RAX as the stub's base pointer. We use stack_realign_offset rather + we've actually realigned the stack or not. */ + align = GET_MODE_ALIGNMENT (V4SFmode); + addr = choose_baseaddr (frame.stack_realign_offset + + xlogue.get_stub_ptr_offset (), ); + gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); + emit_insn (gen_rtx_SET (rax, addr)); - /* Verify that the incoming stack 16-byte alignment offset matches the - layout we're using. */ - gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD)); + /* Allocate stack if not already done. */ + if (allocate > 0) + pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, + GEN_INT (-allocate), -1, false); /* Get the stub symbol. */ sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP : XLOGUE_STUB_SAVE); RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym); - /* Setup RAX as the stub's base pointer. */ - align = GET_MODE_ALIGNMENT (V4SFmode); - addr = choose_baseaddr (rax_offset, ); - gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); - insn = emit_insn (gen_rtx_SET (rax, addr)); - - gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ()); - pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, -GEN_INT (-stack_alloc_size), -1, -m->fs.cfa_reg == stack_pointer_rtx); for (i = 0; i < ncregs; ++i) { const xlogue_layout::reginfo = xlogue.get_reginfo (i); rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode), r.regno); - RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);; + RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset); } gcc_assert (vi == (unsigned)GET_NUM_ELEM (v)); @@ -14608,8 +14611,8 @@ ix86_expand_prologue (void) that we must allocate the size of the register save area before performing the actual alignment. Otherwise we cannot guarantee that there's enough storage above the realignment point. */ - allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset; - if (allocate && !m->call_ms2sysv) + allocate = frame.stack_realign_allocate; + if (allocate) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-allocate), -1, false); @@ -14618,8 +14621,7 @@ ix86_expand_prologu
[PATCH 4/6] [i386] Modify ix86_compute_frame_layout
These changes affect how the stack frame is calculated from the region starting at frame.reg_save_offset until frame.frame_pointer_offset, which includes either the stub save area or the (inline) SSE register save area and the va_args register save area. The calculation used when not realigning the stack pointer is the same, but when when realigning we calculate the 16-byte aligned space needed in reverse so that the stack realignment boundary at frame.stack_realign_offset may not necessarily be a multiple of stack_alignment_needed, but the value of frame.frame_pointer_offset will. This results in a properly aligned stack for the function body and avoids wasting stack space. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 116 + gcc/config/i386/i386.h | 2 +- 2 files changed, 80 insertions(+), 38 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index e2e9546a27c..e92f322de0c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12874,6 +12874,14 @@ ix86_compute_frame_layout (void) gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT); gcc_assert (preferred_alignment <= stack_alignment_needed); + /* The only ABI saving SSE regs should be 64-bit ms_abi. */ + gcc_assert (TARGET_64BIT || !frame->nsseregs); + if (TARGET_64BIT && m->call_ms2sysv) +{ + gcc_assert (stack_alignment_needed >= 16); + gcc_assert (!frame->nsseregs); +} + /* For SEH we have to limit the amount of code movement into the prologue. At present we do this via a BLOCKAGE, at which point there's very little scheduling that can be done, which means that there's very little point @@ -12936,54 +12944,88 @@ ix86_compute_frame_layout (void) if (TARGET_SEH) frame->hard_frame_pointer_offset = offset; - /* When re-aligning the stack frame, but not saving SSE registers, this - is the offset we want adjust the stack pointer to. */ - frame->stack_realign_allocate_offset = offset; + /* Calculate the size of the va-arg area (not including padding, if any). */ + frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size; - /* The re-aligned stack starts here. Values before this point are not - directly comparable with values below this point. Use sp_valid_at - to determine if the stack pointer is valid for a given offset and - fp_valid_at for the frame pointer. */ if (stack_realign_fp) -offset = ROUND_UP (offset, stack_alignment_needed); - frame->stack_realign_offset = offset; - - if (TARGET_64BIT && m->call_ms2sysv) { - gcc_assert (stack_alignment_needed >= 16); - gcc_assert (!frame->nsseregs); + /* We may need a 16-byte aligned stack for the remainder of the +register save area, but the stack frame for the local function +may require a greater alignment if using AVX/2/512. In order +to avoid wasting space, we first calculate the space needed for +the rest of the register saves, add that to the stack pointer, +and then realign the stack to the boundary of the start of the +frame for the local function. */ + HOST_WIDE_INT space_needed = 0; + HOST_WIDE_INT sse_reg_space_needed = 0; - m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD); - offset += xlogue_layout::get_instance ().get_stack_space_used (); -} + if (TARGET_64BIT) + { + if (m->call_ms2sysv) + { + m->call_ms2sysv_pad_in = 0; + space_needed = xlogue_layout::get_instance ().get_stack_space_used (); + } - /* Align and set SSE register save area. */ - else if (frame->nsseregs) -{ - /* The only ABI that has saved SSE registers (Win64) also has a -16-byte aligned default stack. However, many programs violate -the ABI, and Wine64 forces stack realignment to compensate. + else if (frame->nsseregs) + /* The only ABI that has saved SSE registers (Win64) also has a + 16-byte aligned default stack. However, many programs violate + the ABI, and Wine64 forces stack realignment to compensate. */ + space_needed = frame->nsseregs * 16; + + sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16); + + /* 64-bit frame->va_arg_size should always be a multiple of 16, but +rounding to be pedantic. */ + space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16); + } + else + space_needed = frame->va_arg_size; + + /* Record the allocation size required prior to the realignment AND. */ + frame->stack_realign_allocate = space_needed; + + /* The re-aligned stack starts at frame->stack_realign_offset. Values +before this point are not directly co
[PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out
The -mcall-ms2sysv-xlogues project added the boolean fields call_ms2sysv_pad_in and call_ms2sysv_pad_out to struct machine_function to track rather or not an additional 8 bytes of padding was needed for stack alignment prior to and after the stub save area. This design was based upon the faulty assumption the function body would not require a stack alignment greater than 16 bytes. This continues to work well for managing padding prior to the stub save area, but will not work for the outgoing alignment. Rather than changing machine_function::call_ms2sysv_pad_out to a larger type, this patch removes it, thus transferring responsibility for stack alignment following the stub save area from class xlogue_layout to the body of ix86_compute_frame_layout. Since the 64-bit va_arg register save area is always a multiple of 16-bytes in size (176 for System V ABI and 96 for Microsoft ABI), the ROUND_UP calculation for the stack offset at the start of the function body (frame.frame_pointer_offset) will assure there is enough room for any padding needed to keep the save area for SSE va_args 16-byte aligned, so no modification is needed for that calculation. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 18 -- gcc/config/i386/i386.h | 8 ++-- 2 files changed, 6 insertions(+), 20 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 47c5608c3cd..e2e9546a27c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2491,9 +2491,7 @@ public: unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1; gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); -return m_regs[last_reg].offset - + (m->call_ms2sysv_pad_out ? 8 : 0) - + STUB_INDEX_OFFSET; +return m_regs[last_reg].offset + STUB_INDEX_OFFSET; } /* Returns the offset for the base pointer used by the stub. */ @@ -12849,13 +12847,12 @@ ix86_compute_frame_layout (void) { unsigned count = xlogue_layout::count_stub_managed_regs (); m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS; + m->call_ms2sysv_pad_in = 0; } } frame->nregs = ix86_nsaved_regs (); frame->nsseregs = ix86_nsaved_sseregs (); - m->call_ms2sysv_pad_in = 0; - m->call_ms2sysv_pad_out = 0; /* 64-bit MS ABI seem to require stack alignment to be always 16, except for function prologues, leaf functions and when the defult @@ -12957,15 +12954,7 @@ ix86_compute_frame_layout (void) gcc_assert (!frame->nsseregs); m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD); - - /* Select an appropriate layout for incoming stack offset. */ - const struct xlogue_layout = xlogue_layout::get_instance (); - - if ((offset + xlogue.get_stack_space_used ()) & UNITS_PER_WORD) - m->call_ms2sysv_pad_out = 1; - - offset += xlogue.get_stack_space_used (); - gcc_assert (!(offset & 0xf)); + offset += xlogue_layout::get_instance ().get_stack_space_used (); } /* Align and set SSE register save area. */ @@ -12993,6 +12982,7 @@ ix86_compute_frame_layout (void) /* Align start of frame for local function. */ if (stack_realign_fp + || m->call_ms2sysv || offset != frame->sse_reg_save_offset || size != 0 || !crtl->is_leaf diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 1648bdf1556..b08e45f68d4 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2646,17 +2646,13 @@ struct GTY(()) machine_function { BOOL_BITFIELD arg_reg_available : 1; /* If true, we're out-of-lining reg save/restore for regs clobbered - by ms_abi functions calling a sysv function. */ + by 64-bit ms_abi functions calling a sysv_abi function. */ BOOL_BITFIELD call_ms2sysv : 1; /* If true, the incoming 16-byte aligned stack has an offset (of 8) and - needs padding. */ + needs padding prior to out-of-line stub save/restore area. */ BOOL_BITFIELD call_ms2sysv_pad_in : 1; - /* If true, the size of the stub save area plus inline int reg saves will - result in an 8 byte offset, so needs padding. */ - BOOL_BITFIELD call_ms2sysv_pad_out : 1; - /* This is the number of extra registers saved by stub (valid range is 0-6). Each additional register is only saved/restored by the stubs if all successive ones are. (Will always be zero when using a hard -- 2.13.3
[PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset
This value was used in an earlier incarnation of the -mcall-ms2sysv-xlogues patch set but is now set and never read. The value of ix86_frame::sse_reg_save_offset serves the same purpose. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 1 - gcc/config/i386/i386.h | 4 +--- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 690631dfe43..47c5608c3cd 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12966,7 +12966,6 @@ ix86_compute_frame_layout (void) offset += xlogue.get_stack_space_used (); gcc_assert (!(offset & 0xf)); - frame->outlined_save_offset = offset; } /* Align and set SSE register save area. */ diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index ce5bb7f6677..1648bdf1556 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2477,8 +2477,7 @@ enum avx_u128_state <- end of stub-saved/restored regs [padding1] ] - <- outlined_save_offset - <- sse_regs_save_offset + <- sse_reg_save_offset [padding2] |<- FRAME_POINTER [va_arg registers] | @@ -2504,7 +2503,6 @@ struct GTY(()) ix86_frame HOST_WIDE_INT reg_save_offset; HOST_WIDE_INT stack_realign_allocate_offset; HOST_WIDE_INT stack_realign_offset; - HOST_WIDE_INT outlined_save_offset; HOST_WIDE_INT sse_reg_save_offset; /* When save_regs_using_mov is set, emit prologue using -- 2.13.3
[PATCH 1/6] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at
When we realign the stack frame (without DRAP), there may be a range of CFA offsets that should never be touched because they are alignment padding and any reference to them is almost certainly an error. Previously, only the offset of where the realigned stack frame starts was recorded and checked in sp_valid_at and fp_valid_at. This change adds sp_realigned_fp_last to struct machine_frame_state to record the last valid offset from which the frame pointer can be used when the stack pointer is realigned and modifies sp_valid_at and fp_valid_at to fail an assertion when passed an offset in the "no-man's land" between these two values. Comments for struct machine_frame_state incorrectly stated that a realigned stack pointer could be used to access offsets equal to or greater than sp_realigned_offset, but it is only valid for offsets that are greater. This was the (incorrect) behaviour of sp_valid_at and fp_valid_at prior to r250587 and this change now corrects the documentation and adds clarification of the CFA-relative calculation. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 45 ++--- gcc/config/i386/i386.h | 18 +- 2 files changed, 43 insertions(+), 20 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index f1486ff3750..690631dfe43 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13102,26 +13102,36 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT offset) return len; } -/* Determine if the stack pointer is valid for accessing the cfa_offset. - The register is saved at CFA - CFA_OFFSET. */ +/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in + the frame save area. The register is saved at CFA - CFA_OFFSET. */ -static inline bool +static bool sp_valid_at (HOST_WIDE_INT cfa_offset) { const struct machine_frame_state = cfun->machine->fs; - return fs.sp_valid && !(fs.sp_realigned - && cfa_offset <= fs.sp_realigned_offset); + if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset) +{ + /* Validate that the cfa_offset isn't in a "no-man's land". */ + gcc_assert (cfa_offset <= fs.sp_realigned_fp_last); + return false; +} + return fs.sp_valid; } -/* Determine if the frame pointer is valid for accessing the cfa_offset. - The register is saved at CFA - CFA_OFFSET. */ +/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in + the frame save area. The register is saved at CFA - CFA_OFFSET. */ static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset) { const struct machine_frame_state = cfun->machine->fs; - return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned - && cfa_offset > fs.sp_realigned_offset); + if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last) +{ + /* Validate that the cfa_offset isn't in a "no-man's land". */ + gcc_assert (cfa_offset >= fs.sp_realigned_offset); + return false; +} + return fs.fp_valid; } /* Choose a base register based upon alignment requested, speed and/or @@ -14560,6 +14570,9 @@ ix86_expand_prologue (void) int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT; gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT); + /* Record last valid frame pointer offset. */ + m->fs.sp_realigned_fp_last = m->fs.sp_offset; + /* The computation of the size of the re-aligned stack frame means that we must allocate the size of the register save area before performing the actual alignment. Otherwise we cannot guarantee @@ -14573,13 +14586,15 @@ ix86_expand_prologue (void) insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-align_bytes))); - /* For the purposes of register save area addressing, the stack -pointer can no longer be used to access anything in the frame -below m->fs.sp_realigned_offset and the frame pointer cannot be -used for anything at or above. */ m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes); m->fs.sp_realigned = true; m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16; + /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset. +Beyond this point, stack access should be done via choose_baseaddr or +by using sp_valid_at and fp_valid_at to determine the correct base +register. Henceforth, any CFA offset should be thought of as logical +and not physical. */ + gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last); gcc_assert (m->fs.sp_realign
[PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
When working on the Wine64 project to use aligned SSE MOVs after SP realignment and adding -mcall-ms2sysv-xlogues, I overlooked the fact that the function body may require a stack alignment greater than 16-bytes. This can result in an ICE with -mabi=ms -mavx512f and some other cases. This patch set reworks the strategy for calculating the frame layout following normal (inline) integral register saves (at frame.reg_save_offset) to the start of the frame for the local function (frame.frame_pointer_offset). I've completed a bootstrap and full regression test with no additional failures, but I don't have access to a machine with avx512 extensions. I have manually run the tests that need it using the Intel SDE, but I haven't been able to validate that my check_effective_target_avx512f_runtime code in gcc/testsuite/lib/target-supports.exp is correctly enabling the tests for pr80969-4*.c. As an aside note, I still have some rework of the ms-sysv.exp tests that I haven't yet to submitted and in which I'm adding more tests for cases with uncommon stacks, as in PR 81563. Thanks, Daniel 2017-07-23 Daniel Santos <daniel.san...@pobox.com> * config/i386/i386.h (ix86_frame::outlined_save_offset): Remove field. (ix86_frame::stack_realign_allocate_offset): Likewise. (ix86_frame::stack_realign_allocate): New field. (struct machine_frame_state): Modify comments. (machine_frame_state::sp_realigned_fp_end): New field. (machine_function::call_ms2sysv_pad_out): Remove field. * config/i386/i386.c (xlogue_layout::get_stack_space_used): Modify. (ix86_compute_frame_layout): Likewise. (sp_valid_at): Likewise. (fp_valid_at): Likewise. (choose_baseaddr): Modify comments. (ix86_emit_outlined_ms2sysv_save): Modify. (ix86_expand_prologue): Likewise. (ix86_expand_epilogue): Modify comments. 2017-07-23 Daniel Santos <daniel.san...@pobox.com> * gcc.target/i386/pr80969-1.c: New testcase. * gcc.target/i386/pr80969-2a.c: Likewise. * gcc.target/i386/pr80969-2.c: Likewise. * gcc.target/i386/pr80969-3.c: Likewise. * gcc.target/i386/pr80969-4a.c: Likewise. * gcc.target/i386/pr80969-4b.c: Likewise. * gcc.target/i386/pr80969-4.c: Likewise.
Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 07/28/2017 09:41 AM, H.J. Lu wrote: On Fri, Jul 28, 2017 at 6:57 AM, Daniel Santos<daniel.san...@pobox.com> wrote: On 07/26/2017 02:03 PM, H.J. Lu wrote: This patch caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563 Hello. I've rebased my patch set and I'm now retesting. I'm afraid that your changes are wrong because my my sp_valid_at and fp_valid_at functions are wrong -- these are supposed to be for the base offset and not the CFA offset, sorry about that. This means that the check in choose_basereg (and thus choose_baseaddr) have been wrong as well. I'm retesting now. Please check your change with gcc.target/i386/pr81563.c. Thanks. I'm still getting used to x86 stack math and and briefly I thought that my understanding of the CFA was wrong and that I had messed up sp_valid_at and fp_valid_at, but I was mistaken, so sorry for the false alarm. My rebased patches pass all tests, so it's OK.
Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 07/26/2017 02:03 PM, H.J. Lu wrote: This patch caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563 Hello. I've rebased my patch set and I'm now retesting. I'm afraid that your changes are wrong because my my sp_valid_at and fp_valid_at functions are wrong -- these are supposed to be for the base offset and not the CFA offset, sorry about that. This means that the check in choose_basereg (and thus choose_baseaddr) have been wrong as well. I'm retesting now.
Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 07/26/2017 02:03 PM, H.J. Lu wrote: This patch caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81563 Yes, I discovered this flaw while working on PR 80969 but I hadn't found an actual testcase where it caused a problem yet. I'm about to submit my patchset for review, so sorry I didn't get it committed sooner. My patch set further improves sp_valid_at and fp_valid_at since it's possible that the the last offset the frame pointer can be used to access is not equal to realignment offset. I'll try to get this out tonight or tomorrow. Thanks! Daniel
PING: [PATCH v2 0/2] [testsuite, libgcc] PR80759 Fix FAILs on Solaris and Darwin
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00025.html Uros, Can you review changes for i386 please? Mike or Iain, Can one of you review changes for Darwin please? I'm not familiar with the platform, although Rainer tested on Darwin for me. Ian, Can you review changes to libgcc please? Thank you all! Daniel On 07/02/2017 12:11 AM, Daniel Santos wrote: This patchset addresses a number of testsuite issues for gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp, mostly occurring on Solaris and Darwin. Additionally, it solves a bug in libgcc that caused link failures on Darwin when building with -mcall-ms2sysv-xlogues. The issues are detailed in the notes for each patch. I would particularly appreciate any feedback for Darwin as I am unfamiliar with the platform and Rainer and I have fashioned some of these changes by looking at other Darwin code in gcc. .../gcc.target/x86_64/abi/ms-sysv/do-test.S | 200 --- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c | 83 +++- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp| 153 +- libgcc/config.host | 6 +- libgcc/config/i386/i386-asm.h| 89 + libgcc/config/i386/resms64.S | 2 +- libgcc/config/i386/resms64f.S| 2 +- libgcc/config/i386/resms64fx.S | 2 +- libgcc/config/i386/resms64x.S| 2 +- libgcc/config/i386/savms64.S | 2 +- libgcc/config/i386/savms64f.S| 2 +- 11 files changed, 274 insertions(+), 269 deletions(-) Many thanks to Rainer for all of his help on this! Thanks, Daniel
[PATCH 2/2] [libgcc]: PR80759 fixes for Solaris & Darwin
The -mcall-ms2sysv-xlogues option is supposed to work on Solaris and Darwin, but my changes to config.host weren't adding the sav/res stubs to libgcc and the assembly code wasn't compatible with their assemblers either. * Change config.host to build -mcall-ms2sysv-xlogues sav/res stubs on Solaris and Darwin. * Replace .macro/.endm with cpp macros * Replace .global with .globl * Append __USER_LABEL_PREFIX__ when defined (via ASMNAME macro). * Only use .size when __ELF__ is defined. * Only use .hidden when both __ELF__ and HAVE_GAS_HIDDEN are defined. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- libgcc/config.host | 6 +-- libgcc/config/i386/i386-asm.h | 89 ++ libgcc/config/i386/resms64.S | 2 +- libgcc/config/i386/resms64f.S | 2 +- libgcc/config/i386/resms64fx.S | 2 +- libgcc/config/i386/resms64x.S | 2 +- libgcc/config/i386/savms64.S | 2 +- libgcc/config/i386/savms64f.S | 2 +- 8 files changed, 64 insertions(+), 43 deletions(-) diff --git a/libgcc/config.host b/libgcc/config.host index cf62e0e54f7..bee3e931106 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -588,12 +588,12 @@ hppa*-*-openbsd*) tmake_file="$tmake_file pa/t-openbsd" ;; i[34567]86-*-darwin*) - tmake_file="$tmake_file i386/t-crtpc t-crtfm" + tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi" tm_file="$tm_file i386/darwin-lib.h" extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o" ;; x86_64-*-darwin*) - tmake_file="$tmake_file i386/t-crtpc t-crtfm" + tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi" tm_file="$tm_file i386/darwin-lib.h" extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o" ;; @@ -670,7 +670,7 @@ i[34567]86-*-rtems*) extra_parts="$extra_parts crti.o crtn.o" ;; i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]*) - tmake_file="$tmake_file i386/t-crtpc t-crtfm" + tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi" extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o" tm_file="${tm_file} i386/elf-lib.h" md_unwind_header=i386/sol2-unwind.h diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h index c613e9fd83d..1387fd24b4f 100644 --- a/libgcc/config/i386/i386-asm.h +++ b/libgcc/config/i386/i386-asm.h @@ -26,22 +26,45 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #ifndef I386_ASM_H #define I386_ASM_H +#include "auto-host.h" + +/* These macros currently support GNU/Linux, Solaris and Darwin. */ + #ifdef __ELF__ -# define ELFFN(fn) .type fn,@function +# define FN_TYPE(fn) .type fn,@function +# define FN_SIZE(fn) .size fn,.-fn +# ifdef HAVE_GAS_HIDDEN +# define FN_HIDDEN(fn) .hidden fn +# endif +#else +# define FN_TYPE(fn) +# define FN_SIZE(fn) +#endif + +#ifndef FN_HIDDEN +# define FN_HIDDEN(fn) +#endif + +#ifdef __USER_LABEL_PREFIX__ +# define ASMNAME2(prefix, name)prefix ## name +# define ASMNAME1(prefix, name)ASMNAME2(prefix, name) +# define ASMNAME(name) ASMNAME1(__USER_LABEL_PREFIX__, name) #else -# define ELFFN(fn) +# define ASMNAME(name) name #endif -#define FUNC_START(fn) \ - .global fn; \ - ELFFN (fn); \ -fn: +#define FUNC_BEGIN(fn) \ + .globl ASMNAME(fn); \ + FN_TYPE (ASMNAME(fn)); \ +ASMNAME(fn): -#define HIDDEN_FUNC(fn)\ - FUNC_START (fn) \ - .hidden fn; \ +#define HIDDEN_FUNC(fn)\ + .globl ASMNAME(fn); \ + FN_TYPE(ASMNAME(fn)); \ + FN_HIDDEN(ASMNAME(fn)); \ +ASMNAME(fn): -#define FUNC_END(fn) .size fn,.-fn +#define FUNC_END(fn) FN_SIZE(ASMNAME(fn)) #ifdef __SSE2__ # ifdef __AVX__ @@ -51,32 +74,30 @@ fn: # endif /* Save SSE registers 6-15. off is the offset of rax to get to xmm6. */ -.macro SSE_SAVE off=0 - MOVAPS %xmm15,(\off - 0x90)(%rax) - MOVAPS %xmm14,(\off - 0x80)(%rax) - MOVAPS %xmm13,(\off - 0x70)(%rax) - MOVAPS %xmm12,(\off - 0x60)(%rax) - MOVAPS %xmm11,(\off - 0x50)(%rax) - MOVAPS %xmm10,(\off - 0x40)(%rax) - MOVAPS %xmm9, (\off - 0x30)(%rax) - MOVAPS %xmm8, (\off - 0x20)(%rax) - MOVAPS %xmm7, (\off - 0x10)(%rax) - MOVAPS %xmm6, \off(%rax) -.endm +#define SSE_SAVE \ + MOVAPS %xmm15,-0x30(%rax); \ + MOVAPS %xmm14,-0x20(%rax); \ + MOVAPS %xmm13,-0x10(%rax); \ + MOVAPS %xmm12, (%rax); \ + MOVAPS %xmm11, 0x10(%rax); \ + MOVAPS %xmm10, 0x20(%rax); \ + MOVAPS %xmm9, 0x30(%rax); \ + MOVAPS %xmm8, 0x40(%rax); \ + MOVAPS %xmm7, 0x50(%rax); \ + MOVAPS %xmm6, 0x60(%rax) /* Restore SSE r
[PATCH 1/2] [testsuite] PR80759 fix tests on Solaris and Darwin
The ms-sysv.exp tests were failing on Solaris and Darwin targets. In addition, a number of other problems have been identified. * Assembly failed on Solaris and Darwin when not using gas due to use of .cfi directives and .struct. * Tests were failing on Solaris due to hard frame pointer being always enabled on that platform and and not passing --omit-rbp-clobbers to the code generator. * Manual compilation (via remote_exec as opposed to dg-runtest, et. al.) was missing TEST_ALWAYS_FLAGS, resulting in color codes in log files. It was also missing -m64 in some cases where it was needed. * When built with make -j48 on an unsupported triplet, the "test unsupported" message appeared 48 times in the log (it appears that several other tests do this as well). * Using hard-coded offsets in do-tests.S is ugly. This is fixed by moving some code into inline assembly in ms-sysv.c. * Custom parallelization code broke when running make without -j * Accessing the test_data global from assembly requires(?) use of global offset table on Darwin. This patch corrects all of these problems. The custom parallelization code has been removed and replaced with calls to procs in gcc's standard testing framework: gcc_parallel_test_enable, runtest_file_p and dg-runtest. This results in much poorer parallelization, which I hope to address in a future patch, but has little effect when built without checking enabled. Previously, each test job compiled and executed around 20k individual tests. This high number resulted in test jobs far exceeding the default 5 minute timeout for remote_/local_exec when gcc was built with --enable-checking=rtl. This has been resolved by splitting the tests out to a maximum of around 3500 tests per job. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 200 + .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c| 83 - .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 153 +--- 3 files changed, 210 insertions(+), 226 deletions(-) diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S index 1395235fd1e..ffe011bcc68 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S @@ -23,141 +23,101 @@ a copy of the GCC Runtime Library Exception along with this program; see the files COPYING3 and COPYING.RUNTIME respectively. If not, see <http://www.gnu.org/licenses/>. */ -#ifdef __x86_64__ - -# ifdef __ELF__ -# define ELFFN_BEGIN(fn) .type fn,@function -# define ELFFN_END(fn) .size fn,.-fn -# else -# define ELFFN_BEGIN(fn) -# define ELFFN_END(fn) -# endif - -# define FUNC(fn) \ - .global fn; \ - ELFFN_BEGIN(fn);\ -fn: - -#define FUNC_END(fn) ELFFN_END(fn) - -# ifdef __AVX__ -# define MOVAPS vmovaps -# else -# define MOVAPS movaps -# endif - -/* TODO: Is there a cleaner way to provide these offsets? */ - .struct 0 -test_data_save: - - .struct test_data_save + 224 -test_data_input: - - .struct test_data_save + 448 -test_data_output: - - .struct test_data_save + 672 -test_data_fn: - - .struct test_data_save + 680 -test_data_retaddr: +#if defined(__x86_64__) && defined(__SSE2__) + +/* These macros currently support GNU/Linux, Solaris and Darwin. */ + +#ifdef __ELF__ +# define FN_TYPE(fn) .type fn,@function +# define FN_SIZE(fn) .size fn,.-fn +#else +# define FN_TYPE(fn) +# define FN_SIZE(fn) +#endif + +#ifdef __USER_LABEL_PREFIX__ +# define ASMNAME2(prefix, name)prefix ## name +# define ASMNAME1(prefix, name)ASMNAME2(prefix, name) +# define ASMNAME(name) ASMNAME1(__USER_LABEL_PREFIX__, name) +#else +# define ASMNAME(name) name +#endif + +#define FUNC_BEGIN(fn) \ + .globl ASMNAME(fn); \ + FN_TYPE (ASMNAME(fn)); \ +ASMNAME(fn): + +#define FUNC_END(fn) FN_SIZE(ASMNAME(fn)) + +#ifdef __AVX__ +# define MOVAPS vmovaps +#else +# define MOVAPS movaps +#endif .text -regs_to_mem: - MOVAPS %xmm6, (%rax) - MOVAPS %xmm7, 0x10(%rax) - MOVAPS %xmm8, 0x20(%rax) - MOVAPS %xmm9, 0x30(%rax) - MOVAPS %xmm10, 0x40(%rax) - MOVAPS %xmm11, 0x50(%rax) - MOVAPS %xmm12, 0x60(%rax) - MOVAPS %xmm13, 0x70(%rax) - MOVAPS %xmm14, 0x80(%rax) - MOVAPS %xmm15, 0x90(%rax) - mov %rsi, 0xa0(%rax) - mov %rdi, 0xa8(%rax) - mov %rbx, 0xb0(%rax) - mov %rbp, 0xb8(%rax) - mov %r12, 0xc0(%rax) - mov %r13, 0xc8(%rax) - mov %r14, 0xd0(%rax) - mov %r15, 0xd8(%rax) +FUNC_BEGIN(regs_to_mem) + MOVAPS %xmm6, (%r10) + MOVAPS %xmm7, 0x10(%r10) + MOVAPS %xmm8, 0x20(%r10) + MOVAPS %xmm9, 0x30(%r10) + MOVAPS %xmm10, 0x4
Re: [PATCH v2 0/2] [testsuite, libgcc] PR80759 Fix FAILs on Solaris and Darwin
This patchset addresses a number of testsuite issues for gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp, mostly occurring on Solaris and Darwin. Additionally, it solves a bug in libgcc that caused link failures on Darwin when building with -mcall-ms2sysv-xlogues. The issues are detailed in the notes for each patch. I would particularly appreciate any feedback for Darwin as I am unfamiliar with the platform and Rainer and I have fashioned some of these changes by looking at other Darwin code in gcc. .../gcc.target/x86_64/abi/ms-sysv/do-test.S | 200 --- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c | 83 +++- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp| 153 +- libgcc/config.host | 6 +- libgcc/config/i386/i386-asm.h| 89 + libgcc/config/i386/resms64.S | 2 +- libgcc/config/i386/resms64f.S| 2 +- libgcc/config/i386/resms64fx.S | 2 +- libgcc/config/i386/resms64x.S| 2 +- libgcc/config/i386/savms64.S | 2 +- libgcc/config/i386/savms64f.S| 2 +- 11 files changed, 274 insertions(+), 269 deletions(-) Many thanks to Rainer for all of his help on this! Thanks, Daniel 2017-06-28 Daniel Santos <daniel.san...@pobox.com> 2017-06-10 Daniel Santos <daniel.san...@pobox.com> PR testsuite/80759 * gcc.target/x86_64/abi/ms-sysv/do-test.S (ELFFN_BEGIN): Rename to FN_TYPE. (ELFFN_END): Rename to FN_SIZE. (ASMNAME): New macro. (FUNC): Rename to FUNC_BEGIN, use ASMNAME and use .globl instead of .global. (FUNC_END): Use ASMNAME. (test_data_save): Remove. (test_data_input): Likewise. (test_data_output: Likewise. (test_data_fn): Likewise. (test_data_retaddr): Likewise. (regs_to_mem): Make globals, use r10 instead of rax. (mem_to_regs): Likewise. (do_test_unaligned): Remove .cfi directives, remove pushf/popf, move body to ms-sysv.c. (do_test_aligned): Likewise. * gcc.target/x86_64/abi/ms-sysv/ms-sysv.c: Add dg-* directives. (PASTE_STR): New macro. (ASMNAME): Likewise. (LOAD_TEST_DATA_ADDR): Likewise. (TEST_DATA_OFFSET): Likewise. (do_test_body0): New C function. (do_test_body): New inline assembly routine. * gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp (runtest_ms_sysv): Modify. 2017-06-28 Daniel Santos <daniel.san...@pobox.com> PR testsuite/80759 * config.host: include i386/t-msabi for darwin and solaris. * config/i386/i386-asm.h (ELFFN): Rename to FN_TYPE. (FN_SIZE): New macro. (FN_HIDDEN): Likewise. (ASMNAME): Likewise. (FUNC_START): Rename to FUNC_BEGIN, use ASMNAME, replace .global with .globl. (HIDDEN_FUNC): Use ASMNAME and .globl instead of .global. (SSE_SAVE): Convert to cpp macro, hard-code offset (always 0x60). * config/i386/resms64.S: Use SSE_SAVE as cpp macro instead of gas .macro. * config/i386/resms64f.S: Likewise. * config/i386/resms64fx.S: Likewise. * config/i386/resms64x.S: Likewise. * config/i386/savms64.S: Likewise. * config/i386/savms64f.S: Likewise.
[PATCH try 2 resend] [i386] Remove warnings for ignoring -mcall-ms2sysv-xlogues.
I appear to have forgotten to cc gcc-patches, sorry about that. There are currently three cases where we issue a warning when disabling -mcall-ms2sysv-xlogues for a function, but I never added a proper warning, so there's no mechanism for disabling it. This is something that I meant to address sooner. I'm thinking that it's better to just remove the warning entirely and document these cases, rather than adding a new warning. Any thoughts? These are the conditions: * the use of -fsplit-stack, * the use of static call chains (not sure if we can ever have that), and * if the function calls __buildin_eh_return. Some of these cases can likely be supported, but they are just on the "not yet tested" list. 2017-06-11 Daniel Santos <daniel.san...@pobox.com * config/i386/i386.c (warn_once_call_ms2sysv_xlogues): Remove. (ix86_compute_frame_layout): Don't call warn_once_call_ms2sysv_xlogues. (ix86_expand_call): Likewise. Thanks, Daniel Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 26 +++--- gcc/doc/invoke.texi| 25 - 2 files changed, 23 insertions(+), 28 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d5c2d46bf5e..2dc6e53c765 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12772,18 +12772,6 @@ ix86_builtin_setjmp_frame_value (void) return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx; } -/* Emits a warning for unsupported msabi to sysv pro/epilogues. */ -static void warn_once_call_ms2sysv_xlogues (const char *feature) -{ - static bool warned_once = false; - if (!warned_once) -{ - warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s", - feature); - warned_once = true; -} -} - /* When using -fsplit-stack, the allocation routines set a field in the TCB to the bottom of the stack plus this much space, measured in bytes. */ @@ -12814,18 +12802,10 @@ ix86_compute_frame_layout (void) gcc_assert (TARGET_SSE); gcc_assert (!ix86_using_red_zone ()); - if (crtl->calls_eh_return) + if (crtl->calls_eh_return || ix86_static_chain_on_stack) { gcc_assert (!reload_completed); m->call_ms2sysv = false; - warn_once_call_ms2sysv_xlogues ("__builtin_eh_return"); - } - - else if (ix86_static_chain_on_stack) - { - gcc_assert (!reload_completed); - m->call_ms2sysv = false; - warn_once_call_ms2sysv_xlogues ("static call chains"); } /* Finally, compute which registers the stub will manage. */ @@ -29290,9 +29270,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, else if (ix86_function_ms_hook_prologue (current_function_decl)) ; - /* TODO: Cases not yet examined. */ + /* TODO: Compatibility not yet examined. */ else if (flag_split_stack) - warn_once_call_ms2sysv_xlogues ("-fsplit-stack"); + ; else { diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c1168823af7..eec02b43a4f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -25389,11 +25389,26 @@ using the function attributes @code{ms_abi} and @code{sysv_abi}. @opindex mno-call-ms2sysv-xlogues Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a System V ABI function must consider RSI, RDI and XMM6-15 as clobbered. By -default, the code for saving and restoring these registers is emitted inline, -resulting in fairly lengthy prologues and epilogues. Using -@option{-mcall-ms2sysv-xlogues} emits prologues and epilogues that -use stubs in the static portion of libgcc to perform these saves and restores, -thus reducing function size at the cost of a few extra instructions. +default, the instructions for saving and restoring these registers are emitted +inline, resulting in fairly lengthy pro- and epilogues. Using +@option{-mcall-ms2sysv-xlogues} emits pro- and epilogues that use stubs in the +static portion of libgcc to perform these saves and restores, thus reducing +function size at the cost of executing a few extra instructions. This cost is +theoretically mitigated or eliminated by reduced instruction cache utilization, +temporal locality of the stubs, and the stubs' use of MOV instructions over +PUSH and POP. + +This option is not supported with SEH, so it is completely unavailable on +Windows. It is also silently disabled if a function: + +@enumerate +@item is built with @option{-mno-sse2} or @option{-fsplit-stack}, +@item has @code{__attribute__ ((ms_hook_prologue))}, or +@item either throws an exception or explicitly calls @code{__builtin_eh_return}. +@end enumerate + +Support for @option{-fsplit-stack} and @code{__builtin_eh_return} may be +added at some time in the future, but has not ye
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/23/2017 09:31 AM, Bernd Edlinger wrote: Hi, this is the latest version of my patch. As already said, it attempts to compute the frame layout only when relevant data have changed. Apologies for doing more clean-up on Daniel's patch than absolutely necessary, but ... Bootstrap and reg-tested successfully on x86_64-pc-linux-gnu with unix\{,-m32\}. Is it OK for trunk? Thanks Bernd. OK with me. Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/22/2017 01:32 PM, Bernd Edlinger wrote: On 05/19/17 05:17, Daniel Santos wrote: No, I'm not at all comfortable with you making so many seemingly unnecessary changes to this code. (Although I wish I got this much feedback during my RFCs! :) I can accept the changes to is/count_stub_managed_reg (with some caveats), but I do not at all see the rationale for changing m_stub_names to a static and adding another dimension for the object instance -- from an OO standpoint, that's just bad design. Can you please share your rationale for that? Hmm, sorry about that ... I just thought it would be nice to avoid the const-cast here. Well remember const-correctness isn't about an object's internal (bitwise) state, but it's externally visible (logical) state. So a const member function need not avoid altering it's internal state if the externally visible state remains unchanged, such as when caching some result or lazy initing. I have tended to prefer using const_cast for this, isolating its use to a single const accessor function (or if () block) to leave less room for the data members to be accidentally altered in another const member function. But mutable is generally preferred over const_cast, which opens up the danger of accidentally modifying an object's logical state (especially by a subsequent edit), so using mutable is probably a better practice anyway. However, ... This moved the m_stub_names from all 4 instances to one static array s_stub_names. But looking at it again, I think the extra dimension is not even necessary, because all instances share the same data, so removing that extra dimension again will be fine. You are correct! And I see that you're new patch has already changed get_stub_name to a static member function, so great! Incidentally, half of the space in that array is wasted and can be trimmed since a given instance of xlogue_layout either supports hard frame pointers or doesn't, I just never got around to splitting that apart. (The first three enum xlogue_stub values are for without HFP and the last three for with.) Also, if we wanted to further reduce the memory footprint of xlogue_layout objects, the offset field of struct reginfo could be changed to int, and if we really wanted to go nuts then 16 bits would work for both of its fields. So for is/count_stub_managed_reg(s), you are obviously much more familiar with gcc, its passes and the i386 backend, but my knowledge level makes me very uncomfortable having the result of xlogue_layout::is_stub_managed_reg() determined in a way that has the (apparent) potential to differ from from the state at the time the last call to ix86_compute_frame_layout() was made; I just don't understand I fund it technically difficult to add a HARD_REG_SET to struct machine_function, and tried to avoid the extra overhead of calling ix86_save_reg so often, which you also felt uncomfortable with. So, if you look at compute_stub_managed_regs I first saw that the first loop can never terminate thru the "return 0", because the registers in x86_64_ms_sysv_extra_clobbered_registers are guaranteed to be clobbered. Then I saw that the bits in stub_managed_regs are always added in the same sequence, thus the result depends only on the number call_ms2sysv_extra_regs and hfp so everything is already available in struct machine_function. Thanks Bernd. Yes, I agree with how you have refactored compute_stub_managed_regs given your rationale of not adding another header dependency to i386.h. It is only the overall scheme of calculating this outside of ix86_compute_frame_layout that I cannot validate due to my not having a good understanding of what can and cannot change in between the time that ix86_compute_frame_layout is last called and the last call to is_stub_managed_regs(). As Uros said, my patch set touches a "delicate part of the compiler, where lots of code-paths cross each other (and we have had quite some hard-to-fix bugs in this area)" (https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01924.html). I wrote it the way I did with my understanding of what was safe to do and your alterations move it's functionality outside of that understanding. So if you say that this won't break it, then I will have to trust you (and the testsuite) on that. On that note, the tests are undergoing some change and bug fixes and I'm planning on adding more tests to validate non-breakage with various other stack frame-related options and probably additional tests (and test options) triggered by GCC_TEST_RUN_EXPENSIVE or some such. Daniel
Re: [PATCH 2/2] [testsuite] PR 80759 Remove gas extensions from do-test.S, fix other problems
Thanks you for your assistance Rainer! On 05/19/2017 04:03 AM, Rainer Orth wrote: unfortunately, it still doesn't, as explained in the PR. The multilib support is still wrong/non-existant. I guess I thought for some reason that would magically appear in TEST_ALWAYS_FLAGS. :) I've explicitly added it for now, but I haven't yet found where -m64 gets fed in the normal flow of things and I would rather know I'm doing things as closely as possible to how the rest if the test harness does it. (I have SVN write privs now, so I can even commit it myself). Please always include ChangeLog entries with your patch submissions so one can easily see what you've change (cf. https://gcc.gnu.org/contribute.html). Thanks. Rainer I hate when I forget that! I'll be sure to remember when I resubmit. Use of .struct in do_test.S causes breakages when gas isn't the assembler (e.g., Solaris). I also wasn't including TEST_ALWAYS_FLAGS in my CFLAGS resulting in super-ugly log files. Finally, this patch eliminates spam of "test unsupported" (limiting it to one printing). Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 26 +- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c| 7 ++ .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 24 3 files changed, 27 insertions(+), 30 deletions(-) diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S index 1395235fd1e..967eb959fbc 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S @@ -46,22 +46,6 @@ fn: # define MOVAPS movaps # endif -/* TODO: Is there a cleaner way to provide these offsets? */ - .struct 0 -test_data_save: - - .struct test_data_save + 224 -test_data_input: - - .struct test_data_save + 448 -test_data_output: - - .struct test_data_save + 672 -test_data_fn: - - .struct test_data_save + 680 -test_data_retaddr: - .text regs_to_mem: @@ -132,23 +116,23 @@ L0: callregs_to_mem # Load register with random data - lea test_data + test_data_input(%rip), %rax + lea test_data + 224(%rip), %rax callmem_to_regs # Save original return address pop %rax - movq%rax, test_data + test_data_retaddr(%rip) + movq%rax, test_data + 680(%rip) # Call the test function - call*test_data + test_data_fn(%rip) + call*test_data + 672(%rip) # Restore the original return address - movqtest_data + test_data_retaddr(%rip), %rcx + movqtest_data + 680(%rip), %rcx push%rcx # Save test function return value and store resulting register values push%rax - lea test_data + test_data_output(%rip), %rax + lea test_data + 448(%rip), %rax callregs_to_mem # Restore registers diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c index 2a011f5103d..7cec312c386 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c @@ -346,6 +346,13 @@ int main (int argc, char *argv[]) assert (!((long)_data.regdata[REG_SET_INPUT] & 15)); assert (!((long)_data.regdata[REG_SET_OUTPUT] & 15)); + /* Verify offsets hard-coded into assembly. */ + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_SAVE]) == 0); + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_INPUT]) == 224); + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_OUTPUT]) == 448); + assert (__builtin_offsetof (struct test_data, fn) == 672); + assert (__builtin_offsetof (struct test_data, retaddr) == 680); + while ((c = getopt (argc, argv, "s:f")) != -1) { switch (c) while .struct is a gas extension and doesn't work with the Solaris/x86 /bin/as, having the same (mostly unexplained) constants hardcoded in two places isn't exactly helpful. I'd suggest moving them to (say) ms-sysv.h and include that from both do-test.S (which is preprocessed assembler after all) and ms-sysv.c. Rainer Well, I don't have an ms-sysv.h, but I suppose I can add one. I'm starting to lean more towards the idea of plucking out the portion of asm that uses these offsets, moving that to an inline asm function and having the code in do-test.S just jmp to it. I wish there was some sort of "naked" attribute for x86 since I'm not well versed in every way that the compiler can change it in a way that wouldn't be friendly. void __attribute__((optimize ("-O0 -fno-split-stack"))) do_test_body (void) { __asm__ __volatile__ ( "# Save registers\n" " lea %0, %%
[PATCH 2/2] [testsuite] PR 80759 Remove gas extensions from do-test.S, fix other problems
Use of .struct in do_test.S causes breakages when gas isn't the assembler (e.g., Solaris). I also wasn't including TEST_ALWAYS_FLAGS in my CFLAGS resulting in super-ugly log files. Finally, this patch eliminates spam of "test unsupported" (limiting it to one printing). Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 26 +- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c| 7 ++ .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 24 3 files changed, 27 insertions(+), 30 deletions(-) diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S index 1395235fd1e..967eb959fbc 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S @@ -46,22 +46,6 @@ fn: # define MOVAPS movaps # endif -/* TODO: Is there a cleaner way to provide these offsets? */ - .struct 0 -test_data_save: - - .struct test_data_save + 224 -test_data_input: - - .struct test_data_save + 448 -test_data_output: - - .struct test_data_save + 672 -test_data_fn: - - .struct test_data_save + 680 -test_data_retaddr: - .text regs_to_mem: @@ -132,23 +116,23 @@ L0: callregs_to_mem # Load register with random data - lea test_data + test_data_input(%rip), %rax + lea test_data + 224(%rip), %rax callmem_to_regs # Save original return address pop %rax - movq%rax, test_data + test_data_retaddr(%rip) + movq%rax, test_data + 680(%rip) # Call the test function - call*test_data + test_data_fn(%rip) + call*test_data + 672(%rip) # Restore the original return address - movqtest_data + test_data_retaddr(%rip), %rcx + movqtest_data + 680(%rip), %rcx push%rcx # Save test function return value and store resulting register values push%rax - lea test_data + test_data_output(%rip), %rax + lea test_data + 448(%rip), %rax callregs_to_mem # Restore registers diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c index 2a011f5103d..7cec312c386 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c @@ -346,6 +346,13 @@ int main (int argc, char *argv[]) assert (!((long)_data.regdata[REG_SET_INPUT] & 15)); assert (!((long)_data.regdata[REG_SET_OUTPUT] & 15)); + /* Verify offsets hard-coded into assembly. */ + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_SAVE]) == 0); + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_INPUT]) == 224); + assert (__builtin_offsetof (struct test_data, regdata[REG_SET_OUTPUT]) == 448); + assert (__builtin_offsetof (struct test_data, fn) == 672); + assert (__builtin_offsetof (struct test_data, retaddr) == 680); + while ((c = getopt (argc, argv, "s:f")) != -1) { switch (c) diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp index 77c40dbf349..a9571f194b1 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp @@ -23,18 +23,12 @@ # see the files COPYING3 and COPYING.RUNTIME respectively. If not, see # <http://www.gnu.org/licenses/>. -# Exit immediately if this isn't a native x86_64 target. -if { (![istarget x86_64-*-*] && ![istarget i?86-*-*]) - || ![is-effective-target lp64] || ![isnative] } then { -unsupported "$subdir" -return -} - load_lib gcc-dg.exp load_lib parallelize.exp proc runtest_ms_sysv { cflags generator_args } { -global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir +global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir \ + TEST_ALWAYS_FLAGS set objdir "$tmpdir/ms-sysv" set generator "$tmpdir/ms-sysv-generate.exe" @@ -93,7 +87,7 @@ proc runtest_ms_sysv { cflags generator_args } { } } -set cc "$GCC_UNDER_TEST -I$objdir -I$srcdir/$subdir $cflags $warn_flags" +set cc "$GCC_UNDER_TEST -I$objdir -I$srcdir/$subdir $TEST_ALWAYS_FLAGS $cflags $warn_flags" # Assemble do-test.S set src "$srcdir/$subdir/do-test.S" @@ -142,6 +136,18 @@ if { [parallel-init "ms2sysv"] != 0 } then { return; } +# Exit if this isn't a native x86_64 target. +if { (![istarget x86_64-*-*] && ![istarget i?86-*-*]) + || ![is-effective-target lp64] || ![isnative] } then { + +# The first call to parallel-should-run-test is used so we only print the +# "
[PATCH 1/2] [testsuite] Move non-standard parallelization support into new lib and fix flaw
This fixes a flaw in my parallelization code that caused it to fail when GCC_RUNTEST_PARALLELIZE_DIR wasn't set. It worked fine with make -j1, but failed with just make. As there could be other tests that might need to do their own paralellization, I'm moving the that code into it's own file under gcc/testsuite/lib. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 48 gcc/testsuite/lib/parallelize.exp | 88 ++ 2 files changed, 103 insertions(+), 33 deletions(-) create mode 100644 gcc/testsuite/lib/parallelize.exp diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp index e317af9bd85..77c40dbf349 100644 --- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp @@ -30,13 +30,11 @@ if { (![istarget x86_64-*-*] && ![istarget i?86-*-*]) return } -global GCC_RUNTEST_PARALLELIZE_DIR - load_lib gcc-dg.exp +load_lib parallelize.exp proc runtest_ms_sysv { cflags generator_args } { -global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir \ - parallel_dir next_test +global GCC_UNDER_TEST HOSTCXX HOSTCXXFLAGS tmpdir srcdir subdir set objdir "$tmpdir/ms-sysv" set generator "$tmpdir/ms-sysv-generate.exe" @@ -46,22 +44,6 @@ proc runtest_ms_sysv { cflags generator_args } { set ms_sysv_exe "$objdir/ms-sysv.exe" set status 0 set warn_flags "-Wall" -set this_test $next_test -incr next_test - -# Do parallelization here -if [catch {set fd [open "$parallel_dir/$this_test" \ - [list RDWR CREAT EXCL]]} ] { - if { [lindex $::errorCode 1] eq "EEXIST" } then { - # Another job is running this test - return - } else { - error "Failed to open $parallel_dir/$this_test: $::errorCode" - set status 1 - } -} else { - close $fd -} # Detect when hard frame pointers are enabled (or required) so we know not # to generate bp clobbers. @@ -73,9 +55,17 @@ proc runtest_ms_sysv { cflags generator_args } { set descr "$subdir CFLAGS=\"$cflags\" generator_args=\"$generator_args\"" verbose "$tmpdir: Running test $descr" 1 -# Cleanup any previous test in objdir -file delete -force $objdir -file mkdir $objdir +set status [parallel-should-run-test] + +if { $status == 1 } then { + return +} + +if { $status == 0 } then { + # Cleanup any previous test in objdir + file delete -force $objdir + file mkdir $objdir +} # Build the generator (only needs to be done once). set src "$srcdir/$subdir/gen.cc" @@ -148,16 +138,8 @@ proc runtest_ms_sysv { cflags generator_args } { } dg-init - -# Setup parallelization -set next_test 0 -set parallel_dir "$env(GCC_RUNTEST_PARALLELIZE_DIR)/abi-ms-sysv" -file mkdir "$env(GCC_RUNTEST_PARALLELIZE_DIR)" -file mkdir "$parallel_dir" - -if { ![file isdirectory "$parallel_dir"] } then { -error "Failed to create directory $parallel_dir: $::errorCode" -return +if { [parallel-init "ms2sysv"] != 0 } then { +return; } set gen_opts "-p0-5" diff --git a/gcc/testsuite/lib/parallelize.exp b/gcc/testsuite/lib/parallelize.exp new file mode 100644 index 000..346a06f0fa0 --- /dev/null +++ b/gcc/testsuite/lib/parallelize.exp @@ -0,0 +1,88 @@ +# Functions for parallelizing tests that cannot use the standard dg-run, +# dg-runtest or gcc-dg-runtest for some reason. +# +# Copyright (C) 2017 Free Software Foundation, Inc. +# Contributed by Daniel Santos <daniel.san...@pobox.com> +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# Under Section 7 of GPL version 3, you are granted additional +# permissions described in the GCC Runtime Library Exception, version +# 3.1, as published by the Free Software Foundation. +# +# You should have received a copy of the GNU General Public License and +# a copy of the GCC Runtime Library Exception along with this program; +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +# <http://www.gnu.org/licenses/>. + +set is_parallel_build 0 +set parallel_next_test 0 +set parallel_dir ""
[PATCH 0/2] [testsuite] PR80759 Fix test breakages on i386-pc-solaris2.*
There are a few issues with my ms-sysv.exp tests: 1. Use of gas extensions in do_test.S cause breakages on Solaris, 2. Parallelization breaks when no make -j flag is passed, 3. Builds aren't adding TEST_ALWAYS_FLAGS, so log files filled with color escape codes, and 4. The "test unsupported" message is being spammed once for each -j https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80759 I've broken this apart into two patches because I don't know if you'll agree with the first one. I fixed the make -j issue and moved the parallelization code into a new gcc/target/lib/parallelize.exp in the first patch and fixed all of the other issues in the second. I've removed all usage of gas .struct in my assembly file, used hard-coded the offsets into the code and added asserts to main.c to make sure they don't change. I've bootstrapped and retested on x86_64 Linux and have asked Rainer to retest on Solaris. Presuming that succeeds, are you OK with this change? (I have SVN write privs now, so I can even commit it myself). Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
PS: Oh! it might be due to the difference between -j1 and no -j argument. Yes, that's how I missed it. This flaw isn't exposed with make -j1, but is exposed with just make. Thanks for finding this! Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/18/2017 08:37 AM, Bernd Edlinger wrote: On 05/17/17 04:01, Daniel Santos wrote: - if (ignore_outlined && cfun->machine->call_ms2sysv - && in_hard_reg_set_p (stub_managed_regs, DImode, regno)) -return false; + if (ignore_outlined && cfun->machine->call_ms2sysv) +{ + /* Registers who's save & restore will be managed by stubs called from + pro/epilogue. */ + HARD_REG_SET stub_managed_regs; + xlogue_layout::compute_stub_managed_regs (stub_managed_regs); + if (in_hard_reg_set_p (stub_managed_regs, DImode, regno)) +return false; +} + if (crtl->drap_reg && regno == REGNO (crtl->drap_reg) && !cfun->machine->no_drap_save_restore) This makes no sense. The entire purpose of stub_managed_regs is to cache the result of xlogue_layout::compute_stub_managed_regs() and this would unnecessarily repeat that calculation for each time ix86_save_reg() is called. Since xlogue_layout::compute_stub_managed_regs() calls ix86_save_reg many times, this makes it even worse.Which registers are being saved out-of-line and inline MUST be known at the time the stack layout is determined. So stub_managed_regsshould either be left a TU static or just moved to struct machine_function. As an aside, I've noticed that xlogue_layout::compute_stub_managed_regs is calling ix86_save_reg (which isn't trivial) more often than it really has to, so I've refactored it. Well, meanwhile I think the stub_managed_regs contain zero information and need not be saved at all, because it can easily be reconstructed from m->call_ms2sysv_extra_regs. See the attached new version. Daniel does it work for you? No, I'm not at all comfortable with you making so many seemingly unnecessary changes to this code. (Although I wish I got this much feedback during my RFCs! :) I can accept the changes to is/count_stub_managed_reg (with some caveats), but I do not at all see the rationale for changing m_stub_names to a static and adding another dimension for the object instance -- from an OO standpoint, that's just bad design. Can you please share your rationale for that? Incidentally, half of the space in that array is wasted and can be trimmed since a given instance of xlogue_layout either supports hard frame pointers or doesn't, I just never got around to splitting that apart. (The first three enum xlogue_stub values are for without HFP and the last three for with.) Also, if we wanted to further reduce the memory footprint of xlogue_layout objects, the offset field of struct reginfo could be changed to int, and if we really wanted to go nuts then 16 bits would work for both of its fields. So for is/count_stub_managed_reg(s), you are obviously much more familiar with gcc, its passes and the i386 backend, but my knowledge level makes me very uncomfortable having the result of xlogue_layout::is_stub_managed_reg() determined in a way that has the (apparent) potential to differ from from the state at the time the last call to ix86_compute_frame_layout() was made; I just don't understand well enough what all can change in between the last call to ix86_compute_frame_layout() and the last call to xlogue_layout::is_stub_managed_reg(). I like your count_stub_managed_regs() is_stub_managed_regs() from a *performance* standpoint (and I know I get too uptight about that kind of thing, so appreciate that), but as to the change in scheme, I would have to trust you if you assert that this will always behave consistently. I also want to give you a little background on some of these seemingly repetitive computations. One of my design goals was for the code to be relatively easily to adapted to the management of out-of-line pro/epilogue stubs for other possible scenarios where there are a lot of clobbers and it could be useful. Granted, I don't have enough knowledge of x86 architectures to identify situations other than this one (in 64-bit Wine) where it could be helpful and I know that x86 push/pops are really small. So theoretically, struct machine_function's "call_ms2sysv" could be changed to something like "outline_savres" and any combination of clobbered registers for which there is a descent stub could be used if it was a good choice. I also realize that nobody likes complexity that isn't being used, and I respect that. So if you are comfortable with this change and you believe you understand how it works then I will agree to it, but I'll be trusting you well beyond my knowledge level and ability to confidently predict the outcome (probably what a programmer hates the most). Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/17/2017 01:39 PM, Bernd Edlinger wrote: On 05/15/17 03:39, Daniel Santos wrote: I should add that if you want to run faster tests just on the ms to sysv abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if that succeeds run the full testsuite. Daniel Hmm, that's funny... If I use "make check-c RUNTESTFLAGS="ms-sysv.exp" -j8" it seems to work, but if I omit the -j8 it fails: make check-c RUNTESTFLAGS="ms-sysv.exp" ...Test Run By ed on Wed May 17 20:38:24 2017 Native configuration is x86_64-pc-linux-gnu === gcc tests === Schedule of variations: unix Running target unix Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/ed/gnu/gcc-trunk/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. Running /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp ... ERROR: tcl error sourcing /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp. ERROR: no such variable (read trace on "env(GCC_RUNTEST_PARALLELIZE_DIR)") invoked from within "set parallel_dir "$env(GCC_RUNTEST_PARALLELIZE_DIR)/abi-ms-sysv"" (file "/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp" line 154) invoked from within "source /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp" ("uplevel" body line 1) invoked from within "uplevel #0 source /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp" invoked from within "catch "uplevel #0 source $test_file_name"" === gcc Summary === /home/ed/gnu/gcc-build/gcc/xgcc version 8.0.0 20170514 (experimental) (GCC) make[2]: Leaving directory `/home/ed/gnu/gcc-build/gcc' make[1]: Leaving directory `/home/ed/gnu/gcc-build/gcc' Hmm, that might be something I hadn't actually tried. And if I run it in a directory where I had previously run a multi-job check it doesn't blow up (maybe because the directory is already there?) Due to the nature of my test program, I had to break with tradition and implement something akin to the test that generates random structs (I forgot what that one is called). It ended up breaking the bastardized parallelization scheme, so I had to implement my own re-bastardized scheme. Looks like I can just skip parallelization if GCC_RUNTEST_PARALLELIZE_DIR isn't defined. I have another Solaris test issue on PR 80759 so I'll fix that along with it. Thanks, Daniel PS: Oh! it might be due to the difference between -j1 and no -j argument.
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/17/2017 12:41 PM, Bernd Edlinger wrote: Apologies if I ruined your patch... As I said before, I'm the new guy here. :) So when this is done I'll rebase my changes. I have some test stuff to fix and some refactoring and refinements to xlogue_layout::compute_stub_managed_regs(). And then I'll find a solution to the stub_managed_regs after that. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c(revision 248031) +++ gcc/config/i386/i386.c(working copy) @@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] = /* Additional registers that are clobbered by SYSV calls. */ -unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] = +#define NUM_X86_64_MS_CLOBBERED_REGS 12 +static int const x86_64_ms_sysv_extra_clobbered_registers + [NUM_X86_64_MS_CLOBBERED_REGS] = Is there a reason you're changing this unsigned to signed int? While AX_REG and such are just preprocessor macros, everywhere else it seems that register numbers are dealt with as unsigned ints. I actually there seems to be confusion about "int" vs. "unsigned int" for regno, the advantage of int, is that it can contain -1 as a exceptional value. Furthermore there are 3 similar arrays just above that also use int: static int const x86_64_int_parameter_registers[6] = { DI_REG, SI_REG, DX_REG, CX_REG, R8_REG, R9_REG }; static int const x86_64_ms_abi_int_parameter_registers[4] = { CX_REG, DX_REG, R8_REG, R9_REG }; static int const x86_64_int_return_registers[4] = { AX_REG, DX_REG, DI_REG, SI_REG }; /* Additional registers that are clobbered by SYSV calls. */ #define NUM_X86_64_MS_CLOBBERED_REGS 12 static int const x86_64_ms_sysv_extra_clobbered_registers [NUM_X86_64_MS_CLOBBERED_REGS] = { SI_REG, DI_REG, XMM6_REG, XMM7_REG, XMM8_REG, XMM9_REG, XMM10_REG, XMM11_REG, XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG }; So IMHO it looked odd to have one array use a different type in the first place. OK. I think that when I originally started this I was using elements of this array in comparisons and got the signed/unsigned warning and changed them. None of the code gives that warning now however. @@ -2484,13 +2486,13 @@ class xlogue_layout { needs to store registers based upon data in the machine_function. */ HOST_WIDE_INT get_stack_space_used () const { -const struct machine_function = *cfun->machine; -unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1; +const struct machine_function *m = cfun->machine; +unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1; What is the reason for this change? Because a mixture of C and C++ (C wants "struct" machine_function) looks ugly, and everywhere else in this module, "m" is a pointer and no reference. I see, consistency with the rest of the file. -gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); +gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); return m_regs[last_reg].offset -+ (m.call_ms2sysv_pad_out ? 8 : 0) -+ STUB_INDEX_OFFSET; + + (m->call_ms2sysv_pad_out ? 8 : 0) + + STUB_INDEX_OFFSET; } /* Returns the offset for the base pointer used by the stub. */ @@ -2532,7 +2534,7 @@ class xlogue_layout { /* Lazy-inited cache of symbol names for stubs. */ char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN]; - static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT]; + static const struct GTY(()) xlogue_layout s_instances[XLOGUE_SET_COUNT]; Hmm, during development I originally had C-style xlogue_layout as a struct and later decided to make it a class and apparently forgot to remove the "struct" here. None the less, it's bazaar that the GTY() would go in between the "struct" and the "xlogue_layout." As I said before, I don't fully understand how this GTY works. Can we just remove the "struct" keyword? Also, if the way I had it was wrong, (and resulted in garbage collection not working right) then perhaps it was the cause of a problem I had with caching symbol rtx objects. I could not get this to work because my cached objects would somehow become stale and I've since removed that code (from xlogue_layout::get_stub_rtx). (i.e., does GTY effect lifespan of globals, TU statics and static C++ data members?) Yes, I have not noticed the "struct", and agree to remove it. I just saw every other place where GTY is used it is directly after "struct" or "static", so my impulse was just to follow that examples. Yeah, and not understanding how it worked I was just trying to follow suit. But neither version actually makes the class GC-able. Apparently this class construct is too complicated for the gengtype machinery. So I am inclined to remove the GTY keyword completely as it gives you only false security in GC's ability to garbage collect anything in this class.
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/16/2017 02:52 PM, Bernd Edlinger wrote: I think I solved the problem with -fsplit-stack, I am not sure if ix86_static_chain_on_stack might change after reload due to final.c possibly calling targetm.calls.static_chain, but if that is the case, that is an already pre-existing problem. The goal of this patch is to make all decisions regarding the frame layout before the reload pass, and to make sure that the frame layout does not change unexpectedly it asserts that the data that goes into the decision does not change after reload_completed. With the attached patch -fsplit-stack and the attribute ms_hook_prologue is handed directly at the ix86_expand_call, because that data is already known before expansion. The calls_eh_return and ix86_static_chain_on_stack may become known at a later time, but after reload it should not change any more. To be sure, I added an assertion at ix86_static_chain, which the regression test did not trigger, neither with -m64 nor with -m32. I have bootstrapped the patch several times, and a few times I encounterd a segfault in the garbage collection, but it did not happen every time. Currently I think that is unrelated to this patch. Bootstrapped and reg-tested on x86_64-pc-linux-gnu with -m64/-m32. Is it OK for trunk? Thanks Bernd. With as many formatting errors as I seem to have had, I would like to fix those then you patch on top of that if you wouldn't mind terribly. While gcc uses subversion, git-blame is still very helpful (then again, since Uros committed it for me, I guess that's already off). Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c(revision 248031) +++ gcc/config/i386/i386.c(working copy) @@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] = /* Additional registers that are clobbered by SYSV calls. */ -unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] = +#define NUM_X86_64_MS_CLOBBERED_REGS 12 +static int const x86_64_ms_sysv_extra_clobbered_registers + [NUM_X86_64_MS_CLOBBERED_REGS] = Is there a reason you're changing this unsigned to signed int? While AX_REG and such are just preprocessor macros, everywhere else it seems that register numbers are dealt with as unsigned ints. @@ -2484,13 +2486,13 @@ class xlogue_layout { needs to store registers based upon data in the machine_function. */ HOST_WIDE_INT get_stack_space_used () const { -const struct machine_function = *cfun->machine; -unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1; +const struct machine_function *m = cfun->machine; +unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1; What is the reason for this change? -gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); +gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS); return m_regs[last_reg].offset -+ (m.call_ms2sysv_pad_out ? 8 : 0) -+ STUB_INDEX_OFFSET; + + (m->call_ms2sysv_pad_out ? 8 : 0) + + STUB_INDEX_OFFSET; } /* Returns the offset for the base pointer used by the stub. */ @@ -2532,7 +2534,7 @@ class xlogue_layout { /* Lazy-inited cache of symbol names for stubs. */ char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN]; - static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT]; + static const struct GTY(()) xlogue_layout s_instances[XLOGUE_SET_COUNT]; Hmm, during development I originally had C-style xlogue_layout as a struct and later decided to make it a class and apparently forgot to remove the "struct" here. None the less, it's bazaar that the GTY() would go in between the "struct" and the "xlogue_layout." As I said before, I don't fully understand how this GTY works. Can we just remove the "struct" keyword? Also, if the way I had it was wrong, (and resulted in garbage collection not working right) then perhaps it was the cause of a problem I had with caching symbol rtx objects. I could not get this to work because my cached objects would somehow become stale and I've since removed that code (from xlogue_layout::get_stub_rtx). (i.e., does GTY effect lifespan of globals, TU statics and static C++ data members?) /* Constructor for xlogue_layout. */ @@ -2639,11 +2643,11 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_ : m_hfp (hfp) , m_nregs (hfp ? 17 : 18), m_stack_align_off_in (stack_align_off_in) { + HOST_WIDE_INT offset = stack_align_off_in; + unsigned i, j; + memset (m_regs, 0, sizeof (m_regs)); memset (m_stub_names, 0, sizeof (m_stub_names)); - - HOST_WIDE_INT offset = stack_align_off_in; - unsigned i, j; for (i = j = 0; i < MAX_REGS; ++i) { unsigned regno = REG_ORDER[i]; @@ -2662,11 +2666,12 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_ m_regs[j].regno= regno; m_regs[j++].offset = offset - STUB_INDEX_OFFSET; } -gcc_assert (j
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/16/2017 12:19 PM, Ian Lance Taylor wrote: On Mon, May 15, 2017 at 10:00 PM, Daniel Santos <daniel.san...@pobox.com> wrote: Ian, would you mind looking at this please? A combination of my -mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when ix86_expand_split_stack_prologue() calls ix86_expand_call(). I don't have a lot of context here. I assume that ms2sysv is going to be used on Windows systems, where -fsplit-stack isn't really going to work anyhow, so I think it would probably be OK that reject that combination if it causes trouble. Sorry I wasn't more specific. This -mcall-ms2sysv-xlogues actually targets Wine, although they don't use -fsplit-stack. My patch set as-is is disabled when fsplit-stack is used, but during ix86_compute_frame_layout, which is too late in the case of -fsplit-stack. I think I should just change this to a sorry() in ix86_option_override_internal. Also, it's overkill for ix86_expand_split_stack_prologue to call ix86_expand_call. The call is always to __morestack, and __morestack is written in assembler, so we could use a simpler version of ix86_expand_call if that helps. In particular we can decide that __morestack doesn't clobber any unusual registers, if that is what is causing the problem. Ian Well aside from the conflict of the two patches, it just looks like it has the potential to generate clobbers where none are needed, but I'm having trouble actually *proving* that, so maybe I'm just wrong. Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/16/2017 03:34 AM, Bernd Edlinger wrote: It would be good to have test cases for each of the not-supported warnings that can happen, so far I only managed to get a test case for -fsplit-stack. Yes, I'm inclined to agree. I'll try to get this done today or tomorrow. I've also put in a limiter of one warning per TU. One problem is that there isn't a way to disable the warning, so I may want to add that. Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
Ian, would you mind looking at this please? A combination of my -mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when ix86_expand_split_stack_prologue() calls ix86_expand_call(). On 05/15/2017 06:46 PM, Daniel Santos wrote: Rather or not m->call_ms2sysv is set determines which stack layout is used when ix86_compute_frame_layout() runs. But if we can run expand_call after the final time ix86_compute_frame_layout() then we have a problem. It looks like ix86_expand_split_stack_prologue is the only function that manually calls ix86_expand_call, but maybe it would be better to modify the test to something like this: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a78819d6b3f..c36383f6962 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -29325,7 +29325,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, } /* Set here, but it may get cleared later. */ - if (TARGET_CALL_MS2SYSV_XLOGUES) + if (TARGET_CALL_MS2SYSV_XLOGUES && !reload_completed) cfun->machine->call_ms2sysv = true; } Actually, I think this is wrong. I happened to recall looking at the morestack code last year and remembered that it was all assembly. I looked at it again and I don't see that it calls anything outside of it's implementation file (libgcc/config/i386/morestack.S) except for _Unwind_Resume and the calling function its self (I think it calls its caller). It saves and restores rsi and rdi and doesn't use any sse registers, so it doesn't need to clobber all of the regs in the x86_64_ms_sysv_extra_clobbered_registers array. I'm guessing that this should have it's own pattern instead of calling ix86_expand_call in the first place. Of course, I'm the new guy here, so please enlighten me if I'm wrong. Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/15/2017 03:39 PM, Bernd Edlinger wrote: On 05/15/17 03:39, Daniel Santos wrote: On 05/14/2017 11:31 AM, Bernd Edlinger wrote: Hi Daniel, there is one thing I don't understand in your patch: That is, it introduces a static value: /* Registers who's save & restore will be managed by stubs called from pro/epilogue. */ static HARD_REG_SET GTY(()) stub_managed_regs; This seems to be set as a side effect of ix86_compute_frame_layout, and depends on the register usage of the current function. But values that depend on the current function need usually be attached to cfun->machine, because the passes can run in parallel unless I am completely mistaken, and the stub_managed_regs may therefore be computed from a different function. Bernd. I should add that if you want to run faster tests just on the ms to sysv abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if that succeeds run the full testsuite. Daniel Unfortunately I encounter a serious problem when my patch is used ontop of your patch, Yes, the test suite ran without error, but then I tried to trigger the warning and that tripped an ICE. The reason is that cfun->machine->call_ms2sysv can be set to true *after* reload_completed, which can be seen using the following patch: Index: i386.c === --- i386.c (revision 248031) +++ i386.c (working copy) @@ -29320,7 +29320,10 @@ /* Set here, but it may get cleared later. */ if (TARGET_CALL_MS2SYSV_XLOGUES) + { + gcc_assert(!reload_completed); cfun->machine->call_ms2sysv = true; + } } if (vec_len > 1) That assertion is triggered in this test case: cat test.c int test() { __builtin_printf("test\n"); return 0; } gcc -mabi=ms -mcall-ms2sysv-xlogues -fsplit-stack -c test.c test.c: In function 'test': test.c:5:1: internal compiler error: in ix86_expand_call, at config/i386/i386.c:29324 } ^ 0x13390a4 ix86_expand_call(rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*, bool) ../../gcc-trunk/gcc/config/i386/i386.c:29324 0x1317494 ix86_expand_split_stack_prologue() ../../gcc-trunk/gcc/config/i386/i386.c:15920 0x162ba21 gen_split_stack_prologue() ../../gcc-trunk/gcc/config/i386/i386.md:12556 0x12f3f30 target_gen_split_stack_prologue ../../gcc-trunk/gcc/config/i386/i386.md:12325 0xb237b3 make_split_prologue_seq ../../gcc-trunk/gcc/function.c:5822 0xb23a08 thread_prologue_and_epilogue_insns() ../../gcc-trunk/gcc/function.c:5958 0xb24840 rest_of_handle_thread_prologue_and_epilogue ../../gcc-trunk/gcc/function.c:6428 0xb248c0 execute ../../gcc-trunk/gcc/function.c:6470 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. so, in ix86_expand_split_stack_prologue we first call: ix86_finalize_stack_realign_flags (); ix86_compute_frame_layout (); and later: call_insn = ix86_expand_call (NULL_RTX, gen_rtx_MEM (QImode, fn), GEN_INT (UNITS_PER_WORD), constm1_rtx, pop, false); which changes a flag with a huge impact on the frame layout, but there is no absolutely no way how the frame layout can change once it is finalized. Any Thoughts? Bernd. Well, my intention was actually to punt on those cases, but I hadn't actually tested with -fsplit-stack. It looks like ix86_expand_split_stack_prologue calls ix86_expand_call, and I hadn't anticipated it getting called after the last call to ix86_compute_frame_layout(), which your patch has probably eliminated. In the case of -fsplit-stack, I'm testing the macro flag_split_stack which (currently) just expands to check the global flag, so this could instead be done in ix86_option_override_internal () instead, but I think it highlights a somewhat deeper problem. Rather or not m->call_ms2sysv is set determines which stack layout is used when ix86_compute_frame_layout() runs. But if we can run expand_call after the final time ix86_compute_frame_layout() then we have a problem. It looks like ix86_expand_split_stack_prologue is the only function that manually calls ix86_expand_call, but maybe it would be better to modify the test to something like this: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a78819d6b3f..c36383f6962 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -29325,7 +29325,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, } /* Set here, but it may get cleared later. */ - if (TARGET_CALL_MS2SYSV_XLOGUES) + if (TARGET_CALL_MS2SYSV_XLOGUES && !reload_completed) cfun->machine->call_ms2sysv = true; } Or even use the same incompatibility tests from ix86_c
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/14/2017 11:31 AM, Bernd Edlinger wrote: Hi Daniel, there is one thing I don't understand in your patch: That is, it introduces a static value: /* Registers who's save & restore will be managed by stubs called from pro/epilogue. */ static HARD_REG_SET GTY(()) stub_managed_regs; This seems to be set as a side effect of ix86_compute_frame_layout, and depends on the register usage of the current function. But values that depend on the current function need usually be attached to cfun->machine, because the passes can run in parallel unless I am completely mistaken, and the stub_managed_regs may therefore be computed from a different function. Bernd. I should add that if you want to run faster tests just on the ms to sysv abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if that succeeds run the full testsuite. Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/14/2017 11:31 AM, Bernd Edlinger wrote: Hi Daniel, there is one thing I don't understand in your patch: That is, it introduces a static value: /* Registers who's save & restore will be managed by stubs called from pro/epilogue. */ static HARD_REG_SET GTY(()) stub_managed_regs; This seems to be set as a side effect of ix86_compute_frame_layout, and depends on the register usage of the current function. But values that depend on the current function need usually be attached to cfun->machine, because the passes can run in parallel unless I am completely mistaken, and the stub_managed_regs may therefore be computed from a different function. Bernd. I'm relatively new to GCC and still learning. However, there are quite a lot of static TU variables in i386.c like this. I am not aware of gcc having parallelism support, but if it were to be added then all of these TU variables should probably be moved to some class or struct (like cfun->machine) to reduce the number of TLS lookups required (which I presume is a little more expensive than a this/offset calculation). Having this (as well as other variables) in such a struct is better design IMO, but as I said, I'm still learning GCC's architecture, idioms and patterns. (I should add that I don't really understand the GTY memory management either. :) To be clear on class xlogue_layout, the only instances of this class are const and could be shared across multiple threads. It is dependent upon the cfun->machine as well as the global struct rtl_data crtl, but is not so entangled that were these proper C++ classes (with private data) that it would need to be a friend -- it only needs read-access to their data members. To be honest, it's a strange feeling programming in a mixture of C and C++ idioms, but I know it was only recently converted to C++ so I think it's better to try to use only one or the other in a given function. But if I were going to do this all OO, then ix86_compute_frame_layout would be a member function of ix86_frame (which would be a specialization of some generic "frame" class), machine_function would be class ix86_machine_function with it's own compute_frame_layout that called ix86_frame::compute_frame_layout, etc. If I really wanted to go nuts, I would consider making class function, et.al. template classes with machine_function and machine_function_state part of the object instead of pointers to separate objects to reduce accesses down to a single this/offset, but now I I'm *really* digressing... Please free to move it. Thanks, Daniel
Re: [PATCH] [i386] Recompute the frame layout less often
On 05/14/2017 02:42 AM, Bernd Edlinger wrote: Hi, this patch uses the new TARGET_COMPUTE_FRAME_LAYOUT hook in the i386 backend to avoid re-computing the frame layout when not really necessary. It simplifies the logic in ix86_compute_frame_layout by removing the use_fast_prologue_epilogue_nregs, which is no longer necessary, because the frame layout can no longer change spontaneously. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. I think Uros is about to commit my improvements to ms to sysv abi calls, which is a large change and will conflict with your patch. I've added several new fields to struct ix86_frame that will need to be merged (and moved to i386.h). I believe that my only explicit check of crtl->stack_realign_finalized is during pro/epilogue expand, and not in ix86_compute_frame_layout. A former incarnation of my patches needed ix86_compute_frame_layout to be called *after* it was set, but I believe that is no longer the case, and so shouldn't conflict, but retesting should certainly be done. https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01338.html Thanks, Daniel
Re: [PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 05/13/2017 11:52 AM, Uros Bizjak wrote: On Sat, May 13, 2017 at 1:01 AM, Daniel Santos <daniel.san...@pobox.com> wrote: Ping? I have posted revisions of the following in patch set: 05/12 - https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01442.html 09/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00348.html 11/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00350.html I have retested them on Linux x86-64 in addition a Wine testsuite comparison resulting in fewer failed tests (31) than when using unpatched 7.1.0 (78) and 5.4.0 (78). A cursory examination of the now working failures with 7.1.0 seemed to be to be due to race conditions in Wine that are incidentally hidden after the patches. Is there anything else needed before we can commit these? They still rebase cleanly onto the HEAD, but I can repost as "v5" if you prefer. Please go ahead and commit the patches. However, please stay around to fix possible fallout. As said - you are touching quite complex part of the compiler ... Thanks, Uros. Thanks! I'll definitely be around, I have a lot more that I'm working on with C generics/pseudo-templates (all middle-end stuff). I also want to examine more ways that SSE saves/restores can be omitted in these ms to sysv calls through static analysis and such. Anyway, I don't yet have SVN write access, will you sponsor my request? Thanks, Daniel
[PING] [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
Ping? I have posted revisions of the following in patch set: 05/12 - https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01442.html 09/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00348.html 11/12 - https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00350.html I have retested them on Linux x86-64 in addition a Wine testsuite comparison resulting in fewer failed tests (31) than when using unpatched 7.1.0 (78) and 5.4.0 (78). A cursory examination of the now working failures with 7.1.0 seemed to be to be due to race conditions in Wine that are incidentally hidden after the patches. Is there anything else needed before we can commit these? They still rebase cleanly onto the HEAD, but I can repost as "v5" if you prefer. Thanks, Daniel
Re: [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 05/06/2017 03:22 PM, Daniel Santos wrote: gcc-5.4.0 CFLAGS="-march=native -O2 -g": 74 gcc-7.1.0 CFLAGS="-march=native -O2 -g": 74 gcc-7.1.0 CFLAGS="-march=nocona -mtune=generic -O2 -g": 79 gcc-7.1.0 CFLAGS="-march=native -O2 -g -mcall-ms2sysv-xlogues" (patched): 31 I'm building out a clean test environment on another machine to try to rule out clutter issues (and video driver issues) on my workstation. Daniel I've re-run Wine's tests with a new clean VM environment and some changes to include more tests and similar results: Compiler Failures gcc-4.9.4: 39 gcc-7.1.0: 78 gcc-7.1.0-patched (with -mcall-ms2sysv-xlogues): 40 The first error not present in the gcc-4.9.4 tests that I examined looked like a run-of-the-mill race condition in Wine that just happened to not crash when built with 4.9.4. So I'm going to guess that the disappearance of these failures with -mcall-ms2sysv-xlogues is just incidental. I think we're in good condition with this patch set. Daniel
Re: [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 05/05/2017 03:56 AM, Daniel Santos wrote: On 05/02/2017 05:40 AM, Kai Tietz wrote: Right, and Wine people will tell, if something doesn't work for them. So ok for me too. Kai Well, I haven't re-run these tests in a few months, but I got 272 failed wine tests with gcc 7.1 and 234 with my patch set rebased onto 7.1. So it looks like I'll be trying to diagnose these failures this weekend. Those are bad numbers. I had forgotten to filter out the testlist.o files. Below are my most recent numbers running Wine 2.7: gcc-5.4.0 CFLAGS="-march=native -O2 -g": 74 gcc-7.1.0 CFLAGS="-march=native -O2 -g": 74 gcc-7.1.0 CFLAGS="-march=nocona -mtune=generic -O2 -g": 79 gcc-7.1.0 CFLAGS="-march=native -O2 -g -mcall-ms2sysv-xlogues" (patched): 31 I'm building out a clean test environment on another machine to try to rule out clutter issues (and video driver issues) on my workstation. Daniel
Re: [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 05/02/2017 05:40 AM, Kai Tietz wrote: Right, and Wine people will tell, if something doesn't work for them. So ok for me too. Kai Well, I haven't re-run these tests in a few months, but I got 272 failed wine tests with gcc 7.1 and 234 with my patch set rebased onto 7.1. So it looks like I'll be trying to diagnose these failures this weekend. Daniel
[PATCH 11/12 rev1] [i386] Add remainder of -mcall-ms2sysv-xlogues implementation
Now generates RTL with appropriate stack restore and leave patterns. Slightly cleaned up code that calculates the number of vector elements for clarity. Tests are good when rebased onto gcc-7_1_0-release as HEAD currently fails to bootstrap. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/i386.c | 287 +++-- 1 file changed, 278 insertions(+), 9 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index f2772b2d10e..e43dc819f9a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -14148,6 +14148,78 @@ ix86_elim_entry_set_got (rtx reg) } } +static rtx +gen_frame_set (rtx reg, rtx frame_reg, int offset, bool store) +{ + rtx addr, mem; + + if (offset) +addr = gen_rtx_PLUS (Pmode, frame_reg, GEN_INT (offset)); + mem = gen_frame_mem (GET_MODE (reg), offset ? addr : frame_reg); + return gen_rtx_SET (store ? mem : reg, store ? reg : mem); +} + +static inline rtx +gen_frame_load (rtx reg, rtx frame_reg, int offset) +{ + return gen_frame_set (reg, frame_reg, offset, false); +} + +static inline rtx +gen_frame_store (rtx reg, rtx frame_reg, int offset) +{ + return gen_frame_set (reg, frame_reg, offset, true); +} + +static void +ix86_emit_outlined_ms2sysv_save (const struct ix86_frame ) +{ + struct machine_function *m = cfun->machine; + const unsigned ncregs = NUM_X86_64_MS_CLOBBERED_REGS + + m->call_ms2sysv_extra_regs; + rtvec v = rtvec_alloc (ncregs + 1); + unsigned int align, i, vi = 0; + rtx_insn *insn; + rtx sym, addr; + rtx rax = gen_rtx_REG (word_mode, AX_REG); + const struct xlogue_layout = xlogue_layout::get_instance (); + HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset; + HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset; + HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in (); + + /* Verify that the incoming stack 16-byte alignment offset matches the + layout we're using. */ + gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD)); + + /* Get the stub symbol. */ + sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP + : XLOGUE_STUB_SAVE); + RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym); + + /* Setup RAX as the stub's base pointer. */ + align = GET_MODE_ALIGNMENT (V4SFmode); + addr = choose_baseaddr (rax_offset, ); + gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode)); + insn = emit_insn (gen_rtx_SET (rax, addr)); + + gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ()); + pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, +GEN_INT (-stack_alloc_size), -1, +m->fs.cfa_reg == stack_pointer_rtx); + for (i = 0; i < ncregs; ++i) +{ + const xlogue_layout::reginfo = xlogue.get_reginfo (i); + rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode), +r.regno); + RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);; +} + + gcc_assert (vi == (unsigned)GET_NUM_ELEM (v)); + + insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, v)); + RTX_FRAME_RELATED_P (insn) = true; +} + /* Expand the prologue into a bunch of separate insns. */ void @@ -14395,7 +14467,7 @@ ix86_expand_prologue (void) performing the actual alignment. Otherwise we cannot guarantee that there's enough storage above the realignment point. */ allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset; - if (allocate) + if (allocate && !m->call_ms2sysv) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-allocate), -1, false); @@ -14403,7 +14475,6 @@ ix86_expand_prologue (void) insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (-align_bytes))); - /* For the purposes of register save area addressing, the stack pointer can no longer be used to access anything in the frame below m->fs.sp_realigned_offset and the frame pointer cannot be @@ -14420,6 +14491,9 @@ ix86_expand_prologue (void) m->fs.sp_valid = false; } + if (m->call_ms2sysv) +ix86_emit_outlined_ms2sysv_save (frame); + allocate = frame.stack_pointer_offset - m->fs.sp_offset; if (flag_stack_usage_info) @@ -14740,17 +14814,19 @@ ix86_emit_restore_regs_using_pop (void) unsigned int regno; for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) -if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false)) +if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, false, true)) ix86_emit_restore_reg_using_pop (gen_rtx_REG (word_mode, regno));
[PATCH 09/12 rev1] [i386] Add patterns and predicates mcall-ms2sysv-xlogues
I've cleaned up the patterns and predicates as per your instructions, resulting in 74 less lines of code. Adding explicit insns to restore the stack pointer and pointer perform the "leave" (to the patterns restore_multiple_and_return and restore_multiple_leave_return, respectively) disambiguates them just fine without the const_int tag while correctly describing exactly what the pattern does. Thanks for your guidance. I understand RTL much better now. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/predicates.md | 81 +++ gcc/config/i386/sse.md| 37 2 files changed, 118 insertions(+) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 8f250a2e720..e7371a41b16 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1657,3 +1657,84 @@ (ior (match_operand 0 "register_operand") (and (match_code "const_int") (match_test "op == constm1_rtx" + +;; Return true if the vector ends with between 12 and 18 register saves using +;; RAX as the base address. +(define_predicate "save_multiple" + (match_code "parallel") +{ + const unsigned len = XVECLEN (op, 0); + unsigned i; + + /* Starting from end of vector, count register saves. */ + for (i = 0; i < len; ++i) +{ + rtx src, dest, addr; + rtx e = XVECEXP (op, 0, len - 1 - i); + + if (GET_CODE (e) != SET) + break; + + src = SET_SRC (e); + dest = SET_DEST (e); + + if (!REG_P (src) || !MEM_P (dest)) + break; + + addr = XEXP (dest, 0); + + /* Good if dest address is in RAX. */ + if (REG_P (addr) && REGNO (addr) == AX_REG) + continue; + + /* Good if dest address is offset of RAX. */ + if (GET_CODE (addr) == PLUS + && REG_P (XEXP (addr, 0)) + && REGNO (XEXP (addr, 0)) == AX_REG) + continue; + + break; +} + return (i >= 12 && i <= 18); +}) + + +;; Return true if the vector ends with between 12 and 18 register loads using +;; RSI as the base address. +(define_predicate "restore_multiple" + (match_code "parallel") +{ + const unsigned len = XVECLEN (op, 0); + unsigned i; + + /* Starting from end of vector, count register restores. */ + for (i = 0; i < len; ++i) +{ + rtx src, dest, addr; + rtx e = XVECEXP (op, 0, len - 1 - i); + + if (GET_CODE (e) != SET) + break; + + src = SET_SRC (e); + dest = SET_DEST (e); + + if (!MEM_P (src) || !REG_P (dest)) + break; + + addr = XEXP (src, 0); + + /* Good if src address is in RSI. */ + if (REG_P (addr) && REGNO (addr) == SI_REG) + continue; + + /* Good if src address is offset of RSI. */ + if (GET_CODE (addr) == PLUS + && REG_P (XEXP (addr, 0)) + && REGNO (XEXP (addr, 0)) == SI_REG) + continue; + + break; +} + return (i >= 12 && i <= 18); +}) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 094404bc913..d488b25c254 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -20010,3 +20010,40 @@ (match_operand:VI48_512 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512VPOPCNTDQ" "vpopcnt\t{%1, %0|%0, %1}") + +;; Save multiple registers out-of-line. +(define_insn "save_multiple" + [(match_parallel 0 "save_multiple" +[(use (match_operand:P 1 "symbol_operand"))])] + "TARGET_SSE && TARGET_64BIT" + "call\t%P1") + +;; Restore multiple registers out-of-line. +(define_insn "restore_multiple" + [(match_parallel 0 "restore_multiple" +[(use (match_operand:P 1 "symbol_operand"))])] + "TARGET_SSE && TARGET_64BIT" + "call\t%P1") + +;; Restore multiple registers out-of-line and return. +(define_insn "restore_multiple_and_return" + [(match_parallel 0 "restore_multiple" +[(return) + (use (match_operand:P 1 "symbol_operand")) + (set (reg:DI SP_REG) (reg:DI R10_REG)) +])] + "TARGET_SSE && TARGET_64BIT" + "jmp\t%P1") + +;; Restore multiple registers out-of-line when hard frame pointer is used, +;; perform the leave operation prior to returning (from the function). +(define_insn "restore_multiple_leave_return" + [(match_parallel 0 "restore_multiple" +[(return) + (use (match_operand:P 1 "symbol_operand")) + (set (reg:DI SP_REG) (plus:DI (reg:DI BP_REG) (const_int 8))) + (set (reg:DI BP_REG) (mem:DI (reg:DI BP_REG))) + (clobber (mem:BLK (scratch))) +])] + "TARGET_SSE && TARGET_64BIT" + "jmp\t%P1") -- 2.11.0
Re: [PATCH 09/12] [i386] Add patterns and predicates foutline-msabi-xlouges
On 05/03/2017 01:10 AM, Uros Bizjak wrote: The order of subexpressions of parallel in general does not matter. Thanks, this makes things much clearer. Also, I'm wondering if there's anything wrong with calling ix86_gen_leave () and plucking the insns out of the generated parallel insn and moving that into my own parallel rather than generating them in my own function. I guess all the matters is what is cleanest. Hm... I'd rather see subexpressions generated "by hand". OK. While we're on the topic, are you OK with my changes to ix86_emit_leave to generate the notes or would you prefer those by hand as well? Also, are these predicates what you had in mind? (I haven't actually tested them just yet.) (define_predicate "save_multiple" (match_code "parallel") { const unsigned len = XVECLEN (op, 0); unsigned i; /* Starting from end of vector, count register saves. */ for (i = 0; i < len; ++i) { rtx src, dest, addr; rtx e = XVECEXP (op, 0, len - 1 - i); if (GET_CODE (e) != SET) break; src = SET_SRC (e); dest = SET_DEST (e); if (!REG_P (src) || !MEM_P (dest)) break; addr = XEXP (dest, 0); /* Good if dest address is in RAX. */ if (REG_P (addr) && REGNO (addr) == AX_REG) continue; /* Good if dest address is offset of RAX. */ if (GET_CODE (addr) == PLUS && REG_P (XEXP (addr, 0)) && REGNO (XEXP (addr, 0)) == AX_REG) continue; break; } return (i >= 12 && i <= 18); }) (define_predicate "restore_multiple" (match_code "parallel") { const unsigned len = XVECLEN (op, 0); unsigned i; /* Starting from end of vector, count register restores. */ for (i = 0; i < len; ++i) { rtx src, dest, addr; rtx e = XVECEXP (op, 0, len - 1 - i); if (GET_CODE (e) != SET) break; src = SET_SRC (e); dest = SET_DEST (e); if (!MEM_P (src) || !REG_P (dest)) break; addr = XEXP (src, 0); /* Good if src address is in RSI. */ if (REG_P (addr) && REGNO (addr) == SI_REG) continue; /* Good if src address is offset of RSI. */ if (GET_CODE (addr) == PLUS && REG_P (XEXP (addr, 0)) && REGNO (XEXP (addr, 0)) == SI_REG) continue; break; } return (i >= 12 && i <= 18); }) Thanks, Daniel
Re: [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 05/02/2017 05:40 AM, Kai Tietz wrote: Right, and Wine people will tell, if something doesn't work for them. So ok for me too. Kai Yes, and I although I haven't repeated the Wine tests in a few months, little has changed since my last run. I'll be running them again soon anyway. Daniel
Re: [PATCH v4 0/12] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
On 05/02/2017 05:21 AM, JonY wrote: On 05/01/2017 11:31 AM, Uros Bizjak wrote: I also assume that Cygwin and MinGW people agree with the patch and the functionality itself. Uros. Cygwin and MinGW does not use SysV/MS transitions directly in their own code, changes should be OK. And to be clear, this did initially have a failed gcc test on Cygwin due to the aligned SSE MOVs portion of the patch set (this is the first three patches) and this is resolved by disabling that feature on SEH targets. This is the last two lines in the below chunk from 3/12: @@ -14080,11 +14102,19 @@ ix86_expand_prologue (void) GEN_INT (-align_bytes))); /* For the purposes of register save area addressing, the stack - pointer is no longer valid. As for the value of sp_offset, -see ix86_compute_frame_layout, which we need to match in order -to pass verification of stack_pointer_offset at the end. */ +pointer can no longer be used to access anything in the frame +below m->fs.sp_realigned_offset and the frame pointer cannot be +used for anything at or above. */ m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes); - m->fs.sp_valid = false; + m->fs.sp_realigned = true; + m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16; + gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset); + /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which +is needed to describe where a register is saved using a realigned +stack pointer, so we need to invalidate the stack pointer for that +target. */ + if (TARGET_SEH) + m->fs.sp_valid = false; } allocate = frame.stack_pointer_offset - m->fs.sp_offset; Still I have run complete tests on Cygwin with 32- and 64-bit using both the Cygwin and MinGW compilers. Daniel
Re: [PATCH 09/12] [i386] Add patterns and predicates foutline-msabi-xlouges
Thank you for the review. On 05/01/2017 06:18 AM, Uros Bizjak wrote: On Thu, Apr 27, 2017 at 10:09 AM, Daniel Santos <daniel.san...@pobox.com> wrote: Adds the predicates save_multiple and restore_multiple to predicates.md, which are used by following patterns in sse.md: * save_multiple - insn that calls a save stub * restore_multiple - call_insn that calls a save stub and returns to the function to allow a sibling call (which should typically offer better optimization than the restore stub as the tail call) * restore_multiple_and_return - a jump_insn that returns from the function as a tail-call. * restore_multiple_leave_return - like the above, but restores the frame pointer before returning. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/config/i386/predicates.md | 155 ++ gcc/config/i386/sse.md| 37 ++ 2 files changed, 192 insertions(+) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 8f250a2e720..36fe8abc3f4 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1657,3 +1657,158 @@ (ior (match_operand 0 "register_operand") (and (match_code "const_int") (match_test "op == constm1_rtx" + +;; Return true if: +;; 1. first op is a symbol reference, +;; 2. >= 13 operands, and +;; 3. operands 2 to end is one of: +;; a. save a register to a memory location, or +;; b. restore stack pointer. +(define_predicate "save_multiple" + (match_code "parallel") +{ + const unsigned nregs = XVECLEN (op, 0); + rtx head = XVECEXP (op, 0, 0); + unsigned i; + + if (GET_CODE (head) != USE) +return false; + else +{ + rtx op0 = XEXP (head, 0); + if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF) + return false; +} + + if (nregs < 13) +return false; + + for (i = 2; i < nregs; i++) +{ + rtx e, src, dest; + + e = XVECEXP (op, 0, i); + + switch (GET_CODE (e)) + { + case SET: + src = SET_SRC (e); + dest = SET_DEST (e); + + /* storing a register to memory. */ + if (GET_CODE (src) == REG && GET_CODE (dest) == MEM) Please use REG_P (...) and MEM_P (...) - and possible others - predicates in the code. + { + rtx addr = XEXP (dest, 0); + + /* Good if dest address is in RAX. */ + if (GET_CODE (addr) == REG + && REGNO (addr) == AX_REG) + continue; + + /* Good if dest address is offset of RAX. */ + if (GET_CODE (addr) == PLUS + && GET_CODE (XEXP (addr, 0)) == REG + && REGNO (XEXP (addr, 0)) == AX_REG) + continue; + } + break; + + default: + break; + } + return false; +} + return true; +}) + +;; Return true if: +;; * first op is (return) or a a use (symbol reference), +;; * >= 14 operands, and +;; * operands 2 to end are one of: +;; - restoring a register from a memory location that's an offset of RSI. +;; - clobbering a reg +;; - adjusting SP +(define_predicate "restore_multiple" + (match_code "parallel") +{ + const unsigned nregs = XVECLEN (op, 0); + rtx head = XVECEXP (op, 0, 0); + unsigned i; + + switch (GET_CODE (head)) +{ + case RETURN: + i = 3; + break; + + case USE: + { + rtx op0 = XEXP (head, 0); + + if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF) + return false; + + i = 1; + break; + } + + default: + return false; +} + + if (nregs < i + 12) +return false; + + for (; i < nregs; i++) +{ + rtx e, src, dest; + + e = XVECEXP (op, 0, i); + + switch (GET_CODE (e)) + { + case CLOBBER: + continue; I don't see where CLOBBER is genreated in ix86_emit_outlined_ms2sysv_restore. I think this is clutter that I didn't remove after changing the stubs. + + case SET: + src = SET_SRC (e); + dest = SET_DEST (e); + + /* Restoring a register from memory. */ + if (GET_CODE (src) == MEM && GET_CODE (dest) == REG) + { + rtx addr = XEXP (src, 0); + + /* Good if src address is in RSI. */ + if (GET_CODE (addr) == REG + && REGNO (addr) == SI_REG) + continue; + + /* Good if src address is offset of RSI. */ + if (GET_CODE (addr) == PLUS + && GET_CODE (XEXP (addr, 0)) == REG + && REGNO (XEXP (addr, 0)) == SI_REG) + continue; + + /* Good if adjusting stack pointer. */ + if (GET_C
Re: [PATCH 05/12 rev 1] [i386] Add option -mcall-ms2sysv-xlogues
Oops. I blame my fingers. :) Daniel --- gcc/config/i386/i386.c | 6 +- gcc/config/i386/i386.opt | 4 gcc/doc/invoke.texi | 13 - 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 113f83742c2..521116195cb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4508,7 +4508,8 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2, { "-mstv", MASK_STV }, { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD }, { "-mavx256-split-unaligned-store", MASK_AVX256_SPLIT_UNALIGNED_STORE }, -{ "-mprefer-avx128", MASK_PREFER_AVX128 } +{ "-mprefer-avx128", MASK_PREFER_AVX128 }, +{ "-mcall-ms2sysv-xlogues",MASK_CALL_MS2SYSV_XLOGUES } }; /* Additional flag options. */ @@ -6319,6 +6320,9 @@ ix86_option_override_internal (bool main_args_p, #endif } + if (TARGET_SEH && TARGET_CALL_MS2SYSV_XLOGUES) +sorry ("-mcall-ms2sysv-xlogues isn%'t currently supported with SEH"); + if (!(opts_set->x_target_flags & MASK_VZEROUPPER)) opts->x_target_flags |= MASK_VZEROUPPER; if (!(opts_set->x_target_flags & MASK_STV)) diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 9384e29b1de..65b228544a5 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -538,6 +538,10 @@ Enum(calling_abi) String(sysv) Value(SYSV_ABI) EnumValue Enum(calling_abi) String(ms) Value(MS_ABI) +mcall-ms2sysv-xlogues +Target Report Mask(CALL_MS2SYSV_XLOGUES) Save +Use libgcc stubs to save and restore registers clobbered by 64-bit Microsoft to System V ABI calls. + mveclibabi= Target RejectNegative Joined Var(ix86_veclibabi_type) Enum(ix86_veclibabi) Init(ix86_veclibabi_type_none) Vector library ABI to use. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 0eeea7b3b87..d9894f37ee5 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1209,7 +1209,7 @@ See RS/6000 and PowerPC Options. -msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol -mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol -malign-data=@var{type} -mstack-protector-guard=@var{guard} @gol --mmitigate-rop -mgeneral-regs-only} +-mmitigate-rop -mgeneral-regs-only -mcall-ms2sysv-xlogues} @emph{x86 Windows Options} @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol @@ -25308,6 +25308,17 @@ You can control this behavior for specific functions by using the function attributes @code{ms_abi} and @code{sysv_abi}. @xref{Function Attributes}. +@item -mcall-ms2sysv-xlogues +@opindex mcall-ms2sysv-xlogues +@opindex mno-call-ms2sysv-xlogues +Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a +System V ABI function must consider RSI, RDI and XMM6-15 as clobbered. By +default, the code for saving and restoring these registers is emitted inline, +resulting in fairly lengthy prologues and epilogues. Using +@option{-mcall-ms2sysv-xlogues} emits prologues and epilogues that +use stubs in the static portion of libgcc to perform these saves and restores, +thus reducing function size at the cost of a few extra instructions. + @item -mtls-dialect=@var{type} @opindex mtls-dialect Generate code to access thread-local storage using the @samp{gnu} or -- 2.11.0
Re: [PATCH v4 0/12 GCC8] [i386] Improve 64-bit Microsoft to System V ABI pro/epilogues
I probably should have mentioned that these are all for GCC8.
[PATCH 12/12] [i386,testsuite] Test program for ms to sysv abi function calls.
A comprehensive program for testing x86_64 ms_abi functions that call sysv_abi functions to help validate -mcall-ms2sysv-xlogues and use of aligned SSE MOVs after a (non-DRAP) realigned stack. Signed-off-by: Daniel Santos <daniel.san...@pobox.com> --- gcc/Makefile.in| 2 + .../gcc.target/x86_64/abi/ms-sysv/do-test.S| 163 + gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc | 807 + .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c| 373 ++ .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp | 178 + 5 files changed, 1523 insertions(+) create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp diff --git a/gcc/Makefile.in b/gcc/Makefile.in index f675e073ecc..7f7c238127b 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3807,7 +3807,9 @@ site.exp: ./config.status Makefile @echo "set CFLAGS \"\"" >> ./site.tmp @echo "set CXXFLAGS \"\"" >> ./site.tmp @echo "set HOSTCC \"$(CC)\"" >> ./site.tmp + @echo "set HOSTCXX \"$(CXX)\"" >> ./site.tmp @echo "set HOSTCFLAGS \"$(CFLAGS)\"" >> ./site.tmp + @echo "set HOSTCXXFLAGS \"$(CXXFLAGS)\"" >> ./site.tmp # TEST_ALWAYS_FLAGS are flags that should be passed to every compilation. # They are passed first to allow individual tests to override them. @echo "set TEST_ALWAYS_FLAGS \"$(SYSROOT_CFLAGS_FOR_TARGET)\"" >> ./site.tmp diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S new file mode 100644 index 000..1395235fd1e --- /dev/null +++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S @@ -0,0 +1,163 @@ +/* Assembly proxy functions for ms_abi tests. + Copyright (C) 2016-2017 Free Software Foundation, Inc. + Contributed by Daniel Santos <daniel.san...@pobox.com> + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +#ifdef __x86_64__ + +# ifdef __ELF__ +# define ELFFN_BEGIN(fn) .type fn,@function +# define ELFFN_END(fn) .size fn,.-fn +# else +# define ELFFN_BEGIN(fn) +# define ELFFN_END(fn) +# endif + +# define FUNC(fn) \ + .global fn; \ + ELFFN_BEGIN(fn);\ +fn: + +#define FUNC_END(fn) ELFFN_END(fn) + +# ifdef __AVX__ +# define MOVAPS vmovaps +# else +# define MOVAPS movaps +# endif + +/* TODO: Is there a cleaner way to provide these offsets? */ + .struct 0 +test_data_save: + + .struct test_data_save + 224 +test_data_input: + + .struct test_data_save + 448 +test_data_output: + + .struct test_data_save + 672 +test_data_fn: + + .struct test_data_save + 680 +test_data_retaddr: + + .text + +regs_to_mem: + MOVAPS %xmm6, (%rax) + MOVAPS %xmm7, 0x10(%rax) + MOVAPS %xmm8, 0x20(%rax) + MOVAPS %xmm9, 0x30(%rax) + MOVAPS %xmm10, 0x40(%rax) + MOVAPS %xmm11, 0x50(%rax) + MOVAPS %xmm12, 0x60(%rax) + MOVAPS %xmm13, 0x70(%rax) + MOVAPS %xmm14, 0x80(%rax) + MOVAPS %xmm15, 0x90(%rax) + mov %rsi, 0xa0(%rax) + mov %rdi, 0xa8(%rax) + mov %rbx, 0xb0(%rax) + mov %rbp, 0xb8(%rax) + mov %r12, 0xc0(%rax) + mov %r13, 0xc8(%rax) + mov %r14, 0xd0(%rax) + mov %r15, 0xd8(%rax) + retq + +mem_to_regs: + MOVAPS (%rax), %xmm6 + MOVAPS 0x10(%rax),%xmm7 + MOVAPS 0x20(%rax),%xmm8 + MOVAPS 0x30(%rax),%xmm9 + MOVAPS 0x40(%rax),%xmm10 + MOVAPS 0x50(%rax),%xmm11 + MOVAPS 0x60(%rax),%xmm12 + MOVAPS 0x70(%rax),%xmm13 + MOVAPS 0x80(%rax),%xmm14 + MOVAPS 0x90(%rax),%xmm15 + mov 0