Re: AMDGPU and 16B stack alignment
On Wed, Oct 16, 2019 at 11:55 AM Arvind Sankar wrote: > > On Tue, Oct 15, 2019 at 06:51:26PM -0700, Nick Desaulniers wrote: > > On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar wrote: > > > > > > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote: > > > > Hmmm...I would have liked to remove it outright, as it is an ABI > > > > mismatch that is likely to result in instability and non-fun-to-debug > > > > runtime issues in the future. I suspect my patch does work for GCC > > > > 7.1+. The question is: Do we want to either: > > > > 1. mark AMDGPU broken for GCC < 7.1, or > > > > 2. continue supporting it via stack alignment mismatch? > > > > > > > > 2 is brittle, and may break at any point in the future, but if it's > > > > working for someone it does make me feel bad to outright disable it. > > > > What I'd image 2 looks like is (psuedo code in a Makefile): > > > > > > > > if CC_IS_GCC && GCC_VERSION < 7.1: > > > > set stack alignment to 16B and hope for the best > > > > > > > > So my diff would be amended to keep the stack alignment flags, but > > > > only to support GCC < 7.1. And that assumes my change compiles with > > > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > > > > feel even more confident if someone with hardware to test on and GCC > > > > 7.1+ could boot test). > > > > -- > > > > Thanks, > > > > ~Nick Desaulniers > > > > > > If we do keep it, would adding -mstackrealign make it more robust? > > > That's simple and will only add the alignment to functions that require > > > 16-byte alignment (at least on gcc). > > > > I think there's also `-mincoming-stack-boundary=`. > > https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017 > > Yes, but -mstackrealign looks like it's supported by clang as well. Good to know, but I want less duct tape, not more. > > > > > > > > Alternative is to use > > > __attribute__((force_align_arg_pointer)) on functions that might be > > > called from 8-byte-aligned code. > > > > Which is hard to automate and easy to forget. Likely a large diff to fix > > today. > > Right, this is a no-go, esp to just fix old compilers. > > > > > > > > It looks like -mstackrealign should work from gcc 5.3 onwards. > > > > The kernel would generally like to support GCC 4.9+. > > > > There's plenty of different ways to keep layering on duct tape and > > bailing wire to support differing ABIs, but that's just adding > > technical debt that will have to be repaid one day. That's why the > > cleanest solution IMO is mark the driver broken for old toolchains, > > and use a code-base-consistent stack alignment. Bending over > > backwards to support old toolchains means accepting stack alignment > > mismatches, which is in the "unspecified behavior" ring of the > > "undefined behavior" Venn diagram. I have the same opinion on relying > > on explicitly undefined behavior. > > > > I'll send patches for fixing up Clang, but please consider my strong > > advice to generally avoid stack alignment mismatches, regardless of > > compiler. > > -- > > Thanks, > > ~Nick Desaulniers > > What I suggested was in reference to your proposal for dropping the > -mpreferred-stack-boundary=4 for modern compilers, but keeping it for > <7.1. -mstackrealign would at least let 5.3 onwards be less likely to > break (and it doesn't error before then, I think it just doesn't > actually do anything, so no worse than now at least). > > Simply dropping support for <7.1 would be cleanest, yes, but it sounds > like people don't want to go that far. That's fair. I've included your suggestions in the commit message of 02/03 of a series I just sent but forgot to in reply to this thread: https://lkml.org/lkml/2019/10/16/1700 Also, I do appreciate the suggestions and understand the value of brainstorming. -- Thanks, ~Nick Desaulniers
Re: AMDGPU and 16B stack alignment
On Tue, Oct 15, 2019 at 06:51:26PM -0700, Nick Desaulniers wrote: > On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar wrote: > > > > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote: > > > Hmmm...I would have liked to remove it outright, as it is an ABI > > > mismatch that is likely to result in instability and non-fun-to-debug > > > runtime issues in the future. I suspect my patch does work for GCC > > > 7.1+. The question is: Do we want to either: > > > 1. mark AMDGPU broken for GCC < 7.1, or > > > 2. continue supporting it via stack alignment mismatch? > > > > > > 2 is brittle, and may break at any point in the future, but if it's > > > working for someone it does make me feel bad to outright disable it. > > > What I'd image 2 looks like is (psuedo code in a Makefile): > > > > > > if CC_IS_GCC && GCC_VERSION < 7.1: > > > set stack alignment to 16B and hope for the best > > > > > > So my diff would be amended to keep the stack alignment flags, but > > > only to support GCC < 7.1. And that assumes my change compiles with > > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > > > feel even more confident if someone with hardware to test on and GCC > > > 7.1+ could boot test). > > > -- > > > Thanks, > > > ~Nick Desaulniers > > > > If we do keep it, would adding -mstackrealign make it more robust? > > That's simple and will only add the alignment to functions that require > > 16-byte alignment (at least on gcc). > > I think there's also `-mincoming-stack-boundary=`. > https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017 Yes, but -mstackrealign looks like it's supported by clang as well. > > > > > Alternative is to use > > __attribute__((force_align_arg_pointer)) on functions that might be > > called from 8-byte-aligned code. > > Which is hard to automate and easy to forget. Likely a large diff to fix > today. Right, this is a no-go, esp to just fix old compilers. > > > > > It looks like -mstackrealign should work from gcc 5.3 onwards. > > The kernel would generally like to support GCC 4.9+. > > There's plenty of different ways to keep layering on duct tape and > bailing wire to support differing ABIs, but that's just adding > technical debt that will have to be repaid one day. That's why the > cleanest solution IMO is mark the driver broken for old toolchains, > and use a code-base-consistent stack alignment. Bending over > backwards to support old toolchains means accepting stack alignment > mismatches, which is in the "unspecified behavior" ring of the > "undefined behavior" Venn diagram. I have the same opinion on relying > on explicitly undefined behavior. > > I'll send patches for fixing up Clang, but please consider my strong > advice to generally avoid stack alignment mismatches, regardless of > compiler. > -- > Thanks, > ~Nick Desaulniers What I suggested was in reference to your proposal for dropping the -mpreferred-stack-boundary=4 for modern compilers, but keeping it for <7.1. -mstackrealign would at least let 5.3 onwards be less likely to break (and it doesn't error before then, I think it just doesn't actually do anything, so no worse than now at least). Simply dropping support for <7.1 would be cleanest, yes, but it sounds like people don't want to go that far.
Re: AMDGPU and 16B stack alignment
On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar wrote: > > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote: > > Hmmm...I would have liked to remove it outright, as it is an ABI > > mismatch that is likely to result in instability and non-fun-to-debug > > runtime issues in the future. I suspect my patch does work for GCC > > 7.1+. The question is: Do we want to either: > > 1. mark AMDGPU broken for GCC < 7.1, or > > 2. continue supporting it via stack alignment mismatch? > > > > 2 is brittle, and may break at any point in the future, but if it's > > working for someone it does make me feel bad to outright disable it. > > What I'd image 2 looks like is (psuedo code in a Makefile): > > > > if CC_IS_GCC && GCC_VERSION < 7.1: > > set stack alignment to 16B and hope for the best > > > > So my diff would be amended to keep the stack alignment flags, but > > only to support GCC < 7.1. And that assumes my change compiles with > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > > feel even more confident if someone with hardware to test on and GCC > > 7.1+ could boot test). > > -- > > Thanks, > > ~Nick Desaulniers > > If we do keep it, would adding -mstackrealign make it more robust? > That's simple and will only add the alignment to functions that require > 16-byte alignment (at least on gcc). I think there's also `-mincoming-stack-boundary=`. https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017 > > Alternative is to use > __attribute__((force_align_arg_pointer)) on functions that might be > called from 8-byte-aligned code. Which is hard to automate and easy to forget. Likely a large diff to fix today. > > It looks like -mstackrealign should work from gcc 5.3 onwards. The kernel would generally like to support GCC 4.9+. There's plenty of different ways to keep layering on duct tape and bailing wire to support differing ABIs, but that's just adding technical debt that will have to be repaid one day. That's why the cleanest solution IMO is mark the driver broken for old toolchains, and use a code-base-consistent stack alignment. Bending over backwards to support old toolchains means accepting stack alignment mismatches, which is in the "unspecified behavior" ring of the "undefined behavior" Venn diagram. I have the same opinion on relying on explicitly undefined behavior. I'll send patches for fixing up Clang, but please consider my strong advice to generally avoid stack alignment mismatches, regardless of compiler. -- Thanks, ~Nick Desaulniers
Re: AMDGPU and 16B stack alignment
On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote: > Hmmm...I would have liked to remove it outright, as it is an ABI > mismatch that is likely to result in instability and non-fun-to-debug > runtime issues in the future. I suspect my patch does work for GCC > 7.1+. The question is: Do we want to either: > 1. mark AMDGPU broken for GCC < 7.1, or > 2. continue supporting it via stack alignment mismatch? > > 2 is brittle, and may break at any point in the future, but if it's > working for someone it does make me feel bad to outright disable it. > What I'd image 2 looks like is (psuedo code in a Makefile): > > if CC_IS_GCC && GCC_VERSION < 7.1: > set stack alignment to 16B and hope for the best > > So my diff would be amended to keep the stack alignment flags, but > only to support GCC < 7.1. And that assumes my change compiles with > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > feel even more confident if someone with hardware to test on and GCC > 7.1+ could boot test). > -- > Thanks, > ~Nick Desaulniers If we do keep it, would adding -mstackrealign make it more robust? That's simple and will only add the alignment to functions that require 16-byte alignment (at least on gcc). Alternative is to use __attribute__((force_align_arg_pointer)) on functions that might be called from 8-byte-aligned code. It looks like -mstackrealign should work from gcc 5.3 onwards.
Re: AMDGPU and 16B stack alignment
On Tue, Oct 15, 2019 at 11:30 AM Alex Deucher wrote: > > On Tue, Oct 15, 2019 at 2:07 PM Nick Desaulniers > wrote: > > > > On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann wrote: > > > > > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish wrote: > > > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > > > > > My gcc build fails with below errors: > > > > > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 > > > > and 12 > > > > > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between > > > > 4 and 12 > > > > I was able to reproduce this failure on pre-7.1 versions of GCC. It > > seems that when: > > 1. code is using doubles > > 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment > > than GCC produces that error: > > https://godbolt.org/z/7T8nbH > > > > That's already a tall order of constraints, so it's understandable > > that the compiler would just error likely during instruction > > selection, but was eventually taught how to solve such constraints. > > > > > > > > > > While GPF observed on clang builds seem to be fixed. > > > > Thanks for the report. Your testing these patches is invaluable, Shirish! > > > > > > > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack > > > alignment when > > > SSE is enabled on x86-64, but does not actually rely on that for > > > correct operation > > > unless it's using sse2. So -msse always has to be paired with > > > -mpreferred-stack-boundary=3. > > > > Seemingly only for older versions of GCC, pre 7.1. > > > > > > > > For clang, it sounds like the opposite is true: when passing 16 byte > > > stack alignment > > > and having sse/sse2 enabled, it requires the incoming stack to be 16 > > > byte aligned, > > > > I don't think it requires the incoming stack to be 16B aligned for > > sse2, I think it requires the incoming and current stack alignment to > > match. Today it does not, which is why we observe GPFs. > > > > > but passing 8 byte alignment makes it do the right thing. > > > > > > So, should we just always pass $(call cc-option, > > > -mpreferred-stack-boundary=4) > > > to get the desired outcome on both? > > > > Hmmm...I would have liked to remove it outright, as it is an ABI > > mismatch that is likely to result in instability and non-fun-to-debug > > runtime issues in the future. I suspect my patch does work for GCC > > 7.1+. The question is: Do we want to either: > > 1. mark AMDGPU broken for GCC < 7.1, or > > 2. continue supporting it via stack alignment mismatch? > > > > 2 is brittle, and may break at any point in the future, but if it's > > working for someone it does make me feel bad to outright disable it. > > What I'd image 2 looks like is (psuedo code in a Makefile): > > Well, it's been working as is for years now, at least with gcc, so I'd > hate to break that. Ok, I'm happy to leave that as is for GCC, then. Would you prefer I modify it for GCC >7.1 or just leave it alone (maybe I'll add a comment about *why* it's done for GCC)? Would you prefer 1 patch or 4? > > Alex > > > > > if CC_IS_GCC && GCC_VERSION < 7.1: > > set stack alignment to 16B and hope for the best Ie, this ^ > > > > So my diff would be amended to keep the stack alignment flags, but > > only to support GCC < 7.1. And that assumes my change compiles with > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > > feel even more confident if someone with hardware to test on and GCC > > 7.1+ could boot test). -- Thanks, ~Nick Desaulniers
Re: AMDGPU and 16B stack alignment
On Tue, Oct 15, 2019 at 2:07 PM Nick Desaulniers wrote: > > On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann wrote: > > > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish wrote: > > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > > > My gcc build fails with below errors: > > > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and > > > 12 > > > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 > > > and 12 > > I was able to reproduce this failure on pre-7.1 versions of GCC. It > seems that when: > 1. code is using doubles > 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment > than GCC produces that error: > https://godbolt.org/z/7T8nbH > > That's already a tall order of constraints, so it's understandable > that the compiler would just error likely during instruction > selection, but was eventually taught how to solve such constraints. > > > > > > > While GPF observed on clang builds seem to be fixed. > > Thanks for the report. Your testing these patches is invaluable, Shirish! > > > > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack > > alignment when > > SSE is enabled on x86-64, but does not actually rely on that for > > correct operation > > unless it's using sse2. So -msse always has to be paired with > > -mpreferred-stack-boundary=3. > > Seemingly only for older versions of GCC, pre 7.1. > > > > > For clang, it sounds like the opposite is true: when passing 16 byte > > stack alignment > > and having sse/sse2 enabled, it requires the incoming stack to be 16 > > byte aligned, > > I don't think it requires the incoming stack to be 16B aligned for > sse2, I think it requires the incoming and current stack alignment to > match. Today it does not, which is why we observe GPFs. > > > but passing 8 byte alignment makes it do the right thing. > > > > So, should we just always pass $(call cc-option, > > -mpreferred-stack-boundary=4) > > to get the desired outcome on both? > > Hmmm...I would have liked to remove it outright, as it is an ABI > mismatch that is likely to result in instability and non-fun-to-debug > runtime issues in the future. I suspect my patch does work for GCC > 7.1+. The question is: Do we want to either: > 1. mark AMDGPU broken for GCC < 7.1, or > 2. continue supporting it via stack alignment mismatch? > > 2 is brittle, and may break at any point in the future, but if it's > working for someone it does make me feel bad to outright disable it. > What I'd image 2 looks like is (psuedo code in a Makefile): Well, it's been working as is for years now, at least with gcc, so I'd hate to break that. Alex > > if CC_IS_GCC && GCC_VERSION < 7.1: > set stack alignment to 16B and hope for the best > > So my diff would be amended to keep the stack alignment flags, but > only to support GCC < 7.1. And that assumes my change compiles with > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > feel even more confident if someone with hardware to test on and GCC > 7.1+ could boot test). > -- > Thanks, > ~Nick Desaulniers > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMDGPU and 16B stack alignment
On Tue, Oct 15, 2019 at 11:05 AM Nick Desaulniers wrote: > > On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann wrote: > > > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish wrote: > > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > > > My gcc build fails with below errors: > > > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and > > > 12 > > > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 > > > and 12 > > I was able to reproduce this failure on pre-7.1 versions of GCC. It > seems that when: > 1. code is using doubles > 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment > than GCC produces that error: > https://godbolt.org/z/7T8nbH > Also, I suspect that very error solves the mystery of "why was 16B stack alignment ever specified?" -- Thanks, ~Nick Desaulniers
Re: AMDGPU and 16B stack alignment
On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann wrote: > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish wrote: > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > My gcc build fails with below errors: > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 > > and 12 I was able to reproduce this failure on pre-7.1 versions of GCC. It seems that when: 1. code is using doubles 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment than GCC produces that error: https://godbolt.org/z/7T8nbH That's already a tall order of constraints, so it's understandable that the compiler would just error likely during instruction selection, but was eventually taught how to solve such constraints. > > > > While GPF observed on clang builds seem to be fixed. Thanks for the report. Your testing these patches is invaluable, Shirish! > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack > alignment when > SSE is enabled on x86-64, but does not actually rely on that for > correct operation > unless it's using sse2. So -msse always has to be paired with > -mpreferred-stack-boundary=3. Seemingly only for older versions of GCC, pre 7.1. > > For clang, it sounds like the opposite is true: when passing 16 byte > stack alignment > and having sse/sse2 enabled, it requires the incoming stack to be 16 > byte aligned, I don't think it requires the incoming stack to be 16B aligned for sse2, I think it requires the incoming and current stack alignment to match. Today it does not, which is why we observe GPFs. > but passing 8 byte alignment makes it do the right thing. > > So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4) > to get the desired outcome on both? Hmmm...I would have liked to remove it outright, as it is an ABI mismatch that is likely to result in instability and non-fun-to-debug runtime issues in the future. I suspect my patch does work for GCC 7.1+. The question is: Do we want to either: 1. mark AMDGPU broken for GCC < 7.1, or 2. continue supporting it via stack alignment mismatch? 2 is brittle, and may break at any point in the future, but if it's working for someone it does make me feel bad to outright disable it. What I'd image 2 looks like is (psuedo code in a Makefile): if CC_IS_GCC && GCC_VERSION < 7.1: set stack alignment to 16B and hope for the best So my diff would be amended to keep the stack alignment flags, but only to support GCC < 7.1. And that assumes my change compiles with GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would feel even more confident if someone with hardware to test on and GCC 7.1+ could boot test). -- Thanks, ~Nick Desaulniers ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: AMDGPU and 16B stack alignment
From: Arnd Bergmann > Sent: 15 October 2019 08:19 > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish wrote: > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > My gcc build fails with below errors: > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 > > and 12 > > > > While GPF observed on clang builds seem to be fixed. > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack > alignment when > SSE is enabled on x86-64, but does not actually rely on that for > correct operation > unless it's using sse2. So -msse always has to be paired with > -mpreferred-stack-boundary=3. > > For clang, it sounds like the opposite is true: when passing 16 byte > stack alignment > and having sse/sse2 enabled, it requires the incoming stack to be 16 > byte aligned, > but passing 8 byte alignment makes it do the right thing. > > So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4) > to get the desired outcome on both? It probably won't solve the problem. You need to find all the asm blocks that call back into C and ensure they maintain the required stack alignment. This might be possible in the kernel, but is almost impossible in userspace. ISTR that gcc arbitrarily changed the stack alignment for i386 a few years ago. While it helped code generation it broke a lot of things. I can't remember the correct set of options to get the stack alignment code added only where it was needed (generates a double %bp frame). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMDGPU and 16B stack alignment
On Tue, Oct 15, 2019 at 9:08 AM S, Shirish wrote: > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > My gcc build fails with below errors: > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and > 12 > > While GPF observed on clang builds seem to be fixed. Ok, so it seems that gcc insists on having at least 2^4 bytes stack alignment when SSE is enabled on x86-64, but does not actually rely on that for correct operation unless it's using sse2. So -msse always has to be paired with -mpreferred-stack-boundary=3. For clang, it sounds like the opposite is true: when passing 16 byte stack alignment and having sse/sse2 enabled, it requires the incoming stack to be 16 byte aligned, but passing 8 byte alignment makes it do the right thing. So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4) to get the desired outcome on both? Arnd
Re: AMDGPU and 16B stack alignment
Hi Nick, On 10/15/2019 3:52 AM, Nick Desaulniers wrote: Hello! The x86 kernel is compiled with an 8B stack alignment via `-mpreferred-stack-boundary=3` for GCC since 3.6-rc1 via commit d9b0cde91c60 ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported") or `-mstack-alignment=8` for Clang. Parts of the AMDGPU driver are compiled with 16B stack alignment. Generally, the stack alignment is part of the ABI. Linking together two different translation units with differing stack alignment is dangerous, particularly when the translation unit with the smaller stack alignment makes calls into the translation unit with the larger stack alignment. While 8B aligned stacks are sometimes also 16B aligned, they are not always. Multiple users have reported General Protection Faults (GPF) when using the AMDGPU driver compiled with Clang. Clang is placing objects in stack slots assuming the stack is 16B aligned, and selecting instructions that require 16B aligned memory operands. At runtime, syscalls handling 8B stack aligned code calls into code that assumes 16B stack alignment. When the stack is a multiple of 8B but not 16B, these instructions result in a GPF. GCC doesn't select instructions with alignment requirements, so the GPFs aren't observed, but it is still considered an ABI breakage to mix and match stack alignment. I have patches that basically remove -mpreferred-stack-boundary=4 and -mstack-alignment=16 from AMDGPU: https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-541247601 Yuxuan has tested with Clang and GCC and reported it fixes the GPF's observed. My gcc build fails with below errors: dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 While GPF observed on clang builds seem to be fixed. -- Regards, Shirish S I've split the patch into 4; same commit message but different Fixes tags so that they backport to stable on finer granularity. 2 questions BEFORE I send the series: 1. Would you prefer 4 patches with unique `fixes` tags, or 1 patch? 2. Was there or is there still a good reason for the stack alignment mismatch? (Further, I think we can use -msse2 for BOTH clang+gcc after my patch, but I don't have hardware to test on. I'm happy to write/send the follow up patch, but I'd need help testing). ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
AMDGPU and 16B stack alignment
Hello! The x86 kernel is compiled with an 8B stack alignment via `-mpreferred-stack-boundary=3` for GCC since 3.6-rc1 via commit d9b0cde91c60 ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported") or `-mstack-alignment=8` for Clang. Parts of the AMDGPU driver are compiled with 16B stack alignment. Generally, the stack alignment is part of the ABI. Linking together two different translation units with differing stack alignment is dangerous, particularly when the translation unit with the smaller stack alignment makes calls into the translation unit with the larger stack alignment. While 8B aligned stacks are sometimes also 16B aligned, they are not always. Multiple users have reported General Protection Faults (GPF) when using the AMDGPU driver compiled with Clang. Clang is placing objects in stack slots assuming the stack is 16B aligned, and selecting instructions that require 16B aligned memory operands. At runtime, syscalls handling 8B stack aligned code calls into code that assumes 16B stack alignment. When the stack is a multiple of 8B but not 16B, these instructions result in a GPF. GCC doesn't select instructions with alignment requirements, so the GPFs aren't observed, but it is still considered an ABI breakage to mix and match stack alignment. I have patches that basically remove -mpreferred-stack-boundary=4 and -mstack-alignment=16 from AMDGPU: https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-541247601 Yuxuan has tested with Clang and GCC and reported it fixes the GPF's observed. I've split the patch into 4; same commit message but different Fixes tags so that they backport to stable on finer granularity. 2 questions BEFORE I send the series: 1. Would you prefer 4 patches with unique `fixes` tags, or 1 patch? 2. Was there or is there still a good reason for the stack alignment mismatch? (Further, I think we can use -msse2 for BOTH clang+gcc after my patch, but I don't have hardware to test on. I'm happy to write/send the follow up patch, but I'd need help testing). -- Thanks, ~Nick Desaulniers