Re: AMDGPU and 16B stack alignment

2019-10-16 Thread Nick Desaulniers
On Wed, Oct 16, 2019 at 11:55 AM Arvind Sankar  wrote:
>
> On Tue, Oct 15, 2019 at 06:51:26PM -0700, Nick Desaulniers wrote:
> > On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar  wrote:
> > >
> > > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote:
> > > > Hmmm...I would have liked to remove it outright, as it is an ABI
> > > > mismatch that is likely to result in instability and non-fun-to-debug
> > > > runtime issues in the future.  I suspect my patch does work for GCC
> > > > 7.1+.  The question is: Do we want to either:
> > > > 1. mark AMDGPU broken for GCC < 7.1, or
> > > > 2. continue supporting it via stack alignment mismatch?
> > > >
> > > > 2 is brittle, and may break at any point in the future, but if it's
> > > > working for someone it does make me feel bad to outright disable it.
> > > > What I'd image 2 looks like is (psuedo code in a Makefile):
> > > >
> > > > if CC_IS_GCC && GCC_VERSION < 7.1:
> > > >   set stack alignment to 16B and hope for the best
> > > >
> > > > So my diff would be amended to keep the stack alignment flags, but
> > > > only to support GCC < 7.1.  And that assumes my change compiles with
> > > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> > > > feel even more confident if someone with hardware to test on and GCC
> > > > 7.1+ could boot test).
> > > > --
> > > > Thanks,
> > > > ~Nick Desaulniers
> > >
> > > If we do keep it, would adding -mstackrealign make it more robust?
> > > That's simple and will only add the alignment to functions that require
> > > 16-byte alignment (at least on gcc).
> >
> > I think there's also `-mincoming-stack-boundary=`.
> > https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017
>
> Yes, but -mstackrealign looks like it's supported by clang as well.

Good to know, but I want less duct tape, not more.

> >
> > >
> > > Alternative is to use
> > > __attribute__((force_align_arg_pointer)) on functions that might be
> > > called from 8-byte-aligned code.
> >
> > Which is hard to automate and easy to forget.  Likely a large diff to fix 
> > today.
>
> Right, this is a no-go, esp to just fix old compilers.
> >
> > >
> > > It looks like -mstackrealign should work from gcc 5.3 onwards.
> >
> > The kernel would generally like to support GCC 4.9+.
> >
> > There's plenty of different ways to keep layering on duct tape and
> > bailing wire to support differing ABIs, but that's just adding
> > technical debt that will have to be repaid one day.  That's why the
> > cleanest solution IMO is mark the driver broken for old toolchains,
> > and use a code-base-consistent stack alignment.  Bending over
> > backwards to support old toolchains means accepting stack alignment
> > mismatches, which is in the "unspecified behavior" ring of the
> > "undefined behavior" Venn diagram.  I have the same opinion on relying
> > on explicitly undefined behavior.
> >
> > I'll send patches for fixing up Clang, but please consider my strong
> > advice to generally avoid stack alignment mismatches, regardless of
> > compiler.
> > --
> > Thanks,
> > ~Nick Desaulniers
>
> What I suggested was in reference to your proposal for dropping the
> -mpreferred-stack-boundary=4 for modern compilers, but keeping it for
> <7.1. -mstackrealign would at least let 5.3 onwards be less likely to
> break (and it doesn't error before then, I think it just doesn't
> actually do anything, so no worse than now at least).
>
> Simply dropping support for <7.1 would be cleanest, yes, but it sounds
> like people don't want to go that far.

That's fair.  I've included your suggestions in the commit message of
02/03 of a series I just sent but forgot to in reply to this thread:
https://lkml.org/lkml/2019/10/16/1700

Also, I do appreciate the suggestions and understand the value of brainstorming.
-- 
Thanks,
~Nick Desaulniers


Re: AMDGPU and 16B stack alignment

2019-10-16 Thread Arvind Sankar
On Tue, Oct 15, 2019 at 06:51:26PM -0700, Nick Desaulniers wrote:
> On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar  wrote:
> >
> > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote:
> > > Hmmm...I would have liked to remove it outright, as it is an ABI
> > > mismatch that is likely to result in instability and non-fun-to-debug
> > > runtime issues in the future.  I suspect my patch does work for GCC
> > > 7.1+.  The question is: Do we want to either:
> > > 1. mark AMDGPU broken for GCC < 7.1, or
> > > 2. continue supporting it via stack alignment mismatch?
> > >
> > > 2 is brittle, and may break at any point in the future, but if it's
> > > working for someone it does make me feel bad to outright disable it.
> > > What I'd image 2 looks like is (psuedo code in a Makefile):
> > >
> > > if CC_IS_GCC && GCC_VERSION < 7.1:
> > >   set stack alignment to 16B and hope for the best
> > >
> > > So my diff would be amended to keep the stack alignment flags, but
> > > only to support GCC < 7.1.  And that assumes my change compiles with
> > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> > > feel even more confident if someone with hardware to test on and GCC
> > > 7.1+ could boot test).
> > > --
> > > Thanks,
> > > ~Nick Desaulniers
> >
> > If we do keep it, would adding -mstackrealign make it more robust?
> > That's simple and will only add the alignment to functions that require
> > 16-byte alignment (at least on gcc).
> 
> I think there's also `-mincoming-stack-boundary=`.
> https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017

Yes, but -mstackrealign looks like it's supported by clang as well.
> 
> >
> > Alternative is to use
> > __attribute__((force_align_arg_pointer)) on functions that might be
> > called from 8-byte-aligned code.
> 
> Which is hard to automate and easy to forget.  Likely a large diff to fix 
> today.

Right, this is a no-go, esp to just fix old compilers.
> 
> >
> > It looks like -mstackrealign should work from gcc 5.3 onwards.
> 
> The kernel would generally like to support GCC 4.9+.
> 
> There's plenty of different ways to keep layering on duct tape and
> bailing wire to support differing ABIs, but that's just adding
> technical debt that will have to be repaid one day.  That's why the
> cleanest solution IMO is mark the driver broken for old toolchains,
> and use a code-base-consistent stack alignment.  Bending over
> backwards to support old toolchains means accepting stack alignment
> mismatches, which is in the "unspecified behavior" ring of the
> "undefined behavior" Venn diagram.  I have the same opinion on relying
> on explicitly undefined behavior.
> 
> I'll send patches for fixing up Clang, but please consider my strong
> advice to generally avoid stack alignment mismatches, regardless of
> compiler.
> --
> Thanks,
> ~Nick Desaulniers

What I suggested was in reference to your proposal for dropping the
-mpreferred-stack-boundary=4 for modern compilers, but keeping it for
<7.1. -mstackrealign would at least let 5.3 onwards be less likely to
break (and it doesn't error before then, I think it just doesn't
actually do anything, so no worse than now at least).

Simply dropping support for <7.1 would be cleanest, yes, but it sounds
like people don't want to go that far.


Re: AMDGPU and 16B stack alignment

2019-10-15 Thread Nick Desaulniers
On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar  wrote:
>
> On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote:
> > Hmmm...I would have liked to remove it outright, as it is an ABI
> > mismatch that is likely to result in instability and non-fun-to-debug
> > runtime issues in the future.  I suspect my patch does work for GCC
> > 7.1+.  The question is: Do we want to either:
> > 1. mark AMDGPU broken for GCC < 7.1, or
> > 2. continue supporting it via stack alignment mismatch?
> >
> > 2 is brittle, and may break at any point in the future, but if it's
> > working for someone it does make me feel bad to outright disable it.
> > What I'd image 2 looks like is (psuedo code in a Makefile):
> >
> > if CC_IS_GCC && GCC_VERSION < 7.1:
> >   set stack alignment to 16B and hope for the best
> >
> > So my diff would be amended to keep the stack alignment flags, but
> > only to support GCC < 7.1.  And that assumes my change compiles with
> > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> > feel even more confident if someone with hardware to test on and GCC
> > 7.1+ could boot test).
> > --
> > Thanks,
> > ~Nick Desaulniers
>
> If we do keep it, would adding -mstackrealign make it more robust?
> That's simple and will only add the alignment to functions that require
> 16-byte alignment (at least on gcc).

I think there's also `-mincoming-stack-boundary=`.
https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017

>
> Alternative is to use
> __attribute__((force_align_arg_pointer)) on functions that might be
> called from 8-byte-aligned code.

Which is hard to automate and easy to forget.  Likely a large diff to fix today.

>
> It looks like -mstackrealign should work from gcc 5.3 onwards.

The kernel would generally like to support GCC 4.9+.

There's plenty of different ways to keep layering on duct tape and
bailing wire to support differing ABIs, but that's just adding
technical debt that will have to be repaid one day.  That's why the
cleanest solution IMO is mark the driver broken for old toolchains,
and use a code-base-consistent stack alignment.  Bending over
backwards to support old toolchains means accepting stack alignment
mismatches, which is in the "unspecified behavior" ring of the
"undefined behavior" Venn diagram.  I have the same opinion on relying
on explicitly undefined behavior.

I'll send patches for fixing up Clang, but please consider my strong
advice to generally avoid stack alignment mismatches, regardless of
compiler.
--
Thanks,
~Nick Desaulniers


Re: AMDGPU and 16B stack alignment

2019-10-15 Thread Arvind Sankar
On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote:
> Hmmm...I would have liked to remove it outright, as it is an ABI
> mismatch that is likely to result in instability and non-fun-to-debug
> runtime issues in the future.  I suspect my patch does work for GCC
> 7.1+.  The question is: Do we want to either:
> 1. mark AMDGPU broken for GCC < 7.1, or
> 2. continue supporting it via stack alignment mismatch?
> 
> 2 is brittle, and may break at any point in the future, but if it's
> working for someone it does make me feel bad to outright disable it.
> What I'd image 2 looks like is (psuedo code in a Makefile):
> 
> if CC_IS_GCC && GCC_VERSION < 7.1:
>   set stack alignment to 16B and hope for the best
> 
> So my diff would be amended to keep the stack alignment flags, but
> only to support GCC < 7.1.  And that assumes my change compiles with
> GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> feel even more confident if someone with hardware to test on and GCC
> 7.1+ could boot test).
> -- 
> Thanks,
> ~Nick Desaulniers

If we do keep it, would adding -mstackrealign make it more robust?
That's simple and will only add the alignment to functions that require
16-byte alignment (at least on gcc).

Alternative is to use
__attribute__((force_align_arg_pointer)) on functions that might be
called from 8-byte-aligned code.

It looks like -mstackrealign should work from gcc 5.3 onwards.


Re: AMDGPU and 16B stack alignment

2019-10-15 Thread Nick Desaulniers
On Tue, Oct 15, 2019 at 11:30 AM Alex Deucher  wrote:
>
> On Tue, Oct 15, 2019 at 2:07 PM Nick Desaulniers
>  wrote:
> >
> > On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann  wrote:
> > >
> > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish  wrote:
> > > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> > >
> > > > My gcc build fails with below errors:
> > > >
> > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 
> > > > and 12
> > > >
> > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 
> > > > 4 and 12
> >
> > I was able to reproduce this failure on pre-7.1 versions of GCC.  It
> > seems that when:
> > 1. code is using doubles
> > 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment
> > than GCC produces that error:
> > https://godbolt.org/z/7T8nbH
> >
> > That's already a tall order of constraints, so it's understandable
> > that the compiler would just error likely during instruction
> > selection, but was eventually taught how to solve such constraints.
> >
> > > >
> > > > While GPF observed on clang builds seem to be fixed.
> >
> > Thanks for the report.  Your testing these patches is invaluable, Shirish!
> >
> > >
> > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack
> > > alignment when
> > > SSE is enabled on x86-64, but does not actually rely on that for
> > > correct operation
> > > unless it's using sse2. So -msse always has to be paired with
> > >  -mpreferred-stack-boundary=3.
> >
> > Seemingly only for older versions of GCC, pre 7.1.
> >
> > >
> > > For clang, it sounds like the opposite is true: when passing 16 byte
> > > stack alignment
> > > and having sse/sse2 enabled, it requires the incoming stack to be 16
> > > byte aligned,
> >
> > I don't think it requires the incoming stack to be 16B aligned for
> > sse2, I think it requires the incoming and current stack alignment to
> > match. Today it does not, which is why we observe GPFs.
> >
> > > but passing 8 byte alignment makes it do the right thing.
> > >
> > > So, should we just always pass $(call cc-option, 
> > > -mpreferred-stack-boundary=4)
> > > to get the desired outcome on both?
> >
> > Hmmm...I would have liked to remove it outright, as it is an ABI
> > mismatch that is likely to result in instability and non-fun-to-debug
> > runtime issues in the future.  I suspect my patch does work for GCC
> > 7.1+.  The question is: Do we want to either:
> > 1. mark AMDGPU broken for GCC < 7.1, or
> > 2. continue supporting it via stack alignment mismatch?
> >
> > 2 is brittle, and may break at any point in the future, but if it's
> > working for someone it does make me feel bad to outright disable it.
> > What I'd image 2 looks like is (psuedo code in a Makefile):
>
> Well, it's been working as is for years now, at least with gcc, so I'd
> hate to break that.

Ok, I'm happy to leave that as is for GCC, then.  Would you prefer I
modify it for GCC >7.1 or just leave it alone (maybe I'll add a
comment about *why* it's done for GCC)? Would you prefer 1 patch or 4?

>
> Alex
>
> >
> > if CC_IS_GCC && GCC_VERSION < 7.1:
> >   set stack alignment to 16B and hope for the best

Ie, this ^

> >
> > So my diff would be amended to keep the stack alignment flags, but
> > only to support GCC < 7.1.  And that assumes my change compiles with
> > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> > feel even more confident if someone with hardware to test on and GCC
> > 7.1+ could boot test).

-- 
Thanks,
~Nick Desaulniers


Re: AMDGPU and 16B stack alignment

2019-10-15 Thread Alex Deucher
On Tue, Oct 15, 2019 at 2:07 PM Nick Desaulniers
 wrote:
>
> On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann  wrote:
> >
> > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish  wrote:
> > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> >
> > > My gcc build fails with below errors:
> > >
> > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 
> > > 12
> > >
> > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 
> > > and 12
>
> I was able to reproduce this failure on pre-7.1 versions of GCC.  It
> seems that when:
> 1. code is using doubles
> 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment
> than GCC produces that error:
> https://godbolt.org/z/7T8nbH
>
> That's already a tall order of constraints, so it's understandable
> that the compiler would just error likely during instruction
> selection, but was eventually taught how to solve such constraints.
>
> > >
> > > While GPF observed on clang builds seem to be fixed.
>
> Thanks for the report.  Your testing these patches is invaluable, Shirish!
>
> >
> > Ok, so it seems that gcc insists on having at least 2^4 bytes stack
> > alignment when
> > SSE is enabled on x86-64, but does not actually rely on that for
> > correct operation
> > unless it's using sse2. So -msse always has to be paired with
> >  -mpreferred-stack-boundary=3.
>
> Seemingly only for older versions of GCC, pre 7.1.
>
> >
> > For clang, it sounds like the opposite is true: when passing 16 byte
> > stack alignment
> > and having sse/sse2 enabled, it requires the incoming stack to be 16
> > byte aligned,
>
> I don't think it requires the incoming stack to be 16B aligned for
> sse2, I think it requires the incoming and current stack alignment to
> match. Today it does not, which is why we observe GPFs.
>
> > but passing 8 byte alignment makes it do the right thing.
> >
> > So, should we just always pass $(call cc-option, 
> > -mpreferred-stack-boundary=4)
> > to get the desired outcome on both?
>
> Hmmm...I would have liked to remove it outright, as it is an ABI
> mismatch that is likely to result in instability and non-fun-to-debug
> runtime issues in the future.  I suspect my patch does work for GCC
> 7.1+.  The question is: Do we want to either:
> 1. mark AMDGPU broken for GCC < 7.1, or
> 2. continue supporting it via stack alignment mismatch?
>
> 2 is brittle, and may break at any point in the future, but if it's
> working for someone it does make me feel bad to outright disable it.
> What I'd image 2 looks like is (psuedo code in a Makefile):

Well, it's been working as is for years now, at least with gcc, so I'd
hate to break that.

Alex

>
> if CC_IS_GCC && GCC_VERSION < 7.1:
>   set stack alignment to 16B and hope for the best
>
> So my diff would be amended to keep the stack alignment flags, but
> only to support GCC < 7.1.  And that assumes my change compiles with
> GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> feel even more confident if someone with hardware to test on and GCC
> 7.1+ could boot test).
> --
> Thanks,
> ~Nick Desaulniers
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU and 16B stack alignment

2019-10-15 Thread Nick Desaulniers
On Tue, Oct 15, 2019 at 11:05 AM Nick Desaulniers
 wrote:
>
> On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann  wrote:
> >
> > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish  wrote:
> > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> >
> > > My gcc build fails with below errors:
> > >
> > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 
> > > 12
> > >
> > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 
> > > and 12
>
> I was able to reproduce this failure on pre-7.1 versions of GCC.  It
> seems that when:
> 1. code is using doubles
> 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment
> than GCC produces that error:
> https://godbolt.org/z/7T8nbH
>

Also, I suspect that very error solves the mystery of "why was 16B
stack alignment ever specified?"
-- 
Thanks,
~Nick Desaulniers


Re: AMDGPU and 16B stack alignment

2019-10-15 Thread Nick Desaulniers
On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann  wrote:
>
> On Tue, Oct 15, 2019 at 9:08 AM S, Shirish  wrote:
> > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
>
> > My gcc build fails with below errors:
> >
> > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> >
> > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 
> > and 12

I was able to reproduce this failure on pre-7.1 versions of GCC.  It
seems that when:
1. code is using doubles
2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment
than GCC produces that error:
https://godbolt.org/z/7T8nbH

That's already a tall order of constraints, so it's understandable
that the compiler would just error likely during instruction
selection, but was eventually taught how to solve such constraints.

> >
> > While GPF observed on clang builds seem to be fixed.

Thanks for the report.  Your testing these patches is invaluable, Shirish!

>
> Ok, so it seems that gcc insists on having at least 2^4 bytes stack
> alignment when
> SSE is enabled on x86-64, but does not actually rely on that for
> correct operation
> unless it's using sse2. So -msse always has to be paired with
>  -mpreferred-stack-boundary=3.

Seemingly only for older versions of GCC, pre 7.1.

>
> For clang, it sounds like the opposite is true: when passing 16 byte
> stack alignment
> and having sse/sse2 enabled, it requires the incoming stack to be 16
> byte aligned,

I don't think it requires the incoming stack to be 16B aligned for
sse2, I think it requires the incoming and current stack alignment to
match. Today it does not, which is why we observe GPFs.

> but passing 8 byte alignment makes it do the right thing.
>
> So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
> to get the desired outcome on both?

Hmmm...I would have liked to remove it outright, as it is an ABI
mismatch that is likely to result in instability and non-fun-to-debug
runtime issues in the future.  I suspect my patch does work for GCC
7.1+.  The question is: Do we want to either:
1. mark AMDGPU broken for GCC < 7.1, or
2. continue supporting it via stack alignment mismatch?

2 is brittle, and may break at any point in the future, but if it's
working for someone it does make me feel bad to outright disable it.
What I'd image 2 looks like is (psuedo code in a Makefile):

if CC_IS_GCC && GCC_VERSION < 7.1:
  set stack alignment to 16B and hope for the best

So my diff would be amended to keep the stack alignment flags, but
only to support GCC < 7.1.  And that assumes my change compiles with
GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
feel even more confident if someone with hardware to test on and GCC
7.1+ could boot test).
-- 
Thanks,
~Nick Desaulniers
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: AMDGPU and 16B stack alignment

2019-10-15 Thread David Laight
From: Arnd Bergmann
> Sent: 15 October 2019 08:19
> 
> On Tue, Oct 15, 2019 at 9:08 AM S, Shirish  wrote:
> > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> 
> > My gcc build fails with below errors:
> >
> > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> >
> > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 
> > and 12
> >
> > While GPF observed on clang builds seem to be fixed.
> 
> Ok, so it seems that gcc insists on having at least 2^4 bytes stack
> alignment when
> SSE is enabled on x86-64, but does not actually rely on that for
> correct operation
> unless it's using sse2. So -msse always has to be paired with
>  -mpreferred-stack-boundary=3.
> 
> For clang, it sounds like the opposite is true: when passing 16 byte
> stack alignment
> and having sse/sse2 enabled, it requires the incoming stack to be 16
> byte aligned,
> but passing 8 byte alignment makes it do the right thing.
> 
> So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
> to get the desired outcome on both?

It probably won't solve the problem.
You need to find all the asm blocks that call back into C and ensure they
maintain the required stack alignment.
This might be possible in the kernel, but is almost impossible in userspace.

ISTR that gcc arbitrarily changed the stack alignment for i386 a few years ago.
While it helped code generation it broke a lot of things.
I can't remember the correct set of options to get the stack alignment
code added only where it was needed (generates a double %bp frame).

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: AMDGPU and 16B stack alignment

2019-10-15 Thread Arnd Bergmann
On Tue, Oct 15, 2019 at 9:08 AM S, Shirish  wrote:
> On 10/15/2019 3:52 AM, Nick Desaulniers wrote:

> My gcc build fails with below errors:
>
> dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
>
> dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 
> 12
>
> While GPF observed on clang builds seem to be fixed.

Ok, so it seems that gcc insists on having at least 2^4 bytes stack
alignment when
SSE is enabled on x86-64, but does not actually rely on that for
correct operation
unless it's using sse2. So -msse always has to be paired with
 -mpreferred-stack-boundary=3.

For clang, it sounds like the opposite is true: when passing 16 byte
stack alignment
and having sse/sse2 enabled, it requires the incoming stack to be 16
byte aligned,
but passing 8 byte alignment makes it do the right thing.

So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
to get the desired outcome on both?

   Arnd


Re: AMDGPU and 16B stack alignment

2019-10-15 Thread S, Shirish
Hi Nick,

On 10/15/2019 3:52 AM, Nick Desaulniers wrote:

Hello!

The x86 kernel is compiled with an 8B stack alignment via
`-mpreferred-stack-boundary=3` for GCC since 3.6-rc1 via
commit d9b0cde91c60 ("x86-64, gcc: Use
-mpreferred-stack-boundary=3 if supported")
or `-mstack-alignment=8` for Clang. Parts of the AMDGPU driver are
compiled with 16B stack alignment.

Generally, the stack alignment is part of the ABI. Linking together two
different translation units with differing stack alignment is dangerous,
particularly when the translation unit with the smaller stack alignment
makes calls into the translation unit with the larger stack alignment.
While 8B aligned stacks are sometimes also 16B aligned, they are not
always.

Multiple users have reported General Protection Faults (GPF) when using
the AMDGPU driver compiled with Clang. Clang is placing objects in stack
slots assuming the stack is 16B aligned, and selecting instructions that
require 16B aligned memory operands. At runtime, syscalls handling 8B
stack aligned code calls into code that assumes 16B stack alignment.
When the stack is a multiple of 8B but not 16B, these instructions
result in a GPF.

GCC doesn't select instructions with alignment requirements, so the GPFs
aren't observed, but it is still considered an ABI breakage to mix and
match stack alignment.

I have patches that basically remove -mpreferred-stack-boundary=4 and
-mstack-alignment=16 from AMDGPU:
https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-541247601
Yuxuan has tested with Clang and GCC and reported it fixes the GPF's observed.

My gcc build fails with below errors:

dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12

dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12

While GPF observed on clang builds seem to be fixed.

--
Regards,
Shirish S



I've split the patch into 4; same commit message but different Fixes
tags so that they backport to stable on finer granularity. 2 questions
BEFORE I send the series:

1. Would you prefer 4 patches with unique `fixes` tags, or 1 patch?
2. Was there or is there still a good reason for the stack alignment mismatch?

(Further, I think we can use -msse2 for BOTH clang+gcc after my patch,
but I don't have hardware to test on. I'm happy to write/send the
follow up patch, but I'd need help testing).




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

AMDGPU and 16B stack alignment

2019-10-14 Thread Nick Desaulniers
Hello!

The x86 kernel is compiled with an 8B stack alignment via
`-mpreferred-stack-boundary=3` for GCC since 3.6-rc1 via
commit d9b0cde91c60 ("x86-64, gcc: Use
-mpreferred-stack-boundary=3 if supported")
or `-mstack-alignment=8` for Clang. Parts of the AMDGPU driver are
compiled with 16B stack alignment.

Generally, the stack alignment is part of the ABI. Linking together two
different translation units with differing stack alignment is dangerous,
particularly when the translation unit with the smaller stack alignment
makes calls into the translation unit with the larger stack alignment.
While 8B aligned stacks are sometimes also 16B aligned, they are not
always.

Multiple users have reported General Protection Faults (GPF) when using
the AMDGPU driver compiled with Clang. Clang is placing objects in stack
slots assuming the stack is 16B aligned, and selecting instructions that
require 16B aligned memory operands. At runtime, syscalls handling 8B
stack aligned code calls into code that assumes 16B stack alignment.
When the stack is a multiple of 8B but not 16B, these instructions
result in a GPF.

GCC doesn't select instructions with alignment requirements, so the GPFs
aren't observed, but it is still considered an ABI breakage to mix and
match stack alignment.

I have patches that basically remove -mpreferred-stack-boundary=4 and
-mstack-alignment=16 from AMDGPU:
https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-541247601
Yuxuan has tested with Clang and GCC and reported it fixes the GPF's observed.

I've split the patch into 4; same commit message but different Fixes
tags so that they backport to stable on finer granularity. 2 questions
BEFORE I send the series:

1. Would you prefer 4 patches with unique `fixes` tags, or 1 patch?
2. Was there or is there still a good reason for the stack alignment mismatch?

(Further, I think we can use -msse2 for BOTH clang+gcc after my patch,
but I don't have hardware to test on. I'm happy to write/send the
follow up patch, but I'd need help testing).
-- 
Thanks,
~Nick Desaulniers