Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-18 Thread Peter Zijlstra
On Fri, Apr 17, 2020 at 04:27:28PM -0400, Rodrigo Siqueira wrote:
> I'm going to work on the FPU issues in the display code. In this sense,
> could you clarify a little bit more about the Makefile issues?

I think it might have been PEBKAC, I assumed allmodconfig would end up
selecting the driver, it doesn't!

Adding "AMD GPU" to a defconfig seems to make it work:

$ make O=defconfig-build/ 
drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o

Sorry about the confusion.

> Also, I applied the patch `[PATCH v4] x86: insn: Add insn_is_fpu()` and
> tried to reproduce the warning that you described but I didn't see it.
> Could you explain to me how I can check those warnings?

grab:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/fpu

Then build like above:

$ make O=defconfig-build/ 
drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o

And manually validate:

$ defconfig-build/tools/objtool/objtool check -Ffa 
defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o
defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: 
warning: objtool: dcn_validate_bandwidth()+0x1b17: FPU instruction outside of 
kernel_fpu_{begin,end}()


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-17 Thread Rodrigo Siqueira
On 04/09, Peter Zijlstra wrote:
> On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote:
> > Am 09.04.20 um 19:09 schrieb Peter Zijlstra:
> > > On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
> > > [SNIP]
> > > > I'll need another approach, let me consider.
> > > Christian; it says these files are generated, does that generator know
> > > which functions are wholly in FPU context and which are not?
> > 
> > Well that "generator" is still a human being :)
> > 
> > It's just that the formulae for the calculation come from the hardware team
> > and we are not able to easily transcript them to fixed point calculations.
> 
> Well, if it's a human, can this human respect the kernel coding style a
> bit more :-) Some of that stuff is atrocious.
> 
> > > My current thinking is that if I annotate all functions that are wholly
> > > inside kernel_fpu_start() with an __fpu function attribute, then I can
> > > verify that any call from regular text to fpu text only happens inside
> > > kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation
> > > fuctions only contain !fpu instructions.
> > 
> > Yeah, that sounds like a good idea to me and should be easily doable.
> > 
> > > Can that generator add the __fpu function attribute or is that something
> > > that would need to be done manually (which seems like it would be
> > > painful, since it is quite a bit of code) ?
> > 
> > We are currently in the process of moving all the stuff which requires
> > floating point into a single C file(s) and then make sure that we only call
> > those within kernel_fpu_begin()/end() blocks.
> 
> Can you make the build system stick all those .o files in a single
> archive? That's the only way I can do call validation; external
> relocatoin records do not contain the section.
> 
> > Annotating those function with __fpu or even saying to gcc that all code of
> > those files should go into a special text.fpu segment shouldn't be much of a
> > problem.
> 
> Guess what the __fpu attribute does ;-)
> 
> With the below patch (which is on to of newer versions of the objtool
> patches send earlier, let me know if you want a full set) that only
> converts a few files, but fully converts:
> 
>   drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c
> 
> But building it (and this is an absolute pain; when you're reworking
> this, can you pretty please also fix the Makefiles?), we get:

Hi,

I'm going to work on the FPU issues in the display code. In this sense,
could you clarify a little bit more about the Makefile issues?

Also, I applied the patch `[PATCH v4] x86: insn: Add insn_is_fpu()` and
tried to reproduce the warning that you described but I didn't see it.
Could you explain to me how I can check those warnings?

Thanks

>   drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: 
> objtool: dcn_validate_bandwidth()+0x34fa: FPU instruction outside of 
> kernel_fpu_{begin,end}()
> 
> $ ./scripts/faddr2line 
> defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o 
> dcn_validate_bandwidth+0x34fa
> dcn_validate_bandwidth+0x34fa/0x57ce:
> dcn_validate_bandwidth at 
> /usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293
>  (discriminator 5)
> 
> # ./objdump-func.sh 
> defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o 
> dcn_validate_bandwidth | grep 34fa
> 34fa 50fa:  f2 0f 10 b5 60 ff ffmovsd  -0xa0(%rbp),%xmm6
> 
> Which seems to indicate there's still problms with the current code.
> 
> 
> 
> ---
>  arch/x86/include/asm/fpu/api.h | 12 +++
>  arch/x86/kernel/vmlinux.lds.S  |  1 +
>  .../gpu/drm/amd/display/dc/calcs/dcn_calc_math.c   | 25 
> +++---
>  drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c   |  4 ++--
>  .../display/dc/dml/dcn20/display_rq_dlg_calc_20.c  | 10 -
>  .../amd/display/dc/dml/display_rq_dlg_helpers.c|  2 +-
>  .../gpu/drm/amd/display/dc/dml/dml_common_defs.c   |  2 +-
>  drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c|  2 +-
>  drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c   | 10 -
>  drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c   |  4 ++--
>  drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h |  2 ++
>  tools/objtool/check.c  |  7 +-
>  tools/objtool/elf.h|  2 +-
>  13 files changed, 52 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
> index 64be4426fda9..19eaf98bbb0a 100644
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -12,11 +12,23 @@
>  #define _ASM_X86_FPU_API_H
>  #include 
> 
> +#ifdef CONFIG_STACK_VALIDATION
> +
> +#define __fpu __section(".text.fpu")
> +
>  #define _ASM_ANNOTATE_FPU(at)
> \
>".pushsection .discard.fpu_safe\n"   

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-15 Thread Peter Zijlstra
On Fri, Apr 10, 2020 at 04:31:39PM +0200, Christian König wrote:
> Can we put this new automated check will be behind a configuration flag
> initially? Or at least make it a warning and not a hard error.

I'll try and get the patches merged in mainline objtool but with a flag
that isn't used by default.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-10 Thread Christian König

Am 09.04.20 um 22:01 schrieb Peter Zijlstra:

On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote:

Am 09.04.20 um 19:09 schrieb Peter Zijlstra:

On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
[SNIP]

I'll need another approach, let me consider.

Christian; it says these files are generated, does that generator know
which functions are wholly in FPU context and which are not?

Well that "generator" is still a human being :)

It's just that the formulae for the calculation come from the hardware team
and we are not able to easily transcript them to fixed point calculations.

Well, if it's a human, can this human respect the kernel coding style a
bit more :-) Some of that stuff is atrocious.


Yes, I know. That's unfortunately something we still need to work on as 
well.



We are currently in the process of moving all the stuff which requires
floating point into a single C file(s) and then make sure that we only call
those within kernel_fpu_begin()/end() blocks.

Can you make the build system stick all those .o files in a single
archive? That's the only way I can do call validation; external
relocatoin records do not contain the section.


Need to double check that with the display team responsible for the 
code, but I think that shouldn't be much of a problem.



Annotating those function with __fpu or even saying to gcc that all code of
those files should go into a special text.fpu segment shouldn't be much of a
problem.

Guess what the __fpu attribute does ;-)


Good to know that my suspicion how this is implemented was correct :)


With the below patch (which is on to of newer versions of the objtool
patches send earlier, let me know if you want a full set


Getting a branch somewhere would be perfect.


) that only
converts a few files, but fully converts:

   drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c

But building it (and this is an absolute pain; when you're reworking
this, can you pretty please also fix the Makefiles?), we get:

   drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: 
objtool: dcn_validate_bandwidth()+0x34fa: FPU instruction outside of 
kernel_fpu_{begin,end}()

$ ./scripts/faddr2line 
defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o 
dcn_validate_bandwidth+0x34fa
dcn_validate_bandwidth+0x34fa/0x57ce:
dcn_validate_bandwidth at 
/usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293
 (discriminator 5)

# ./objdump-func.sh 
defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o 
dcn_validate_bandwidth | grep 34fa
34fa 50fa:  f2 0f 10 b5 60 ff ffmovsd  -0xa0(%rbp),%xmm6

Which seems to indicate there's still problms with the current code.


Making an educated guess I would say the compiler has no idea that it 
shouldn't use instructions which touch fp registers outside of 
kernel_fpu_{begin,end}().


Going to talk with the display team about this whole topic internally 
once more. Since this discussion already raised attention in our 
technical management it shouldn't be to much of a problem to get 
manpower to get this fixed properly.


Can we put this new automated check will be behind a configuration flag 
initially? Or at least make it a warning and not a hard error.


Thanks,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-09 Thread Peter Zijlstra
On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote:
> Am 09.04.20 um 19:09 schrieb Peter Zijlstra:
> > On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
> > [SNIP]
> > > I'll need another approach, let me consider.
> > Christian; it says these files are generated, does that generator know
> > which functions are wholly in FPU context and which are not?
> 
> Well that "generator" is still a human being :)
> 
> It's just that the formulae for the calculation come from the hardware team
> and we are not able to easily transcript them to fixed point calculations.

Well, if it's a human, can this human respect the kernel coding style a
bit more :-) Some of that stuff is atrocious.

> > My current thinking is that if I annotate all functions that are wholly
> > inside kernel_fpu_start() with an __fpu function attribute, then I can
> > verify that any call from regular text to fpu text only happens inside
> > kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation
> > fuctions only contain !fpu instructions.
> 
> Yeah, that sounds like a good idea to me and should be easily doable.
> 
> > Can that generator add the __fpu function attribute or is that something
> > that would need to be done manually (which seems like it would be
> > painful, since it is quite a bit of code) ?
> 
> We are currently in the process of moving all the stuff which requires
> floating point into a single C file(s) and then make sure that we only call
> those within kernel_fpu_begin()/end() blocks.

Can you make the build system stick all those .o files in a single
archive? That's the only way I can do call validation; external
relocatoin records do not contain the section.

> Annotating those function with __fpu or even saying to gcc that all code of
> those files should go into a special text.fpu segment shouldn't be much of a
> problem.

Guess what the __fpu attribute does ;-)

With the below patch (which is on to of newer versions of the objtool
patches send earlier, let me know if you want a full set) that only
converts a few files, but fully converts:

  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c

But building it (and this is an absolute pain; when you're reworking
this, can you pretty please also fix the Makefiles?), we get:

  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: objtool: 
dcn_validate_bandwidth()+0x34fa: FPU instruction outside of 
kernel_fpu_{begin,end}()

$ ./scripts/faddr2line 
defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o 
dcn_validate_bandwidth+0x34fa
dcn_validate_bandwidth+0x34fa/0x57ce:
dcn_validate_bandwidth at 
/usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293
 (discriminator 5)

# ./objdump-func.sh 
defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o 
dcn_validate_bandwidth | grep 34fa
34fa 50fa:  f2 0f 10 b5 60 ff ffmovsd  -0xa0(%rbp),%xmm6

Which seems to indicate there's still problms with the current code.



---
 arch/x86/include/asm/fpu/api.h | 12 +++
 arch/x86/kernel/vmlinux.lds.S  |  1 +
 .../gpu/drm/amd/display/dc/calcs/dcn_calc_math.c   | 25 +++---
 drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c   |  4 ++--
 .../display/dc/dml/dcn20/display_rq_dlg_calc_20.c  | 10 -
 .../amd/display/dc/dml/display_rq_dlg_helpers.c|  2 +-
 .../gpu/drm/amd/display/dc/dml/dml_common_defs.c   |  2 +-
 drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c|  2 +-
 drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c   | 10 -
 drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c   |  4 ++--
 drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h |  2 ++
 tools/objtool/check.c  |  7 +-
 tools/objtool/elf.h|  2 +-
 13 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 64be4426fda9..19eaf98bbb0a 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,11 +12,23 @@
 #define _ASM_X86_FPU_API_H
 #include 

+#ifdef CONFIG_STACK_VALIDATION
+
+#define __fpu __section(".text.fpu")
+
 #define _ASM_ANNOTATE_FPU(at)  \
 ".pushsection .discard.fpu_safe\n" \
 ".long " #at " - .\n"  \
 ".popsection\n"\

+#else
+
+#define __fpu
+
+#define _ASM_ANNOTATE_FPU(at)
+
+#endif /* CONFIG_STACK_VALIDATION */
+
 #define annotate_fpu() ({  \
asm volatile("%c0:\n\t" \
 _ASM_ANNOTATE_FPU(%c0b)\
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1bf7e312361f..8442f8633d07 100644

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-09 Thread Christian König

Am 09.04.20 um 19:09 schrieb Peter Zijlstra:

On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
[SNIP]

I'll need another approach, let me consider.

Christian; it says these files are generated, does that generator know
which functions are wholly in FPU context and which are not?


Well that "generator" is still a human being :)

It's just that the formulae for the calculation come from the hardware 
team and we are not able to easily transcript them to fixed point 
calculations.



My current thinking is that if I annotate all functions that are wholly
inside kernel_fpu_start() with an __fpu function attribute, then I can
verify that any call from regular text to fpu text only happens inside
kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation
fuctions only contain !fpu instructions.


Yeah, that sounds like a good idea to me and should be easily doable.


Can that generator add the __fpu function attribute or is that something
that would need to be done manually (which seems like it would be
painful, since it is quite a bit of code) ?


We are currently in the process of moving all the stuff which requires 
floating point into a single C file(s) and then make sure that we only 
call those within kernel_fpu_begin()/end() blocks.


Annotating those function with __fpu or even saying to gcc that all code 
of those files should go into a special text.fpu segment shouldn't be 
much of a problem.


Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-09 Thread Peter Zijlstra
On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 02, 2020 at 04:13:08PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote:
> 
> > > yes, using the floating point calculations in the display code has been a
> > > source of numerous problems and confusion in the past.
> > > 
> > > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the
> > > DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> > > architecture depend handling for x86 and PPC64.
> > > 
> > > This originated from the graphics block integrated into AMD CPU (where we
> > > knew which fp unit we had), but as far as I know is now also used for
> > > dedicated AMD GPUs as well.
> > > 
> > > I'm not really a fan of this either, but so far we weren't able to 
> > > convince
> > > the hardware engineers to not use floating point calculations for the
> > > display stuff.

> I'll need another approach, let me consider.

Christian; it says these files are generated, does that generator know
which functions are wholly in FPU context and which are not?

My current thinking is that if I annotate all functions that are wholly
inside kernel_fpu_start() with an __fpu function attribute, then I can
verify that any call from regular text to fpu text only happens inside
kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation
fuctions only contain !fpu instructions.

Can that generator add the __fpu function attribute or is that something
that would need to be done manually (which seems like it would be
painful, since it is quite a bit of code) ?
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-09 Thread Peter Zijlstra
On Thu, Apr 02, 2020 at 04:13:08PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote:

> > yes, using the floating point calculations in the display code has been a
> > source of numerous problems and confusion in the past.
> > 
> > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the
> > DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> > architecture depend handling for x86 and PPC64.
> > 
> > This originated from the graphics block integrated into AMD CPU (where we
> > knew which fp unit we had), but as far as I know is now also used for
> > dedicated AMD GPUs as well.
> > 
> > I'm not really a fan of this either, but so far we weren't able to convince
> > the hardware engineers to not use floating point calculations for the
> > display stuff.
> 
> Might I complain that:
> 
>   make O=allmodconfig-build drivers/gpu/drm/amd/display/dc/
> 
> does not in fact work?

Worse; allmodconfig doesn't select these, and hence I did not in fact
build-test them for a while :/

Anyway, I now have a config that includes them and I get plenty fail
with my objtool patch. In part because this is spread over multiple
object files and in part because of the forrest of indirect calls Jann
already mentioned.

The multi-unit issue can be fixed by simply sticking all the related .o
files in an archive and running objtool on that, but the pointer crap is
much harder.

I'll need another approach, let me consider.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-07 Thread Peter Zijlstra
On Wed, Apr 08, 2020 at 12:41:11AM +0900, Masami Hiramatsu wrote:
> On Tue, 7 Apr 2020 13:15:35 +0200
> Peter Zijlstra  wrote:

> > > > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
> > > > that previously):
> > > 
> > > Oops, let me check it.
> > 
> > I just send you another patch that could do with insn_is_vmx()
> > (sorry!!!)
> 
> Hmm, it is hard to find out the vmx insns. Maybe we need to clarify it by
> opcode pattern. (like "VM.*")

Yeah, I know. Maybe I should just keep it as I have for now.

One thing I thought of is we could perhaps add manual markers in
x86-opcode-map.txt. The '{','}' characters appear unused so far, we
perhaps we can use them to classify things.

That could maybe replace "mmx_expr" as well. That is, something like so:

---

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..e01b76e0a294 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -462,9 +462,9 @@ AVXcode: 1
 75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
 76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
 # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
-77: emms | vzeroupper | vzeroall
-78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx 
(F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
-79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx 
(F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
+77: emms {FPU} | vzeroupper | vzeroall
+78: VMREAD Ey,Gy {VMX} | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx 
(F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
+79: VMWRITE Gy,Ey {VMX} | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx 
(F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
 7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | 
vcvttps2qq/pd2qq Vx,Wx (66),(ev)
 7b: vcvtusi2sd Vpd,Hpd,Ev (F2),(ev) | vcvtusi2ss Vps,Hps,Ev (F3),(ev) | 
vcvtps2qq/pd2qq Vx,Wx (66),(ev)
 7c: vhaddpd Vpd,Hpd,Wpd (66) | vhaddps Vps,Hps,Wps (F2)
@@ -965,9 +965,9 @@ GrpTable: Grp6
 EndTable
 
 GrpTable: Grp7
-0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) 
| VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B)
+0: SGDT Ms | VMCALL (001),(11B) {VMX} | VMLAUNCH (010),(11B) {VMX} | VMRESUME 
(011),(11B) {VMX} | VMXOFF (100),(11B) {VMX} | PCONFIG (101),(11B) | ENCLV 
(000),(11B)
 1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC 
(011),(11B) | ENCLS (111),(11B)
-2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | 
XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
+2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) 
{VMX} | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
 3: LIDT Ms
 4: SMSW Mw/Rv
 5: rdpkru (110),(11B) | wrpkru (111),(11B) | SAVEPREVSSP (F3),(010),(11B) | 
RSTORSSP Mq (F3) | SETSSBSY (F3),(000),(11B)
@@ -987,8 +987,8 @@ GrpTable: Grp9
 3: xrstors
 4: xsavec
 5: xsaves
-6: VMPTRLD Mq | VMCLEAR Mq (66) | VMXON Mq (F3) | RDRAND Rv (11B)
-7: VMPTRST Mq | VMPTRST Mq (F3) | RDSEED Rv (11B)
+6: VMPTRLD Mq {VMX} | VMCLEAR Mq (66) {VMX} | VMXON Mq (F3) {VMX} | RDRAND Rv 
(11B)
+7: VMPTRST Mq {VMX} | VMPTRST Mq (F3) {VMX} | RDSEED Rv (11B)
 EndTable
 
 GrpTable: Grp10
@@ -1036,10 +1036,10 @@ GrpTable: Grp14
 EndTable
 
 GrpTable: Grp15
-0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+0: fxsave {FPU} | RDFSBASE Ry (F3),(11B)
+1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr {FPU} | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: stmxcsr {FPU} | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv 
(F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-07 Thread Masami Hiramatsu
On Tue, 7 Apr 2020 13:15:35 +0200
Peter Zijlstra  wrote:

> On Tue, Apr 07, 2020 at 06:50:08PM +0900, Masami Hiramatsu wrote:
> > On Mon, 6 Apr 2020 12:21:07 +0200
> > Peter Zijlstra  wrote:
> 
> > > arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: 
> > > fpu_safe hint not an FPU instruction
> > > 008b  36b:  48 0f ae 0d 00 00 00fxrstor64 0x0(%rip)# 373 
> > > 
> > > 
> > > arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: 
> > > fpu_safe hint not an FPU instruction
> > > 01fa1d2fa:  48 0f ae 4b 40  fxrstor64 0x40(%rbx)
> > 
> > Ah, fxstor will not chang the FPU/MMX/SSE regs but just store it on memory.
> > OK, I'll remove it from the list.
> 
> Yeah, I don't much care if its in or out, but the way I was reading that
> patch it _should_ be in, but then it doesn't seem to recognise it.

Oops, I misread. OK. I fixed the issue.

> 
> > > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
> > > that previously):
> > 
> > Oops, let me check it.
> 
> I just send you another patch that could do with insn_is_vmx()
> (sorry!!!)

Hmm, it is hard to find out the vmx insns. Maybe we need to clarify it by
opcode pattern. (like "VM.*")

Thank you,

-- 
Masami Hiramatsu 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-07 Thread Peter Zijlstra
On Tue, Apr 07, 2020 at 06:50:08PM +0900, Masami Hiramatsu wrote:
> On Mon, 6 Apr 2020 12:21:07 +0200
> Peter Zijlstra  wrote:

> > arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: 
> > fpu_safe hint not an FPU instruction
> > 008b  36b:  48 0f ae 0d 00 00 00fxrstor64 0x0(%rip)# 373 
> > 
> > 
> > arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: 
> > fpu_safe hint not an FPU instruction
> > 01fa1d2fa:  48 0f ae 4b 40  fxrstor64 0x40(%rbx)
> 
> Ah, fxstor will not chang the FPU/MMX/SSE regs but just store it on memory.
> OK, I'll remove it from the list.

Yeah, I don't much care if its in or out, but the way I was reading that
patch it _should_ be in, but then it doesn't seem to recognise it.

> > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
> > that previously):
> 
> Oops, let me check it.

I just send you another patch that could do with insn_is_vmx()
(sorry!!!)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-06 Thread Peter Zijlstra
On Sun, Apr 05, 2020 at 12:19:30PM +0900, Masami Hiramatsu wrote:

> > @@ -269,14 +269,14 @@ d4: AAM Ib (i64)
> >  d5: AAD Ib (i64)
> >  d6:
> >  d7: XLAT/XLATB
> > -d8: ESC
> > -d9: ESC
> > -da: ESC
> > -db: ESC
> > -dc: ESC
> > -dd: ESC
> > -de: ESC
> > -df: ESC
> > +d8: FPU
> > +d9: FPU
> > +da: FPU
> > +db: FPU
> > +dc: FPU
> > +dd: FPU
> > +de: FPU
> > +df: FPU
> 
> I don't want to use FPU since Intel SDM is still using ESC because it
> is co-processor escape code.

But we all know that co-processor is x87. Can we then perhaps put in
'x87' as an escape code instead of 'ESC' ?

> Here is the new patch. 
> 
> From d7eca4946ab3f0d08ad1268f49418f8655aaf57c Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu 
> Date: Fri, 3 Apr 2020 16:58:22 +0900
> Subject: [PATCH] x86: insn: Add insn_is_fpu()
> 
> Add insn_is_fpu(insn) which tells that the insn is
> whether touch the MMX/XMM/YMM register or the instruction
> of FP coprocessor.
> 
> Signed-off-by: Masami Hiramatsu 

arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: fpu_safe 
hint not an FPU instruction
008b  36b:  48 0f ae 0d 00 00 00fxrstor64 0x0(%rip)# 373 


arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: 
fpu_safe hint not an FPU instruction
01fa1d2fa:  48 0f ae 4b 40  fxrstor64 0x40(%rbx)



Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
that previously):

arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_eoi_induced()+0x20: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_write()+0x20: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_invlpg()+0x20: FPU instruction 
outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_interrupt_shadow()+0x1c: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_decache_cr4_guest_bits()+0x5a: 
FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_decache_cr0_guest_bits()+0x5a: 
FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_io()+0x24: FPU instruction 
outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_access()+0x39: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_idt()+0x57: FPU instruction 
outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_exit_info()+0x58: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_gdt()+0x57: FPU instruction 
outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_guest_apic_has_interrupt()+0xf8: 
FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_nmi_allowed()+0x98: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_handle_exit_irqoff()+0xb3: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_nmi_mask()+0x8a: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_rflags()+0x99: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_ept_misconfig()+0x22: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_interrupt_allowed()+0x8d: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_write_pml_buffer()+0x1c5: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_invpcid()+0x26a: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_ar()+0x9b: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_selector()+0x96: 
FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_base()+0x9b: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_segment()+0x2da: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmwrite_error()+0x161: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: exec_controls_set.isra.0()+0x5a: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_dr()+0x1bc: FPU instruction 
outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: update_exception_bitmap()+0x136: FPU 
instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_interrupt_shadow()+0x8b: FPU 
instruction outside of kernel_fpu_{begin,end}()

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-05 Thread Randy Dunlap
On 4/3/20 8:08 PM, Masami Hiramatsu wrote:
> +static inline int insn_is_fpu(struct insn *insn)
> +{
> + if (!insn->opcode.got)
> + insn_get_opcode(insn);
> + if (inat_is_fpu(insn->attr)) {
> + if (insn->attr & INAT_FPUIFVEX)
> + return insn_is_avx(insn);
> + return 1;
> + }

return 0; // ??

> +}
> +


-- 
~Randy

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-05 Thread Peter Zijlstra
On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote:
> From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu 
> Date: Fri, 3 Apr 2020 16:58:22 +0900
> Subject: [PATCH] x86: insn: Add insn_is_fpu()
> 
> Add insn_is_fpu(insn) which tells that the insn is
> whether touch the MMX/XMM/YMM register or the instruction
> of FP coprocessor.
> 
> Signed-off-by: Masami Hiramatsu 

With that I get a lot of warnings:

  FPU instruction outside of kernel_fpu_{begin,end}()

two random examples (x86-64-allmodconfig build):

arch/x86/xen/enlighten.o: warning: objtool: xen_vcpu_restore()+0x341: FPU 
instruction outside of kernel_fpu_{begin,end}()

$ ./objdump-func.sh defconfig-build/arch/x86/xen/enlighten.o xen_vcpu_restore | 
grep 341
0341  841:  0f 92 c3setb   %bl

arch/x86/events/core.o: warning: objtool: x86_pmu_stop()+0x6d: FPU instruction 
outside of kernel_fpu_{begin,end}()

$ ./objdump-func.sh defconfig-build/arch/x86/events/core.o x86_pmu_stop | grep 
6d
006d 23ad:  41 0f 92 c6 setb   %r14b

Which seems to suggest something goes wobbly with SETB, but I'm not
seeing what in a hurry.


---
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,6 +12,13 @@
 #define _ASM_X86_FPU_API_H
 #include 

+#define annotate_fpu() ({  \
+   asm volatile("%c0:\n\t" \
+".pushsection .discard.fpu_safe\n\t"   \
+".long %c0b - .\n\t"   \
+".popsection\n\t" : : "i" (__COUNTER__));  \
+})
+
 /*
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
  * disables preemption so be careful if you intend to use it for long periods
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -437,6 +437,7 @@ static inline int copy_fpregs_to_fpstate
 * Legacy FPU register saving, FNSAVE always clears FPU registers,
 * so we have to mark them inactive:
 */
+   annotate_fpu();
asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));

return 0;
@@ -462,6 +463,7 @@ static inline void copy_kernel_to_fpregs
 * "m" is a random variable that should be in L1.
 */
if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) {
+   annotate_fpu();
asm volatile(
"fnclex\n\t"
"emms\n\t"
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,10 @@ static void fpu__init_cpu_generic(void)
fpstate_init_soft(>thread.fpu.state.soft);
else
 #endif
+   {
+   annotate_fpu();
asm volatile ("fninit");
+   }
 }

 /*
@@ -61,6 +64,7 @@ static bool fpu__probe_without_cpuid(voi
cr0 &= ~(X86_CR0_TS | X86_CR0_EM);
write_cr0(cr0);

+   annotate_fpu();
asm volatile("fninit ; fnstsw %0 ; fnstcw %1" : "+m" (fsw), "+m" (fcw));

pr_info("x86/fpu: Probing for FPU: FSW=0x%04hx FCW=0x%04hx\n", fsw, 
fcw);
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -27,6 +27,7 @@ enum insn_type {
INSN_CLAC,
INSN_STD,
INSN_CLD,
+   INSN_FPU,
INSN_OTHER,
 };

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -92,6 +92,11 @@ int arch_decode_instruction(struct elf *
*len = insn.length;
*type = INSN_OTHER;

+   if (insn_is_fpu()) {
+   *type = INSN_FPU;
+   return 0;
+   }
+
if (insn.vex_prefix.nbytes)
return 0;

@@ -357,48 +362,54 @@ int arch_decode_instruction(struct elf *

case 0x0f:

-   if (op2 == 0x01) {
-
+   switch (op2) {
+   case 0x01:
if (modrm == 0xca)
*type = INSN_CLAC;
else if (modrm == 0xcb)
*type = INSN_STAC;
+   break;

-   } else if (op2 >= 0x80 && op2 <= 0x8f) {
-
+   case 0x80 ... 0x8f: /* Jcc */
*type = INSN_JUMP_CONDITIONAL;
+   break;

-   } else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 ||
-  op2 == 0x35) {
-
-   /* sysenter, sysret */
+   case 0x05: /* syscall */
+   case 0x07: /* sysret */
+   case 0x34: /* sysenter */
+   case 0x35: /* sysexit */
*type = INSN_CONTEXT_SWITCH;
+   break;

-   } else if (op2 == 0x0b || op2 == 0xb9) {
-
-   /* ud2 */
+   case 0xff: /* ud0 */
+   case 0xb9: /* ud1 */
+   case 0x0b: /* ud2 */
*type = 

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-04 Thread Masami Hiramatsu
On Sat, 4 Apr 2020 16:36:20 +0200
Peter Zijlstra  wrote:

> On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote:
> > From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001
> > From: Masami Hiramatsu 
> > Date: Fri, 3 Apr 2020 16:58:22 +0900
> > Subject: [PATCH] x86: insn: Add insn_is_fpu()
> > 
> > Add insn_is_fpu(insn) which tells that the insn is
> > whether touch the MMX/XMM/YMM register or the instruction
> > of FP coprocessor.
> > 
> > Signed-off-by: Masami Hiramatsu 
> 
> With that I get a lot of warnings:
> 
>   FPU instruction outside of kernel_fpu_{begin,end}()
> 
> two random examples (x86-64-allmodconfig build):
> 
> arch/x86/xen/enlighten.o: warning: objtool: xen_vcpu_restore()+0x341: FPU 
> instruction outside of kernel_fpu_{begin,end}()
> 
> $ ./objdump-func.sh defconfig-build/arch/x86/xen/enlighten.o xen_vcpu_restore 
> | grep 341
> 0341  841:  0f 92 c3setb   %bl
> 
> arch/x86/events/core.o: warning: objtool: x86_pmu_stop()+0x6d: FPU 
> instruction outside of kernel_fpu_{begin,end}()
> 
> $ ./objdump-func.sh defconfig-build/arch/x86/events/core.o x86_pmu_stop | 
> grep 6d
> 006d 23ad:  41 0f 92 c6 setb   %r14b
> 
> Which seems to suggest something goes wobbly with SETB, but I'm not
> seeing what in a hurry.

Yes, I also got same issue, please try the new one.

Thank you!

> 
> 
> ---
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -12,6 +12,13 @@
>  #define _ASM_X86_FPU_API_H
>  #include 
> 
> +#define annotate_fpu() ({\
> + asm volatile("%c0:\n\t" \
> +  ".pushsection .discard.fpu_safe\n\t"   \
> +  ".long %c0b - .\n\t"   \
> +  ".popsection\n\t" : : "i" (__COUNTER__));  \
> +})
> +
>  /*
>   * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
>   * disables preemption so be careful if you intend to use it for long periods
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -437,6 +437,7 @@ static inline int copy_fpregs_to_fpstate
>* Legacy FPU register saving, FNSAVE always clears FPU registers,
>* so we have to mark them inactive:
>*/
> + annotate_fpu();
>   asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
> 
>   return 0;
> @@ -462,6 +463,7 @@ static inline void copy_kernel_to_fpregs
>* "m" is a random variable that should be in L1.
>*/
>   if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) {
> + annotate_fpu();
>   asm volatile(
>   "fnclex\n\t"
>   "emms\n\t"
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -38,7 +38,10 @@ static void fpu__init_cpu_generic(void)
>   fpstate_init_soft(>thread.fpu.state.soft);
>   else
>  #endif
> + {
> + annotate_fpu();
>   asm volatile ("fninit");
> + }
>  }
> 
>  /*
> @@ -61,6 +64,7 @@ static bool fpu__probe_without_cpuid(voi
>   cr0 &= ~(X86_CR0_TS | X86_CR0_EM);
>   write_cr0(cr0);
> 
> + annotate_fpu();
>   asm volatile("fninit ; fnstsw %0 ; fnstcw %1" : "+m" (fsw), "+m" (fcw));
> 
>   pr_info("x86/fpu: Probing for FPU: FSW=0x%04hx FCW=0x%04hx\n", fsw, 
> fcw);
> --- a/tools/objtool/arch.h
> +++ b/tools/objtool/arch.h
> @@ -27,6 +27,7 @@ enum insn_type {
>   INSN_CLAC,
>   INSN_STD,
>   INSN_CLD,
> + INSN_FPU,
>   INSN_OTHER,
>  };
> 
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -92,6 +92,11 @@ int arch_decode_instruction(struct elf *
>   *len = insn.length;
>   *type = INSN_OTHER;
> 
> + if (insn_is_fpu()) {
> + *type = INSN_FPU;
> + return 0;
> + }
> +
>   if (insn.vex_prefix.nbytes)
>   return 0;
> 
> @@ -357,48 +362,54 @@ int arch_decode_instruction(struct elf *
> 
>   case 0x0f:
> 
> - if (op2 == 0x01) {
> -
> + switch (op2) {
> + case 0x01:
>   if (modrm == 0xca)
>   *type = INSN_CLAC;
>   else if (modrm == 0xcb)
>   *type = INSN_STAC;
> + break;
> 
> - } else if (op2 >= 0x80 && op2 <= 0x8f) {
> -
> + case 0x80 ... 0x8f: /* Jcc */
>   *type = INSN_JUMP_CONDITIONAL;
> + break;
> 
> - } else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 ||
> -op2 == 0x35) {
> -
> - /* sysenter, sysret */
> + case 0x05: /* syscall */
> + case 0x07: /* sysret */
> + case 0x34: /* sysenter */
> + case 0x35: /* sysexit */
>   *type = 

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-04 Thread Masami Hiramatsu
On Sat, 4 Apr 2020 16:32:24 +0200
Peter Zijlstra  wrote:

> On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote:
> > From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001
> > From: Masami Hiramatsu 
> > Date: Fri, 3 Apr 2020 16:58:22 +0900
> > Subject: [PATCH] x86: insn: Add insn_is_fpu()
> > 
> > Add insn_is_fpu(insn) which tells that the insn is
> > whether touch the MMX/XMM/YMM register or the instruction
> > of FP coprocessor.
> 
> Looks good, although I changed it a little like so:

OK, and I found there is a mistake on my patch. I should not use (v) for
the instruction, which makes decoder insane.

> 
> --- a/arch/x86/include/asm/insn.h
> +++ b/arch/x86/include/asm/insn.h
> @@ -133,11 +133,12 @@ static inline int insn_is_fpu(struct ins
>  {
>   if (!insn->opcode.got)
>   insn_get_opcode(insn);
> - if (inat_is_fpu(insn->attr)) {
> + if (inat_is_fpu(insn->attr)) {
>   if (insn->attr & INAT_FPUIFVEX)
>   return insn_is_avx(insn);
>   return 1;
>   }
> + return 0;
>  }
>  
>  static inline int insn_has_emulate_prefix(struct insn *insn)
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -269,14 +269,14 @@ d4: AAM Ib (i64)
>  d5: AAD Ib (i64)
>  d6:
>  d7: XLAT/XLATB
> -d8: ESC
> -d9: ESC
> -da: ESC
> -db: ESC
> -dc: ESC
> -dd: ESC
> -de: ESC
> -df: ESC
> +d8: FPU
> +d9: FPU
> +da: FPU
> +db: FPU
> +dc: FPU
> +dd: FPU
> +de: FPU
> +df: FPU

I don't want to use FPU since Intel SDM is still using ESC because it
is co-processor escape code.

Here is the new patch. 

>From d7eca4946ab3f0d08ad1268f49418f8655aaf57c Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu 
Date: Fri, 3 Apr 2020 16:58:22 +0900
Subject: [PATCH] x86: insn: Add insn_is_fpu()

Add insn_is_fpu(insn) which tells that the insn is
whether touch the MMX/XMM/YMM register or the instruction
of FP coprocessor.

Signed-off-by: Masami Hiramatsu 
---
 Changes:
  - Fix SET* also not FPU (unless it has vex prefix.)
  - Fix to remove (v) (VEX only) flag.
---
 arch/x86/include/asm/inat.h|  7 +++
 arch/x86/include/asm/insn.h| 12 
 arch/x86/lib/x86-opcode-map.txt|  6 +++---
 arch/x86/tools/gen-insn-attr-x86.awk   | 22 +-
 tools/arch/x86/include/asm/inat.h  |  7 +++
 tools/arch/x86/include/asm/insn.h  | 12 
 tools/arch/x86/lib/x86-opcode-map.txt  |  6 +++---
 tools/arch/x86/tools/gen-insn-attr-x86.awk | 22 +-
 8 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 4cf2ad521f65..ffce45178c08 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY   (1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY  (1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU   (1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX  (1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)  (pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)  (esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+   return attr & INAT_FPU;
+}
 #endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5c1ae3eff9d4..1752c54d2103 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+   if (!insn->opcode.got)
+   insn_get_opcode(insn);
+   if (inat_is_fpu(insn->attr)) {
+   if (insn->attr & INAT_FPUIFVEX)
+   return insn_is_avx(insn);
+   return 1;
+   }
+   return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
return !!insn->emulate_prefix_size;
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..c3d36b4c894d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -1037,9 +1037,9 @@ EndTable
 
 GrpTable: Grp15
 0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+1: fxrstor | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: stmxcsr | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv 
(F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git 

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-04 Thread Masami Hiramatsu
On Fri, 3 Apr 2020 20:15:11 -0700
Randy Dunlap  wrote:

> On 4/3/20 8:08 PM, Masami Hiramatsu wrote:
> > +static inline int insn_is_fpu(struct insn *insn)
> > +{
> > +   if (!insn->opcode.got)
> > +   insn_get_opcode(insn);
> > +   if (inat_is_fpu(insn->attr)) {
> > +   if (insn->attr & INAT_FPUIFVEX)
> > +   return insn_is_avx(insn);
> > +   return 1;
> > +   }
> 
>   return 0; // ??

Oops, right!

Hm, I need to add a caller for this API...

Thanks,

> 
> > +}
> > +
> 
> 
> -- 
> ~Randy
> 


-- 
Masami Hiramatsu 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-03 Thread Masami Hiramatsu
On Fri, 3 Apr 2020 13:21:13 +0200
Peter Zijlstra  wrote:

> On Fri, Apr 03, 2020 at 02:28:37PM +0900, Masami Hiramatsu wrote:
> > On Thu, 2 Apr 2020 16:13:08 +0200
> > Peter Zijlstra  wrote:
> 
> > > Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ?
> > > While digging through various opcode manuals is of course forever fun, I
> > > do feel like it might not be the best way.
> > 
> > Yes, it is possible to add INAT_FPU and insn_is_fpu().
> > But it seems that the below patch needs more classification based on
> > nmemonic or opcodes.
> 
> I went with opcode, and I think I did a fairly decent job, but I did
> find a few problems on a second look at things.
> 
> I don't think nmemonic are going to help, the x86 nmemonics are a mess
> (much like its opcode tables), there's no way to sanely detect what
> registers are effected by an instruction based on name.
> 
> The best I came up with is operand class, see below.

Yeah, so we need another map, current inat map is optimized for
decoding, and lack of some information for reducing size.
E.g. it mixed up the VEX prefix instruction with non-VEX one.

> 
> > IMHO, it is the time to expand gen-insn-attr.awk or clone it to
> > generate another opcode map, so that user will easily extend the
> > insn infrastructure.
> > (e.g. I had made an in-kernel disassembler, which generates a mnemonic
> >  maps from x86-opcode-map.txt)
> >  https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414
> 
> Cute, and I'm thinking we might want that eventually, people have been
> asking for a kernel specific objdump, one that knows about and shows all
> the magical things the kernel does, like alternative, jump-labels and
> soon the static_call stuff, but also things like the exception handling.
> 
> Objtool actually knows about much of that, and pairing it with your
> disassembler could print it.
> 
> > > + if (insn.vex_prefix.nbytes) {
> > > + *type = INSN_FPU;
> > >   return 0;
> > > + }
> 
> So that's the AVX nonsense dealt with; right until they stick an integer
> instruction in the AVX space I suppose :/ Please tell me they didn't
> already do that..

I'm not so sure.
Theoretically, x86 instruction can be encoded with VEX prefix instead of
REX prefix (most compiler may not output such inefficient code.)

> > >   op1 = insn.opcode.bytes[0];
> > >   op2 = insn.opcode.bytes[1];
> > > @@ -357,48 +359,71 @@ int arch_decode_instruction(struct elf *elf, struct 
> > > section *sec,
> > >  
> > >   case 0x0f:
> > >  
> > > + switch (op2) {
> 
> > > + case 0xae:
> > > + /* insane!! */
> > > + if ((modrm_reg >= 0 && modrm_reg <= 3) && modrm_mod != 
> > > 3 && !insn.prefixes.nbytes)
> > > + *type = INSN_FPU;
> > > + break;
> 
> This is crazy, but I was trying to get at the x86 FPU control
> instructions:
> 
>   FXSAVE, FXRSTOR, LDMXCSR and STMXCSR
> 
> Which are in Grp15

Yes, that is a complex part.

> Now arguably, I could skip them, the compiler should never emit those,
> and the newer, fancier, XSAV family isn't marked as FPU either, even
> though it will save/restore the FPU/MMX/SSE/AVX states too.
> 
> So I think I'll remove this part, it'll also make the fpu_safe
> annotations easier.
> 
> > > + case 0x10 ... 0x17:
> > > + case 0x28 ... 0x2f:
> > > + case 0x3a:
> > > + case 0x50 ... 0x77:
> > > + case 0x7a ... 0x7f:
> > > + case 0xc2:
> > > + case 0xc4 ... 0xc6:
> > > + case 0xd0 ... 0xff:
> > > + /* MMX, SSE, VMX */
> 
> So afaict these are the MMX and SSE instruction (clearly the VMX is my
> brain loosing it).
> 
> I went with the coder64 opcode tables, but our x86-opcode-map.txt seems
> to agree, mostly.
> 
> I now see that 0f 3a is not all mmx/sse, it also includes RORX which is
> an integer instruction. Also, may I state that the opcode map is a
> sodding disgrace? Why is an integer instruction stuck in the middle of
> SSE instructions like that ?!?!
> 
> And I should shorten the last range to 0xd0 ... 0xfe, as 0f ff is UD0.
> 
> Other than that I think this is pretty accurate.
> 
> > > + *type = INSN_FPU;
> > > + break;
> > > +
> > > + default:
> > > + break;
> > > + }
> > >   break;
> > >  
> > >   case 0xc9:
> > > @@ -414,6 +439,10 @@ int arch_decode_instruction(struct elf *elf, struct 
> > > section *sec,
> > >  
> > >   break;
> > >  
> > > + case 0xd8 ... 0xdf: /* x87 FPU range */
> > > + *type = INSN_FPU;
> > > + break;
> 
> Our x86-opcode-map.txt lists that as ESC, but doesn't have an escape
> table for it. Per:
> 
>   http://ref.x86asm.net/coder64.html
> 
> these are all the traditional x87 FPU ops.

Yes, for decoding, we don't need those tables.

> > > +
> > >   case 0xe3:
> > >   /* jecxz/jrcxz */
> > >   *type = INSN_JUMP_CONDITIONAL;
> 
> 
> 

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-03 Thread Peter Zijlstra
On Fri, Apr 03, 2020 at 02:28:37PM +0900, Masami Hiramatsu wrote:
> On Thu, 2 Apr 2020 16:13:08 +0200
> Peter Zijlstra  wrote:

> > Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ?
> > While digging through various opcode manuals is of course forever fun, I
> > do feel like it might not be the best way.
> 
> Yes, it is possible to add INAT_FPU and insn_is_fpu().
> But it seems that the below patch needs more classification based on
> nmemonic or opcodes.

I went with opcode, and I think I did a fairly decent job, but I did
find a few problems on a second look at things.

I don't think nmemonic are going to help, the x86 nmemonics are a mess
(much like its opcode tables), there's no way to sanely detect what
registers are effected by an instruction based on name.

The best I came up with is operand class, see below.

> IMHO, it is the time to expand gen-insn-attr.awk or clone it to
> generate another opcode map, so that user will easily extend the
> insn infrastructure.
> (e.g. I had made an in-kernel disassembler, which generates a mnemonic
>  maps from x86-opcode-map.txt)
>  https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414

Cute, and I'm thinking we might want that eventually, people have been
asking for a kernel specific objdump, one that knows about and shows all
the magical things the kernel does, like alternative, jump-labels and
soon the static_call stuff, but also things like the exception handling.

Objtool actually knows about much of that, and pairing it with your
disassembler could print it.

> > +   if (insn.vex_prefix.nbytes) {
> > +   *type = INSN_FPU;
> > return 0;
> > +   }

So that's the AVX nonsense dealt with; right until they stick an integer
instruction in the AVX space I suppose :/ Please tell me they didn't
already do that..

> >  
> > op1 = insn.opcode.bytes[0];
> > op2 = insn.opcode.bytes[1];
> > @@ -357,48 +359,71 @@ int arch_decode_instruction(struct elf *elf, struct 
> > section *sec,
> >  
> > case 0x0f:
> >  
> > +   switch (op2) {

> > +   case 0xae:
> > +   /* insane!! */
> > +   if ((modrm_reg >= 0 && modrm_reg <= 3) && modrm_mod != 
> > 3 && !insn.prefixes.nbytes)
> > +   *type = INSN_FPU;
> > +   break;

This is crazy, but I was trying to get at the x86 FPU control
instructions:

  FXSAVE, FXRSTOR, LDMXCSR and STMXCSR

Which are in Grp15

Now arguably, I could skip them, the compiler should never emit those,
and the newer, fancier, XSAV family isn't marked as FPU either, even
though it will save/restore the FPU/MMX/SSE/AVX states too.

So I think I'll remove this part, it'll also make the fpu_safe
annotations easier.

> > +   case 0x10 ... 0x17:
> > +   case 0x28 ... 0x2f:
> > +   case 0x3a:
> > +   case 0x50 ... 0x77:
> > +   case 0x7a ... 0x7f:
> > +   case 0xc2:
> > +   case 0xc4 ... 0xc6:
> > +   case 0xd0 ... 0xff:
> > +   /* MMX, SSE, VMX */

So afaict these are the MMX and SSE instruction (clearly the VMX is my
brain loosing it).

I went with the coder64 opcode tables, but our x86-opcode-map.txt seems
to agree, mostly.

I now see that 0f 3a is not all mmx/sse, it also includes RORX which is
an integer instruction. Also, may I state that the opcode map is a
sodding disgrace? Why is an integer instruction stuck in the middle of
SSE instructions like that ?!?!

And I should shorten the last range to 0xd0 ... 0xfe, as 0f ff is UD0.

Other than that I think this is pretty accurate.

> > +   *type = INSN_FPU;
> > +   break;
> > +
> > +   default:
> > +   break;
> > +   }
> > break;
> >  
> > case 0xc9:
> > @@ -414,6 +439,10 @@ int arch_decode_instruction(struct elf *elf, struct 
> > section *sec,
> >  
> > break;
> >  
> > +   case 0xd8 ... 0xdf: /* x87 FPU range */
> > +   *type = INSN_FPU;
> > +   break;

Our x86-opcode-map.txt lists that as ESC, but doesn't have an escape
table for it. Per:

  http://ref.x86asm.net/coder64.html

these are all the traditional x87 FPU ops.


> > +
> > case 0xe3:
> > /* jecxz/jrcxz */
> > *type = INSN_JUMP_CONDITIONAL;


Now; I suppose I need our x86-opcode-map.txt extended in at least two
ways:

 - all those x87 FPU instructions need adding
 - a way of detecting the affected register set

Now, I suspect we can do that latter by the instruction operands that
are already there, although I've not managed to untangle them fully
(hint, we really should improve the comments on top). Operands seem to
have one capital that denotes the class:

 - I: immediate
 - G: general purpose
 - E
 - P,Q: MMX
 - V,M,W,H: SSE

So if we can extend the awk magic to provide operand classes for each
decoded instruction, then that would simplify this lots.

New version below...

---
--- 

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-02 Thread Masami Hiramatsu
On Thu, 2 Apr 2020 16:13:08 +0200
Peter Zijlstra  wrote:

> On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote:
> > Hi Jann,
> > 
> > Am 02.04.20 um 04:34 schrieb Jann Horn:
> > > [x86 folks in CC so that they can chime in on the precise rules for this 
> > > stuff]
> > > 
> > > Hi!
> > > 
> > > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
> > > turn on floating-point instructions in the compiler flags
> > > (-mhard-float, -msse and -msse2) in order to make the "float" and
> > > "double" types usable from C code without requiring helper functions.
> > > 
> > > However, as far as I know, code running in normal kernel context isn't
> > > allowed to use floating-point registers without special protection
> > > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
> > > require that the protected code never blocks). If you violate that
> > > rule, that can lead to various issues - among other things, I think
> > > the kernel will clobber userspace FPU register state, and I think the
> > > kernel code can blow up if a context switch happens at the wrong time,
> > > since in-kernel task switches don't preserve FPU state.
> > > 
> > > Is there some hidden trick I'm missing that makes it okay to use FPU
> > > registers here?
> > > 
> > > I would try testing this, but unfortunately none of the AMD devices I
> > > have here have the appropriate graphics hardware...
> > 
> > yes, using the floating point calculations in the display code has been a
> > source of numerous problems and confusion in the past.
> > 
> > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the
> > DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> > architecture depend handling for x86 and PPC64.
> > 
> > This originated from the graphics block integrated into AMD CPU (where we
> > knew which fp unit we had), but as far as I know is now also used for
> > dedicated AMD GPUs as well.
> > 
> > I'm not really a fan of this either, but so far we weren't able to convince
> > the hardware engineers to not use floating point calculations for the
> > display stuff.
> 
> Might I complain that:
> 
>   make O=allmodconfig-build drivers/gpu/drm/amd/display/dc/
> 
> does not in fact work?
> 
> Anyway, how do people feel about something like the below?
> 
> Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ?
> While digging through various opcode manuals is of course forever fun, I
> do feel like it might not be the best way.

Yes, it is possible to add INAT_FPU and insn_is_fpu().
But it seems that the below patch needs more classification based on
nmemonic or opcodes.

IMHO, it is the time to expand gen-insn-attr.awk or clone it to
generate another opcode map, so that user will easily extend the
insn infrastructure.
(e.g. I had made an in-kernel disassembler, which generates a mnemonic
 maps from x86-opcode-map.txt)
 https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414

Thank you,

> 
> ---
>  arch/x86/include/asm/fpu/api.h  |  7 
>  arch/x86/include/asm/fpu/internal.h | 11 ++
>  arch/x86/kernel/fpu/init.c  |  5 +++
>  tools/objtool/arch.h|  1 +
>  tools/objtool/arch/x86/decode.c | 71 
> ++---
>  tools/objtool/check.c   | 58 ++
>  tools/objtool/check.h   |  3 +-
>  7 files changed, 134 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
> index b774c52e5411..b9bca1cdc875 100644
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -12,6 +12,13 @@
>  #define _ASM_X86_FPU_API_H
>  #include 
>  
> +#define annotate_fpu() ({\
> + asm volatile("%c0:\n\t" \
> +  ".pushsection .discard.fpu_safe\n\t"   \
> +  ".long %c0b - .\n\t"   \
> +  ".popsection\n\t" : : "i" (__COUNTER__));  \
> +})
> +
>  /*
>   * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
>   * disables preemption so be careful if you intend to use it for long periods
> diff --git a/arch/x86/include/asm/fpu/internal.h 
> b/arch/x86/include/asm/fpu/internal.h
> index 44c48e34d799..bc436213a0d4 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -102,6 +102,11 @@ static inline void fpstate_init_fxstate(struct 
> fxregs_state *fx)
>  }
>  extern void fpstate_sanitize_xstate(struct fpu *fpu);
>  
> +#define _ASM_ANNOTATE_FPU(at)
> \
> +  ".pushsection .discard.fpu_safe\n" \
> +  ".long " #at " - .\n"  \
> +  ".popsection\n"\
> +
>  #define 

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-02 Thread Jann Horn
On Thu, Apr 2, 2020 at 11:36 AM Thomas Gleixner  wrote:
> Jann Horn  writes:
> > On Thu, Apr 2, 2020 at 9:34 AM Christian König  
> > wrote:
> >> Am 02.04.20 um 04:34 schrieb Jann Horn:
> >> > [x86 folks in CC so that they can chime in on the precise rules for
> >> > this stuff]
>
> They are pretty simple.
>
> Any code using FPU needs to be completely isolated from regular code
> either by using inline asm or by moving it to a different compilation
> unit. The invocations need fpu_begin/end() of course.
[...]
> We really need objtool support to validate that.
>
> Peter, now that we know how to do it (noinstr, clac/stac) we can emit
> annotations (see patch below) and validate that any FPU instruction is
> inside a safe region. Hmm?

One annoying aspect is that for the "move it to a different
compilation unit" method, objtool needs to know at compile time
(before linking) which functions are in FPU-enabled object files,
right? So we'd need to have some sort of function annotation that gets
plumbed from the function declaration in a header file through the
compiler into the ELF file, and then let objtool verify that calls to
FPU-enabled methods occur only when the FPU is available? (Ideally
something that covers indirect calls... but this would probably get
really complicated unless we can get the compiler to include that
annotation in its type checking.)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-02 Thread Peter Zijlstra
On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote:
> Hi Jann,
> 
> Am 02.04.20 um 04:34 schrieb Jann Horn:
> > [x86 folks in CC so that they can chime in on the precise rules for this 
> > stuff]
> > 
> > Hi!
> > 
> > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
> > turn on floating-point instructions in the compiler flags
> > (-mhard-float, -msse and -msse2) in order to make the "float" and
> > "double" types usable from C code without requiring helper functions.
> > 
> > However, as far as I know, code running in normal kernel context isn't
> > allowed to use floating-point registers without special protection
> > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
> > require that the protected code never blocks). If you violate that
> > rule, that can lead to various issues - among other things, I think
> > the kernel will clobber userspace FPU register state, and I think the
> > kernel code can blow up if a context switch happens at the wrong time,
> > since in-kernel task switches don't preserve FPU state.
> > 
> > Is there some hidden trick I'm missing that makes it okay to use FPU
> > registers here?
> > 
> > I would try testing this, but unfortunately none of the AMD devices I
> > have here have the appropriate graphics hardware...
> 
> yes, using the floating point calculations in the display code has been a
> source of numerous problems and confusion in the past.
> 
> The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the
> DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> architecture depend handling for x86 and PPC64.
> 
> This originated from the graphics block integrated into AMD CPU (where we
> knew which fp unit we had), but as far as I know is now also used for
> dedicated AMD GPUs as well.
> 
> I'm not really a fan of this either, but so far we weren't able to convince
> the hardware engineers to not use floating point calculations for the
> display stuff.

Might I complain that:

make O=allmodconfig-build drivers/gpu/drm/amd/display/dc/

does not in fact work?

Anyway, how do people feel about something like the below?

Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ?
While digging through various opcode manuals is of course forever fun, I
do feel like it might not be the best way.

---
 arch/x86/include/asm/fpu/api.h  |  7 
 arch/x86/include/asm/fpu/internal.h | 11 ++
 arch/x86/kernel/fpu/init.c  |  5 +++
 tools/objtool/arch.h|  1 +
 tools/objtool/arch/x86/decode.c | 71 ++---
 tools/objtool/check.c   | 58 ++
 tools/objtool/check.h   |  3 +-
 7 files changed, 134 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index b774c52e5411..b9bca1cdc875 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,6 +12,13 @@
 #define _ASM_X86_FPU_API_H
 #include 
 
+#define annotate_fpu() ({  \
+   asm volatile("%c0:\n\t" \
+".pushsection .discard.fpu_safe\n\t"   \
+".long %c0b - .\n\t"   \
+".popsection\n\t" : : "i" (__COUNTER__));  \
+})
+
 /*
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
  * disables preemption so be careful if you intend to use it for long periods
diff --git a/arch/x86/include/asm/fpu/internal.h 
b/arch/x86/include/asm/fpu/internal.h
index 44c48e34d799..bc436213a0d4 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -102,6 +102,11 @@ static inline void fpstate_init_fxstate(struct 
fxregs_state *fx)
 }
 extern void fpstate_sanitize_xstate(struct fpu *fpu);
 
+#define _ASM_ANNOTATE_FPU(at)  \
+".pushsection .discard.fpu_safe\n" \
+".long " #at " - .\n"  \
+".popsection\n"\
+
 #define user_insn(insn, output, input...)  \
 ({ \
int err;\
@@ -116,6 +121,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
 "jmp  2b\n"\
 ".previous\n"  \
 _ASM_EXTABLE(1b, 3b)   \
+_ASM_ANNOTATE_FPU(1b)  \
 : [err] "=r" (err), output \
 : "0"(0), input);  \
  

Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-02 Thread Jann Horn
On Thu, Apr 2, 2020 at 9:34 AM Christian König  wrote:
> Am 02.04.20 um 04:34 schrieb Jann Horn:
> > [x86 folks in CC so that they can chime in on the precise rules for this 
> > stuff]
> > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
> > turn on floating-point instructions in the compiler flags
> > (-mhard-float, -msse and -msse2) in order to make the "float" and
> > "double" types usable from C code without requiring helper functions.
> >
> > However, as far as I know, code running in normal kernel context isn't
> > allowed to use floating-point registers without special protection
> > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
> > require that the protected code never blocks). If you violate that
> > rule, that can lead to various issues - among other things, I think
> > the kernel will clobber userspace FPU register state, and I think the
> > kernel code can blow up if a context switch happens at the wrong time,
> > since in-kernel task switches don't preserve FPU state.
> >
> > Is there some hidden trick I'm missing that makes it okay to use FPU
> > registers here?
> >
> > I would try testing this, but unfortunately none of the AMD devices I
> > have here have the appropriate graphics hardware...
>
> yes, using the floating point calculations in the display code has been
> a source of numerous problems and confusion in the past.
>
> The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind
> the DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> architecture depend handling for x86 and PPC64.

Hmm... but as far as I can tell, you're using those macros from inside
functions that are already compiled with the FPU on:

 - drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c uses the macros,
but is already compiled with calcs_ccflags
 - drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c uses the
macros, but is already compiled with "-mhard-float -msse -msse2"
 - drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c uses the
macros, but is already compiled with "-mhard-float -msse -msse2"

AFAIK as soon as you enter any function in any file compiled with FPU
instructions, you may encounter SSE instructions, e.g. via things like
compiler-generated memory-zeroing code - not just when you're actually
using doubles or floats.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-02 Thread Thomas Gleixner
Jann Horn  writes:
> On Thu, Apr 2, 2020 at 9:34 AM Christian König  
> wrote:
>> Am 02.04.20 um 04:34 schrieb Jann Horn:
>> > [x86 folks in CC so that they can chime in on the precise rules for
>> > this stuff]

They are pretty simple.

Any code using FPU needs to be completely isolated from regular code
either by using inline asm or by moving it to a different compilation
unit. The invocations need fpu_begin/end() of course.

>> > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
>> > turn on floating-point instructions in the compiler flags
>> > (-mhard-float, -msse and -msse2) in order to make the "float" and
>> > "double" types usable from C code without requiring helper functions.
>> >
>> > However, as far as I know, code running in normal kernel context isn't
>> > allowed to use floating-point registers without special protection
>> > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
>> > require that the protected code never blocks). If you violate that
>> > rule, that can lead to various issues - among other things, I think
>> > the kernel will clobber userspace FPU register state, and I think the
>> > kernel code can blow up if a context switch happens at the wrong time,
>> > since in-kernel task switches don't preserve FPU state.
>> >
>> > Is there some hidden trick I'm missing that makes it okay to use FPU
>> > registers here?
>> >
>> > I would try testing this, but unfortunately none of the AMD devices I
>> > have here have the appropriate graphics hardware...
>>
>> yes, using the floating point calculations in the display code has been
>> a source of numerous problems and confusion in the past.
>>
>> The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind
>> the DC_FP_START() and DC_FP_END() macros which are supposed to hide the
>> architecture depend handling for x86 and PPC64.
>
> Hmm... but as far as I can tell, you're using those macros from inside
> functions that are already compiled with the FPU on:
>
>  - drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c uses the macros,
> but is already compiled with calcs_ccflags
>  - drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c uses the
> macros, but is already compiled with "-mhard-float -msse -msse2"
>  - drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c uses the
> macros, but is already compiled with "-mhard-float -msse -msse2"
>
> AFAIK as soon as you enter any function in any file compiled with FPU
> instructions, you may encounter SSE instructions, e.g. via things like
> compiler-generated memory-zeroing code - not just when you're actually
> using doubles or floats.

That's correct and this will silently cause FPU state corruption ...

We really need objtool support to validate that.

Peter, now that we know how to do it (noinstr, clac/stac) we can emit
annotations (see patch below) and validate that any FPU instruction is
inside a safe region. Hmm?

Thanks,

tglx

8<---
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -19,8 +19,27 @@
  * If you intend to use the FPU in softirq you need to check first with
  * irq_fpu_usable() if it is possible.
  */
-extern void kernel_fpu_begin(void);
-extern void kernel_fpu_end(void);
+extern void __kernel_fpu_begin(void);
+extern void __kernel_fpu_end(void);
+
+static inline void kernel_fpu_begin(void)
+{
+   asm volatile("%c0:\n\t"
+".pushsection .discard.fpu_begin\n\t"
+".long %c0b - .\n\t"
+".popsection\n\t" : : "i" (__COUNTER__));
+   __kernel_fpu_begin();
+}
+
+static inline void kernel_fpu_end(void)
+{
+   __kernel_fpu_end();
+   asm volatile("%c0:\n\t"
+".pushsection .discard.fpu_end\n\t"
+".long %c0b - .\n\t"
+".popsection\n\t" : : "i" (__COUNTER__));
+}
+
 extern bool irq_fpu_usable(void);
 extern void fpregs_mark_activate(void);
 
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -82,7 +82,7 @@ bool irq_fpu_usable(void)
 }
 EXPORT_SYMBOL(irq_fpu_usable);
 
-void kernel_fpu_begin(void)
+void __kernel_fpu_begin(void)
 {
preempt_disable();
 
@@ -102,16 +102,16 @@ void kernel_fpu_begin(void)
}
__cpu_invalidate_fpregs_state();
 }
-EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+EXPORT_SYMBOL_GPL(__kernel_fpu_begin);
 
-void kernel_fpu_end(void)
+void __kernel_fpu_end(void)
 {
WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
 
this_cpu_write(in_kernel_fpu, false);
preempt_enable();
 }
-EXPORT_SYMBOL_GPL(kernel_fpu_end);
+EXPORT_SYMBOL_GPL(__kernel_fpu_end);
 
 /*
  * Save the FPU state (mark it for reload if necessary):



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-02 Thread Christian König

Hi Jann,

Am 02.04.20 um 04:34 schrieb Jann Horn:

[x86 folks in CC so that they can chime in on the precise rules for this stuff]

Hi!

I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
turn on floating-point instructions in the compiler flags
(-mhard-float, -msse and -msse2) in order to make the "float" and
"double" types usable from C code without requiring helper functions.

However, as far as I know, code running in normal kernel context isn't
allowed to use floating-point registers without special protection
using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
require that the protected code never blocks). If you violate that
rule, that can lead to various issues - among other things, I think
the kernel will clobber userspace FPU register state, and I think the
kernel code can blow up if a context switch happens at the wrong time,
since in-kernel task switches don't preserve FPU state.

Is there some hidden trick I'm missing that makes it okay to use FPU
registers here?

I would try testing this, but unfortunately none of the AMD devices I
have here have the appropriate graphics hardware...


yes, using the floating point calculations in the display code has been 
a source of numerous problems and confusion in the past.


The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind 
the DC_FP_START() and DC_FP_END() macros which are supposed to hide the 
architecture depend handling for x86 and PPC64.


This originated from the graphics block integrated into AMD CPU (where 
we knew which fp unit we had), but as far as I know is now also used for 
dedicated AMD GPUs as well.


I'm not really a fan of this either, but so far we weren't able to 
convince the hardware engineers to not use floating point calculations 
for the display stuff.


Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx