Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Fri, Apr 17, 2020 at 04:27:28PM -0400, Rodrigo Siqueira wrote: > I'm going to work on the FPU issues in the display code. In this sense, > could you clarify a little bit more about the Makefile issues? I think it might have been PEBKAC, I assumed allmodconfig would end up selecting the driver, it doesn't! Adding "AMD GPU" to a defconfig seems to make it work: $ make O=defconfig-build/ drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o Sorry about the confusion. > Also, I applied the patch `[PATCH v4] x86: insn: Add insn_is_fpu()` and > tried to reproduce the warning that you described but I didn't see it. > Could you explain to me how I can check those warnings? grab: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/fpu Then build like above: $ make O=defconfig-build/ drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o And manually validate: $ defconfig-build/tools/objtool/objtool check -Ffa defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: objtool: dcn_validate_bandwidth()+0x1b17: FPU instruction outside of kernel_fpu_{begin,end}() ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On 04/09, Peter Zijlstra wrote: > On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote: > > Am 09.04.20 um 19:09 schrieb Peter Zijlstra: > > > On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote: > > > [SNIP] > > > > I'll need another approach, let me consider. > > > Christian; it says these files are generated, does that generator know > > > which functions are wholly in FPU context and which are not? > > > > Well that "generator" is still a human being :) > > > > It's just that the formulae for the calculation come from the hardware team > > and we are not able to easily transcript them to fixed point calculations. > > Well, if it's a human, can this human respect the kernel coding style a > bit more :-) Some of that stuff is atrocious. > > > > My current thinking is that if I annotate all functions that are wholly > > > inside kernel_fpu_start() with an __fpu function attribute, then I can > > > verify that any call from regular text to fpu text only happens inside > > > kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation > > > fuctions only contain !fpu instructions. > > > > Yeah, that sounds like a good idea to me and should be easily doable. > > > > > Can that generator add the __fpu function attribute or is that something > > > that would need to be done manually (which seems like it would be > > > painful, since it is quite a bit of code) ? > > > > We are currently in the process of moving all the stuff which requires > > floating point into a single C file(s) and then make sure that we only call > > those within kernel_fpu_begin()/end() blocks. > > Can you make the build system stick all those .o files in a single > archive? That's the only way I can do call validation; external > relocatoin records do not contain the section. > > > Annotating those function with __fpu or even saying to gcc that all code of > > those files should go into a special text.fpu segment shouldn't be much of a > > problem. > > Guess what the __fpu attribute does ;-) > > With the below patch (which is on to of newer versions of the objtool > patches send earlier, let me know if you want a full set) that only > converts a few files, but fully converts: > > drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c > > But building it (and this is an absolute pain; when you're reworking > this, can you pretty please also fix the Makefiles?), we get: Hi, I'm going to work on the FPU issues in the display code. In this sense, could you clarify a little bit more about the Makefile issues? Also, I applied the patch `[PATCH v4] x86: insn: Add insn_is_fpu()` and tried to reproduce the warning that you described but I didn't see it. Could you explain to me how I can check those warnings? Thanks > drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: > objtool: dcn_validate_bandwidth()+0x34fa: FPU instruction outside of > kernel_fpu_{begin,end}() > > $ ./scripts/faddr2line > defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o > dcn_validate_bandwidth+0x34fa > dcn_validate_bandwidth+0x34fa/0x57ce: > dcn_validate_bandwidth at > /usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293 > (discriminator 5) > > # ./objdump-func.sh > defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o > dcn_validate_bandwidth | grep 34fa > 34fa 50fa: f2 0f 10 b5 60 ff ffmovsd -0xa0(%rbp),%xmm6 > > Which seems to indicate there's still problms with the current code. > > > > --- > arch/x86/include/asm/fpu/api.h | 12 +++ > arch/x86/kernel/vmlinux.lds.S | 1 + > .../gpu/drm/amd/display/dc/calcs/dcn_calc_math.c | 25 > +++--- > drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c | 4 ++-- > .../display/dc/dml/dcn20/display_rq_dlg_calc_20.c | 10 - > .../amd/display/dc/dml/display_rq_dlg_helpers.c| 2 +- > .../gpu/drm/amd/display/dc/dml/dml_common_defs.c | 2 +- > drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c| 2 +- > drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c | 10 - > drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c | 4 ++-- > drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h | 2 ++ > tools/objtool/check.c | 7 +- > tools/objtool/elf.h| 2 +- > 13 files changed, 52 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h > index 64be4426fda9..19eaf98bbb0a 100644 > --- a/arch/x86/include/asm/fpu/api.h > +++ b/arch/x86/include/asm/fpu/api.h > @@ -12,11 +12,23 @@ > #define _ASM_X86_FPU_API_H > #include > > +#ifdef CONFIG_STACK_VALIDATION > + > +#define __fpu __section(".text.fpu") > + > #define _ASM_ANNOTATE_FPU(at) > \ >".pushsection .discard.fpu_safe\n"
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Fri, Apr 10, 2020 at 04:31:39PM +0200, Christian König wrote: > Can we put this new automated check will be behind a configuration flag > initially? Or at least make it a warning and not a hard error. I'll try and get the patches merged in mainline objtool but with a flag that isn't used by default. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
Am 09.04.20 um 22:01 schrieb Peter Zijlstra: On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote: Am 09.04.20 um 19:09 schrieb Peter Zijlstra: On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote: [SNIP] I'll need another approach, let me consider. Christian; it says these files are generated, does that generator know which functions are wholly in FPU context and which are not? Well that "generator" is still a human being :) It's just that the formulae for the calculation come from the hardware team and we are not able to easily transcript them to fixed point calculations. Well, if it's a human, can this human respect the kernel coding style a bit more :-) Some of that stuff is atrocious. Yes, I know. That's unfortunately something we still need to work on as well. We are currently in the process of moving all the stuff which requires floating point into a single C file(s) and then make sure that we only call those within kernel_fpu_begin()/end() blocks. Can you make the build system stick all those .o files in a single archive? That's the only way I can do call validation; external relocatoin records do not contain the section. Need to double check that with the display team responsible for the code, but I think that shouldn't be much of a problem. Annotating those function with __fpu or even saying to gcc that all code of those files should go into a special text.fpu segment shouldn't be much of a problem. Guess what the __fpu attribute does ;-) Good to know that my suspicion how this is implemented was correct :) With the below patch (which is on to of newer versions of the objtool patches send earlier, let me know if you want a full set Getting a branch somewhere would be perfect. ) that only converts a few files, but fully converts: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c But building it (and this is an absolute pain; when you're reworking this, can you pretty please also fix the Makefiles?), we get: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: objtool: dcn_validate_bandwidth()+0x34fa: FPU instruction outside of kernel_fpu_{begin,end}() $ ./scripts/faddr2line defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth+0x34fa dcn_validate_bandwidth+0x34fa/0x57ce: dcn_validate_bandwidth at /usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293 (discriminator 5) # ./objdump-func.sh defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth | grep 34fa 34fa 50fa: f2 0f 10 b5 60 ff ffmovsd -0xa0(%rbp),%xmm6 Which seems to indicate there's still problms with the current code. Making an educated guess I would say the compiler has no idea that it shouldn't use instructions which touch fp registers outside of kernel_fpu_{begin,end}(). Going to talk with the display team about this whole topic internally once more. Since this discussion already raised attention in our technical management it shouldn't be to much of a problem to get manpower to get this fixed properly. Can we put this new automated check will be behind a configuration flag initially? Or at least make it a warning and not a hard error. Thanks, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote: > Am 09.04.20 um 19:09 schrieb Peter Zijlstra: > > On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote: > > [SNIP] > > > I'll need another approach, let me consider. > > Christian; it says these files are generated, does that generator know > > which functions are wholly in FPU context and which are not? > > Well that "generator" is still a human being :) > > It's just that the formulae for the calculation come from the hardware team > and we are not able to easily transcript them to fixed point calculations. Well, if it's a human, can this human respect the kernel coding style a bit more :-) Some of that stuff is atrocious. > > My current thinking is that if I annotate all functions that are wholly > > inside kernel_fpu_start() with an __fpu function attribute, then I can > > verify that any call from regular text to fpu text only happens inside > > kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation > > fuctions only contain !fpu instructions. > > Yeah, that sounds like a good idea to me and should be easily doable. > > > Can that generator add the __fpu function attribute or is that something > > that would need to be done manually (which seems like it would be > > painful, since it is quite a bit of code) ? > > We are currently in the process of moving all the stuff which requires > floating point into a single C file(s) and then make sure that we only call > those within kernel_fpu_begin()/end() blocks. Can you make the build system stick all those .o files in a single archive? That's the only way I can do call validation; external relocatoin records do not contain the section. > Annotating those function with __fpu or even saying to gcc that all code of > those files should go into a special text.fpu segment shouldn't be much of a > problem. Guess what the __fpu attribute does ;-) With the below patch (which is on to of newer versions of the objtool patches send earlier, let me know if you want a full set) that only converts a few files, but fully converts: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c But building it (and this is an absolute pain; when you're reworking this, can you pretty please also fix the Makefiles?), we get: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: objtool: dcn_validate_bandwidth()+0x34fa: FPU instruction outside of kernel_fpu_{begin,end}() $ ./scripts/faddr2line defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth+0x34fa dcn_validate_bandwidth+0x34fa/0x57ce: dcn_validate_bandwidth at /usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293 (discriminator 5) # ./objdump-func.sh defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth | grep 34fa 34fa 50fa: f2 0f 10 b5 60 ff ffmovsd -0xa0(%rbp),%xmm6 Which seems to indicate there's still problms with the current code. --- arch/x86/include/asm/fpu/api.h | 12 +++ arch/x86/kernel/vmlinux.lds.S | 1 + .../gpu/drm/amd/display/dc/calcs/dcn_calc_math.c | 25 +++--- drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c | 4 ++-- .../display/dc/dml/dcn20/display_rq_dlg_calc_20.c | 10 - .../amd/display/dc/dml/display_rq_dlg_helpers.c| 2 +- .../gpu/drm/amd/display/dc/dml/dml_common_defs.c | 2 +- drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c| 2 +- drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c | 10 - drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c | 4 ++-- drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h | 2 ++ tools/objtool/check.c | 7 +- tools/objtool/elf.h| 2 +- 13 files changed, 52 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 64be4426fda9..19eaf98bbb0a 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -12,11 +12,23 @@ #define _ASM_X86_FPU_API_H #include +#ifdef CONFIG_STACK_VALIDATION + +#define __fpu __section(".text.fpu") + #define _ASM_ANNOTATE_FPU(at) \ ".pushsection .discard.fpu_safe\n" \ ".long " #at " - .\n" \ ".popsection\n"\ +#else + +#define __fpu + +#define _ASM_ANNOTATE_FPU(at) + +#endif /* CONFIG_STACK_VALIDATION */ + #define annotate_fpu() ({ \ asm volatile("%c0:\n\t" \ _ASM_ANNOTATE_FPU(%c0b)\ diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 1bf7e312361f..8442f8633d07 100644
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
Am 09.04.20 um 19:09 schrieb Peter Zijlstra: On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote: [SNIP] I'll need another approach, let me consider. Christian; it says these files are generated, does that generator know which functions are wholly in FPU context and which are not? Well that "generator" is still a human being :) It's just that the formulae for the calculation come from the hardware team and we are not able to easily transcript them to fixed point calculations. My current thinking is that if I annotate all functions that are wholly inside kernel_fpu_start() with an __fpu function attribute, then I can verify that any call from regular text to fpu text only happens inside kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation fuctions only contain !fpu instructions. Yeah, that sounds like a good idea to me and should be easily doable. Can that generator add the __fpu function attribute or is that something that would need to be done manually (which seems like it would be painful, since it is quite a bit of code) ? We are currently in the process of moving all the stuff which requires floating point into a single C file(s) and then make sure that we only call those within kernel_fpu_begin()/end() blocks. Annotating those function with __fpu or even saying to gcc that all code of those files should go into a special text.fpu segment shouldn't be much of a problem. Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote: > On Thu, Apr 02, 2020 at 04:13:08PM +0200, Peter Zijlstra wrote: > > On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote: > > > > yes, using the floating point calculations in the display code has been a > > > source of numerous problems and confusion in the past. > > > > > > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the > > > DC_FP_START() and DC_FP_END() macros which are supposed to hide the > > > architecture depend handling for x86 and PPC64. > > > > > > This originated from the graphics block integrated into AMD CPU (where we > > > knew which fp unit we had), but as far as I know is now also used for > > > dedicated AMD GPUs as well. > > > > > > I'm not really a fan of this either, but so far we weren't able to > > > convince > > > the hardware engineers to not use floating point calculations for the > > > display stuff. > I'll need another approach, let me consider. Christian; it says these files are generated, does that generator know which functions are wholly in FPU context and which are not? My current thinking is that if I annotate all functions that are wholly inside kernel_fpu_start() with an __fpu function attribute, then I can verify that any call from regular text to fpu text only happens inside kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation fuctions only contain !fpu instructions. Can that generator add the __fpu function attribute or is that something that would need to be done manually (which seems like it would be painful, since it is quite a bit of code) ? ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Thu, Apr 02, 2020 at 04:13:08PM +0200, Peter Zijlstra wrote: > On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote: > > yes, using the floating point calculations in the display code has been a > > source of numerous problems and confusion in the past. > > > > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the > > DC_FP_START() and DC_FP_END() macros which are supposed to hide the > > architecture depend handling for x86 and PPC64. > > > > This originated from the graphics block integrated into AMD CPU (where we > > knew which fp unit we had), but as far as I know is now also used for > > dedicated AMD GPUs as well. > > > > I'm not really a fan of this either, but so far we weren't able to convince > > the hardware engineers to not use floating point calculations for the > > display stuff. > > Might I complain that: > > make O=allmodconfig-build drivers/gpu/drm/amd/display/dc/ > > does not in fact work? Worse; allmodconfig doesn't select these, and hence I did not in fact build-test them for a while :/ Anyway, I now have a config that includes them and I get plenty fail with my objtool patch. In part because this is spread over multiple object files and in part because of the forrest of indirect calls Jann already mentioned. The multi-unit issue can be fixed by simply sticking all the related .o files in an archive and running objtool on that, but the pointer crap is much harder. I'll need another approach, let me consider. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Wed, Apr 08, 2020 at 12:41:11AM +0900, Masami Hiramatsu wrote: > On Tue, 7 Apr 2020 13:15:35 +0200 > Peter Zijlstra wrote: > > > > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing > > > > that previously): > > > > > > Oops, let me check it. > > > > I just send you another patch that could do with insn_is_vmx() > > (sorry!!!) > > Hmm, it is hard to find out the vmx insns. Maybe we need to clarify it by > opcode pattern. (like "VM.*") Yeah, I know. Maybe I should just keep it as I have for now. One thing I thought of is we could perhaps add manual markers in x86-opcode-map.txt. The '{','}' characters appear unused so far, we perhaps we can use them to classify things. That could maybe replace "mmx_expr" as well. That is, something like so: --- diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt index ec31f5b60323..e01b76e0a294 100644 --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -462,9 +462,9 @@ AVXcode: 1 75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1) 76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1) # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX. -77: emms | vzeroupper | vzeroall -78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev) -79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev) +77: emms {FPU} | vzeroupper | vzeroall +78: VMREAD Ey,Gy {VMX} | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev) +79: VMWRITE Gy,Ey {VMX} | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev) 7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev) 7b: vcvtusi2sd Vpd,Hpd,Ev (F2),(ev) | vcvtusi2ss Vps,Hps,Ev (F3),(ev) | vcvtps2qq/pd2qq Vx,Wx (66),(ev) 7c: vhaddpd Vpd,Hpd,Wpd (66) | vhaddps Vps,Hps,Wps (F2) @@ -965,9 +965,9 @@ GrpTable: Grp6 EndTable GrpTable: Grp7 -0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) +0: SGDT Ms | VMCALL (001),(11B) {VMX} | VMLAUNCH (010),(11B) {VMX} | VMRESUME (011),(11B) {VMX} | VMXOFF (100),(11B) {VMX} | PCONFIG (101),(11B) | ENCLV (000),(11B) 1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) -2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B) +2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) {VMX} | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B) 3: LIDT Ms 4: SMSW Mw/Rv 5: rdpkru (110),(11B) | wrpkru (111),(11B) | SAVEPREVSSP (F3),(010),(11B) | RSTORSSP Mq (F3) | SETSSBSY (F3),(000),(11B) @@ -987,8 +987,8 @@ GrpTable: Grp9 3: xrstors 4: xsavec 5: xsaves -6: VMPTRLD Mq | VMCLEAR Mq (66) | VMXON Mq (F3) | RDRAND Rv (11B) -7: VMPTRST Mq | VMPTRST Mq (F3) | RDSEED Rv (11B) +6: VMPTRLD Mq {VMX} | VMCLEAR Mq (66) {VMX} | VMXON Mq (F3) {VMX} | RDRAND Rv (11B) +7: VMPTRST Mq {VMX} | VMPTRST Mq (F3) {VMX} | RDSEED Rv (11B) EndTable GrpTable: Grp10 @@ -1036,10 +1036,10 @@ GrpTable: Grp14 EndTable GrpTable: Grp15 -0: fxsave | RDFSBASE Ry (F3),(11B) -1: fxstor | RDGSBASE Ry (F3),(11B) -2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B) -3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B) +0: fxsave {FPU} | RDFSBASE Ry (F3),(11B) +1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B) +2: ldmxcsr {FPU} | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B) +3: stmxcsr {FPU} | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B) 4: XSAVE | ptwrite Ey (F3),(11B) 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B) 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Tue, 7 Apr 2020 13:15:35 +0200 Peter Zijlstra wrote: > On Tue, Apr 07, 2020 at 06:50:08PM +0900, Masami Hiramatsu wrote: > > On Mon, 6 Apr 2020 12:21:07 +0200 > > Peter Zijlstra wrote: > > > > arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: > > > fpu_safe hint not an FPU instruction > > > 008b 36b: 48 0f ae 0d 00 00 00fxrstor64 0x0(%rip)# 373 > > > > > > > > > arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: > > > fpu_safe hint not an FPU instruction > > > 01fa1d2fa: 48 0f ae 4b 40 fxrstor64 0x40(%rbx) > > > > Ah, fxstor will not chang the FPU/MMX/SSE regs but just store it on memory. > > OK, I'll remove it from the list. > > Yeah, I don't much care if its in or out, but the way I was reading that > patch it _should_ be in, but then it doesn't seem to recognise it. Oops, I misread. OK. I fixed the issue. > > > > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing > > > that previously): > > > > Oops, let me check it. > > I just send you another patch that could do with insn_is_vmx() > (sorry!!!) Hmm, it is hard to find out the vmx insns. Maybe we need to clarify it by opcode pattern. (like "VM.*") Thank you, -- Masami Hiramatsu ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Tue, Apr 07, 2020 at 06:50:08PM +0900, Masami Hiramatsu wrote: > On Mon, 6 Apr 2020 12:21:07 +0200 > Peter Zijlstra wrote: > > arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: > > fpu_safe hint not an FPU instruction > > 008b 36b: 48 0f ae 0d 00 00 00fxrstor64 0x0(%rip)# 373 > > > > > > arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: > > fpu_safe hint not an FPU instruction > > 01fa1d2fa: 48 0f ae 4b 40 fxrstor64 0x40(%rbx) > > Ah, fxstor will not chang the FPU/MMX/SSE regs but just store it on memory. > OK, I'll remove it from the list. Yeah, I don't much care if its in or out, but the way I was reading that patch it _should_ be in, but then it doesn't seem to recognise it. > > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing > > that previously): > > Oops, let me check it. I just send you another patch that could do with insn_is_vmx() (sorry!!!) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Sun, Apr 05, 2020 at 12:19:30PM +0900, Masami Hiramatsu wrote: > > @@ -269,14 +269,14 @@ d4: AAM Ib (i64) > > d5: AAD Ib (i64) > > d6: > > d7: XLAT/XLATB > > -d8: ESC > > -d9: ESC > > -da: ESC > > -db: ESC > > -dc: ESC > > -dd: ESC > > -de: ESC > > -df: ESC > > +d8: FPU > > +d9: FPU > > +da: FPU > > +db: FPU > > +dc: FPU > > +dd: FPU > > +de: FPU > > +df: FPU > > I don't want to use FPU since Intel SDM is still using ESC because it > is co-processor escape code. But we all know that co-processor is x87. Can we then perhaps put in 'x87' as an escape code instead of 'ESC' ? > Here is the new patch. > > From d7eca4946ab3f0d08ad1268f49418f8655aaf57c Mon Sep 17 00:00:00 2001 > From: Masami Hiramatsu > Date: Fri, 3 Apr 2020 16:58:22 +0900 > Subject: [PATCH] x86: insn: Add insn_is_fpu() > > Add insn_is_fpu(insn) which tells that the insn is > whether touch the MMX/XMM/YMM register or the instruction > of FP coprocessor. > > Signed-off-by: Masami Hiramatsu arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: fpu_safe hint not an FPU instruction 008b 36b: 48 0f ae 0d 00 00 00fxrstor64 0x0(%rip)# 373 arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: fpu_safe hint not an FPU instruction 01fa1d2fa: 48 0f ae 4b 40 fxrstor64 0x40(%rbx) Also, all the VMX bits seems to qualify as FPU (I can't remember seeing that previously): arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_eoi_induced()+0x20: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_write()+0x20: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_invlpg()+0x20: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_interrupt_shadow()+0x1c: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_decache_cr4_guest_bits()+0x5a: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_decache_cr0_guest_bits()+0x5a: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_io()+0x24: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_access()+0x39: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_idt()+0x57: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_exit_info()+0x58: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_gdt()+0x57: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_guest_apic_has_interrupt()+0xf8: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_nmi_allowed()+0x98: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_handle_exit_irqoff()+0xb3: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_nmi_mask()+0x8a: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_rflags()+0x99: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_ept_misconfig()+0x22: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_interrupt_allowed()+0x8d: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_write_pml_buffer()+0x1c5: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_invpcid()+0x26a: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_ar()+0x9b: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_selector()+0x96: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_base()+0x9b: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_segment()+0x2da: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmwrite_error()+0x161: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: exec_controls_set.isra.0()+0x5a: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_dr()+0x1bc: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: update_exception_bitmap()+0x136: FPU instruction outside of kernel_fpu_{begin,end}() arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_interrupt_shadow()+0x8b: FPU instruction outside of kernel_fpu_{begin,end}()
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On 4/3/20 8:08 PM, Masami Hiramatsu wrote: > +static inline int insn_is_fpu(struct insn *insn) > +{ > + if (!insn->opcode.got) > + insn_get_opcode(insn); > + if (inat_is_fpu(insn->attr)) { > + if (insn->attr & INAT_FPUIFVEX) > + return insn_is_avx(insn); > + return 1; > + } return 0; // ?? > +} > + -- ~Randy ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote: > From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001 > From: Masami Hiramatsu > Date: Fri, 3 Apr 2020 16:58:22 +0900 > Subject: [PATCH] x86: insn: Add insn_is_fpu() > > Add insn_is_fpu(insn) which tells that the insn is > whether touch the MMX/XMM/YMM register or the instruction > of FP coprocessor. > > Signed-off-by: Masami Hiramatsu With that I get a lot of warnings: FPU instruction outside of kernel_fpu_{begin,end}() two random examples (x86-64-allmodconfig build): arch/x86/xen/enlighten.o: warning: objtool: xen_vcpu_restore()+0x341: FPU instruction outside of kernel_fpu_{begin,end}() $ ./objdump-func.sh defconfig-build/arch/x86/xen/enlighten.o xen_vcpu_restore | grep 341 0341 841: 0f 92 c3setb %bl arch/x86/events/core.o: warning: objtool: x86_pmu_stop()+0x6d: FPU instruction outside of kernel_fpu_{begin,end}() $ ./objdump-func.sh defconfig-build/arch/x86/events/core.o x86_pmu_stop | grep 6d 006d 23ad: 41 0f 92 c6 setb %r14b Which seems to suggest something goes wobbly with SETB, but I'm not seeing what in a hurry. --- --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -12,6 +12,13 @@ #define _ASM_X86_FPU_API_H #include +#define annotate_fpu() ({ \ + asm volatile("%c0:\n\t" \ +".pushsection .discard.fpu_safe\n\t" \ +".long %c0b - .\n\t" \ +".popsection\n\t" : : "i" (__COUNTER__)); \ +}) + /* * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It * disables preemption so be careful if you intend to use it for long periods --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -437,6 +437,7 @@ static inline int copy_fpregs_to_fpstate * Legacy FPU register saving, FNSAVE always clears FPU registers, * so we have to mark them inactive: */ + annotate_fpu(); asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave)); return 0; @@ -462,6 +463,7 @@ static inline void copy_kernel_to_fpregs * "m" is a random variable that should be in L1. */ if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) { + annotate_fpu(); asm volatile( "fnclex\n\t" "emms\n\t" --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -38,7 +38,10 @@ static void fpu__init_cpu_generic(void) fpstate_init_soft(>thread.fpu.state.soft); else #endif + { + annotate_fpu(); asm volatile ("fninit"); + } } /* @@ -61,6 +64,7 @@ static bool fpu__probe_without_cpuid(voi cr0 &= ~(X86_CR0_TS | X86_CR0_EM); write_cr0(cr0); + annotate_fpu(); asm volatile("fninit ; fnstsw %0 ; fnstcw %1" : "+m" (fsw), "+m" (fcw)); pr_info("x86/fpu: Probing for FPU: FSW=0x%04hx FCW=0x%04hx\n", fsw, fcw); --- a/tools/objtool/arch.h +++ b/tools/objtool/arch.h @@ -27,6 +27,7 @@ enum insn_type { INSN_CLAC, INSN_STD, INSN_CLD, + INSN_FPU, INSN_OTHER, }; --- a/tools/objtool/arch/x86/decode.c +++ b/tools/objtool/arch/x86/decode.c @@ -92,6 +92,11 @@ int arch_decode_instruction(struct elf * *len = insn.length; *type = INSN_OTHER; + if (insn_is_fpu()) { + *type = INSN_FPU; + return 0; + } + if (insn.vex_prefix.nbytes) return 0; @@ -357,48 +362,54 @@ int arch_decode_instruction(struct elf * case 0x0f: - if (op2 == 0x01) { - + switch (op2) { + case 0x01: if (modrm == 0xca) *type = INSN_CLAC; else if (modrm == 0xcb) *type = INSN_STAC; + break; - } else if (op2 >= 0x80 && op2 <= 0x8f) { - + case 0x80 ... 0x8f: /* Jcc */ *type = INSN_JUMP_CONDITIONAL; + break; - } else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 || - op2 == 0x35) { - - /* sysenter, sysret */ + case 0x05: /* syscall */ + case 0x07: /* sysret */ + case 0x34: /* sysenter */ + case 0x35: /* sysexit */ *type = INSN_CONTEXT_SWITCH; + break; - } else if (op2 == 0x0b || op2 == 0xb9) { - - /* ud2 */ + case 0xff: /* ud0 */ + case 0xb9: /* ud1 */ + case 0x0b: /* ud2 */ *type =
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Sat, 4 Apr 2020 16:36:20 +0200 Peter Zijlstra wrote: > On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote: > > From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001 > > From: Masami Hiramatsu > > Date: Fri, 3 Apr 2020 16:58:22 +0900 > > Subject: [PATCH] x86: insn: Add insn_is_fpu() > > > > Add insn_is_fpu(insn) which tells that the insn is > > whether touch the MMX/XMM/YMM register or the instruction > > of FP coprocessor. > > > > Signed-off-by: Masami Hiramatsu > > With that I get a lot of warnings: > > FPU instruction outside of kernel_fpu_{begin,end}() > > two random examples (x86-64-allmodconfig build): > > arch/x86/xen/enlighten.o: warning: objtool: xen_vcpu_restore()+0x341: FPU > instruction outside of kernel_fpu_{begin,end}() > > $ ./objdump-func.sh defconfig-build/arch/x86/xen/enlighten.o xen_vcpu_restore > | grep 341 > 0341 841: 0f 92 c3setb %bl > > arch/x86/events/core.o: warning: objtool: x86_pmu_stop()+0x6d: FPU > instruction outside of kernel_fpu_{begin,end}() > > $ ./objdump-func.sh defconfig-build/arch/x86/events/core.o x86_pmu_stop | > grep 6d > 006d 23ad: 41 0f 92 c6 setb %r14b > > Which seems to suggest something goes wobbly with SETB, but I'm not > seeing what in a hurry. Yes, I also got same issue, please try the new one. Thank you! > > > --- > --- a/arch/x86/include/asm/fpu/api.h > +++ b/arch/x86/include/asm/fpu/api.h > @@ -12,6 +12,13 @@ > #define _ASM_X86_FPU_API_H > #include > > +#define annotate_fpu() ({\ > + asm volatile("%c0:\n\t" \ > + ".pushsection .discard.fpu_safe\n\t" \ > + ".long %c0b - .\n\t" \ > + ".popsection\n\t" : : "i" (__COUNTER__)); \ > +}) > + > /* > * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It > * disables preemption so be careful if you intend to use it for long periods > --- a/arch/x86/include/asm/fpu/internal.h > +++ b/arch/x86/include/asm/fpu/internal.h > @@ -437,6 +437,7 @@ static inline int copy_fpregs_to_fpstate >* Legacy FPU register saving, FNSAVE always clears FPU registers, >* so we have to mark them inactive: >*/ > + annotate_fpu(); > asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave)); > > return 0; > @@ -462,6 +463,7 @@ static inline void copy_kernel_to_fpregs >* "m" is a random variable that should be in L1. >*/ > if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) { > + annotate_fpu(); > asm volatile( > "fnclex\n\t" > "emms\n\t" > --- a/arch/x86/kernel/fpu/init.c > +++ b/arch/x86/kernel/fpu/init.c > @@ -38,7 +38,10 @@ static void fpu__init_cpu_generic(void) > fpstate_init_soft(>thread.fpu.state.soft); > else > #endif > + { > + annotate_fpu(); > asm volatile ("fninit"); > + } > } > > /* > @@ -61,6 +64,7 @@ static bool fpu__probe_without_cpuid(voi > cr0 &= ~(X86_CR0_TS | X86_CR0_EM); > write_cr0(cr0); > > + annotate_fpu(); > asm volatile("fninit ; fnstsw %0 ; fnstcw %1" : "+m" (fsw), "+m" (fcw)); > > pr_info("x86/fpu: Probing for FPU: FSW=0x%04hx FCW=0x%04hx\n", fsw, > fcw); > --- a/tools/objtool/arch.h > +++ b/tools/objtool/arch.h > @@ -27,6 +27,7 @@ enum insn_type { > INSN_CLAC, > INSN_STD, > INSN_CLD, > + INSN_FPU, > INSN_OTHER, > }; > > --- a/tools/objtool/arch/x86/decode.c > +++ b/tools/objtool/arch/x86/decode.c > @@ -92,6 +92,11 @@ int arch_decode_instruction(struct elf * > *len = insn.length; > *type = INSN_OTHER; > > + if (insn_is_fpu()) { > + *type = INSN_FPU; > + return 0; > + } > + > if (insn.vex_prefix.nbytes) > return 0; > > @@ -357,48 +362,54 @@ int arch_decode_instruction(struct elf * > > case 0x0f: > > - if (op2 == 0x01) { > - > + switch (op2) { > + case 0x01: > if (modrm == 0xca) > *type = INSN_CLAC; > else if (modrm == 0xcb) > *type = INSN_STAC; > + break; > > - } else if (op2 >= 0x80 && op2 <= 0x8f) { > - > + case 0x80 ... 0x8f: /* Jcc */ > *type = INSN_JUMP_CONDITIONAL; > + break; > > - } else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 || > -op2 == 0x35) { > - > - /* sysenter, sysret */ > + case 0x05: /* syscall */ > + case 0x07: /* sysret */ > + case 0x34: /* sysenter */ > + case 0x35: /* sysexit */ > *type =
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Sat, 4 Apr 2020 16:32:24 +0200 Peter Zijlstra wrote: > On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote: > > From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001 > > From: Masami Hiramatsu > > Date: Fri, 3 Apr 2020 16:58:22 +0900 > > Subject: [PATCH] x86: insn: Add insn_is_fpu() > > > > Add insn_is_fpu(insn) which tells that the insn is > > whether touch the MMX/XMM/YMM register or the instruction > > of FP coprocessor. > > Looks good, although I changed it a little like so: OK, and I found there is a mistake on my patch. I should not use (v) for the instruction, which makes decoder insane. > > --- a/arch/x86/include/asm/insn.h > +++ b/arch/x86/include/asm/insn.h > @@ -133,11 +133,12 @@ static inline int insn_is_fpu(struct ins > { > if (!insn->opcode.got) > insn_get_opcode(insn); > - if (inat_is_fpu(insn->attr)) { > + if (inat_is_fpu(insn->attr)) { > if (insn->attr & INAT_FPUIFVEX) > return insn_is_avx(insn); > return 1; > } > + return 0; > } > > static inline int insn_has_emulate_prefix(struct insn *insn) > --- a/arch/x86/lib/x86-opcode-map.txt > +++ b/arch/x86/lib/x86-opcode-map.txt > @@ -269,14 +269,14 @@ d4: AAM Ib (i64) > d5: AAD Ib (i64) > d6: > d7: XLAT/XLATB > -d8: ESC > -d9: ESC > -da: ESC > -db: ESC > -dc: ESC > -dd: ESC > -de: ESC > -df: ESC > +d8: FPU > +d9: FPU > +da: FPU > +db: FPU > +dc: FPU > +dd: FPU > +de: FPU > +df: FPU I don't want to use FPU since Intel SDM is still using ESC because it is co-processor escape code. Here is the new patch. >From d7eca4946ab3f0d08ad1268f49418f8655aaf57c Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Fri, 3 Apr 2020 16:58:22 +0900 Subject: [PATCH] x86: insn: Add insn_is_fpu() Add insn_is_fpu(insn) which tells that the insn is whether touch the MMX/XMM/YMM register or the instruction of FP coprocessor. Signed-off-by: Masami Hiramatsu --- Changes: - Fix SET* also not FPU (unless it has vex prefix.) - Fix to remove (v) (VEX only) flag. --- arch/x86/include/asm/inat.h| 7 +++ arch/x86/include/asm/insn.h| 12 arch/x86/lib/x86-opcode-map.txt| 6 +++--- arch/x86/tools/gen-insn-attr-x86.awk | 22 +- tools/arch/x86/include/asm/inat.h | 7 +++ tools/arch/x86/include/asm/insn.h | 12 tools/arch/x86/lib/x86-opcode-map.txt | 6 +++--- tools/arch/x86/tools/gen-insn-attr-x86.awk | 22 +- 8 files changed, 78 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h index 4cf2ad521f65..ffce45178c08 100644 --- a/arch/x86/include/asm/inat.h +++ b/arch/x86/include/asm/inat.h @@ -77,6 +77,8 @@ #define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5)) #define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6)) #define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7)) +#define INAT_FPU (1 << (INAT_FLAG_OFFS + 8)) +#define INAT_FPUIFVEX (1 << (INAT_FLAG_OFFS + 9)) /* Attribute making macros for attribute tables */ #define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS) #define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS) @@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr) { return attr & INAT_EVEXONLY; } + +static inline int inat_is_fpu(insn_attr_t attr) +{ + return attr & INAT_FPU; +} #endif diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h index 5c1ae3eff9d4..1752c54d2103 100644 --- a/arch/x86/include/asm/insn.h +++ b/arch/x86/include/asm/insn.h @@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn) return (insn->vex_prefix.nbytes == 4); } +static inline int insn_is_fpu(struct insn *insn) +{ + if (!insn->opcode.got) + insn_get_opcode(insn); + if (inat_is_fpu(insn->attr)) { + if (insn->attr & INAT_FPUIFVEX) + return insn_is_avx(insn); + return 1; + } + return 0; +} + static inline int insn_has_emulate_prefix(struct insn *insn) { return !!insn->emulate_prefix_size; diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt index ec31f5b60323..c3d36b4c894d 100644 --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -1037,9 +1037,9 @@ EndTable GrpTable: Grp15 0: fxsave | RDFSBASE Ry (F3),(11B) -1: fxstor | RDGSBASE Ry (F3),(11B) -2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B) -3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B) +1: fxrstor | RDGSBASE Ry (F3),(11B) +2: ldmxcsr | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B) +3: stmxcsr | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B) 4: XSAVE | ptwrite Ey (F3),(11B) 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B) 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3) diff --git
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Fri, 3 Apr 2020 20:15:11 -0700 Randy Dunlap wrote: > On 4/3/20 8:08 PM, Masami Hiramatsu wrote: > > +static inline int insn_is_fpu(struct insn *insn) > > +{ > > + if (!insn->opcode.got) > > + insn_get_opcode(insn); > > + if (inat_is_fpu(insn->attr)) { > > + if (insn->attr & INAT_FPUIFVEX) > > + return insn_is_avx(insn); > > + return 1; > > + } > > return 0; // ?? Oops, right! Hm, I need to add a caller for this API... Thanks, > > > +} > > + > > > -- > ~Randy > -- Masami Hiramatsu ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Fri, 3 Apr 2020 13:21:13 +0200 Peter Zijlstra wrote: > On Fri, Apr 03, 2020 at 02:28:37PM +0900, Masami Hiramatsu wrote: > > On Thu, 2 Apr 2020 16:13:08 +0200 > > Peter Zijlstra wrote: > > > > Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ? > > > While digging through various opcode manuals is of course forever fun, I > > > do feel like it might not be the best way. > > > > Yes, it is possible to add INAT_FPU and insn_is_fpu(). > > But it seems that the below patch needs more classification based on > > nmemonic or opcodes. > > I went with opcode, and I think I did a fairly decent job, but I did > find a few problems on a second look at things. > > I don't think nmemonic are going to help, the x86 nmemonics are a mess > (much like its opcode tables), there's no way to sanely detect what > registers are effected by an instruction based on name. > > The best I came up with is operand class, see below. Yeah, so we need another map, current inat map is optimized for decoding, and lack of some information for reducing size. E.g. it mixed up the VEX prefix instruction with non-VEX one. > > > IMHO, it is the time to expand gen-insn-attr.awk or clone it to > > generate another opcode map, so that user will easily extend the > > insn infrastructure. > > (e.g. I had made an in-kernel disassembler, which generates a mnemonic > > maps from x86-opcode-map.txt) > > https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414 > > Cute, and I'm thinking we might want that eventually, people have been > asking for a kernel specific objdump, one that knows about and shows all > the magical things the kernel does, like alternative, jump-labels and > soon the static_call stuff, but also things like the exception handling. > > Objtool actually knows about much of that, and pairing it with your > disassembler could print it. > > > > + if (insn.vex_prefix.nbytes) { > > > + *type = INSN_FPU; > > > return 0; > > > + } > > So that's the AVX nonsense dealt with; right until they stick an integer > instruction in the AVX space I suppose :/ Please tell me they didn't > already do that.. I'm not so sure. Theoretically, x86 instruction can be encoded with VEX prefix instead of REX prefix (most compiler may not output such inefficient code.) > > > op1 = insn.opcode.bytes[0]; > > > op2 = insn.opcode.bytes[1]; > > > @@ -357,48 +359,71 @@ int arch_decode_instruction(struct elf *elf, struct > > > section *sec, > > > > > > case 0x0f: > > > > > > + switch (op2) { > > > > + case 0xae: > > > + /* insane!! */ > > > + if ((modrm_reg >= 0 && modrm_reg <= 3) && modrm_mod != > > > 3 && !insn.prefixes.nbytes) > > > + *type = INSN_FPU; > > > + break; > > This is crazy, but I was trying to get at the x86 FPU control > instructions: > > FXSAVE, FXRSTOR, LDMXCSR and STMXCSR > > Which are in Grp15 Yes, that is a complex part. > Now arguably, I could skip them, the compiler should never emit those, > and the newer, fancier, XSAV family isn't marked as FPU either, even > though it will save/restore the FPU/MMX/SSE/AVX states too. > > So I think I'll remove this part, it'll also make the fpu_safe > annotations easier. > > > > + case 0x10 ... 0x17: > > > + case 0x28 ... 0x2f: > > > + case 0x3a: > > > + case 0x50 ... 0x77: > > > + case 0x7a ... 0x7f: > > > + case 0xc2: > > > + case 0xc4 ... 0xc6: > > > + case 0xd0 ... 0xff: > > > + /* MMX, SSE, VMX */ > > So afaict these are the MMX and SSE instruction (clearly the VMX is my > brain loosing it). > > I went with the coder64 opcode tables, but our x86-opcode-map.txt seems > to agree, mostly. > > I now see that 0f 3a is not all mmx/sse, it also includes RORX which is > an integer instruction. Also, may I state that the opcode map is a > sodding disgrace? Why is an integer instruction stuck in the middle of > SSE instructions like that ?!?! > > And I should shorten the last range to 0xd0 ... 0xfe, as 0f ff is UD0. > > Other than that I think this is pretty accurate. > > > > + *type = INSN_FPU; > > > + break; > > > + > > > + default: > > > + break; > > > + } > > > break; > > > > > > case 0xc9: > > > @@ -414,6 +439,10 @@ int arch_decode_instruction(struct elf *elf, struct > > > section *sec, > > > > > > break; > > > > > > + case 0xd8 ... 0xdf: /* x87 FPU range */ > > > + *type = INSN_FPU; > > > + break; > > Our x86-opcode-map.txt lists that as ESC, but doesn't have an escape > table for it. Per: > > http://ref.x86asm.net/coder64.html > > these are all the traditional x87 FPU ops. Yes, for decoding, we don't need those tables. > > > + > > > case 0xe3: > > > /* jecxz/jrcxz */ > > > *type = INSN_JUMP_CONDITIONAL; > > >
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Fri, Apr 03, 2020 at 02:28:37PM +0900, Masami Hiramatsu wrote: > On Thu, 2 Apr 2020 16:13:08 +0200 > Peter Zijlstra wrote: > > Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ? > > While digging through various opcode manuals is of course forever fun, I > > do feel like it might not be the best way. > > Yes, it is possible to add INAT_FPU and insn_is_fpu(). > But it seems that the below patch needs more classification based on > nmemonic or opcodes. I went with opcode, and I think I did a fairly decent job, but I did find a few problems on a second look at things. I don't think nmemonic are going to help, the x86 nmemonics are a mess (much like its opcode tables), there's no way to sanely detect what registers are effected by an instruction based on name. The best I came up with is operand class, see below. > IMHO, it is the time to expand gen-insn-attr.awk or clone it to > generate another opcode map, so that user will easily extend the > insn infrastructure. > (e.g. I had made an in-kernel disassembler, which generates a mnemonic > maps from x86-opcode-map.txt) > https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414 Cute, and I'm thinking we might want that eventually, people have been asking for a kernel specific objdump, one that knows about and shows all the magical things the kernel does, like alternative, jump-labels and soon the static_call stuff, but also things like the exception handling. Objtool actually knows about much of that, and pairing it with your disassembler could print it. > > + if (insn.vex_prefix.nbytes) { > > + *type = INSN_FPU; > > return 0; > > + } So that's the AVX nonsense dealt with; right until they stick an integer instruction in the AVX space I suppose :/ Please tell me they didn't already do that.. > > > > op1 = insn.opcode.bytes[0]; > > op2 = insn.opcode.bytes[1]; > > @@ -357,48 +359,71 @@ int arch_decode_instruction(struct elf *elf, struct > > section *sec, > > > > case 0x0f: > > > > + switch (op2) { > > + case 0xae: > > + /* insane!! */ > > + if ((modrm_reg >= 0 && modrm_reg <= 3) && modrm_mod != > > 3 && !insn.prefixes.nbytes) > > + *type = INSN_FPU; > > + break; This is crazy, but I was trying to get at the x86 FPU control instructions: FXSAVE, FXRSTOR, LDMXCSR and STMXCSR Which are in Grp15 Now arguably, I could skip them, the compiler should never emit those, and the newer, fancier, XSAV family isn't marked as FPU either, even though it will save/restore the FPU/MMX/SSE/AVX states too. So I think I'll remove this part, it'll also make the fpu_safe annotations easier. > > + case 0x10 ... 0x17: > > + case 0x28 ... 0x2f: > > + case 0x3a: > > + case 0x50 ... 0x77: > > + case 0x7a ... 0x7f: > > + case 0xc2: > > + case 0xc4 ... 0xc6: > > + case 0xd0 ... 0xff: > > + /* MMX, SSE, VMX */ So afaict these are the MMX and SSE instruction (clearly the VMX is my brain loosing it). I went with the coder64 opcode tables, but our x86-opcode-map.txt seems to agree, mostly. I now see that 0f 3a is not all mmx/sse, it also includes RORX which is an integer instruction. Also, may I state that the opcode map is a sodding disgrace? Why is an integer instruction stuck in the middle of SSE instructions like that ?!?! And I should shorten the last range to 0xd0 ... 0xfe, as 0f ff is UD0. Other than that I think this is pretty accurate. > > + *type = INSN_FPU; > > + break; > > + > > + default: > > + break; > > + } > > break; > > > > case 0xc9: > > @@ -414,6 +439,10 @@ int arch_decode_instruction(struct elf *elf, struct > > section *sec, > > > > break; > > > > + case 0xd8 ... 0xdf: /* x87 FPU range */ > > + *type = INSN_FPU; > > + break; Our x86-opcode-map.txt lists that as ESC, but doesn't have an escape table for it. Per: http://ref.x86asm.net/coder64.html these are all the traditional x87 FPU ops. > > + > > case 0xe3: > > /* jecxz/jrcxz */ > > *type = INSN_JUMP_CONDITIONAL; Now; I suppose I need our x86-opcode-map.txt extended in at least two ways: - all those x87 FPU instructions need adding - a way of detecting the affected register set Now, I suspect we can do that latter by the instruction operands that are already there, although I've not managed to untangle them fully (hint, we really should improve the comments on top). Operands seem to have one capital that denotes the class: - I: immediate - G: general purpose - E - P,Q: MMX - V,M,W,H: SSE So if we can extend the awk magic to provide operand classes for each decoded instruction, then that would simplify this lots. New version below... --- ---
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Thu, 2 Apr 2020 16:13:08 +0200 Peter Zijlstra wrote: > On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote: > > Hi Jann, > > > > Am 02.04.20 um 04:34 schrieb Jann Horn: > > > [x86 folks in CC so that they can chime in on the precise rules for this > > > stuff] > > > > > > Hi! > > > > > > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/ > > > turn on floating-point instructions in the compiler flags > > > (-mhard-float, -msse and -msse2) in order to make the "float" and > > > "double" types usable from C code without requiring helper functions. > > > > > > However, as far as I know, code running in normal kernel context isn't > > > allowed to use floating-point registers without special protection > > > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also > > > require that the protected code never blocks). If you violate that > > > rule, that can lead to various issues - among other things, I think > > > the kernel will clobber userspace FPU register state, and I think the > > > kernel code can blow up if a context switch happens at the wrong time, > > > since in-kernel task switches don't preserve FPU state. > > > > > > Is there some hidden trick I'm missing that makes it okay to use FPU > > > registers here? > > > > > > I would try testing this, but unfortunately none of the AMD devices I > > > have here have the appropriate graphics hardware... > > > > yes, using the floating point calculations in the display code has been a > > source of numerous problems and confusion in the past. > > > > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the > > DC_FP_START() and DC_FP_END() macros which are supposed to hide the > > architecture depend handling for x86 and PPC64. > > > > This originated from the graphics block integrated into AMD CPU (where we > > knew which fp unit we had), but as far as I know is now also used for > > dedicated AMD GPUs as well. > > > > I'm not really a fan of this either, but so far we weren't able to convince > > the hardware engineers to not use floating point calculations for the > > display stuff. > > Might I complain that: > > make O=allmodconfig-build drivers/gpu/drm/amd/display/dc/ > > does not in fact work? > > Anyway, how do people feel about something like the below? > > Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ? > While digging through various opcode manuals is of course forever fun, I > do feel like it might not be the best way. Yes, it is possible to add INAT_FPU and insn_is_fpu(). But it seems that the below patch needs more classification based on nmemonic or opcodes. IMHO, it is the time to expand gen-insn-attr.awk or clone it to generate another opcode map, so that user will easily extend the insn infrastructure. (e.g. I had made an in-kernel disassembler, which generates a mnemonic maps from x86-opcode-map.txt) https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414 Thank you, > > --- > arch/x86/include/asm/fpu/api.h | 7 > arch/x86/include/asm/fpu/internal.h | 11 ++ > arch/x86/kernel/fpu/init.c | 5 +++ > tools/objtool/arch.h| 1 + > tools/objtool/arch/x86/decode.c | 71 > ++--- > tools/objtool/check.c | 58 ++ > tools/objtool/check.h | 3 +- > 7 files changed, 134 insertions(+), 22 deletions(-) > > diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h > index b774c52e5411..b9bca1cdc875 100644 > --- a/arch/x86/include/asm/fpu/api.h > +++ b/arch/x86/include/asm/fpu/api.h > @@ -12,6 +12,13 @@ > #define _ASM_X86_FPU_API_H > #include > > +#define annotate_fpu() ({\ > + asm volatile("%c0:\n\t" \ > + ".pushsection .discard.fpu_safe\n\t" \ > + ".long %c0b - .\n\t" \ > + ".popsection\n\t" : : "i" (__COUNTER__)); \ > +}) > + > /* > * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It > * disables preemption so be careful if you intend to use it for long periods > diff --git a/arch/x86/include/asm/fpu/internal.h > b/arch/x86/include/asm/fpu/internal.h > index 44c48e34d799..bc436213a0d4 100644 > --- a/arch/x86/include/asm/fpu/internal.h > +++ b/arch/x86/include/asm/fpu/internal.h > @@ -102,6 +102,11 @@ static inline void fpstate_init_fxstate(struct > fxregs_state *fx) > } > extern void fpstate_sanitize_xstate(struct fpu *fpu); > > +#define _ASM_ANNOTATE_FPU(at) > \ > + ".pushsection .discard.fpu_safe\n" \ > + ".long " #at " - .\n" \ > + ".popsection\n"\ > + > #define
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Thu, Apr 2, 2020 at 11:36 AM Thomas Gleixner wrote: > Jann Horn writes: > > On Thu, Apr 2, 2020 at 9:34 AM Christian König > > wrote: > >> Am 02.04.20 um 04:34 schrieb Jann Horn: > >> > [x86 folks in CC so that they can chime in on the precise rules for > >> > this stuff] > > They are pretty simple. > > Any code using FPU needs to be completely isolated from regular code > either by using inline asm or by moving it to a different compilation > unit. The invocations need fpu_begin/end() of course. [...] > We really need objtool support to validate that. > > Peter, now that we know how to do it (noinstr, clac/stac) we can emit > annotations (see patch below) and validate that any FPU instruction is > inside a safe region. Hmm? One annoying aspect is that for the "move it to a different compilation unit" method, objtool needs to know at compile time (before linking) which functions are in FPU-enabled object files, right? So we'd need to have some sort of function annotation that gets plumbed from the function declaration in a header file through the compiler into the ELF file, and then let objtool verify that calls to FPU-enabled methods occur only when the FPU is available? (Ideally something that covers indirect calls... but this would probably get really complicated unless we can get the compiler to include that annotation in its type checking.) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote: > Hi Jann, > > Am 02.04.20 um 04:34 schrieb Jann Horn: > > [x86 folks in CC so that they can chime in on the precise rules for this > > stuff] > > > > Hi! > > > > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/ > > turn on floating-point instructions in the compiler flags > > (-mhard-float, -msse and -msse2) in order to make the "float" and > > "double" types usable from C code without requiring helper functions. > > > > However, as far as I know, code running in normal kernel context isn't > > allowed to use floating-point registers without special protection > > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also > > require that the protected code never blocks). If you violate that > > rule, that can lead to various issues - among other things, I think > > the kernel will clobber userspace FPU register state, and I think the > > kernel code can blow up if a context switch happens at the wrong time, > > since in-kernel task switches don't preserve FPU state. > > > > Is there some hidden trick I'm missing that makes it okay to use FPU > > registers here? > > > > I would try testing this, but unfortunately none of the AMD devices I > > have here have the appropriate graphics hardware... > > yes, using the floating point calculations in the display code has been a > source of numerous problems and confusion in the past. > > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the > DC_FP_START() and DC_FP_END() macros which are supposed to hide the > architecture depend handling for x86 and PPC64. > > This originated from the graphics block integrated into AMD CPU (where we > knew which fp unit we had), but as far as I know is now also used for > dedicated AMD GPUs as well. > > I'm not really a fan of this either, but so far we weren't able to convince > the hardware engineers to not use floating point calculations for the > display stuff. Might I complain that: make O=allmodconfig-build drivers/gpu/drm/amd/display/dc/ does not in fact work? Anyway, how do people feel about something like the below? Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ? While digging through various opcode manuals is of course forever fun, I do feel like it might not be the best way. --- arch/x86/include/asm/fpu/api.h | 7 arch/x86/include/asm/fpu/internal.h | 11 ++ arch/x86/kernel/fpu/init.c | 5 +++ tools/objtool/arch.h| 1 + tools/objtool/arch/x86/decode.c | 71 ++--- tools/objtool/check.c | 58 ++ tools/objtool/check.h | 3 +- 7 files changed, 134 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index b774c52e5411..b9bca1cdc875 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -12,6 +12,13 @@ #define _ASM_X86_FPU_API_H #include +#define annotate_fpu() ({ \ + asm volatile("%c0:\n\t" \ +".pushsection .discard.fpu_safe\n\t" \ +".long %c0b - .\n\t" \ +".popsection\n\t" : : "i" (__COUNTER__)); \ +}) + /* * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It * disables preemption so be careful if you intend to use it for long periods diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 44c48e34d799..bc436213a0d4 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -102,6 +102,11 @@ static inline void fpstate_init_fxstate(struct fxregs_state *fx) } extern void fpstate_sanitize_xstate(struct fpu *fpu); +#define _ASM_ANNOTATE_FPU(at) \ +".pushsection .discard.fpu_safe\n" \ +".long " #at " - .\n" \ +".popsection\n"\ + #define user_insn(insn, output, input...) \ ({ \ int err;\ @@ -116,6 +121,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu); "jmp 2b\n"\ ".previous\n" \ _ASM_EXTABLE(1b, 3b) \ +_ASM_ANNOTATE_FPU(1b) \ : [err] "=r" (err), output \ : "0"(0), input); \
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
On Thu, Apr 2, 2020 at 9:34 AM Christian König wrote: > Am 02.04.20 um 04:34 schrieb Jann Horn: > > [x86 folks in CC so that they can chime in on the precise rules for this > > stuff] > > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/ > > turn on floating-point instructions in the compiler flags > > (-mhard-float, -msse and -msse2) in order to make the "float" and > > "double" types usable from C code without requiring helper functions. > > > > However, as far as I know, code running in normal kernel context isn't > > allowed to use floating-point registers without special protection > > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also > > require that the protected code never blocks). If you violate that > > rule, that can lead to various issues - among other things, I think > > the kernel will clobber userspace FPU register state, and I think the > > kernel code can blow up if a context switch happens at the wrong time, > > since in-kernel task switches don't preserve FPU state. > > > > Is there some hidden trick I'm missing that makes it okay to use FPU > > registers here? > > > > I would try testing this, but unfortunately none of the AMD devices I > > have here have the appropriate graphics hardware... > > yes, using the floating point calculations in the display code has been > a source of numerous problems and confusion in the past. > > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind > the DC_FP_START() and DC_FP_END() macros which are supposed to hide the > architecture depend handling for x86 and PPC64. Hmm... but as far as I can tell, you're using those macros from inside functions that are already compiled with the FPU on: - drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c uses the macros, but is already compiled with calcs_ccflags - drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c uses the macros, but is already compiled with "-mhard-float -msse -msse2" - drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c uses the macros, but is already compiled with "-mhard-float -msse -msse2" AFAIK as soon as you enter any function in any file compiled with FPU instructions, you may encounter SSE instructions, e.g. via things like compiler-generated memory-zeroing code - not just when you're actually using doubles or floats. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
Jann Horn writes: > On Thu, Apr 2, 2020 at 9:34 AM Christian König > wrote: >> Am 02.04.20 um 04:34 schrieb Jann Horn: >> > [x86 folks in CC so that they can chime in on the precise rules for >> > this stuff] They are pretty simple. Any code using FPU needs to be completely isolated from regular code either by using inline asm or by moving it to a different compilation unit. The invocations need fpu_begin/end() of course. >> > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/ >> > turn on floating-point instructions in the compiler flags >> > (-mhard-float, -msse and -msse2) in order to make the "float" and >> > "double" types usable from C code without requiring helper functions. >> > >> > However, as far as I know, code running in normal kernel context isn't >> > allowed to use floating-point registers without special protection >> > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also >> > require that the protected code never blocks). If you violate that >> > rule, that can lead to various issues - among other things, I think >> > the kernel will clobber userspace FPU register state, and I think the >> > kernel code can blow up if a context switch happens at the wrong time, >> > since in-kernel task switches don't preserve FPU state. >> > >> > Is there some hidden trick I'm missing that makes it okay to use FPU >> > registers here? >> > >> > I would try testing this, but unfortunately none of the AMD devices I >> > have here have the appropriate graphics hardware... >> >> yes, using the floating point calculations in the display code has been >> a source of numerous problems and confusion in the past. >> >> The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind >> the DC_FP_START() and DC_FP_END() macros which are supposed to hide the >> architecture depend handling for x86 and PPC64. > > Hmm... but as far as I can tell, you're using those macros from inside > functions that are already compiled with the FPU on: > > - drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c uses the macros, > but is already compiled with calcs_ccflags > - drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c uses the > macros, but is already compiled with "-mhard-float -msse -msse2" > - drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c uses the > macros, but is already compiled with "-mhard-float -msse -msse2" > > AFAIK as soon as you enter any function in any file compiled with FPU > instructions, you may encounter SSE instructions, e.g. via things like > compiler-generated memory-zeroing code - not just when you're actually > using doubles or floats. That's correct and this will silently cause FPU state corruption ... We really need objtool support to validate that. Peter, now that we know how to do it (noinstr, clac/stac) we can emit annotations (see patch below) and validate that any FPU instruction is inside a safe region. Hmm? Thanks, tglx 8<--- --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -19,8 +19,27 @@ * If you intend to use the FPU in softirq you need to check first with * irq_fpu_usable() if it is possible. */ -extern void kernel_fpu_begin(void); -extern void kernel_fpu_end(void); +extern void __kernel_fpu_begin(void); +extern void __kernel_fpu_end(void); + +static inline void kernel_fpu_begin(void) +{ + asm volatile("%c0:\n\t" +".pushsection .discard.fpu_begin\n\t" +".long %c0b - .\n\t" +".popsection\n\t" : : "i" (__COUNTER__)); + __kernel_fpu_begin(); +} + +static inline void kernel_fpu_end(void) +{ + __kernel_fpu_end(); + asm volatile("%c0:\n\t" +".pushsection .discard.fpu_end\n\t" +".long %c0b - .\n\t" +".popsection\n\t" : : "i" (__COUNTER__)); +} + extern bool irq_fpu_usable(void); extern void fpregs_mark_activate(void); --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -82,7 +82,7 @@ bool irq_fpu_usable(void) } EXPORT_SYMBOL(irq_fpu_usable); -void kernel_fpu_begin(void) +void __kernel_fpu_begin(void) { preempt_disable(); @@ -102,16 +102,16 @@ void kernel_fpu_begin(void) } __cpu_invalidate_fpregs_state(); } -EXPORT_SYMBOL_GPL(kernel_fpu_begin); +EXPORT_SYMBOL_GPL(__kernel_fpu_begin); -void kernel_fpu_end(void) +void __kernel_fpu_end(void) { WARN_ON_FPU(!this_cpu_read(in_kernel_fpu)); this_cpu_write(in_kernel_fpu, false); preempt_enable(); } -EXPORT_SYMBOL_GPL(kernel_fpu_end); +EXPORT_SYMBOL_GPL(__kernel_fpu_end); /* * Save the FPU state (mark it for reload if necessary): ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
Hi Jann, Am 02.04.20 um 04:34 schrieb Jann Horn: [x86 folks in CC so that they can chime in on the precise rules for this stuff] Hi! I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/ turn on floating-point instructions in the compiler flags (-mhard-float, -msse and -msse2) in order to make the "float" and "double" types usable from C code without requiring helper functions. However, as far as I know, code running in normal kernel context isn't allowed to use floating-point registers without special protection using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also require that the protected code never blocks). If you violate that rule, that can lead to various issues - among other things, I think the kernel will clobber userspace FPU register state, and I think the kernel code can blow up if a context switch happens at the wrong time, since in-kernel task switches don't preserve FPU state. Is there some hidden trick I'm missing that makes it okay to use FPU registers here? I would try testing this, but unfortunately none of the AMD devices I have here have the appropriate graphics hardware... yes, using the floating point calculations in the display code has been a source of numerous problems and confusion in the past. The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the DC_FP_START() and DC_FP_END() macros which are supposed to hide the architecture depend handling for x86 and PPC64. This originated from the graphics block integrated into AMD CPU (where we knew which fp unit we had), but as far as I know is now also used for dedicated AMD GPUs as well. I'm not really a fan of this either, but so far we weren't able to convince the hardware engineers to not use floating point calculations for the display stuff. Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx