amdgpu kmemleaks

2024-02-28 Thread Borislav Petkov
Hi folks,

anyone interested in a bunch of amdgpu kmemleak reports from latest Linus tree
+ tip?

GPU is:

[   11.317312] [drm] amdgpu kernel modesetting enabled.
[   11.363627] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 
0x103C:0x807E 0xC4).
[   11.364077] [drm] register mmio base: 0xD0C0
[   11.364547] [drm] register mmio size: 262144
[   11.365347] [drm] add ip block number 0 
[   11.365580] [drm] add ip block number 1 
[   11.365840] [drm] add ip block number 2 
[   11.366047] [drm] add ip block number 3 
[   11.366263] [drm] add ip block number 4 
[   11.366470] [drm] add ip block number 5 
[   11.32] [drm] add ip block number 6 
[   11.366835] [drm] add ip block number 7 
[   11.367022] [drm] add ip block number 8 
[   11.382774] [drm] BIOS signature incorrect 5b 7
[   11.383002] resource: resource sanity check: requesting [mem 
0x000c-0x000d], which spans more than PCI Bus :00 
[mem 0x000c-0x000cbfff window]
[   11.383655] caller pci_map_rom+0x68/0x1d0 mapping multiple BARs
[   11.384009] amdgpu :00:01.0: amdgpu: Fetched VBIOS from ROM BAR
[   11.384402] amdgpu: ATOM BIOS: SWBRT27354.001
[   11.385827] [drm] UVD is enabled in physical mode
[   11.386063] [drm] VCE enabled in physical mode
[   11.386886] amdgpu :00:01.0: vgaarb: deactivate vga console
[   11.389089] Console: switching to colour dummy device 80x25
[   11.389543] amdgpu :00:01.0: amdgpu: Trusted Memory Zone (TMZ) feature 
not supported
[   11.390482] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment 
size is 9-bit
[   11.390793] amdgpu :00:01.0: amdgpu: VRAM: 512M 0x00F4 - 
0x00F41FFF (512M used)
[   11.391129] amdgpu :00:01.0: amdgpu: GART: 1024M 0x00FF - 
0x00FF3FFF
[   11.391456] [drm] Detected VRAM RAM=512M, BAR=512M
[   11.391632] [drm] RAM width 128bits UNKNOWN
[   11.394546] [drm] amdgpu: 512M of VRAM memory ready
[   11.394751] [drm] amdgpu: 7622M of GTT memory ready.
[   11.395299] [drm] GART: num cpu pages 262144, num gpu pages 262144
[   11.395813] [drm] PCIE GART of 1024M enabled (table at 0x00F400A0).
[   11.404914] amdgpu: hwmgr_sw_init smu backed is smu8_smu
[   11.407177] [drm] Found UVD firmware Version: 1.91 Family ID: 11
[   11.407670] [drm] UVD ENC is disabled
[   11.409969] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[   11.412601] amdgpu: smu version 18.62.00
[   11.419275] [drm] DM_PPLIB: values for Engine clock
[   11.419480] [drm] DM_PPLIB:   30
[   11.419610] [drm] DM_PPLIB:   36
[   11.419740] [drm] DM_PPLIB:   423530
[   11.419869] [drm] DM_PPLIB:   514290
[   11.419998] [drm] DM_PPLIB:   626090
[   11.420127] [drm] DM_PPLIB:   72
[   11.420327] [drm] DM_PPLIB: Validation clocks:
[   11.420536] [drm] DM_PPLIB:engine_max_clock: 72000
[   11.420722] [drm] DM_PPLIB:memory_max_clock: 8
[   11.420907] [drm] DM_PPLIB:level   : 8
[   11.421083] [drm] DM_PPLIB: values for Display clock
[   11.421266] [drm] DM_PPLIB:   30
[   11.421395] [drm] DM_PPLIB:   40
[   11.421524] [drm] DM_PPLIB:   496560
[   11.421652] [drm] DM_PPLIB:   626090
[   11.421781] [drm] DM_PPLIB:   685720
[   11.421910] [drm] DM_PPLIB:   757900
[   11.422039] [drm] DM_PPLIB: Validation clocks:
[   11.422201] [drm] DM_PPLIB:engine_max_clock: 72000
[   11.422386] [drm] DM_PPLIB:memory_max_clock: 8
[   11.422572] [drm] DM_PPLIB:level   : 8
[   11.422746] [drm] DM_PPLIB: values for Memory clock
[   11.422923] [drm] DM_PPLIB:   333000
[   11.423052] [drm] DM_PPLIB:   80
[   11.423181] [drm] DM_PPLIB: Validation clocks:
[   11.423342] [drm] DM_PPLIB:engine_max_clock: 72000
[   11.423528] [drm] DM_PPLIB:memory_max_clock: 8
[   11.423713] [drm] DM_PPLIB:level   : 8
[   11.424561] [drm] Display Core v3.2.266 initialized on DCE 11.0
[   11.516117] [drm] UVD initialized successfully.
[   11.716119] [drm] VCE initialized successfully.


unreferenced object 0x88810e6faa80 (size 128):
  comm "systemd-udevd", pid 1219, jiffies 4294895080
  hex dump (first 32 bytes):
18 cb 03 00 00 0a 28 0a 48 0a c8 0a 00 00 a0 05  ..(.H...
aa 05 b4 05 dc 05 00 00 0a 00 00 00 00 00 00 00  
  backtrace (crc 5201319b):
[<6e1e4989>] kmalloc_trace+0x25a/0x300
[<7b61fcfc>] do_detailed_mode+0x323/0x670
[<79955120>] drm_for_each_detailed_block.part.0+0x34/0x180
[<9a087c6a>] _drm_edid_connector_add_modes.part.0+0x8f/0x10b0
[] drm_add_edid_modes+0x14e/0x160
[<0a49b747>] amdgpu_dm_connector_get_modes+0x13b/0x470 [amdgpu]
[<5f5da5a5>] amdgpu_dm_init.isra.0+0x12ed/0x1e50 [amdgpu]
[] dm_hw_init+0xe/0x20 [amdgpu]
[] amdgpu_device_init+0x1f17/0x2530 [amdgpu]
[<9c22ce56>] amdgpu_driver_load_kms+0x23/0x1a0 [amdgpu]
[<8bc75f74>] amdgpu_pci_probe+0x1b5/0x550 [amdg

Re: Error in amd driver?

2024-05-06 Thread Borislav Petkov
+ amd-gfx@lists.freedesktop.org

On Sun, May 05, 2024 at 09:59:22PM +0300, Tranton Baddy wrote:
> I have this in my dmesg since version 6.8.6, not sure when it appeared. Is 
> amdgpu driver has bug?
> [   64.253144] 
> ==
> [   64.253162] BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x51f/0x7a0
> 
> [   64.253183] Use-after-free read at 0x671c48dd (in kfence-#111):
> [   64.253192]  amdgpu_bo_move+0x51f/0x7a0
> [   64.253202]  ttm_bo_handle_move_mem+0xcf/0x180
> [   64.253211]  ttm_mem_evict_first+0x1c5/0x500
> [   64.253218]  ttm_resource_manager_evict_all+0xa3/0x1e0
> [   64.253228]  amdgpu_device_prepare+0x66/0x110
> [   64.253237]  amdgpu_pmops_runtime_suspend+0xbe/0x1c0
> [   64.253248]  pci_pm_runtime_suspend+0x74/0x200
> [   64.253259]  vga_switcheroo_runtime_suspend+0x21/0xb0
> [   64.253268]  __rpm_callback+0x5f/0x190
> [   64.253277]  rpm_callback+0x7f/0x90
> [   64.253283]  rpm_suspend+0x120/0x6a0
> [   64.253290]  pm_runtime_work+0x9c/0xa0
> [   64.253297]  process_one_work+0x164/0x330
> [   64.253310]  worker_thread+0x302/0x430
> [   64.253320]  kthread+0xe4/0x110
> [   64.253329]  ret_from_fork+0x4c/0x60
> [   64.253341]  ret_from_fork_asm+0x1b/0x30
> 
> [   64.253353] kfence-#111: 0xd018cf03-0x34e821d1, size=96, 
> cache=kmalloc-96
> 
> [   64.253363] allocated by task 152 on cpu 3 at 64.248952s:
> [   64.253418]  kmalloc_trace+0x283/0x340
> [   64.253427]  amdgpu_vram_mgr_new+0x8f/0x3f0
> [   64.253435]  ttm_resource_alloc+0x39/0x90
> [   64.253444]  ttm_bo_mem_space+0xa4/0x260
> [   64.253450]  ttm_mem_evict_first+0x18a/0x500
> [   64.253456]  ttm_resource_manager_evict_all+0xa3/0x1e0
> [   64.253465]  amdgpu_device_prepare+0x66/0x110
> [   64.253472]  amdgpu_pmops_runtime_suspend+0xbe/0x1c0
> [   64.253481]  pci_pm_runtime_suspend+0x74/0x200
> [   64.253489]  vga_switcheroo_runtime_suspend+0x21/0xb0
> [   64.253496]  __rpm_callback+0x5f/0x190
> [   64.253503]  rpm_callback+0x7f/0x90
> [   64.253509]  rpm_suspend+0x120/0x6a0
> [   64.253516]  pm_runtime_work+0x9c/0xa0
> [   64.253523]  process_one_work+0x164/0x330
> [   64.253532]  worker_thread+0x302/0x430
> [   64.253542]  kthread+0xe4/0x110
> [   64.253550]  ret_from_fork+0x4c/0x60
> [   64.253559]  ret_from_fork_asm+0x1b/0x30
> 
> [   64.253570] freed by task 152 on cpu 3 at 64.253117s:
> [   64.253582]  ttm_resource_free+0x67/0x90
> [   64.253591]  ttm_bo_move_accel_cleanup+0x247/0x2e0
> [   64.253598]  amdgpu_bo_move+0x1bd/0x7a0
> [   64.253605]  ttm_bo_handle_move_mem+0xcf/0x180
> [   64.253612]  ttm_mem_evict_first+0x1c5/0x500
> [   64.253618]  ttm_resource_manager_evict_all+0xa3/0x1e0
> [   64.253626]  amdgpu_device_prepare+0x66/0x110
> [   64.253634]  amdgpu_pmops_runtime_suspend+0xbe/0x1c0
> [   64.253642]  pci_pm_runtime_suspend+0x74/0x200
> [   64.253650]  vga_switcheroo_runtime_suspend+0x21/0xb0
> [   64.253658]  __rpm_callback+0x5f/0x190
> [   64.253664]  rpm_callback+0x7f/0x90
> [   64.253671]  rpm_suspend+0x120/0x6a0
> [   64.253677]  pm_runtime_work+0x9c/0xa0
> [   64.253684]  process_one_work+0x164/0x330
> [   64.253693]  worker_thread+0x302/0x430
> [   64.253703]  kthread+0xe4/0x110
> [   64.253711]  ret_from_fork+0x4c/0x60
> [   64.253723]  ret_from_fork_asm+0x1b/0x30
> 
> [   64.253735] CPU: 3 PID: 152 Comm: kworker/3:2 Tainted: P   OE  
> 6.8.9 #3 e7323d0d25f89e853881fc823e59523bdcc577c6
> [   64.253756] Hardware name: Hewlett-Packard HP Pavilion Notebook /80B9, 
> BIOS F.54 05/27/2019
> [   64.253761] Workqueue: pm pm_runtime_work
> [   64.253771] 
> ==
> 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH 02/20] x86/mce: Define mce_setup() helpers for global and per-CPU fields

2023-11-22 Thread Borislav Petkov
On Sat, Nov 18, 2023 at 01:32:30PM -0600, Yazen Ghannam wrote:
> +void mce_setup_global(struct mce *m)

We usually call those things "common":

mce_setup_common().

> +{
> + memset(m, 0, sizeof(struct mce));
> +
> + m->cpuid= cpuid_eax(1);
> + m->cpuvendor= boot_cpu_data.x86_vendor;
> + m->mcgcap   = __rdmsr(MSR_IA32_MCG_CAP);
> + /* need the internal __ version to avoid deadlocks */
> + m->time = __ktime_get_real_seconds();
> +}
> +
> +void mce_setup_per_cpu(struct mce *m)

And call this

mce_setup_for_cpu(unsigned int cpu, struct mce *m);

so that it doesn't look like some per_cpu helper.

And yes, you should supply the CPU number as an argument. Because
otherwise, when you look at your next change:


+   mce_setup_global(&m);
+   m.cpu = m.extcpu = cpu;
+   mce_setup_per_cpu(&m);

This contains the "hidden" requirement that m.extcpu happens *always*
*before* the mce_setup_per_cpu() call and that is flaky and error prone.

So make that:

mce_setup_common(&m);
mce_setup_for_cpu(m.extcpu, &m);

and do m.cpu = m.extcpu = cpu inside the second function.

And then it JustWorks(tm) and you can't "forget" assigning m.extcpu and
there's no subtlety.

Ok?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 03/20] x86/mce: Use mce_setup() helpers for apei_smca_report_x86_error()

2023-11-22 Thread Borislav Petkov
On Sat, Nov 18, 2023 at 01:32:31PM -0600, Yazen Ghannam wrote:
> Current AMD systems may report MCA errors using the ACPI Boot Error
> Record Table (BERT). The BERT entries for MCA errors will be an x86
> Common Platform Error Record (CPER) with an MSR register context that
> matches the MCAX/SMCA register space.
> 
> However, the BERT will not necessarily be processed on the CPU that
> reported the MCA errors. Therefore, the correct CPU number needs to be
> determined and the information saved in struct mce.
> 
> The CPU number is determined by searching all possible CPUs for a Local
> APIC ID matching the value in the x86 CPER.

Those below are explaining what the patch does. Not needed here.

> Set up the MCA record after searching for a CPU number. If no possible
> CPU was found, then return early.
> 
> Gather the global MCA information first, save the found CPU number, then
> gather the per-CPU information.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 05/20] x86/mce/amd: Use helper for UMC bank type check

2023-11-27 Thread Borislav Petkov
On Sat, Nov 18, 2023 at 01:32:33PM -0600, Yazen Ghannam wrote:
> @@ -714,14 +721,10 @@ static bool legacy_mce_is_memory_error(struct mce *m)
>   */
>  static bool smca_mce_is_memory_error(struct mce *m)
>  {
> - enum smca_bank_types bank_type;
> -
>   if (XEC(m->status, 0x3f))
>   return false;
>  
> - bank_type = smca_get_bank_type(m->extcpu, m->bank);
> -
> - return bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2;
> + return smca_umc_bank_type(m->ipid);

return FIELD_GET(MCI_IPID_HWID, ipid) == IPID_TYPE_UMC;

after having done:

#define IPID_TYPE_UMC   0x96;

and you don't need that silly helper.

And then you can do more cleanups ontop by doing

/* Unified Memory Controller MCA type */
{ SMCA_UMC,  HWID_MCATYPE(IPID_TYPE_UMC, 0x0)},
{ SMCA_UMC_V2,   HWID_MCATYPE(IPID_TYPE_UMC, 0x1)},

and have all the numbering properly defined and abstracted away.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 06/20] x86/mce/amd: Use helper for GPU UMC bank type checks

2023-11-27 Thread Borislav Petkov
On Sat, Nov 18, 2023 at 01:32:34PM -0600, Yazen Ghannam wrote:
> +/* GPU UMCs have MCATYPE=0x1.*/
> +bool smca_gpu_umc_bank_type(u64 ipid)
> +{
> + if (!smca_umc_bank_type(ipid))
> + return false;
> +
> + return FIELD_GET(MCI_IPID_MCATYPE, ipid) == 0x1;
> +}

And now this tells you that you want to use

u32 hwid_mcatype;   /* (hwid,mcatype) tuple */

everywhere and do your macros around that thing.

No need for yet another helper as this all is static information which
doesn't change.

> +EXPORT_SYMBOL_GPL(smca_gpu_umc_bank_type);

Definitely not another export.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-07-28 Thread Borislav Petkov
On Wed, Jul 28, 2021 at 02:17:27PM +0100, Christoph Hellwig wrote:
> So common checks obviously make sense, but I really hate the stupid
> multiplexer.  Having one well-documented helper per feature is much
> easier to follow.

We had that in x86 - it was called cpu_has_ where xxx is the
feature bit. It didn't scale with the sheer amount of feature bits that
kept getting added so we do cpu_feature_enabled(X86_FEATURE_XXX) now.

The idea behind this is very similar - those protected guest flags
will only grow in the couple of tens range - at least - so having a
multiplexer is a lot simpler, I'd say, than having a couple of tens of
helpers. And those PATTR flags should have good, readable names, btw.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 01/12] x86/ioremap: Selectively build arch override encryption functions

2021-08-16 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:20AM -0500, Tom Lendacky wrote:
> In prep for other uses of the prot_guest_has() function besides AMD's
> memory encryption support, selectively build the AMD memory encryption
> architecture override functions only when CONFIG_AMD_MEM_ENCRYPT=y. These
> functions are:
> - early_memremap_pgprot_adjust()
> - arch_memremap_can_ram_remap()
> 
> Additionally, routines that are only invoked by these architecture
> override functions can also be conditionally built. These functions are:
> - memremap_should_map_decrypted()
> - memremap_is_efi_data()
> - memremap_is_setup_data()
> - early_memremap_is_setup_data()
> 
> And finally, phys_mem_access_encrypted() is conditionally built as well,
> but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is
> not set.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/io.h | 8 
>  arch/x86/mm/ioremap.c | 2 +-
>  2 files changed, 9 insertions(+), 1 deletion(-)

LGTM.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-16 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:22AM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/include/asm/protected_guest.h 
> b/arch/x86/include/asm/protected_guest.h
> new file mode 100644
> index ..51e4eefd9542
> --- /dev/null
> +++ b/arch/x86/include/asm/protected_guest.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Protected Guest (and Host) Capability checks
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + */
> +
> +#ifndef _X86_PROTECTED_GUEST_H
> +#define _X86_PROTECTED_GUEST_H
> +
> +#include 
> +
> +#ifndef __ASSEMBLY__
> +
> +static inline bool prot_guest_has(unsigned int attr)
> +{
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + if (sme_me_mask)
> + return amd_prot_guest_has(attr);
> +#endif
> +
> + return false;
> +}
> +
> +#endif   /* __ASSEMBLY__ */
> +
> +#endif   /* _X86_PROTECTED_GUEST_H */

I think this can be simplified more, diff ontop below:

- no need for the ifdeffery as amd_prot_guest_has() has versions for
both when CONFIG_AMD_MEM_ENCRYPT is set or not.

- the sme_me_mask check is pushed there too.

- and since this is vendor-specific, I'm checking the vendor bit. Yeah,
yeah, cross-vendor but I don't really believe that.

---
diff --git a/arch/x86/include/asm/protected_guest.h 
b/arch/x86/include/asm/protected_guest.h
index 51e4eefd9542..8541c76d5da4 100644
--- a/arch/x86/include/asm/protected_guest.h
+++ b/arch/x86/include/asm/protected_guest.h
@@ -12,18 +12,13 @@
 
 #include 
 
-#ifndef __ASSEMBLY__
-
 static inline bool prot_guest_has(unsigned int attr)
 {
-#ifdef CONFIG_AMD_MEM_ENCRYPT
-   if (sme_me_mask)
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+   boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
return amd_prot_guest_has(attr);
-#endif
 
return false;
 }
 
-#endif /* __ASSEMBLY__ */
-
 #endif /* _X86_PROTECTED_GUEST_H */
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index edc67ddf065d..5a0442a6f072 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -392,6 +392,9 @@ bool noinstr sev_es_active(void)
 
 bool amd_prot_guest_has(unsigned int attr)
 {
+   if (!sme_me_mask)
+   return false;
+
switch (attr) {
case PATTR_MEM_ENCRYPT:
return sme_me_mask != 0;

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-16 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote:
> In prep for other protected virtualization technologies, introduce a
> generic helper function, prot_guest_has(), that can be used to check
> for specific protection attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks
> to the code (e.g. if (sev_active() || tdx_active())).
> 
> Reviewed-by: Joerg Roedel 
> Co-developed-by: Andi Kleen 
> Signed-off-by: Andi Kleen 
> Co-developed-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/Kconfig|  3 +++
>  include/linux/protected_guest.h | 35 +
>  2 files changed, 38 insertions(+)
>  create mode 100644 include/linux/protected_guest.h
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 98db63496bab..bd4f60c581f1 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1231,6 +1231,9 @@ config RELR
>  config ARCH_HAS_MEM_ENCRYPT
>   bool
>  
> +config ARCH_HAS_PROTECTED_GUEST
> + bool
> +
>  config HAVE_SPARSE_SYSCALL_NR
> bool
> help
> diff --git a/include/linux/protected_guest.h b/include/linux/protected_guest.h
> new file mode 100644
> index ..43d4dde94793
> --- /dev/null
> +++ b/include/linux/protected_guest.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Protected Guest (and Host) Capability checks
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + */
> +
> +#ifndef _PROTECTED_GUEST_H
> +#define _PROTECTED_GUEST_H
> +
> +#ifndef __ASSEMBLY__
   ^

Do you really need that guard? It builds fine without it too. Or
something coming later does need it...?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-16 Thread Borislav Petkov
On Sun, Aug 15, 2021 at 08:53:31AM -0500, Tom Lendacky wrote:
> It's not a cross-vendor thing as opposed to a KVM or other hypervisor
> thing where the family doesn't have to be reported as AMD or HYGON.

What would be the use case? A HV starts a guest which is supposed to be
encrypted using the AMD's confidential guest technology but the HV tells
the guest that it is not running on an AMD SVM HV but something else?

Is that even an actual use case?

Or am I way off?

I know we have talked about this in the past but this still sounds
insane.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 09/12] mm: Remove the now unused mem_encrypt_active() function

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 12:22:33PM +0200, Borislav Petkov wrote:
> This one wants to be part of the previous patch.

... and the three following patches too - the treewide patch does a
single atomic :) replacement and that's it.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 05/12] x86/sme: Replace occurrences of sme_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:24AM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index edc67ddf065d..5635ca9a1fbe 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -144,7 +144,7 @@ void __init sme_unmap_bootdata(char *real_mode_data)
>   struct boot_params *boot_data;
>   unsigned long cmdline_paddr;
>  
> - if (!sme_active())
> + if (!amd_prot_guest_has(PATTR_SME))
>   return;
>  
>   /* Get the command line address before unmapping the real_mode_data */
> @@ -164,7 +164,7 @@ void __init sme_map_bootdata(char *real_mode_data)
>   struct boot_params *boot_data;
>   unsigned long cmdline_paddr;
>  
> - if (!sme_active())
> + if (!amd_prot_guest_has(PATTR_SME))
>   return;
>  
>   __sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), true);
> @@ -378,7 +378,7 @@ bool sev_active(void)
>   return sev_status & MSR_AMD64_SEV_ENABLED;
>  }
>  
> -bool sme_active(void)
> +static bool sme_active(void)

Just get rid of it altogether. Also, there's an

EXPORT_SYMBOL_GPL(sev_active);

which needs to go under the actual function. Here's a diff ontop:

---
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 5635ca9a1fbe..a3a2396362a5 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -364,8 +364,9 @@ int __init early_set_memory_encrypted(unsigned long vaddr, 
unsigned long size)
 /*
  * SME and SEV are very similar but they are not the same, so there are
  * times that the kernel will need to distinguish between SME and SEV. The
- * sme_active() and sev_active() functions are used for this.  When a
- * distinction isn't needed, the mem_encrypt_active() function can be used.
+ * PATTR_HOST_MEM_ENCRYPT and PATTR_GUEST_MEM_ENCRYPT flags to
+ * amd_prot_guest_has() are used for this. When a distinction isn't needed,
+ * the mem_encrypt_active() function can be used.
  *
  * The trampoline code is a good example for this requirement.  Before
  * paging is activated, SME will access all memory as decrypted, but SEV
@@ -377,11 +378,6 @@ bool sev_active(void)
 {
return sev_status & MSR_AMD64_SEV_ENABLED;
 }
-
-static bool sme_active(void)
-{
-   return sme_me_mask && !sev_active();
-}
 EXPORT_SYMBOL_GPL(sev_active);
 
 /* Needs to be called from non-instrumentable code */
@@ -398,7 +394,7 @@ bool amd_prot_guest_has(unsigned int attr)
 
case PATTR_SME:
case PATTR_HOST_MEM_ENCRYPT:
-   return sme_active();
+   return sme_me_mask && !sev_active();
 
case PATTR_SEV:
case PATTR_GUEST_MEM_ENCRYPT:

>  {
>   return sme_me_mask && !sev_active();
>  }
> @@ -428,7 +428,7 @@ bool force_dma_unencrypted(struct device *dev)
>* device does not support DMA to addresses that include the
>* encryption mask.
>*/
> - if (sme_active()) {
> + if (amd_prot_guest_has(PATTR_SME)) {

So I'm not sure: you add PATTR_SME which you call with
amd_prot_guest_has() and PATTR_HOST_MEM_ENCRYPT which you call with
prot_guest_has() and they both end up being the same thing on AMD.

So why even bother with PATTR_SME?

This is only going to cause confusion later and I'd say let's simply use
prot_guest_has(PATTR_HOST_MEM_ENCRYPT) everywhere...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 06/12] x86/sev: Replace occurrences of sev_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:25AM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 8e7b517ad738..66ff788b79c9 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -167,7 +167,7 @@ static int init_transition_pgtable(struct kimage *image, 
> pgd_t *pgd)
>   }
>   pte = pte_offset_kernel(pmd, vaddr);
>  
> - if (sev_active())
> + if (prot_guest_has(PATTR_GUEST_MEM_ENCRYPT))
>   prot = PAGE_KERNEL_EXEC;
>  
>   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
> @@ -207,7 +207,7 @@ static int init_pgtable(struct kimage *image, unsigned 
> long start_pgtable)
>   level4p = (pgd_t *)__va(start_pgtable);
>   clear_page(level4p);
>  
> - if (sev_active()) {
> + if (prot_guest_has(PATTR_GUEST_MEM_ENCRYPT)) {
>   info.page_flag   |= _PAGE_ENC;
>   info.kernpg_flag |= _PAGE_ENC;
>   }
> @@ -570,12 +570,12 @@ void arch_kexec_unprotect_crashkres(void)
>   */
>  int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
>  {
> - if (sev_active())
> + if (!prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
>   return 0;
>  
>   /*
> -  * If SME is active we need to be sure that kexec pages are
> -  * not encrypted because when we boot to the new kernel the
> +  * If host memory encryption is active we need to be sure that kexec
> +  * pages are not encrypted because when we boot to the new kernel the
>* pages won't be accessed encrypted (initially).
>*/

That hunk belongs logically into the previous patch which removes
sme_active().

>   return set_memory_decrypted((unsigned long)vaddr, pages);
> @@ -583,12 +583,12 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned 
> int pages, gfp_t gfp)
>  
>  void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>  {
> - if (sev_active())
> + if (!prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
>   return;
>  
>   /*
> -  * If SME is active we need to reset the pages back to being
> -  * an encrypted mapping before freeing them.
> +  * If host memory encryption is active we need to reset the pages back
> +  * to being an encrypted mapping before freeing them.
>*/
>   set_memory_encrypted((unsigned long)vaddr, pages);
>  }
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e8ccab50ebf6..b69f5ac622d5 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -457,7 +458,7 @@ static int has_svm(void)
>   return 0;
>   }
>  
> - if (sev_active()) {
> + if (prot_guest_has(PATTR_SEV)) {
>   pr_info("KVM is unsupported when running as an SEV guest\n");
>   return 0;

Same question as for PATTR_SME. PATTR_GUEST_MEM_ENCRYPT should be enough.

> @@ -373,7 +373,7 @@ int __init early_set_memory_encrypted(unsigned long 
> vaddr, unsigned long size)
>   * up under SME the trampoline area cannot be encrypted, whereas under SEV
>   * the trampoline area must be encrypted.
>   */
> -bool sev_active(void)
> +static bool sev_active(void)
>  {
>   return sev_status & MSR_AMD64_SEV_ENABLED;
>  }
> @@ -382,7 +382,6 @@ static bool sme_active(void)
>  {
>   return sme_me_mask && !sev_active();
>  }
> -EXPORT_SYMBOL_GPL(sev_active);

Just get rid of it altogether.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 07/12] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:26AM -0500, Tom Lendacky wrote:
> Replace occurrences of sev_es_active() with the more generic
> prot_guest_has() using PATTR_GUEST_PROT_STATE, except for in
> arch/x86/kernel/sev*.c and arch/x86/mm/mem_encrypt*.c where PATTR_SEV_ES
> will be used. If future support is added for other memory encyrption
> techonologies, the use of PATTR_GUEST_PROT_STATE can be updated, as
> required, to specifically use PATTR_SEV_ES.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/mem_encrypt.h | 2 --
>  arch/x86/kernel/sev.c  | 6 +++---
>  arch/x86/mm/mem_encrypt.c  | 7 +++
>  arch/x86/realmode/init.c   | 3 +--
>  4 files changed, 7 insertions(+), 11 deletions(-)

Same comments to this one as for the previous two.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 09/12] mm: Remove the now unused mem_encrypt_active() function

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:28AM -0500, Tom Lendacky wrote:
> The mem_encrypt_active() function has been replaced by prot_guest_has(),
> so remove the implementation.
> 
> Reviewed-by: Joerg Roedel 
> Signed-off-by: Tom Lendacky 
> ---
>  include/linux/mem_encrypt.h | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
> index 5c4a18a91f89..ae4526389261 100644
> --- a/include/linux/mem_encrypt.h
> +++ b/include/linux/mem_encrypt.h
> @@ -16,10 +16,6 @@
>  
>  #include 
>  
> -#else/* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
> -
> -static inline bool mem_encrypt_active(void) { return false; }
> -
>  #endif   /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
>  
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
> -- 

This one wants to be part of the previous patch.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 04/12] powerpc/pseries/svm: Add a powerpc version of prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:23AM -0500, Tom Lendacky wrote:
> Introduce a powerpc version of the prot_guest_has() function. This will
> be used to replace the powerpc mem_encrypt_active() implementation, so
> the implementation will initially only support the PATTR_MEM_ENCRYPT
> attribute.
> 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/powerpc/include/asm/protected_guest.h | 30 ++
>  arch/powerpc/platforms/pseries/Kconfig |  1 +
>  2 files changed, 31 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/protected_guest.h
> 
> diff --git a/arch/powerpc/include/asm/protected_guest.h 
> b/arch/powerpc/include/asm/protected_guest.h
> new file mode 100644
> index ..ce55c2c7e534
> --- /dev/null
> +++ b/arch/powerpc/include/asm/protected_guest.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Protected Guest (and Host) Capability checks
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + */
> +
> +#ifndef _POWERPC_PROTECTED_GUEST_H
> +#define _POWERPC_PROTECTED_GUEST_H
> +
> +#include 
> +
> +#ifndef __ASSEMBLY__

Same thing here. Pls audit the whole set whether those __ASSEMBLY__
guards are really needed and remove them if not.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 10:22:52AM -0500, Tom Lendacky wrote:
> I can change it to be an AMD/HYGON check...  although, I'll have to check
> to see if any (very) early use of the function will work with that.

We can always change it later if really needed. It is just that I'm not
a fan of such "preemptive" changes.

> At a minimum, the check in arch/x86/kernel/head64.c will have to be
> changed or removed. I'll take a closer look.

Yeah, sme_me_mask, already discussed on IRC.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 06/12] x86/sev: Replace occurrences of sev_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 10:26:18AM -0500, Tom Lendacky wrote:
> >>/*
> >> -   * If SME is active we need to be sure that kexec pages are
> >> -   * not encrypted because when we boot to the new kernel the
> >> +   * If host memory encryption is active we need to be sure that kexec
> >> +   * pages are not encrypted because when we boot to the new kernel the
> >> * pages won't be accessed encrypted (initially).
> >> */
> > 
> > That hunk belongs logically into the previous patch which removes
> > sme_active().
> 
> I was trying to keep the sev_active() changes separate... so even though
> it's an SME thing, I kept it here. But I can move it to the previous
> patch, it just might look strange.

Oh I meant only the comment because it is a SME-related change. But not
too important so whatever.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 05/12] x86/sme: Replace occurrences of sme_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 09:46:58AM -0500, Tom Lendacky wrote:
> I'm ok with letting the TDX folks make changes to these calls to be SME or
> SEV specific, if necessary, later.

Yap, exactly. Let's add the specific stuff only when really needed.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-19 Thread Borislav Petkov
On Thu, Aug 19, 2021 at 10:52:53AM +0100, Christoph Hellwig wrote:
> Which suggest that the name is not good to start with.  Maybe protected
> hardware, system or platform might be a better choice?

Yah, coming up with a proper name here hasn't been easy.
prot_guest_has() is not the first variant.

>From all three things you suggest above, I guess calling it a "platform"
is the closest. As in, this is a confidential computing platform which
provides host and guest facilities etc.

So calling it

confidential_computing_platform_has()

is obviously too long.

ccp_has() clashes with the namespace of drivers/crypto/ccp/ which is
used by the technology too.

coco_platform_has() is too unserious.

So I guess

cc_platform_has()

ain't all that bad.

Unless you have a better idea, ofc.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

2021-08-23 Thread Borislav Petkov
On Mon, Aug 23, 2021 at 03:49:39PM -0400, Alex Deucher wrote:
> Maybe fixed with this patch?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5706cb3c910cc8283f344bc37a889a8d523a2c6d

Nope, this one is already in:

$ git tag --contains 5706cb3c910cc8283f344bc37a889a8d523a2c6d
v5.14-rc5
v5.14-rc6
v5.14-rc7

also, from only a quick poke so IMHO, the error says:

ERROR: modpost: "pm_suspend_target_state" 
[drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

which means you need the

EXPORT_SYMBOL_GPL(pm_suspend_target_state);

which is in kernel/power/suspend.c which gets enabled with

obj-$(CONFIG_SUSPEND)   += suspend.o

and if you look at PM_SLEEP:

config PM_SLEEP
def_bool y
depends on SUSPEND || HIBERNATE_CALLBACKS

(notice the ||)

and my randconfig has:

$ grep -E "(HIBERNATE_CALLBACKS|SUSPEND)" .config
# CONFIG_SUSPEND is not set
CONFIG_HIBERNATE_CALLBACKS=y

which means, you need:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index 4137e848f6a2..a9ce3b20d371 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1040,7 +1040,7 @@ void amdgpu_acpi_detect(void)
  */
 bool amdgpu_acpi_is_s0ix_supported(struct amdgpu_device *adev)
 {
-#if IS_ENABLED(CONFIG_AMD_PMC) && IS_ENABLED(CONFIG_PM_SLEEP)
+#if IS_ENABLED(CONFIG_AMD_PMC) && IS_ENABLED(CONFIG_SUSPEND)
if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) {
if (adev->flags & AMD_IS_APU)
return pm_suspend_target_state == PM_SUSPEND_TO_IDLE;


but whether that gives you what you want for amdgpu, you probably need
to ponder on a bit.

I sincerely hope that helps a little.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

2021-08-23 Thread Borislav Petkov
On Mon, Aug 23, 2021 at 04:31:42PM -0400, Alex Deucher wrote:
> Thanks. I think that should do the trick. Care to send that as a
> formal patch?

Sure, but let me run it through the randconfigs tests first to make sure
nothing else breaks. It is late here so if I don't manage now I'll send
you a formal version tomorrow morning, CET, the latest.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
From: Borislav Petkov 

Building a randconfig here triggered:

  ERROR: modpost: "pm_suspend_target_state" 
[drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

because the module export of that symbol happens in
kernel/power/suspend.c which is enabled with CONFIG_SUSPEND.

The ifdef guards in amdgpu_acpi_is_s0ix_supported(), however, test for
CONFIG_PM_SLEEP which is defined like this:

  config PM_SLEEP
  def_bool y
  depends on SUSPEND || HIBERNATE_CALLBACKS

and that randconfig has:

  # CONFIG_SUSPEND is not set
  CONFIG_HIBERNATE_CALLBACKS=y

leading to the module export missing.

Change the ifdeffery to depend directly on CONFIG_SUSPEND.

Fixes: 5706cb3c910c ("drm/amdgpu: fix checking pmops when PM_SLEEP is not 
enabled")
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/ysp6lv53qv0co...@zn.tnic
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index 4137e848f6a2..a9ce3b20d371 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1040,7 +1040,7 @@ void amdgpu_acpi_detect(void)
  */
 bool amdgpu_acpi_is_s0ix_supported(struct amdgpu_device *adev)
 {
-#if IS_ENABLED(CONFIG_AMD_PMC) && IS_ENABLED(CONFIG_PM_SLEEP)
+#if IS_ENABLED(CONFIG_AMD_PMC) && IS_ENABLED(CONFIG_SUSPEND)
if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) {
if (adev->flags & AMD_IS_APU)
return pm_suspend_target_state == PM_SUSPEND_TO_IDLE;
-- 
2.29.2


-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
On Tue, Aug 24, 2021 at 06:38:41PM +0530, Lazar, Lijo wrote:
> Without CONFIG_PM_SLEEP and with CONFIG_SUSPEND

Can you even create such a .config?

> I remember giving a reviewed-by for this one, looks like it never got in.
> https://www.spinics.net/lists/amd-gfx/msg66166.html

A better version of that one got in:

5706cb3c910c ("drm/amdgpu: fix checking pmops when PM_SLEEP is not enabled")

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
On Tue, Aug 24, 2021 at 07:22:46PM +0530, Lazar, Lijo wrote:
> 'pm_suspend_target_state' is only available when CONFIG_PM_SLEEP
> is set/enabled.

pm_suspend_target_state is available only when CONFIG_SUSPEND is
enabled. The extern thing is only a forward declaration.

> OTOH, when both SUSPEND and HIBERNATION are not set,
> PM_SLEEP is not set, so this variable cannot be used.

And it will not be used.

> ../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c: In function
> ‘amdgpu_acpi_is_s0ix_active’:
> ../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:1046:11: error:
> ‘pm_suspend_target_state’ undeclared (first use in this function); did you
> mean ‘__KSYM_pm_suspend_target_state’?
> return pm_suspend_target_state == PM_SUSPEND_TO_IDLE;
>^~~
>__KSYM_pm_suspend_target_state

That looks like the .config didn't have CONFIG_SUSPEND enabled.

> Also use shorter IS_ENABLED(CONFIG_foo) notation for checking the
> 2 config symbols.

What shorter notation?

> So it does look like that error can be extracted as well in some
> config.

Yah, when CONFIG_SUSPEND=n.

> Well, now it doesn't seem to be a better one. The original one checked
> both.

I don't see a reason for checking both.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 2/8] mm: Introduce a function to check for confidential computing features

2021-09-13 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:33PM -0500, Tom Lendacky wrote:
> In prep for other confidential computing technologies, introduce a generic

preparation

> helper function, cc_platform_has(), that can be used to check for specific
> active confidential computing attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks to
> the code (e.g. if (sev_active() || tdx_active())).

...

> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> new file mode 100644
> index ..253f3ea66cd8
> --- /dev/null
> +++ b/include/linux/cc_platform.h
> @@ -0,0 +1,88 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Confidential Computing Platform Capability checks
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + */
> +
> +#ifndef _CC_PLATFORM_H

_LINUX_CC_PLATFORM_H

> +#define _CC_PLATFORM_H

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 3/8] x86/sev: Add an x86 version of cc_platform_has()

2021-09-13 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:34PM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
> new file mode 100644
> index ..3c9bacd3c3f3
> --- /dev/null
> +++ b/arch/x86/kernel/cc_platform.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Confidential Computing Platform Capability checks
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +bool cc_platform_has(enum cc_attr attr)
> +{
> + if (sme_me_mask)

Why are you still checking the sme_me_mask here? AFAIR, we said that
we'll do that only when the KVM folks come with a valid use case...

> + return amd_cc_platform_has(attr);
> +
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(cc_platform_has);
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ff08dc463634..18fe19916bc3 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -389,6 +390,26 @@ bool noinstr sev_es_active(void)
>   return sev_status & MSR_AMD64_SEV_ES_ENABLED;
>  }
>  
> +bool amd_cc_platform_has(enum cc_attr attr)
> +{
> + switch (attr) {
> + case CC_ATTR_MEM_ENCRYPT:
> + return sme_me_mask != 0;

No need for the "!= 0"

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-14 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:35PM -0500, Tom Lendacky wrote:
> Introduce a powerpc version of the cc_platform_has() function. This will
> be used to replace the powerpc mem_encrypt_active() implementation, so
> the implementation will initially only support the CC_ATTR_MEM_ENCRYPT
> attribute.
> 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/powerpc/platforms/pseries/Kconfig   |  1 +
>  arch/powerpc/platforms/pseries/Makefile  |  2 ++
>  arch/powerpc/platforms/pseries/cc_platform.c | 26 
>  3 files changed, 29 insertions(+)
>  create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c

Michael,

can I get an ACK for the ppc bits to carry them through the tip tree
pls?

Btw, on a related note, cross-compiling this throws the following error here:

$ make 
CROSS_COMPILE=/home/share/src/crosstool/gcc-9.4.0-nolibc/powerpc64-linux/bin/powerpc64-linux-
 V=1 ARCH=powerpc

...

/home/share/src/crosstool/gcc-9.4.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc
 -Wp,-MD,arch/powerpc/boot/.crt0.o.d -D__ASSEMBLY__ -Wall -Wundef 
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -O2 -msoft-float 
-mno-altivec -mno-vsx -pipe -fomit-frame-pointer -fno-builtin -fPIC -nostdinc 
-include ./include/linux/compiler_attributes.h -I./arch/powerpc/include 
-I./arch/powerpc/include/generated  -I./include -I./arch/powerpc/include/uapi 
-I./arch/powerpc/include/generated/uapi -I./include/uapi 
-I./include/generated/uapi -include ./include/linux/compiler-version.h -include 
./include/linux/kconfig.h -m32 -isystem 
/home/share/src/crosstool/gcc-9.4.0-nolibc/powerpc64-linux/bin/../lib/gcc/powerpc64-linux/9.4.0/include
 -mbig-endian -nostdinc -c -o arch/powerpc/boot/crt0.o arch/powerpc/boot/crt0.S
In file included from :
././include/linux/compiler_attributes.h:62:5: warning: "__has_attribute" is not 
defined, evaluates to 0 [-Wundef]
   62 | #if __has_attribute(__assume_aligned__)
  | ^~~
././include/linux/compiler_attributes.h:62:20: error: missing binary operator 
before token "("
   62 | #if __has_attribute(__assume_aligned__)
  |^
././include/linux/compiler_attributes.h:88:5: warning: "__has_attribute" is not 
defined, evaluates to 0 [-Wundef]
   88 | #if __has_attribute(__copy__)
  | ^~~
...

Known issue?

This __has_attribute() thing is supposed to be supported
in gcc since 5.1 and I'm using the crosstool stuff from
https://www.kernel.org/pub/tools/crosstool/ and gcc-9.4 above is pretty
new so that should not happen actually.

But it does...

Hmmm.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-14 Thread Borislav Petkov
On Tue, Sep 14, 2021 at 04:47:41PM +0200, Christophe Leroy wrote:
> Yes, see 
> https://lore.kernel.org/linuxppc-dev/20210914123919.58203...@canb.auug.org.au/T/#t

Aha, more compiler magic stuff ;-\

Oh well, I guess that fix will land upstream soon.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-14 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:36PM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 18fe19916bc3..4b54a2377821 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -144,7 +144,7 @@ void __init sme_unmap_bootdata(char *real_mode_data)
>   struct boot_params *boot_data;
>   unsigned long cmdline_paddr;
>  
> - if (!sme_active())
> + if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
>   return;
>  
>   /* Get the command line address before unmapping the real_mode_data */
> @@ -164,7 +164,7 @@ void __init sme_map_bootdata(char *real_mode_data)
>   struct boot_params *boot_data;
>   unsigned long cmdline_paddr;
>  
> - if (!sme_active())
> + if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
>   return;
>  
>   __sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), true);
> @@ -377,11 +377,6 @@ bool sev_active(void)
>  {
>   return sev_status & MSR_AMD64_SEV_ENABLED;
>  }
> -
> -bool sme_active(void)
> -{
> - return sme_me_mask && !sev_active();
> -}
>  EXPORT_SYMBOL_GPL(sev_active);
>  
>  /* Needs to be called from non-instrumentable code */

You forgot this hunk:

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 5635ca9a1fbe..a3a2396362a5 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -364,8 +364,9 @@ int __init early_set_memory_encrypted(unsigned long vaddr, 
unsigned long size)
 /*
  * SME and SEV are very similar but they are not the same, so there are
  * times that the kernel will need to distinguish between SME and SEV. The
- * sme_active() and sev_active() functions are used for this.  When a
- * distinction isn't needed, the mem_encrypt_active() function can be used.
+ * PATTR_HOST_MEM_ENCRYPT and PATTR_GUEST_MEM_ENCRYPT flags to
+ * amd_prot_guest_has() are used for this. When a distinction isn't needed,
+ * the mem_encrypt_active() function can be used.
  *
  * The trampoline code is a good example for this requirement.  Before
  * paging is activated, SME will access all memory as decrypted, but SEV

because there's still a sme_active() mentioned there:

$ git grep sme_active
arch/x86/mm/mem_encrypt.c:367: * sme_active() and sev_active() functions are 
used for this.  When a

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-15 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 10:28:59AM +1000, Michael Ellerman wrote:
> I don't love it, a new C file and an out-of-line call to then call back
> to a static inline that for most configuration will return false ... but
> whatever :)

Yeah, hch thinks it'll cause a big mess otherwise:

https://lore.kernel.org/lkml/ysscwvpxevxw%2f...@infradead.org/

I guess less ifdeffery is nice too.

> Acked-by: Michael Ellerman  (powerpc)

Thx.

> Yeah, fixed in mainline today, thanks for trying to cross compile :)

Always!

:-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-15 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:31PM -0500, Tom Lendacky wrote:
> This patch series provides a generic helper function, cc_platform_has(),
> to replace the sme_active(), sev_active(), sev_es_active() and
> mem_encrypt_active() functions.
> 
> It is expected that as new confidential computing technologies are
> added to the kernel, they can all be covered by a single function call
> instead of a collection of specific function calls all called from the
> same locations.
> 
> The powerpc and s390 patches have been compile tested only. Can the
> folks copied on this series verify that nothing breaks for them. Also,
> a new file, arch/powerpc/platforms/pseries/cc_platform.c, has been
> created for powerpc to hold the out of line function.

...

> 
> Tom Lendacky (8):
>   x86/ioremap: Selectively build arch override encryption functions
>   mm: Introduce a function to check for confidential computing features
>   x86/sev: Add an x86 version of cc_platform_has()
>   powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
>   x86/sme: Replace occurrences of sme_active() with cc_platform_has()
>   x86/sev: Replace occurrences of sev_active() with cc_platform_has()
>   x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
>   treewide: Replace the use of mem_encrypt_active() with
> cc_platform_has()

Ok, modulo the minor things the plan is to take this through tip after
-rc2 releases in order to pick up the powerpc build fix and have a clean
base (-rc2) to base stuff on, at the same time.

Pls holler if something's still amiss.

Sathya,

if you want to prepare the Intel variant intel_cc_platform_has() ontop
of those and send it to me, that would be good because then I can
integrate it all in one branch which can be used to base future work
ontop.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-15 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 07:18:34PM +0200, Christophe Leroy wrote:
> Could you please provide more explicit explanation why inlining such an
> helper is considered as bad practice and messy ?

Tom already told you to look at the previous threads. Let's read them
together. This one, for example:

https://lore.kernel.org/lkml/ysscwvpxevxw%2f...@infradead.org/

| > To take it out of line, I'm leaning towards the latter, creating a new
| > file that is built based on the ARCH_HAS_PROTECTED_GUEST setting.
| 
| Yes.  In general everytime architectures have to provide the prototype
| and not just the implementation of something we end up with a giant mess
| sooner or later.  In a few cases that is still warranted due to
| performance concerns, but i don't think that is the case here.

So I think what Christoph means here is that you want to have the
generic prototype defined in a header and arches get to implement it
exactly to the letter so that there's no mess.

As to what mess exactly, I'd let him explain that.

> Because as demonstrated in my previous response some days ago, taking that
> outline ends up with an unneccessary ugly generated code and we don't
> benefit front GCC's capability to fold in and opt out unreachable code.

And this is real fast path where a couple of instructions matter or what?

set_memory_encrypted/_decrypted doesn't look like one to me.

> I can't see your point here. Inlining the function wouldn't add any
> ifdeffery as far as I can see.

If the function is touching defines etc, they all need to be visible.
If that function needs to call other functions - which is the case on
x86, perhaps not so much on power - then you need to either ifdef around
them or provide stubs with ifdeffery in the headers. And you need to
make them global functions instead of keeping them static to the same
compilation unit, etc, etc.

With a separate compilation unit, you don't need any of that and it is
all kept in that single file.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-16 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 10:26:06AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> I have a Intel variant patch (please check following patch). But it includes
> TDX changes as well. Shall I move TDX changes to different patch and just
> create a separate patch for adding intel_cc_platform_has()?

Yes, please, so that I can expedite that stuff separately and so that it
can go in early in order for future work to be based ontop.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Borislav Petkov
On Tue, Sep 21, 2021 at 12:04:58PM -0500, Tom Lendacky wrote:
> Looks like instrumentation during early boot. I worked with Boris offline to
> exclude arch/x86/kernel/cc_platform.c from some of the instrumentation and
> that allowed an allyesconfig to boot.

And here's the lineup I have so far, I'd appreciate it if ppc and s390 folks
could run it too:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc2-cc

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Borislav Petkov
On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote:
> I still believe calling cc_platform_has() from __startup_64() is totally
> broken as it lacks proper wrapping while accessing global variables.

Well, one of the issues on the AMD side was using boot_cpu_data too
early and the Intel side uses it too. Can you replace those checks with
is_tdx_guest() or whatever was the helper's name which would check
whether the the kernel is running as a TDX guest, and see if that helps?

Thx.

-- 
Regards/Gruss,
Boris.



Re: [PATCHv2 1/2] x86/MCE/AMD: Export smca_get_bank_type symbol

2021-09-22 Thread Borislav Petkov
On Sun, Sep 12, 2021 at 10:13:10PM -0400, Mukul Joshi wrote:
> Export smca_get_bank_type for use in the AMD GPU
> driver to determine MCA bank while handling correctable
> and uncorrectable errors in GPU UMC.
> 
> v1->v2:
> - Drop the function is_smca_umc_v2().
> - Drop the patch to introduce a new MCE priority (MCE_PRIO_ACEL)
>   for GPU/accelarator cards.

Patch changelog information goes...

> 
> Signed-off-by: Mukul Joshi 
> ---

... under this line so that it gets automatically removed by git when
applying the patch.

Alex, how do you wanna handle this?

Want me to ACK this and you can carry it through your tree along with
the second patch?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCHv2 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-22 Thread Borislav Petkov
On Sun, Sep 12, 2021 at 10:13:11PM -0400, Mukul Joshi wrote:
> On Aldebaran, GPU driver will handle bad page retirement
> even though UMC is host managed. As a result, register a
> bad page retirement handler on the mce notifier chain to
> retire bad pages on Aldebaran.
> 
> v1->v2:
> - Use smca_get_bank_type() to determine MCA bank.
> - Envelope the changes under #ifdef CONFIG_X86_MCE_AMD.
> - Use MCE_PRIORITY_UC instead of MCE_PRIO_ACCEL as we are
>   only handling uncorrectable errors.
> - Use macros to determine UMC instance and channel instance
>   where the uncorrectable error occured.
> - Update the headline.

Same note as for the previous patch.

> +static int amdgpu_bad_page_notifier(struct notifier_block *nb,
> + unsigned long val, void *data)
> +{
> + struct mce *m = (struct mce *)data;
> + struct amdgpu_device *adev = NULL;
> + uint32_t gpu_id = 0;
> + uint32_t umc_inst = 0;
> + uint32_t ch_inst, channel_index = 0;
> + struct ras_err_data err_data = {0, 0, 0, NULL};
> + struct eeprom_table_record err_rec;
> + uint64_t retired_page;
> +
> + /*
> +  * If the error was generated in UMC_V2, which belongs to GPU UMCs,
> +  * and error occurred in DramECC (Extended error code = 0) then only
> +  * process the error, else bail out.
> +  */
> + if (!m || !((smca_get_bank_type(m->bank) == SMCA_UMC_V2) &&
> + (XEC(m->status, 0x1f) == 0x0)))
> + return NOTIFY_DONE;
> +
> + /*
> +  * GPU Id is offset by GPU_ID_OFFSET in MCA_IPID_UMC register.
> +  */
> + gpu_id = GET_MCA_IPID_GPUID(m->ipid) - GPU_ID_OFFSET;
> +
> + adev = find_adev(gpu_id);
> + if (!adev) {
> + dev_warn(adev->dev, "%s: Unable to find adev for gpu_id: %d\n",
> +  __func__, gpu_id);
> + return NOTIFY_DONE;
> + }
> +
> + /*
> +  * If it is correctable error, return.
> +  */
> + if (mce_is_correctable(m)) {
> + return NOTIFY_OK;
> + }

This can run before you find_adev().

> +static void amdgpu_register_bad_pages_mca_notifier(void)
> +{
> + /*
> +  * Register the x86 notifier only once
> +  * with MCE subsystem.
> +  */
> + if (notifier_registered == false) {
> + mce_register_decode_chain(&amdgpu_bad_page_nb);
> + notifier_registered = true;
> + }

I have a patchset which will get rid of the need to do this silliness -
only if I had some time to actually prepare it for submission... :-\

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCHv2 1/2] x86/MCE/AMD: Export smca_get_bank_type symbol

2021-09-22 Thread Borislav Petkov
On Wed, Sep 22, 2021 at 04:27:34PM +, Deucher, Alexander wrote:
> > On Sun, Sep 12, 2021 at 10:13:10PM -0400, Mukul Joshi wrote:
> > > Export smca_get_bank_type for use in the AMD GPU driver to determine
> > > MCA bank while handling correctable and uncorrectable errors in GPU
> > > UMC.
> > >
> > > v1->v2:
> > > - Drop the function is_smca_umc_v2().
> > > - Drop the patch to introduce a new MCE priority (MCE_PRIO_ACEL)
> > >   for GPU/accelarator cards.
> > 
> > Patch changelog information goes...
> > 
> > >
> > > Signed-off-by: Mukul Joshi 
> > > ---
> > 
> > ... under this line so that it gets automatically removed by git when 
> > applying
> > the patch.
> > 
> > Alex, how do you wanna handle this?
> > 
> > Want me to ACK this and you can carry it through your tree along with the
> > second patch?
> 
> That would be great.  Thanks!

Ok, with the above changelog removed:

Acked-by: Borislav Petkov 

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-22 Thread Borislav Petkov
On Wed, Sep 22, 2021 at 05:30:15PM +0300, Kirill A. Shutemov wrote:
> Not fine, but waiting to blowup with random build environment change.

Why is it not fine?

Are you suspecting that the compiler might generate something else and
not a rip-relative access?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-23 Thread Borislav Petkov
On Thu, Sep 23, 2021 at 02:29:07PM +, Yazen Ghannam wrote:
> > +   /*
> > +* If the error was generated in UMC_V2, which belongs to GPU UMCs,
> > +* and error occurred in DramECC (Extended error code = 0) then only
> > +* process the error, else bail out.
> > +*/
> > +   if (!m || !((smca_get_bank_type(m->bank) == SMCA_UMC_V2) &&
> > +   (XEC(m->status, 0x1f) == 0x0)))
> 
> The MCA_STATUS[ErrorCodeExt] field is bits [21:16], so the mask should be
> 0x3f.
> 
> > +   return NOTIFY_DONE;
> > +
> > +   /*
> > +* If it is correctable error, return.
> > +*/
> > +   if (mce_is_correctable(m))
> > +   return NOTIFY_OK;
> 
> Shouldn't this be "NOTIFY_DONE" if "don't care" about this error?

I think the logic here is to stop calling any further consumers on the
notify chain because this is a GPU correctable error and they can't do
anything about it anyway, right?

Or am I misreading it?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-23 Thread Borislav Petkov
On Thu, Sep 23, 2021 at 05:23:21PM +, Yazen Ghannam wrote:
> Shouldn't the error still be reported to EDAC for decoding and counting? I
> think users want this.

You know what happens with users getting ECCs reported, right? They
think immediately their hw is going bad and start wanting to replace
it...

So what does actually tell you if you were a simple user and you had 5
correctable errors in the GPU VRAM?

All you wanna do is play, I'd say.

:-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-23 Thread Borislav Petkov
On Thu, Sep 23, 2021 at 12:05:58AM +0300, Kirill A. Shutemov wrote:
> Unless we find other way to guarantee RIP-relative access, we must use
> fixup_pointer() to access any global variables.

Yah, I've asked compiler folks about any guarantees we have wrt
rip-relative addresses but it doesn't look good. Worst case, we'd have
to do the fixup_pointer() thing.

In the meantime, Tom and I did some more poking at this and here's a
diff ontop.

The direction being that we'll stick both the AMD and Intel
*cc_platform_has() call into cc_platform.c for which instrumentation
will be disabled so no issues with that.

And that will keep all that querying all together in a single file.

---
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index a73712b6ee0e..2d4f5c17d79c 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void);
 void __init mem_encrypt_init(void);
 
 void __init sev_es_init_vc_handling(void);
-bool amd_cc_platform_has(enum cc_attr attr);
 
 #define __bss_decrypted __section(".bss..decrypted")
 
@@ -74,7 +73,6 @@ static inline void __init sme_encrypt_kernel(struct 
boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
-static inline bool amd_cc_platform_has(enum cc_attr attr) { return false; }
 
 static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 
0; }
@@ -103,12 +101,6 @@ static inline u64 sme_get_me_mask(void)
return sme_me_mask;
 }
 
-#if defined(CONFIG_CPU_SUP_INTEL) && defined(CONFIG_ARCH_HAS_CC_PLATFORM)
-bool intel_cc_platform_has(enum cc_attr attr);
-#else
-static inline bool intel_cc_platform_has(enum cc_attr attr) { return false; }
-#endif
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index da54a1805211..97ede7052f77 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -13,6 +13,52 @@
 
 #include 
 
+static bool intel_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_INTEL_TDX_GUEST
+   return false;
+#else
+   return false;
+#endif
+}
+
+/*
+ * SME and SEV are very similar but they are not the same, so there are
+ * times that the kernel will need to distinguish between SME and SEV. The
+ * cc_platform_has() function is used for this.  When a distinction isn't
+ * needed, the CC_ATTR_MEM_ENCRYPT attribute can be used.
+ *
+ * The trampoline code is a good example for this requirement.  Before
+ * paging is activated, SME will access all memory as decrypted, but SEV
+ * will access all memory as encrypted.  So, when APs are being brought
+ * up under SME the trampoline area cannot be encrypted, whereas under SEV
+ * the trampoline area must be encrypted.
+ */
+static bool amd_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   switch (attr) {
+   case CC_ATTR_MEM_ENCRYPT:
+   return sme_me_mask;
+
+   case CC_ATTR_HOST_MEM_ENCRYPT:
+   return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
+
+   case CC_ATTR_GUEST_MEM_ENCRYPT:
+   return sev_status & MSR_AMD64_SEV_ENABLED;
+
+   case CC_ATTR_GUEST_STATE_ENCRYPT:
+   return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+
+   default:
+   return false;
+   }
+#else
+   return false;
+#endif
+}
+
+
 bool cc_platform_has(enum cc_attr attr)
 {
if (sme_me_mask)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 53756ff12295..8321c43554a1 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -60,13 +60,6 @@ static u64 msr_test_ctrl_cache __ro_after_init;
  */
 static bool cpu_model_supports_sld __ro_after_init;
 
-#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
-bool intel_cc_platform_has(enum cc_attr attr)
-{
-   return false;
-}
-#endif
-
 /*
  * Processors which have self-snooping capability can handle conflicting
  * memory type across CPUs by snooping its own cache. However, there exists
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 9417d404ea92..23d54b810f08 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -361,38 +361,6 @@ int __init early_set_memory_encrypted(unsigned long vaddr, 
unsigned long size)
return early_set_memory_enc_dec(vaddr, size, true);
 }
 
-/*
- * SME and SEV are very similar but they are not the same, so there are
- * times that the kernel will need to distinguish between SME and SEV. The
- * cc_platform_has() function is used for this.  When a distinction isn't
- * needed, the CC_ATTR_MEM_ENCRYPT attribute can be used.
- *
- * The trampoline code is a good example for this requirement.  Before
- * paging is activated, SME will access all memory as decrypted, but SEV
- * will access all memory a

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-24 Thread Borislav Petkov
On Fri, Sep 24, 2021 at 12:41:32PM +0300, Kirill A. Shutemov wrote:
> On Thu, Sep 23, 2021 at 08:21:03PM +0200, Borislav Petkov wrote:
> > On Thu, Sep 23, 2021 at 12:05:58AM +0300, Kirill A. Shutemov wrote:
> > > Unless we find other way to guarantee RIP-relative access, we must use
> > > fixup_pointer() to access any global variables.
> > 
> > Yah, I've asked compiler folks about any guarantees we have wrt
> > rip-relative addresses but it doesn't look good. Worst case, we'd have
> > to do the fixup_pointer() thing.
> > 
> > In the meantime, Tom and I did some more poking at this and here's a
> > diff ontop.
> > 
> > The direction being that we'll stick both the AMD and Intel
> > *cc_platform_has() call into cc_platform.c for which instrumentation
> > will be disabled so no issues with that.
> > 
> > And that will keep all that querying all together in a single file.
> 
> And still do cc_platform_has() calls in __startup_64() codepath?
> 
> It's broken.
> 
> Intel detection in cc_platform_has() relies on boot_cpu_data.x86_vendor
> which is not initialized until early_cpu_init() in setup_arch(). Given
> that X86_VENDOR_INTEL is 0 it leads to false-positive.

Yeah, Tom, I had the same question yesterday.

/me looks in his direction.

:-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-27 Thread Borislav Petkov
On Fri, Sep 24, 2021 at 07:46:10PM +, Yazen Ghannam wrote:
> I agree with you in general. But this device isn't really a GPU. And
> users of this device seem to want to count *every* error, at least for
> now.

Aha, so something accelerator-y where they do general purpose computation.

So what's the big picture here: they count all the errors and when they
reach a certain amount, they decide to replace the GPUs just in case?

Or wait until they become uncorrectable? But then it doesn't matter
because we will handle it properly by excluding the VRAM range from
further use.

Or do they wanna see *when* they had the correctable errors so that they
can restart the computation, just in case.

Dunno, it would be a lot helpful if we had some RAS strategy for those
things...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[PATCH 7/8] x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Replace uses of sev_es_active() with the more generic cc_platform_has()
using CC_ATTR_GUEST_STATE_ENCRYPT. If future support is added for other
memory encyrption techonologies, the use of CC_ATTR_GUEST_STATE_ENCRYPT
can be updated, as required.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/mem_encrypt.h |  2 --
 arch/x86/kernel/sev.c  |  6 +++---
 arch/x86/mm/mem_encrypt.c  | 24 +++-
 arch/x86/realmode/init.c   |  3 +--
 4 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index a5a58ccd1ee3..da14ede311aa 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void);
 void __init mem_encrypt_init(void);
 
 void __init sev_es_init_vc_handling(void);
-bool sev_es_active(void);
 
 #define __bss_decrypted __section(".bss..decrypted")
 
@@ -74,7 +73,6 @@ static inline void __init sme_encrypt_kernel(struct 
boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
-static inline bool sev_es_active(void) { return false; }
 
 static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 
0; }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a6895e440bc3..53a6837d354b 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -11,7 +11,7 @@
 
 #include  /* For show_regs() */
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -615,7 +615,7 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd)
int cpu;
u64 pfn;
 
-   if (!sev_es_active())
+   if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
return 0;
 
pflags = _PAGE_NX | _PAGE_RW;
@@ -774,7 +774,7 @@ void __init sev_es_init_vc_handling(void)
 
BUILD_BUG_ON(offsetof(struct sev_es_runtime_data, ghcb_page) % 
PAGE_SIZE);
 
-   if (!sev_es_active())
+   if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
return;
 
if (!sev_es_check_cpu_features())
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 932007a6913b..2d04c39bea1d 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -361,25 +361,6 @@ int __init early_set_memory_encrypted(unsigned long vaddr, 
unsigned long size)
return early_set_memory_enc_dec(vaddr, size, true);
 }
 
-/*
- * SME and SEV are very similar but they are not the same, so there are
- * times that the kernel will need to distinguish between SME and SEV. The
- * cc_platform_has() function is used for this.  When a distinction isn't
- * needed, the CC_ATTR_MEM_ENCRYPT attribute can be used.
- *
- * The trampoline code is a good example for this requirement.  Before
- * paging is activated, SME will access all memory as decrypted, but SEV
- * will access all memory as encrypted.  So, when APs are being brought
- * up under SME the trampoline area cannot be encrypted, whereas under SEV
- * the trampoline area must be encrypted.
- */
-
-/* Needs to be called from non-instrumentable code */
-bool noinstr sev_es_active(void)
-{
-   return sev_status & MSR_AMD64_SEV_ES_ENABLED;
-}
-
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 bool force_dma_unencrypted(struct device *dev)
 {
@@ -449,7 +430,7 @@ static void print_mem_encrypt_feature_info(void)
pr_cont(" SEV");
 
/* Encrypted Register State */
-   if (sev_es_active())
+   if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
pr_cont(" SEV-ES");
 
pr_cont("\n");
@@ -468,7 +449,8 @@ void __init mem_encrypt_init(void)
 * With SEV, we need to unroll the rep string I/O instructions,
 * but SEV-ES supports them through the #VC handler.
 */
-   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && !sev_es_active())
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
+   !cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
static_branch_enable(&sev_enable_key);
 
print_mem_encrypt_feature_info();
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index c878c5ee5a4c..4a3da7592b99 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -2,7 +2,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct trampoline_header 
*th)
if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
th->flags |= TH_FLAGS_SME_ACTIVE;
 
-   if (sev_es_active()) {
+   if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
/*
 * Skip the call to verify_cpu() in secondary_startup_64 as it
  

[PATCH 2/8] arch/cc: Introduce a function to check for confidential computing features

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

In preparation for other confidential computing technologies, introduce
a generic helper function, cc_platform_has(), that can be used to
check for specific active confidential computing attributes, like
memory encryption. This is intended to eliminate having to add multiple
technology-specific checks to the code (e.g. if (sev_active() ||
tdx_active() || ... ).

 [ bp: s/_CC_PLATFORM_H/_LINUX_CC_PLATFORM_H/g ]

Co-developed-by: Andi Kleen 
Signed-off-by: Andi Kleen 
Co-developed-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/Kconfig|  3 ++
 include/linux/cc_platform.h | 88 +
 2 files changed, 91 insertions(+)
 create mode 100644 include/linux/cc_platform.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 8df1c7102643..d1e69d6e8498 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1234,6 +1234,9 @@ config RELR
 config ARCH_HAS_MEM_ENCRYPT
bool
 
+config ARCH_HAS_CC_PLATFORM
+   bool
+
 config HAVE_SPARSE_SYSCALL_NR
bool
help
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
new file mode 100644
index ..a075b70b9a70
--- /dev/null
+++ b/include/linux/cc_platform.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Confidential Computing Platform Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ */
+
+#ifndef _LINUX_CC_PLATFORM_H
+#define _LINUX_CC_PLATFORM_H
+
+#include 
+#include 
+
+/**
+ * enum cc_attr - Confidential computing attributes
+ *
+ * These attributes represent confidential computing features that are
+ * currently active.
+ */
+enum cc_attr {
+   /**
+* @CC_ATTR_MEM_ENCRYPT: Memory encryption is active
+*
+* The platform/OS is running with active memory encryption. This
+* includes running either as a bare-metal system or a hypervisor
+* and actively using memory encryption or as a guest/virtual machine
+* and actively using memory encryption.
+*
+* Examples include SME, SEV and SEV-ES.
+*/
+   CC_ATTR_MEM_ENCRYPT,
+
+   /**
+* @CC_ATTR_HOST_MEM_ENCRYPT: Host memory encryption is active
+*
+* The platform/OS is running as a bare-metal system or a hypervisor
+* and actively using memory encryption.
+*
+* Examples include SME.
+*/
+   CC_ATTR_HOST_MEM_ENCRYPT,
+
+   /**
+* @CC_ATTR_GUEST_MEM_ENCRYPT: Guest memory encryption is active
+*
+* The platform/OS is running as a guest/virtual machine and actively
+* using memory encryption.
+*
+* Examples include SEV and SEV-ES.
+*/
+   CC_ATTR_GUEST_MEM_ENCRYPT,
+
+   /**
+* @CC_ATTR_GUEST_STATE_ENCRYPT: Guest state encryption is active
+*
+* The platform/OS is running as a guest/virtual machine and actively
+* using memory encryption and register state encryption.
+*
+* Examples include SEV-ES.
+*/
+   CC_ATTR_GUEST_STATE_ENCRYPT,
+};
+
+#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
+
+/**
+ * cc_platform_has() - Checks if the specified cc_attr attribute is active
+ * @attr: Confidential computing attribute to check
+ *
+ * The cc_platform_has() function will return an indicator as to whether the
+ * specified Confidential Computing attribute is currently active.
+ *
+ * Context: Any context
+ * Return:
+ * * TRUE  - Specified Confidential Computing attribute is active
+ * * FALSE - Specified Confidential Computing attribute is not active
+ */
+bool cc_platform_has(enum cc_attr attr);
+
+#else  /* !CONFIG_ARCH_HAS_CC_PLATFORM */
+
+static inline bool cc_platform_has(enum cc_attr attr) { return false; }
+
+#endif /* CONFIG_ARCH_HAS_CC_PLATFORM */
+
+#endif /* _LINUX_CC_PLATFORM_H */
-- 
2.29.2



[PATCH 3/8] x86/sev: Add an x86 version of cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Introduce an x86 version of the cc_platform_has() function. This will be
used to replace vendor specific calls like sme_active(), sev_active(),
etc.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/Kconfig   |  1 +
 arch/x86/include/asm/mem_encrypt.h |  1 +
 arch/x86/kernel/Makefile   |  6 +++
 arch/x86/kernel/cc_platform.c  | 69 ++
 arch/x86/mm/mem_encrypt.c  |  1 +
 5 files changed, 78 insertions(+)
 create mode 100644 arch/x86/kernel/cc_platform.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab83c22d274e..9f190ec4f953 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1518,6 +1518,7 @@ config AMD_MEM_ENCRYPT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
select INSTRUCTION_DECODER
select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
+   select ARCH_HAS_CC_PLATFORM
help
  Say yes to enable support for the encryption of system memory.
  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 9c80c68d75b5..3fb9f5ebefa4 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -13,6 +13,7 @@
 #ifndef __ASSEMBLY__
 
 #include 
+#include 
 
 #include 
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8f4e8fa6ed75..2ff3e600f426 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,6 +21,7 @@ CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 CFLAGS_REMOVE_sev.o = -pg
+CFLAGS_REMOVE_cc_platform.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o   := n
@@ -29,6 +30,7 @@ KASAN_SANITIZE_dumpstack_$(BITS).o:= n
 KASAN_SANITIZE_stacktrace.o:= n
 KASAN_SANITIZE_paravirt.o  := n
 KASAN_SANITIZE_sev.o   := n
+KASAN_SANITIZE_cc_platform.o   := n
 
 # With some compiler versions the generated code results in boot hangs, caused
 # by several compilation units. To be safe, disable all instrumentation.
@@ -47,6 +49,7 @@ endif
 KCOV_INSTRUMENT:= n
 
 CFLAGS_head$(BITS).o   += -fno-stack-protector
+CFLAGS_cc_platform.o   += -fno-stack-protector
 
 CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
 
@@ -147,6 +150,9 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER)+= 
unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)   += unwind_guess.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += sev.o
+
+obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += cc_platform.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
new file mode 100644
index ..03bb2f343ddb
--- /dev/null
+++ b/arch/x86/kernel/cc_platform.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Confidential Computing Platform Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_INTEL_TDX_GUEST
+   return false;
+#else
+   return false;
+#endif
+}
+
+/*
+ * SME and SEV are very similar but they are not the same, so there are
+ * times that the kernel will need to distinguish between SME and SEV. The
+ * cc_platform_has() function is used for this.  When a distinction isn't
+ * needed, the CC_ATTR_MEM_ENCRYPT attribute can be used.
+ *
+ * The trampoline code is a good example for this requirement.  Before
+ * paging is activated, SME will access all memory as decrypted, but SEV
+ * will access all memory as encrypted.  So, when APs are being brought
+ * up under SME the trampoline area cannot be encrypted, whereas under SEV
+ * the trampoline area must be encrypted.
+ */
+static bool amd_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   switch (attr) {
+   case CC_ATTR_MEM_ENCRYPT:
+   return sme_me_mask;
+
+   case CC_ATTR_HOST_MEM_ENCRYPT:
+   return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
+
+   case CC_ATTR_GUEST_MEM_ENCRYPT:
+   return sev_status & MSR_AMD64_SEV_ENABLED;
+
+   case CC_ATTR_GUEST_STATE_ENCRYPT:
+   return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+
+   default:
+   return false;
+   }
+#else
+   return false;
+#endif
+}
+
+
+bool cc_platform_has(enum cc_attr attr)
+{
+   if (sme_me_mask)
+   return amd_cc_platform_has(attr);
+
+   return false;
+}
+EXPORT_SYMBOL_GPL(cc_platform_has);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ff08dc463634..e29b1418d00c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem

[PATCH 6/8] x86/sev: Replace occurrences of sev_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Replace uses of sev_active() with the more generic cc_platform_has()
using CC_ATTR_GUEST_MEM_ENCRYPT. If future support is added for other
memory encryption technologies, the use of CC_ATTR_GUEST_MEM_ENCRYPT
can be updated, as required.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/mem_encrypt.h |  2 --
 arch/x86/kernel/crash_dump_64.c|  4 +++-
 arch/x86/kernel/kvm.c  |  3 ++-
 arch/x86/kernel/kvmclock.c |  4 ++--
 arch/x86/kernel/machine_kexec_64.c |  4 ++--
 arch/x86/kvm/svm/svm.c |  3 ++-
 arch/x86/mm/ioremap.c  |  6 +++---
 arch/x86/mm/mem_encrypt.c  | 21 -
 arch/x86/platform/efi/efi_64.c |  9 +
 9 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 63c5b99ccae5..a5a58ccd1ee3 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void);
 void __init mem_encrypt_init(void);
 
 void __init sev_es_init_vc_handling(void);
-bool sev_active(void);
 bool sev_es_active(void);
 
 #define __bss_decrypted __section(".bss..decrypted")
@@ -75,7 +74,6 @@ static inline void __init sme_encrypt_kernel(struct 
boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
-static inline bool sev_active(void) { return false; }
 static inline bool sev_es_active(void) { return false; }
 
 static inline int __init
diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index 045e82e8945b..a7f617a3981d 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
  unsigned long offset, int userbuf,
@@ -73,5 +74,6 @@ ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char 
*buf, size_t csize,
 
 ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, sev_active());
+   return read_from_oldmem(buf, count, ppos, 0,
+   cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT));
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b656456c3a94..8863d1941f1b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -418,7 +419,7 @@ static void __init sev_map_percpu_data(void)
 {
int cpu;
 
-   if (!sev_active())
+   if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return;
 
for_each_possible_cpu(cpu) {
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ad273e5861c1..fc3930c5db1b 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -16,9 +16,9 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
-#include 
 #include 
 #include 
 
@@ -232,7 +232,7 @@ static void __init kvmclock_init_mem(void)
 * hvclock is shared between the guest and the hypervisor, must
 * be mapped decrypted.
 */
-   if (sev_active()) {
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
r = set_memory_decrypted((unsigned long) hvclock_mem,
 1UL << order);
if (r) {
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 7040c0fa921c..f5da4a18070a 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -167,7 +167,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
}
pte = pte_offset_kernel(pmd, vaddr);
 
-   if (sev_active())
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
prot = PAGE_KERNEL_EXEC;
 
set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
@@ -207,7 +207,7 @@ static int init_pgtable(struct kimage *image, unsigned long 
start_pgtable)
level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
 
-   if (sev_active()) {
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
info.page_flag   |= _PAGE_ENC;
info.kernpg_flag |= _PAGE_ENC;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 05e8d4d27969..a2f78a8cfdaa 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -455,7 +456,7 @@ static int has_svm(void)
return 0;
}
 
-   if (sev_active()) {
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
pr_info("KVM is unsupported whe

[PATCH 1/8] x86/ioremap: Selectively build arch override encryption functions

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

In preparation for other uses of the cc_platform_has() function
besides AMD's memory encryption support, selectively build the
AMD memory encryption architecture override functions only when
CONFIG_AMD_MEM_ENCRYPT=y. These functions are:

- early_memremap_pgprot_adjust()
- arch_memremap_can_ram_remap()

Additionally, routines that are only invoked by these architecture
override functions can also be conditionally built. These functions are:

- memremap_should_map_decrypted()
- memremap_is_efi_data()
- memremap_is_setup_data()
- early_memremap_is_setup_data()

And finally, phys_mem_access_encrypted() is conditionally built as well,
but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is
not set.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/io.h | 8 
 arch/x86/mm/ioremap.c | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 841a5d104afa..5c6a4af0b911 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -391,6 +391,7 @@ extern void arch_io_free_memtype_wc(resource_size_t start, 
resource_size_t size)
 #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 extern bool arch_memremap_can_ram_remap(resource_size_t offset,
unsigned long size,
unsigned long flags);
@@ -398,6 +399,13 @@ extern bool arch_memremap_can_ram_remap(resource_size_t 
offset,
 
 extern bool phys_mem_access_encrypted(unsigned long phys_addr,
  unsigned long size);
+#else
+static inline bool phys_mem_access_encrypted(unsigned long phys_addr,
+unsigned long size)
+{
+   return true;
+}
+#endif
 
 /**
  * iosubmit_cmds512 - copy data to single MMIO location, in 512-bit units
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 60ade7dd71bd..ccff76cedd8f 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -508,6 +508,7 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
memunmap((void *)((unsigned long)addr & PAGE_MASK));
 }
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 /*
  * Examine the physical address to determine if it is an area of memory
  * that should be mapped decrypted.  If the memory is not part of the
@@ -746,7 +747,6 @@ bool phys_mem_access_encrypted(unsigned long phys_addr, 
unsigned long size)
return arch_memremap_can_ram_remap(phys_addr, size, 0);
 }
 
-#ifdef CONFIG_AMD_MEM_ENCRYPT
 /* Remap memory with encryption */
 void __init *early_memremap_encrypted(resource_size_t phys_addr,
  unsigned long size)
-- 
2.29.2



[PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
From: Borislav Petkov 

Hi all,

here's v4 of the cc_platform_has() patchset with feedback incorporated.

I'm going to route this through tip if there are no objections.

Thx.

Tom Lendacky (8):
  x86/ioremap: Selectively build arch override encryption functions
  arch/cc: Introduce a function to check for confidential computing
features
  x86/sev: Add an x86 version of cc_platform_has()
  powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
  x86/sme: Replace occurrences of sme_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
  treewide: Replace the use of mem_encrypt_active() with
cc_platform_has()

 arch/Kconfig |  3 +
 arch/powerpc/include/asm/mem_encrypt.h   |  5 --
 arch/powerpc/platforms/pseries/Kconfig   |  1 +
 arch/powerpc/platforms/pseries/Makefile  |  2 +
 arch/powerpc/platforms/pseries/cc_platform.c | 26 ++
 arch/powerpc/platforms/pseries/svm.c |  5 +-
 arch/s390/include/asm/mem_encrypt.h  |  2 -
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/io.h|  8 ++
 arch/x86/include/asm/kexec.h |  2 +-
 arch/x86/include/asm/mem_encrypt.h   | 12 +--
 arch/x86/kernel/Makefile |  6 ++
 arch/x86/kernel/cc_platform.c| 69 +++
 arch/x86/kernel/crash_dump_64.c  |  4 +-
 arch/x86/kernel/head64.c |  9 +-
 arch/x86/kernel/kvm.c|  3 +-
 arch/x86/kernel/kvmclock.c   |  4 +-
 arch/x86/kernel/machine_kexec_64.c   | 19 +++--
 arch/x86/kernel/pci-swiotlb.c|  9 +-
 arch/x86/kernel/relocate_kernel_64.S |  2 +-
 arch/x86/kernel/sev.c|  6 +-
 arch/x86/kvm/svm/svm.c   |  3 +-
 arch/x86/mm/ioremap.c| 18 ++--
 arch/x86/mm/mem_encrypt.c| 55 
 arch/x86/mm/mem_encrypt_identity.c   |  9 +-
 arch/x86/mm/pat/set_memory.c |  3 +-
 arch/x86/platform/efi/efi_64.c   |  9 +-
 arch/x86/realmode/init.c |  8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |  4 +-
 drivers/gpu/drm/drm_cache.c  |  4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |  4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c  |  6 +-
 drivers/iommu/amd/init.c |  7 +-
 drivers/iommu/amd/iommu.c|  3 +-
 drivers/iommu/amd/iommu_v2.c |  3 +-
 drivers/iommu/iommu.c|  3 +-
 fs/proc/vmcore.c |  6 +-
 include/linux/cc_platform.h  | 88 
 include/linux/mem_encrypt.h  |  4 -
 kernel/dma/swiotlb.c |  4 +-
 40 files changed, 310 insertions(+), 129 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c
 create mode 100644 arch/x86/kernel/cc_platform.c
 create mode 100644 include/linux/cc_platform.h

-- 
2.29.2



[PATCH 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Introduce a powerpc version of the cc_platform_has() function. This will
be used to replace the powerpc mem_encrypt_active() implementation, so
the implementation will initially only support the CC_ATTR_MEM_ENCRYPT
attribute.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
Acked-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/Kconfig   |  1 +
 arch/powerpc/platforms/pseries/Makefile  |  2 ++
 arch/powerpc/platforms/pseries/cc_platform.c | 26 
 3 files changed, 29 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 5e037df2a3a1..2e57391e0778 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -159,6 +159,7 @@ config PPC_SVM
select SWIOTLB
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select ARCH_HAS_CC_PLATFORM
help
 There are certain POWER platforms which support secure guests using
 the Protected Execution Facility, with the help of an Ultravisor
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..41d8aee98da4 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -31,3 +31,5 @@ obj-$(CONFIG_FA_DUMP) += rtas-fadump.o
 
 obj-$(CONFIG_SUSPEND)  += suspend.o
 obj-$(CONFIG_PPC_VAS)  += vas.o
+
+obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += cc_platform.o
diff --git a/arch/powerpc/platforms/pseries/cc_platform.c 
b/arch/powerpc/platforms/pseries/cc_platform.c
new file mode 100644
index ..e8021af83a19
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/cc_platform.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Confidential Computing Platform Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+
+bool cc_platform_has(enum cc_attr attr)
+{
+   switch (attr) {
+   case CC_ATTR_MEM_ENCRYPT:
+   return is_secure_guest();
+
+   default:
+   return false;
+   }
+}
+EXPORT_SYMBOL_GPL(cc_platform_has);
-- 
2.29.2



[PATCH 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Replace uses of sme_active() with the more generic cc_platform_has()
using CC_ATTR_HOST_MEM_ENCRYPT. If future support is added for other
memory encryption technologies, the use of CC_ATTR_HOST_MEM_ENCRYPT
can be updated, as required.

This also replaces two usages of sev_active() that are really geared
towards detecting if SME is active.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/kexec.h |  2 +-
 arch/x86/include/asm/mem_encrypt.h   |  2 --
 arch/x86/kernel/machine_kexec_64.c   | 15 ---
 arch/x86/kernel/pci-swiotlb.c|  9 -
 arch/x86/kernel/relocate_kernel_64.S |  2 +-
 arch/x86/mm/ioremap.c|  6 +++---
 arch/x86/mm/mem_encrypt.c| 13 -
 arch/x86/mm/mem_encrypt_identity.c   |  9 -
 arch/x86/realmode/init.c |  5 +++--
 drivers/iommu/amd/init.c |  7 ---
 10 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 0a6e34b07017..11b7c06e2828 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -129,7 +129,7 @@ relocate_kernel(unsigned long indirection_page,
unsigned long page_list,
unsigned long start_address,
unsigned int preserve_context,
-   unsigned int sme_active);
+   unsigned int host_mem_enc_active);
 #endif
 
 #define ARCH_HAS_KIMAGE_ARCH
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 3fb9f5ebefa4..63c5b99ccae5 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void);
 void __init mem_encrypt_init(void);
 
 void __init sev_es_init_vc_handling(void);
-bool sme_active(void);
 bool sev_active(void);
 bool sev_es_active(void);
 
@@ -76,7 +75,6 @@ static inline void __init sme_encrypt_kernel(struct 
boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
-static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
 static inline bool sev_es_active(void) { return false; }
 
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 131f30fdcfbd..7040c0fa921c 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -358,7 +359,7 @@ void machine_kexec(struct kimage *image)
   (unsigned long)page_list,
   image->start,
   image->preserve_context,
-  sme_active());
+  
cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
 
 #ifdef CONFIG_KEXEC_JUMP
if (image->preserve_context)
@@ -569,12 +570,12 @@ void arch_kexec_unprotect_crashkres(void)
  */
 int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
 {
-   if (sev_active())
+   if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
return 0;
 
/*
-* If SME is active we need to be sure that kexec pages are
-* not encrypted because when we boot to the new kernel the
+* If host memory encryption is active we need to be sure that kexec
+* pages are not encrypted because when we boot to the new kernel the
 * pages won't be accessed encrypted (initially).
 */
return set_memory_decrypted((unsigned long)vaddr, pages);
@@ -582,12 +583,12 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int 
pages, gfp_t gfp)
 
 void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
 {
-   if (sev_active())
+   if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
return;
 
/*
-* If SME is active we need to reset the pages back to being
-* an encrypted mapping before freeing them.
+* If host memory encryption is active we need to reset the pages back
+* to being an encrypted mapping before freeing them.
 */
set_memory_encrypted((unsigned long)vaddr, pages);
 }
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index c2cfa5e7c152..814ab46a0dad 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -6,7 +6,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -45,11 +45,10 @@ int __init pci_swiotlb_detect_4gb(void)
swiotlb = 1;
 
/*
-* If SME is active then swiotlb will be set to 1 so that bounce
-* buffers are allocated and used for devices that do not support
-* the addressing range required for the e

[PATCH 8/8] treewide: Replace the use of mem_encrypt_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Replace uses of mem_encrypt_active() with calls to cc_platform_has() with
the CC_ATTR_MEM_ENCRYPT attribute.

Remove the implementation of mem_encrypt_active() across all arches.

For s390, since the default implementation of the cc_platform_has()
matches the s390 implementation of mem_encrypt_active(), cc_platform_has()
does not need to be implemented in s390 (the config option
ARCH_HAS_CC_PLATFORM is not set).

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/powerpc/include/asm/mem_encrypt.h  | 5 -
 arch/powerpc/platforms/pseries/svm.c| 5 +++--
 arch/s390/include/asm/mem_encrypt.h | 2 --
 arch/x86/include/asm/mem_encrypt.h  | 5 -
 arch/x86/kernel/head64.c| 9 +++--
 arch/x86/mm/ioremap.c   | 4 ++--
 arch/x86/mm/mem_encrypt.c   | 2 +-
 arch/x86/mm/pat/set_memory.c| 3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++-
 drivers/gpu/drm/drm_cache.c | 4 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++---
 drivers/iommu/amd/iommu.c   | 3 ++-
 drivers/iommu/amd/iommu_v2.c| 3 ++-
 drivers/iommu/iommu.c   | 3 ++-
 fs/proc/vmcore.c| 6 +++---
 include/linux/mem_encrypt.h | 4 
 kernel/dma/swiotlb.c| 4 ++--
 18 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/mem_encrypt.h 
b/arch/powerpc/include/asm/mem_encrypt.h
index ba9dab07c1be..2f26b8fc8d29 100644
--- a/arch/powerpc/include/asm/mem_encrypt.h
+++ b/arch/powerpc/include/asm/mem_encrypt.h
@@ -10,11 +10,6 @@
 
 #include 
 
-static inline bool mem_encrypt_active(void)
-{
-   return is_secure_guest();
-}
-
 static inline bool force_dma_unencrypted(struct device *dev)
 {
return is_secure_guest();
diff --git a/arch/powerpc/platforms/pseries/svm.c 
b/arch/powerpc/platforms/pseries/svm.c
index 87f001b4c4e4..c083ecbbae4d 100644
--- a/arch/powerpc/platforms/pseries/svm.c
+++ b/arch/powerpc/platforms/pseries/svm.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -63,7 +64,7 @@ void __init svm_swiotlb_init(void)
 
 int set_memory_encrypted(unsigned long addr, int numpages)
 {
-   if (!mem_encrypt_active())
+   if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT))
return 0;
 
if (!PAGE_ALIGNED(addr))
@@ -76,7 +77,7 @@ int set_memory_encrypted(unsigned long addr, int numpages)
 
 int set_memory_decrypted(unsigned long addr, int numpages)
 {
-   if (!mem_encrypt_active())
+   if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT))
return 0;
 
if (!PAGE_ALIGNED(addr))
diff --git a/arch/s390/include/asm/mem_encrypt.h 
b/arch/s390/include/asm/mem_encrypt.h
index 2542cbf7e2d1..08a8b96606d7 100644
--- a/arch/s390/include/asm/mem_encrypt.h
+++ b/arch/s390/include/asm/mem_encrypt.h
@@ -4,8 +4,6 @@
 
 #ifndef __ASSEMBLY__
 
-static inline bool mem_encrypt_active(void) { return false; }
-
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
 
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index da14ede311aa..2d4f5c17d79c 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -96,11 +96,6 @@ static inline void mem_encrypt_free_decrypted_mem(void) { }
 
 extern char __start_bss_decrypted[], __end_bss_decrypted[], 
__start_bss_decrypted_unused[];
 
-static inline bool mem_encrypt_active(void)
-{
-   return sme_me_mask;
-}
-
 static inline u64 sme_get_me_mask(void)
 {
return sme_me_mask;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..fc5371a7e9d1 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -19,7 +19,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #include 
@@ -284,8 +284,13 @@ unsigned long __head __startup_64(unsigned long physaddr,
 * The bss section will be memset to zero later in the initialization so
 * there is no need to zero it after changing the memory encryption
 * attribute.
+*
+* This is early code, use an open coded check for SME instead of
+* using cc_platform_has(). This eliminates worries about removing
+* instrumentation or checking boot_cpu_data in the cc_platform_has()
+* function.
 */
-   if (mem_encrypt_active()) {
+   if (sme_get_me_mask()) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index b59a5cbc6bc5..026031b3b782 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -694,7 +694,7 @@ static bool __i

Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 12:19:49PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Intel CC support patch is not included in this series. You want me
> to address the issue raised by Joerg before merging it?

Did you not see my email to you today:

https://lkml.kernel.org/r/yvl4zughfsh1q...@zn.tnic

?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 01:48:46PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Just read it. If you want to use cpuid_has_tdx_guest() directly in
> cc_platform_has(), then you want to rename intel_cc_platform_has() to
> tdx_cc_platform_has()?

Why?

You simply do:

if (cpuid_has_tdx_guest())
intel_cc_platform_has(...);

and lemme paste from that mail: " ...you should use
cpuid_has_tdx_guest() instead but cache its result so that you don't
call CPUID each time the kernel executes cc_platform_has()."

Makes sense?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 02:01:57PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Yes. But, since the check is related to TDX, I just want to confirm whether
> you are fine with naming the function as intel_*().

Why is this such a big of a deal?!

There's amd_cc_platform_has() and intel_cc_platform_has() will be the
corresponding Intel version.

> Since this patch is going to have dependency on TDX code, I will include
> this patch in TDX patch set.

Ok.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-05 Thread Borislav Petkov
On Tue, Oct 05, 2021 at 04:29:41PM +0200, Paul Menzel wrote:
> Selecting the symbol `AMD_MEM_ENCRYPT` – as
> done in Debian 5.13.9-1~exp1 [1] – also selects
> `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT`, as it defaults to yes,

I'm assuming that "selecting" is done automatically: alldefconfig,
olddefconfig?

Because CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT only depends on
CONFIG_AMD_MEM_ENCRYPT and former can be disabled in oldconfig or
menuconfig etc.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote:
> It's not incompatible per se, but SEM requires the IOMMU be enabled
> because the C bit used for encryption is beyond the dma_mask of most
> devices.  If the C bit is not set, the en/decryption for DMA doesn't
> occur.  So you need IOMMU to be enabled in remapping mode to use SME
> with most devices.  Raven has further requirements in that it requires
> IOMMUv2 functionality to support some features which currently uses a
> direct mapping in the IOMMU and hence the C bit is not properly
> handled.

So lemme ask you this: do Raven-containing systems exist out there which
don't have IOMMUv2 functionality and which can cause boot failures when
SME is enabled in the kernel .config?

IOW, can we handle this at boot time properly, i.e., disable SME if we
detect Raven or IOMMUv2 support is missing?

If not, then we really will have to change the default.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 09:23:22AM -0400, Alex Deucher wrote:
> There could be some OEM systems that disable the IOMMU on the platform
> and don't provide a switch in the bios to enable it.  The GPU driver
> will still work in that case, it will just not be able to enable KFD
> support for ROCm compute.  SME won't work for most devices in that
> case however since most devices have a DMA mask too small to handle
> the C bit for encryption.  SME should be dependent on IOMMU being
> enabled.

Yeah, I'd let you hash this out with Tom.

> I'm not an SME expert, but I thought that that was already the case.

Yeah, I think Paul wants this:

---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b79e88ee6627..e94c2df7a043 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1518,7 +1518,6 @@ config AMD_MEM_ENCRYPT
 
 config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
-   default y
depends on AMD_MEM_ENCRYPT
help
  Say yes to have system memory encrypted by default if running on

---

The reason we did this is so that you don't want to supply
mem_encrypt=on on the cmdline but didn't anticipate any such fun with
some devices.

> We just added the error condition in the GPU driver to prevent the
> driver from loading when the user forced SME on.  IIRC, there were
> users that cared more about SME than graphics support.

Well, it's a distro kernel so we should at least try to make everyone
happy. :)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
Ok,

so I sat down and wrote something and tried to capture all the stuff we
so talked about that it is clear in the future why we did it.

Thoughts?

---
From: Borislav Petkov 
Date: Wed, 6 Oct 2021 19:34:55 +0200
Subject: [PATCH] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
 automatically

This Kconfig option was added initially so that memory encryption is
enabled by default on machines which support it.

However, Raven-class GPUs, a.o., cannot handle DMA masks which are
shorter than the bit position of the encryption, aka C-bit. For that,
those devices need to have the IOMMU present.

If the IOMMU is disabled or in passthrough mode, though, the kernel
would switch to SWIOTLB bounce-buffering for those transfers.

In order to avoid that,

   2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")

disables the default IOMMU passthrough mode so that devices for which
the default 256K DMA is insufficient, can use the IOMMU instead.

However 2, there are cases where the IOMMU is disabled in the BIOS, etc,
think the usual hardware folk "oops, I dropped the ball there" cases.

Which means, it can happen that there are systems out there with devices
which need the IOMMU to function properly with SME enabled but the IOMMU
won't necessarily be enabled.

So in order for those devices to function, drop the "default y" for
the SME by default on option so that users who want to have SME, will
need to either enable it in their config or use "mem_encrypt=on" on the
kernel command line.

Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
Reported-by: Paul Menzel 
Signed-off-by: Borislav Petkov 
Cc: 
Link: 
https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df...@molgen.mpg.de
---
 arch/x86/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8055da49f1c0..6a336b1f3f28 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT
 
 config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
-   default y
depends on AMD_MEM_ENCRYPT
help
  Say yes to have system memory encrypted by default if running on
-- 
2.29.2


-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 02:10:30PM -0400, Alex Deucher wrote:
> This is not limited to Raven.

That's what the innocuous "a.o." wanted to state. :)

> All GPUs (and quite a few other
> devices) have a limited DMA mask.  AMD GPUs have between 32 and 48
> bits of DMA depending on what generation the hardware is.  So to
> support SME, you either need swiotlb with bounce buffers or you need
> IOMMU in remapping mode. The limitation with Raven is that if you want
> to use it with the IOMMU enabled it requires the IOMMU to be set up in
> passthrough mode to support IOMMUv2 functionality for compute support
> and due to other hardware limitations on the display side. So for all
> GPUs except raven, just having IOMMU enabled in remapping mode is
> fine.  GPUs from other vendors would likely run into similar
> limitations.  Raven just has further limitations.

Hmm, and in passthrough mode it would use bounce buffers when SME is
enabled. And when those 256K are not enough, it would fail there too,
even with IOMMUv2. At least this is how it looks from here.

Dunno, it feels like doing GPU compute and SME does not go hand-in-hand
real smoothly currently but that probably doesn't matter all too much
for both user camps. But that's just me with a hunch.

> Another option would be to enable SME by default on Epyc platforms,
> but disabled by default on client APU platforms or even just raven.

Thing is, we don't know at SME init time - very early during boot -
whether we're Epyc or client. Can we find that out reliably from the hw?

And even if we do, that's still not accurate enough - we wanna know
whether the IOMMU works.

So I guess we're all left to the user to decide. But I'm always open
to suggestions for solving things in sw and not requiring any user
interaction.

> Other than these comments, looks fine to me.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 02:36:56PM -0400, Alex Deucher wrote:
> From the x86 model and family info?  I think Raven has different
> families from other Zen based CPUs.

Yeah, I'd like to avoid a f/m/s mapping table, if possible. Those things
should be a last resort and they always need adjustment when new models
pop up.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 02:21:40PM -0400, Alex Deucher wrote:
> And just another general comment, swiotlb + bounce buffers isn't
> really useful on GPUs.  You may have 10-100s of MBs of memory mapped
> long term into the GPU's address space for random access.  E.g., you
> may have buffers in system memory that the display hardware is
> actively scanning out of.  For GPUs you should really only enable SME
> if IOMMU is enabled in remapping mode.  But that is probably beyond
> the discussion here.

Right, but insights into how these things work (or don't work) together
are always welcome. And yes, as 2cc13bb4f59f says:

"... The bounce buffer
code has an upper limit of 256kb for the size of DMA
allocations, which is too small for certain devices and
causes them to fail."

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-07 Thread Borislav Petkov
Hi folks,

commit in $Subject breaks rebooting an HP laptop here with a Carrizo
chipset: after typing "reboot" and pressing Enter, it powers off the
machine up to a certain point but the fans remain on, screen goes black
and nothing happens anymore. No reboot. I have to power it off by
holding the power key down for 4 seconds.

Reverting the patch fixes the issue.

GPU info on that machine:

[1.462214] [drm] amdgpu kernel modesetting enabled.
[1.465150] amdgpu :00:01.0: vgaarb: deactivate vga console
[1.466259] Console: switching to colour dummy device 80x25
[1.466844] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 
0x103C:0x807E 0xC4).
[1.467242] amdgpu :00:01.0: amdgpu: Trusted Memory Zone (TMZ) feature 
not supported
[1.467552] [drm] register mmio base: 0xD0C0
[1.467750] [drm] register mmio size: 262144
[1.467901] [drm] add ip block number 0 
[1.468067] [drm] add ip block number 1 
[1.468266] [drm] add ip block number 2 
[1.468436] [drm] add ip block number 3 
[1.468603] [drm] add ip block number 4 
[1.468809] [drm] add ip block number 5 
[1.468975] [drm] add ip block number 6 
[1.469120] [drm] add ip block number 7 
[1.469282] [drm] add ip block number 8 
[1.485350] [drm] BIOS signature incorrect 20 7
[1.485494] resource sanity check: requesting [mem 0x000c-0x000d], 
which spans more than PCI Bus :00 [mem 0x000c-
0x000c3fff window]
[1.485922] caller pci_map_rom+0x68/0x1b0 mapping multiple BARs
[1.486273] amdgpu :00:01.0: amdgpu: Fetched VBIOS from ROM BAR
[1.486488] amdgpu: ATOM BIOS: SWBRT27354.001
[1.486701] [drm] UVD is enabled in physical mode
[1.486862] [drm] VCE enabled in physical mode
[1.487061] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment 
size is 9-bit
[1.487339] amdgpu :00:01.0: amdgpu: VRAM: 512M 0x00F4 - 
0x00F41FFF (512M used)
[1.487652] amdgpu :00:01.0: amdgpu: GART: 1024M 0x00FF - 
0x00FF3FFF
[1.487939] [drm] Detected VRAM RAM=512M, BAR=512M
[1.488101] [drm] RAM width 128bits UNKNOWN
[1.488309] [drm] amdgpu: 512M of VRAM memory ready
[1.488522] [drm] amdgpu: 3072M of GTT memory ready.
[1.488707] [drm] GART: num cpu pages 262144, num gpu pages 262144
[1.488997] [drm] PCIE GART of 1024M enabled (table at 0x00F40090).
[1.491962] amdgpu: hwmgr_sw_init smu backed is smu8_smu
[1.492544] [drm] Found UVD firmware Version: 1.91 Family ID: 11
[1.492764] [drm] UVD ENC is disabled
[1.494177] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[1.495765] amdgpu: smu version 18.62.00
[1.501983] [drm] DM_PPLIB: values for Engine clock
[1.502201] [drm] DM_PPLIB:   30
[1.502321] [drm] DM_PPLIB:   36
[1.502441] [drm] DM_PPLIB:   423530
[1.502561] [drm] DM_PPLIB:   514290
[1.502680] [drm] DM_PPLIB:   626090
[1.502799] [drm] DM_PPLIB:   72
[1.502919] [drm] DM_PPLIB: Validation clocks:
[1.503069] [drm] DM_PPLIB:engine_max_clock: 72000
[1.503242] [drm] DM_PPLIB:memory_max_clock: 8
[1.503415] [drm] DM_PPLIB:level   : 8
[1.503578] [drm] DM_PPLIB: values for Display clock
[1.503745] [drm] DM_PPLIB:   30
[1.503864] [drm] DM_PPLIB:   40
[1.503984] [drm] DM_PPLIB:   496560
[1.504147] [drm] DM_PPLIB:   626090
[1.504275] [drm] DM_PPLIB:   685720
[1.504403] [drm] DM_PPLIB:   757900
[1.504526] [drm] DM_PPLIB: Validation clocks:
[1.504678] [drm] DM_PPLIB:engine_max_clock: 72000
[1.504891] [drm] DM_PPLIB:memory_max_clock: 8
[1.505063] [drm] DM_PPLIB:level   : 8
[1.505225] [drm] DM_PPLIB: values for Memory clock
[1.505389] [drm] DM_PPLIB:   333000
[1.505508] [drm] DM_PPLIB:   80
[1.505628] [drm] DM_PPLIB: Validation clocks:
[1.505777] [drm] DM_PPLIB:engine_max_clock: 72000
[1.505950] [drm] DM_PPLIB:memory_max_clock: 8
[1.506123] [drm] DM_PPLIB:level   : 8
[1.506375] [drm] Display Core initialized with v3.2.149!
[1.584817] [drm] UVD initialized successfully.
[1.784234] [drm] VCE initialized successfully.
[1.784415] amdgpu :00:01.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, 
active_cu_number 8
[1.787958] [drm] fb mappable at 0xA0EE4000
[1.788118] [drm] vram apper at 0xA000
[1.788258] [drm] size 14745600
[1.788367] [drm] fb depth is 24
[1.788503] [drm]pitch is 10240
[1.789198] fbcon: amdgpu (fb0) is primary device
[1.880014] Console: switching to colour frame buffer device 320x90
[1.903779] amdgpu :00:01.0: [drm] fb0: amdgpu frame buffer device
[1.918353] [drm] Initialized amdgpu 3.42.0 20150101 for :00:01.0 on 
minor 0

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-08 Thread Borislav Petkov
On Fri, Oct 08, 2021 at 11:12:35AM -0400, Alex Deucher wrote:
> Can you try swapping the order of
> amdgpu_device_ip_set_powergating_state() and
> amdgpu_device_ip_set_clockgating_state() in the patch?

Nope, the diff below didn't change things.

Should I comment them out one by one and see whether the clockgating or
the powergating causes it?

diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index bc571833632e..99e3d697cc24 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -561,10 +561,10 @@ static int uvd_v6_0_hw_fini(void *handle)
} else {
amdgpu_asic_set_uvd_clocks(adev, 0, 0);
/* shutdown the UVD block */
-   amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_UVD,
-  AMD_PG_STATE_GATE);
amdgpu_device_ip_set_clockgating_state(adev, 
AMD_IP_BLOCK_TYPE_UVD,
   AMD_CG_STATE_GATE);
+   amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_UVD,
+  AMD_PG_STATE_GATE);
}
 
if (RREG32(mmUVD_STATUS) != 0)
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 9de66893ccd6..a36612357d0f 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -507,10 +507,10 @@ static int vce_v3_0_hw_fini(void *handle)
amdgpu_dpm_enable_vce(adev, false);
} else {
amdgpu_asic_set_vce_clocks(adev, 0, 0);
-   amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VCE,
-  AMD_PG_STATE_GATE);
amdgpu_device_ip_set_clockgating_state(adev, 
AMD_IP_BLOCK_TYPE_VCE,
   AMD_CG_STATE_GATE);
+   amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VCE,
+  AMD_PG_STATE_GATE);
}
 
r = vce_v3_0_wait_for_idle(handle);

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-11 Thread Borislav Petkov
On Sat, Oct 09, 2021 at 01:20:39AM +, Quan, Evan wrote:
> Maybe the change below can address your issue.
> https://lists.freedesktop.org/archives/amd-gfx/2021-September/069006.html

Nope, that one doesn't change anything.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-11 Thread Borislav Petkov
On Sat, Oct 09, 2021 at 09:54:13AM +, Quan, Evan wrote:
> Oops, I just found some necessary changes are missing from the patch of the 
> link below.
> https://lists.freedesktop.org/archives/amd-gfx/2021-September/069006.html
> 
> Could you try the patch from the link above + the attached patch?

Nope, still no joy. ;-\

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-11 Thread Borislav Petkov
On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote:
> I think, the IOMMU is enabled on the MSI B350M MORTAR, but otherwise, yes
> this looks fine. The help text could also be updated to mention problems
> with AMD Raven devices.

This is not only about Raven GPUs but, as Alex explained, pretty much
about every device which doesn't support a 48 bit DMA mask. I'll expand
that aspect in the changelog.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[PATCH -v2] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically

2021-10-11 Thread Borislav Petkov
Ok,

here's v2, I've added "however" number 3 below which should summarize
Christian's note about coherent and concurrent use of memory by the GPU
and CPU, which obviously cannot work with bounce buffers.

I'll send it to Linus next week if there are no more complaints.

Thx.

---
From: Borislav Petkov 

This Kconfig option was added initially so that memory encryption is
enabled by default on machines which support it.

However, devices which have DMA masks that are less than the bit
position of the encryption bit, aka C-bit, require the use of an IOMMU
or the use of SWIOTLB.

If the IOMMU is disabled or in passthrough mode, the kernel would switch
to SWIOTLB bounce-buffering for those transfers.

In order to avoid that,

  2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")

disables the default IOMMU passthrough mode so that devices for which the
default 256K DMA is insufficient, can use the IOMMU instead.

However 2, there are cases where the IOMMU is disabled in the BIOS, etc.
(think the usual hardware folk "oops, I dropped the ball there" cases) or a
driver doesn't properly use the DMA APIs or a device has a firmware or
hardware bug, e.g.:

  ea68573d408f ("drm/amdgpu: Fail to load on RAVEN if SME is active")

However 3, in the above GPU use case, there are APIs like Vulkan and
some OpenGL/OpenCL extensions which are under the assumption that
user-allocated memory can be passed in to the kernel driver and both the
GPU and CPU can do coherent and concurrent access to the same memory.
That cannot work with SWIOTLB bounce buffers, of course.

So, in order for those devices to function, drop the "default y" for the
SME by default active option so that users who want to have SME enabled,
will need to either enable it in their config or use "mem_encrypt=on" on
the kernel command line.

 [ tlendacky: Generalize commit message. ]

Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
Reported-by: Paul Menzel 
Signed-off-by: Borislav Petkov 
Cc: 
Link: 
https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df...@molgen.mpg.de
---
 arch/x86/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bd70e8a39fbf..d9830e7e1060 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT
 
 config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
-   default y
depends on AMD_MEM_ENCRYPT
help
  Say yes to have system memory encrypted by default if running on
-- 
2.29.2

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-11 Thread Borislav Petkov
On Mon, Oct 11, 2021 at 08:03:51AM +, Quan, Evan wrote:
> OK... Then forget about previous patches. Let's try to narrow down the
> issue first. Please try the attached patch1 first. If it works,

It does.

> please undo the changes of patch1 and try patch2 to narrow down further.

It does too.

:-)

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-13 Thread Borislav Petkov
On Wed, Oct 13, 2021 at 09:19:45AM +, Quan, Evan wrote:
> So, I need your help to confirm the last two patches(I sent you) do not 
> affect the fix for the bug above.
> Please follow the steps below to verify it:
> 1. Launch a video playing
> 2. open another terminal and issue "sudo pm-suspend" command to force 
> suspending
> 3. verify the system can suspend and resume back correctly without errors or 
> hangs

Just to confirm: you want me to do that with the last two patches
applied?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-14 Thread Borislav Petkov
On Thu, Oct 14, 2021 at 02:02:48AM +, Quan, Evan wrote:
> [Quan, Evan] Yes, but not(apply them) at the same time. One by one as you did 
> before.
> - try the patch1 first

Ok, first patch worked fine.

> - undo the changes of patch1 and try patch2

Did that, worked fine too except after the first resume cycle, the video
didn't continue playing.

Then I restarted the video and did a couple more suspend cycles to see
if it would not continue again. In the subsequent tries it would resume
fine and the video would continue playing too.

So I'm going to chalk that single case of halted video with the second
patch to a resume glitch or so.

Btw, I don't have pm-suspend on that box but I did suspend to RAM
assuming this is what you wanted, which is done as root with two
commands:

# echo "suspend" > /sys/power/disk
# echo "mem" > /sys/power/state

If you want me to do more extensive testing, just shoot.

HTH.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH] drm/amdgpu: fix the hang observed on Carrizo due to UVD suspend failure

2021-10-18 Thread Borislav Petkov
On Mon, Oct 18, 2021 at 03:34:32PM +0800, Evan Quan wrote:
> It's confirmed that on some APUs the interaction with SMU(about DPM 
> disablement)
> will power off the UVD. That will make the succeeding interactions with UVD 
> on the
> suspend path impossible. And the system will hang due to that. To workaround 
> the
> issue, we will skip the UVD(or VCE) power off during interaction with SMU for
> suspend scenario.
> 
> Signed-off-by: Evan Quan 
> Change-Id: I7804d3835aadbc7cf4b686da4994e8461748
> ---
>  .../powerplay/hwmgr/smu7_clockpowergating.c   | 20 +--
>  .../drm/amd/pm/powerplay/hwmgr/smu8_hwmgr.c   | 16 +--
>  drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c |  4 ++--
>  3 files changed, 34 insertions(+), 6 deletions(-)

Want me to run it?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-11-05 Thread Borislav Petkov
On Fri, Nov 05, 2021 at 08:05:41AM +, Quan, Evan wrote:
> I'm wondering are you able to give the attached patch(alone) a try.

Yap, looks good.

Tested-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH] drm/amd/pm: avoid duplicate powergate/ungate setting

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 09:51:03AM +0100, Paul Menzel wrote:
> Please elaborate the kind of issues.

It fails to reboot on Carrizo-based laptops.

Whoever commits this, pls add

Link: https://lore.kernel.org/r/yv81vidwqlwva...@zn.tnic

so that it is clear what the whole story way.

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Ivo Totev, HRB 36809, AG Nürnberg


Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-12 Thread Borislav Petkov
Hi,

so this is a drive-by review using the lore.kernel.org mail because I
wasn't CCed on this.

On Tue, May 11, 2021 at 09:30:58PM -0400, Mukul Joshi wrote:
> +static int amdgpu_bad_page_notifier(struct notifier_block *nb,
> + unsigned long val, void *data)
> +{
> + struct mce *m = (struct mce *)data;
> + struct amdgpu_device *adev = NULL;
> + uint32_t gpu_id = 0;
> + uint32_t umc_inst = 0;
> + uint32_t chan_index = 0;
> + struct ras_err_data err_data = {0, 0, 0, NULL};
> + struct eeprom_table_record err_rec;
> + uint64_t retired_page;
> +
> + /*
> +  * If the error was generated in UMC_V2, which belongs to GPU UMCs,

Why does it matter if the error is a v2 UMC generated one?

IOW, can you detect the type of errors in GPU memory - I assume this
is what this is supposed to handle - from the error signature itself
instead of doing is_smca_umc_v2?

> +  * and error occurred in DramECC (Extended error code = 0) then only
> +  * process the error, else bail out.
> +  */
> + if (!m || !(is_smca_umc_v2(m->bank) && (XEC(m->status, 0x1f) == 0x0)))
> + return NOTIFY_DONE;
> +
> + gpu_id = GET_MCA_IPID_GPUID(m->ipid);
> +
> + /*
> +  * GPU Id is offset by GPU_ID_OFFSET in MCA_IPID_UMC register.
> +  */
> + gpu_id -= GPU_ID_OFFSET;
> +
> + adev = find_adev(gpu_id);
> + if (!adev) {
> + dev_warn(adev->dev, "%s: Unable to find adev for gpu_id: %d\n",
> +  __func__, gpu_id);
> + return NOTIFY_DONE;
> + }
> +
> + /*
> +  * If it is correctable error, then print a message and return.
> +  */
> + if (mce_is_correctable(m)) {
> + dev_info(adev->dev, "%s: UMC Correctable error detected.",
> + __func__);

So put yourself in the shoes of a support engineer who starts getting
all those calls about people's hardware getting correctable errors and
should they replace it and should AMD RMA the GPUs and so on and so
on..? Do you really wanna be on the receiving side of that call?

IOW, whom does printing the fact that the GPU had a corrected error
which got corrected by the hardware, help and do you really want to
upset people needlessly?

> + return NOTIFY_OK;
> + }
> +
> + /*
> +  * If it is uncorrectable error, then find out UMC instance and
> +  * channel index.
> +  */
> + find_umc_inst_chan_index(m, &umc_inst, &chan_index);

That's a void function but it could return a pointer to the instance and
channel bundled in a struct maybe...

> +
> + dev_info(adev->dev, "Uncorrectable error detected in UMC inst: %d,"
> + " chan_idx: %d", umc_inst, chan_index);
> +
> + memset(&err_rec, 0x0, sizeof(struct eeprom_table_record));
> +
> + /*
> +  * Translate UMC channel address to Physical address
> +  */
> + retired_page = ADDR_OF_8KB_BLOCK(m->addr) |
> + ADDR_OF_256B_BLOCK(chan_index) |
> + OFFSET_IN_256B_BLOCK(m->addr);
> +
> + err_rec.address = m->addr;
> + err_rec.retired_page = retired_page >> AMDGPU_GPU_PAGE_SHIFT;
> + err_rec.ts = (uint64_t)ktime_get_real_seconds();
> + err_rec.err_type = AMDGPU_RAS_EEPROM_ERR_NON_RECOVERABLE;
> + err_rec.cu = 0;
> + err_rec.mem_channel = chan_index;
> + err_rec.mcumc_id = umc_inst;
> +
> + err_data.err_addr = &err_rec;
> + err_data.err_addr_cnt = 1;
> +
> + if (amdgpu_bad_page_threshold != 0) {
> + amdgpu_ras_add_bad_pages(adev, err_data.err_addr,
> + err_data.err_addr_cnt);
> + amdgpu_ras_save_bad_pages(adev);
> + }
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block amdgpu_bad_page_nb = {
> + .notifier_call  = amdgpu_bad_page_notifier,
> + .priority   = MCE_PRIO_ACCEL,

Nothing ever explains why this needs to be a separate priority. So how
about it?

> +};
> +
> +static void amdgpu_register_bad_pages_mca_notifier(void)
> +{
> + /*
> +  * Register the x86 notifier with MCE subsystem.
> +  * Please note a notifier can be registered only once
> +  * with the MCE subsystem.
> +  */

Why do you need this? Other users don't need that. Do you need to call
mce_unregister_decode_chain() when the driver gets removed or so?

> + if (notifier_registered == false) {

This is just silly and should be fixed properly once the issue is
understood.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-12 Thread Borislav Petkov
On Wed, May 12, 2021 at 07:00:58PM +, Joshi, Mukul wrote:
> SMCA UMCv2 corresponds to GPU's UMC MCA bank and the GPU driver is
> only interested in errors on GPU UMC.

So that thing should be called SMCA_GPU_UMC not SMCA_UMC_V2.

> We cannot know this without is_smca_umc_v2.

You don't need it - just export smca_get_bank_type() and test the bank
type at the call site.

> Maybe. I hope its not too much of a concern if it stays the way it is.

That was just a suggestion anyway - it is not code I maintain so not my
call.

> I wasn't really sure if I should use the EDAC priority here or create a new 
> one for Accelerator devices.
> I thought using EDAC priority might not be accepted by the maintainers as 
> EDAC and GPU (Accelerator) devices
> are two different class of devices.
> That is the reason I create a new one. 
> I am OK to use EDAC priority if that is acceptable.

I don't know what's acceptable because I still am unclear as to what
that thing is supposed to do. It seems you are interested only in
uncorrectable errors.

How are those errors reported? #MC exception, deferred interrupt, simply
logged in the bank and we find them by polling?

Then, the commit message is talking about some "bad page retirement".
What does that do? What can the user do when she sees the

"Uncorrectable error detected in UMC..."

message?

It depends on what "retiring" of GPU pages means...

In any case, dmesg should issue a human-understandable message about the
recovery action being done and what that means for the user: should she
replace the GPU, should she ignore, etc, etc.

> A system can have multiple GPUs and we only want a single notifier
> registered. I will change the comment to explicitly state this.

Actually, the notifier registration should be able to return a different
retval to state that a callback has already been registered but that
warns only currently so I'm guessing we're stuck with such ugly
"workarounds" for their shortcomings.

I'm gonna take a look whether they can be fixed though so that you don't
have to do this notifier_registered thing.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-13 Thread Borislav Petkov
On Thu, May 13, 2021 at 03:20:36AM +, Joshi, Mukul wrote:
> Exporting smca_get_bank_type() works fine when CONFIG_X86_MCE_AMD is defined.
> I would need to put #ifdef CONFIG_X86_MCE_AMD in my code to compile the amdgpu
> driver when CONFIG_X86_MCE_AMD is not defined.
> I can avoid all that by using is_smca_umc_v2().
> I think it would be cleaner with using is_smca_umc_v2().

See how smca_get_long_name() is exported and export that function the
same way.

To save you some energy: is_smca_umc_v2() is not going to happen.

> You can think of GPU device as a EDAC device here. It is mainly
> interested in handling uncorrectable errors.

An EDAC "device", as you call it, is not interested in handling UEs. If
anything, it counts them.

> It is a deferred interrupt that generates an MCE.

Is that the same deferred interrupt which calls amd_deferred_error_interrupt() ?

> When an uncorrectable error is detected on the GPU UMC, all we are
> doing is determining the physical address where the error occurred and
> then "retiring" the page that address belongs to.

What page is that? Normal DRAM page or a page in some special GPU memory?

> By retiring, we mean we reserve the page so that it is not available
> for allocations to any applications.

We do that for normal DRAM memory pages by poisoning them. I hope you
don't mean that.

Looking at

amdgpu_ras_add_bad_pages
|-> amdgpu_vram_mgr_reserve_range

that's some VRAM thing so I'm guessing special memory on the GPU.

If so, what happens with all those "retired" pages when you reboot?
They're getting used again and potentially trigger the same UEs and the
same retiring happens?

> We are providing information to the user by storing all the
> information about the retired pages in EEPROM. This can be accessed
> through sysfs.

Ok, I'm a user and I can access that information through sysfs. What can
I do with it?

> Hope it clears what "bad page retirement" is achieving.

It is getting there.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-13 Thread Borislav Petkov
On Thu, May 13, 2021 at 10:17:47AM -0400, Alex Deucher wrote:
> The bad pages are stored in an EEPROM on the board and the next time
> the driver loads it reads the EEPROM so that it can reserve the bad
> pages at init time so they don't get used again.

And that works automagically on the next boot? Because that sounds like
the right thing to do.

So practically, what happens to a GPU in such a case where the VRAM
starts going bad? It might get exhausted eventually and the driver will
say something along the lines of:

  "VRAM bad pages: 80%, consider replacing the GPU. It is operating
  currently with degrated performance."

or so?

Yap, from a RAS perspective, that makes good sense as you're prolonging
the life of the component while still remains operational as good as it
can and the only user interaction you need is she/he replacing it.

Sounds good.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-13 Thread Borislav Petkov
On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote:
> Right.  The sys admin can query the bad page count and decide when to
> retire the card.

Yap, although the driver should actively "tell" the sysadmin when some
critical counts of retired VRAM pages are reached because I doubt all
admins would go look at those counts on their own.

Btw, you say "admin" - am I to understand that those are some high end
GPU cards with ECC memory? If consumer grade stuff has this too, then
the driver should very much warn on such levels on its own because
normal users won't know what and where to look.

Other than that, the big picture sounds good to me.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-14 Thread Borislav Petkov
On Thu, May 13, 2021 at 11:10:34PM +, Joshi, Mukul wrote:
> That's probably not the best example to look at.

Oh, it is the *perfect* example but...

> smca_get_long_name() is used in drivers/edac/mce_amd.c and this file
> doesn't get compiled when CONFIG_X86_MCE_AMD is not defined.
>
> And amdgpu driver has no dependency on CONFIG_X86_MCE_AMD.

... maybe this will make you see it the right way: how much of the
amdgpu RAS functionality you're adding, is going to even function
without CONFIG_X86_MCE_AMD?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-14 Thread Borislav Petkov
On Thu, May 13, 2021 at 11:14:30PM +, Joshi, Mukul wrote:
> Are you OK with a new MCE priority (MCE_PRIO_ACCEL) or do you want us to use
> something else?

I still don't know why a separate priority is needed. Maybe this still
needs answering:

> It is a deferred interrupt that generates an MCE.

Is that the same deferred interrupt which calls
amd_deferred_error_interrupt() ?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-14 Thread Borislav Petkov
On Fri, May 14, 2021 at 01:06:33PM +, Joshi, Mukul wrote:
> We have RAS functionality in other ASICs that is not dependent on
> CONFIG_X86_MCE_AMD. So, I don't think we would want to do that just
> for one ASIC.

Lemme try again: you said that those errors do get reported through a
deferred interrupt. Which is likely amd_deferred_error_interrupt().

If it is that interrupt and you don't have CONFIG_X86_MCE_AMD enabled,
then you won't get any errors reported and your RAS functionality will
simply sit there inactive.

So if that above is true - something to which I'm still not getting
an answer but maybe one fine day... - so if that above is true, your
RAS functionality *needs* CONFIG_X86_MCE_AMD to be enabled in order to
*actually* function.

So you *must* make your RAS functionality depend on CONFIG_X86_MCE_AMD
- otherwise no deferred interrupts and no errors reported. It is that
simple.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [5.13-rc1][bug] often hangs for no reason

2021-05-17 Thread Borislav Petkov
On Mon, May 17, 2021 at 03:27:23AM +0500, Mikhail Gavrilov wrote:
> Hi folks.
> 5.13-rc1 after 5.13-rc0 is a disaster because it hangs and hangs again
> after reboot.
> All hang's have in common is that they all happens in
> smp_call_function_many_cond function (I compared all trace [1], [2],
> [3], [4], [5])
> I do not know if this is a known problem or not, so I'm reporting here.

Looks like some splat, lockdep probably, in amdgpu when it gets loaded.
Maybe locking's botched although the beginning of the splats is missing
for whatever reason...

Adding them to Cc in case they have a better idea.

> Full logs here:
> [1] https://pastebin.com/u7SkKGDT
> [2] https://pastebin.com/fNae4dSL
> [3] https://pastebin.com/jEDMjQDy
> [4] https://pastebin.com/vEMhWGgE
> [5] https://pastebin.com/fvWx5ctR
> 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RIP: 0010:radeon_vm_fini+0x15/0x220 [radeon]

2022-01-17 Thread Borislav Petkov
On Mon, Jan 17, 2022 at 08:16:09AM +0100, Christian König wrote:
> Interesting to see that even that old stuff is still used.

Well, "used" is a stretch.

This is my way of testing on K8 as pretty much all the big K8 boxes to
which I had access to, got decommissioned so this baby is the only K8
real hw I have now. :-)

Lemme test your patch.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


first bad commit: [fc8c70526bd30733ea8667adb8b8ffebea30a8ed] drm/radeon: Prefer lower feedback dividers

2020-09-13 Thread Borislav Petkov
Hi,

this patch breaks X on my box - it is fully responsive and I can log in
into it from another machine but both monitors are black and show this:

"The current input timing is not supported by the monitor display. Please

change your input timing to 1920x1200@60Hz or any other monitor

listed timing as per the monitor specifications."

Reverting it fixes the issue.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[bugzilla-dae...@bugzilla.kernel.org: [Bug 211245] New: Fedora 33 amdgpu print warning at boot]

2021-01-18 Thread Borislav Petkov
Forwarding by mail because I can't find the respective AMD GPU assignee
mail on bugzilla.k.o.

- Forwarded message from bugzilla-dae...@bugzilla.kernel.org -

Date: Sun, 17 Jan 2021 21:13:06 +
From: bugzilla-dae...@bugzilla.kernel.org
To: b...@alien8.de
Subject: [Bug 211245] New: Fedora 33 amdgpu print warning at boot
Message-ID: 

https://bugzilla.kernel.org/show_bug.cgi?id=211245

Bug ID: 211245
   Summary: Fedora 33 amdgpu print warning at boot
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 5.10.7-200.fc33.x86_64
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: x86-64
  Assignee: platform_x86...@kernel-bugs.osdl.org
  Reporter: aals...@gmail.com
Regression: No

Sorry for a non-descriptive bug report, but I have no idea what is going on
here, but I guess that it is better to report the error.

I get this in dmesg every time I boot

```
[   12.743879] [ cut here ]
[   12.743958] WARNING: CPU: 15 PID: 465 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_resource.c:3240
dcn20_validate_bandwidth_fp+0x8d/0xd0 [amdgpu]
[   12.743959] Modules linked in: hid_logitech_hidpp hid_logitech_dj
hid_plantronics amdgpu iommu_v2 gpu_sched ttm drm_kms_helper cec drm
crct10dif_pclmul crc32_pclmul crc32c_intel igb nvme ccp ghash_clmulni_intel
nvme_core dca xhci_pci i2c_algo_bit xhci_pci_renesas wmi pinctrl_amd fuse
[   12.743971] CPU: 15 PID: 465 Comm: kworker/15:2 Not tainted
5.10.7-200.fc33.x86_64 #1
[   12.743972] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./X570 Taichi, BIOS P3.61 11/06/2020
[   12.743985] Workqueue: events drm_mode_rmfb_work_fn [drm]
[   12.744058] RIP: 0010:dcn20_validate_bandwidth_fp+0x8d/0xd0 [amdgpu]
[   12.744071] Code: 00 7b 35 22 85 14 1f 00 00 75 2f 31 d2 f2 0f 11 85 58 26
00 00 48 89 ee 4c 89 e7 e8 9d f6 ff ff 89 c2 22 95 14 1f 00 00 75 30 <0f> 0b 48
89 9d 58 26 00 00 5b 5d 41 5c c3 75 c9 48 89 9d 58 26 00
[   12.744072] RSP: 0018:bf15c07c7c40 EFLAGS: 00010246
[   12.744073] RAX: 0001 RBX: 40794000 RCX:
07a6
[   12.744073] RDX:  RSI: 590effd4489bae10 RDI:
0002f1a0
[   12.744074] RBP: 985fd1e4 R08: 985fd25cb000 R09:
985fd037
[   12.744074] R10: 985fd25cb000 R11: 00010001 R12:
985fd037
[   12.744075] R13: 985fcd1134e0 R14: 985fcef93000 R15:
985fd1e4
[   12.744076] FS:  () GS:9866cedc()
knlGS:
[   12.744077] CS:  0010 DS:  ES:  CR0: 80050033
[   12.744077] CR2: 7f899f98c56c CR3: 00010c616000 CR4:
00350ee0
[   12.744078] Call Trace:
[   12.744154]  dcn20_validate_bandwidth+0x24/0x40 [amdgpu]
[   12.744223]  dc_validate_global_state+0x2f2/0x390 [amdgpu]
[   12.744295]  amdgpu_dm_atomic_check+0xacf/0xbd0 [amdgpu]
[   12.744308]  drm_atomic_check_only+0x55a/0x7d0 [drm]
[   12.744321]  ? drm_connector_list_iter_next+0x81/0xb0 [drm]
[   12.744330]  drm_atomic_commit+0x13/0x50 [drm]
[   12.744341]  drm_framebuffer_remove+0x388/0x3f0 [drm]
[   12.744351]  drm_mode_rmfb_work_fn+0x4f/0x60 [drm]
[   12.744354]  process_one_work+0x1b6/0x350
[   12.744356]  worker_thread+0x1e8/0x3e0
[   12.744358]  ? process_one_work+0x350/0x350
[   12.744359]  kthread+0x11b/0x140
[   12.744360]  ? __kthread_bind_mask+0x60/0x60
[   12.744362]  ret_from_fork+0x22/0x30
[   12.744364] ---[ end trace 9e0cfbae2c7a2d73 ]---
```

$ uname -r
5.10.7-200.fc33.x86_64

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

- End forwarded message -

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


amdgpu, WARNING: CPU: 12 PID: 389 at arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xd5/0x100

2021-03-12 Thread Borislav Petkov
Hi folks,

I get the below on -rc2+tip/master. I added printks to your FPU macros:

---
diff --git a/drivers/gpu/drm/amd/display/dc/os_types.h 
b/drivers/gpu/drm/amd/display/dc/os_types.h
index 126c2f3a4dd3..49629dc03f99 100644
--- a/drivers/gpu/drm/amd/display/dc/os_types.h
+++ b/drivers/gpu/drm/amd/display/dc/os_types.h
@@ -53,8 +53,18 @@
 #if defined(CONFIG_DRM_AMD_DC_DCN)
 #if defined(CONFIG_X86)
 #include 
-#define DC_FP_START() kernel_fpu_begin()
-#define DC_FP_END() kernel_fpu_end()
+#define DC_FP_START()  \
+({ \
+   pr_emerg("%s: DC_FP_START\n", __func__);\
+   kernel_fpu_begin(); \
+})
+
+#define DC_FP_END()\
+({ \
+   pr_emerg("%s: DC_FP_END\n", __func__);  \
+   kernel_fpu_end();   \
+})
+
 #elif defined(CONFIG_PPC64)
 #include 
 #include 

and I get wrong nesting of FPU usage with amdgpu:

...
[2.480080] [drm] reserve 0x40 from 0xf41f80 for PSP TMR
[2.577011] amdgpu :06:00.0: amdgpu: RAS: optional ras ta ucode is not 
available
[2.585556] amdgpu :06:00.0: amdgpu: RAP: optional rap ta ucode is not 
available
[2.585567] amdgpu :06:00.0: amdgpu: SECUREDISPLAY: securedisplay ta 
ucode is not available
[2.586024] amdgpu :06:00.0: amdgpu: SMU is initialized successfully!
[2.587396] [drm] kiq ring mec 2 pipe 1 q 0
[2.588930] [drm] Display Core initialized with v3.2.122!
[2.601665] [drm] DMUB hardware initialized: version=0x0100
[2.620813] snd_hda_intel :06:00.1: bound :06:00.0 (ops 
amdgpu_dm_audio_component_bind_ops [amdgpu])
[2.698383] input: TPPS/2 Elan TrackPoint as 
/devices/platform/i8042/serio1/serio2/input/input15
[2.713147] [drm] VCN decode and encode initialized successfully(under DPG 
Mode).
[2.713180] [drm] JPEG decode initialized successfully.
[2.715003] kfd kfd: Allocated 3969056 bytes on gart
[2.715251] Virtual CRAT table created for GPU
[2.715412] amdgpu: Topology: Add dGPU node [0x1636:0x1002]
[2.715421] kfd kfd: added device 1002:1636
[2.715428] amdgpu :06:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 18, 
active_cu_number 27
[2.716496] [drm] fb mappable at 0x410CE
[2.716510] [drm] vram apper at 0x41000
[2.716515] [drm] size 8294400
[2.716518] [drm] fb depth is 24
[2.716522] [drm]pitch is 7680
[2.716710] fbcon: amdgpudrmfb (fb0) is primary device
[2.716922] dcn21_validate_bandwidth: DC_FP_START
[2.716969] patch_bounding_box: DC_FP_START

This should not happen. You need DC_FP_END before the next DC_FP_START
because FPU usage cannot nest.

But who knows, maybe this is fixed already...

[2.716973] [ cut here ]
[2.716974] WARNING: CPU: 12 PID: 389 at arch/x86/kernel/fpu/core.c:129 
kernel_fpu_begin_mask+0xd5/0x100
[2.716986] Modules linked in: joydev edac_mce_amd edac_core iwlmvm kvm_amd 
mac80211 libarc4 kvm irqbypass crct10dif_pclmul crc32_pclmul iwlwifi 
crc32c_intel snd_hda_codec_realtek snd_hda_codec_generic amdgpu(+) 
ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg 
snd_hda_codec rtsx_pci_sdmmc snd_hwdep mmc_core snd_hda_core aesni_intel libaes 
crypto_simd wmi_bmof thinkpad_acpi sp5100_tco snd_pcm cryptd nvram ucsi_acpi(+) 
ledtrig_audio watchdog rapl rtsx_pci platform_profile snd_timer pcspkr cfg80211 
efi_pstore typec_ucsi k10temp ccp i2c_piix4 gpu_sched mfd_core r8169 roles snd 
typec wmi soundcore ac battery video i2c_scmi acpi_cpufreq button psmouse 
serio_raw nvme nvme_core
[2.717057] CPU: 12 PID: 389 Comm: systemd-udevd Not tainted 5.12.0-rc2+ #1
[2.717062] Hardware name: LENOVO 20Y2CC/20Y2CC, BIOS R1BET58W(1.27 
) 10/20/2020
[2.717065] RIP: 0010:kernel_fpu_begin_mask+0xd5/0x100
[2.717070] Code: 40 75 af f0 80 4f 01 40 48 81 c7 c0 0b 00 00 e8 d1 fb ff 
ff eb 9c db e3 eb c7 0f 0b 66 0f 1f 84 00 00 00 00 00 e9 62 ff ff ff <0f> 0b 66 
0f 1f 84 00 00 00 00 00 e9 61 ff ff ff 66 66 2e 0f 1f 84
[2.717073] RSP: 0018:c90001427528 EFLAGS: 00010202
[2.717076] RAX: 0001 RBX: 0002 RCX: 
[2.717078] RDX:  RSI: 0082 RDI: 0001
[2.717080] RBP: 8881301e R08: 82a246a0 R09: 5f4344203a786f62
[2.717082] R10: 625f676e69646e75 R11: 6f625f6863746170 R12: 888181783000
[2.717083] R13:  R14: c900014275cc R15: a10c87c0
[2.717085] FS:  7f53f5f1a8c0() GS:8883fb10() 
knlGS:
[2.717088] CS:  0010 DS:  ES:  CR0: 80050033
[2.717091] CR2: 5572e3f6fc90 CR3: 0001026ce000 CR4: 00350ee0
[2.717093] Call Trace:
[2.717100]  dcn21_calculate_wm.cold+0x27/0x3a8 [amdgpu]
[2.717560]  dcn21_valida

Re: amdgpu, WARNING: CPU: 12 PID: 389 at arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xd5/0x100

2021-03-12 Thread Borislav Petkov
On Fri, Mar 12, 2021 at 06:20:25PM +, Deucher, Alexander wrote:
> Should be fixed with these patches:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=15e8b95d5f7509e0b09289be8c422c459c9f0412
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=680174cfd1e1cea70a8f30ccb44d8fbdf996018e

Looks like it. Lemme know if I should run them explicitly now or
alternatively, I'll wait until they land in -rc3. They'll get tested
eventually as that machine is a test box...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


8353d30e747f ("drm/amd/display: disable stream if pixel clock changed with link active")

2020-12-11 Thread Borislav Petkov
Hi,

patch in $Subject breaks booting on a laptop here, GPU details are
below. The machine stops booting right when it attempts to switch modes
during boot, to a higher mode than the default VGA one. Machine doesn't
ping and is otherwise unresponsive so that a hard reset is the only
thing that helps.

Reverting that patch ontop of -rc7 fixes it and the machine boots just fine.

Thx.

[1.628086] ata1.00: supports DRM functions and may not be fully accessible
[1.632050] ata1.00: supports DRM functions and may not be fully accessible
[1.895818] [drm] amdgpu kernel modesetting enabled.
[1.897628] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 
0x103C:0x807E 0xC4).
[1.898256] [drm] register mmio base: 0xD0C0
[1.898422] [drm] register mmio size: 262144
[1.898583] [drm] add ip block number 0 
[1.898759] [drm] add ip block number 1 
[1.898931] [drm] add ip block number 2 
[1.899082] [drm] add ip block number 3 
[1.899241] [drm] add ip block number 4 
[1.899439] [drm] add ip block number 5 
[1.899573] [drm] add ip block number 6 
[1.899693] [drm] add ip block number 7 
[1.899827] [drm] add ip block number 8 
[1.911458] [drm] BIOS signature incorrect 5b 7
[1.912551] [drm] UVD is enabled in physical mode
[1.912707] [drm] VCE enabled in physical mode
[1.912921] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment 
size is 9-bit
[1.913837] [drm] Detected VRAM RAM=512M, BAR=512M
[1.913998] [drm] RAM width 128bits UNKNOWN
[1.915149] [drm] amdgpu: 512M of VRAM memory ready
[1.915306] [drm] amdgpu: 3072M of GTT memory ready.
[1.915468] [drm] GART: num cpu pages 262144, num gpu pages 262144
[1.916139] [drm] PCIE GART of 1024M enabled (table at 0x00F40090).
[1.918733] [drm] Found UVD firmware Version: 1.91 Family ID: 11
[1.918950] [drm] UVD ENC is disabled
[1.919680] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[1.925963] [drm] DM_PPLIB: values for Engine clock
[1.926106] [drm] DM_PPLIB:   30
[1.926205] [drm] DM_PPLIB:   36
[1.926304] [drm] DM_PPLIB:   423530
[1.926404] [drm] DM_PPLIB:   514290
[1.926516] [drm] DM_PPLIB:   626090
[1.926629] [drm] DM_PPLIB:   72
[1.926743] [drm] DM_PPLIB: Validation clocks:
[1.926952] [drm] DM_PPLIB:engine_max_clock: 72000
[1.927117] [drm] DM_PPLIB:memory_max_clock: 8
[1.927281] [drm] DM_PPLIB:level   : 8
[1.927435] [drm] DM_PPLIB: values for Display clock
[1.927594] [drm] DM_PPLIB:   30
[1.927708] [drm] DM_PPLIB:   40
[1.927822] [drm] DM_PPLIB:   496560
[1.927936] [drm] DM_PPLIB:   626090
[1.928048] [drm] DM_PPLIB:   685720
[1.928161] [drm] DM_PPLIB:   757900
[1.928275] [drm] DM_PPLIB: Validation clocks:
[1.928419] [drm] DM_PPLIB:engine_max_clock: 72000
[1.928584] [drm] DM_PPLIB:memory_max_clock: 8
[1.928748] [drm] DM_PPLIB:level   : 8
[1.928901] [drm] DM_PPLIB: values for Memory clock
[1.929058] [drm] DM_PPLIB:   333000
[1.929172] [drm] DM_PPLIB:   80
[1.929403] [drm] DM_PPLIB: Validation clocks:
[1.929549] [drm] DM_PPLIB:engine_max_clock: 72000
[1.929716] [drm] DM_PPLIB:memory_max_clock: 8
[1.929919] [drm] DM_PPLIB:level   : 8
[1.930148] [drm] Display Core initialized with v3.2.104!
[2.003938] [drm] UVD initialized successfully.
[2.204023] [drm] VCE initialized successfully.
[2.206228] [drm] fb mappable at 0xA0EE4000
[2.206375] [drm] vram apper at 0xA000
[2.206514] [drm] size 14745600
[2.206654] [drm] fb depth is 24
[2.206760] [drm]pitch is 10240
[2.207123] fbcon: amdgpudrmfb (fb0) is primary device
[2.301263] amdgpu :00:01.0: [drm] fb0: amdgpudrmfb frame buffer device
[2.320735] [drm] Initialized amdgpu 3.40.0 20150101 for :00:01.0 on 
minor 0


-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/amd/display: disable stream if pixel clock changed with link active"

2020-12-14 Thread Borislav Petkov
On Mon, Dec 14, 2020 at 04:53:39PM -0500, Alex Deucher wrote:
> This reverts commit 8353d30e747f4e5cdd867c6b054dbb85cdcc76a9.
> 
> This causes a hang on a carrizo based laptop.  Revert until we can fix
> it properly.
> 
> Cc: Borislav Petkov 

Reported-by: me

> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc.c | 1 +
>  1 file changed, 1 insertion(+)

Lemme know if I should test fixes.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: 8353d30e747f ("drm/amd/display: disable stream if pixel clock changed with link active")

2020-12-15 Thread Borislav Petkov
On Tue, Dec 15, 2020 at 10:47:03AM -0500, Rodrigo Siqueira wrote:
> Hi Boris,
> 
> Could you check if your branch has this commit:
> 
>  drm/amd/display: Fix module load hangs when connected to an eDP
> 
> If so, could you try this patch:
> 
>  https://patchwork.freedesktop.org/series/84965/

So I did a bisection between

git bisect start
# bad: [3650b228f83adda7e5ee532e2b90429c03f7b9ec] Linux 5.10-rc1
git bisect bad 3650b228f83adda7e5ee532e2b90429c03f7b9ec
# good: [bbf5c979011a099af5dc76498918ed7df445635b] Linux 5.9

and the patch in $Subject came in in 5.10-rc1.

I can test any tree you want me to so just tell me on which tree to
apply this patch and I'll run it.

In any case, it doesn't apply on v5.10 though:

$ test-apply.sh /tmp/rodrigo.siqueira 
checking file drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c
Hunk #1 FAILED at 120

You can push a tree of yours somewhere which I can try directly,
alternatively.

Lemme know.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: 8353d30e747f ("drm/amd/display: disable stream if pixel clock changed with link active")

2020-12-15 Thread Borislav Petkov
On Tue, Dec 15, 2020 at 12:04:23PM -0500, Alex Deucher wrote:
> That patch trivially backports to 5.10.  See attached backported
> patch.  @Borislav Petkov does the attached patch fix 5.10 for you?

Yes, thanks.

Reported-and-tested-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: 8353d30e747f ("drm/amd/display: disable stream if pixel clock changed with link active")

2020-12-15 Thread Borislav Petkov
On Tue, Dec 15, 2020 at 02:00:58PM -0500, Rodrigo Siqueira wrote:
> Thanks for reporting this issue and test the fix.

It was my pleasure. Thanks for the quick fix!

:-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/radeon: stop re-init the TTM page pool

2021-01-11 Thread Borislav Petkov
On Tue, Jan 05, 2021 at 07:23:08PM +0100, Christian König wrote:
> Drivers are not supposed to init the page pool directly any more.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/radeon/radeon_ttm.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
> b/drivers/gpu/drm/radeon/radeon_ttm.c
> index d4328ff57757..35b715f82ed8 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -729,9 +729,6 @@ int radeon_ttm_init(struct radeon_device *rdev)
>   }
>   rdev->mman.initialized = true;
>  
> - ttm_pool_init(&rdev->mman.bdev.pool, rdev->dev, rdev->need_swiotlb,
> -   dma_addressing_limited(&rdev->pdev->dev));
> -
>   r = radeon_ttm_init_vram(rdev);
>   if (r) {
>   DRM_ERROR("Failed initializing VRAM heap.\n");
> -- 

Was finally able to test those during workstation hw maintenance so I
was able to install a new kernel and reboot.

Reported-by: Borislav Petkov 
Tested-by: Borislav Petkov 

Thanks for the fixes!

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


WARNING: CPU: 5 PID: 1464 at drivers/gpu/drm/ttm/ttm_bo.c:326 ttm_bo_release+0x27e/0x2d0 [ttm]

2023-06-05 Thread Borislav Petkov
Hi,

this below triggers with the latest Linus tree:

51f269a6ecc7 ("Merge tag 'probes-fixes-6.4-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace")

...
[   16.173593] [drm] radeon kernel modesetting enabled.
[   16.173743] radeon :29:00.0: vgaarb: deactivate vga console
[   16.174300] MCE: In-kernel MCE decoding enabled.
[   16.175695] EDAC DEBUG: umc_read_base_mask:   DCSB0[0]=0x0001 reg: 
0x5
[   16.175698] EDAC DEBUG: umc_read_base_mask: DCSB_SEC0[0]=0x reg: 
0x50010
[   16.175700] EDAC DEBUG: umc_read_base_mask:   DCSB0[1]=0x reg: 
0x50004
[   16.175702] EDAC DEBUG: umc_read_base_mask: DCSB_SEC0[1]=0x reg: 
0x50014
[   16.175703] EDAC DEBUG: umc_read_base_mask:   DCSB0[2]=0x0201 reg: 
0x50008
[   16.175705] EDAC DEBUG: umc_read_base_mask: DCSB_SEC0[2]=0x reg: 
0x50018
[   16.175706] EDAC DEBUG: umc_read_base_mask:   DCSB0[3]=0x reg: 
0x5000c
[   16.175707] EDAC DEBUG: umc_read_base_mask: DCSB_SEC0[3]=0x reg: 
0x5001c
[   16.175709] EDAC DEBUG: umc_read_base_mask:   DCSM0[0]=0x03fffdfe reg: 
0x50020
[   16.175710] EDAC DEBUG: umc_read_base_mask: DCSM_SEC0[0]=0x reg: 
0x50028
[   16.175712] EDAC DEBUG: umc_read_base_mask:   DCSM0[1]=0x03fffdfe reg: 
0x50024
[   16.175713] EDAC DEBUG: umc_read_base_mask: DCSM_SEC0[1]=0x reg: 
0x5002c
[   16.175715] EDAC DEBUG: umc_read_base_mask:   DCSB1[0]=0x0001 reg: 
0x15
[   16.175716] EDAC DEBUG: umc_read_base_mask: DCSB_SEC1[0]=0x reg: 
0x150010
[   16.175718] EDAC DEBUG: umc_read_base_mask:   DCSB1[1]=0x reg: 
0x150004
[   16.175719] EDAC DEBUG: umc_read_base_mask: DCSB_SEC1[1]=0x reg: 
0x150014
[   16.175720] EDAC DEBUG: umc_read_base_mask:   DCSB1[2]=0x0201 reg: 
0x150008
[   16.175722] EDAC DEBUG: umc_read_base_mask: DCSB_SEC1[2]=0x reg: 
0x150018
[   16.175723] EDAC DEBUG: umc_read_base_mask:   DCSB1[3]=0x reg: 
0x15000c
[   16.175725] EDAC DEBUG: umc_read_base_mask: DCSB_SEC1[3]=0x reg: 
0x15001c
[   16.175726] EDAC DEBUG: umc_read_base_mask:   DCSM1[0]=0x03fffdfe reg: 
0x150020
[   16.175728] EDAC DEBUG: umc_read_base_mask: DCSM_SEC1[0]=0x reg: 
0x150028
[   16.175729] EDAC DEBUG: umc_read_base_mask:   DCSM1[1]=0x03fffdfe reg: 
0x150024
[   16.175730] EDAC DEBUG: umc_read_base_mask: DCSM_SEC1[1]=0x reg: 
0x15002c
[   16.175741] EDAC DEBUG: umc_determine_memory_type:   UMC0 DIMM type: 
Unbuffered-DDR4
[   16.175742] EDAC DEBUG: umc_determine_memory_type:   UMC1 DIMM type: 
Unbuffered-DDR4
[   16.177514] Console: switching to colour dummy device 80x25
[   16.177693] [drm] initializing kernel modesetting (CEDAR 0x1002:0x68E1 
0x174B:0x3000 0x00).
[   16.177733] ATOM BIOS: AMD
[   16.177795] radeon :29:00.0: VRAM: 1024M 0x - 
0x3FFF (1024M used)
[   16.177798] radeon :29:00.0: GTT: 1024M 0x4000 - 
0x7FFF
[   16.177800] [drm] Detected VRAM RAM=1024M, BAR=256M
[   16.177802] [drm] RAM width 64bits DDR
[   16.177835] [drm] radeon: 1024M of VRAM memory ready
[   16.177836] [drm] radeon: 1024M of GTT memory ready.
[   16.177839] [drm] Loading CEDAR Microcode
[   16.179106] [drm] Internal thermal controller without fan control
[   16.199812] [drm] radeon: dpm initialized
[   16.200179] [drm] GART: num cpu pages 262144, num gpu pages 262144
[   16.200399] [drm] enabling PCIE gen 2 link speeds, disable with 
radeon.pcie_gen2=0
[   16.218135] [drm] PCIE GART of 1024M enabled (table at 0x0014C000).
[   16.218239] radeon :29:00.0: WB enabled
[   16.218240] radeon :29:00.0: fence driver on ring 0 use gpu addr 
0x4c00
[   16.218242] radeon :29:00.0: fence driver on ring 3 use gpu addr 
0x4c0c
[   16.218606] radeon :29:00.0: fence driver on ring 5 use gpu addr 
0x0005c418
[   16.218657] radeon :29:00.0: radeon: MSI limited to 32-bit
[   16.218689] radeon :29:00.0: radeon: using MSI.
[   16.218707] [drm] radeon: irq initialized.
[   16.234730] [drm] ring test on 0 succeeded in 0 usecs
[   16.234738] [drm] ring test on 3 succeeded in 2 usecs
[   16.317725] r8169 :25:00.0 eth0: Link is Down
[   16.410486] [drm] ring test on 5 succeeded in 1 usecs
[   16.410492] [drm] UVD initialized successfully.
[   16.410555] [drm] ib test on ring 0 succeeded in 0 usecs
[   16.410596] [drm] ib test on ring 3 succeeded in 0 usecs
[   17.077422] [drm] ib test on ring 5 succeeded
[   17.077581] [drm] Radeon Display Connectors
[   17.077584] [drm] Connector 0:
[   17.077585] [drm]   HDMI-A-1
[   17.077586] [drm]   HPD4
[   17.077588] [drm]   DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 
0x644c
[   17.077590] [drm]   Encoders:
[   17.077591] [drm] DFP1: INTERNAL_UNIPHY1
[   17.077593] [drm] Connector 1:
[   17.077594] [drm]   DVI-I-1
[   17.077595] [drm]   HPD1
[   17.077597] [drm]   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c

Re: [PATCH] drm/radeon: Disable outputs when releasing fbdev client

2023-06-09 Thread Borislav Petkov
On Fri, Jun 09, 2023 at 04:03:56PM +0200, Thomas Zimmermann wrote:
> Disable the modesetting pipeline before release the radeon's fbdev
> client. Fixes the following error:
> 
> [   17.217408] WARNING: CPU: 5 PID: 1464 at drivers/gpu/drm/ttm/ttm_bo.c:326 
> ttm_bo_release+0x27e/0x2d0 [ttm]
> [   17.217418] Modules linked in: edac_mce_amd radeon(+) drm_ttm_helper ttm 
> video drm_suballoc_helper drm_display_helper kvm irqbypass drm_kms_helper 
> syscopyarea crc32_pclmul sysfillrect sha512_ssse3 sysimgblt sha512_generic 
> cfbfillrect cfbimgblt wmi_bmof aesni_intel cfbcopyarea crypto_simd cryptd 
> k10temp acpi_cpufreq wmi dm_mod
> [   17.217432] CPU: 5 PID: 1464 Comm: systemd-udevd Not tainted 6.4.0-rc4+ #1
> [   17.217436] Hardware name: Micro-Star International Co., Ltd. 
> MS-7A38/B450M PRO-VDH MAX (MS-7A38), BIOS B.G0 07/26/2022
> [   17.217438] RIP: 0010:ttm_bo_release+0x27e/0x2d0 [ttm]
> [   17.217444] Code: 48 89 43 38 48 89 43 40 48 8b 5c 24 30 48 8b b5 40 08 00 
> 00 48 8b 6c 24 38 48 83 c4 58 e9 7a 49 f7 e0 48 89 ef e9 6c fe ff ff <0f> 0b 
> 48 83 7b 20 00 0f 84 b7 fd ff ff 0f 0b 0f 1f 00 e9 ad fd ff
> [   17.217448] RSP: 0018:c995fbb0 EFLAGS: 00010202
> [   17.217451] RAX: 0001 RBX: 8881052c8de0 RCX: 
> 
> [   17.217453] RDX: 0001 RSI:  RDI: 
> 8881052c8de0
> [   17.217455] RBP: 888104a66e00 R08: 8881052c8de0 R09: 
> 888104a7cf08
> [   17.217457] R10: c995fbe0 R11: c995fbe8 R12: 
> 8881052c8c78
> [   17.217458] R13: 8881052c8c78 R14: dead0100 R15: 
> 88810528b108
> [   17.217460] FS:  7f319fcbb8c0() GS:1a54() 
> knlGS:
> [   17.217463] CS:  0010 DS:  ES:  CR0: 80050033
> [   17.217464] CR2: 55dc8b0224a0 CR3: 00010373d000 CR4: 
> 00750ee0
> [   17.217466] PKRU: 5554
> [   17.217468] Call Trace:
> [   17.217470]  
> [   17.217472]  ? __warn+0x97/0x160
> [   17.217476]  ? ttm_bo_release+0x27e/0x2d0 [ttm]
> [   17.217481]  ? report_bug+0x1ec/0x200
> [   17.217487]  ? handle_bug+0x3c/0x70
> [   17.217490]  ? exc_invalid_op+0x1f/0x90
> [   17.217493]  ? preempt_count_sub+0xb5/0x100
> [   17.217496]  ? asm_exc_invalid_op+0x16/0x20
> [   17.217500]  ? ttm_bo_release+0x27e/0x2d0 [ttm]
> [   17.217505]  ? ttm_resource_move_to_lru_tail+0x1ab/0x1d0 [ttm]
> [   17.217511]  radeon_bo_unref+0x1a/0x30 [radeon]
> [   17.217547]  radeon_gem_object_free+0x20/0x30 [radeon]
> [   17.217579]  radeon_fbdev_fb_destroy+0x57/0x90 [radeon]
> [   17.217616]  unregister_framebuffer+0x72/0x110
> [   17.217620]  drm_client_dev_unregister+0x6d/0xe0
> [   17.217623]  drm_dev_unregister+0x2e/0x90
> [   17.217626]  drm_put_dev+0x26/0x90
> [   17.217628]  pci_device_remove+0x44/0xc0
> [   17.217631]  really_probe+0x257/0x340
> [   17.217635]  __driver_probe_device+0x73/0x120
> [   17.217638]  driver_probe_device+0x2c/0xb0
> [   17.217641]  __driver_attach+0xa0/0x150
> [   17.217643]  ? __pfx___driver_attach+0x10/0x10
> [   17.217646]  bus_for_each_dev+0x67/0xa0
> [   17.217649]  bus_add_driver+0x10e/0x210
> [   17.217651]  driver_register+0x5c/0x120
> [   17.217653]  ? __pfx_radeon_module_init+0x10/0x10 [radeon]
> [   17.217681]  do_one_initcall+0x44/0x220
> [   17.217684]  ? kmalloc_trace+0x37/0xc0
> [   17.217688]  do_init_module+0x64/0x240
> [   17.217691]  __do_sys_finit_module+0xb2/0x100
> [   17.217694]  do_syscall_64+0x3b/0x90
> [   17.217697]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [   17.217700] RIP: 0033:0x7f319feaa5a9
> [   17.217702] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 
> f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 
> 01 f0 ff ff 73 01 c3 48 8b 0d 27 08 0d 00 f7 d8 64 89 01 48
> [   17.217706] RSP: 002b:7ffc6bf3e7f8 EFLAGS: 0246 ORIG_RAX: 
> 0139
> [   17.217709] RAX: ffda RBX: 5607204f3170 RCX: 
> 7f319feaa5a9
> [   17.217710] RDX:  RSI: 7f31a002eefd RDI: 
> 0018
> [   17.217712] RBP: 7f31a002eefd R08:  R09: 
> 5607204f1860
> [   17.217714] R10: 0018 R11: 0246 R12: 
> 0002
> [   17.217716] R13:  R14: 560720522450 R15: 
> 560720255899
> [   17.217718]  
> [   17.217719] ---[ end trace  ]---
> 
> The buffer object backing the fbdev emulation got pinned twice: by the
> fb_probe helper radeon_fbdev_create_pinned_object() and the modesetting
> code when the framebuffer got displayed. It only got unpinned once by
> the fbdev helper radeon_fbdev_destroy_pinned_object(). Hence TTM's BO-
> release function c

amdgpu refcount saturation

2022-12-18 Thread Borislav Petkov
Hi folks,

this is with Linus' tree from Wed:

041fae9c105a ("Merge tag 'f2fs-for-6.2-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs")

on a CZ laptop:

[7.782901] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 
0x103C:0x807E 0xC4)

The splat is kinda messy:

---

[7.755306] [drm] amdgpu kernel modesetting enabled.
[7.779110] amdgpu :00:01.0: vgaarb: deactivate vga console
[7.780417] Console: switching to colour dummy device 80x25
[7.782901] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 
0x103C:0x807E 0xC4).
[7.783244] [drm] register mmio base: 0xD0C0
[7.783405] [drm] register mmio size: 262144
[7.784182] [drm] add ip block number 0 
[7.784375] [drm] add ip block number 1 
[7.784555] [drm] add ip block number 2 
[7.784717] [drm] add ip block number 3 
[7.784925] [drm] add ip block number 4 
[7.785094] [drm] add ip block number 5 
[7.785264] [drm] add ip block number 6 
[7.785413] [drm] add ip block number 7 
[7.785580] [drm] add ip block number 8 
[7.800919] [drm] BIOS signature incorrect 5b 7
[7.801095] resource sanity check: requesting [mem 0x000c-0x000d], 
which spans more than PCI Bus :00 [mem 0x000c-0x000cbfff window]
[7.801544] caller pci_map_rom+0x68/0x1c0 mapping multiple BARs
[7.801838] amdgpu :00:01.0: amdgpu: Fetched VBIOS from ROM BAR
[7.802067] amdgpu: ATOM BIOS: SWBRT27354.001
[7.802272] [drm] UVD is enabled in physical mode
[7.802438] [drm] VCE enabled in physical mode
[7.802592] amdgpu :00:01.0: amdgpu: Trusted Memory Zone (TMZ) feature 
not supported
[7.803100] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment 
size is 9-bit
[7.803387] amdgpu :00:01.0: amdgpu: VRAM: 512M 0x00F4 - 
0x00F41FFF (512M used)
[7.803708] amdgpu :00:01.0: amdgpu: GART: 1024M 0x00FF - 
0x00FF3FFF
[7.804007] [drm] Detected VRAM RAM=512M, BAR=512M
[7.804174] [drm] RAM width 128bits UNKNOWN
[7.804703] [drm] amdgpu: 512M of VRAM memory ready
[7.804882] [drm] amdgpu: 7638M of GTT memory ready.
[7.805164] [drm] GART: num cpu pages 262144, num gpu pages 262144
[7.805484] [drm] PCIE GART of 1024M enabled (table at 0x00F400A0).
[7.808418] amdgpu: hwmgr_sw_init smu backed is smu8_smu
[7.809070] [drm] Found UVD firmware Version: 1.91 Family ID: 11
[7.809413] [drm] UVD ENC is disabled
[7.810321] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[7.812036] amdgpu: smu version 18.62.00
[7.818378] [drm] DM_PPLIB: values for Engine clock
[7.818566] [drm] DM_PPLIB:   30
[7.818689] [drm] DM_PPLIB:   36
[7.818811] [drm] DM_PPLIB:   423530
[7.818934] [drm] DM_PPLIB:   514290
[7.819056] [drm] DM_PPLIB:   626090
[7.819179] [drm] DM_PPLIB:   72
[7.819302] [drm] DM_PPLIB: Validation clocks:
[7.819456] [drm] DM_PPLIB:engine_max_clock: 72000
[7.819633] [drm] DM_PPLIB:memory_max_clock: 8
[7.819810] [drm] DM_PPLIB:level   : 8
[7.819977] [drm] DM_PPLIB: values for Display clock
[7.820148] [drm] DM_PPLIB:   30
[7.820271] [drm] DM_PPLIB:   40
[7.820394] [drm] DM_PPLIB:   496560
[7.820563] [drm] DM_PPLIB:   626090
[7.820694] [drm] DM_PPLIB:   685720
[7.820857] [drm] DM_PPLIB:   757900
[7.820979] [drm] DM_PPLIB: Validation clocks:
[7.821133] [drm] DM_PPLIB:engine_max_clock: 72000
[7.821310] [drm] DM_PPLIB:memory_max_clock: 8
[7.821487] [drm] DM_PPLIB:level   : 8
[7.821653] [drm] DM_PPLIB: values for Memory clock
[7.821821] [drm] DM_PPLIB:   333000
[7.821944] [drm] DM_PPLIB:   80
[7.822066] [drm] DM_PPLIB: Validation clocks:
[7.80] [drm] DM_PPLIB:engine_max_clock: 72000
[7.822397] [drm] DM_PPLIB:memory_max_clock: 8
[7.822574] [drm] DM_PPLIB:level   : 8
[7.823044] [drm] Display Core initialized with v3.2.215!
[7.903994] [drm] UVD initialized successfully.
[8.103416] [drm] VCE initialized successfully.
[8.104616] amdgpu :00:01.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, 
active_cu_number 8
[8.109430] [drm] Initialized amdgpu 3.49.0 20150101 for :00:01.0 on 
minor 0
[8.120099] fbcon: amdgpudrmfb (fb0) is primary device
[8.886332] Console: switching to colour frame buffer device 320x90
[8.902118] amdgpu :00:01.0: [drm] fb0: amdgpudrmfb frame buffer device
[8.967565] process '/usr/bin/fstype' started with executable stack
[8.979419] PM: Image not found (code -22)
[9.043724] EXT4-fs (sda2): mounted filesystem 
c34989f9-7c8f-49ae-8285-7896af84c685 with ordered data mode. Quota mode: 
disabled.
[9.540346] systemd-udevd[1404]: /etc/udev/rules.d/storage_devices.rules:1 
Invalid value for OPTIONS key, ignoring: 'all_partitions'
[9.766687] input: Power Button as 
/devices/LNXSYSTM:00/L

  1   2   >