radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000010da43f last fence id 0x00000000010da52d on ring 0)

2018-06-06 Thread Borislav Petkov
Hi guys, X just froze here ontop of 4.17-rc7+ tip/master (kernel is from last week) with the splat at the end. Box is a x470 chipset with Ryzen 2700X. GPU gets detected as [7.440971] [drm] radeon kernel modesetting enabled. [7.441220] [drm] initializing kernel modesetting (RV635

Re: radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000010da43f last fence id 0x00000000010da52d on ring 0)

2018-06-07 Thread Borislav Petkov
On Wed, Jun 06, 2018 at 10:26:15AM +0200, Christian König wrote: > Well what did you do to trigger the lockup? Looks like an application send > something to the hardware to crash the GFX block. So what I observed was (in that order): machine was building a kernel so was busy, X didn't respond for

Re: radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000017a66bf last fence id 0x00000000017a67a1 on ring 0)

2019-01-15 Thread Borislav Petkov
On Sat, Jan 12, 2019 at 09:50:51PM +0100, Borislav Petkov wrote: > Hi guys, > > my odyssey with the GPU continues. This time it didn't reset itself > but started spewing a single line about the hardware locking up. > > The machine was responsive to sysrq so I was able to wr

radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000017a66bf last fence id 0x00000000017a67a1 on ring 0)

2019-01-13 Thread Borislav Petkov
Hi guys, my odyssey with the GPU continues. This time it didn't reset itself but started spewing a single line about the hardware locking up. The machine was responsive to sysrq so I was able to write out /var/log/messages and reboot. This is still with 4.20-rc7 but I'm building 5.0-rc1 to see

[bugzilla-dae...@bugzilla.kernel.org: [Bug 202493] New: Soft lockup ryzen]

2019-02-02 Thread Borislav Petkov
FYI: First splat triggers the REG_WAIT timeout warning: [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:944 WARNING: CPU: 14 PID: 1613 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:249 generic_reg_wait+0xdc/0x140

Re: 1c74ca7a1a9a ("drm/fb-helper: call vga_remove_vgacon automatically.")

2019-08-09 Thread Borislav Petkov
On Fri, Aug 09, 2019 at 10:54:41AM +0200, Gerd Hoffmann wrote: > A bit later: > >[8.198138] radeon :00:01.0: Direct firmware load for > radeon/PALM_pfp.bin failed with error -2 >[8.198351] r600_cp: Failed to load firmware "radeon/PALM_pfp.bin" >[8.198512]

first bad commit: [fc8c70526bd30733ea8667adb8b8ffebea30a8ed] drm/radeon: Prefer lower feedback dividers

2020-09-13 Thread Borislav Petkov
Hi, this patch breaks X on my box - it is fully responsive and I can log in into it from another machine but both monitors are black and show this: "The current input timing is not supported by the monitor display. Please change your input timing to 1920x1200@60Hz or any other monitor listed

8353d30e747f ("drm/amd/display: disable stream if pixel clock changed with link active")

2020-12-11 Thread Borislav Petkov
Hi, patch in $Subject breaks booting on a laptop here, GPU details are below. The machine stops booting right when it attempts to switch modes during boot, to a higher mode than the default VGA one. Machine doesn't ping and is otherwise unresponsive so that a hard reset is the only thing that

Re: 8353d30e747f ("drm/amd/display: disable stream if pixel clock changed with link active")

2020-12-15 Thread Borislav Petkov
On Tue, Dec 15, 2020 at 10:47:03AM -0500, Rodrigo Siqueira wrote: > Hi Boris, > > Could you check if your branch has this commit: > > drm/amd/display: Fix module load hangs when connected to an eDP > > If so, could you try this patch: > > https://patchwork.freedesktop.org/series/84965/ So I

Re: 8353d30e747f ("drm/amd/display: disable stream if pixel clock changed with link active")

2020-12-15 Thread Borislav Petkov
On Tue, Dec 15, 2020 at 02:00:58PM -0500, Rodrigo Siqueira wrote: > Thanks for reporting this issue and test the fix. It was my pleasure. Thanks for the quick fix! :-) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Re: 8353d30e747f ("drm/amd/display: disable stream if pixel clock changed with link active")

2020-12-15 Thread Borislav Petkov
On Tue, Dec 15, 2020 at 12:04:23PM -0500, Alex Deucher wrote: > That patch trivially backports to 5.10. See attached backported > patch. @Borislav Petkov does the attached patch fix 5.10 for you? Yes, thanks. Reported-and-tested-by: Borislav Petkov -- Regards/Gruss, Boris.

Re: [PATCH] Revert "drm/amd/display: disable stream if pixel clock changed with link active"

2020-12-14 Thread Borislav Petkov
On Mon, Dec 14, 2020 at 04:53:39PM -0500, Alex Deucher wrote: > This reverts commit 8353d30e747f4e5cdd867c6b054dbb85cdcc76a9. > > This causes a hang on a carrizo based laptop. Revert until we can fix > it properly. > > Cc: Borislav Petkov Reported-by: me > Signed

Re: [PATCH 1/2] drm/radeon: stop re-init the TTM page pool

2021-01-11 Thread Borislav Petkov
DRM_ERROR("Failed initializing VRAM heap.\n"); > -- Was finally able to test those during workstation hw maintenance so I was able to install a new kernel and reboot. Reported-by: Borislav Petkov Tested-by: Borislav Petkov Thanks for

[bugzilla-dae...@bugzilla.kernel.org: [Bug 211245] New: Fedora 33 amdgpu print warning at boot]

2021-01-18 Thread Borislav Petkov
Forwarding by mail because I can't find the respective AMD GPU assignee mail on bugzilla.k.o. - Forwarded message from bugzilla-dae...@bugzilla.kernel.org - Date: Sun, 17 Jan 2021 21:13:06 + From: bugzilla-dae...@bugzilla.kernel.org To: b...@alien8.de Subject: [Bug 211245] New:

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-13 Thread Borislav Petkov
On Thu, May 13, 2021 at 03:20:36AM +, Joshi, Mukul wrote: > Exporting smca_get_bank_type() works fine when CONFIG_X86_MCE_AMD is defined. > I would need to put #ifdef CONFIG_X86_MCE_AMD in my code to compile the amdgpu > driver when CONFIG_X86_MCE_AMD is not defined. > I can avoid all that by

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-13 Thread Borislav Petkov
On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote: > Right. The sys admin can query the bad page count and decide when to > retire the card. Yap, although the driver should actively "tell" the sysadmin when some critical counts of retired VRAM pages are reached because I doubt all

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-13 Thread Borislav Petkov
On Thu, May 13, 2021 at 10:17:47AM -0400, Alex Deucher wrote: > The bad pages are stored in an EEPROM on the board and the next time > the driver loads it reads the EEPROM so that it can reserve the bad > pages at init time so they don't get used again. And that works automagically on the next

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-12 Thread Borislav Petkov
Hi, so this is a drive-by review using the lore.kernel.org mail because I wasn't CCed on this. On Tue, May 11, 2021 at 09:30:58PM -0400, Mukul Joshi wrote: > +static int amdgpu_bad_page_notifier(struct notifier_block *nb, > + unsigned long val, void *data) > +{ >

Re: [5.13-rc1][bug] often hangs for no reason

2021-05-17 Thread Borislav Petkov
On Mon, May 17, 2021 at 03:27:23AM +0500, Mikhail Gavrilov wrote: > Hi folks. > 5.13-rc1 after 5.13-rc0 is a disaster because it hangs and hangs again > after reboot. > All hang's have in common is that they all happens in > smp_call_function_many_cond function (I compared all trace [1], [2], >

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-12 Thread Borislav Petkov
On Wed, May 12, 2021 at 07:00:58PM +, Joshi, Mukul wrote: > SMCA UMCv2 corresponds to GPU's UMC MCA bank and the GPU driver is > only interested in errors on GPU UMC. So that thing should be called SMCA_GPU_UMC not SMCA_UMC_V2. > We cannot know this without is_smca_umc_v2. You don't need it

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-14 Thread Borislav Petkov
On Fri, May 14, 2021 at 01:06:33PM +, Joshi, Mukul wrote: > We have RAS functionality in other ASICs that is not dependent on > CONFIG_X86_MCE_AMD. So, I don't think we would want to do that just > for one ASIC. Lemme try again: you said that those errors do get reported through a deferred

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-14 Thread Borislav Petkov
On Thu, May 13, 2021 at 11:10:34PM +, Joshi, Mukul wrote: > That's probably not the best example to look at. Oh, it is the *perfect* example but... > smca_get_long_name() is used in drivers/edac/mce_amd.c and this file > doesn't get compiled when CONFIG_X86_MCE_AMD is not defined. > > And

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-14 Thread Borislav Petkov
On Thu, May 13, 2021 at 11:14:30PM +, Joshi, Mukul wrote: > Are you OK with a new MCE priority (MCE_PRIO_ACCEL) or do you want us to use > something else? I still don't know why a separate priority is needed. Maybe this still needs answering: > It is a deferred interrupt that generates an

amdgpu, WARNING: CPU: 12 PID: 389 at arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xd5/0x100

2021-03-12 Thread Borislav Petkov
Hi folks, I get the below on -rc2+tip/master. I added printks to your FPU macros: --- diff --git a/drivers/gpu/drm/amd/display/dc/os_types.h b/drivers/gpu/drm/amd/display/dc/os_types.h index 126c2f3a4dd3..49629dc03f99 100644 --- a/drivers/gpu/drm/amd/display/dc/os_types.h +++

Re: amdgpu, WARNING: CPU: 12 PID: 389 at arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xd5/0x100

2021-03-12 Thread Borislav Petkov
On Fri, Mar 12, 2021 at 06:20:25PM +, Deucher, Alexander wrote: > Should be fixed with these patches: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=15e8b95d5f7509e0b09289be8c422c459c9f0412 >

Re: [PATCH v2 06/12] x86/sev: Replace occurrences of sev_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:25AM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/kernel/machine_kexec_64.c > b/arch/x86/kernel/machine_kexec_64.c > index 8e7b517ad738..66ff788b79c9 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -167,7 +167,7

Re: [PATCH v2 07/12] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
ture support is added for other memory encyrption > techonologies, the use of PATTR_GUEST_PROT_STATE can be updated, as > required, to specifically use PATTR_SEV_ES. > > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Signed-off-by: Tom Lendacky > --- >

Re: [PATCH v2 09/12] mm: Remove the now unused mem_encrypt_active() function

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 12:22:33PM +0200, Borislav Petkov wrote: > This one wants to be part of the previous patch. ... and the three following patches too - the treewide patch does a single atomic :) replacement and that's it. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/no

Re: [PATCH v2 05/12] x86/sme: Replace occurrences of sme_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:24AM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c > index edc67ddf065d..5635ca9a1fbe 100644 > --- a/arch/x86/mm/mem_encrypt.c > +++ b/arch/x86/mm/mem_encrypt.c > @@ -144,7 +144,7 @@ void __init sme_unmap_bootdata(char

Re: [PATCH v2 09/12] mm: Remove the now unused mem_encrypt_active() function

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:28AM -0500, Tom Lendacky wrote: > The mem_encrypt_active() function has been replaced by prot_guest_has(), > so remove the implementation. > > Reviewed-by: Joerg Roedel > Signed-off-by: Tom Lendacky > --- > include/linux/mem_encrypt.h | 4 > 1 file changed, 4

Re: [PATCH v2 04/12] powerpc/pseries/svm: Add a powerpc version of prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:23AM -0500, Tom Lendacky wrote: > Introduce a powerpc version of the prot_guest_has() function. This will > be used to replace the powerpc mem_encrypt_active() implementation, so > the implementation will initially only support the PATTR_MEM_ENCRYPT > attribute. > >

Re: [PATCH v2 01/12] x86/ioremap: Selectively build arch override encryption functions

2021-08-16 Thread Borislav Petkov
ted() > - memremap_is_efi_data() > - memremap_is_setup_data() > - early_memremap_is_setup_data() > > And finally, phys_mem_access_encrypted() is conditionally built as well, > but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is > not set. > > Cc: Thomas Gleixner &g

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-16 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:22AM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/include/asm/protected_guest.h > b/arch/x86/include/asm/protected_guest.h > new file mode 100644 > index ..51e4eefd9542 > --- /dev/null > +++ b/arch/x86/include/asm/protected_guest.h > @@ -0,0 +1,29

Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-16 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote: > In prep for other protected virtualization technologies, introduce a > generic helper function, prot_guest_has(), that can be used to check > for specific protection attributes, like memory encryption. This is > intended to eliminate

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-16 Thread Borislav Petkov
On Sun, Aug 15, 2021 at 08:53:31AM -0500, Tom Lendacky wrote: > It's not a cross-vendor thing as opposed to a KVM or other hypervisor > thing where the family doesn't have to be reported as AMD or HYGON. What would be the use case? A HV starts a guest which is supposed to be encrypted using the

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 10:22:52AM -0500, Tom Lendacky wrote: > I can change it to be an AMD/HYGON check... although, I'll have to check > to see if any (very) early use of the function will work with that. We can always change it later if really needed. It is just that I'm not a fan of such

Re: [PATCH v2 06/12] x86/sev: Replace occurrences of sev_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 10:26:18AM -0500, Tom Lendacky wrote: > >>/* > >> - * If SME is active we need to be sure that kexec pages are > >> - * not encrypted because when we boot to the new kernel the > >> + * If host memory encryption is active we need to be sure that kexec > >> + *

Re: [PATCH v2 05/12] x86/sme: Replace occurrences of sme_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 09:46:58AM -0500, Tom Lendacky wrote: > I'm ok with letting the TDX folks make changes to these calls to be SME or > SEV specific, if necessary, later. Yap, exactly. Let's add the specific stuff only when really needed. Thx. -- Regards/Gruss, Boris.

[PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
From: Borislav Petkov Building a randconfig here triggered: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! because the module export of that symbol happens in kernel/power/suspend.c which is enabled with CONFIG_SUSPEND. The if

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
On Tue, Aug 24, 2021 at 07:22:46PM +0530, Lazar, Lijo wrote: > 'pm_suspend_target_state' is only available when CONFIG_PM_SLEEP > is set/enabled. pm_suspend_target_state is available only when CONFIG_SUSPEND is enabled. The extern thing is only a forward declaration. > OTOH, when both SUSPEND

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
On Tue, Aug 24, 2021 at 06:38:41PM +0530, Lazar, Lijo wrote: > Without CONFIG_PM_SLEEP and with CONFIG_SUSPEND Can you even create such a .config? > I remember giving a reviewed-by for this one, looks like it never got in. > https://www.spinics.net/lists/amd-gfx/msg66166.html A better version

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-19 Thread Borislav Petkov
On Thu, Aug 19, 2021 at 10:52:53AM +0100, Christoph Hellwig wrote: > Which suggest that the name is not good to start with. Maybe protected > hardware, system or platform might be a better choice? Yah, coming up with a proper name here hasn't been easy. prot_guest_has() is not the first variant.

Re: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

2021-08-23 Thread Borislav Petkov
On Mon, Aug 23, 2021 at 03:49:39PM -0400, Alex Deucher wrote: > Maybe fixed with this patch? > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5706cb3c910cc8283f344bc37a889a8d523a2c6d Nope, this one is already in: $ git tag --contains

Re: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

2021-08-23 Thread Borislav Petkov
On Mon, Aug 23, 2021 at 04:31:42PM -0400, Alex Deucher wrote: > Thanks. I think that should do the trick. Care to send that as a > formal patch? Sure, but let me run it through the randconfigs tests first to make sure nothing else breaks. It is late here so if I don't manage now I'll send you a

Re: [PATCH v3 3/8] x86/sev: Add an x86 version of cc_platform_has()

2021-09-13 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:34PM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c > new file mode 100644 > index ..3c9bacd3c3f3 > --- /dev/null > +++ b/arch/x86/kernel/cc_platform.c > @@ -0,0 +1,21 @@ > +//

Re: [PATCH v3 2/8] mm: Introduce a function to check for confidential computing features

2021-09-13 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:33PM -0500, Tom Lendacky wrote: > In prep for other confidential computing technologies, introduce a generic preparation > helper function, cc_platform_has(), that can be used to check for specific > active confidential computing attributes, like memory encryption.

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-15 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 10:28:59AM +1000, Michael Ellerman wrote: > I don't love it, a new C file and an out-of-line call to then call back > to a static inline that for most configuration will return false ... but > whatever :) Yeah, hch thinks it'll cause a big mess otherwise:

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-14 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:35PM -0500, Tom Lendacky wrote: > Introduce a powerpc version of the cc_platform_has() function. This will > be used to replace the powerpc mem_encrypt_active() implementation, so > the implementation will initially only support the CC_ATTR_MEM_ENCRYPT > attribute. >

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-15 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 07:18:34PM +0200, Christophe Leroy wrote: > Could you please provide more explicit explanation why inlining such an > helper is considered as bad practice and messy ? Tom already told you to look at the previous threads. Let's read them together. This one, for example:

Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-15 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:31PM -0500, Tom Lendacky wrote: > This patch series provides a generic helper function, cc_platform_has(), > to replace the sme_active(), sev_active(), sev_es_active() and > mem_encrypt_active() functions. > > It is expected that as new confidential computing

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-14 Thread Borislav Petkov
On Tue, Sep 14, 2021 at 04:47:41PM +0200, Christophe Leroy wrote: > Yes, see > https://lore.kernel.org/linuxppc-dev/20210914123919.58203...@canb.auug.org.au/T/#t Aha, more compiler magic stuff ;-\ Oh well, I guess that fix will land upstream soon. Thx. -- Regards/Gruss, Boris.

Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-16 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 10:26:06AM -0700, Kuppuswamy, Sathyanarayanan wrote: > I have a Intel variant patch (please check following patch). But it includes > TDX changes as well. Shall I move TDX changes to different patch and just > create a separate patch for adding intel_cc_platform_has()?

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-14 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:36PM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c > index 18fe19916bc3..4b54a2377821 100644 > --- a/arch/x86/mm/mem_encrypt.c > +++ b/arch/x86/mm/mem_encrypt.c > @@ -144,7 +144,7 @@ void __init sme_unmap_bootdata(char

Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-07-28 Thread Borislav Petkov
On Wed, Jul 28, 2021 at 02:17:27PM +0100, Christoph Hellwig wrote: > So common checks obviously make sense, but I really hate the stupid > multiplexer. Having one well-documented helper per feature is much > easier to follow. We had that in x86 - it was called cpu_has_ where xxx is the feature

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Borislav Petkov
On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: > I still believe calling cc_platform_has() from __startup_64() is totally > broken as it lacks proper wrapping while accessing global variables. Well, one of the issues on the AMD side was using boot_cpu_data too early and the

Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-23 Thread Borislav Petkov
On Thu, Sep 23, 2021 at 02:29:07PM +, Yazen Ghannam wrote: > > + /* > > +* If the error was generated in UMC_V2, which belongs to GPU UMCs, > > +* and error occurred in DramECC (Extended error code = 0) then only > > +* process the error, else bail out. > > +*/ > > + if (!m

Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-23 Thread Borislav Petkov
On Thu, Sep 23, 2021 at 05:23:21PM +, Yazen Ghannam wrote: > Shouldn't the error still be reported to EDAC for decoding and counting? I > think users want this. You know what happens with users getting ECCs reported, right? They think immediately their hw is going bad and start wanting to

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-23 Thread Borislav Petkov
On Thu, Sep 23, 2021 at 12:05:58AM +0300, Kirill A. Shutemov wrote: > Unless we find other way to guarantee RIP-relative access, we must use > fixup_pointer() to access any global variables. Yah, I've asked compiler folks about any guarantees we have wrt rip-relative addresses but it doesn't look

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-13 Thread Borislav Petkov
On Wed, Oct 13, 2021 at 09:19:45AM +, Quan, Evan wrote: > So, I need your help to confirm the last two patches(I sent you) do not > affect the fix for the bug above. > Please follow the steps below to verify it: > 1. Launch a video playing > 2. open another terminal and issue "sudo

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-11 Thread Borislav Petkov
On Mon, Oct 11, 2021 at 08:03:51AM +, Quan, Evan wrote: > OK... Then forget about previous patches. Let's try to narrow down the > issue first. Please try the attached patch1 first. If it works, It does. > please undo the changes of patch1 and try patch2 to narrow down further. It does too.

[PATCH -v2] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically

2021-10-11 Thread Borislav Petkov
ll need to either enable it in their config or use "mem_encrypt=on" on the kernel command line. [ tlendacky: Generalize commit message. ] Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support") Reported-by: Paul Menzel Signed-off-by: Borislav Petkov Cc: Link: https:

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-14 Thread Borislav Petkov
On Thu, Oct 14, 2021 at 02:02:48AM +, Quan, Evan wrote: > [Quan, Evan] Yes, but not(apply them) at the same time. One by one as you did > before. > - try the patch1 first Ok, first patch worked fine. > - undo the changes of patch1 and try patch2 Did that, worked fine too except after the

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-11 Thread Borislav Petkov
On Sat, Oct 09, 2021 at 01:20:39AM +, Quan, Evan wrote: > Maybe the change below can address your issue. > https://lists.freedesktop.org/archives/amd-gfx/2021-September/069006.html Nope, that one doesn't change anything. Thx. -- Regards/Gruss, Boris.

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-11 Thread Borislav Petkov
On Sat, Oct 09, 2021 at 09:54:13AM +, Quan, Evan wrote: > Oops, I just found some necessary changes are missing from the patch of the > link below. > https://lists.freedesktop.org/archives/amd-gfx/2021-September/069006.html > > Could you try the patch from the link above + the attached

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-08 Thread Borislav Petkov
On Fri, Oct 08, 2021 at 11:12:35AM -0400, Alex Deucher wrote: > Can you try swapping the order of > amdgpu_device_ip_set_powergating_state() and > amdgpu_device_ip_set_clockgating_state() in the patch? Nope, the diff below didn't change things. Should I comment them out one by one and see

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-11 Thread Borislav Petkov
On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote: > I think, the IOMMU is enabled on the MSI B350M MORTAR, but otherwise, yes > this looks fine. The help text could also be updated to mention problems > with AMD Raven devices. This is not only about Raven GPUs but, as Alex explained,

Re: [PATCH] drm/amdgpu: fix the hang observed on Carrizo due to UVD suspend failure

2021-10-18 Thread Borislav Petkov
On Mon, Oct 18, 2021 at 03:34:32PM +0800, Evan Quan wrote: > It's confirmed that on some APUs the interaction with SMU(about DPM > disablement) > will power off the UVD. That will make the succeeding interactions with UVD > on the > suspend path impossible. And the system will hang due to that.

[PATCH 1/8] x86/ioremap: Selectively build arch override encryption functions

2021-09-28 Thread Borislav Petkov
, phys_mem_access_encrypted() is conditionally built as well, but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is not set. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/include/asm/io.h | 8 arch/x86/mm/ioremap.c | 2 +- 2 files

[PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
From: Borislav Petkov Hi all, here's v4 of the cc_platform_has() patchset with feedback incorporated. I'm going to route this through tip if there are no objections. Thx. Tom Lendacky (8): x86/ioremap: Selectively build arch override encryption functions arch/cc: Introduce a function

[PATCH 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-28 Thread Borislav Petkov
-by: Borislav Petkov Acked-by: Michael Ellerman --- arch/powerpc/platforms/pseries/Kconfig | 1 + arch/powerpc/platforms/pseries/Makefile | 2 ++ arch/powerpc/platforms/pseries/cc_platform.c | 26 3 files changed, 29 insertions(+) create mode 100644 arch/powerpc/platforms

[PATCH 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
of sev_active() that are really geared towards detecting if SME is active. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/include/asm/kexec.h | 2 +- arch/x86/include/asm/mem_encrypt.h | 2 -- arch/x86/kernel/machine_kexec_64.c | 15 --- arch/x86/kernel

[PATCH 8/8] treewide: Replace the use of mem_encrypt_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
implementation of mem_encrypt_active(), cc_platform_has() does not need to be implemented in s390 (the config option ARCH_HAS_CC_PLATFORM is not set). Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/powerpc/include/asm/mem_encrypt.h | 5 - arch/powerpc/platforms/pseries/svm.c| 5

[PATCH 7/8] x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
Signed-off-by: Borislav Petkov --- arch/x86/include/asm/mem_encrypt.h | 2 -- arch/x86/kernel/sev.c | 6 +++--- arch/x86/mm/mem_encrypt.c | 24 +++- arch/x86/realmode/init.c | 3 +-- 4 files changed, 7 insertions(+), 28 deletions(-) diff --git

[PATCH 2/8] arch/cc: Introduce a function to check for confidential computing features

2021-09-28 Thread Borislav Petkov
-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/Kconfig| 3 ++ include/linux/cc_platform.h | 88 + 2 files changed, 91 insertions(+) create mode 100644 include/linux/cc_platform.h diff --git a/arch/Kconfig b/arch/Kconfig index

[PATCH 3/8] x86/sev: Add an x86 version of cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky Introduce an x86 version of the cc_platform_has() function. This will be used to replace vendor specific calls like sme_active(), sev_active(), etc. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/Kconfig | 1 + arch/x86/include

[PATCH 6/8] x86/sev: Replace occurrences of sev_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
-by: Borislav Petkov --- arch/x86/include/asm/mem_encrypt.h | 2 -- arch/x86/kernel/crash_dump_64.c| 4 +++- arch/x86/kernel/kvm.c | 3 ++- arch/x86/kernel/kvmclock.c | 4 ++-- arch/x86/kernel/machine_kexec_64.c | 4 ++-- arch/x86/kvm/svm/svm.c | 3

Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 12:19:49PM -0700, Kuppuswamy, Sathyanarayanan wrote: > Intel CC support patch is not included in this series. You want me > to address the issue raised by Joerg before merging it? Did you not see my email to you today: https://lkml.kernel.org/r/yvl4zughfsh1q...@zn.tnic ?

Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 02:01:57PM -0700, Kuppuswamy, Sathyanarayanan wrote: > Yes. But, since the check is related to TDX, I just want to confirm whether > you are fine with naming the function as intel_*(). Why is this such a big of a deal?! There's amd_cc_platform_has() and

Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 01:48:46PM -0700, Kuppuswamy, Sathyanarayanan wrote: > Just read it. If you want to use cpuid_has_tdx_guest() directly in > cc_platform_has(), then you want to rename intel_cc_platform_has() to > tdx_cc_platform_has()? Why? You simply do: if

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 09:23:22AM -0400, Alex Deucher wrote: > There could be some OEM systems that disable the IOMMU on the platform > and don't provide a switch in the bios to enable it. The GPU driver > will still work in that case, it will just not be able to enable KFD > support for ROCm

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote: > It's not incompatible per se, but SEM requires the IOMMU be enabled > because the C bit used for encryption is beyond the dma_mask of most > devices. If the C bit is not set, the en/decryption for DMA doesn't > occur. So you need

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 02:36:56PM -0400, Alex Deucher wrote: > From the x86 model and family info? I think Raven has different > families from other Zen based CPUs. Yeah, I'd like to avoid a f/m/s mapping table, if possible. Those things should be a last resort and they always need adjustment

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 02:21:40PM -0400, Alex Deucher wrote: > And just another general comment, swiotlb + bounce buffers isn't > really useful on GPUs. You may have 10-100s of MBs of memory mapped > long term into the GPU's address space for random access. E.g., you > may have buffers in

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
Ok, so I sat down and wrote something and tried to capture all the stuff we so talked about that it is clear in the future why we did it. Thoughts? --- From: Borislav Petkov Date: Wed, 6 Oct 2021 19:34:55 +0200 Subject: [PATCH] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 02:10:30PM -0400, Alex Deucher wrote: > This is not limited to Raven. That's what the innocuous "a.o." wanted to state. :) > All GPUs (and quite a few other > devices) have a limited DMA mask. AMD GPUs have between 32 and 48 > bits of DMA depending on what generation the

bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-07 Thread Borislav Petkov
Hi folks, commit in $Subject breaks rebooting an HP laptop here with a Carrizo chipset: after typing "reboot" and pressing Enter, it powers off the machine up to a certain point but the fans remain on, screen goes black and nothing happens anymore. No reboot. I have to power it off by holding the

Re: [PATCHv2 1/2] x86/MCE/AMD: Export smca_get_bank_type symbol

2021-09-22 Thread Borislav Petkov
On Sun, Sep 12, 2021 at 10:13:10PM -0400, Mukul Joshi wrote: > Export smca_get_bank_type for use in the AMD GPU > driver to determine MCA bank while handling correctable > and uncorrectable errors in GPU UMC. > > v1->v2: > - Drop the function is_smca_umc_v2(). > - Drop the patch to introduce a

Re: [PATCHv2 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-22 Thread Borislav Petkov
On Sun, Sep 12, 2021 at 10:13:11PM -0400, Mukul Joshi wrote: > On Aldebaran, GPU driver will handle bad page retirement > even though UMC is host managed. As a result, register a > bad page retirement handler on the mce notifier chain to > retire bad pages on Aldebaran. > > v1->v2: > - Use

Re: [PATCHv2 1/2] x86/MCE/AMD: Export smca_get_bank_type symbol

2021-09-22 Thread Borislav Petkov
; Want me to ACK this and you can carry it through your tree along with the > > second patch? > > That would be great. Thanks! Ok, with the above changelog removed: Acked-by: Borislav Petkov Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-22 Thread Borislav Petkov
On Wed, Sep 22, 2021 at 05:30:15PM +0300, Kirill A. Shutemov wrote: > Not fine, but waiting to blowup with random build environment change. Why is it not fine? Are you suspecting that the compiler might generate something else and not a rip-relative access? -- Regards/Gruss, Boris.

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-05 Thread Borislav Petkov
On Tue, Oct 05, 2021 at 04:29:41PM +0200, Paul Menzel wrote: > Selecting the symbol `AMD_MEM_ENCRYPT` – as > done in Debian 5.13.9-1~exp1 [1] – also selects > `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT`, as it defaults to yes, I'm assuming that "selecting" is done automatically: alldefconfig,

Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-27 Thread Borislav Petkov
On Fri, Sep 24, 2021 at 07:46:10PM +, Yazen Ghannam wrote: > I agree with you in general. But this device isn't really a GPU. And > users of this device seem to want to count *every* error, at least for > now. Aha, so something accelerator-y where they do general purpose computation. So

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-24 Thread Borislav Petkov
On Fri, Sep 24, 2021 at 12:41:32PM +0300, Kirill A. Shutemov wrote: > On Thu, Sep 23, 2021 at 08:21:03PM +0200, Borislav Petkov wrote: > > On Thu, Sep 23, 2021 at 12:05:58AM +0300, Kirill A. Shutemov wrote: > > > Unless we find other way to guarantee RIP-relative a

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Borislav Petkov
On Tue, Sep 21, 2021 at 12:04:58PM -0500, Tom Lendacky wrote: > Looks like instrumentation during early boot. I worked with Boris offline to > exclude arch/x86/kernel/cc_platform.c from some of the instrumentation and > that allowed an allyesconfig to boot. And here's the lineup I have so far,

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-11-05 Thread Borislav Petkov
On Fri, Nov 05, 2021 at 08:05:41AM +, Quan, Evan wrote: > I'm wondering are you able to give the attached patch(alone) a try. Yap, looks good. Tested-by: Borislav Petkov -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] drm/amd/pm: avoid duplicate powergate/ungate setting

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 09:51:03AM +0100, Paul Menzel wrote: > Please elaborate the kind of issues. It fails to reboot on Carrizo-based laptops. Whoever commits this, pls add Link: https://lore.kernel.org/r/yv81vidwqlwva...@zn.tnic so that it is clear what the whole story way. Thx. --

Re: RIP: 0010:radeon_vm_fini+0x15/0x220 [radeon]

2022-01-17 Thread Borislav Petkov
On Mon, Jan 17, 2022 at 08:16:09AM +0100, Christian König wrote: > Interesting to see that even that old stuff is still used. Well, "used" is a stretch. This is my way of testing on K8 as pretty much all the big K8 boxes to which I had access to, got decommissioned so this baby is the only K8

Re: amdgpu refcount saturation

2022-12-23 Thread Borislav Petkov
On Thu, Dec 22, 2022 at 10:20:37PM +0100, Michal Kubecek wrote: > Unfortunately, just like Boris, I always seem to have multiple stack > traces tangled together. See if this fixes it: https://lore.kernel.org/r/20221219104718.21677-1-christian.koe...@amd.com Thx. -- Regards/Gruss, Boris.

amdgpu refcount saturation

2022-12-18 Thread Borislav Petkov
Hi folks, this is with Linus' tree from Wed: 041fae9c105a ("Merge tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs") on a CZ laptop: [7.782901] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 0x103C:0x807E 0xC4) The splat is kinda messy:

Re: [PATCH] drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency

2022-12-19 Thread Borislav Petkov
u_vm_sdma.c | 2 ++ > 1 file changed, 2 insertions(+) Thanks, that fixes it. Reported-by: Borislav Petkov (AMD) Tested-by: Borislav Petkov (AMD) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

  1   2   >