[Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered

2021-08-14 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213391

mcmar...@gmx.net changed:

   What|Removed |Added

 CC||mcmar...@gmx.net

--- Comment #35 from mcmar...@gmx.net ---
i have a Lenovo L340 and the same problem

here is the complete dmesg log

https://gist.github.com/McMarius11/36c8d21a2dcaf5c2289c91a74af4f7fb

Operating System: Manjaro Linux
KDE Plasma Version: 5.22.4
KDE Frameworks Version: 5.84.0
Qt Version: 5.15.2
Kernel Version: 5.11.22-2-MANJARO (64-bit)
Graphics Platform: X11
Processors: 8 × AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
Memory: 5,6 GiB of RAM
Graphics Processor: AMD Radeon™ Vega 10 Graphics

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-14 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:22AM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/include/asm/protected_guest.h 
> b/arch/x86/include/asm/protected_guest.h
> new file mode 100644
> index ..51e4eefd9542
> --- /dev/null
> +++ b/arch/x86/include/asm/protected_guest.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Protected Guest (and Host) Capability checks
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + */
> +
> +#ifndef _X86_PROTECTED_GUEST_H
> +#define _X86_PROTECTED_GUEST_H
> +
> +#include 
> +
> +#ifndef __ASSEMBLY__
> +
> +static inline bool prot_guest_has(unsigned int attr)
> +{
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + if (sme_me_mask)
> + return amd_prot_guest_has(attr);
> +#endif
> +
> + return false;
> +}
> +
> +#endif   /* __ASSEMBLY__ */
> +
> +#endif   /* _X86_PROTECTED_GUEST_H */

I think this can be simplified more, diff ontop below:

- no need for the ifdeffery as amd_prot_guest_has() has versions for
both when CONFIG_AMD_MEM_ENCRYPT is set or not.

- the sme_me_mask check is pushed there too.

- and since this is vendor-specific, I'm checking the vendor bit. Yeah,
yeah, cross-vendor but I don't really believe that.

---
diff --git a/arch/x86/include/asm/protected_guest.h 
b/arch/x86/include/asm/protected_guest.h
index 51e4eefd9542..8541c76d5da4 100644
--- a/arch/x86/include/asm/protected_guest.h
+++ b/arch/x86/include/asm/protected_guest.h
@@ -12,18 +12,13 @@
 
 #include 
 
-#ifndef __ASSEMBLY__
-
 static inline bool prot_guest_has(unsigned int attr)
 {
-#ifdef CONFIG_AMD_MEM_ENCRYPT
-   if (sme_me_mask)
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+   boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
return amd_prot_guest_has(attr);
-#endif
 
return false;
 }
 
-#endif /* __ASSEMBLY__ */
-
 #endif /* _X86_PROTECTED_GUEST_H */
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index edc67ddf065d..5a0442a6f072 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -392,6 +392,9 @@ bool noinstr sev_es_active(void)
 
 bool amd_prot_guest_has(unsigned int attr)
 {
+   if (!sme_me_mask)
+   return false;
+
switch (attr) {
case PATTR_MEM_ENCRYPT:
return sme_me_mask != 0;

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-14 Thread Tom Lendacky

On 8/14/21 1:32 PM, Borislav Petkov wrote:

On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote:

diff --git a/include/linux/protected_guest.h b/include/linux/protected_guest.h
new file mode 100644
index ..43d4dde94793
--- /dev/null
+++ b/include/linux/protected_guest.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Protected Guest (and Host) Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ */
+
+#ifndef _PROTECTED_GUEST_H
+#define _PROTECTED_GUEST_H
+
+#ifndef __ASSEMBLY__

   ^

Do you really need that guard? It builds fine without it too. Or
something coming later does need it...?


No, I probably did it out of habit. I can remove it in the next version.

Thanks,
Tom





Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-14 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote:
> In prep for other protected virtualization technologies, introduce a
> generic helper function, prot_guest_has(), that can be used to check
> for specific protection attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks
> to the code (e.g. if (sev_active() || tdx_active())).
> 
> Reviewed-by: Joerg Roedel 
> Co-developed-by: Andi Kleen 
> Signed-off-by: Andi Kleen 
> Co-developed-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/Kconfig|  3 +++
>  include/linux/protected_guest.h | 35 +
>  2 files changed, 38 insertions(+)
>  create mode 100644 include/linux/protected_guest.h
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 98db63496bab..bd4f60c581f1 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1231,6 +1231,9 @@ config RELR
>  config ARCH_HAS_MEM_ENCRYPT
>   bool
>  
> +config ARCH_HAS_PROTECTED_GUEST
> + bool
> +
>  config HAVE_SPARSE_SYSCALL_NR
> bool
> help
> diff --git a/include/linux/protected_guest.h b/include/linux/protected_guest.h
> new file mode 100644
> index ..43d4dde94793
> --- /dev/null
> +++ b/include/linux/protected_guest.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Protected Guest (and Host) Capability checks
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + */
> +
> +#ifndef _PROTECTED_GUEST_H
> +#define _PROTECTED_GUEST_H
> +
> +#ifndef __ASSEMBLY__
   ^

Do you really need that guard? It builds fine without it too. Or
something coming later does need it...?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 01/12] x86/ioremap: Selectively build arch override encryption functions

2021-08-14 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:20AM -0500, Tom Lendacky wrote:
> In prep for other uses of the prot_guest_has() function besides AMD's
> memory encryption support, selectively build the AMD memory encryption
> architecture override functions only when CONFIG_AMD_MEM_ENCRYPT=y. These
> functions are:
> - early_memremap_pgprot_adjust()
> - arch_memremap_can_ram_remap()
> 
> Additionally, routines that are only invoked by these architecture
> override functions can also be conditionally built. These functions are:
> - memremap_should_map_decrypted()
> - memremap_is_efi_data()
> - memremap_is_setup_data()
> - early_memremap_is_setup_data()
> 
> And finally, phys_mem_access_encrypted() is conditionally built as well,
> but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is
> not set.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/io.h | 8 
>  arch/x86/mm/ioremap.c | 2 +-
>  2 files changed, 9 insertions(+), 1 deletion(-)

LGTM.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[BUG - BISECTED] display not detected anymore

2021-08-14 Thread Heiko Carstens
Hello,

I have Fedora 33 running, and with the Fedore kernel update from 5.11
series to 5.12 my external monitor was not detected anymore. Same is
true with the Fedora supplied 5.13 kernel version.

So I tried with vanilla kernel 5.11 and latest git head from Linus'
tree. 5.11 works while latest git head does not. Bisecting the problem
points to commit 32c3d9b0f51e ("Merge tag 'drm-intel-next-2021-01-27'
of git://anongit.freedesktop.org/drm/drm-intel into drm-next").

Unfortunately it is a merge commit, so it looks like conflicting
changes have been made in the parent branches.

Hardware in use:

- ThinkPad X1 Yoga 4th / Carbon 7th
- Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz

The Thinkpad is connected to a ThinkPad Thunderbolt 3 Dock with a
Thunderbolt cable and a monitor (Eizo EV3285) is connected via
Displayport to the docking station.

The monitor is detected and works without any problems (4k@60HZ)
before the above mentioned merge commit. With the commit and
afterwards it is not detected anymore and only the Thinkpad builtin
display can be used.

Any idea what went wrong? I can provide more information, or test
debug patches if wanted. Just let me know.

Thanks,
Heiko


[PATCH] drm/i915: Release i915_gem_context from a worker

2021-08-14 Thread Daniel Vetter
The only reason for this really is the i915_gem_engines->fence
callback engines_notify(), which exists purely as a fairly funky
reference counting scheme for that. Otherwise all other callers are
from process context, and generally fairly benign locking context.

Unfortunately untangling that requires some major surgery, and we have
a few i915_gem_context reference counting bugs that need fixing, and
they blow in the current hardirq calling context, so we need a
stop-gap measure.

Put a FIXME comment in when this should be removable again.

v2: Fix mock_context(), noticed by intel-gfx-ci.

Signed-off-by: Daniel Vetter 
Cc: Jon Bloomfield 
Cc: Chris Wilson 
Cc: Maarten Lankhorst 
Cc: Joonas Lahtinen 
Cc: Daniel Vetter 
Cc: "Thomas Hellström" 
Cc: Matthew Auld 
Cc: Lionel Landwerlin 
Cc: Dave Airlie 
Cc: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 13 +++--
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 12 
 drivers/gpu/drm/i915/gem/selftests/mock_context.c |  1 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index fd169cf2f75a..051bc357ff65 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -986,9 +986,10 @@ static struct i915_gem_engines *user_engines(struct 
i915_gem_context *ctx,
return err;
 }
 
-void i915_gem_context_release(struct kref *ref)
+static void i915_gem_context_release_work(struct work_struct *work)
 {
-   struct i915_gem_context *ctx = container_of(ref, typeof(*ctx), ref);
+   struct i915_gem_context *ctx = container_of(work, typeof(*ctx),
+   release_work);
 
trace_i915_context_free(ctx);
GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
@@ -1002,6 +1003,13 @@ void i915_gem_context_release(struct kref *ref)
kfree_rcu(ctx, rcu);
 }
 
+void i915_gem_context_release(struct kref *ref)
+{
+   struct i915_gem_context *ctx = container_of(ref, typeof(*ctx), ref);
+
+   queue_work(ctx->i915->wq, &ctx->release_work);
+}
+
 static inline struct i915_gem_engines *
 __context_engines_static(const struct i915_gem_context *ctx)
 {
@@ -1303,6 +1311,7 @@ i915_gem_create_context(struct drm_i915_private *i915,
ctx->sched = pc->sched;
mutex_init(&ctx->mutex);
INIT_LIST_HEAD(&ctx->link);
+   INIT_WORK(&ctx->release_work, i915_gem_context_release_work);
 
spin_lock_init(&ctx->stale.lock);
INIT_LIST_HEAD(&ctx->stale.engines);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 94c03a97cb77..0c38789bd4a8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -288,6 +288,18 @@ struct i915_gem_context {
 */
struct kref ref;
 
+   /**
+* @release_work:
+*
+* Work item for deferred cleanup, since i915_gem_context_put() tends to
+* be called from hardirq context.
+*
+* FIXME: The only real reason for this is &i915_gem_engines.fence, all
+* other callers are from process context and need at most some mild
+* shuffling to pull the i915_gem_context_put() call out of a spinlock.
+*/
+   struct work_struct release_work;
+
/**
 * @rcu: rcu_head for deferred freeing.
 */
diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.c 
b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
index fee070df1c97..067d68a6fe4c 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
@@ -23,6 +23,7 @@ mock_context(struct drm_i915_private *i915,
kref_init(&ctx->ref);
INIT_LIST_HEAD(&ctx->link);
ctx->i915 = i915;
+   INIT_WORK(&ctx->release_work, i915_gem_context_release_work);
 
mutex_init(&ctx->mutex);
 
-- 
2.32.0



Why we didn't use embedded gem object for virtio gpu when making ttm bo a gem bo subclass?

2021-08-14 Thread lepton
Hi Gerd,

We found a bug in 5.4 kernel and virtgpu_gem_prime_mmap doesn't work
because it references vma_node in gem_base object while ttm code
initialized vma_node in tbo.base object. I am wondering, in your
original serial:
https://patchwork.kernel.org/project/dri-devel/cover/20190805124310.3275-1-kra...@redhat.com/
(drm/ttm: make ttm bo a gem bo subclass), why you changed to use
embedded gem object for most gpu drivers but skipping virtio gpu? Is
there some specific reason?

I am thinking about CL like this (http://crrev.com/c/3092457) to fix
it and not sure if I missed something.

Thanks for your help!