[PATCH V3] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Stanley . Yang
Changed from V1: rename some functions name, only init ras error handler data for supported asic. Changed from V2: fix potential memory leak. Signed-off-by: Stanley.Yang Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

RE: [PATCH 12/16] drm/amd/powerplay: better namings

2020-06-04 Thread Quan, Evan
[AMD Official Use Only - Internal Distribution Only] -Original Message- From: Alex Deucher Sent: Friday, June 5, 2020 5:07 AM To: Quan, Evan Cc: amd-gfx list ; Deucher, Alexander Subject: Re: [PATCH 12/16] drm/amd/powerplay: better namings On Thu, Jun 4, 2020 at 12:47 AM Evan Quan

[PATCH V3] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Stanley . Yang
Changed from V1: rename some functions name, only init ras error handler data for supported asic. Changed from V2: fix poential memory leak. Signed-off-by: Stanley.Yang Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |

RE: [PATCH V2] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Yang, Stanley
[AMD Public Use] Hi Tao, Thanks for your suggestion and reply inline. Regards, Stanley > -Original Message- > From: Zhou1, Tao > Sent: Friday, June 5, 2020 11:00 AM > To: Yang, Stanley ; amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Chen, Guchun > ; Liu, Monk ;

RE: [PATCH V2] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Zhang, Hawking
[AMD Public Use] I would not suggest to explicitly call out SRIOV in the kernel message. That's just confusing people. It doesn't matter the message share the same format with bare-metal one -- We haven't make a unified amdgpu driver to support both host and guest for bare-metal and sriov use

RE: [PATCH V2] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Yang, Stanley
[AMD Public Use] Thanks GuChun, Will fix potential memory leak and typo. Regards, Stanley > -Original Message- > From: Chen, Guchun > Sent: Friday, June 5, 2020 10:24 AM > To: Yang, Stanley ; amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Liu, Monk > ; Clements, John ; Zhou1, >

RE: [PATCH V2] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Zhou1, Tao
[AMD Public Use] > -Original Message- > From: Stanley.Yang > Sent: 2020年6月4日 20:36 > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Chen, Guchun > ; Liu, Monk ; Clements, > John ; Zhou1, Tao ; Li, > Dennis ; Yang, Stanley > Subject: [PATCH V2] drm/amdgpu: support reserve bad

Re: [PATCH 16/16] drm/amd/powerplay: skip BACO feature on DPMs disablement

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote: > > Instead of disabling and reenabling it later. > > Change-Id: I90775202178f3b7695f42f39ce240bbfd51a1346 > Signed-off-by: Evan Quan Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 72 ++ > 1

Re: [PATCH 14/16] drm/amd/powerplay: allocate the struct amdgpu_irq_src on the stack

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote: > > Since it is only several bytes in size. I think the subject and description should be clarified a bit. We are not allocating it on the stack. We are just moving the object to the smu structure allocation rather than allocating it dynamically

Re: [PATCH 12/16] drm/amd/powerplay: better namings

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 5:07 PM Alex Deucher wrote: > > On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote: > > > > And some minor changes as dropping unused parameter and label > > internal used API as static. > > > > Change-Id: I0af0aea029dc4fc7d8e150ab6ec984e9a5f1a74a > > Signed-off-by: Evan Quan

Re: [PATCH 13/16] drm/amd/powerplay: max code sharing between .hw_fini and .suspend

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote: > > Thus redundant code can be dropped. > > Change-Id: I672f84ed5856da53b7f8f915b2f24ca11cd4b228 > Signed-off-by: Evan Quan Clarify subject: drm/amd/powerplay: maximize code sharing between .hw_fini and .suspend With that fixed: Reviewed-by:

Re: [PATCH 11/16] drm/amd/powerplay: resort those operations performed in hw setup

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote: > > Those common operations(for all ASICs) are placed first and followed > by ASIC specific ones. While the display related are placed at the last. > > Change-Id: Id45caee98273c8c0b9c1c9f2713fcf8106e02000 > Signed-off-by: Evan Quan Typo in the

Re: [PATCH 10/16] drm/amd/powerplay: max code sharing between .hw_init and .resume

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote: > > Then redundant code can be dropped. > > Change-Id: Icbafbb7ffc8189a09f4236786aea6702ee73f9f4 > Signed-off-by: Evan Quan Subject could be clarified as: drm/amd/powerplay: maximize code sharing between .hw_init and .resume With that fixed:

Re: [PATCH 09/16] drm/amd/powerplay: move those operations not needed for resume out

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote: > > Since smu_smc_table_hw_init() is needed for both .hw_init and .resume. > By doing this, we can drop unnecessary operations on resume. > > Change-Id: I2af6277efaa9adba2de69161e20e54c4aa10a411 > Signed-off-by: Evan Quan Reviewed-by: Alex

Re: [PATCH 08/16] drm/amd/powerplay: postpone operations not must for hw setup to late_init

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote: > > So that we do not need to perform those unnecessary operations again on > resume. > > Change-Id: I90f8a8d68762b5f88d7477934128a17bf67e3341 > Signed-off-by: Evan Quan For the patch subject, I think it would be clearer as: drm/amd/powerplay:

Re: [PATCH 03/16] drm/amd/powerplay: implement a common API for dpms disablement

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote: > > So that code can be shared between .hw_fini and .suspend. > > Change-Id: I4a0eeb7cdecbf5b24fac3d0fe1d8fcb1ca9f0b0a > Signed-off-by: Evan Quan Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 178

Re: [PATCH 02/16] drm/amd/powerplay: drop unused APIs and unnecessary checks

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:46 AM Evan Quan wrote: > > Minor code cleanups. > > Change-Id: I6d240241e78cae17288c1d49dbae6ab1796b1128 > Signed-off-by: Evan Quan Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 74 --- >

Re: [PATCH 01/16] drm/amd/powerplay: eliminate asic type check

2020-06-04 Thread Alex Deucher
On Thu, Jun 4, 2020 at 12:46 AM Evan Quan wrote: > > By moving ASIC specific code into its own file. You might want to clarify that the macros check if the asic has the callback, so no need to explicitly check. With that fixed: Reviewed-by: Alex Deucher > > Change-Id:

Re: [PATCH][next] drm/amd/display: fix spelling mistake: "propogation" -> "propagation"

2020-06-04 Thread Alex Deucher
Applied. thanks! Alex On Thu, Jun 4, 2020 at 6:35 AM Colin King wrote: > > From: Colin Ian King > > There is a spelling mistake in a dml_print message. Fix it. > > Signed-off-by: Colin Ian King > --- > drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 2 +- > 1 file changed,

[pull] amdgpu drm-fixes-5.8

2020-06-04 Thread Alex Deucher
Hi Dave, Daniel, Fixes for 5.8. The following changes since commit 9ca1f474cea0edc14a1d7ec933e5472c0ff115d3: Merge tag 'amd-drm-next-5.8-2020-05-27' of git://people.freedesktop.org/~agd5f/linux into drm-next (2020-05-28 16:10:17 +1000) are available in the Git repository at:

Re: [PATCH] sound/pci/hda: add sienna_cichlid audio asic id for sienna_cichlid up

2020-06-04 Thread Alex Deucher
On Wed, Jun 3, 2020 at 5:39 AM Takashi Iwai wrote: > > On Wed, 03 Jun 2020 03:31:37 +0200, > Alex Deucher wrote: > > > > From: Hersen Wu > > > > dp/hdmi ati hda is not shown in audio settings > > > > Signed-off-by: Hersen Wu > > Reviewed-by: Alex Deucher > > Signed-off-by: Alex Deucher > >

[PATCH][next] drm/amd/display: fix spelling mistake: "propogation" -> "propagation"

2020-06-04 Thread Colin King
From: Colin Ian King There is a spelling mistake in a dml_print message. Fix it. Signed-off-by: Colin Ian King --- drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[PATCH V2] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Stanley . Yang
Changed from V1: rename same functions name, only init ras error handler data for supported asic. Signed-off-by: Stanley.Yang Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |

Re: [Intel-gfx] [PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-04 Thread Daniel Vetter
On Thu, Jun 4, 2020 at 11:27 AM Chris Wilson wrote: > > Quoting Daniel Vetter (2020-06-04 10:21:46) > > On Thu, Jun 4, 2020 at 10:57 AM Thomas Hellström (Intel) > > wrote: > > > > > > > > > On 6/4/20 10:12 AM, Daniel Vetter wrote: > > > ... > > > > Thread A: > > > > > > > > mutex_lock(A);

Re: [PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-04 Thread Chris Wilson
Quoting Daniel Vetter (2020-06-04 10:21:46) > On Thu, Jun 4, 2020 at 10:57 AM Thomas Hellström (Intel) > wrote: > > > > > > On 6/4/20 10:12 AM, Daniel Vetter wrote: > > ... > > > Thread A: > > > > > > mutex_lock(A); > > > mutex_unlock(A); > > > > > > dma_fence_signal(); > > > >

Re: [PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-04 Thread Daniel Vetter
On Thu, Jun 4, 2020 at 10:57 AM Thomas Hellström (Intel) wrote: > > > On 6/4/20 10:12 AM, Daniel Vetter wrote: > ... > > Thread A: > > > > mutex_lock(A); > > mutex_unlock(A); > > > > dma_fence_signal(); > > > > Thread B: > > > > mutex_lock(A); > > dma_fence_wait(); >

Re: [PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-04 Thread Intel
On 6/4/20 10:12 AM, Daniel Vetter wrote: ... Thread A: mutex_lock(A); mutex_unlock(A); dma_fence_signal(); Thread B: mutex_lock(A); dma_fence_wait(); mutex_unlock(A); Thread B is blocked on A signalling the fence, but A never gets around to

[PATCH 18/18] drm/i915: Annotate dma_fence_work

2020-06-04 Thread Daniel Vetter
i915 does tons of allocations from this worker, which lockdep catches. Also generic infrastructure like this with big potential for how dma_fence or other cross driver contracts work, really should be reviewed on dri-devel. Implementing custom wheels for everything within the driver is a classic

[PATCH 15/18] drm/amdgpu: use dma-fence annotations for gpu reset code

2020-06-04 Thread Daniel Vetter
To improve coverage also annotate the gpu reset code itself, since that's called from other places than drm/scheduler (which is already annotated). Annotations nests, so this doesn't break anything, and allows easier testing. Cc: linux-me...@vger.kernel.org Cc: linaro-mm-...@lists.linaro.org Cc:

[PATCH 13/18] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail

2020-06-04 Thread Daniel Vetter
Trying to grab dma_resv_lock while in commit_tail before we've done all the code that leads to the eventual signalling of the vblank event (which can be a dma_fence) is deadlock-y. Don't do that. Here the solution is easy because just grabbing locks to read something races anyway. We don't need

[PATCH 16/18] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset"

2020-06-04 Thread Daniel Vetter
This is one from the department of "maybe play lottery if you hit this, karma compensation might work". Or at least lockdep ftw! This reverts commit 565d1941557756a584ac357d945bc374d5fcd1d0. It's not quite as low-risk as the commit message claims, because this grabs console_lock, which might be

[PATCH 17/18] drm/amdgpu: gpu recovery does full modesets

2020-06-04 Thread Daniel Vetter
... I think it's time to stop this little exercise. The lockdep splat, for the record: [ 132.583381] == [ 132.584091] WARNING: possible circular locking dependency detected [ 132.584775] 5.7.0-rc3+ #346 Tainted: GW [ 132.585461]

[PATCH 05/18] drm/vkms: Annotate vblank timer

2020-06-04 Thread Daniel Vetter
This is needed to signal the fences from page flips, annotate it accordingly. We need to annotate entire timer callback since if we get stuck anywhere in there, then the timer stops, and hence fences stop. Just annotating the top part that does the vblank handling isn't enough. Cc:

[PATCH 09/18] drm/scheduler: use dma-fence annotations in main thread

2020-06-04 Thread Daniel Vetter
If the scheduler rt thread gets stuck on a mutex that we're holding while waiting for gpu workloads to complete, we have a problem. Add dma-fence annotations so that lockdep can check this for us. I've tried to quite carefully review this, and I think it's at the right spot. But obviosly no

[PATCH 06/18] drm/vblank: Annotate with dma-fence signalling section

2020-06-04 Thread Daniel Vetter
This is rather overkill since currently all drivers call this from hardirq (or at least timers). But maybe in the future we're going to have thread irq handlers and what not, doesn't hurt to be prepared. Plus this is an easy start for sprinkling these fence annotations into shared code. Cc:

[PATCH 07/18] drm/atomic-helper: Add dma-fence annotations

2020-06-04 Thread Daniel Vetter
This is a bit disappointing since we need to split the annotations over all the different parts. I was considering just leaking the critical section into the ->atomic_commit_tail callback of each driver. But that would mean we need to pass the fence_cookie into each driver (there's a total of 13

[PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-04 Thread Daniel Vetter
Two in one go: - it is allowed to call dma_fence_wait() while holding a dma_resv_lock(). This is fundamental to how eviction works with ttm, so required. - it is allowed to call dma_fence_wait() from memory reclaim contexts, specifically from shrinker callbacks (which i915 does), and from

[PATCH 02/18] dma-buf: minor doc touch-ups

2020-06-04 Thread Daniel Vetter
Just some tiny edits: - fix link to struct dma_fence - give slightly more meaningful title - the polling here is about implicit fences, explicit fences (in sync_file or drm_syncobj) also have their own polling Signed-off-by: Daniel Vetter --- drivers/dma-buf/dma-buf.c | 6 +++--- 1 file

[PATCH 03/18] dma-fence: basic lockdep annotations

2020-06-04 Thread Daniel Vetter
Design is similar to the lockdep annotations for workers, but with some twists: - We use a read-lock for the execution/worker/completion side, so that this explicit annotation can be more liberally sprinkled around. With read locks lockdep isn't going to complain if the read-side isn't

[PATCH 10/18] drm/amdgpu: use dma-fence annotations in cs_submit()

2020-06-04 Thread Daniel Vetter
This is a bit tricky, since ->notifier_lock is held while calling dma_fence_wait we must ensure that also the read side (i.e. dma_fence_begin_signalling) is on the same side. If we mix this up lockdep complaints, and that's again why we want to have these annotations. A nice side effect of this

[PATCH 00/18] dma-fence lockdep annotations, round 2

2020-06-04 Thread Daniel Vetter
Hi all, Still very much early stuff, still very much looking for initial thoughts and maybe some ideas how this could all be rolled out across drivers. Full intro probably best from the RFC cover letter: https://lore.kernel.org/amd-gfx/20200512085944.222637-1-daniel.vet...@ffwll.ch/ Changes

[PATCH 11/18] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code

2020-06-04 Thread Daniel Vetter
My dma-fence lockdep annotations caught an inversion because we allocate memory where we really shouldn't: kmem_cache_alloc+0x2b/0x6d0 amdgpu_fence_emit+0x30/0x330 [amdgpu] amdgpu_ib_schedule+0x306/0x550 [amdgpu] amdgpu_job_run+0x10f/0x260 [amdgpu]

[PATCH 08/18] drm/amdgpu: add dma-fence annotations to atomic commit path

2020-06-04 Thread Daniel Vetter
I need a canary in a ttm-based atomic driver to make sure the dma_fence_begin/end_signalling annotations actually work. Cc: linux-me...@vger.kernel.org Cc: linaro-mm-...@lists.linaro.org Cc: linux-r...@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-...@lists.freedesktop.org Cc: Chris

[PATCH 01/18] mm: Track mmu notifiers in fs_reclaim_acquire/release

2020-06-04 Thread Daniel Vetter
fs_reclaim_acquire/release nicely catch recursion issues when allocating GFP_KERNEL memory against shrinkers (which gpu drivers tend to use to keep the excessive caches in check). For mmu notifier recursions we do have lockdep annotations since 23b68395c7c7 ("mm/mmu_notifiers: add a lockdep map

[PATCH 14/18] drm/scheduler: use dma-fence annotations in tdr work

2020-06-04 Thread Daniel Vetter
In the face of unpriviledged userspace being able to submit bogus gpu workloads the kernel needs gpu timeout and reset (tdr) to guarantee that dma_fences actually complete. Annotate this worker to make sure we don't have any accidental locking inversions or other problems lurking. Originally this

[PATCH 12/18] drm/amdgpu: DC also loves to allocate stuff where it shouldn't

2020-06-04 Thread Daniel Vetter
Not going to bother with a complete commit message, just offending backtrace: kvmalloc_node+0x47/0x80 dc_create_state+0x1f/0x60 [amdgpu] dc_commit_state+0xcb/0x9b0 [amdgpu] amdgpu_dm_atomic_commit_tail+0xd31/0x2010 [amdgpu] commit_tail+0xa4/0x140

RE: [PATCH] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Yang, Stanley
[AMD Public Use] Thanks tao, to call amdgpu_virt_init_err_handler_data In amdgpu_virt_add_bad_page once Is also a way, I will check whether has potential risk. And I'll make distinguish the message from the one in bare mental RAS when reserved page failed. Regards, Stanley > -Original