Changed from V1:
rename some functions name, only init ras error handler data for
supported asic.
Changed from V2:
fix potential memory leak.
Signed-off-by: Stanley.Yang
Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
[AMD Official Use Only - Internal Distribution Only]
-Original Message-
From: Alex Deucher
Sent: Friday, June 5, 2020 5:07 AM
To: Quan, Evan
Cc: amd-gfx list ; Deucher, Alexander
Subject: Re: [PATCH 12/16] drm/amd/powerplay: better namings
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan
Changed from V1:
rename some functions name, only init ras error handler data for
supported asic.
Changed from V2:
fix poential memory leak.
Signed-off-by: Stanley.Yang
Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |
[AMD Public Use]
Hi Tao,
Thanks for your suggestion and reply inline.
Regards,
Stanley
> -Original Message-
> From: Zhou1, Tao
> Sent: Friday, June 5, 2020 11:00 AM
> To: Yang, Stanley ; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Chen, Guchun
> ; Liu, Monk ;
[AMD Public Use]
I would not suggest to explicitly call out SRIOV in the kernel message. That's
just confusing people. It doesn't matter the message share the same format with
bare-metal one -- We haven't make a unified amdgpu driver to support both host
and guest for bare-metal and sriov use
[AMD Public Use]
Thanks GuChun,
Will fix potential memory leak and typo.
Regards,
Stanley
> -Original Message-
> From: Chen, Guchun
> Sent: Friday, June 5, 2020 10:24 AM
> To: Yang, Stanley ; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Liu, Monk
> ; Clements, John ; Zhou1,
>
[AMD Public Use]
> -Original Message-
> From: Stanley.Yang
> Sent: 2020年6月4日 20:36
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Chen, Guchun
> ; Liu, Monk ; Clements,
> John ; Zhou1, Tao ; Li,
> Dennis ; Yang, Stanley
> Subject: [PATCH V2] drm/amdgpu: support reserve bad
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote:
>
> Instead of disabling and reenabling it later.
>
> Change-Id: I90775202178f3b7695f42f39ce240bbfd51a1346
> Signed-off-by: Evan Quan
Acked-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 72 ++
> 1
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote:
>
> Since it is only several bytes in size.
I think the subject and description should be clarified a bit. We are
not allocating it on the stack. We are just moving the object to the
smu structure allocation rather than allocating it dynamically
On Thu, Jun 4, 2020 at 5:07 PM Alex Deucher wrote:
>
> On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote:
> >
> > And some minor changes as dropping unused parameter and label
> > internal used API as static.
> >
> > Change-Id: I0af0aea029dc4fc7d8e150ab6ec984e9a5f1a74a
> > Signed-off-by: Evan Quan
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote:
>
> Thus redundant code can be dropped.
>
> Change-Id: I672f84ed5856da53b7f8f915b2f24ca11cd4b228
> Signed-off-by: Evan Quan
Clarify subject:
drm/amd/powerplay: maximize code sharing between .hw_fini and .suspend
With that fixed:
Reviewed-by:
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote:
>
> Those common operations(for all ASICs) are placed first and followed
> by ASIC specific ones. While the display related are placed at the last.
>
> Change-Id: Id45caee98273c8c0b9c1c9f2713fcf8106e02000
> Signed-off-by: Evan Quan
Typo in the
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote:
>
> Then redundant code can be dropped.
>
> Change-Id: Icbafbb7ffc8189a09f4236786aea6702ee73f9f4
> Signed-off-by: Evan Quan
Subject could be clarified as:
drm/amd/powerplay: maximize code sharing between .hw_init and .resume
With that fixed:
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote:
>
> Since smu_smc_table_hw_init() is needed for both .hw_init and .resume.
> By doing this, we can drop unnecessary operations on resume.
>
> Change-Id: I2af6277efaa9adba2de69161e20e54c4aa10a411
> Signed-off-by: Evan Quan
Reviewed-by: Alex
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote:
>
> So that we do not need to perform those unnecessary operations again on
> resume.
>
> Change-Id: I90f8a8d68762b5f88d7477934128a17bf67e3341
> Signed-off-by: Evan Quan
For the patch subject, I think it would be clearer as:
drm/amd/powerplay:
On Thu, Jun 4, 2020 at 12:47 AM Evan Quan wrote:
>
> So that code can be shared between .hw_fini and .suspend.
>
> Change-Id: I4a0eeb7cdecbf5b24fac3d0fe1d8fcb1ca9f0b0a
> Signed-off-by: Evan Quan
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 178
On Thu, Jun 4, 2020 at 12:46 AM Evan Quan wrote:
>
> Minor code cleanups.
>
> Change-Id: I6d240241e78cae17288c1d49dbae6ab1796b1128
> Signed-off-by: Evan Quan
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 74 ---
>
On Thu, Jun 4, 2020 at 12:46 AM Evan Quan wrote:
>
> By moving ASIC specific code into its own file.
You might want to clarify that the macros check if the asic has the
callback, so no need to explicitly check. With that fixed:
Reviewed-by: Alex Deucher
>
> Change-Id:
Applied. thanks!
Alex
On Thu, Jun 4, 2020 at 6:35 AM Colin King wrote:
>
> From: Colin Ian King
>
> There is a spelling mistake in a dml_print message. Fix it.
>
> Signed-off-by: Colin Ian King
> ---
> drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 2 +-
> 1 file changed,
Hi Dave, Daniel,
Fixes for 5.8.
The following changes since commit 9ca1f474cea0edc14a1d7ec933e5472c0ff115d3:
Merge tag 'amd-drm-next-5.8-2020-05-27' of
git://people.freedesktop.org/~agd5f/linux into drm-next (2020-05-28 16:10:17
+1000)
are available in the Git repository at:
On Wed, Jun 3, 2020 at 5:39 AM Takashi Iwai wrote:
>
> On Wed, 03 Jun 2020 03:31:37 +0200,
> Alex Deucher wrote:
> >
> > From: Hersen Wu
> >
> > dp/hdmi ati hda is not shown in audio settings
> >
> > Signed-off-by: Hersen Wu
> > Reviewed-by: Alex Deucher
> > Signed-off-by: Alex Deucher
>
>
From: Colin Ian King
There is a spelling mistake in a dml_print message. Fix it.
Signed-off-by: Colin Ian King
---
drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
Changed from V1:
rename same functions name, only init ras error handler data for
supported asic.
Signed-off-by: Stanley.Yang
Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |
On Thu, Jun 4, 2020 at 11:27 AM Chris Wilson wrote:
>
> Quoting Daniel Vetter (2020-06-04 10:21:46)
> > On Thu, Jun 4, 2020 at 10:57 AM Thomas Hellström (Intel)
> > wrote:
> > >
> > >
> > > On 6/4/20 10:12 AM, Daniel Vetter wrote:
> > > ...
> > > > Thread A:
> > > >
> > > > mutex_lock(A);
Quoting Daniel Vetter (2020-06-04 10:21:46)
> On Thu, Jun 4, 2020 at 10:57 AM Thomas Hellström (Intel)
> wrote:
> >
> >
> > On 6/4/20 10:12 AM, Daniel Vetter wrote:
> > ...
> > > Thread A:
> > >
> > > mutex_lock(A);
> > > mutex_unlock(A);
> > >
> > > dma_fence_signal();
> > >
>
On Thu, Jun 4, 2020 at 10:57 AM Thomas Hellström (Intel)
wrote:
>
>
> On 6/4/20 10:12 AM, Daniel Vetter wrote:
> ...
> > Thread A:
> >
> > mutex_lock(A);
> > mutex_unlock(A);
> >
> > dma_fence_signal();
> >
> > Thread B:
> >
> > mutex_lock(A);
> > dma_fence_wait();
>
On 6/4/20 10:12 AM, Daniel Vetter wrote:
...
Thread A:
mutex_lock(A);
mutex_unlock(A);
dma_fence_signal();
Thread B:
mutex_lock(A);
dma_fence_wait();
mutex_unlock(A);
Thread B is blocked on A signalling the fence, but A never gets around
to
i915 does tons of allocations from this worker, which lockdep catches.
Also generic infrastructure like this with big potential for how
dma_fence or other cross driver contracts work, really should be
reviewed on dri-devel. Implementing custom wheels for everything
within the driver is a classic
To improve coverage also annotate the gpu reset code itself, since
that's called from other places than drm/scheduler (which is already
annotated). Annotations nests, so this doesn't break anything, and
allows easier testing.
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Cc:
Trying to grab dma_resv_lock while in commit_tail before we've done
all the code that leads to the eventual signalling of the vblank event
(which can be a dma_fence) is deadlock-y. Don't do that.
Here the solution is easy because just grabbing locks to read
something races anyway. We don't need
This is one from the department of "maybe play lottery if you hit
this, karma compensation might work". Or at least lockdep ftw!
This reverts commit 565d1941557756a584ac357d945bc374d5fcd1d0.
It's not quite as low-risk as the commit message claims, because this
grabs console_lock, which might be
...
I think it's time to stop this little exercise.
The lockdep splat, for the record:
[ 132.583381] ==
[ 132.584091] WARNING: possible circular locking dependency detected
[ 132.584775] 5.7.0-rc3+ #346 Tainted: GW
[ 132.585461]
This is needed to signal the fences from page flips, annotate it
accordingly. We need to annotate entire timer callback since if we get
stuck anywhere in there, then the timer stops, and hence fences stop.
Just annotating the top part that does the vblank handling isn't
enough.
Cc:
If the scheduler rt thread gets stuck on a mutex that we're holding
while waiting for gpu workloads to complete, we have a problem.
Add dma-fence annotations so that lockdep can check this for us.
I've tried to quite carefully review this, and I think it's at the
right spot. But obviosly no
This is rather overkill since currently all drivers call this from
hardirq (or at least timers). But maybe in the future we're going to
have thread irq handlers and what not, doesn't hurt to be prepared.
Plus this is an easy start for sprinkling these fence annotations into
shared code.
Cc:
This is a bit disappointing since we need to split the annotations
over all the different parts.
I was considering just leaking the critical section into the
->atomic_commit_tail callback of each driver. But that would mean we
need to pass the fence_cookie into each driver (there's a total of 13
Two in one go:
- it is allowed to call dma_fence_wait() while holding a
dma_resv_lock(). This is fundamental to how eviction works with ttm,
so required.
- it is allowed to call dma_fence_wait() from memory reclaim contexts,
specifically from shrinker callbacks (which i915 does), and from
Just some tiny edits:
- fix link to struct dma_fence
- give slightly more meaningful title - the polling here is about
implicit fences, explicit fences (in sync_file or drm_syncobj) also
have their own polling
Signed-off-by: Daniel Vetter
---
drivers/dma-buf/dma-buf.c | 6 +++---
1 file
Design is similar to the lockdep annotations for workers, but with
some twists:
- We use a read-lock for the execution/worker/completion side, so that
this explicit annotation can be more liberally sprinkled around.
With read locks lockdep isn't going to complain if the read-side
isn't
This is a bit tricky, since ->notifier_lock is held while calling
dma_fence_wait we must ensure that also the read side (i.e.
dma_fence_begin_signalling) is on the same side. If we mix this up
lockdep complaints, and that's again why we want to have these
annotations.
A nice side effect of this
Hi all,
Still very much early stuff, still very much looking for initial thoughts
and maybe some ideas how this could all be rolled out across drivers.
Full intro probably best from the RFC cover letter:
https://lore.kernel.org/amd-gfx/20200512085944.222637-1-daniel.vet...@ffwll.ch/
Changes
My dma-fence lockdep annotations caught an inversion because we
allocate memory where we really shouldn't:
kmem_cache_alloc+0x2b/0x6d0
amdgpu_fence_emit+0x30/0x330 [amdgpu]
amdgpu_ib_schedule+0x306/0x550 [amdgpu]
amdgpu_job_run+0x10f/0x260 [amdgpu]
I need a canary in a ttm-based atomic driver to make sure the
dma_fence_begin/end_signalling annotations actually work.
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Cc: linux-r...@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-...@lists.freedesktop.org
Cc: Chris
fs_reclaim_acquire/release nicely catch recursion issues when
allocating GFP_KERNEL memory against shrinkers (which gpu drivers tend
to use to keep the excessive caches in check). For mmu notifier
recursions we do have lockdep annotations since 23b68395c7c7
("mm/mmu_notifiers: add a lockdep map
In the face of unpriviledged userspace being able to submit bogus gpu
workloads the kernel needs gpu timeout and reset (tdr) to guarantee
that dma_fences actually complete. Annotate this worker to make sure
we don't have any accidental locking inversions or other problems
lurking.
Originally this
Not going to bother with a complete commit message, just
offending backtrace:
kvmalloc_node+0x47/0x80
dc_create_state+0x1f/0x60 [amdgpu]
dc_commit_state+0xcb/0x9b0 [amdgpu]
amdgpu_dm_atomic_commit_tail+0xd31/0x2010 [amdgpu]
commit_tail+0xa4/0x140
[AMD Public Use]
Thanks tao, to call amdgpu_virt_init_err_handler_data In
amdgpu_virt_add_bad_page once Is also a way, I will check whether has potential
risk.
And I'll make distinguish the message from the one in bare mental RAS when
reserved page failed.
Regards,
Stanley
> -Original
47 matches
Mail list logo