Hi Monk,
in general an interesting idea, but I see two major problems with that:
1. It would make the reset take much longer.
2. Things get often stuck because of timing issues, so a guilty job
might pass perfectly when run a second time.
Apart from that the whole ring mirror list turned
Am 23.02.21 um 18:31 schrieb Alex Deucher:
On Wed, Feb 10, 2021 at 8:14 AM Daniel Vetter wrote:
On Wed, Feb 10, 2021 at 08:45:56AM +0100, Christian König wrote:
Reviewed-by: Christian König for the series.
Smash it into -misc?
@Christian Koenig did these ever land? I don't see them in
Well coding style clean ups are usually welcome, but not necessarily one
by one.
We can probably merge this if you clean up all checkpatch.pl warnings in
the whole file.
Christian.
Am 26.02.21 um 07:05 schrieb wangjingyu:
drm_property_create_range(rdev->ddev, 0 , "coherent", 0, 1);
By using the information provided by PMFW when available.
V2: put those structures shared around SMU V11 ASICs in
smu_v11_0.h
Change-Id: I1afec4cd07ac9608861ee07c449e320e3f36a932
Signed-off-by: Evan Quan
Acked-by: Alex Deucher
---
drivers/gpu/drm/amd/pm/inc/smu_v11_0.h| 10
To make sure they are naturally aligned.
V2: minimum the possible influence to existing applications which
were developed based on those data structures. With this change,
only 32bit OS are affected while 64bit OS not.
Change-Id: I0a139e1e1f09fe27deffdce1cec6ea4594947625
Signed-off-by:
[AMD Public Use]
Reviewed-by: Hawking Zhang
Regards,
Hawking
-Original Message-
From: Dennis Li
Sent: Friday, February 26, 2021 14:42
To: amd-gfx@lists.freedesktop.org; Chen, Guchun ; Zhang,
Hawking ; Koenig, Christian
Cc: Li, Dennis
Subject: [PATCH v2] drm/amdgpu: remove
If the number of badpage records exceed the threshold, driver has
updated both epprom header and control->tbl_hdr.header before gpu reset,
therefore GPU recovery thread no need to read epprom header directly.
v2: merge amdgpu_ras_check_err_threshold into
amdgpu_ras_eeprom_check_err_threshold
New changes were involved for the SmuMetrics structure.
Change-Id: Ib45443db03977ccd18618bcfdfd3574ac13d50d1
Signed-off-by: Evan Quan
---
.../drm/amd/pm/inc/smu11_driver_if_navi10.h | 98 ++-
drivers/gpu/drm/amd/pm/inc/smu_v11_0.h| 6 +-
.../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
[AMD Public Use]
Hi all
NAVI2X project hit a really hard to solve issue now, and it is turned out to
be a general headache of our TDR mechanism , check below scenario:
1. There is a job1 running on compute1 ring at timestamp
2. There is a job2 running on gfx ring at timestamp
3.
[AMD Official Use Only - Internal Distribution Only]
Reviewed-by: Evan Quan
-Original Message-
From: amd-gfx On Behalf Of Shirish S
Sent: Thursday, February 25, 2021 11:45 PM
To: Deucher, Alexander ;
amd-gfx@lists.freedesktop.org
Cc: S, Shirish
Subject: [PATCH] amdgpu/pm:
[AMD Public Use]
Acked-by: Evan Quan
-Original Message-
From: Horace Chen
Sent: Thursday, February 25, 2021 8:05 PM
To: amd-gfx@lists.freedesktop.org
Cc: Grodzovsky, Andrey ; Quan, Evan
; Chen, Horace ; Tuikov, Luben
; Koenig, Christian ; Deucher,
Alexander ; Xiao, Jack ; Zhang,
Hi, Hawking,
Agree with your suggestion, and it could further simplify our codes. I
will refactor them again.
Best Regards
Dennis Li
-Original Message-
From: Zhang, Hawking
Sent: Friday, February 26, 2021 12:30 PM
To: Li, Dennis ; amd-gfx@lists.freedesktop.org; Chen, Guchun
;
[AMD Public Use]
What about merge this function with amdgpu_ras_check_err_threshold?
Regards,
Hawking
-Original Message-
From: Dennis Li
Sent: Friday, February 26, 2021 09:26
To: amd-gfx@lists.freedesktop.org; Chen, Guchun ; Zhang,
Hawking ; Koenig, Christian
Cc: Li, Dennis
[AMD Public Use]
Reviewed-by: Hawking Zhang
Regards,
Hawking
-Original Message-
From: amd-gfx On Behalf Of Kevin Wang
Sent: Friday, February 26, 2021 12:25
To: amd-gfx@lists.freedesktop.org
Cc: Wang, Kevin(Yang)
Subject: [PATCH] drm/amdgpu: remove unused variable in
[AMD Public Use]
Reviewed-by: Hawking Zhang
-Original Message-
From: Wang, Kevin(Yang)
Sent: Friday, February 26, 2021 12:24
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Li, Candice ;
Wang, Kevin(Yang)
Subject: [PATCH] drm/amdgpu: add RAP TA version print in
clean up unsued variable in amdgpu_dma_buf_unmap().
Fixes:
drm/amdgpu: Remove amdgpu_device arg from free_sgt api
Signed-off-by: Kevin Wang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4
1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
add RAP TA version print in amdgpu_firmware_info.
Signed-off-by: Kevin Wang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 8
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index ce031a77cda5..a5ed9530f542
If the number of badpage records exceed the threshold, driver has
updated both epprom header and control->tbl_hdr.header before gpu reset,
therefore GPU recovery thread no need to read epprom header directly.
Signed-off-by: Dennis Li
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
Not used so remove them.
Fixes: d2d0c920a127 ("drm/amdgpu: Remove amdgpu_device arg from free_sgt api")
Cc: Ramesh Errabolu
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4
1 file changed, 4 deletions(-)
diff --git
On Thu, Feb 25, 2021 at 10:01 AM Arnd Bergmann wrote:
>
> From: Arnd Bergmann
>
> clang points out that the new logic uses an always-uninitialized
> array index:
>
> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9810:38: warning:
> variable 'i' is uninitialized when used here
On Thu, Feb 25, 2021 at 10:34 PM 'Nick Desaulniers' via Clang Built
Linux wrote:
> return parse_edid_cea(aconnector, edid_ext, EDID_LENGTH, vsdb_info) ? i :
> -ENODEV;
>
> would suffice, but the patch is still fine as is.
Right, I did not want to change more than necessary here, and the
[AMD Official Use Only - Internal Distribution Only]
Hi Arnd,
I have all the patches ready and I have tested them with the feature/platform
I'm working on and Bindu helped to test the 32bit build.
I'm in process of submitting the latest change.
Thanks,
Vladimir.
On 2021-02-25, 4:36 PM, "Arnd
Problem: If scheduler is already stopped by the time sched_entity
is released and entity's job_queue not empty I encountred
a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle
never becomes false.
Fix: In drm_sched_fini detach all sched_entities from the
scheduler's run
On Thu, Feb 25, 2021 at 3:33 PM Arnd Bergmann wrote:
>
> From: Arnd Bergmann
>
> The new display synchronization code caused a regression
> on all 32-bit architectures:
>
> ld.lld: error: undefined symbol: __aeabi_uldivmod
> >>> referenced by dce_clock_source.c
> >>>
> >>>
On Thu, Feb 25, 2021 at 4:19 AM Jiapeng Chong
wrote:
>
> Fix the following coccicheck warnings:
>
> ./drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp_cm.c:243:67-72:
> WARNING: conversion to bool not needed here.
>
> Reported-by: Abaci Robot
> Signed-off-by: Jiapeng Chong
Applied. Thanks!
On Thu, Feb 25, 2021 at 4:02 AM Yang Li wrote:
>
> Fix the following coccicheck warning:
> ./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1589:0-23: WARNING:
> fops_ib_preempt should be defined with DEFINE_DEBUGFS_ATTRIBUTE
> ./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1592:0-23: WARNING:
>
On 2021-02-25 1:42 p.m., Christian König wrote:
Am 25.02.21 um 17:03 schrieb Andrey Grodzovsky:
On 2021-02-25 2:53 a.m., Christian König wrote:
Am 24.02.21 um 16:13 schrieb Andrey Grodzovsky:
Ping
Sorry, I've been on vacation this week.
Andrey
On 2021-02-20 7:12 a.m., Andrey
On Tue, Feb 23, 2021 at 4:34 PM Jonathan Kim wrote:
>
> Request to stop DF performance counters is missing the actual write to the
> controller register.
>
> Reported-by: Chris Freehill
> Signed-off-by: Jonathan Kim
Acked-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 2 ++
Am 25.02.21 um 21:16 schrieb Alex Deucher:
This will be used by a new INFO ioctl query to fetch the decode
and encode capabilities from the kernel driver rather than
hardcoding them in mesa. This gives us more fine grained control
of capabilities using information that is only availabl in the
And just use the ioctl index. They are the same.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 -
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 16 -
drivers/gpu/drm/amd/amdgpu/cik.c| 12 ---
drivers/gpu/drm/amd/amdgpu/nv.c | 36
For each asic family. Will be used to populate tables
for the new INFO ioctl query.
v2: add max_pixels_per_frame to handle the portrait case
v3: fix copy paste typos
Reviewed-by: Leo Liu (v1)
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/cik.c | 75 ++
So mesa can check when to query the kernel vs use hardcoded
codec bandwidth data.
Reviewed-by: Leo Liu
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
We currently hardcode these in mesa, but querying them from
the kernel makes more sense since there may be board specific
limitations that the kernel driver is better suited to
determining.
Userpace patches that use this interface:
This will be used by a new INFO ioctl query to fetch the decode
and encode capabilities from the kernel driver rather than
hardcoding them in mesa. This gives us more fine grained control
of capabilities using information that is only availabl in the
kernel (e.g., platform limitations or
Am 25.02.21 um 19:33 schrieb Felix Kuehling:
[SNIP]
This in turn can lead to starvation of the work handler and so a life
lock as well.
I won't touch rptr or wptr at all for this.
Not sure what's your idea here, using ih->lock. Is it to completely
drain all IRQs until the IH ring is
From: Oak Zeng
With the 2-level gart page table, vram is squeezed into gart aperture
and FB aperture is disabled. Therefore all VRAM virtual addresses are
in the GART aperture. However currently PSP requires TMR addresses
in FB aperture. So we need some design change at PSP FW level to support
From: Dennis Li
When connected to a host via xGMI, system fatal errors may trigger
warm reset, driver has no change to query edc status before reset.
Therefore in this case, driver should harvest previous error loging
registers during boot, instead of only resetting them.
v2:
1. IP's
From: Felix Kuehling
This is needed for best machine learning performance. XNACK can still
be enabled per-process if needed.
Signed-off-by: Felix Kuehling
Reviewed-by: Alex Deucher
Reviewed-by: Philip Yang
Tested-by: Alex Sierra
Signed-off-by: Alex Deucher
---
From: Harish Kasiviswanathan
Signed-off-by: Harish Kasiviswanathan
Reivewed-by: Hawking Zhang
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 7 +++
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git
Am 2021-02-25 um 1:32 p.m. schrieb Deucher, Alexander:
>
> [AMD Official Use Only - Internal Distribution Only]
>
>
> I dropped the KFD debugger hunks and just added the gfx 9.4.2 changes
> since these were required for a bunch of later patches that build on
> that file that are not dependent on
Am 25.02.21 um 17:03 schrieb Andrey Grodzovsky:
On 2021-02-25 2:53 a.m., Christian König wrote:
Am 24.02.21 um 16:13 schrieb Andrey Grodzovsky:
Ping
Sorry, I've been on vacation this week.
Andrey
On 2021-02-20 7:12 a.m., Andrey Grodzovsky wrote:
On 2/20/21 3:38 AM, Christian König
Am 2021-02-25 um 11:48 a.m. schrieb Christian König:
>
>
> Am 25.02.21 um 16:35 schrieb Felix Kuehling:
>> Am 2021-02-25 um 8:53 a.m. schrieb Christian König:
>>>
>>> Am 25.02.21 um 04:15 schrieb Felix Kuehling:
On 2021-02-24 10:54 a.m., Kim, Jonathan wrote:
> [AMD Official Use Only -
[AMD Official Use Only - Internal Distribution Only]
I dropped the KFD debugger hunks and just added the gfx 9.4.2 changes since
these were required for a bunch of later patches that build on that file that
are not dependent on debugger. I can rework the commit message if you'd like.
Alex
On 2021-02-12 8:08 p.m., Aurabindo Pillai wrote:
[Why]
A seamless transition between modes can be performed if the new incoming
mode has the same timing parameters as the optimized mode on a display with a
variable vtotal min/max.
Smooth video playback usecases can be enabled with this seamless
Am 25.02.21 um 03:49 schrieb Ramesh Errabolu:
Currently callers have to provide handle of amdgpu_device,
which is not used by the implementation. It is unlikely this
parameter will become useful in future, thus removing it
Signed-off-by: Ramesh Errabolu
Reviewed-by: Christian König
---
Am 25.02.21 um 16:35 schrieb Felix Kuehling:
Am 2021-02-25 um 8:53 a.m. schrieb Christian König:
Am 25.02.21 um 04:15 schrieb Felix Kuehling:
On 2021-02-24 10:54 a.m., Kim, Jonathan wrote:
[AMD Official Use Only - Internal Distribution Only]
-Original Message-
From: Koenig,
This has been stable for a while.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 10 ++
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
On 2021-02-25 5:25 a.m., Daniel Vetter wrote:
On Wed, Feb 24, 2021 at 11:30:50AM -0500, Andrey Grodzovsky wrote:
On 2021-02-19 5:24 a.m., Daniel Vetter wrote:
On Thu, Feb 18, 2021 at 9:03 PM Andrey Grodzovsky
wrote:
Looked a bit into it, I want to export sync_object to FD and import from
On 2021-02-25 2:53 a.m., Christian König wrote:
Am 24.02.21 um 16:13 schrieb Andrey Grodzovsky:
Ping
Sorry, I've been on vacation this week.
Andrey
On 2021-02-20 7:12 a.m., Andrey Grodzovsky wrote:
On 2/20/21 3:38 AM, Christian König wrote:
Am 18.02.21 um 17:41 schrieb Andrey
report -ENOTSUPP instead of -EINVAL, so that if userspace
fails to read sensor data can figure it out the failure correctly.
Signed-off-by: Shirish S
---
drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c | 2 +-
drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 2 +-
Am 2021-02-25 um 8:53 a.m. schrieb Christian König:
>
>
> Am 25.02.21 um 04:15 schrieb Felix Kuehling:
>> On 2021-02-24 10:54 a.m., Kim, Jonathan wrote:
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
-Original Message-
From: Koenig, Christian
Sent: Wednesday,
From: Arnd Bergmann
clang points out that the new logic uses an always-uninitialized
array index:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9810:38: warning:
variable 'i' is uninitialized when used here [-Wuninitialized]
timing = >detailed_timings[i];
[AMD Official Use Only - Internal Distribution Only]
Acked-by: Alex Deucher
From: Horace Chen
Sent: Thursday, February 25, 2021 7:04 AM
To: amd-gfx@lists.freedesktop.org
Cc: Grodzovsky, Andrey ; Quan, Evan
; Chen, Horace ; Tuikov, Luben
; Koenig, Christian ;
From: Arnd Bergmann
The new display synchronization code caused a regression
on all 32-bit architectures:
ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> referenced by dce_clock_source.c
>>>
>>> gpu/drm/amd/display/dc/dce/dce_clock_source.o:(get_pixel_clk_frequency_100hz)
Fix the following coccicheck warnings:
./drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp_cm.c:243:67-72:
WARNING: conversion to bool not needed here.
Reported-by: Abaci Robot
Signed-off-by: Jiapeng Chong
---
drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp_cm.c | 2 +-
1 file changed, 1
Am 25.02.21 um 04:15 schrieb Felix Kuehling:
On 2021-02-24 10:54 a.m., Kim, Jonathan wrote:
[AMD Official Use Only - Internal Distribution Only]
-Original Message-
From: Koenig, Christian
Sent: Wednesday, February 24, 2021 4:17 AM
To: Kim, Jonathan ; amd-
g...@lists.freedesktop.org
The whole patch set needs a rebase since the TTM_PL_FLAG_* for
controlling the caching doesn't exists any more upstream.
How should we approach that?
Thanks,
Christian.
Am 24.02.21 um 23:17 schrieb Alex Deucher:
From: Oak Zeng
If TTM placement flag is cached, buffer is intended to be
On Wed, Feb 24, 2021 at 11:30:50AM -0500, Andrey Grodzovsky wrote:
>
> On 2021-02-19 5:24 a.m., Daniel Vetter wrote:
> > On Thu, Feb 18, 2021 at 9:03 PM Andrey Grodzovsky
> > wrote:
> > > Looked a bit into it, I want to export sync_object to FD and import from
> > > that FD
> > > such that I
Am 25.02.21 um 10:16 schrieb Jingwen Chen:
Move gpu_reset_counter after drm_sched_stop to avoid race
condition caused by job submitted between reset_count +1 and
drm_sched_stop.
Signed-off-by: Jingwen Chen
Reviewed-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3
Move gpu_reset_counter after drm_sched_stop to avoid race
condition caused by job submitted between reset_count +1 and
drm_sched_stop.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git
Fix the following coccicheck warning:
./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1589:0-23: WARNING:
fops_ib_preempt should be defined with DEFINE_DEBUGFS_ATTRIBUTE
./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1592:0-23: WARNING:
fops_sclk_set should be defined with DEFINE_DEBUGFS_ATTRIBUTE
Hi,
On Wed, Feb 24, 2021 at 12:33:45PM +0100, Thomas Zimmermann wrote:
> Hi Maxime,
>
> for the whole series:
>
> Acked-by: Thomas Zimmermann
Applied the whole series, thanks to everyone involved in the review,
it's been a pretty daunting one :)
Maxime
signature.asc
Description: PGP
[AMD Official Use Only - Internal Distribution Only]
Yeah, that sounds better than original fix
Thanks Christian
--
Monk Liu | Cloud-GPU Core team
--
-Original Message-
From: Koenig, Christian
Sent:
Good catch, but the approach for the fix is incorrect.
The device reset count can only be incremented after taking the reset
lock and stopping the scheduler, otherwise a whole bunch of different
race conditions can happen.
Christian.
Am 25.02.21 um 08:56 schrieb Chen, JingWen:
[AMD
64 matches
Mail list logo