Re: [RFC 02/17] dma-fence: basic lockdep annotations

2020-05-28 Thread Daniel Vetter
On Thu, May 28, 2020 at 11:54 PM Luben Tuikov  wrote:
>
> On 2020-05-12 4:59 a.m., Daniel Vetter wrote:
> > Design is similar to the lockdep annotations for workers, but with
> > some twists:
> >
> > - We use a read-lock for the execution/worker/completion side, so that
> >   this explicit annotation can be more liberally sprinkled around.
> >   With read locks lockdep isn't going to complain if the read-side
> >   isn't nested the same way under all circumstances, so ABBA deadlocks
> >   are ok. Which they are, since this is an annotation only.
> >
> > - We're using non-recursive lockdep read lock mode, since in recursive
> >   read lock mode lockdep does not catch read side hazards. And we
> >   _very_ much want read side hazards to be caught. For full details of
> >   this limitation see
> >
> >   commit e91498589746065e3ae95d9a00b068e525eec34f
> >   Author: Peter Zijlstra 
> >   Date:   Wed Aug 23 13:13:11 2017 +0200
> >
> >   locking/lockdep/selftests: Add mixed read-write ABBA tests
> >
> > - To allow nesting of the read-side explicit annotations we explicitly
> >   keep track of the nesting. lock_is_held() allows us to do that.
> >
> > - The wait-side annotation is a write lock, and entirely done within
> >   dma_fence_wait() for everyone by default.
> >
> > - To be able to freely annotate helper functions I want to make it ok
> >   to call dma_fence_begin/end_signalling from soft/hardirq context.
> >   First attempt was using the hardirq locking context for the write
> >   side in lockdep, but this forces all normal spinlocks nested within
> >   dma_fence_begin/end_signalling to be spinlocks. That bollocks.
> >
> >   The approach now is to simple check in_atomic(), and for these cases
> >   entirely rely on the might_sleep() check in dma_fence_wait(). That
> >   will catch any wrong nesting against spinlocks from soft/hardirq
> >   contexts.
> >
> > The idea here is that every code path that's critical for eventually
> > signalling a dma_fence should be annotated with
> > dma_fence_begin/end_signalling. The annotation ideally starts right
> > after a dma_fence is published (added to a dma_resv, exposed as a
> > sync_file fd, attached to a drm_syncobj fd, or anything else that
> > makes the dma_fence visible to other kernel threads), up to and
> > including the dma_fence_wait(). Examples are irq handlers, the
> > scheduler rt threads, the tail of execbuf (after the corresponding
> > fences are visible), any workers that end up signalling dma_fences and
> > really anything else. Not annotated should be code paths that only
> > complete fences opportunistically as the gpu progresses, like e.g.
> > shrinker/eviction code.
> >
> > The main class of deadlocks this is supposed to catch are:
> >
> > Thread A:
> >
> >   mutex_lock(A);
> >   mutex_unlock(A);
> >
> >   dma_fence_signal();
> >
> > Thread B:
> >
> >   mutex_lock(A);
> >   dma_fence_wait();
> >   mutex_unlock(A);
> >
> > Thread B is blocked on A signalling the fence, but A never gets around
> > to that because it cannot acquire the lock A.
> >
> > Note that dma_fence_wait() is allowed to be nested within
> > dma_fence_begin/end_signalling sections. To allow this to happen the
> > read lock needs to be upgraded to a write lock, which means that any
> > other lock is acquired between the dma_fence_begin_signalling() call and
> > the call to dma_fence_wait(), and still held, this will result in an
> > immediate lockdep complaint. The only other option would be to not
> > annotate such calls, defeating the point. Therefore these annotations
> > cannot be sprinkled over the code entirely mindless to avoid false
> > positives.
> >
> > v2: handle soft/hardirq ctx better against write side and dont forget
> > EXPORT_SYMBOL, drivers can't use this otherwise.
> >
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Cc: linux-r...@vger.kernel.org
> > Cc: amd-gfx@lists.freedesktop.org
> > Cc: intel-...@lists.freedesktop.org
> > Cc: Chris Wilson 
> > Cc: Maarten Lankhorst 
> > Cc: Christian König 
> > Signed-off-by: Daniel Vetter 
> > ---
> >  drivers/dma-buf/dma-fence.c | 53 +
> >  include/linux/dma-fence.h   | 12 +
> >  2 files changed, 65 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 6802125349fb..d5c0fd2efc70 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -110,6 +110,52 @@ u64 dma_fence_context_alloc(unsigned num)
> >  }
> >  EXPORT_SYMBOL(dma_fence_context_alloc);
> >
> > +#ifdef CONFIG_LOCKDEP
> > +struct lockdep_map   dma_fence_lockdep_map = {
> > + .name = "dma_fence_map"
> > +};
> > +
> > +bool dma_fence_begin_signalling(void)
> > +{
> > + /* explicitly nesting ... */
> > + if (lock_is_held_type(_fence_lockdep_map, 1))
> > + return true;
> > +
> > + /* rely on might_sleep check for soft/hardirq locks */
> > + if 

[PATCH] drm/amdgpu: enable renoir discovery for gc info retrieved

2020-05-28 Thread Prike.Liang
Use ip discovery GC table instead of gpu info firmware
for exporting gpu info to inquire interface.As Renoir discovery
has same version with Navi1x therefore just enable it same way
as Navi1x.

Signed-off-by: Prike.Liang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2f0e8da..bff740ccd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1528,7 +1528,7 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
 {
const char *chip_name;
char fw_name[30];
-   int err;
+   int err, r;
const struct gpu_info_firmware_header_v1_0 *hdr;
 
adev->firmware.gpu_info_fw = NULL;
@@ -1578,6 +1578,23 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
chip_name = "arcturus";
break;
case CHIP_RENOIR:
+   if (amdgpu_discovery) {
+   /**
+* For RENOIR series seems needn't reinitialize the reg base 
again as it already set during
+* early init,if any concern here will need export 
amdgpu_discovery_init() for this case.
+*/
+   r = amdgpu_discovery_reg_base_init(adev);
+   if (r) {
+   DRM_WARN("failed to get ip discovery table, "
+   "fallback to get gpu info in legacy 
method\n");
+   goto legacy_gpuinfo;
+   }
+
+   amdgpu_discovery_get_gfx_info(adev);
+
+   return 0;
+   }
+legacy_gpuinfo:
chip_name = "renoir";
break;
case CHIP_NAVI10:
@@ -1617,7 +1634,7 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
(const struct gpu_info_firmware_v1_0 
*)(adev->firmware.gpu_info_fw->data +

le32_to_cpu(hdr->header.ucode_array_offset_bytes));
 
-   if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) {
+   if (amdgpu_discovery && adev->asic_type >= CHIP_RENOIR && !r) {
amdgpu_discovery_get_gfx_info(adev);
goto parse_soc_bounding_box;
}
@@ -3364,7 +3381,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
sysfs_remove_files(>dev->kobj, amdgpu_dev_attributes);
if (IS_ENABLED(CONFIG_PERF_EVENTS))
amdgpu_pmu_fini(adev);
-   if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
+   if (amdgpu_discovery && adev->asic_type >= CHIP_RENOIR)
amdgpu_discovery_fini(adev);
 }
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 6/6] drm/amdgpu/nv: enable init reset check

2020-05-28 Thread Quan, Evan
[AMD Official Use Only - Internal Distribution Only]

Patch 1-4 are acked-by: Evan Quan 
Patch 5,6 are reviewed-by: Evan Quan 

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, May 29, 2020 5:35 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 6/6] drm/amdgpu/nv: enable init reset check

gpu reset is implemented for navi so we can enable this.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 0f927fcff0d5..fd3b9e21a5bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -41,6 +41,7 @@
 #include "hdp/hdp_5_0_0_offset.h"
 #include "hdp/hdp_5_0_0_sh_mask.h"
 #include "smuio/smuio_11_0_0_offset.h"
+#include "mp/mp_11_0_offset.h"

 #include "soc15.h"
 #include "soc15_common.h"
@@ -514,7 +515,6 @@ static bool nv_need_full_reset(struct amdgpu_device *adev)

 static bool nv_need_reset_on_init(struct amdgpu_device *adev)
 {
-#if 0
 u32 sol_reg;

 if (adev->flags & AMD_IS_APU)
@@ -526,8 +526,7 @@ static bool nv_need_reset_on_init(struct amdgpu_device 
*adev)
 sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
 if (sol_reg)
 return true;
-#endif
-/* TODO: re-enable it when mode1 reset is functional */
+
 return false;
 }

--
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cevan.quan%40amd.com%7C7a7488e91f1e452df51c08d8034f1589%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637262985526500839sdata=1GY9VAybW%2BvAU0mZme2y6jll%2FuUPllXT9YwJq54wmmU%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 6/6] drm/amdgpu/nv: enable init reset check

2020-05-28 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only]

The series is

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, May 29, 2020 05:35
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 6/6] drm/amdgpu/nv: enable init reset check

gpu reset is implemented for navi so we can enable this.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c 
index 0f927fcff0d5..fd3b9e21a5bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -41,6 +41,7 @@
 #include "hdp/hdp_5_0_0_offset.h"
 #include "hdp/hdp_5_0_0_sh_mask.h"
 #include "smuio/smuio_11_0_0_offset.h"
+#include "mp/mp_11_0_offset.h"
 
 #include "soc15.h"
 #include "soc15_common.h"
@@ -514,7 +515,6 @@ static bool nv_need_full_reset(struct amdgpu_device *adev)
 
 static bool nv_need_reset_on_init(struct amdgpu_device *adev)  { -#if 0
u32 sol_reg;
 
if (adev->flags & AMD_IS_APU)
@@ -526,8 +526,7 @@ static bool nv_need_reset_on_init(struct amdgpu_device 
*adev)
sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
if (sol_reg)
return true;
-#endif
-   /* TODO: re-enable it when mode1 reset is functional */
+
return false;
 }
 
--
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Chawking.zhang%40amd.com%7C461e9591172c445fd4b308d8034f15fc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637262985495510583sdata=NGZFIi8Z13c8JPnescBK9zOU91Ow0o5ZgrBwSer29%2Bo%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu/pm: don't bail for in_suspend

2020-05-28 Thread Quan, Evan
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Evan Quan 

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, May 28, 2020 9:00 PM
To: amd-gfx list 
Cc: Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu/pm: don't bail for in_suspend

Ping?

On Wed, May 27, 2020 at 6:52 PM Alex Deucher  wrote:
>
> Otherwise we disable sysfs/debugfs access with runtime pm.
>
> Fixes: f7c8d853b029df ("drm/amdgpu/pm: return an error during GPU
> reset or suspend")
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 114
> -
>  1 file changed, 57 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 808884aaf36d..775e389c9a13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -163,7 +163,7 @@ static ssize_t amdgpu_get_power_dpm_state(struct device 
> *dev,
> enum amd_pm_state_type pm;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev); @@ -199,7 +199,7 @@
> static ssize_t amdgpu_set_power_dpm_state(struct device *dev,
> enum amd_pm_state_type  state;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (strncmp("battery", buf, strlen("battery")) == 0) @@ -303,7
> +303,7 @@ static ssize_t amdgpu_get_power_dpm_force_performance_level(struct 
> device *dev,
> enum amd_dpm_forced_level level = 0xff;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev); @@ -343,7 +343,7 @@
> static ssize_t amdgpu_set_power_dpm_force_performance_level(struct device 
> *dev,
> enum amd_dpm_forced_level current_level = 0xff;
> int ret = 0;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (strncmp("low", buf, strlen("low")) == 0) { @@ -445,7
> +445,7 @@ static ssize_t amdgpu_get_pp_num_states(struct device *dev,
> struct pp_states_info data;
> int i, buf_len, ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev); @@ -487,7 +487,7 @@
> static ssize_t amdgpu_get_pp_cur_state(struct device *dev,
> enum amd_pm_state_type pm = 0;
> int i = 0, ret = 0;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev); @@ -526,7 +526,7 @@
> static ssize_t amdgpu_get_pp_force_state(struct device *dev,
> struct drm_device *ddev = dev_get_drvdata(dev);
> struct amdgpu_device *adev = ddev->dev_private;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (adev->pp_force_state_enabled) @@ -546,7 +546,7 @@ static
> ssize_t amdgpu_set_pp_force_state(struct device *dev,
> unsigned long idx;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (strlen(buf) == 1)
> @@ -604,7 +604,7 @@ static ssize_t amdgpu_get_pp_table(struct device *dev,
> char *table = NULL;
> int size, ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev); @@ -646,7 +646,7 @@
> static ssize_t amdgpu_set_pp_table(struct device *dev,
> struct amdgpu_device *adev = ddev->dev_private;
> int ret = 0;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev); @@ -751,7 +751,7 @@
> static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> const char delimiter[3] = {' ', '\n', '\0'};
> uint32_t type;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (count > 127)
> @@ -843,7 +843,7 @@ static ssize_t amdgpu_get_pp_od_clk_voltage(struct device 
> *dev,
> ssize_t size;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev); @@ -895,7 +895,7 @@
> static ssize_t amdgpu_set_pp_features(struct device *dev,
> uint64_t featuremask;
> int ret;
>
> -   if (adev->in_gpu_reset 

RE: [PATCH] drm/amdgpu/fru: fix header guard and include header

2020-05-28 Thread Quan, Evan
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Evan Quan 

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, May 28, 2020 9:45 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH] drm/amdgpu/fru: fix header guard and include header

Fix the fru eeprom header guard and include it in the .c file.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 1 +  
drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index 815c072ac4da..6ae80b33182c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -26,6 +26,7 @@
 #include "amdgpu_i2c.h"
 #include "smu_v11_0_i2c.h"
 #include "atom.h"
+#include "amdgpu_fru_eeprom.h"

 #define I2C_PRODUCT_INFO_ADDR0xAC
 #define I2C_PRODUCT_INFO_ADDR_SIZE0x2
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h
index 968115c97e33..f29a8611d69b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h
@@ -21,8 +21,8 @@
  *
  */

-#ifndef __AMDGPU_PRODINFO_H__
-#define __AMDGPU_PRODINFO_H__
+#ifndef __AMDGPU_FRU_EEPROM_H__
+#define __AMDGPU_FRU_EEPROM_H__

 int amdgpu_fru_get_product_info(struct amdgpu_device *adev);

--
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cevan.quan%40amd.com%7Cb5ab699c12e54a474a9708d8030d4f59%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637262703031199462sdata=gZqz%2FC9%2FFxdj6Cowb72g6RqjsI4L9kxIsyOwVPlS2kA%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH][next] drm/amdkfd: fix a dereference of pdd before it is null checked

2020-05-28 Thread Felix Kuehling

On 2020-05-28 18:24, Colin King wrote:

From: Colin Ian King 

Currently pointer pdd is being dereferenced when assigning pointer
dpm and then pdd is being null checked.  Fix this by checking if
pdd is null before the dereference of pdd occurs.

Addresses-Coverity: ("Dereference before null check")
Fixes: 522b89c63370 ("drm/amdkfd: Track SDMA utilization per process")
Signed-off-by: Colin Ian King 


Reviewed-by: Felix Kuehling 

I applied the patch to our internal amd-staging-drm-next.

Regards,
  Felix



---
  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 25636789f3d3..bdc58741b32e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -103,10 +103,11 @@ static void kfd_sdma_activity_worker(struct work_struct 
*work)
return;
  
  	pdd = workarea->pdd;

+   if (!pdd)
+   return;
dqm = pdd->dev->dqm;
qpd = >qpd;
-
-   if (!pdd || !dqm || !qpd)
+   if (!dqm || !qpd)
return;
  
  	mm = get_task_mm(pdd->process->lead_thread);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC 02/17] dma-fence: basic lockdep annotations

2020-05-28 Thread Luben Tuikov
On 2020-05-12 4:59 a.m., Daniel Vetter wrote:
> Design is similar to the lockdep annotations for workers, but with
> some twists:
> 
> - We use a read-lock for the execution/worker/completion side, so that
>   this explicit annotation can be more liberally sprinkled around.
>   With read locks lockdep isn't going to complain if the read-side
>   isn't nested the same way under all circumstances, so ABBA deadlocks
>   are ok. Which they are, since this is an annotation only.
> 
> - We're using non-recursive lockdep read lock mode, since in recursive
>   read lock mode lockdep does not catch read side hazards. And we
>   _very_ much want read side hazards to be caught. For full details of
>   this limitation see
> 
>   commit e91498589746065e3ae95d9a00b068e525eec34f
>   Author: Peter Zijlstra 
>   Date:   Wed Aug 23 13:13:11 2017 +0200
> 
>   locking/lockdep/selftests: Add mixed read-write ABBA tests
> 
> - To allow nesting of the read-side explicit annotations we explicitly
>   keep track of the nesting. lock_is_held() allows us to do that.
> 
> - The wait-side annotation is a write lock, and entirely done within
>   dma_fence_wait() for everyone by default.
> 
> - To be able to freely annotate helper functions I want to make it ok
>   to call dma_fence_begin/end_signalling from soft/hardirq context.
>   First attempt was using the hardirq locking context for the write
>   side in lockdep, but this forces all normal spinlocks nested within
>   dma_fence_begin/end_signalling to be spinlocks. That bollocks.
> 
>   The approach now is to simple check in_atomic(), and for these cases
>   entirely rely on the might_sleep() check in dma_fence_wait(). That
>   will catch any wrong nesting against spinlocks from soft/hardirq
>   contexts.
> 
> The idea here is that every code path that's critical for eventually
> signalling a dma_fence should be annotated with
> dma_fence_begin/end_signalling. The annotation ideally starts right
> after a dma_fence is published (added to a dma_resv, exposed as a
> sync_file fd, attached to a drm_syncobj fd, or anything else that
> makes the dma_fence visible to other kernel threads), up to and
> including the dma_fence_wait(). Examples are irq handlers, the
> scheduler rt threads, the tail of execbuf (after the corresponding
> fences are visible), any workers that end up signalling dma_fences and
> really anything else. Not annotated should be code paths that only
> complete fences opportunistically as the gpu progresses, like e.g.
> shrinker/eviction code.
> 
> The main class of deadlocks this is supposed to catch are:
> 
> Thread A:
> 
>   mutex_lock(A);
>   mutex_unlock(A);
> 
>   dma_fence_signal();
> 
> Thread B:
> 
>   mutex_lock(A);
>   dma_fence_wait();
>   mutex_unlock(A);
> 
> Thread B is blocked on A signalling the fence, but A never gets around
> to that because it cannot acquire the lock A.
> 
> Note that dma_fence_wait() is allowed to be nested within
> dma_fence_begin/end_signalling sections. To allow this to happen the
> read lock needs to be upgraded to a write lock, which means that any
> other lock is acquired between the dma_fence_begin_signalling() call and
> the call to dma_fence_wait(), and still held, this will result in an
> immediate lockdep complaint. The only other option would be to not
> annotate such calls, defeating the point. Therefore these annotations
> cannot be sprinkled over the code entirely mindless to avoid false
> positives.
> 
> v2: handle soft/hardirq ctx better against write side and dont forget
> EXPORT_SYMBOL, drivers can't use this otherwise.
> 
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Cc: linux-r...@vger.kernel.org
> Cc: amd-gfx@lists.freedesktop.org
> Cc: intel-...@lists.freedesktop.org
> Cc: Chris Wilson 
> Cc: Maarten Lankhorst 
> Cc: Christian König 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/dma-buf/dma-fence.c | 53 +
>  include/linux/dma-fence.h   | 12 +
>  2 files changed, 65 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 6802125349fb..d5c0fd2efc70 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -110,6 +110,52 @@ u64 dma_fence_context_alloc(unsigned num)
>  }
>  EXPORT_SYMBOL(dma_fence_context_alloc);
>  
> +#ifdef CONFIG_LOCKDEP
> +struct lockdep_map   dma_fence_lockdep_map = {
> + .name = "dma_fence_map"
> +};
> +
> +bool dma_fence_begin_signalling(void)
> +{
> + /* explicitly nesting ... */
> + if (lock_is_held_type(_fence_lockdep_map, 1))
> + return true;
> +
> + /* rely on might_sleep check for soft/hardirq locks */
> + if (in_atomic())
> + return true;
> +
> + /* ... and non-recursive readlock */
> + lock_acquire(_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_);
> +
> + return false;
> +}
> +EXPORT_SYMBOL(dma_fence_begin_signalling);

Hi Daniel,

This is great work and 

[PATCH 6/6] drm/amdgpu/nv: enable init reset check

2020-05-28 Thread Alex Deucher
gpu reset is implemented for navi so we can enable this.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 0f927fcff0d5..fd3b9e21a5bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -41,6 +41,7 @@
 #include "hdp/hdp_5_0_0_offset.h"
 #include "hdp/hdp_5_0_0_sh_mask.h"
 #include "smuio/smuio_11_0_0_offset.h"
+#include "mp/mp_11_0_offset.h"
 
 #include "soc15.h"
 #include "soc15_common.h"
@@ -514,7 +515,6 @@ static bool nv_need_full_reset(struct amdgpu_device *adev)
 
 static bool nv_need_reset_on_init(struct amdgpu_device *adev)
 {
-#if 0
u32 sol_reg;
 
if (adev->flags & AMD_IS_APU)
@@ -526,8 +526,7 @@ static bool nv_need_reset_on_init(struct amdgpu_device 
*adev)
sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
if (sol_reg)
return true;
-#endif
-   /* TODO: re-enable it when mode1 reset is functional */
+
return false;
 }
 
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/6] drm/amdgpu: skip gpu_info firmware if discovery info is available

2020-05-28 Thread Alex Deucher
The GPU info firmware is only applicable at bring up when the
IP discovery table is not present.  If it's available, use that
first and then fallback to parsing the gpu info firmware.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2f0e8da7bacf..716f1f7ebe3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1533,6 +1533,11 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
 
adev->firmware.gpu_info_fw = NULL;
 
+   if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) {
+   amdgpu_discovery_get_gfx_info(adev);
+   return 0;
+   }
+
switch (adev->asic_type) {
 #ifdef CONFIG_DRM_AMDGPU_SI
case CHIP_VERDE:
@@ -1617,11 +1622,6 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
(const struct gpu_info_firmware_v1_0 
*)(adev->firmware.gpu_info_fw->data +

le32_to_cpu(hdr->header.ucode_array_offset_bytes));
 
-   if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) {
-   amdgpu_discovery_get_gfx_info(adev);
-   goto parse_soc_bounding_box;
-   }
-
adev->gfx.config.max_shader_engines = 
le32_to_cpu(gpu_info_fw->gc_num_se);
adev->gfx.config.max_cu_per_sh = 
le32_to_cpu(gpu_info_fw->gc_num_cu_per_sh);
adev->gfx.config.max_sh_per_se = 
le32_to_cpu(gpu_info_fw->gc_num_sh_per_se);
@@ -1650,10 +1650,9 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
le32_to_cpu(gpu_info_fw->num_packer_per_sc);
}
 
-parse_soc_bounding_box:
/*
 * soc bounding box info is not integrated in disocovery table,
-* we always need to parse it from gpu info firmware.
+* we always need to parse it from gpu info firmware if needed.
 */
if (hdr->version_minor == 2) {
const struct gpu_info_firmware_v1_2 *gpu_info_fw =
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/6] drm/amdgpu/nv: allow access to SDMA status registers

2020-05-28 Thread Alex Deucher
For access via ioctl for tools like umr and mesa.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 6655dd2009b6..61eea26922ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -188,10 +188,8 @@ static struct soc15_allowed_register_entry 
nv_allowed_read_registers[] = {
{ SOC15_REG_ENTRY(GC, 0, mmGRBM_STATUS_SE1)},
{ SOC15_REG_ENTRY(GC, 0, mmGRBM_STATUS_SE2)},
{ SOC15_REG_ENTRY(GC, 0, mmGRBM_STATUS_SE3)},
-#if 0  /* TODO: will set it when SDMA header is available */
{ SOC15_REG_ENTRY(SDMA0, 0, mmSDMA0_STATUS_REG)},
{ SOC15_REG_ENTRY(SDMA1, 0, mmSDMA1_STATUS_REG)},
-#endif
{ SOC15_REG_ENTRY(GC, 0, mmCP_STAT)},
{ SOC15_REG_ENTRY(GC, 0, mmCP_STALLED_STAT1)},
{ SOC15_REG_ENTRY(GC, 0, mmCP_STALLED_STAT2)},
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/6] drm/amdgpu: clean up discovery testing

2020-05-28 Thread Alex Deucher
Rather than checking of the variable is enabled and the
chip is the right family check for the presence of the
discovery table.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 716f1f7ebe3d..febcecc5c6b0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1533,7 +1533,7 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
 
adev->firmware.gpu_info_fw = NULL;
 
-   if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) {
+   if (adev->discovery_bin) {
amdgpu_discovery_get_gfx_info(adev);
return 0;
}
@@ -3363,7 +3363,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
sysfs_remove_files(>dev->kobj, amdgpu_dev_attributes);
if (IS_ENABLED(CONFIG_PERF_EVENTS))
amdgpu_pmu_fini(adev);
-   if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
+   if (adev->discovery_bin)
amdgpu_discovery_fini(adev);
 }
 
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 5/6] drm/amdgpu/nv: remove some dead code

2020-05-28 Thread Alex Deucher
navi never supported the pci config reset.  Neither did
vega.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 34 -
 1 file changed, 34 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 61eea26922ce..0f927fcff0d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -254,31 +254,6 @@ static int nv_read_register(struct amdgpu_device *adev, 
u32 se_num,
return -EINVAL;
 }
 
-#if 0
-static void nv_gpu_pci_config_reset(struct amdgpu_device *adev)
-{
-   u32 i;
-
-   dev_info(adev->dev, "GPU pci config reset\n");
-
-   /* disable BM */
-   pci_clear_master(adev->pdev);
-   /* reset */
-   amdgpu_pci_config_reset(adev);
-
-   udelay(100);
-
-   /* wait for asic to come out of reset */
-   for (i = 0; i < adev->usec_timeout; i++) {
-   u32 memsize = nbio_v2_3_get_memsize(adev);
-   if (memsize != 0x)
-   break;
-   udelay(1);
-   }
-
-}
-#endif
-
 static int nv_asic_mode1_reset(struct amdgpu_device *adev)
 {
u32 i;
@@ -336,15 +311,6 @@ nv_asic_reset_method(struct amdgpu_device *adev)
 
 static int nv_asic_reset(struct amdgpu_device *adev)
 {
-
-   /* FIXME: it doesn't work since vega10 */
-#if 0
-   amdgpu_atombios_scratch_regs_engine_hung(adev, true);
-
-   nv_gpu_pci_config_reset(adev);
-
-   amdgpu_atombios_scratch_regs_engine_hung(adev, false);
-#endif
int ret = 0;
struct smu_context *smu = >smu;
 
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/6] drm/amdgpu: use IP discovery table for renoir

2020-05-28 Thread Alex Deucher
Rather than relying on gpu info firmware.

Signed-off-by: Alex Deucher 
---

Can someone test this on renoir?

 drivers/gpu/drm/amd/amdgpu/soc15.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index c7c9e07962b9..623745b2d8b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -670,14 +670,25 @@ static uint32_t soc15_get_rev_id(struct amdgpu_device 
*adev)
 
 int soc15_set_ip_blocks(struct amdgpu_device *adev)
 {
+   int r;
+
/* Set IP register base before any HW register access */
switch (adev->asic_type) {
case CHIP_VEGA10:
case CHIP_VEGA12:
case CHIP_RAVEN:
-   case CHIP_RENOIR:
vega10_reg_base_init(adev);
break;
+   case CHIP_RENOIR:
+   if (amdgpu_discovery) {
+   r = amdgpu_discovery_reg_base_init(adev);
+   if (r) {
+   DRM_WARN("failed to init reg base from ip 
discovery table, "
+"fallback to legacy init method\n");
+   vega10_reg_base_init(adev);
+   }
+   }
+   break;
case CHIP_VEGA20:
vega20_reg_base_init(adev);
break;
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: using amdgpu headless (no monitor)

2020-05-28 Thread Alex Deucher
On Thu, May 28, 2020 at 3:47 AM Ian Rogers  wrote:
>
> Hi,
>
> Why can't virtual_display be used along with a physical display?
>

We've never had a use case to mix the two.

> I see from here: https://bugzilla.kernel.org/show_bug.cgi?id=203339 that the 
> intention of the virtual_display module option is to allow for virtual 
> displays and purposely disable physical/real displays.
> Is there a technical limitation that required this?
>

It was easier to implement them separately because they are self
contained and we don't have to deal with any interactions or
dependencies between real and fake elements of the display pipeline.

> I ask because I want to be able to run a system that is sometimes headless 
> and sometimes not.  And I'd like to be able to access a current X session 
> (either logged in or at the login greeter) both remotely (say via VNC) and 
> locally via the physical display (when it is plugged in).without having to 
> reboot or lose that X session.
>
> However I've noticed that (at least with a Ryzen 3 3200G with Radeon Vega 8) 
> an X session does not login successfully when accessed remotely if there is 
> no monitor connected.
> I assume this is caused by something in the amdgpu driver but I haven't been 
> able to figure out what.

It's your display manager (X, mutter, kwin, etc.).  They generally
won't start if they doesn't detect a monitor.  You might be able to
force one via whatever configuration mechanism is provided by your
environment.

> So I was hoping that perhaps a solution would involve using virtual_display, 
> but that won't work as it's currently designed because of it disabling the 
> physical display.
>
> But maybe I've gone down the wrong track and there is another solution.  I've 
> tried using the xorg dummy driver in addition to amdgpu, but when doing so, I 
> can never get a physical display working.
> This is an Ubuntu 20.04 system with kernel 5.4.0.
>
> I can provide lots more detail on what I've tried and my config where needed.
>
> Thanks much
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs

2020-05-28 Thread Marek Olšák
On Thu, May 28, 2020 at 2:12 PM Christian König 
wrote:

> Am 28.05.20 um 18:06 schrieb Marek Olšák:
>
> On Thu, May 28, 2020 at 10:40 AM Christian König 
> wrote:
>
>> Am 28.05.20 um 12:06 schrieb Michel Dänzer:
>> > On 2020-05-28 11:11 a.m., Christian König wrote:
>> >> Well we still need implicit sync [...]
>> > Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu
>> > doesn't ensure later jobs fully see the effects of previous implicitly
>> > synced jobs", requiring userspace to do pessimistic flushing.
>>
>> Yes, exactly that.
>>
>> For the background: We also do this flushing for explicit syncs. And
>> when this was implemented 2-3 years ago we first did the flushing for
>> implicit sync as well.
>>
>> That was immediately reverted and then implemented differently because
>> it caused severe performance problems in some use cases.
>>
>> I'm not sure of the root cause of this performance problems. My
>> assumption was always that we then insert to many pipeline syncs, but
>> Marek doesn't seem to think it could be that.
>>
>> On the one hand I'm rather keen to remove the extra handling and just
>> always use the explicit handling for everything because it simplifies
>> the kernel code quite a bit. On the other hand I don't want to run into
>> this performance problem again.
>>
>> Additional to that what the kernel does is a "full" pipeline sync, e.g.
>> we busy wait for the full hardware pipeline to drain. That might be
>> overkill if you just want to do some flushing so that the next shader
>> sees the stuff written, but I'm not an expert on that.
>>
>
> Do we busy-wait on the CPU or in WAIT_REG_MEM?
>
> WAIT_REG_MEM is what UMDs do and should be faster.
>
>
> We use WAIT_REG_MEM to wait for an EOP fence value to reach memory.
>
> We use this for a couple of things, especially to make sure that the
> hardware is idle before changing VMID to page table associations.
>
> What about your idea of having an extra dw in the shared BOs indicating
> that they are flushed?
>
> As far as I understand it an EOS or other event might be sufficient for
> the caches as well. And you could insert the WAIT_REG_MEM directly before
> the first draw using the texture and not before the whole IB.
>
> Could be that we can optimize this even more than what we do in the kernel.
>
> Christian.
>

Adding fences into BOs would be bad, because all UMDs would have to handle
them.

Is it possible to do this in the ring buffer:
if (fence_signalled) {
   indirect_buffer(dependent_IB);
   indirect_buffer(other_IB);
} else {
   indirect_buffer(other_IB);
   wait_reg_mem(fence);
   indirect_buffer(dependent_IB);
}

Or we might have to wait for a hw scheduler.

Does the kernel sync when the driver fd is different, or when the context
is different?

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs

2020-05-28 Thread Christian König

Am 28.05.20 um 18:06 schrieb Marek Olšák:
On Thu, May 28, 2020 at 10:40 AM Christian König 
mailto:christian.koe...@amd.com>> wrote:


Am 28.05.20 um 12:06 schrieb Michel Dänzer:
> On 2020-05-28 11:11 a.m., Christian König wrote:
>> Well we still need implicit sync [...]
> Yeah, this isn't about "we don't want implicit sync", it's about
"amdgpu
> doesn't ensure later jobs fully see the effects of previous
implicitly
> synced jobs", requiring userspace to do pessimistic flushing.

Yes, exactly that.

For the background: We also do this flushing for explicit syncs. And
when this was implemented 2-3 years ago we first did the flushing for
implicit sync as well.

That was immediately reverted and then implemented differently
because
it caused severe performance problems in some use cases.

I'm not sure of the root cause of this performance problems. My
assumption was always that we then insert to many pipeline syncs, but
Marek doesn't seem to think it could be that.

On the one hand I'm rather keen to remove the extra handling and just
always use the explicit handling for everything because it simplifies
the kernel code quite a bit. On the other hand I don't want to run
into
this performance problem again.

Additional to that what the kernel does is a "full" pipeline sync,
e.g.
we busy wait for the full hardware pipeline to drain. That might be
overkill if you just want to do some flushing so that the next shader
sees the stuff written, but I'm not an expert on that.


Do we busy-wait on the CPU or in WAIT_REG_MEM?

WAIT_REG_MEM is what UMDs do and should be faster.


We use WAIT_REG_MEM to wait for an EOP fence value to reach memory.

We use this for a couple of things, especially to make sure that the 
hardware is idle before changing VMID to page table associations.


What about your idea of having an extra dw in the shared BOs indicating 
that they are flushed?


As far as I understand it an EOS or other event might be sufficient for 
the caches as well. And you could insert the WAIT_REG_MEM directly 
before the first draw using the texture and not before the whole IB.


Could be that we can optimize this even more than what we do in the kernel.

Christian.



Marek


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs

2020-05-28 Thread Marek Olšák
On Thu, May 28, 2020 at 10:40 AM Christian König 
wrote:

> Am 28.05.20 um 12:06 schrieb Michel Dänzer:
> > On 2020-05-28 11:11 a.m., Christian König wrote:
> >> Well we still need implicit sync [...]
> > Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu
> > doesn't ensure later jobs fully see the effects of previous implicitly
> > synced jobs", requiring userspace to do pessimistic flushing.
>
> Yes, exactly that.
>
> For the background: We also do this flushing for explicit syncs. And
> when this was implemented 2-3 years ago we first did the flushing for
> implicit sync as well.
>
> That was immediately reverted and then implemented differently because
> it caused severe performance problems in some use cases.
>
> I'm not sure of the root cause of this performance problems. My
> assumption was always that we then insert to many pipeline syncs, but
> Marek doesn't seem to think it could be that.
>
> On the one hand I'm rather keen to remove the extra handling and just
> always use the explicit handling for everything because it simplifies
> the kernel code quite a bit. On the other hand I don't want to run into
> this performance problem again.
>
> Additional to that what the kernel does is a "full" pipeline sync, e.g.
> we busy wait for the full hardware pipeline to drain. That might be
> overkill if you just want to do some flushing so that the next shader
> sees the stuff written, but I'm not an expert on that.
>

Do we busy-wait on the CPU or in WAIT_REG_MEM?

WAIT_REG_MEM is what UMDs do and should be faster.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs

2020-05-28 Thread Christian König

Am 28.05.20 um 12:06 schrieb Michel Dänzer:

On 2020-05-28 11:11 a.m., Christian König wrote:

Well we still need implicit sync [...]

Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu
doesn't ensure later jobs fully see the effects of previous implicitly
synced jobs", requiring userspace to do pessimistic flushing.


Yes, exactly that.

For the background: We also do this flushing for explicit syncs. And 
when this was implemented 2-3 years ago we first did the flushing for 
implicit sync as well.


That was immediately reverted and then implemented differently because 
it caused severe performance problems in some use cases.


I'm not sure of the root cause of this performance problems. My 
assumption was always that we then insert to many pipeline syncs, but 
Marek doesn't seem to think it could be that.


On the one hand I'm rather keen to remove the extra handling and just 
always use the explicit handling for everything because it simplifies 
the kernel code quite a bit. On the other hand I don't want to run into 
this performance problem again.


Additional to that what the kernel does is a "full" pipeline sync, e.g. 
we busy wait for the full hardware pipeline to drain. That might be 
overkill if you just want to do some flushing so that the next shader 
sees the stuff written, but I'm not an expert on that.


Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/amd/display: disable dcn20 abm feature for bring up"

2020-05-28 Thread Harry Wentland



On 2020-05-28 10:13 a.m., Alexander Monakov wrote:
> 
> 
> On Thu, 28 May 2020, Harry Wentland wrote:
> 
>> On 2020-05-28 9:54 a.m., Alexander Monakov wrote:
>>>
>>>
>>> On Thu, 28 May 2020, Harry Wentland wrote:
>>>
 This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.

 This change was used for DCN2 bringup and is no longer desired.
 In fact it breaks backlight on DCN2 systems.
>>>
>>> Reported-and-tested-by: Alexander Monakov 
>>>
>>
>> Thanks, Alex.
>>
>> Just to confirm, this fixes the backlight issue you were seeing?
> 
> I applied a similar fix to my kernel yesterday, and it worked fine.
> Tested by changing brightness a few times. If any problems come up,
> I'll send new reports.
> 

Thanks for the confirmation.

Harry

> Alexander
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC 02/17] dma-fence: basic lockdep annotations

2020-05-28 Thread Daniel Vetter
On Thu, May 28, 2020 at 3:37 PM Thomas Hellström (Intel)
 wrote:
>
> On 2020-05-12 10:59, Daniel Vetter wrote:
> > Design is similar to the lockdep annotations for workers, but with
> > some twists:
> >
> > - We use a read-lock for the execution/worker/completion side, so that
> >this explicit annotation can be more liberally sprinkled around.
> >With read locks lockdep isn't going to complain if the read-side
> >isn't nested the same way under all circumstances, so ABBA deadlocks
> >are ok. Which they are, since this is an annotation only.
> >
> > - We're using non-recursive lockdep read lock mode, since in recursive
> >read lock mode lockdep does not catch read side hazards. And we
> >_very_ much want read side hazards to be caught. For full details of
> >this limitation see
> >
> >commit e91498589746065e3ae95d9a00b068e525eec34f
> >Author: Peter Zijlstra 
> >Date:   Wed Aug 23 13:13:11 2017 +0200
> >
> >locking/lockdep/selftests: Add mixed read-write ABBA tests
> >
> > - To allow nesting of the read-side explicit annotations we explicitly
> >keep track of the nesting. lock_is_held() allows us to do that.
> >
> > - The wait-side annotation is a write lock, and entirely done within
> >dma_fence_wait() for everyone by default.
> >
> > - To be able to freely annotate helper functions I want to make it ok
> >to call dma_fence_begin/end_signalling from soft/hardirq context.
> >First attempt was using the hardirq locking context for the write
> >side in lockdep, but this forces all normal spinlocks nested within
> >dma_fence_begin/end_signalling to be spinlocks. That bollocks.
> >
> >The approach now is to simple check in_atomic(), and for these cases
> >entirely rely on the might_sleep() check in dma_fence_wait(). That
> >will catch any wrong nesting against spinlocks from soft/hardirq
> >contexts.
> >
> > The idea here is that every code path that's critical for eventually
> > signalling a dma_fence should be annotated with
> > dma_fence_begin/end_signalling. The annotation ideally starts right
> > after a dma_fence is published (added to a dma_resv, exposed as a
> > sync_file fd, attached to a drm_syncobj fd, or anything else that
> > makes the dma_fence visible to other kernel threads), up to and
> > including the dma_fence_wait(). Examples are irq handlers, the
> > scheduler rt threads, the tail of execbuf (after the corresponding
> > fences are visible), any workers that end up signalling dma_fences and
> > really anything else. Not annotated should be code paths that only
> > complete fences opportunistically as the gpu progresses, like e.g.
> > shrinker/eviction code.
> >
> > The main class of deadlocks this is supposed to catch are:
> >
> > Thread A:
> >
> >   mutex_lock(A);
> >   mutex_unlock(A);
> >
> >   dma_fence_signal();
> >
> > Thread B:
> >
> >   mutex_lock(A);
> >   dma_fence_wait();
> >   mutex_unlock(A);
> >
> > Thread B is blocked on A signalling the fence, but A never gets around
> > to that because it cannot acquire the lock A.
> >
> > Note that dma_fence_wait() is allowed to be nested within
> > dma_fence_begin/end_signalling sections. To allow this to happen the
> > read lock needs to be upgraded to a write lock, which means that any
> > other lock is acquired between the dma_fence_begin_signalling() call and
> > the call to dma_fence_wait(), and still held, this will result in an
> > immediate lockdep complaint. The only other option would be to not
> > annotate such calls, defeating the point. Therefore these annotations
> > cannot be sprinkled over the code entirely mindless to avoid false
> > positives.
> >
> > v2: handle soft/hardirq ctx better against write side and dont forget
> > EXPORT_SYMBOL, drivers can't use this otherwise.
> >
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Cc: linux-r...@vger.kernel.org
> > Cc: amd-gfx@lists.freedesktop.org
> > Cc: intel-...@lists.freedesktop.org
> > Cc: Chris Wilson 
> > Cc: Maarten Lankhorst 
> > Cc: Christian König 
> > Signed-off-by: Daniel Vetter 
>
> LGTM. Perhaps some in-code documentation on how to use the new functions
> are called.

See cover letter, that's going to be done for next round. For this one
here I just wanted to showcase a bit how it's used in a few different
places, mostly selected to get as much feedback from across different
drivers. Hence e.g. annotating drm/scheduler.

> Otherwise for patch 2 and 3,
>
> Reviewed-by: Thomas Hellstrom 

I think I'll just cc you for the next round with docs, so you can make
sure it looks ok :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/amd/display: disable dcn20 abm feature for bring up"

2020-05-28 Thread Alexander Monakov



On Thu, 28 May 2020, Harry Wentland wrote:

> On 2020-05-28 9:54 a.m., Alexander Monakov wrote:
> > 
> > 
> > On Thu, 28 May 2020, Harry Wentland wrote:
> > 
> >> This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.
> >>
> >> This change was used for DCN2 bringup and is no longer desired.
> >> In fact it breaks backlight on DCN2 systems.
> > 
> > Reported-and-tested-by: Alexander Monakov 
> > 
> 
> Thanks, Alex.
> 
> Just to confirm, this fixes the backlight issue you were seeing?

I applied a similar fix to my kernel yesterday, and it worked fine.
Tested by changing brightness a few times. If any problems come up,
I'll send new reports.

Alexander
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/amd/display: disable dcn20 abm feature for bring up"

2020-05-28 Thread Kazlauskas, Nicholas

On 2020-05-28 10:08 a.m., Alex Deucher wrote:

On Thu, May 28, 2020 at 9:47 AM Harry Wentland  wrote:


This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.

This change was used for DCN2 bringup and is no longer desired.
In fact it breaks backlight on DCN2 systems.

Cc: Alexander Monakov 
Cc: Hersen Wu 
Cc: Anthony Koo 
Cc: Michael Chiu 
Signed-off-by: Harry Wentland 


Acked-by: Alex Deucher 


Reviewed-by: Nicholas Kazlauskas 




---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ddc979e3eebe..acd4874e0743 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1356,7 +1356,7 @@ static int dm_late_init(void *handle)
 unsigned int linear_lut[16];
 int i;
 struct dmcu *dmcu = NULL;
-   bool ret = false;
+   bool ret;

 if (!adev->dm.fw_dmcu)
 return detect_mst_link_for_all_connectors(adev->ddev);
@@ -1377,13 +1377,10 @@ static int dm_late_init(void *handle)
  */
 params.min_abm_backlight = 0x28F;

-   /* todo will enable for navi10 */
-   if (adev->asic_type <= CHIP_RAVEN) {
-   ret = dmcu_load_iram(dmcu, params);
+   ret = dmcu_load_iram(dmcu, params);

-   if (!ret)
-   return -EINVAL;
-   }
+   if (!ret)
+   return -EINVAL;

 return detect_mst_link_for_all_connectors(adev->ddev);
  }
--
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/amd/display: disable dcn20 abm feature for bring up"

2020-05-28 Thread Alex Deucher
On Thu, May 28, 2020 at 9:47 AM Harry Wentland  wrote:
>
> This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.
>
> This change was used for DCN2 bringup and is no longer desired.
> In fact it breaks backlight on DCN2 systems.
>
> Cc: Alexander Monakov 
> Cc: Hersen Wu 
> Cc: Anthony Koo 
> Cc: Michael Chiu 
> Signed-off-by: Harry Wentland 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index ddc979e3eebe..acd4874e0743 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -1356,7 +1356,7 @@ static int dm_late_init(void *handle)
> unsigned int linear_lut[16];
> int i;
> struct dmcu *dmcu = NULL;
> -   bool ret = false;
> +   bool ret;
>
> if (!adev->dm.fw_dmcu)
> return detect_mst_link_for_all_connectors(adev->ddev);
> @@ -1377,13 +1377,10 @@ static int dm_late_init(void *handle)
>  */
> params.min_abm_backlight = 0x28F;
>
> -   /* todo will enable for navi10 */
> -   if (adev->asic_type <= CHIP_RAVEN) {
> -   ret = dmcu_load_iram(dmcu, params);
> +   ret = dmcu_load_iram(dmcu, params);
>
> -   if (!ret)
> -   return -EINVAL;
> -   }
> +   if (!ret)
> +   return -EINVAL;
>
> return detect_mst_link_for_all_connectors(adev->ddev);
>  }
> --
> 2.26.2
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/amd/display: disable dcn20 abm feature for bring up"

2020-05-28 Thread Alexander Monakov



On Thu, 28 May 2020, Harry Wentland wrote:

> This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.
> 
> This change was used for DCN2 bringup and is no longer desired.
> In fact it breaks backlight on DCN2 systems.

Reported-and-tested-by: Alexander Monakov 

Thanks.

> Cc: Alexander Monakov 
> Cc: Hersen Wu 
> Cc: Anthony Koo 
> Cc: Michael Chiu 
> Signed-off-by: Harry Wentland 
> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index ddc979e3eebe..acd4874e0743 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -1356,7 +1356,7 @@ static int dm_late_init(void *handle)
>   unsigned int linear_lut[16];
>   int i;
>   struct dmcu *dmcu = NULL;
> - bool ret = false;
> + bool ret;
>  
>   if (!adev->dm.fw_dmcu)
>   return detect_mst_link_for_all_connectors(adev->ddev);
> @@ -1377,13 +1377,10 @@ static int dm_late_init(void *handle)
>*/
>   params.min_abm_backlight = 0x28F;
>  
> - /* todo will enable for navi10 */
> - if (adev->asic_type <= CHIP_RAVEN) {
> - ret = dmcu_load_iram(dmcu, params);
> + ret = dmcu_load_iram(dmcu, params);
>  
> - if (!ret)
> - return -EINVAL;
> - }
> + if (!ret)
> + return -EINVAL;
>  
>   return detect_mst_link_for_all_connectors(adev->ddev);
>  }
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/amd/display: disable dcn20 abm feature for bring up"

2020-05-28 Thread Harry Wentland
On 2020-05-28 9:54 a.m., Alexander Monakov wrote:
> 
> 
> On Thu, 28 May 2020, Harry Wentland wrote:
> 
>> This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.
>>
>> This change was used for DCN2 bringup and is no longer desired.
>> In fact it breaks backlight on DCN2 systems.
> 
> Reported-and-tested-by: Alexander Monakov 
> 

Thanks, Alex.

Just to confirm, this fixes the backlight issue you were seeing?

Harry

> Thanks.
> 
>> Cc: Alexander Monakov 
>> Cc: Hersen Wu 
>> Cc: Anthony Koo 
>> Cc: Michael Chiu 
>> Signed-off-by: Harry Wentland 
>> ---
>>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
>>  1 file changed, 4 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index ddc979e3eebe..acd4874e0743 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -1356,7 +1356,7 @@ static int dm_late_init(void *handle)
>>  unsigned int linear_lut[16];
>>  int i;
>>  struct dmcu *dmcu = NULL;
>> -bool ret = false;
>> +bool ret;
>>  
>>  if (!adev->dm.fw_dmcu)
>>  return detect_mst_link_for_all_connectors(adev->ddev);
>> @@ -1377,13 +1377,10 @@ static int dm_late_init(void *handle)
>>   */
>>  params.min_abm_backlight = 0x28F;
>>  
>> -/* todo will enable for navi10 */
>> -if (adev->asic_type <= CHIP_RAVEN) {
>> -ret = dmcu_load_iram(dmcu, params);
>> +ret = dmcu_load_iram(dmcu, params);
>>  
>> -if (!ret)
>> -return -EINVAL;
>> -}
>> +if (!ret)
>> +return -EINVAL;
>>  
>>  return detect_mst_link_for_all_connectors(adev->ddev);
>>  }
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/display: drop the reduction loop when setting the sync groups

2020-05-28 Thread Alex Deucher
The logic for blanked is not the same as having a plane_state. Technically
you can drive an OTG without anything connected in the front end and it'll
just draw out the back color which is distinct from having the OTG be blanked.
If we add planes or unblank the OTG later then we'll still want the
synchronization.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/781
Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is synced by plane 
state")
Cc: nicholas.kazlaus...@amd.com
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 04c3d9f7e323..6279520f7873 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1040,14 +1040,6 @@ static void program_timing_sync(
status->timing_sync_info.master = false;
 
}
-   /* remove any other pipes with plane as they have already been 
synced */
-   for (j = j + 1; j < group_size; j++) {
-   if (pipe_set[j]->plane_state) {
-   group_size--;
-   pipe_set[j] = pipe_set[group_size];
-   j--;
-   }
-   }
 
if (group_size > 1) {
dc->hwss.enable_timing_synchronization(
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Improve the MTYPE comments

2020-05-28 Thread Christian König

Am 27.05.20 um 02:57 schrieb Yong Zhao:

Use words insteads of acronym for better understanding.

Signed-off-by: Yong Zhao 


Good idea, Reviewed-by: Christian König 


---
  include/uapi/drm/amdgpu_drm.h | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index d65f9b4ba05c..0072ddb59747 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -564,15 +564,15 @@ struct drm_amdgpu_gem_op {
  #define AMDGPU_VM_MTYPE_MASK  (0xf << 5)
  /* Default MTYPE. Pre-AI must use this.  Recommended for newer ASICs. */
  #define AMDGPU_VM_MTYPE_DEFAULT   (0 << 5)
-/* Use NC MTYPE instead of default MTYPE */
+/* Use Non Coherent MTYPE instead of default MTYPE */
  #define AMDGPU_VM_MTYPE_NC(1 << 5)
-/* Use WC MTYPE instead of default MTYPE */
+/* Use Write Combine MTYPE instead of default MTYPE */
  #define AMDGPU_VM_MTYPE_WC(2 << 5)
-/* Use CC MTYPE instead of default MTYPE */
+/* Use Cache Coherent MTYPE instead of default MTYPE */
  #define AMDGPU_VM_MTYPE_CC(3 << 5)
-/* Use UC MTYPE instead of default MTYPE */
+/* Use UnCached MTYPE instead of default MTYPE */
  #define AMDGPU_VM_MTYPE_UC(4 << 5)
-/* Use RW MTYPE instead of default MTYPE */
+/* Use Read Write MTYPE instead of default MTYPE */
  #define AMDGPU_VM_MTYPE_RW(5 << 5)
  
  struct drm_amdgpu_gem_va {


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] Revert "drm/amd/display: disable dcn20 abm feature for bring up"

2020-05-28 Thread Harry Wentland
This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.

This change was used for DCN2 bringup and is no longer desired.
In fact it breaks backlight on DCN2 systems.

Cc: Alexander Monakov 
Cc: Hersen Wu 
Cc: Anthony Koo 
Cc: Michael Chiu 
Signed-off-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ddc979e3eebe..acd4874e0743 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1356,7 +1356,7 @@ static int dm_late_init(void *handle)
unsigned int linear_lut[16];
int i;
struct dmcu *dmcu = NULL;
-   bool ret = false;
+   bool ret;
 
if (!adev->dm.fw_dmcu)
return detect_mst_link_for_all_connectors(adev->ddev);
@@ -1377,13 +1377,10 @@ static int dm_late_init(void *handle)
 */
params.min_abm_backlight = 0x28F;
 
-   /* todo will enable for navi10 */
-   if (adev->asic_type <= CHIP_RAVEN) {
-   ret = dmcu_load_iram(dmcu, params);
+   ret = dmcu_load_iram(dmcu, params);
 
-   if (!ret)
-   return -EINVAL;
-   }
+   if (!ret)
+   return -EINVAL;
 
return detect_mst_link_for_all_connectors(adev->ddev);
 }
-- 
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/fru: fix header guard and include header

2020-05-28 Thread Alex Deucher
Fix the fru eeprom header guard and include it in the .c file.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index 815c072ac4da..6ae80b33182c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -26,6 +26,7 @@
 #include "amdgpu_i2c.h"
 #include "smu_v11_0_i2c.h"
 #include "atom.h"
+#include "amdgpu_fru_eeprom.h"
 
 #define I2C_PRODUCT_INFO_ADDR  0xAC
 #define I2C_PRODUCT_INFO_ADDR_SIZE 0x2
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h
index 968115c97e33..f29a8611d69b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h
@@ -21,8 +21,8 @@
  *
  */
 
-#ifndef __AMDGPU_PRODINFO_H__
-#define __AMDGPU_PRODINFO_H__
+#ifndef __AMDGPU_FRU_EEPROM_H__
+#define __AMDGPU_FRU_EEPROM_H__
 
 int amdgpu_fru_get_product_info(struct amdgpu_device *adev);
 
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC 02/17] dma-fence: basic lockdep annotations

2020-05-28 Thread Intel

On 2020-05-12 10:59, Daniel Vetter wrote:

Design is similar to the lockdep annotations for workers, but with
some twists:

- We use a read-lock for the execution/worker/completion side, so that
   this explicit annotation can be more liberally sprinkled around.
   With read locks lockdep isn't going to complain if the read-side
   isn't nested the same way under all circumstances, so ABBA deadlocks
   are ok. Which they are, since this is an annotation only.

- We're using non-recursive lockdep read lock mode, since in recursive
   read lock mode lockdep does not catch read side hazards. And we
   _very_ much want read side hazards to be caught. For full details of
   this limitation see

   commit e91498589746065e3ae95d9a00b068e525eec34f
   Author: Peter Zijlstra 
   Date:   Wed Aug 23 13:13:11 2017 +0200

   locking/lockdep/selftests: Add mixed read-write ABBA tests

- To allow nesting of the read-side explicit annotations we explicitly
   keep track of the nesting. lock_is_held() allows us to do that.

- The wait-side annotation is a write lock, and entirely done within
   dma_fence_wait() for everyone by default.

- To be able to freely annotate helper functions I want to make it ok
   to call dma_fence_begin/end_signalling from soft/hardirq context.
   First attempt was using the hardirq locking context for the write
   side in lockdep, but this forces all normal spinlocks nested within
   dma_fence_begin/end_signalling to be spinlocks. That bollocks.

   The approach now is to simple check in_atomic(), and for these cases
   entirely rely on the might_sleep() check in dma_fence_wait(). That
   will catch any wrong nesting against spinlocks from soft/hardirq
   contexts.

The idea here is that every code path that's critical for eventually
signalling a dma_fence should be annotated with
dma_fence_begin/end_signalling. The annotation ideally starts right
after a dma_fence is published (added to a dma_resv, exposed as a
sync_file fd, attached to a drm_syncobj fd, or anything else that
makes the dma_fence visible to other kernel threads), up to and
including the dma_fence_wait(). Examples are irq handlers, the
scheduler rt threads, the tail of execbuf (after the corresponding
fences are visible), any workers that end up signalling dma_fences and
really anything else. Not annotated should be code paths that only
complete fences opportunistically as the gpu progresses, like e.g.
shrinker/eviction code.

The main class of deadlocks this is supposed to catch are:

Thread A:

mutex_lock(A);
mutex_unlock(A);

dma_fence_signal();

Thread B:

mutex_lock(A);
dma_fence_wait();
mutex_unlock(A);

Thread B is blocked on A signalling the fence, but A never gets around
to that because it cannot acquire the lock A.

Note that dma_fence_wait() is allowed to be nested within
dma_fence_begin/end_signalling sections. To allow this to happen the
read lock needs to be upgraded to a write lock, which means that any
other lock is acquired between the dma_fence_begin_signalling() call and
the call to dma_fence_wait(), and still held, this will result in an
immediate lockdep complaint. The only other option would be to not
annotate such calls, defeating the point. Therefore these annotations
cannot be sprinkled over the code entirely mindless to avoid false
positives.

v2: handle soft/hardirq ctx better against write side and dont forget
EXPORT_SYMBOL, drivers can't use this otherwise.

Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Cc: linux-r...@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-...@lists.freedesktop.org
Cc: Chris Wilson 
Cc: Maarten Lankhorst 
Cc: Christian König 
Signed-off-by: Daniel Vetter 


LGTM. Perhaps some in-code documentation on how to use the new functions 
are called.


Otherwise for patch 2 and 3,

Reviewed-by: Thomas Hellstrom 


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/amd/powerplay: stop thermal IRQs on suspend

2020-05-28 Thread Deucher, Alexander
[AMD Public Use]

Series is:
Reviewed-by: Alex Deucher 

From: Quan, Evan 
Sent: Thursday, May 28, 2020 5:02 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Quan, Evan 

Subject: [PATCH 1/3] drm/amd/powerplay: stop thermal IRQs on suspend

Added missing thermal IRQs disablement on suspend.

Change-Id: I959a1d56930de434cc8534334220d3faeadf79f8
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 8ce907280dc9..12511407683d 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1537,6 +1537,12 @@ static int smu_suspend(void *handle)

 smu_i2c_eeprom_fini(smu, >pm.smu_i2c);

+   ret = smu_stop_thermal_control(smu);
+   if (ret) {
+   pr_warn("Fail to stop thermal control!\n");
+   return ret;
+   }
+
 ret = smu_disable_dpm(smu);
 if (ret)
 return ret;
--
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu/pm: don't bail for in_suspend

2020-05-28 Thread Alex Deucher
Ping?

On Wed, May 27, 2020 at 6:52 PM Alex Deucher  wrote:
>
> Otherwise we disable sysfs/debugfs access with runtime pm.
>
> Fixes: f7c8d853b029df ("drm/amdgpu/pm: return an error during GPU reset or 
> suspend")
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 114 -
>  1 file changed, 57 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 808884aaf36d..775e389c9a13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -163,7 +163,7 @@ static ssize_t amdgpu_get_power_dpm_state(struct device 
> *dev,
> enum amd_pm_state_type pm;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev);
> @@ -199,7 +199,7 @@ static ssize_t amdgpu_set_power_dpm_state(struct device 
> *dev,
> enum amd_pm_state_type  state;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (strncmp("battery", buf, strlen("battery")) == 0)
> @@ -303,7 +303,7 @@ static ssize_t 
> amdgpu_get_power_dpm_force_performance_level(struct device *dev,
> enum amd_dpm_forced_level level = 0xff;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev);
> @@ -343,7 +343,7 @@ static ssize_t 
> amdgpu_set_power_dpm_force_performance_level(struct device *dev,
> enum amd_dpm_forced_level current_level = 0xff;
> int ret = 0;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (strncmp("low", buf, strlen("low")) == 0) {
> @@ -445,7 +445,7 @@ static ssize_t amdgpu_get_pp_num_states(struct device 
> *dev,
> struct pp_states_info data;
> int i, buf_len, ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev);
> @@ -487,7 +487,7 @@ static ssize_t amdgpu_get_pp_cur_state(struct device *dev,
> enum amd_pm_state_type pm = 0;
> int i = 0, ret = 0;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev);
> @@ -526,7 +526,7 @@ static ssize_t amdgpu_get_pp_force_state(struct device 
> *dev,
> struct drm_device *ddev = dev_get_drvdata(dev);
> struct amdgpu_device *adev = ddev->dev_private;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (adev->pp_force_state_enabled)
> @@ -546,7 +546,7 @@ static ssize_t amdgpu_set_pp_force_state(struct device 
> *dev,
> unsigned long idx;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (strlen(buf) == 1)
> @@ -604,7 +604,7 @@ static ssize_t amdgpu_get_pp_table(struct device *dev,
> char *table = NULL;
> int size, ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev);
> @@ -646,7 +646,7 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
> struct amdgpu_device *adev = ddev->dev_private;
> int ret = 0;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev);
> @@ -751,7 +751,7 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device 
> *dev,
> const char delimiter[3] = {' ', '\n', '\0'};
> uint32_t type;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> if (count > 127)
> @@ -843,7 +843,7 @@ static ssize_t amdgpu_get_pp_od_clk_voltage(struct device 
> *dev,
> ssize_t size;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = pm_runtime_get_sync(ddev->dev);
> @@ -895,7 +895,7 @@ static ssize_t amdgpu_set_pp_features(struct device *dev,
> uint64_t featuremask;
> int ret;
>
> -   if (adev->in_gpu_reset || adev->in_suspend)
> +   if (adev->in_gpu_reset)
> return -EPERM;
>
> ret = kstrtou64(buf, 0, );
> @@ -938,7 +938,7 @@ static ssize_t amdgpu_get_pp_features(struct device *dev,
> ssize_t size;
> int ret;
>
> -   if 

using amdgpu headless (no monitor)

2020-05-28 Thread Ian Rogers
Hi,

Why can't virtual_display be used along with a physical display?

I see from here: https://bugzilla.kernel.org/show_bug.cgi?id=203339 that the 
intention of the virtual_display module option is to allow for virtual displays 
and purposely disable physical/real displays.
Is there a technical limitation that required this?

I ask because I want to be able to run a system that is sometimes headless and 
sometimes not.  And I'd like to be able to access a current X session (either 
logged in or at the login greeter) both remotely (say via VNC) and locally via 
the physical display (when it is plugged in).without having to reboot or lose 
that X session.

However I've noticed that (at least with a Ryzen 3 3200G with Radeon Vega 8) an 
X session does not login successfully when accessed remotely if there is no 
monitor connected.
I assume this is caused by something in the amdgpu driver but I haven't been 
able to figure out what.
So I was hoping that perhaps a solution would involve using virtual_display, 
but that won't work as it's currently designed because of it disabling the 
physical display.

But maybe I've gone down the wrong track and there is another solution.  I've 
tried using the xorg dummy driver in addition to amdgpu, but when doing so, I 
can never get a physical display working.
This is an Ubuntu 20.04 system with kernel 5.4.0.

I can provide lots more detail on what I've tried and my config where needed.

Thanks much
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs

2020-05-28 Thread Michel Dänzer
On 2020-05-28 11:11 a.m., Christian König wrote:
> Well we still need implicit sync [...]

Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu
doesn't ensure later jobs fully see the effects of previous implicitly
synced jobs", requiring userspace to do pessimistic flushing.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs

2020-05-28 Thread Christian König
Well we still need implicit sync or otherwise the GPU scheduler would 
pick up the jobs in the wrong order.


See without this when we have the following sequence of submission:

Client IB1 using buffer A
Client IB2
X IB1 using buffer A

We could end up with the execution order

X IB1 using buffer A
Client IB1 using buffer A
Client IB2

And that is not correct. The scheduler is only allowed to do the following:

Client IB1 using buffer A
X IB1 using buffer A
Client IB2

And that's what implicit sync is taking care of.

Christian.

Am 26.05.20 um 00:07 schrieb Marek Olšák:
If a user mode driver is changed to rely on the existence of implicit 
sync, it results in corruption and flickering as reported here: 
https://gitlab.freedesktop.org/mesa/mesa/-/issues/2950


Marek

On Mon, May 25, 2020 at 6:05 PM Marek Olšák > wrote:


Hi Christian,

Bas and Michel wanted to discuss this. The main disadvantage of no
implicit (pipeline) sync within the same queue is that we get
lower performance and lower GPU utilization in some cases.

We actually never really needed the kernel to have implicit sync,
because all user mode drivers contained hacks to work without it.

Marek



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/3] drm/amd/powerplay: stop thermal IRQs on suspend

2020-05-28 Thread Evan Quan
Added missing thermal IRQs disablement on suspend.

Change-Id: I959a1d56930de434cc8534334220d3faeadf79f8
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 8ce907280dc9..12511407683d 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1537,6 +1537,12 @@ static int smu_suspend(void *handle)
 
smu_i2c_eeprom_fini(smu, >pm.smu_i2c);
 
+   ret = smu_stop_thermal_control(smu);
+   if (ret) {
+   pr_warn("Fail to stop thermal control!\n");
+   return ret;
+   }
+
ret = smu_disable_dpm(smu);
if (ret)
return ret;
-- 
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/3] drm/amd/powerplay: use the common APIs for IRQ disablement/enablement

2020-05-28 Thread Evan Quan
Also the new logics for MP1 SW IRQs disablement/enablement are added.

Change-Id: I57ef8f21ab3d51aa0d557f511d89f5fa2ce08144
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 79 ---
 1 file changed, 57 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c 
b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
index edc9782743d2..728965ab6d83 100644
--- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
@@ -1167,8 +1167,6 @@ static int smu_v11_0_set_thermal_range(struct smu_context 
*smu,
val = RREG32_SOC15(THM, 0, mmTHM_THERMAL_INT_CTRL);
val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, MAX_IH_CREDIT, 5);
val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, THERM_IH_HW_ENA, 1);
-   val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, THERM_INTH_MASK, 0);
-   val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, THERM_INTL_MASK, 0);
val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, DIG_THERM_INTH, (high & 
0xff));
val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, DIG_THERM_INTL, (low & 
0xff));
val = val & (~THM_THERMAL_INT_CTRL__THERM_TRIGGER_MASK_MASK);
@@ -1178,20 +1176,6 @@ static int smu_v11_0_set_thermal_range(struct 
smu_context *smu,
return 0;
 }
 
-static int smu_v11_0_enable_thermal_alert(struct smu_context *smu)
-{
-   struct amdgpu_device *adev = smu->adev;
-   uint32_t val = 0;
-
-   val |= (1 << THM_THERMAL_INT_ENA__THERM_INTH_CLR__SHIFT);
-   val |= (1 << THM_THERMAL_INT_ENA__THERM_INTL_CLR__SHIFT);
-   val |= (1 << THM_THERMAL_INT_ENA__THERM_TRIGGER_CLR__SHIFT);
-
-   WREG32_SOC15(THM, 0, mmTHM_THERMAL_INT_ENA, val);
-
-   return 0;
-}
-
 int smu_v11_0_start_thermal_control(struct smu_context *smu)
 {
int ret = 0;
@@ -1209,7 +1193,7 @@ int smu_v11_0_start_thermal_control(struct smu_context 
*smu)
if (ret)
return ret;
 
-   ret = smu_v11_0_enable_thermal_alert(smu);
+   ret = amdgpu_irq_get(adev, smu->irq_source, 0);
if (ret)
return ret;
 
@@ -1233,11 +1217,7 @@ int smu_v11_0_start_thermal_control(struct smu_context 
*smu)
 
 int smu_v11_0_stop_thermal_control(struct smu_context *smu)
 {
-   struct amdgpu_device *adev = smu->adev;
-
-   WREG32_SOC15(THM, 0, mmTHM_THERMAL_INT_ENA, 0);
-
-   return 0;
+   return amdgpu_irq_put(smu->adev, smu->irq_source, 0);
 }
 
 static uint16_t convert_to_vddc(uint8_t vid)
@@ -1508,6 +1488,59 @@ int smu_v11_0_set_xgmi_pstate(struct smu_context *smu,
return ret;
 }
 
+static int smu_v11_0_set_irq_state(struct amdgpu_device *adev,
+  struct amdgpu_irq_src *source,
+  unsigned tyep,
+  enum amdgpu_interrupt_state state)
+{
+   uint32_t val = 0;
+
+   switch (state) {
+   case AMDGPU_IRQ_STATE_DISABLE:
+   /* For THM irqs */
+   val = RREG32_SOC15(THM, 0, mmTHM_THERMAL_INT_CTRL);
+   val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, THERM_INTH_MASK, 
1);
+   val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, THERM_INTL_MASK, 
1);
+   WREG32_SOC15(THM, 0, mmTHM_THERMAL_INT_CTRL, val);
+
+   WREG32_SOC15(THM, 0, mmTHM_THERMAL_INT_ENA, 0);
+
+   /* For MP1 SW irqs */
+   val = RREG32_SOC15(MP1, 0, mmMP1_SMN_IH_SW_INT_CTRL);
+   val = REG_SET_FIELD(val, MP1_SMN_IH_SW_INT_CTRL, INT_MASK, 1);
+   WREG32_SOC15(MP1, 0, mmMP1_SMN_IH_SW_INT_CTRL, val);
+
+   break;
+   case AMDGPU_IRQ_STATE_ENABLE:
+   /* For THM irqs */
+   val = RREG32_SOC15(THM, 0, mmTHM_THERMAL_INT_CTRL);
+   val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, THERM_INTH_MASK, 
0);
+   val = REG_SET_FIELD(val, THM_THERMAL_INT_CTRL, THERM_INTL_MASK, 
0);
+   WREG32_SOC15(THM, 0, mmTHM_THERMAL_INT_CTRL, val);
+
+   val = (1 << THM_THERMAL_INT_ENA__THERM_INTH_CLR__SHIFT);
+   val |= (1 << THM_THERMAL_INT_ENA__THERM_INTL_CLR__SHIFT);
+   val |= (1 << THM_THERMAL_INT_ENA__THERM_TRIGGER_CLR__SHIFT);
+   WREG32_SOC15(THM, 0, mmTHM_THERMAL_INT_ENA, val);
+
+   /* For MP1 SW irqs */
+   val = RREG32_SOC15(MP1, 0, mmMP1_SMN_IH_SW_INT);
+   val = REG_SET_FIELD(val, MP1_SMN_IH_SW_INT, ID, 0xFE);
+   val = REG_SET_FIELD(val, MP1_SMN_IH_SW_INT, VALID, 0);
+   WREG32_SOC15(MP1, 0, mmMP1_SMN_IH_SW_INT, val);
+
+   val = RREG32_SOC15(MP1, 0, mmMP1_SMN_IH_SW_INT_CTRL);
+   val = REG_SET_FIELD(val, MP1_SMN_IH_SW_INT_CTRL, INT_MASK, 0);
+   WREG32_SOC15(MP1, 0, mmMP1_SMN_IH_SW_INT_CTRL, val);
+
+   break;
+   default:
+   break;
+   }
+
+   return 0;
+}
+

[PATCH 3/3] drm/amd/powerplay: give better names for the thermal IRQ related APIs

2020-05-28 Thread Evan Quan
Thermal control is performed by PMFW. What handled in driver is
just whether or not to enable the alert(to driver).

Change-Id: Icf857054b74f021e7fee2bf3aa9b314aa0d5ef09
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 8 
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c   | 4 ++--
 drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h | 4 ++--
 drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h  | 4 ++--
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 4 ++--
 drivers/gpu/drm/amd/powerplay/smu_internal.h   | 8 
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c  | 4 ++--
 7 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 12511407683d..5294aa7cdde1 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1350,7 +1350,7 @@ static int smu_hw_init(void *handle)
if (ret)
goto failed;
 
-   ret = smu_start_thermal_control(smu);
+   ret = smu_enable_thermal_alert(smu);
if (ret)
goto failed;
 
@@ -1396,7 +1396,7 @@ static int smu_hw_fini(void *handle)
 
smu_i2c_eeprom_fini(smu, >pm.smu_i2c);
 
-   ret = smu_stop_thermal_control(smu);
+   ret = smu_disable_thermal_alert(smu);
if (ret) {
pr_warn("Fail to stop thermal control!\n");
return ret;
@@ -1537,7 +1537,7 @@ static int smu_suspend(void *handle)
 
smu_i2c_eeprom_fini(smu, >pm.smu_i2c);
 
-   ret = smu_stop_thermal_control(smu);
+   ret = smu_disable_thermal_alert(smu);
if (ret) {
pr_warn("Fail to stop thermal control!\n");
return ret;
@@ -1582,7 +1582,7 @@ static int smu_resume(void *handle)
if (ret)
goto failed;
 
-   ret = smu_start_thermal_control(smu);
+   ret = smu_enable_thermal_alert(smu);
if (ret)
goto failed;
 
diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c 
b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index 5c1b2d7abcaa..302b7e9cb5ba 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -2442,8 +2442,8 @@ static const struct pptable_funcs arcturus_ppt_funcs = {
.set_power_limit = smu_v11_0_set_power_limit,
.get_current_clk_freq = smu_v11_0_get_current_clk_freq,
.init_max_sustainable_clocks = smu_v11_0_init_max_sustainable_clocks,
-   .start_thermal_control = smu_v11_0_start_thermal_control,
-   .stop_thermal_control = smu_v11_0_stop_thermal_control,
+   .enable_thermal_alert = smu_v11_0_enable_thermal_alert,
+   .disable_thermal_alert = smu_v11_0_disable_thermal_alert,
.set_deep_sleep_dcefclk = smu_v11_0_set_deep_sleep_dcefclk,
.display_clock_voltage_request = 
smu_v11_0_display_clock_voltage_request,
.get_fan_control_mode = smu_v11_0_get_fan_control_mode,
diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
index 1223d298c03f..5bb1ac821aeb 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
@@ -530,8 +530,8 @@ struct pptable_funcs {
int (*set_power_limit)(struct smu_context *smu, uint32_t n);
int (*get_current_clk_freq)(struct smu_context *smu, enum smu_clk_type 
clk_id, uint32_t *value);
int (*init_max_sustainable_clocks)(struct smu_context *smu);
-   int (*start_thermal_control)(struct smu_context *smu);
-   int (*stop_thermal_control)(struct smu_context *smu);
+   int (*enable_thermal_alert)(struct smu_context *smu);
+   int (*disable_thermal_alert)(struct smu_context *smu);
int (*set_deep_sleep_dcefclk)(struct smu_context *smu, uint32_t clk);
int (*set_active_display_count)(struct smu_context *smu, uint32_t 
count);
int (*store_cc6_data)(struct smu_context *smu, uint32_t separation_time,
diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
index f3f34a0f5602..71f829ab306e 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
@@ -207,9 +207,9 @@ int smu_v11_0_get_current_clk_freq(struct smu_context *smu,
 
 int smu_v11_0_init_max_sustainable_clocks(struct smu_context *smu);
 
-int smu_v11_0_start_thermal_control(struct smu_context *smu);
+int smu_v11_0_enable_thermal_alert(struct smu_context *smu);
 
-int smu_v11_0_stop_thermal_control(struct smu_context *smu);
+int smu_v11_0_disable_thermal_alert(struct smu_context *smu);
 
 int smu_v11_0_read_sensor(struct smu_context *smu,
 enum amd_pp_sensors sensor,
diff --git a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c 
b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
index ef4952afb365..68142f6798c6 100644
---