RE: [PATCH 1/2] drm/amdgpu: update smu_v11_0_pptable.h

2020-02-06 Thread Quan, Evan
> SMU_11_0_ODFEATURE_COUNT= 14,
This seems a little weird. 
Maybe it should be "SMU_11_0_ODFEATURE_COUNT = 1 << SMU_11_0_ODCAP_COUNT, "
With above confirmed, the patch series is reviewed-by: Evan Quan 


>-Original Message-
>From: amd-gfx  On Behalf Of Alex
>Deucher
>Sent: Friday, February 7, 2020 3:55 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander 
>Subject: [PATCH 1/2] drm/amdgpu: update smu_v11_0_pptable.h
>
>Update to the latest changes.
>
>Signed-off-by: Alex Deucher 
>---
> .../drm/amd/powerplay/inc/smu_v11_0_pptable.h | 46 +-
>-
> 1 file changed, 32 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
>b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
>index b2f96a101124..7a63cf8e85ed 100644
>--- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
>+++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
>@@ -39,21 +39,39 @@
> #define SMU_11_0_PP_OVERDRIVE_VERSION   0x0800
> #define SMU_11_0_PP_POWERSAVINGCLOCK_VERSION0x0100
>
>+enum SMU_11_0_ODFEATURE_CAP {
>+SMU_11_0_ODCAP_GFXCLK_LIMITS = 0,
>+SMU_11_0_ODCAP_GFXCLK_CURVE,
>+SMU_11_0_ODCAP_UCLK_MAX,
>+SMU_11_0_ODCAP_POWER_LIMIT,
>+SMU_11_0_ODCAP_FAN_ACOUSTIC_LIMIT,
>+SMU_11_0_ODCAP_FAN_SPEED_MIN,
>+SMU_11_0_ODCAP_TEMPERATURE_FAN,
>+SMU_11_0_ODCAP_TEMPERATURE_SYSTEM,
>+SMU_11_0_ODCAP_MEMORY_TIMING_TUNE,
>+SMU_11_0_ODCAP_FAN_ZERO_RPM_CONTROL,
>+SMU_11_0_ODCAP_AUTO_UV_ENGINE,
>+SMU_11_0_ODCAP_AUTO_OC_ENGINE,
>+SMU_11_0_ODCAP_AUTO_OC_MEMORY,
>+SMU_11_0_ODCAP_FAN_CURVE,
>+SMU_11_0_ODCAP_COUNT,
>+};
>+
> enum SMU_11_0_ODFEATURE_ID {
>-SMU_11_0_ODFEATURE_GFXCLK_LIMITS= 1 << 0, //GFXCLK Limit
>feature
>-SMU_11_0_ODFEATURE_GFXCLK_CURVE = 1 << 1, //GFXCLK Curve
>feature
>-SMU_11_0_ODFEATURE_UCLK_MAX = 1 << 2, //UCLK Limit
>feature
>-SMU_11_0_ODFEATURE_POWER_LIMIT  = 1 << 3, //Power Limit
>feature
>-SMU_11_0_ODFEATURE_FAN_ACOUSTIC_LIMIT   = 1 << 4, //Fan
>Acoustic RPM feature
>-SMU_11_0_ODFEATURE_FAN_SPEED_MIN= 1 << 5, //Minimum
>Fan Speed feature
>-SMU_11_0_ODFEATURE_TEMPERATURE_FAN  = 1 << 6, //Fan Target
>Temperature Limit feature
>-SMU_11_0_ODFEATURE_TEMPERATURE_SYSTEM   = 1 << 7,
>//Operating Temperature Limit feature
>-SMU_11_0_ODFEATURE_MEMORY_TIMING_TUNE   = 1 << 8, //AC
>Timing Tuning feature
>-SMU_11_0_ODFEATURE_FAN_ZERO_RPM_CONTROL = 1 << 9, //Zero
>RPM feature
>-SMU_11_0_ODFEATURE_AUTO_UV_ENGINE   = 1 << 10,//Auto
>Under Volt GFXCLK feature
>-SMU_11_0_ODFEATURE_AUTO_OC_ENGINE   = 1 << 11,//Auto Over
>Clock GFXCLK feature
>-SMU_11_0_ODFEATURE_AUTO_OC_MEMORY   = 1 << 12,//Auto
>Over Clock MCLK feature
>-SMU_11_0_ODFEATURE_FAN_CURVE= 1 << 13,//VICTOR TODO
>+SMU_11_0_ODFEATURE_GFXCLK_LIMITS= 1 <<
>SMU_11_0_ODCAP_GFXCLK_LIMITS,//GFXCLK Limit feature
>+SMU_11_0_ODFEATURE_GFXCLK_CURVE = 1 <<
>SMU_11_0_ODCAP_GFXCLK_CURVE, //GFXCLK Curve feature
>+SMU_11_0_ODFEATURE_UCLK_MAX = 1 <<
>SMU_11_0_ODCAP_UCLK_MAX, //UCLK Limit feature
>+SMU_11_0_ODFEATURE_POWER_LIMIT  = 1 <<
>SMU_11_0_ODCAP_POWER_LIMIT,  //Power Limit feature
>+SMU_11_0_ODFEATURE_FAN_ACOUSTIC_LIMIT   = 1 <<
>SMU_11_0_ODCAP_FAN_ACOUSTIC_LIMIT,   //Fan Acoustic RPM feature
>+SMU_11_0_ODFEATURE_FAN_SPEED_MIN= 1 <<
>SMU_11_0_ODCAP_FAN_SPEED_MIN,//Minimum Fan Speed feature
>+SMU_11_0_ODFEATURE_TEMPERATURE_FAN  = 1 <<
>SMU_11_0_ODCAP_TEMPERATURE_FAN,  //Fan Target Temperature
>Limit feature
>+SMU_11_0_ODFEATURE_TEMPERATURE_SYSTEM   = 1 <<
>SMU_11_0_ODCAP_TEMPERATURE_SYSTEM,   //Operating Temperature
>Limit feature
>+SMU_11_0_ODFEATURE_MEMORY_TIMING_TUNE   = 1 <<
>SMU_11_0_ODCAP_MEMORY_TIMING_TUNE,   //AC Timing Tuning feature
>+SMU_11_0_ODFEATURE_FAN_ZERO_RPM_CONTROL = 1 <<
>SMU_11_0_ODCAP_FAN_ZERO_RPM_CONTROL, //Zero RPM feature
>+SMU_11_0_ODFEATURE_AUTO_UV_ENGINE   = 1 <<
>SMU_11_0_ODCAP_AUTO_UV_ENGINE,   //Auto Under Volt GFXCLK
>feature
>+SMU_11_0_ODFEATURE_AUTO_OC_ENGINE   = 1 <<
>SMU_11_0_ODCAP_AUTO_OC_ENGINE,   //Auto Over Clock GFXCLK
>feature
>+SMU_11_0_ODFEATURE_AUTO_OC_MEMORY   = 1 <<
>SMU_11_0_ODCAP_AUTO_OC_MEMORY,   //Auto Over Clock MCLK
>feature
>+SMU_11_0_ODFEATURE_FAN_CURVE= 1 <<
>SMU_11_0_ODCAP_FAN_CURVE,//Fan Curve feature
> SMU_11_0_ODFEATURE_COUNT= 14,
> };
> #define SMU_11_0_MAX_ODFEATURE32  //Maximum Number of OD
>Features
>--
>2.24.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org

Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2020-02-06 Thread Daniel Drake
On Thu, Jan 16, 2020 at 11:15 PM Alex Deucher  wrote:
> It's just papering over the problem.  It would be better from a power
> perspective for the driver to just not suspend and keep running like
> normal.  When the driver is not suspended runtime things like clock
> and power gating are active which keep the GPU power at a minimum.

Until we have a better solution, are there any strategies we could
apply here to avoid the suspend as you say?
e.g. DMI quirk these products to disable suspend? Or disable suspend
on all s2idle setups?

This would certainly be better than the current situation of the
machine becoming unusable on resume.

> I talked to our sbios team and they seem to think our S0ix
> implementation works pretty differently from Intel's.  I'm not really
> an expert on this area however.  We have a new team ramping on up this
> for Linux however.

Thanks for following up on this internally! Can I lend a product
sample to the new team so that they have direct access?

Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[Patch v3 3/4] drm/amdkfd: refactor runtime pm for baco

2020-02-06 Thread Rajneesh Bhardwaj
So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.

However, in an ideal case, when gpu is runtime  suspended the kfd driver
should be able to:

 - auto resume amdgpu driver whenever a client requests compute service
 - prevent runtime suspend for amdgpu  while kfd is in use

This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.

Reviewed-by: Oak Zeng 
Reviewed-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 12 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  8 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 +--
 drivers/gpu/drm/amd/amdkfd/kfd_device.c| 29 +---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 40 --
 6 files changed, 68 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 8609287620ea..314c4a2a0354 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -178,18 +178,18 @@ void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
kgd2kfd_interrupt(adev->kfd.dev, ih_ring_entry);
 }
 
-void amdgpu_amdkfd_suspend(struct amdgpu_device *adev)
+void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool run_pm)
 {
if (adev->kfd.dev)
-   kgd2kfd_suspend(adev->kfd.dev);
+   kgd2kfd_suspend(adev->kfd.dev, run_pm);
 }
 
-int amdgpu_amdkfd_resume(struct amdgpu_device *adev)
+int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm)
 {
int r = 0;
 
if (adev->kfd.dev)
-   r = kgd2kfd_resume(adev->kfd.dev);
+   r = kgd2kfd_resume(adev->kfd.dev, run_pm);
 
return r;
 }
@@ -713,11 +713,11 @@ void kgd2kfd_exit(void)
 {
 }
 
-void kgd2kfd_suspend(struct kfd_dev *kfd)
+void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
 {
 }
 
-int kgd2kfd_resume(struct kfd_dev *kfd)
+int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
 {
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 47b0f2957d1f..9e8db702d878 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -122,8 +122,8 @@ struct amdkfd_process_info {
 int amdgpu_amdkfd_init(void);
 void amdgpu_amdkfd_fini(void);
 
-void amdgpu_amdkfd_suspend(struct amdgpu_device *adev);
-int amdgpu_amdkfd_resume(struct amdgpu_device *adev);
+void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool run_pm);
+int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm);
 void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
const void *ih_ring_entry);
 void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
@@ -249,8 +249,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 struct drm_device *ddev,
 const struct kgd2kfd_shared_resources *gpu_resources);
 void kgd2kfd_device_exit(struct kfd_dev *kfd);
-void kgd2kfd_suspend(struct kfd_dev *kfd);
-int kgd2kfd_resume(struct kfd_dev *kfd);
+void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm);
+int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm);
 int kgd2kfd_pre_reset(struct kfd_dev *kfd);
 int kgd2kfd_post_reset(struct kfd_dev *kfd);
 void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f33c49ed4f94..18fa78806317 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3312,7 +3312,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
}
}
 
-   amdgpu_amdkfd_suspend(adev);
+   amdgpu_amdkfd_suspend(adev, !fbcon);
 
amdgpu_ras_suspend(adev);
 
@@ -3396,7 +3396,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
fbcon)
}
}
}
-   r = amdgpu_amdkfd_resume(adev);
+   r = amdgpu_amdkfd_resume(adev, !fbcon);
if (r)
return r;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 798ad1c8f799..42ee9ea5c45a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -732,7 +732,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 void 

[Patch v3 1/4] drm/amdgpu: Fix missing error check in suspend

2020-02-06 Thread Rajneesh Bhardwaj
amdgpu_device_suspend might return an error code since it can be called
from both runtime and system suspend flows. Add the missing return code
in case of a failure.

Reviewed-by: Oak Zeng 
Reviewed-by: Felix Kuehling 
Reviewed-by: Alex Deucher 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f28d040de3ce..0026ff56542c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1235,6 +1235,9 @@ static int amdgpu_pmops_runtime_suspend(struct device 
*dev)
drm_kms_helper_poll_disable(drm_dev);
 
ret = amdgpu_device_suspend(drm_dev, false);
+   if (ret)
+   return ret;
+
if (amdgpu_device_supports_boco(drm_dev)) {
/* Only need to handle PCI state in the driver for ATPX
 * PCI core handles it for _PR3.
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[Patch v3 0/4] Enable BACO with KFD

2020-02-06 Thread Rajneesh Bhardwaj
Changes in v3:
 * Addressed review comments.
 * Rebased to latest amd-staging-drm-next.
 * Slightly modified patch 4 to avoid runpm on devices where baco is not
   yet fully supported for runtime pm but still is used for gpu reset.

Changes in v2:
 * Rebased on latest amd-staging-drm-next
 * Addressed review comments from Felix, Oak and Alex for v1
 * Removed 60 second hack for auto-suspend delay and simplified the
   logic
 * Dropped kfd debugfs patch
 * Folded in Alex's patch from this series to enable and test with kfd.
   https://patchwork.freedesktop.org/series/67885/ also fixed a
   checkpatch warning.

Link to v1: https://www.spinics.net/lists/amd-gfx/msg44551.html

This series aims to enable BACO by default on supported AMD platforms
and ensures that the AMD Kernel Fusion Driver can co-exist with this
feature when the GPU devices are runtime suspended and firmware pushes
the envelop to save more power with BACO entry sequence. Current
approach makes sure that if KFD is using GPU services for compute, it
won't let AMDGPU driver suspend and if the AMDGPU driver is already
runtime suspended with GPUs in deep power saving mode with BACO, the KFD
driver wakes up the AMDGPU and then starts the compute workload
execution.

This series has been tested with a single GPU (fiji), Dual GPUs (fiji
and fiji/tonga) and seems to work fine with kfdtest. Also tested system
wide suspend to mem and suspend to idle and with this series and it
worked fine.


Alex Deucher (1):
  drm/amdgpu/runpm: enable runpm on baco capable VI+ asics

Rajneesh Bhardwaj (3):
  drm/amdgpu: Fix missing error check in suspend
  drm/amdkfd: show warning when kfd is locked
  drm/amdkfd: refactor runtime pm for baco

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 12 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  8 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 10 --
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_device.c| 29 +---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 40 --
 9 files changed, 81 insertions(+), 28 deletions(-)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[Patch v3 2/4] drm/amdkfd: show warning when kfd is locked

2020-02-06 Thread Rajneesh Bhardwaj
During system suspend the kfd driver aquires a lock that prohibits
further kfd actions unless the gpu is resumed. This adds some info which
can be useful while debugging.

Reviewed-by: Oak Zeng 
Reviewed-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 275f79ab0900..86b919d82129 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -127,6 +127,8 @@ static int kfd_open(struct inode *inode, struct file *filep)
return PTR_ERR(process);
 
if (kfd_is_locked()) {
+   dev_dbg(kfd_device, "kfd is locked!\n"
+   "process %d unreferenced", process->pasid);
kfd_unref_process(process);
return -EAGAIN;
}
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[Patch v3 4/4] drm/amdgpu/runpm: enable runpm on baco capable VI+ asics

2020-02-06 Thread Rajneesh Bhardwaj
From: Alex Deucher 

Seems to work reliably on VI+ except for a few so enable runpm barring
those where baco for runtime power management is not supported.

[rajneesh] Picked https://patchwork.freedesktop.org/patch/335402/ to
enable runtime pm with baco for kfd. Also fixed a checkpatch warning and
added extra checks for VEGA20 and ARCTURUS.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 3a0ea9096498..0f3563926ad1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -170,10 +170,16 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
unsigned long flags)
}
 
if (amdgpu_device_supports_boco(dev) &&
-   (amdgpu_runtime_pm != 0)) /* enable runpm by default */
+   (amdgpu_runtime_pm != 0)) /* enable runpm by default for boco */
adev->runpm = true;
else if (amdgpu_device_supports_baco(dev) &&
-(amdgpu_runtime_pm > 0)) /* enable runpm if runpm=1 */
+(amdgpu_runtime_pm != 0) &&
+(adev->asic_type >= CHIP_TOPAZ) &&
+(adev->asic_type != CHIP_VEGA20) &&
+(adev->asic_type != CHIP_ARCTURUS)) /* enable runpm on VI+ */
+   adev->runpm = true;
+   else if (amdgpu_device_supports_baco(dev) &&
+(amdgpu_runtime_pm > 0))  /* enable runpm if runpm=1 on CI */
adev->runpm = true;
 
/* Call ACPI methods: require modeset init
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/5] drm/amd/display: update HDCP DTM immediately after hardware programming

2020-02-06 Thread Bhawanpreet Lakha
From: Wenjing Liu 

[why]
HDCP DTM needs to be aware of the upto date display topology
information in order to validate hardware consistency.

[how]
update HDCP DTM on update_stream_config call.

Signed-off-by: Wenjing Liu 
Reviewed-by: Jun Lei 
---
 .../gpu/drm/amd/display/modules/hdcp/hdcp.c   |  46 ++-
 .../gpu/drm/amd/display/modules/hdcp/hdcp.h   |  10 +-
 .../display/modules/hdcp/hdcp1_execution.c|   4 -
 .../display/modules/hdcp/hdcp1_transition.c   |   6 +-
 .../display/modules/hdcp/hdcp2_execution.c|   6 +-
 .../display/modules/hdcp/hdcp2_transition.c   |   6 +-
 .../drm/amd/display/modules/hdcp/hdcp_log.h   |   9 ++
 .../drm/amd/display/modules/hdcp/hdcp_psp.c   | 129 ++
 8 files changed, 106 insertions(+), 110 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c
index a7d24734c7cd..83eaec4c6ad7 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c
@@ -104,8 +104,6 @@ static enum mod_hdcp_status execution(struct mod_hdcp *hdcp,
event_ctx->unexpected_event = 1;
goto out;
}
-   /* update topology event if hdcp is not desired */
-   status = mod_hdcp_add_display_topology(hdcp);
} else if (is_in_hdcp1_states(hdcp)) {
status = mod_hdcp_hdcp1_execution(hdcp, event_ctx, 
>hdcp1);
} else if (is_in_hdcp1_dp_states(hdcp)) {
@@ -192,14 +190,7 @@ static enum mod_hdcp_status reset_authentication(struct 
mod_hdcp *hdcp,
mod_hdcp_hdcp1_destroy_session(hdcp);
 
}
-   if (hdcp->auth.trans_input.hdcp1.add_topology == PASS) {
-   status = mod_hdcp_remove_display_topology(hdcp);
-   if (status != MOD_HDCP_STATUS_SUCCESS) {
-   output->callback_needed = 0;
-   output->watchdog_timer_needed = 0;
-   goto out;
-   }
-   }
+
HDCP_TOP_RESET_AUTH_TRACE(hdcp);
memset(>auth, 0, sizeof(struct mod_hdcp_authentication));
memset(>state, 0, sizeof(struct mod_hdcp_state));
@@ -213,25 +204,12 @@ static enum mod_hdcp_status reset_authentication(struct 
mod_hdcp *hdcp,
goto out;
}
}
-   if (hdcp->auth.trans_input.hdcp2.add_topology == PASS) {
-   status = mod_hdcp_remove_display_topology(hdcp);
-   if (status != MOD_HDCP_STATUS_SUCCESS) {
-   output->callback_needed = 0;
-   output->watchdog_timer_needed = 0;
-   goto out;
-   }
-   }
+
HDCP_TOP_RESET_AUTH_TRACE(hdcp);
memset(>auth, 0, sizeof(struct mod_hdcp_authentication));
memset(>state, 0, sizeof(struct mod_hdcp_state));
set_state_id(hdcp, output, HDCP_INITIALIZED);
} else if (is_in_cp_not_desired_state(hdcp)) {
-   status = mod_hdcp_remove_display_topology(hdcp);
-   if (status != MOD_HDCP_STATUS_SUCCESS) {
-   output->callback_needed = 0;
-   output->watchdog_timer_needed = 0;
-   goto out;
-   }
HDCP_TOP_RESET_AUTH_TRACE(hdcp);
memset(>auth, 0, sizeof(struct mod_hdcp_authentication));
memset(>state, 0, sizeof(struct mod_hdcp_state));
@@ -338,16 +316,19 @@ enum mod_hdcp_status mod_hdcp_add_display(struct mod_hdcp 
*hdcp,
if (status != MOD_HDCP_STATUS_SUCCESS)
goto out;
 
-   /* add display to connection */
-   hdcp->connection.link = *link;
-   *display_container = *display;
-
/* reset retry counters */
reset_retry_counts(hdcp);
 
/* reset error trace */
memset(>connection.trace, 0, sizeof(hdcp->connection.trace));
 
+   /* add display to connection */
+   hdcp->connection.link = *link;
+   *display_container = *display;
+   status = mod_hdcp_add_display_to_topology(hdcp, display->index);
+   if (status != MOD_HDCP_STATUS_SUCCESS)
+   goto out;
+
/* request authentication */
if (current_state(hdcp) != HDCP_INITIALIZED)
set_state_id(hdcp, output, HDCP_INITIALIZED);
@@ -380,15 +361,18 @@ enum mod_hdcp_status mod_hdcp_remove_display(struct 
mod_hdcp *hdcp,
if (status != MOD_HDCP_STATUS_SUCCESS)
goto out;
 
-   /* remove display */
-   display->state = MOD_HDCP_DISPLAY_INACTIVE;
-
/* clear retry counters */
reset_retry_counts(hdcp);
 
/* reset error trace */
memset(>connection.trace, 0, 

[PATCH 5/5] drm/amd/display: Fix message for encryption

2020-02-06 Thread Bhawanpreet Lakha
-msg_in is not needed for enabling encryption.
-Use hdcp2_set_encryption instead of hdcp1_enable_encryption for hdcp2.2

Signed-off-by: Bhawanpreet Lakha 
Reviewed-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c
index acbe3e8a8eb7..d9cb2383d6de 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c
@@ -662,20 +662,15 @@ enum mod_hdcp_status 
mod_hdcp_hdcp2_enable_encryption(struct mod_hdcp *hdcp)
 {
struct psp_context *psp = hdcp->config.psp.handle;
struct ta_hdcp_shared_memory *hdcp_cmd;
-   struct 
ta_hdcp_cmd_hdcp2_process_prepare_authentication_message_input_v2 *msg_in;
struct mod_hdcp_display *display = get_first_added_display(hdcp);
 
hdcp_cmd = (struct ta_hdcp_shared_memory 
*)psp->hdcp_context.hdcp_shared_buf;
memset(hdcp_cmd, 0, sizeof(struct ta_hdcp_shared_memory));
 
-   msg_in = 
_cmd->in_msg.hdcp2_prepare_process_authentication_message_v2;
-
-   hdcp2_message_init(hdcp, msg_in);
-
if (!display)
return MOD_HDCP_STATUS_DISPLAY_NOT_FOUND;
 
-   hdcp_cmd->in_msg.hdcp1_enable_encryption.session_handle = hdcp->auth.id;
+   hdcp_cmd->in_msg.hdcp2_set_encryption.session_handle = hdcp->auth.id;
 
hdcp_cmd->cmd_id = TA_HDCP_COMMAND__HDCP2_SET_ENCRYPTION;
psp_hdcp_invoke(psp, hdcp_cmd->cmd_id);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 0/5] HDCP fixes

2020-02-06 Thread Bhawanpreet Lakha
Summary of changes
*handle revoked receivers
*don't retry if revoked
*fix rx_caps check typo
*fix enable encryption call to psp for 2.2
*update DTM right after HW programming


Bhawanpreet Lakha (3):
  drm/amd/display: Handle revoked receivers
  drm/amd/display: fix backwards byte order in rx_caps.
  drm/amd/display: Fix message for encryption

Wenjing Liu (2):
  drm/amd/display: no hdcp retry if bksv or ksv list is revoked
  drm/amd/display: update HDCP DTM immediately after hardware
programming

 .../gpu/drm/amd/display/modules/hdcp/hdcp.c   |  49 ++
 .../gpu/drm/amd/display/modules/hdcp/hdcp.h   |  11 +-
 .../display/modules/hdcp/hdcp1_execution.c|   4 -
 .../display/modules/hdcp/hdcp1_transition.c   |  12 +-
 .../display/modules/hdcp/hdcp2_execution.c|  10 +-
 .../display/modules/hdcp/hdcp2_transition.c   |   6 +-
 .../drm/amd/display/modules/hdcp/hdcp_log.c   |   4 +
 .../drm/amd/display/modules/hdcp/hdcp_log.h   |   9 +
 .../drm/amd/display/modules/hdcp/hdcp_psp.c   | 164 +++---
 .../drm/amd/display/modules/hdcp/hdcp_psp.h   |   6 +-
 .../drm/amd/display/modules/inc/mod_hdcp.h|   2 +
 11 files changed, 150 insertions(+), 127 deletions(-)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/5] drm/amd/display: Handle revoked receivers

2020-02-06 Thread Bhawanpreet Lakha
[Why]
PSP added a new return code for revoked receivers (SRM). We need to
handle that so we don't retry hdcp

This is already being handled on windows

[How]
Add the enums to psp interface header and handle them.

Signed-off-by: Bhawanpreet Lakha 
Reviewed-by: Nicholas Kazlauskas 
---
 .../drm/amd/display/modules/hdcp/hdcp_psp.c   | 28 ---
 .../drm/amd/display/modules/hdcp/hdcp_psp.h   |  6 ++--
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c
index 7911dc157d5a..844454e0a5ba 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c
@@ -210,6 +210,10 @@ enum mod_hdcp_status mod_hdcp_hdcp1_validate_rx(struct 
mod_hdcp *hdcp)
} else if 
(hdcp_cmd->out_msg.hdcp1_first_part_authentication.authentication_status ==
   TA_HDCP_AUTHENTICATION_STATUS__HDCP1_AUTHENTICATED) {
hdcp->connection.is_repeater = 0;
+   } else if 
(hdcp_cmd->out_msg.hdcp1_first_part_authentication.authentication_status ==
+  TA_HDCP_AUTHENTICATION_STATUS__HDCP1_KSV_REVOKED) {
+   hdcp->connection.is_hdcp1_revoked = 1;
+   return MOD_HDCP_STATUS_HDCP1_BKSV_REVOKED;
} else
return MOD_HDCP_STATUS_HDCP1_VALIDATE_RX_FAILURE;
 
@@ -245,6 +249,7 @@ enum mod_hdcp_status 
mod_hdcp_hdcp1_validate_ksvlist_vp(struct mod_hdcp *hdcp)
 {
struct psp_context *psp = hdcp->config.psp.handle;
struct ta_hdcp_shared_memory *hdcp_cmd;
+   enum mod_hdcp_status status = MOD_HDCP_STATUS_SUCCESS;
 
hdcp_cmd = (struct ta_hdcp_shared_memory 
*)psp->hdcp_context.hdcp_shared_buf;
memset(hdcp_cmd, 0, sizeof(struct ta_hdcp_shared_memory));
@@ -264,10 +269,19 @@ enum mod_hdcp_status 
mod_hdcp_hdcp1_validate_ksvlist_vp(struct mod_hdcp *hdcp)
 
psp_hdcp_invoke(psp, hdcp_cmd->cmd_id);
 
-   if (hdcp_cmd->hdcp_status != TA_HDCP_STATUS__SUCCESS)
-   return MOD_HDCP_STATUS_HDCP1_VALIDATE_KSV_LIST_FAILURE;
+   if (hdcp_cmd->hdcp_status == TA_HDCP_STATUS__SUCCESS &&
+   
hdcp_cmd->out_msg.hdcp1_second_part_authentication.authentication_status ==
+   TA_HDCP_AUTHENTICATION_STATUS__HDCP1_AUTHENTICATED) {
+   status = MOD_HDCP_STATUS_SUCCESS;
+   } else if 
(hdcp_cmd->out_msg.hdcp1_second_part_authentication.authentication_status ==
+  TA_HDCP_AUTHENTICATION_STATUS__HDCP1_KSV_REVOKED) {
+   hdcp->connection.is_hdcp1_revoked = 1;
+   status = MOD_HDCP_STATUS_HDCP1_KSV_LIST_REVOKED;
+   } else {
+   status = MOD_HDCP_STATUS_HDCP1_VALIDATE_KSV_LIST_FAILURE;
+   }
 
-   return MOD_HDCP_STATUS_SUCCESS;
+   return status;
 }
 
 enum mod_hdcp_status mod_hdcp_hdcp1_enable_dp_stream_encryption(struct 
mod_hdcp *hdcp)
@@ -473,9 +487,12 @@ enum mod_hdcp_status 
mod_hdcp_hdcp2_validate_ake_cert(struct mod_hdcp *hdcp)
hdcp->connection.is_km_stored = msg_out->process.is_km_stored ? 
1 : 0;
hdcp->connection.is_repeater = msg_out->process.is_repeater ? 1 
: 0;
return MOD_HDCP_STATUS_SUCCESS;
+   } else if (msg_out->process.msg1_status == 
TA_HDCP2_MSG_AUTHENTICATION_STATUS__RECEIVERID_REVOKED) {
+   hdcp->connection.is_hdcp2_revoked = 1;
+   return MOD_HDCP_STATUS_HDCP2_AKE_CERT_REVOKED;
}
 
-   return MOD_HDCP_STATUS_FAILURE;
+   return MOD_HDCP_STATUS_HDCP2_VALIDATE_AKE_CERT_FAILURE;
 }
 
 enum mod_hdcp_status mod_hdcp_hdcp2_validate_h_prime(struct mod_hdcp *hdcp)
@@ -695,6 +712,9 @@ enum mod_hdcp_status 
mod_hdcp_hdcp2_validate_rx_id_list(struct mod_hdcp *hdcp)
hdcp->connection.is_km_stored = msg_out->process.is_km_stored ? 
1 : 0;
hdcp->connection.is_repeater = msg_out->process.is_repeater ? 1 
: 0;
return MOD_HDCP_STATUS_SUCCESS;
+   } else if (msg_out->process.msg1_status == 
TA_HDCP2_MSG_AUTHENTICATION_STATUS__RECEIVERID_REVOKED) {
+   hdcp->connection.is_hdcp2_revoked = 1;
+   return MOD_HDCP_STATUS_HDCP2_RX_ID_LIST_REVOKED;
}
 
 
diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.h 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.h
index d5cb3f46606f..1a663dbbf810 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.h
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.h
@@ -240,7 +240,8 @@ enum ta_hdcp_authentication_status {
TA_HDCP_AUTHENTICATION_STATUS__HDCP22_AUTHENTICATION_PENDING = 0x06,
TA_HDCP_AUTHENTICATION_STATUS__HDCP22_AUTHENTICATION_FAILED = 0x07,
TA_HDCP_AUTHENTICATION_STATUS__HDCP22_AUTHENTICATED = 0x08,
-   TA_HDCP_AUTHENTICATION_STATUS__HDCP1_KSV_VALIDATION_FAILED = 0x09
+   TA_HDCP_AUTHENTICATION_STATUS__HDCP1_KSV_VALIDATION_FAILED = 0x09,
+   

[PATCH 4/5] drm/amd/display: fix backwards byte order in rx_caps.

2020-02-06 Thread Bhawanpreet Lakha
We were using incorrect byte order after we started using the drm_defines
So fix it.

Fixes: 02837a91ae75 ("drm/amd/display: add and use defines from drm_hdcp.h")
Signed-off-by: JinZe.Xu 
Signed-off-by: Bhawanpreet Lakha 
Reviewed-by: Wenjing Liu 
---
 drivers/gpu/drm/amd/display/modules/hdcp/hdcp2_execution.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp2_execution.c 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp2_execution.c
index 432b2a016e56..340df6d406f9 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp2_execution.c
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp2_execution.c
@@ -46,8 +46,8 @@ static inline enum mod_hdcp_status check_hdcp2_capable(struct 
mod_hdcp *hdcp)
enum mod_hdcp_status status;
 
if (is_dp_hdcp(hdcp))
-   status = (hdcp->auth.msg.hdcp2.rxcaps_dp[2] & 
HDCP_2_2_RX_CAPS_VERSION_VAL) &&
-   
HDCP_2_2_DP_HDCP_CAPABLE(hdcp->auth.msg.hdcp2.rxcaps_dp[0]) ?
+   status = (hdcp->auth.msg.hdcp2.rxcaps_dp[0] == 
HDCP_2_2_RX_CAPS_VERSION_VAL) &&
+   
HDCP_2_2_DP_HDCP_CAPABLE(hdcp->auth.msg.hdcp2.rxcaps_dp[2]) ?
MOD_HDCP_STATUS_SUCCESS :
MOD_HDCP_STATUS_HDCP2_NOT_CAPABLE;
else
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/5] drm/amd/display: no hdcp retry if bksv or ksv list is revoked

2020-02-06 Thread Bhawanpreet Lakha
From: Wenjing Liu 

[why]
According to the specs when bksv or ksv list fails SRM check,
HDCP TX should abort hdcp immediately.
However with the current code HDCP will be reattampt upto 4 times.

[how]
Add the logic that stop HDCP retry if bksv or ksv list
is revoked.

Signed-off-by: Wenjing Liu 
Reviewed-by: Jun Lei 
---
 drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c | 3 ++-
 drivers/gpu/drm/amd/display/modules/hdcp/hdcp.h | 1 +
 drivers/gpu/drm/amd/display/modules/hdcp/hdcp1_transition.c | 6 --
 drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.c | 4 
 drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h  | 2 ++
 5 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c
index 8aa528e874c4..a7d24734c7cd 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c
@@ -61,7 +61,8 @@ static uint8_t is_cp_desired_hdcp1(struct mod_hdcp *hdcp)
 
return (hdcp->connection.hdcp1_retry_count < MAX_NUM_OF_ATTEMPTS) &&
is_auth_needed &&
-   !hdcp->connection.link.adjust.hdcp1.disable;
+   !hdcp->connection.link.adjust.hdcp1.disable &&
+   !hdcp->connection.is_hdcp1_revoked;
 }
 
 static uint8_t is_cp_desired_hdcp2(struct mod_hdcp *hdcp)
diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.h 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.h
index af78e4f1be68..4d717ec8f14b 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.h
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.h
@@ -170,6 +170,7 @@ struct mod_hdcp_connection {
struct mod_hdcp_display displays[MAX_NUM_OF_DISPLAYS];
uint8_t is_repeater;
uint8_t is_km_stored;
+   uint8_t is_hdcp1_revoked;
uint8_t is_hdcp2_revoked;
struct mod_hdcp_trace trace;
uint8_t hdcp1_retry_count;
diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp1_transition.c 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp1_transition.c
index 76edcbe51f71..d66a9f954ade 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp1_transition.c
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp1_transition.c
@@ -210,7 +210,8 @@ enum mod_hdcp_status mod_hdcp_hdcp1_dp_transition(struct 
mod_hdcp *hdcp,
fail_and_restart_in_ms(0, , output);
break;
} else if (input->rx_validation != PASS) {
-   if (hdcp->state.stay_count < 2) {
+   if (hdcp->state.stay_count < 2 &&
+   !hdcp->connection.is_hdcp1_revoked) {
/* allow 2 additional retries */
callback_in_ms(0, output);
increment_stay_counter(hdcp);
@@ -290,7 +291,8 @@ enum mod_hdcp_status mod_hdcp_hdcp1_dp_transition(struct 
mod_hdcp *hdcp,
fail_and_restart_in_ms(0, , output);
break;
} else if (input->ksvlist_vp_validation != PASS) {
-   if (hdcp->state.stay_count < 2) {
+   if (hdcp->state.stay_count < 2 &&
+   !hdcp->connection.is_hdcp1_revoked) {
/* allow 2 additional retries */
callback_in_ms(0, output);
increment_stay_counter(hdcp);
diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.c 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.c
index 724ebcee9a19..44956f9ba178 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.c
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.c
@@ -90,10 +90,14 @@ char *mod_hdcp_status_to_str(int32_t status)
return "MOD_HDCP_STATUS_HDCP1_R0_PRIME_PENDING";
case MOD_HDCP_STATUS_HDCP1_VALIDATE_RX_FAILURE:
return "MOD_HDCP_STATUS_HDCP1_VALIDATE_RX_FAILURE";
+   case MOD_HDCP_STATUS_HDCP1_BKSV_REVOKED:
+   return "MOD_HDCP_STATUS_HDCP1_BKSV_REVOKED";
case MOD_HDCP_STATUS_HDCP1_KSV_LIST_NOT_READY:
return "MOD_HDCP_STATUS_HDCP1_KSV_LIST_NOT_READY";
case MOD_HDCP_STATUS_HDCP1_VALIDATE_KSV_LIST_FAILURE:
return "MOD_HDCP_STATUS_HDCP1_VALIDATE_KSV_LIST_FAILURE";
+   case MOD_HDCP_STATUS_HDCP1_KSV_LIST_REVOKED:
+   return "MOD_HDCP_STATUS_HDCP1_KSV_LIST_REVOKED";
case MOD_HDCP_STATUS_HDCP1_ENABLE_ENCRYPTION:
return "MOD_HDCP_STATUS_HDCP1_ENABLE_ENCRYPTION";
case MOD_HDCP_STATUS_HDCP1_ENABLE_STREAM_ENCRYPTION_FAILURE:
diff --git a/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h 
b/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h
index f2a0e1a064da..891bca555e17 100644
--- a/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h

RE: [PATCH] drm/amdgpu: fix amdgpu pmu to use hwc->config instead of hwc->conf

2020-02-06 Thread Kim, Jonathan
[AMD Official Use Only - Approved for External Use]

Thanks for pointing that out Felix.  I'll append that as well to the comments 
for the commit.

Jon

-Original Message-
From: Kuehling, Felix  
Sent: Thursday, February 6, 2020 3:08 PM
To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: fix amdgpu pmu to use hwc->config instead of 
hwc->conf

On 2020-02-06 12:08, Jonathan Kim wrote:
> hwc->conf was designated specifically for AMD APU IOMMU purposes.  
> hwc->This
> could cause problems in performance and/or function since APU IOMMU 
> implementation is elsewhere.

It's actually worse than that. hwc is a union of anonymous structures. 
hwc->conf and hwc->config are in different members of that union. So 
hwc->conf aliases some other variable in the structure that hwc->config
is in. If I did the math right, hwc->conf aliases hwc->last_tag.

Anyway, the patch is

Reviewed-by: Felix Kuehling 


>
> Signed-off-by: Jonathan Kim 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 15 ---
>   1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
> index 07914e34bc25..1311d6aec5d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
> @@ -52,7 +52,7 @@ static int amdgpu_perf_event_init(struct perf_event *event)
>   return -ENOENT;
>   
>   /* update the hw_perf_event struct with config data */
> - hwc->conf = event->attr.config;
> + hwc->config = event->attr.config;
>   
>   return 0;
>   }
> @@ -74,9 +74,9 @@ static void amdgpu_perf_start(struct perf_event *event, int 
> flags)
>   switch (pe->pmu_perf_type) {
>   case PERF_TYPE_AMDGPU_DF:
>   if (!(flags & PERF_EF_RELOAD))
> - pe->adev->df.funcs->pmc_start(pe->adev, hwc->conf, 1);
> + pe->adev->df.funcs->pmc_start(pe->adev, hwc->config, 1);
>   
> - pe->adev->df.funcs->pmc_start(pe->adev, hwc->conf, 0);
> + pe->adev->df.funcs->pmc_start(pe->adev, hwc->config, 0);
>   break;
>   default:
>   break;
> @@ -101,7 +101,7 @@ static void amdgpu_perf_read(struct perf_event *event)
>   
>   switch (pe->pmu_perf_type) {
>   case PERF_TYPE_AMDGPU_DF:
> - pe->adev->df.funcs->pmc_get_count(pe->adev, hwc->conf,
> + pe->adev->df.funcs->pmc_get_count(pe->adev, hwc->config,
> );
>   break;
>   default:
> @@ -126,7 +126,7 @@ static void amdgpu_perf_stop(struct perf_event *event, 
> int flags)
>   
>   switch (pe->pmu_perf_type) {
>   case PERF_TYPE_AMDGPU_DF:
> - pe->adev->df.funcs->pmc_stop(pe->adev, hwc->conf, 0);
> + pe->adev->df.funcs->pmc_stop(pe->adev, hwc->config, 0);
>   break;
>   default:
>   break;
> @@ -156,7 +156,8 @@ static int amdgpu_perf_add(struct perf_event *event, int 
> flags)
>   
>   switch (pe->pmu_perf_type) {
>   case PERF_TYPE_AMDGPU_DF:
> - retval = pe->adev->df.funcs->pmc_start(pe->adev, hwc->conf, 1);
> + retval = pe->adev->df.funcs->pmc_start(pe->adev,
> +hwc->config, 1);
>   break;
>   default:
>   return 0;
> @@ -184,7 +185,7 @@ static void amdgpu_perf_del(struct perf_event *event, int 
> flags)
>   
>   switch (pe->pmu_perf_type) {
>   case PERF_TYPE_AMDGPU_DF:
> - pe->adev->df.funcs->pmc_stop(pe->adev, hwc->conf, 1);
> + pe->adev->df.funcs->pmc_stop(pe->adev, hwc->config, 1);
>   break;
>   default:
>   break;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix amdgpu pmu to use hwc->config instead of hwc->conf

2020-02-06 Thread Felix Kuehling

On 2020-02-06 12:08, Jonathan Kim wrote:

hwc->conf was designated specifically for AMD APU IOMMU purposes.  This
could cause problems in performance and/or function since APU IOMMU
implementation is elsewhere.


It's actually worse than that. hwc is a union of anonymous structures. 
hwc->conf and hwc->config are in different members of that union. So 
hwc->conf aliases some other variable in the structure that hwc->config 
is in. If I did the math right, hwc->conf aliases hwc->last_tag.


Anyway, the patch is

Reviewed-by: Felix Kuehling 




Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 15 ---
  1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
index 07914e34bc25..1311d6aec5d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
@@ -52,7 +52,7 @@ static int amdgpu_perf_event_init(struct perf_event *event)
return -ENOENT;
  
  	/* update the hw_perf_event struct with config data */

-   hwc->conf = event->attr.config;
+   hwc->config = event->attr.config;
  
  	return 0;

  }
@@ -74,9 +74,9 @@ static void amdgpu_perf_start(struct perf_event *event, int 
flags)
switch (pe->pmu_perf_type) {
case PERF_TYPE_AMDGPU_DF:
if (!(flags & PERF_EF_RELOAD))
-   pe->adev->df.funcs->pmc_start(pe->adev, hwc->conf, 1);
+   pe->adev->df.funcs->pmc_start(pe->adev, hwc->config, 1);
  
-		pe->adev->df.funcs->pmc_start(pe->adev, hwc->conf, 0);

+   pe->adev->df.funcs->pmc_start(pe->adev, hwc->config, 0);
break;
default:
break;
@@ -101,7 +101,7 @@ static void amdgpu_perf_read(struct perf_event *event)
  
  		switch (pe->pmu_perf_type) {

case PERF_TYPE_AMDGPU_DF:
-   pe->adev->df.funcs->pmc_get_count(pe->adev, hwc->conf,
+   pe->adev->df.funcs->pmc_get_count(pe->adev, hwc->config,
  );
break;
default:
@@ -126,7 +126,7 @@ static void amdgpu_perf_stop(struct perf_event *event, int 
flags)
  
  	switch (pe->pmu_perf_type) {

case PERF_TYPE_AMDGPU_DF:
-   pe->adev->df.funcs->pmc_stop(pe->adev, hwc->conf, 0);
+   pe->adev->df.funcs->pmc_stop(pe->adev, hwc->config, 0);
break;
default:
break;
@@ -156,7 +156,8 @@ static int amdgpu_perf_add(struct perf_event *event, int 
flags)
  
  	switch (pe->pmu_perf_type) {

case PERF_TYPE_AMDGPU_DF:
-   retval = pe->adev->df.funcs->pmc_start(pe->adev, hwc->conf, 1);
+   retval = pe->adev->df.funcs->pmc_start(pe->adev,
+  hwc->config, 1);
break;
default:
return 0;
@@ -184,7 +185,7 @@ static void amdgpu_perf_del(struct perf_event *event, int 
flags)
  
  	switch (pe->pmu_perf_type) {

case PERF_TYPE_AMDGPU_DF:
-   pe->adev->df.funcs->pmc_stop(pe->adev, hwc->conf, 1);
+   pe->adev->df.funcs->pmc_stop(pe->adev, hwc->config, 1);
break;
default:
break;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Rearm IRQ in Navi10 SR-IOV if IRQ lost

2020-02-06 Thread Alex Deucher
On Thu, Feb 6, 2020 at 3:00 PM Samir Dhume  wrote:
>
> Ported from Vega10. SDMA stress tests sometimes see IRQ lost.
>
> Signed-off-by: Samir Dhume 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 36 ++
>  1 file changed, 36 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> index cf557a428298..e08245a446fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> @@ -32,6 +32,7 @@
>  #include "soc15_common.h"
>  #include "navi10_ih.h"
>
> +#define MAX_REARM_RETRY 10
>
>  static void navi10_ih_set_interrupt_funcs(struct amdgpu_device *adev);
>
> @@ -283,6 +284,38 @@ static void navi10_ih_decode_iv(struct amdgpu_device 
> *adev,
> ih->rptr += 32;
>  }
>
> +/**
> + * navi10_ih_irq_rearm - rearm IRQ if lost
> + *
> + * @adev: amdgpu_device pointer
> + *
> + */
> +static void navi10_ih_irq_rearm(struct amdgpu_device *adev,
> +  struct amdgpu_ih_ring *ih)
> +{
> +   uint32_t reg_rptr = 0;
> +   uint32_t v = 0;
> +   uint32_t i = 0;
> +
> +   if (ih == >irq.ih)
> +   reg_rptr = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_RPTR);
> +   else if (ih == >irq.ih1)
> +   reg_rptr = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_RPTR_RING1);
> +   else if (ih == >irq.ih2)
> +   reg_rptr = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_RPTR_RING2);
> +   else
> +   return;
> +
> +   /* Rearm IRQ / re-write doorbell if doorbell write is lost */
> +   for (i = 0; i < MAX_REARM_RETRY; i++) {
> +   v = RREG32_NO_KIQ(reg_rptr);
> +   if ((v < ih->ring_size) && (v != ih->rptr))
> +   WDOORBELL32(ih->doorbell_index, ih->rptr);
> +   else
> +   break;
> +   }
> +}
> +
>  /**
>   * navi10_ih_set_rptr - set the IH ring buffer rptr
>   *
> @@ -297,6 +330,9 @@ static void navi10_ih_set_rptr(struct amdgpu_device *adev,
> /* XXX check if swapping is necessary on BE */
> *ih->rptr_cpu = ih->rptr;
> WDOORBELL32(ih->doorbell_index, ih->rptr);
> +
> +   if (amdgpu_sriov_vf(adev))
> +   navi10_ih_irq_rearm(adev, ih);
> } else
> WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
>  }
> --
> 2.20.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: Rearm IRQ in Navi10 SR-IOV if IRQ lost

2020-02-06 Thread Samir Dhume
Ported from Vega10. SDMA stress tests sometimes see IRQ lost.

Signed-off-by: Samir Dhume 
---
 drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 36 ++
 1 file changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index cf557a428298..e08245a446fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -32,6 +32,7 @@
 #include "soc15_common.h"
 #include "navi10_ih.h"
 
+#define MAX_REARM_RETRY 10
 
 static void navi10_ih_set_interrupt_funcs(struct amdgpu_device *adev);
 
@@ -283,6 +284,38 @@ static void navi10_ih_decode_iv(struct amdgpu_device *adev,
ih->rptr += 32;
 }
 
+/**
+ * navi10_ih_irq_rearm - rearm IRQ if lost
+ *
+ * @adev: amdgpu_device pointer
+ *
+ */
+static void navi10_ih_irq_rearm(struct amdgpu_device *adev,
+  struct amdgpu_ih_ring *ih)
+{
+   uint32_t reg_rptr = 0;
+   uint32_t v = 0;
+   uint32_t i = 0;
+
+   if (ih == >irq.ih)
+   reg_rptr = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_RPTR);
+   else if (ih == >irq.ih1)
+   reg_rptr = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_RPTR_RING1);
+   else if (ih == >irq.ih2)
+   reg_rptr = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_RPTR_RING2);
+   else
+   return;
+
+   /* Rearm IRQ / re-write doorbell if doorbell write is lost */
+   for (i = 0; i < MAX_REARM_RETRY; i++) {
+   v = RREG32_NO_KIQ(reg_rptr);
+   if ((v < ih->ring_size) && (v != ih->rptr))
+   WDOORBELL32(ih->doorbell_index, ih->rptr);
+   else
+   break;
+   }
+}
+
 /**
  * navi10_ih_set_rptr - set the IH ring buffer rptr
  *
@@ -297,6 +330,9 @@ static void navi10_ih_set_rptr(struct amdgpu_device *adev,
/* XXX check if swapping is necessary on BE */
*ih->rptr_cpu = ih->rptr;
WDOORBELL32(ih->doorbell_index, ih->rptr);
+
+   if (amdgpu_sriov_vf(adev))
+   navi10_ih_irq_rearm(adev, ih);
} else
WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
 }
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amdgpu: update smu_v11_0_pptable.h

2020-02-06 Thread Alex Deucher
Update to the latest changes.

Signed-off-by: Alex Deucher 
---
 .../drm/amd/powerplay/inc/smu_v11_0_pptable.h | 46 +--
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
index b2f96a101124..7a63cf8e85ed 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
@@ -39,21 +39,39 @@
 #define SMU_11_0_PP_OVERDRIVE_VERSION   0x0800
 #define SMU_11_0_PP_POWERSAVINGCLOCK_VERSION0x0100
 
+enum SMU_11_0_ODFEATURE_CAP {
+SMU_11_0_ODCAP_GFXCLK_LIMITS = 0,
+SMU_11_0_ODCAP_GFXCLK_CURVE,
+SMU_11_0_ODCAP_UCLK_MAX,
+SMU_11_0_ODCAP_POWER_LIMIT,
+SMU_11_0_ODCAP_FAN_ACOUSTIC_LIMIT,
+SMU_11_0_ODCAP_FAN_SPEED_MIN,
+SMU_11_0_ODCAP_TEMPERATURE_FAN,
+SMU_11_0_ODCAP_TEMPERATURE_SYSTEM,
+SMU_11_0_ODCAP_MEMORY_TIMING_TUNE,
+SMU_11_0_ODCAP_FAN_ZERO_RPM_CONTROL,
+SMU_11_0_ODCAP_AUTO_UV_ENGINE,
+SMU_11_0_ODCAP_AUTO_OC_ENGINE,
+SMU_11_0_ODCAP_AUTO_OC_MEMORY,
+SMU_11_0_ODCAP_FAN_CURVE,
+SMU_11_0_ODCAP_COUNT,
+};
+
 enum SMU_11_0_ODFEATURE_ID {
-SMU_11_0_ODFEATURE_GFXCLK_LIMITS= 1 << 0, //GFXCLK Limit 
feature
-SMU_11_0_ODFEATURE_GFXCLK_CURVE = 1 << 1, //GFXCLK Curve 
feature
-SMU_11_0_ODFEATURE_UCLK_MAX = 1 << 2, //UCLK Limit 
feature
-SMU_11_0_ODFEATURE_POWER_LIMIT  = 1 << 3, //Power Limit 
feature
-SMU_11_0_ODFEATURE_FAN_ACOUSTIC_LIMIT   = 1 << 4, //Fan Acoustic 
RPM feature
-SMU_11_0_ODFEATURE_FAN_SPEED_MIN= 1 << 5, //Minimum Fan 
Speed feature
-SMU_11_0_ODFEATURE_TEMPERATURE_FAN  = 1 << 6, //Fan Target 
Temperature Limit feature
-SMU_11_0_ODFEATURE_TEMPERATURE_SYSTEM   = 1 << 7, //Operating 
Temperature Limit feature
-SMU_11_0_ODFEATURE_MEMORY_TIMING_TUNE   = 1 << 8, //AC Timing 
Tuning feature
-SMU_11_0_ODFEATURE_FAN_ZERO_RPM_CONTROL = 1 << 9, //Zero RPM 
feature
-SMU_11_0_ODFEATURE_AUTO_UV_ENGINE   = 1 << 10,//Auto Under 
Volt GFXCLK feature
-SMU_11_0_ODFEATURE_AUTO_OC_ENGINE   = 1 << 11,//Auto Over 
Clock GFXCLK feature
-SMU_11_0_ODFEATURE_AUTO_OC_MEMORY   = 1 << 12,//Auto Over 
Clock MCLK feature
-SMU_11_0_ODFEATURE_FAN_CURVE= 1 << 13,//VICTOR TODO
+SMU_11_0_ODFEATURE_GFXCLK_LIMITS= 1 << 
SMU_11_0_ODCAP_GFXCLK_LIMITS,//GFXCLK Limit feature
+SMU_11_0_ODFEATURE_GFXCLK_CURVE = 1 << 
SMU_11_0_ODCAP_GFXCLK_CURVE, //GFXCLK Curve feature
+SMU_11_0_ODFEATURE_UCLK_MAX = 1 << SMU_11_0_ODCAP_UCLK_MAX,
 //UCLK Limit feature
+SMU_11_0_ODFEATURE_POWER_LIMIT  = 1 << SMU_11_0_ODCAP_POWER_LIMIT, 
 //Power Limit feature
+SMU_11_0_ODFEATURE_FAN_ACOUSTIC_LIMIT   = 1 << 
SMU_11_0_ODCAP_FAN_ACOUSTIC_LIMIT,   //Fan Acoustic RPM feature
+SMU_11_0_ODFEATURE_FAN_SPEED_MIN= 1 << 
SMU_11_0_ODCAP_FAN_SPEED_MIN,//Minimum Fan Speed feature
+SMU_11_0_ODFEATURE_TEMPERATURE_FAN  = 1 << 
SMU_11_0_ODCAP_TEMPERATURE_FAN,  //Fan Target Temperature Limit feature
+SMU_11_0_ODFEATURE_TEMPERATURE_SYSTEM   = 1 << 
SMU_11_0_ODCAP_TEMPERATURE_SYSTEM,   //Operating Temperature Limit feature
+SMU_11_0_ODFEATURE_MEMORY_TIMING_TUNE   = 1 << 
SMU_11_0_ODCAP_MEMORY_TIMING_TUNE,   //AC Timing Tuning feature
+SMU_11_0_ODFEATURE_FAN_ZERO_RPM_CONTROL = 1 << 
SMU_11_0_ODCAP_FAN_ZERO_RPM_CONTROL, //Zero RPM feature
+SMU_11_0_ODFEATURE_AUTO_UV_ENGINE   = 1 << 
SMU_11_0_ODCAP_AUTO_UV_ENGINE,   //Auto Under Volt GFXCLK feature
+SMU_11_0_ODFEATURE_AUTO_OC_ENGINE   = 1 << 
SMU_11_0_ODCAP_AUTO_OC_ENGINE,   //Auto Over Clock GFXCLK feature
+SMU_11_0_ODFEATURE_AUTO_OC_MEMORY   = 1 << 
SMU_11_0_ODCAP_AUTO_OC_MEMORY,   //Auto Over Clock MCLK feature
+SMU_11_0_ODFEATURE_FAN_CURVE= 1 << SMU_11_0_ODCAP_FAN_CURVE,   
 //Fan Curve feature
 SMU_11_0_ODFEATURE_COUNT= 14,
 };
 #define SMU_11_0_MAX_ODFEATURE32  //Maximum Number of OD Features
-- 
2.24.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amdgpu:/navi10: use the ODCAP enum to index the caps array

2020-02-06 Thread Alex Deucher
Rather than the FEATURE_ID flags.  Avoids a possible reading past
the end of the array.

Reported-by: Aleksandr Mezin 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c 
b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
index 19a9846b730e..0d73a49166af 100644
--- a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
@@ -736,9 +736,9 @@ static bool navi10_is_support_fine_grained_dpm(struct 
smu_context *smu, enum smu
return dpm_desc->SnapToDiscrete == 0 ? true : false;
 }
 
-static inline bool navi10_od_feature_is_supported(struct 
smu_11_0_overdrive_table *od_table, enum SMU_11_0_ODFEATURE_ID feature)
+static inline bool navi10_od_feature_is_supported(struct 
smu_11_0_overdrive_table *od_table, enum SMU_11_0_ODFEATURE_CAP cap)
 {
-   return od_table->cap[feature];
+   return od_table->cap[cap];
 }
 
 static void navi10_od_setting_get_range(struct smu_11_0_overdrive_table 
*od_table,
@@ -846,7 +846,7 @@ static int navi10_print_clk_levels(struct smu_context *smu,
case SMU_OD_SCLK:
if (!smu->od_enabled || !od_table || !od_settings)
break;
-   if (!navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODFEATURE_GFXCLK_LIMITS))
+   if (!navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_GFXCLK_LIMITS))
break;
size += sprintf(buf + size, "OD_SCLK:\n");
size += sprintf(buf + size, "0: %uMhz\n1: %uMhz\n", 
od_table->GfxclkFmin, od_table->GfxclkFmax);
@@ -854,7 +854,7 @@ static int navi10_print_clk_levels(struct smu_context *smu,
case SMU_OD_MCLK:
if (!smu->od_enabled || !od_table || !od_settings)
break;
-   if (!navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODFEATURE_UCLK_MAX))
+   if (!navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_UCLK_MAX))
break;
size += sprintf(buf + size, "OD_MCLK:\n");
size += sprintf(buf + size, "1: %uMHz\n", od_table->UclkFmax);
@@ -862,7 +862,7 @@ static int navi10_print_clk_levels(struct smu_context *smu,
case SMU_OD_VDDC_CURVE:
if (!smu->od_enabled || !od_table || !od_settings)
break;
-   if (!navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODFEATURE_GFXCLK_CURVE))
+   if (!navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_GFXCLK_CURVE))
break;
size += sprintf(buf + size, "OD_VDDC_CURVE:\n");
for (i = 0; i < 3; i++) {
@@ -887,7 +887,7 @@ static int navi10_print_clk_levels(struct smu_context *smu,
break;
size = sprintf(buf, "%s:\n", "OD_RANGE");
 
-   if (navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODFEATURE_GFXCLK_LIMITS)) {
+   if (navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_GFXCLK_LIMITS)) {
navi10_od_setting_get_range(od_settings, 
SMU_11_0_ODSETTING_GFXCLKFMIN,
_value, NULL);
navi10_od_setting_get_range(od_settings, 
SMU_11_0_ODSETTING_GFXCLKFMAX,
@@ -896,14 +896,14 @@ static int navi10_print_clk_levels(struct smu_context 
*smu,
min_value, max_value);
}
 
-   if (navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODFEATURE_UCLK_MAX)) {
+   if (navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_UCLK_MAX)) {
navi10_od_setting_get_range(od_settings, 
SMU_11_0_ODSETTING_UCLKFMAX,
_value, _value);
size += sprintf(buf + size, "MCLK: %7uMhz %10uMhz\n",
min_value, max_value);
}
 
-   if (navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODFEATURE_GFXCLK_CURVE)) {
+   if (navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_GFXCLK_CURVE)) {
navi10_od_setting_get_range(od_settings, 
SMU_11_0_ODSETTING_VDDGFXCURVEFREQ_P1,
_value, _value);
size += sprintf(buf + size, "VDDC_CURVE_SCLK[0]: %7uMhz 
%10uMhz\n",
@@ -2056,7 +2056,7 @@ static int navi10_od_edit_dpm_table(struct smu_context 
*smu, enum PP_OD_DPM_TABL
 
switch (type) {
case PP_OD_EDIT_SCLK_VDDC_TABLE:
-   if (!navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODFEATURE_GFXCLK_LIMITS)) {
+   if (!navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_GFXCLK_LIMITS)) {

Re: [PATCH] amdgpu: Prevent build errors regarding soft/hard-float FP ABI tags

2020-02-06 Thread Alex Deucher
On Thu, Feb 6, 2020 at 2:21 PM Daniel Kolesa  wrote:
>
> On PowerPC, the compiler will tag object files with whether they
> use hard or soft float FP ABI and whether they use 64 or 128-bit
> long double ABI. On systems with 64-bit long double ABI, a tag
> will get emitted whenever a double is used, as on those systems
> a long double is the same as a double. This will prevent linkage
> as other files are being compiled with hard-float.
>
> On ppc64, this code will never actually get used for the time
> being, as the only currently existing hardware using it are the
> Renoir APUs. Therefore, until this is testable and can be fixed
> properly, at least make sure the build will not fail.
>
> Signed-off-by: Daniel Kolesa 

Applied.  Thanks!

Alex

> ---
>  drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile 
> b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile
> index b864869..6fa7422 100644
> --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile
> @@ -91,6 +91,12 @@ ifdef CONFIG_DRM_AMD_DC_DCN2_1
>  
> ###
>  CLK_MGR_DCN21 = rn_clk_mgr.o rn_clk_mgr_vbios_smu.o
>
> +# prevent build errors regarding soft-float vs hard-float FP ABI tags
> +# this code is currently unused on ppc64, as it applies to Renoir APUs only
> +ifdef CONFIG_PPC64
> +CFLAGS_$(AMDDALPATH)/dc/clk_mgr/dcn21/rn_clk_mgr.o := $(call 
> cc-option,-mno-gnu-attribute)
> +endif
> +
>  AMD_DAL_CLK_MGR_DCN21 = $(addprefix 
> $(AMDDALPATH)/dc/clk_mgr/dcn21/,$(CLK_MGR_DCN21))
>
>  AMD_DISPLAY_FILES += $(AMD_DAL_CLK_MGR_DCN21)
> --
> 2.25.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] amdgpu: Prevent build errors regarding soft/hard-float FP ABI tags

2020-02-06 Thread Daniel Kolesa
On PowerPC, the compiler will tag object files with whether they
use hard or soft float FP ABI and whether they use 64 or 128-bit
long double ABI. On systems with 64-bit long double ABI, a tag
will get emitted whenever a double is used, as on those systems
a long double is the same as a double. This will prevent linkage
as other files are being compiled with hard-float.

On ppc64, this code will never actually get used for the time
being, as the only currently existing hardware using it are the
Renoir APUs. Therefore, until this is testable and can be fixed
properly, at least make sure the build will not fail.

Signed-off-by: Daniel Kolesa 
---
 drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile
index b864869..6fa7422 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile
@@ -91,6 +91,12 @@ ifdef CONFIG_DRM_AMD_DC_DCN2_1
 ###
 CLK_MGR_DCN21 = rn_clk_mgr.o rn_clk_mgr_vbios_smu.o
 
+# prevent build errors regarding soft-float vs hard-float FP ABI tags
+# this code is currently unused on ppc64, as it applies to Renoir APUs only
+ifdef CONFIG_PPC64
+CFLAGS_$(AMDDALPATH)/dc/clk_mgr/dcn21/rn_clk_mgr.o := $(call 
cc-option,-mno-gnu-attribute)
+endif
+
 AMD_DAL_CLK_MGR_DCN21 = $(addprefix 
$(AMDDALPATH)/dc/clk_mgr/dcn21/,$(CLK_MGR_DCN21))
 
 AMD_DISPLAY_FILES += $(AMD_DAL_CLK_MGR_DCN21)
-- 
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: linux-next: Tree for Feb 6 (amdgpu)

2020-02-06 Thread Randy Dunlap
On 2/5/20 8:09 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Please do not add any v5.7 material to your linux-next included
> branches until after v5.6-rc1 has been released.
> 
> Changes since 20200205:
> 

on i386:

ERROR: "dtn_debugfs_init" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

Full randconfig file is attached.

-- 
~Randy
Reported-by: Randy Dunlap 
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 5.5.0 Kernel Configuration
#

#
# Compiler: gcc (SUSE Linux) 7.5.0
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=70500
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_WARN_MAYBE_UNINITIALIZED=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_UAPI_HEADER_TEST=y
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
CONFIG_KERNEL_LZMA=y
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_SIM=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_PSI=y
# CONFIG_PSI_DEFAULT_DISABLED is not set
# end of CPU/Task time and stats accounting

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# end of RCU Subsystem

CONFIG_IKCONFIG=m
CONFIG_IKCONFIG_PROC=y
CONFIG_IKHEADERS=y
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
# CONFIG_MEMCG_SWAP_ENABLED is not set
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
CONFIG_RT_GROUP_SCHED=y
# CONFIG_CGROUP_PIDS is not set
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
# CONFIG_CGROUP_CPUACCT is not set
CONFIG_CGROUP_PERF=y
# CONFIG_CGROUP_BPF is not set
CONFIG_CGROUP_DEBUG=y
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_CHECKPOINT_RESTORE=y
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_RD_GZIP is not set
# CONFIG_RD_BZIP2 is not set
CONFIG_RD_LZMA=y
# CONFIG_RD_XZ is not set
# CONFIG_RD_LZO is not set
CONFIG_RD_LZ4=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
CONFIG_EXPERT=y
# CONFIG_MULTIUSER is not set
# CONFIG_SGETMASK_SYSCALL is not set
CONFIG_SYSFS_SYSCALL=y
# CONFIG_FHANDLE is not set
# CONFIG_POSIX_TIMERS is not set
# CONFIG_PRINTK is not set
# CONFIG_BUG is not set
CONFIG_ELF_CORE=y
# CONFIG_PCSPKR_PLATFORM is not set
CONFIG_BASE_FULL=y
# CONFIG_FUTEX is not set
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
# CONFIG_AIO is not set
# CONFIG_IO_URING is not set
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_MEMBARRIER is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
# CONFIG_USERFAULTFD is not set
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
# CONFIG_RSEQ is not set
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not 

Re: [Dali] Raven 2 detection Patch

2020-02-06 Thread Alex Deucher
On Thu, Feb 6, 2020 at 1:00 PM Tawfik, Aly  wrote:
>
>
> ***Reattached patch with corrections***
>
> From b828a2b3df3057461dfceb4d1394fe858ced9d03 Mon Sep 17 00:00:00 2001
> From: Ali-Tawfik 
> Date: Thu, 6 Feb 2020 12:53:02 -0500
> Subject: [PATCH] drm/amdgpu: [DALI] Dali Variant Detection
>
> Problem Description:
>
> Currently we are checking internal fused rev id with pci rev id. However, 
> fused internal rev id is the same on all raven2 parts (in which Dali was 
> based on too), thus Dali detection fails
>
> Fix:
>
> To detect this chip we need to use pci rev id but it is not defined in the
> scope of DC. To workaround this issue alter the fused
> rev id using pcie id for all dali chips before DC init,
> then use the internal fused id for chip detection in DC.
>
> Signed-off-by: Ali-Tawfik 

Reviewed-by: Alex Deucher 

Once it's reviewed, feel free to commit it.

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c| 9 -
>  drivers/gpu/drm/amd/display/include/dal_asic_id.h | 4 ++--
>  2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 317803f6a561..f85c27fbe64c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -1094,8 +1094,15 @@ static int soc15_common_early_init(void *handle)
> break;
> case CHIP_RAVEN:
> adev->asic_funcs = _asic_funcs;
> -   if (adev->rev_id >= 0x8)
> +   if (adev->rev_id >= 0x8) {
> +   if ((adev->pdev->device == 0x15d8) &&
> +((adev->pdev->revision == 0xCF) ||
> +(adev->pdev->revision == 0xE3)||
> +(adev->pdev->revision == 0xE4))) {
> +   adev->rev_id = 0x10;
> +   }
> adev->external_rev_id = adev->rev_id + 0x79;
> +   }
> else if (adev->pdev->device == 0x15d8)
> adev->external_rev_id = adev->rev_id + 0x41;
> else if (adev->rev_id == 1)
> diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h 
> b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
> index a2903985b9e8..0329f26bfacd 100644
> --- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h
> +++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
> @@ -143,6 +143,7 @@
>  #define RAVEN2_15D8_REV_EB 0xEB
>  #define RAVEN1_F0 0xF0
>  #define RAVEN_UNKNOWN 0xFF
> +#define RAVEN2_15D8_B0_LW 0x89
>  #ifndef ASICREV_IS_RAVEN
>  #define ASICREV_IS_RAVEN(eChipRev) ((eChipRev >= RAVEN_A0) && eChipRev < 
> RAVEN_UNKNOWN)
>  #endif
> @@ -152,8 +153,7 @@
>  #define ASICREV_IS_RAVEN2(eChipRev) ((eChipRev >= RAVEN2_A0) && (eChipRev < 
> RAVEN1_F0))
>  #endif
>  #define ASICREV_IS_RV1_F0(eChipRev) ((eChipRev >= RAVEN1_F0) && (eChipRev < 
> RAVEN_UNKNOWN))
> -#define ASICREV_IS_DALI(eChipRev) ((eChipRev == RAVEN2_15D8_REV_E3) \
> -   || (eChipRev == RAVEN2_15D8_REV_E4))
> +#define ASICREV_IS_DALI(eChipRev) ((eChipRev == RAVEN2_15D8_B0_LW))
>  #define ASICREV_IS_POLLOCK(eChipRev) (eChipRev == RAVEN2_15D8_REV_94 \
> || eChipRev == RAVEN2_15D8_REV_95 \
> || eChipRev == RAVEN2_15D8_REV_E9 \
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [Dali] Raven 2 detection Patch

2020-02-06 Thread Tawfik, Aly

***Reattached patch with corrections***

From b828a2b3df3057461dfceb4d1394fe858ced9d03 Mon Sep 17 00:00:00 2001
From: Ali-Tawfik 
Date: Thu, 6 Feb 2020 12:53:02 -0500
Subject: [PATCH] drm/amdgpu: [DALI] Dali Variant Detection

Problem Description:

Currently we are checking internal fused rev id with pci rev id. However, fused 
internal rev id is the same on all raven2 parts (in which Dali was based on 
too), thus Dali detection fails

Fix:

To detect this chip we need to use pci rev id but it is not defined in the
scope of DC. To workaround this issue alter the fused
rev id using pcie id for all dali chips before DC init,
then use the internal fused id for chip detection in DC.

Signed-off-by: Ali-Tawfik 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c| 9 -
 drivers/gpu/drm/amd/display/include/dal_asic_id.h | 4 ++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 317803f6a561..f85c27fbe64c 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1094,8 +1094,15 @@ static int soc15_common_early_init(void *handle)
break;
case CHIP_RAVEN:
adev->asic_funcs = _asic_funcs;
-   if (adev->rev_id >= 0x8)
+   if (adev->rev_id >= 0x8) {
+   if ((adev->pdev->device == 0x15d8) &&
+((adev->pdev->revision == 0xCF) ||
+(adev->pdev->revision == 0xE3)||
+(adev->pdev->revision == 0xE4))) {
+   adev->rev_id = 0x10;
+   }
adev->external_rev_id = adev->rev_id + 0x79;
+   }
else if (adev->pdev->device == 0x15d8)
adev->external_rev_id = adev->rev_id + 0x41;
else if (adev->rev_id == 1)
diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h 
b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
index a2903985b9e8..0329f26bfacd 100644
--- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h
+++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
@@ -143,6 +143,7 @@
 #define RAVEN2_15D8_REV_EB 0xEB
 #define RAVEN1_F0 0xF0
 #define RAVEN_UNKNOWN 0xFF
+#define RAVEN2_15D8_B0_LW 0x89
 #ifndef ASICREV_IS_RAVEN
 #define ASICREV_IS_RAVEN(eChipRev) ((eChipRev >= RAVEN_A0) && eChipRev < 
RAVEN_UNKNOWN)
 #endif
@@ -152,8 +153,7 @@
 #define ASICREV_IS_RAVEN2(eChipRev) ((eChipRev >= RAVEN2_A0) && (eChipRev < 
RAVEN1_F0))
 #endif
 #define ASICREV_IS_RV1_F0(eChipRev) ((eChipRev >= RAVEN1_F0) && (eChipRev < 
RAVEN_UNKNOWN))
-#define ASICREV_IS_DALI(eChipRev) ((eChipRev == RAVEN2_15D8_REV_E3) \
-   || (eChipRev == RAVEN2_15D8_REV_E4))
+#define ASICREV_IS_DALI(eChipRev) ((eChipRev == RAVEN2_15D8_B0_LW))
 #define ASICREV_IS_POLLOCK(eChipRev) (eChipRev == RAVEN2_15D8_REV_94 \
|| eChipRev == RAVEN2_15D8_REV_95 \
|| eChipRev == RAVEN2_15D8_REV_E9 \
-- 
2.17.1



0001-drm-amdgpu-DALI-Dali-Variant-Detection.patch
Description: 0001-drm-amdgpu-DALI-Dali-Variant-Detection.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: fix amdgpu pmu to use hwc->config instead of hwc->conf

2020-02-06 Thread Jonathan Kim
hwc->conf was designated specifically for AMD APU IOMMU purposes.  This
could cause problems in performance and/or function since APU IOMMU
implementation is elsewhere.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
index 07914e34bc25..1311d6aec5d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
@@ -52,7 +52,7 @@ static int amdgpu_perf_event_init(struct perf_event *event)
return -ENOENT;
 
/* update the hw_perf_event struct with config data */
-   hwc->conf = event->attr.config;
+   hwc->config = event->attr.config;
 
return 0;
 }
@@ -74,9 +74,9 @@ static void amdgpu_perf_start(struct perf_event *event, int 
flags)
switch (pe->pmu_perf_type) {
case PERF_TYPE_AMDGPU_DF:
if (!(flags & PERF_EF_RELOAD))
-   pe->adev->df.funcs->pmc_start(pe->adev, hwc->conf, 1);
+   pe->adev->df.funcs->pmc_start(pe->adev, hwc->config, 1);
 
-   pe->adev->df.funcs->pmc_start(pe->adev, hwc->conf, 0);
+   pe->adev->df.funcs->pmc_start(pe->adev, hwc->config, 0);
break;
default:
break;
@@ -101,7 +101,7 @@ static void amdgpu_perf_read(struct perf_event *event)
 
switch (pe->pmu_perf_type) {
case PERF_TYPE_AMDGPU_DF:
-   pe->adev->df.funcs->pmc_get_count(pe->adev, hwc->conf,
+   pe->adev->df.funcs->pmc_get_count(pe->adev, hwc->config,
  );
break;
default:
@@ -126,7 +126,7 @@ static void amdgpu_perf_stop(struct perf_event *event, int 
flags)
 
switch (pe->pmu_perf_type) {
case PERF_TYPE_AMDGPU_DF:
-   pe->adev->df.funcs->pmc_stop(pe->adev, hwc->conf, 0);
+   pe->adev->df.funcs->pmc_stop(pe->adev, hwc->config, 0);
break;
default:
break;
@@ -156,7 +156,8 @@ static int amdgpu_perf_add(struct perf_event *event, int 
flags)
 
switch (pe->pmu_perf_type) {
case PERF_TYPE_AMDGPU_DF:
-   retval = pe->adev->df.funcs->pmc_start(pe->adev, hwc->conf, 1);
+   retval = pe->adev->df.funcs->pmc_start(pe->adev,
+  hwc->config, 1);
break;
default:
return 0;
@@ -184,7 +185,7 @@ static void amdgpu_perf_del(struct perf_event *event, int 
flags)
 
switch (pe->pmu_perf_type) {
case PERF_TYPE_AMDGPU_DF:
-   pe->adev->df.funcs->pmc_stop(pe->adev, hwc->conf, 1);
+   pe->adev->df.funcs->pmc_stop(pe->adev, hwc->config, 1);
break;
default:
break;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Indicate use of TMZ buffers to DC

2020-02-06 Thread Kazlauskas, Nicholas

On 2020-02-06 11:54 a.m., Bhawanpreet Lakha wrote:

From: Harry Wentland 

[Why]
Hubp needs to know whether a buffer is being scanned out from the trusted
memory zone or not.

[How]
Check for the TMZ flag on the amdgpu_bo and set the tmz_surface flag in
dc_plane_address accordingly.

Signed-off-by: Harry Wentland 
Signed-off-by: Bhawanpreet Lakha 
Acked-by: Bhawanpreet Lakha 


Reviewed-by: Nicholas Kazlauskas 


---
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 38 +--
  1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 7f6d3b0f9efc..73000f1e1734 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3166,7 +3166,7 @@ static int fill_dc_scaling_info(const struct 
drm_plane_state *state,
  }
  
  static int get_fb_info(const struct amdgpu_framebuffer *amdgpu_fb,

-  uint64_t *tiling_flags)
+  uint64_t *tiling_flags, bool *tmz_surface)
  {
struct amdgpu_bo *rbo = gem_to_amdgpu_bo(amdgpu_fb->base.obj[0]);
int r = amdgpu_bo_reserve(rbo, false);
@@ -3181,6 +3181,9 @@ static int get_fb_info(const struct amdgpu_framebuffer 
*amdgpu_fb,
if (tiling_flags)
amdgpu_bo_get_tiling_flags(rbo, tiling_flags);
  
+	if (tmz_surface)

+   *tmz_surface = amdgpu_bo_encrypted(rbo);
+
amdgpu_bo_unreserve(rbo);
  
  	return r;

@@ -3263,7 +3266,8 @@ fill_plane_buffer_attributes(struct amdgpu_device *adev,
 union dc_tiling_info *tiling_info,
 struct plane_size *plane_size,
 struct dc_plane_dcc_param *dcc,
-struct dc_plane_address *address)
+struct dc_plane_address *address,
+bool tmz_surface)
  {
const struct drm_framebuffer *fb = >base;
int ret;
@@ -3273,6 +3277,8 @@ fill_plane_buffer_attributes(struct amdgpu_device *adev,
memset(dcc, 0, sizeof(*dcc));
memset(address, 0, sizeof(*address));
  
+	address->tmz_surface = tmz_surface;

+
if (format < SURFACE_PIXEL_FORMAT_VIDEO_BEGIN) {
plane_size->surface_size.x = 0;
plane_size->surface_size.y = 0;
@@ -3461,7 +3467,8 @@ fill_dc_plane_info_and_addr(struct amdgpu_device *adev,
const struct drm_plane_state *plane_state,
const uint64_t tiling_flags,
struct dc_plane_info *plane_info,
-   struct dc_plane_address *address)
+   struct dc_plane_address *address,
+   bool tmz_surface)
  {
const struct drm_framebuffer *fb = plane_state->fb;
const struct amdgpu_framebuffer *afb =
@@ -3540,7 +3547,7 @@ fill_dc_plane_info_and_addr(struct amdgpu_device *adev,
   plane_info->rotation, tiling_flags,
   _info->tiling_info,
   _info->plane_size,
-  _info->dcc, address);
+  _info->dcc, address, 
tmz_surface);
if (ret)
return ret;
  
@@ -3563,6 +3570,7 @@ static int fill_dc_plane_attributes(struct amdgpu_device *adev,

struct dc_plane_info plane_info;
uint64_t tiling_flags;
int ret;
+   bool tmz_surface = false;
  
  	ret = fill_dc_scaling_info(plane_state, _info);

if (ret)
@@ -3573,13 +3581,14 @@ static int fill_dc_plane_attributes(struct 
amdgpu_device *adev,
dc_plane_state->clip_rect = scaling_info.clip_rect;
dc_plane_state->scaling_quality = scaling_info.scaling_quality;
  
-	ret = get_fb_info(amdgpu_fb, _flags);

+   ret = get_fb_info(amdgpu_fb, _flags, _surface);
if (ret)
return ret;
  
  	ret = fill_dc_plane_info_and_addr(adev, plane_state, tiling_flags,

  _info,
- _plane_state->address);
+ _plane_state->address,
+ tmz_surface);
if (ret)
return ret;
  
@@ -5174,6 +5183,7 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,

uint64_t tiling_flags;
uint32_t domain;
int r;
+   bool tmz_surface = false;
  
  	dm_plane_state_old = to_dm_plane_state(plane->state);

dm_plane_state_new = to_dm_plane_state(new_state);
@@ -5222,6 +5232,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
  
  	amdgpu_bo_get_tiling_flags(rbo, _flags);
  
+	tmz_surface = amdgpu_bo_encrypted(rbo);

+
ttm_eu_backoff_reservation(, );
  
  	afb->address = 

[PATCH] drm/amd/display: Indicate use of TMZ buffers to DC

2020-02-06 Thread Bhawanpreet Lakha
From: Harry Wentland 

[Why]
Hubp needs to know whether a buffer is being scanned out from the trusted
memory zone or not.

[How]
Check for the TMZ flag on the amdgpu_bo and set the tmz_surface flag in
dc_plane_address accordingly.

Signed-off-by: Harry Wentland 
Signed-off-by: Bhawanpreet Lakha 
Acked-by: Bhawanpreet Lakha 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 38 +--
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 7f6d3b0f9efc..73000f1e1734 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3166,7 +3166,7 @@ static int fill_dc_scaling_info(const struct 
drm_plane_state *state,
 }
 
 static int get_fb_info(const struct amdgpu_framebuffer *amdgpu_fb,
-  uint64_t *tiling_flags)
+  uint64_t *tiling_flags, bool *tmz_surface)
 {
struct amdgpu_bo *rbo = gem_to_amdgpu_bo(amdgpu_fb->base.obj[0]);
int r = amdgpu_bo_reserve(rbo, false);
@@ -3181,6 +3181,9 @@ static int get_fb_info(const struct amdgpu_framebuffer 
*amdgpu_fb,
if (tiling_flags)
amdgpu_bo_get_tiling_flags(rbo, tiling_flags);
 
+   if (tmz_surface)
+   *tmz_surface = amdgpu_bo_encrypted(rbo);
+
amdgpu_bo_unreserve(rbo);
 
return r;
@@ -3263,7 +3266,8 @@ fill_plane_buffer_attributes(struct amdgpu_device *adev,
 union dc_tiling_info *tiling_info,
 struct plane_size *plane_size,
 struct dc_plane_dcc_param *dcc,
-struct dc_plane_address *address)
+struct dc_plane_address *address,
+bool tmz_surface)
 {
const struct drm_framebuffer *fb = >base;
int ret;
@@ -3273,6 +3277,8 @@ fill_plane_buffer_attributes(struct amdgpu_device *adev,
memset(dcc, 0, sizeof(*dcc));
memset(address, 0, sizeof(*address));
 
+   address->tmz_surface = tmz_surface;
+
if (format < SURFACE_PIXEL_FORMAT_VIDEO_BEGIN) {
plane_size->surface_size.x = 0;
plane_size->surface_size.y = 0;
@@ -3461,7 +3467,8 @@ fill_dc_plane_info_and_addr(struct amdgpu_device *adev,
const struct drm_plane_state *plane_state,
const uint64_t tiling_flags,
struct dc_plane_info *plane_info,
-   struct dc_plane_address *address)
+   struct dc_plane_address *address,
+   bool tmz_surface)
 {
const struct drm_framebuffer *fb = plane_state->fb;
const struct amdgpu_framebuffer *afb =
@@ -3540,7 +3547,7 @@ fill_dc_plane_info_and_addr(struct amdgpu_device *adev,
   plane_info->rotation, tiling_flags,
   _info->tiling_info,
   _info->plane_size,
-  _info->dcc, address);
+  _info->dcc, address, 
tmz_surface);
if (ret)
return ret;
 
@@ -3563,6 +3570,7 @@ static int fill_dc_plane_attributes(struct amdgpu_device 
*adev,
struct dc_plane_info plane_info;
uint64_t tiling_flags;
int ret;
+   bool tmz_surface = false;
 
ret = fill_dc_scaling_info(plane_state, _info);
if (ret)
@@ -3573,13 +3581,14 @@ static int fill_dc_plane_attributes(struct 
amdgpu_device *adev,
dc_plane_state->clip_rect = scaling_info.clip_rect;
dc_plane_state->scaling_quality = scaling_info.scaling_quality;
 
-   ret = get_fb_info(amdgpu_fb, _flags);
+   ret = get_fb_info(amdgpu_fb, _flags, _surface);
if (ret)
return ret;
 
ret = fill_dc_plane_info_and_addr(adev, plane_state, tiling_flags,
  _info,
- _plane_state->address);
+ _plane_state->address,
+ tmz_surface);
if (ret)
return ret;
 
@@ -5174,6 +5183,7 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
uint64_t tiling_flags;
uint32_t domain;
int r;
+   bool tmz_surface = false;
 
dm_plane_state_old = to_dm_plane_state(plane->state);
dm_plane_state_new = to_dm_plane_state(new_state);
@@ -5222,6 +5232,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
 
amdgpu_bo_get_tiling_flags(rbo, _flags);
 
+   tmz_surface = amdgpu_bo_encrypted(rbo);
+
ttm_eu_backoff_reservation(, );
 
afb->address = amdgpu_bo_gpu_offset(rbo);
@@ -5236,7 +5248,7 @@ static int 

Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2020-02-06 Thread Andrey Grodzovsky


On 2/6/20 9:51 AM, Christian König wrote:

Am 06.02.20 um 15:49 schrieb Alex Deucher:

On Thu, Feb 6, 2020 at 6:50 AM Christian König
 wrote:

Am 06.02.20 um 12:10 schrieb Lucas Stach:

Hi all,

On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:

Hi Andrey,

This commit breaks all drivers, which may bail out of the timeout
processing as they wish to extend the timeout (etnaviv, v3d).

Those drivers currently just return from the timeout handler before
calling drm_sched_stop(), which means with this commit applied we are
removing the first job from the ring_mirror_list, but never put it
back. This leads to jobs getting lost from the ring mirror, which 
then

causes quite a bit of fallout like unsignaled fences.

Not sure yet what to do about it, we can either add a function to add
the job back to the ring_mirror if the driver wants to extend the
timeout, or we could look for another way to stop
drm_sched_cleanup_jobs from freeing jobs that are currently in 
timeout

processing.

So after thinking about this a bit more my opinion is that we need to
revert this change for now and go back to the drawing board for the
scheduler timeout handling.

Right now this starts to feel like a big midlayer mistake with all the
very intricate intertwining between the drivers and the scheduler. The
rules on when it's safe to manipulate the ring mirror and when
completed jobs are signaled and freed are not really well specified.
The fact that we need to mutate state in order to get rid of races
instead of having a single big "timeout processing is owner of the
scheduler state for now" is a big fat warning sign IMHO.

Yes, that strongly feels like a hack to me as well. But I didn't had
time and still haven't to take a closer look and suggest something 
better.



In that case, can someone send me a revert?


Well a revert would break our driver.

The real solution is that somebody needs to sit down, gather ALL the 
requirements and then come up with a solution which is clean and works 
for everyone.


Christian.



I can to take on this as indeed our general design on this becomes more 
and more entangled as GPU reset scenarios grow in complexity (at least 
in AMD driver). Currently I am on a high priority internal task which 
should take me around a week or 2 to finish and after that I can get to it.


Regarding temporary solution  - I looked into v3d and etnaviv use cases 
and we in AMD actually face the same scenario where we decide to skip HW 
reset if the guilty job did finish by the time we are processing the 
timeout  (see amdgpu_device_gpu_recover and skip_hw_reset goto) - the 
difference is we always call drm_sched_stop/start irrespectively of 
whether we are going to actually HW reset or not (same as extend 
timeout). I wonder if something like this can be done also for ve3 and 
etnaviv ?


Andrey






Alex



Christian.


It took me far longer than I'd like to admit to understand the failure
mode with fences not getting signaled after a GPU hang. The back and
forth between scheduler and driver code makes things really hard to
follow.

Regards,
Lucas


Regards,
Lucas

On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:

Problem:
Due to a race between drm_sched_cleanup_jobs in sched thread and
drm_sched_job_timedout in timeout work there is a possiblity that
bad job was already freed while still being accessed from the
timeout thread.

Fix:
Instead of just peeking at the bad job in the mirror list
remove it from the list under lock and then put it back later when
we are garanteed no race with main sched thread is possible which
is after the thread is parked.

v2: Lock around processing ring_mirror_list in 
drm_sched_cleanup_jobs.


v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
drm_sched_get_cleanup_job already has a lock there.

v4: Fix comments to relfect latest code in drm-misc.

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
Tested-by: Emily Deng 
---
   drivers/gpu/drm/scheduler/sched_main.c | 27 
+++

   1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 6774955..1bf9c40 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct 
work_struct *work)

 unsigned long flags;

 sched = container_of(work, struct drm_gpu_scheduler, 
work_tdr.work);

+
+   /* Protects against concurrent deletion in 
drm_sched_get_cleanup_job */

+   spin_lock_irqsave(>job_list_lock, flags);
 job = list_first_entry_or_null(>ring_mirror_list,
    struct drm_sched_job, node);

 if (job) {
+   /*
+    * Remove the bad job so it cannot be freed by 
concurrent
+    * drm_sched_cleanup_jobs. It will be reinserted back 
after sched->thread

+    * is parked at which point it's safe.
+    */
+   

Re: [PATCH 4/4] drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_access_memory v2

2020-02-06 Thread Felix Kuehling

On 2020-02-06 9:30, Christian König wrote:

Make use of the better performance here as well.

This patch is only compile tested!

v2: fix calculation bug pointed out by Felix

Signed-off-by: Christian König 
Acked-by: Jonathan Kim 


The series is

Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 38 +++--
  1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 58d143b24ba0..2c1d1eb1a7e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1565,7 +1565,7 @@ static int amdgpu_ttm_access_memory(struct 
ttm_buffer_object *bo,
  
  	while (len && pos < adev->gmc.mc_vram_size) {

uint64_t aligned_pos = pos & ~(uint64_t)3;
-   uint32_t bytes = 4 - (pos & 3);
+   uint64_t bytes = 4 - (pos & 3);
uint32_t shift = (pos & 3) * 8;
uint32_t mask = 0x << shift;
  
@@ -1574,20 +1574,28 @@ static int amdgpu_ttm_access_memory(struct ttm_buffer_object *bo,

bytes = len;
}
  
-		spin_lock_irqsave(>mmio_idx_lock, flags);

-   WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)aligned_pos) | 0x8000);
-   WREG32_NO_KIQ(mmMM_INDEX_HI, aligned_pos >> 31);
-   if (!write || mask != 0x)
-   value = RREG32_NO_KIQ(mmMM_DATA);
-   if (write) {
-   value &= ~mask;
-   value |= (*(uint32_t *)buf << shift) & mask;
-   WREG32_NO_KIQ(mmMM_DATA, value);
-   }
-   spin_unlock_irqrestore(>mmio_idx_lock, flags);
-   if (!write) {
-   value = (value & mask) >> shift;
-   memcpy(buf, , bytes);
+   if (mask != 0x) {
+   spin_lock_irqsave(>mmio_idx_lock, flags);
+   WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)aligned_pos) | 
0x8000);
+   WREG32_NO_KIQ(mmMM_INDEX_HI, aligned_pos >> 31);
+   if (!write || mask != 0x)
+   value = RREG32_NO_KIQ(mmMM_DATA);
+   if (write) {
+   value &= ~mask;
+   value |= (*(uint32_t *)buf << shift) & mask;
+   WREG32_NO_KIQ(mmMM_DATA, value);
+   }
+   spin_unlock_irqrestore(>mmio_idx_lock, flags);
+   if (!write) {
+   value = (value & mask) >> shift;
+   memcpy(buf, , bytes);
+   }
+   } else {
+   bytes = (nodes->start + nodes->size) << PAGE_SHIFT;
+   bytes = min(bytes - pos, (uint64_t)len & ~0x3ull);
+
+   amdgpu_device_vram_access(adev, pos, (uint32_t *)buf,
+ bytes, write);
}
  
  		ret += bytes;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [Patch v2 3/4] drm/amdkfd: refactor runtime pm for baco

2020-02-06 Thread Zeng, Oak
[AMD Official Use Only - Internal Distribution Only]

Hi Alex,

I am trying to understand why prevent runtime pm when xgmi is active. Is it 
because other device's accessing suspended device's HBM? Here is my 
understanding: after device is suspend, the DF and HBM will still be alive. So 
as long as the gL2 probing to device is disabled (this should be guaranteed by 
gL2 flush during suspend), other device should still be able to access HBM on 
the suspended device.  

Regards,
Oak

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, February 6, 2020 9:27 AM
To: Kuehling, Felix 
Cc: Deucher, Alexander ; Bhardwaj, Rajneesh 
; amd-gfx list 
Subject: Re: [Patch v2 3/4] drm/amdkfd: refactor runtime pm for baco

On Tue, Feb 4, 2020 at 11:46 PM Alex Deucher  wrote:
>
> On Tue, Feb 4, 2020 at 4:28 PM Felix Kuehling  wrote:
> >
> > On 2020-01-31 10:37 p.m., Rajneesh Bhardwaj wrote:
> > > So far the kfd driver implemented same routines for runtime and 
> > > system wide suspend and resume (s2idle or mem). During system wide 
> > > suspend the kfd aquires an atomic lock that prevents any more user 
> > > processes to create queues and interact with kfd driver and amd 
> > > gpu. This mechanism created problem when amdgpu device is runtime 
> > > suspended with BACO enabled. Any application that relies on kfd 
> > > driver fails to load because the driver reports a locked kfd device since 
> > > gpu is runtime suspended.
> > >
> > > However, in an ideal case, when gpu is runtime  suspended the kfd 
> > > driver should be able to:
> > >
> > >   - auto resume amdgpu driver whenever a client requests compute service
> > >   - prevent runtime suspend for amdgpu  while kfd is in use
> > >
> > > This change refactors the amdgpu and amdkfd drivers to support 
> > > BACO and runtime power management.
> > >
> > > Signed-off-by: Rajneesh Bhardwaj 
> >
> > One small comment inline. Other than that patches 1-3 are
> >
> > Reviewed-by: Felix Kuehling 
> >
> > Also, I believe patch 1 is unchanged from v1 and already got a 
> > Reviewed-by from Alex. Please remember to add that tag before you submit.
> >
> > The last patch that enabled runtime PM by default, I'd leave the 
> > decision to submit that up to Alex. There may be other 
> > considerations than just KFD.
>
> KFD was the only thing left.  I've been running with runpm forced on 
> for a while now with no problems across a wide variety of hardware.

Actually, we need to prevent runtime pm if xgmi is active.  One more thing to 
check.

Alex


>
> Alex
>
> >
> > See inline ...
> >
> >
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 12 +++---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  8 ++--
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 +-
> > >   drivers/gpu/drm/amd/amdkfd/kfd_device.c| 29 +--
> > >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  1 +
> > >   drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 43 --
> > >   6 files changed, 70 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > > index 8609287620ea..314c4a2a0354 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > > @@ -178,18 +178,18 @@ void amdgpu_amdkfd_interrupt(struct amdgpu_device 
> > > *adev,
> > >   kgd2kfd_interrupt(adev->kfd.dev, ih_ring_entry);
> > >   }
> > >
> > > -void amdgpu_amdkfd_suspend(struct amdgpu_device *adev)
> > > +void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool 
> > > +run_pm)
> > >   {
> > >   if (adev->kfd.dev)
> > > - kgd2kfd_suspend(adev->kfd.dev);
> > > + kgd2kfd_suspend(adev->kfd.dev, run_pm);
> > >   }
> > >
> > > -int amdgpu_amdkfd_resume(struct amdgpu_device *adev)
> > > +int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm)
> > >   {
> > >   int r = 0;
> > >
> > >   if (adev->kfd.dev)
> > > - r = kgd2kfd_resume(adev->kfd.dev);
> > > + r = kgd2kfd_resume(adev->kfd.dev, run_pm);
> > >
> > >   return r;
> > >   }
> > > @@ -713,11 +713,11 @@ void kgd2kfd_exit(void)
> > >   {
> > >   }
> > >
> > > -void kgd2kfd_suspend(struct kfd_dev *kfd)
> > > +void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
> > >   {
> > >   }
> > >
> > > -int kgd2kfd_resume(struct kfd_dev *kfd)
> > > +int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
> > >   {
> > >   return 0;
> > >   }
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > index 47b0f2957d1f..9e8db702d878 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > @@ -122,8 +122,8 @@ struct amdkfd_process_info {
> > >   int amdgpu_amdkfd_init(void);
> > >   void amdgpu_amdkfd_fini(void);
> > >
> > > -void amdgpu_amdkfd_suspend(struct 

Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2020-02-06 Thread Christian König

Am 06.02.20 um 15:49 schrieb Alex Deucher:

On Thu, Feb 6, 2020 at 6:50 AM Christian König
 wrote:

Am 06.02.20 um 12:10 schrieb Lucas Stach:

Hi all,

On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:

Hi Andrey,

This commit breaks all drivers, which may bail out of the timeout
processing as they wish to extend the timeout (etnaviv, v3d).

Those drivers currently just return from the timeout handler before
calling drm_sched_stop(), which means with this commit applied we are
removing the first job from the ring_mirror_list, but never put it
back. This leads to jobs getting lost from the ring mirror, which then
causes quite a bit of fallout like unsignaled fences.

Not sure yet what to do about it, we can either add a function to add
the job back to the ring_mirror if the driver wants to extend the
timeout, or we could look for another way to stop
drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
processing.

So after thinking about this a bit more my opinion is that we need to
revert this change for now and go back to the drawing board for the
scheduler timeout handling.

Right now this starts to feel like a big midlayer mistake with all the
very intricate intertwining between the drivers and the scheduler. The
rules on when it's safe to manipulate the ring mirror and when
completed jobs are signaled and freed are not really well specified.
The fact that we need to mutate state in order to get rid of races
instead of having a single big "timeout processing is owner of the
scheduler state for now" is a big fat warning sign IMHO.

Yes, that strongly feels like a hack to me as well. But I didn't had
time and still haven't to take a closer look and suggest something better.


In that case, can someone send me a revert?


Well a revert would break our driver.

The real solution is that somebody needs to sit down, gather ALL the 
requirements and then come up with a solution which is clean and works 
for everyone.


Christian.



Alex



Christian.


It took me far longer than I'd like to admit to understand the failure
mode with fences not getting signaled after a GPU hang. The back and
forth between scheduler and driver code makes things really hard to
follow.

Regards,
Lucas


Regards,
Lucas

On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:

Problem:
Due to a race between drm_sched_cleanup_jobs in sched thread and
drm_sched_job_timedout in timeout work there is a possiblity that
bad job was already freed while still being accessed from the
timeout thread.

Fix:
Instead of just peeking at the bad job in the mirror list
remove it from the list under lock and then put it back later when
we are garanteed no race with main sched thread is possible which
is after the thread is parked.

v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.

v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
drm_sched_get_cleanup_job already has a lock there.

v4: Fix comments to relfect latest code in drm-misc.

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
Tested-by: Emily Deng 
---
   drivers/gpu/drm/scheduler/sched_main.c | 27 +++
   1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 6774955..1bf9c40 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
 unsigned long flags;

 sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
+
+   /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
+   spin_lock_irqsave(>job_list_lock, flags);
 job = list_first_entry_or_null(>ring_mirror_list,
struct drm_sched_job, node);

 if (job) {
+   /*
+* Remove the bad job so it cannot be freed by concurrent
+* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
+* is parked at which point it's safe.
+*/
+   list_del_init(>node);
+   spin_unlock_irqrestore(>job_list_lock, flags);
+
 job->sched->ops->timedout_job(job);

 /*
@@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
 job->sched->ops->free_job(job);
 sched->free_guilty = false;
 }
+   } else {
+   spin_unlock_irqrestore(>job_list_lock, flags);
 }

 spin_lock_irqsave(>job_list_lock, flags);
@@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
struct drm_sched_job *bad)
 kthread_park(sched->thread);

 /*
+* Reinsert back the bad job here - now it's safe as
+* drm_sched_get_cleanup_job cannot race against us and release the
+* bad job at this point - we parked (waited for) any in progress
+* (earlier) cleanups and drm_sched_get_cleanup_job will 

Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2020-02-06 Thread Alex Deucher
On Thu, Feb 6, 2020 at 6:50 AM Christian König
 wrote:
>
> Am 06.02.20 um 12:10 schrieb Lucas Stach:
> > Hi all,
> >
> > On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
> >> Hi Andrey,
> >>
> >> This commit breaks all drivers, which may bail out of the timeout
> >> processing as they wish to extend the timeout (etnaviv, v3d).
> >>
> >> Those drivers currently just return from the timeout handler before
> >> calling drm_sched_stop(), which means with this commit applied we are
> >> removing the first job from the ring_mirror_list, but never put it
> >> back. This leads to jobs getting lost from the ring mirror, which then
> >> causes quite a bit of fallout like unsignaled fences.
> >>
> >> Not sure yet what to do about it, we can either add a function to add
> >> the job back to the ring_mirror if the driver wants to extend the
> >> timeout, or we could look for another way to stop
> >> drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
> >> processing.
> > So after thinking about this a bit more my opinion is that we need to
> > revert this change for now and go back to the drawing board for the
> > scheduler timeout handling.
> >
> > Right now this starts to feel like a big midlayer mistake with all the
> > very intricate intertwining between the drivers and the scheduler. The
> > rules on when it's safe to manipulate the ring mirror and when
> > completed jobs are signaled and freed are not really well specified.
> > The fact that we need to mutate state in order to get rid of races
> > instead of having a single big "timeout processing is owner of the
> > scheduler state for now" is a big fat warning sign IMHO.
>
> Yes, that strongly feels like a hack to me as well. But I didn't had
> time and still haven't to take a closer look and suggest something better.
>

In that case, can someone send me a revert?

Alex


> Christian.
>
> >
> > It took me far longer than I'd like to admit to understand the failure
> > mode with fences not getting signaled after a GPU hang. The back and
> > forth between scheduler and driver code makes things really hard to
> > follow.
> >
> > Regards,
> > Lucas
> >
> >> Regards,
> >> Lucas
> >>
> >> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
> >>> Problem:
> >>> Due to a race between drm_sched_cleanup_jobs in sched thread and
> >>> drm_sched_job_timedout in timeout work there is a possiblity that
> >>> bad job was already freed while still being accessed from the
> >>> timeout thread.
> >>>
> >>> Fix:
> >>> Instead of just peeking at the bad job in the mirror list
> >>> remove it from the list under lock and then put it back later when
> >>> we are garanteed no race with main sched thread is possible which
> >>> is after the thread is parked.
> >>>
> >>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
> >>>
> >>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> >>> drm_sched_get_cleanup_job already has a lock there.
> >>>
> >>> v4: Fix comments to relfect latest code in drm-misc.
> >>>
> >>> Signed-off-by: Andrey Grodzovsky 
> >>> Reviewed-by: Christian König 
> >>> Tested-by: Emily Deng 
> >>> ---
> >>>   drivers/gpu/drm/scheduler/sched_main.c | 27 +++
> >>>   1 file changed, 27 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> >>> b/drivers/gpu/drm/scheduler/sched_main.c
> >>> index 6774955..1bf9c40 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct 
> >>> work_struct *work)
> >>> unsigned long flags;
> >>>
> >>> sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> >>> +
> >>> +   /* Protects against concurrent deletion in drm_sched_get_cleanup_job 
> >>> */
> >>> +   spin_lock_irqsave(>job_list_lock, flags);
> >>> job = list_first_entry_or_null(>ring_mirror_list,
> >>>struct drm_sched_job, node);
> >>>
> >>> if (job) {
> >>> +   /*
> >>> +* Remove the bad job so it cannot be freed by concurrent
> >>> +* drm_sched_cleanup_jobs. It will be reinserted back after 
> >>> sched->thread
> >>> +* is parked at which point it's safe.
> >>> +*/
> >>> +   list_del_init(>node);
> >>> +   spin_unlock_irqrestore(>job_list_lock, flags);
> >>> +
> >>> job->sched->ops->timedout_job(job);
> >>>
> >>> /*
> >>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct 
> >>> *work)
> >>> job->sched->ops->free_job(job);
> >>> sched->free_guilty = false;
> >>> }
> >>> +   } else {
> >>> +   spin_unlock_irqrestore(>job_list_lock, flags);
> >>> }
> >>>
> >>> spin_lock_irqsave(>job_list_lock, flags);
> >>> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
> >>> struct 

[PATCH 4/4] drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_access_memory v2

2020-02-06 Thread Christian König
Make use of the better performance here as well.

This patch is only compile tested!

v2: fix calculation bug pointed out by Felix

Signed-off-by: Christian König 
Acked-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 38 +++--
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 58d143b24ba0..2c1d1eb1a7e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1565,7 +1565,7 @@ static int amdgpu_ttm_access_memory(struct 
ttm_buffer_object *bo,
 
while (len && pos < adev->gmc.mc_vram_size) {
uint64_t aligned_pos = pos & ~(uint64_t)3;
-   uint32_t bytes = 4 - (pos & 3);
+   uint64_t bytes = 4 - (pos & 3);
uint32_t shift = (pos & 3) * 8;
uint32_t mask = 0x << shift;
 
@@ -1574,20 +1574,28 @@ static int amdgpu_ttm_access_memory(struct 
ttm_buffer_object *bo,
bytes = len;
}
 
-   spin_lock_irqsave(>mmio_idx_lock, flags);
-   WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)aligned_pos) | 0x8000);
-   WREG32_NO_KIQ(mmMM_INDEX_HI, aligned_pos >> 31);
-   if (!write || mask != 0x)
-   value = RREG32_NO_KIQ(mmMM_DATA);
-   if (write) {
-   value &= ~mask;
-   value |= (*(uint32_t *)buf << shift) & mask;
-   WREG32_NO_KIQ(mmMM_DATA, value);
-   }
-   spin_unlock_irqrestore(>mmio_idx_lock, flags);
-   if (!write) {
-   value = (value & mask) >> shift;
-   memcpy(buf, , bytes);
+   if (mask != 0x) {
+   spin_lock_irqsave(>mmio_idx_lock, flags);
+   WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)aligned_pos) | 
0x8000);
+   WREG32_NO_KIQ(mmMM_INDEX_HI, aligned_pos >> 31);
+   if (!write || mask != 0x)
+   value = RREG32_NO_KIQ(mmMM_DATA);
+   if (write) {
+   value &= ~mask;
+   value |= (*(uint32_t *)buf << shift) & mask;
+   WREG32_NO_KIQ(mmMM_DATA, value);
+   }
+   spin_unlock_irqrestore(>mmio_idx_lock, flags);
+   if (!write) {
+   value = (value & mask) >> shift;
+   memcpy(buf, , bytes);
+   }
+   } else {
+   bytes = (nodes->start + nodes->size) << PAGE_SHIFT;
+   bytes = min(bytes - pos, (uint64_t)len & ~0x3ull);
+
+   amdgpu_device_vram_access(adev, pos, (uint32_t *)buf,
+ bytes, write);
}
 
ret += bytes;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/4] drm/amdgpu: optimize amdgpu_device_vram_access a bit.

2020-02-06 Thread Christian König
Only write the _HI register when necessary.

Signed-off-by: Christian König 
Reviewed-by: Alex Deucher 
Acked-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f33c49ed4f94..be4e6c33d7e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -184,20 +184,25 @@ bool amdgpu_device_supports_baco(struct drm_device *dev)
 void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
   uint32_t *buf, size_t size, bool write)
 {
-   uint64_t last;
unsigned long flags;
+   uint32_t hi = ~0;
+   uint64_t last;
+
+   spin_lock_irqsave(>mmio_idx_lock, flags);
+   for (last = pos + size; pos < last; pos += 4) {
+   uint32_t tmp = pos >> 31;
 
-   last = size - 4;
-   for (last += pos; pos <= last; pos += 4) {
-   spin_lock_irqsave(>mmio_idx_lock, flags);
WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)pos) | 0x8000);
-   WREG32_NO_KIQ(mmMM_INDEX_HI, pos >> 31);
+   if (tmp != hi) {
+   WREG32_NO_KIQ(mmMM_INDEX_HI, tmp);
+   hi = tmp;
+   }
if (write)
WREG32_NO_KIQ(mmMM_DATA, *buf++);
else
*buf++ = RREG32_NO_KIQ(mmMM_DATA);
-   spin_unlock_irqrestore(>mmio_idx_lock, flags);
}
+   spin_unlock_irqrestore(>mmio_idx_lock, flags);
 }
 
 /*
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/4] drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2

2020-02-06 Thread Christian König
This should speed up debugging VRAM access a lot.

v2: add HDP flush/invalidate

Signed-off-by: Christian König 
Acked-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index be4e6c33d7e3..8f4849f94fb0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -188,6 +188,32 @@ void amdgpu_device_vram_access(struct amdgpu_device *adev, 
loff_t pos,
uint32_t hi = ~0;
uint64_t last;
 
+
+#ifdef CONFIG_64BIT
+   last = min(pos + size, adev->gmc.visible_vram_size);
+   if (last > pos) {
+   void __iomem *addr = adev->mman.aper_base_kaddr + pos;
+   size_t count = last - pos;
+
+   if (write) {
+   memcpy_toio(addr, buf, count);
+   mb();
+   amdgpu_asic_flush_hdp(adev, NULL);
+   } else {
+   amdgpu_asic_invalidate_hdp(adev, NULL);
+   mb();
+   memcpy_fromio(buf, addr, count);
+   }
+
+   if (count == size)
+   return;
+
+   pos += count;
+   buf += count / 4;
+   size -= count;
+   }
+#endif
+
spin_lock_irqsave(>mmio_idx_lock, flags);
for (last = pos + size; pos < last; pos += 4) {
uint32_t tmp = pos >> 31;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/4] drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_vram_read

2020-02-06 Thread Christian König
This speeds up the access quite a bit from 2.2 MB/s to
2.9 MB/s on 32bit and 12,8 MB/s on 64bit.

Signed-off-by: Christian König 
Reviewed-by: Alex Deucher 
Acked-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 27 ++---
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index ae1b00def5d8..58d143b24ba0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -59,6 +59,8 @@
 #include "amdgpu_ras.h"
 #include "bif/bif_4_1_d.h"
 
+#define AMDGPU_TTM_VRAM_MAX_DW_READ(size_t)128
+
 static int amdgpu_map_buffer(struct ttm_buffer_object *bo,
 struct ttm_mem_reg *mem, unsigned num_pages,
 uint64_t offset, unsigned window,
@@ -2255,27 +2257,20 @@ static ssize_t amdgpu_ttm_vram_read(struct file *f, 
char __user *buf,
if (*pos >= adev->gmc.mc_vram_size)
return -ENXIO;
 
+   size = min(size, (size_t)(adev->gmc.mc_vram_size - *pos));
while (size) {
-   unsigned long flags;
-   uint32_t value;
-
-   if (*pos >= adev->gmc.mc_vram_size)
-   return result;
-
-   spin_lock_irqsave(>mmio_idx_lock, flags);
-   WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)*pos) | 0x8000);
-   WREG32_NO_KIQ(mmMM_INDEX_HI, *pos >> 31);
-   value = RREG32_NO_KIQ(mmMM_DATA);
-   spin_unlock_irqrestore(>mmio_idx_lock, flags);
+   size_t bytes = min(size, AMDGPU_TTM_VRAM_MAX_DW_READ * 4);
+   uint32_t value[AMDGPU_TTM_VRAM_MAX_DW_READ];
 
-   r = put_user(value, (uint32_t *)buf);
+   amdgpu_device_vram_access(adev, *pos, value, bytes, false);
+   r = copy_to_user(buf, value, bytes);
if (r)
return r;
 
-   result += 4;
-   buf += 4;
-   *pos += 4;
-   size -= 4;
+   result += bytes;
+   buf += bytes;
+   *pos += bytes;
+   size -= bytes;
}
 
return result;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Patch v2 3/4] drm/amdkfd: refactor runtime pm for baco

2020-02-06 Thread Alex Deucher
On Tue, Feb 4, 2020 at 11:46 PM Alex Deucher  wrote:
>
> On Tue, Feb 4, 2020 at 4:28 PM Felix Kuehling  wrote:
> >
> > On 2020-01-31 10:37 p.m., Rajneesh Bhardwaj wrote:
> > > So far the kfd driver implemented same routines for runtime and system
> > > wide suspend and resume (s2idle or mem). During system wide suspend the
> > > kfd aquires an atomic lock that prevents any more user processes to
> > > create queues and interact with kfd driver and amd gpu. This mechanism
> > > created problem when amdgpu device is runtime suspended with BACO
> > > enabled. Any application that relies on kfd driver fails to load because
> > > the driver reports a locked kfd device since gpu is runtime suspended.
> > >
> > > However, in an ideal case, when gpu is runtime  suspended the kfd driver
> > > should be able to:
> > >
> > >   - auto resume amdgpu driver whenever a client requests compute service
> > >   - prevent runtime suspend for amdgpu  while kfd is in use
> > >
> > > This change refactors the amdgpu and amdkfd drivers to support BACO and
> > > runtime power management.
> > >
> > > Signed-off-by: Rajneesh Bhardwaj 
> >
> > One small comment inline. Other than that patches 1-3 are
> >
> > Reviewed-by: Felix Kuehling 
> >
> > Also, I believe patch 1 is unchanged from v1 and already got a
> > Reviewed-by from Alex. Please remember to add that tag before you submit.
> >
> > The last patch that enabled runtime PM by default, I'd leave the
> > decision to submit that up to Alex. There may be other considerations
> > than just KFD.
>
> KFD was the only thing left.  I've been running with runpm forced on
> for a while now with no problems across a wide variety of hardware.

Actually, we need to prevent runtime pm if xgmi is active.  One more
thing to check.

Alex


>
> Alex
>
> >
> > See inline ...
> >
> >
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 12 +++---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  8 ++--
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 +-
> > >   drivers/gpu/drm/amd/amdkfd/kfd_device.c| 29 +--
> > >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  1 +
> > >   drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 43 --
> > >   6 files changed, 70 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > > index 8609287620ea..314c4a2a0354 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > > @@ -178,18 +178,18 @@ void amdgpu_amdkfd_interrupt(struct amdgpu_device 
> > > *adev,
> > >   kgd2kfd_interrupt(adev->kfd.dev, ih_ring_entry);
> > >   }
> > >
> > > -void amdgpu_amdkfd_suspend(struct amdgpu_device *adev)
> > > +void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool run_pm)
> > >   {
> > >   if (adev->kfd.dev)
> > > - kgd2kfd_suspend(adev->kfd.dev);
> > > + kgd2kfd_suspend(adev->kfd.dev, run_pm);
> > >   }
> > >
> > > -int amdgpu_amdkfd_resume(struct amdgpu_device *adev)
> > > +int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm)
> > >   {
> > >   int r = 0;
> > >
> > >   if (adev->kfd.dev)
> > > - r = kgd2kfd_resume(adev->kfd.dev);
> > > + r = kgd2kfd_resume(adev->kfd.dev, run_pm);
> > >
> > >   return r;
> > >   }
> > > @@ -713,11 +713,11 @@ void kgd2kfd_exit(void)
> > >   {
> > >   }
> > >
> > > -void kgd2kfd_suspend(struct kfd_dev *kfd)
> > > +void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
> > >   {
> > >   }
> > >
> > > -int kgd2kfd_resume(struct kfd_dev *kfd)
> > > +int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm)
> > >   {
> > >   return 0;
> > >   }
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > index 47b0f2957d1f..9e8db702d878 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > @@ -122,8 +122,8 @@ struct amdkfd_process_info {
> > >   int amdgpu_amdkfd_init(void);
> > >   void amdgpu_amdkfd_fini(void);
> > >
> > > -void amdgpu_amdkfd_suspend(struct amdgpu_device *adev);
> > > -int amdgpu_amdkfd_resume(struct amdgpu_device *adev);
> > > +void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool run_pm);
> > > +int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm);
> > >   void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
> > >   const void *ih_ring_entry);
> > >   void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
> > > @@ -249,8 +249,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
> > >struct drm_device *ddev,
> > >const struct kgd2kfd_shared_resources 
> > > *gpu_resources);
> > >   void kgd2kfd_device_exit(struct kfd_dev *kfd);
> > > -void kgd2kfd_suspend(struct kfd_dev *kfd);
> > > -int kgd2kfd_resume(struct 

Re: [PATCH] drm/amd/powerplay: handle features disablement for baco reset in SMU FW

2020-02-06 Thread Alex Deucher
On Thu, Feb 6, 2020 at 3:19 AM Evan Quan  wrote:
>
> SMU FW will handle the features disablement for baco reset
> on Arcturus.
>
> Change-Id: Ifd87a09de2fca0c67c041afbd5e711973769119a
> Signed-off-by: Evan Quan 
> ---
>  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 53 +-
>  1 file changed, 42 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
> b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> index 9d1075823681..efd10e6fa9ef 100644
> --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> @@ -1467,24 +1467,39 @@ void smu_late_fini(void *handle)
> smu_send_smc_msg(smu, SMU_MSG_PrepareMp1ForShutdown);
>  }
>
> -static int smu_suspend(void *handle)
> +static int smu_disabled_dpms(struct smu_context *smu)
>  {
> -   int ret;
> -   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> -   struct smu_context *smu = >smu;
> -   bool baco_feature_is_enabled = false;
> +   struct amdgpu_device *adev = smu->adev;
> +   uint32_t smu_version;
> +   int ret = 0;
>
> -   if (!smu->pm_enabled)
> -   return 0;
> +   ret = smu_get_smc_version(smu, NULL, _version);
> +   if (ret) {
> +   pr_err("Failed to get smu version.\n");
> +   return ret;
> +   }
>
> -   if(!smu->is_apu)
> -   baco_feature_is_enabled = smu_feature_is_enabled(smu, 
> SMU_FEATURE_BACO_BIT);
> +   /*
> +* For baco reset on Arcturus, this operation
> +* (disable all smu feature) will be handled by SMU FW.
> +*/
> +   if (adev->in_gpu_reset &&
> +   (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) &&
> +   (adev->asic_type == CHIP_ARCTURUS && smu_version > 0x360e00))
> +   return 0;

Do we need the in_gpu_reset check here?  Is there ever a case where
would ever want to execute the rest of this?  What if we enable BACO
for power savings on arcturus?

>
> +   /* Disable all enabled SMU features */
> ret = smu_system_features_control(smu, false);
> -   if (ret)
> +   if (ret) {
> +   pr_err("Failed to disable smu features.\n");
> return ret;
> +   }
>
> -   if (baco_feature_is_enabled) {
> +   /* For baco reset, need to leave BACO feature enabled */
> +   if (adev->in_gpu_reset &&
> +   (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) &&
> +   !smu->is_apu &&
> +   smu_feature_is_enabled(smu, SMU_FEATURE_BACO_BIT)) {

This change will break BACO for power savings on navi1x.  I think we
can drop this hunk.

> ret = smu_feature_set_enabled(smu, SMU_FEATURE_BACO_BIT, 
> true);
> if (ret) {
> pr_warn("set BACO feature enabled failed, return 
> %d\n", ret);
> @@ -1492,6 +1507,22 @@ static int smu_suspend(void *handle)
> }
> }
>
> +   return ret;
> +}
> +
> +static int smu_suspend(void *handle)
> +{
> +   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +   struct smu_context *smu = >smu;
> +   int ret;
> +
> +   if (!smu->pm_enabled)
> +   return 0;
> +
> +   ret = smu_disabled_dpms(smu);
> +   if (ret)
> +   return ret;
> +
> smu->watermarks_bitmap &= ~(WATERMARKS_LOADED);
>
> if (adev->asic_type >= CHIP_NAVI10 &&
> --
> 2.25.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/smu11: fix SMU_11_0_ODFEATURE_ID/navi10_od_feature_is_supported()

2020-02-06 Thread Aleksandr Mezin
navi10_od_feature_is_supported() function uses enum values as array index. But
the enum values are defined like bit flags.

Starting from SMU_11_0_ODFEATURE_FAN_SPEED_MIN the function would read past the
end of 'cap' array.

I've been unable to find any uses of this enum except in the aforementioned
function, so fixing the enum seems to be the correct solution.

Signed-off-by: Aleksandr Mezin 
---
 .../drm/amd/powerplay/inc/smu_v11_0_pptable.h | 28 +--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
index b2f96a101124..ba720630cc4b 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h
@@ -40,20 +40,20 @@
 #define SMU_11_0_PP_POWERSAVINGCLOCK_VERSION0x0100
 
 enum SMU_11_0_ODFEATURE_ID {
-SMU_11_0_ODFEATURE_GFXCLK_LIMITS= 1 << 0, //GFXCLK Limit 
feature
-SMU_11_0_ODFEATURE_GFXCLK_CURVE = 1 << 1, //GFXCLK Curve 
feature
-SMU_11_0_ODFEATURE_UCLK_MAX = 1 << 2, //UCLK Limit 
feature
-SMU_11_0_ODFEATURE_POWER_LIMIT  = 1 << 3, //Power Limit 
feature
-SMU_11_0_ODFEATURE_FAN_ACOUSTIC_LIMIT   = 1 << 4, //Fan Acoustic 
RPM feature
-SMU_11_0_ODFEATURE_FAN_SPEED_MIN= 1 << 5, //Minimum Fan 
Speed feature
-SMU_11_0_ODFEATURE_TEMPERATURE_FAN  = 1 << 6, //Fan Target 
Temperature Limit feature
-SMU_11_0_ODFEATURE_TEMPERATURE_SYSTEM   = 1 << 7, //Operating 
Temperature Limit feature
-SMU_11_0_ODFEATURE_MEMORY_TIMING_TUNE   = 1 << 8, //AC Timing 
Tuning feature
-SMU_11_0_ODFEATURE_FAN_ZERO_RPM_CONTROL = 1 << 9, //Zero RPM 
feature
-SMU_11_0_ODFEATURE_AUTO_UV_ENGINE   = 1 << 10,//Auto Under 
Volt GFXCLK feature
-SMU_11_0_ODFEATURE_AUTO_OC_ENGINE   = 1 << 11,//Auto Over 
Clock GFXCLK feature
-SMU_11_0_ODFEATURE_AUTO_OC_MEMORY   = 1 << 12,//Auto Over 
Clock MCLK feature
-SMU_11_0_ODFEATURE_FAN_CURVE= 1 << 13,//VICTOR TODO
+SMU_11_0_ODFEATURE_GFXCLK_LIMITS= 0, //GFXCLK Limit feature
+SMU_11_0_ODFEATURE_GFXCLK_CURVE = 1, //GFXCLK Curve feature
+SMU_11_0_ODFEATURE_UCLK_MAX = 2, //UCLK Limit feature
+SMU_11_0_ODFEATURE_POWER_LIMIT  = 3, //Power Limit feature
+SMU_11_0_ODFEATURE_FAN_ACOUSTIC_LIMIT   = 4, //Fan Acoustic RPM 
feature
+SMU_11_0_ODFEATURE_FAN_SPEED_MIN= 5, //Minimum Fan Speed 
feature
+SMU_11_0_ODFEATURE_TEMPERATURE_FAN  = 6, //Fan Target 
Temperature Limit feature
+SMU_11_0_ODFEATURE_TEMPERATURE_SYSTEM   = 7, //Operating 
Temperature Limit feature
+SMU_11_0_ODFEATURE_MEMORY_TIMING_TUNE   = 8, //AC Timing Tuning 
feature
+SMU_11_0_ODFEATURE_FAN_ZERO_RPM_CONTROL = 9, //Zero RPM feature
+SMU_11_0_ODFEATURE_AUTO_UV_ENGINE   = 10,//Auto Under Volt 
GFXCLK feature
+SMU_11_0_ODFEATURE_AUTO_OC_ENGINE   = 11,//Auto Over Clock 
GFXCLK feature
+SMU_11_0_ODFEATURE_AUTO_OC_MEMORY   = 12,//Auto Over Clock 
MCLK feature
+SMU_11_0_ODFEATURE_FAN_CURVE= 13,//VICTOR TODO
 SMU_11_0_ODFEATURE_COUNT= 14,
 };
 #define SMU_11_0_MAX_ODFEATURE32  //Maximum Number of OD Features
-- 
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2020-02-06 Thread Christian König

Am 06.02.20 um 12:10 schrieb Lucas Stach:

Hi all,

On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:

Hi Andrey,

This commit breaks all drivers, which may bail out of the timeout
processing as they wish to extend the timeout (etnaviv, v3d).

Those drivers currently just return from the timeout handler before
calling drm_sched_stop(), which means with this commit applied we are
removing the first job from the ring_mirror_list, but never put it
back. This leads to jobs getting lost from the ring mirror, which then
causes quite a bit of fallout like unsignaled fences.

Not sure yet what to do about it, we can either add a function to add
the job back to the ring_mirror if the driver wants to extend the
timeout, or we could look for another way to stop
drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
processing.

So after thinking about this a bit more my opinion is that we need to
revert this change for now and go back to the drawing board for the
scheduler timeout handling.

Right now this starts to feel like a big midlayer mistake with all the
very intricate intertwining between the drivers and the scheduler. The
rules on when it's safe to manipulate the ring mirror and when
completed jobs are signaled and freed are not really well specified.
The fact that we need to mutate state in order to get rid of races
instead of having a single big "timeout processing is owner of the
scheduler state for now" is a big fat warning sign IMHO.


Yes, that strongly feels like a hack to me as well. But I didn't had 
time and still haven't to take a closer look and suggest something better.


Christian.



It took me far longer than I'd like to admit to understand the failure
mode with fences not getting signaled after a GPU hang. The back and
forth between scheduler and driver code makes things really hard to
follow.

Regards,
Lucas


Regards,
Lucas

On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:

Problem:
Due to a race between drm_sched_cleanup_jobs in sched thread and
drm_sched_job_timedout in timeout work there is a possiblity that
bad job was already freed while still being accessed from the
timeout thread.

Fix:
Instead of just peeking at the bad job in the mirror list
remove it from the list under lock and then put it back later when
we are garanteed no race with main sched thread is possible which
is after the thread is parked.

v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.

v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
drm_sched_get_cleanup_job already has a lock there.

v4: Fix comments to relfect latest code in drm-misc.

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
Tested-by: Emily Deng 
---
  drivers/gpu/drm/scheduler/sched_main.c | 27 +++
  1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 6774955..1bf9c40 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
unsigned long flags;
  
  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);

+
+   /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
+   spin_lock_irqsave(>job_list_lock, flags);
job = list_first_entry_or_null(>ring_mirror_list,
   struct drm_sched_job, node);
  
  	if (job) {

+   /*
+* Remove the bad job so it cannot be freed by concurrent
+* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
+* is parked at which point it's safe.
+*/
+   list_del_init(>node);
+   spin_unlock_irqrestore(>job_list_lock, flags);
+
job->sched->ops->timedout_job(job);
  
  		/*

@@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
job->sched->ops->free_job(job);
sched->free_guilty = false;
}
+   } else {
+   spin_unlock_irqrestore(>job_list_lock, flags);
}
  
  	spin_lock_irqsave(>job_list_lock, flags);

@@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
struct drm_sched_job *bad)
kthread_park(sched->thread);
  
  	/*

+* Reinsert back the bad job here - now it's safe as
+* drm_sched_get_cleanup_job cannot race against us and release the
+* bad job at this point - we parked (waited for) any in progress
+* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
+* now until the scheduler thread is unparked.
+*/
+   if (bad && bad->sched == sched)
+   /*
+* Add at the head of the queue to reflect it was the earliest
+* job extracted.
+*/
+ 

Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2020-02-06 Thread Lucas Stach
Hi all,

On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
> Hi Andrey,
> 
> This commit breaks all drivers, which may bail out of the timeout
> processing as they wish to extend the timeout (etnaviv, v3d).
> 
> Those drivers currently just return from the timeout handler before
> calling drm_sched_stop(), which means with this commit applied we are
> removing the first job from the ring_mirror_list, but never put it
> back. This leads to jobs getting lost from the ring mirror, which then
> causes quite a bit of fallout like unsignaled fences.
> 
> Not sure yet what to do about it, we can either add a function to add
> the job back to the ring_mirror if the driver wants to extend the
> timeout, or we could look for another way to stop
> drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
> processing.

So after thinking about this a bit more my opinion is that we need to
revert this change for now and go back to the drawing board for the
scheduler timeout handling.

Right now this starts to feel like a big midlayer mistake with all the
very intricate intertwining between the drivers and the scheduler. The
rules on when it's safe to manipulate the ring mirror and when
completed jobs are signaled and freed are not really well specified.
The fact that we need to mutate state in order to get rid of races
instead of having a single big "timeout processing is owner of the
scheduler state for now" is a big fat warning sign IMHO.

It took me far longer than I'd like to admit to understand the failure
mode with fences not getting signaled after a GPU hang. The back and
forth between scheduler and driver code makes things really hard to
follow.

Regards,
Lucas

> Regards,
> Lucas
> 
> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
> > Problem:
> > Due to a race between drm_sched_cleanup_jobs in sched thread and
> > drm_sched_job_timedout in timeout work there is a possiblity that
> > bad job was already freed while still being accessed from the
> > timeout thread.
> > 
> > Fix:
> > Instead of just peeking at the bad job in the mirror list
> > remove it from the list under lock and then put it back later when
> > we are garanteed no race with main sched thread is possible which
> > is after the thread is parked.
> > 
> > v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
> > 
> > v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> > drm_sched_get_cleanup_job already has a lock there.
> > 
> > v4: Fix comments to relfect latest code in drm-misc.
> > 
> > Signed-off-by: Andrey Grodzovsky 
> > Reviewed-by: Christian König 
> > Tested-by: Emily Deng 
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 27 +++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 6774955..1bf9c40 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct 
> > *work)
> > unsigned long flags;
> >  
> > sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> > +
> > +   /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
> > +   spin_lock_irqsave(>job_list_lock, flags);
> > job = list_first_entry_or_null(>ring_mirror_list,
> >struct drm_sched_job, node);
> >  
> > if (job) {
> > +   /*
> > +* Remove the bad job so it cannot be freed by concurrent
> > +* drm_sched_cleanup_jobs. It will be reinserted back after 
> > sched->thread
> > +* is parked at which point it's safe.
> > +*/
> > +   list_del_init(>node);
> > +   spin_unlock_irqrestore(>job_list_lock, flags);
> > +
> > job->sched->ops->timedout_job(job);
> >  
> > /*
> > @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct 
> > *work)
> > job->sched->ops->free_job(job);
> > sched->free_guilty = false;
> > }
> > +   } else {
> > +   spin_unlock_irqrestore(>job_list_lock, flags);
> > }
> >  
> > spin_lock_irqsave(>job_list_lock, flags);
> > @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
> > struct drm_sched_job *bad)
> > kthread_park(sched->thread);
> >  
> > /*
> > +* Reinsert back the bad job here - now it's safe as
> > +* drm_sched_get_cleanup_job cannot race against us and release the
> > +* bad job at this point - we parked (waited for) any in progress
> > +* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> > +* now until the scheduler thread is unparked.
> > +*/
> > +   if (bad && bad->sched == sched)
> > +   /*
> > +* Add at the head of the queue to reflect it was the earliest
> > +* job 

RE: [PATCH] drm/amd/powerplay: handle features disablement for baco reset in SMU FW

2020-02-06 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Quan, Evan  
Sent: Thursday, February 6, 2020 16:19
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Quan, Evan 
Subject: [PATCH] drm/amd/powerplay: handle features disablement for baco reset 
in SMU FW

SMU FW will handle the features disablement for baco reset on Arcturus.

Change-Id: Ifd87a09de2fca0c67c041afbd5e711973769119a
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 53 +-
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 9d1075823681..efd10e6fa9ef 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1467,24 +1467,39 @@ void smu_late_fini(void *handle)
smu_send_smc_msg(smu, SMU_MSG_PrepareMp1ForShutdown);  }
 
-static int smu_suspend(void *handle)
+static int smu_disabled_dpms(struct smu_context *smu)
 {
-   int ret;
-   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-   struct smu_context *smu = >smu;
-   bool baco_feature_is_enabled = false;
+   struct amdgpu_device *adev = smu->adev;
+   uint32_t smu_version;
+   int ret = 0;
 
-   if (!smu->pm_enabled)
-   return 0;
+   ret = smu_get_smc_version(smu, NULL, _version);
+   if (ret) {
+   pr_err("Failed to get smu version.\n");
+   return ret;
+   }
 
-   if(!smu->is_apu)
-   baco_feature_is_enabled = smu_feature_is_enabled(smu, 
SMU_FEATURE_BACO_BIT);
+   /*
+* For baco reset on Arcturus, this operation
+* (disable all smu feature) will be handled by SMU FW.
+*/
+   if (adev->in_gpu_reset &&
+   (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) &&
+   (adev->asic_type == CHIP_ARCTURUS && smu_version > 0x360e00))
+   return 0;
 
+   /* Disable all enabled SMU features */
ret = smu_system_features_control(smu, false);
-   if (ret)
+   if (ret) {
+   pr_err("Failed to disable smu features.\n");
return ret;
+   }
 
-   if (baco_feature_is_enabled) {
+   /* For baco reset, need to leave BACO feature enabled */
+   if (adev->in_gpu_reset &&
+   (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) &&
+   !smu->is_apu &&
+   smu_feature_is_enabled(smu, SMU_FEATURE_BACO_BIT)) {
ret = smu_feature_set_enabled(smu, SMU_FEATURE_BACO_BIT, true);
if (ret) {
pr_warn("set BACO feature enabled failed, return %d\n", 
ret); @@ -1492,6 +1507,22 @@ static int smu_suspend(void *handle)
}
}
 
+   return ret;
+}
+
+static int smu_suspend(void *handle)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   struct smu_context *smu = >smu;
+   int ret;
+
+   if (!smu->pm_enabled)
+   return 0;
+
+   ret = smu_disabled_dpms(smu);
+   if (ret)
+   return ret;
+
smu->watermarks_bitmap &= ~(WATERMARKS_LOADED);
 
if (adev->asic_type >= CHIP_NAVI10 &&
--
2.25.0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/powerplay: handle features disablement for baco reset in SMU FW

2020-02-06 Thread Evan Quan
SMU FW will handle the features disablement for baco reset
on Arcturus.

Change-Id: Ifd87a09de2fca0c67c041afbd5e711973769119a
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 53 +-
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 9d1075823681..efd10e6fa9ef 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1467,24 +1467,39 @@ void smu_late_fini(void *handle)
smu_send_smc_msg(smu, SMU_MSG_PrepareMp1ForShutdown);
 }
 
-static int smu_suspend(void *handle)
+static int smu_disabled_dpms(struct smu_context *smu)
 {
-   int ret;
-   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-   struct smu_context *smu = >smu;
-   bool baco_feature_is_enabled = false;
+   struct amdgpu_device *adev = smu->adev;
+   uint32_t smu_version;
+   int ret = 0;
 
-   if (!smu->pm_enabled)
-   return 0;
+   ret = smu_get_smc_version(smu, NULL, _version);
+   if (ret) {
+   pr_err("Failed to get smu version.\n");
+   return ret;
+   }
 
-   if(!smu->is_apu)
-   baco_feature_is_enabled = smu_feature_is_enabled(smu, 
SMU_FEATURE_BACO_BIT);
+   /*
+* For baco reset on Arcturus, this operation
+* (disable all smu feature) will be handled by SMU FW.
+*/
+   if (adev->in_gpu_reset &&
+   (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) &&
+   (adev->asic_type == CHIP_ARCTURUS && smu_version > 0x360e00))
+   return 0;
 
+   /* Disable all enabled SMU features */
ret = smu_system_features_control(smu, false);
-   if (ret)
+   if (ret) {
+   pr_err("Failed to disable smu features.\n");
return ret;
+   }
 
-   if (baco_feature_is_enabled) {
+   /* For baco reset, need to leave BACO feature enabled */
+   if (adev->in_gpu_reset &&
+   (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) &&
+   !smu->is_apu &&
+   smu_feature_is_enabled(smu, SMU_FEATURE_BACO_BIT)) {
ret = smu_feature_set_enabled(smu, SMU_FEATURE_BACO_BIT, true);
if (ret) {
pr_warn("set BACO feature enabled failed, return %d\n", 
ret);
@@ -1492,6 +1507,22 @@ static int smu_suspend(void *handle)
}
}
 
+   return ret;
+}
+
+static int smu_suspend(void *handle)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   struct smu_context *smu = >smu;
+   int ret;
+
+   if (!smu->pm_enabled)
+   return 0;
+
+   ret = smu_disabled_dpms(smu);
+   if (ret)
+   return ret;
+
smu->watermarks_bitmap &= ~(WATERMARKS_LOADED);
 
if (adev->asic_type >= CHIP_NAVI10 &&
-- 
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx