RE: [PATCH] drm/amdgpu: cancel late_init_work before gpu reset

2019-05-16 Thread Xu, Feifei
Reviewed-by: Feifei Xu 

-Original Message-
From: amd-gfx  On Behalf Of Pan, Xinhui
Sent: Friday, May 17, 2019 11:04 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, JinHuiEric 

Subject: [PATCH] drm/amdgpu: cancel late_init_work before gpu reset

[CAUTION: External Email]

gpu reset will run late_init and schedule the late_init_work.  if we keep 
triggering gpu reset in a short time, there are potenial races.

Signed-off-by: xinhui pan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ecb99aeab6cf..241cd2c75433 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3636,6 +3636,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,

dev_info(adev->dev, "GPU reset begin!\n");

+   cancel_delayed_work_sync(>late_init_work);
+
hive = amdgpu_get_xgmi_hive(adev, false);

/*
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: keep stolen memory on picasso

2019-05-16 Thread Xu, Feifei
Reviewed-by: Feifei Xu 

-Original Message-
From: amd-gfx  On Behalf Of Cui, Flora
Sent: Friday, May 17, 2019 11:48 AM
To: amd-gfx@lists.freedesktop.org
Cc: Cui, Flora 
Subject: [PATCH] drm/amdgpu: keep stolen memory on picasso

[CAUTION: External Email]

otherwise screen corrupts during modprobe.

Change-Id: I73bcf3ab0c666077dfe85436a3457a0379382304
Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index c221570..5afbb59 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -626,6 +626,7 @@ static bool gmc_v9_0_keep_stolen_memory(struct 
amdgpu_device *adev)
case CHIP_VEGA10:
return true;
case CHIP_RAVEN:
+   return (adev->pdev->device == 0x15d8);
case CHIP_VEGA12:
case CHIP_VEGA20:
default:
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/powerplay: fix sw SMU wrong UVD/VCE powergate setting

2019-05-16 Thread Xu, Feifei
Reviewed-by: Feifei Xu 

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Friday, May 17, 2019 1:45 PM
To: amd-gfx@lists.freedesktop.org
Cc: Quan, Evan 
Subject: [PATCH] drm/amd/powerplay: fix sw SMU wrong UVD/VCE powergate setting

[CAUTION: External Email]

The UVD/VCE bits are set wrongly. This causes the UVD/VCE clocks are not 
brought back correctly on needed.

Change-Id: I6eda67ea3be45fd5f422cdb78356915bf06ff41e
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c 
b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
index 0eea93c8dff7..a3a7afca7516 100644
--- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
@@ -1814,24 +1814,24 @@ static int smu_v11_0_update_od8_settings(struct 
smu_context *smu,

 static int smu_v11_0_dpm_set_uvd_enable(struct smu_context *smu, bool enable)  
{
-   if (!smu_feature_is_supported(smu, FEATURE_DPM_VCE_BIT))
+   if (!smu_feature_is_supported(smu, FEATURE_DPM_UVD_BIT))
return 0;

-   if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_VCE_BIT))
+   if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_UVD_BIT))
return 0;

-   return smu_feature_set_enabled(smu, FEATURE_DPM_VCE_BIT, enable);
+   return smu_feature_set_enabled(smu, FEATURE_DPM_UVD_BIT, 
+ enable);
 }

 static int smu_v11_0_dpm_set_vce_enable(struct smu_context *smu, bool enable)  
{
-   if (!smu_feature_is_supported(smu, FEATURE_DPM_UVD_BIT))
+   if (!smu_feature_is_supported(smu, FEATURE_DPM_VCE_BIT))
return 0;

-   if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_UVD_BIT))
+   if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_VCE_BIT))
return 0;

-   return smu_feature_set_enabled(smu, FEATURE_DPM_UVD_BIT, enable);
+   return smu_feature_set_enabled(smu, FEATURE_DPM_VCE_BIT, 
+ enable);
 }

 static int smu_v11_0_get_current_rpm(struct smu_context *smu,
--
2.21.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/powerplay: fix sw SMU wrong UVD/VCE powergate setting

2019-05-16 Thread Evan Quan
The UVD/VCE bits are set wrongly. This causes the UVD/VCE clocks
are not brought back correctly on needed.

Change-Id: I6eda67ea3be45fd5f422cdb78356915bf06ff41e
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c 
b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
index 0eea93c8dff7..a3a7afca7516 100644
--- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
@@ -1814,24 +1814,24 @@ static int smu_v11_0_update_od8_settings(struct 
smu_context *smu,
 
 static int smu_v11_0_dpm_set_uvd_enable(struct smu_context *smu, bool enable)
 {
-   if (!smu_feature_is_supported(smu, FEATURE_DPM_VCE_BIT))
+   if (!smu_feature_is_supported(smu, FEATURE_DPM_UVD_BIT))
return 0;
 
-   if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_VCE_BIT))
+   if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_UVD_BIT))
return 0;
 
-   return smu_feature_set_enabled(smu, FEATURE_DPM_VCE_BIT, enable);
+   return smu_feature_set_enabled(smu, FEATURE_DPM_UVD_BIT, enable);
 }
 
 static int smu_v11_0_dpm_set_vce_enable(struct smu_context *smu, bool enable)
 {
-   if (!smu_feature_is_supported(smu, FEATURE_DPM_UVD_BIT))
+   if (!smu_feature_is_supported(smu, FEATURE_DPM_VCE_BIT))
return 0;
 
-   if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_UVD_BIT))
+   if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_VCE_BIT))
return 0;
 
-   return smu_feature_set_enabled(smu, FEATURE_DPM_UVD_BIT, enable);
+   return smu_feature_set_enabled(smu, FEATURE_DPM_VCE_BIT, enable);
 }
 
 static int smu_v11_0_get_current_rpm(struct smu_context *smu,
-- 
2.21.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: keep stolen memory on picasso

2019-05-16 Thread Cui, Flora
otherwise screen corrupts during modprobe.

Change-Id: I73bcf3ab0c666077dfe85436a3457a0379382304
Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index c221570..5afbb59 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -626,6 +626,7 @@ static bool gmc_v9_0_keep_stolen_memory(struct 
amdgpu_device *adev)
case CHIP_VEGA10:
return true;
case CHIP_RAVEN:
+   return (adev->pdev->device == 0x15d8);
case CHIP_VEGA12:
case CHIP_VEGA20:
default:
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: cancel late_init_work before gpu reset

2019-05-16 Thread Pan, Xinhui

gpu reset will run late_init and schedule the late_init_work.  if we
keep triggering gpu reset in a short time, there are potenial races.

Signed-off-by: xinhui pan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ecb99aeab6cf..241cd2c75433 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3636,6 +3636,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
dev_info(adev->dev, "GPU reset begin!\n");
 
+   cancel_delayed_work_sync(>late_init_work);
+
hive = amdgpu_get_xgmi_hive(adev, false);
 
/*
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

2019-05-16 Thread Tao, Yintian
Hi Christian


I have modified it according to your suggestion. Can you help review this 
again? Thanks in advance.


Best Regards
Yintian Tao

-Original Message-
From: Yintian Tao  
Sent: Thursday, May 16, 2019 7:54 PM
To: amd-gfx@lists.freedesktop.org
Cc: Tao, Yintian ; Liu, Monk 
Subject: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

PSP fw primary buffer is not used under SRIOV.
Therefore, we don't need to allocate memory for it.

v2: remove superfluous check for amdgpu_bo_free_kernel().

Signed-off-by: Yintian Tao 
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index c567a55..af9835c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev)
if (!psp->cmd)
return -ENOMEM;
 
-   ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
-   AMDGPU_GEM_DOMAIN_GTT,
-   >fw_pri_bo,
-   >fw_pri_mc_addr,
-   >fw_pri_buf);
-   if (ret)
-   goto failed;
+   /* this fw pri bo is not used under SRIOV */
+   if (!amdgpu_sriov_vf(psp->adev)) {
+   ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
+ AMDGPU_GEM_DOMAIN_GTT,
+ >fw_pri_bo,
+ >fw_pri_mc_addr,
+ >fw_pri_buf);
+   if (ret)
+   goto failed;
+   }
 
ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
AMDGPU_GEM_DOMAIN_VRAM,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: don't read DF register for SRIOV

2019-05-16 Thread Tao, Yintian
Ping...

Hi Christian and Alex


Can you help review this? Thanks in advance.


Best Regards
Yintian Tao

-Original Message-
From: amd-gfx  On Behalf Of Yintian Tao
Sent: Thursday, May 16, 2019 8:11 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk ; Tao, Yintian 
Subject: [PATCH] drm/amdgpu: don't read DF register for SRIOV

[CAUTION: External Email]

Under SRIOV, reading DF register has chance to lead to AER error in host side, 
just skip reading it.

Signed-off-by: Monk Liu 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index a417763..b5bf9ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -837,7 +837,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)

if (amdgpu_emu_mode != 1)
adev->gmc.vram_width = amdgpu_atomfirmware_get_vram_width(adev);
-   if (!adev->gmc.vram_width) {
+   if (!adev->gmc.vram_width && !amdgpu_sriov_vf(adev)) {
/* hbm memory channel size */
if (adev->flags & AMD_IS_APU)
chansize = 64;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov

2019-05-16 Thread Tao, Yintian
Ping...

Hi Christian and Alex


Can you help review this? Thanks in advance.


Best Regards
Yintian Tao

-Original Message-
From: Yintian Tao  
Sent: Thursday, May 16, 2019 8:03 PM
To: amd-gfx@lists.freedesktop.org
Cc: Tao, Yintian ; Huang, Trigger 
Subject: [PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov

For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN, and DF related 
registers is not readable, seems hardcord is the only way to set the correct 
vram_width

Signed-off-by: Trigger Huang 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index c221570..a417763 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -848,6 +848,13 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
adev->gmc.vram_width = numchan * chansize;
}
 
+   /* For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN,
+* and DF related registers is not readable, seems hardcord is the
+* only way to set the correct vram_width */
+   if (amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_VEGA10)) {
+   adev->gmc.vram_width = 2048;
+   }
+
/* size in MB on si */
adev->gmc.mc_vram_size =
adev->nbio_funcs->get_memsize(adev) * 1024ULL * 1024ULL;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/7] drm/dp_mst: Register AUX devices for MST ports

2019-05-16 Thread Lyude Paul
So a couple of things:

On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote:
> From: Ville Syrjälä 
> 
> All available downstream ports - physical and logical - are exposed for
> each MST device. They are listed in /dev/, following the same naming
> scheme as SST devices by appending an incremental ID.
> 
> Although all downstream ports are exposed, only some will work as
> expected. Consider the following topology:
> 
>+-+
>|  ASIC   |
>+-+
>   Conn-0|
> |
>+v+
>   +| MST HUB |+
>   |+-+|
>   |   |
>   |Port-1   Port-2|
> +-v-+   +-v-+
> |  MST  |   |  SST  |
> |  Display  |   |  Display  |
> +---+   +---+
>   |Port-1
>   x
> 
>  MST Path  | MST Device
>  --+--
>  sst:0 | MST Hub
>  mst:0-1   | MST Display
>  mst:0-1-1 | MST Display's disconnected DP out
>  mst:0-1-8 | MST Display's internal sink
>  mst:0-2   | SST Display
> 
> On certain MST displays, the upstream physical port will ACK DPCD reads.
> However, reads on the local logical port to the internal sink will
> *NAK*. i.e. reading mst:0-1 ACKs, but mst:0-1-8 NAKs.
> 
> There may also be duplicates. Some displays will return the same GUID
> when reading DPCD from both mst:0-1 and mst:0-1-8.
> 
> There are some device-dependent behavior as well. The MST hub used
> during testing will actually *ACK* read requests on a disconnected
> physical port, whereas the MST displays will NAK.
> 
> In light of these discrepancies, it's simpler to expose all downstream
> ports - both physical and logical - and let the user decide what to use.
> 
> Cc: Lyude Paul 
> Signed-off-by: Ville Syrjälä 
> Signed-off-by: Leo Li 
> ---
>  drivers/gpu/drm/drm_dp_aux_dev.c  |  14 -
>  drivers/gpu/drm/drm_dp_mst_topology.c | 103 +
> -
>  include/drm/drm_dp_helper.h   |   4 ++
>  include/drm/drm_dp_mst_helper.h   |   6 ++
>  4 files changed, 112 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c
> b/drivers/gpu/drm/drm_dp_aux_dev.c
> index 6d84611..01c02b9 100644
> --- a/drivers/gpu/drm/drm_dp_aux_dev.c
> +++ b/drivers/gpu/drm/drm_dp_aux_dev.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -114,6 +115,7 @@ static ssize_t name_show(struct device *dev,
>  
>   return res;
>  }
> +
>  static DEVICE_ATTR_RO(name);
>  
>  static struct attribute *drm_dp_aux_attrs[] = {
> @@ -160,7 +162,11 @@ static ssize_t auxdev_read_iter(struct kiocb *iocb,
> struct iov_iter *to)
>   break;
>   }
>  
> - res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo);
> + if (aux_dev->aux->is_remote)
> + res = drm_dp_mst_dpcd_read(aux_dev->aux, pos, buf,
> todo);
> + else
> + res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo);
> +

It's still not at all clear to me why we're trying to avoid specifying actual
callbacks for the aux device. We should remove this part entirely, remove the
is_remote entry from struct drm_dp_aux, and then just specify our own hook in
struct drm_dp_aux->transfer().

>   if (res <= 0)
>   break;
>  
> @@ -207,7 +213,11 @@ static ssize_t auxdev_write_iter(struct kiocb *iocb,
> struct iov_iter *from)
>   break;
>   }
>  
> - res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo);
> + if (aux_dev->aux->is_remote)
> + res = drm_dp_mst_dpcd_write(aux_dev->aux, pos, buf,
> todo);
> + else
> + res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo);
> +
>   if (res <= 0)
>   break;
>  
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 2ab16c9..54da68e 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -35,6 +35,8 @@
>  #include 
>  #include 
>  
> +#include "drm_crtc_helper_internal.h"
> +

Unless I'm mistaken, this looks like some sort of ditritus.

>  /**
>   * DOC: dp mst helper
>   *
> @@ -52,6 +54,9 @@ static int drm_dp_dpcd_write_payload(struct
> drm_dp_mst_topology_mgr *mgr,
>int id,
>struct drm_dp_payload *payload);
>  
> +static int drm_dp_send_dpcd_read(struct drm_dp_mst_topology_mgr *mgr,
> +  struct drm_dp_mst_port *port,
> +  int offset, int size, u8 *bytes);
>  static int drm_dp_send_dpcd_write(struct drm_dp_mst_topology_mgr *mgr,
> struct drm_dp_mst_port *port,
>

Re: [PATCH 7/7] drm/nouveau: Use connector kdev as aux device parent

2019-05-16 Thread Lyude Paul
Reviewed-by: Lyude Paul 

On Thu, 2019-05-16 at 11:18 -0400, sunpeng...@amd.com wrote:
> From: Leo Li 
> 
> Set the connector's kernel device as the parent for the aux kernel
> device. This allows udev rules to access connector attributes when
> creating symlinks to aux devices.
> 
> Cc: Ben Skeggs 
> Cc: Lyude Paul 
> Signed-off-by: Leo Li 
> ---
>  drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c
> b/drivers/gpu/drm/nouveau/nouveau_connector.c
> index 3f463c9..738782a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_connector.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
> @@ -1345,7 +1345,7 @@ nouveau_connector_create(struct drm_device *dev,
>   break;
>   case DRM_MODE_CONNECTOR_DisplayPort:
>   case DRM_MODE_CONNECTOR_eDP:
> - nv_connector->aux.dev = dev->dev;
> + nv_connector->aux.dev = connector->kdev;
>   nv_connector->aux.transfer = nouveau_connector_aux_xfer;
>   snprintf(aux_name, sizeof(aux_name), "sor-%04x-%04x",
>dcbe->hasht, dcbe->hashm);
-- 
Cheers,
Lyude Paul

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/7] drm/dp-mst: Use connector kdev as aux device parent

2019-05-16 Thread Lyude Paul
aghsorry, but I need to take back my Reviewed-by. Noticed an issue when 
reloading drm and i915. I'll explain it when I respond to patch 2/7 in a
moment

On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote:
> From: Leo Li 
> 
> Placing the MST aux device as a child of the connector gives udev the
> ability to access the connector device's attributes. This will come in
> handy when writing udev rules to create more descriptive symlinks to the
> MST aux devices.
> 
> Cc: Ville Syrjälä 
> Cc: Lyude Paul 
> Signed-off-by: Leo Li 
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 54da68e..cd2f3c4 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1269,6 +1269,9 @@ static void drm_dp_add_port(struct drm_dp_mst_branch
> *mstb,
>   }
>   (*mstb->mgr->cbs->register_connector)(port->connector);
>  
> + if (port->connector->registration_state ==
> DRM_CONNECTOR_REGISTERED)
> + port->aux.dev = port->connector->kdev;
> +
>   drm_dp_aux_register_devnode(>aux);
>   }
>  
-- 
Cheers,
Lyude Paul

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: cast to unsigned int for 32-bit portability

2019-05-16 Thread Abramov, Slava
Without casting, 64-bit division is used implicitly causing DEPMOD
failure when building 32-bit kernel.

Signed-off-by: Slava Abramov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index da1dc40b9b14..0499620ec972 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -764,8 +764,9 @@ static ssize_t amdgpu_ras_sysfs_badpages_read(struct file 
*f,
  struct amdgpu_device *adev = con->adev;
  const unsigned int element_size =
  sizeof("0xabcdabcd : 0x12345678 : R\n") - 1;
- unsigned int start = (ppos + element_size - 1) / element_size;
- unsigned int end = (ppos + count - 1) / element_size;
+ unsigned int start =
+ (unsigned int) (ppos + element_size - 1) / element_size;
+ unsigned int end = (unsigned int) (ppos + count - 1) / element_size;
  ssize_t s = 0;
  struct ras_badpage *bps = NULL;
  unsigned int bps_count = 0;
--
2.17.1


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: GPU passthrough support for Stoney [Radeon R2/R3/R4/R5 Graphics]?

2019-05-16 Thread Alex Deucher
On Thu, May 16, 2019 at 4:07 PM Micah Morton  wrote:
>
> On Wed, May 15, 2019 at 7:19 PM Alex Deucher  wrote:
> >
> > On Wed, May 15, 2019 at 2:26 PM Micah Morton  wrote:
> > >
> > > Hi folks,
> > >
> > > I'm interested in running a VM on a system with an integrated Stoney
> > > [Radeon R2/R3/R4/R5 Graphics] card and passing through the graphics
> > > card to the VM using the IOMMU. I'm wondering whether this is feasible
> > > and supposed to be doable with the right setup (as opposed to passing
> > > a discrete GPU to the VM, which I think is definitely doable?).
> > >
> > > So far, I can do all the qemu/kvm/vfio/iommu stuff to run the VM and
> > > pass the integrated GPU to it, but the drm driver in the VM fails
> > > during amdgpu_device_init(). Specifically, the logs show the SMU being
> > > unresponsive, which leads to a 'SMU firmware load failed' error
> > > message and kernel panic. I can share VM logs and the invocation of
> > > qemu and such if helpful, but first wanted to know at a high level if
> > > this should be feasible?
> > >
> > > P.S.: I'm not initializing the GPU in the host bios or host kernel at
> > > all, so I should be passing a fresh GPU to the VM. Also, I'm pretty
> > > sure I'm running the correct VGA bios for this GPU in the guest VM
> > > bios before guest boot.
> > >
> > > Any comments/suggestions would be appreciated!
> >
> > It should work in at least once as long as your vm is properly set up.
>
> Is there any reason running coreboot vs UEFI at host boot would make a
> difference? I was running a modified version of coreboot that avoids
> doing any GPU initialization in firmware -- so the first POST happens
> inside the guest.

The GPU on APUs shares a bunch of resources with the CPU.  There are a
bunch of blocks which are shared and need to be initialized on both
for everything to work properly.

>
> > Note that the driver needs access to the vbios image in the guest to
> > get device specific configuration details (clocks, display connector
> > configuration, etc.).
>
> Is there anything I need to do to ensure this besides passing '-device
> vfio-pci,...,romfile=/path/to/vgarom' to qemu?

You need the actual vbios rom image from your system.  The image is
board specific.

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/7] drm/dp-mst: Use connector kdev as aux device parent

2019-05-16 Thread Lyude Paul
On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote:
> From: Leo Li 
> 
> Placing the MST aux device as a child of the connector gives udev the
> ability to access the connector device's attributes. This will come in
> handy when writing udev rules to create more descriptive symlinks to the
> MST aux devices.
> 
> Cc: Ville Syrjälä 
> Cc: Lyude Paul 
> Signed-off-by: Leo Li 
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 54da68e..cd2f3c4 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1269,6 +1269,9 @@ static void drm_dp_add_port(struct drm_dp_mst_branch
> *mstb,
>   }
>   (*mstb->mgr->cbs->register_connector)(port->connector);
>  
> + if (port->connector->registration_state ==
> DRM_CONNECTOR_REGISTERED)
> + port->aux.dev = port->connector->kdev;
> +

Line wrapping please! Probably worth running checkpatch on all of these
patches as well.

With that nitpick addressed:

Reviewed-by: Lyude Paul 

>   drm_dp_aux_register_devnode(>aux);
>   }
>  
-- 
Cheers,
Lyude Paul

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 0/7] Add DP MST AUX devices

2019-05-16 Thread Li, Sun peng (Leo)


On 2019-05-16 3:54 p.m., Lyude Paul wrote:
> [CAUTION: External Email]
> 
> Hi, could we (and for future versions of this series and others) get a respin
> of this that's actually rebased against drm-tip? That is the defacto standard
> branch to do development on, and otherwise anyone trying to test these patches
> has to resolve merge conflicts (along with maintainers). The branch this
> appears to be based off of doesn't even have the new kref scheme for branch
> devices and ports.
> 

Sorry, this was laziness on my part :)
Rebasing this now.

Leo

> On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote:
>> From: Leo Li 
>>
>> This series adds support for MST AUX devices.
>>
>> A more descriptive 'mstpath' attribute is also added to MST connector
>> devices. In addition, the DP aux device is made to be a child of the
>> corresponding connector's device where possible (*). This allows udev
>> rules to provide descriptive symlinks to the AUX devices.
>>
>> (*) This can only be done on drivers that register their connector and
>> aux devices via drm_connector_register() and drm_dp_aux_register()
>> respectively. The connector also needs to be registered before the aux
>> device.
>>
>> Leo Li (6):
>>drm/dp: Use non-cyclic idr
>>drm/dp-mst: Use connector kdev as aux device parent
>>drm/sysfs: Add mstpath attribute to connector devices
>>drm/amd/display: Use connector kdev as aux device parent
>>drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent
>>drm/nouveau: Use connector kdev as aux device parent
>>
>> Ville Syrjälä (1):
>>drm/dp_mst: Register AUX devices for MST ports
>>
>>   .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c|   2 +-
>>   drivers/gpu/drm/bridge/analogix-anx78xx.c  |  22 ++---
>>   drivers/gpu/drm/drm_dp_aux_dev.c   |  17 +++-
>>   drivers/gpu/drm/drm_dp_mst_topology.c  | 106
>> ++---
>>   drivers/gpu/drm/drm_sysfs.c|  23 +
>>   drivers/gpu/drm/nouveau/nouveau_connector.c|   2 +-
>>   include/drm/drm_dp_helper.h|   4 +
>>   include/drm/drm_dp_mst_helper.h|   6 ++
>>   8 files changed, 152 insertions(+), 30 deletions(-)
>>
> --
> Cheers,
>  Lyude Paul
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 0/7] Add DP MST AUX devices

2019-05-16 Thread Lyude Paul
Whoops-one more thing I forgot to mention. This is just personal preference
for me, but if you're ccing me on any of the patches in the series feel free
to just do it for all of them. Makes my inbox a little less confusing to look
at

On Thu, 2019-05-16 at 15:54 -0400, Lyude Paul wrote:
> Hi, could we (and for future versions of this series and others) get a
> respin
> of this that's actually rebased against drm-tip? That is the defacto
> standard
> branch to do development on, and otherwise anyone trying to test these
> patches
> has to resolve merge conflicts (along with maintainers). The branch this
> appears to be based off of doesn't even have the new kref scheme for branch
> devices and ports.
> 
> On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote:
> > From: Leo Li 
> > 
> > This series adds support for MST AUX devices. 
> > 
> > A more descriptive 'mstpath' attribute is also added to MST connector
> > devices. In addition, the DP aux device is made to be a child of the
> > corresponding connector's device where possible (*). This allows udev
> > rules to provide descriptive symlinks to the AUX devices.
> > 
> > (*) This can only be done on drivers that register their connector and
> > aux devices via drm_connector_register() and drm_dp_aux_register()
> > respectively. The connector also needs to be registered before the aux
> > device.
> > 
> > Leo Li (6):
> >   drm/dp: Use non-cyclic idr
> >   drm/dp-mst: Use connector kdev as aux device parent
> >   drm/sysfs: Add mstpath attribute to connector devices
> >   drm/amd/display: Use connector kdev as aux device parent
> >   drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent
> >   drm/nouveau: Use connector kdev as aux device parent
> > 
> > Ville Syrjälä (1):
> >   drm/dp_mst: Register AUX devices for MST ports
> > 
> >  .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c|   2 +-
> >  drivers/gpu/drm/bridge/analogix-anx78xx.c  |  22 ++---
> >  drivers/gpu/drm/drm_dp_aux_dev.c   |  17 +++-
> >  drivers/gpu/drm/drm_dp_mst_topology.c  | 106
> > ++---
> >  drivers/gpu/drm/drm_sysfs.c|  23 +
> >  drivers/gpu/drm/nouveau/nouveau_connector.c|   2 +-
> >  include/drm/drm_dp_helper.h|   4 +
> >  include/drm/drm_dp_mst_helper.h|   6 ++
> >  8 files changed, 152 insertions(+), 30 deletions(-)
> > 
-- 
Cheers,
Lyude Paul

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: GPU passthrough support for Stoney [Radeon R2/R3/R4/R5 Graphics]?

2019-05-16 Thread Micah Morton
On Wed, May 15, 2019 at 7:19 PM Alex Deucher  wrote:
>
> On Wed, May 15, 2019 at 2:26 PM Micah Morton  wrote:
> >
> > Hi folks,
> >
> > I'm interested in running a VM on a system with an integrated Stoney
> > [Radeon R2/R3/R4/R5 Graphics] card and passing through the graphics
> > card to the VM using the IOMMU. I'm wondering whether this is feasible
> > and supposed to be doable with the right setup (as opposed to passing
> > a discrete GPU to the VM, which I think is definitely doable?).
> >
> > So far, I can do all the qemu/kvm/vfio/iommu stuff to run the VM and
> > pass the integrated GPU to it, but the drm driver in the VM fails
> > during amdgpu_device_init(). Specifically, the logs show the SMU being
> > unresponsive, which leads to a 'SMU firmware load failed' error
> > message and kernel panic. I can share VM logs and the invocation of
> > qemu and such if helpful, but first wanted to know at a high level if
> > this should be feasible?
> >
> > P.S.: I'm not initializing the GPU in the host bios or host kernel at
> > all, so I should be passing a fresh GPU to the VM. Also, I'm pretty
> > sure I'm running the correct VGA bios for this GPU in the guest VM
> > bios before guest boot.
> >
> > Any comments/suggestions would be appreciated!
>
> It should work in at least once as long as your vm is properly set up.

Is there any reason running coreboot vs UEFI at host boot would make a
difference? I was running a modified version of coreboot that avoids
doing any GPU initialization in firmware -- so the first POST happens
inside the guest.

> Note that the driver needs access to the vbios image in the guest to
> get device specific configuration details (clocks, display connector
> configuration, etc.).

Is there anything I need to do to ensure this besides passing '-device
vfio-pci,...,romfile=/path/to/vgarom' to qemu?

>
> Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 0/7] Add DP MST AUX devices

2019-05-16 Thread Lyude Paul
Hi, could we (and for future versions of this series and others) get a respin
of this that's actually rebased against drm-tip? That is the defacto standard
branch to do development on, and otherwise anyone trying to test these patches
has to resolve merge conflicts (along with maintainers). The branch this
appears to be based off of doesn't even have the new kref scheme for branch
devices and ports.

On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote:
> From: Leo Li 
> 
> This series adds support for MST AUX devices. 
> 
> A more descriptive 'mstpath' attribute is also added to MST connector
> devices. In addition, the DP aux device is made to be a child of the
> corresponding connector's device where possible (*). This allows udev
> rules to provide descriptive symlinks to the AUX devices.
> 
> (*) This can only be done on drivers that register their connector and
> aux devices via drm_connector_register() and drm_dp_aux_register()
> respectively. The connector also needs to be registered before the aux
> device.
> 
> Leo Li (6):
>   drm/dp: Use non-cyclic idr
>   drm/dp-mst: Use connector kdev as aux device parent
>   drm/sysfs: Add mstpath attribute to connector devices
>   drm/amd/display: Use connector kdev as aux device parent
>   drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent
>   drm/nouveau: Use connector kdev as aux device parent
> 
> Ville Syrjälä (1):
>   drm/dp_mst: Register AUX devices for MST ports
> 
>  .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c|   2 +-
>  drivers/gpu/drm/bridge/analogix-anx78xx.c  |  22 ++---
>  drivers/gpu/drm/drm_dp_aux_dev.c   |  17 +++-
>  drivers/gpu/drm/drm_dp_mst_topology.c  | 106
> ++---
>  drivers/gpu/drm/drm_sysfs.c|  23 +
>  drivers/gpu/drm/nouveau/nouveau_connector.c|   2 +-
>  include/drm/drm_dp_helper.h|   4 +
>  include/drm/drm_dp_mst_helper.h|   6 ++
>  8 files changed, 152 insertions(+), 30 deletions(-)
> 
-- 
Cheers,
Lyude Paul

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Hard lockups with ROCM

2019-05-16 Thread Kuehling, Felix
Hi Daniel,

On 2019-05-12 9:44 p.m., Daniel Kasak wrote:
> [CAUTION: External Email]
> Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and 
> 5.1.0 kernel. Things were going great with various boinc GPU tasks. 
> But there is a setiathome GPU task which reliably gives me a hard 
> lockup within about 30 minutes of running. I actually had to do *two* 
> emergency re-installs over the past week.

Sorry to hear about your trouble. Do you have a second computer you can 
use to remote login into your system? Chances are that it's still 
responsive and only the screen is frozen.

Also, you could try booting in console mode (without an xserver). The 
console usually still works even when the GPU compute units or SDMA 
engines are hanging.

If you manage to do an emergency reboot with sysrq (remount-RO and 
reboot), you should see the kernel log of your previous session in 
/var/log. On Ubuntu it's in /var/log/kern.log. Not sure where it is on 
Gentoo. There is a good chance the log contains helpful information 
(e.g. if the driver detected a hang but failed to reset the GPU, or 
maybe a driver bug that leads to a deadlock or kernel panic).

> Perhaps part of this was my fault ( running btrfs with lzo compression 
> on my root partition ... ). But absolutely part of this was the hard 
> lockups. I've tested all kinds of other things ( eg rebuilding lots of 
> stuff under Gentoo ) ... I don't have a general stability issue even 
> under hours of high load. But after restarting boinc with that same 
> setiathome task ... !
>
> If someone wants me to sacrifice another installation, they can point 
> me to instructions for trying to gather more information.

If you want to risk another installation, it may be a good idea to do it 
on a spare hard drive, or a spare partition on your existing hard drive. 
Also, use a more conventional choice of file system. A simple ext4 is 
pretty robust in my experience. We get hard lockups all the time. I 
usually only reinstall my system for big OS upgrades or if I'm stupid 
and mess something up myself.

Which GPU are you using?

There are some things you could try to narrow down the cause of your 
problem.

 1. Monitor GPU temperature while running setiathome
 2. If you're building your own kernel, enable some helpful kernel debug
options that can provide very helpful diagnostic info: lock
debugging, memory debugging, lockup/hang debugging
 3. Try running with lower GPU clocks (rocm-smi --setperflevel low). If
that fixes it, you may have inadequate cooling or power supply
 4. Try running in console mode (without Xserver or other graphical UI
running). If that fixes it, there may be a bad interaction between
graphics and compute
 5. Try updating your firmware. The DKMS package included in our ROCm
releases includes the latest firmware. You should be able to extract
it from there and drop it into /lib/firmware/amdgpu
 6. Try to find a regression point. Is there any known version of ROCm
or the kernel where it worked correctly?

Regards,
   Felix


>
> Anyway ... perhaps more work around detecting and recovering from GPU 
> lockups is in order?
>
> Dan
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 5/7] drm/amd/display: Use connector kdev as aux device parent

2019-05-16 Thread Kazlauskas, Nicholas
On 5/16/19 11:18 AM, sunpeng...@amd.com wrote:
> 
> From: Leo Li 
> 
> Set the connector's kernel device as the parent for the aux kernel
> device. This allows udev rules to access connector attributes when
> creating symlinks to aux devices.
> 
> For example, the following udev rule:
> 
> SUBSYSTEM=="drm_dp_aux_dev", SUBSYSTEMS=="drm", ATTRS{edid}=="*",
>  SYMLINK+="drm_dp_aux/by-name/$id"
> 
> Will create the following symlinks using the connector's name:
> 
> $ ls /dev/drm_dp_aux/by-name/
> card0-DP-1  card0-DP-2  card0-DP-3
> 
> Cc: Ville Syrjälä 
> Cc: Lyude Paul 
> Cc: Nicholas Kazlauskas 
> Cc: Harry Wentland 
> Cc: Jerry (Fangzhi) Zuo 
> Signed-off-by: Leo Li 

Reviewed-by: Nicholas Kazlauskas 

No idea why it wasn't like this in the first place.

Nicholas Kazlauskas

> ---
>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> index a6f44a4..083fb97 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> @@ -385,7 +385,7 @@ void amdgpu_dm_initialize_dp_connector(struct 
> amdgpu_display_manager *dm,
> struct amdgpu_dm_connector 
> *aconnector)
>   {
>  aconnector->dm_dp_aux.aux.name = "dmdc";
> -   aconnector->dm_dp_aux.aux.dev = dm->adev->dev;
> +   aconnector->dm_dp_aux.aux.dev = aconnector->base.kdev;
>  aconnector->dm_dp_aux.aux.transfer = dm_dp_aux_transfer;
>  aconnector->dm_dp_aux.ddc_service = aconnector->dc_link->ddc;
> 
> --
> 2.7.4
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Hard lockups with ROCM

2019-05-16 Thread Paul Menzel
Dear Daniel,


On 05/16/2019 01:52 PM, Daniel Kasak wrote:
> On Thu, May 16, 2019 at 11:43 AM Alex Deucher  wrote:
> 
>> On Wed, May 15, 2019 at 8:33 PM Daniel Kasak 
>> wrote:
>>>
>>> On Mon, May 13, 2019 at 11:44 AM Daniel Kasak 
>> wrote:

 Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and
>> 5.1.0 kernel. Things were going great with various boinc GPU tasks. But
>> there is a setiathome GPU task which reliably gives me a hard lockup within
>> about 30 minutes of running. I actually had to do *two* emergency
>> re-installs over the past week. Perhaps part of this was my fault ( running
>> btrfs with lzo compression on my root partition ... ). But absolutely part
>> of this was the hard lockups. I've tested all kinds of other things ( eg
>> rebuilding lots of stuff under Gentoo ) ... I don't have a general
>> stability issue even under hours of high load. But after restarting boinc
>> with that same setiathome task ... !

 If someone wants me to sacrifice another installation, they can point
>> me to instructions for trying to gather more information.

 Anyway ... perhaps more work around detecting and recovering from GPU
>> lockups is in order?

>>> 
>>>
>>> That's what I was afraid of :(
>>
>> Not sure what you were afraid of.  I don't think anyone has looked at
>> setiathome on ROCm.  I'd suggest filing a bug
>> (https://bugs.freedesktop.org) and attaching your dmesg output and
>> xorg log (if using X).  If there is a GPU reset, note that you will
>> need to restart your desktop environment because currently neither
>> glamor or any compositors support GL robustness extensions to reset
>> their contexts after a GPU reset.

> Hi Alex. dmesg output is not available ... this is a *hard* lockup. I need
> to power-cycle after it happens ( ALT + SysRq + { S , U , B } doesn't even
> work ). That's why I asked for instructions to possibly gather more info. I
> did check the xorg log after I did an emergency export of my filesystem ...
> nothing of interest in there. It seems like I currently don't really have
> enough info to make a bug report worthwhile.

Does your board have a serial port? If yes, please use the serial console to
gather the messages on another system.

Sometimes the netconsole [1] is also supposed to be able to send the last
Linux messages out.


Kind regards,

Paul


[1]: https://www.kernel.org/doc/Documentation/networking/netconsole.txt



smime.p7s
Description: S/MIME Cryptographic Signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 6/7] drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent

2019-05-16 Thread sunpeng.li
From: Leo Li 

Set the connector's kernel device as the parent for the aux kernel
device. This allows udev rules to access connector attributes when
creating symlinks to aux devices.

To do so, the connector needs to be registered beforehand. Therefore,
shift aux registration to be after connector registration.

Cc: Enric Balletbo i Serra 
Cc: Nicolas Boichat 
Signed-off-by: Leo Li 
---
 drivers/gpu/drm/bridge/analogix-anx78xx.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/bridge/analogix-anx78xx.c 
b/drivers/gpu/drm/bridge/analogix-anx78xx.c
index f8433c9..9fc8b4c 100644
--- a/drivers/gpu/drm/bridge/analogix-anx78xx.c
+++ b/drivers/gpu/drm/bridge/analogix-anx78xx.c
@@ -1018,17 +1018,6 @@ static int anx78xx_bridge_attach(struct drm_bridge 
*bridge)
return -ENODEV;
}
 
-   /* Register aux channel */
-   anx78xx->aux.name = "DP-AUX";
-   anx78xx->aux.dev = >client->dev;
-   anx78xx->aux.transfer = anx78xx_aux_transfer;
-
-   err = drm_dp_aux_register(>aux);
-   if (err < 0) {
-   DRM_ERROR("Failed to register aux channel: %d\n", err);
-   return err;
-   }
-
err = drm_connector_init(bridge->dev, >connector,
 _connector_funcs,
 DRM_MODE_CONNECTOR_DisplayPort);
@@ -1048,6 +1037,17 @@ static int anx78xx_bridge_attach(struct drm_bridge 
*bridge)
 
anx78xx->connector.polled = DRM_CONNECTOR_POLL_HPD;
 
+   /* Register aux channel */
+   anx78xx->aux.name = "DP-AUX";
+   anx78xx->aux.dev = >connector.kdev;
+   anx78xx->aux.transfer = anx78xx_aux_transfer;
+
+   err = drm_dp_aux_register(>aux);
+   if (err < 0) {
+   DRM_ERROR("Failed to register aux channel: %d\n", err);
+   return err;
+   }
+
err = drm_connector_attach_encoder(>connector,
   bridge->encoder);
if (err) {
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/7] drm/sysfs: Add mstpath attribute to connector devices

2019-05-16 Thread sunpeng.li
From: Leo Li 

This can be used to create more descriptive symlinks for MST aux
devices. Consider the following udev rule:

SUBSYSTEM=="drm_dp_aux_dev", SUBSYSTEMS=="drm", ATTRS{mstpath}=="?*",
SYMLINK+="drm_dp_aux/by-path/$attr{mstpath}"

The following symlinks will be created (depending on your MST topology):

$ ls /dev/drm_dp_aux/by-path/
card0-mst:0-1  card0-mst:0-1-1  card0-mst:0-1-8  card0-mst:0-8

Cc: Ville Syrjälä 
Cc: Lyude Paul 
Signed-off-by: Leo Li 
---
 drivers/gpu/drm/drm_sysfs.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
index ecb7b33..e000e0c 100644
--- a/drivers/gpu/drm/drm_sysfs.c
+++ b/drivers/gpu/drm/drm_sysfs.c
@@ -229,16 +229,39 @@ static ssize_t modes_show(struct device *device,
return written;
 }
 
+static ssize_t mstpath_show(struct device *device,
+   struct device_attribute *attr,
+   char *buf)
+{
+   struct drm_connector *connector = to_drm_connector(device);
+   ssize_t ret = 0;
+   char *path;
+
+   mutex_lock(>dev->mode_config.mutex);
+   if (!connector->path_blob_ptr)
+   goto unlock;
+
+   path = connector->path_blob_ptr->data;
+   ret = snprintf(buf, PAGE_SIZE, "card%d-%s\n",
+  connector->dev->primary->index, path);
+
+unlock:
+   mutex_unlock(>dev->mode_config.mutex);
+   return ret;
+}
+
 static DEVICE_ATTR_RW(status);
 static DEVICE_ATTR_RO(enabled);
 static DEVICE_ATTR_RO(dpms);
 static DEVICE_ATTR_RO(modes);
+static DEVICE_ATTR_RO(mstpath);
 
 static struct attribute *connector_dev_attrs[] = {
_attr_status.attr,
_attr_enabled.attr,
_attr_dpms.attr,
_attr_modes.attr,
+   _attr_mstpath.attr,
NULL
 };
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/7] drm/dp: Use non-cyclic idr

2019-05-16 Thread sunpeng.li
From: Leo Li 

In preparation for adding aux devices for DP MST, make the IDR
non-cyclic. That way, hotplug cycling MST devices won't needlessly
increment the minor version index.

Signed-off-by: Leo Li 
Reviewed-by: Lyude Paul 
Reviewed-by: Ville Syrjälä 
---
 drivers/gpu/drm/drm_dp_aux_dev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c b/drivers/gpu/drm/drm_dp_aux_dev.c
index 0e4f25d..6d84611 100644
--- a/drivers/gpu/drm/drm_dp_aux_dev.c
+++ b/drivers/gpu/drm/drm_dp_aux_dev.c
@@ -80,8 +80,7 @@ static struct drm_dp_aux_dev *alloc_drm_dp_aux_dev(struct 
drm_dp_aux *aux)
kref_init(_dev->refcount);
 
mutex_lock(_idr_mutex);
-   index = idr_alloc_cyclic(_idr, aux_dev, 0, DRM_AUX_MINORS,
-GFP_KERNEL);
+   index = idr_alloc(_idr, aux_dev, 0, DRM_AUX_MINORS, GFP_KERNEL);
mutex_unlock(_idr_mutex);
if (index < 0) {
kfree(aux_dev);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 5/7] drm/amd/display: Use connector kdev as aux device parent

2019-05-16 Thread sunpeng.li
From: Leo Li 

Set the connector's kernel device as the parent for the aux kernel
device. This allows udev rules to access connector attributes when
creating symlinks to aux devices.

For example, the following udev rule:

SUBSYSTEM=="drm_dp_aux_dev", SUBSYSTEMS=="drm", ATTRS{edid}=="*",
SYMLINK+="drm_dp_aux/by-name/$id"

Will create the following symlinks using the connector's name:

$ ls /dev/drm_dp_aux/by-name/
card0-DP-1  card0-DP-2  card0-DP-3

Cc: Ville Syrjälä 
Cc: Lyude Paul 
Cc: Nicholas Kazlauskas 
Cc: Harry Wentland 
Cc: Jerry (Fangzhi) Zuo 
Signed-off-by: Leo Li 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index a6f44a4..083fb97 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -385,7 +385,7 @@ void amdgpu_dm_initialize_dp_connector(struct 
amdgpu_display_manager *dm,
   struct amdgpu_dm_connector *aconnector)
 {
aconnector->dm_dp_aux.aux.name = "dmdc";
-   aconnector->dm_dp_aux.aux.dev = dm->adev->dev;
+   aconnector->dm_dp_aux.aux.dev = aconnector->base.kdev;
aconnector->dm_dp_aux.aux.transfer = dm_dp_aux_transfer;
aconnector->dm_dp_aux.ddc_service = aconnector->dc_link->ddc;
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 7/7] drm/nouveau: Use connector kdev as aux device parent

2019-05-16 Thread sunpeng.li
From: Leo Li 

Set the connector's kernel device as the parent for the aux kernel
device. This allows udev rules to access connector attributes when
creating symlinks to aux devices.

Cc: Ben Skeggs 
Cc: Lyude Paul 
Signed-off-by: Leo Li 
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c 
b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 3f463c9..738782a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -1345,7 +1345,7 @@ nouveau_connector_create(struct drm_device *dev,
break;
case DRM_MODE_CONNECTOR_DisplayPort:
case DRM_MODE_CONNECTOR_eDP:
-   nv_connector->aux.dev = dev->dev;
+   nv_connector->aux.dev = connector->kdev;
nv_connector->aux.transfer = nouveau_connector_aux_xfer;
snprintf(aux_name, sizeof(aux_name), "sor-%04x-%04x",
 dcbe->hasht, dcbe->hashm);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/7] drm/dp-mst: Use connector kdev as aux device parent

2019-05-16 Thread sunpeng.li
From: Leo Li 

Placing the MST aux device as a child of the connector gives udev the
ability to access the connector device's attributes. This will come in
handy when writing udev rules to create more descriptive symlinks to the
MST aux devices.

Cc: Ville Syrjälä 
Cc: Lyude Paul 
Signed-off-by: Leo Li 
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 54da68e..cd2f3c4 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1269,6 +1269,9 @@ static void drm_dp_add_port(struct drm_dp_mst_branch 
*mstb,
}
(*mstb->mgr->cbs->register_connector)(port->connector);
 
+   if (port->connector->registration_state == 
DRM_CONNECTOR_REGISTERED)
+   port->aux.dev = port->connector->kdev;
+
drm_dp_aux_register_devnode(>aux);
}
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 0/7] Add DP MST AUX devices

2019-05-16 Thread sunpeng.li
From: Leo Li 

This series adds support for MST AUX devices. 

A more descriptive 'mstpath' attribute is also added to MST connector
devices. In addition, the DP aux device is made to be a child of the
corresponding connector's device where possible (*). This allows udev
rules to provide descriptive symlinks to the AUX devices.

(*) This can only be done on drivers that register their connector and
aux devices via drm_connector_register() and drm_dp_aux_register()
respectively. The connector also needs to be registered before the aux
device.

Leo Li (6):
  drm/dp: Use non-cyclic idr
  drm/dp-mst: Use connector kdev as aux device parent
  drm/sysfs: Add mstpath attribute to connector devices
  drm/amd/display: Use connector kdev as aux device parent
  drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent
  drm/nouveau: Use connector kdev as aux device parent

Ville Syrjälä (1):
  drm/dp_mst: Register AUX devices for MST ports

 .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c|   2 +-
 drivers/gpu/drm/bridge/analogix-anx78xx.c  |  22 ++---
 drivers/gpu/drm/drm_dp_aux_dev.c   |  17 +++-
 drivers/gpu/drm/drm_dp_mst_topology.c  | 106 ++---
 drivers/gpu/drm/drm_sysfs.c|  23 +
 drivers/gpu/drm/nouveau/nouveau_connector.c|   2 +-
 include/drm/drm_dp_helper.h|   4 +
 include/drm/drm_dp_mst_helper.h|   6 ++
 8 files changed, 152 insertions(+), 30 deletions(-)

-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/7] drm/dp_mst: Register AUX devices for MST ports

2019-05-16 Thread sunpeng.li
From: Ville Syrjälä 

All available downstream ports - physical and logical - are exposed for
each MST device. They are listed in /dev/, following the same naming
scheme as SST devices by appending an incremental ID.

Although all downstream ports are exposed, only some will work as
expected. Consider the following topology:

   +-+
   |  ASIC   |
   +-+
  Conn-0|
|
   +v+
  +| MST HUB |+
  |+-+|
  |   |
  |Port-1   Port-2|
+-v-+   +-v-+
|  MST  |   |  SST  |
|  Display  |   |  Display  |
+---+   +---+
  |Port-1
  x

 MST Path  | MST Device
 --+--
 sst:0 | MST Hub
 mst:0-1   | MST Display
 mst:0-1-1 | MST Display's disconnected DP out
 mst:0-1-8 | MST Display's internal sink
 mst:0-2   | SST Display

On certain MST displays, the upstream physical port will ACK DPCD reads.
However, reads on the local logical port to the internal sink will
*NAK*. i.e. reading mst:0-1 ACKs, but mst:0-1-8 NAKs.

There may also be duplicates. Some displays will return the same GUID
when reading DPCD from both mst:0-1 and mst:0-1-8.

There are some device-dependent behavior as well. The MST hub used
during testing will actually *ACK* read requests on a disconnected
physical port, whereas the MST displays will NAK.

In light of these discrepancies, it's simpler to expose all downstream
ports - both physical and logical - and let the user decide what to use.

Cc: Lyude Paul 
Signed-off-by: Ville Syrjälä 
Signed-off-by: Leo Li 
---
 drivers/gpu/drm/drm_dp_aux_dev.c  |  14 -
 drivers/gpu/drm/drm_dp_mst_topology.c | 103 +-
 include/drm/drm_dp_helper.h   |   4 ++
 include/drm/drm_dp_mst_helper.h   |   6 ++
 4 files changed, 112 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c b/drivers/gpu/drm/drm_dp_aux_dev.c
index 6d84611..01c02b9 100644
--- a/drivers/gpu/drm/drm_dp_aux_dev.c
+++ b/drivers/gpu/drm/drm_dp_aux_dev.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -114,6 +115,7 @@ static ssize_t name_show(struct device *dev,
 
return res;
 }
+
 static DEVICE_ATTR_RO(name);
 
 static struct attribute *drm_dp_aux_attrs[] = {
@@ -160,7 +162,11 @@ static ssize_t auxdev_read_iter(struct kiocb *iocb, struct 
iov_iter *to)
break;
}
 
-   res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo);
+   if (aux_dev->aux->is_remote)
+   res = drm_dp_mst_dpcd_read(aux_dev->aux, pos, buf, 
todo);
+   else
+   res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo);
+
if (res <= 0)
break;
 
@@ -207,7 +213,11 @@ static ssize_t auxdev_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
break;
}
 
-   res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo);
+   if (aux_dev->aux->is_remote)
+   res = drm_dp_mst_dpcd_write(aux_dev->aux, pos, buf, 
todo);
+   else
+   res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo);
+
if (res <= 0)
break;
 
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 2ab16c9..54da68e 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -35,6 +35,8 @@
 #include 
 #include 
 
+#include "drm_crtc_helper_internal.h"
+
 /**
  * DOC: dp mst helper
  *
@@ -52,6 +54,9 @@ static int drm_dp_dpcd_write_payload(struct 
drm_dp_mst_topology_mgr *mgr,
 int id,
 struct drm_dp_payload *payload);
 
+static int drm_dp_send_dpcd_read(struct drm_dp_mst_topology_mgr *mgr,
+struct drm_dp_mst_port *port,
+int offset, int size, u8 *bytes);
 static int drm_dp_send_dpcd_write(struct drm_dp_mst_topology_mgr *mgr,
  struct drm_dp_mst_port *port,
  int offset, int size, u8 *bytes);
@@ -941,6 +946,8 @@ static void drm_dp_destroy_port(struct kref *kref)
struct drm_dp_mst_topology_mgr *mgr = port->mgr;
 
if (!port->input) {
+   drm_dp_aux_unregister_devnode(>aux);
+
port->vcpi.num_slots = 0;
 
kfree(port->cached_edid);
@@ -1095,6 +1102,46 @@ static bool drm_dp_port_setup_pdt(struct drm_dp_mst_port 
*port)
return send_link;
 }
 
+/**
+ * drm_dp_mst_dpcd_read() - read a series of bytes from the DPCD via sideband
+ * @aux: Fake sideband AUX CH
+ * @offset: address of the 

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Kenny Ho
On Thu, May 16, 2019 at 10:10 AM Tejun Heo  wrote:
> I haven't gone through the patchset yet but some quick comments.
>
> On Wed, May 15, 2019 at 10:29:21PM -0400, Kenny Ho wrote:
> > Given this controller is specific to the drm kernel subsystem which
> > uses minor to identify drm device, I don't see a need to complicate
> > the interfaces more by having major and a key.  As you can see in the
> > examples below, the drm device minor corresponds to the line number.
> > I am not sure how strict cgroup upstream is about the convention but I
>
> We're pretty strict.
>
> > am hoping there are flexibility here to allow for what I have
> > implemented.  There are a couple of other things I have done that is
>
> So, please follow the interface conventions.  We can definitely add
> new ones but that would need functional reasons.
>
> > not described in the convention: 1) inclusion of read-only *.help file
> > at the root cgroup, 2) use read-only (which I can potentially make rw)
> > *.default file instead of having a default entries (since the default
> > can be different for different devices) inside the control files (this
> > way, the resetting of cgroup values for all the drm devices, can be
> > done by a simple 'cp'.)
>
> Again, please follow the existing conventions.  There's a lot more
> harm than good in every controller being creative in their own way.
> It's trivial to build convenience features in userspace.  Please do it
> there.
I can certainly remove the ro *.help file and leave the documentation
to Documentation/, but for the *.default I do have a functional reason
to it.  As far as I can tell from the convention, the default is per
cgroup and there is no way to describe per device default.  Although,
perhaps we are talking about two different kinds of defaults.  Anyway,
I can leave the discussion to a more detailed review.

Regards,
Kenny
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Kenny Ho
On Thu, May 16, 2019 at 10:12 AM Christian König
 wrote:
> Am 16.05.19 um 16:03 schrieb Kenny Ho:
> > On Thu, May 16, 2019 at 3:25 AM Christian König
> >  wrote:
> >> Am 16.05.19 um 09:16 schrieb Koenig, Christian:
> >> We need something like the Linux sysfs location or similar to have a
> >> stable implementation.
> > I get that, which is why I don't use minor to identify cards in user
> > space apps I wrote:
> > https://github.com/RadeonOpenCompute/k8s-device-plugin/blob/c2659c9d1d0713cad36fb5256681125121e6e32f/internal/pkg/amdgpu/amdgpu.go#L85
>
> Yeah, that is certainly a possibility.
>
> > But within the kernel, I think my use of minor is consistent with the
> > rest of the drm subsystem.  I hope I don't need to reform the way the
> > drm subsystem use minor in order to introduce a cgroup controller.
>
> Well I would try to avoid using the minor and at least look for
> alternatives. E.g. what does udev uses to identify the devices for
> example? And IIRC we have something like a "device-name" in the kernel
> as well (what's printed in the logs).
>
> The minimum we need to do is get away from the minor=linenum approach,
> cause as Daniel pointed out the minor allocation is quite a mess and not
> necessary contiguous.

I noticed :) but looks like there isn't much of a choice from what
Tejun/cgroup replied about convention.

Regards,
Kenny
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Christian König

Am 16.05.19 um 16:03 schrieb Kenny Ho:

On Thu, May 16, 2019 at 3:25 AM Christian König
 wrote:

Am 16.05.19 um 09:16 schrieb Koenig, Christian:

Am 16.05.19 um 04:29 schrieb Kenny Ho:

On Wed, May 15, 2019 at 5:26 PM Welty, Brian  wrote:

On 5/9/2019 2:04 PM, Kenny Ho wrote:

Each file is multi-lined with one entry/line per drm device.

Multi-line is correct for multiple devices, but I believe you need
to use a KEY to denote device for both your set and get routines.
I didn't see your set functions reading a key, or the get functions
printing the key in output.
cgroups-v2 conventions mention using KEY of major:minor, but I think
you can use drm_minor as key?

Given this controller is specific to the drm kernel subsystem which
uses minor to identify drm device,

Wait a second, using the DRM minor is a good idea in the first place.

Well that should have read "is not a good idea"..

I have a test system with a Vega10 and a Vega20. Which device gets which
minor is not stable, but rather defined by the scan order of the PCIe bus.

Normally the scan order is always the same, but adding or removing
devices or delaying things just a little bit during init is enough to
change this.

We need something like the Linux sysfs location or similar to have a
stable implementation.

I get that, which is why I don't use minor to identify cards in user
space apps I wrote:
https://github.com/RadeonOpenCompute/k8s-device-plugin/blob/c2659c9d1d0713cad36fb5256681125121e6e32f/internal/pkg/amdgpu/amdgpu.go#L85


Yeah, that is certainly a possibility.


But within the kernel, I think my use of minor is consistent with the
rest of the drm subsystem.  I hope I don't need to reform the way the
drm subsystem use minor in order to introduce a cgroup controller.


Well I would try to avoid using the minor and at least look for 
alternatives. E.g. what does udev uses to identify the devices for 
example? And IIRC we have something like a "device-name" in the kernel 
as well (what's printed in the logs).


The minimum we need to do is get away from the minor=linenum approach, 
cause as Daniel pointed out the minor allocation is quite a mess and not 
necessary contiguous.


Regards,
Christian.



Regards,
Kenny
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Tejun Heo
Hello,

I haven't gone through the patchset yet but some quick comments.

On Wed, May 15, 2019 at 10:29:21PM -0400, Kenny Ho wrote:
> Given this controller is specific to the drm kernel subsystem which
> uses minor to identify drm device, I don't see a need to complicate
> the interfaces more by having major and a key.  As you can see in the
> examples below, the drm device minor corresponds to the line number.
> I am not sure how strict cgroup upstream is about the convention but I

We're pretty strict.

> am hoping there are flexibility here to allow for what I have
> implemented.  There are a couple of other things I have done that is

So, please follow the interface conventions.  We can definitely add
new ones but that would need functional reasons.

> not described in the convention: 1) inclusion of read-only *.help file
> at the root cgroup, 2) use read-only (which I can potentially make rw)
> *.default file instead of having a default entries (since the default
> can be different for different devices) inside the control files (this
> way, the resetting of cgroup values for all the drm devices, can be
> done by a simple 'cp'.)

Again, please follow the existing conventions.  There's a lot more
harm than good in every controller being creative in their own way.
It's trivial to build convenience features in userspace.  Please do it
there.

> > Is this really useful for an administrator to control?
> > Isn't the resource we want to control actually the physical backing store?
> That's correct.  This is just the first level of control since the
> backing store can be backed by different type of memory.  I am in the
> process of adding at least two more resources.  Stay tuned.  I am
> doing the charge here to enforce the idea of "creator is deemed owner"
> at a place where the code is shared by all (the init function.)

Ideally, controller should only control hard resources which impact
behaviors and performance which are immediately visible to users.

Thanks.

-- 
tejun
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Koenig, Christian
Am 16.05.19 um 14:28 schrieb Daniel Vetter:
> [CAUTION: External Email]
>
> On Thu, May 16, 2019 at 09:25:31AM +0200, Christian König wrote:
>> Am 16.05.19 um 09:16 schrieb Koenig, Christian:
>>> Am 16.05.19 um 04:29 schrieb Kenny Ho:
 [CAUTION: External Email]

 On Wed, May 15, 2019 at 5:26 PM Welty, Brian  wrote:
> On 5/9/2019 2:04 PM, Kenny Ho wrote:
>> There are four control file types,
>> stats (ro) - display current measured values for a resource
>> max (rw) - limits for a resource
>> default (ro, root cgroup only) - default values for a resource
>> help (ro, root cgroup only) - help string for a resource
>>
>> Each file is multi-lined with one entry/line per drm device.
> Multi-line is correct for multiple devices, but I believe you need
> to use a KEY to denote device for both your set and get routines.
> I didn't see your set functions reading a key, or the get functions
> printing the key in output.
> cgroups-v2 conventions mention using KEY of major:minor, but I think
> you can use drm_minor as key?
 Given this controller is specific to the drm kernel subsystem which
 uses minor to identify drm device,
>>> Wait a second, using the DRM minor is a good idea in the first place.
>> Well that should have read "is not a good idea"..
> What else should we use?

Well what does for example udev uses to identify a device?

>> Christian.
>>
>>> I have a test system with a Vega10 and a Vega20. Which device gets which
>>> minor is not stable, but rather defined by the scan order of the PCIe bus.
>>>
>>> Normally the scan order is always the same, but adding or removing
>>> devices or delaying things just a little bit during init is enough to
>>> change this.
>>>
>>> We need something like the Linux sysfs location or similar to have a
>>> stable implementation.
> You can go from sysfs location to drm class directory (in sysfs) and back.
> That means if you care you need to walk sysfs yourself a bit, but using
> the drm minor isn't a blocker itself.

Yeah, agreed that userspace could do this. But I think if there is an of 
hand alternative we should use this instead.

> One downside with the drm minor is that it's pretty good nonsense once you
> have more than 64 gpus though, due to how we space render and legacy nodes
> in the minor ids :-)

Ok, another good reason to at least not use the minor=linenum approach.

Christian.

> -Daniel
>>> Regards,
>>> Christian.
>>>
 I don't see a need to complicate
 the interfaces more by having major and a key.  As you can see in the
 examples below, the drm device minor corresponds to the line number.
 I am not sure how strict cgroup upstream is about the convention but I
 am hoping there are flexibility here to allow for what I have
 implemented.  There are a couple of other things I have done that is
 not described in the convention: 1) inclusion of read-only *.help file
 at the root cgroup, 2) use read-only (which I can potentially make rw)
 *.default file instead of having a default entries (since the default
 can be different for different devices) inside the control files (this
 way, the resetting of cgroup values for all the drm devices, can be
 done by a simple 'cp'.)

>> Usage examples:
>> // set limit for card1 to 1GB
>> sed -i '2s/.*/1073741824/' /sys/fs/cgroup//drm.buffer.total.max
>>
>> // set limit for card0 to 512MB
>> sed -i '1s/.*/536870912/' /sys/fs/cgroup//drm.buffer.total.max
>> /** @file drm_gem.c
>> @@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device 
>> *dev,
>>  obj->handle_count = 0;
>>  obj->size = size;
>>  drm_vma_node_reset(>vma_node);
>> +
>> + obj->drmcgrp = get_drmcgrp(current);
>> + drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size);
> Why do the charging here?
> There is no backing store yet for the buffer, so this is really tracking 
> something akin to allowed virtual memory for GEM objects?
> Is this really useful for an administrator to control?
> Isn't the resource we want to control actually the physical backing store?
 That's correct.  This is just the first level of control since the
 backing store can be backed by different type of memory.  I am in the
 process of adding at least two more resources.  Stay tuned.  I am
 doing the charge here to enforce the idea of "creator is deemed owner"
 at a place where the code is shared by all (the init function.)

>> + while (i <= max_minor && limits != NULL) {
>> + sval =  strsep(, "\n");
>> + rc = kstrtoll(sval, 0, );
> Input should be "KEY VALUE", so KEY will determine device to apply this 
> to.
> Also, per cgroups-v2 documentation of limits, I believe need to parse and 
> handle the special "max" input value.
>
> parse_resources() in rdma 

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Kenny Ho
On Thu, May 16, 2019 at 3:25 AM Christian König
 wrote:
> Am 16.05.19 um 09:16 schrieb Koenig, Christian:
> > Am 16.05.19 um 04:29 schrieb Kenny Ho:
> >> On Wed, May 15, 2019 at 5:26 PM Welty, Brian  wrote:
> >>> On 5/9/2019 2:04 PM, Kenny Ho wrote:
>  Each file is multi-lined with one entry/line per drm device.
> >>> Multi-line is correct for multiple devices, but I believe you need
> >>> to use a KEY to denote device for both your set and get routines.
> >>> I didn't see your set functions reading a key, or the get functions
> >>> printing the key in output.
> >>> cgroups-v2 conventions mention using KEY of major:minor, but I think
> >>> you can use drm_minor as key?
> >> Given this controller is specific to the drm kernel subsystem which
> >> uses minor to identify drm device,
> > Wait a second, using the DRM minor is a good idea in the first place.
> Well that should have read "is not a good idea"..
>
> I have a test system with a Vega10 and a Vega20. Which device gets which
> minor is not stable, but rather defined by the scan order of the PCIe bus.
>
> Normally the scan order is always the same, but adding or removing
> devices or delaying things just a little bit during init is enough to
> change this.
>
> We need something like the Linux sysfs location or similar to have a
> stable implementation.

I get that, which is why I don't use minor to identify cards in user
space apps I wrote:
https://github.com/RadeonOpenCompute/k8s-device-plugin/blob/c2659c9d1d0713cad36fb5256681125121e6e32f/internal/pkg/amdgpu/amdgpu.go#L85

But within the kernel, I think my use of minor is consistent with the
rest of the drm subsystem.  I hope I don't need to reform the way the
drm subsystem use minor in order to introduce a cgroup controller.

Regards,
Kenny
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Daniel Vetter
On Thu, May 16, 2019 at 09:25:31AM +0200, Christian König wrote:
> Am 16.05.19 um 09:16 schrieb Koenig, Christian:
> > Am 16.05.19 um 04:29 schrieb Kenny Ho:
> > > [CAUTION: External Email]
> > > 
> > > On Wed, May 15, 2019 at 5:26 PM Welty, Brian  
> > > wrote:
> > > > On 5/9/2019 2:04 PM, Kenny Ho wrote:
> > > > > There are four control file types,
> > > > > stats (ro) - display current measured values for a resource
> > > > > max (rw) - limits for a resource
> > > > > default (ro, root cgroup only) - default values for a resource
> > > > > help (ro, root cgroup only) - help string for a resource
> > > > > 
> > > > > Each file is multi-lined with one entry/line per drm device.
> > > > Multi-line is correct for multiple devices, but I believe you need
> > > > to use a KEY to denote device for both your set and get routines.
> > > > I didn't see your set functions reading a key, or the get functions
> > > > printing the key in output.
> > > > cgroups-v2 conventions mention using KEY of major:minor, but I think
> > > > you can use drm_minor as key?
> > > Given this controller is specific to the drm kernel subsystem which
> > > uses minor to identify drm device,
> > Wait a second, using the DRM minor is a good idea in the first place.
> 
> Well that should have read "is not a good idea"..

What else should we use?
> 
> Christian.
> 
> > 
> > I have a test system with a Vega10 and a Vega20. Which device gets which
> > minor is not stable, but rather defined by the scan order of the PCIe bus.
> > 
> > Normally the scan order is always the same, but adding or removing
> > devices or delaying things just a little bit during init is enough to
> > change this.
> > 
> > We need something like the Linux sysfs location or similar to have a
> > stable implementation.

You can go from sysfs location to drm class directory (in sysfs) and back.
That means if you care you need to walk sysfs yourself a bit, but using
the drm minor isn't a blocker itself.

One downside with the drm minor is that it's pretty good nonsense once you
have more than 64 gpus though, due to how we space render and legacy nodes
in the minor ids :-)
-Daniel
> > 
> > Regards,
> > Christian.
> > 
> > >I don't see a need to complicate
> > > the interfaces more by having major and a key.  As you can see in the
> > > examples below, the drm device minor corresponds to the line number.
> > > I am not sure how strict cgroup upstream is about the convention but I
> > > am hoping there are flexibility here to allow for what I have
> > > implemented.  There are a couple of other things I have done that is
> > > not described in the convention: 1) inclusion of read-only *.help file
> > > at the root cgroup, 2) use read-only (which I can potentially make rw)
> > > *.default file instead of having a default entries (since the default
> > > can be different for different devices) inside the control files (this
> > > way, the resetting of cgroup values for all the drm devices, can be
> > > done by a simple 'cp'.)
> > > 
> > > > > Usage examples:
> > > > > // set limit for card1 to 1GB
> > > > > sed -i '2s/.*/1073741824/' 
> > > > > /sys/fs/cgroup//drm.buffer.total.max
> > > > > 
> > > > > // set limit for card0 to 512MB
> > > > > sed -i '1s/.*/536870912/' /sys/fs/cgroup//drm.buffer.total.max
> > > > >/** @file drm_gem.c
> > > > > @@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct 
> > > > > drm_device *dev,
> > > > > obj->handle_count = 0;
> > > > > obj->size = size;
> > > > > drm_vma_node_reset(>vma_node);
> > > > > +
> > > > > + obj->drmcgrp = get_drmcgrp(current);
> > > > > + drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size);
> > > > Why do the charging here?
> > > > There is no backing store yet for the buffer, so this is really 
> > > > tracking something akin to allowed virtual memory for GEM objects?
> > > > Is this really useful for an administrator to control?
> > > > Isn't the resource we want to control actually the physical backing 
> > > > store?
> > > That's correct.  This is just the first level of control since the
> > > backing store can be backed by different type of memory.  I am in the
> > > process of adding at least two more resources.  Stay tuned.  I am
> > > doing the charge here to enforce the idea of "creator is deemed owner"
> > > at a place where the code is shared by all (the init function.)
> > > 
> > > > > + while (i <= max_minor && limits != NULL) {
> > > > > + sval =  strsep(, "\n");
> > > > > + rc = kstrtoll(sval, 0, );
> > > > Input should be "KEY VALUE", so KEY will determine device to apply this 
> > > > to.
> > > > Also, per cgroups-v2 documentation of limits, I believe need to parse 
> > > > and handle the special "max" input value.
> > > > 
> > > > parse_resources() in rdma controller is example for both of above.
> > > Please see my previous reply for the rationale of my hope to not need
> > > a key.  I can certainly add handling of "max" and 

Re: [PATCH 2/2] drm/amd/amdgpu: Fix maybe-uninitialized warning in df_v3_6_pmc_start

2019-05-16 Thread Alex Deucher
Series is:
Reviewed-by: Alex Deucher 

On Wed, May 15, 2019 at 4:24 PM Bhawanpreet Lakha
 wrote:
>
> This fixes the warning below
>
> error: ‘ret’ may be used uninitialized in this function 
> [-Werror=maybe-uninitialized]
>   int xgmi_tx_link, ret;
> ^~~
> Signed-off-by: Bhawanpreet Lakha 
> ---
>  drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
> b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> index 201c00411720..a8442508eea5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> @@ -459,7 +459,7 @@ static int df_v3_6_stop_xgmi_link_cntr(struct 
> amdgpu_device *adev,
>  static int df_v3_6_pmc_start(struct amdgpu_device *adev, uint64_t config,
>  int is_enable)
>  {
> -   int xgmi_tx_link, ret;
> +   int xgmi_tx_link, ret = 0;
>
> switch (adev->asic_type) {
> case CHIP_VEGA20:
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: don't read DF register for SRIOV

2019-05-16 Thread Yintian Tao
Under SRIOV, reading DF register has chance to lead to
AER error in host side, just skip reading it.

Signed-off-by: Monk Liu 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index a417763..b5bf9ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -837,7 +837,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
 
if (amdgpu_emu_mode != 1)
adev->gmc.vram_width = amdgpu_atomfirmware_get_vram_width(adev);
-   if (!adev->gmc.vram_width) {
+   if (!adev->gmc.vram_width && !amdgpu_sriov_vf(adev)) {
/* hbm memory channel size */
if (adev->flags & AMD_IS_APU)
chansize = 64;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov

2019-05-16 Thread Yintian Tao
For Vega10 SR-IOV, vram_width can't be read from ATOM as
RAVEN, and DF related registers is not readable, seems hardcord
is the only way to set the correct vram_width

Signed-off-by: Trigger Huang 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index c221570..a417763 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -848,6 +848,13 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
adev->gmc.vram_width = numchan * chansize;
}
 
+   /* For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN,
+* and DF related registers is not readable, seems hardcord is the
+* only way to set the correct vram_width */
+   if (amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_VEGA10)) {
+   adev->gmc.vram_width = 2048;
+   }
+
/* size in MB on si */
adev->gmc.mc_vram_size =
adev->nbio_funcs->get_memsize(adev) * 1024ULL * 1024ULL;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

2019-05-16 Thread Yintian Tao
PSP fw primary buffer is not used under SRIOV.
Therefore, we don't need to allocate memory for it.

v2: remove superfluous check for amdgpu_bo_free_kernel().

Signed-off-by: Yintian Tao 
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index c567a55..af9835c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev)
if (!psp->cmd)
return -ENOMEM;
 
-   ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
-   AMDGPU_GEM_DOMAIN_GTT,
-   >fw_pri_bo,
-   >fw_pri_mc_addr,
-   >fw_pri_buf);
-   if (ret)
-   goto failed;
+   /* this fw pri bo is not used under SRIOV */
+   if (!amdgpu_sriov_vf(psp->adev)) {
+   ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
+ AMDGPU_GEM_DOMAIN_GTT,
+ >fw_pri_bo,
+ >fw_pri_mc_addr,
+ >fw_pri_buf);
+   if (ret)
+   goto failed;
+   }
 
ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
AMDGPU_GEM_DOMAIN_VRAM,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Hard lockups with ROCM

2019-05-16 Thread Daniel Kasak
On Thu, May 16, 2019 at 11:43 AM Alex Deucher  wrote:

> On Wed, May 15, 2019 at 8:33 PM Daniel Kasak 
> wrote:
> >
> > On Mon, May 13, 2019 at 11:44 AM Daniel Kasak 
> wrote:
> >>
> >> Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and
> 5.1.0 kernel. Things were going great with various boinc GPU tasks. But
> there is a setiathome GPU task which reliably gives me a hard lockup within
> about 30 minutes of running. I actually had to do *two* emergency
> re-installs over the past week. Perhaps part of this was my fault ( running
> btrfs with lzo compression on my root partition ... ). But absolutely part
> of this was the hard lockups. I've tested all kinds of other things ( eg
> rebuilding lots of stuff under Gentoo ) ... I don't have a general
> stability issue even under hours of high load. But after restarting boinc
> with that same setiathome task ... !
> >>
> >> If someone wants me to sacrifice another installation, they can point
> me to instructions for trying to gather more information.
> >>
> >> Anyway ... perhaps more work around detecting and recovering from GPU
> lockups is in order?
> >>
> >> Dan
> >
> >
> > 
> >
> > That's what I was afraid of :(
>
> Not sure what you were afraid of.  I don't think anyone has looked at
> setiathome on ROCm.  I'd suggest filing a bug
> (https://bugs.freedesktop.org) and attaching your dmesg output and
> xorg log (if using X).  If there is a GPU reset, note that you will
> need to restart your desktop environment because currently neither
> glamor or any compositors support GL robustness extensions to reset
> their contexts after a GPU reset.
>
> Alex
>

Hi Alex. dmesg output is not available ... this is a *hard* lockup. I need
to power-cycle after it happens ( ALT + SysRq + { S , U , B } doesn't even
work ). That's why I asked for instructions to possibly gather more info. I
did check the xorg log after I did an emergency export of my filesystem ...
nothing of interest in there. It seems like I currently don't really have
enough info to make a bug report worthwhile.

Dan
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Compiler warnings with current amd-staging-drm-next

2019-05-16 Thread Michel Dänzer
On 2019-05-10 1:17 p.m., Michel Dänzer wrote:
> 
> drivers/gpu/drm//amd/amdgpu/df_v3_6.c: In function ‘df_v3_6_pmc_start’:
> drivers/gpu/drm//amd/amdgpu/df_v3_6.c:482:9: warning: ‘ret’ may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
>   return ret;
>  ^~~

This warning is still there; the others have been fixed.


Other warnings have been introduced though:

drivers/gpu/drm//amd/amdgpu/../powerplay/vega20_ppt.c: In function 
‘vega20_get_ppfeature_status’:
drivers/gpu/drm//amd/amdgpu/../powerplay/vega20_ppt.c:2387:103: warning: 
‘feature_mask’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  *features_enabled = uint64_t)feature_mask[0] << SMU_FEATURES_LOW_SHIFT) & 
SMU_FEATURES_LOW_MASK) |
  
~^
(((uint64_t)feature_mask[1] << SMU_FEATURES_HIGH_SHIFT) & 
SMU_FEATURES_HIGH_MASK));

~~
  
drivers/gpu/drm//amd/amdgpu/../powerplay/vega20_ppt.c: In function 
‘vega20_set_ppfeature_status’:
drivers/gpu/drm//amd/amdgpu/../powerplay/vega20_ppt.c:2387:103: warning: 
‘feature_mask’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  *features_enabled = uint64_t)feature_mask[0] << SMU_FEATURES_LOW_SHIFT) & 
SMU_FEATURES_LOW_MASK) |
  
~^
(((uint64_t)feature_mask[1] << SMU_FEATURES_HIGH_SHIFT) & 
SMU_FEATURES_HIGH_MASK));

~~
  


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Christian König

Am 16.05.19 um 09:16 schrieb Koenig, Christian:

Am 16.05.19 um 04:29 schrieb Kenny Ho:

[CAUTION: External Email]

On Wed, May 15, 2019 at 5:26 PM Welty, Brian  wrote:

On 5/9/2019 2:04 PM, Kenny Ho wrote:

There are four control file types,
stats (ro) - display current measured values for a resource
max (rw) - limits for a resource
default (ro, root cgroup only) - default values for a resource
help (ro, root cgroup only) - help string for a resource

Each file is multi-lined with one entry/line per drm device.

Multi-line is correct for multiple devices, but I believe you need
to use a KEY to denote device for both your set and get routines.
I didn't see your set functions reading a key, or the get functions
printing the key in output.
cgroups-v2 conventions mention using KEY of major:minor, but I think
you can use drm_minor as key?

Given this controller is specific to the drm kernel subsystem which
uses minor to identify drm device,

Wait a second, using the DRM minor is a good idea in the first place.


Well that should have read "is not a good idea"..

Christian.



I have a test system with a Vega10 and a Vega20. Which device gets which
minor is not stable, but rather defined by the scan order of the PCIe bus.

Normally the scan order is always the same, but adding or removing
devices or delaying things just a little bit during init is enough to
change this.

We need something like the Linux sysfs location or similar to have a
stable implementation.

Regards,
Christian.


   I don't see a need to complicate
the interfaces more by having major and a key.  As you can see in the
examples below, the drm device minor corresponds to the line number.
I am not sure how strict cgroup upstream is about the convention but I
am hoping there are flexibility here to allow for what I have
implemented.  There are a couple of other things I have done that is
not described in the convention: 1) inclusion of read-only *.help file
at the root cgroup, 2) use read-only (which I can potentially make rw)
*.default file instead of having a default entries (since the default
can be different for different devices) inside the control files (this
way, the resetting of cgroup values for all the drm devices, can be
done by a simple 'cp'.)


Usage examples:
// set limit for card1 to 1GB
sed -i '2s/.*/1073741824/' /sys/fs/cgroup//drm.buffer.total.max

// set limit for card0 to 512MB
sed -i '1s/.*/536870912/' /sys/fs/cgroup//drm.buffer.total.max
   /** @file drm_gem.c
@@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
obj->handle_count = 0;
obj->size = size;
drm_vma_node_reset(>vma_node);
+
+ obj->drmcgrp = get_drmcgrp(current);
+ drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size);

Why do the charging here?
There is no backing store yet for the buffer, so this is really tracking 
something akin to allowed virtual memory for GEM objects?
Is this really useful for an administrator to control?
Isn't the resource we want to control actually the physical backing store?

That's correct.  This is just the first level of control since the
backing store can be backed by different type of memory.  I am in the
process of adding at least two more resources.  Stay tuned.  I am
doing the charge here to enforce the idea of "creator is deemed owner"
at a place where the code is shared by all (the init function.)


+ while (i <= max_minor && limits != NULL) {
+ sval =  strsep(, "\n");
+ rc = kstrtoll(sval, 0, );

Input should be "KEY VALUE", so KEY will determine device to apply this to.
Also, per cgroups-v2 documentation of limits, I believe need to parse and handle the 
special "max" input value.

parse_resources() in rdma controller is example for both of above.

Please see my previous reply for the rationale of my hope to not need
a key.  I can certainly add handling of "max" and "default".



+void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
+ size_t size)

Shouldn't this return an error and be implemented with same semantics as the
try_charge() functions of other controllers?
Below will allow stats_total_allocated to overrun limits_total_allocated.

This is because I am charging the buffer at the init of the buffer
which does not fail so the "try" (drmcgrp_bo_can_allocate) is separate
and placed earlier and nearer other condition where gem object
allocation may fail.  In other words, there are multiple possibilities
for which gem allocation may fail (cgroup limit being one of them) and
satisfying cgroup limit does not mean a charge is needed.  I can
certainly combine the two functions to have an additional try_charge
semantic as well if that is really needed.

Regards,
Kenny

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list

Re: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

2019-05-16 Thread Christian König

Am 16.05.19 um 07:11 schrieb Yintian Tao:

PSP fw primary buffer is not used under SRIOV
Therefore, we don't need to allocate memory for it.

Signed-off-by: Yintian Tao 
Signed-off-by: Monk Liu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 22 +-
  1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index c567a55..d3c77d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev)
if (!psp->cmd)
return -ENOMEM;
  
-	ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,

-   AMDGPU_GEM_DOMAIN_GTT,
-   >fw_pri_bo,
-   >fw_pri_mc_addr,
-   >fw_pri_buf);
-   if (ret)
-   goto failed;
+   /* this fw pri bo is not used under SRIOV */
+   if (!amdgpu_sriov_vf(psp->adev)) {
+   ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
+ AMDGPU_GEM_DOMAIN_GTT,
+ >fw_pri_bo,
+ >fw_pri_mc_addr,
+ >fw_pri_buf);
+   if (ret)
+   goto failed;
+   }
  
  	ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,

AMDGPU_GEM_DOMAIN_VRAM,
@@ -1012,8 +1015,9 @@ static int psp_hw_fini(void *handle)
psp_ring_destroy(psp, PSP_RING_TYPE__KM);
  
  	amdgpu_bo_free_kernel(>tmr_bo, >tmr_mc_addr, >tmr_buf);

-   amdgpu_bo_free_kernel(>fw_pri_bo,
- >fw_pri_mc_addr, >fw_pri_buf);
+   if (!amdgpu_sriov_vf(psp->adev))
+   amdgpu_bo_free_kernel(>fw_pri_bo,
+ >fw_pri_mc_addr, >fw_pri_buf);


This is superfluous, amdgpu_bo_free_kernel() does a NULL check anyway.

Apart from that looks good to me,
Christian.


amdgpu_bo_free_kernel(>fence_buf_bo,
  >fence_buf_mc_addr, >fence_buf);
amdgpu_bo_free_kernel(>asd_shared_bo, >asd_shared_mc_addr,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Koenig, Christian
Am 16.05.19 um 04:29 schrieb Kenny Ho:
> [CAUTION: External Email]
>
> On Wed, May 15, 2019 at 5:26 PM Welty, Brian  wrote:
>> On 5/9/2019 2:04 PM, Kenny Ho wrote:
>>> There are four control file types,
>>> stats (ro) - display current measured values for a resource
>>> max (rw) - limits for a resource
>>> default (ro, root cgroup only) - default values for a resource
>>> help (ro, root cgroup only) - help string for a resource
>>>
>>> Each file is multi-lined with one entry/line per drm device.
>> Multi-line is correct for multiple devices, but I believe you need
>> to use a KEY to denote device for both your set and get routines.
>> I didn't see your set functions reading a key, or the get functions
>> printing the key in output.
>> cgroups-v2 conventions mention using KEY of major:minor, but I think
>> you can use drm_minor as key?
> Given this controller is specific to the drm kernel subsystem which
> uses minor to identify drm device,

Wait a second, using the DRM minor is a good idea in the first place.

I have a test system with a Vega10 and a Vega20. Which device gets which 
minor is not stable, but rather defined by the scan order of the PCIe bus.

Normally the scan order is always the same, but adding or removing 
devices or delaying things just a little bit during init is enough to 
change this.

We need something like the Linux sysfs location or similar to have a 
stable implementation.

Regards,
Christian.

>   I don't see a need to complicate
> the interfaces more by having major and a key.  As you can see in the
> examples below, the drm device minor corresponds to the line number.
> I am not sure how strict cgroup upstream is about the convention but I
> am hoping there are flexibility here to allow for what I have
> implemented.  There are a couple of other things I have done that is
> not described in the convention: 1) inclusion of read-only *.help file
> at the root cgroup, 2) use read-only (which I can potentially make rw)
> *.default file instead of having a default entries (since the default
> can be different for different devices) inside the control files (this
> way, the resetting of cgroup values for all the drm devices, can be
> done by a simple 'cp'.)
>
>>> Usage examples:
>>> // set limit for card1 to 1GB
>>> sed -i '2s/.*/1073741824/' /sys/fs/cgroup//drm.buffer.total.max
>>>
>>> // set limit for card0 to 512MB
>>> sed -i '1s/.*/536870912/' /sys/fs/cgroup//drm.buffer.total.max
>
>>>   /** @file drm_gem.c
>>> @@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
>>>obj->handle_count = 0;
>>>obj->size = size;
>>>drm_vma_node_reset(>vma_node);
>>> +
>>> + obj->drmcgrp = get_drmcgrp(current);
>>> + drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size);
>> Why do the charging here?
>> There is no backing store yet for the buffer, so this is really tracking 
>> something akin to allowed virtual memory for GEM objects?
>> Is this really useful for an administrator to control?
>> Isn't the resource we want to control actually the physical backing store?
> That's correct.  This is just the first level of control since the
> backing store can be backed by different type of memory.  I am in the
> process of adding at least two more resources.  Stay tuned.  I am
> doing the charge here to enforce the idea of "creator is deemed owner"
> at a place where the code is shared by all (the init function.)
>
>>> + while (i <= max_minor && limits != NULL) {
>>> + sval =  strsep(, "\n");
>>> + rc = kstrtoll(sval, 0, );
>> Input should be "KEY VALUE", so KEY will determine device to apply this to.
>> Also, per cgroups-v2 documentation of limits, I believe need to parse and 
>> handle the special "max" input value.
>>
>> parse_resources() in rdma controller is example for both of above.
> Please see my previous reply for the rationale of my hope to not need
> a key.  I can certainly add handling of "max" and "default".
>
>
>>> +void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
>>> + size_t size)
>> Shouldn't this return an error and be implemented with same semantics as the
>> try_charge() functions of other controllers?
>> Below will allow stats_total_allocated to overrun limits_total_allocated.
> This is because I am charging the buffer at the init of the buffer
> which does not fail so the "try" (drmcgrp_bo_can_allocate) is separate
> and placed earlier and nearer other condition where gem object
> allocation may fail.  In other words, there are multiple possibilities
> for which gem allocation may fail (cgroup limit being one of them) and
> satisfying cgroup limit does not mean a charge is needed.  I can
> certainly combine the two functions to have an additional try_charge
> semantic as well if that is really needed.
>
> Regards,
> Kenny

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org