RE: [PATCH] drm/amdgpu: cancel late_init_work before gpu reset
Reviewed-by: Feifei Xu -Original Message- From: amd-gfx On Behalf Of Pan, Xinhui Sent: Friday, May 17, 2019 11:04 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Huang, JinHuiEric Subject: [PATCH] drm/amdgpu: cancel late_init_work before gpu reset [CAUTION: External Email] gpu reset will run late_init and schedule the late_init_work. if we keep triggering gpu reset in a short time, there are potenial races. Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index ecb99aeab6cf..241cd2c75433 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3636,6 +3636,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, dev_info(adev->dev, "GPU reset begin!\n"); + cancel_delayed_work_sync(>late_init_work); + hive = amdgpu_get_xgmi_hive(adev, false); /* -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: keep stolen memory on picasso
Reviewed-by: Feifei Xu -Original Message- From: amd-gfx On Behalf Of Cui, Flora Sent: Friday, May 17, 2019 11:48 AM To: amd-gfx@lists.freedesktop.org Cc: Cui, Flora Subject: [PATCH] drm/amdgpu: keep stolen memory on picasso [CAUTION: External Email] otherwise screen corrupts during modprobe. Change-Id: I73bcf3ab0c666077dfe85436a3457a0379382304 Signed-off-by: Flora Cui --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index c221570..5afbb59 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -626,6 +626,7 @@ static bool gmc_v9_0_keep_stolen_memory(struct amdgpu_device *adev) case CHIP_VEGA10: return true; case CHIP_RAVEN: + return (adev->pdev->device == 0x15d8); case CHIP_VEGA12: case CHIP_VEGA20: default: -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amd/powerplay: fix sw SMU wrong UVD/VCE powergate setting
Reviewed-by: Feifei Xu -Original Message- From: amd-gfx On Behalf Of Evan Quan Sent: Friday, May 17, 2019 1:45 PM To: amd-gfx@lists.freedesktop.org Cc: Quan, Evan Subject: [PATCH] drm/amd/powerplay: fix sw SMU wrong UVD/VCE powergate setting [CAUTION: External Email] The UVD/VCE bits are set wrongly. This causes the UVD/VCE clocks are not brought back correctly on needed. Change-Id: I6eda67ea3be45fd5f422cdb78356915bf06ff41e Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c index 0eea93c8dff7..a3a7afca7516 100644 --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c @@ -1814,24 +1814,24 @@ static int smu_v11_0_update_od8_settings(struct smu_context *smu, static int smu_v11_0_dpm_set_uvd_enable(struct smu_context *smu, bool enable) { - if (!smu_feature_is_supported(smu, FEATURE_DPM_VCE_BIT)) + if (!smu_feature_is_supported(smu, FEATURE_DPM_UVD_BIT)) return 0; - if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_VCE_BIT)) + if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_UVD_BIT)) return 0; - return smu_feature_set_enabled(smu, FEATURE_DPM_VCE_BIT, enable); + return smu_feature_set_enabled(smu, FEATURE_DPM_UVD_BIT, + enable); } static int smu_v11_0_dpm_set_vce_enable(struct smu_context *smu, bool enable) { - if (!smu_feature_is_supported(smu, FEATURE_DPM_UVD_BIT)) + if (!smu_feature_is_supported(smu, FEATURE_DPM_VCE_BIT)) return 0; - if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_UVD_BIT)) + if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_VCE_BIT)) return 0; - return smu_feature_set_enabled(smu, FEATURE_DPM_UVD_BIT, enable); + return smu_feature_set_enabled(smu, FEATURE_DPM_VCE_BIT, + enable); } static int smu_v11_0_get_current_rpm(struct smu_context *smu, -- 2.21.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/powerplay: fix sw SMU wrong UVD/VCE powergate setting
The UVD/VCE bits are set wrongly. This causes the UVD/VCE clocks are not brought back correctly on needed. Change-Id: I6eda67ea3be45fd5f422cdb78356915bf06ff41e Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c index 0eea93c8dff7..a3a7afca7516 100644 --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c @@ -1814,24 +1814,24 @@ static int smu_v11_0_update_od8_settings(struct smu_context *smu, static int smu_v11_0_dpm_set_uvd_enable(struct smu_context *smu, bool enable) { - if (!smu_feature_is_supported(smu, FEATURE_DPM_VCE_BIT)) + if (!smu_feature_is_supported(smu, FEATURE_DPM_UVD_BIT)) return 0; - if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_VCE_BIT)) + if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_UVD_BIT)) return 0; - return smu_feature_set_enabled(smu, FEATURE_DPM_VCE_BIT, enable); + return smu_feature_set_enabled(smu, FEATURE_DPM_UVD_BIT, enable); } static int smu_v11_0_dpm_set_vce_enable(struct smu_context *smu, bool enable) { - if (!smu_feature_is_supported(smu, FEATURE_DPM_UVD_BIT)) + if (!smu_feature_is_supported(smu, FEATURE_DPM_VCE_BIT)) return 0; - if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_UVD_BIT)) + if (enable == smu_feature_is_enabled(smu, FEATURE_DPM_VCE_BIT)) return 0; - return smu_feature_set_enabled(smu, FEATURE_DPM_UVD_BIT, enable); + return smu_feature_set_enabled(smu, FEATURE_DPM_VCE_BIT, enable); } static int smu_v11_0_get_current_rpm(struct smu_context *smu, -- 2.21.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: keep stolen memory on picasso
otherwise screen corrupts during modprobe. Change-Id: I73bcf3ab0c666077dfe85436a3457a0379382304 Signed-off-by: Flora Cui --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index c221570..5afbb59 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -626,6 +626,7 @@ static bool gmc_v9_0_keep_stolen_memory(struct amdgpu_device *adev) case CHIP_VEGA10: return true; case CHIP_RAVEN: + return (adev->pdev->device == 0x15d8); case CHIP_VEGA12: case CHIP_VEGA20: default: -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: cancel late_init_work before gpu reset
gpu reset will run late_init and schedule the late_init_work. if we keep triggering gpu reset in a short time, there are potenial races. Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index ecb99aeab6cf..241cd2c75433 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3636,6 +3636,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, dev_info(adev->dev, "GPU reset begin!\n"); + cancel_delayed_work_sync(>late_init_work); + hive = amdgpu_get_xgmi_hive(adev, false); /* -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV
Hi Christian I have modified it according to your suggestion. Can you help review this again? Thanks in advance. Best Regards Yintian Tao -Original Message- From: Yintian Tao Sent: Thursday, May 16, 2019 7:54 PM To: amd-gfx@lists.freedesktop.org Cc: Tao, Yintian ; Liu, Monk Subject: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV PSP fw primary buffer is not used under SRIOV. Therefore, we don't need to allocate memory for it. v2: remove superfluous check for amdgpu_bo_free_kernel(). Signed-off-by: Yintian Tao Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index c567a55..af9835c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev) if (!psp->cmd) return -ENOMEM; - ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG, - AMDGPU_GEM_DOMAIN_GTT, - >fw_pri_bo, - >fw_pri_mc_addr, - >fw_pri_buf); - if (ret) - goto failed; + /* this fw pri bo is not used under SRIOV */ + if (!amdgpu_sriov_vf(psp->adev)) { + ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG, + AMDGPU_GEM_DOMAIN_GTT, + >fw_pri_bo, + >fw_pri_mc_addr, + >fw_pri_buf); + if (ret) + goto failed; + } ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: don't read DF register for SRIOV
Ping... Hi Christian and Alex Can you help review this? Thanks in advance. Best Regards Yintian Tao -Original Message- From: amd-gfx On Behalf Of Yintian Tao Sent: Thursday, May 16, 2019 8:11 PM To: amd-gfx@lists.freedesktop.org Cc: Liu, Monk ; Tao, Yintian Subject: [PATCH] drm/amdgpu: don't read DF register for SRIOV [CAUTION: External Email] Under SRIOV, reading DF register has chance to lead to AER error in host side, just skip reading it. Signed-off-by: Monk Liu Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index a417763..b5bf9ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -837,7 +837,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev) if (amdgpu_emu_mode != 1) adev->gmc.vram_width = amdgpu_atomfirmware_get_vram_width(adev); - if (!adev->gmc.vram_width) { + if (!adev->gmc.vram_width && !amdgpu_sriov_vf(adev)) { /* hbm memory channel size */ if (adev->flags & AMD_IS_APU) chansize = 64; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov
Ping... Hi Christian and Alex Can you help review this? Thanks in advance. Best Regards Yintian Tao -Original Message- From: Yintian Tao Sent: Thursday, May 16, 2019 8:03 PM To: amd-gfx@lists.freedesktop.org Cc: Tao, Yintian ; Huang, Trigger Subject: [PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN, and DF related registers is not readable, seems hardcord is the only way to set the correct vram_width Signed-off-by: Trigger Huang Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index c221570..a417763 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -848,6 +848,13 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev) adev->gmc.vram_width = numchan * chansize; } + /* For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN, +* and DF related registers is not readable, seems hardcord is the +* only way to set the correct vram_width */ + if (amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_VEGA10)) { + adev->gmc.vram_width = 2048; + } + /* size in MB on si */ adev->gmc.mc_vram_size = adev->nbio_funcs->get_memsize(adev) * 1024ULL * 1024ULL; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/7] drm/dp_mst: Register AUX devices for MST ports
So a couple of things: On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote: > From: Ville Syrjälä > > All available downstream ports - physical and logical - are exposed for > each MST device. They are listed in /dev/, following the same naming > scheme as SST devices by appending an incremental ID. > > Although all downstream ports are exposed, only some will work as > expected. Consider the following topology: > >+-+ >| ASIC | >+-+ > Conn-0| > | >+v+ > +| MST HUB |+ > |+-+| > | | > |Port-1 Port-2| > +-v-+ +-v-+ > | MST | | SST | > | Display | | Display | > +---+ +---+ > |Port-1 > x > > MST Path | MST Device > --+-- > sst:0 | MST Hub > mst:0-1 | MST Display > mst:0-1-1 | MST Display's disconnected DP out > mst:0-1-8 | MST Display's internal sink > mst:0-2 | SST Display > > On certain MST displays, the upstream physical port will ACK DPCD reads. > However, reads on the local logical port to the internal sink will > *NAK*. i.e. reading mst:0-1 ACKs, but mst:0-1-8 NAKs. > > There may also be duplicates. Some displays will return the same GUID > when reading DPCD from both mst:0-1 and mst:0-1-8. > > There are some device-dependent behavior as well. The MST hub used > during testing will actually *ACK* read requests on a disconnected > physical port, whereas the MST displays will NAK. > > In light of these discrepancies, it's simpler to expose all downstream > ports - both physical and logical - and let the user decide what to use. > > Cc: Lyude Paul > Signed-off-by: Ville Syrjälä > Signed-off-by: Leo Li > --- > drivers/gpu/drm/drm_dp_aux_dev.c | 14 - > drivers/gpu/drm/drm_dp_mst_topology.c | 103 + > - > include/drm/drm_dp_helper.h | 4 ++ > include/drm/drm_dp_mst_helper.h | 6 ++ > 4 files changed, 112 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c > b/drivers/gpu/drm/drm_dp_aux_dev.c > index 6d84611..01c02b9 100644 > --- a/drivers/gpu/drm/drm_dp_aux_dev.c > +++ b/drivers/gpu/drm/drm_dp_aux_dev.c > @@ -34,6 +34,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -114,6 +115,7 @@ static ssize_t name_show(struct device *dev, > > return res; > } > + > static DEVICE_ATTR_RO(name); > > static struct attribute *drm_dp_aux_attrs[] = { > @@ -160,7 +162,11 @@ static ssize_t auxdev_read_iter(struct kiocb *iocb, > struct iov_iter *to) > break; > } > > - res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo); > + if (aux_dev->aux->is_remote) > + res = drm_dp_mst_dpcd_read(aux_dev->aux, pos, buf, > todo); > + else > + res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo); > + It's still not at all clear to me why we're trying to avoid specifying actual callbacks for the aux device. We should remove this part entirely, remove the is_remote entry from struct drm_dp_aux, and then just specify our own hook in struct drm_dp_aux->transfer(). > if (res <= 0) > break; > > @@ -207,7 +213,11 @@ static ssize_t auxdev_write_iter(struct kiocb *iocb, > struct iov_iter *from) > break; > } > > - res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo); > + if (aux_dev->aux->is_remote) > + res = drm_dp_mst_dpcd_write(aux_dev->aux, pos, buf, > todo); > + else > + res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo); > + > if (res <= 0) > break; > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > b/drivers/gpu/drm/drm_dp_mst_topology.c > index 2ab16c9..54da68e 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -35,6 +35,8 @@ > #include > #include > > +#include "drm_crtc_helper_internal.h" > + Unless I'm mistaken, this looks like some sort of ditritus. > /** > * DOC: dp mst helper > * > @@ -52,6 +54,9 @@ static int drm_dp_dpcd_write_payload(struct > drm_dp_mst_topology_mgr *mgr, >int id, >struct drm_dp_payload *payload); > > +static int drm_dp_send_dpcd_read(struct drm_dp_mst_topology_mgr *mgr, > + struct drm_dp_mst_port *port, > + int offset, int size, u8 *bytes); > static int drm_dp_send_dpcd_write(struct drm_dp_mst_topology_mgr *mgr, > struct drm_dp_mst_port *port, >
Re: [PATCH 7/7] drm/nouveau: Use connector kdev as aux device parent
Reviewed-by: Lyude Paul On Thu, 2019-05-16 at 11:18 -0400, sunpeng...@amd.com wrote: > From: Leo Li > > Set the connector's kernel device as the parent for the aux kernel > device. This allows udev rules to access connector attributes when > creating symlinks to aux devices. > > Cc: Ben Skeggs > Cc: Lyude Paul > Signed-off-by: Leo Li > --- > drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c > b/drivers/gpu/drm/nouveau/nouveau_connector.c > index 3f463c9..738782a 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_connector.c > +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c > @@ -1345,7 +1345,7 @@ nouveau_connector_create(struct drm_device *dev, > break; > case DRM_MODE_CONNECTOR_DisplayPort: > case DRM_MODE_CONNECTOR_eDP: > - nv_connector->aux.dev = dev->dev; > + nv_connector->aux.dev = connector->kdev; > nv_connector->aux.transfer = nouveau_connector_aux_xfer; > snprintf(aux_name, sizeof(aux_name), "sor-%04x-%04x", >dcbe->hasht, dcbe->hashm); -- Cheers, Lyude Paul ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 3/7] drm/dp-mst: Use connector kdev as aux device parent
aghsorry, but I need to take back my Reviewed-by. Noticed an issue when reloading drm and i915. I'll explain it when I respond to patch 2/7 in a moment On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote: > From: Leo Li > > Placing the MST aux device as a child of the connector gives udev the > ability to access the connector device's attributes. This will come in > handy when writing udev rules to create more descriptive symlinks to the > MST aux devices. > > Cc: Ville Syrjälä > Cc: Lyude Paul > Signed-off-by: Leo Li > --- > drivers/gpu/drm/drm_dp_mst_topology.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > b/drivers/gpu/drm/drm_dp_mst_topology.c > index 54da68e..cd2f3c4 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -1269,6 +1269,9 @@ static void drm_dp_add_port(struct drm_dp_mst_branch > *mstb, > } > (*mstb->mgr->cbs->register_connector)(port->connector); > > + if (port->connector->registration_state == > DRM_CONNECTOR_REGISTERED) > + port->aux.dev = port->connector->kdev; > + > drm_dp_aux_register_devnode(>aux); > } > -- Cheers, Lyude Paul ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: cast to unsigned int for 32-bit portability
Without casting, 64-bit division is used implicitly causing DEPMOD failure when building 32-bit kernel. Signed-off-by: Slava Abramov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index da1dc40b9b14..0499620ec972 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -764,8 +764,9 @@ static ssize_t amdgpu_ras_sysfs_badpages_read(struct file *f, struct amdgpu_device *adev = con->adev; const unsigned int element_size = sizeof("0xabcdabcd : 0x12345678 : R\n") - 1; - unsigned int start = (ppos + element_size - 1) / element_size; - unsigned int end = (ppos + count - 1) / element_size; + unsigned int start = + (unsigned int) (ppos + element_size - 1) / element_size; + unsigned int end = (unsigned int) (ppos + count - 1) / element_size; ssize_t s = 0; struct ras_badpage *bps = NULL; unsigned int bps_count = 0; -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: GPU passthrough support for Stoney [Radeon R2/R3/R4/R5 Graphics]?
On Thu, May 16, 2019 at 4:07 PM Micah Morton wrote: > > On Wed, May 15, 2019 at 7:19 PM Alex Deucher wrote: > > > > On Wed, May 15, 2019 at 2:26 PM Micah Morton wrote: > > > > > > Hi folks, > > > > > > I'm interested in running a VM on a system with an integrated Stoney > > > [Radeon R2/R3/R4/R5 Graphics] card and passing through the graphics > > > card to the VM using the IOMMU. I'm wondering whether this is feasible > > > and supposed to be doable with the right setup (as opposed to passing > > > a discrete GPU to the VM, which I think is definitely doable?). > > > > > > So far, I can do all the qemu/kvm/vfio/iommu stuff to run the VM and > > > pass the integrated GPU to it, but the drm driver in the VM fails > > > during amdgpu_device_init(). Specifically, the logs show the SMU being > > > unresponsive, which leads to a 'SMU firmware load failed' error > > > message and kernel panic. I can share VM logs and the invocation of > > > qemu and such if helpful, but first wanted to know at a high level if > > > this should be feasible? > > > > > > P.S.: I'm not initializing the GPU in the host bios or host kernel at > > > all, so I should be passing a fresh GPU to the VM. Also, I'm pretty > > > sure I'm running the correct VGA bios for this GPU in the guest VM > > > bios before guest boot. > > > > > > Any comments/suggestions would be appreciated! > > > > It should work in at least once as long as your vm is properly set up. > > Is there any reason running coreboot vs UEFI at host boot would make a > difference? I was running a modified version of coreboot that avoids > doing any GPU initialization in firmware -- so the first POST happens > inside the guest. The GPU on APUs shares a bunch of resources with the CPU. There are a bunch of blocks which are shared and need to be initialized on both for everything to work properly. > > > Note that the driver needs access to the vbios image in the guest to > > get device specific configuration details (clocks, display connector > > configuration, etc.). > > Is there anything I need to do to ensure this besides passing '-device > vfio-pci,...,romfile=/path/to/vgarom' to qemu? You need the actual vbios rom image from your system. The image is board specific. Alex ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 3/7] drm/dp-mst: Use connector kdev as aux device parent
On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote: > From: Leo Li > > Placing the MST aux device as a child of the connector gives udev the > ability to access the connector device's attributes. This will come in > handy when writing udev rules to create more descriptive symlinks to the > MST aux devices. > > Cc: Ville Syrjälä > Cc: Lyude Paul > Signed-off-by: Leo Li > --- > drivers/gpu/drm/drm_dp_mst_topology.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > b/drivers/gpu/drm/drm_dp_mst_topology.c > index 54da68e..cd2f3c4 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -1269,6 +1269,9 @@ static void drm_dp_add_port(struct drm_dp_mst_branch > *mstb, > } > (*mstb->mgr->cbs->register_connector)(port->connector); > > + if (port->connector->registration_state == > DRM_CONNECTOR_REGISTERED) > + port->aux.dev = port->connector->kdev; > + Line wrapping please! Probably worth running checkpatch on all of these patches as well. With that nitpick addressed: Reviewed-by: Lyude Paul > drm_dp_aux_register_devnode(>aux); > } > -- Cheers, Lyude Paul ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 0/7] Add DP MST AUX devices
On 2019-05-16 3:54 p.m., Lyude Paul wrote: > [CAUTION: External Email] > > Hi, could we (and for future versions of this series and others) get a respin > of this that's actually rebased against drm-tip? That is the defacto standard > branch to do development on, and otherwise anyone trying to test these patches > has to resolve merge conflicts (along with maintainers). The branch this > appears to be based off of doesn't even have the new kref scheme for branch > devices and ports. > Sorry, this was laziness on my part :) Rebasing this now. Leo > On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote: >> From: Leo Li >> >> This series adds support for MST AUX devices. >> >> A more descriptive 'mstpath' attribute is also added to MST connector >> devices. In addition, the DP aux device is made to be a child of the >> corresponding connector's device where possible (*). This allows udev >> rules to provide descriptive symlinks to the AUX devices. >> >> (*) This can only be done on drivers that register their connector and >> aux devices via drm_connector_register() and drm_dp_aux_register() >> respectively. The connector also needs to be registered before the aux >> device. >> >> Leo Li (6): >>drm/dp: Use non-cyclic idr >>drm/dp-mst: Use connector kdev as aux device parent >>drm/sysfs: Add mstpath attribute to connector devices >>drm/amd/display: Use connector kdev as aux device parent >>drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent >>drm/nouveau: Use connector kdev as aux device parent >> >> Ville Syrjälä (1): >>drm/dp_mst: Register AUX devices for MST ports >> >> .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c| 2 +- >> drivers/gpu/drm/bridge/analogix-anx78xx.c | 22 ++--- >> drivers/gpu/drm/drm_dp_aux_dev.c | 17 +++- >> drivers/gpu/drm/drm_dp_mst_topology.c | 106 >> ++--- >> drivers/gpu/drm/drm_sysfs.c| 23 + >> drivers/gpu/drm/nouveau/nouveau_connector.c| 2 +- >> include/drm/drm_dp_helper.h| 4 + >> include/drm/drm_dp_mst_helper.h| 6 ++ >> 8 files changed, 152 insertions(+), 30 deletions(-) >> > -- > Cheers, > Lyude Paul > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 0/7] Add DP MST AUX devices
Whoops-one more thing I forgot to mention. This is just personal preference for me, but if you're ccing me on any of the patches in the series feel free to just do it for all of them. Makes my inbox a little less confusing to look at On Thu, 2019-05-16 at 15:54 -0400, Lyude Paul wrote: > Hi, could we (and for future versions of this series and others) get a > respin > of this that's actually rebased against drm-tip? That is the defacto > standard > branch to do development on, and otherwise anyone trying to test these > patches > has to resolve merge conflicts (along with maintainers). The branch this > appears to be based off of doesn't even have the new kref scheme for branch > devices and ports. > > On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote: > > From: Leo Li > > > > This series adds support for MST AUX devices. > > > > A more descriptive 'mstpath' attribute is also added to MST connector > > devices. In addition, the DP aux device is made to be a child of the > > corresponding connector's device where possible (*). This allows udev > > rules to provide descriptive symlinks to the AUX devices. > > > > (*) This can only be done on drivers that register their connector and > > aux devices via drm_connector_register() and drm_dp_aux_register() > > respectively. The connector also needs to be registered before the aux > > device. > > > > Leo Li (6): > > drm/dp: Use non-cyclic idr > > drm/dp-mst: Use connector kdev as aux device parent > > drm/sysfs: Add mstpath attribute to connector devices > > drm/amd/display: Use connector kdev as aux device parent > > drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent > > drm/nouveau: Use connector kdev as aux device parent > > > > Ville Syrjälä (1): > > drm/dp_mst: Register AUX devices for MST ports > > > > .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c| 2 +- > > drivers/gpu/drm/bridge/analogix-anx78xx.c | 22 ++--- > > drivers/gpu/drm/drm_dp_aux_dev.c | 17 +++- > > drivers/gpu/drm/drm_dp_mst_topology.c | 106 > > ++--- > > drivers/gpu/drm/drm_sysfs.c| 23 + > > drivers/gpu/drm/nouveau/nouveau_connector.c| 2 +- > > include/drm/drm_dp_helper.h| 4 + > > include/drm/drm_dp_mst_helper.h| 6 ++ > > 8 files changed, 152 insertions(+), 30 deletions(-) > > -- Cheers, Lyude Paul ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: GPU passthrough support for Stoney [Radeon R2/R3/R4/R5 Graphics]?
On Wed, May 15, 2019 at 7:19 PM Alex Deucher wrote: > > On Wed, May 15, 2019 at 2:26 PM Micah Morton wrote: > > > > Hi folks, > > > > I'm interested in running a VM on a system with an integrated Stoney > > [Radeon R2/R3/R4/R5 Graphics] card and passing through the graphics > > card to the VM using the IOMMU. I'm wondering whether this is feasible > > and supposed to be doable with the right setup (as opposed to passing > > a discrete GPU to the VM, which I think is definitely doable?). > > > > So far, I can do all the qemu/kvm/vfio/iommu stuff to run the VM and > > pass the integrated GPU to it, but the drm driver in the VM fails > > during amdgpu_device_init(). Specifically, the logs show the SMU being > > unresponsive, which leads to a 'SMU firmware load failed' error > > message and kernel panic. I can share VM logs and the invocation of > > qemu and such if helpful, but first wanted to know at a high level if > > this should be feasible? > > > > P.S.: I'm not initializing the GPU in the host bios or host kernel at > > all, so I should be passing a fresh GPU to the VM. Also, I'm pretty > > sure I'm running the correct VGA bios for this GPU in the guest VM > > bios before guest boot. > > > > Any comments/suggestions would be appreciated! > > It should work in at least once as long as your vm is properly set up. Is there any reason running coreboot vs UEFI at host boot would make a difference? I was running a modified version of coreboot that avoids doing any GPU initialization in firmware -- so the first POST happens inside the guest. > Note that the driver needs access to the vbios image in the guest to > get device specific configuration details (clocks, display connector > configuration, etc.). Is there anything I need to do to ensure this besides passing '-device vfio-pci,...,romfile=/path/to/vgarom' to qemu? > > Alex ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 0/7] Add DP MST AUX devices
Hi, could we (and for future versions of this series and others) get a respin of this that's actually rebased against drm-tip? That is the defacto standard branch to do development on, and otherwise anyone trying to test these patches has to resolve merge conflicts (along with maintainers). The branch this appears to be based off of doesn't even have the new kref scheme for branch devices and ports. On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote: > From: Leo Li > > This series adds support for MST AUX devices. > > A more descriptive 'mstpath' attribute is also added to MST connector > devices. In addition, the DP aux device is made to be a child of the > corresponding connector's device where possible (*). This allows udev > rules to provide descriptive symlinks to the AUX devices. > > (*) This can only be done on drivers that register their connector and > aux devices via drm_connector_register() and drm_dp_aux_register() > respectively. The connector also needs to be registered before the aux > device. > > Leo Li (6): > drm/dp: Use non-cyclic idr > drm/dp-mst: Use connector kdev as aux device parent > drm/sysfs: Add mstpath attribute to connector devices > drm/amd/display: Use connector kdev as aux device parent > drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent > drm/nouveau: Use connector kdev as aux device parent > > Ville Syrjälä (1): > drm/dp_mst: Register AUX devices for MST ports > > .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c| 2 +- > drivers/gpu/drm/bridge/analogix-anx78xx.c | 22 ++--- > drivers/gpu/drm/drm_dp_aux_dev.c | 17 +++- > drivers/gpu/drm/drm_dp_mst_topology.c | 106 > ++--- > drivers/gpu/drm/drm_sysfs.c| 23 + > drivers/gpu/drm/nouveau/nouveau_connector.c| 2 +- > include/drm/drm_dp_helper.h| 4 + > include/drm/drm_dp_mst_helper.h| 6 ++ > 8 files changed, 152 insertions(+), 30 deletions(-) > -- Cheers, Lyude Paul ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Hard lockups with ROCM
Hi Daniel, On 2019-05-12 9:44 p.m., Daniel Kasak wrote: > [CAUTION: External Email] > Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and > 5.1.0 kernel. Things were going great with various boinc GPU tasks. > But there is a setiathome GPU task which reliably gives me a hard > lockup within about 30 minutes of running. I actually had to do *two* > emergency re-installs over the past week. Sorry to hear about your trouble. Do you have a second computer you can use to remote login into your system? Chances are that it's still responsive and only the screen is frozen. Also, you could try booting in console mode (without an xserver). The console usually still works even when the GPU compute units or SDMA engines are hanging. If you manage to do an emergency reboot with sysrq (remount-RO and reboot), you should see the kernel log of your previous session in /var/log. On Ubuntu it's in /var/log/kern.log. Not sure where it is on Gentoo. There is a good chance the log contains helpful information (e.g. if the driver detected a hang but failed to reset the GPU, or maybe a driver bug that leads to a deadlock or kernel panic). > Perhaps part of this was my fault ( running btrfs with lzo compression > on my root partition ... ). But absolutely part of this was the hard > lockups. I've tested all kinds of other things ( eg rebuilding lots of > stuff under Gentoo ) ... I don't have a general stability issue even > under hours of high load. But after restarting boinc with that same > setiathome task ... ! > > If someone wants me to sacrifice another installation, they can point > me to instructions for trying to gather more information. If you want to risk another installation, it may be a good idea to do it on a spare hard drive, or a spare partition on your existing hard drive. Also, use a more conventional choice of file system. A simple ext4 is pretty robust in my experience. We get hard lockups all the time. I usually only reinstall my system for big OS upgrades or if I'm stupid and mess something up myself. Which GPU are you using? There are some things you could try to narrow down the cause of your problem. 1. Monitor GPU temperature while running setiathome 2. If you're building your own kernel, enable some helpful kernel debug options that can provide very helpful diagnostic info: lock debugging, memory debugging, lockup/hang debugging 3. Try running with lower GPU clocks (rocm-smi --setperflevel low). If that fixes it, you may have inadequate cooling or power supply 4. Try running in console mode (without Xserver or other graphical UI running). If that fixes it, there may be a bad interaction between graphics and compute 5. Try updating your firmware. The DKMS package included in our ROCm releases includes the latest firmware. You should be able to extract it from there and drop it into /lib/firmware/amdgpu 6. Try to find a regression point. Is there any known version of ROCm or the kernel where it worked correctly? Regards, Felix > > Anyway ... perhaps more work around detecting and recovering from GPU > lockups is in order? > > Dan > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/7] drm/amd/display: Use connector kdev as aux device parent
On 5/16/19 11:18 AM, sunpeng...@amd.com wrote: > > From: Leo Li > > Set the connector's kernel device as the parent for the aux kernel > device. This allows udev rules to access connector attributes when > creating symlinks to aux devices. > > For example, the following udev rule: > > SUBSYSTEM=="drm_dp_aux_dev", SUBSYSTEMS=="drm", ATTRS{edid}=="*", > SYMLINK+="drm_dp_aux/by-name/$id" > > Will create the following symlinks using the connector's name: > > $ ls /dev/drm_dp_aux/by-name/ > card0-DP-1 card0-DP-2 card0-DP-3 > > Cc: Ville Syrjälä > Cc: Lyude Paul > Cc: Nicholas Kazlauskas > Cc: Harry Wentland > Cc: Jerry (Fangzhi) Zuo > Signed-off-by: Leo Li Reviewed-by: Nicholas Kazlauskas No idea why it wasn't like this in the first place. Nicholas Kazlauskas > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c > index a6f44a4..083fb97 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c > @@ -385,7 +385,7 @@ void amdgpu_dm_initialize_dp_connector(struct > amdgpu_display_manager *dm, > struct amdgpu_dm_connector > *aconnector) > { > aconnector->dm_dp_aux.aux.name = "dmdc"; > - aconnector->dm_dp_aux.aux.dev = dm->adev->dev; > + aconnector->dm_dp_aux.aux.dev = aconnector->base.kdev; > aconnector->dm_dp_aux.aux.transfer = dm_dp_aux_transfer; > aconnector->dm_dp_aux.ddc_service = aconnector->dc_link->ddc; > > -- > 2.7.4 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Hard lockups with ROCM
Dear Daniel, On 05/16/2019 01:52 PM, Daniel Kasak wrote: > On Thu, May 16, 2019 at 11:43 AM Alex Deucher wrote: > >> On Wed, May 15, 2019 at 8:33 PM Daniel Kasak >> wrote: >>> >>> On Mon, May 13, 2019 at 11:44 AM Daniel Kasak >> wrote: Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and >> 5.1.0 kernel. Things were going great with various boinc GPU tasks. But >> there is a setiathome GPU task which reliably gives me a hard lockup within >> about 30 minutes of running. I actually had to do *two* emergency >> re-installs over the past week. Perhaps part of this was my fault ( running >> btrfs with lzo compression on my root partition ... ). But absolutely part >> of this was the hard lockups. I've tested all kinds of other things ( eg >> rebuilding lots of stuff under Gentoo ) ... I don't have a general >> stability issue even under hours of high load. But after restarting boinc >> with that same setiathome task ... ! If someone wants me to sacrifice another installation, they can point >> me to instructions for trying to gather more information. Anyway ... perhaps more work around detecting and recovering from GPU >> lockups is in order? >>> >>> >>> That's what I was afraid of :( >> >> Not sure what you were afraid of. I don't think anyone has looked at >> setiathome on ROCm. I'd suggest filing a bug >> (https://bugs.freedesktop.org) and attaching your dmesg output and >> xorg log (if using X). If there is a GPU reset, note that you will >> need to restart your desktop environment because currently neither >> glamor or any compositors support GL robustness extensions to reset >> their contexts after a GPU reset. > Hi Alex. dmesg output is not available ... this is a *hard* lockup. I need > to power-cycle after it happens ( ALT + SysRq + { S , U , B } doesn't even > work ). That's why I asked for instructions to possibly gather more info. I > did check the xorg log after I did an emergency export of my filesystem ... > nothing of interest in there. It seems like I currently don't really have > enough info to make a bug report worthwhile. Does your board have a serial port? If yes, please use the serial console to gather the messages on another system. Sometimes the netconsole [1] is also supposed to be able to send the last Linux messages out. Kind regards, Paul [1]: https://www.kernel.org/doc/Documentation/networking/netconsole.txt smime.p7s Description: S/MIME Cryptographic Signature ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 6/7] drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent
From: Leo Li Set the connector's kernel device as the parent for the aux kernel device. This allows udev rules to access connector attributes when creating symlinks to aux devices. To do so, the connector needs to be registered beforehand. Therefore, shift aux registration to be after connector registration. Cc: Enric Balletbo i Serra Cc: Nicolas Boichat Signed-off-by: Leo Li --- drivers/gpu/drm/bridge/analogix-anx78xx.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/bridge/analogix-anx78xx.c b/drivers/gpu/drm/bridge/analogix-anx78xx.c index f8433c9..9fc8b4c 100644 --- a/drivers/gpu/drm/bridge/analogix-anx78xx.c +++ b/drivers/gpu/drm/bridge/analogix-anx78xx.c @@ -1018,17 +1018,6 @@ static int anx78xx_bridge_attach(struct drm_bridge *bridge) return -ENODEV; } - /* Register aux channel */ - anx78xx->aux.name = "DP-AUX"; - anx78xx->aux.dev = >client->dev; - anx78xx->aux.transfer = anx78xx_aux_transfer; - - err = drm_dp_aux_register(>aux); - if (err < 0) { - DRM_ERROR("Failed to register aux channel: %d\n", err); - return err; - } - err = drm_connector_init(bridge->dev, >connector, _connector_funcs, DRM_MODE_CONNECTOR_DisplayPort); @@ -1048,6 +1037,17 @@ static int anx78xx_bridge_attach(struct drm_bridge *bridge) anx78xx->connector.polled = DRM_CONNECTOR_POLL_HPD; + /* Register aux channel */ + anx78xx->aux.name = "DP-AUX"; + anx78xx->aux.dev = >connector.kdev; + anx78xx->aux.transfer = anx78xx_aux_transfer; + + err = drm_dp_aux_register(>aux); + if (err < 0) { + DRM_ERROR("Failed to register aux channel: %d\n", err); + return err; + } + err = drm_connector_attach_encoder(>connector, bridge->encoder); if (err) { -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 4/7] drm/sysfs: Add mstpath attribute to connector devices
From: Leo Li This can be used to create more descriptive symlinks for MST aux devices. Consider the following udev rule: SUBSYSTEM=="drm_dp_aux_dev", SUBSYSTEMS=="drm", ATTRS{mstpath}=="?*", SYMLINK+="drm_dp_aux/by-path/$attr{mstpath}" The following symlinks will be created (depending on your MST topology): $ ls /dev/drm_dp_aux/by-path/ card0-mst:0-1 card0-mst:0-1-1 card0-mst:0-1-8 card0-mst:0-8 Cc: Ville Syrjälä Cc: Lyude Paul Signed-off-by: Leo Li --- drivers/gpu/drm/drm_sysfs.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c index ecb7b33..e000e0c 100644 --- a/drivers/gpu/drm/drm_sysfs.c +++ b/drivers/gpu/drm/drm_sysfs.c @@ -229,16 +229,39 @@ static ssize_t modes_show(struct device *device, return written; } +static ssize_t mstpath_show(struct device *device, + struct device_attribute *attr, + char *buf) +{ + struct drm_connector *connector = to_drm_connector(device); + ssize_t ret = 0; + char *path; + + mutex_lock(>dev->mode_config.mutex); + if (!connector->path_blob_ptr) + goto unlock; + + path = connector->path_blob_ptr->data; + ret = snprintf(buf, PAGE_SIZE, "card%d-%s\n", + connector->dev->primary->index, path); + +unlock: + mutex_unlock(>dev->mode_config.mutex); + return ret; +} + static DEVICE_ATTR_RW(status); static DEVICE_ATTR_RO(enabled); static DEVICE_ATTR_RO(dpms); static DEVICE_ATTR_RO(modes); +static DEVICE_ATTR_RO(mstpath); static struct attribute *connector_dev_attrs[] = { _attr_status.attr, _attr_enabled.attr, _attr_dpms.attr, _attr_modes.attr, + _attr_mstpath.attr, NULL }; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/7] drm/dp: Use non-cyclic idr
From: Leo Li In preparation for adding aux devices for DP MST, make the IDR non-cyclic. That way, hotplug cycling MST devices won't needlessly increment the minor version index. Signed-off-by: Leo Li Reviewed-by: Lyude Paul Reviewed-by: Ville Syrjälä --- drivers/gpu/drm/drm_dp_aux_dev.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c b/drivers/gpu/drm/drm_dp_aux_dev.c index 0e4f25d..6d84611 100644 --- a/drivers/gpu/drm/drm_dp_aux_dev.c +++ b/drivers/gpu/drm/drm_dp_aux_dev.c @@ -80,8 +80,7 @@ static struct drm_dp_aux_dev *alloc_drm_dp_aux_dev(struct drm_dp_aux *aux) kref_init(_dev->refcount); mutex_lock(_idr_mutex); - index = idr_alloc_cyclic(_idr, aux_dev, 0, DRM_AUX_MINORS, -GFP_KERNEL); + index = idr_alloc(_idr, aux_dev, 0, DRM_AUX_MINORS, GFP_KERNEL); mutex_unlock(_idr_mutex); if (index < 0) { kfree(aux_dev); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 5/7] drm/amd/display: Use connector kdev as aux device parent
From: Leo Li Set the connector's kernel device as the parent for the aux kernel device. This allows udev rules to access connector attributes when creating symlinks to aux devices. For example, the following udev rule: SUBSYSTEM=="drm_dp_aux_dev", SUBSYSTEMS=="drm", ATTRS{edid}=="*", SYMLINK+="drm_dp_aux/by-name/$id" Will create the following symlinks using the connector's name: $ ls /dev/drm_dp_aux/by-name/ card0-DP-1 card0-DP-2 card0-DP-3 Cc: Ville Syrjälä Cc: Lyude Paul Cc: Nicholas Kazlauskas Cc: Harry Wentland Cc: Jerry (Fangzhi) Zuo Signed-off-by: Leo Li --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c index a6f44a4..083fb97 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -385,7 +385,7 @@ void amdgpu_dm_initialize_dp_connector(struct amdgpu_display_manager *dm, struct amdgpu_dm_connector *aconnector) { aconnector->dm_dp_aux.aux.name = "dmdc"; - aconnector->dm_dp_aux.aux.dev = dm->adev->dev; + aconnector->dm_dp_aux.aux.dev = aconnector->base.kdev; aconnector->dm_dp_aux.aux.transfer = dm_dp_aux_transfer; aconnector->dm_dp_aux.ddc_service = aconnector->dc_link->ddc; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 7/7] drm/nouveau: Use connector kdev as aux device parent
From: Leo Li Set the connector's kernel device as the parent for the aux kernel device. This allows udev rules to access connector attributes when creating symlinks to aux devices. Cc: Ben Skeggs Cc: Lyude Paul Signed-off-by: Leo Li --- drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c index 3f463c9..738782a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_connector.c +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c @@ -1345,7 +1345,7 @@ nouveau_connector_create(struct drm_device *dev, break; case DRM_MODE_CONNECTOR_DisplayPort: case DRM_MODE_CONNECTOR_eDP: - nv_connector->aux.dev = dev->dev; + nv_connector->aux.dev = connector->kdev; nv_connector->aux.transfer = nouveau_connector_aux_xfer; snprintf(aux_name, sizeof(aux_name), "sor-%04x-%04x", dcbe->hasht, dcbe->hashm); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 3/7] drm/dp-mst: Use connector kdev as aux device parent
From: Leo Li Placing the MST aux device as a child of the connector gives udev the ability to access the connector device's attributes. This will come in handy when writing udev rules to create more descriptive symlinks to the MST aux devices. Cc: Ville Syrjälä Cc: Lyude Paul Signed-off-by: Leo Li --- drivers/gpu/drm/drm_dp_mst_topology.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 54da68e..cd2f3c4 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -1269,6 +1269,9 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb, } (*mstb->mgr->cbs->register_connector)(port->connector); + if (port->connector->registration_state == DRM_CONNECTOR_REGISTERED) + port->aux.dev = port->connector->kdev; + drm_dp_aux_register_devnode(>aux); } -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 0/7] Add DP MST AUX devices
From: Leo Li This series adds support for MST AUX devices. A more descriptive 'mstpath' attribute is also added to MST connector devices. In addition, the DP aux device is made to be a child of the corresponding connector's device where possible (*). This allows udev rules to provide descriptive symlinks to the AUX devices. (*) This can only be done on drivers that register their connector and aux devices via drm_connector_register() and drm_dp_aux_register() respectively. The connector also needs to be registered before the aux device. Leo Li (6): drm/dp: Use non-cyclic idr drm/dp-mst: Use connector kdev as aux device parent drm/sysfs: Add mstpath attribute to connector devices drm/amd/display: Use connector kdev as aux device parent drm/bridge/analogix-anx78xx: Use connector kdev as aux device parent drm/nouveau: Use connector kdev as aux device parent Ville Syrjälä (1): drm/dp_mst: Register AUX devices for MST ports .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c| 2 +- drivers/gpu/drm/bridge/analogix-anx78xx.c | 22 ++--- drivers/gpu/drm/drm_dp_aux_dev.c | 17 +++- drivers/gpu/drm/drm_dp_mst_topology.c | 106 ++--- drivers/gpu/drm/drm_sysfs.c| 23 + drivers/gpu/drm/nouveau/nouveau_connector.c| 2 +- include/drm/drm_dp_helper.h| 4 + include/drm/drm_dp_mst_helper.h| 6 ++ 8 files changed, 152 insertions(+), 30 deletions(-) -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/7] drm/dp_mst: Register AUX devices for MST ports
From: Ville Syrjälä All available downstream ports - physical and logical - are exposed for each MST device. They are listed in /dev/, following the same naming scheme as SST devices by appending an incremental ID. Although all downstream ports are exposed, only some will work as expected. Consider the following topology: +-+ | ASIC | +-+ Conn-0| | +v+ +| MST HUB |+ |+-+| | | |Port-1 Port-2| +-v-+ +-v-+ | MST | | SST | | Display | | Display | +---+ +---+ |Port-1 x MST Path | MST Device --+-- sst:0 | MST Hub mst:0-1 | MST Display mst:0-1-1 | MST Display's disconnected DP out mst:0-1-8 | MST Display's internal sink mst:0-2 | SST Display On certain MST displays, the upstream physical port will ACK DPCD reads. However, reads on the local logical port to the internal sink will *NAK*. i.e. reading mst:0-1 ACKs, but mst:0-1-8 NAKs. There may also be duplicates. Some displays will return the same GUID when reading DPCD from both mst:0-1 and mst:0-1-8. There are some device-dependent behavior as well. The MST hub used during testing will actually *ACK* read requests on a disconnected physical port, whereas the MST displays will NAK. In light of these discrepancies, it's simpler to expose all downstream ports - both physical and logical - and let the user decide what to use. Cc: Lyude Paul Signed-off-by: Ville Syrjälä Signed-off-by: Leo Li --- drivers/gpu/drm/drm_dp_aux_dev.c | 14 - drivers/gpu/drm/drm_dp_mst_topology.c | 103 +- include/drm/drm_dp_helper.h | 4 ++ include/drm/drm_dp_mst_helper.h | 6 ++ 4 files changed, 112 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c b/drivers/gpu/drm/drm_dp_aux_dev.c index 6d84611..01c02b9 100644 --- a/drivers/gpu/drm/drm_dp_aux_dev.c +++ b/drivers/gpu/drm/drm_dp_aux_dev.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -114,6 +115,7 @@ static ssize_t name_show(struct device *dev, return res; } + static DEVICE_ATTR_RO(name); static struct attribute *drm_dp_aux_attrs[] = { @@ -160,7 +162,11 @@ static ssize_t auxdev_read_iter(struct kiocb *iocb, struct iov_iter *to) break; } - res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo); + if (aux_dev->aux->is_remote) + res = drm_dp_mst_dpcd_read(aux_dev->aux, pos, buf, todo); + else + res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo); + if (res <= 0) break; @@ -207,7 +213,11 @@ static ssize_t auxdev_write_iter(struct kiocb *iocb, struct iov_iter *from) break; } - res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo); + if (aux_dev->aux->is_remote) + res = drm_dp_mst_dpcd_write(aux_dev->aux, pos, buf, todo); + else + res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo); + if (res <= 0) break; diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 2ab16c9..54da68e 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -35,6 +35,8 @@ #include #include +#include "drm_crtc_helper_internal.h" + /** * DOC: dp mst helper * @@ -52,6 +54,9 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr, int id, struct drm_dp_payload *payload); +static int drm_dp_send_dpcd_read(struct drm_dp_mst_topology_mgr *mgr, +struct drm_dp_mst_port *port, +int offset, int size, u8 *bytes); static int drm_dp_send_dpcd_write(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port, int offset, int size, u8 *bytes); @@ -941,6 +946,8 @@ static void drm_dp_destroy_port(struct kref *kref) struct drm_dp_mst_topology_mgr *mgr = port->mgr; if (!port->input) { + drm_dp_aux_unregister_devnode(>aux); + port->vcpi.num_slots = 0; kfree(port->cached_edid); @@ -1095,6 +1102,46 @@ static bool drm_dp_port_setup_pdt(struct drm_dp_mst_port *port) return send_link; } +/** + * drm_dp_mst_dpcd_read() - read a series of bytes from the DPCD via sideband + * @aux: Fake sideband AUX CH + * @offset: address of the
Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit
On Thu, May 16, 2019 at 10:10 AM Tejun Heo wrote: > I haven't gone through the patchset yet but some quick comments. > > On Wed, May 15, 2019 at 10:29:21PM -0400, Kenny Ho wrote: > > Given this controller is specific to the drm kernel subsystem which > > uses minor to identify drm device, I don't see a need to complicate > > the interfaces more by having major and a key. As you can see in the > > examples below, the drm device minor corresponds to the line number. > > I am not sure how strict cgroup upstream is about the convention but I > > We're pretty strict. > > > am hoping there are flexibility here to allow for what I have > > implemented. There are a couple of other things I have done that is > > So, please follow the interface conventions. We can definitely add > new ones but that would need functional reasons. > > > not described in the convention: 1) inclusion of read-only *.help file > > at the root cgroup, 2) use read-only (which I can potentially make rw) > > *.default file instead of having a default entries (since the default > > can be different for different devices) inside the control files (this > > way, the resetting of cgroup values for all the drm devices, can be > > done by a simple 'cp'.) > > Again, please follow the existing conventions. There's a lot more > harm than good in every controller being creative in their own way. > It's trivial to build convenience features in userspace. Please do it > there. I can certainly remove the ro *.help file and leave the documentation to Documentation/, but for the *.default I do have a functional reason to it. As far as I can tell from the convention, the default is per cgroup and there is no way to describe per device default. Although, perhaps we are talking about two different kinds of defaults. Anyway, I can leave the discussion to a more detailed review. Regards, Kenny ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit
On Thu, May 16, 2019 at 10:12 AM Christian König wrote: > Am 16.05.19 um 16:03 schrieb Kenny Ho: > > On Thu, May 16, 2019 at 3:25 AM Christian König > > wrote: > >> Am 16.05.19 um 09:16 schrieb Koenig, Christian: > >> We need something like the Linux sysfs location or similar to have a > >> stable implementation. > > I get that, which is why I don't use minor to identify cards in user > > space apps I wrote: > > https://github.com/RadeonOpenCompute/k8s-device-plugin/blob/c2659c9d1d0713cad36fb5256681125121e6e32f/internal/pkg/amdgpu/amdgpu.go#L85 > > Yeah, that is certainly a possibility. > > > But within the kernel, I think my use of minor is consistent with the > > rest of the drm subsystem. I hope I don't need to reform the way the > > drm subsystem use minor in order to introduce a cgroup controller. > > Well I would try to avoid using the minor and at least look for > alternatives. E.g. what does udev uses to identify the devices for > example? And IIRC we have something like a "device-name" in the kernel > as well (what's printed in the logs). > > The minimum we need to do is get away from the minor=linenum approach, > cause as Daniel pointed out the minor allocation is quite a mess and not > necessary contiguous. I noticed :) but looks like there isn't much of a choice from what Tejun/cgroup replied about convention. Regards, Kenny ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit
Am 16.05.19 um 16:03 schrieb Kenny Ho: On Thu, May 16, 2019 at 3:25 AM Christian König wrote: Am 16.05.19 um 09:16 schrieb Koenig, Christian: Am 16.05.19 um 04:29 schrieb Kenny Ho: On Wed, May 15, 2019 at 5:26 PM Welty, Brian wrote: On 5/9/2019 2:04 PM, Kenny Ho wrote: Each file is multi-lined with one entry/line per drm device. Multi-line is correct for multiple devices, but I believe you need to use a KEY to denote device for both your set and get routines. I didn't see your set functions reading a key, or the get functions printing the key in output. cgroups-v2 conventions mention using KEY of major:minor, but I think you can use drm_minor as key? Given this controller is specific to the drm kernel subsystem which uses minor to identify drm device, Wait a second, using the DRM minor is a good idea in the first place. Well that should have read "is not a good idea".. I have a test system with a Vega10 and a Vega20. Which device gets which minor is not stable, but rather defined by the scan order of the PCIe bus. Normally the scan order is always the same, but adding or removing devices or delaying things just a little bit during init is enough to change this. We need something like the Linux sysfs location or similar to have a stable implementation. I get that, which is why I don't use minor to identify cards in user space apps I wrote: https://github.com/RadeonOpenCompute/k8s-device-plugin/blob/c2659c9d1d0713cad36fb5256681125121e6e32f/internal/pkg/amdgpu/amdgpu.go#L85 Yeah, that is certainly a possibility. But within the kernel, I think my use of minor is consistent with the rest of the drm subsystem. I hope I don't need to reform the way the drm subsystem use minor in order to introduce a cgroup controller. Well I would try to avoid using the minor and at least look for alternatives. E.g. what does udev uses to identify the devices for example? And IIRC we have something like a "device-name" in the kernel as well (what's printed in the logs). The minimum we need to do is get away from the minor=linenum approach, cause as Daniel pointed out the minor allocation is quite a mess and not necessary contiguous. Regards, Christian. Regards, Kenny ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit
Hello, I haven't gone through the patchset yet but some quick comments. On Wed, May 15, 2019 at 10:29:21PM -0400, Kenny Ho wrote: > Given this controller is specific to the drm kernel subsystem which > uses minor to identify drm device, I don't see a need to complicate > the interfaces more by having major and a key. As you can see in the > examples below, the drm device minor corresponds to the line number. > I am not sure how strict cgroup upstream is about the convention but I We're pretty strict. > am hoping there are flexibility here to allow for what I have > implemented. There are a couple of other things I have done that is So, please follow the interface conventions. We can definitely add new ones but that would need functional reasons. > not described in the convention: 1) inclusion of read-only *.help file > at the root cgroup, 2) use read-only (which I can potentially make rw) > *.default file instead of having a default entries (since the default > can be different for different devices) inside the control files (this > way, the resetting of cgroup values for all the drm devices, can be > done by a simple 'cp'.) Again, please follow the existing conventions. There's a lot more harm than good in every controller being creative in their own way. It's trivial to build convenience features in userspace. Please do it there. > > Is this really useful for an administrator to control? > > Isn't the resource we want to control actually the physical backing store? > That's correct. This is just the first level of control since the > backing store can be backed by different type of memory. I am in the > process of adding at least two more resources. Stay tuned. I am > doing the charge here to enforce the idea of "creator is deemed owner" > at a place where the code is shared by all (the init function.) Ideally, controller should only control hard resources which impact behaviors and performance which are immediately visible to users. Thanks. -- tejun ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit
Am 16.05.19 um 14:28 schrieb Daniel Vetter: > [CAUTION: External Email] > > On Thu, May 16, 2019 at 09:25:31AM +0200, Christian König wrote: >> Am 16.05.19 um 09:16 schrieb Koenig, Christian: >>> Am 16.05.19 um 04:29 schrieb Kenny Ho: [CAUTION: External Email] On Wed, May 15, 2019 at 5:26 PM Welty, Brian wrote: > On 5/9/2019 2:04 PM, Kenny Ho wrote: >> There are four control file types, >> stats (ro) - display current measured values for a resource >> max (rw) - limits for a resource >> default (ro, root cgroup only) - default values for a resource >> help (ro, root cgroup only) - help string for a resource >> >> Each file is multi-lined with one entry/line per drm device. > Multi-line is correct for multiple devices, but I believe you need > to use a KEY to denote device for both your set and get routines. > I didn't see your set functions reading a key, or the get functions > printing the key in output. > cgroups-v2 conventions mention using KEY of major:minor, but I think > you can use drm_minor as key? Given this controller is specific to the drm kernel subsystem which uses minor to identify drm device, >>> Wait a second, using the DRM minor is a good idea in the first place. >> Well that should have read "is not a good idea".. > What else should we use? Well what does for example udev uses to identify a device? >> Christian. >> >>> I have a test system with a Vega10 and a Vega20. Which device gets which >>> minor is not stable, but rather defined by the scan order of the PCIe bus. >>> >>> Normally the scan order is always the same, but adding or removing >>> devices or delaying things just a little bit during init is enough to >>> change this. >>> >>> We need something like the Linux sysfs location or similar to have a >>> stable implementation. > You can go from sysfs location to drm class directory (in sysfs) and back. > That means if you care you need to walk sysfs yourself a bit, but using > the drm minor isn't a blocker itself. Yeah, agreed that userspace could do this. But I think if there is an of hand alternative we should use this instead. > One downside with the drm minor is that it's pretty good nonsense once you > have more than 64 gpus though, due to how we space render and legacy nodes > in the minor ids :-) Ok, another good reason to at least not use the minor=linenum approach. Christian. > -Daniel >>> Regards, >>> Christian. >>> I don't see a need to complicate the interfaces more by having major and a key. As you can see in the examples below, the drm device minor corresponds to the line number. I am not sure how strict cgroup upstream is about the convention but I am hoping there are flexibility here to allow for what I have implemented. There are a couple of other things I have done that is not described in the convention: 1) inclusion of read-only *.help file at the root cgroup, 2) use read-only (which I can potentially make rw) *.default file instead of having a default entries (since the default can be different for different devices) inside the control files (this way, the resetting of cgroup values for all the drm devices, can be done by a simple 'cp'.) >> Usage examples: >> // set limit for card1 to 1GB >> sed -i '2s/.*/1073741824/' /sys/fs/cgroup//drm.buffer.total.max >> >> // set limit for card0 to 512MB >> sed -i '1s/.*/536870912/' /sys/fs/cgroup//drm.buffer.total.max >> /** @file drm_gem.c >> @@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device >> *dev, >> obj->handle_count = 0; >> obj->size = size; >> drm_vma_node_reset(>vma_node); >> + >> + obj->drmcgrp = get_drmcgrp(current); >> + drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size); > Why do the charging here? > There is no backing store yet for the buffer, so this is really tracking > something akin to allowed virtual memory for GEM objects? > Is this really useful for an administrator to control? > Isn't the resource we want to control actually the physical backing store? That's correct. This is just the first level of control since the backing store can be backed by different type of memory. I am in the process of adding at least two more resources. Stay tuned. I am doing the charge here to enforce the idea of "creator is deemed owner" at a place where the code is shared by all (the init function.) >> + while (i <= max_minor && limits != NULL) { >> + sval = strsep(, "\n"); >> + rc = kstrtoll(sval, 0, ); > Input should be "KEY VALUE", so KEY will determine device to apply this > to. > Also, per cgroups-v2 documentation of limits, I believe need to parse and > handle the special "max" input value. > > parse_resources() in rdma
Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit
On Thu, May 16, 2019 at 3:25 AM Christian König wrote: > Am 16.05.19 um 09:16 schrieb Koenig, Christian: > > Am 16.05.19 um 04:29 schrieb Kenny Ho: > >> On Wed, May 15, 2019 at 5:26 PM Welty, Brian wrote: > >>> On 5/9/2019 2:04 PM, Kenny Ho wrote: > Each file is multi-lined with one entry/line per drm device. > >>> Multi-line is correct for multiple devices, but I believe you need > >>> to use a KEY to denote device for both your set and get routines. > >>> I didn't see your set functions reading a key, or the get functions > >>> printing the key in output. > >>> cgroups-v2 conventions mention using KEY of major:minor, but I think > >>> you can use drm_minor as key? > >> Given this controller is specific to the drm kernel subsystem which > >> uses minor to identify drm device, > > Wait a second, using the DRM minor is a good idea in the first place. > Well that should have read "is not a good idea".. > > I have a test system with a Vega10 and a Vega20. Which device gets which > minor is not stable, but rather defined by the scan order of the PCIe bus. > > Normally the scan order is always the same, but adding or removing > devices or delaying things just a little bit during init is enough to > change this. > > We need something like the Linux sysfs location or similar to have a > stable implementation. I get that, which is why I don't use minor to identify cards in user space apps I wrote: https://github.com/RadeonOpenCompute/k8s-device-plugin/blob/c2659c9d1d0713cad36fb5256681125121e6e32f/internal/pkg/amdgpu/amdgpu.go#L85 But within the kernel, I think my use of minor is consistent with the rest of the drm subsystem. I hope I don't need to reform the way the drm subsystem use minor in order to introduce a cgroup controller. Regards, Kenny ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit
On Thu, May 16, 2019 at 09:25:31AM +0200, Christian König wrote: > Am 16.05.19 um 09:16 schrieb Koenig, Christian: > > Am 16.05.19 um 04:29 schrieb Kenny Ho: > > > [CAUTION: External Email] > > > > > > On Wed, May 15, 2019 at 5:26 PM Welty, Brian > > > wrote: > > > > On 5/9/2019 2:04 PM, Kenny Ho wrote: > > > > > There are four control file types, > > > > > stats (ro) - display current measured values for a resource > > > > > max (rw) - limits for a resource > > > > > default (ro, root cgroup only) - default values for a resource > > > > > help (ro, root cgroup only) - help string for a resource > > > > > > > > > > Each file is multi-lined with one entry/line per drm device. > > > > Multi-line is correct for multiple devices, but I believe you need > > > > to use a KEY to denote device for both your set and get routines. > > > > I didn't see your set functions reading a key, or the get functions > > > > printing the key in output. > > > > cgroups-v2 conventions mention using KEY of major:minor, but I think > > > > you can use drm_minor as key? > > > Given this controller is specific to the drm kernel subsystem which > > > uses minor to identify drm device, > > Wait a second, using the DRM minor is a good idea in the first place. > > Well that should have read "is not a good idea".. What else should we use? > > Christian. > > > > > I have a test system with a Vega10 and a Vega20. Which device gets which > > minor is not stable, but rather defined by the scan order of the PCIe bus. > > > > Normally the scan order is always the same, but adding or removing > > devices or delaying things just a little bit during init is enough to > > change this. > > > > We need something like the Linux sysfs location or similar to have a > > stable implementation. You can go from sysfs location to drm class directory (in sysfs) and back. That means if you care you need to walk sysfs yourself a bit, but using the drm minor isn't a blocker itself. One downside with the drm minor is that it's pretty good nonsense once you have more than 64 gpus though, due to how we space render and legacy nodes in the minor ids :-) -Daniel > > > > Regards, > > Christian. > > > > >I don't see a need to complicate > > > the interfaces more by having major and a key. As you can see in the > > > examples below, the drm device minor corresponds to the line number. > > > I am not sure how strict cgroup upstream is about the convention but I > > > am hoping there are flexibility here to allow for what I have > > > implemented. There are a couple of other things I have done that is > > > not described in the convention: 1) inclusion of read-only *.help file > > > at the root cgroup, 2) use read-only (which I can potentially make rw) > > > *.default file instead of having a default entries (since the default > > > can be different for different devices) inside the control files (this > > > way, the resetting of cgroup values for all the drm devices, can be > > > done by a simple 'cp'.) > > > > > > > > Usage examples: > > > > > // set limit for card1 to 1GB > > > > > sed -i '2s/.*/1073741824/' > > > > > /sys/fs/cgroup//drm.buffer.total.max > > > > > > > > > > // set limit for card0 to 512MB > > > > > sed -i '1s/.*/536870912/' /sys/fs/cgroup//drm.buffer.total.max > > > > >/** @file drm_gem.c > > > > > @@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct > > > > > drm_device *dev, > > > > > obj->handle_count = 0; > > > > > obj->size = size; > > > > > drm_vma_node_reset(>vma_node); > > > > > + > > > > > + obj->drmcgrp = get_drmcgrp(current); > > > > > + drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size); > > > > Why do the charging here? > > > > There is no backing store yet for the buffer, so this is really > > > > tracking something akin to allowed virtual memory for GEM objects? > > > > Is this really useful for an administrator to control? > > > > Isn't the resource we want to control actually the physical backing > > > > store? > > > That's correct. This is just the first level of control since the > > > backing store can be backed by different type of memory. I am in the > > > process of adding at least two more resources. Stay tuned. I am > > > doing the charge here to enforce the idea of "creator is deemed owner" > > > at a place where the code is shared by all (the init function.) > > > > > > > > + while (i <= max_minor && limits != NULL) { > > > > > + sval = strsep(, "\n"); > > > > > + rc = kstrtoll(sval, 0, ); > > > > Input should be "KEY VALUE", so KEY will determine device to apply this > > > > to. > > > > Also, per cgroups-v2 documentation of limits, I believe need to parse > > > > and handle the special "max" input value. > > > > > > > > parse_resources() in rdma controller is example for both of above. > > > Please see my previous reply for the rationale of my hope to not need > > > a key. I can certainly add handling of "max" and
Re: [PATCH 2/2] drm/amd/amdgpu: Fix maybe-uninitialized warning in df_v3_6_pmc_start
Series is: Reviewed-by: Alex Deucher On Wed, May 15, 2019 at 4:24 PM Bhawanpreet Lakha wrote: > > This fixes the warning below > > error: ‘ret’ may be used uninitialized in this function > [-Werror=maybe-uninitialized] > int xgmi_tx_link, ret; > ^~~ > Signed-off-by: Bhawanpreet Lakha > --- > drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c > b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c > index 201c00411720..a8442508eea5 100644 > --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c > +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c > @@ -459,7 +459,7 @@ static int df_v3_6_stop_xgmi_link_cntr(struct > amdgpu_device *adev, > static int df_v3_6_pmc_start(struct amdgpu_device *adev, uint64_t config, > int is_enable) > { > - int xgmi_tx_link, ret; > + int xgmi_tx_link, ret = 0; > > switch (adev->asic_type) { > case CHIP_VEGA20: > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: don't read DF register for SRIOV
Under SRIOV, reading DF register has chance to lead to AER error in host side, just skip reading it. Signed-off-by: Monk Liu Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index a417763..b5bf9ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -837,7 +837,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev) if (amdgpu_emu_mode != 1) adev->gmc.vram_width = amdgpu_atomfirmware_get_vram_width(adev); - if (!adev->gmc.vram_width) { + if (!adev->gmc.vram_width && !amdgpu_sriov_vf(adev)) { /* hbm memory channel size */ if (adev->flags & AMD_IS_APU) chansize = 64; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov
For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN, and DF related registers is not readable, seems hardcord is the only way to set the correct vram_width Signed-off-by: Trigger Huang Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index c221570..a417763 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -848,6 +848,13 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev) adev->gmc.vram_width = numchan * chansize; } + /* For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN, +* and DF related registers is not readable, seems hardcord is the +* only way to set the correct vram_width */ + if (amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_VEGA10)) { + adev->gmc.vram_width = 2048; + } + /* size in MB on si */ adev->gmc.mc_vram_size = adev->nbio_funcs->get_memsize(adev) * 1024ULL * 1024ULL; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV
PSP fw primary buffer is not used under SRIOV. Therefore, we don't need to allocate memory for it. v2: remove superfluous check for amdgpu_bo_free_kernel(). Signed-off-by: Yintian Tao Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index c567a55..af9835c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev) if (!psp->cmd) return -ENOMEM; - ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG, - AMDGPU_GEM_DOMAIN_GTT, - >fw_pri_bo, - >fw_pri_mc_addr, - >fw_pri_buf); - if (ret) - goto failed; + /* this fw pri bo is not used under SRIOV */ + if (!amdgpu_sriov_vf(psp->adev)) { + ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG, + AMDGPU_GEM_DOMAIN_GTT, + >fw_pri_bo, + >fw_pri_mc_addr, + >fw_pri_buf); + if (ret) + goto failed; + } ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Hard lockups with ROCM
On Thu, May 16, 2019 at 11:43 AM Alex Deucher wrote: > On Wed, May 15, 2019 at 8:33 PM Daniel Kasak > wrote: > > > > On Mon, May 13, 2019 at 11:44 AM Daniel Kasak > wrote: > >> > >> Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and > 5.1.0 kernel. Things were going great with various boinc GPU tasks. But > there is a setiathome GPU task which reliably gives me a hard lockup within > about 30 minutes of running. I actually had to do *two* emergency > re-installs over the past week. Perhaps part of this was my fault ( running > btrfs with lzo compression on my root partition ... ). But absolutely part > of this was the hard lockups. I've tested all kinds of other things ( eg > rebuilding lots of stuff under Gentoo ) ... I don't have a general > stability issue even under hours of high load. But after restarting boinc > with that same setiathome task ... ! > >> > >> If someone wants me to sacrifice another installation, they can point > me to instructions for trying to gather more information. > >> > >> Anyway ... perhaps more work around detecting and recovering from GPU > lockups is in order? > >> > >> Dan > > > > > > > > > > That's what I was afraid of :( > > Not sure what you were afraid of. I don't think anyone has looked at > setiathome on ROCm. I'd suggest filing a bug > (https://bugs.freedesktop.org) and attaching your dmesg output and > xorg log (if using X). If there is a GPU reset, note that you will > need to restart your desktop environment because currently neither > glamor or any compositors support GL robustness extensions to reset > their contexts after a GPU reset. > > Alex > Hi Alex. dmesg output is not available ... this is a *hard* lockup. I need to power-cycle after it happens ( ALT + SysRq + { S , U , B } doesn't even work ). That's why I asked for instructions to possibly gather more info. I did check the xorg log after I did an emergency export of my filesystem ... nothing of interest in there. It seems like I currently don't really have enough info to make a bug report worthwhile. Dan ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Compiler warnings with current amd-staging-drm-next
On 2019-05-10 1:17 p.m., Michel Dänzer wrote: > > drivers/gpu/drm//amd/amdgpu/df_v3_6.c: In function ‘df_v3_6_pmc_start’: > drivers/gpu/drm//amd/amdgpu/df_v3_6.c:482:9: warning: ‘ret’ may be used > uninitialized in this function [-Wmaybe-uninitialized] > return ret; > ^~~ This warning is still there; the others have been fixed. Other warnings have been introduced though: drivers/gpu/drm//amd/amdgpu/../powerplay/vega20_ppt.c: In function ‘vega20_get_ppfeature_status’: drivers/gpu/drm//amd/amdgpu/../powerplay/vega20_ppt.c:2387:103: warning: ‘feature_mask’ may be used uninitialized in this function [-Wmaybe-uninitialized] *features_enabled = uint64_t)feature_mask[0] << SMU_FEATURES_LOW_SHIFT) & SMU_FEATURES_LOW_MASK) | ~^ (((uint64_t)feature_mask[1] << SMU_FEATURES_HIGH_SHIFT) & SMU_FEATURES_HIGH_MASK)); ~~ drivers/gpu/drm//amd/amdgpu/../powerplay/vega20_ppt.c: In function ‘vega20_set_ppfeature_status’: drivers/gpu/drm//amd/amdgpu/../powerplay/vega20_ppt.c:2387:103: warning: ‘feature_mask’ may be used uninitialized in this function [-Wmaybe-uninitialized] *features_enabled = uint64_t)feature_mask[0] << SMU_FEATURES_LOW_SHIFT) & SMU_FEATURES_LOW_MASK) | ~^ (((uint64_t)feature_mask[1] << SMU_FEATURES_HIGH_SHIFT) & SMU_FEATURES_HIGH_MASK)); ~~ -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit
Am 16.05.19 um 09:16 schrieb Koenig, Christian: Am 16.05.19 um 04:29 schrieb Kenny Ho: [CAUTION: External Email] On Wed, May 15, 2019 at 5:26 PM Welty, Brian wrote: On 5/9/2019 2:04 PM, Kenny Ho wrote: There are four control file types, stats (ro) - display current measured values for a resource max (rw) - limits for a resource default (ro, root cgroup only) - default values for a resource help (ro, root cgroup only) - help string for a resource Each file is multi-lined with one entry/line per drm device. Multi-line is correct for multiple devices, but I believe you need to use a KEY to denote device for both your set and get routines. I didn't see your set functions reading a key, or the get functions printing the key in output. cgroups-v2 conventions mention using KEY of major:minor, but I think you can use drm_minor as key? Given this controller is specific to the drm kernel subsystem which uses minor to identify drm device, Wait a second, using the DRM minor is a good idea in the first place. Well that should have read "is not a good idea".. Christian. I have a test system with a Vega10 and a Vega20. Which device gets which minor is not stable, but rather defined by the scan order of the PCIe bus. Normally the scan order is always the same, but adding or removing devices or delaying things just a little bit during init is enough to change this. We need something like the Linux sysfs location or similar to have a stable implementation. Regards, Christian. I don't see a need to complicate the interfaces more by having major and a key. As you can see in the examples below, the drm device minor corresponds to the line number. I am not sure how strict cgroup upstream is about the convention but I am hoping there are flexibility here to allow for what I have implemented. There are a couple of other things I have done that is not described in the convention: 1) inclusion of read-only *.help file at the root cgroup, 2) use read-only (which I can potentially make rw) *.default file instead of having a default entries (since the default can be different for different devices) inside the control files (this way, the resetting of cgroup values for all the drm devices, can be done by a simple 'cp'.) Usage examples: // set limit for card1 to 1GB sed -i '2s/.*/1073741824/' /sys/fs/cgroup//drm.buffer.total.max // set limit for card0 to 512MB sed -i '1s/.*/536870912/' /sys/fs/cgroup//drm.buffer.total.max /** @file drm_gem.c @@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->handle_count = 0; obj->size = size; drm_vma_node_reset(>vma_node); + + obj->drmcgrp = get_drmcgrp(current); + drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size); Why do the charging here? There is no backing store yet for the buffer, so this is really tracking something akin to allowed virtual memory for GEM objects? Is this really useful for an administrator to control? Isn't the resource we want to control actually the physical backing store? That's correct. This is just the first level of control since the backing store can be backed by different type of memory. I am in the process of adding at least two more resources. Stay tuned. I am doing the charge here to enforce the idea of "creator is deemed owner" at a place where the code is shared by all (the init function.) + while (i <= max_minor && limits != NULL) { + sval = strsep(, "\n"); + rc = kstrtoll(sval, 0, ); Input should be "KEY VALUE", so KEY will determine device to apply this to. Also, per cgroups-v2 documentation of limits, I believe need to parse and handle the special "max" input value. parse_resources() in rdma controller is example for both of above. Please see my previous reply for the rationale of my hope to not need a key. I can certainly add handling of "max" and "default". +void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev, + size_t size) Shouldn't this return an error and be implemented with same semantics as the try_charge() functions of other controllers? Below will allow stats_total_allocated to overrun limits_total_allocated. This is because I am charging the buffer at the init of the buffer which does not fail so the "try" (drmcgrp_bo_can_allocate) is separate and placed earlier and nearer other condition where gem object allocation may fail. In other words, there are multiple possibilities for which gem allocation may fail (cgroup limit being one of them) and satisfying cgroup limit does not mean a charge is needed. I can certainly combine the two functions to have an additional try_charge semantic as well if that is really needed. Regards, Kenny ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list
Re: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV
Am 16.05.19 um 07:11 schrieb Yintian Tao: PSP fw primary buffer is not used under SRIOV Therefore, we don't need to allocate memory for it. Signed-off-by: Yintian Tao Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index c567a55..d3c77d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev) if (!psp->cmd) return -ENOMEM; - ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG, - AMDGPU_GEM_DOMAIN_GTT, - >fw_pri_bo, - >fw_pri_mc_addr, - >fw_pri_buf); - if (ret) - goto failed; + /* this fw pri bo is not used under SRIOV */ + if (!amdgpu_sriov_vf(psp->adev)) { + ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG, + AMDGPU_GEM_DOMAIN_GTT, + >fw_pri_bo, + >fw_pri_mc_addr, + >fw_pri_buf); + if (ret) + goto failed; + } ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM, @@ -1012,8 +1015,9 @@ static int psp_hw_fini(void *handle) psp_ring_destroy(psp, PSP_RING_TYPE__KM); amdgpu_bo_free_kernel(>tmr_bo, >tmr_mc_addr, >tmr_buf); - amdgpu_bo_free_kernel(>fw_pri_bo, - >fw_pri_mc_addr, >fw_pri_buf); + if (!amdgpu_sriov_vf(psp->adev)) + amdgpu_bo_free_kernel(>fw_pri_bo, + >fw_pri_mc_addr, >fw_pri_buf); This is superfluous, amdgpu_bo_free_kernel() does a NULL check anyway. Apart from that looks good to me, Christian. amdgpu_bo_free_kernel(>fence_buf_bo, >fence_buf_mc_addr, >fence_buf); amdgpu_bo_free_kernel(>asd_shared_bo, >asd_shared_mc_addr, ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit
Am 16.05.19 um 04:29 schrieb Kenny Ho: > [CAUTION: External Email] > > On Wed, May 15, 2019 at 5:26 PM Welty, Brian wrote: >> On 5/9/2019 2:04 PM, Kenny Ho wrote: >>> There are four control file types, >>> stats (ro) - display current measured values for a resource >>> max (rw) - limits for a resource >>> default (ro, root cgroup only) - default values for a resource >>> help (ro, root cgroup only) - help string for a resource >>> >>> Each file is multi-lined with one entry/line per drm device. >> Multi-line is correct for multiple devices, but I believe you need >> to use a KEY to denote device for both your set and get routines. >> I didn't see your set functions reading a key, or the get functions >> printing the key in output. >> cgroups-v2 conventions mention using KEY of major:minor, but I think >> you can use drm_minor as key? > Given this controller is specific to the drm kernel subsystem which > uses minor to identify drm device, Wait a second, using the DRM minor is a good idea in the first place. I have a test system with a Vega10 and a Vega20. Which device gets which minor is not stable, but rather defined by the scan order of the PCIe bus. Normally the scan order is always the same, but adding or removing devices or delaying things just a little bit during init is enough to change this. We need something like the Linux sysfs location or similar to have a stable implementation. Regards, Christian. > I don't see a need to complicate > the interfaces more by having major and a key. As you can see in the > examples below, the drm device minor corresponds to the line number. > I am not sure how strict cgroup upstream is about the convention but I > am hoping there are flexibility here to allow for what I have > implemented. There are a couple of other things I have done that is > not described in the convention: 1) inclusion of read-only *.help file > at the root cgroup, 2) use read-only (which I can potentially make rw) > *.default file instead of having a default entries (since the default > can be different for different devices) inside the control files (this > way, the resetting of cgroup values for all the drm devices, can be > done by a simple 'cp'.) > >>> Usage examples: >>> // set limit for card1 to 1GB >>> sed -i '2s/.*/1073741824/' /sys/fs/cgroup//drm.buffer.total.max >>> >>> // set limit for card0 to 512MB >>> sed -i '1s/.*/536870912/' /sys/fs/cgroup//drm.buffer.total.max > >>> /** @file drm_gem.c >>> @@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device *dev, >>>obj->handle_count = 0; >>>obj->size = size; >>>drm_vma_node_reset(>vma_node); >>> + >>> + obj->drmcgrp = get_drmcgrp(current); >>> + drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size); >> Why do the charging here? >> There is no backing store yet for the buffer, so this is really tracking >> something akin to allowed virtual memory for GEM objects? >> Is this really useful for an administrator to control? >> Isn't the resource we want to control actually the physical backing store? > That's correct. This is just the first level of control since the > backing store can be backed by different type of memory. I am in the > process of adding at least two more resources. Stay tuned. I am > doing the charge here to enforce the idea of "creator is deemed owner" > at a place where the code is shared by all (the init function.) > >>> + while (i <= max_minor && limits != NULL) { >>> + sval = strsep(, "\n"); >>> + rc = kstrtoll(sval, 0, ); >> Input should be "KEY VALUE", so KEY will determine device to apply this to. >> Also, per cgroups-v2 documentation of limits, I believe need to parse and >> handle the special "max" input value. >> >> parse_resources() in rdma controller is example for both of above. > Please see my previous reply for the rationale of my hope to not need > a key. I can certainly add handling of "max" and "default". > > >>> +void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev, >>> + size_t size) >> Shouldn't this return an error and be implemented with same semantics as the >> try_charge() functions of other controllers? >> Below will allow stats_total_allocated to overrun limits_total_allocated. > This is because I am charging the buffer at the init of the buffer > which does not fail so the "try" (drmcgrp_bo_can_allocate) is separate > and placed earlier and nearer other condition where gem object > allocation may fail. In other words, there are multiple possibilities > for which gem allocation may fail (cgroup limit being one of them) and > satisfying cgroup limit does not mean a charge is needed. I can > certainly combine the two functions to have an additional try_charge > semantic as well if that is really needed. > > Regards, > Kenny ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org