RE: [PATCH] drm/amd/powerplay: correct Arcturus OD support
I think Kent has more insight with demand for this feature than I do, so I defer to his opinion. I haven't received any feedback on that API. Main thing is that if it's not supported, don't have the file visible. Thanks, Chris -Original Message- From: Russell, Kent Sent: Thursday, November 7, 2019 8:05 AM To: Alex Deucher ; Quan, Evan Cc: Li, Candice ; amd-gfx@lists.freedesktop.org; Freehill, Chris Subject: RE: [PATCH] drm/amd/powerplay: correct Arcturus OD support While we do like OverDrive being available (mostly our open-source community users who are using consumer cards vs server cards), if it's not supported on the HW for whatever reason, we'll adapt. If we get a request later for its enablement (since OverDrive is consumer-card-only in general, and I personally haven't seen any Arcturus consumer cards), we can discuss then. It's fine with Chris and I in that regard. Kent -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, November 7, 2019 8:58 AM To: Quan, Evan Cc: Li, Candice ; amd-gfx@lists.freedesktop.org; Freehill, Chris Subject: Re: [PATCH] drm/amd/powerplay: correct Arcturus OD support On Thu, Nov 7, 2019 at 2:38 AM Quan, Evan wrote: > > OD is not supported on Arcturus. Thus the pp_od_clk_voltage sysfs > interface is also not supported. > > Change-Id: Ib70632a55a0980cf04c3432d43dbcf869cd1b4bf > Signed-off-by: Evan Quan You might want to check with Chris and Kent about this. I think there is a use case for OD on ROCm in some cases. Assuming they are ok with it, Reviewed-by: Alex Deucher Alex > --- > drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > index c21fe7ac5df8..76a4154b3be2 100644 > --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > @@ -714,6 +714,9 @@ static int smu_set_funcs(struct amdgpu_device > *adev) { > struct smu_context *smu = >smu; > > + if (adev->pm.pp_feature & PP_OVERDRIVE_MASK) > + smu->od_enabled = true; > + > switch (adev->asic_type) { > case CHIP_VEGA20: > vega20_set_ppt_funcs(smu); @@ -725,6 +728,8 @@ static > int smu_set_funcs(struct amdgpu_device *adev) > break; > case CHIP_ARCTURUS: > arcturus_set_ppt_funcs(smu); > + /* OD is not supported on Arcturus */ > + smu->od_enabled =false; > break; > case CHIP_RENOIR: > renoir_set_ppt_funcs(smu); @@ -733,9 +738,6 @@ static > int smu_set_funcs(struct amdgpu_device *adev) > return -EINVAL; > } > > - if (adev->pm.pp_feature & PP_OVERDRIVE_MASK) > - smu->od_enabled = true; > - > return 0; > } > > -- > 2.23.0 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: update ras sysfs feature info
Hi Hawking, I don't want to divert from this important discussion, but would like clarification... In the rocm-smi library, I previously had this interpretation of the values (I think someone had confirmed this): none, //!< No current errors disabled //!< ECC is disabled parity //!< ECC errors present, but type unknown single_correctable, //!< Single correctable error multi_correctable//!< Multiple uncorrectable errors poison //!< Firmware detected error and isolated //!< page. Treat as uncorrectable. It seems now we are just saying it's enabled/disabled, but no indication of whether there are errors or not. Is that right, or am I misunderstanding something? Chris -Original Message- From: Koenig, Christian Sent: Thursday, August 8, 2019 10:24 AM To: Zhang, Hawking ; Russell, Kent ; Zhou1, Tao ; amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, Chris Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info Hi Hawking, yeah, but this isn't sufficient to comply to the upstream rules. See what we need to do is to remove the extra text and the per IP information. In other words something like this: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1a4412e47810..c8c7f9655ffc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -819,22 +819,7 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct device *dev, ssize_t s; struct ras_manager *obj; - s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", con->features); - - for (i = 0; i < ras_block_count; i++) { - head.block = i; - - if (amdgpu_ras_is_feature_enabled(adev, )) { - obj = amdgpu_ras_find_obj(adev, ); - s += scnprintf([s], PAGE_SIZE - s, - "%s: %s\n", - ras_block_str(i), - ras_err_str(obj->head.type)); - } else - s += scnprintf([s], PAGE_SIZE - s, - "%s: disabled\n", - ras_block_str(i)); - } + s = scnprintf(buf, PAGE_SIZE, "0x%x\n", con->features); return s; } And I'm not talking about some rule we could bend. This is an absolutely *MUST* have for upstreaming. Regards, Christian. Am 08.08.19 um 17:17 schrieb Zhang, Hawking: > Hi Chris, > > I thought I've stated as "The feature mask is already good enough for this > node". It is just a uint32 value. We don't expect ROCm SMI to read any other > information from feature node except for the feature mask. > > Regards, > Hawking > > -Original Message- > From: Koenig, Christian > Sent: 2019年8月8日 23:11 > To: Zhang, Hawking ; Russell, Kent > ; Zhou1, Tao ; > amd-gfx@lists.freedesktop.org; Pan, Xinhui ; > Freehill, Chris > Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info > > Hi Hawking, > > a multi line value is not the problem, but here you have multiple values. > > E.g. in the case of the pp tables that is one big array of power profiles and > we actually had a discussion if exposing them like this is ok or not. > > But in the case of the ras features you got multiple different things in the > same file. And that in turn is a clear violation of the sysfs rules. > > I don't think we can't upstream the interface like this. See here for > a good summary of the rules as well: https://lwn.net/Articles/378884/ > > Regards, > Christian. > > Am 08.08.19 um 17:00 schrieb Zhang, Hawking: >> Hi Chris, >> >> I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are >> intended to be used by upper level apps/libs. >> >> There are bunches of sysfs entries that have multiple line value. The most >> complicate one would be pp_power_profile_mode, which looks like. >> >> 0 BOOTUP_DEFAULT*: >> 0( GFXCLK) 0 0 1 0 >> 4 800 4587520 -65536 0 >> 1( SOCCLK) 0 0 1 0 >> 4 800 327680 -6553 0 >> 2( UCLK) 0 0 1 0 >> 4 800 327680 -65536 0 >> ... >> 1 3D_FULL_SCREEN : >> 0( GFXCLK) 0 1 1 0 >> 4 800 4587520 -65536 0 >> 1( SOCCLK) 0 1 4 850 >> 4 800 327680 -65536 0 >> >> Regards, >> Hawking >> ---
RE: [PATCH] drm/amdgpu: expose sclk and mclk via hwmon
Ditto...except if those files Felix mentioned were to go away as pp_od_clk_voltage more widely used, then we would need this. I think Alex had mentioned there was some redundancy between those, but pp_od_clk_voltage didn't indicate the current frequencies. Chris -Original Message- From: Kuehling, Felix Sent: Tuesday, January 8, 2019 3:44 PM To: Deucher, Alexander ; Alex Deucher ; amd-gfx@lists.freedesktop.org; Russell, Kent ; Freehill, Chris Subject: Re: [PATCH] drm/amdgpu: expose sclk and mclk via hwmon I'm not saying it's useless. But rocm-smi can already get the clocks from sysfs (pp_dpm_sclk and pp_dpm_mclk). Are these clocks any different? Regards, Felix On 2019-01-08 4:32 p.m., Deucher, Alexander wrote: > > Ping? Useful for rocm-smi? > > -- > -- > *From:* Alex Deucher > *Sent:* Monday, December 10, 2018 4:17:27 PM > *To:* amd-gfx@lists.freedesktop.org > *Cc:* Deucher, Alexander > *Subject:* [PATCH] drm/amdgpu: expose sclk and mclk via hwmon > > Expose sclk (gfx clock) and mclk (memory clock) via hwmon compatible > interface. hwmon does not actually formally specify a frequency type > attribute, but these are compatible with the format of the other > attributes exposed via hwmon. Units are hertz. > > freq1_input - GPU gfx/compute clock in hertz freq2_input - GPU memory > clock in hertz (dGPU only) > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 93 > ++ > 1 file changed, 93 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > index 1f61ed95727c..6d52428fc45b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > @@ -1516,6 +1516,75 @@ static ssize_t > amdgpu_hwmon_set_power_cap(struct device *dev, > return count; > } > > +static ssize_t amdgpu_hwmon_show_sclk(struct device *dev, > + struct device_attribute *attr, > + char *buf) { > + struct amdgpu_device *adev = dev_get_drvdata(dev); > + struct drm_device *ddev = adev->ddev; > + uint32_t sclk; > + int r, size = sizeof(sclk); > + > + /* Can't get voltage when the card is off */ > + if ((adev->flags & AMD_IS_PX) && > + (ddev->switch_power_state != DRM_SWITCH_POWER_ON)) > + return -EINVAL; > + > + /* sanity check PP is enabled */ > + if (!(adev->powerplay.pp_funcs && > + adev->powerplay.pp_funcs->read_sensor)) > + return -EINVAL; > + > + /* get the sclk */ > + r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GFX_SCLK, > + (void *), ); > + if (r) > + return r; > + > + return snprintf(buf, PAGE_SIZE, "%d\n", sclk * 10 * 1000); } > + > +static ssize_t amdgpu_hwmon_show_sclk_label(struct device *dev, > + struct device_attribute > +*attr, > + char *buf) { > + return snprintf(buf, PAGE_SIZE, "sclk\n"); } > + > +static ssize_t amdgpu_hwmon_show_mclk(struct device *dev, > + struct device_attribute *attr, > + char *buf) { > + struct amdgpu_device *adev = dev_get_drvdata(dev); > + struct drm_device *ddev = adev->ddev; > + uint32_t mclk; > + int r, size = sizeof(mclk); > + > + /* Can't get voltage when the card is off */ > + if ((adev->flags & AMD_IS_PX) && > + (ddev->switch_power_state != DRM_SWITCH_POWER_ON)) > + return -EINVAL; > + > + /* sanity check PP is enabled */ > + if (!(adev->powerplay.pp_funcs && > + adev->powerplay.pp_funcs->read_sensor)) > + return -EINVAL; > + > + /* get the sclk */ > + r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GFX_MCLK, > + (void *), ); > + if (r) > + return r; > + > + return snprintf(buf, PAGE_SIZE, "%d\n", mclk * 10 * 1000); } > + > +static ssize_t amdgpu_hwmon_show_mclk_label(struct device *dev, > + struct device_attribute > +*attr, > + char *buf) { > + return snprintf(buf, PAGE_SIZE, "mclk\n"); } > > /** > * DOC: hwmon > @@ -1532,6 +1601,10 @@ static