RE: [PATCH] drm/amd/powerplay: correct Arcturus OD support

2019-11-07 Thread Freehill, Chris
I think Kent has more insight with demand for this feature than I do, so I 
defer to his opinion. I haven't received any feedback on that API.

Main thing is that if it's not supported, don't have the file visible. 

Thanks,
Chris
-Original Message-
From: Russell, Kent  
Sent: Thursday, November 7, 2019 8:05 AM
To: Alex Deucher ; Quan, Evan 
Cc: Li, Candice ; amd-gfx@lists.freedesktop.org; Freehill, 
Chris 
Subject: RE: [PATCH] drm/amd/powerplay: correct Arcturus OD support

While we do like OverDrive being available (mostly our open-source community 
users who are using consumer cards vs server cards), if it's not supported on 
the HW for whatever reason, we'll adapt. If we get a request later for its 
enablement (since OverDrive is consumer-card-only in general, and I personally 
haven't seen any Arcturus consumer cards), we can discuss then. It's fine with 
Chris and I in that regard.

 Kent

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, November 7, 2019 8:58 AM
To: Quan, Evan 
Cc: Li, Candice ; amd-gfx@lists.freedesktop.org; Freehill, 
Chris 
Subject: Re: [PATCH] drm/amd/powerplay: correct Arcturus OD support

On Thu, Nov 7, 2019 at 2:38 AM Quan, Evan  wrote:
>
> OD is not supported on Arcturus. Thus the pp_od_clk_voltage sysfs 
> interface is also not supported.
>
> Change-Id: Ib70632a55a0980cf04c3432d43dbcf869cd1b4bf
> Signed-off-by: Evan Quan 

You might want to check with Chris and Kent about this.  I think there is a use 
case for OD on ROCm in some cases.  Assuming they are ok with it,
Reviewed-by: Alex Deucher 

Alex

> ---
>  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> index c21fe7ac5df8..76a4154b3be2 100644
> --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> @@ -714,6 +714,9 @@ static int smu_set_funcs(struct amdgpu_device
> *adev)  {
> struct smu_context *smu = >smu;
>
> +   if (adev->pm.pp_feature & PP_OVERDRIVE_MASK)
> +   smu->od_enabled = true;
> +
> switch (adev->asic_type) {
> case CHIP_VEGA20:
> vega20_set_ppt_funcs(smu); @@ -725,6 +728,8 @@ static 
> int smu_set_funcs(struct amdgpu_device *adev)
> break;
> case CHIP_ARCTURUS:
> arcturus_set_ppt_funcs(smu);
> +   /* OD is not supported on Arcturus */
> +   smu->od_enabled =false;
> break;
> case CHIP_RENOIR:
> renoir_set_ppt_funcs(smu); @@ -733,9 +738,6 @@ static 
> int smu_set_funcs(struct amdgpu_device *adev)
> return -EINVAL;
> }
>
> -   if (adev->pm.pp_feature & PP_OVERDRIVE_MASK)
> -   smu->od_enabled = true;
> -
> return 0;
>  }
>
> --
> 2.23.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Freehill, Chris
Hi Hawking,

I don't want to divert from this important discussion, but would like 
clarification...

In the rocm-smi library, I previously had this interpretation of the values (I 
think someone had confirmed this):
none,   //!< No current errors
disabled   //!< ECC is disabled
parity //!< ECC errors present, but type unknown
 single_correctable, //!< Single correctable error
multi_correctable//!< Multiple uncorrectable errors
poison //!< Firmware detected error and isolated
 //!< page. Treat as uncorrectable.

It seems now we are just saying it's enabled/disabled, but no indication of 
whether there are errors or not. Is that right, or am I misunderstanding 
something?

Chris

-Original Message-
From: Koenig, Christian  
Sent: Thursday, August 8, 2019 10:24 AM
To: Zhang, Hawking ; Russell, Kent 
; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, 
Chris 
Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info

Hi Hawking,

yeah, but this isn't sufficient to comply to the upstream rules.

See what we need to do is to remove the extra text and the per IP information. 
In other words something like this:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1a4412e47810..c8c7f9655ffc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -819,22 +819,7 @@ static ssize_t
amdgpu_ras_sysfs_features_read(struct device *dev,
     ssize_t s;
     struct ras_manager *obj;

-   s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", 
con->features);
-
-   for (i = 0; i < ras_block_count; i++) {
-   head.block = i;
-
-   if (amdgpu_ras_is_feature_enabled(adev, )) {
-   obj = amdgpu_ras_find_obj(adev, );
-   s += scnprintf([s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
- ras_err_str(obj->head.type));
-   } else
-   s += scnprintf([s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
-   }
+   s = scnprintf(buf, PAGE_SIZE, "0x%x\n", con->features);

     return s;
  }

And I'm not talking about some rule we could bend. This is an absolutely
*MUST* have for upstreaming.

Regards,
Christian.

Am 08.08.19 um 17:17 schrieb Zhang, Hawking:
> Hi Chris,
>
> I thought I've stated as "The feature mask is already good enough for this 
> node". It is just a uint32 value. We don't expect ROCm SMI to read any other 
> information from feature node except for the feature mask.
>
> Regards,
> Hawking
>
> -Original Message-
> From: Koenig, Christian 
> Sent: 2019年8月8日 23:11
> To: Zhang, Hawking ; Russell, Kent 
> ; Zhou1, Tao ; 
> amd-gfx@lists.freedesktop.org; Pan, Xinhui ; 
> Freehill, Chris 
> Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Hi Hawking,
>
> a multi line value is not the problem, but here you have multiple values.
>
> E.g. in the case of the pp tables that is one big array of power profiles and 
> we actually had a discussion if exposing them like this is ok or not.
>
> But in the case of the ras features you got multiple different things in the 
> same file. And that in turn is a clear violation of the sysfs rules.
>
> I don't think we can't upstream the interface like this. See here for 
> a good summary of the rules as well: https://lwn.net/Articles/378884/
>
> Regards,
> Christian.
>
> Am 08.08.19 um 17:00 schrieb Zhang, Hawking:
>> Hi Chris,
>>
>> I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are 
>> intended to be used by upper level apps/libs.
>>
>> There are bunches of sysfs entries that have multiple line value. The most 
>> complicate one would be pp_power_profile_mode, which looks like.
>>
>> 0 BOOTUP_DEFAULT*:
>>   0(   GFXCLK)   0   0   1   0   
>> 4 800 4587520  -65536   0
>>   1(   SOCCLK)   0   0   1   0   
>> 4 800  327680   -6553   0
>>   2( UCLK)   0   0   1   0   
>> 4 800  327680  -65536   0
>> ...
>> 1 3D_FULL_SCREEN :
>>   0(   GFXCLK)   0   1   1   0   
>> 4 800 4587520  -65536   0
>>   1(   SOCCLK)   0   1   4 850   
>> 4 800  327680  -65536   0
>>
>> Regards,
>> Hawking
>> ---

RE: [PATCH] drm/amdgpu: expose sclk and mclk via hwmon

2019-01-08 Thread Freehill, Chris
Ditto...except if those files Felix mentioned were to go away as 
pp_od_clk_voltage more widely used, then we would need this.

I think Alex had mentioned there was some redundancy between those, but 
pp_od_clk_voltage didn't indicate the current frequencies.

Chris



-Original Message-
From: Kuehling, Felix  
Sent: Tuesday, January 8, 2019 3:44 PM
To: Deucher, Alexander ; Alex Deucher 
; amd-gfx@lists.freedesktop.org; Russell, Kent 
; Freehill, Chris 
Subject: Re: [PATCH] drm/amdgpu: expose sclk and mclk via hwmon

I'm not saying it's useless. But rocm-smi can already get the clocks from sysfs 
(pp_dpm_sclk and pp_dpm_mclk). Are these clocks any different?


Regards,
  Felix


On 2019-01-08 4:32 p.m., Deucher, Alexander wrote:
>
> Ping?  Useful for rocm-smi?
>
> --
> --
> *From:* Alex Deucher 
> *Sent:* Monday, December 10, 2018 4:17:27 PM
> *To:* amd-gfx@lists.freedesktop.org
> *Cc:* Deucher, Alexander
> *Subject:* [PATCH] drm/amdgpu: expose sclk and mclk via hwmon
>  
> Expose sclk (gfx clock) and mclk (memory clock) via hwmon compatible 
> interface.  hwmon does not actually formally specify a frequency type 
> attribute, but these are compatible with the format of the other 
> attributes exposed via hwmon.  Units are hertz.
>
> freq1_input - GPU gfx/compute clock in hertz freq2_input - GPU memory 
> clock in hertz (dGPU only)
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 93
> ++
>  1 file changed, 93 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 1f61ed95727c..6d52428fc45b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -1516,6 +1516,75 @@ static ssize_t
> amdgpu_hwmon_set_power_cap(struct device *dev,
>  return count;
>  }
>  
> +static ssize_t amdgpu_hwmon_show_sclk(struct device *dev,
> + struct device_attribute *attr,
> + char *buf) {
> +   struct amdgpu_device *adev = dev_get_drvdata(dev);
> +   struct drm_device *ddev = adev->ddev;
> +   uint32_t sclk;
> +   int r, size = sizeof(sclk);
> +
> +   /* Can't get voltage when the card is off */
> +   if  ((adev->flags & AMD_IS_PX) &&
> +    (ddev->switch_power_state != DRM_SWITCH_POWER_ON))
> +   return -EINVAL;
> +
> +   /* sanity check PP is enabled */
> +   if (!(adev->powerplay.pp_funcs &&
> + adev->powerplay.pp_funcs->read_sensor))
> + return -EINVAL;
> +
> +   /* get the sclk */
> +   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GFX_SCLK,
> +  (void *), );
> +   if (r)
> +   return r;
> +
> +   return snprintf(buf, PAGE_SIZE, "%d\n", sclk * 10 * 1000); }
> +
> +static ssize_t amdgpu_hwmon_show_sclk_label(struct device *dev,
> +   struct device_attribute 
> +*attr,
> +   char *buf) {
> +   return snprintf(buf, PAGE_SIZE, "sclk\n"); }
> +
> +static ssize_t amdgpu_hwmon_show_mclk(struct device *dev,
> + struct device_attribute *attr,
> + char *buf) {
> +   struct amdgpu_device *adev = dev_get_drvdata(dev);
> +   struct drm_device *ddev = adev->ddev;
> +   uint32_t mclk;
> +   int r, size = sizeof(mclk);
> +
> +   /* Can't get voltage when the card is off */
> +   if  ((adev->flags & AMD_IS_PX) &&
> +    (ddev->switch_power_state != DRM_SWITCH_POWER_ON))
> +   return -EINVAL;
> +
> +   /* sanity check PP is enabled */
> +   if (!(adev->powerplay.pp_funcs &&
> + adev->powerplay.pp_funcs->read_sensor))
> + return -EINVAL;
> +
> +   /* get the sclk */
> +   r = amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GFX_MCLK,
> +  (void *), );
> +   if (r)
> +   return r;
> +
> +   return snprintf(buf, PAGE_SIZE, "%d\n", mclk * 10 * 1000); }
> +
> +static ssize_t amdgpu_hwmon_show_mclk_label(struct device *dev,
> +   struct device_attribute 
> +*attr,
> +   char *buf) {
> +   return snprintf(buf, PAGE_SIZE, "mclk\n"); }
>  
>  /**
>   * DOC: hwmon
> @@ -1532,6 +1601,10 @@ static