date:20190517

Re: lima_bo memory leak after drm_sched job destruction rework

2019-05-17 Thread Grodzovsky, Andrey

Don't have the code in front of me now but as far as I remember it will only 
prematurely terminate in drm_sched_cleanup_jobs if there is timeout work in 
progress which would not be the case if nothing hangs.

Andrey

From: Erico Nunes 
Sent: 17 May 2019 17:42:48
To: Grodzovsky, Andrey
Cc: Deucher, Alexander; Koenig, Christian; Zhou, David(ChunMing); David Airlie; 
Daniel Vetter; Lucas Stach; Russell King; Christian Gmeiner; Qiang Yu; Rob 
Herring; Tomeu Vizoso; Eric Anholt; Rex Zhu; Huang, Ray; Deng, Emily; Nayan 
Deshmukh; Sharat Masetty; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; l...@lists.freedesktop.org
Subject: Re: lima_bo memory leak after drm_sched job destruction rework

[CAUTION: External Email]

On Fri, May 17, 2019 at 10:43 PM Grodzovsky, Andrey
 wrote:
> On 5/17/19 3:35 PM, Erico Nunes wrote:
> > Lima currently defaults to an "infinite" timeout. Setting a 500ms
> > default timeout like most other drm_sched users do fixed the leak for
> > me.
>
> I am not very clear about the problem - so you basically never allow a
> time out handler to run ? And then when the job hangs for ever you get
> this memory leak ? How it worked for you before this refactoring ? As
> far as I remember  sched->ops->free_job before this change was called
> from drm_sched_job_finish which is the work scheduled from HW fence
> signaled callback - drm_sched_process_job so if your job hangs for ever
> anyway and this work is never scheduled  when your free_job callback was
> called ?

In this particular case, the jobs run successfully, nothing hangs.
Lima currently specifies an "infinite" timeout to the drm scheduler,
so if a job did did hang, it would hang forever, I suppose. But this
is not the issue.

If I understand correctly it worked well before the rework because the
cleanup was triggered at the end of drm_sched_process_job
independently on the timeout.

What I'm observing now is that even when jobs run successfully, they
are not cleaned by the drm scheduler because drm_sched_cleanup_jobs
seems to give up based on the status of a timeout worker.
I would expect the timeout value to only be relevant in error/hung job cases.

I will probably set the timeout to a reasonable value anyway, I just
posted here to report that this can possibly be a bug in the drm
scheduler after that rework.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: lima_bo memory leak after drm_sched job destruction rework

2019-05-17 Thread Erico Nunes

On Fri, May 17, 2019 at 10:43 PM Grodzovsky, Andrey
 wrote:
> On 5/17/19 3:35 PM, Erico Nunes wrote:
> > Lima currently defaults to an "infinite" timeout. Setting a 500ms
> > default timeout like most other drm_sched users do fixed the leak for
> > me.
>
> I am not very clear about the problem - so you basically never allow a
> time out handler to run ? And then when the job hangs for ever you get
> this memory leak ? How it worked for you before this refactoring ? As
> far as I remember  sched->ops->free_job before this change was called
> from drm_sched_job_finish which is the work scheduled from HW fence
> signaled callback - drm_sched_process_job so if your job hangs for ever
> anyway and this work is never scheduled  when your free_job callback was
> called ?

In this particular case, the jobs run successfully, nothing hangs.
Lima currently specifies an "infinite" timeout to the drm scheduler,
so if a job did did hang, it would hang forever, I suppose. But this
is not the issue.

If I understand correctly it worked well before the rework because the
cleanup was triggered at the end of drm_sched_process_job
independently on the timeout.

What I'm observing now is that even when jobs run successfully, they
are not cleaned by the drm scheduler because drm_sched_cleanup_jobs
seems to give up based on the status of a timeout worker.
I would expect the timeout value to only be relevant in error/hung job cases.

I will probably set the timeout to a reasonable value anyway, I just
posted here to report that this can possibly be a bug in the drm
scheduler after that rework.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

lima_bo memory leak after drm_sched job destruction rework

2019-05-17 Thread Erico Nunes

Hello,

I have recently observed a memory leak issue with lima using
drm-misc-next, which I initially reported here:
https://gitlab.freedesktop.org/lima/linux/issues/24
It is an easily reproduceable memory leak which I was able to bisect to commit:

5918045c4ed4 drm/scheduler: rework job destruction

After some investigation, it seems that after the refactor,
sched->ops->free_job (in lima: lima_sched_free_job) is no longer
called.
With some more debugging I found that it is not being called because
the job freeing is now in drm_sched_cleanup_jobs, which for lima
always aborts in the initial "Don't destroy jobs while the timeout
worker is running" condition.

Lima currently defaults to an "infinite" timeout. Setting a 500ms
default timeout like most other drm_sched users do fixed the leak for
me.

I can send a patch to set a 500ms timeout and have it probably working
again, but I am wondering now if this is expected behaviour for
drm_sched after the refactor.
In particular I also noticed that drm_sched_suspend_timeout is not
called anywhere. Is it expected that we now rely on a timeout
parameter to cleanup jobs that ran successfully?

Erico
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: lima_bo memory leak after drm_sched job destruction rework

2019-05-17 Thread Grodzovsky, Andrey

On 5/17/19 3:35 PM, Erico Nunes wrote:
> [CAUTION: External Email]
>
> Hello,
>
> I have recently observed a memory leak issue with lima using
> drm-misc-next, which I initially reported here:
> https://gitlab.freedesktop.org/lima/linux/issues/24
> It is an easily reproduceable memory leak which I was able to bisect to 
> commit:
>
> 5918045c4ed4 drm/scheduler: rework job destruction
>
> After some investigation, it seems that after the refactor,
> sched->ops->free_job (in lima: lima_sched_free_job) is no longer
> called.
> With some more debugging I found that it is not being called because
> the job freeing is now in drm_sched_cleanup_jobs, which for lima
> always aborts in the initial "Don't destroy jobs while the timeout
> worker is running" condition.
>
> Lima currently defaults to an "infinite" timeout. Setting a 500ms
> default timeout like most other drm_sched users do fixed the leak for
> me.

I am not very clear about the problem - so you basically never allow a 
time out handler to run ? And then when the job hangs for ever you get 
this memory leak ? How it worked for you before this refactoring ? As 
far as I remember  sched->ops->free_job before this change was called 
from drm_sched_job_finish which is the work scheduled from HW fence 
signaled callback - drm_sched_process_job so if your job hangs for ever 
anyway and this work is never scheduled  when your free_job callback was 
called ?

>
> I can send a patch to set a 500ms timeout and have it probably working
> again, but I am wondering now if this is expected behaviour for
> drm_sched after the refactor.
> In particular I also noticed that drm_sched_suspend_timeout is not
> called anywhere. Is it expected that we now rely on a timeout
> parameter to cleanup jobs that ran successfully?

AFAIK the drm_sched_suspend_timeout is used by a driver in a staging 
branch, Christian can give more detail.

Andrey

>
> Erico
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2 4/4] drm/amdkfd: Check against device cgroup

2019-05-17 Thread Kuehling, Felix

Patches 1,2,4 will be submitted through amd-staging-drm-next. Patch 3 
goes through the cgroup tree. Patch 4 depends on patch 3. So submitting 
patch 4 will need to wait until we rebase amd-staging-drm-next on a new 
enough kernel release that includes patch 3.

Patch 1 and 2 could be submitted now or wait for patch 3 as well so we 
submit all our cgroup stuff in amdgpu and KFD together. It probably 
makes most sense to wait since unused code tends to rot.

Patches 1,2,4 are already reviewed by me. Feel free to add my Acked-by 
to patch 3.

Thanks,
   Felix

On 2019-05-17 12:15 p.m., Kasiviswanathan, Harish wrote:
> [CAUTION: External Email]
>
> Participate in device cgroup. All kfd devices are exposed via /dev/kfd.
> So use /dev/dri/renderN node.
>
> Before exposing the device to a task check if it has permission to
> access it. If the task (based on its cgroup) can access /dev/dri/renderN
> then expose the device via kfd node.
>
> If the task cannot access /dev/dri/renderN then process device data
> (pdd) is not created. This will ensure that task cannot use the device.
>
> In sysfs topology, all device nodes are visible irrespective of the task
> cgroup. The sysfs node directories are created at driver load time and
> cannot be changed dynamically. However, access to information inside
> nodes is controlled based on the task's cgroup permissions.
>
> Change-Id: Ia7ed40930542111ac9f74b0706c3fa5ebf186b3c
> Signed-off-by: Harish Kasiviswanathan 
> Reviewed-by: Felix Kuehling 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c |  9 +++--
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 17 +
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c| 12 
>   3 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
> index 22a8e88b6a67..85c8838323a2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
> @@ -369,8 +369,13 @@ int kfd_init_apertures(struct kfd_process *process)
>
>  /*Iterating over all devices*/
>  while (kfd_topology_enum_kfd_devices(id, ) == 0) {
> -   if (!dev) {
> -   id++; /* Skip non GPU devices */
> +   if (!dev || kfd_devcgroup_check_permission(dev)) {
> +   /* Skip non GPU devices and devices to which the
> +* current process have no access to. Access can be
> +* limited by placing the process in a specific
> +* cgroup hierarchy.
> +*/
> +   id++;
>  continue;
>  }
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index cbba0047052d..85f55022014a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -35,6 +35,8 @@
>   #include 
>   #include 
>   #include 
> +#include 
> +#include 
>   #include 
>
>   #include "amd_shared.h"
> @@ -989,6 +991,21 @@ bool kfd_is_locked(void);
>   void kfd_inc_compute_active(struct kfd_dev *dev);
>   void kfd_dec_compute_active(struct kfd_dev *dev);
>
> +/* Cgroup Support */
> +/* Check with device cgroup if @kfd device is accessible */
> +static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
> +{
> +#if defined(CONFIG_CGROUP_DEVICE)
> +   struct drm_device *ddev = kfd->ddev;
> +
> +   return devcgroup_check_permission(DEVCG_DEV_CHAR, ddev->driver->major,
> + ddev->render->index,
> + DEVCG_ACC_WRITE | DEVCG_ACC_READ);
> +#else
> +   return 0;
> +#endif
> +}
> +
>   /* Debugfs */
>   #if defined(CONFIG_DEBUG_FS)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 64d667ae0d36..a3a14a76ece1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -269,6 +269,8 @@ static ssize_t iolink_show(struct kobject *kobj, struct 
> attribute *attr,
>  buffer[0] = 0;
>
>  iolink = container_of(attr, struct kfd_iolink_properties, attr);
> +   if (iolink->gpu && kfd_devcgroup_check_permission(iolink->gpu))
> +   return -EPERM;
>  sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
>  sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
>  sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
> @@ -305,6 +307,8 @@ static ssize_t mem_show(struct kobject *kobj, struct 
> attribute *attr,
>  buffer[0] = 0;
>
>  mem = container_of(attr, struct kfd_mem_properties, attr);
> +   if (mem->gpu && kfd_devcgroup_check_permission(mem->gpu))
> +   return -EPERM;
>  sysfs_show_32bit_prop(buffer, "heap_type",

Re: [PATCH v2 0/4] AMDKFD (AMD GPU compute) support for device cgroup.

2019-05-17 Thread Kasiviswanathan, Harish

Hi Tejun,

Thanks for comments. I can definitely add more documentation but just need a 
bit of clarification on this.

1). Documentation for user on how to use device cgroup for amdkfd device. I 
have some more information on this in patch 4. 
or
2) The reason devcgroup_check_permission() needs to be exported
or
3) something else totally that I missed.

Best Regards,
Harish

From: Tejun Heo  on behalf of Tejun Heo 
Sent: Friday, May 17, 2019 12:49 PM
To: Kasiviswanathan, Harish
Cc: cgro...@vger.kernel.org; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v2 0/4] AMDKFD (AMD GPU compute) support for device cgroup.

[CAUTION: External Email]

On Fri, May 17, 2019 at 04:14:52PM +, Kasiviswanathan, Harish wrote:
> amdkfd (part of amdgpu) driver supports the AMD GPU compute stack.
> amdkfd exposes only a single device /dev/kfd even if multiple AMD GPU
> (compute) devices exist in a system. However, amdgpu drvier exposes a
> separate render device file /dev/dri/renderDN for each device. To participate
> in device cgroup amdkfd driver will rely on these redner device files.
>
> v2: Exporting devcgroup_check_permission() instead of
> __devcgroup_check_permission() as per review comments.

Looks fine to me but given how non-obvious it is, some documentation
would be great.

Thanks.

--
tejun

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: use div64_ul for 32-bit compatibility v1

2019-05-17 Thread Alex Deucher

On Fri, May 17, 2019 at 2:34 PM Abramov, Slava  wrote:
>
> v1: replace casting to unsigned long with div64_ul
>
> Change-Id: Ia48671ed0756bb73c7b4760a800bcb6f600cbef2
> Signed-off-by: Slava Abramov 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index da1dc40b9b14..d5719b0fb82c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -764,8 +764,8 @@ static ssize_t amdgpu_ras_sysfs_badpages_read(struct file 
> *f,
>   struct amdgpu_device *adev = con->adev;
>   const unsigned int element_size =
>   sizeof("0xabcdabcd : 0x12345678 : R\n") - 1;
> - unsigned int start = (ppos + element_size - 1) / element_size;
> - unsigned int end = (ppos + count - 1) / element_size;
> + unsigned int start = div64_ul(ppos + element_size - 1, element_size);
> + unsigned int end = div64_ul(ppos + count - 1, element_size);
>   ssize_t s = 0;
>   struct ras_badpage *bps = NULL;
>   unsigned int bps_count = 0;
> --
> 2.17.1
>
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/soc15: skip reset on init

2019-05-17 Thread Lin, Amber

Acked-by: Amber Lin 

On 2019-05-17 1:02 p.m., Alex Deucher wrote:
> [CAUTION: External Email]
>
> On Fri, May 17, 2019 at 10:47 AM Lin, Amber  wrote:
>>
>>
>> On 2019-05-17 10:26 a.m., Alex Deucher wrote:
>>> [CAUTION: External Email]
>>>
>>> Not necessary on soc15 and breaks driver reload on server cards.
>>>
>>> Signed-off-by: Alex Deucher 
>>> Cc: sta...@vger.kernel.org
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/soc15.c | 5 +
>>>1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
>>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> index 32dc5a128249..78bd4fc07bab 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> @@ -761,6 +761,11 @@ static bool soc15_need_reset_on_init(struct 
>>> amdgpu_device *adev)
>>>{
>>>   u32 sol_reg;
>>>
>>> +   /* Just return false for soc15 GPUs.  Reset does not seem to
>>> +* be necessary.
>>> +*/
>>> +   return false;
>>> +
>>>   if (adev->flags & AMD_IS_APU)
>>>   return false;
>> Should remove the rest of code in this function and sol_reg as well?
>> Simply return false?
> I was thinking we'd leave it in place for now in case we need to
> re-enable it for something else in the future, but I guess we can just
> revert the change if need be.  I don't have a strong opinion either
> way.
>
> Alex
>
>>> --
>>> 2.20.1
>>>
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Add Unique Identifier sysfs file unique_id v2

2019-05-17 Thread Alex Deucher

On Fri, May 17, 2019 at 2:31 PM Russell, Kent  wrote:
>
> Add a file that provides a Unique ID for the GPU.
> This will persist across machines and is guaranteed to be unique.
> This is only available for GFX9 and newer, so older ASICs will not
> have this file in the sysfs pool
>
> v2: Store it in adev for ASICs that don't have a hwmgr
>
> Change-Id: I3c673f78efcc5bf93ca58d65edbe39fc3a86b42a
> Signed-off-by: Kent Russell 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 32 +++
>  .../drm/amd/powerplay/hwmgr/vega10_hwmgr.c|  9 ++
>  .../drm/amd/powerplay/hwmgr/vega12_hwmgr.c| 10 ++
>  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 10 ++
>  5 files changed, 63 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 96394ef3fae3..d355e9a09ad1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -960,6 +960,8 @@ struct amdgpu_device {
> longsdma_timeout;
> longvideo_timeout;
> longcompute_timeout;
> +
> +   uint64_tunique_id;
>  };
>
>  static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device 
> *bdev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index adba9ea03e63..a73e1903d29b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -1368,6 +1368,29 @@ static ssize_t amdgpu_get_pcie_bw(struct device *dev,
> count0, count1, pcie_get_mps(adev->pdev));
>  }
>
> +/**
> + * DOC: unique_id
> + *
> + * The amdgpu driver provides a sysfs API for providing a unique ID for the 
> GPU
> + * The file unique_id is used for this.
> + * This will provide a Unique ID that will persist from machine to machine
> + *
> + * NOTE: This will only work for GFX9 and newer. This file will be absent
> + * on unsupported ASICs (GFX8 and older)
> + */
> +static ssize_t amdgpu_get_unique_id(struct device *dev,
> +   struct device_attribute *attr,
> +   char *buf)
> +{
> +   struct drm_device *ddev = dev_get_drvdata(dev);
> +   struct amdgpu_device *adev = ddev->dev_private;
> +
> +   if (adev->unique_id)
> +   return snprintf(buf, PAGE_SIZE, "%016llx\n", adev->unique_id);
> +
> +   return 0;
> +}
> +
>  static DEVICE_ATTR(power_dpm_state, S_IRUGO | S_IWUSR, amdgpu_get_dpm_state, 
> amdgpu_set_dpm_state);
>  static DEVICE_ATTR(power_dpm_force_performance_level, S_IRUGO | S_IWUSR,
>amdgpu_get_dpm_forced_performance_level,
> @@ -1418,6 +1441,7 @@ static DEVICE_ATTR(pcie_bw, S_IRUGO, 
> amdgpu_get_pcie_bw, NULL);
>  static DEVICE_ATTR(ppfeatures, S_IRUGO | S_IWUSR,
> amdgpu_get_ppfeature_status,
> amdgpu_set_ppfeature_status);
> +static DEVICE_ATTR(unique_id, S_IRUGO, amdgpu_get_unique_id, NULL);
>
>  static ssize_t amdgpu_hwmon_show_temp(struct device *dev,
>   struct device_attribute *attr,
> @@ -2814,6 +2838,12 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev)
> return ret;
> }
> }
> +   if (adev->unique_id)
> +   ret = device_create_file(adev->dev, _attr_unique_id);
> +   if (ret) {
> +   DRM_ERROR("failed to create device file unique_id\n");
> +   return ret;
> +   }
> ret = amdgpu_debugfs_pm_init(adev);
> if (ret) {
> DRM_ERROR("Failed to register debugfs file for dpm!\n");
> @@ -2875,6 +2905,8 @@ void amdgpu_pm_sysfs_fini(struct amdgpu_device *adev)
> device_remove_file(adev->dev, _attr_mem_busy_percent);
> if (!(adev->flags & AMD_IS_APU))
> device_remove_file(adev->dev, _attr_pcie_bw);
> +   if (adev->unique_id)
> +   device_remove_file(adev->dev, _attr_unique_id);
> if ((adev->asic_type >= CHIP_VEGA10) &&
> !(adev->flags & AMD_IS_APU))
> device_remove_file(adev->dev, _attr_ppfeatures);
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> index 9585ba51d853..ce6aeb5a0362 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> @@ -356,6 +356,7 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr 
> *hwmgr)
> struct vega10_hwmgr *data = hwmgr->backend;
> int i;
> uint32_t sub_vendor_id, hw_revision;
> +   uint32_t top32, bottom32;
> struct amdgpu_device *adev = hwmgr->adev;
>
> vega10_initialize_power_tune_defaults(hwmgr);
> @@ -499,6 +500,14 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr 
> *hwmgr)

[PATCH] drm/amdgpu: use div64_ul for 32-bit compatibility v1

2019-05-17 Thread Abramov, Slava

v1: replace casting to unsigned long with div64_ul

Change-Id: Ia48671ed0756bb73c7b4760a800bcb6f600cbef2
Signed-off-by: Slava Abramov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index da1dc40b9b14..d5719b0fb82c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -764,8 +764,8 @@ static ssize_t amdgpu_ras_sysfs_badpages_read(struct file 
*f,
  struct amdgpu_device *adev = con->adev;
  const unsigned int element_size =
  sizeof("0xabcdabcd : 0x12345678 : R\n") - 1;
- unsigned int start = (ppos + element_size - 1) / element_size;
- unsigned int end = (ppos + count - 1) / element_size;
+ unsigned int start = div64_ul(ppos + element_size - 1, element_size);
+ unsigned int end = div64_ul(ppos + count - 1, element_size);
  ssize_t s = 0;
  struct ras_badpage *bps = NULL;
  unsigned int bps_count = 0;
--
2.17.1


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: Add Unique Identifier sysfs file unique_id v2

2019-05-17 Thread Russell, Kent

Add a file that provides a Unique ID for the GPU.
This will persist across machines and is guaranteed to be unique.
This is only available for GFX9 and newer, so older ASICs will not
have this file in the sysfs pool

v2: Store it in adev for ASICs that don't have a hwmgr

Change-Id: I3c673f78efcc5bf93ca58d65edbe39fc3a86b42a
Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 32 +++
 .../drm/amd/powerplay/hwmgr/vega10_hwmgr.c|  9 ++
 .../drm/amd/powerplay/hwmgr/vega12_hwmgr.c| 10 ++
 .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 10 ++
 5 files changed, 63 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 96394ef3fae3..d355e9a09ad1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -960,6 +960,8 @@ struct amdgpu_device {
longsdma_timeout;
longvideo_timeout;
longcompute_timeout;
+
+   uint64_tunique_id;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index adba9ea03e63..a73e1903d29b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -1368,6 +1368,29 @@ static ssize_t amdgpu_get_pcie_bw(struct device *dev,
count0, count1, pcie_get_mps(adev->pdev));
 }
 
+/**
+ * DOC: unique_id
+ *
+ * The amdgpu driver provides a sysfs API for providing a unique ID for the GPU
+ * The file unique_id is used for this.
+ * This will provide a Unique ID that will persist from machine to machine
+ *
+ * NOTE: This will only work for GFX9 and newer. This file will be absent
+ * on unsupported ASICs (GFX8 and older)
+ */
+static ssize_t amdgpu_get_unique_id(struct device *dev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   struct drm_device *ddev = dev_get_drvdata(dev);
+   struct amdgpu_device *adev = ddev->dev_private;
+
+   if (adev->unique_id)
+   return snprintf(buf, PAGE_SIZE, "%016llx\n", adev->unique_id);
+
+   return 0;
+}
+
 static DEVICE_ATTR(power_dpm_state, S_IRUGO | S_IWUSR, amdgpu_get_dpm_state, 
amdgpu_set_dpm_state);
 static DEVICE_ATTR(power_dpm_force_performance_level, S_IRUGO | S_IWUSR,
   amdgpu_get_dpm_forced_performance_level,
@@ -1418,6 +1441,7 @@ static DEVICE_ATTR(pcie_bw, S_IRUGO, amdgpu_get_pcie_bw, 
NULL);
 static DEVICE_ATTR(ppfeatures, S_IRUGO | S_IWUSR,
amdgpu_get_ppfeature_status,
amdgpu_set_ppfeature_status);
+static DEVICE_ATTR(unique_id, S_IRUGO, amdgpu_get_unique_id, NULL);
 
 static ssize_t amdgpu_hwmon_show_temp(struct device *dev,
  struct device_attribute *attr,
@@ -2814,6 +2838,12 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev)
return ret;
}
}
+   if (adev->unique_id)
+   ret = device_create_file(adev->dev, _attr_unique_id);
+   if (ret) {
+   DRM_ERROR("failed to create device file unique_id\n");
+   return ret;
+   }
ret = amdgpu_debugfs_pm_init(adev);
if (ret) {
DRM_ERROR("Failed to register debugfs file for dpm!\n");
@@ -2875,6 +2905,8 @@ void amdgpu_pm_sysfs_fini(struct amdgpu_device *adev)
device_remove_file(adev->dev, _attr_mem_busy_percent);
if (!(adev->flags & AMD_IS_APU))
device_remove_file(adev->dev, _attr_pcie_bw);
+   if (adev->unique_id)
+   device_remove_file(adev->dev, _attr_unique_id);
if ((adev->asic_type >= CHIP_VEGA10) &&
!(adev->flags & AMD_IS_APU))
device_remove_file(adev->dev, _attr_ppfeatures);
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
index 9585ba51d853..ce6aeb5a0362 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
@@ -356,6 +356,7 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr *hwmgr)
struct vega10_hwmgr *data = hwmgr->backend;
int i;
uint32_t sub_vendor_id, hw_revision;
+   uint32_t top32, bottom32;
struct amdgpu_device *adev = hwmgr->adev;
 
vega10_initialize_power_tune_defaults(hwmgr);
@@ -499,6 +500,14 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr 
*hwmgr)
(hw_revision == 0) &&
(sub_vendor_id != 0x1002))
data->smu_features[GNLD_PCC_LIMIT].supported = true;
+
+   /* Get the SN to turn into a Unique ID */
+   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumTop32);
+

Re: [PATCH] drm/amdgpu: Add Unique Identifier sysfs file unique_id

2019-05-17 Thread Alex Deucher

On Fri, May 17, 2019 at 9:45 AM Russell, Kent  wrote:
>
> Add a file that provides a Unique ID for the GPU.
> This will persist across machines and is guaranteed to be unique.
> This is only available for GFX9 and newer, so older ASICs will not
> have this file in the sysfs pool
>
> Change-Id: I3c673f78efcc5bf93ca58d65edbe39fc3a86b42a
> Signed-off-by: Kent Russell 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 33 +++
>  .../drm/amd/powerplay/hwmgr/vega10_hwmgr.c|  9 +
>  .../drm/amd/powerplay/hwmgr/vega12_hwmgr.c|  9 +
>  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c|  9 +
>  drivers/gpu/drm/amd/powerplay/inc/hwmgr.h |  1 +
>  5 files changed, 61 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index adba9ea03e63..e4e5dbbc69c1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -1368,6 +1368,30 @@ static ssize_t amdgpu_get_pcie_bw(struct device *dev,
> count0, count1, pcie_get_mps(adev->pdev));
>  }
>
> +/**
> + * DOC: unique_id
> + *
> + * The amdgpu driver provides a sysfs API for providing a unique ID for the 
> GPU
> + * The file unique_id is used for this.
> + * This will provide a Unique ID that will persist from machine to machine
> + *
> + * NOTE: This will only work for GFX9 and newer. This file will be absent
> + * on unsupported ASICs (GFX8 and older)
> + */
> +static ssize_t amdgpu_get_unique_id(struct device *dev,
> +   struct device_attribute *attr,
> +   char *buf)
> +{
> +   struct drm_device *ddev = dev_get_drvdata(dev);
> +   struct amdgpu_device *adev = ddev->dev_private;
> +   struct pp_hwmgr *hwmgr = adev->powerplay.pp_handle;
> +
> +   if (hwmgr->serial)
> +   return snprintf(buf, PAGE_SIZE, "%016llx\n", hwmgr->serial);
> +
> +   return 0;
> +}
> +
>  static DEVICE_ATTR(power_dpm_state, S_IRUGO | S_IWUSR, amdgpu_get_dpm_state, 
> amdgpu_set_dpm_state);
>  static DEVICE_ATTR(power_dpm_force_performance_level, S_IRUGO | S_IWUSR,
>amdgpu_get_dpm_forced_performance_level,
> @@ -1418,6 +1442,7 @@ static DEVICE_ATTR(pcie_bw, S_IRUGO, 
> amdgpu_get_pcie_bw, NULL);
>  static DEVICE_ATTR(ppfeatures, S_IRUGO | S_IWUSR,
> amdgpu_get_ppfeature_status,
> amdgpu_set_ppfeature_status);
> +static DEVICE_ATTR(unique_id, S_IRUGO, amdgpu_get_unique_id, NULL);
>
>  static ssize_t amdgpu_hwmon_show_temp(struct device *dev,
>   struct device_attribute *attr,
> @@ -2814,6 +2839,12 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev)
> return ret;
> }
> }
> +   if (hwmgr->serial)
> +   ret = device_create_file(adev->dev, _attr_unique_id);
> +   if (ret) {
> +   DRM_ERROR("failed to create device file unique_id\n");
> +   return ret;
> +   }

I would suggest storing serial in adev rather than hwmgr.  For some
asics (e.g., SI or CIK APUs) there is no hwmgr and we'll crash if we
try and dereference it.  Additionally, the new smu11 code for vega20
does not use hwmgr.  You can get to adev from hwmgr->adev.

Alex

> ret = amdgpu_debugfs_pm_init(adev);
> if (ret) {
> DRM_ERROR("Failed to register debugfs file for dpm!\n");
> @@ -2875,6 +2906,8 @@ void amdgpu_pm_sysfs_fini(struct amdgpu_device *adev)
> device_remove_file(adev->dev, _attr_mem_busy_percent);
> if (!(adev->flags & AMD_IS_APU))
> device_remove_file(adev->dev, _attr_pcie_bw);
> +   if (hwmgr->serial)
> +   device_remove_file(adev->dev, _attr_unique_id);
> if ((adev->asic_type >= CHIP_VEGA10) &&
> !(adev->flags & AMD_IS_APU))
> device_remove_file(adev->dev, _attr_ppfeatures);
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> index 9585ba51d853..6b5facca7a81 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> @@ -356,6 +356,7 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr 
> *hwmgr)
> struct vega10_hwmgr *data = hwmgr->backend;
> int i;
> uint32_t sub_vendor_id, hw_revision;
> +   uint32_t top32, bottom32;
> struct amdgpu_device *adev = hwmgr->adev;
>
> vega10_initialize_power_tune_defaults(hwmgr);
> @@ -499,6 +500,14 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr 
> *hwmgr)
> (hw_revision == 0) &&
> (sub_vendor_id != 0x1002))
> data->smu_features[GNLD_PCC_LIMIT].supported = true;
> +
> +   /* Get the SN to turn into a Unique ID */
> +   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumTop32);
> +   top32 = smum_get_argument(hwmgr);
>

XDC 2019: Registration & Call for Proposals now open!

2019-05-17 Thread Mark Filion

Hello!

Registration & Call for Proposals are now open for XDC 2019, which will
take place at the Concordia University Conference Centre in Montréal,
Canada on October 2-4, 2019.

Thanks to LWN.net, this year we have a brand new website using the
Indico platform, a fully open source event management tool!

https://xdc2019.x.org

If you plan on attending, please make sure to register as early as
possible as space is limited! As usual, the conference is free of
charge and open to the general public.

Please note, as we are now based on Indico, we will no longer use the
wiki for registration. You will therefore need to register via the XDC
website. However, as XDC is sharing the same Indico infrastructure as
LPC, if you previously registered on the LPC website
(linuxplumbersconference.org), then you already have an active account
and can use the same username & password to login!

https://xdc2019.x.org/event/5/registrations/2/

In addition to registration, the CfP is now open for talks, workshops
and demos at XDC 2019. While any serious proposal will be gratefully
considered, topics of interest to X.Org and freedesktop.org developers
are encouraged. The program focus is on new development, ongoing
challenges and anything else that will spark discussions among
attendees in the hallway track.

We are open to talks across all layers of the graphics stack, from the
kernel to desktop environments / graphical applications and about how
to make things better for the developers who build them. Head to the
CfP page to learn more: 

https://xdc2019.x.org/event/5/abstracts/

The deadline for submissions Sunday, 7 July 2019.

We are looking forward to seeing you (and sharing a poutine!) in
beautiful Montréal! If you have any questions, please send me an email
to mark.fil...@collabora.com, adding on CC the X.org board (board at
foundation.x.org).

And don't forget, you can follow us on Twitter for all the latest
updates and to stay connected:

https://twitter.com/xdc2019

Best,

Mark

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/gmc9: set vram_width properly for SR-IOV

2019-05-17 Thread Alex Deucher

For SR-IOV, vram_width can't be read from ATOM as
RAVEN, and DF related registers is not readable, so hardcord
is the only way to set the correct vram_width.

Signed-off-by: Trigger Huang 
Signed-off-by: Yintian Tao 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 59c580bd5a3b..9750b632e9aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -835,8 +835,16 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
int chansize, numchan;
int r;
 
-   if (amdgpu_emu_mode != 1)
+   if (amdgpu_sriov_vf(adev)) {
+   /* For Vega10 SR-IOV, vram_width can't be read from ATOM as 
RAVEN,
+* and DF related registers is not readable, seems hardcord is 
the
+* only way to set the correct vram_width
+*/
+   adev->gmc.vram_width = 2048;
+   } else if (amdgpu_emu_mode != 1) {
adev->gmc.vram_width = amdgpu_atomfirmware_get_vram_width(adev);
+   }
+
if (!adev->gmc.vram_width) {
/* hbm memory channel size */
if (adev->flags & AMD_IS_APU)
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2 3/4] device_cgroup: Export devcgroup_check_permission

2019-05-17 Thread Roman Gushchin

On Fri, May 17, 2019 at 04:15:06PM +, Kasiviswanathan, Harish wrote:
> For AMD compute (amdkfd) driver.
> 
> All AMD compute devices are exported via single device node /dev/kfd. As
> a result devices cannot be controlled individually using device cgroup.
> 
> AMD compute devices will rely on its graphics counterpart that exposes
> /dev/dri/renderN node for each device. For each task (based on its
> cgroup), KFD driver will check if /dev/dri/renderN node is accessible
> before exposing it.
> 
> Change-Id: I1b9705b2c30622a27655f4f878980fa138dbf373
> Signed-off-by: Harish Kasiviswanathan 
> ---
>  include/linux/device_cgroup.h | 19 ---
>  security/device_cgroup.c  | 15 +--
>  2 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
> index 8557efe096dc..bd19897bd582 100644
> --- a/include/linux/device_cgroup.h
> +++ b/include/linux/device_cgroup.h
> @@ -12,26 +12,15 @@
>  #define DEVCG_DEV_ALL   4  /* this represents all devices */
>  
>  #ifdef CONFIG_CGROUP_DEVICE
> -extern int __devcgroup_check_permission(short type, u32 major, u32 minor,
> - short access);
> +extern int devcgroup_check_permission(short type, u32 major, u32 minor,
> +   short access);
>  #else
> -static inline int __devcgroup_check_permission(short type, u32 major, u32 
> minor,
> -short access)
> +static inline int devcgroup_check_permission(short type, u32 major, u32 
> minor,
> +  short access)
>  { return 0; }
>  #endif
>  
>  #if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
> -static inline int devcgroup_check_permission(short type, u32 major, u32 
> minor,
> -  short access)
> -{
> - int rc = BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access);
> -
> - if (rc)
> - return -EPERM;
> -
> - return __devcgroup_check_permission(type, major, minor, access);
> -}
> -
>  static inline int devcgroup_inode_permission(struct inode *inode, int mask)
>  {
>   short type, access = 0;
> diff --git a/security/device_cgroup.c b/security/device_cgroup.c
> index cd97929fac66..3c57e05bf73b 100644
> --- a/security/device_cgroup.c
> +++ b/security/device_cgroup.c
> @@ -801,8 +801,8 @@ struct cgroup_subsys devices_cgrp_subsys = {
>   *
>   * returns 0 on success, -EPERM case the operation is not permitted
>   */
> -int __devcgroup_check_permission(short type, u32 major, u32 minor,
> -  short access)
> +static int __devcgroup_check_permission(short type, u32 major, u32 minor,
> + short access)
>  {
>   struct dev_cgroup *dev_cgroup;
>   bool rc;
> @@ -824,3 +824,14 @@ int __devcgroup_check_permission(short type, u32 major, 
> u32 minor,
>  
>   return 0;
>  }
> +
> +int devcgroup_check_permission(short type, u32 major, u32 minor, short 
> access)
> +{
> + int rc = BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access);
> +
> + if (rc)
> + return -EPERM;
> +
> + return __devcgroup_check_permission(type, major, minor, access);
> +}
> +EXPORT_SYMBOL(devcgroup_check_permission);

Perfect, now looks good to me!

Please, feel free to use my acks for patches 3 and 4:
Acked-by: Roman Gushchin 

Thanks!
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/soc15: skip reset on init

2019-05-17 Thread Alex Deucher

On Fri, May 17, 2019 at 10:47 AM Lin, Amber  wrote:
>
>
>
> On 2019-05-17 10:26 a.m., Alex Deucher wrote:
> > [CAUTION: External Email]
> >
> > Not necessary on soc15 and breaks driver reload on server cards.
> >
> > Signed-off-by: Alex Deucher 
> > Cc: sta...@vger.kernel.org
> > ---
> >   drivers/gpu/drm/amd/amdgpu/soc15.c | 5 +
> >   1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 32dc5a128249..78bd4fc07bab 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -761,6 +761,11 @@ static bool soc15_need_reset_on_init(struct 
> > amdgpu_device *adev)
> >   {
> >  u32 sol_reg;
> >
> > +   /* Just return false for soc15 GPUs.  Reset does not seem to
> > +* be necessary.
> > +*/
> > +   return false;
> > +
> >  if (adev->flags & AMD_IS_APU)
> >  return false;
> Should remove the rest of code in this function and sol_reg as well?
> Simply return false?

I was thinking we'd leave it in place for now in case we need to
re-enable it for something else in the future, but I guess we can just
revert the change if need be.  I don't have a strong opinion either
way.

Alex

> >
> > --
> > 2.20.1
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: GPU passthrough support for Stoney [Radeon R2/R3/R4/R5 Graphics]?

2019-05-17 Thread Alex Deucher

On Fri, May 17, 2019 at 11:36 AM Micah Morton  wrote:
>
> On Thu, May 16, 2019 at 1:39 PM Alex Deucher  wrote:
> >
> > On Thu, May 16, 2019 at 4:07 PM Micah Morton  wrote:
> > >
> > > On Wed, May 15, 2019 at 7:19 PM Alex Deucher  
> > > wrote:
> > > >
> > > > On Wed, May 15, 2019 at 2:26 PM Micah Morton  
> > > > wrote:
> > > > >
> > > > > Hi folks,
> > > > >
> > > > > I'm interested in running a VM on a system with an integrated Stoney
> > > > > [Radeon R2/R3/R4/R5 Graphics] card and passing through the graphics
> > > > > card to the VM using the IOMMU. I'm wondering whether this is feasible
> > > > > and supposed to be doable with the right setup (as opposed to passing
> > > > > a discrete GPU to the VM, which I think is definitely doable?).
> > > > >
> > > > > So far, I can do all the qemu/kvm/vfio/iommu stuff to run the VM and
> > > > > pass the integrated GPU to it, but the drm driver in the VM fails
> > > > > during amdgpu_device_init(). Specifically, the logs show the SMU being
> > > > > unresponsive, which leads to a 'SMU firmware load failed' error
> > > > > message and kernel panic. I can share VM logs and the invocation of
> > > > > qemu and such if helpful, but first wanted to know at a high level if
> > > > > this should be feasible?
> > > > >
> > > > > P.S.: I'm not initializing the GPU in the host bios or host kernel at
> > > > > all, so I should be passing a fresh GPU to the VM. Also, I'm pretty
> > > > > sure I'm running the correct VGA bios for this GPU in the guest VM
> > > > > bios before guest boot.
> > > > >
> > > > > Any comments/suggestions would be appreciated!
> > > >
> > > > It should work in at least once as long as your vm is properly set up.
> > >
> > > Is there any reason running coreboot vs UEFI at host boot would make a
> > > difference? I was running a modified version of coreboot that avoids
> > > doing any GPU initialization in firmware -- so the first POST happens
> > > inside the guest.
> >
> > The GPU on APUs shares a bunch of resources with the CPU.  There are a
> > bunch of blocks which are shared and need to be initialized on both
> > for everything to work properly.
>
> Interesting. So skipping running the vbios in the host and waiting
> until running it for the first time in the guest SeaBIOS is a bad
> idea? Would it be better to let APU+CPU initialize normally in the
> host and then skip trying to run the vbios in guest SeaBIOS and just
> do some kind of reset before the drm driver starts accessing it from
> the guest?

If you let the sbios initialize things, it should work.  The driver
will do the right thing to init the card when it loads whether its
running on bare metal or in a VM.  We've never tested any scenarios
where the GPU on APUs is not handled by the sbios.  Note that the GPU
does not have to be posted per se, it just needs to have been properly
taken into account when the sbios comes up so that shared components
are initialized correctly.  I don't know what your patched system does
or doesn't do with respect to the platform initialization.

>
> >
> > >
> > > > Note that the driver needs access to the vbios image in the guest to
> > > > get device specific configuration details (clocks, display connector
> > > > configuration, etc.).
> > >
> > > Is there anything I need to do to ensure this besides passing '-device
> > > vfio-pci,...,romfile=/path/to/vgarom' to qemu?
> >
> > You need the actual vbios rom image from your system.  The image is
> > board specific.
>
> I should have the correct vbios rom image for my board. I'm extracting
> it from the firmware image (that works for regular graphics init
> without this VM stuff) for the board at build time (rather than
> grabbing it from /sys/devices/pci... at runtime), so it shouldn't be
> modified or corrupted in any way.

The vbios image is patched at boot time by the sbios image for run
time configuration stuff.  For example, some of the pcie lanes are
shared with display lanes and can be used for either display or pcie
add in cards.  The sbios determines this at boot and patches the vbios
display tables so the driver knows that the displays are not
available.  Also things like flat panels on laptops.  OEMs may have
several different flat panel models they use with a particular
platform and the sbios patches the vbios display tables with the
proper parameters for the panel in use.  The sbios also patches tables
related to bandwidth.  E.g., the type and speed and number of channels
of the system ram so that the GPU driver can set proper limits on
things like display modes.  So you need to use the vbios image that is
provided by the sbios at boot.

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2 0/4] AMDKFD (AMD GPU compute) support for device cgroup.

2019-05-17 Thread Tejun Heo

On Fri, May 17, 2019 at 04:14:52PM +, Kasiviswanathan, Harish wrote:
> amdkfd (part of amdgpu) driver supports the AMD GPU compute stack.
> amdkfd exposes only a single device /dev/kfd even if multiple AMD GPU
> (compute) devices exist in a system. However, amdgpu drvier exposes a
> separate render device file /dev/dri/renderDN for each device. To participate
> in device cgroup amdkfd driver will rely on these redner device files.
> 
> v2: Exporting devcgroup_check_permission() instead of
> __devcgroup_check_permission() as per review comments.

Looks fine to me but given how non-obvious it is, some documentation
would be great.

Thanks.

-- 
tejun
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] gpu: drm: use struct_size() in kmalloc()

2019-05-17 Thread Pan, Xinhui

I am going to put more members which are also array after this struct, not only 
obj[].  Looks like this struct_size did not help on multiple array case. Thanks 
anyway.

From: xiaolinkui 
Sent: Friday, May 17, 2019 4:46:00 PM
To: Deucher, Alexander; Koenig, Christian; Zhou, David(ChunMing); 
airl...@linux.ie; dan...@ffwll.ch; Pan, Xinhui; Quan, Evan
Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; xiaolin...@kylinos.cn
Subject: [PATCH] gpu: drm: use struct_size() in kmalloc()

[CAUTION: External Email]

Use struct_size() helper to keep code simple.

Signed-off-by: xiaolinkui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 22bd21e..4717a64 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1375,8 +1375,7 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
if (con)
return 0;

-   con = kmalloc(sizeof(struct amdgpu_ras) +
-   sizeof(struct ras_manager) * AMDGPU_RAS_BLOCK_COUNT,
+   con = kmalloc(struct_size(con, objs, AMDGPU_RAS_BLOCK_COUNT),
GFP_KERNEL|__GFP_ZERO);
if (!con)
return -ENOMEM;
--
2.7.4



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: cast to unsigned int for 32-bit portability

2019-05-17 Thread Pan, Xinhui

better to use div64_ul(x, y) for compatiblity.

From: Abramov, Slava
Sent: Friday, May 17, 2019 5:19:54 AM
To: amd-gfx@lists.freedesktop.org
Cc: Pan, Xinhui; Deucher, Alexander
Subject: [PATCH] drm/amdgpu: cast to unsigned int for 32-bit portability


Without casting, 64-bit division is used implicitly causing DEPMOD
failure when building 32-bit kernel.

Signed-off-by: Slava Abramov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index da1dc40b9b14..0499620ec972 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -764,8 +764,9 @@ static ssize_t amdgpu_ras_sysfs_badpages_read(struct file 
*f,
  struct amdgpu_device *adev = con->adev;
  const unsigned int element_size =
  sizeof("0xabcdabcd : 0x12345678 : R\n") - 1;
- unsigned int start = (ppos + element_size - 1) / element_size;
- unsigned int end = (ppos + count - 1) / element_size;
+ unsigned int start =
+ (unsigned int) (ppos + element_size - 1) / element_size;
+ unsigned int end = (unsigned int) (ppos + count - 1) / element_size;
  ssize_t s = 0;
  struct ras_badpage *bps = NULL;
  unsigned int bps_count = 0;
--
2.17.1


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 4/4] drm/amdkfd: Check against device cgroup

2019-05-17 Thread Kasiviswanathan, Harish

Participate in device cgroup. All kfd devices are exposed via /dev/kfd.
So use /dev/dri/renderN node.

Before exposing the device to a task check if it has permission to
access it. If the task (based on its cgroup) can access /dev/dri/renderN
then expose the device via kfd node.

If the task cannot access /dev/dri/renderN then process device data
(pdd) is not created. This will ensure that task cannot use the device.

In sysfs topology, all device nodes are visible irrespective of the task
cgroup. The sysfs node directories are created at driver load time and
cannot be changed dynamically. However, access to information inside
nodes is controlled based on the task's cgroup permissions.

Change-Id: Ia7ed40930542111ac9f74b0706c3fa5ebf186b3c
Signed-off-by: Harish Kasiviswanathan 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c |  9 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 17 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c| 12 
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 22a8e88b6a67..85c8838323a2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -369,8 +369,13 @@ int kfd_init_apertures(struct kfd_process *process)
 
/*Iterating over all devices*/
while (kfd_topology_enum_kfd_devices(id, ) == 0) {
-   if (!dev) {
-   id++; /* Skip non GPU devices */
+   if (!dev || kfd_devcgroup_check_permission(dev)) {
+   /* Skip non GPU devices and devices to which the
+* current process have no access to. Access can be
+* limited by placing the process in a specific
+* cgroup hierarchy.
+*/
+   id++;
continue;
}
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cbba0047052d..85f55022014a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -35,6 +35,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include "amd_shared.h"
@@ -989,6 +991,21 @@ bool kfd_is_locked(void);
 void kfd_inc_compute_active(struct kfd_dev *dev);
 void kfd_dec_compute_active(struct kfd_dev *dev);
 
+/* Cgroup Support */
+/* Check with device cgroup if @kfd device is accessible */
+static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
+{
+#if defined(CONFIG_CGROUP_DEVICE)
+   struct drm_device *ddev = kfd->ddev;
+
+   return devcgroup_check_permission(DEVCG_DEV_CHAR, ddev->driver->major,
+ ddev->render->index,
+ DEVCG_ACC_WRITE | DEVCG_ACC_READ);
+#else
+   return 0;
+#endif
+}
+
 /* Debugfs */
 #if defined(CONFIG_DEBUG_FS)
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 64d667ae0d36..a3a14a76ece1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -269,6 +269,8 @@ static ssize_t iolink_show(struct kobject *kobj, struct 
attribute *attr,
buffer[0] = 0;
 
iolink = container_of(attr, struct kfd_iolink_properties, attr);
+   if (iolink->gpu && kfd_devcgroup_check_permission(iolink->gpu))
+   return -EPERM;
sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
@@ -305,6 +307,8 @@ static ssize_t mem_show(struct kobject *kobj, struct 
attribute *attr,
buffer[0] = 0;
 
mem = container_of(attr, struct kfd_mem_properties, attr);
+   if (mem->gpu && kfd_devcgroup_check_permission(mem->gpu))
+   return -EPERM;
sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
sysfs_show_32bit_prop(buffer, "flags", mem->flags);
@@ -334,6 +338,8 @@ static ssize_t kfd_cache_show(struct kobject *kobj, struct 
attribute *attr,
buffer[0] = 0;
 
cache = container_of(attr, struct kfd_cache_properties, attr);
+   if (cache->gpu && kfd_devcgroup_check_permission(cache->gpu))
+   return -EPERM;
sysfs_show_32bit_prop(buffer, "processor_id_low",
cache->processor_id_low);
sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
@@ -416,12 +422,16 @@ static ssize_t node_show(struct kobject *kobj, struct 
attribute *attr,
if (strcmp(attr->name, "gpu_id") == 0) {
dev = container_of(attr, struct kfd_topology_device,
attr_gpuid);

[PATCH v2 3/4] device_cgroup: Export devcgroup_check_permission

2019-05-17 Thread Kasiviswanathan, Harish

For AMD compute (amdkfd) driver.

All AMD compute devices are exported via single device node /dev/kfd. As
a result devices cannot be controlled individually using device cgroup.

AMD compute devices will rely on its graphics counterpart that exposes
/dev/dri/renderN node for each device. For each task (based on its
cgroup), KFD driver will check if /dev/dri/renderN node is accessible
before exposing it.

Change-Id: I1b9705b2c30622a27655f4f878980fa138dbf373
Signed-off-by: Harish Kasiviswanathan 
---
 include/linux/device_cgroup.h | 19 ---
 security/device_cgroup.c  | 15 +--
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index 8557efe096dc..bd19897bd582 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -12,26 +12,15 @@
 #define DEVCG_DEV_ALL   4  /* this represents all devices */
 
 #ifdef CONFIG_CGROUP_DEVICE
-extern int __devcgroup_check_permission(short type, u32 major, u32 minor,
-   short access);
+extern int devcgroup_check_permission(short type, u32 major, u32 minor,
+ short access);
 #else
-static inline int __devcgroup_check_permission(short type, u32 major, u32 
minor,
-  short access)
+static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
+short access)
 { return 0; }
 #endif
 
 #if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
-static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
-short access)
-{
-   int rc = BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access);
-
-   if (rc)
-   return -EPERM;
-
-   return __devcgroup_check_permission(type, major, minor, access);
-}
-
 static inline int devcgroup_inode_permission(struct inode *inode, int mask)
 {
short type, access = 0;
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index cd97929fac66..3c57e05bf73b 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -801,8 +801,8 @@ struct cgroup_subsys devices_cgrp_subsys = {
  *
  * returns 0 on success, -EPERM case the operation is not permitted
  */
-int __devcgroup_check_permission(short type, u32 major, u32 minor,
-short access)
+static int __devcgroup_check_permission(short type, u32 major, u32 minor,
+   short access)
 {
struct dev_cgroup *dev_cgroup;
bool rc;
@@ -824,3 +824,14 @@ int __devcgroup_check_permission(short type, u32 major, 
u32 minor,
 
return 0;
 }
+
+int devcgroup_check_permission(short type, u32 major, u32 minor, short access)
+{
+   int rc = BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access);
+
+   if (rc)
+   return -EPERM;
+
+   return __devcgroup_check_permission(type, major, minor, access);
+}
+EXPORT_SYMBOL(devcgroup_check_permission);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 1/4] drm/amdkfd: Store kfd_dev in iolink and cache properties

2019-05-17 Thread Kasiviswanathan, Harish

This is required to check against cgroup permissions.

Signed-off-by: Harish Kasiviswanathan 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 ++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 2c06d6c16eab..64d667ae0d36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1104,6 +1104,9 @@ static struct kfd_topology_device *kfd_assign_gpu(struct 
kfd_dev *gpu)
 {
struct kfd_topology_device *dev;
struct kfd_topology_device *out_dev = NULL;
+   struct kfd_mem_properties *mem;
+   struct kfd_cache_properties *cache;
+   struct kfd_iolink_properties *iolink;
 
down_write(_lock);
list_for_each_entry(dev, _device_list, list) {
@@ -1117,6 +1120,13 @@ static struct kfd_topology_device *kfd_assign_gpu(struct 
kfd_dev *gpu)
if (!dev->gpu && (dev->node_props.simd_count > 0)) {
dev->gpu = gpu;
out_dev = dev;
+
+   list_for_each_entry(mem, >mem_props, list)
+   mem->gpu = dev->gpu;
+   list_for_each_entry(cache, >cache_props, list)
+   cache->gpu = dev->gpu;
+   list_for_each_entry(iolink, >io_link_props, list)
+   iolink->gpu = dev->gpu;
break;
}
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 949e885dfb53..5fd3df80bb0e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -101,6 +101,7 @@ struct kfd_mem_properties {
uint32_tflags;
uint32_twidth;
uint32_tmem_clk_max;
+   struct kfd_dev  *gpu;
struct kobject  *kobj;
struct attributeattr;
 };
@@ -122,6 +123,7 @@ struct kfd_cache_properties {
uint32_tcache_latency;
uint32_tcache_type;
uint8_t sibling_map[CRAT_SIBLINGMAP_SIZE];
+   struct kfd_dev  *gpu;
struct kobject  *kobj;
struct attributeattr;
 };
@@ -140,6 +142,7 @@ struct kfd_iolink_properties {
uint32_tmax_bandwidth;
uint32_trec_transfer_size;
uint32_tflags;
+   struct kfd_dev  *gpu;
struct kobject  *kobj;
struct attributeattr;
 };
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 2/4] drm/amd: Pass drm_device to kfd

2019-05-17 Thread Kasiviswanathan, Harish

kfd needs drm_device to call into drm_cgroup functions

Signed-off-by: Harish Kasiviswanathan 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c| 2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  | 3 +++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 98326e3b5619..df92eb65a897 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -193,7 +193,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
adev->doorbell_index.last_non_cp;
}
 
-   kgd2kfd_device_init(adev->kfd.dev, _resources);
+   kgd2kfd_device_init(adev->kfd.dev, adev->ddev, _resources);
}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index f57f29763769..bad378596f46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -234,6 +234,7 @@ void kgd2kfd_exit(void);
 struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev,
  const struct kfd2kgd_calls *f2g);
 bool kgd2kfd_device_init(struct kfd_dev *kfd,
+struct drm_device *ddev,
 const struct kgd2kfd_shared_resources *gpu_resources);
 void kgd2kfd_device_exit(struct kfd_dev *kfd);
 void kgd2kfd_suspend(struct kfd_dev *kfd);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 7b4ea24c87f8..abc28b96c491 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -527,10 +527,12 @@ static void kfd_cwsr_init(struct kfd_dev *kfd)
 }
 
 bool kgd2kfd_device_init(struct kfd_dev *kfd,
+struct drm_device *ddev,
 const struct kgd2kfd_shared_resources *gpu_resources)
 {
unsigned int size;
 
+   kfd->ddev = ddev;
kfd->mec_fw_version = amdgpu_amdkfd_get_fw_version(kfd->kgd,
KGD_ENGINE_MEC1);
kfd->sdma_fw_version = amdgpu_amdkfd_get_fw_version(kfd->kgd,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 8f02d7817162..cbba0047052d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -46,6 +46,8 @@
 /* GPU ID hash width in bits */
 #define KFD_GPU_ID_HASH_WIDTH 16
 
+struct drm_device;
+
 /* Use upper bits of mmap offset to store KFD driver specific information.
  * BITS[63:62] - Encode MMAP type
  * BITS[61:46] - Encode gpu_id. To identify to which GPU the offset belongs to
@@ -211,6 +213,7 @@ struct kfd_dev {
 
const struct kfd_device_info *device_info;
struct pci_dev *pdev;
+   struct drm_device *ddev;
 
unsigned int id;/* topology stub index */
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 0/4] AMDKFD (AMD GPU compute) support for device cgroup.

2019-05-17 Thread Kasiviswanathan, Harish

amdkfd (part of amdgpu) driver supports the AMD GPU compute stack.
amdkfd exposes only a single device /dev/kfd even if multiple AMD GPU
(compute) devices exist in a system. However, amdgpu drvier exposes a
separate render device file /dev/dri/renderDN for each device. To participate
in device cgroup amdkfd driver will rely on these redner device files.

v2: Exporting devcgroup_check_permission() instead of
__devcgroup_check_permission() as per review comments.

Harish Kasiviswanathan (4):
  drm/amdkfd: Store kfd_dev in iolink and cache properties
  drm/amd: Pass drm_device to kfd
  device_cgroup: Export devcgroup_check_permission
  drm/amdkfd: Check against device cgroup

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c |  9 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 20 ++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c| 22 
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h|  3 +++
 include/linux/device_cgroup.h| 19 -
 security/device_cgroup.c | 15 +++--
 9 files changed, 73 insertions(+), 20 deletions(-)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: GPU passthrough support for Stoney [Radeon R2/R3/R4/R5 Graphics]?

2019-05-17 Thread Micah Morton

On Thu, May 16, 2019 at 1:39 PM Alex Deucher  wrote:
>
> On Thu, May 16, 2019 at 4:07 PM Micah Morton  wrote:
> >
> > On Wed, May 15, 2019 at 7:19 PM Alex Deucher  wrote:
> > >
> > > On Wed, May 15, 2019 at 2:26 PM Micah Morton  wrote:
> > > >
> > > > Hi folks,
> > > >
> > > > I'm interested in running a VM on a system with an integrated Stoney
> > > > [Radeon R2/R3/R4/R5 Graphics] card and passing through the graphics
> > > > card to the VM using the IOMMU. I'm wondering whether this is feasible
> > > > and supposed to be doable with the right setup (as opposed to passing
> > > > a discrete GPU to the VM, which I think is definitely doable?).
> > > >
> > > > So far, I can do all the qemu/kvm/vfio/iommu stuff to run the VM and
> > > > pass the integrated GPU to it, but the drm driver in the VM fails
> > > > during amdgpu_device_init(). Specifically, the logs show the SMU being
> > > > unresponsive, which leads to a 'SMU firmware load failed' error
> > > > message and kernel panic. I can share VM logs and the invocation of
> > > > qemu and such if helpful, but first wanted to know at a high level if
> > > > this should be feasible?
> > > >
> > > > P.S.: I'm not initializing the GPU in the host bios or host kernel at
> > > > all, so I should be passing a fresh GPU to the VM. Also, I'm pretty
> > > > sure I'm running the correct VGA bios for this GPU in the guest VM
> > > > bios before guest boot.
> > > >
> > > > Any comments/suggestions would be appreciated!
> > >
> > > It should work in at least once as long as your vm is properly set up.
> >
> > Is there any reason running coreboot vs UEFI at host boot would make a
> > difference? I was running a modified version of coreboot that avoids
> > doing any GPU initialization in firmware -- so the first POST happens
> > inside the guest.
>
> The GPU on APUs shares a bunch of resources with the CPU.  There are a
> bunch of blocks which are shared and need to be initialized on both
> for everything to work properly.

Interesting. So skipping running the vbios in the host and waiting
until running it for the first time in the guest SeaBIOS is a bad
idea? Would it be better to let APU+CPU initialize normally in the
host and then skip trying to run the vbios in guest SeaBIOS and just
do some kind of reset before the drm driver starts accessing it from
the guest?

>
> >
> > > Note that the driver needs access to the vbios image in the guest to
> > > get device specific configuration details (clocks, display connector
> > > configuration, etc.).
> >
> > Is there anything I need to do to ensure this besides passing '-device
> > vfio-pci,...,romfile=/path/to/vgarom' to qemu?
>
> You need the actual vbios rom image from your system.  The image is
> board specific.

I should have the correct vbios rom image for my board. I'm extracting
it from the firmware image (that works for regular graphics init
without this VM stuff) for the board at build time (rather than
grabbing it from /sys/devices/pci... at runtime), so it shouldn't be
modified or corrupted in any way.

>
> Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

答复: [PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov

2019-05-17 Thread Tao, Yintian

Hi Alex



Many thanks for your review. I will merge these two patches into one and submit 
again.



Best Regards

Yintian Tao


发件人: Alex Deucher 
发送时间: 2019年5月17日 22:34:30
收件人: Tao, Yintian
抄送: amd-gfx@lists.freedesktop.org; Koenig, Christian; Deucher, Alexander; 
Huang, Trigger
主题: Re: [PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov

[CAUTION: External Email]

How about combining these two patches into one?  This seems cleaner.

Alex

On Thu, May 16, 2019 at 10:39 PM Tao, Yintian  wrote:
>
> Ping...
>
> Hi Christian and Alex
>
>
> Can you help review this? Thanks in advance.
>
>
> Best Regards
> Yintian Tao
>
> -Original Message-
> From: Yintian Tao 
> Sent: Thursday, May 16, 2019 8:03 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tao, Yintian ; Huang, Trigger 
> Subject: [PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov
>
> For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN, and DF 
> related registers is not readable, seems hardcord is the only way to set the 
> correct vram_width
>
> Signed-off-by: Trigger Huang 
> Signed-off-by: Yintian Tao 
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index c221570..a417763 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -848,6 +848,13 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
> adev->gmc.vram_width = numchan * chansize;
> }
>
> +   /* For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN,
> +* and DF related registers is not readable, seems hardcord is the
> +* only way to set the correct vram_width */
> +   if (amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_VEGA10)) {
> +   adev->gmc.vram_width = 2048;
> +   }
> +
> /* size in MB on si */
> adev->gmc.mc_vram_size =
> adev->nbio_funcs->get_memsize(adev) * 1024ULL * 1024ULL;
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v15 00/17] arm64: untag user pointers passed to the kernel

2019-05-17 Thread Catalin Marinas

Hi Andrey,

On Mon, May 06, 2019 at 06:30:46PM +0200, Andrey Konovalov wrote:
> One of the alternative approaches to untagging that was considered is to
> completely strip the pointer tag as the pointer enters the kernel with
> some kind of a syscall wrapper, but that won't work with the countless
> number of different ioctl calls. With this approach we would need a custom
> wrapper for each ioctl variation, which doesn't seem practical.

The more I look at this problem, the less convinced I am that we can
solve it in a way that results in a stable ABI covering ioctls(). While
for the Android kernel codebase it could be simpler as you don't upgrade
the kernel version every 2.5 months, for the mainline kernel this
doesn't scale. Any run-time checks are relatively limited in terms of
drivers covered. Better static checking would be nice as a long term
solution but we didn't get anywhere with the discussion last year.

IMO (RFC for now), I see two ways forward:

1. Make this a user space problem and do not allow tagged pointers into
   the syscall ABI. A libc wrapper would have to convert structures,
   parameters before passing them into the kernel. Note that we can
   still support the hardware MTE in the kernel by enabling tagged
   memory ranges, saving/restoring tags etc. but not allowing tagged
   addresses at the syscall boundary.

2. Similar shim to the above libc wrapper but inside the kernel
   (arch/arm64 only; most pointer arguments could be covered with an
   __SC_CAST similar to the s390 one). There are two differences from
   what we've discussed in the past:

   a) this is an opt-in by the user which would have to explicitly call
  prctl(). If it returns -ENOTSUPP etc., the user won't be allowed
  to pass tagged pointers to the kernel. This would probably be the
  responsibility of the C lib to make sure it doesn't tag heap
  allocations. If the user did not opt-in, the syscalls are routed
  through the normal path (no untagging address shim).

   b) ioctl() and other blacklisted syscalls (prctl) will not accept
  tagged pointers (to be documented in Vicenzo's ABI patches).

It doesn't solve the problems we are trying to address but 2.a saves us
from blindly relaxing the ABI without knowing how to easily assess new
code being merged (over 500K lines between kernel versions). Existing
applications (who don't opt-in) won't inadvertently start using the new
ABI which could risk becoming de-facto ABI that we need to support on
the long run.

Option 1 wouldn't solve the ioctl() problem either and while it makes
things simpler for the kernel, I am aware that it's slightly more
complicated in user space (but I really don't mind if you prefer option
1 ;)).

The tagged pointers (whether hwasan or MTE) should ideally be a
transparent feature for the application writer but I don't think we can
solve it entirely and make it seamless for the multitude of ioctls().
I'd say you only opt in to such feature if you know what you are doing
and the user code takes care of specific cases like ioctl(), hence the
prctl() proposal even for the hwasan.

Comments welcomed.

-- 
Catalin

Re: [PATCH] drm/amdgpu/soc15: skip reset on init

2019-05-17 Thread Lin, Amber



On 2019-05-17 10:26 a.m., Alex Deucher wrote:
> [CAUTION: External Email]
>
> Not necessary on soc15 and breaks driver reload on server cards.
>
> Signed-off-by: Alex Deucher 
> Cc: sta...@vger.kernel.org
> ---
>   drivers/gpu/drm/amd/amdgpu/soc15.c | 5 +
>   1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 32dc5a128249..78bd4fc07bab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -761,6 +761,11 @@ static bool soc15_need_reset_on_init(struct 
> amdgpu_device *adev)
>   {
>  u32 sol_reg;
>
> +   /* Just return false for soc15 GPUs.  Reset does not seem to
> +* be necessary.
> +*/
> +   return false;
> +
>  if (adev->flags & AMD_IS_APU)
>  return false;
Should remove the rest of code in this function and sol_reg as well? 
Simply return false?
>
> --
> 2.20.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov

2019-05-17 Thread Alex Deucher

How about combining these two patches into one?  This seems cleaner.

Alex

On Thu, May 16, 2019 at 10:39 PM Tao, Yintian  wrote:
>
> Ping...
>
> Hi Christian and Alex
>
>
> Can you help review this? Thanks in advance.
>
>
> Best Regards
> Yintian Tao
>
> -Original Message-
> From: Yintian Tao 
> Sent: Thursday, May 16, 2019 8:03 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tao, Yintian ; Huang, Trigger 
> Subject: [PATCH] drm/amdgpu: set correct vram_width for vega10 under sriov
>
> For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN, and DF 
> related registers is not readable, seems hardcord is the only way to set the 
> correct vram_width
>
> Signed-off-by: Trigger Huang 
> Signed-off-by: Yintian Tao 
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index c221570..a417763 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -848,6 +848,13 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
> adev->gmc.vram_width = numchan * chansize;
> }
>
> +   /* For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN,
> +* and DF related registers is not readable, seems hardcord is the
> +* only way to set the correct vram_width */
> +   if (amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_VEGA10)) {
> +   adev->gmc.vram_width = 2048;
> +   }
> +
> /* size in MB on si */
> adev->gmc.mc_vram_size =
> adev->nbio_funcs->get_memsize(adev) * 1024ULL * 1024ULL;
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
From 28cff589e564087d22e9be35ba8f90e0a30409e9 Mon Sep 17 00:00:00 2001
From: Alex Deucher 
Date: Fri, 17 May 2019 09:31:43 -0500
Subject: [PATCH] drm/amdgpu/gmc9: set vram_width properly for SR-IOV

For SR-IOV, vram_width can't be read from ATOM as
RAVEN, and DF related registers is not readable, so hardcord
is the only way to set the correct vram_width.

Signed-off-by: Trigger Huang 
Signed-off-by: Yintian Tao 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 59c580bd5a3b..9750b632e9aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -835,8 +835,16 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
 	int chansize, numchan;
 	int r;
 
-	if (amdgpu_emu_mode != 1)
+	if (amdgpu_sriov_vf(adev)) {
+		/* For Vega10 SR-IOV, vram_width can't be read from ATOM as RAVEN,
+		 * and DF related registers is not readable, seems hardcord is the
+		 * only way to set the correct vram_width
+		 */
+		adev->gmc.vram_width = 2048;
+	} else if (amdgpu_emu_mode != 1) {
 		adev->gmc.vram_width = amdgpu_atomfirmware_get_vram_width(adev);
+	}
+
 	if (!adev->gmc.vram_width) {
 		/* hbm memory channel size */
 		if (adev->flags & AMD_IS_APU)
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/soc15: skip reset on init

2019-05-17 Thread Alex Deucher

Not necessary on soc15 and breaks driver reload on server cards.

Signed-off-by: Alex Deucher 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 32dc5a128249..78bd4fc07bab 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -761,6 +761,11 @@ static bool soc15_need_reset_on_init(struct amdgpu_device 
*adev)
 {
u32 sol_reg;
 
+   /* Just return false for soc15 GPUs.  Reset does not seem to
+* be necessary.
+*/
+   return false;
+
if (adev->flags & AMD_IS_APU)
return false;
 
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 01/11] drm/ttm: Make LRU removal optional.

2019-05-17 Thread Zhou, David(ChunMing)



> -Original Message-
> From: Christian König 
> Sent: Tuesday, May 14, 2019 8:31 PM
> To: Olsak, Marek ; Zhou, David(ChunMing)
> ; Liang, Prike ; dri-
> de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Subject: [PATCH 01/11] drm/ttm: Make LRU removal optional.
> 
> [CAUTION: External Email]
> 
> We are already doing this for DMA-buf imports and also for amdgpu VM BOs
> for quite a while now.
> 
> If this doesn't run into any problems we are probably going to stop removing
> BOs from the LRU altogether.
> 
> Signed-off-by: Christian König 
> ---
[snip]
> diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> index 0075eb9a0b52..957ec375a4ba 100644
> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> @@ -69,7 +69,8 @@ void ttm_eu_backoff_reservation(struct
> ww_acquire_ctx *ticket,
> list_for_each_entry(entry, list, head) {
> struct ttm_buffer_object *bo = entry->bo;
> 
> -   ttm_bo_add_to_lru(bo);
> +   if (list_empty(>lru))
> +   ttm_bo_add_to_lru(bo);
> reservation_object_unlock(bo->resv);
> }
> spin_unlock(>lru_lock);
> @@ -93,7 +94,7 @@ EXPORT_SYMBOL(ttm_eu_backoff_reservation);
> 
>  int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
>struct list_head *list, bool intr,
> -  struct list_head *dups)
> +  struct list_head *dups, bool del_lru)
>  {
> struct ttm_bo_global *glob;
> struct ttm_validate_buffer *entry; @@ -172,11 +173,11 @@ int
> ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
> list_add(>head, list);
> }
> 
> -   if (ticket)
> -   ww_acquire_done(ticket);
> -   spin_lock(>lru_lock);
> -   ttm_eu_del_from_lru_locked(list);
> -   spin_unlock(>lru_lock);
> +   if (del_lru) {
> +   spin_lock(>lru_lock);
> +   ttm_eu_del_from_lru_locked(list);
> +   spin_unlock(>lru_lock);
> +   }

Can you make bo to lru tail here when del_lru is false?

Busy iteration in evict_first will try other process Bos first, which could 
save loop time.

> return 0;
>  }
>  EXPORT_SYMBOL(ttm_eu_reserve_buffers);
> @@ -203,7 +204,10 @@ void ttm_eu_fence_buffer_objects(struct
> ww_acquire_ctx *ticket,
> reservation_object_add_shared_fence(bo->resv, fence);
> else
> reservation_object_add_excl_fence(bo->resv, fence);
> -   ttm_bo_add_to_lru(bo);
> +   if (list_empty(>lru))
> +   ttm_bo_add_to_lru(bo);
> +   else
> +   ttm_bo_move_to_lru_tail(bo, NULL);

If this line is done in above, then we don't need this here.

-David
> reservation_object_unlock(bo->resv);
> }
> spin_unlock(>lru_lock);
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 161b80fee492..5cffaa24259f 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -63,7 +63,7 @@ static int virtio_gpu_object_list_validate(struct
> ww_acquire_ctx *ticket,
> struct virtio_gpu_object *qobj;
> int ret;
> 
> -   ret = ttm_eu_reserve_buffers(ticket, head, true, NULL);
> +   ret = ttm_eu_reserve_buffers(ticket, head, true, NULL, true);
> if (ret != 0)
> return ret;
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> index a7c30e567f09..d28cbedba0b5 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> @@ -465,7 +465,8 @@ vmw_resource_check_buffer(struct ww_acquire_ctx
> *ticket,
> val_buf->bo = >backup->base;
> val_buf->num_shared = 0;
> list_add_tail(_buf->head, _list);
> -   ret = ttm_eu_reserve_buffers(ticket, _list, interruptible, NULL);
> +   ret = ttm_eu_reserve_buffers(ticket, _list, interruptible, NULL,
> +true);
> if (unlikely(ret != 0))
> goto out_no_reserve;
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> index 3b396fea40d7..ac435b51f4eb 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> @@ -165,7 +165,7 @@ vmw_validation_bo_reserve(struct
> vmw_validation_context *ctx,
>   bool intr)
>  {
> return ttm_eu_reserve_buffers(>ticket, >bo_list, intr,
> - NULL);
> + NULL, true);
>  }
> 
>  /**
> diff --git a/include/drm/ttm/ttm_bo_driver.h
> b/include/drm/ttm/ttm_bo_driver.h index c008346c2401..fc0d995ac90d
> 100644
> ---

[PATCH] drm/amdgpu: Add Unique Identifier sysfs file unique_id

2019-05-17 Thread Russell, Kent

Add a file that provides a Unique ID for the GPU.
This will persist across machines and is guaranteed to be unique.
This is only available for GFX9 and newer, so older ASICs will not
have this file in the sysfs pool

Change-Id: I3c673f78efcc5bf93ca58d65edbe39fc3a86b42a
Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 33 +++
 .../drm/amd/powerplay/hwmgr/vega10_hwmgr.c|  9 +
 .../drm/amd/powerplay/hwmgr/vega12_hwmgr.c|  9 +
 .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c|  9 +
 drivers/gpu/drm/amd/powerplay/inc/hwmgr.h |  1 +
 5 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index adba9ea03e63..e4e5dbbc69c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -1368,6 +1368,30 @@ static ssize_t amdgpu_get_pcie_bw(struct device *dev,
count0, count1, pcie_get_mps(adev->pdev));
 }
 
+/**
+ * DOC: unique_id
+ *
+ * The amdgpu driver provides a sysfs API for providing a unique ID for the GPU
+ * The file unique_id is used for this.
+ * This will provide a Unique ID that will persist from machine to machine
+ *
+ * NOTE: This will only work for GFX9 and newer. This file will be absent
+ * on unsupported ASICs (GFX8 and older)
+ */
+static ssize_t amdgpu_get_unique_id(struct device *dev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   struct drm_device *ddev = dev_get_drvdata(dev);
+   struct amdgpu_device *adev = ddev->dev_private;
+   struct pp_hwmgr *hwmgr = adev->powerplay.pp_handle;
+
+   if (hwmgr->serial)
+   return snprintf(buf, PAGE_SIZE, "%016llx\n", hwmgr->serial);
+
+   return 0;
+}
+
 static DEVICE_ATTR(power_dpm_state, S_IRUGO | S_IWUSR, amdgpu_get_dpm_state, 
amdgpu_set_dpm_state);
 static DEVICE_ATTR(power_dpm_force_performance_level, S_IRUGO | S_IWUSR,
   amdgpu_get_dpm_forced_performance_level,
@@ -1418,6 +1442,7 @@ static DEVICE_ATTR(pcie_bw, S_IRUGO, amdgpu_get_pcie_bw, 
NULL);
 static DEVICE_ATTR(ppfeatures, S_IRUGO | S_IWUSR,
amdgpu_get_ppfeature_status,
amdgpu_set_ppfeature_status);
+static DEVICE_ATTR(unique_id, S_IRUGO, amdgpu_get_unique_id, NULL);
 
 static ssize_t amdgpu_hwmon_show_temp(struct device *dev,
  struct device_attribute *attr,
@@ -2814,6 +2839,12 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev)
return ret;
}
}
+   if (hwmgr->serial)
+   ret = device_create_file(adev->dev, _attr_unique_id);
+   if (ret) {
+   DRM_ERROR("failed to create device file unique_id\n");
+   return ret;
+   }
ret = amdgpu_debugfs_pm_init(adev);
if (ret) {
DRM_ERROR("Failed to register debugfs file for dpm!\n");
@@ -2875,6 +2906,8 @@ void amdgpu_pm_sysfs_fini(struct amdgpu_device *adev)
device_remove_file(adev->dev, _attr_mem_busy_percent);
if (!(adev->flags & AMD_IS_APU))
device_remove_file(adev->dev, _attr_pcie_bw);
+   if (hwmgr->serial)
+   device_remove_file(adev->dev, _attr_unique_id);
if ((adev->asic_type >= CHIP_VEGA10) &&
!(adev->flags & AMD_IS_APU))
device_remove_file(adev->dev, _attr_ppfeatures);
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
index 9585ba51d853..6b5facca7a81 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
@@ -356,6 +356,7 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr *hwmgr)
struct vega10_hwmgr *data = hwmgr->backend;
int i;
uint32_t sub_vendor_id, hw_revision;
+   uint32_t top32, bottom32;
struct amdgpu_device *adev = hwmgr->adev;
 
vega10_initialize_power_tune_defaults(hwmgr);
@@ -499,6 +500,14 @@ static void vega10_init_dpm_defaults(struct pp_hwmgr 
*hwmgr)
(hw_revision == 0) &&
(sub_vendor_id != 0x1002))
data->smu_features[GNLD_PCC_LIMIT].supported = true;
+
+   /* Get the SN to turn into a Unique ID */
+   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumTop32);
+   top32 = smum_get_argument(hwmgr);
+   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ReadSerialNumBottom32);
+   bottom32 = smum_get_argument(hwmgr);
+
+   hwmgr->serial = ((uint64_t)bottom32 << 32) | top32;
 }
 
 #ifdef PPLIB_VEGA10_EVV_SUPPORT
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c
index 1a909dda37c7..8625e18f52d9 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c
@@ -289,6 +289,7 @@ static int

Re: [PATCH] drm/amd/display: Allow faking displays as VRR capable.

2019-05-17 Thread Harry Wentland



On 2019-04-30 3:56 p.m., Mario Kleiner wrote:
> [CAUTION: External Email]
> 
> On Tue, Apr 30, 2019 at 2:22 PM Kazlauskas, Nicholas
>  wrote:
>>
>> On 4/30/19 3:44 AM, Michel Dänzer wrote:
>>> [CAUTION: External Email]
>>>
>>> On 2019-04-30 9:37 a.m., Mario Kleiner wrote:
 Allow to detect any connected display to be marked as
 VRR capable. This is useful for testing the basics of
 VRR mode, e.g., scheduling and timestamping, BTR, and
 transition logic, on non-VRR capable displays, e.g.,
 to perform IGT test-suit kms_vrr test runs.

 This fake VRR display mode is enabled by setting the
 optional module parameter amdgpu.fakevrrdisplay=1.

 It will try to use VRR range info parsed from EDID on
 DisplayPort displays which have a compatible EDID,
 but not compatible DPCD caps for Adaptive Sync. E.g.,
 NVidia G-Sync compatible displays expose a proper EDID,
 but not proper DPCD caps.

 It will use a hard-coded VRR range of 30 Hz - 144 Hz on
 other displays without suitable EDID, e.g., standard
 DisplayPort, HDMI, DVI monitors.

 Signed-off-by: Mario Kleiner 

 [...]

   struct amdgpu_mgpu_info mgpu_info = {
.mutex = __MUTEX_INITIALIZER(mgpu_info.mutex),
 @@ -665,6 +666,16 @@ MODULE_PARM_DESC(halt_if_hws_hang, "Halt if HWS hang 
 is detected (0 = off (defau
   MODULE_PARM_DESC(dcfeaturemask, "all stable DC features enabled 
 (default))");
   module_param_named(dcfeaturemask, amdgpu_dc_feature_mask, uint, 0444);

 +/**
 + * DOC: fakevrrdisplay (int)
 + * Override detection of VRR displays to mark any display as VRR capable, 
 even
 + * if it is not. Useful for basic testing of VRR without need to attach 
 such a
 + * display, e.g., for igt tests.
 + * Setting 1 enables faking VRR. Default value, 0, does normal detection.
 + */
 +module_param_named(fakevrrdisplay, amdgpu_fake_vrr_display, int, 0644);
 +MODULE_PARM_DESC(fakevrrdisplay, "Detect any display as VRR capable (0 = 
 off (default), 1 = on)");
>>>
>>> amdgpu has too many module parameters already; IMHO this kind of niche
>>> use-case doesn't justify adding yet another one. For the vast majority
>>> of users, this would just be another knob to break things, resulting in
>>> support burden for us.
>>>
>>> How about e.g. making the vrr_capable property mutable, or adding
>>> another property for this?
>>>
>>>
>>> --
>>> Earthling Michel Dänzer   |  https://www.amd.com
>>> Libre software enthusiast | Mesa and X developer
>>>
>>
>> Since vrr_capable is already an optional property I think making it
>> mutable could potentially be an option. It would allow for userspace to
>> be able to disable capability as well that way.
> 
> Yes, that would have been useful for at least my case. In my own
> toolkit i will need to control vrr on/off on a run-by-run basis,
> depending on what the users experiment scripts want. So i'll add code
> to manipulate my fullscreen windows attached XAtom directly and
> override Mesa's choices. A bit of a hack, but should hopefully work.
> 
> At least for my special niche, more easily accessible (== RandR output
> props) info is always helpful. Other things that would probably make
> my case easier would be optional properties to report the "vrr_active"
> state back, so that the toolkit can know cheaply via a simple query
> without doubt at any point in time if vrr is active or not, because i
> need to use very different ways of scheduling swapbuffers and
> correctness checking the results for vrr vs. non-vrr.
> 
> Or some vrr_range property, so the toolkit can know if it is operating
> the hw in normal vrr range, or in BTR mode, and adapt its swapbuffers
> scheduling, e.g., to try to help the current vrr/btr code a bit to
> make the "right" decisions for stable timing.
> 
> Of course that makes userspace clients more dependent on current hw
> implementation details, so i can see why it's probably not a popular
> choice for a generic api. The best long-term solution is to have
> proper api for the client to just provide a target presentation
> timestamp and leave the rest to the magic little elves inside the
> machine.
> 
>>
>> It's a pretty niche usecase though. However, as Michel said, it would
>> probably just end up being another setting that allows users to break
>> their own setup.
>>
>> Nicholas Kazlauskas
> 
> Ok, fair enough, thank you two for the feedback. I assumed users
> wouldn't mess with this module parameter, given that it has zero
> practical end-user value and can essentially only be used for cheap
> IGT regression testing, but then one never knows if users think it is
> some magic "get Freesync for free" switch and poke the button anyway.
> I'll keep the patch locally for my own testing then.
> 

Users will absolutely set this, report varying levels of success and
then report bugs when they

[PATCH] gpu: drm: use struct_size() in kmalloc()

2019-05-17 Thread xiaolinkui

Use struct_size() helper to keep code simple.

Signed-off-by: xiaolinkui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 22bd21e..4717a64 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1375,8 +1375,7 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
if (con)
return 0;
 
-   con = kmalloc(sizeof(struct amdgpu_ras) +
-   sizeof(struct ras_manager) * AMDGPU_RAS_BLOCK_COUNT,
+   con = kmalloc(struct_size(con, objs, AMDGPU_RAS_BLOCK_COUNT),
GFP_KERNEL|__GFP_ZERO);
if (!con)
return -ENOMEM;
-- 
2.7.4



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

2019-05-17 Thread Liu, Monk

Instead if guest load “sos” and “sysdrv” there are two kinds of result:

  1.  For elder version PSP boot loader guest would hit failure.
  2.  For new version PSP boot loader those ucode won’t work at all so loading 
the is redundant

/Monk

From: Tao, Yintian
Sent: Friday, May 17, 2019 3:17 PM
To: Koenig, Christian 
Cc: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: 答复: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV


Hi  Christian



Many thanks for your review.



The background is that this bo is to let psp load sos and sysdrv but under 
sriov, sos and sysdrv is loaded by VBIOS or hypervisor driver.



The reason why not let guest driver to load it under SRIOV is that it is not 
safe.





Best Regards

Yintian Tao


发件人: Koenig, Christian
发送时间: 2019年5月17日 14:53:35
收件人: Tao, Yintian
抄送: Liu, Monk; 
amd-gfx@lists.freedesktop.org
主题: Re: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

Looks good to me now, but I don't know the technical background why this
BO is not needed under SRIOV.

So this patch is Acked-by: Christian König 
mailto:christian.koe...@amd.com>>.

Regards,
Christian.

Am 17.05.19 um 04:41 schrieb Tao, Yintian:
> Hi Christian
>
>
> I have modified it according to your suggestion. Can you help review this 
> again? Thanks in advance.
>
>
> Best Regards
> Yintian Tao
>
> -Original Message-
> From: Yintian Tao mailto:yt...@amd.com>>
> Sent: Thursday, May 16, 2019 7:54 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tao, Yintian mailto:yintian@amd.com>>; Liu, Monk 
> mailto:monk@amd.com>>
> Subject: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV
>
> PSP fw primary buffer is not used under SRIOV.
> Therefore, we don't need to allocate memory for it.
>
> v2: remove superfluous check for amdgpu_bo_free_kernel().
>
> Signed-off-by: Yintian Tao mailto:yt...@amd.com>>
> Signed-off-by: Monk Liu mailto:monk@amd.com>>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 ++---
>   1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index c567a55..af9835c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev)
>if (!psp->cmd)
>return -ENOMEM;
>
> - ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> - AMDGPU_GEM_DOMAIN_GTT,
> - >fw_pri_bo,
> - >fw_pri_mc_addr,
> - >fw_pri_buf);
> - if (ret)
> - goto failed;
> + /* this fw pri bo is not used under SRIOV */
> + if (!amdgpu_sriov_vf(psp->adev)) {
> + ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> +   AMDGPU_GEM_DOMAIN_GTT,
> +   >fw_pri_bo,
> +   >fw_pri_mc_addr,
> +   >fw_pri_buf);
> + if (ret)
> + goto failed;
> + }
>
>ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
>AMDGPU_GEM_DOMAIN_VRAM,
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-17 Thread Liang, Prike

Hi Christian，

With the series patch set , amdgpu_vm_validate_pt_bos occasionally evicted 
amdgpu BOs failed and can’t
find the valid first busy bo . Another problem is that  during the first BOs 
get lock period will run into deadlock .

/* check if other user occupy memory too long time */
if (!first_bo || !request_resv || !request_resv->lock.ctx) {
if (first_bo)
ttm_bo_put(first_bo);
return -EBUSY;
}
if (first_bo->resv == request_resv) {
ttm_bo_put(first_bo);
return -EBUSY;
}
if (ctx->interruptible)
ret = ww_mutex_lock_interruptible(_bo->resv->lock,
  
request_resv->lock.ctx);
else
ret = ww_mutex_lock(_bo->resv->lock, 
request_resv->lock.ctx);
if (ret) {
ttm_bo_put(first_bo);
if (ret == -EDEADLK) {
ret = -EAGAIN;
}

return ret;
}

Thanks
Prike

From: Christian König 
Sent: Wednesday, May 15, 2019 3:05 PM
To: Liang, Prike ; Marek Olšák 
Cc: Zhou, David(ChunMing) ; dri-devel 
; amd-gfx mailing list 

Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

[CAUTION: External Email]
Hi Prike,

no, that can lead to massive problems in a real OOM situation and is not 
something we can do here.

Christian.

Am 15.05.19 um 04:00 schrieb Liang, Prike:
Hi Christian ,

I just wonder when encounter ENOMEM error during pin amdgpu BOs can we retry 
validate again as below.
With the following simply patch the Abaqus pinned issue not observed.

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 11cbf63..72a32f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
bo->placements[i].lpfn = lpfn;
bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;
}
-
+retry:
r = ttm_bo_validate(>tbo, >placement, );
if (unlikely(r)) {
-   dev_err(adev->dev, "%p pin failed\n", bo);
-   goto error;
+if (r == -ENOMEM){
+goto retry;
+} else {
+   dev_err(adev->dev, "%p pin failed\n", bo);
+   goto error;
+}
}

bo->pin_count = 1;


Thanks,
Prike

From: Marek Olšák 
Sent: Wednesday, May 15, 2019 3:33 AM
To: Christian König 

Cc: Zhou, David(ChunMing) ; 
Liang, Prike ; dri-devel 
; 
amd-gfx mailing list 

Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

[CAUTION: External Email]
This series fixes the OOM errors. However, if I torture the kernel driver more, 
I can get it to deadlock and end up with unkillable processes. I can also get 
an OOM error. I just ran the test 5 times:

AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & 
AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & 
AMD_DEBUG=testgdsmm glxgears

Marek

On Tue, May 14, 2019 at 8:31 AM Christian König 
mailto:ckoenig.leichtzumer...@gmail.com>> 
wrote:
This avoids OOM situations when we have lots of threads
submitting at the same time.

Signed-off-by: Christian König 
mailto:christian.koe...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fff558cf385b..f9240a94217b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
}

r = ttm_eu_reserve_buffers(>ticket, >validated, true,
-  , true);
+  , false);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [v3] drm/amdgpu: add badpages sysfs interafce

2019-05-17 Thread Nathan Chancellor

On Thu, May 09, 2019 at 10:31:05AM +, Pan, Xinhui wrote:
> add badpages node.
> it will output badpages list in format
> gpu pfn : gpu page size : flags
> 
> example
> 0x : 0x1000 : R
> 0x0001 : 0x1000 : R
> 0x0002 : 0x1000 : R
> 0x0003 : 0x1000 : R
> 0x0004 : 0x1000 : R
> 0x0005 : 0x1000 : R
> 0x0006 : 0x1000 : R
> 0x0007 : 0x1000 : P
> 0x0008 : 0x1000 : P
> 0x0009 : 0x1000 : P
> 
> flags can be one of below characters
> R: reserved.
> P: pending for reserve.
> F: failed to reserve for some reasons.
> 
> Signed-off-by: xinhui pan 
> Reviewed-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 146 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |   1 +
>  2 files changed, 147 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index c60d5f813801..c9e24f60938e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -90,6 +90,12 @@ struct ras_manager {
>   struct ras_err_data err_data;
>  };
>  
> +struct ras_badpage {
> + unsigned int bp;
> + unsigned int size;
> + unsigned int flags;
> +};
> +
>  const char *ras_error_string[] = {
>   "none",
>   "parity",
> @@ -710,6 +716,77 @@ int amdgpu_ras_query_error_count(struct amdgpu_device 
> *adev,
>  
>  /* sysfs begin */
>  
> +static int amdgpu_ras_badpages_read(struct amdgpu_device *adev,
> + struct ras_badpage **bps, unsigned int *count);
> +
> +static char *amdgpu_ras_badpage_flags_str(unsigned int flags)
> +{
> + switch (flags) {
> + case 0:
> + return "R";
> + case 1:
> + return "P";
> + case 2:
> + default:
> + return "F";
> + };
> +}
> +
> +/*
> + * DOC: ras sysfs gpu_vram_bad_pages interface
> + *
> + * It allows user to read the bad pages of vram on the gpu through
> + * /sys/class/drm/card[0/1/2...]/device/ras/gpu_vram_bad_pages
> + *
> + * It outputs multiple lines, and each line stands for one gpu page.
> + *
> + * The format of one line is below,
> + * gpu pfn : gpu page size : flags
> + *
> + * gpu pfn and gpu page size are printed in hex format.
> + * flags can be one of below character,
> + * R: reserved, this gpu page is reserved and not able to use.
> + * P: pending for reserve, this gpu page is marked as bad, will be reserved
> + *in next window of page_reserve.
> + * F: unable to reserve. this gpu page can't be reserved due to some reasons.
> + *
> + * examples:
> + * 0x0001 : 0x1000 : R
> + * 0x0002 : 0x1000 : P
> + */
> +
> +static ssize_t amdgpu_ras_sysfs_badpages_read(struct file *f,
> + struct kobject *kobj, struct bin_attribute *attr,
> + char *buf, loff_t ppos, size_t count)
> +{
> + struct amdgpu_ras *con =
> + container_of(attr, struct amdgpu_ras, badpages_attr);
> + struct amdgpu_device *adev = con->adev;
> + const unsigned int element_size =
> + sizeof("0xabcdabcd : 0x12345678 : R\n") - 1;
> + unsigned int start = (ppos + element_size - 1) / element_size;
> + unsigned int end = (ppos + count - 1) / element_size;

I believe these two lines cause a link time error with arm32 defconfig +
CONFIG_DRM_AMDGPU (filtered down from allyesconfig):

arm-linux-gnueabi-ld: drivers/gpu/drm/amd/amdgpu/amdgpu_ras.o: in function 
`amdgpu_ras_sysfs_badpages_read':
amdgpu_ras.c:(.text+0x804): undefined reference to `__aeabi_ldivmod'
arm-linux-gnueabi-ld: amdgpu_ras.c:(.text+0x830): undefined reference to 
`__aeabi_ldivmod'

The assignments of start and end involve a 64-bit dividend because loff_t
is defined as long long, meaning one of the 64-bit division functions
from include/linux/math64.h should be used. I am not sure of which one
otherwise I would have sent a patch :)

Cheers,
Nathan

> + ssize_t s = 0;
> + struct ras_badpage *bps = NULL;
> + unsigned int bps_count = 0;
> +
> + memset(buf, 0, count);
> +
> + if (amdgpu_ras_badpages_read(adev, , _count))
> + return 0;
> +
> + for (; start < end && start < bps_count; start++)
> + s += scnprintf([s], element_size + 1,
> + "0x%08x : 0x%08x : %1s\n",
> + bps[start].bp,
> + bps[start].size,
> + amdgpu_ras_badpage_flags_str(bps[start].flags));
> +
> + kfree(bps);
> +
> + return s;
> +}
> +
>  static ssize_t amdgpu_ras_sysfs_features_read(struct device *dev,
>   struct device_attribute *attr, char *buf)
>  {
> @@ -750,9 +827,14 @@ static int amdgpu_ras_sysfs_create_feature_node(struct 
> amdgpu_device *adev)
>   >features_attr.attr,
>   NULL
>   };
> + struct bin_attribute *bin_attrs[] = {
> + >badpages_attr,
> + NULL
> + };
>   struct

答复: 答复: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

2019-05-17 Thread Tao, Yintian

Hi  Chrisitian



Yes, of course. Thanks for your reminder.



Best Regards
Yintian Tao


发件人: Christian König 
发送时间: 2019年5月17日 15:20:54
收件人: Tao, Yintian; Koenig, Christian
抄送: amd-gfx@lists.freedesktop.org; Liu, Monk
主题: Re: 答复: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

[CAUTION: External Email]
Hi Yintian,

please add this as a code comment to the patch.

Christian.

Am 17.05.19 um 09:17 schrieb Tao, Yintian:

Hi  Christian


Many thanks for your review.


The background is that this bo is to let psp load sos and sysdrv but under 
sriov, sos and sysdrv is loaded by VBIOS or hypervisor driver.


The reason why not let guest driver to load it under SRIOV is that it is not 
safe.



Best Regards

Yintian Tao


发件人: Koenig, Christian
发送时间: 2019年5月17日 14:53:35
收件人: Tao, Yintian
抄送: Liu, Monk; 
amd-gfx@lists.freedesktop.org
主题: Re: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

Looks good to me now, but I don't know the technical background why this
BO is not needed under SRIOV.

So this patch is Acked-by: Christian König 
.

Regards,
Christian.

Am 17.05.19 um 04:41 schrieb Tao, Yintian:
> Hi Christian
>
>
> I have modified it according to your suggestion. Can you help review this 
> again? Thanks in advance.
>
>
> Best Regards
> Yintian Tao
>
> -Original Message-
> From: Yintian Tao 
> Sent: Thursday, May 16, 2019 7:54 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tao, Yintian ; Liu, Monk 
> 
> Subject: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV
>
> PSP fw primary buffer is not used under SRIOV.
> Therefore, we don't need to allocate memory for it.
>
> v2: remove superfluous check for amdgpu_bo_free_kernel().
>
> Signed-off-by: Yintian Tao 
> Signed-off-by: Monk Liu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 ++---
>   1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index c567a55..af9835c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev)
>if (!psp->cmd)
>return -ENOMEM;
>
> - ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> - AMDGPU_GEM_DOMAIN_GTT,
> - >fw_pri_bo,
> - >fw_pri_mc_addr,
> - >fw_pri_buf);
> - if (ret)
> - goto failed;
> + /* this fw pri bo is not used under SRIOV */
> + if (!amdgpu_sriov_vf(psp->adev)) {
> + ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> +   AMDGPU_GEM_DOMAIN_GTT,
> +   >fw_pri_bo,
> +   >fw_pri_mc_addr,
> +   >fw_pri_buf);
> + if (ret)
> + goto failed;
> + }
>
>ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
>AMDGPU_GEM_DOMAIN_VRAM,




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: 答复: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

2019-05-17 Thread Christian König


Hi Yintian,

please add this as a code comment to the patch.

Christian.

Am 17.05.19 um 09:17 schrieb Tao, Yintian:


Hi  Christian


Many thanks for your review.


The background is that this bo is to let psp load sos and sysdrv but 
under sriov, sos and sysdrv is loaded by VBIOS or hypervisor driver.



The reason why not let guest driver to load it under SRIOV is that it 
is not safe.




Best Regards

Yintian Tao


*发件人:* Koenig, Christian
*发送时间:* 2019年5月17日 14:53:35
*收件人:* Tao, Yintian
*抄送:* Liu, Monk; amd-gfx@lists.freedesktop.org
*主题:* Re: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV
Looks good to me now, but I don't know the technical background why this
BO is not needed under SRIOV.

So this patch is Acked-by: Christian König .

Regards,
Christian.

Am 17.05.19 um 04:41 schrieb Tao, Yintian:
> Hi Christian
>
>
> I have modified it according to your suggestion. Can you help review 
this again? Thanks in advance.

>
>
> Best Regards
> Yintian Tao
>
> -Original Message-
> From: Yintian Tao 
> Sent: Thursday, May 16, 2019 7:54 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tao, Yintian ; Liu, Monk 
> Subject: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV
>
> PSP fw primary buffer is not used under SRIOV.
> Therefore, we don't need to allocate memory for it.
>
> v2: remove superfluous check for amdgpu_bo_free_kernel().
>
> Signed-off-by: Yintian Tao 
> Signed-off-by: Monk Liu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 ++---
>   1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c

> index c567a55..af9835c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev)
>    if (!psp->cmd)
>    return -ENOMEM;
>
> - ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> - AMDGPU_GEM_DOMAIN_GTT,
> - >fw_pri_bo,
> - >fw_pri_mc_addr,
> - >fw_pri_buf);
> - if (ret)
> - goto failed;
> + /* this fw pri bo is not used under SRIOV */
> + if (!amdgpu_sriov_vf(psp->adev)) {
> + ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> + AMDGPU_GEM_DOMAIN_GTT,
> + >fw_pri_bo,
> + >fw_pri_mc_addr,
> + >fw_pri_buf);
> + if (ret)
> + goto failed;
> + }
>
>    ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, 
PAGE_SIZE,

> AMDGPU_GEM_DOMAIN_VRAM,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

答复: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

2019-05-17 Thread Tao, Yintian

Hi  Christian


Many thanks for your review.


The background is that this bo is to let psp load sos and sysdrv but under 
sriov, sos and sysdrv is loaded by VBIOS or hypervisor driver.


The reason why not let guest driver to load it under SRIOV is that it is not 
safe.



Best Regards

Yintian Tao


发件人: Koenig, Christian
发送时间: 2019年5月17日 14:53:35
收件人: Tao, Yintian
抄送: Liu, Monk; amd-gfx@lists.freedesktop.org
主题: Re: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

Looks good to me now, but I don't know the technical background why this
BO is not needed under SRIOV.

So this patch is Acked-by: Christian König .

Regards,
Christian.

Am 17.05.19 um 04:41 schrieb Tao, Yintian:
> Hi Christian
>
>
> I have modified it according to your suggestion. Can you help review this 
> again? Thanks in advance.
>
>
> Best Regards
> Yintian Tao
>
> -Original Message-
> From: Yintian Tao 
> Sent: Thursday, May 16, 2019 7:54 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tao, Yintian ; Liu, Monk 
> Subject: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV
>
> PSP fw primary buffer is not used under SRIOV.
> Therefore, we don't need to allocate memory for it.
>
> v2: remove superfluous check for amdgpu_bo_free_kernel().
>
> Signed-off-by: Yintian Tao 
> Signed-off-by: Monk Liu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 ++---
>   1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index c567a55..af9835c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev)
>if (!psp->cmd)
>return -ENOMEM;
>
> - ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> - AMDGPU_GEM_DOMAIN_GTT,
> - >fw_pri_bo,
> - >fw_pri_mc_addr,
> - >fw_pri_buf);
> - if (ret)
> - goto failed;
> + /* this fw pri bo is not used under SRIOV */
> + if (!amdgpu_sriov_vf(psp->adev)) {
> + ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> +   AMDGPU_GEM_DOMAIN_GTT,
> +   >fw_pri_bo,
> +   >fw_pri_mc_addr,
> +   >fw_pri_buf);
> + if (ret)
> + goto failed;
> + }
>
>ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
>AMDGPU_GEM_DOMAIN_VRAM,

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

2019-05-17 Thread Koenig, Christian

Looks good to me now, but I don't know the technical background why this 
BO is not needed under SRIOV.

So this patch is Acked-by: Christian König .

Regards,
Christian.

Am 17.05.19 um 04:41 schrieb Tao, Yintian:
> Hi Christian
>
>
> I have modified it according to your suggestion. Can you help review this 
> again? Thanks in advance.
>
>
> Best Regards
> Yintian Tao
>
> -Original Message-
> From: Yintian Tao 
> Sent: Thursday, May 16, 2019 7:54 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tao, Yintian ; Liu, Monk 
> Subject: [PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV
>
> PSP fw primary buffer is not used under SRIOV.
> Therefore, we don't need to allocate memory for it.
>
> v2: remove superfluous check for amdgpu_bo_free_kernel().
>
> Signed-off-by: Yintian Tao 
> Signed-off-by: Monk Liu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 ++---
>   1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index c567a55..af9835c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -905,13 +905,16 @@ static int psp_load_fw(struct amdgpu_device *adev)
>   if (!psp->cmd)
>   return -ENOMEM;
>   
> - ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> - AMDGPU_GEM_DOMAIN_GTT,
> - >fw_pri_bo,
> - >fw_pri_mc_addr,
> - >fw_pri_buf);
> - if (ret)
> - goto failed;
> + /* this fw pri bo is not used under SRIOV */
> + if (!amdgpu_sriov_vf(psp->adev)) {
> + ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> +   AMDGPU_GEM_DOMAIN_GTT,
> +   >fw_pri_bo,
> +   >fw_pri_mc_addr,
> +   >fw_pri_buf);
> + if (ret)
> + goto failed;
> + }
>   
>   ret = amdgpu_bo_create_kernel(adev, PSP_FENCE_BUFFER_SIZE, PAGE_SIZE,
>   AMDGPU_GEM_DOMAIN_VRAM,

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: don't read DF register for SRIOV

2019-05-17 Thread Koenig, Christian

Hi Yintian,

sorry I have not the slightest idea how this part of the hw works. Maybe 
try to explain more what the DF register is actually doing in the commit 
message. I suspect that it is only about detecting which VRAM connection 
is used, but I'm not 100% sure.

Regards,
Christian.

Am 17.05.19 um 04:38 schrieb Tao, Yintian:
> Ping...
>
> Hi Christian and Alex
>
>
> Can you help review this? Thanks in advance.
>
>
> Best Regards
> Yintian Tao
>
> -Original Message-
> From: amd-gfx  On Behalf Of Yintian Tao
> Sent: Thursday, May 16, 2019 8:11 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk ; Tao, Yintian 
> Subject: [PATCH] drm/amdgpu: don't read DF register for SRIOV
>
> [CAUTION: External Email]
>
> Under SRIOV, reading DF register has chance to lead to AER error in host 
> side, just skip reading it.
>
> Signed-off-by: Monk Liu 
> Signed-off-by: Yintian Tao 
> ---
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index a417763..b5bf9ed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -837,7 +837,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
>
>  if (amdgpu_emu_mode != 1)
>  adev->gmc.vram_width = 
> amdgpu_atomfirmware_get_vram_width(adev);
> -   if (!adev->gmc.vram_width) {
> +   if (!adev->gmc.vram_width && !amdgpu_sriov_vf(adev)) {
>  /* hbm memory channel size */
>  if (adev->flags & AMD_IS_APU)
>  chansize = 64;
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

43 matches

Mail list logo