[PATCH] drm/amdgpu: skip fw pri bo alloc for SRIOV

2019-05-15 Thread Yintian Tao
PSP fw primary buffer is not used under SRIOV Therefore, we don't need to allocate memory for it. Signed-off-by: Yintian Tao Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-15 Thread Kenny Ho
On Wed, May 15, 2019 at 5:26 PM Welty, Brian wrote: > On 5/9/2019 2:04 PM, Kenny Ho wrote: > > There are four control file types, > > stats (ro) - display current measured values for a resource > > max (rw) - limits for a resource > > default (ro, root cgroup only) - default values for a resource

Re: GPU passthrough support for Stoney [Radeon R2/R3/R4/R5 Graphics]?

2019-05-15 Thread Alex Deucher
On Wed, May 15, 2019 at 2:26 PM Micah Morton wrote: > > Hi folks, > > I'm interested in running a VM on a system with an integrated Stoney > [Radeon R2/R3/R4/R5 Graphics] card and passing through the graphics > card to the VM using the IOMMU. I'm wondering whether this is feasible > and supposed

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-15 Thread Welty, Brian
On 5/9/2019 2:04 PM, Kenny Ho wrote: > The drm resource being measured and limited here is the GEM buffer > objects. User applications allocate and free these buffers. In > addition, a process can allocate a buffer and share it with another > process. The consumer of a shared buffer can also

Re: Hard lockups with ROCM

2019-05-15 Thread Alex Deucher
On Wed, May 15, 2019 at 8:33 PM Daniel Kasak wrote: > > On Mon, May 13, 2019 at 11:44 AM Daniel Kasak wrote: >> >> Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and 5.1.0 >> kernel. Things were going great with various boinc GPU tasks. But there is a >> setiathome GPU task

Re: Hard lockups with ROCM

2019-05-15 Thread Daniel Kasak
On Mon, May 13, 2019 at 11:44 AM Daniel Kasak wrote: > Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and 5.1.0 > kernel. Things were going great with various boinc GPU tasks. But there is > a setiathome GPU task which reliably gives me a hard lockup within about 30 > minutes

Re: [PATCH][next] drm/amdgpu: fix spelling mistake "retrived" -> "retrieved"

2019-05-15 Thread Alex Deucher
On Fri, May 10, 2019 at 3:07 AM Colin King wrote: > > From: Colin Ian King > > There is a spelling mistake in a DRM_ERROR error message. Fix this. > > Signed-off-by: Colin Ian King Applied. thanks! Alex > --- > drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 2 +- > 1 file changed, 1 insertion(+),

[PATCH 2/7] drm/amdgpu: Add interface to alloc gws from amdgpu

2019-05-15 Thread Zeng, Oak
Add amdgpu_amdkfd interface to alloc and free gws from amdgpu Change-Id: I4eb418356e5a6051aa09c5e2c4a454263631d6ab Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 34 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 ++ 2 files changed, 36

[PATCH 7/7] drm/amdkfd: PM4 packets change to support GWS

2019-05-15 Thread Zeng, Oak
Add a field in map_queues packet to indicate whether this is a gws control queue. Only one queue per process can be gws control queue. Change num_gws field in map_process packet to 7 bits Change-Id: I0db91d99c6962d14f9202b2eb950f8e7e497b79e Signed-off-by: Oak Zeng ---

[PATCH 4/7] drm/amdkfd: Add function to set queue gws

2019-05-15 Thread Zeng, Oak
Add functions in process queue manager to set/get queue gws. Also set process's number of gws used. Currently only one queue in process can use and use all gws. Change-Id: I03e480c8692db3eabfc3a188cce8904d5d962ab7 Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h |

[PATCH 1/7] drm/amdkfd: Add gws number to kfd topology node properties

2019-05-15 Thread Zeng, Oak
Add amdgpu_amdkfd interface to get num_gws and add num_gws to /sys/class/kfd/kfd/topology/nodes/x/properties. Only report num_gws if MEC FW support GWS barriers. Currently it is determined by a environment variable which will be replaced with MEC FW version check when firmware is ready.

[PATCH 5/7] drm/amdgpu: Add function to add/remove gws to kfd process

2019-05-15 Thread Zeng, Oak
GWS bo is shared between all kfd processes. Add function to add gws to kfd process's bo list so gws can be evicted from and restored for process. Change-Id: I75d74cfdadb7075ff8b2b68634e205deb73dc1ea Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +

[PATCH 3/7] drm/amdkfd: Allocate gws on device initialization

2019-05-15 Thread Zeng, Oak
On device initialization, KFD allocates all (64) gws which is shared by all KFD processes. Change-Id: I1f1274b8d4f6a8ad08785e2791a9a79def75e913 Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 14 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 +++ 2 files

[PATCH 6/7] drm/amdkfd: New IOCTL to allocate queue GWS

2019-05-15 Thread Zeng, Oak
Add a new kfd ioctl to allocate queue GWS. Queue GWS is released on queue destroy. Change-Id: I60153c26a577992ad873e4292e759e5c3d5bbd15 Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 39 ++ .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c

Re: [PATCH] drm/amdgpu/vega20: use mode1 reset for RAS and XGMI

2019-05-15 Thread Grodzovsky, Andrey
Reviewed-by: Andrey Grodzovsky Andrey On 5/15/19 4:29 PM, Alex Deucher wrote: > [CAUTION: External Email] > > If RAS or XGMI are enabled, you have to use mode1 reset rather > than BACO. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/soc15.c | 9 + > 1 file

[PATCH] drm/amdgpu/vega20: use mode1 reset for RAS and XGMI

2019-05-15 Thread Alex Deucher
If RAS or XGMI are enabled, you have to use mode1 reset rather than BACO. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/soc15.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index

[PATCH 2/2] drm/amd/amdgpu: Fix maybe-uninitialized warning in df_v3_6_pmc_start

2019-05-15 Thread Bhawanpreet Lakha
This fixes the warning below error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized] int xgmi_tx_link, ret; ^~~ Signed-off-by: Bhawanpreet Lakha --- drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

[PATCH 1/2] drm/amd/powerplay: Fix maybe-uninitialized in get_ppfeature_status

2019-05-15 Thread Bhawanpreet Lakha
This fixes the warning below error: ‘feature_mask’ may be used uninitialized in this function [-Werror=maybe-uninitialized] *features_enabled = uint64_t)feature_mask[0] << SMU_FEATURES_LOW_SHIFT) & SMU_FEATURES_LOW_MASK) |

GPU passthrough support for Stoney [Radeon R2/R3/R4/R5 Graphics]?

2019-05-15 Thread Micah Morton
Hi folks, I'm interested in running a VM on a system with an integrated Stoney [Radeon R2/R3/R4/R5 Graphics] card and passing through the graphics card to the VM using the IOMMU. I'm wondering whether this is feasible and supposed to be doable with the right setup (as opposed to passing a

Re:[PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-15 Thread Zhou, David(ChunMing)
Ah, sorry, I missed "+ ttm_bo_move_to_lru_tail(bo, NULL);". Right, moving them to end before releasing is fixing my concern. Sorry for noise. -David Original Message Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From:

Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-15 Thread Koenig, Christian
BO list? No, we stop removing them from the LRU. But we still move them to the end of the LRU before releasing them. Christian. Am 15.05.19 um 16:21 schrieb Zhou, David(ChunMing): Isn't this patch trying to stop removing for all BOs from bo list? -David Original Message

Re:[PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-15 Thread Zhou, David(ChunMing)
Isn't this patch trying to stop removing for all BOs from bo list? -David Original Message Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Olsak, Marek" ,"Liang, Prike"

Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-15 Thread Christian König
That is a good point, but actually not a problem in practice. See the change to ttm_eu_fence_buffer_objects: -   ttm_bo_add_to_lru(bo); +   if (list_empty(>lru)) +   ttm_bo_add_to_lru(bo); +   else +  

RE: [PATCH] drm/amd/powerplay: fix locking in smu_feature_set_supported()

2019-05-15 Thread Huang, Ray
> -Original Message- > From: Dan Carpenter > Sent: Wednesday, May 15, 2019 5:52 PM > To: Deucher, Alexander ; Wang, Kevin(Yang) > > Cc: Koenig, Christian ; Zhou, David(ChunMing) > ; David Airlie ; Daniel Vetter > ; Huang, Ray ; Gao, Likun > ; Gui, Jack ; amd- >

[PATCH] drm/amd/powerplay: fix locking in smu_feature_set_supported()

2019-05-15 Thread Dan Carpenter
There is a typo so the code unlocks twice instead of taking the lock and then releasing it. Fixes: f14a323db5b0 ("drm/amd/powerplay: implement update enabled feature state to smc for smu11") Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 2 +- 1 file changed, 1

Re: [PATCH 02/11] drm/ttm: fix busy memory to fail other user v8

2019-05-15 Thread Christian König
Am 15.05.19 um 10:45 schrieb Daniel Vetter: On Wed, May 15, 2019 at 10:38:28AM +0200, Daniel Vetter wrote: On Tue, May 14, 2019 at 02:31:18PM +0200, Christian König wrote: From: Chunming Zhou heavy gpu job could occupy memory long time, which lead other user fail to get memory. basically

Re: [PATCH 09/11] drm/ttm: convert EDEADLK into EAGAIN

2019-05-15 Thread Christian König
Am 15.05.19 um 10:40 schrieb Daniel Vetter: On Tue, May 14, 2019 at 02:31:25PM +0200, Christian König wrote: Let userspace try again if we really run into a deadlock during eviction. This has a low chance of live locking, but with guaranteed forward process. Signed-off-by: Christian König

Re: [PATCH 02/11] drm/ttm: fix busy memory to fail other user v8

2019-05-15 Thread Christian König
Am 15.05.19 um 11:27 schrieb Christian König: Am 15.05.19 um 10:45 schrieb Daniel Vetter: On Wed, May 15, 2019 at 10:38:28AM +0200, Daniel Vetter wrote: On Tue, May 14, 2019 at 02:31:18PM +0200, Christian König wrote: From: Chunming Zhou heavy gpu job could occupy memory long time, which

Re: [PATCH 02/11] drm/ttm: fix busy memory to fail other user v8

2019-05-15 Thread Christian König
Am 15.05.19 um 10:45 schrieb Daniel Vetter: On Wed, May 15, 2019 at 10:38:28AM +0200, Daniel Vetter wrote: On Tue, May 14, 2019 at 02:31:18PM +0200, Christian König wrote: From: Chunming Zhou heavy gpu job could occupy memory long time, which lead other user fail to get memory. basically

Re: [PATCH 02/11] drm/ttm: fix busy memory to fail other user v8

2019-05-15 Thread Daniel Vetter
On Wed, May 15, 2019 at 10:38:28AM +0200, Daniel Vetter wrote: > On Tue, May 14, 2019 at 02:31:18PM +0200, Christian König wrote: > > From: Chunming Zhou > > > > heavy gpu job could occupy memory long time, which lead other user fail to > > get memory. > > > > basically pick up Christian idea:

Re: [PATCH 09/11] drm/ttm: convert EDEADLK into EAGAIN

2019-05-15 Thread Daniel Vetter
On Tue, May 14, 2019 at 02:31:25PM +0200, Christian König wrote: > Let userspace try again if we really run into a deadlock during eviction. > > This has a low chance of live locking, but with guaranteed forward process. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/ttm/ttm_bo.c |

Re: [PATCH 02/11] drm/ttm: fix busy memory to fail other user v8

2019-05-15 Thread Daniel Vetter
On Tue, May 14, 2019 at 02:31:18PM +0200, Christian König wrote: > From: Chunming Zhou > > heavy gpu job could occupy memory long time, which lead other user fail to > get memory. > > basically pick up Christian idea: > > 1. Reserve the BO in DC using a ww_mutex ticket (trivial). > 2. If we

Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-15 Thread Christian König
Hi Prike, no, that can lead to massive problems in a real OOM situation and is not something we can do here. Christian. Am 15.05.19 um 04:00 schrieb Liang, Prike: Hi Christian , I just wonder when encounter ENOMEM error during pin amdgpu BOs can we retry validate again as below. With