Re: [Intel-xe] [RFC PATCH 3/4] drm/ttm: Handle -EAGAIN in ttm_resource_alloc as -ENOSPC.

2023-05-03 Thread Maarten Lankhorst


On 2023-05-03 11:11, Thomas Hellström wrote:

Hi, Maarten

On 5/3/23 10:34, Maarten Lankhorst wrote:

This allows the drm cgroup controller to return no space is available..

XXX: This is a hopeless simplification that changes behavior, and
returns -ENOSPC even if we could evict ourselves from the current
cgroup.

Ideally, the eviction code becomes cgroup aware, and will force eviction
from the current cgroup or its parents.

Signed-off-by: Maarten Lankhorst 


Thinking of the shrinker analogy, do non-cgroup aware shrinkers just 
shrink blindly or do they reject shrinking like this patch when a 
cgroup limit is reached?


When I made the cgroup controller return -ENOSPC I just hit an infinite 
loop since it sees enough memory is free and tries to allocate memory 
again. Hence the -EAGAIN handling here. It returns -ENOSPC, without the 
infinite looping.


I think there should be 2 code paths:

- OOM, generic case: Handle like we do now. No need for special cgroup 
handling needed right now.


Might change if we implement cgroup memory semantics. See the memory 
section of Documentation/admin-guide/cgroup-v2.rst


It could be useful regardless.

- OOM, cgroup limit reached: Check for each BO if it's valuable to evict to 
unblock the relevant limit.


 / cg1.0
root - cg1 --  cg1.1
   \ \ cg1.2
\  cg2

If we hit the cg limit in cg1.0 for only cg1.0, it doesn't make sense to evict 
from any other cgroup.
If we hit the limit in cg1.0 for the entirety of cg1, it makes sense to evict 
from any of the cg1 nodes, but not from cg2.

This should be relatively straightforward to implement. We identify which 
cgroup hit a limit, and then let the shrinker
run only on that cgroup and its childs.

This could be simplified to the OOM generic case, for root/NULL cg.


~Maarten


Re: [Intel-xe] [RFC PATCH 3/4] drm/ttm: Handle -EAGAIN in ttm_resource_alloc as -ENOSPC.

2023-05-03 Thread Thomas Hellström

Hi, Maarten

On 5/3/23 10:34, Maarten Lankhorst wrote:

This allows the drm cgroup controller to return no space is available..

XXX: This is a hopeless simplification that changes behavior, and
returns -ENOSPC even if we could evict ourselves from the current
cgroup.

Ideally, the eviction code becomes cgroup aware, and will force eviction
from the current cgroup or its parents.

Signed-off-by: Maarten Lankhorst 


Thinking of the shrinker analogy, do non-cgroup aware shrinkers just 
shrink blindly or do they reject shrinking like this patch when a cgroup 
limit is reached?


/Thomas



---
  drivers/gpu/drm/ttm/ttm_bo.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bd5dae4d1624..e057d5d8f09a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -731,6 +731,8 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
ret = ttm_resource_alloc(bo, place, mem);
if (likely(!ret))
break;
+   if (ret == -EAGAIN)
+   return -ENOSPC;
if (unlikely(ret != -ENOSPC))
return ret;
ret = ttm_mem_evict_first(bdev, man, place, ctx,
@@ -783,7 +785,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
  
  		type_found = true;

ret = ttm_resource_alloc(bo, place, mem);
-   if (ret == -ENOSPC)
+   if (ret == -ENOSPC || ret == -EAGAIN)
continue;
if (unlikely(ret))
goto error;