Re: [Intel-gfx] [RFC v5 00/17] DRM cgroup controller with scheduling control and memory stats

2023-07-20 Thread T.J. Mercier
On Thu, Jul 20, 2023 at 3:55 AM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 19/07/2023 21:31, T.J. Mercier wrote:
> > On Wed, Jul 12, 2023 at 4:47 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>drm.memory.stat
> >>  A nested file containing cumulative memory statistics for the 
> >> whole
> >>  sub-hierarchy, broken down into separate GPUs and separate memory
> >>  regions supported by the latter.
> >>
> >>  For example::
> >>
> >>$ cat drm.memory.stat
> >>card0 region=system total=12898304 shared=0 active=0 
> >> resident=12111872 purgeable=167936
> >>card0 region=stolen-system total=0 shared=0 active=0 resident=0 
> >> purgeable=0
> >>
> >>  Card designation corresponds to the DRM device names and multiple 
> >> line
> >>  entries can be present per card.
> >>
> >>  Memory region names should be expected to be driver specific with 
> >> the
> >>  exception of 'system' which is standardised and applicable for 
> >> GPUs
> >>  which can operate on system memory buffers.
> >>
> >>  Sub-keys 'resident' and 'purgeable' are optional.
> >>
> >>  Per category region usage is reported in bytes.
> >>
> >>   * Feedback from people interested in drm.active_us and drm.memory.stat is
> >> required to understand the use cases and their usefulness (of the 
> >> fields).
> >>
> >> Memory stats are something which was easy to add to my series, since I 
> >> was
> >> already working on the fdinfo memory stats patches, but the question 
> >> is how
> >> useful it is.
> >>
> > Hi Tvrtko,
> >
> > I think this style of driver-defined categories for reporting of
> > memory could potentially allow us to eliminate the GPU memory tracking
> > tracepoint used on Android (gpu_mem_total). This would involve reading
> > drm.memory.stat at the root cgroup (I see it's currently disabled on
>
> I can put it available under root too, don't think there is any
> technical reason to not have it. In fact, now that I look at it again,
> memory.stat is present on root so that would align with my general
> guideline to keep the two as similar as possible.
>
> > the root), which means traversing the whole cgroup tree under the
> > cgroup lock to generate the values on-demand. This would be done
> > rarely, but I still wonder what the cost of that would turn out to be.
>
> Yeah that's ugly. I could eliminate cgroup_lock by being a bit smarter.
> Just didn't think it worth it for the RFC.
>
> Basically to account memory stats for any sub-tree I need the equivalent
> one struct drm_memory_stats per DRM device present in the hierarchy. So
> I could pre-allocate a few and restart if run out of spares, or
> something. They are really small so pre-allocating a good number, based
> on past state or something, should would good enough. Or even total
> number of DRM devices in a system as a pessimistic and safe option for
> most reasonable deployments.
>
> > The drm_memory_stats categories in the output don't seem like a big
> > value-add for this use-case, but no real objection to them being
>
> You mean the fact there are different categories is not a value add for
> your use case because you would only use one?
>
Exactly, I guess that'd be just "private" (or pick another one) for
the driver-defined "regions" where
shared/private/resident/purgeable/active aren't really applicable.
That doesn't seem like a big problem to me since you already need an
understanding of what a driver-defined region means. It's just adding
a requirement to understand what fields are used, and a driver can
document that in the same place as the region itself. That does mean
performing arithmetic on values from different drivers might not make
sense. But this is just my perspective from trying to fit the
gpu_mem_total tracepoint here. I think we could probably change the
way drivers that use it report memory to fit closer into the
drm_memory_stats categories.

> The idea was to align 1:1 with DRM memory stats fdinfo and somewhat
> emulate how memory.stat also offers a breakdown.
>
> > there. I know it's called the DRM cgroup controller, but it'd be nice
> > if there were a way to make the mem tracking part work for any driver
> > that wishes to participate as many of our devices don't use a DRM
> > driver. But making that work doesn't look like it would fit very
>
> Ah that would be a cha

Re: [Intel-gfx] [RFC v5 00/17] DRM cgroup controller with scheduling control and memory stats

2023-07-19 Thread T.J. Mercier
On Wed, Jul 12, 2023 at 4:47 AM Tvrtko Ursulin
 wrote:
>
>   drm.memory.stat
> A nested file containing cumulative memory statistics for the whole
> sub-hierarchy, broken down into separate GPUs and separate memory
> regions supported by the latter.
>
> For example::
>
>   $ cat drm.memory.stat
>   card0 region=system total=12898304 shared=0 active=0 
> resident=12111872 purgeable=167936
>   card0 region=stolen-system total=0 shared=0 active=0 resident=0 
> purgeable=0
>
> Card designation corresponds to the DRM device names and multiple line
> entries can be present per card.
>
> Memory region names should be expected to be driver specific with the
> exception of 'system' which is standardised and applicable for GPUs
> which can operate on system memory buffers.
>
> Sub-keys 'resident' and 'purgeable' are optional.
>
> Per category region usage is reported in bytes.
>
>  * Feedback from people interested in drm.active_us and drm.memory.stat is
>required to understand the use cases and their usefulness (of the fields).
>
>Memory stats are something which was easy to add to my series, since I was
>already working on the fdinfo memory stats patches, but the question is how
>useful it is.
>
Hi Tvrtko,

I think this style of driver-defined categories for reporting of
memory could potentially allow us to eliminate the GPU memory tracking
tracepoint used on Android (gpu_mem_total). This would involve reading
drm.memory.stat at the root cgroup (I see it's currently disabled on
the root), which means traversing the whole cgroup tree under the
cgroup lock to generate the values on-demand. This would be done
rarely, but I still wonder what the cost of that would turn out to be.
The drm_memory_stats categories in the output don't seem like a big
value-add for this use-case, but no real objection to them being
there. I know it's called the DRM cgroup controller, but it'd be nice
if there were a way to make the mem tracking part work for any driver
that wishes to participate as many of our devices don't use a DRM
driver. But making that work doesn't look like it would fit very
cleanly into this controller, so I'll just shut up now.

Thanks!
-T.J.


Re: [Intel-gfx] [PATCH v4 2/6] dma-buf/heaps: Don't assert held reservation lock for dma-buf mmapping

2023-06-21 Thread T.J. Mercier
On Wed, Jun 21, 2023 at 11:16 AM Dmitry Osipenko
 wrote:
>
> Hi,
>
> On 6/21/23 20:21, T.J. Mercier wrote:
> > On Mon, May 29, 2023 at 3:46 PM Dmitry Osipenko
> >  wrote:
> >>
> >> Don't assert held dma-buf reservation lock on memory mapping of exported
> >> buffer.
> >>
> >> We're going to change dma-buf mmap() locking policy such that exporters
> >> will have to handle the lock. The previous locking policy caused deadlock
> >> problem for DRM drivers in a case of self-imported dma-bufs once these
> >> drivers are moved to use reservation lock universally. The problem
> >> solved by moving the lock down to exporters. This patch prepares dma-buf
> >> heaps for the locking policy update.
> >>
> > Hi Dmitry,
> >
> > I see that in patch 6 of this series calls to
> > dma_resv_lock/dma_resv_unlock have been added to the
> > drm_gem_shmem_helper functions and some exporters. But I'm curious why
> > no dma_resv_lock/dma_resv_unlock calls were added to these two dma-buf
> > heap exporters for mmap?
>
> DMA-buf heaps are exporters, drm_gem_shmem_helper is importer. Locking
> rules are different for importers and exporters.
>
> DMA-heaps use own locking, they can be moved to resv lock in the future.
>
> DMA-heaps don't protect internal data in theirs mmap() implementations,
> nothing to protect there.
>
Thanks.


Re: [Intel-gfx] [PATCH v4 2/6] dma-buf/heaps: Don't assert held reservation lock for dma-buf mmapping

2023-06-21 Thread T.J. Mercier
On Mon, May 29, 2023 at 3:46 PM Dmitry Osipenko
 wrote:
>
> Don't assert held dma-buf reservation lock on memory mapping of exported
> buffer.
>
> We're going to change dma-buf mmap() locking policy such that exporters
> will have to handle the lock. The previous locking policy caused deadlock
> problem for DRM drivers in a case of self-imported dma-bufs once these
> drivers are moved to use reservation lock universally. The problem
> solved by moving the lock down to exporters. This patch prepares dma-buf
> heaps for the locking policy update.
>
Hi Dmitry,

I see that in patch 6 of this series calls to
dma_resv_lock/dma_resv_unlock have been added to the
drm_gem_shmem_helper functions and some exporters. But I'm curious why
no dma_resv_lock/dma_resv_unlock calls were added to these two dma-buf
heap exporters for mmap?

Thanks,
T.J.

> Reviewed-by: Emil Velikov 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/dma-buf/heaps/cma_heap.c| 3 ---
>  drivers/dma-buf/heaps/system_heap.c | 3 ---
>  2 files changed, 6 deletions(-)
>
> diff --git a/drivers/dma-buf/heaps/cma_heap.c 
> b/drivers/dma-buf/heaps/cma_heap.c
> index 1131fb943992..28fb04eccdd0 100644
> --- a/drivers/dma-buf/heaps/cma_heap.c
> +++ b/drivers/dma-buf/heaps/cma_heap.c
> @@ -13,7 +13,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -183,8 +182,6 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct 
> vm_area_struct *vma)
>  {
> struct cma_heap_buffer *buffer = dmabuf->priv;
>
> -   dma_resv_assert_held(dmabuf->resv);
> -
> if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
> return -EINVAL;
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c 
> b/drivers/dma-buf/heaps/system_heap.c
> index e8bd10e60998..fcf836ba9c1f 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -13,7 +13,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -202,8 +201,6 @@ static int system_heap_mmap(struct dma_buf *dmabuf, 
> struct vm_area_struct *vma)
> struct sg_page_iter piter;
> int ret;
>
> -   dma_resv_assert_held(dmabuf->resv);
> -
> for_each_sgtable_page(table, , vma->vm_pgoff) {
> struct page *page = sg_page_iter_page();
>
> --
> 2.40.1
>


Re: [Intel-gfx] [RFC 13/17] cgroup/drm: Ability to periodically scan cgroups for over budget GPU usage

2022-10-24 Thread T.J. Mercier
On Wed, Oct 19, 2022 at 10:34 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> Add a scanning worker, which if enabled, periodically queries the cgroup
> for GPU usage and if over budget (as configured by it's relative weight
> share) notifies the drm core about the fact.
>
> This is off by default and can be enabled by configuring a scanning
> period using the drm.period_us cgroup control file.
>
> Signed-off-by: Tvrtko Ursulin 
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  35 +-
>  kernel/cgroup/drm.c | 426 +++-
>  2 files changed, 459 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst 
> b/Documentation/admin-guide/cgroup-v2.rst
> index 1f3cca4e2572..318f463a1316 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -2401,7 +2401,8 @@ HugeTLB Interface Files
>  DRM
>  ---
>
> -The DRM controller allows configuring static hierarchical scheduling 
> priority.
> +The DRM controller allows configuring static hierarchical scheduling priority
> +and scheduling soft limits.
>
>  DRM static priority control
>  ~~~
> @@ -2458,6 +2459,38 @@ DRM static priority interface files
> Read only integer showing the current effective priority level for the
> group. Effective meaning taking into account the chain of inherited
>
> +DRM scheduling soft limits
> +~~
> +
> +Because of the heterogenous hardware and driver DRM capabilities, soft limits
> +are implemented as a loose co-operative (bi-directional) interface between 
> the
> +controller and DRM core.
> +
> +The controller configures the GPU time allowed per group and periodically 
> scans
> +the belonging tasks to detect the over budget condition, at which point it
> +invokes a callback notifying the DRM core of the condition.
> +
> +DRM core provides an API to query per process GPU utilization and 2nd API to
> +receive notification from the cgroup controller when the group enters or 
> exits
> +the over budget condition.
> +
> +Individual DRM drivers which implement the interface are expected to act on 
> this
> +in the best-effort manner only. There are no guarantees that the soft limits
> +will be respected.
> +
> +DRM scheduling soft limits interface files
> +~~
> +
> +  drm.weight
> +   Standard cgroup weight based control [1, 1] used to configure the
> +   relative distributing of GPU time between the sibling groups.
> +
> +  drm.period_us
> +   An integer representing the period with which the controller should 
> look
> +   at the GPU usage by the group and potentially send the over/under 
> budget
> +   signal.
> +   Value of zero (defaul) disables the soft limit checking.
> +
>  Misc
>  
>
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 48f1eaaa1c07..af50ead1564a 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -18,6 +18,29 @@ struct drm_cgroup_state {
> int priority;
> int effective_priority;
> unsigned int weight;
> +   unsigned int period_us;
> +
> +   bool scanning_suspended;
> +   unsigned int suspended_period_us;
> +
> +   struct delayed_work scan_work;
> +
> +   /*
> +* Below fields are owned and updated by the scan worker. Either the
> +* worker accesses them, or worker needs to be suspended and synced
> +* before they can be touched from the outside.
> +*/
> +   bool scanned;
> +
> +   ktime_t prev_timestamp;
> +
> +   u64 sum_children_weights;
> +   u64 children_active_us;
> +   u64 per_s_budget_ns;
> +   u64 prev_active_us;
> +   u64 active_us;
> +
> +   bool over_budget;
>  };
>
>  static DEFINE_MUTEX(drmcg_mutex);
> @@ -33,6 +56,31 @@ static inline struct drm_cgroup_state 
> *get_task_drmcs(struct task_struct *task)
> return css_to_drmcs(task_get_css(task, drm_cgrp_id));
>  }
>
> +static u64 drmcs_get_active_time_us(struct drm_cgroup_state *drmcs)
> +{
> +   struct cgroup *cgrp = drmcs->css.cgroup;
> +   struct task_struct *task;
> +   struct css_task_iter it;
> +   u64 total = 0;
> +
> +   css_task_iter_start(>self,
> +   CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED,
> +   );
> +   while ((task = css_task_iter_next())) {
> +   u64 time;
> +
> +   /* Ignore kernel threads here. */
> +   if (task->flags & PF_KTHREAD)
> +   continue;
> +
> +   time = drm_pid_get_active_time_us(task_pid(task));
> +   total += time;
> +   }
> +   css_task_iter_end();
> +
> +   return total;
> +}
> +
>  int drmcgroup_lookup_effective_priority(struct task_struct *task)
>  {
> struct drm_cgroup_state *drmcs = get_task_drmcs(task);
> @@ -202,9 +250,301 @@ static int