Re: [RFC 6/8] drm: Document fdinfo format specification

2021-07-23 Thread Nieto, David M
[AMD Official Use Only]

I just want to make a comment that with this approach (the ns) calculating the 
percentage will take at least two reads of the fdinfo per pid over some time. 
Some engines may be able to provide a single shot percentage usage over an 
internal integration period. That is, for example, what we currently have 
implemented for that exact reason.

I'd like to propose that we add an optional set of fields for this. Also, I may 
have missed a message, but why did we remove the timstamp? It is needed for 
accurate measurements of engine usage.

David

From: Daniel Vetter 
Sent: Friday, July 23, 2021 9:47 AM
To: Daniel Stone 
Cc: Tvrtko Ursulin ; intel-gfx 
; Tvrtko Ursulin ; 
Koenig, Christian ; dri-devel 
; Nieto, David M 
Subject: Re: [RFC 6/8] drm: Document fdinfo format specification

On Fri, Jul 23, 2021 at 05:43:01PM +0100, Daniel Stone wrote:
> Hi Tvrtko,
> Thanks for typing this up!
>
> On Thu, 15 Jul 2021 at 10:18, Tvrtko Ursulin
>  wrote:
> > +Mandatory fully standardised keys
> > +-
> > +
> > +- drm-driver: 
> > +
> > +String shall contain a fixed string uniquely identified the driver handling
> > +the device in question. For example name of the respective kernel module.
>
> I think let's be more prescriptive and just say that it is the module name.

Just a quick comment on this one.

drm_driver.name is already uapi, so let's please not invent a new one. The
shared code should probably make sure drivers don't get this wrong. Maybe
good if we document the getverion ioctl, which also exposes this, and then
link between the two.
-Daniel

>
> > +Optional fully standardised keys
> > +
> > +
> > +- drm-pdev: 
> > +
> > +For PCI devices this should contain the PCI slot address of the device in
> > +question.
>
> How about just major:minor of the DRM render node device it's attached to?
>
> > +- drm-client-id: 
> > +
> > +Unique value relating to the open DRM file descriptor used to distinguish
> > +duplicated and shared file descriptors. Conceptually the value should map 
> > 1:1
> > +to the in kernel representation of `struct drm_file` instances.
> > +
> > +Uniqueness of the value shall be either globally unique, or unique within 
> > the
> > +scope of each device, in which case `drm-pdev` shall be present as well.
> > +
> > +Userspace should make sure to not double account any usage statistics by 
> > using
> > +the above described criteria in order to associate data to individual 
> > clients.
> > +
> > +- drm-engine-:  ns
> > +
> > +GPUs usually contain multiple execution engines. Each shall be given a 
> > stable
> > +and unique name (str), with possible values documented in the driver 
> > specific
> > +documentation.
> > +
> > +Value shall be in specified time units which the respective GPU engine 
> > spent
> > +busy executing workloads belonging to this client.
> > +
> > +Values are not required to be constantly monotonic if it makes the driver
> > +implementation easier, but are required to catch up with the previously 
> > reported
> > +larger value within a reasonable period. Upon observing a value lower than 
> > what
> > +was previously read, userspace is expected to stay with that larger 
> > previous
> > +value until a monotonic update is seen.
>
> Yeah, that would work well for Mali/Panfrost. We can queue multiple
> jobs in the hardware, which can either be striped across multiple
> cores with an affinity mask (e.g. 3 cores for your client and 1 for
> your compositor), or picked according to priority, or ...
>
> The fine-grained performance counters (e.g. time spent waiting for
> sampler) are only GPU-global. So if you have two jobs running
> simultaneously, you have no idea who's responsible for what.
>
> But it does give us coarse-grained counters which are accounted
> per-job-slot, including exactly this metric: amount of 'GPU time'
> (whatever that means) occupied by that job slot during the sampling
> period. So we could support that nicely if we fenced job-slot updates
> with register reads/writes.
>
> Something I'm missing though is how we enable this information. Seems
> like it would be best to either only do it whilst fdinfo is open (and
> re-read it whenever you need an update), or on a per-driver sysfs
> toggle, or ... ?
>
> > +- drm-memory-:  [KiB|MiB]
> > +
> > +Each possible memory type which can be used to store buffer objects by the
> > +GPU in question shall be given a stable and unique name to be returned as 
> > the
> &

Re: [RFC 7/7] drm/i915: Expose client engine utilisation via fdinfo

2021-05-20 Thread Nieto, David M
[AMD Official Use Only]

i would like to add a unit marker for the stats that we monitor in the fd, as 
we discussed currently we are displaying the usage percentage, because we 
wanted to to provide single query percentages, but this may evolve with time.

May I suggest to add two new fields

drm-stat-interval: <64 bit> ns
drm-stat-timestamp: <64 bit> ns

If interval is set, engine utilization is calculated by doing  = 
100*/
if interval is not set, two reads are needed :  = 
100* / 


Regards,

David



From: Tvrtko Ursulin 
Sent: Thursday, May 20, 2021 8:12 AM
To: intel-...@lists.freedesktop.org 
Cc: dri-devel@lists.freedesktop.org ; Tvrtko 
Ursulin ; Nieto, David M ; 
Koenig, Christian ; Daniel Vetter 
Subject: [RFC 7/7] drm/i915: Expose client engine utilisation via fdinfo

From: Tvrtko Ursulin 

Similar to AMD commit
874442541133 ("drm/amdgpu: Add show_fdinfo() interface"), using the
infrastructure added in previous patches, we add basic client info
and GPU engine utilisation for i915.

Example of the output:

  pos:0
  flags:  012
  mnt_id: 21
  drm-driver: i915
  drm-pdev:   :00:02.0
  drm-client-id:  7
  drm-engine-render:  9288864723 ns
  drm-engine-copy:2035071108 ns
  drm-engine-video:   0 ns
  drm-engine-video-enhance:   0 ns

DRM related fields are appropriately prefixed for easy parsing and
separation from generic fdinfo fields.

Idea is for some fields to become either fully or partially standardised
in order to enable writting of generic top-like tools.

Initial proposal for fully standardised common fields:

 drm-driver: 
 drm-pdev: 

Optional fully standardised:

 drm-client-id: 

Optional partially standardised:

 engine-:  ns
 memory-:  KiB

Once agreed the format would need to go to some README or kerneldoc in
DRM core.

Signed-off-by: Tvrtko Ursulin 
Cc: David M Nieto 
Cc: Christian König 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_drm_client.c | 68 ++
 drivers/gpu/drm/i915/i915_drm_client.h |  4 ++
 drivers/gpu/drm/i915/i915_drv.c|  3 ++
 3 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index 1e5db7753276..5e9cfba1116b 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -9,6 +9,11 @@

 #include 

+#include 
+
+#include "gem/i915_gem_context.h"
+#include "gt/intel_engine_user.h"
+
 #include "i915_drm_client.h"
 #include "i915_drv.h"
 #include "i915_gem.h"
@@ -168,3 +173,66 @@ void i915_drm_clients_fini(struct i915_drm_clients 
*clients)

 xa_destroy(>xarray);
 }
+
+#ifdef CONFIG_PROC_FS
+static const char * const uabi_class_names[] = {
+   [I915_ENGINE_CLASS_RENDER] = "render",
+   [I915_ENGINE_CLASS_COPY] = "copy",
+   [I915_ENGINE_CLASS_VIDEO] = "video",
+   [I915_ENGINE_CLASS_VIDEO_ENHANCE] = "video-enhance",
+};
+
+static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
+{
+   struct i915_gem_engines_iter it;
+   struct intel_context *ce;
+   u64 total = 0;
+
+   for_each_gem_engine(ce, rcu_dereference(ctx->engines), it) {
+   if (ce->engine->uabi_class != class)
+   continue;
+
+   total += intel_context_get_total_runtime_ns(ce);
+   }
+
+   return total;
+}
+
+static void
+show_client_class(struct seq_file *m,
+ struct i915_drm_client *client,
+ unsigned int class)
+{
+   const struct list_head *list = >ctx_list;
+   u64 total = atomic64_read(>past_runtime[class]);
+   struct i915_gem_context *ctx;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(ctx, list, client_link)
+   total += busy_add(ctx, class);
+   rcu_read_unlock();
+
+   return seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+ uabi_class_names[class], total);
+}
+
+void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
+{
+   struct drm_file *file = f->private_data;
+   struct drm_i915_file_private *file_priv = file->driver_priv;
+   struct drm_i915_private *i915 = file_priv->dev_priv;
+   struct i915_drm_client *client = file_priv->client;
+   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
+   unsigned int i;
+
+   seq_printf(m, "drm-driver:\ti915\n");
+   seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
+  pci_domain_nr(pdev->bus), pdev->bus->number,
+  PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
+
+   seq_printf(m, "drm-client-id:\t%u\n", client->id);
+
+   for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
+   show_client_class(m, client, i);
+}
+#endif
diff --git a/drivers/gpu/drm/i915

Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

2021-05-19 Thread Nieto, David M
[AMD Official Use Only]

Parsing over 550 processes for fdinfo is taking between 40-100ms single 
threaded in a 2GHz skylake IBRS within a VM using simple string comparisons and 
DIRent parsing. And that is pretty much the worst case scenario with some more 
optimized implementations.

David

From: Daniel Vetter 
Sent: Wednesday, May 19, 2021 11:23 AM
To: Tvrtko Ursulin 
Cc: Daniel Stone ; jhubb...@nvidia.com 
; nouv...@lists.freedesktop.org 
; Intel Graphics Development 
; Maling list - DRI developers 
; Simon Ser ; Koenig, 
Christian ; arit...@nvidia.com ; 
Nieto, David M 
Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
 wrote:
>
>
> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> >
> > On 18/05/2021 10:16, Daniel Stone wrote:
> >> Hi,
> >>
> >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> >>  wrote:
> >>> I was just wondering if stat(2) and a chrdev major check would be a
> >>> solid criteria to more efficiently (compared to parsing the text
> >>> content) detect drm files while walking procfs.
> >>
> >> Maybe I'm missing something, but is the per-PID walk actually a
> >> measurable performance issue rather than just a bit unpleasant?
> >
> > Per pid and per each open fd.
> >
> > As said in the other thread what bothers me a bit in this scheme is that
> > the cost of obtaining GPU usage scales based on non-GPU criteria.
> >
> > For use case of a top-like tool which shows all processes this is a
> > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > higher.
>
> To further expand, not only cost would scale per pid multiplies per open
> fd, but to detect which of the fds are DRM I see these three options:
>
> 1) Open and parse fdinfo.
> 2) Name based matching ie /dev/dri/.. something.
> 3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.

> All sound quite sub-optimal to me.
>
> Name based matching is probably the least evil on system resource usage
> (Keeping the dentry cache too hot? Too many syscalls?), even though
> fundamentally I don't it is the right approach.
>
> What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).

> Does anyone have any feedback on the /proc//gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2Fdata=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3Dreserved=0


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Nieto, David M
[Public]

Cycling some of the Nvidia/nouveau guys here too.

I think there is a benefit on trying to estandarize how fdinfo can be used to 
expose per engine and device memory utilization.

Another of the advantages of going the /proc/ way instead of the sysfs debugfs 
approach is that you inherit the access lists directly from the distribution 
and you don't need to start messing with ownership and group access. By default 
an user can monitor its own processes as long as /proc is mounted.

I am not saying that fdinfo or the way we implemented is 100% the way to go, 
but I'd rather have a solution within the confines of proc first.

David




From: Nieto, David M 
Sent: Monday, May 17, 2021 11:02 AM
To: Tvrtko Ursulin ; Daniel Vetter 
; Koenig, Christian 
Cc: Alex Deucher ; Intel Graphics Development 
; Maling list - DRI developers 

Subject: Re: [PATCH 0/7] Per client engine busyness

The format is simple:

:  %

we also have entries for the memory mapped:
mem  :  KiB

On my submission 
https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html I added a 
python script to print out the info. It has a CPU usage lower that top, for 
example.

To be absolutely honest, I agree that there is an overhead, but It might not be 
as much as you fear.

From: Tvrtko Ursulin 
Sent: Monday, May 17, 2021 9:00 AM
To: Nieto, David M ; Daniel Vetter ; 
Koenig, Christian 
Cc: Alex Deucher ; Intel Graphics Development 
; Maling list - DRI developers 

Subject: Re: [PATCH 0/7] Per client engine busyness


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
>
>
> Maybe we could try to standardize how the different submission ring
>   usage gets exposed in the fdinfo? We went the simple way of just
> adding name and index, but if someone has a suggestion on how else we
> could format them so there is commonality across vendors we could just
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Nieto, David M
[AMD Official Use Only]

The format is simple:

:  %

we also have entries for the memory mapped:
mem  :  KiB

On my submission 
https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html I added a 
python script to print out the info. It has a CPU usage lower that top, for 
example.

To be absolutely honest, I agree that there is an overhead, but It might not be 
as much as you fear.

From: Tvrtko Ursulin 
Sent: Monday, May 17, 2021 9:00 AM
To: Nieto, David M ; Daniel Vetter ; 
Koenig, Christian 
Cc: Alex Deucher ; Intel Graphics Development 
; Maling list - DRI developers 

Subject: Re: [PATCH 0/7] Per client engine busyness


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
>
>
> Maybe we could try to standardize how the different submission ring
>   usage gets exposed in the fdinfo? We went the simple way of just
> adding name and index, but if someone has a suggestion on how else we
> could format them so there is commonality across vendors we could just
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Nieto, David M
[AMD Official Use Only]

Maybe we could try to standardize how the different submission ring  usage gets 
exposed in the fdinfo? We went the simple way of just adding name and index, 
but if someone has a suggestion on how else we could format them so there is 
commonality across vendors we could just amend those.

I’d really like to have the process managers tools display GPU usage regardless 
of what vendor is installed.


From: Daniel Vetter 
Sent: Monday, May 17, 2021 7:30:47 AM
To: Koenig, Christian 
Cc: Tvrtko Ursulin ; Nieto, David M 
; Alex Deucher ; Intel Graphics 
Development ; Maling list - DRI developers 
; Daniel Vetter 
Subject: Re: [PATCH 0/7] Per client engine busyness

On Fri, May 14, 2021 at 05:10:29PM +0200, Christian König wrote:
> Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> >
> > On 14/05/2021 15:56, Christian König wrote:
> > > Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> > > >
> > > > On 14/05/2021 14:53, Christian König wrote:
> > > > > >
> > > > > > David also said that you considered sysfs but were wary
> > > > > > of exposing process info in there. To clarify, my patch
> > > > > > is not exposing sysfs entry per process, but one per
> > > > > > open drm fd.
> > > > > >
> > > > >
> > > > > Yes, we discussed this as well, but then rejected the approach.
> > > > >
> > > > > To have useful information related to the open drm fd you
> > > > > need to related that to process(es) which have that file
> > > > > descriptor open. Just tracking who opened it first like DRM
> > > > > does is pretty useless on modern systems.
> > > >
> > > > We do update the pid/name for fds passed over unix sockets.
> > >
> > > Well I just double checked and that is not correct.
> > >
> > > Could be that i915 has some special code for that, but on my laptop
> > > I only see the X server under the "clients" debugfs file.
> >
> > Yes we have special code in i915 for this. Part of this series we are
> > discussing here.
>
> Ah, yeah you should mention that. Could we please separate that into common
> code instead? Cause I really see that as a bug in the current handling
> independent of the discussion here.
>
> As far as I know all IOCTLs go though some common place in DRM anyway.

Yeah, might be good to fix that confusion in debugfs. But since that's
non-uapi, I guess no one ever cared (enough).

> > > > > But an "lsof /dev/dri/renderD128" for example does exactly
> > > > > what top does as well, it iterates over /proc and sees which
> > > > > process has that file open.
> > > >
> > > > Lsof is quite inefficient for this use case. It has to open
> > > > _all_ open files for _all_ processes on the system to find a
> > > > handful of ones which may have the DRM device open.
> > >
> > > Completely agree.
> > >
> > > The key point is you either need to have all references to an open
> > > fd, or at least track whoever last used that fd.
> > >
> > > At least the last time I looked even the fs layer didn't know which
> > > fd is open by which process. So there wasn't really any alternative
> > > to the lsof approach.
> >
> > I asked you about the use case you have in mind which you did not
> > answer. Otherwise I don't understand when do you need to walk all files.
> > What information you want to get?
>
> Per fd debugging information, e.g. instead of the top use case you know
> which process you want to look at.
>
> >
> > For the use case of knowing which DRM file is using how much GPU time on
> > engine X we do not need to walk all open files either with my sysfs
> > approach or the proc approach from Chris. (In the former case we
> > optionally aggregate by PID at presentation time, and in the latter case
> > aggregation is implicit.)
>
> I'm unsure if we should go with the sysfs, proc or some completely different
> approach.
>
> In general it would be nice to have a way to find all the fd references for
> an open inode.

Yeah, but that maybe needs to be an ioctl or syscall or something on the
inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
open files? If this really is a real world problem, but given that
top/lsof and everyone else hasn't asked for it yet maybe it's not.

Also I replied in some other thread, I really like the fdinfo stuff, and I
think trying to somewhat standardized this across drivers would be neat.
Especially since i915 is going to adopt drm/scheduler for front-end
scheduling too, so at least some of this should be fairly easy to share.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2Fdata=04%7C01%7CDavid.Nieto%40amd.com%7C3711fdd207484d6bb5fd08d919405eb0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637568586536251118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=so13elRn0Ffl7w51QEk%2F%2FXmxOav9n5p6fNXrnDBVY%2B0%3Dreserved=0


Re: [PATCH 0/7] Per client engine busyness

2021-05-14 Thread Nieto, David M
[AMD Official Use Only - Internal Distribution Only]

We had entertained the idea of exposing the processes as sysfs nodes as you 
proposed, but we had concerns about exposing process info in there, especially 
since /proc already exists for that purpose.

I think if you were to follow that approach, we could have tools like top that 
support exposing GPU engine usage.

From: Alex Deucher 
Sent: Thursday, May 13, 2021 10:58 PM
To: Tvrtko Ursulin ; Nieto, David M 
; Koenig, Christian 
Cc: Intel Graphics Development ; Maling list - 
DRI developers ; Daniel Vetter 

Subject: Re: [PATCH 0/7] Per client engine busyness

+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Resurrect of the previosuly merged per client engine busyness patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful and show 
> >> not
> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> 
> >> intel-gpu-top -  906/ 955 MHz;0% RC6;  5.30 Watts;  933 irqs/s
> >>
> >>IMC reads: 4414 MiB/s
> >>   IMC writes: 3805 MiB/s
> >>
> >>ENGINE  BUSY  MI_SEMA 
> >> MI_WAIT
> >>   Render/3D/0   93.46% |▋  |  0%   
> >>0%
> >> Blitter/00.00% |   |  0%   
> >>0%
> >>   Video/00.00% |   |  0%   
> >>0%
> >>VideoEnhance/00.00% |   |  0%   
> >>0%
> >>
> >>PIDNAME  Render/3D  BlitterVideo  
> >> VideoEnhance
> >>   2733   neverball |██▌ |||||| 
> >>|
> >>   2047Xorg |███▊|||||| 
> >>|
> >>   2737glxgears |█▍  |||||| 
> >>|
> >>   2128   xfwm4 ||||||| 
> >>|
> >>   2047Xorg ||||||| 
> >>|
> >> 
> >>
> >> Internally we track time spent on engines for each struct intel_context, 
> >> both
> >> for current and past contexts belonging to each open DRM file.
> >>
> >> This can serve as a building block for several features from the wanted 
> >> list:
> >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
> >> functionality
> >> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> >> dynamic SSEU tuning, ...
> >>
> >> To enable userspace access to the tracked data, we expose time spent on 
> >> GPU per
> >> client and per engine class in sysfs with a hierarchy like the below:
> >>
> >>  # cd /sys/class/drm/card0/clients/
> >>  # tree
> >>  .
> >>  ├── 7
> >>  │   ├── busy
> >>  │   │   ├── 0
> >>  │   │   ├── 1
> >>  │   │   ├── 2
> >>  │   │   └── 3
> >>  │   ├── name
> >>  │   └── pid
> >>  ├── 8
> >>  │   ├── busy
> >>  │   │   ├── 0
> >>  │   │   ├── 1
> >>  │   │   ├── 2
> >>  │   │   └── 3
> >>  │   ├── name
> >>  │   └── pid
> >>  └── 9
> >>  ├── busy
> >>  │   ├── 0
> >>  │   ├── 1
> >>  │   ├── 2
> >>  │   └── 3
> >>  ├── name
> >>  └── pid
> >>
> >> Files in 'busy' directories are numbered using the engine class ABI values 
> >> and
> >> they contain accumulated nanoseconds each client spent on engines of a
> >> respective class.
> >
> > We did something similar in amdgpu using the gpu scheduler.  We the