Re: [PATCH 0/7] Per client engine busyness

2021-06-28 Thread Daniel Vetter
On Mon, Jun 28, 2021 at 12:18 PM Tvrtko Ursulin
 wrote:
>
>
>
> On 14/05/2021 16:10, Christian König wrote:
> > Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> >>
> >> On 14/05/2021 15:56, Christian König wrote:
> >>> Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> 
>  On 14/05/2021 14:53, Christian König wrote:
> >>
> >> David also said that you considered sysfs but were wary of
> >> exposing process info in there. To clarify, my patch is not
> >> exposing sysfs entry per process, but one per open drm fd.
> >>
> >
> > Yes, we discussed this as well, but then rejected the approach.
> >
> > To have useful information related to the open drm fd you need to
> > related that to process(es) which have that file descriptor open.
> > Just tracking who opened it first like DRM does is pretty useless
> > on modern systems.
> 
>  We do update the pid/name for fds passed over unix sockets.
> >>>
> >>> Well I just double checked and that is not correct.
> >>>
> >>> Could be that i915 has some special code for that, but on my laptop I
> >>> only see the X server under the "clients" debugfs file.
> >>
> >> Yes we have special code in i915 for this. Part of this series we are
> >> discussing here.
> >
> > Ah, yeah you should mention that. Could we please separate that into
> > common code instead? Cause I really see that as a bug in the current
> > handling independent of the discussion here.
>
> What we do in i915 is update the pid and name when a task different to
> the one which opened the fd does a GEM context create ioctl.
>
> Moving that to DRM core would be along the lines of doing the same check
> and update on every ioctl. Maybe allow the update to be one time only if
> that would work. Would this be desirable and acceptable? If so I can
> definitely sketch it out.

If we go with fdinfo for these it becomes clear who all owns the file,
since it's then a per-process thing. Not sure how much smarts we
should have for internal debugfs output. Maybe one-shot update on
first driver ioctl (since if you're on render nodes then X does the
drm auth dance, so "first ioctl" is wrong).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 0/7] Per client engine busyness

2021-06-28 Thread Tvrtko Ursulin




On 14/05/2021 16:10, Christian König wrote:

Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:


On 14/05/2021 15:56, Christian König wrote:

Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:


On 14/05/2021 14:53, Christian König wrote:


David also said that you considered sysfs but were wary of 
exposing process info in there. To clarify, my patch is not 
exposing sysfs entry per process, but one per open drm fd.




Yes, we discussed this as well, but then rejected the approach.

To have useful information related to the open drm fd you need to 
related that to process(es) which have that file descriptor open. 
Just tracking who opened it first like DRM does is pretty useless 
on modern systems.


We do update the pid/name for fds passed over unix sockets.


Well I just double checked and that is not correct.

Could be that i915 has some special code for that, but on my laptop I 
only see the X server under the "clients" debugfs file.


Yes we have special code in i915 for this. Part of this series we are 
discussing here.


Ah, yeah you should mention that. Could we please separate that into 
common code instead? Cause I really see that as a bug in the current 
handling independent of the discussion here.


What we do in i915 is update the pid and name when a task different to 
the one which opened the fd does a GEM context create ioctl.


Moving that to DRM core would be along the lines of doing the same check 
and update on every ioctl. Maybe allow the update to be one time only if 
that would work. Would this be desirable and acceptable? If so I can 
definitely sketch it out.


Regards,

Tvrtko


Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

2021-05-24 Thread Tvrtko Ursulin



On 20/05/2021 09:35, Tvrtko Ursulin wrote:

On 19/05/2021 19:23, Daniel Vetter wrote:

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
 wrote:



On 18/05/2021 10:40, Tvrtko Ursulin wrote:


On 18/05/2021 10:16, Daniel Stone wrote:

Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
 wrote:

I was just wondering if stat(2) and a chrdev major check would be a
solid criteria to more efficiently (compared to parsing the text
content) detect drm files while walking procfs.


Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?


Per pid and per each open fd.

As said in the other thread what bothers me a bit in this scheme is 
that

the cost of obtaining GPU usage scales based on non-GPU criteria.

For use case of a top-like tool which shows all processes this is a
smaller additional cost, but then for a gpu-top like tool it is 
somewhat

higher.


To further expand, not only cost would scale per pid multiplies per open
fd, but to detect which of the fds are DRM I see these three options:

1) Open and parse fdinfo.
2) Name based matching ie /dev/dri/.. something.
3) Stat the symlink target and check for DRM major.


stat with symlink following should be plenty fast.


Maybe. I don't think my point about keeping the dentry cache needlessly 
hot is getting through at all. On my lightly loaded desktop:


  $ sudo lsof | wc -l
  599551

  $ sudo lsof | grep "/dev/dri/" | wc -l
  1965

It's going to look up ~600k pointless dentries in every iteration. Just 
to find a handful of DRM ones. Hard to say if that is better or worse 
than just parsing fdinfo text for all files. Will see.


CPU usage looks passable under a production kernel (non-debug). Once a 
second refresh period, on a not really that loaded system (115 running 
processes, 3096 open file descriptors as reported by lsof, none of which 
are DRM), results in a system call heavy load:


real0m55.348s
user0m0.100s
sys 0m0.319s

Once per second loop is essentially along the lines of:

  for each pid in /proc/:
for each fd in /proc//fdinfo:
  if fstatat(fd) is drm major:
read fdinfo text in one sweep and parse it

I'll post the quick intel_gpu_top patch for reference but string parsing 
in C leaves a few things to be desired there.


Regards,

Tvrtko


Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness

2021-05-21 Thread arabek
> Well if it becomes a problem fixing the debugfs "clients" file and
> making it sysfs shouldn't be much of a problem later on.

Why not to try using something in terms of perf / opensnoop or bpf
to do the work. Should be optimal enough.

ie.
http://www.brendangregg.com/blog/2014-07-25/opensnoop-for-linux.html
https://man7.org/linux/man-pages/man2/bpf.2.html


Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

2021-05-20 Thread Christian König




Am 20.05.21 um 16:11 schrieb Daniel Vetter:

On Wed, May 19, 2021 at 11:17:24PM +, Nieto, David M wrote:

[AMD Official Use Only]

Parsing over 550 processes for fdinfo is taking between 40-100ms single
threaded in a 2GHz skylake IBRS within a VM using simple string
comparisons and DIRent parsing. And that is pretty much the worst case
scenario with some more optimized implementations.

I think this is plenty ok, and if it's not you could probably make this
massively faster with io_uring for all the fs operations and whack a
parser-generator on top for real parsing speed.


Well if it becomes a problem fixing the debugfs "clients" file and 
making it sysfs shouldn't be much of a problem later on.


Christian.



So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
approach at all, and focuse more on trying to reasonably (but not too
much, this is still drm render stuff after all) standardize how it works
and how we'll extend it all. I think there's tons of good suggestions in
this thread on this topic already.

/me out
-Daniel


David

From: Daniel Vetter 
Sent: Wednesday, May 19, 2021 11:23 AM
To: Tvrtko Ursulin 
Cc: Daniel Stone ; jhubb...@nvidia.com ; nouv...@lists.freedesktop.org 
; Intel Graphics Development ; Maling list - DRI 
developers ; Simon Ser ; Koenig, Christian 
; arit...@nvidia.com ; Nieto, David M 
Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
 wrote:


On 18/05/2021 10:40, Tvrtko Ursulin wrote:

On 18/05/2021 10:16, Daniel Stone wrote:

Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
 wrote:

I was just wondering if stat(2) and a chrdev major check would be a
solid criteria to more efficiently (compared to parsing the text
content) detect drm files while walking procfs.

Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?

Per pid and per each open fd.

As said in the other thread what bothers me a bit in this scheme is that
the cost of obtaining GPU usage scales based on non-GPU criteria.

For use case of a top-like tool which shows all processes this is a
smaller additional cost, but then for a gpu-top like tool it is somewhat
higher.

To further expand, not only cost would scale per pid multiplies per open
fd, but to detect which of the fds are DRM I see these three options:

1) Open and parse fdinfo.
2) Name based matching ie /dev/dri/.. something.
3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.


All sound quite sub-optimal to me.

Name based matching is probably the least evil on system resource usage
(Keeping the dentry cache too hot? Too many syscalls?), even though
fundamentally I don't it is the right approach.

What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).


Does anyone have any feedback on the /proc//gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2Fdata=04%7C01%7CChristian.Koenig%40amd.com%7Ced2eccaa081d4cd336d408d91b991ee0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637571166744508313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ZihrnanU70nJAM6bHYCjRnURDDCIdwGI85imjGd%2FNgs%3Dreserved=0




Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

2021-05-20 Thread Daniel Vetter
On Wed, May 19, 2021 at 11:17:24PM +, Nieto, David M wrote:
> [AMD Official Use Only]
> 
> Parsing over 550 processes for fdinfo is taking between 40-100ms single
> threaded in a 2GHz skylake IBRS within a VM using simple string
> comparisons and DIRent parsing. And that is pretty much the worst case
> scenario with some more optimized implementations.

I think this is plenty ok, and if it's not you could probably make this
massively faster with io_uring for all the fs operations and whack a
parser-generator on top for real parsing speed.

So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
approach at all, and focuse more on trying to reasonably (but not too
much, this is still drm render stuff after all) standardize how it works
and how we'll extend it all. I think there's tons of good suggestions in
this thread on this topic already.

/me out
-Daniel

> 
> David
> 
> From: Daniel Vetter 
> Sent: Wednesday, May 19, 2021 11:23 AM
> To: Tvrtko Ursulin 
> Cc: Daniel Stone ; jhubb...@nvidia.com 
> ; nouv...@lists.freedesktop.org 
> ; Intel Graphics Development 
> ; Maling list - DRI developers 
> ; Simon Ser ; Koenig, 
> Christian ; arit...@nvidia.com 
> ; Nieto, David M 
> Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
> 
> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
>  wrote:
> >
> >
> > On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> > >
> > > On 18/05/2021 10:16, Daniel Stone wrote:
> > >> Hi,
> > >>
> > >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> > >>  wrote:
> > >>> I was just wondering if stat(2) and a chrdev major check would be a
> > >>> solid criteria to more efficiently (compared to parsing the text
> > >>> content) detect drm files while walking procfs.
> > >>
> > >> Maybe I'm missing something, but is the per-PID walk actually a
> > >> measurable performance issue rather than just a bit unpleasant?
> > >
> > > Per pid and per each open fd.
> > >
> > > As said in the other thread what bothers me a bit in this scheme is that
> > > the cost of obtaining GPU usage scales based on non-GPU criteria.
> > >
> > > For use case of a top-like tool which shows all processes this is a
> > > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > > higher.
> >
> > To further expand, not only cost would scale per pid multiplies per open
> > fd, but to detect which of the fds are DRM I see these three options:
> >
> > 1) Open and parse fdinfo.
> > 2) Name based matching ie /dev/dri/.. something.
> > 3) Stat the symlink target and check for DRM major.
> 
> stat with symlink following should be plenty fast.
> 
> > All sound quite sub-optimal to me.
> >
> > Name based matching is probably the least evil on system resource usage
> > (Keeping the dentry cache too hot? Too many syscalls?), even though
> > fundamentally I don't it is the right approach.
> >
> > What happens with dup(2) is another question.
> 
> We need benchmark numbers showing that on anything remotely realistic
> it's an actual problem. Until we've demonstrated it's a real problem
> we don't need to solve it.
> 
> E.g. top with any sorting enabled also parses way more than it
> displays on every update. It seems to be doing Just Fine (tm).
> 
> > Does anyone have any feedback on the /proc//gpu idea at all?
> 
> When we know we have a problem to solve we can take a look at solutions.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2Fdata=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3Dreserved=0

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

2021-05-20 Thread Tvrtko Ursulin



On 19/05/2021 19:23, Daniel Vetter wrote:

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
 wrote:



On 18/05/2021 10:40, Tvrtko Ursulin wrote:


On 18/05/2021 10:16, Daniel Stone wrote:

Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
 wrote:

I was just wondering if stat(2) and a chrdev major check would be a
solid criteria to more efficiently (compared to parsing the text
content) detect drm files while walking procfs.


Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?


Per pid and per each open fd.

As said in the other thread what bothers me a bit in this scheme is that
the cost of obtaining GPU usage scales based on non-GPU criteria.

For use case of a top-like tool which shows all processes this is a
smaller additional cost, but then for a gpu-top like tool it is somewhat
higher.


To further expand, not only cost would scale per pid multiplies per open
fd, but to detect which of the fds are DRM I see these three options:

1) Open and parse fdinfo.
2) Name based matching ie /dev/dri/.. something.
3) Stat the symlink target and check for DRM major.


stat with symlink following should be plenty fast.


Maybe. I don't think my point about keeping the dentry cache needlessly 
hot is getting through at all. On my lightly loaded desktop:


 $ sudo lsof | wc -l
 599551

 $ sudo lsof | grep "/dev/dri/" | wc -l
 1965

It's going to look up ~600k pointless dentries in every iteration. Just 
to find a handful of DRM ones. Hard to say if that is better or worse 
than just parsing fdinfo text for all files. Will see.



All sound quite sub-optimal to me.

Name based matching is probably the least evil on system resource usage
(Keeping the dentry cache too hot? Too many syscalls?), even though
fundamentally I don't it is the right approach.

What happens with dup(2) is another question.


We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.


Point about dup(2) is whether it is possible to distinguish the 
duplicated fds in fdinfo. If a DRM client dupes, and we found two 
fdinfos each saying client is using 20% GPU, we don't want to add it up 
to 40%.



E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).


Ha, perceptions differ. I see it using 4-5% while building the kernel on 
a Xeon server which I find quite a lot. :)



Does anyone have any feedback on the /proc//gpu idea at all?


When we know we have a problem to solve we can take a look at solutions.


Yes I don't think the problem would be to add a better solution later, 
so happy to try the fdinfo first. I am simply pointing out a fundamental 
design inefficiency. Even if machines are getting faster and faster I 
don't think that should be an excuse to waste more and more under the 
hood, when a more efficient solution can be designed from the start.


Regards,

Tvrtko


Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

2021-05-19 Thread Nieto, David M
[AMD Official Use Only]

Parsing over 550 processes for fdinfo is taking between 40-100ms single 
threaded in a 2GHz skylake IBRS within a VM using simple string comparisons and 
DIRent parsing. And that is pretty much the worst case scenario with some more 
optimized implementations.

David

From: Daniel Vetter 
Sent: Wednesday, May 19, 2021 11:23 AM
To: Tvrtko Ursulin 
Cc: Daniel Stone ; jhubb...@nvidia.com 
; nouv...@lists.freedesktop.org 
; Intel Graphics Development 
; Maling list - DRI developers 
; Simon Ser ; Koenig, 
Christian ; arit...@nvidia.com ; 
Nieto, David M 
Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
 wrote:
>
>
> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> >
> > On 18/05/2021 10:16, Daniel Stone wrote:
> >> Hi,
> >>
> >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> >>  wrote:
> >>> I was just wondering if stat(2) and a chrdev major check would be a
> >>> solid criteria to more efficiently (compared to parsing the text
> >>> content) detect drm files while walking procfs.
> >>
> >> Maybe I'm missing something, but is the per-PID walk actually a
> >> measurable performance issue rather than just a bit unpleasant?
> >
> > Per pid and per each open fd.
> >
> > As said in the other thread what bothers me a bit in this scheme is that
> > the cost of obtaining GPU usage scales based on non-GPU criteria.
> >
> > For use case of a top-like tool which shows all processes this is a
> > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > higher.
>
> To further expand, not only cost would scale per pid multiplies per open
> fd, but to detect which of the fds are DRM I see these three options:
>
> 1) Open and parse fdinfo.
> 2) Name based matching ie /dev/dri/.. something.
> 3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.

> All sound quite sub-optimal to me.
>
> Name based matching is probably the least evil on system resource usage
> (Keeping the dentry cache too hot? Too many syscalls?), even though
> fundamentally I don't it is the right approach.
>
> What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).

> Does anyone have any feedback on the /proc//gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2Fdata=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3Dreserved=0


Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

2021-05-19 Thread Daniel Vetter
On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
 wrote:
>
>
> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> >
> > On 18/05/2021 10:16, Daniel Stone wrote:
> >> Hi,
> >>
> >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> >>  wrote:
> >>> I was just wondering if stat(2) and a chrdev major check would be a
> >>> solid criteria to more efficiently (compared to parsing the text
> >>> content) detect drm files while walking procfs.
> >>
> >> Maybe I'm missing something, but is the per-PID walk actually a
> >> measurable performance issue rather than just a bit unpleasant?
> >
> > Per pid and per each open fd.
> >
> > As said in the other thread what bothers me a bit in this scheme is that
> > the cost of obtaining GPU usage scales based on non-GPU criteria.
> >
> > For use case of a top-like tool which shows all processes this is a
> > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > higher.
>
> To further expand, not only cost would scale per pid multiplies per open
> fd, but to detect which of the fds are DRM I see these three options:
>
> 1) Open and parse fdinfo.
> 2) Name based matching ie /dev/dri/.. something.
> 3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.

> All sound quite sub-optimal to me.
>
> Name based matching is probably the least evil on system resource usage
> (Keeping the dentry cache too hot? Too many syscalls?), even though
> fundamentally I don't it is the right approach.
>
> What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).

> Does anyone have any feedback on the /proc//gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 0/7] Per client engine busyness

2021-05-19 Thread Tvrtko Ursulin



On 18/05/2021 10:40, Tvrtko Ursulin wrote:


On 18/05/2021 10:16, Daniel Stone wrote:

Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
 wrote:

I was just wondering if stat(2) and a chrdev major check would be a
solid criteria to more efficiently (compared to parsing the text
content) detect drm files while walking procfs.


Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?


Per pid and per each open fd.

As said in the other thread what bothers me a bit in this scheme is that 
the cost of obtaining GPU usage scales based on non-GPU criteria.


For use case of a top-like tool which shows all processes this is a 
smaller additional cost, but then for a gpu-top like tool it is somewhat 
higher.


To further expand, not only cost would scale per pid multiplies per open 
fd, but to detect which of the fds are DRM I see these three options:


1) Open and parse fdinfo.
2) Name based matching ie /dev/dri/.. something.
3) Stat the symlink target and check for DRM major.

All sound quite sub-optimal to me.

Name based matching is probably the least evil on system resource usage 
(Keeping the dentry cache too hot? Too many syscalls?), even though 
fundamentally I don't it is the right approach.


What happens with dup(2) is another question.

Does anyone have any feedback on the /proc//gpu idea at all?

Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Christian König

Am 18.05.21 um 11:35 schrieb Tvrtko Ursulin:


On 17/05/2021 19:02, Nieto, David M wrote:

[AMD Official Use Only]


The format is simple:

:  %


Hm what time period does the percent relate to?

The i915 implementation uses accumulated nanoseconds active. That way 
who reads the file can calculate the percentage relative to the time 
period between two reads of the file.


That sounds much saner to me as well. The percentage calculation inside 
the kernel looks suspiciously misplaced.





we also have entries for the memory mapped:
mem  :  KiB


Okay so in general key values per line in text format. Colon as 
delimiter.


What common fields could be useful between different drivers and what 
common naming scheme, in order to enable as easy as possible creation 
of a generic top-like tool?


driver: 
pdev: 
ring-: N 
...
mem-: N 
...

What else?
Is ring a good common name? We actually more use engine in i915 but I 
am not really bothered about it.


I would prefer engine as well. We are currently in the process of moving 
away from kernel rings, so that notion doesn't make much sense to keep 
forward.


Christian.



Aggregated GPU usage could be easily and generically done by userspace 
by adding all rings and normalizing.


On my submission 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-May%2F063149.htmldata=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=TW3HaPkqyr6jwhTUVRue3fGTyRfV4KnhEuRtTTI5fMY%3Dreserved=0 
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-May%2F063149.htmldata=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=TW3HaPkqyr6jwhTUVRue3fGTyRfV4KnhEuRtTTI5fMY%3Dreserved=0> I 
added a python script to print out the info. It has a CPU usage lower 
that top, for example.


To be absolutely honest, I agree that there is an overhead, but It 
might not be as much as you fear.


For me more the issue is that the extra number of operations grows 
with the number of open files on the system, which has no relation to 
the number of drm clients.


Extra so if the monitoring tool wants to show _only_ DRM processes. 
Then the cost scales with total number of processes time total number 
of files on the server.


This design inefficiency bothers me yes. This is somewhat alleviated 
by the proposal from Chris 
(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F419042%2F%3Fseries%3D86692%26rev%3D1data=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=jNfe8h2BalOOc1Y0Idcs3wxnNOi74XhulkRlebmpgJM%3Dreserved=0) 
although there are downsides there as well. Like needing to keep a map 
of pids to drm files in drivers.


Btw what do you do in that tool for same fd in a multi-threaded process
or so? Do you show duplicate entries or detect and ignore? I guess I 
did not figure out if you show by pid/tgid or by fd.


Regards,

Tvrtko



*From:* Tvrtko Ursulin 
*Sent:* Monday, May 17, 2021 9:00 AM
*To:* Nieto, David M ; Daniel Vetter 
; Koenig, Christian 
*Cc:* Alex Deucher ; Intel Graphics 
Development ; Maling list - DRI 
developers 

*Subject:* Re: [PATCH 0/7] Per client engine busyness

On 17/05/2021 15:39, Nieto, David M wrote:

[AMD Official Use Only]


Maybe we could try to standardize how the different submission ring 
   usage gets exposed in the fdinfo? We went the simple way of just 
adding name and index, but if someone has a suggestion on how else 
we could format them so there is commonality across vendors we could 
just amend those.


Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
   - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM 
devices.


Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

I’d really like to have the process managers tools display GPU usage 
regardless of what 

Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Tvrtko Ursulin



On 18/05/2021 10:16, Daniel Stone wrote:

Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
 wrote:

I was just wondering if stat(2) and a chrdev major check would be a
solid criteria to more efficiently (compared to parsing the text
content) detect drm files while walking procfs.


Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?


Per pid and per each open fd.

As said in the other thread what bothers me a bit in this scheme is that 
the cost of obtaining GPU usage scales based on non-GPU criteria.


For use case of a top-like tool which shows all processes this is a 
smaller additional cost, but then for a gpu-top like tool it is somewhat 
higher.


Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Tvrtko Ursulin



On 17/05/2021 19:02, Nieto, David M wrote:

[AMD Official Use Only]


The format is simple:

:  %


Hm what time period does the percent relate to?

The i915 implementation uses accumulated nanoseconds active. That way 
who reads the file can calculate the percentage relative to the time 
period between two reads of the file.



we also have entries for the memory mapped:
mem  :  KiB


Okay so in general key values per line in text format. Colon as delimiter.

What common fields could be useful between different drivers and what 
common naming scheme, in order to enable as easy as possible creation of 
a generic top-like tool?


driver: 
pdev: 
ring-: N 
...
mem-: N 
...

What else?
Is ring a good common name? We actually more use engine in i915 but I am 
not really bothered about it.


Aggregated GPU usage could be easily and generically done by userspace 
by adding all rings and normalizing.


On my submission 
https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html 
<https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html> I 
added a python script to print out the info. It has a CPU usage lower 
that top, for example.


To be absolutely honest, I agree that there is an overhead, but It might 
not be as much as you fear.


For me more the issue is that the extra number of operations grows with 
the number of open files on the system, which has no relation to the 
number of drm clients.


Extra so if the monitoring tool wants to show _only_ DRM processes. Then 
the cost scales with total number of processes time total number of 
files on the server.


This design inefficiency bothers me yes. This is somewhat alleviated by 
the proposal from Chris 
(https://patchwork.freedesktop.org/patch/419042/?series=86692=1) 
although there are downsides there as well. Like needing to keep a map 
of pids to drm files in drivers.


Btw what do you do in that tool for same fd in a multi-threaded process
or so? Do you show duplicate entries or detect and ignore? I guess I did 
not figure out if you show by pid/tgid or by fd.


Regards,

Tvrtko



*From:* Tvrtko Ursulin 
*Sent:* Monday, May 17, 2021 9:00 AM
*To:* Nieto, David M ; Daniel Vetter 
; Koenig, Christian 
*Cc:* Alex Deucher ; Intel Graphics Development 
; Maling list - DRI developers 


*Subject:* Re: [PATCH 0/7] Per client engine busyness

On 17/05/2021 15:39, Nieto, David M wrote:

[AMD Official Use Only]


Maybe we could try to standardize how the different submission ring 
   usage gets exposed in the fdinfo? We went the simple way of just 
adding name and index, but if someone has a suggestion on how else we 
could format them so there is commonality across vendors we could just 
amend those.


Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
   - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

I’d really like to have the process managers tools display GPU usage 
regardless of what vendor is installed.


Definitely.

Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Daniel Stone
Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
 wrote:
> I was just wondering if stat(2) and a chrdev major check would be a
> solid criteria to more efficiently (compared to parsing the text
> content) detect drm files while walking procfs.

Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?

Cheers,
Daniel


Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Tvrtko Ursulin



On 17/05/2021 20:03, Simon Ser wrote:

On Monday, May 17th, 2021 at 8:16 PM, Nieto, David M  
wrote:


Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.


It's not in the headers, but it's de facto uAPI, as seen in libdrm:

 > git grep 226
 xf86drm.c
 99:#define DRM_MAJOR 226 /* Linux */


I suspected it would be yes, thanks.

I was just wondering if stat(2) and a chrdev major check would be a 
solid criteria to more efficiently (compared to parsing the text 
content) detect drm files while walking procfs.


Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Christian König

Am 17.05.21 um 16:30 schrieb Daniel Vetter:

[SNIP]

Could be that i915 has some special code for that, but on my laptop
I only see the X server under the "clients" debugfs file.

Yes we have special code in i915 for this. Part of this series we are
discussing here.

Ah, yeah you should mention that. Could we please separate that into common
code instead? Cause I really see that as a bug in the current handling
independent of the discussion here.

As far as I know all IOCTLs go though some common place in DRM anyway.

Yeah, might be good to fix that confusion in debugfs. But since that's
non-uapi, I guess no one ever cared (enough).


Well we cared, problem is that we didn't know how to fix it properly and 
pretty much duplicated it in the VM code :)



For the use case of knowing which DRM file is using how much GPU time on
engine X we do not need to walk all open files either with my sysfs
approach or the proc approach from Chris. (In the former case we
optionally aggregate by PID at presentation time, and in the latter case
aggregation is implicit.)

I'm unsure if we should go with the sysfs, proc or some completely different
approach.

In general it would be nice to have a way to find all the fd references for
an open inode.

Yeah, but that maybe needs to be an ioctl or syscall or something on the
inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
open files? If this really is a real world problem, but given that
top/lsof and everyone else hasn't asked for it yet maybe it's not.


Well has anybody already measured how much overhead it would be to 
iterate over the relevant data structures in the kernel instead of 
userspace?


I mean we don't really need the tracking when a couple of hundred fd 
tables can be processed in just a few ms because of lockless RCU protection.



Also I replied in some other thread, I really like the fdinfo stuff, and I
think trying to somewhat standardized this across drivers would be neat.
Especially since i915 is going to adopt drm/scheduler for front-end
scheduling too, so at least some of this should be fairly easy to share.


Yeah, that sounds like a good idea to me as well.

Regards,
Christian.



Cheers, Daniel




Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Simon Ser
On Monday, May 17th, 2021 at 8:16 PM, Nieto, David M  
wrote:

> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

It's not in the headers, but it's de facto uAPI, as seen in libdrm:

> git grep 226
xf86drm.c
99:#define DRM_MAJOR 226 /* Linux */


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Nieto, David M
[Public]

Cycling some of the Nvidia/nouveau guys here too.

I think there is a benefit on trying to estandarize how fdinfo can be used to 
expose per engine and device memory utilization.

Another of the advantages of going the /proc/ way instead of the sysfs debugfs 
approach is that you inherit the access lists directly from the distribution 
and you don't need to start messing with ownership and group access. By default 
an user can monitor its own processes as long as /proc is mounted.

I am not saying that fdinfo or the way we implemented is 100% the way to go, 
but I'd rather have a solution within the confines of proc first.

David




From: Nieto, David M 
Sent: Monday, May 17, 2021 11:02 AM
To: Tvrtko Ursulin ; Daniel Vetter 
; Koenig, Christian 
Cc: Alex Deucher ; Intel Graphics Development 
; Maling list - DRI developers 

Subject: Re: [PATCH 0/7] Per client engine busyness

The format is simple:

:  %

we also have entries for the memory mapped:
mem  :  KiB

On my submission 
https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html I added a 
python script to print out the info. It has a CPU usage lower that top, for 
example.

To be absolutely honest, I agree that there is an overhead, but It might not be 
as much as you fear.

From: Tvrtko Ursulin 
Sent: Monday, May 17, 2021 9:00 AM
To: Nieto, David M ; Daniel Vetter ; 
Koenig, Christian 
Cc: Alex Deucher ; Intel Graphics Development 
; Maling list - DRI developers 

Subject: Re: [PATCH 0/7] Per client engine busyness


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
>
>
> Maybe we could try to standardize how the different submission ring
>   usage gets exposed in the fdinfo? We went the simple way of just
> adding name and index, but if someone has a suggestion on how else we
> could format them so there is commonality across vendors we could just
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Nieto, David M
[AMD Official Use Only]

The format is simple:

:  %

we also have entries for the memory mapped:
mem  :  KiB

On my submission 
https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html I added a 
python script to print out the info. It has a CPU usage lower that top, for 
example.

To be absolutely honest, I agree that there is an overhead, but It might not be 
as much as you fear.

From: Tvrtko Ursulin 
Sent: Monday, May 17, 2021 9:00 AM
To: Nieto, David M ; Daniel Vetter ; 
Koenig, Christian 
Cc: Alex Deucher ; Intel Graphics Development 
; Maling list - DRI developers 

Subject: Re: [PATCH 0/7] Per client engine busyness


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
>
>
> Maybe we could try to standardize how the different submission ring
>   usage gets exposed in the fdinfo? We went the simple way of just
> adding name and index, but if someone has a suggestion on how else we
> could format them so there is commonality across vendors we could just
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Tvrtko Ursulin



On 17/05/2021 15:39, Nieto, David M wrote:

[AMD Official Use Only]


Maybe we could try to standardize how the different submission ring 
  usage gets exposed in the fdinfo? We went the simple way of just 
adding name and index, but if someone has a suggestion on how else we 
could format them so there is commonality across vendors we could just 
amend those.


Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also 
like people to look at the procfs proposal from Chris,

 - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an 
extra cost in DRM client discovery (compared to my sysfs series and also 
procfs RFC from Chris). It would require reading all processes (well 
threads, then maybe aggregating threads into parent processes), all fd 
symlinks, and doing a stat on them to figure out which ones are DRM devices.


Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

I’d really like to have the process managers tools display GPU usage 
regardless of what vendor is installed.


Definitely.

Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Nieto, David M
[AMD Official Use Only]

Maybe we could try to standardize how the different submission ring  usage gets 
exposed in the fdinfo? We went the simple way of just adding name and index, 
but if someone has a suggestion on how else we could format them so there is 
commonality across vendors we could just amend those.

I’d really like to have the process managers tools display GPU usage regardless 
of what vendor is installed.


From: Daniel Vetter 
Sent: Monday, May 17, 2021 7:30:47 AM
To: Koenig, Christian 
Cc: Tvrtko Ursulin ; Nieto, David M 
; Alex Deucher ; Intel Graphics 
Development ; Maling list - DRI developers 
; Daniel Vetter 
Subject: Re: [PATCH 0/7] Per client engine busyness

On Fri, May 14, 2021 at 05:10:29PM +0200, Christian König wrote:
> Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> >
> > On 14/05/2021 15:56, Christian König wrote:
> > > Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> > > >
> > > > On 14/05/2021 14:53, Christian König wrote:
> > > > > >
> > > > > > David also said that you considered sysfs but were wary
> > > > > > of exposing process info in there. To clarify, my patch
> > > > > > is not exposing sysfs entry per process, but one per
> > > > > > open drm fd.
> > > > > >
> > > > >
> > > > > Yes, we discussed this as well, but then rejected the approach.
> > > > >
> > > > > To have useful information related to the open drm fd you
> > > > > need to related that to process(es) which have that file
> > > > > descriptor open. Just tracking who opened it first like DRM
> > > > > does is pretty useless on modern systems.
> > > >
> > > > We do update the pid/name for fds passed over unix sockets.
> > >
> > > Well I just double checked and that is not correct.
> > >
> > > Could be that i915 has some special code for that, but on my laptop
> > > I only see the X server under the "clients" debugfs file.
> >
> > Yes we have special code in i915 for this. Part of this series we are
> > discussing here.
>
> Ah, yeah you should mention that. Could we please separate that into common
> code instead? Cause I really see that as a bug in the current handling
> independent of the discussion here.
>
> As far as I know all IOCTLs go though some common place in DRM anyway.

Yeah, might be good to fix that confusion in debugfs. But since that's
non-uapi, I guess no one ever cared (enough).

> > > > > But an "lsof /dev/dri/renderD128" for example does exactly
> > > > > what top does as well, it iterates over /proc and sees which
> > > > > process has that file open.
> > > >
> > > > Lsof is quite inefficient for this use case. It has to open
> > > > _all_ open files for _all_ processes on the system to find a
> > > > handful of ones which may have the DRM device open.
> > >
> > > Completely agree.
> > >
> > > The key point is you either need to have all references to an open
> > > fd, or at least track whoever last used that fd.
> > >
> > > At least the last time I looked even the fs layer didn't know which
> > > fd is open by which process. So there wasn't really any alternative
> > > to the lsof approach.
> >
> > I asked you about the use case you have in mind which you did not
> > answer. Otherwise I don't understand when do you need to walk all files.
> > What information you want to get?
>
> Per fd debugging information, e.g. instead of the top use case you know
> which process you want to look at.
>
> >
> > For the use case of knowing which DRM file is using how much GPU time on
> > engine X we do not need to walk all open files either with my sysfs
> > approach or the proc approach from Chris. (In the former case we
> > optionally aggregate by PID at presentation time, and in the latter case
> > aggregation is implicit.)
>
> I'm unsure if we should go with the sysfs, proc or some completely different
> approach.
>
> In general it would be nice to have a way to find all the fd references for
> an open inode.

Yeah, but that maybe needs to be an ioctl or syscall or something on the
inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
open files? If this really is a real world problem, but given that
top/lsof and everyone else hasn't asked for it yet maybe it's not.

Also I replied in some other thread, I really like the fdinfo stuff, and I
think trying to somewhat standardized this across drivers would be neat.
Especially since i915 is going to adopt drm/scheduler for front-end
scheduling too, so at least some of this should be fairly easy to share.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2Fdata=04%7C01%7CDavid.Nieto%40amd.com%7C3711fdd207484d6bb5fd08d919405eb0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637568586536251118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=so13elRn0Ffl7w51QEk%2F%2FXmxOav9n5p6fNXrnDBVY%2B0%3Dreserved=0


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Daniel Vetter
On Fri, May 14, 2021 at 05:10:29PM +0200, Christian König wrote:
> Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> > 
> > On 14/05/2021 15:56, Christian König wrote:
> > > Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> > > > 
> > > > On 14/05/2021 14:53, Christian König wrote:
> > > > > > 
> > > > > > David also said that you considered sysfs but were wary
> > > > > > of exposing process info in there. To clarify, my patch
> > > > > > is not exposing sysfs entry per process, but one per
> > > > > > open drm fd.
> > > > > > 
> > > > > 
> > > > > Yes, we discussed this as well, but then rejected the approach.
> > > > > 
> > > > > To have useful information related to the open drm fd you
> > > > > need to related that to process(es) which have that file
> > > > > descriptor open. Just tracking who opened it first like DRM
> > > > > does is pretty useless on modern systems.
> > > > 
> > > > We do update the pid/name for fds passed over unix sockets.
> > > 
> > > Well I just double checked and that is not correct.
> > > 
> > > Could be that i915 has some special code for that, but on my laptop
> > > I only see the X server under the "clients" debugfs file.
> > 
> > Yes we have special code in i915 for this. Part of this series we are
> > discussing here.
> 
> Ah, yeah you should mention that. Could we please separate that into common
> code instead? Cause I really see that as a bug in the current handling
> independent of the discussion here.
> 
> As far as I know all IOCTLs go though some common place in DRM anyway.

Yeah, might be good to fix that confusion in debugfs. But since that's
non-uapi, I guess no one ever cared (enough).

> > > > > But an "lsof /dev/dri/renderD128" for example does exactly
> > > > > what top does as well, it iterates over /proc and sees which
> > > > > process has that file open.
> > > > 
> > > > Lsof is quite inefficient for this use case. It has to open
> > > > _all_ open files for _all_ processes on the system to find a
> > > > handful of ones which may have the DRM device open.
> > > 
> > > Completely agree.
> > > 
> > > The key point is you either need to have all references to an open
> > > fd, or at least track whoever last used that fd.
> > > 
> > > At least the last time I looked even the fs layer didn't know which
> > > fd is open by which process. So there wasn't really any alternative
> > > to the lsof approach.
> > 
> > I asked you about the use case you have in mind which you did not
> > answer. Otherwise I don't understand when do you need to walk all files.
> > What information you want to get?
> 
> Per fd debugging information, e.g. instead of the top use case you know
> which process you want to look at.
> 
> > 
> > For the use case of knowing which DRM file is using how much GPU time on
> > engine X we do not need to walk all open files either with my sysfs
> > approach or the proc approach from Chris. (In the former case we
> > optionally aggregate by PID at presentation time, and in the latter case
> > aggregation is implicit.)
> 
> I'm unsure if we should go with the sysfs, proc or some completely different
> approach.
> 
> In general it would be nice to have a way to find all the fd references for
> an open inode.

Yeah, but that maybe needs to be an ioctl or syscall or something on the
inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
open files? If this really is a real world problem, but given that
top/lsof and everyone else hasn't asked for it yet maybe it's not.

Also I replied in some other thread, I really like the fdinfo stuff, and I
think trying to somewhat standardized this across drivers would be neat.
Especially since i915 is going to adopt drm/scheduler for front-end
scheduling too, so at least some of this should be fairly easy to share.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 0/7] Per client engine busyness

2021-05-17 Thread Daniel Vetter
On Thu, May 13, 2021 at 11:48:08AM -0400, Alex Deucher wrote:
> On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
>  wrote:
> >
> > From: Tvrtko Ursulin 
> >
> > Resurrect of the previosuly merged per client engine busyness patches. In a
> > nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> > only physical GPU engine usage but per process view as well.
> >
> > Example screen capture:
> > 
> > intel-gpu-top -  906/ 955 MHz;0% RC6;  5.30 Watts;  933 irqs/s
> >
> >   IMC reads: 4414 MiB/s
> >  IMC writes: 3805 MiB/s
> >
> >   ENGINE  BUSY  MI_SEMA 
> > MI_WAIT
> >  Render/3D/0   93.46% |▋  |  0% 
> >  0%
> >Blitter/00.00% |   |  0% 
> >  0%
> >  Video/00.00% |   |  0% 
> >  0%
> >   VideoEnhance/00.00% |   |  0% 
> >  0%
> >
> >   PIDNAME  Render/3D  BlitterVideo  VideoEnhance
> >  2733   neverball |██▌ ||||||   
> >  |
> >  2047Xorg |███▊||||||   
> >  |
> >  2737glxgears |█▍  ||||||   
> >  |
> >  2128   xfwm4 |||||||   
> >  |
> >  2047Xorg |||||||   
> >  |
> > 
> >
> > Internally we track time spent on engines for each struct intel_context, 
> > both
> > for current and past contexts belonging to each open DRM file.
> >
> > This can serve as a building block for several features from the wanted 
> > list:
> > smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> > wanted by some customers, setrlimit(2) like controls, cgroups controller,
> > dynamic SSEU tuning, ...
> >
> > To enable userspace access to the tracked data, we expose time spent on GPU 
> > per
> > client and per engine class in sysfs with a hierarchy like the below:
> >
> > # cd /sys/class/drm/card0/clients/
> > # tree
> > .
> > ├── 7
> > │   ├── busy
> > │   │   ├── 0
> > │   │   ├── 1
> > │   │   ├── 2
> > │   │   └── 3
> > │   ├── name
> > │   └── pid
> > ├── 8
> > │   ├── busy
> > │   │   ├── 0
> > │   │   ├── 1
> > │   │   ├── 2
> > │   │   └── 3
> > │   ├── name
> > │   └── pid
> > └── 9
> > ├── busy
> > │   ├── 0
> > │   ├── 1
> > │   ├── 2
> > │   └── 3
> > ├── name
> > └── pid
> >
> > Files in 'busy' directories are numbered using the engine class ABI values 
> > and
> > they contain accumulated nanoseconds each client spent on engines of a
> > respective class.
> 
> We did something similar in amdgpu using the gpu scheduler.  We then
> expose the data via fdinfo.  See
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704

Yeah the reason I've dropped these patches was because they looked like
prime material for at least a bit of standardization across drivers.

Also fdinfo sounds like very good interface for these, I didn't even know
that's doable. Might also be interesting to even standardize the fdinfo
stuff across drivers.

Also since drm/i915 will adopt drm/scheduler, we could build that on top
of that code too. So no restrictions there from i915 side.

Anyway discussion kicked off, I'll let yout figure out what we'll do here.
-Daniel

> 
> Alex
> 
> 
> >
> > Tvrtko Ursulin (7):
> >   drm/i915: Expose list of clients in sysfs
> >   drm/i915: Update client name on context create
> >   drm/i915: Make GEM contexts track DRM clients
> >   drm/i915: Track runtime spent in closed and unreachable GEM contexts
> >   drm/i915: Track all user contexts per client
> >   drm/i915: Track context current active time
> >   drm/i915: Expose per-engine client busyness
> >
> >  drivers/gpu/drm/i915/Makefile |   5 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
> >  .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
> >  drivers/gpu/drm/i915/gt/intel_context.c   |  27 +-
> >  drivers/gpu/drm/i915/gt/intel_context.h   |  15 +-
> >  drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
> >  .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
> >  .../gpu/drm/i915/gt/intel_gt_clock_utils.c|   4 +
> >  drivers/gpu/drm/i915/gt/intel_lrc.c   |  27 +-
> >  

Re: [PATCH 0/7] Per client engine busyness

2021-05-14 Thread Christian König

Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:


On 14/05/2021 15:56, Christian König wrote:

Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:


On 14/05/2021 14:53, Christian König wrote:


David also said that you considered sysfs but were wary of 
exposing process info in there. To clarify, my patch is not 
exposing sysfs entry per process, but one per open drm fd.




Yes, we discussed this as well, but then rejected the approach.

To have useful information related to the open drm fd you need to 
related that to process(es) which have that file descriptor open. 
Just tracking who opened it first like DRM does is pretty useless 
on modern systems.


We do update the pid/name for fds passed over unix sockets.


Well I just double checked and that is not correct.

Could be that i915 has some special code for that, but on my laptop I 
only see the X server under the "clients" debugfs file.


Yes we have special code in i915 for this. Part of this series we are 
discussing here.


Ah, yeah you should mention that. Could we please separate that into 
common code instead? Cause I really see that as a bug in the current 
handling independent of the discussion here.


As far as I know all IOCTLs go though some common place in DRM anyway.

But an "lsof /dev/dri/renderD128" for example does exactly what top 
does as well, it iterates over /proc and sees which process has 
that file open.


Lsof is quite inefficient for this use case. It has to open _all_ 
open files for _all_ processes on the system to find a handful of 
ones which may have the DRM device open.


Completely agree.

The key point is you either need to have all references to an open 
fd, or at least track whoever last used that fd.


At least the last time I looked even the fs layer didn't know which 
fd is open by which process. So there wasn't really any alternative 
to the lsof approach.


I asked you about the use case you have in mind which you did not 
answer. Otherwise I don't understand when do you need to walk all 
files. What information you want to get?


Per fd debugging information, e.g. instead of the top use case you know 
which process you want to look at.




For the use case of knowing which DRM file is using how much GPU time 
on engine X we do not need to walk all open files either with my sysfs 
approach or the proc approach from Chris. (In the former case we 
optionally aggregate by PID at presentation time, and in the latter 
case aggregation is implicit.)


I'm unsure if we should go with the sysfs, proc or some completely 
different approach.


In general it would be nice to have a way to find all the fd references 
for an open inode.


Regards,
Christian.



Regards,

Tvrtko




Re: [PATCH 0/7] Per client engine busyness

2021-05-14 Thread Tvrtko Ursulin



On 14/05/2021 15:56, Christian König wrote:

Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:


On 14/05/2021 14:53, Christian König wrote:


David also said that you considered sysfs but were wary of exposing 
process info in there. To clarify, my patch is not exposing sysfs 
entry per process, but one per open drm fd.




Yes, we discussed this as well, but then rejected the approach.

To have useful information related to the open drm fd you need to 
related that to process(es) which have that file descriptor open. 
Just tracking who opened it first like DRM does is pretty useless on 
modern systems.


We do update the pid/name for fds passed over unix sockets.


Well I just double checked and that is not correct.

Could be that i915 has some special code for that, but on my laptop I 
only see the X server under the "clients" debugfs file.


Yes we have special code in i915 for this. Part of this series we are 
discussing here.


But an "lsof /dev/dri/renderD128" for example does exactly what top 
does as well, it iterates over /proc and sees which process has that 
file open.


Lsof is quite inefficient for this use case. It has to open _all_ open 
files for _all_ processes on the system to find a handful of ones 
which may have the DRM device open.


Completely agree.

The key point is you either need to have all references to an open fd, 
or at least track whoever last used that fd.


At least the last time I looked even the fs layer didn't know which fd 
is open by which process. So there wasn't really any alternative to the 
lsof approach.


I asked you about the use case you have in mind which you did not 
answer. Otherwise I don't understand when do you need to walk all files. 
What information you want to get?


For the use case of knowing which DRM file is using how much GPU time on 
engine X we do not need to walk all open files either with my sysfs 
approach or the proc approach from Chris. (In the former case we 
optionally aggregate by PID at presentation time, and in the latter case 
aggregation is implicit.)


Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-14 Thread Christian König

Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:


On 14/05/2021 14:53, Christian König wrote:


David also said that you considered sysfs but were wary of exposing 
process info in there. To clarify, my patch is not exposing sysfs 
entry per process, but one per open drm fd.




Yes, we discussed this as well, but then rejected the approach.

To have useful information related to the open drm fd you need to 
related that to process(es) which have that file descriptor open. 
Just tracking who opened it first like DRM does is pretty useless on 
modern systems.


We do update the pid/name for fds passed over unix sockets.


Well I just double checked and that is not correct.

Could be that i915 has some special code for that, but on my laptop I 
only see the X server under the "clients" debugfs file.


But an "lsof /dev/dri/renderD128" for example does exactly what top 
does as well, it iterates over /proc and sees which process has that 
file open.


Lsof is quite inefficient for this use case. It has to open _all_ open 
files for _all_ processes on the system to find a handful of ones 
which may have the DRM device open.


Completely agree.

The key point is you either need to have all references to an open fd, 
or at least track whoever last used that fd.


At least the last time I looked even the fs layer didn't know which fd 
is open by which process. So there wasn't really any alternative to the 
lsof approach.


Regards,
Christian.



So even with sysfs aid for discovery you are back to just going over 
all files again.


For what use case?

To enable GPU usage in top we can do much better than iterate over all 
open files in the system. We can start with a process if going with 
the /proc proposal, or with the opened DRM file directly with the 
sysfs proposal. Both are significantly fewer than total number of open 
files across all processes.


Regards,

Tvrtko




Re: [PATCH 0/7] Per client engine busyness

2021-05-14 Thread Tvrtko Ursulin



On 14/05/2021 14:53, Christian König wrote:


David also said that you considered sysfs but were wary of exposing 
process info in there. To clarify, my patch is not exposing sysfs 
entry per process, but one per open drm fd.




Yes, we discussed this as well, but then rejected the approach.

To have useful information related to the open drm fd you need to 
related that to process(es) which have that file descriptor open. Just 
tracking who opened it first like DRM does is pretty useless on modern 
systems.


We do update the pid/name for fds passed over unix sockets.

But an "lsof /dev/dri/renderD128" for example does exactly what top does 
as well, it iterates over /proc and sees which process has that file open.


Lsof is quite inefficient for this use case. It has to open _all_ open 
files for _all_ processes on the system to find a handful of ones which 
may have the DRM device open.


So even with sysfs aid for discovery you are back to just going over all 
files again.


For what use case?

To enable GPU usage in top we can do much better than iterate over all 
open files in the system. We can start with a process if going with the 
/proc proposal, or with the opened DRM file directly with the sysfs 
proposal. Both are significantly fewer than total number of open files 
across all processes.


Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-14 Thread Christian König


David also said that you considered sysfs but were wary of exposing 
process info in there. To clarify, my patch is not exposing sysfs 
entry per process, but one per open drm fd.




Yes, we discussed this as well, but then rejected the approach.

To have useful information related to the open drm fd you need to 
related that to process(es) which have that file descriptor open. Just 
tracking who opened it first like DRM does is pretty useless on modern 
systems.


But an "lsof /dev/dri/renderD128" for example does exactly what top does 
as well, it iterates over /proc and sees which process has that file open.


So even with sysfs aid for discovery you are back to just going over all 
files again.


Regards,
Christian.

Am 14.05.21 um 15:42 schrieb Tvrtko Ursulin:


On 14/05/2021 09:04, Christian König wrote:
Well in my opinion exposing it through fdinfo turned out to be a 
really clean approach.


It describes exactly the per file descriptor information we need.


Yeah fdinfo certainly is mostly simple and neat.

I say mostly because main problem I see with it is discoverability. 
Alex commented in another sub-thread - "If you know the process, you can
look it up in procfs." - so that's fine for introspection but a bit 
challenging for a top(1) like tool.


David also said that you considered sysfs but were wary of exposing 
process info in there. To clarify, my patch is not exposing sysfs 
entry per process, but one per open drm fd.


Top level hierarchy is under /sys/class/drm/card0/clients/ and each 
opened drm fd gets a directory in there. Process data I expose there 
are the name and pid, but these are for convenience, not as a primary 
information.


But yes, I agree this part of the approach is definitely questionable. 
(As a side note, I am not sure if I could put a symlink to proc in 
there. I think sysfs and symlinks did not really work.)


Another data point is that this "client root" we think would be useful 
for adding other stuff in the future. For instance per client debug 
log stream is occasionally talked about.



Making that device driver independent is potentially useful as well.


Alternative to my sysfs approach, the idea of exposing this in proc 
was floated by Chris in this series 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F86692%2Fdata=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=JsARknf2q%2FwtGfgLM6ZUOSkaivV%2B6yakpqYh%2B6yQlEc%3Dreserved=0.


That would be generic enough so any GPU vendor can slot in, and common 
enough that GPU agnostic tools should be able to use it. Modulo some 
discussion around naming the "channels" (GPU engines) or not.


It wouldn't be able to support things like the before mentioned per 
client debug log stream but I guess that's not the most important 
thing. Most important would be allowing GPU usage to be wired to 
top(1) like tools which is probably even overdue given the modern 
computing landscape.


Would you guys be interested to give a more detailed look over both 
proposals and see if any would interest you?


Regards,

Tvrtko


Regards,
Christian.

Am 14.05.21 um 09:22 schrieb Nieto, David M:


[AMD Official Use Only - Internal Distribution Only]


We had entertained the idea of exposing the processes as sysfs nodes 
as you proposed, but we had concerns about exposing process info in 
there, especially since /proc already exists for that purpose.


I think if you were to follow that approach, we could have tools 
like top that support exposing GPU engine usage.
 


*From:* Alex Deucher 
*Sent:* Thursday, May 13, 2021 10:58 PM
*To:* Tvrtko Ursulin ; Nieto, David 
M ; Koenig, Christian 
*Cc:* Intel Graphics Development ; 
Maling list - DRI developers ; 
Daniel Vetter 

*Subject:* Re: [PATCH 0/7] Per client engine busyness
+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Resurrect of the previosuly merged per client engine busyness 
patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful 
and show not

> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> 

> >> intel-gpu-top -  906/ 955 MHz;    0% RC6; 5.30 Watts;  933 
irqs/s

> >>
> >>    IMC reads: 4414 MiB/s
> >>   IMC writes:

Re: [PATCH 0/7] Per client engine busyness

2021-05-14 Thread Tvrtko Ursulin



On 14/05/2021 09:04, Christian König wrote:
Well in my opinion exposing it through fdinfo turned out to be a really 
clean approach.


It describes exactly the per file descriptor information we need.


Yeah fdinfo certainly is mostly simple and neat.

I say mostly because main problem I see with it is discoverability. Alex 
commented in another sub-thread - "If you know the process, you can
look it up in procfs." - so that's fine for introspection but a bit 
challenging for a top(1) like tool.


David also said that you considered sysfs but were wary of exposing 
process info in there. To clarify, my patch is not exposing sysfs entry 
per process, but one per open drm fd.


Top level hierarchy is under /sys/class/drm/card0/clients/ and each 
opened drm fd gets a directory in there. Process data I expose there are 
the name and pid, but these are for convenience, not as a primary 
information.


But yes, I agree this part of the approach is definitely questionable. 
(As a side note, I am not sure if I could put a symlink to proc in 
there. I think sysfs and symlinks did not really work.)


Another data point is that this "client root" we think would be useful 
for adding other stuff in the future. For instance per client debug log 
stream is occasionally talked about.



Making that device driver independent is potentially useful as well.


Alternative to my sysfs approach, the idea of exposing this in proc was 
floated by Chris in this series 
https://patchwork.freedesktop.org/series/86692/.


That would be generic enough so any GPU vendor can slot in, and common 
enough that GPU agnostic tools should be able to use it. Modulo some 
discussion around naming the "channels" (GPU engines) or not.


It wouldn't be able to support things like the before mentioned per 
client debug log stream but I guess that's not the most important thing. 
Most important would be allowing GPU usage to be wired to top(1) like 
tools which is probably even overdue given the modern computing landscape.


Would you guys be interested to give a more detailed look over both 
proposals and see if any would interest you?


Regards,

Tvrtko


Regards,
Christian.

Am 14.05.21 um 09:22 schrieb Nieto, David M:


[AMD Official Use Only - Internal Distribution Only]


We had entertained the idea of exposing the processes as sysfs nodes 
as you proposed, but we had concerns about exposing process info in 
there, especially since /proc already exists for that purpose.


I think if you were to follow that approach, we could have tools like 
top that support exposing GPU engine usage.


*From:* Alex Deucher 
*Sent:* Thursday, May 13, 2021 10:58 PM
*To:* Tvrtko Ursulin ; Nieto, David M 
; Koenig, Christian 
*Cc:* Intel Graphics Development ; 
Maling list - DRI developers ; Daniel 
Vetter 

*Subject:* Re: [PATCH 0/7] Per client engine busyness
+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Resurrect of the previosuly merged per client engine busyness 
patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful 
and show not

> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> 


> >> intel-gpu-top -  906/ 955 MHz;    0% RC6; 5.30 Watts;  933 irqs/s
> >>
> >>    IMC reads: 4414 MiB/s
> >>   IMC writes: 3805 MiB/s
> >>
> >>    ENGINE BUSY  
MI_SEMA MI_WAIT
> >>   Render/3D/0   93.46% |▋  
|  0%  0%
> >> Blitter/0    0.00% |   
|  0%  0%
> >>   Video/0    0.00% |   
|  0%  0%
> >>    VideoEnhance/0    0.00% |   
|  0%  0%

> >>
> >>    PID    NAME  Render/3D Blitter    Video  
VideoEnhance
> >>   2733   neverball |██▌ ||    ||
||    |
> >>   2047    Xorg |███▊ ||    ||
||    |
> >>   2737    glxgears |█▍ ||    ||
||    |

> >>   2128   xfwm4 | ||    ||    ||    |
> >>   2047    Xorg | ||    ||    ||    |
> >> 


> >>
> >> I

Re: [PATCH 0/7] Per client engine busyness

2021-05-14 Thread Christian König
Well in my opinion exposing it through fdinfo turned out to be a really 
clean approach.


It describes exactly the per file descriptor information we need.

Making that device driver independent is potentially useful as well.

Regards,
Christian.

Am 14.05.21 um 09:22 schrieb Nieto, David M:


[AMD Official Use Only - Internal Distribution Only]


We had entertained the idea of exposing the processes as sysfs nodes 
as you proposed, but we had concerns about exposing process info in 
there, especially since /proc already exists for that purpose.


I think if you were to follow that approach, we could have tools like 
top that support exposing GPU engine usage.


*From:* Alex Deucher 
*Sent:* Thursday, May 13, 2021 10:58 PM
*To:* Tvrtko Ursulin ; Nieto, David M 
; Koenig, Christian 
*Cc:* Intel Graphics Development ; 
Maling list - DRI developers ; Daniel 
Vetter 

*Subject:* Re: [PATCH 0/7] Per client engine busyness
+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Resurrect of the previosuly merged per client engine busyness 
patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful 
and show not

> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> 


> >> intel-gpu-top -  906/ 955 MHz;    0% RC6; 5.30 Watts;  933 irqs/s
> >>
> >>    IMC reads: 4414 MiB/s
> >>   IMC writes: 3805 MiB/s
> >>
> >>    ENGINE BUSY  
MI_SEMA MI_WAIT
> >>   Render/3D/0   93.46% |▋  
|  0%  0%
> >> Blitter/0    0.00% |   
|  0%  0%
> >>   Video/0    0.00% |   
|  0%  0%
> >>    VideoEnhance/0    0.00% |   
|  0%  0%

> >>
> >>    PID    NAME  Render/3D Blitter    Video  
VideoEnhance
> >>   2733   neverball |██▌ ||    ||    
||    |
> >>   2047    Xorg |███▊ ||    ||    
||    |
> >>   2737    glxgears |█▍ ||    ||    
||    |

> >>   2128   xfwm4 | ||    ||    ||    |
> >>   2047    Xorg | ||    ||    ||    |
> >> 


> >>
> >> Internally we track time spent on engines for each struct 
intel_context, both

> >> for current and past contexts belonging to each open DRM file.
> >>
> >> This can serve as a building block for several features from the 
wanted list:
> >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
functionality
> >> wanted by some customers, setrlimit(2) like controls, cgroups 
controller,

> >> dynamic SSEU tuning, ...
> >>
> >> To enable userspace access to the tracked data, we expose time 
spent on GPU per

> >> client and per engine class in sysfs with a hierarchy like the below:
> >>
> >>  # cd /sys/class/drm/card0/clients/
> >>  # tree
> >>  .
> >>  ├── 7
> >>  │   ├── busy
> >>  │   │   ├── 0
> >>  │   │   ├── 1
> >>  │   │   ├── 2
> >>  │   │   └── 3
> >>  │   ├── name
> >>  │   └── pid
> >>  ├── 8
> >>  │   ├── busy
> >>  │   │   ├── 0
> >>  │   │   ├── 1
> >>  │   │   ├── 2
> >>  │   │   └── 3
> >>  │   ├── name
> >>  │   └── pid
> >>  └── 9
> >>  ├── busy
> >>  │   ├── 0
> >>  │   ├── 1
> >>  │   ├── 2
> >>  │   └── 3
> >>  ├── name
> >>  └── pid
> >>
> >> Files in 'busy' directories are numbered using the engine class 
ABI values and
> >> they contain accumulated nanoseconds each client spent on engines 
of a

> >> respective class.
> >
> > We did something similar in amd

Re: [PATCH 0/7] Per client engine busyness

2021-05-14 Thread Nieto, David M
[AMD Official Use Only - Internal Distribution Only]

We had entertained the idea of exposing the processes as sysfs nodes as you 
proposed, but we had concerns about exposing process info in there, especially 
since /proc already exists for that purpose.

I think if you were to follow that approach, we could have tools like top that 
support exposing GPU engine usage.

From: Alex Deucher 
Sent: Thursday, May 13, 2021 10:58 PM
To: Tvrtko Ursulin ; Nieto, David M 
; Koenig, Christian 
Cc: Intel Graphics Development ; Maling list - 
DRI developers ; Daniel Vetter 

Subject: Re: [PATCH 0/7] Per client engine busyness

+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Resurrect of the previosuly merged per client engine busyness patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful and show 
> >> not
> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> 
> >> intel-gpu-top -  906/ 955 MHz;0% RC6;  5.30 Watts;  933 irqs/s
> >>
> >>IMC reads: 4414 MiB/s
> >>   IMC writes: 3805 MiB/s
> >>
> >>ENGINE  BUSY  MI_SEMA 
> >> MI_WAIT
> >>   Render/3D/0   93.46% |▋  |  0%   
> >>0%
> >> Blitter/00.00% |   |  0%   
> >>0%
> >>   Video/00.00% |   |  0%   
> >>0%
> >>VideoEnhance/00.00% |   |  0%   
> >>0%
> >>
> >>PIDNAME  Render/3D  BlitterVideo  
> >> VideoEnhance
> >>   2733   neverball |██▌ |||||| 
> >>|
> >>   2047Xorg |███▊|||||| 
> >>|
> >>   2737glxgears |█▍  |||||| 
> >>|
> >>   2128   xfwm4 ||||||| 
> >>|
> >>   2047Xorg ||||||| 
> >>|
> >> 
> >>
> >> Internally we track time spent on engines for each struct intel_context, 
> >> both
> >> for current and past contexts belonging to each open DRM file.
> >>
> >> This can serve as a building block for several features from the wanted 
> >> list:
> >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
> >> functionality
> >> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> >> dynamic SSEU tuning, ...
> >>
> >> To enable userspace access to the tracked data, we expose time spent on 
> >> GPU per
> >> client and per engine class in sysfs with a hierarchy like the below:
> >>
> >>  # cd /sys/class/drm/card0/clients/
> >>  # tree
> >>  .
> >>  ├── 7
> >>  │   ├── busy
> >>  │   │   ├── 0
> >>  │   │   ├── 1
> >>  │   │   ├── 2
> >>  │   │   └── 3
> >>  │   ├── name
> >>  │   └── pid
> >>  ├── 8
> >>  │   ├── busy
> >>  │   │   ├── 0
> >>  │   │   ├── 1
> >>  │   │   ├── 2
> >>  │   │   └── 3
> >>  │   ├── name
> >>  │   └── pid
> >>  └── 9
> >>  ├── busy
> >>  │   ├── 0
> >>  │   ├── 1
> >>  │   ├── 2
> >>  │   └── 3
> >>  ├── name
> >>  └── pid
> >>
> >> Files in 'busy' directories are numbered using the engine class ABI values 
> >> and
> >> they contain accumulated nanoseconds each client spent on engines of a
> >> respective class.
> >
> > We did something similar in amdgpu using the gpu scheduler.  We the

Re: [PATCH 0/7] Per client engine busyness

2021-05-13 Thread Alex Deucher
+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Resurrect of the previosuly merged per client engine busyness patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful and show 
> >> not
> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> 
> >> intel-gpu-top -  906/ 955 MHz;0% RC6;  5.30 Watts;  933 irqs/s
> >>
> >>IMC reads: 4414 MiB/s
> >>   IMC writes: 3805 MiB/s
> >>
> >>ENGINE  BUSY  MI_SEMA 
> >> MI_WAIT
> >>   Render/3D/0   93.46% |▋  |  0%   
> >>0%
> >> Blitter/00.00% |   |  0%   
> >>0%
> >>   Video/00.00% |   |  0%   
> >>0%
> >>VideoEnhance/00.00% |   |  0%   
> >>0%
> >>
> >>PIDNAME  Render/3D  BlitterVideo  
> >> VideoEnhance
> >>   2733   neverball |██▌ |||||| 
> >>|
> >>   2047Xorg |███▊|||||| 
> >>|
> >>   2737glxgears |█▍  |||||| 
> >>|
> >>   2128   xfwm4 ||||||| 
> >>|
> >>   2047Xorg ||||||| 
> >>|
> >> 
> >>
> >> Internally we track time spent on engines for each struct intel_context, 
> >> both
> >> for current and past contexts belonging to each open DRM file.
> >>
> >> This can serve as a building block for several features from the wanted 
> >> list:
> >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
> >> functionality
> >> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> >> dynamic SSEU tuning, ...
> >>
> >> To enable userspace access to the tracked data, we expose time spent on 
> >> GPU per
> >> client and per engine class in sysfs with a hierarchy like the below:
> >>
> >>  # cd /sys/class/drm/card0/clients/
> >>  # tree
> >>  .
> >>  ├── 7
> >>  │   ├── busy
> >>  │   │   ├── 0
> >>  │   │   ├── 1
> >>  │   │   ├── 2
> >>  │   │   └── 3
> >>  │   ├── name
> >>  │   └── pid
> >>  ├── 8
> >>  │   ├── busy
> >>  │   │   ├── 0
> >>  │   │   ├── 1
> >>  │   │   ├── 2
> >>  │   │   └── 3
> >>  │   ├── name
> >>  │   └── pid
> >>  └── 9
> >>  ├── busy
> >>  │   ├── 0
> >>  │   ├── 1
> >>  │   ├── 2
> >>  │   └── 3
> >>  ├── name
> >>  └── pid
> >>
> >> Files in 'busy' directories are numbered using the engine class ABI values 
> >> and
> >> they contain accumulated nanoseconds each client spent on engines of a
> >> respective class.
> >
> > We did something similar in amdgpu using the gpu scheduler.  We then
> > expose the data via fdinfo.  See
> > https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
> > https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704
>
> Interesting!
>
> Is yours wall time or actual GPU time taking preemption and such into
> account? Do you have some userspace tools parsing this data and how to
> do you client discovery? Presumably there has to be a better way that
> going through all open file descriptors?

Wall time.  It uses the fences in the scheduler to calculate engine
time.  We have some python scripts to make it look pretty, but mainly
just reading the files directly.  If you know the process, you can
look it up in procfs.

>
> Our implementation was merged in January but Daniel took it out recently
> because he wanted to have discussion about a common vendor framework for
> this whole story on dri-devel. I think. +Daniel to comment.
>
> I couldn't find the patch you pasted on the mailing list to see if there
> was any such discussion around your version.

It was on the amd-gfx mailing list.

Alex

>
> Regards,
>
> Tvrtko
>
> >
> > Alex
> >
> >
> >>
> >> Tvrtko Ursulin (7):
> >>drm/i915: Expose list of clients in sysfs
> >>drm/i915: Update client name on context create
> >>drm/i915: Make GEM contexts track DRM clients
> >>drm/i915: Track runtime spent in closed and unreachable GEM contexts
> >>drm/i915: Track all user contexts per client
> >>drm/i915: Track context current 

Re: [PATCH 0/7] Per client engine busyness

2021-05-13 Thread Tvrtko Ursulin



Hi,

On 13/05/2021 16:48, Alex Deucher wrote:

On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
 wrote:


From: Tvrtko Ursulin 

Resurrect of the previosuly merged per client engine busyness patches. In a
nutshell it enables intel_gpu_top to be more top(1) like useful and show not
only physical GPU engine usage but per process view as well.

Example screen capture:

intel-gpu-top -  906/ 955 MHz;0% RC6;  5.30 Watts;  933 irqs/s

   IMC reads: 4414 MiB/s
  IMC writes: 3805 MiB/s

   ENGINE  BUSY  MI_SEMA MI_WAIT
  Render/3D/0   93.46% |▋  |  0%  0%
Blitter/00.00% |   |  0%  0%
  Video/00.00% |   |  0%  0%
   VideoEnhance/00.00% |   |  0%  0%

   PIDNAME  Render/3D  BlitterVideo  VideoEnhance
  2733   neverball |██▌ |||||||
  2047Xorg |███▊|||||||
  2737glxgears |█▍  |||||||
  2128   xfwm4 ||||||||
  2047Xorg ||||||||


Internally we track time spent on engines for each struct intel_context, both
for current and past contexts belonging to each open DRM file.

This can serve as a building block for several features from the wanted list:
smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
wanted by some customers, setrlimit(2) like controls, cgroups controller,
dynamic SSEU tuning, ...

To enable userspace access to the tracked data, we expose time spent on GPU per
client and per engine class in sysfs with a hierarchy like the below:

 # cd /sys/class/drm/card0/clients/
 # tree
 .
 ├── 7
 │   ├── busy
 │   │   ├── 0
 │   │   ├── 1
 │   │   ├── 2
 │   │   └── 3
 │   ├── name
 │   └── pid
 ├── 8
 │   ├── busy
 │   │   ├── 0
 │   │   ├── 1
 │   │   ├── 2
 │   │   └── 3
 │   ├── name
 │   └── pid
 └── 9
 ├── busy
 │   ├── 0
 │   ├── 1
 │   ├── 2
 │   └── 3
 ├── name
 └── pid

Files in 'busy' directories are numbered using the engine class ABI values and
they contain accumulated nanoseconds each client spent on engines of a
respective class.


We did something similar in amdgpu using the gpu scheduler.  We then
expose the data via fdinfo.  See
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704


Interesting!

Is yours wall time or actual GPU time taking preemption and such into 
account? Do you have some userspace tools parsing this data and how to 
do you client discovery? Presumably there has to be a better way that 
going through all open file descriptors?


Our implementation was merged in January but Daniel took it out recently 
because he wanted to have discussion about a common vendor framework for 
this whole story on dri-devel. I think. +Daniel to comment.


I couldn't find the patch you pasted on the mailing list to see if there 
was any such discussion around your version.


Regards,

Tvrtko



Alex




Tvrtko Ursulin (7):
   drm/i915: Expose list of clients in sysfs
   drm/i915: Update client name on context create
   drm/i915: Make GEM contexts track DRM clients
   drm/i915: Track runtime spent in closed and unreachable GEM contexts
   drm/i915: Track all user contexts per client
   drm/i915: Track context current active time
   drm/i915: Expose per-engine client busyness

  drivers/gpu/drm/i915/Makefile |   5 +-
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
  .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
  drivers/gpu/drm/i915/gt/intel_context.c   |  27 +-
  drivers/gpu/drm/i915/gt/intel_context.h   |  15 +-
  drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
  .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
  .../gpu/drm/i915/gt/intel_gt_clock_utils.c|   4 +
  drivers/gpu/drm/i915/gt/intel_lrc.c   |  27 +-
  drivers/gpu/drm/i915/gt/intel_lrc.h   |  24 ++
  drivers/gpu/drm/i915/gt/selftest_lrc.c|  10 +-
  drivers/gpu/drm/i915/i915_drm_client.c| 365 ++
  drivers/gpu/drm/i915/i915_drm_client.h| 123 ++
  drivers/gpu/drm/i915/i915_drv.c   |   6 +
  

Re: [PATCH 0/7] Per client engine busyness

2021-05-13 Thread Alex Deucher
On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> Resurrect of the previosuly merged per client engine busyness patches. In a
> nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> only physical GPU engine usage but per process view as well.
>
> Example screen capture:
> 
> intel-gpu-top -  906/ 955 MHz;0% RC6;  5.30 Watts;  933 irqs/s
>
>   IMC reads: 4414 MiB/s
>  IMC writes: 3805 MiB/s
>
>   ENGINE  BUSY  MI_SEMA 
> MI_WAIT
>  Render/3D/0   93.46% |▋  |  0%  
> 0%
>Blitter/00.00% |   |  0%  
> 0%
>  Video/00.00% |   |  0%  
> 0%
>   VideoEnhance/00.00% |   |  0%  
> 0%
>
>   PIDNAME  Render/3D  BlitterVideo  VideoEnhance
>  2733   neverball |██▌ |||||||
>  2047Xorg |███▊|||||||
>  2737glxgears |█▍  |||||||
>  2128   xfwm4 ||||||||
>  2047Xorg ||||||||
> 
>
> Internally we track time spent on engines for each struct intel_context, both
> for current and past contexts belonging to each open DRM file.
>
> This can serve as a building block for several features from the wanted list:
> smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> dynamic SSEU tuning, ...
>
> To enable userspace access to the tracked data, we expose time spent on GPU 
> per
> client and per engine class in sysfs with a hierarchy like the below:
>
> # cd /sys/class/drm/card0/clients/
> # tree
> .
> ├── 7
> │   ├── busy
> │   │   ├── 0
> │   │   ├── 1
> │   │   ├── 2
> │   │   └── 3
> │   ├── name
> │   └── pid
> ├── 8
> │   ├── busy
> │   │   ├── 0
> │   │   ├── 1
> │   │   ├── 2
> │   │   └── 3
> │   ├── name
> │   └── pid
> └── 9
> ├── busy
> │   ├── 0
> │   ├── 1
> │   ├── 2
> │   └── 3
> ├── name
> └── pid
>
> Files in 'busy' directories are numbered using the engine class ABI values and
> they contain accumulated nanoseconds each client spent on engines of a
> respective class.

We did something similar in amdgpu using the gpu scheduler.  We then
expose the data via fdinfo.  See
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704

Alex


>
> Tvrtko Ursulin (7):
>   drm/i915: Expose list of clients in sysfs
>   drm/i915: Update client name on context create
>   drm/i915: Make GEM contexts track DRM clients
>   drm/i915: Track runtime spent in closed and unreachable GEM contexts
>   drm/i915: Track all user contexts per client
>   drm/i915: Track context current active time
>   drm/i915: Expose per-engine client busyness
>
>  drivers/gpu/drm/i915/Makefile |   5 +-
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
>  drivers/gpu/drm/i915/gt/intel_context.c   |  27 +-
>  drivers/gpu/drm/i915/gt/intel_context.h   |  15 +-
>  drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
>  .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
>  .../gpu/drm/i915/gt/intel_gt_clock_utils.c|   4 +
>  drivers/gpu/drm/i915/gt/intel_lrc.c   |  27 +-
>  drivers/gpu/drm/i915/gt/intel_lrc.h   |  24 ++
>  drivers/gpu/drm/i915/gt/selftest_lrc.c|  10 +-
>  drivers/gpu/drm/i915/i915_drm_client.c| 365 ++
>  drivers/gpu/drm/i915/i915_drm_client.h| 123 ++
>  drivers/gpu/drm/i915/i915_drv.c   |   6 +
>  drivers/gpu/drm/i915/i915_drv.h   |   5 +
>  drivers/gpu/drm/i915/i915_gem.c   |  21 +-
>  drivers/gpu/drm/i915/i915_gpu_error.c |  31 +-
>  drivers/gpu/drm/i915/i915_gpu_error.h |   2 +-
>  drivers/gpu/drm/i915/i915_sysfs.c |   8 +
>  19 files changed, 716 insertions(+), 81 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
>  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
>
> --
> 2.30.2
>


[PATCH 0/7] Per client engine busyness

2021-05-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Resurrect of the previosuly merged per client engine busyness patches. In a
nutshell it enables intel_gpu_top to be more top(1) like useful and show not
only physical GPU engine usage but per process view as well.

Example screen capture:

intel-gpu-top -  906/ 955 MHz;0% RC6;  5.30 Watts;  933 irqs/s

  IMC reads: 4414 MiB/s
 IMC writes: 3805 MiB/s

  ENGINE  BUSY  MI_SEMA MI_WAIT
 Render/3D/0   93.46% |▋  |  0%  0%
   Blitter/00.00% |   |  0%  0%
 Video/00.00% |   |  0%  0%
  VideoEnhance/00.00% |   |  0%  0%

  PIDNAME  Render/3D  BlitterVideo  VideoEnhance
 2733   neverball |██▌ |||||||
 2047Xorg |███▊|||||||
 2737glxgears |█▍  |||||||
 2128   xfwm4 ||||||||
 2047Xorg ||||||||


Internally we track time spent on engines for each struct intel_context, both
for current and past contexts belonging to each open DRM file.

This can serve as a building block for several features from the wanted list:
smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
wanted by some customers, setrlimit(2) like controls, cgroups controller,
dynamic SSEU tuning, ...

To enable userspace access to the tracked data, we expose time spent on GPU per
client and per engine class in sysfs with a hierarchy like the below:

# cd /sys/class/drm/card0/clients/
# tree
.
├── 7
│   ├── busy
│   │   ├── 0
│   │   ├── 1
│   │   ├── 2
│   │   └── 3
│   ├── name
│   └── pid
├── 8
│   ├── busy
│   │   ├── 0
│   │   ├── 1
│   │   ├── 2
│   │   └── 3
│   ├── name
│   └── pid
└── 9
├── busy
│   ├── 0
│   ├── 1
│   ├── 2
│   └── 3
├── name
└── pid

Files in 'busy' directories are numbered using the engine class ABI values and
they contain accumulated nanoseconds each client spent on engines of a
respective class.

Tvrtko Ursulin (7):
  drm/i915: Expose list of clients in sysfs
  drm/i915: Update client name on context create
  drm/i915: Make GEM contexts track DRM clients
  drm/i915: Track runtime spent in closed and unreachable GEM contexts
  drm/i915: Track all user contexts per client
  drm/i915: Track context current active time
  drm/i915: Expose per-engine client busyness

 drivers/gpu/drm/i915/Makefile |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
 drivers/gpu/drm/i915/gt/intel_context.c   |  27 +-
 drivers/gpu/drm/i915/gt/intel_context.h   |  15 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
 .../gpu/drm/i915/gt/intel_gt_clock_utils.c|   4 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   |  27 +-
 drivers/gpu/drm/i915/gt/intel_lrc.h   |  24 ++
 drivers/gpu/drm/i915/gt/selftest_lrc.c|  10 +-
 drivers/gpu/drm/i915/i915_drm_client.c| 365 ++
 drivers/gpu/drm/i915/i915_drm_client.h| 123 ++
 drivers/gpu/drm/i915/i915_drv.c   |   6 +
 drivers/gpu/drm/i915/i915_drv.h   |   5 +
 drivers/gpu/drm/i915/i915_gem.c   |  21 +-
 drivers/gpu/drm/i915/i915_gpu_error.c |  31 +-
 drivers/gpu/drm/i915/i915_gpu_error.h |   2 +-
 drivers/gpu/drm/i915/i915_sysfs.c |   8 +
 19 files changed, 716 insertions(+), 81 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

-- 
2.30.2